[RFC - 0/13] NTP cleanup work (v. B5)

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [RFC - 0/13] NTP cleanup work (v. B5)
@ 2005-08-11  1:21 john stultz
  2005-08-11  1:23 ` [RFC][PATCH - 1/13] NTP cleanup: Move NTP code into ntp.c john stultz
                   ` (2 more replies)
  0 siblings, 3 replies; 71+ messages in thread
From: john stultz @ 2005-08-11  1:21 UTC (permalink / raw)
  To: lkml
  Cc: George Anzinger, frank, Anton Blanchard, benh,
	Nishanth Aravamudan, zippel, Ulrich Windl

All,
	The goal of this patch set is to isolate the in kernel NTP state
machine in the hope of simplifying the current timekeeping code and
allowing for optional future changes in the timekeeping subsystem.

I've tried to address some of the complexity concerns for systems that
do not have a continuous timesource, preserving the existing behavior
while still providing a ppm interface allowing smooth adjustments to
continuous timesources. 

Patches 1-10 and patch 13 should be fairly straight forward only moving
and cleaning up various bits of code.

Patches 11 and 12 are somewhat more functional changes and should be
reviewed more carefully. Especially by someone who knows the PPC64
ppc_adjtimex() function in depth.

I've lightly tested this code on i386 and it seems to work properly. I'm
currently working to get more thorough testing done, but I wanted to get
it out there for review and hopefully a bit of testing.

Please let me know if you have any feedback or suggestions.

thanks
-john

^ permalink raw reply	[flat|nested] 71+ messages in thread

* [RFC][PATCH - 1/13] NTP cleanup: Move NTP code into ntp.c
  2005-08-11  1:21 [RFC - 0/13] NTP cleanup work (v. B5) john stultz
@ 2005-08-11  1:23 ` john stultz
  2005-08-11  1:25   ` [RFC][PATCH - 2/13] NTP cleanup: Move arches to new ntp interfaces john stultz
  2005-08-11  2:13 ` [RFC - 0/9] Generic timekeeping subsystem (v. B5) john stultz
  2005-08-15 22:12 ` [RFC - 0/13] NTP cleanup work " Roman Zippel
  2 siblings, 1 reply; 71+ messages in thread
From: john stultz @ 2005-08-11  1:23 UTC (permalink / raw)
  To: lkml
  Cc: George Anzinger, frank, Anton Blanchard, benh,
	Nishanth Aravamudan, Roman Zippel, Ulrich Windl

All,
	This patch moves the generic NTP code from time.c and timer.c into
ntp.c. It makes most of the NTP variables static providing more
understandable interfaces like ntp_synced() and ntp_clear(). 
	
Since some of the newly made static variables are used in arch generic
code, this patch alone will not compile. Thus this patch requires part 2
of the series which fixes the arch specific uses of the newly static
variables.

Any comments or feedback would be greatly appreciated.

thanks
-john


linux-2.6.13-rc6_timeofday-ntp-part1_B5.patch
============================================
diff --git a/include/linux/ntp.h b/include/linux/ntp.h
new file mode 100644
--- /dev/null
+++ b/include/linux/ntp.h
@@ -0,0 +1,31 @@
+/*  linux/include/linux/ntp.h
+ *
+ *  This file NTP state machine accessor functions.
+ */
+
+#ifndef _LINUX_NTP_H
+#define _LINUX_NTP_H
+#include <linux/types.h>
+#include <linux/time.h>
+#include <linux/timex.h>
+
+/* NTP state machine interfaces */
+void ntp_advance(void);
+int ntp_adjtimex(struct timex*);
+void second_overflow(void);
+void ntp_clear(void);
+int ntp_synced(void);
+long ntp_get_fixed_ns_adjustment(void);
+
+
+extern int tickadj;
+extern long time_adjust;
+
+/* Due to ppc64 having its own NTP  code,
+ * these variables cannot be made static just yet
+ */
+extern long time_offset;
+extern long time_freq;
+extern long time_constant;
+
+#endif
diff --git a/include/linux/timex.h b/include/linux/timex.h
--- a/include/linux/timex.h
+++ b/include/linux/timex.h
@@ -226,39 +226,6 @@ struct timex {
  */
 extern unsigned long tick_usec;		/* USER_HZ period (usec) */
 extern unsigned long tick_nsec;		/* ACTHZ          period (nsec) */
-extern int tickadj;			/* amount of adjustment per tick */
-
-/*
- * phase-lock loop variables
- */
-extern int time_state;		/* clock status */
-extern int time_status;		/* clock synchronization status bits */
-extern long time_offset;	/* time adjustment (us) */
-extern long time_constant;	/* pll time constant */
-extern long time_tolerance;	/* frequency tolerance (ppm) */
-extern long time_precision;	/* clock precision (us) */
-extern long time_maxerror;	/* maximum error */
-extern long time_esterror;	/* estimated error */
-
-extern long time_freq;		/* frequency offset (scaled ppm) */
-extern long time_reftime;	/* time at last adjustment (s) */
-
-extern long time_adjust;	/* The amount of adjtime left */
-extern long time_next_adjust;	/* Value for time_adjust at next tick */
-
-/* interface variables pps->timer interrupt */
-extern long pps_offset;		/* pps time offset (us) */
-extern long pps_jitter;		/* time dispersion (jitter) (us) */
-extern long pps_freq;		/* frequency offset (scaled ppm) */
-extern long pps_stabil;		/* frequency dispersion (scaled ppm) */
-extern long pps_valid;		/* pps signal watchdog counter */
-
-/* interface variables pps->adjtimex */
-extern int pps_shift;		/* interval duration (s) (shift) */
-extern long pps_jitcnt;		/* jitter limit exceeded */
-extern long pps_calcnt;		/* calibration intervals */
-extern long pps_errcnt;		/* calibration errors */
-extern long pps_stbcnt;		/* stability limit exceeded */
 
 #ifdef CONFIG_TIME_INTERPOLATION
 
diff --git a/kernel/Makefile b/kernel/Makefile
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -7,7 +7,7 @@ obj-y     = sched.o fork.o exec_domain.o
 	    sysctl.o capability.o ptrace.o timer.o user.o \
 	    signal.o sys.o kmod.o workqueue.o pid.o \
 	    rcupdate.o intermodule.o extable.o params.o posix-timers.o \
-	    kthread.o wait.o kfifo.o sys_ni.o posix-cpu-timers.o
+	    kthread.o wait.o kfifo.o sys_ni.o posix-cpu-timers.o ntp.o
 
 obj-$(CONFIG_FUTEX) += futex.o
 obj-$(CONFIG_GENERIC_ISA_DMA) += dma.o
diff --git a/kernel/ntp.c b/kernel/ntp.c
new file mode 100644
--- /dev/null
+++ b/kernel/ntp.c
@@ -0,0 +1,490 @@
+/********************************************************************
+* linux/kernel/ntp.c
+*
+* NTP state machine and time scaling code.
+*
+* Code moved from kernel/time.c and kernel/timer.c
+* Please see those files for original copyrights.
+*
+* This program is free software; you can redistribute it and/or modify
+* it under the terms of the GNU General Public License as published by
+* the Free Software Foundation; either version 2 of the License, or
+* (at your option) any later version.
+*
+* This program is distributed in the hope that it will be useful,
+* but WITHOUT ANY WARRANTY; without even the implied warranty of
+* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+* GNU General Public License for more details.
+*
+* You should have received a copy of the GNU General Public License
+* along with this program; if not, write to the Free Software
+* Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
+*
+* Notes:
+*
+* Hopefully you should never have to understand or touch
+* any of the code below. but don't let that keep you from trying!
+*
+* This code is loosely based on David Mills' RFC 1589 and its
+* updates. Please see the following for more details:
+*  http://www.eecis.udel.edu/~mills/database/rfc/rfc1589.txt
+*  http://www.eecis.udel.edu/~mills/database/reports/kern/kernb.pdf
+*
+* NOTE:	To simplify the code, we do not implement any of
+* the PPS code, as the code that uses it never was merged.
+*                             -johnstul@us.ibm.com
+*
+*********************************************************************/
+
+#include <linux/ntp.h>
+#include <linux/jiffies.h>
+#include <linux/errno.h>
+
+#ifdef CONFIG_TIME_INTERPOLATION
+void time_interpolator_update(long delta_nsec);
+#else
+#define time_interpolator_update(x) do {} while (0)
+#endif
+
+
+static long pps_offset;            /* pps time offset (us) */
+static long pps_jitter = MAXTIME;  /* time dispersion (jitter) (us) */
+
+static long pps_freq;              /* frequency offset (scaled ppm) */
+static long pps_stabil = MAXFREQ;  /* frequency dispersion (scaled ppm) */
+
+static long pps_valid = PPS_VALID; /* pps signal watchdog counter */
+
+static int pps_shift = PPS_SHIFT;  /* interval duration (s) (shift) */
+
+static long pps_jitcnt;            /* jitter limit exceeded */
+static long pps_calcnt;            /* calibration intervals */
+static long pps_errcnt;            /* calibration errors */
+static long pps_stbcnt;            /* stability limit exceeded */
+
+/* Don't completely fail for HZ > 500.  */
+int tickadj = 500/HZ ? : 1;		/* microsecs */
+
+
+/*
+ * phase-lock loop variables
+ */
+/* TIME_ERROR prevents overwriting the CMOS clock */
+static int time_state = TIME_OK;              /* clock synchronization status */
+static int time_status = STA_UNSYNC;          /* clock status bits */
+long time_offset;                      /* time adjustment (us) */
+long time_constant = 2;                /* pll time constant */
+static long time_tolerance = MAXFREQ;         /* frequency tolerance (ppm) */
+static long time_precision = 1;               /* clock precision (us) */
+static long time_maxerror = NTP_PHASE_LIMIT;  /* maximum error (us) */
+static long time_esterror = NTP_PHASE_LIMIT;  /* estimated error (us) */
+static long time_phase;                       /* phase offset (scaled us) */
+long time_freq = (((NSEC_PER_SEC + HZ/2) % HZ - HZ/2) << SHIFT_USEC) / NSEC_PER_USEC;
+                                        /* frequency offset (scaled ppm) */
+static long time_adj;                   /* tick adjust (scaled 1 / HZ) */
+static long time_reftime;                      /* time at last adjustment (s) */
+long time_adjust;
+static long time_next_adjust;
+
+static long fixed_tick_ns_adj;
+
+void ntp_advance(void)
+{
+	long time_adjust_step;
+
+	if ( (time_adjust_step = time_adjust) != 0 ) {
+	    /* We are doing an adjtime thing.
+	     *
+	     * Prepare time_adjust_step to be within bounds.
+	     * Note that a positive time_adjust means we want the clock
+	     * to run faster.
+	     *
+	     * Limit the amount of the step to be in the range
+	     * -tickadj .. +tickadj
+	     */
+		if (time_adjust > tickadj)
+			time_adjust_step = tickadj;
+		else if (time_adjust < -tickadj)
+			time_adjust_step = -tickadj;
+
+	    /* Reduce by this step the amount of time left  */
+	    time_adjust -= time_adjust_step;
+	}
+	fixed_tick_ns_adj = time_adjust_step * 1000;
+
+	/*
+	 * Advance the phase, once it gets to one microsecond, then
+	 * advance the tick more.
+	 */
+	time_phase += time_adj;
+	if (time_phase <= -FINENSEC) {
+		long ltemp = -time_phase >> (SHIFT_SCALE - 10);
+		time_phase += ltemp << (SHIFT_SCALE - 10);
+		fixed_tick_ns_adj -= ltemp;
+	} else if (time_phase >= FINENSEC) {
+		long ltemp = time_phase >> (SHIFT_SCALE - 10);
+		time_phase -= ltemp << (SHIFT_SCALE - 10);
+		fixed_tick_ns_adj += ltemp;
+	}
+
+	/* Changes by adjtime() do not take effect till next tick. */
+	if (time_next_adjust != 0) {
+		time_adjust = time_next_adjust;
+		time_next_adjust = 0;
+	}
+}
+
+
+/*
+ * this routine handles the overflow of the microsecond field
+ *
+ * The tricky bits of code to handle the accurate clock support
+ * were provided by Dave Mills (Mills@UDEL.EDU) of NTP fame.
+ * They were originally developed for SUN and DEC kernels.
+ * All the kudos should go to Dave for this stuff.
+ *
+ */
+void second_overflow(void)
+{
+	long ltemp;
+
+	/* Bump the maxerror field */
+	time_maxerror += time_tolerance >> SHIFT_USEC;
+	if ( time_maxerror > NTP_PHASE_LIMIT ) {
+		time_maxerror = NTP_PHASE_LIMIT;
+		time_status |= STA_UNSYNC;
+	}
+
+	/*
+	 * Leap second processing. If in leap-insert state at
+	 * the end of the day, the system clock is set back one
+	 * second; if in leap-delete state, the system clock is
+	 * set ahead one second. The microtime() routine or
+	 * external clock driver will insure that reported time
+	 * is always monotonic. The ugly divides should be
+	 * replaced.
+	 */
+	switch (time_state) {
+
+	case TIME_OK:
+		if (time_status & STA_INS)
+			time_state = TIME_INS;
+		else if (time_status & STA_DEL)
+			time_state = TIME_DEL;
+		break;
+
+	case TIME_INS:
+		if (xtime.tv_sec % 86400 == 0) {
+			xtime.tv_sec--;
+			wall_to_monotonic.tv_sec++;
+			/* The timer interpolator will make time change gradually instead
+			 * of an immediate jump by one second.
+			 */
+			time_interpolator_update(-NSEC_PER_SEC);
+			time_state = TIME_OOP;
+			clock_was_set();
+			printk(KERN_NOTICE "Clock: inserting leap second 23:59:60 UTC\n");
+		}
+		break;
+
+	case TIME_DEL:
+		if ((xtime.tv_sec + 1) % 86400 == 0) {
+			xtime.tv_sec++;
+			wall_to_monotonic.tv_sec--;
+			/* Use of time interpolator for a gradual change of time */
+			time_interpolator_update(NSEC_PER_SEC);
+			time_state = TIME_WAIT;
+			clock_was_set();
+			printk(KERN_NOTICE "Clock: deleting leap second 23:59:59 UTC\n");
+		}
+		break;
+
+	case TIME_OOP:
+		time_state = TIME_WAIT;
+		break;
+
+	case TIME_WAIT:
+		if (!(time_status & (STA_INS | STA_DEL)))
+			time_state = TIME_OK;
+	}
+
+	/*
+	 * Compute the phase adjustment for the next second. In
+	 * PLL mode, the offset is reduced by a fixed factor
+	 * times the time constant. In FLL mode the offset is
+	 * used directly. In either mode, the maximum phase
+	 * adjustment for each second is clamped so as to spread
+	 * the adjustment over not more than the number of
+	 * seconds between updates.
+	 */
+	if (time_offset < 0) {
+		ltemp = -time_offset;
+		if (!(time_status & STA_FLL))
+			ltemp >>= SHIFT_KG + time_constant;
+		if (ltemp > (MAXPHASE / MINSEC) << SHIFT_UPDATE)
+			ltemp = (MAXPHASE / MINSEC) << SHIFT_UPDATE;
+		time_offset += ltemp;
+		time_adj = -ltemp << (SHIFT_SCALE - SHIFT_HZ - SHIFT_UPDATE);
+	} else {
+		ltemp = time_offset;
+		if (!(time_status & STA_FLL))
+			ltemp >>= SHIFT_KG + time_constant;
+		if (ltemp > (MAXPHASE / MINSEC) << SHIFT_UPDATE)
+			ltemp = (MAXPHASE / MINSEC) << SHIFT_UPDATE;
+		time_offset -= ltemp;
+		time_adj = ltemp << (SHIFT_SCALE - SHIFT_HZ - SHIFT_UPDATE);
+	}
+
+	/*
+	 * Compute the frequency estimate and additional phase
+	 * adjustment due to frequency error for the next
+	 * second. When the PPS signal is engaged, gnaw on the
+	 * watchdog counter and update the frequency computed by
+	 * the pll and the PPS signal.
+	 */
+	pps_valid++;
+	if (pps_valid == PPS_VALID) {	/* PPS signal lost */
+		pps_jitter = MAXTIME;
+		pps_stabil = MAXFREQ;
+		time_status &= ~(STA_PPSSIGNAL | STA_PPSJITTER |
+			STA_PPSWANDER | STA_PPSERROR);
+	}
+	ltemp = time_freq + pps_freq;
+	if (ltemp < 0)
+		time_adj -= -ltemp >> (SHIFT_USEC + SHIFT_HZ - SHIFT_SCALE);
+	else
+		time_adj += ltemp >> (SHIFT_USEC + SHIFT_HZ - SHIFT_SCALE);
+
+#if HZ == 100
+    /* Compensate for (HZ==100) != (1 << SHIFT_HZ).
+     * Add 25% and 3.125% to get 128.125; => only 0.125% error (p. 14)
+     */
+	if (time_adj < 0)
+		time_adj -= (-time_adj >> 2) + (-time_adj >> 5);
+	else
+		time_adj += (time_adj >> 2) + (time_adj >> 5);
+#endif
+#if HZ == 1000
+    /* Compensate for (HZ==1000) != (1 << SHIFT_HZ).
+     * Add 1.5625% and 0.78125% to get 1023.4375; => only 0.05% error (p. 14)
+     */
+	if (time_adj < 0)
+		time_adj -= (-time_adj >> 6) + (-time_adj >> 7);
+	else
+		time_adj += (time_adj >> 6) + (time_adj >> 7);
+#endif
+}
+
+/* adjtimex mainly allows reading (and writing, if superuser) of
+ * kernel time-keeping variables. used by xntpd.
+ */
+int ntp_adjtimex(struct timex *txc)
+{
+	long ltemp, mtemp, save_adjust;
+	int result;
+
+	/* Now we validate the data before disabling interrupts */
+
+	if ((txc->modes & ADJ_OFFSET_SINGLESHOT) == ADJ_OFFSET_SINGLESHOT)
+		/* singleshot must not be used with any other mode bits */
+		if (txc->modes != ADJ_OFFSET_SINGLESHOT)
+			return -EINVAL;
+
+	if (txc->modes != ADJ_OFFSET_SINGLESHOT && (txc->modes & ADJ_OFFSET))
+		/* adjustment Offset limited to +- .512 seconds */
+		if (txc->offset <= - MAXPHASE || txc->offset >= MAXPHASE )
+			return -EINVAL;
+
+	/* if the quartz is off by more than 10% something is VERY wrong ! */
+	if (txc->modes & ADJ_TICK)
+		if (txc->tick <  900000/USER_HZ ||
+				txc->tick > 1100000/USER_HZ)
+			return -EINVAL;
+
+	write_seqlock_irq(&xtime_lock);
+	result = time_state;       /* mostly `TIME_OK' */
+
+	/* Save for later - semantics of adjtime is to return old value */
+	save_adjust = time_next_adjust ? time_next_adjust : time_adjust;
+
+	/* If there are input parameters, then process them */
+	if (txc->modes)	{
+		if (txc->modes & ADJ_STATUS)	/* only set allowed bits */
+			time_status =  (txc->status & ~STA_RONLY) |
+					(time_status & STA_RONLY);
+
+		if (txc->modes & ADJ_FREQUENCY) {	/* p. 22 */
+			if (txc->freq > MAXFREQ || txc->freq < -MAXFREQ) {
+				result = -EINVAL;
+				goto leave;
+			}
+			time_freq = txc->freq - pps_freq;
+		}
+
+		if (txc->modes & ADJ_MAXERROR) {
+			if (txc->maxerror < 0
+					|| txc->maxerror >= NTP_PHASE_LIMIT) {
+				result = -EINVAL;
+				goto leave;
+			}
+			time_maxerror = txc->maxerror;
+		}
+
+		if (txc->modes & ADJ_ESTERROR) {
+			if (txc->esterror < 0
+					|| txc->esterror >= NTP_PHASE_LIMIT) {
+				result = -EINVAL;
+				goto leave;
+			}
+			time_esterror = txc->esterror;
+		}
+
+		if (txc->modes & ADJ_TIMECONST) {	/* p. 24 */
+			if (txc->constant < 0) { /* NTP v4 uses values > 6 */
+				result = -EINVAL;
+				goto leave;
+			}
+			time_constant = txc->constant;
+		}
+
+		if (txc->modes & ADJ_OFFSET) { /* values checked earlier */
+			if (txc->modes == ADJ_OFFSET_SINGLESHOT) {
+				/* adjtime() is independent from ntp_adjtime() */
+				if ((time_next_adjust = txc->offset) == 0)
+					time_adjust = 0;
+			} else if ( time_status & (STA_PLL | STA_PPSTIME) ) {
+				ltemp = (time_status
+					& (STA_PPSTIME | STA_PPSSIGNAL))
+					== (STA_PPSTIME | STA_PPSSIGNAL) ?
+			            pps_offset : txc->offset;
+
+				/*
+				 * Scale the phase adjustment and
+				 * clamp to the operating range.
+				 */
+				if (ltemp > MAXPHASE)
+					time_offset = MAXPHASE << SHIFT_UPDATE;
+				else if (ltemp < -MAXPHASE)
+					time_offset = -(MAXPHASE
+							<< SHIFT_UPDATE);
+				else
+					time_offset = ltemp << SHIFT_UPDATE;
+
+				/*
+				 * Select whether the frequency is to be controlled
+				 * and in which mode (PLL or FLL). Clamp to the operating
+				 * range. Ugly multiply/divide should be replaced someday.
+				 */
+
+				if (time_status & STA_FREQHOLD || time_reftime == 0)
+					time_reftime = xtime.tv_sec;
+
+				mtemp = xtime.tv_sec - time_reftime;
+				time_reftime = xtime.tv_sec;
+
+				if (time_status & STA_FLL) {
+					if (mtemp >= MINSEC) {
+						ltemp = (time_offset / mtemp) << (SHIFT_USEC -
+									SHIFT_UPDATE);
+						if (ltemp < 0)
+							time_freq -= -ltemp >> SHIFT_KH;
+						else
+							time_freq += ltemp >> SHIFT_KH;
+					} else /* calibration interval too short (p. 12) */
+						result = TIME_ERROR;
+				} else {	/* PLL mode */
+					if (mtemp < MAXSEC) {
+						ltemp *= mtemp;
+						if (ltemp < 0)
+							time_freq -= -ltemp >> (time_constant +
+									time_constant +
+									SHIFT_KF - SHIFT_USEC);
+					    else
+				    	    time_freq += ltemp >> (time_constant +
+								       time_constant +
+								       SHIFT_KF - SHIFT_USEC);
+					} else /* calibration interval too long (p. 12) */
+						result = TIME_ERROR;
+				}
+
+				if (time_freq > time_tolerance)
+					time_freq = time_tolerance;
+				else if (time_freq < -time_tolerance)
+					time_freq = -time_tolerance;
+			} /* STA_PLL || STA_PPSTIME */
+		} /* txc->modes & ADJ_OFFSET */
+
+		if (txc->modes & ADJ_TICK) {
+			tick_usec = txc->tick;
+			tick_nsec = TICK_USEC_TO_NSEC(tick_usec);
+		}
+	} /* txc->modes */
+leave:
+
+	if ((time_status & (STA_UNSYNC|STA_CLOCKERR)) != 0
+		|| ((time_status & (STA_PPSFREQ|STA_PPSTIME)) != 0
+		&& (time_status & STA_PPSSIGNAL) == 0)
+		/* p. 24, (b) */
+		|| ((time_status & (STA_PPSTIME|STA_PPSJITTER))
+		== (STA_PPSTIME|STA_PPSJITTER))
+		/* p. 24, (c) */
+		|| ((time_status & STA_PPSFREQ) != 0
+		&& (time_status & (STA_PPSWANDER|STA_PPSERROR)) != 0))
+		/* p. 24, (d) */
+			result = TIME_ERROR;
+
+	if ((txc->modes & ADJ_OFFSET_SINGLESHOT) == ADJ_OFFSET_SINGLESHOT)
+		txc->offset = save_adjust;
+	else {
+		if (time_offset < 0)
+			txc->offset = -(-time_offset >> SHIFT_UPDATE);
+		else
+			txc->offset = time_offset >> SHIFT_UPDATE;
+	}
+	txc->freq = time_freq + pps_freq;
+	txc->maxerror = time_maxerror;
+	txc->esterror = time_esterror;
+	txc->status = time_status;
+	txc->constant = time_constant;
+	txc->precision = time_precision;
+	txc->tolerance = time_tolerance;
+	txc->tick = tick_usec;
+	txc->ppsfreq = pps_freq;
+	txc->jitter = pps_jitter >> PPS_AVG;
+	txc->shift = pps_shift;
+	txc->stabil = pps_stabil;
+	txc->jitcnt = pps_jitcnt;
+	txc->calcnt = pps_calcnt;
+	txc->errcnt = pps_errcnt;
+	txc->stbcnt = pps_stbcnt;
+	write_sequnlock_irq(&xtime_lock);
+	do_gettimeofday(&txc->time);
+	return result;
+}
+
+/**
+ * ntp_clear - Clears the NTP state machine.
+ *
+ * Must be called while holding a write on the xtime_lock
+ */
+void ntp_clear(void)
+{
+	time_adjust = 0;		/* stop active adjtime() */
+	time_status |= STA_UNSYNC;
+	time_maxerror = NTP_PHASE_LIMIT;
+	time_esterror = NTP_PHASE_LIMIT;
+}
+
+/**
+ * ntp_synced - Returns 1 if the NTP status is not UNSYNC
+ *
+ */
+int ntp_synced(void)
+{
+	return !(time_status & STA_UNSYNC);
+}
+
+long ntp_get_fixed_ns_adjustment(void)
+{
+	return fixed_tick_ns_adj;
+}
diff --git a/kernel/time.c b/kernel/time.c
--- a/kernel/time.c
+++ b/kernel/time.c
@@ -35,6 +35,7 @@
 #include <linux/security.h>
 #include <linux/fs.h>
 #include <linux/module.h>
+#include <linux/ntp.h>
 
 #include <asm/uaccess.h>
 #include <asm/unistd.h>
@@ -198,20 +199,6 @@ asmlinkage long sys_settimeofday(struct 
 	return do_sys_settimeofday(tv ? &new_ts : NULL, tz ? &new_tz : NULL);
 }
 
-long pps_offset;		/* pps time offset (us) */
-long pps_jitter = MAXTIME;	/* time dispersion (jitter) (us) */
-
-long pps_freq;			/* frequency offset (scaled ppm) */
-long pps_stabil = MAXFREQ;	/* frequency dispersion (scaled ppm) */
-
-long pps_valid = PPS_VALID;	/* pps signal watchdog counter */
-
-int pps_shift = PPS_SHIFT;	/* interval duration (s) (shift) */
-
-long pps_jitcnt;		/* jitter limit exceeded */
-long pps_calcnt;		/* calibration intervals */
-long pps_errcnt;		/* calibration errors */
-long pps_stbcnt;		/* stability limit exceeded */
 
 /* hook for a loadable hardpps kernel module */
 void (*hardpps_ptr)(struct timeval *);
@@ -229,184 +216,14 @@ void __attribute__ ((weak)) notify_arch_
  */
 int do_adjtimex(struct timex *txc)
 {
-        long ltemp, mtemp, save_adjust;
 	int result;
 
 	/* In order to modify anything, you gotta be super-user! */
 	if (txc->modes && !capable(CAP_SYS_TIME))
 		return -EPERM;
 		
-	/* Now we validate the data before disabling interrupts */
-
-	if ((txc->modes & ADJ_OFFSET_SINGLESHOT) == ADJ_OFFSET_SINGLESHOT)
-	  /* singleshot must not be used with any other mode bits */
-		if (txc->modes != ADJ_OFFSET_SINGLESHOT)
-			return -EINVAL;
-
-	if (txc->modes != ADJ_OFFSET_SINGLESHOT && (txc->modes & ADJ_OFFSET))
-	  /* adjustment Offset limited to +- .512 seconds */
-		if (txc->offset <= - MAXPHASE || txc->offset >= MAXPHASE )
-			return -EINVAL;	
-
-	/* if the quartz is off by more than 10% something is VERY wrong ! */
-	if (txc->modes & ADJ_TICK)
-		if (txc->tick <  900000/USER_HZ ||
-		    txc->tick > 1100000/USER_HZ)
-			return -EINVAL;
-
-	write_seqlock_irq(&xtime_lock);
-	result = time_state;	/* mostly `TIME_OK' */
-
-	/* Save for later - semantics of adjtime is to return old value */
-	save_adjust = time_next_adjust ? time_next_adjust : time_adjust;
-
-#if 0	/* STA_CLOCKERR is never set yet */
-	time_status &= ~STA_CLOCKERR;		/* reset STA_CLOCKERR */
-#endif
-	/* If there are input parameters, then process them */
-	if (txc->modes)
-	{
-	    if (txc->modes & ADJ_STATUS)	/* only set allowed bits */
-		time_status =  (txc->status & ~STA_RONLY) |
-			      (time_status & STA_RONLY);
-
-	    if (txc->modes & ADJ_FREQUENCY) {	/* p. 22 */
-		if (txc->freq > MAXFREQ || txc->freq < -MAXFREQ) {
-		    result = -EINVAL;
-		    goto leave;
-		}
-		time_freq = txc->freq - pps_freq;
-	    }
-
-	    if (txc->modes & ADJ_MAXERROR) {
-		if (txc->maxerror < 0 || txc->maxerror >= NTP_PHASE_LIMIT) {
-		    result = -EINVAL;
-		    goto leave;
-		}
-		time_maxerror = txc->maxerror;
-	    }
-
-	    if (txc->modes & ADJ_ESTERROR) {
-		if (txc->esterror < 0 || txc->esterror >= NTP_PHASE_LIMIT) {
-		    result = -EINVAL;
-		    goto leave;
-		}
-		time_esterror = txc->esterror;
-	    }
-
-	    if (txc->modes & ADJ_TIMECONST) {	/* p. 24 */
-		if (txc->constant < 0) {	/* NTP v4 uses values > 6 */
-		    result = -EINVAL;
-		    goto leave;
-		}
-		time_constant = txc->constant;
-	    }
-
-	    if (txc->modes & ADJ_OFFSET) {	/* values checked earlier */
-		if (txc->modes == ADJ_OFFSET_SINGLESHOT) {
-		    /* adjtime() is independent from ntp_adjtime() */
-		    if ((time_next_adjust = txc->offset) == 0)
-			 time_adjust = 0;
-		}
-		else if ( time_status & (STA_PLL | STA_PPSTIME) ) {
-		    ltemp = (time_status & (STA_PPSTIME | STA_PPSSIGNAL)) ==
-		            (STA_PPSTIME | STA_PPSSIGNAL) ?
-		            pps_offset : txc->offset;
-
-		    /*
-		     * Scale the phase adjustment and
-		     * clamp to the operating range.
-		     */
-		    if (ltemp > MAXPHASE)
-		        time_offset = MAXPHASE << SHIFT_UPDATE;
-		    else if (ltemp < -MAXPHASE)
-			time_offset = -(MAXPHASE << SHIFT_UPDATE);
-		    else
-		        time_offset = ltemp << SHIFT_UPDATE;
-
-		    /*
-		     * Select whether the frequency is to be controlled
-		     * and in which mode (PLL or FLL). Clamp to the operating
-		     * range. Ugly multiply/divide should be replaced someday.
-		     */
-
-		    if (time_status & STA_FREQHOLD || time_reftime == 0)
-		        time_reftime = xtime.tv_sec;
-		    mtemp = xtime.tv_sec - time_reftime;
-		    time_reftime = xtime.tv_sec;
-		    if (time_status & STA_FLL) {
-		        if (mtemp >= MINSEC) {
-			    ltemp = (time_offset / mtemp) << (SHIFT_USEC -
-							      SHIFT_UPDATE);
-			    if (ltemp < 0)
-			        time_freq -= -ltemp >> SHIFT_KH;
-			    else
-			        time_freq += ltemp >> SHIFT_KH;
-			} else /* calibration interval too short (p. 12) */
-				result = TIME_ERROR;
-		    } else {	/* PLL mode */
-		        if (mtemp < MAXSEC) {
-			    ltemp *= mtemp;
-			    if (ltemp < 0)
-			        time_freq -= -ltemp >> (time_constant +
-							time_constant +
-							SHIFT_KF - SHIFT_USEC);
-			    else
-			        time_freq += ltemp >> (time_constant +
-						       time_constant +
-						       SHIFT_KF - SHIFT_USEC);
-			} else /* calibration interval too long (p. 12) */
-				result = TIME_ERROR;
-		    }
-		    if (time_freq > time_tolerance)
-		        time_freq = time_tolerance;
-		    else if (time_freq < -time_tolerance)
-		        time_freq = -time_tolerance;
-		} /* STA_PLL || STA_PPSTIME */
-	    } /* txc->modes & ADJ_OFFSET */
-	    if (txc->modes & ADJ_TICK) {
-		tick_usec = txc->tick;
-		tick_nsec = TICK_USEC_TO_NSEC(tick_usec);
-	    }
-	} /* txc->modes */
-leave:	if ((time_status & (STA_UNSYNC|STA_CLOCKERR)) != 0
-	    || ((time_status & (STA_PPSFREQ|STA_PPSTIME)) != 0
-		&& (time_status & STA_PPSSIGNAL) == 0)
-	    /* p. 24, (b) */
-	    || ((time_status & (STA_PPSTIME|STA_PPSJITTER))
-		== (STA_PPSTIME|STA_PPSJITTER))
-	    /* p. 24, (c) */
-	    || ((time_status & STA_PPSFREQ) != 0
-		&& (time_status & (STA_PPSWANDER|STA_PPSERROR)) != 0))
-	    /* p. 24, (d) */
-		result = TIME_ERROR;
+	result = ntp_adjtimex(txc);
 	
-	if ((txc->modes & ADJ_OFFSET_SINGLESHOT) == ADJ_OFFSET_SINGLESHOT)
-	    txc->offset	   = save_adjust;
-	else {
-	    if (time_offset < 0)
-		txc->offset = -(-time_offset >> SHIFT_UPDATE);
-	    else
-		txc->offset = time_offset >> SHIFT_UPDATE;
-	}
-	txc->freq	   = time_freq + pps_freq;
-	txc->maxerror	   = time_maxerror;
-	txc->esterror	   = time_esterror;
-	txc->status	   = time_status;
-	txc->constant	   = time_constant;
-	txc->precision	   = time_precision;
-	txc->tolerance	   = time_tolerance;
-	txc->tick	   = tick_usec;
-	txc->ppsfreq	   = pps_freq;
-	txc->jitter	   = pps_jitter >> PPS_AVG;
-	txc->shift	   = pps_shift;
-	txc->stabil	   = pps_stabil;
-	txc->jitcnt	   = pps_jitcnt;
-	txc->calcnt	   = pps_calcnt;
-	txc->errcnt	   = pps_errcnt;
-	txc->stbcnt	   = pps_stbcnt;
-	write_sequnlock_irq(&xtime_lock);
-	do_gettimeofday(&txc->time);
 	notify_arch_cmos_timer();
 	return(result);
 }
@@ -522,10 +339,8 @@ int do_settimeofday (struct timespec *tv
 		set_normalized_timespec(&xtime, sec, nsec);
 		set_normalized_timespec(&wall_to_monotonic, wtm_sec, wtm_nsec);
 
-		time_adjust = 0;		/* stop active adjtime() */
-		time_status |= STA_UNSYNC;
-		time_maxerror = NTP_PHASE_LIMIT;
-		time_esterror = NTP_PHASE_LIMIT;
+		ntp_clear();
+
 		time_interpolator_reset();
 	}
 	write_sequnlock_irq(&xtime_lock);
diff --git a/kernel/timer.c b/kernel/timer.c
--- a/kernel/timer.c
+++ b/kernel/timer.c
@@ -33,6 +33,7 @@
 #include <linux/posix-timers.h>
 #include <linux/cpu.h>
 #include <linux/syscalls.h>
+#include <linux/ntp.h>
 
 #include <asm/uaccess.h>
 #include <asm/unistd.h>
@@ -41,7 +42,7 @@
 #include <asm/io.h>
 
 #ifdef CONFIG_TIME_INTERPOLATION
-static void time_interpolator_update(long delta_nsec);
+void time_interpolator_update(long delta_nsec);
 #else
 #define time_interpolator_update(x)
 #endif
@@ -597,219 +598,19 @@ struct timespec wall_to_monotonic __attr
 
 EXPORT_SYMBOL(xtime);
 
-/* Don't completely fail for HZ > 500.  */
-int tickadj = 500/HZ ? : 1;		/* microsecs */
-
-
-/*
- * phase-lock loop variables
- */
-/* TIME_ERROR prevents overwriting the CMOS clock */
-int time_state = TIME_OK;		/* clock synchronization status	*/
-int time_status = STA_UNSYNC;		/* clock status bits		*/
-long time_offset;			/* time adjustment (us)		*/
-long time_constant = 2;			/* pll time constant		*/
-long time_tolerance = MAXFREQ;		/* frequency tolerance (ppm)	*/
-long time_precision = 1;		/* clock precision (us)		*/
-long time_maxerror = NTP_PHASE_LIMIT;	/* maximum error (us)		*/
-long time_esterror = NTP_PHASE_LIMIT;	/* estimated error (us)		*/
-static long time_phase;			/* phase offset (scaled us)	*/
-long time_freq = (((NSEC_PER_SEC + HZ/2) % HZ - HZ/2) << SHIFT_USEC) / NSEC_PER_USEC;
-					/* frequency offset (scaled ppm)*/
-static long time_adj;			/* tick adjust (scaled 1 / HZ)	*/
-long time_reftime;			/* time at last adjustment (s)	*/
-long time_adjust;
-long time_next_adjust;
-
-/*
- * this routine handles the overflow of the microsecond field
- *
- * The tricky bits of code to handle the accurate clock support
- * were provided by Dave Mills (Mills@UDEL.EDU) of NTP fame.
- * They were originally developed for SUN and DEC kernels.
- * All the kudos should go to Dave for this stuff.
- *
- */
-static void second_overflow(void)
-{
-    long ltemp;
-
-    /* Bump the maxerror field */
-    time_maxerror += time_tolerance >> SHIFT_USEC;
-    if ( time_maxerror > NTP_PHASE_LIMIT ) {
-	time_maxerror = NTP_PHASE_LIMIT;
-	time_status |= STA_UNSYNC;
-    }
-
-    /*
-     * Leap second processing. If in leap-insert state at
-     * the end of the day, the system clock is set back one
-     * second; if in leap-delete state, the system clock is
-     * set ahead one second. The microtime() routine or
-     * external clock driver will insure that reported time
-     * is always monotonic. The ugly divides should be
-     * replaced.
-     */
-    switch (time_state) {
-
-    case TIME_OK:
-	if (time_status & STA_INS)
-	    time_state = TIME_INS;
-	else if (time_status & STA_DEL)
-	    time_state = TIME_DEL;
-	break;
-
-    case TIME_INS:
-	if (xtime.tv_sec % 86400 == 0) {
-	    xtime.tv_sec--;
-	    wall_to_monotonic.tv_sec++;
-	    /* The timer interpolator will make time change gradually instead
-	     * of an immediate jump by one second.
-	     */
-	    time_interpolator_update(-NSEC_PER_SEC);
-	    time_state = TIME_OOP;
-	    clock_was_set();
-	    printk(KERN_NOTICE "Clock: inserting leap second 23:59:60 UTC\n");
-	}
-	break;
-
-    case TIME_DEL:
-	if ((xtime.tv_sec + 1) % 86400 == 0) {
-	    xtime.tv_sec++;
-	    wall_to_monotonic.tv_sec--;
-	    /* Use of time interpolator for a gradual change of time */
-	    time_interpolator_update(NSEC_PER_SEC);
-	    time_state = TIME_WAIT;
-	    clock_was_set();
-	    printk(KERN_NOTICE "Clock: deleting leap second 23:59:59 UTC\n");
-	}
-	break;
-
-    case TIME_OOP:
-	time_state = TIME_WAIT;
-	break;
-
-    case TIME_WAIT:
-	if (!(time_status & (STA_INS | STA_DEL)))
-	    time_state = TIME_OK;
-    }
-
-    /*
-     * Compute the phase adjustment for the next second. In
-     * PLL mode, the offset is reduced by a fixed factor
-     * times the time constant. In FLL mode the offset is
-     * used directly. In either mode, the maximum phase
-     * adjustment for each second is clamped so as to spread
-     * the adjustment over not more than the number of
-     * seconds between updates.
-     */
-    if (time_offset < 0) {
-	ltemp = -time_offset;
-	if (!(time_status & STA_FLL))
-	    ltemp >>= SHIFT_KG + time_constant;
-	if (ltemp > (MAXPHASE / MINSEC) << SHIFT_UPDATE)
-	    ltemp = (MAXPHASE / MINSEC) << SHIFT_UPDATE;
-	time_offset += ltemp;
-	time_adj = -ltemp << (SHIFT_SCALE - SHIFT_HZ - SHIFT_UPDATE);
-    } else {
-	ltemp = time_offset;
-	if (!(time_status & STA_FLL))
-	    ltemp >>= SHIFT_KG + time_constant;
-	if (ltemp > (MAXPHASE / MINSEC) << SHIFT_UPDATE)
-	    ltemp = (MAXPHASE / MINSEC) << SHIFT_UPDATE;
-	time_offset -= ltemp;
-	time_adj = ltemp << (SHIFT_SCALE - SHIFT_HZ - SHIFT_UPDATE);
-    }
-
-    /*
-     * Compute the frequency estimate and additional phase
-     * adjustment due to frequency error for the next
-     * second. When the PPS signal is engaged, gnaw on the
-     * watchdog counter and update the frequency computed by
-     * the pll and the PPS signal.
-     */
-    pps_valid++;
-    if (pps_valid == PPS_VALID) {	/* PPS signal lost */
-	pps_jitter = MAXTIME;
-	pps_stabil = MAXFREQ;
-	time_status &= ~(STA_PPSSIGNAL | STA_PPSJITTER |
-			 STA_PPSWANDER | STA_PPSERROR);
-    }
-    ltemp = time_freq + pps_freq;
-    if (ltemp < 0)
-	time_adj -= -ltemp >>
-	    (SHIFT_USEC + SHIFT_HZ - SHIFT_SCALE);
-    else
-	time_adj += ltemp >>
-	    (SHIFT_USEC + SHIFT_HZ - SHIFT_SCALE);
-
-#if HZ == 100
-    /* Compensate for (HZ==100) != (1 << SHIFT_HZ).
-     * Add 25% and 3.125% to get 128.125; => only 0.125% error (p. 14)
-     */
-    if (time_adj < 0)
-	time_adj -= (-time_adj >> 2) + (-time_adj >> 5);
-    else
-	time_adj += (time_adj >> 2) + (time_adj >> 5);
-#endif
-#if HZ == 1000
-    /* Compensate for (HZ==1000) != (1 << SHIFT_HZ).
-     * Add 1.5625% and 0.78125% to get 1023.4375; => only 0.05% error (p. 14)
-     */
-    if (time_adj < 0)
-	time_adj -= (-time_adj >> 6) + (-time_adj >> 7);
-    else
-	time_adj += (time_adj >> 6) + (time_adj >> 7);
-#endif
-}
 
 /* in the NTP reference this is called "hardclock()" */
 static void update_wall_time_one_tick(void)
 {
-	long time_adjust_step, delta_nsec;
+	long delta_nsec;
 
-	if ( (time_adjust_step = time_adjust) != 0 ) {
-	    /* We are doing an adjtime thing. 
-	     *
-	     * Prepare time_adjust_step to be within bounds.
-	     * Note that a positive time_adjust means we want the clock
-	     * to run faster.
-	     *
-	     * Limit the amount of the step to be in the range
-	     * -tickadj .. +tickadj
-	     */
-	     if (time_adjust > tickadj)
-		time_adjust_step = tickadj;
-	     else if (time_adjust < -tickadj)
-		time_adjust_step = -tickadj;
+	ntp_advance();
+
+	delta_nsec = tick_nsec + ntp_get_fixed_ns_adjustment();
 
-	    /* Reduce by this step the amount of time left  */
-	    time_adjust -= time_adjust_step;
-	}
-	delta_nsec = tick_nsec + time_adjust_step * 1000;
-	/*
-	 * Advance the phase, once it gets to one microsecond, then
-	 * advance the tick more.
-	 */
-	time_phase += time_adj;
-	if (time_phase <= -FINENSEC) {
-		long ltemp = -time_phase >> (SHIFT_SCALE - 10);
-		time_phase += ltemp << (SHIFT_SCALE - 10);
-		delta_nsec -= ltemp;
-	}
-	else if (time_phase >= FINENSEC) {
-		long ltemp = time_phase >> (SHIFT_SCALE - 10);
-		time_phase -= ltemp << (SHIFT_SCALE - 10);
-		delta_nsec += ltemp;
-	}
 	xtime.tv_nsec += delta_nsec;
 	time_interpolator_update(delta_nsec);
 
-	/* Changes by adjtime() do not take effect till next tick. */
-	if (time_next_adjust != 0) {
-		time_adjust = time_next_adjust;
-		time_next_adjust = 0;
-	}
 }
 
 /*
@@ -1473,7 +1274,7 @@ unsigned long time_interpolator_get_offs
 #define INTERPOLATOR_ADJUST 65536
 #define INTERPOLATOR_MAX_SKIP 10*INTERPOLATOR_ADJUST
 
-static void time_interpolator_update(long delta_nsec)
+void time_interpolator_update(long delta_nsec)
 {
 	u64 counter;
 	unsigned long offset;



^ permalink raw reply	[flat|nested] 71+ messages in thread

* [RFC][PATCH - 2/13] NTP cleanup: Move arches to new ntp interfaces
  2005-08-11  1:23 ` [RFC][PATCH - 1/13] NTP cleanup: Move NTP code into ntp.c john stultz
@ 2005-08-11  1:25   ` john stultz
  2005-08-11  1:26     ` [RFC][PATCH - 3/13] NTP cleanup: Remove unused NTP PPS code john stultz
  0 siblings, 1 reply; 71+ messages in thread
From: john stultz @ 2005-08-11  1:25 UTC (permalink / raw)
  To: lkml
  Cc: George Anzinger, frank, Anton Blanchard, benh,
	Nishanth Aravamudan, Roman Zippel, Ulrich Windl

All,
	This patch converts all of arch specific code to use the new ntp_synced
() and ntp_clear() interfaces. This patch is required for the patch 1 of
the series to build. 

Any comments or feedback would be greatly appreciated.

thanks
-john


linux-2.6.13-rc6_timeofday-ntp-part2_B5.patch
============================================
diff --git a/arch/alpha/kernel/time.c b/arch/alpha/kernel/time.c
--- a/arch/alpha/kernel/time.c
+++ b/arch/alpha/kernel/time.c
@@ -42,6 +42,7 @@
 #include <linux/init.h>
 #include <linux/bcd.h>
 #include <linux/profile.h>
+#include <linux/ntp.h>
 
 #include <asm/uaccess.h>
 #include <asm/io.h>
@@ -149,7 +150,7 @@ irqreturn_t timer_interrupt(int irq, voi
 	 * CMOS clock accordingly every ~11 minutes. Set_rtc_mmss() has to be
 	 * called as close as possible to 500 ms before the new second starts.
 	 */
-	if ((time_status & STA_UNSYNC) == 0
+	if (ntp_synced()
 	    && xtime.tv_sec > state.last_rtc_update + 660
 	    && xtime.tv_nsec >= 500000 - ((unsigned) TICK_SIZE) / 2
 	    && xtime.tv_nsec <= 500000 + ((unsigned) TICK_SIZE) / 2) {
@@ -502,10 +503,7 @@ do_settimeofday(struct timespec *tv)
 	set_normalized_timespec(&xtime, sec, nsec);
 	set_normalized_timespec(&wall_to_monotonic, wtm_sec, wtm_nsec);
 
-	time_adjust = 0;		/* stop active adjtime() */
-	time_status |= STA_UNSYNC;
-	time_maxerror = NTP_PHASE_LIMIT;
-	time_esterror = NTP_PHASE_LIMIT;
+	ntp_clear();
 
 	write_sequnlock_irq(&xtime_lock);
 	clock_was_set();
diff --git a/arch/arm/kernel/time.c b/arch/arm/kernel/time.c
--- a/arch/arm/kernel/time.c
+++ b/arch/arm/kernel/time.c
@@ -28,6 +28,7 @@
 #include <linux/profile.h>
 #include <linux/sysdev.h>
 #include <linux/timer.h>
+#include <linux/ntp.h>
 
 #include <asm/hardware.h>
 #include <asm/io.h>
@@ -102,7 +103,7 @@ static unsigned long next_rtc_update;
  */
 static inline void do_set_rtc(void)
 {
-	if (time_status & STA_UNSYNC || set_rtc == NULL)
+	if (!ntp_synced() || set_rtc == NULL)
 		return;
 
 	if (next_rtc_update &&
@@ -292,10 +293,7 @@ int do_settimeofday(struct timespec *tv)
 	set_normalized_timespec(&xtime, sec, nsec);
 	set_normalized_timespec(&wall_to_monotonic, wtm_sec, wtm_nsec);
 
-	time_adjust = 0;		/* stop active adjtime() */
-	time_status |= STA_UNSYNC;
-	time_maxerror = NTP_PHASE_LIMIT;
-	time_esterror = NTP_PHASE_LIMIT;
+	ntp_clear();
 	write_sequnlock_irq(&xtime_lock);
 	clock_was_set();
 	return 0;
diff --git a/arch/arm26/kernel/time.c b/arch/arm26/kernel/time.c
--- a/arch/arm26/kernel/time.c
+++ b/arch/arm26/kernel/time.c
@@ -28,6 +28,7 @@
 #include <linux/timex.h>
 #include <linux/errno.h>
 #include <linux/profile.h>
+#include <linux/ntp.h>
 
 #include <asm/hardware.h>
 #include <asm/io.h>
@@ -114,7 +115,7 @@ static unsigned long next_rtc_update;
  */
 static inline void do_set_rtc(void)
 {
-	if (time_status & STA_UNSYNC || set_rtc == NULL)
+	if (!ntp_synced() || set_rtc == NULL)
 		return;
 
 //FIXME - timespec.tv_sec is a time_t not unsigned long
@@ -189,10 +190,7 @@ int do_settimeofday(struct timespec *tv)
 
 	xtime.tv_sec = tv->tv_sec;
 	xtime.tv_nsec = tv->tv_nsec;
-	time_adjust = 0;		/* stop active adjtime() */
-	time_status |= STA_UNSYNC;
-	time_maxerror = NTP_PHASE_LIMIT;
-	time_esterror = NTP_PHASE_LIMIT;
+	ntp_clear();
 	write_sequnlock_irq(&xtime_lock);
 	clock_was_set();
 	return 0;
diff --git a/arch/cris/arch-v10/kernel/time.c b/arch/cris/arch-v10/kernel/time.c
--- a/arch/cris/arch-v10/kernel/time.c
+++ b/arch/cris/arch-v10/kernel/time.c
@@ -240,7 +240,7 @@ timer_interrupt(int irq, void *dev_id, s
 	 * The division here is not time critical since it will run once in 
 	 * 11 minutes
 	 */
-	if ((time_status & STA_UNSYNC) == 0 &&
+	if (ntp_synced() &&
 	    xtime.tv_sec > last_rtc_update + 660 &&
 	    (xtime.tv_nsec / 1000) >= 500000 - (tick_nsec / 1000) / 2 &&
 	    (xtime.tv_nsec / 1000) <= 500000 + (tick_nsec / 1000) / 2) {
diff --git a/arch/cris/kernel/time.c b/arch/cris/kernel/time.c
--- a/arch/cris/kernel/time.c
+++ b/arch/cris/kernel/time.c
@@ -30,6 +30,7 @@
 #include <linux/bcd.h>
 #include <linux/timex.h>
 #include <linux/init.h>
+#include <linux/ntp.h>
 #include <linux/profile.h>
 
 u64 jiffies_64 = INITIAL_JIFFIES;
@@ -114,10 +115,7 @@ int do_settimeofday(struct timespec *tv)
 	set_normalized_timespec(&xtime, sec, nsec);
 	set_normalized_timespec(&wall_to_monotonic, wtm_sec, wtm_nsec);
 
-	time_adjust = 0;		/* stop active adjtime() */
-	time_status |= STA_UNSYNC;
-	time_maxerror = NTP_PHASE_LIMIT;
-	time_esterror = NTP_PHASE_LIMIT;
+	ntp_clear();
 	write_sequnlock_irq(&xtime_lock);
 	clock_was_set();
 	return 0;
diff --git a/arch/frv/kernel/time.c b/arch/frv/kernel/time.c
--- a/arch/frv/kernel/time.c
+++ b/arch/frv/kernel/time.c
@@ -21,6 +21,7 @@
 #include <linux/profile.h>
 #include <linux/irq.h>
 #include <linux/mm.h>
+#include <linux/ntp.h>
 
 #include <asm/io.h>
 #include <asm/timer-regs.h>
@@ -85,7 +86,7 @@ static irqreturn_t timer_interrupt(int i
 	 * CMOS clock accordingly every ~11 minutes. Set_rtc_mmss() has to be
 	 * called as close as possible to 500 ms before the new second starts.
 	 */
-	if ((time_status & STA_UNSYNC) == 0 &&
+	if (ntp_synced() &&
 	    xtime.tv_sec > last_rtc_update + 660 &&
 	    (xtime.tv_nsec / 1000) >= 500000 - ((unsigned) TICK_SIZE) / 2 &&
 	    (xtime.tv_nsec / 1000) <= 500000 + ((unsigned) TICK_SIZE) / 2
@@ -216,10 +217,7 @@ int do_settimeofday(struct timespec *tv)
 	set_normalized_timespec(&xtime, sec, nsec);
 	set_normalized_timespec(&wall_to_monotonic, wtm_sec, wtm_nsec);
 
-	time_adjust = 0;		/* stop active adjtime() */
-	time_status |= STA_UNSYNC;
-	time_maxerror = NTP_PHASE_LIMIT;
-	time_esterror = NTP_PHASE_LIMIT;
+	ntp_clear();
 	write_sequnlock_irq(&xtime_lock);
 	clock_was_set();
 	return 0;
diff --git a/arch/h8300/kernel/time.c b/arch/h8300/kernel/time.c
--- a/arch/h8300/kernel/time.c
+++ b/arch/h8300/kernel/time.c
@@ -26,6 +26,7 @@
 #include <linux/mm.h>
 #include <linux/timex.h>
 #include <linux/profile.h>
+#include <linux/ntp.h>
 
 #include <asm/io.h>
 #include <asm/target_time.h>
@@ -116,10 +117,7 @@ int do_settimeofday(struct timespec *tv)
 
 	xtime.tv_sec = tv->tv_sec;
 	xtime.tv_nsec = tv->tv_nsec;
-	time_adjust = 0;		/* stop active adjtime() */
-	time_status |= STA_UNSYNC;
-	time_maxerror = NTP_PHASE_LIMIT;
-	time_esterror = NTP_PHASE_LIMIT;
+	ntp_clear();
 	write_sequnlock_irq(&xtime_lock);
 	clock_was_set();
 	return 0;
diff --git a/arch/i386/kernel/time.c b/arch/i386/kernel/time.c
--- a/arch/i386/kernel/time.c
+++ b/arch/i386/kernel/time.c
@@ -46,6 +46,7 @@
 #include <linux/bcd.h>
 #include <linux/efi.h>
 #include <linux/mca.h>
+#include <linux/ntp.h>
 
 #include <asm/io.h>
 #include <asm/smp.h>
@@ -194,10 +195,7 @@ int do_settimeofday(struct timespec *tv)
 	set_normalized_timespec(&xtime, sec, nsec);
 	set_normalized_timespec(&wall_to_monotonic, wtm_sec, wtm_nsec);
 
-	time_adjust = 0;		/* stop active adjtime() */
-	time_status |= STA_UNSYNC;
-	time_maxerror = NTP_PHASE_LIMIT;
-	time_esterror = NTP_PHASE_LIMIT;
+	ntp_clear();
 	write_sequnlock_irq(&xtime_lock);
 	clock_was_set();
 	return 0;
@@ -348,7 +346,7 @@ static void sync_cmos_clock(unsigned lon
 	 * This code is run on a timer.  If the clock is set, that timer
 	 * may not expire at the correct time.  Thus, we adjust...
 	 */
-	if ((time_status & STA_UNSYNC) != 0)
+	if (!ntp_synced())
 		/*
 		 * Not synced, exit, do not restart a timer (if one is
 		 * running, let it run out).
diff --git a/arch/m32r/kernel/time.c b/arch/m32r/kernel/time.c
--- a/arch/m32r/kernel/time.c
+++ b/arch/m32r/kernel/time.c
@@ -28,6 +28,7 @@
 #include <linux/mm.h>
 #include <linux/interrupt.h>
 #include <linux/profile.h>
+#include <linux/ntp.h>
 
 #include <asm/io.h>
 #include <asm/m32r.h>
@@ -171,10 +172,7 @@ int do_settimeofday(struct timespec *tv)
 	set_normalized_timespec(&xtime, sec, nsec);
 	set_normalized_timespec(&wall_to_monotonic, wtm_sec, wtm_nsec);
 
-	time_adjust = 0;		/* stop active adjtime() */
-	time_status |= STA_UNSYNC;
-	time_maxerror = NTP_PHASE_LIMIT;
-	time_esterror = NTP_PHASE_LIMIT;
+	ntp_clear();
 	write_sequnlock_irq(&xtime_lock);
 	clock_was_set();
 
@@ -221,7 +219,7 @@ irqreturn_t timer_interrupt(int irq, voi
 	 * called as close as possible to 500 ms before the new second starts.
 	 */
 	write_seqlock(&xtime_lock);
-	if ((time_status & STA_UNSYNC) == 0
+	if (ntp_synced()
 		&& xtime.tv_sec > last_rtc_update + 660
 		&& (xtime.tv_nsec / 1000) >= 500000 - ((unsigned)TICK_SIZE) / 2
 		&& (xtime.tv_nsec / 1000) <= 500000 + ((unsigned)TICK_SIZE) / 2)
diff --git a/arch/m68k/kernel/time.c b/arch/m68k/kernel/time.c
--- a/arch/m68k/kernel/time.c
+++ b/arch/m68k/kernel/time.c
@@ -19,6 +19,7 @@
 #include <linux/string.h>
 #include <linux/mm.h>
 #include <linux/rtc.h>
+#include <linux/ntp.h>
 
 #include <asm/machdep.h>
 #include <asm/io.h>
@@ -166,10 +167,7 @@ int do_settimeofday(struct timespec *tv)
 	set_normalized_timespec(&xtime, sec, nsec);
 	set_normalized_timespec(&wall_to_monotonic, wtm_sec, wtm_nsec);
 
-	time_adjust = 0;		/* stop active adjtime() */
-	time_status |= STA_UNSYNC;
-	time_maxerror = NTP_PHASE_LIMIT;
-	time_esterror = NTP_PHASE_LIMIT;
+	ntp_clear();
 	write_sequnlock_irq(&xtime_lock);
 	clock_was_set();
 	return 0;
diff --git a/arch/m68knommu/kernel/time.c b/arch/m68knommu/kernel/time.c
--- a/arch/m68knommu/kernel/time.c
+++ b/arch/m68knommu/kernel/time.c
@@ -21,6 +21,7 @@
 #include <linux/profile.h>
 #include <linux/time.h>
 #include <linux/timex.h>
+#include <linux/ntp.h>
 
 #include <asm/machdep.h>
 #include <asm/io.h>
@@ -68,7 +69,7 @@ static irqreturn_t timer_interrupt(int i
 	 * CMOS clock accordingly every ~11 minutes. Set_rtc_mmss() has to be
 	 * called as close as possible to 500 ms before the new second starts.
 	 */
-	if ((time_status & STA_UNSYNC) == 0 &&
+	if (ntp_synced() &&
 	    xtime.tv_sec > last_rtc_update + 660 &&
 	    (xtime.tv_nsec / 1000) >= 500000 - ((unsigned) TICK_SIZE) / 2 &&
 	    (xtime.tv_nsec  / 1000) <= 500000 + ((unsigned) TICK_SIZE) / 2) {
@@ -178,10 +179,7 @@ int do_settimeofday(struct timespec *tv)
 	set_normalized_timespec(&xtime, sec, nsec);
 	set_normalized_timespec(&wall_to_monotonic, wtm_sec, wtm_nsec);
 
-	time_adjust = 0;		/* stop active adjtime() */
-	time_status |= STA_UNSYNC;
-	time_maxerror = NTP_PHASE_LIMIT;
-	time_esterror = NTP_PHASE_LIMIT;
+	ntp_clear();
 	write_sequnlock_irq(&xtime_lock);
 	clock_was_set();
 	return 0;
diff --git a/arch/mips/kernel/sysirix.c b/arch/mips/kernel/sysirix.c
--- a/arch/mips/kernel/sysirix.c
+++ b/arch/mips/kernel/sysirix.c
@@ -30,6 +30,7 @@
 #include <linux/socket.h>
 #include <linux/security.h>
 #include <linux/syscalls.h>
+#include <linux/ntp.h>
 
 #include <asm/ptrace.h>
 #include <asm/page.h>
@@ -632,10 +633,7 @@ asmlinkage int irix_stime(int value)
 	write_seqlock_irq(&xtime_lock);
 	xtime.tv_sec = value;
 	xtime.tv_nsec = 0;
-	time_adjust = 0;			/* stop active adjtime() */
-	time_status |= STA_UNSYNC;
-	time_maxerror = NTP_PHASE_LIMIT;
-	time_esterror = NTP_PHASE_LIMIT;
+	ntp_clear();
 	write_sequnlock_irq(&xtime_lock);
 
 	return 0;
diff --git a/arch/mips/kernel/time.c b/arch/mips/kernel/time.c
--- a/arch/mips/kernel/time.c
+++ b/arch/mips/kernel/time.c
@@ -23,6 +23,7 @@
 #include <linux/spinlock.h>
 #include <linux/interrupt.h>
 #include <linux/module.h>
+#include <linux/ntp.h>
 
 #include <asm/bootinfo.h>
 #include <asm/compiler.h>
@@ -223,10 +224,7 @@ int do_settimeofday(struct timespec *tv)
 	set_normalized_timespec(&xtime, sec, nsec);
 	set_normalized_timespec(&wall_to_monotonic, wtm_sec, wtm_nsec);
 
-	time_adjust = 0;			/* stop active adjtime() */
-	time_status |= STA_UNSYNC;
-	time_maxerror = NTP_PHASE_LIMIT;
-	time_esterror = NTP_PHASE_LIMIT;
+	ntp_clear();
 
 	write_sequnlock_irq(&xtime_lock);
 	clock_was_set();
@@ -442,7 +440,7 @@ irqreturn_t timer_interrupt(int irq, voi
 	 * called as close as possible to 500 ms before the new second starts.
 	 */
 	write_seqlock(&xtime_lock);
-	if ((time_status & STA_UNSYNC) == 0 &&
+	if (ntp_synced() &&
 	    xtime.tv_sec > last_rtc_update + 660 &&
 	    (xtime.tv_nsec / 1000) >= 500000 - ((unsigned) TICK_SIZE) / 2 &&
 	    (xtime.tv_nsec / 1000) <= 500000 + ((unsigned) TICK_SIZE) / 2) {
diff --git a/arch/mips/sgi-ip27/ip27-timer.c b/arch/mips/sgi-ip27/ip27-timer.c
--- a/arch/mips/sgi-ip27/ip27-timer.c
+++ b/arch/mips/sgi-ip27/ip27-timer.c
@@ -12,6 +12,7 @@
 #include <linux/time.h>
 #include <linux/timex.h>
 #include <linux/mm.h>
+#include <linux/ntp.h>
 
 #include <asm/time.h>
 #include <asm/pgtable.h>
@@ -118,7 +119,7 @@ again:
 	 * RTC clock accordingly every ~11 minutes. Set_rtc_mmss() has to be
 	 * called as close as possible to when a second starts.
 	 */
-	if ((time_status & STA_UNSYNC) == 0 &&
+	if (ntp_synced() &&
 	    xtime.tv_sec > last_rtc_update + 660 &&
 	    (xtime.tv_nsec / 1000) >= 500000 - ((unsigned) TICK_SIZE) / 2 &&
 	    (xtime.tv_nsec / 1000) <= 500000 + ((unsigned) TICK_SIZE) / 2) {
diff --git a/arch/parisc/kernel/time.c b/arch/parisc/kernel/time.c
--- a/arch/parisc/kernel/time.c
+++ b/arch/parisc/kernel/time.c
@@ -23,6 +23,7 @@
 #include <linux/init.h>
 #include <linux/smp.h>
 #include <linux/profile.h>
+#include <linux/ntp.h>
 
 #include <asm/uaccess.h>
 #include <asm/io.h>
@@ -188,10 +189,7 @@ do_settimeofday (struct timespec *tv)
 		set_normalized_timespec(&xtime, sec, nsec);
 		set_normalized_timespec(&wall_to_monotonic, wtm_sec, wtm_nsec);
 
-		time_adjust = 0;		/* stop active adjtime() */
-		time_status |= STA_UNSYNC;
-		time_maxerror = NTP_PHASE_LIMIT;
-		time_esterror = NTP_PHASE_LIMIT;
+		ntp_clear();
 	}
 	write_sequnlock_irq(&xtime_lock);
 	clock_was_set();
diff --git a/arch/ppc/kernel/time.c b/arch/ppc/kernel/time.c
--- a/arch/ppc/kernel/time.c
+++ b/arch/ppc/kernel/time.c
@@ -57,6 +57,7 @@
 #include <linux/time.h>
 #include <linux/init.h>
 #include <linux/profile.h>
+#include <linux/ntp.h>
 
 #include <asm/segment.h>
 #include <asm/io.h>
@@ -169,7 +170,7 @@ void timer_interrupt(struct pt_regs * re
 		 * We should have an rtc call that only sets the minutes and
 		 * seconds like on Intel to avoid problems with non UTC clocks.
 		 */
-		if ( ppc_md.set_rtc_time && (time_status & STA_UNSYNC) == 0 &&
+		if ( ppc_md.set_rtc_time && ntp_synced() &&
 		     xtime.tv_sec - last_rtc_update >= 659 &&
 		     abs((xtime.tv_nsec / 1000) - (1000000-1000000/HZ)) < 500000/HZ &&
 		     jiffies - wall_jiffies == 1) {
@@ -271,10 +272,7 @@ int do_settimeofday(struct timespec *tv)
 	 */
 	last_rtc_update = new_sec - 658;
 
-	time_adjust = 0;                /* stop active adjtime() */
-	time_status |= STA_UNSYNC;
-	time_maxerror = NTP_PHASE_LIMIT;
-	time_esterror = NTP_PHASE_LIMIT;
+	ntp_clear();
 	write_sequnlock_irqrestore(&xtime_lock, flags);
 	clock_was_set();
 	return 0;
diff --git a/arch/ppc64/kernel/time.c b/arch/ppc64/kernel/time.c
--- a/arch/ppc64/kernel/time.c
+++ b/arch/ppc64/kernel/time.c
@@ -50,6 +50,7 @@
 #include <linux/profile.h>
 #include <linux/cpu.h>
 #include <linux/security.h>
+#include <linux/ntp.h>
 
 #include <asm/segment.h>
 #include <asm/io.h>
@@ -128,7 +129,7 @@ static __inline__ void timer_check_rtc(v
          * We should have an rtc call that only sets the minutes and
          * seconds like on Intel to avoid problems with non UTC clocks.
          */
-        if ( (time_status & STA_UNSYNC) == 0 &&
+        if (ntp_synced() &&
              xtime.tv_sec - last_rtc_update >= 659 &&
              abs((xtime.tv_nsec/1000) - (1000000-1000000/HZ)) < 500000/HZ &&
              jiffies - wall_jiffies == 1) {
@@ -437,10 +438,7 @@ int do_settimeofday(struct timespec *tv)
 	 */
 	last_rtc_update = new_sec - 658;
 
-	time_adjust = 0;                /* stop active adjtime() */
-	time_status |= STA_UNSYNC;
-	time_maxerror = NTP_PHASE_LIMIT;
-	time_esterror = NTP_PHASE_LIMIT;
+	ntp_clear();
 
 	delta_xsec = mulhdu( (tb_last_stamp-do_gtod.varp->tb_orig_stamp),
 			     do_gtod.varp->tb_to_xs );
diff --git a/arch/s390/kernel/time.c b/arch/s390/kernel/time.c
--- a/arch/s390/kernel/time.c
+++ b/arch/s390/kernel/time.c
@@ -29,6 +29,7 @@
 #include <linux/profile.h>
 #include <linux/timex.h>
 #include <linux/notifier.h>
+#include <linux/ntp.h>
 
 #include <asm/uaccess.h>
 #include <asm/delay.h>
@@ -139,10 +140,7 @@ int do_settimeofday(struct timespec *tv)
 	set_normalized_timespec(&xtime, sec, nsec);
 	set_normalized_timespec(&wall_to_monotonic, wtm_sec, wtm_nsec);
 
-	time_adjust = 0;		/* stop active adjtime() */
-	time_status |= STA_UNSYNC;
-	time_maxerror = NTP_PHASE_LIMIT;
-	time_esterror = NTP_PHASE_LIMIT;
+	ntp_clear();
 	write_sequnlock_irq(&xtime_lock);
 	clock_was_set();
 	return 0;
diff --git a/arch/sh/kernel/time.c b/arch/sh/kernel/time.c
--- a/arch/sh/kernel/time.c
+++ b/arch/sh/kernel/time.c
@@ -24,6 +24,7 @@
 #include <linux/init.h>
 #include <linux/smp.h>
 #include <linux/profile.h>
+#include <linux/ntp.h>
 
 #include <asm/processor.h>
 #include <asm/uaccess.h>
@@ -215,10 +216,7 @@ int do_settimeofday(struct timespec *tv)
 	set_normalized_timespec(&xtime, sec, nsec);
 	set_normalized_timespec(&wall_to_monotonic, wtm_sec, wtm_nsec);
 
-	time_adjust = 0;		/* stop active adjtime() */
-	time_status |= STA_UNSYNC;
-	time_maxerror = NTP_PHASE_LIMIT;
-	time_esterror = NTP_PHASE_LIMIT;
+	ntp_clear();
 	write_sequnlock_irq(&xtime_lock);
 	clock_was_set();
 
@@ -252,7 +250,7 @@ static inline void do_timer_interrupt(in
 	 * RTC clock accordingly every ~11 minutes. Set_rtc_mmss() has to be
 	 * called as close as possible to 500 ms before the new second starts.
 	 */
-	if ((time_status & STA_UNSYNC) == 0 &&
+	if (ntp_synced() &&
 	    xtime.tv_sec > last_rtc_update + 660 &&
 	    (xtime.tv_nsec / 1000) >= 500000 - ((unsigned) TICK_SIZE) / 2 &&
 	    (xtime.tv_nsec / 1000) <= 500000 + ((unsigned) TICK_SIZE) / 2) {
diff --git a/arch/sh64/kernel/time.c b/arch/sh64/kernel/time.c
--- a/arch/sh64/kernel/time.c
+++ b/arch/sh64/kernel/time.c
@@ -29,6 +29,7 @@
 #include <linux/init.h>
 #include <linux/profile.h>
 #include <linux/smp.h>
+#include <linux/ntp.h>
 
 #include <asm/registers.h>	 /* required by inline __asm__ stmt. */
 
@@ -247,10 +248,7 @@ int do_settimeofday(struct timespec *tv)
 	set_normalized_timespec(&xtime, sec, nsec);
 	set_normalized_timespec(&wall_to_monotonic, wtm_sec, wtm_nsec);
 
-	time_adjust = 0;		/* stop active adjtime() */
-	time_status |= STA_UNSYNC;
-	time_maxerror = NTP_PHASE_LIMIT;
-	time_esterror = NTP_PHASE_LIMIT;
+	ntp_clear();
 	write_sequnlock_irq(&xtime_lock);
 	clock_was_set();
 
@@ -328,7 +326,7 @@ static inline void do_timer_interrupt(in
 	 * RTC clock accordingly every ~11 minutes. Set_rtc_mmss() has to be
 	 * called as close as possible to 500 ms before the new second starts.
 	 */
-	if ((time_status & STA_UNSYNC) == 0 &&
+	if (ntp_synced() &&
 	    xtime.tv_sec > last_rtc_update + 660 &&
 	    (xtime.tv_nsec / 1000) >= 500000 - ((unsigned) TICK_SIZE) / 2 &&
 	    (xtime.tv_nsec / 1000) <= 500000 + ((unsigned) TICK_SIZE) / 2) {
diff --git a/arch/sparc/kernel/pcic.c b/arch/sparc/kernel/pcic.c
--- a/arch/sparc/kernel/pcic.c
+++ b/arch/sparc/kernel/pcic.c
@@ -17,6 +17,7 @@
 #include <linux/mm.h>
 #include <linux/slab.h>
 #include <linux/jiffies.h>
+#include <linux/ntp.h>
 
 #include <asm/ebus.h>
 #include <asm/sbus.h> /* for sanity check... */
@@ -840,10 +841,7 @@ static int pci_do_settimeofday(struct ti
 
 	xtime.tv_sec = tv->tv_sec;
 	xtime.tv_nsec = tv->tv_nsec;
-	time_adjust = 0;		/* stop active adjtime() */
-	time_status |= STA_UNSYNC;
-	time_maxerror = NTP_PHASE_LIMIT;
-	time_esterror = NTP_PHASE_LIMIT;
+	ntp_clear();
 	return 0;
 }
 
diff --git a/arch/sparc/kernel/time.c b/arch/sparc/kernel/time.c
--- a/arch/sparc/kernel/time.c
+++ b/arch/sparc/kernel/time.c
@@ -30,6 +30,7 @@
 #include <linux/pci.h>
 #include <linux/ioport.h>
 #include <linux/profile.h>
+#include <linux/ntp.h>
 
 #include <asm/oplib.h>
 #include <asm/segment.h>
@@ -140,7 +141,7 @@ irqreturn_t timer_interrupt(int irq, voi
 
 
 	/* Determine when to update the Mostek clock. */
-	if ((time_status & STA_UNSYNC) == 0 &&
+	if (ntp_synced() &&
 	    xtime.tv_sec > last_rtc_update + 660 &&
 	    (xtime.tv_nsec / 1000) >= 500000 - ((unsigned) TICK_SIZE) / 2 &&
 	    (xtime.tv_nsec / 1000) <= 500000 + ((unsigned) TICK_SIZE) / 2) {
@@ -555,10 +556,7 @@ static int sbus_do_settimeofday(struct t
 	set_normalized_timespec(&xtime, sec, nsec);
 	set_normalized_timespec(&wall_to_monotonic, wtm_sec, wtm_nsec);
 
-	time_adjust = 0;		/* stop active adjtime() */
-	time_status |= STA_UNSYNC;
-	time_maxerror = NTP_PHASE_LIMIT;
-	time_esterror = NTP_PHASE_LIMIT;
+	ntp_clear();
 	return 0;
 }
 
diff --git a/arch/sparc64/kernel/time.c b/arch/sparc64/kernel/time.c
--- a/arch/sparc64/kernel/time.c
+++ b/arch/sparc64/kernel/time.c
@@ -30,6 +30,7 @@
 #include <linux/cpufreq.h>
 #include <linux/percpu.h>
 #include <linux/profile.h>
+#include <linux/ntp.h>
 
 #include <asm/oplib.h>
 #include <asm/mostek.h>
@@ -449,7 +450,7 @@ static inline void timer_check_rtc(void)
 	static long last_rtc_update;
 
 	/* Determine when to update the Mostek clock. */
-	if ((time_status & STA_UNSYNC) == 0 &&
+	if (ntp_synced() &&
 	    xtime.tv_sec > last_rtc_update + 660 &&
 	    (xtime.tv_nsec / 1000) >= 500000 - ((unsigned) TICK_SIZE) / 2 &&
 	    (xtime.tv_nsec / 1000) <= 500000 + ((unsigned) TICK_SIZE) / 2) {
diff --git a/arch/v850/kernel/time.c b/arch/v850/kernel/time.c
--- a/arch/v850/kernel/time.c
+++ b/arch/v850/kernel/time.c
@@ -21,6 +21,7 @@
 #include <linux/time.h>
 #include <linux/timex.h>
 #include <linux/profile.h>
+#include <linux/ntp.h>
 
 #include <asm/io.h>
 
@@ -66,7 +67,7 @@ static irqreturn_t timer_interrupt (int 
 	 * CMOS clock accordingly every ~11 minutes. Set_rtc_mmss() has to be
 	 * called as close as possible to 500 ms before the new second starts.
 	 */
-	if ((time_status & STA_UNSYNC) == 0 &&
+	if (ntp_synced() &&
 	    xtime.tv_sec > last_rtc_update + 660 &&
 	    (xtime.tv_nsec / 1000) >= 500000 - ((unsigned) TICK_SIZE) / 2 &&
 	    (xtime.tv_nsec / 1000) <= 500000 + ((unsigned) TICK_SIZE) / 2) {
@@ -169,10 +170,7 @@ int do_settimeofday(struct timespec *tv)
 	xtime.tv_sec = tv->tv_sec;
 	xtime.tv_nsec = tv->tv_nsec;
 
-	time_adjust = 0;		/* stop active adjtime () */
-	time_status |= STA_UNSYNC;
-	time_maxerror = NTP_PHASE_LIMIT;
-	time_esterror = NTP_PHASE_LIMIT;
+	ntp_clear();
 
 	write_sequnlock_irq (&xtime_lock);
 	clock_was_set();
diff --git a/arch/x86_64/kernel/time.c b/arch/x86_64/kernel/time.c
--- a/arch/x86_64/kernel/time.c
+++ b/arch/x86_64/kernel/time.c
@@ -27,6 +27,8 @@
 #include <linux/bcd.h>
 #include <linux/kallsyms.h>
 #include <linux/acpi.h>
+#include <linux/ntp.h>
+
 #ifdef CONFIG_ACPI
 #include <acpi/achware.h>	/* for PM timer frequency */
 #endif
@@ -176,10 +178,7 @@ int do_settimeofday(struct timespec *tv)
 	set_normalized_timespec(&xtime, sec, nsec);
 	set_normalized_timespec(&wall_to_monotonic, wtm_sec, wtm_nsec);
 
-	time_adjust = 0;		/* stop active adjtime() */
-	time_status |= STA_UNSYNC;
-	time_maxerror = NTP_PHASE_LIMIT;
-	time_esterror = NTP_PHASE_LIMIT;
+	ntp_clear();
 
 	write_sequnlock_irq(&xtime_lock);
 	clock_was_set();
@@ -471,7 +470,7 @@ static irqreturn_t timer_interrupt(int i
  * off) isn't likely to go away much sooner anyway.
  */
 
-	if ((~time_status & STA_UNSYNC) && xtime.tv_sec > rtc_update &&
+	if (ntp_synced() && xtime.tv_sec > rtc_update &&
 		abs(xtime.tv_nsec - 500000000) <= tick_nsec / 2) {
 		set_rtc_mmss(xtime.tv_sec);
 		rtc_update = xtime.tv_sec + 660;
diff --git a/arch/xtensa/kernel/time.c b/arch/xtensa/kernel/time.c
--- a/arch/xtensa/kernel/time.c
+++ b/arch/xtensa/kernel/time.c
@@ -22,6 +22,7 @@
 #include <linux/irq.h>
 #include <linux/profile.h>
 #include <linux/delay.h>
+#include <linux/ntp.h>
 
 #include <asm/timex.h>
 #include <asm/platform.h>
@@ -122,10 +123,7 @@ int do_settimeofday(struct timespec *tv)
 	set_normalized_timespec(&xtime, sec, nsec);
 	set_normalized_timespec(&wall_to_monotonic, wtm_sec, wtm_nsec);
 
-	time_adjust = 0;                /* stop active adjtime() */
-	time_status |= STA_UNSYNC;
-	time_maxerror = NTP_PHASE_LIMIT;
-	time_esterror = NTP_PHASE_LIMIT;
+	ntp_clear();
 	write_sequnlock_irq(&xtime_lock);
 	return 0;
 }
@@ -184,7 +182,7 @@ again:
 		next += CCOUNT_PER_JIFFY;
 		do_timer (regs); /* Linux handler in kernel/timer.c */
 
-		if ((time_status & STA_UNSYNC) == 0 &&
+		if (ntp_synced() &&
 		    xtime.tv_sec - last_rtc_update >= 659 &&
 		    abs((xtime.tv_nsec/1000)-(1000000-1000000/HZ))<5000000/HZ &&
 		    jiffies - wall_jiffies == 1) {



^ permalink raw reply	[flat|nested] 71+ messages in thread

* [RFC][PATCH - 3/13] NTP cleanup: Remove unused NTP PPS code
  2005-08-11  1:25   ` [RFC][PATCH - 2/13] NTP cleanup: Move arches to new ntp interfaces john stultz
@ 2005-08-11  1:26     ` john stultz
  2005-08-11  1:27       ` [RFC][PATCH - 4/13] NTP cleanup: Breakup ntp_adjtimex() john stultz
  0 siblings, 1 reply; 71+ messages in thread
From: john stultz @ 2005-08-11  1:26 UTC (permalink / raw)
  To: lkml
  Cc: George Anzinger, frank, Anton Blanchard, benh,
	Nishanth Aravamudan, Roman Zippel, Ulrich Windl

All,
	Since the NTP PPS code required an out of tree patch which I don't
believe there is a 2.6 version of, this patch removes the unused PPS
logic in the kernel.

Any comments or feedback would be greatly appreciated.

thanks
-john


linux-2.6.13-rc6_timeofday-ntp-part3_B5.patch
============================================
diff --git a/kernel/ntp.c b/kernel/ntp.c
--- a/kernel/ntp.c
+++ b/kernel/ntp.c
@@ -46,26 +46,9 @@ void time_interpolator_update(long delta
 #define time_interpolator_update(x) do {} while (0)
 #endif
 
-
-static long pps_offset;            /* pps time offset (us) */
-static long pps_jitter = MAXTIME;  /* time dispersion (jitter) (us) */
-
-static long pps_freq;              /* frequency offset (scaled ppm) */
-static long pps_stabil = MAXFREQ;  /* frequency dispersion (scaled ppm) */
-
-static long pps_valid = PPS_VALID; /* pps signal watchdog counter */
-
-static int pps_shift = PPS_SHIFT;  /* interval duration (s) (shift) */
-
-static long pps_jitcnt;            /* jitter limit exceeded */
-static long pps_calcnt;            /* calibration intervals */
-static long pps_errcnt;            /* calibration errors */
-static long pps_stbcnt;            /* stability limit exceeded */
-
 /* Don't completely fail for HZ > 500.  */
 int tickadj = 500/HZ ? : 1;		/* microsecs */
 
-
 /*
  * phase-lock loop variables
  */
@@ -235,21 +218,7 @@ void second_overflow(void)
 		time_adj = ltemp << (SHIFT_SCALE - SHIFT_HZ - SHIFT_UPDATE);
 	}
 
-	/*
-	 * Compute the frequency estimate and additional phase
-	 * adjustment due to frequency error for the next
-	 * second. When the PPS signal is engaged, gnaw on the
-	 * watchdog counter and update the frequency computed by
-	 * the pll and the PPS signal.
-	 */
-	pps_valid++;
-	if (pps_valid == PPS_VALID) {	/* PPS signal lost */
-		pps_jitter = MAXTIME;
-		pps_stabil = MAXFREQ;
-		time_status &= ~(STA_PPSSIGNAL | STA_PPSJITTER |
-			STA_PPSWANDER | STA_PPSERROR);
-	}
-	ltemp = time_freq + pps_freq;
+	ltemp = time_freq;
 	if (ltemp < 0)
 		time_adj -= -ltemp >> (SHIFT_USEC + SHIFT_HZ - SHIFT_SCALE);
 	else
@@ -318,7 +287,7 @@ int ntp_adjtimex(struct timex *txc)
 				result = -EINVAL;
 				goto leave;
 			}
-			time_freq = txc->freq - pps_freq;
+			time_freq = txc->freq;
 		}
 
 		if (txc->modes & ADJ_MAXERROR) {
@@ -352,11 +321,8 @@ int ntp_adjtimex(struct timex *txc)
 				/* adjtime() is independent from ntp_adjtime() */
 				if ((time_next_adjust = txc->offset) == 0)
 					time_adjust = 0;
-			} else if ( time_status & (STA_PLL | STA_PPSTIME) ) {
-				ltemp = (time_status
-					& (STA_PPSTIME | STA_PPSSIGNAL))
-					== (STA_PPSTIME | STA_PPSSIGNAL) ?
-			            pps_offset : txc->offset;
+			} else if (time_status & STA_PLL) {
+				ltemp = txc->offset;
 
 				/*
 				 * Scale the phase adjustment and
@@ -411,7 +377,7 @@ int ntp_adjtimex(struct timex *txc)
 					time_freq = time_tolerance;
 				else if (time_freq < -time_tolerance)
 					time_freq = -time_tolerance;
-			} /* STA_PLL || STA_PPSTIME */
+			} /* STA_PLL */
 		} /* txc->modes & ADJ_OFFSET */
 
 		if (txc->modes & ADJ_TICK) {
@@ -421,17 +387,8 @@ int ntp_adjtimex(struct timex *txc)
 	} /* txc->modes */
 leave:
 
-	if ((time_status & (STA_UNSYNC|STA_CLOCKERR)) != 0
-		|| ((time_status & (STA_PPSFREQ|STA_PPSTIME)) != 0
-		&& (time_status & STA_PPSSIGNAL) == 0)
-		/* p. 24, (b) */
-		|| ((time_status & (STA_PPSTIME|STA_PPSJITTER))
-		== (STA_PPSTIME|STA_PPSJITTER))
-		/* p. 24, (c) */
-		|| ((time_status & STA_PPSFREQ) != 0
-		&& (time_status & (STA_PPSWANDER|STA_PPSERROR)) != 0))
-		/* p. 24, (d) */
-			result = TIME_ERROR;
+	if ((time_status & (STA_UNSYNC|STA_CLOCKERR)) != 0)
+		result = TIME_ERROR;
 
 	if ((txc->modes & ADJ_OFFSET_SINGLESHOT) == ADJ_OFFSET_SINGLESHOT)
 		txc->offset = save_adjust;
@@ -441,7 +398,7 @@ leave:
 		else
 			txc->offset = time_offset >> SHIFT_UPDATE;
 	}
-	txc->freq = time_freq + pps_freq;
+	txc->freq = time_freq;
 	txc->maxerror = time_maxerror;
 	txc->esterror = time_esterror;
 	txc->status = time_status;
@@ -449,14 +406,17 @@ leave:
 	txc->precision = time_precision;
 	txc->tolerance = time_tolerance;
 	txc->tick = tick_usec;
-	txc->ppsfreq = pps_freq;
-	txc->jitter = pps_jitter >> PPS_AVG;
-	txc->shift = pps_shift;
-	txc->stabil = pps_stabil;
-	txc->jitcnt = pps_jitcnt;
-	txc->calcnt = pps_calcnt;
-	txc->errcnt = pps_errcnt;
-	txc->stbcnt = pps_stbcnt;
+
+	/* PPS is not implemented, so these are zero */
+	txc->ppsfreq = 0;
+	txc->jitter = 0;
+	txc->shift = 0;
+	txc->stabil = 0;
+	txc->jitcnt = 0;
+	txc->calcnt = 0;
+	txc->errcnt = 0;
+	txc->stbcnt = 0;
+
 	write_sequnlock_irq(&xtime_lock);
 	do_gettimeofday(&txc->time);
 	return result;



^ permalink raw reply	[flat|nested] 71+ messages in thread

* [RFC][PATCH - 4/13] NTP cleanup: Breakup ntp_adjtimex()
  2005-08-11  1:26     ` [RFC][PATCH - 3/13] NTP cleanup: Remove unused NTP PPS code john stultz
@ 2005-08-11  1:27       ` john stultz
  2005-08-11  1:28         ` [RFC][PATCH - 5/13] NTP cleanup: Break out leapsecond processing john stultz
  2005-08-16  2:08         ` [RFC][PATCH - 4/13] NTP cleanup: Breakup ntp_adjtimex() john stultz
  0 siblings, 2 replies; 71+ messages in thread
From: john stultz @ 2005-08-11  1:27 UTC (permalink / raw)
  To: lkml
  Cc: George Anzinger, frank, Anton Blanchard, benh,
	Nishanth Aravamudan, Roman Zippel, Ulrich Windl

All,
	This patch breaks up the complex nesting of code in ntp_adjtimex() by
creating a ntp_hardupdate() function and simplifying some of the logic.
This also mimics the documented NTP spec somewhat better.

Any comments or feedback would be greatly appreciated.

thanks
-john


linux-2.6.13-rc6_timeofday-ntp-part4_B5.patch
============================================
diff --git a/include/linux/ntp.h b/include/linux/ntp.h
--- a/include/linux/ntp.h
+++ b/include/linux/ntp.h
@@ -9,6 +9,11 @@
 #include <linux/time.h>
 #include <linux/timex.h>
 
+/* Required to safely shift negative values */
+#define shiftR(x, s) ({ __typeof__(x) __x = x;\
+		__typeof__(s) __s = s; \
+		(__x < 0) ? (-((-__x) >> (__s))) : ((__x) >> (__s));})
+
 /* NTP state machine interfaces */
 void ntp_advance(void);
 int ntp_adjtimex(struct timex*);
diff --git a/kernel/ntp.c b/kernel/ntp.c
--- a/kernel/ntp.c
+++ b/kernel/ntp.c
@@ -244,12 +244,79 @@ void second_overflow(void)
 #endif
 }
 
+/**
+ * ntp_hardupdate - Calculates the offset and freq values
+ * offset: current offset
+ * tv: timeval holding the current time
+ *
+ * Private function, called only by ntp_adjtimex
+ *
+ * This function is called when an offset adjustment is requested.
+ * It calculates the offset adjustment and manipulates the
+ * frequency adjustement accordingly.
+ */
+static int ntp_hardupdate(long offset, struct timespec tv)
+{
+	int ret;
+	long current_offset, interval;
+
+	ret = 0;
+	if (!(time_status & STA_PLL))
+		return ret;
+
+	current_offset = offset;
+	/* Make sure offset is bounded by MAXPHASE */
+	current_offset = min(current_offset, MAXPHASE);
+	current_offset = max(current_offset, -MAXPHASE);
+	time_offset = current_offset << SHIFT_UPDATE;
+
+	if (time_status & STA_FREQHOLD || time_reftime == 0)
+		time_reftime = tv.tv_sec;
+
+	/* calculate seconds since last call to hardupdate */
+	interval = tv.tv_sec - time_reftime;
+	time_reftime = tv.tv_sec;
+
+	/*
+	 * Select whether the frequency is to be controlled
+	 * and in which mode (PLL or FLL). Clamp to the operating
+	 * range. Ugly multiply/divide should be replaced someday.
+	 */
+	if ((time_status & STA_FLL) && (interval >= MINSEC)) {
+		long offset_ppm;
+
+		offset_ppm = time_offset / interval;
+		offset_ppm <<= (SHIFT_USEC - SHIFT_UPDATE);
+
+		time_freq += shiftR(offset_ppm, SHIFT_KH);
+
+	} else if ((time_status & STA_PLL) && (interval < MAXSEC)) {
+		long damping, offset_ppm;
+
+		offset_ppm = offset * interval;
+
+		damping = (2 * time_constant) + SHIFT_KF - SHIFT_USEC;
+
+		time_freq += shiftR(offset_ppm, damping);
+
+	} else { /* calibration interval out of bounds (p. 12) */
+		ret = TIME_ERROR;
+	}
+
+	/* bound time_freq */
+	time_freq = min(time_freq, time_tolerance);
+	time_freq = max(time_freq, -time_tolerance);
+
+	return ret;
+}
+
+
 /* adjtimex mainly allows reading (and writing, if superuser) of
  * kernel time-keeping variables. used by xntpd.
  */
 int ntp_adjtimex(struct timex *txc)
 {
-	long ltemp, mtemp, save_adjust;
+	long save_adjust;
 	int result;
 
 	/* Now we validate the data before disabling interrupts */
@@ -321,63 +388,9 @@ int ntp_adjtimex(struct timex *txc)
 				/* adjtime() is independent from ntp_adjtime() */
 				if ((time_next_adjust = txc->offset) == 0)
 					time_adjust = 0;
-			} else if (time_status & STA_PLL) {
-				ltemp = txc->offset;
-
-				/*
-				 * Scale the phase adjustment and
-				 * clamp to the operating range.
-				 */
-				if (ltemp > MAXPHASE)
-					time_offset = MAXPHASE << SHIFT_UPDATE;
-				else if (ltemp < -MAXPHASE)
-					time_offset = -(MAXPHASE
-							<< SHIFT_UPDATE);
-				else
-					time_offset = ltemp << SHIFT_UPDATE;
-
-				/*
-				 * Select whether the frequency is to be controlled
-				 * and in which mode (PLL or FLL). Clamp to the operating
-				 * range. Ugly multiply/divide should be replaced someday.
-				 */
-
-				if (time_status & STA_FREQHOLD || time_reftime == 0)
-					time_reftime = xtime.tv_sec;
-
-				mtemp = xtime.tv_sec - time_reftime;
-				time_reftime = xtime.tv_sec;
-
-				if (time_status & STA_FLL) {
-					if (mtemp >= MINSEC) {
-						ltemp = (time_offset / mtemp) << (SHIFT_USEC -
-									SHIFT_UPDATE);
-						if (ltemp < 0)
-							time_freq -= -ltemp >> SHIFT_KH;
-						else
-							time_freq += ltemp >> SHIFT_KH;
-					} else /* calibration interval too short (p. 12) */
-						result = TIME_ERROR;
-				} else {	/* PLL mode */
-					if (mtemp < MAXSEC) {
-						ltemp *= mtemp;
-						if (ltemp < 0)
-							time_freq -= -ltemp >> (time_constant +
-									time_constant +
-									SHIFT_KF - SHIFT_USEC);
-					    else
-				    	    time_freq += ltemp >> (time_constant +
-								       time_constant +
-								       SHIFT_KF - SHIFT_USEC);
-					} else /* calibration interval too long (p. 12) */
-						result = TIME_ERROR;
-				}
-
-				if (time_freq > time_tolerance)
-					time_freq = time_tolerance;
-				else if (time_freq < -time_tolerance)
-					time_freq = -time_tolerance;
-			} /* STA_PLL */
+				else if (ntp_hardupdate(txc->offset, xtime))
+					result = TIME_ERROR;
+			}
 		} /* txc->modes & ADJ_OFFSET */
 
 		if (txc->modes & ADJ_TICK) {



^ permalink raw reply	[flat|nested] 71+ messages in thread

* [RFC][PATCH - 5/13] NTP cleanup: Break out leapsecond processing
  2005-08-11  1:27       ` [RFC][PATCH - 4/13] NTP cleanup: Breakup ntp_adjtimex() john stultz
@ 2005-08-11  1:28         ` john stultz
  2005-08-11  1:28           ` [RFC][PATCH - 6/13] NTP cleanup: Clean up ntp_adjtimex() arguement checking john stultz
  2005-08-16  2:08         ` [RFC][PATCH - 4/13] NTP cleanup: Breakup ntp_adjtimex() john stultz
  1 sibling, 1 reply; 71+ messages in thread
From: john stultz @ 2005-08-11  1:28 UTC (permalink / raw)
  To: lkml
  Cc: George Anzinger, frank, Anton Blanchard, benh,
	Nishanth Aravamudan, Roman Zippel, Ulrich Windl

All,
	This patch breaks the leapsecond processing logic into its own
function. By making the NTP code avoid making any direct changes to
time, instead allowing the time code to use NTP to decide when to change
time, we better isolate the NTP subsystem.

Any comments or feedback would be greatly appreciated.

thanks
-john


linux-2.6.13-rc6_timeofday-ntp-part5_B5.patch
============================================
diff --git a/include/linux/ntp.h b/include/linux/ntp.h
--- a/include/linux/ntp.h
+++ b/include/linux/ntp.h
@@ -18,6 +18,7 @@
 void ntp_advance(void);
 int ntp_adjtimex(struct timex*);
 void second_overflow(void);
+int ntp_leapsecond(struct timespec now);
 void ntp_clear(void);
 int ntp_synced(void);
 long ntp_get_fixed_ns_adjustment(void);
diff --git a/kernel/ntp.c b/kernel/ntp.c
--- a/kernel/ntp.c
+++ b/kernel/ntp.c
@@ -40,12 +40,6 @@
 #include <linux/jiffies.h>
 #include <linux/errno.h>
 
-#ifdef CONFIG_TIME_INTERPOLATION
-void time_interpolator_update(long delta_nsec);
-#else
-#define time_interpolator_update(x) do {} while (0)
-#endif
-
 /* Don't completely fail for HZ > 500.  */
 int tickadj = 500/HZ ? : 1;		/* microsecs */
 
@@ -71,6 +65,8 @@ static long time_next_adjust;
 
 static long fixed_tick_ns_adj;
 
+#define SEC_PER_DAY 86400
+
 void ntp_advance(void)
 {
 	long time_adjust_step;
@@ -139,59 +135,6 @@ void second_overflow(void)
 	}
 
 	/*
-	 * Leap second processing. If in leap-insert state at
-	 * the end of the day, the system clock is set back one
-	 * second; if in leap-delete state, the system clock is
-	 * set ahead one second. The microtime() routine or
-	 * external clock driver will insure that reported time
-	 * is always monotonic. The ugly divides should be
-	 * replaced.
-	 */
-	switch (time_state) {
-
-	case TIME_OK:
-		if (time_status & STA_INS)
-			time_state = TIME_INS;
-		else if (time_status & STA_DEL)
-			time_state = TIME_DEL;
-		break;
-
-	case TIME_INS:
-		if (xtime.tv_sec % 86400 == 0) {
-			xtime.tv_sec--;
-			wall_to_monotonic.tv_sec++;
-			/* The timer interpolator will make time change gradually instead
-			 * of an immediate jump by one second.
-			 */
-			time_interpolator_update(-NSEC_PER_SEC);
-			time_state = TIME_OOP;
-			clock_was_set();
-			printk(KERN_NOTICE "Clock: inserting leap second 23:59:60 UTC\n");
-		}
-		break;
-
-	case TIME_DEL:
-		if ((xtime.tv_sec + 1) % 86400 == 0) {
-			xtime.tv_sec++;
-			wall_to_monotonic.tv_sec--;
-			/* Use of time interpolator for a gradual change of time */
-			time_interpolator_update(NSEC_PER_SEC);
-			time_state = TIME_WAIT;
-			clock_was_set();
-			printk(KERN_NOTICE "Clock: deleting leap second 23:59:59 UTC\n");
-		}
-		break;
-
-	case TIME_OOP:
-		time_state = TIME_WAIT;
-		break;
-
-	case TIME_WAIT:
-		if (!(time_status & (STA_INS | STA_DEL)))
-			time_state = TIME_OK;
-	}
-
-	/*
 	 * Compute the phase adjustment for the next second. In
 	 * PLL mode, the offset is reduced by a fixed factor
 	 * times the time constant. In FLL mode the offset is
@@ -435,6 +378,70 @@ leave:
 	return result;
 }
 
+
+/**
+ * ntp_leapsecond - NTP leapsecond processing code.
+ * now: the current time
+ *
+ * Returns the number of seconds (-1, 0, or 1) that
+ * should be added to the current time to properly
+ * adjust for leapseconds.
+ */
+int ntp_leapsecond(struct timespec now)
+{
+	/*
+	 * Leap second processing. If in leap-insert state at
+	 * the end of the day, the system clock is set back one
+	 * second; if in leap-delete state, the system clock is
+	 * set ahead one second.
+	 */
+	static time_t leaptime = 0;
+
+	switch (time_state) {
+	case TIME_OK:
+		if (time_status & STA_INS)
+			time_state = TIME_INS;
+		else if (time_status & STA_DEL)
+			time_state = TIME_DEL;
+
+		/* calculate end of today (23:59:59)*/
+		leaptime = now.tv_sec + SEC_PER_DAY -
+					(now.tv_sec % SEC_PER_DAY) - 1;
+		break;
+
+	case TIME_INS:
+		/* Once we are at (or past) leaptime, insert the second */
+		if (now.tv_sec >= leaptime) {
+			time_state = TIME_OOP;
+			printk(KERN_NOTICE "Clock: inserting leap second 23:59:60 UTC\n");
+			return -1;
+		}
+		break;
+
+	case TIME_DEL:
+		/* Once we are at (or past) leaptime, delete the second */
+		if (now.tv_sec >= leaptime) {
+			time_state = TIME_WAIT;
+			printk(KERN_NOTICE "Clock: deleting leap second 23:59:59 UTC\n");
+			return 1;
+		}
+		break;
+
+	case TIME_OOP:
+		/*  Wait for the end of the leap second*/
+		if (now.tv_sec > (leaptime + 1))
+			time_state = TIME_WAIT;
+		break;
+
+	case TIME_WAIT:
+		if (!(time_status & (STA_INS | STA_DEL)))
+			time_state = TIME_OK;
+	}
+
+	return 0;
+}
+
+
 /**
  * ntp_clear - Clears the NTP state machine.
  *
diff --git a/kernel/timer.c b/kernel/timer.c
--- a/kernel/timer.c
+++ b/kernel/timer.c
@@ -42,7 +42,7 @@
 #include <asm/io.h>
 
 #ifdef CONFIG_TIME_INTERPOLATION
-void time_interpolator_update(long delta_nsec);
+static void time_interpolator_update(long delta_nsec);
 #else
 #define time_interpolator_update(x)
 #endif
@@ -626,9 +626,19 @@ static void update_wall_time(unsigned lo
 		ticks--;
 		update_wall_time_one_tick();
 		if (xtime.tv_nsec >= 1000000000) {
+			int leapsecond;
 			xtime.tv_nsec -= 1000000000;
 			xtime.tv_sec++;
 			second_overflow();
+
+			/* apply leapsecond if appropriate */
+			leapsecond = ntp_leapsecond(xtime);
+			if (leapsecond) {
+				xtime.tv_sec += leapsecond;
+				wall_to_monotonic.tv_sec -= leapsecond;
+				time_interpolator_update(leapsecond * NSEC_PER_SEC);
+				clock_was_set();
+			}
 		}
 	} while (ticks);
 }
@@ -1274,7 +1284,7 @@ unsigned long time_interpolator_get_offs
 #define INTERPOLATOR_ADJUST 65536
 #define INTERPOLATOR_MAX_SKIP 10*INTERPOLATOR_ADJUST
 
-void time_interpolator_update(long delta_nsec)
+static void time_interpolator_update(long delta_nsec)
 {
 	u64 counter;
 	unsigned long offset;



^ permalink raw reply	[flat|nested] 71+ messages in thread

* [RFC][PATCH - 6/13] NTP cleanup: Clean up ntp_adjtimex() arguement checking
  2005-08-11  1:28         ` [RFC][PATCH - 5/13] NTP cleanup: Break out leapsecond processing john stultz
@ 2005-08-11  1:28           ` john stultz
  2005-08-11  1:31             ` [RFC][PATCH - 7/13] NTP cleanup: Cleanup signed shifting logic john stultz
  0 siblings, 1 reply; 71+ messages in thread
From: john stultz @ 2005-08-11  1:28 UTC (permalink / raw)
  To: lkml
  Cc: George Anzinger, frank, Anton Blanchard, benh,
	Nishanth Aravamudan, Roman Zippel, Ulrich Windl

All
	Currently ntp_adjtimex() checks the validity of a few arguments values
then takes the xtime_lock then checks the validity of more arguements
while it parses them. This separates the logic so we check the validity
of all arguements before aquiring the xtime lock. This greatly improves
the readability of the code.

Any comments or feedback would be greatly appreciated.

thanks
-john


linux-2.6.13-rc6_timeofday-ntp-part6_B5.patch
============================================
diff --git a/kernel/ntp.c b/kernel/ntp.c
--- a/kernel/ntp.c
+++ b/kernel/ntp.c
@@ -264,21 +264,44 @@ int ntp_adjtimex(struct timex *txc)
 
 	/* Now we validate the data before disabling interrupts */
 
-	if ((txc->modes & ADJ_OFFSET_SINGLESHOT) == ADJ_OFFSET_SINGLESHOT)
-		/* singleshot must not be used with any other mode bits */
-		if (txc->modes != ADJ_OFFSET_SINGLESHOT)
-			return -EINVAL;
-
-	if (txc->modes != ADJ_OFFSET_SINGLESHOT && (txc->modes & ADJ_OFFSET))
-		/* adjustment Offset limited to +- .512 seconds */
-		if (txc->offset <= - MAXPHASE || txc->offset >= MAXPHASE )
-			return -EINVAL;
-
-	/* if the quartz is off by more than 10% something is VERY wrong ! */
-	if (txc->modes & ADJ_TICK)
-		if (txc->tick <  900000/USER_HZ ||
-				txc->tick > 1100000/USER_HZ)
-			return -EINVAL;
+	/* frequency adjustment limited to +/- MAXFREQ */
+	if ((txc->modes & ADJ_FREQUENCY)
+			&& (abs(txc->freq) > MAXFREQ))
+		return -EINVAL;
+
+	/* maxerror adjustment limited to NTP_PHASE_LIMIT */
+	if ((txc->modes & ADJ_MAXERROR)
+			&& (txc->maxerror < 0
+				|| txc->maxerror >= NTP_PHASE_LIMIT))
+		return -EINVAL;
+
+	/* esterror adjustment limited to NTP_PHASE_LIMIT */
+	if ((txc->modes & ADJ_ESTERROR)
+			&& (txc->esterror < 0
+				|| txc->esterror >= NTP_PHASE_LIMIT))
+		return -EINVAL;
+
+	/* constant adjustment must be positive */
+	if ((txc->modes & ADJ_TIMECONST)
+			&& (txc->constant < 0)) /* NTP v4 uses values > 6 */
+		return -EINVAL;
+
+	/* Single shot mode can only be used by itself */
+	if (((txc->modes & ADJ_OFFSET_SINGLESHOT) == ADJ_OFFSET_SINGLESHOT)
+			&& (txc->modes != ADJ_OFFSET_SINGLESHOT))
+		return -EINVAL;
+
+	/* offset adjustment limited to +/- MAXPHASE */
+	if ((txc->modes != ADJ_OFFSET_SINGLESHOT)
+			&& (txc->modes & ADJ_OFFSET)
+			&& (abs(txc->offset)>= MAXPHASE))
+		return -EINVAL;
+
+	/* tick adjustment limited to 10% */
+	if ((txc->modes & ADJ_TICK)
+			&& ((txc->tick < 900000/USER_HZ)
+				||(txc->tick > 11000000/USER_HZ)))
+		return -EINVAL;
 
 	write_seqlock_irq(&xtime_lock);
 	result = time_state;       /* mostly `TIME_OK' */
@@ -287,65 +310,42 @@ int ntp_adjtimex(struct timex *txc)
 	save_adjust = time_next_adjust ? time_next_adjust : time_adjust;
 
 	/* If there are input parameters, then process them */
-	if (txc->modes)	{
-		if (txc->modes & ADJ_STATUS)	/* only set allowed bits */
-			time_status =  (txc->status & ~STA_RONLY) |
-					(time_status & STA_RONLY);
-
-		if (txc->modes & ADJ_FREQUENCY) {	/* p. 22 */
-			if (txc->freq > MAXFREQ || txc->freq < -MAXFREQ) {
-				result = -EINVAL;
-				goto leave;
-			}
-			time_freq = txc->freq;
-		}
-
-		if (txc->modes & ADJ_MAXERROR) {
-			if (txc->maxerror < 0
-					|| txc->maxerror >= NTP_PHASE_LIMIT) {
-				result = -EINVAL;
-				goto leave;
-			}
-			time_maxerror = txc->maxerror;
-		}
-
-		if (txc->modes & ADJ_ESTERROR) {
-			if (txc->esterror < 0
-					|| txc->esterror >= NTP_PHASE_LIMIT) {
-				result = -EINVAL;
-				goto leave;
-			}
-			time_esterror = txc->esterror;
-		}
 
-		if (txc->modes & ADJ_TIMECONST) {	/* p. 24 */
-			if (txc->constant < 0) { /* NTP v4 uses values > 6 */
-				result = -EINVAL;
-				goto leave;
-			}
-			time_constant = txc->constant;
-		}
+	if (txc->modes & ADJ_STATUS)	/* only set allowed bits */
+		time_status = (txc->status & ~STA_RONLY) |
+				(time_status & STA_RONLY);
+
+	if (txc->modes & ADJ_FREQUENCY)
+		time_freq = txc->freq;
+
+	if (txc->modes & ADJ_MAXERROR)
+		time_maxerror = txc->maxerror;
+
+	if (txc->modes & ADJ_ESTERROR)
+		time_esterror = txc->esterror;
+
+	if (txc->modes & ADJ_TIMECONST)
+		time_constant = txc->constant;
+
+	if (txc->modes & ADJ_OFFSET) {
+		if (txc->modes == ADJ_OFFSET_SINGLESHOT) {
+			/* adjtime() is independent from ntp_adjtime() */
+			if ((time_next_adjust = txc->offset) == 0)
+				time_adjust = 0;
+		} else if (ntp_hardupdate(txc->offset, xtime))
+			result = TIME_ERROR;
+	}
 
-		if (txc->modes & ADJ_OFFSET) { /* values checked earlier */
-			if (txc->modes == ADJ_OFFSET_SINGLESHOT) {
-				/* adjtime() is independent from ntp_adjtime() */
-				if ((time_next_adjust = txc->offset) == 0)
-					time_adjust = 0;
-				else if (ntp_hardupdate(txc->offset, xtime))
-					result = TIME_ERROR;
-			}
-		} /* txc->modes & ADJ_OFFSET */
-
-		if (txc->modes & ADJ_TICK) {
-			tick_usec = txc->tick;
-			tick_nsec = TICK_USEC_TO_NSEC(tick_usec);
-		}
-	} /* txc->modes */
-leave:
+	if (txc->modes & ADJ_TICK) {
+		tick_usec = txc->tick;
+		tick_nsec = TICK_USEC_TO_NSEC(tick_usec);
+	}
 
 	if ((time_status & (STA_UNSYNC|STA_CLOCKERR)) != 0)
 		result = TIME_ERROR;
 
+	/* write kernel state to user timex values*/
+
 	if ((txc->modes & ADJ_OFFSET_SINGLESHOT) == ADJ_OFFSET_SINGLESHOT)
 		txc->offset = save_adjust;
 	else {



^ permalink raw reply	[flat|nested] 71+ messages in thread

* [RFC][PATCH - 7/13] NTP cleanup: Cleanup signed shifting logic
  2005-08-11  1:28           ` [RFC][PATCH - 6/13] NTP cleanup: Clean up ntp_adjtimex() arguement checking john stultz
@ 2005-08-11  1:31             ` john stultz
  2005-08-11  1:31               ` [RFC][PATCH - 8/13] NTP cleanup: Integrate second_overflow() logic john stultz
  0 siblings, 1 reply; 71+ messages in thread
From: john stultz @ 2005-08-11  1:31 UTC (permalink / raw)
  To: lkml
  Cc: George Anzinger, frank, Anton Blanchard, benh,
	Nishanth Aravamudan, Roman Zippel, Ulrich Windl

All,
	Signed shifting must be done carefully, and the ntp code has quite a
number of conditionals to do the signed shifting. This patch makes use
of the shiftR() macro introduced in a previous patch to simplify a bit
of logic.

Any comments or feedback would be greatly appreciated.

thanks
-john


linux-2.6.13-rc6_timeofday-ntp-part7_B5.patch
============================================
diff --git a/kernel/ntp.c b/kernel/ntp.c
--- a/kernel/ntp.c
+++ b/kernel/ntp.c
@@ -162,28 +162,19 @@ void second_overflow(void)
 	}
 
 	ltemp = time_freq;
-	if (ltemp < 0)
-		time_adj -= -ltemp >> (SHIFT_USEC + SHIFT_HZ - SHIFT_SCALE);
-	else
-		time_adj += ltemp >> (SHIFT_USEC + SHIFT_HZ - SHIFT_SCALE);
+	time_adj += shiftR(ltemp, (SHIFT_USEC + SHIFT_HZ - SHIFT_SCALE));
 
 #if HZ == 100
     /* Compensate for (HZ==100) != (1 << SHIFT_HZ).
      * Add 25% and 3.125% to get 128.125; => only 0.125% error (p. 14)
      */
-	if (time_adj < 0)
-		time_adj -= (-time_adj >> 2) + (-time_adj >> 5);
-	else
-		time_adj += (time_adj >> 2) + (time_adj >> 5);
+	time_adj += shiftR(time_adj,2) + shiftR(time_adj,5);
 #endif
 #if HZ == 1000
     /* Compensate for (HZ==1000) != (1 << SHIFT_HZ).
      * Add 1.5625% and 0.78125% to get 1023.4375; => only 0.05% error (p. 14)
      */
-	if (time_adj < 0)
-		time_adj -= (-time_adj >> 6) + (-time_adj >> 7);
-	else
-		time_adj += (time_adj >> 6) + (time_adj >> 7);
+	time_adj += shiftR(time_adj,6) + shiftR(time_adj,7);
 #endif
 }
 
@@ -349,10 +340,7 @@ int ntp_adjtimex(struct timex *txc)
 	if ((txc->modes & ADJ_OFFSET_SINGLESHOT) == ADJ_OFFSET_SINGLESHOT)
 		txc->offset = save_adjust;
 	else {
-		if (time_offset < 0)
-			txc->offset = -(-time_offset >> SHIFT_UPDATE);
-		else
-			txc->offset = time_offset >> SHIFT_UPDATE;
+		txc->offset = shiftR(time_offset, SHIFT_UPDATE);
 	}
 	txc->freq = time_freq;
 	txc->maxerror = time_maxerror;



^ permalink raw reply	[flat|nested] 71+ messages in thread

* [RFC][PATCH - 8/13] NTP cleanup: Integrate second_overflow() logic
  2005-08-11  1:31             ` [RFC][PATCH - 7/13] NTP cleanup: Cleanup signed shifting logic john stultz
@ 2005-08-11  1:31               ` john stultz
  2005-08-11  1:33                 ` [RFC][PATCH - 9/13] NTP cleanup: Improve NTP variable names john stultz
  0 siblings, 1 reply; 71+ messages in thread
From: john stultz @ 2005-08-11  1:31 UTC (permalink / raw)
  To: lkml
  Cc: George Anzinger, frank, Anton Blanchard, benh,
	Nishanth Aravamudan, Roman Zippel, Ulrich Windl

All
	This patch removes the second_overflow() logic integrating it into the
ntp_advance() function. This provides a single interface to advance the
internal NTP state machine.

Any comments or feedback would be greatly appreciated.

thanks
-john


linux-2.6.13-rc6_timeofday-ntp-part8_B5.patch
============================================
diff --git a/include/linux/ntp.h b/include/linux/ntp.h
--- a/include/linux/ntp.h
+++ b/include/linux/ntp.h
@@ -15,9 +15,8 @@
 		(__x < 0) ? (-((-__x) >> (__s))) : ((__x) >> (__s));})
 
 /* NTP state machine interfaces */
-void ntp_advance(void);
+void ntp_advance(unsigned long interval_nsec);
 int ntp_adjtimex(struct timex*);
-void second_overflow(void);
 int ntp_leapsecond(struct timespec now);
 void ntp_clear(void);
 int ntp_synced(void);
diff --git a/kernel/ntp.c b/kernel/ntp.c
--- a/kernel/ntp.c
+++ b/kernel/ntp.c
@@ -30,6 +30,11 @@
 *  http://www.eecis.udel.edu/~mills/database/rfc/rfc1589.txt
 *  http://www.eecis.udel.edu/~mills/database/reports/kern/kernb.pdf
 *
+* The tricky bits of code to handle the accurate clock support
+* were provided by Dave Mills (Mills@UDEL.EDU) of NTP fame.
+* They were originally developed for SUN and DEC kernels.
+* All the kudos should go to Dave for this stuff.
+*
 * NOTE:	To simplify the code, we do not implement any of
 * the PPS code, as the code that uses it never was merged.
 *                             -johnstul@us.ibm.com
@@ -67,10 +72,69 @@ static long fixed_tick_ns_adj;
 
 #define SEC_PER_DAY 86400
 
-void ntp_advance(void)
+void ntp_advance(unsigned long interval_nsec)
 {
+	static unsigned long interval_sum = 0;
 	long time_adjust_step;
 
+
+	/* Some components of the NTP state machine are advanced
+	 * in full second increments (this is a hold-over from
+	 * the old second_overflow() code)
+	 *
+	 * XXX - I'd prefer to smoothly apply this math at each
+	 * call to ntp_advance() rather then each second.
+	 */
+	interval_sum += interval_nsec;
+	while (interval_sum > NSEC_PER_SEC) {
+		long next_adj;
+		interval_sum -= NSEC_PER_SEC;
+
+		/* Bump the maxerror field */
+		time_maxerror += time_tolerance >> SHIFT_USEC;
+		if ( time_maxerror > NTP_PHASE_LIMIT ) {
+			time_maxerror = NTP_PHASE_LIMIT;
+			time_status |= STA_UNSYNC;
+		}
+
+		/*
+		 * Compute the phase adjustment for the next second. In
+		 * PLL mode, the offset is reduced by a fixed factor
+		 * times the time constant. In FLL mode the offset is
+		 * used directly. In either mode, the maximum phase
+		 * adjustment for each second is clamped so as to spread
+		 * the adjustment over not more than the number of
+		 * seconds between updates.
+		 */
+		next_adj = time_offset;
+		if (!(time_status & STA_FLL))
+			next_adj = shiftR(next_adj, SHIFT_KG + time_constant);
+		next_adj = min(next_adj, (MAXPHASE / MINSEC) << SHIFT_UPDATE);
+		next_adj = max(next_adj, -(MAXPHASE / MINSEC) << SHIFT_UPDATE);
+		time_offset -= next_adj;
+
+		time_adj = next_adj << (SHIFT_SCALE - SHIFT_HZ - SHIFT_UPDATE);
+
+		time_adj += shiftR(time_freq, (SHIFT_USEC + SHIFT_HZ - SHIFT_SCALE));
+
+#if HZ == 100
+    	/* Compensate for (HZ==100) != (1 << SHIFT_HZ).
+	     * Add 25% and 3.125% to get 128.125;
+		 * => only 0.125% error (p. 14)
+    	 */
+		time_adj += shiftR(time_adj,2) + shiftR(time_adj,5);
+#endif
+#if HZ == 1000
+	    /* Compensate for (HZ==1000) != (1 << SHIFT_HZ).
+    	 * Add 1.5625% and 0.78125% to get 1023.4375;
+		 * => only 0.05% error (p. 14)
+	     */
+		time_adj += shiftR(time_adj,6) + shiftR(time_adj,7);
+#endif
+
+	}
+
+
 	if ( (time_adjust_step = time_adjust) != 0 ) {
 	    /* We are doing an adjtime thing.
 	     *
@@ -113,71 +177,6 @@ void ntp_advance(void)
 	}
 }
 
-
-/*
- * this routine handles the overflow of the microsecond field
- *
- * The tricky bits of code to handle the accurate clock support
- * were provided by Dave Mills (Mills@UDEL.EDU) of NTP fame.
- * They were originally developed for SUN and DEC kernels.
- * All the kudos should go to Dave for this stuff.
- *
- */
-void second_overflow(void)
-{
-	long ltemp;
-
-	/* Bump the maxerror field */
-	time_maxerror += time_tolerance >> SHIFT_USEC;
-	if ( time_maxerror > NTP_PHASE_LIMIT ) {
-		time_maxerror = NTP_PHASE_LIMIT;
-		time_status |= STA_UNSYNC;
-	}
-
-	/*
-	 * Compute the phase adjustment for the next second. In
-	 * PLL mode, the offset is reduced by a fixed factor
-	 * times the time constant. In FLL mode the offset is
-	 * used directly. In either mode, the maximum phase
-	 * adjustment for each second is clamped so as to spread
-	 * the adjustment over not more than the number of
-	 * seconds between updates.
-	 */
-	if (time_offset < 0) {
-		ltemp = -time_offset;
-		if (!(time_status & STA_FLL))
-			ltemp >>= SHIFT_KG + time_constant;
-		if (ltemp > (MAXPHASE / MINSEC) << SHIFT_UPDATE)
-			ltemp = (MAXPHASE / MINSEC) << SHIFT_UPDATE;
-		time_offset += ltemp;
-		time_adj = -ltemp << (SHIFT_SCALE - SHIFT_HZ - SHIFT_UPDATE);
-	} else {
-		ltemp = time_offset;
-		if (!(time_status & STA_FLL))
-			ltemp >>= SHIFT_KG + time_constant;
-		if (ltemp > (MAXPHASE / MINSEC) << SHIFT_UPDATE)
-			ltemp = (MAXPHASE / MINSEC) << SHIFT_UPDATE;
-		time_offset -= ltemp;
-		time_adj = ltemp << (SHIFT_SCALE - SHIFT_HZ - SHIFT_UPDATE);
-	}
-
-	ltemp = time_freq;
-	time_adj += shiftR(ltemp, (SHIFT_USEC + SHIFT_HZ - SHIFT_SCALE));
-
-#if HZ == 100
-    /* Compensate for (HZ==100) != (1 << SHIFT_HZ).
-     * Add 25% and 3.125% to get 128.125; => only 0.125% error (p. 14)
-     */
-	time_adj += shiftR(time_adj,2) + shiftR(time_adj,5);
-#endif
-#if HZ == 1000
-    /* Compensate for (HZ==1000) != (1 << SHIFT_HZ).
-     * Add 1.5625% and 0.78125% to get 1023.4375; => only 0.05% error (p. 14)
-     */
-	time_adj += shiftR(time_adj,6) + shiftR(time_adj,7);
-#endif
-}
-
 /**
  * ntp_hardupdate - Calculates the offset and freq values
  * offset: current offset
diff --git a/kernel/timer.c b/kernel/timer.c
--- a/kernel/timer.c
+++ b/kernel/timer.c
@@ -604,7 +604,7 @@ static void update_wall_time_one_tick(vo
 {
 	long delta_nsec;
 
-	ntp_advance();
+	ntp_advance(tick_nsec);
 
 	delta_nsec = tick_nsec + ntp_get_fixed_ns_adjustment();
 
@@ -629,7 +629,6 @@ static void update_wall_time(unsigned lo
 			int leapsecond;
 			xtime.tv_nsec -= 1000000000;
 			xtime.tv_sec++;
-			second_overflow();
 
 			/* apply leapsecond if appropriate */
 			leapsecond = ntp_leapsecond(xtime);



^ permalink raw reply	[flat|nested] 71+ messages in thread

* [RFC][PATCH - 9/13] NTP cleanup: Improve NTP variable names
  2005-08-11  1:31               ` [RFC][PATCH - 8/13] NTP cleanup: Integrate second_overflow() logic john stultz
@ 2005-08-11  1:33                 ` john stultz
  2005-08-11  1:33                   ` [RFC][PATCH - 10/13] NTP cleanup: Use ntp_lock instead of xtime_lock john stultz
  0 siblings, 1 reply; 71+ messages in thread
From: john stultz @ 2005-08-11  1:33 UTC (permalink / raw)
  To: lkml
  Cc: George Anzinger, frank, Anton Blanchard, benh,
	Nishanth Aravamudan, Roman Zippel, Ulrich Windl

All
	This patch changes the NTP variable names from time_* to ntp_* further
clarifing their use. 

Any comments or feedback would be greatly appreciated.

thanks
-john


linux-2.6.13-rc6_timeofday-ntp-part9_B5.patch
============================================
diff --git a/arch/cris/kernel/time.c b/arch/cris/kernel/time.c
--- a/arch/cris/kernel/time.c
+++ b/arch/cris/kernel/time.c
@@ -69,11 +69,11 @@ void do_gettimeofday(struct timeval *tv)
 	}
 
         /*
-	 * If time_adjust is negative then NTP is slowing the clock
+	 * If ntp_adjtime_offset is negative then NTP is slowing the clock
 	 * so make sure not to go into next possible interval.
 	 * Better to lose some accuracy than have time go backwards..
 	 */
-	if (unlikely(time_adjust < 0) && usec > tickadj)
+	if (unlikely(ntp_adjtime_offset < 0) && usec > tickadj)
 		usec = tickadj;
 
 	sec = xtime.tv_sec;
diff --git a/arch/frv/kernel/time.c b/arch/frv/kernel/time.c
--- a/arch/frv/kernel/time.c
+++ b/arch/frv/kernel/time.c
@@ -166,11 +166,11 @@ void do_gettimeofday(struct timeval *tv)
 		lost = jiffies - wall_jiffies;
 
 		/*
-		 * If time_adjust is negative then NTP is slowing the clock
+		 * If ntp_adjtime_offset is negative then NTP is slowing the clock
 		 * so make sure not to go into next possible interval.
 		 * Better to lose some accuracy than have time go backwards..
 		 */
-		if (unlikely(time_adjust < 0)) {
+		if (unlikely(ntp_adjtime_offset < 0)) {
 			max_ntp_tick = (USEC_PER_SEC / HZ) - tickadj;
 			usec = min(usec, max_ntp_tick);
 
diff --git a/arch/i386/kernel/time.c b/arch/i386/kernel/time.c
--- a/arch/i386/kernel/time.c
+++ b/arch/i386/kernel/time.c
@@ -142,11 +142,11 @@ void do_gettimeofday(struct timeval *tv)
 		lost = jiffies - wall_jiffies;
 
 		/*
-		 * If time_adjust is negative then NTP is slowing the clock
+		 * If ntp_adjtime_offset is negative then NTP is slowing the clock
 		 * so make sure not to go into next possible interval.
 		 * Better to lose some accuracy than have time go backwards..
 		 */
-		if (unlikely(time_adjust < 0)) {
+		if (unlikely(ntp_adjtime_offset < 0)) {
 			max_ntp_tick = (USEC_PER_SEC / HZ) - tickadj;
 			usec = min(usec, max_ntp_tick);
 
diff --git a/arch/m32r/kernel/time.c b/arch/m32r/kernel/time.c
--- a/arch/m32r/kernel/time.c
+++ b/arch/m32r/kernel/time.c
@@ -122,11 +122,11 @@ void do_gettimeofday(struct timeval *tv)
 		lost = jiffies - wall_jiffies;
 
 		/*
-		 * If time_adjust is negative then NTP is slowing the clock
+		 * If ntp_adjtime_offset is negative then NTP is slowing the clock
 		 * so make sure not to go into next possible interval.
 		 * Better to lose some accuracy than have time go backwards..
 		 */
-		if (unlikely(time_adjust < 0)) {
+		if (unlikely(ntp_adjtime_offset < 0)) {
 			usec = min(usec, max_ntp_tick);
 			if (lost)
 				usec += lost * max_ntp_tick;
diff --git a/arch/mips/kernel/time.c b/arch/mips/kernel/time.c
--- a/arch/mips/kernel/time.c
+++ b/arch/mips/kernel/time.c
@@ -171,11 +171,11 @@ void do_gettimeofday(struct timeval *tv)
 		lost = jiffies - wall_jiffies;
 
 		/*
-		 * If time_adjust is negative then NTP is slowing the clock
+		 * If ntp_adjtime_offset is negative then NTP is slowing the clock
 		 * so make sure not to go into next possible interval.
 		 * Better to lose some accuracy than have time go backwards..
 		 */
-		if (unlikely(time_adjust < 0)) {
+		if (unlikely(ntp_adjtime_offset < 0)) {
 			usec = min(usec, max_ntp_tick);
 
 			if (lost)
diff --git a/arch/ppc64/kernel/time.c b/arch/ppc64/kernel/time.c
--- a/arch/ppc64/kernel/time.c
+++ b/arch/ppc64/kernel/time.c
@@ -355,7 +355,7 @@ int timer_interrupt(struct pt_regs * reg
 			timer_sync_xtime(lpaca->next_jiffy_update_tb);
 			timer_check_rtc();
 			write_sequnlock(&xtime_lock);
-			if ( adjusting_time && (time_adjust == 0) )
+			if ( adjusting_time && (ntp_adjtime_offset == 0) )
 				ppc_adjtimex();
 		}
 		lpaca->next_jiffy_update_tb += tb_ticks_per_jiffy;
@@ -582,7 +582,7 @@ void __init time_init(void)
 	systemcfg->stamp_xsec = xtime.tv_sec * XSEC_PER_SEC;
 	systemcfg->tb_to_xs = tb_to_xs;
 
-	time_freq = 0;
+	ntp_freq = 0;
 
 	xtime.tv_nsec = 0;
 	last_rtc_update = xtime.tv_sec;
@@ -599,7 +599,7 @@ void __init time_init(void)
  * to microseconds to keep do_gettimeofday synchronized 
  * with ntpd.
  *
- * Use the time_adjust, time_freq and time_offset computed by adjtimex to 
+ * Use the ntp_adjtime_offset, ntp_freq and ntp_offset computed by adjtimex to
  * adjust the frequency.
  */
 
@@ -617,32 +617,32 @@ void ppc_adjtimex(void)
 	long singleshot_ppm = 0;
 
 	/* Compute parts per million frequency adjustment to accomplish the time adjustment
-	   implied by time_offset to be applied over the elapsed time indicated by time_constant.
-	   Use SHIFT_USEC to get it into the same units as time_freq. */
-	if ( time_offset < 0 ) {
-		ltemp = -time_offset;
+	   implied by ntp_offset to be applied over the elapsed time indicated by ntp_constant.
+	   Use SHIFT_USEC to get it into the same units as ntp_freq. */
+	if ( ntp_offset < 0 ) {
+		ltemp = -ntp_offset;
 		ltemp <<= SHIFT_USEC - SHIFT_UPDATE;
-		ltemp >>= SHIFT_KG + time_constant;
+		ltemp >>= SHIFT_KG + ntp_constant;
 		ltemp = -ltemp;
 	}
 	else {
-		ltemp = time_offset;
+		ltemp = ntp_offset;
 		ltemp <<= SHIFT_USEC - SHIFT_UPDATE;
-		ltemp >>= SHIFT_KG + time_constant;
+		ltemp >>= SHIFT_KG + ntp_constant;
 	}
 	
 	/* If there is a single shot time adjustment in progress */
-	if ( time_adjust ) {
+	if ( ntp_adjtime_offset ) {
 #ifdef DEBUG_PPC_ADJTIMEX
 		printk("ppc_adjtimex: ");
 		if ( adjusting_time == 0 )
 			printk("starting ");
-		printk("single shot time_adjust = %ld\n", time_adjust);
+		printk("single shot ntp_adjtime_offset = %ld\n", ntp_adjtime_offset);
 #endif	
 	
 		adjusting_time = 1;
 		
-		/* Compute parts per million frequency adjustment to match time_adjust */
+		/* Compute parts per million frequency adjustment to match ntp_adjtime_offset */
 		singleshot_ppm = tickadj * HZ;	
 		/*
 		 * The adjustment should be tickadj*HZ to match the code in
@@ -650,21 +650,21 @@ void ppc_adjtimex(void)
 		 * large. 3/4 of tickadj*HZ seems about right
 		 */
 		singleshot_ppm -= singleshot_ppm / 4;
-		/* Use SHIFT_USEC to get it into the same units as time_freq */	
+		/* Use SHIFT_USEC to get it into the same units as ntp_freq */
 		singleshot_ppm <<= SHIFT_USEC;
-		if ( time_adjust < 0 )
+		if ( ntp_adjtime_offset < 0 )
 			singleshot_ppm = -singleshot_ppm;
 	}
 	else {
 #ifdef DEBUG_PPC_ADJTIMEX
 		if ( adjusting_time )
-			printk("ppc_adjtimex: ending single shot time_adjust\n");
+			printk("ppc_adjtimex: ending single shot ntp_adjtime_offset\n");
 #endif
 		adjusting_time = 0;
 	}
 	
 	/* Add up all of the frequency adjustments */
-	delta_freq = time_freq + ltemp + singleshot_ppm;
+	delta_freq = ntp_freq + ltemp + singleshot_ppm;
 	
 	/* Compute a new value for tb_ticks_per_sec based on the frequency adjustment */
 	den = 1000000 * (1 << (SHIFT_USEC - 8));
@@ -678,7 +678,7 @@ void ppc_adjtimex(void)
 	}
 	
 #ifdef DEBUG_PPC_ADJTIMEX
-	printk("ppc_adjtimex: ltemp = %ld, time_freq = %ld, singleshot_ppm = %ld\n", ltemp, time_freq, singleshot_ppm);
+	printk("ppc_adjtimex: ltemp = %ld, ntp_freq = %ld, singleshot_ppm = %ld\n", ltemp, ntp_freq, singleshot_ppm);
 	printk("ppc_adjtimex: tb_ticks_per_sec - base = %ld  new = %ld\n", tb_ticks_per_sec, new_tb_ticks_per_sec);
 #endif
 				
diff --git a/arch/sparc/kernel/pcic.c b/arch/sparc/kernel/pcic.c
--- a/arch/sparc/kernel/pcic.c
+++ b/arch/sparc/kernel/pcic.c
@@ -783,11 +783,11 @@ static void pci_do_gettimeofday(struct t
 		lost = jiffies - wall_jiffies;
 
 		/*
-		 * If time_adjust is negative then NTP is slowing the clock
+		 * If ntp_adjtime_offset is negative then NTP is slowing the clock
 		 * so make sure not to go into next possible interval.
 		 * Better to lose some accuracy than have time go backwards..
 		 */
-		if (unlikely(time_adjust < 0)) {
+		if (unlikely(ntp_adjtime_offset < 0)) {
 			usec = min(usec, max_ntp_tick);
 
 			if (lost)
diff --git a/arch/sparc/kernel/time.c b/arch/sparc/kernel/time.c
--- a/arch/sparc/kernel/time.c
+++ b/arch/sparc/kernel/time.c
@@ -492,11 +492,11 @@ void do_gettimeofday(struct timeval *tv)
 		lost = jiffies - wall_jiffies;
 
 		/*
-		 * If time_adjust is negative then NTP is slowing the clock
+		 * If ntp_adjtime_offset is negative then NTP is slowing the clock
 		 * so make sure not to go into next possible interval.
 		 * Better to lose some accuracy than have time go backwards..
 		 */
-		if (unlikely(time_adjust < 0)) {
+		if (unlikely(ntp_adjtime_offset < 0)) {
 			usec = min(usec, max_ntp_tick);
 
 			if (lost)
diff --git a/include/linux/ntp.h b/include/linux/ntp.h
--- a/include/linux/ntp.h
+++ b/include/linux/ntp.h
@@ -24,13 +24,13 @@ long ntp_get_fixed_ns_adjustment(void);
 
 
 extern int tickadj;
-extern long time_adjust;
+extern long ntp_adjtime_offset;
 
 /* Due to ppc64 having its own NTP  code,
  * these variables cannot be made static just yet
  */
-extern long time_offset;
-extern long time_freq;
-extern long time_constant;
+extern long ntp_offset;
+extern long ntp_freq;
+extern long ntp_constant;
 
 #endif
diff --git a/kernel/ntp.c b/kernel/ntp.c
--- a/kernel/ntp.c
+++ b/kernel/ntp.c
@@ -47,26 +47,28 @@
 
 /* Don't completely fail for HZ > 500.  */
 int tickadj = 500/HZ ? : 1;		/* microsecs */
-
-/*
- * phase-lock loop variables
- */
-/* TIME_ERROR prevents overwriting the CMOS clock */
-static int time_state = TIME_OK;              /* clock synchronization status */
-static int time_status = STA_UNSYNC;          /* clock status bits */
-long time_offset;                      /* time adjustment (us) */
-long time_constant = 2;                /* pll time constant */
-static long time_tolerance = MAXFREQ;         /* frequency tolerance (ppm) */
-static long time_precision = 1;               /* clock precision (us) */
-static long time_maxerror = NTP_PHASE_LIMIT;  /* maximum error (us) */
-static long time_esterror = NTP_PHASE_LIMIT;  /* estimated error (us) */
-static long time_phase;                       /* phase offset (scaled us) */
-long time_freq = (((NSEC_PER_SEC + HZ/2) % HZ - HZ/2) << SHIFT_USEC) / NSEC_PER_USEC;
-                                        /* frequency offset (scaled ppm) */
+static long time_phase;                 /* phase offset (scaled us) */
 static long time_adj;                   /* tick adjust (scaled 1 / HZ) */
-static long time_reftime;                      /* time at last adjustment (s) */
-long time_adjust;
-static long time_next_adjust;
+
+/* Chapter 5: Kernel Variables [RFC 1589 pg. 28] */
+/* 5.1 Interface Variables */
+static int ntp_status = STA_UNSYNC;             /* status */
+long ntp_offset;                                /* usec */
+long ntp_constant = 2;                          /* ntp magic? */
+static long ntp_maxerror = NTP_PHASE_LIMIT;     /* usec */
+static long ntp_esterror = NTP_PHASE_LIMIT;     /* usec */
+static const long ntp_tolerance	= MAXFREQ;      /* shifted ppm */
+static const long ntp_precision	= 1;            /* constant */
+
+/* 5.2 Phase-Lock Loop Variables */
+long ntp_freq;                                  /* shifted ppm */
+static long ntp_reftime;                        /* sec */
+
+/* Extra values */
+static int ntp_state    = TIME_OK;              /* leapsecond state */
+long ntp_adjtime_offset;
+static long ntp_next_adjtime_offset;
+
 
 static long fixed_tick_ns_adj;
 
@@ -91,10 +93,10 @@ void ntp_advance(unsigned long interval_
 		interval_sum -= NSEC_PER_SEC;
 
 		/* Bump the maxerror field */
-		time_maxerror += time_tolerance >> SHIFT_USEC;
-		if ( time_maxerror > NTP_PHASE_LIMIT ) {
-			time_maxerror = NTP_PHASE_LIMIT;
-			time_status |= STA_UNSYNC;
+		ntp_maxerror += shiftR(ntp_tolerance, SHIFT_USEC);
+		if (ntp_maxerror > NTP_PHASE_LIMIT) {
+			ntp_maxerror = NTP_PHASE_LIMIT;
+			ntp_status |= STA_UNSYNC;
 		}
 
 		/*
@@ -106,16 +108,16 @@ void ntp_advance(unsigned long interval_
 		 * the adjustment over not more than the number of
 		 * seconds between updates.
 		 */
-		next_adj = time_offset;
-		if (!(time_status & STA_FLL))
-			next_adj = shiftR(next_adj, SHIFT_KG + time_constant);
+		next_adj = ntp_offset;
+		if (!(ntp_status & STA_FLL))
+			next_adj = shiftR(next_adj, SHIFT_KG + ntp_constant);
 		next_adj = min(next_adj, (MAXPHASE / MINSEC) << SHIFT_UPDATE);
 		next_adj = max(next_adj, -(MAXPHASE / MINSEC) << SHIFT_UPDATE);
-		time_offset -= next_adj;
+		ntp_offset -= next_adj;
 
 		time_adj = next_adj << (SHIFT_SCALE - SHIFT_HZ - SHIFT_UPDATE);
 
-		time_adj += shiftR(time_freq, (SHIFT_USEC + SHIFT_HZ - SHIFT_SCALE));
+		time_adj += shiftR(ntp_freq, (SHIFT_USEC + SHIFT_HZ - SHIFT_SCALE));
 
 #if HZ == 100
     	/* Compensate for (HZ==100) != (1 << SHIFT_HZ).
@@ -135,23 +137,23 @@ void ntp_advance(unsigned long interval_
 	}
 
 
-	if ( (time_adjust_step = time_adjust) != 0 ) {
+	if ( (time_adjust_step = ntp_adjtime_offset) != 0 ) {
 	    /* We are doing an adjtime thing.
 	     *
 	     * Prepare time_adjust_step to be within bounds.
-	     * Note that a positive time_adjust means we want the clock
+	     * Note that a positive ntp_adjtime_offset means we want the clock
 	     * to run faster.
 	     *
 	     * Limit the amount of the step to be in the range
 	     * -tickadj .. +tickadj
 	     */
-		if (time_adjust > tickadj)
+		if (ntp_adjtime_offset > tickadj)
 			time_adjust_step = tickadj;
-		else if (time_adjust < -tickadj)
+		else if (ntp_adjtime_offset < -tickadj)
 			time_adjust_step = -tickadj;
 
 	    /* Reduce by this step the amount of time left  */
-	    time_adjust -= time_adjust_step;
+	    ntp_adjtime_offset -= time_adjust_step;
 	}
 	fixed_tick_ns_adj = time_adjust_step * 1000;
 
@@ -171,9 +173,9 @@ void ntp_advance(unsigned long interval_
 	}
 
 	/* Changes by adjtime() do not take effect till next tick. */
-	if (time_next_adjust != 0) {
-		time_adjust = time_next_adjust;
-		time_next_adjust = 0;
+	if (ntp_next_adjtime_offset != 0) {
+		ntp_adjtime_offset = ntp_next_adjtime_offset;
+		ntp_next_adjtime_offset = 0;
 	}
 }
 
@@ -194,51 +196,51 @@ static int ntp_hardupdate(long offset, s
 	long current_offset, interval;
 
 	ret = 0;
-	if (!(time_status & STA_PLL))
+	if (!(ntp_status & STA_PLL))
 		return ret;
 
 	current_offset = offset;
 	/* Make sure offset is bounded by MAXPHASE */
 	current_offset = min(current_offset, MAXPHASE);
 	current_offset = max(current_offset, -MAXPHASE);
-	time_offset = current_offset << SHIFT_UPDATE;
+	ntp_offset = current_offset << SHIFT_UPDATE;
 
-	if (time_status & STA_FREQHOLD || time_reftime == 0)
-		time_reftime = tv.tv_sec;
+	if (ntp_status & STA_FREQHOLD || ntp_reftime == 0)
+		ntp_reftime = tv.tv_sec;
 
 	/* calculate seconds since last call to hardupdate */
-	interval = tv.tv_sec - time_reftime;
-	time_reftime = tv.tv_sec;
+	interval = tv.tv_sec - ntp_reftime;
+	ntp_reftime = tv.tv_sec;
 
 	/*
 	 * Select whether the frequency is to be controlled
 	 * and in which mode (PLL or FLL). Clamp to the operating
 	 * range. Ugly multiply/divide should be replaced someday.
 	 */
-	if ((time_status & STA_FLL) && (interval >= MINSEC)) {
+	if ((ntp_status & STA_FLL) && (interval >= MINSEC)) {
 		long offset_ppm;
 
-		offset_ppm = time_offset / interval;
+		offset_ppm = ntp_offset / interval;
 		offset_ppm <<= (SHIFT_USEC - SHIFT_UPDATE);
 
-		time_freq += shiftR(offset_ppm, SHIFT_KH);
+		ntp_freq += shiftR(offset_ppm, SHIFT_KH);
 
-	} else if ((time_status & STA_PLL) && (interval < MAXSEC)) {
+	} else if ((ntp_status & STA_PLL) && (interval < MAXSEC)) {
 		long damping, offset_ppm;
 
 		offset_ppm = offset * interval;
 
-		damping = (2 * time_constant) + SHIFT_KF - SHIFT_USEC;
+		damping = (2 * ntp_constant) + SHIFT_KF - SHIFT_USEC;
 
-		time_freq += shiftR(offset_ppm, damping);
+		ntp_freq += shiftR(offset_ppm, damping);
 
 	} else { /* calibration interval out of bounds (p. 12) */
 		ret = TIME_ERROR;
 	}
 
-	/* bound time_freq */
-	time_freq = min(time_freq, time_tolerance);
-	time_freq = max(time_freq, -time_tolerance);
+	/* bound ntp_freq */
+	ntp_freq = min(ntp_freq, ntp_tolerance);
+	ntp_freq = max(ntp_freq, -ntp_tolerance);
 
 	return ret;
 }
@@ -294,34 +296,34 @@ int ntp_adjtimex(struct timex *txc)
 		return -EINVAL;
 
 	write_seqlock_irq(&xtime_lock);
-	result = time_state;       /* mostly `TIME_OK' */
+	result = ntp_state;       /* mostly `TIME_OK' */
 
 	/* Save for later - semantics of adjtime is to return old value */
-	save_adjust = time_next_adjust ? time_next_adjust : time_adjust;
+	save_adjust = ntp_next_adjtime_offset ? ntp_next_adjtime_offset : ntp_adjtime_offset;
 
 	/* If there are input parameters, then process them */
 
 	if (txc->modes & ADJ_STATUS)	/* only set allowed bits */
-		time_status = (txc->status & ~STA_RONLY) |
-				(time_status & STA_RONLY);
+		ntp_status = (txc->status & ~STA_RONLY) |
+				(ntp_status & STA_RONLY);
 
 	if (txc->modes & ADJ_FREQUENCY)
-		time_freq = txc->freq;
+		ntp_freq = txc->freq;
 
 	if (txc->modes & ADJ_MAXERROR)
-		time_maxerror = txc->maxerror;
+		ntp_maxerror = txc->maxerror;
 
 	if (txc->modes & ADJ_ESTERROR)
-		time_esterror = txc->esterror;
+		ntp_esterror = txc->esterror;
 
 	if (txc->modes & ADJ_TIMECONST)
-		time_constant = txc->constant;
+		ntp_constant = txc->constant;
 
 	if (txc->modes & ADJ_OFFSET) {
 		if (txc->modes == ADJ_OFFSET_SINGLESHOT) {
 			/* adjtime() is independent from ntp_adjtime() */
-			if ((time_next_adjust = txc->offset) == 0)
-				time_adjust = 0;
+			if ((ntp_next_adjtime_offset = txc->offset) == 0)
+				ntp_adjtime_offset = 0;
 		} else if (ntp_hardupdate(txc->offset, xtime))
 			result = TIME_ERROR;
 	}
@@ -331,23 +333,22 @@ int ntp_adjtimex(struct timex *txc)
 		tick_nsec = TICK_USEC_TO_NSEC(tick_usec);
 	}
 
-	if ((time_status & (STA_UNSYNC|STA_CLOCKERR)) != 0)
+	if ((ntp_status & (STA_UNSYNC|STA_CLOCKERR)) != 0)
 		result = TIME_ERROR;
 
 	/* write kernel state to user timex values*/
-
 	if ((txc->modes & ADJ_OFFSET_SINGLESHOT) == ADJ_OFFSET_SINGLESHOT)
 		txc->offset = save_adjust;
-	else {
-		txc->offset = shiftR(time_offset, SHIFT_UPDATE);
-	}
-	txc->freq = time_freq;
-	txc->maxerror = time_maxerror;
-	txc->esterror = time_esterror;
-	txc->status = time_status;
-	txc->constant = time_constant;
-	txc->precision = time_precision;
-	txc->tolerance = time_tolerance;
+	else
+		txc->offset = shiftR(ntp_offset, SHIFT_UPDATE);
+
+	txc->freq = ntp_freq;
+	txc->maxerror = ntp_maxerror;
+	txc->esterror = ntp_esterror;
+	txc->status = ntp_status;
+	txc->constant = ntp_constant;
+	txc->precision = ntp_precision;
+	txc->tolerance = ntp_tolerance;
 	txc->tick = tick_usec;
 
 	/* PPS is not implemented, so these are zero */
@@ -384,12 +385,12 @@ int ntp_leapsecond(struct timespec now)
 	 */
 	static time_t leaptime = 0;
 
-	switch (time_state) {
+	switch (ntp_state) {
 	case TIME_OK:
-		if (time_status & STA_INS)
-			time_state = TIME_INS;
-		else if (time_status & STA_DEL)
-			time_state = TIME_DEL;
+		if (ntp_status & STA_INS)
+			ntp_state = TIME_INS;
+		else if (ntp_status & STA_DEL)
+			ntp_state = TIME_DEL;
 
 		/* calculate end of today (23:59:59)*/
 		leaptime = now.tv_sec + SEC_PER_DAY -
@@ -399,7 +400,7 @@ int ntp_leapsecond(struct timespec now)
 	case TIME_INS:
 		/* Once we are at (or past) leaptime, insert the second */
 		if (now.tv_sec >= leaptime) {
-			time_state = TIME_OOP;
+			ntp_state = TIME_OOP;
 			printk(KERN_NOTICE "Clock: inserting leap second 23:59:60 UTC\n");
 			return -1;
 		}
@@ -408,7 +409,7 @@ int ntp_leapsecond(struct timespec now)
 	case TIME_DEL:
 		/* Once we are at (or past) leaptime, delete the second */
 		if (now.tv_sec >= leaptime) {
-			time_state = TIME_WAIT;
+			ntp_state = TIME_WAIT;
 			printk(KERN_NOTICE "Clock: deleting leap second 23:59:59 UTC\n");
 			return 1;
 		}
@@ -417,12 +418,12 @@ int ntp_leapsecond(struct timespec now)
 	case TIME_OOP:
 		/*  Wait for the end of the leap second*/
 		if (now.tv_sec > (leaptime + 1))
-			time_state = TIME_WAIT;
+			ntp_state = TIME_WAIT;
 		break;
 
 	case TIME_WAIT:
-		if (!(time_status & (STA_INS | STA_DEL)))
-			time_state = TIME_OK;
+		if (!(ntp_status & (STA_INS | STA_DEL)))
+			ntp_state = TIME_OK;
 	}
 
 	return 0;
@@ -436,10 +437,10 @@ int ntp_leapsecond(struct timespec now)
  */
 void ntp_clear(void)
 {
-	time_adjust = 0;		/* stop active adjtime() */
-	time_status |= STA_UNSYNC;
-	time_maxerror = NTP_PHASE_LIMIT;
-	time_esterror = NTP_PHASE_LIMIT;
+	ntp_next_adjtime_offset = 0;		/* stop active adjtime() */
+	ntp_status |= STA_UNSYNC;
+	ntp_maxerror = NTP_PHASE_LIMIT;
+	ntp_esterror = NTP_PHASE_LIMIT;
 }
 
 /**
@@ -448,7 +449,7 @@ void ntp_clear(void)
  */
 int ntp_synced(void)
 {
-	return !(time_status & STA_UNSYNC);
+	return !(ntp_status & STA_UNSYNC);
 }
 
 long ntp_get_fixed_ns_adjustment(void)



^ permalink raw reply	[flat|nested] 71+ messages in thread

* [RFC][PATCH - 10/13] NTP cleanup: Use ntp_lock instead of xtime_lock
  2005-08-11  1:33                 ` [RFC][PATCH - 9/13] NTP cleanup: Improve NTP variable names john stultz
@ 2005-08-11  1:33                   ` john stultz
  2005-08-11  1:35                     ` [RFC][PATCH - 11/13] NTP cleanup: Introduce PPM adjustment variables john stultz
  0 siblings, 1 reply; 71+ messages in thread
From: john stultz @ 2005-08-11  1:33 UTC (permalink / raw)
  To: lkml
  Cc: George Anzinger, frank, Anton Blanchard, benh,
	Nishanth Aravamudan, Roman Zippel, Ulrich Windl

All,
	This patch introduces the ntp_lock which replaces the xtime_lock for
serialization in the NTP subsystem. This further isolates the NTP
subsystem from the time subsystem.

Any comments or feedback would be greatly appreciated.

thanks
-john

linux-2.6.13-rc6_timeofday-ntp-part10_B5.patch
============================================
diff --git a/kernel/ntp.c b/kernel/ntp.c
--- a/kernel/ntp.c
+++ b/kernel/ntp.c
@@ -69,6 +69,8 @@ static int ntp_state    = TIME_OK;      
 long ntp_adjtime_offset;
 static long ntp_next_adjtime_offset;
 
+/* lock for the above variables */
+static seqlock_t ntp_lock = SEQLOCK_UNLOCKED;
 
 static long fixed_tick_ns_adj;
 
@@ -78,7 +80,9 @@ void ntp_advance(unsigned long interval_
 {
 	static unsigned long interval_sum = 0;
 	long time_adjust_step;
+	unsigned long flags;
 
+	write_seqlock_irqsave(&ntp_lock, flags);
 
 	/* Some components of the NTP state machine are advanced
 	 * in full second increments (this is a hold-over from
@@ -177,6 +181,8 @@ void ntp_advance(unsigned long interval_
 		ntp_adjtime_offset = ntp_next_adjtime_offset;
 		ntp_next_adjtime_offset = 0;
 	}
+
+	write_sequnlock_irqrestore(&ntp_lock, flags);
 }
 
 /**
@@ -184,13 +190,13 @@ void ntp_advance(unsigned long interval_
  * offset: current offset
  * tv: timeval holding the current time
  *
- * Private function, called only by ntp_adjtimex
+ * Private function, called only by ntp_adjtimex while holding ntp_lock
  *
  * This function is called when an offset adjustment is requested.
  * It calculates the offset adjustment and manipulates the
  * frequency adjustement accordingly.
  */
-static int ntp_hardupdate(long offset, struct timespec tv)
+static int ntp_hardupdate(long offset, struct timeval tv)
 {
 	int ret;
 	long current_offset, interval;
@@ -253,6 +259,7 @@ int ntp_adjtimex(struct timex *txc)
 {
 	long save_adjust;
 	int result;
+	unsigned long flags;
 
 	/* Now we validate the data before disabling interrupts */
 
@@ -295,7 +302,8 @@ int ntp_adjtimex(struct timex *txc)
 				||(txc->tick > 11000000/USER_HZ)))
 		return -EINVAL;
 
-	write_seqlock_irq(&xtime_lock);
+	write_seqlock_irqsave(&ntp_lock, flags);
+
 	result = ntp_state;       /* mostly `TIME_OK' */
 
 	/* Save for later - semantics of adjtime is to return old value */
@@ -324,7 +332,7 @@ int ntp_adjtimex(struct timex *txc)
 			/* adjtime() is independent from ntp_adjtime() */
 			if ((ntp_next_adjtime_offset = txc->offset) == 0)
 				ntp_adjtime_offset = 0;
-		} else if (ntp_hardupdate(txc->offset, xtime))
+		} else if (ntp_hardupdate(txc->offset, txc->time))
 			result = TIME_ERROR;
 	}
 
@@ -361,8 +369,8 @@ int ntp_adjtimex(struct timex *txc)
 	txc->errcnt = 0;
 	txc->stbcnt = 0;
 
-	write_sequnlock_irq(&xtime_lock);
-	do_gettimeofday(&txc->time);
+	write_sequnlock_irqrestore(&ntp_lock, flags);
+
 	return result;
 }
 
@@ -384,6 +392,8 @@ int ntp_leapsecond(struct timespec now)
 	 * set ahead one second.
 	 */
 	static time_t leaptime = 0;
+	unsigned long flags;
+	write_seqlock_irqsave(&ntp_lock, flags);
 
 	switch (ntp_state) {
 	case TIME_OK:
@@ -426,6 +436,7 @@ int ntp_leapsecond(struct timespec now)
 			ntp_state = TIME_OK;
 	}
 
+	write_sequnlock_irqrestore(&ntp_lock, flags);
 	return 0;
 }
 
@@ -433,14 +444,18 @@ int ntp_leapsecond(struct timespec now)
 /**
  * ntp_clear - Clears the NTP state machine.
  *
- * Must be called while holding a write on the xtime_lock
  */
 void ntp_clear(void)
 {
+	unsigned long flags;
+	write_seqlock_irqsave(&ntp_lock, flags);
+
 	ntp_next_adjtime_offset = 0;		/* stop active adjtime() */
 	ntp_status |= STA_UNSYNC;
 	ntp_maxerror = NTP_PHASE_LIMIT;
 	ntp_esterror = NTP_PHASE_LIMIT;
+
+	write_sequnlock_irqrestore(&ntp_lock, flags);
 }
 
 /**
diff --git a/kernel/time.c b/kernel/time.c
--- a/kernel/time.c
+++ b/kernel/time.c
@@ -222,6 +222,11 @@ int do_adjtimex(struct timex *txc)
 	if (txc->modes && !capable(CAP_SYS_TIME))
 		return -EPERM;
 		
+	/* Note: We set tx->time first,
+	 * because ntp_adjtimex uses it
+	 */
+	do_gettimeofday(&txc->time);
+
 	result = ntp_adjtimex(txc);
 	
 	notify_arch_cmos_timer();



^ permalink raw reply	[flat|nested] 71+ messages in thread

* [RFC][PATCH - 11/13] NTP cleanup: Introduce PPM adjustment variables
  2005-08-11  1:33                   ` [RFC][PATCH - 10/13] NTP cleanup: Use ntp_lock instead of xtime_lock john stultz
@ 2005-08-11  1:35                     ` john stultz
  2005-08-11  1:36                       ` [RFC][PATCH - 12/13] NTP cleanup: cleanup ntp_advance() adjtime code john stultz
  0 siblings, 1 reply; 71+ messages in thread
From: john stultz @ 2005-08-11  1:35 UTC (permalink / raw)
  To: lkml
  Cc: George Anzinger, frank, Anton Blanchard, benh,
	Nishanth Aravamudan, Roman Zippel, Ulrich Windl

All,
	This patch introduces variables to keep track of the different NTP
adjustment values in PPM units. It also introduces the
ntp_get_ppm_adjustment() interface which returns shifted PPM units. The
patch also changes the ppc64 ppc_adjtimex() function to use
ntp_get_ppm_adjustment(). 
	
	This will likely need careful review from the ppc64 folks. There are
some subtle differences in the calling frequency as well as between what
ntp_get_adjustment() returns and what ppc_adjtimex() used to calculate.

Any comments or feedback would be greatly appreciated.

thanks
-john

linux-2.6.13-rc6_timeofday-ntp-part11_B5.patch
============================================
diff --git a/arch/ppc64/kernel/time.c b/arch/ppc64/kernel/time.c
--- a/arch/ppc64/kernel/time.c
+++ b/arch/ppc64/kernel/time.c
@@ -106,8 +106,6 @@ extern struct timezone sys_tz;
 
 void ppc_adjtimex(void);
 
-static unsigned adjusting_time = 0;
-
 unsigned long ppc_proc_freq;
 unsigned long ppc_tb_freq;
 
@@ -355,8 +353,7 @@ int timer_interrupt(struct pt_regs * reg
 			timer_sync_xtime(lpaca->next_jiffy_update_tb);
 			timer_check_rtc();
 			write_sequnlock(&xtime_lock);
-			if ( adjusting_time && (ntp_adjtime_offset == 0) )
-				ppc_adjtimex();
+			ppc_adjtimex();
 		}
 		lpaca->next_jiffy_update_tb += tb_ticks_per_jiffy;
 	}
@@ -582,7 +579,7 @@ void __init time_init(void)
 	systemcfg->stamp_xsec = xtime.tv_sec * XSEC_PER_SEC;
 	systemcfg->tb_to_xs = tb_to_xs;
 
-	ntp_freq = 0;
+	ntp_clear();
 
 	xtime.tv_nsec = 0;
 	last_rtc_update = xtime.tv_sec;
@@ -599,7 +596,7 @@ void __init time_init(void)
  * to microseconds to keep do_gettimeofday synchronized 
  * with ntpd.
  *
- * Use the ntp_adjtime_offset, ntp_freq and ntp_offset computed by adjtimex to
+ * Use the ntp_get_ppm_adjustment computed by adjtimex to
  * adjust the frequency.
  */
 
@@ -609,62 +606,14 @@ void ppc_adjtimex(void)
 {
 	unsigned long den, new_tb_ticks_per_sec, tb_ticks, old_xsec, new_tb_to_xs, new_xsec, new_stamp_xsec;
 	unsigned long tb_ticks_per_sec_delta;
-	long delta_freq, ltemp;
+	long delta_freq;
 	struct div_result divres; 
 	unsigned long flags;
 	struct gettimeofday_vars * temp_varp;
 	unsigned temp_idx;
-	long singleshot_ppm = 0;
 
-	/* Compute parts per million frequency adjustment to accomplish the time adjustment
-	   implied by ntp_offset to be applied over the elapsed time indicated by ntp_constant.
-	   Use SHIFT_USEC to get it into the same units as ntp_freq. */
-	if ( ntp_offset < 0 ) {
-		ltemp = -ntp_offset;
-		ltemp <<= SHIFT_USEC - SHIFT_UPDATE;
-		ltemp >>= SHIFT_KG + ntp_constant;
-		ltemp = -ltemp;
-	}
-	else {
-		ltemp = ntp_offset;
-		ltemp <<= SHIFT_USEC - SHIFT_UPDATE;
-		ltemp >>= SHIFT_KG + ntp_constant;
-	}
-	
-	/* If there is a single shot time adjustment in progress */
-	if ( ntp_adjtime_offset ) {
-#ifdef DEBUG_PPC_ADJTIMEX
-		printk("ppc_adjtimex: ");
-		if ( adjusting_time == 0 )
-			printk("starting ");
-		printk("single shot ntp_adjtime_offset = %ld\n", ntp_adjtime_offset);
-#endif	
-	
-		adjusting_time = 1;
-		
-		/* Compute parts per million frequency adjustment to match ntp_adjtime_offset */
-		singleshot_ppm = tickadj * HZ;	
-		/*
-		 * The adjustment should be tickadj*HZ to match the code in
-		 * linux/kernel/timer.c, but experiments show that this is too
-		 * large. 3/4 of tickadj*HZ seems about right
-		 */
-		singleshot_ppm -= singleshot_ppm / 4;
-		/* Use SHIFT_USEC to get it into the same units as ntp_freq */
-		singleshot_ppm <<= SHIFT_USEC;
-		if ( ntp_adjtime_offset < 0 )
-			singleshot_ppm = -singleshot_ppm;
-	}
-	else {
-#ifdef DEBUG_PPC_ADJTIMEX
-		if ( adjusting_time )
-			printk("ppc_adjtimex: ending single shot ntp_adjtime_offset\n");
-#endif
-		adjusting_time = 0;
-	}
-	
 	/* Add up all of the frequency adjustments */
-	delta_freq = ntp_freq + ltemp + singleshot_ppm;
+	delta_freq = ntp_get_ppm_adjustment();
 	
 	/* Compute a new value for tb_ticks_per_sec based on the frequency adjustment */
 	den = 1000000 * (1 << (SHIFT_USEC - 8));
@@ -678,7 +627,7 @@ void ppc_adjtimex(void)
 	}
 	
 #ifdef DEBUG_PPC_ADJTIMEX
-	printk("ppc_adjtimex: ltemp = %ld, ntp_freq = %ld, singleshot_ppm = %ld\n", ltemp, ntp_freq, singleshot_ppm);
+	printk("ppc_adjtimex: delta_freq = %ld\n", delta_freq);
 	printk("ppc_adjtimex: tb_ticks_per_sec - base = %ld  new = %ld\n", tb_ticks_per_sec, new_tb_ticks_per_sec);
 #endif
 				
diff --git a/include/linux/ntp.h b/include/linux/ntp.h
--- a/include/linux/ntp.h
+++ b/include/linux/ntp.h
@@ -20,17 +20,11 @@ int ntp_adjtimex(struct timex*);
 int ntp_leapsecond(struct timespec now);
 void ntp_clear(void);
 int ntp_synced(void);
+long ntp_get_ppm_adjustment(void);
 long ntp_get_fixed_ns_adjustment(void);
 
-
+/* these variables cannot be made static just yet */
 extern int tickadj;
 extern long ntp_adjtime_offset;
 
-/* Due to ppc64 having its own NTP  code,
- * these variables cannot be made static just yet
- */
-extern long ntp_offset;
-extern long ntp_freq;
-extern long ntp_constant;
-
 #endif
diff --git a/kernel/ntp.c b/kernel/ntp.c
--- a/kernel/ntp.c
+++ b/kernel/ntp.c
@@ -41,6 +41,7 @@
 *
 *********************************************************************/
 
+#include <linux/kernel.h>
 #include <linux/ntp.h>
 #include <linux/jiffies.h>
 #include <linux/errno.h>
@@ -53,15 +54,15 @@ static long time_adj;                   
 /* Chapter 5: Kernel Variables [RFC 1589 pg. 28] */
 /* 5.1 Interface Variables */
 static int ntp_status = STA_UNSYNC;             /* status */
-long ntp_offset;                                /* usec */
-long ntp_constant = 2;                          /* ntp magic? */
+static long ntp_offset;                         /* usec */
+static long ntp_constant = 2;                   /* ntp magic? */
 static long ntp_maxerror = NTP_PHASE_LIMIT;     /* usec */
 static long ntp_esterror = NTP_PHASE_LIMIT;     /* usec */
 static const long ntp_tolerance	= MAXFREQ;      /* shifted ppm */
 static const long ntp_precision	= 1;            /* constant */
 
 /* 5.2 Phase-Lock Loop Variables */
-long ntp_freq;                                  /* shifted ppm */
+static long ntp_freq;                           /* shifted ppm */
 static long ntp_reftime;                        /* sec */
 
 /* Extra values */
@@ -69,11 +70,17 @@ static int ntp_state    = TIME_OK;      
 long ntp_adjtime_offset;
 static long ntp_next_adjtime_offset;
 
-/* lock for the above variables */
-static seqlock_t ntp_lock = SEQLOCK_UNLOCKED;
+static long singleshot_adj; /* +/- MAX_SINGLESHOT_ADJ (ppm)*/
+static long tick_adj;       /* txc->tick adjustment (ppm) */
+static long offset_adj;     /* offset adjustment (ppm) */
 
+static long shifted_ppm_sum; /* ppm<<SHIFT_USEC sum of total freq adj */
 static long fixed_tick_ns_adj;
 
+/* lock for the above variables */
+static seqlock_t ntp_lock = SEQLOCK_UNLOCKED;
+
+#define MAX_SINGLESHOT_ADJ 500 /* (ppm) */
 #define SEC_PER_DAY 86400
 
 void ntp_advance(unsigned long interval_nsec)
@@ -81,6 +88,7 @@ void ntp_advance(unsigned long interval_
 	static unsigned long interval_sum = 0;
 	long time_adjust_step;
 	unsigned long flags;
+	static long ss_adj = 0;
 
 	write_seqlock_irqsave(&ntp_lock, flags);
 
@@ -118,11 +126,10 @@ void ntp_advance(unsigned long interval_
 		next_adj = min(next_adj, (MAXPHASE / MINSEC) << SHIFT_UPDATE);
 		next_adj = max(next_adj, -(MAXPHASE / MINSEC) << SHIFT_UPDATE);
 		ntp_offset -= next_adj;
+		offset_adj = shiftR(next_adj, SHIFT_UPDATE); /* ppm */
 
 		time_adj = next_adj << (SHIFT_SCALE - SHIFT_HZ - SHIFT_UPDATE);
-
 		time_adj += shiftR(ntp_freq, (SHIFT_USEC + SHIFT_HZ - SHIFT_SCALE));
-
 #if HZ == 100
     	/* Compensate for (HZ==100) != (1 << SHIFT_HZ).
 	     * Add 25% and 3.125% to get 128.125;
@@ -137,9 +144,17 @@ void ntp_advance(unsigned long interval_
 	     */
 		time_adj += shiftR(time_adj,6) + shiftR(time_adj,7);
 #endif
-
+		/* Set ss_adj for the next second */
+		ss_adj = min_t(unsigned long, singleshot_adj, MAX_SINGLESHOT_ADJ);
+		singleshot_adj -= ss_adj;
 	}
 
+	/* calculate total shifted ppm adjustment for the next interval */
+	shifted_ppm_sum = tick_adj<<SHIFT_USEC;
+	shifted_ppm_sum += offset_adj << SHIFT_USEC;
+	shifted_ppm_sum += ntp_freq; /* already shifted by SHIFT_USEC */
+	shifted_ppm_sum += ss_adj << SHIFT_USEC;
+
 
 	if ( (time_adjust_step = ntp_adjtime_offset) != 0 ) {
 	    /* We are doing an adjtime thing.
@@ -332,11 +347,17 @@ int ntp_adjtimex(struct timex *txc)
 			/* adjtime() is independent from ntp_adjtime() */
 			if ((ntp_next_adjtime_offset = txc->offset) == 0)
 				ntp_adjtime_offset = 0;
+			singleshot_adj = txc->offset;
 		} else if (ntp_hardupdate(txc->offset, txc->time))
 			result = TIME_ERROR;
 	}
 
 	if (txc->modes & ADJ_TICK) {
+		/* first calculate usec/user_tick offset */
+		tick_adj = ((USEC_PER_SEC + USER_HZ/2)/USER_HZ) - txc->tick;
+		/* multiply by user_hz to get usec/sec => ppm */
+		tick_adj *= USER_HZ;
+
 		tick_usec = txc->tick;
 		tick_nsec = TICK_USEC_TO_NSEC(tick_usec);
 	}
@@ -454,6 +475,9 @@ void ntp_clear(void)
 	ntp_status |= STA_UNSYNC;
 	ntp_maxerror = NTP_PHASE_LIMIT;
 	ntp_esterror = NTP_PHASE_LIMIT;
+	singleshot_adj = 0;
+	tick_adj = 0;
+	offset_adj =0;
 
 	write_sequnlock_irqrestore(&ntp_lock, flags);
 }
@@ -467,6 +491,15 @@ int ntp_synced(void)
 	return !(ntp_status & STA_UNSYNC);
 }
 
+/**
+ * ntp_get_ppm_adjustment - Returns Shifted PPM adjustment
+ *
+ */
+long ntp_get_ppm_adjustment(void)
+{
+	return shifted_ppm_sum;
+}
+
 long ntp_get_fixed_ns_adjustment(void)
 {
 	return fixed_tick_ns_adj;



^ permalink raw reply	[flat|nested] 71+ messages in thread

* [RFC][PATCH - 12/13] NTP cleanup: cleanup ntp_advance() adjtime code
  2005-08-11  1:35                     ` [RFC][PATCH - 11/13] NTP cleanup: Introduce PPM adjustment variables john stultz
@ 2005-08-11  1:36                       ` john stultz
  2005-08-11  1:38                         ` [RFC][PATCH - 13/13] NTP cleanup: drop time_phase and time_adj add copyright john stultz
  0 siblings, 1 reply; 71+ messages in thread
From: john stultz @ 2005-08-11  1:36 UTC (permalink / raw)
  To: lkml
  Cc: George Anzinger, frank, Anton Blanchard, benh,
	Nishanth Aravamudan, Roman Zippel, Ulrich Windl

All,
	This patch simplifies and cleans up the adjtime code in ntp_advance and
corrects a comment.

Any comments or feedback would be greatly appreciated.

thanks
-john

linux-2.6.13-rc6_timeofday-ntp-part12_B5.patch
============================================
diff --git a/kernel/ntp.c b/kernel/ntp.c
--- a/kernel/ntp.c
+++ b/kernel/ntp.c
@@ -156,29 +156,25 @@ void ntp_advance(unsigned long interval_
 	shifted_ppm_sum += ss_adj << SHIFT_USEC;
 
 
-	if ( (time_adjust_step = ntp_adjtime_offset) != 0 ) {
-	    /* We are doing an adjtime thing.
-	     *
-	     * Prepare time_adjust_step to be within bounds.
-	     * Note that a positive ntp_adjtime_offset means we want the clock
-	     * to run faster.
-	     *
-	     * Limit the amount of the step to be in the range
-	     * -tickadj .. +tickadj
-	     */
-		if (ntp_adjtime_offset > tickadj)
-			time_adjust_step = tickadj;
-		else if (ntp_adjtime_offset < -tickadj)
-			time_adjust_step = -tickadj;
+	/* Calculate the fixed tick adjustment */
+	fixed_tick_ns_adj = 0;
 
-	    /* Reduce by this step the amount of time left  */
-	    ntp_adjtime_offset -= time_adjust_step;
+	/* If we are doing an adjtime thing */
+	if (ntp_adjtime_offset) {
+		long adjust_step = ntp_adjtime_offset;
+		/* Limit the amount of the step to be in the range
+		 * -tickadj .. +tickadj
+		 */
+		adjust_step = min_t(long, tickadj, adjust_step);
+		adjust_step = max_t(long, -tickadj, adjust_step);
+		/* Reduce by this step the amount of time left  */
+		ntp_adjtime_offset -= adjust_step;
+		fixed_tick_ns_adj += adjust_step * 1000;
 	}
-	fixed_tick_ns_adj = time_adjust_step * 1000;
 
 	/*
-	 * Advance the phase, once it gets to one microsecond, then
-	 * advance the tick more.
+	 * Advance the phase, once it gets to one nanosecond,
+	 * then advance the fixed_tick_ns_adj.
 	 */
 	time_phase += time_adj;
 	if (time_phase <= -FINENSEC) {



^ permalink raw reply	[flat|nested] 71+ messages in thread

* [RFC][PATCH - 13/13] NTP cleanup: drop time_phase and time_adj add copyright
  2005-08-11  1:36                       ` [RFC][PATCH - 12/13] NTP cleanup: cleanup ntp_advance() adjtime code john stultz
@ 2005-08-11  1:38                         ` john stultz
  0 siblings, 0 replies; 71+ messages in thread
From: john stultz @ 2005-08-11  1:38 UTC (permalink / raw)
  To: lkml
  Cc: George Anzinger, frank, Anton Blanchard, benh,
	Nishanth Aravamudan, Roman Zippel, Ulrich Windl

All,
	This patch replaces two global variables with better named static local
variables. Additionally it adds my copyright.

Any comments or feedback would be greatly appreciated.

thanks
-john

linux-2.6.13-rc6_timeofday-ntp-part13_B5.patch
============================================
diff --git a/kernel/ntp.c b/kernel/ntp.c
--- a/kernel/ntp.c
+++ b/kernel/ntp.c
@@ -3,7 +3,9 @@
 *
 * NTP state machine and time scaling code.
 *
-* Code moved from kernel/time.c and kernel/timer.c
+* Copyright (C) 2004, 2005 IBM, John Stultz (johnstul@us.ibm.com)
+*
+* Code moved and rewritten from kernel/time.c and kernel/timer.c
 * Please see those files for original copyrights.
 *
 * This program is free software; you can redistribute it and/or modify
@@ -48,8 +50,6 @@
 
 /* Don't completely fail for HZ > 500.  */
 int tickadj = 500/HZ ? : 1;		/* microsecs */
-static long time_phase;                 /* phase offset (scaled us) */
-static long time_adj;                   /* tick adjust (scaled 1 / HZ) */
 
 /* Chapter 5: Kernel Variables [RFC 1589 pg. 28] */
 /* 5.1 Interface Variables */
@@ -86,9 +86,9 @@ static seqlock_t ntp_lock = SEQLOCK_UNLO
 void ntp_advance(unsigned long interval_nsec)
 {
 	static unsigned long interval_sum = 0;
-	long time_adjust_step;
 	unsigned long flags;
 	static long ss_adj = 0;
+	static long phase, phase_adj = 0;
 
 	write_seqlock_irqsave(&ntp_lock, flags);
 
@@ -128,22 +128,24 @@ void ntp_advance(unsigned long interval_
 		ntp_offset -= next_adj;
 		offset_adj = shiftR(next_adj, SHIFT_UPDATE); /* ppm */
 
-		time_adj = next_adj << (SHIFT_SCALE - SHIFT_HZ - SHIFT_UPDATE);
-		time_adj += shiftR(ntp_freq, (SHIFT_USEC + SHIFT_HZ - SHIFT_SCALE));
+		/* calculate the per tick phase adjustment for the next second */
+		phase_adj = next_adj << (SHIFT_SCALE - SHIFT_HZ - SHIFT_UPDATE);
+		phase_adj += shiftR(ntp_freq, (SHIFT_USEC + SHIFT_HZ - SHIFT_SCALE));
 #if HZ == 100
     	/* Compensate for (HZ==100) != (1 << SHIFT_HZ).
 	     * Add 25% and 3.125% to get 128.125;
 		 * => only 0.125% error (p. 14)
     	 */
-		time_adj += shiftR(time_adj,2) + shiftR(time_adj,5);
+		phase_adj += shiftR(phase_adj,2) + shiftR(phase_adj,5);
 #endif
 #if HZ == 1000
 	    /* Compensate for (HZ==1000) != (1 << SHIFT_HZ).
     	 * Add 1.5625% and 0.78125% to get 1023.4375;
 		 * => only 0.05% error (p. 14)
 	     */
-		time_adj += shiftR(time_adj,6) + shiftR(time_adj,7);
+		phase_adj += shiftR(phase_adj,6) + shiftR(phase_adj,7);
 #endif
+
 		/* Set ss_adj for the next second */
 		ss_adj = min_t(unsigned long, singleshot_adj, MAX_SINGLESHOT_ADJ);
 		singleshot_adj -= ss_adj;
@@ -176,14 +178,14 @@ void ntp_advance(unsigned long interval_
 	 * Advance the phase, once it gets to one nanosecond,
 	 * then advance the fixed_tick_ns_adj.
 	 */
-	time_phase += time_adj;
-	if (time_phase <= -FINENSEC) {
-		long ltemp = -time_phase >> (SHIFT_SCALE - 10);
-		time_phase += ltemp << (SHIFT_SCALE - 10);
+	phase += phase_adj;
+	if (phase <= -FINENSEC) {
+		long ltemp = -phase >> (SHIFT_SCALE - 10);
+		phase += ltemp << (SHIFT_SCALE - 10);
 		fixed_tick_ns_adj -= ltemp;
-	} else if (time_phase >= FINENSEC) {
-		long ltemp = time_phase >> (SHIFT_SCALE - 10);
-		time_phase -= ltemp << (SHIFT_SCALE - 10);
+	} else if (phase >= FINENSEC) {
+		long ltemp = phase >> (SHIFT_SCALE - 10);
+		phase -= ltemp << (SHIFT_SCALE - 10);
 		fixed_tick_ns_adj += ltemp;
 	}
 



^ permalink raw reply	[flat|nested] 71+ messages in thread

* [RFC - 0/9] Generic timekeeping subsystem  (v. B5)
  2005-08-11  1:21 [RFC - 0/13] NTP cleanup work (v. B5) john stultz
  2005-08-11  1:23 ` [RFC][PATCH - 1/13] NTP cleanup: Move NTP code into ntp.c john stultz
@ 2005-08-11  2:13 ` john stultz
  2005-08-11  2:14   ` [PATCH 1/9] Timesource management code john stultz
                     ` (3 more replies)
  2005-08-15 22:12 ` [RFC - 0/13] NTP cleanup work " Roman Zippel
  2 siblings, 4 replies; 71+ messages in thread
From: john stultz @ 2005-08-11  2:13 UTC (permalink / raw)
  To: lkml
  Cc: George Anzinger, frank, Anton Blanchard, benh,
	Nishanth Aravamudan, Roman Zippel, Ulrich Windl

All,
	Here's the next rev in my rework of the current timekeeping subsystem.
No major changes, only some cleanups and further splitting the larger
patches into smaller ones.

The goal of this patch set is to provide a simplified and streamlined
common timekeeping infrastructure that architectures can optionally use
to avoid duplicating code with other architectures.

This generic timekeeping subsystem is designed around systems that have
continuous timesources to insure correctness and avoid interpolation
errors. Additionally it allows the timekeeping to correctly function
independently from timer interrupts.

For systems that do not have a continuous timesource, no changes are
necessary, the existing tick-based timekeeping still remains. This code
just avoids needless duplication in the arches that do.

For another description on the rework, see here: 
http://lwn.net/Articles/120850/ (Many thanks to the LWN team for that
easy to understand writeup!)

I'd like to thank the following people who have contributed ideas,
criticism, testing and code that has helped shape this work:

	George Anzinger, Nish Aravamudan, Max Asbock, Dominik Brodowski, Darren
Hart, Christoph Lameter, Matt Mackal, Keith Mannthey, Ingo Oeser, Martin
Schwidefsky, Frank Sorenson, Ulrich Windl, Darrick Wong, Roman Zippel
and any others whom I've accidentally forgotten.

thanks
-john

^ permalink raw reply	[flat|nested] 71+ messages in thread

* [PATCH 1/9] Timesource management code
  2005-08-11  2:13 ` [RFC - 0/9] Generic timekeeping subsystem (v. B5) john stultz
@ 2005-08-11  2:14   ` john stultz
  2005-08-11  2:16     ` [PATCH 2/9] Generic timekeeping core subsystem john stultz
  2005-08-11  2:32   ` [RFC - 0/9] Generic timekeeping subsystem (v. B5) Lee Revell
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 71+ messages in thread
From: john stultz @ 2005-08-11  2:14 UTC (permalink / raw)
  To: lkml
  Cc: George Anzinger, frank, Anton Blanchard, benh,
	Nishanth Aravamudan, Roman Zippel, Ulrich Windl

All,
	This patch introduces the timesource management infrastructure. A
timesource is a driver-like architecture generic abstraction of a
freerunning counter. This patch defines the timesource structure, and
provides management code for registering, selecting, accessing and
scaling timesources. The timesource structure is influenced by the
time_interpolator code, although I feel it has a cleaner interface and
avoids preserving system state in the timesource structure.

Additionally, this patch includes the trivial jiffies timesource, a
lowest common denominator timesource, provided mainly for use as an
example.

This patch applies ontop of my ntp cleanup patchset.

Since this patch provides the groundwork for the generic timeofday core,
it will not function without the generic timeofday patches to follow.

thanks
-john

linux-2.6.13-rc6_timeofday-timesource-core_B5.patch
============================================
diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -52,6 +52,7 @@ restrictions referred to are that the re
 	MTD	MTD support is enabled.
 	NET	Appropriate network support is enabled.
 	NUMA	NUMA support is enabled.
+	GENERIC_TOD The generic timeofday code is enabled.
 	NFS	Appropriate NFS support is enabled.
 	OSS	OSS sound support is enabled.
 	PARIDE	The ParIDE subsystem is enabled.
@@ -309,7 +310,7 @@ running once the system is up.
 			Default value is set via a kernel config option.
 			Value can be changed at runtime via /selinux/checkreqprot.
  
- 	clock=		[BUGS=IA-32, HW] gettimeofday timesource override. 
+ 	clock=		[BUGS=IA-32, HW] gettimeofday timesource override. [Deprecated]
 			Forces specified timesource (if avaliable) to be used
 			when calculating gettimeofday(). If specicified timesource
 			is not avalible, it defaults to PIT. 
@@ -1442,6 +1443,10 @@ running once the system is up.
 
 	time		Show timing data prefixed to each printk message line
 
+	timesource=		[GENERIC_TOD] Override the default timesource
+			Override the default timesource and use the timesource
+			with the name specified.
+
 	tipar.timeout=	[HW,PPT]
 			Set communications timeout in tenths of a second
 			(default 15).
diff --git a/drivers/Makefile b/drivers/Makefile
--- a/drivers/Makefile
+++ b/drivers/Makefile
@@ -65,3 +65,4 @@ obj-$(CONFIG_INFINIBAND)	+= infiniband/
 obj-$(CONFIG_SGI_IOC4)		+= sn/
 obj-y				+= firmware/
 obj-$(CONFIG_CRYPTO)		+= crypto/
+obj-$(CONFIG_GENERICTOD)		+= timesource/
diff --git a/drivers/timesource/Makefile b/drivers/timesource/Makefile
new file mode 100644
--- /dev/null
+++ b/drivers/timesource/Makefile
@@ -0,0 +1 @@
+obj-y += jiffies.o
diff --git a/drivers/timesource/jiffies.c b/drivers/timesource/jiffies.c
new file mode 100644
--- /dev/null
+++ b/drivers/timesource/jiffies.c
@@ -0,0 +1,69 @@
+/***********************************************************************
+* linux/drivers/timesource/jiffies.c
+*
+* This file contains the jiffies based time source.
+*
+* Copyright (C) 2004, 2005 IBM, John Stultz (johnstul@us.ibm.com)
+*
+* This program is free software; you can redistribute it and/or modify
+* it under the terms of the GNU General Public License as published by
+* the Free Software Foundation; either version 2 of the License, or
+* (at your option) any later version.
+*
+* This program is distributed in the hope that it will be useful,
+* but WITHOUT ANY WARRANTY; without even the implied warranty of
+* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+* GNU General Public License for more details.
+*
+* You should have received a copy of the GNU General Public License
+* along with this program; if not, write to the Free Software
+* Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
+*
+************************************************************************/
+#include <linux/timesource.h>
+#include <linux/jiffies.h>
+#include <linux/init.h>
+
+/* The Jiffies based timesource is the lowest common
+ * denominator time source which should function on
+ * all systems. It has the same coarse resolution as
+ * the timer interrupt frequency HZ and it suffers
+ * inaccuracies caused by missed or lost timer
+ * interrupts and the inability for the timer
+ * interrupt hardware to accuratly tick at the
+ * requested HZ value. It is also not reccomended
+ * for "tick-less" systems.
+ */
+#define NSEC_PER_JIFFY ((u32)((((u64)NSEC_PER_SEC)<<8)/ACTHZ))
+
+/* Since jiffies uses a simple NSEC_PER_JIFFY multiplier
+ * conversion, the .shift value could be zero. However
+ * this would make NTP adjustments impossible as they are
+ * in units of 1/2^.shift. Thus we use JIFFIES_SHIFT to
+ * shift both the nominator and denominator the same
+ * amount, and give ntp adjustments in units of 1/2^10
+ */
+#define JIFFIES_SHIFT 10
+
+static cycle_t jiffies_read(void)
+{
+	cycle_t ret = get_jiffies_64();
+	return ret;
+}
+
+struct timesource_t timesource_jiffies = {
+	.name = "jiffies",
+	.priority = 0, /* lowest priority*/
+	.type = TIMESOURCE_FUNCTION,
+	.read_fnct = jiffies_read,
+	.mask = (cycle_t)-1,
+	.mult = NSEC_PER_JIFFY << JIFFIES_SHIFT, /* See above for details */
+	.shift = JIFFIES_SHIFT,
+};
+
+static int __init init_jiffies_timesource(void)
+{
+	register_timesource(&timesource_jiffies);
+	return 0;
+}
+module_init(init_jiffies_timesource);
diff --git a/include/linux/timesource.h b/include/linux/timesource.h
new file mode 100644
--- /dev/null
+++ b/include/linux/timesource.h
@@ -0,0 +1,172 @@
+/*  linux/include/linux/timesource.h
+ *
+ *  This file contains the structure definitions for timesources.
+ *
+ *  If you are not a timesource, or the time of day code, you should
+ *  not be including this file!
+ */
+#ifndef _LINUX_TIMESOURCE_H
+#define _LINUX_TIMESOURCE_H
+
+#include <linux/types.h>
+#include <linux/time.h>
+#include <linux/timex.h>
+#include <asm/io.h>
+#include <asm/div64.h>
+
+/* struct timesource_t:
+ *      Provides mostly state-free accessors to the underlying hardware.
+ *
+ * name:                ptr to timesource name
+ * priority:            priority value for selection (higher is better)
+ *                      To avoid priority inflation the following
+ *                      list should give you a guide as to how
+ *                      to assign your timesource a priority
+ *                      1-99: Unfit for real use
+ *                          Only available for bootup and testing purposes.
+ *                      100-199: Base level usability.
+ *                          Functional for real use, but not desired.
+ *                      200-299: Good.
+ *                           A correct and usable timesource.
+ *                      300-399: Desired.
+ *                           A reasonably fast and accurate timesource.
+ *                      400-499: Perfect
+ *                           The ideal timesource. A must-use where available.
+ * type:                defines timesource type
+ * @read_fnct:          returns a cycle value
+ * ptr:                 ptr to MMIO'ed counter
+ * mask:                bitmask for two's complement
+ *                      subtraction of non 64 bit counters
+ * mult:                cycle to nanosecond multiplier
+ * shift:               cycle to nanosecond divisor (power of two)
+ * @update_callback:    called when safe to alter timesource values
+ */
+struct timesource_t {
+	char* name;
+	int priority;
+	enum {
+		TIMESOURCE_FUNCTION,
+		TIMESOURCE_CYCLES,
+		TIMESOURCE_MMIO_32,
+		TIMESOURCE_MMIO_64
+	} type;
+	cycle_t (*read_fnct)(void);
+	void __iomem *mmio_ptr;
+	cycle_t mask;
+	u32 mult;
+	u32 shift;
+	void (*update_callback)(void);
+};
+
+
+/* Helper functions that converts a khz counter
+ * frequency to a timsource multiplier, given the
+ * timesource shift value
+ */
+static inline u32 timesource_khz2mult(u32 khz, u32 shift_constant)
+{
+	/*  khz = cyc/(Million ns)
+	 *  mult/2^shift  = ns/cyc
+	 *  mult = ns/cyc * 2^shift
+	 *  mult = 1Million/khz * 2^shift
+	 *  mult = 1000000 * 2^shift / khz
+	 *  mult = (1000000<<shift) / khz
+	 */
+	u64 tmp = ((u64)1000000) << shift_constant;
+	tmp += khz/2; /* round for do_div */
+	do_div(tmp, khz);
+	return (u32)tmp;
+}
+
+/* Helper functions that converts a hz counter
+ * frequency to a timsource multiplier, given the
+ * timesource shift value
+ */
+static inline u32 timesource_hz2mult(u32 hz, u32 shift_constant)
+{
+	/*  hz = cyc/(Billion ns)
+	 *  mult/2^shift  = ns/cyc
+	 *  mult = ns/cyc * 2^shift
+	 *  mult = 1Billion/hz * 2^shift
+	 *  mult = 1000000000 * 2^shift / hz
+	 *  mult = (1000000000<<shift) / hz
+	 */
+	u64 tmp = ((u64)1000000000) << shift_constant;
+	tmp += hz/2; /* round for do_div */
+	do_div(tmp, hz);
+	return (u32)tmp;
+}
+
+
+#ifndef readq
+/* Provide an a way to atomically read a u64 on a 32bit arch */
+static inline unsigned long long timesource_readq(void __iomem *addr)
+{
+	u32 low, high;
+	/* loop is required to make sure we get an atomic read */
+	do {
+		high = readl(addr+4);
+		low = readl(addr);
+	} while (high != readl(addr+4));
+
+	return low | (((unsigned long long)high) << 32LL);
+}
+#else
+#define timesource_readq(x) readq(x)
+#endif
+
+
+/* read_timesource():
+ *      Uses the timesource to return the current cycle_t value
+ */
+static inline cycle_t read_timesource(struct timesource_t *ts)
+{
+	switch (ts->type) {
+	case TIMESOURCE_MMIO_32:
+		return (cycle_t)readl(ts->mmio_ptr);
+	case TIMESOURCE_MMIO_64:
+		return (cycle_t)timesource_readq(ts->mmio_ptr);
+	case TIMESOURCE_CYCLES:
+		return (cycle_t)get_cycles();
+	default:/* case: TIMESOURCE_FUNCTION */
+		return ts->read_fnct();
+	}
+}
+
+/* cyc2ns():
+ *      Uses the timesource and ntp ajdustment interval to
+ *      convert cycle_ts to nanoseconds.
+ */
+static inline nsec_t cyc2ns(struct timesource_t *ts, int ntp_adj, cycle_t cycles)
+{
+	u64 ret;
+	ret = (u64)cycles;
+	ret *= (ts->mult + ntp_adj);
+	ret >>= ts->shift;
+	return (nsec_t)ret;
+}
+
+/* cyc2ns_rem():
+ *      Uses the timesource and ntp ajdustment interval to
+ *      convert cycle_ts to nanoseconds. Add in remainder portion
+ *      which is stored in ns<<ts->shift units and save the new
+ *      remainder off.
+ */
+static inline nsec_t cyc2ns_rem(struct timesource_t *ts, int ntp_adj, cycle_t cycles, u64* rem)
+{
+	u64 ret;
+	ret = (u64)cycles;
+	ret *= (ts->mult + ntp_adj);
+	if (rem) {
+		ret += *rem;
+		*rem = ret & ((1<<ts->shift)-1);
+	}
+	ret >>= ts->shift;
+	return (nsec_t)ret;
+}
+
+/* used to install a new time source */
+void register_timesource(struct timesource_t*);
+void reselect_timesource(void);
+struct timesource_t* get_next_timesource(void);
+#endif
diff --git a/kernel/timesource.c b/kernel/timesource.c
new file mode 100644
--- /dev/null
+++ b/kernel/timesource.c
@@ -0,0 +1,259 @@
+/*********************************************************************
+* linux/kernel/timesource.c
+*
+* This file contains the functions which manage timesource drivers.
+*
+* Copyright (C) 2004, 2005 IBM, John Stultz (johnstul@us.ibm.com)
+*
+* This program is free software; you can redistribute it and/or modify
+* it under the terms of the GNU General Public License as published by
+* the Free Software Foundation; either version 2 of the License, or
+* (at your option) any later version.
+*
+* This program is distributed in the hope that it will be useful,
+* but WITHOUT ANY WARRANTY; without even the implied warranty of
+* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+* GNU General Public License for more details.
+*
+* You should have received a copy of the GNU General Public License
+* along with this program; if not, write to the Free Software
+* Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
+*
+* TODO WishList:
+*   o Allow timesource drivers to be unregistered
+*   o get rid of timesource_jiffies extern
+**********************************************************************/
+
+#include <linux/timesource.h>
+#include <linux/sysdev.h>
+#include <linux/init.h>
+#include <linux/module.h>
+
+#define MAX_TIMESOURCES 10
+
+
+/* XXX - Would like a better way for initializing curr_timesource */
+extern struct timesource_t timesource_jiffies;
+
+/*[Timesource internal variables]---------
+ * curr_timesource:
+ *     currently selected timesource. Initialized to timesource_jiffies.
+ * next_timesource:
+ *     pending next selected timesource.
+ * timesource_list:
+ *     array of pointers pointing to registered timesources
+ * timesource_list_counter:
+ *     value which counts the number of registered timesources
+ * timesource_lock:
+ *     protects manipulations to curr_timesource and next_timesource
+ *     and the timesource_list
+ */
+static struct timesource_t *curr_timesource = &timesource_jiffies;
+static struct timesource_t *next_timesource;
+static struct timesource_t *timesource_list[MAX_TIMESOURCES];
+static int timesource_list_counter;
+static seqlock_t timesource_lock = SEQLOCK_UNLOCKED;
+
+static char override_name[32];
+
+/**
+ * get_next_timesource - Returns the selected timesource
+ *
+ */
+struct timesource_t* get_next_timesource(void)
+{
+	write_seqlock(&timesource_lock);
+	if (next_timesource) {
+		curr_timesource = next_timesource;
+		next_timesource = NULL;
+	}
+	write_sequnlock(&timesource_lock);
+
+	return curr_timesource;
+}
+
+/**
+ * select_timesource - Finds the best registered timesource.
+ *
+ * Private function. Must have a writelock on timesource_lock
+ * when called.
+ */
+static struct timesource_t* select_timesource(void)
+{
+	struct timesource_t* best = timesource_list[0];
+	int i;
+
+	for (i=0; i < timesource_list_counter; i++) {
+		/* Check for override */
+		if ((override_name[0] != 0) &&
+			(strlen(override_name)
+				== strlen(timesource_list[i]->name)) &&
+			(!strncmp(timesource_list[i]->name, override_name,
+				 strlen(override_name)))) {
+			best = timesource_list[i];
+			break;
+		}
+		/* Pick the highest priority */
+		if (timesource_list[i]->priority > best->priority)
+		 	best = timesource_list[i];
+	}
+	return best;
+}
+
+/**
+ * register_timesource - Used to install new timesources
+ * @t: timesource to be registered
+ *
+ */
+void register_timesource(struct timesource_t* t)
+{
+	char* error_msg = 0;
+	int i;
+	write_seqlock(&timesource_lock);
+
+	/* check if timesource is already registered */
+	for (i=0; i < timesource_list_counter; i++)
+		if (!strncmp(timesource_list[i]->name, t->name, strlen(t->name))){
+			error_msg = "Already registered!";
+			break;
+		}
+
+	/* check that the list isn't full */
+	if (timesource_list_counter >= MAX_TIMESOURCES)
+		error_msg = "Too many timesources!";
+
+	if(!error_msg)
+		timesource_list[timesource_list_counter++] = t;
+	else
+		printk("register_timesource: Cannot register %s. %s\n",
+					t->name, error_msg);
+
+	/* select next timesource */
+	next_timesource = select_timesource();
+
+	write_sequnlock(&timesource_lock);
+}
+EXPORT_SYMBOL(register_timesource);
+
+
+/**
+ * reselect_timesource - Rescan list for next timesource
+ *
+ * A quick helper function to be used if a timesource
+ * changes its priority. Forces the timesource list to
+ * be re-scaned for the best timesource.
+ */
+void reselect_timesource(void)
+{
+	write_seqlock(&timesource_lock);
+	next_timesource = select_timesource();
+	write_sequnlock(&timesource_lock);
+}
+
+/**
+ * sysfs_show_timesources - sysfs interface for listing timesource
+ * @dev: unused
+ * @buf: char buffer to be filled with timesource list
+ *
+ * Provides sysfs interface for listing registered timesources
+ */
+static ssize_t sysfs_show_timesources(struct sys_device *dev, char *buf)
+{
+	int i;
+	char* curr = buf;
+	write_seqlock(&timesource_lock);
+	for(i=0; i < timesource_list_counter; i++) {
+		/* Mark current timesource w/ a star */
+		if (timesource_list[i] == curr_timesource)
+			curr += sprintf(curr, "*");
+		curr += sprintf(curr, "%s ",timesource_list[i]->name);
+	}
+	write_sequnlock(&timesource_lock);
+
+	curr += sprintf(curr, "\n");
+	return curr - buf;
+}
+
+/**
+ * sysfs_override_timesource - interface for manually overriding timesource
+ * @dev: unused
+ * @buf: name of override timesource
+ *
+ *
+ *     Takes input from sysfs interface for manually overriding
+ *     the default timesource selction
+ */
+static ssize_t sysfs_override_timesource(struct sys_device *dev,
+			const char *buf, size_t count)
+{
+	/* check to avoid underflow later */
+	if (strlen(buf) == 0)
+		return count;
+
+	write_seqlock(&timesource_lock);
+
+	/* copy the name given */
+	strncpy(override_name, buf, strlen(buf)-1);
+	override_name[strlen(buf)-1] = 0;
+
+	/* see if we can find it */
+	next_timesource = select_timesource();
+
+	write_sequnlock(&timesource_lock);
+	return count;
+}
+
+/* Sysfs setup bits:
+ */
+static SYSDEV_ATTR(timesource, 0600, sysfs_show_timesources, sysfs_override_timesource);
+
+static struct sysdev_class timesource_sysclass = {
+	set_kset_name("timesource"),
+};
+
+static struct sys_device device_timesource = {
+	.id	= 0,
+	.cls	= &timesource_sysclass,
+};
+
+static int init_timesource_sysfs(void)
+{
+	int error = sysdev_class_register(&timesource_sysclass);
+	if (!error) {
+		error = sysdev_register(&device_timesource);
+		if (!error)
+			error = sysdev_create_file(&device_timesource, &attr_timesource);
+	}
+	return error;
+}
+device_initcall(init_timesource_sysfs);
+
+
+/**
+ * boot_override_timesource - boot time override
+ * @str: override name
+ *
+ * Takes a timesource= boot argument and uses it
+ * as the timesource override name
+ */
+static int __init boot_override_timesource(char* str)
+{
+	if (str)
+		strlcpy(override_name, str, sizeof(override_name));
+	return 1;
+}
+__setup("timesource=", boot_override_timesource);
+
+/**
+ * boot_override_clock - Compatibility layer for deprecated boot option
+ * @str: override name
+ *
+ * DEPRECATED! Takes a clock= boot argument and uses it
+ * as the timesource override name
+ */
+static int __init boot_override_clock(char* str)
+{
+	printk("Warning! clock= boot option is deprecated.\n");
+	return boot_override_timesource(str);
+}
+__setup("clock=", boot_override_clock);



^ permalink raw reply	[flat|nested] 71+ messages in thread

* [PATCH 2/9] Generic timekeeping core subsystem
  2005-08-11  2:14   ` [PATCH 1/9] Timesource management code john stultz
@ 2005-08-11  2:16     ` john stultz
  2005-08-11  2:18       ` [PATCH 3/9] Generic timekeeping i386 arch specific changes, part 1 john stultz
  0 siblings, 1 reply; 71+ messages in thread
From: john stultz @ 2005-08-11  2:16 UTC (permalink / raw)
  To: lkml
  Cc: George Anzinger, frank, Anton Blanchard, benh,
	Nishanth Aravamudan, Roman Zippel, Ulrich Windl

All,

This patch implements the architecture independent portion of the
generic time of day subsystem. Included below is timeofday.c (which
includes all the time of day management and accessor functions), and
minimal hooks into arch independent code.

This patch applies on top of my timesource managment patch.

The patch does nothing without at least minimal architecture specific
hooks (i386, x86-64 and other architecture examples to follow), and it
should be able to be applied to a tree without affecting the existing
code.

thanks
-john


linux-2.6.13-rc6_timeofday-core_B5.patch
============================================
diff --git a/drivers/char/hangcheck-timer.c b/drivers/char/hangcheck-timer.c
--- a/drivers/char/hangcheck-timer.c
+++ b/drivers/char/hangcheck-timer.c
@@ -49,6 +49,7 @@
 #include <linux/delay.h>
 #include <asm/uaccess.h>
 #include <linux/sysrq.h>
+#include <linux/timeofday.h>
 
 
 #define VERSION_STR "0.9.0"
@@ -130,8 +131,12 @@ __setup("hcheck_dump_tasks", hangcheck_p
 #endif
 
 #ifdef HAVE_MONOTONIC
+#ifndef CONFIG_GENERICTOD
 extern unsigned long long monotonic_clock(void);
 #else
+#define monotonic_clock() do_monotonic_clock()
+#endif
+#else
 static inline unsigned long long monotonic_clock(void)
 {
 # ifdef __s390__
diff --git a/include/asm-generic/timeofday.h b/include/asm-generic/timeofday.h
new file mode 100644
--- /dev/null
+++ b/include/asm-generic/timeofday.h
@@ -0,0 +1,26 @@
+/*  linux/include/asm-generic/timeofday.h
+ *
+ *  This file contains the asm-generic interface
+ *  to the arch specific calls used by the time of day subsystem
+ */
+#ifndef _ASM_GENERIC_TIMEOFDAY_H
+#define _ASM_GENERIC_TIMEOFDAY_H
+#include <linux/types.h>
+#include <linux/time.h>
+#include <linux/timex.h>
+#include <asm/div64.h>
+#ifdef CONFIG_GENERICTOD
+
+/* Required externs */
+extern nsec_t read_persistent_clock(void);
+extern void sync_persistent_clock(struct timespec ts);
+
+#ifdef CONFIG_GENERICTOD_VSYSCALL
+extern void arch_update_vsyscall_gtod(nsec_t wall_time, cycle_t offset_base,
+				struct timesource_t* timesource, int ntp_adj);
+#else
+#define arch_update_vsyscall_gtod(x,y,z,w) {}
+#endif /* CONFIG_GENERICTOD_VSYSCALL */
+
+#endif /* CONFIG_GENERICTOD */
+#endif
diff --git a/include/linux/time.h b/include/linux/time.h
--- a/include/linux/time.h
+++ b/include/linux/time.h
@@ -27,6 +27,10 @@ struct timezone {
 
 #ifdef __KERNEL__
 
+/* timeofday base types */
+typedef u64 nsec_t;
+typedef u64 cycle_t;
+
 /* Parameters used to convert the timespec values */
 #ifndef USEC_PER_SEC
 #define USEC_PER_SEC (1000000L)
diff --git a/include/linux/timeofday.h b/include/linux/timeofday.h
new file mode 100644
--- /dev/null
+++ b/include/linux/timeofday.h
@@ -0,0 +1,58 @@
+/*  linux/include/linux/timeofday.h
+ *
+ *  This file contains the interface to the time of day subsystem
+ */
+#ifndef _LINUX_TIMEOFDAY_H
+#define _LINUX_TIMEOFDAY_H
+#include <linux/types.h>
+#include <linux/time.h>
+#include <linux/timex.h>
+#include <asm/div64.h>
+
+#ifdef CONFIG_GENERICTOD
+/* Public definitions */
+extern nsec_t get_lowres_timestamp(void);
+extern nsec_t get_lowres_timeofday(void);
+extern nsec_t do_monotonic_clock(void);
+
+extern void do_gettimeofday(struct timeval *tv);
+extern void getnstimeofday(struct timespec *ts);
+extern int do_settimeofday(struct timespec *tv);
+
+extern void timeofday_init(void);
+
+/* Inline helper functions */
+static inline struct timeval ns_to_timeval(nsec_t ns)
+{
+	struct timeval tv;
+	tv.tv_sec = div_long_long_rem(ns, NSEC_PER_SEC, &tv.tv_usec);
+	tv.tv_usec = (tv.tv_usec + NSEC_PER_USEC/2) / NSEC_PER_USEC;
+	return tv;
+}
+
+static inline struct timespec ns_to_timespec(nsec_t ns)
+{
+	struct timespec ts;
+	ts.tv_sec = div_long_long_rem(ns, NSEC_PER_SEC, &ts.tv_nsec);
+	return ts;
+}
+
+static inline nsec_t timespec_to_ns(struct timespec* ts)
+{
+	nsec_t ret;
+	ret = ((nsec_t)ts->tv_sec) * NSEC_PER_SEC;
+	ret += ts->tv_nsec;
+	return ret;
+}
+
+static inline nsec_t timeval_to_ns(struct timeval* tv)
+{
+	nsec_t ret;
+	ret = ((nsec_t)tv->tv_sec) * NSEC_PER_SEC;
+	ret += tv->tv_usec * NSEC_PER_USEC;
+	return ret;
+}
+#else /* CONFIG_GENERICTOD */
+#define timeofday_init()
+#endif /* CONFIG_GENERICTOD */
+#endif /* _LINUX_TIMEOFDAY_H */
diff --git a/include/linux/timex.h b/include/linux/timex.h
--- a/include/linux/timex.h
+++ b/include/linux/timex.h
@@ -227,6 +227,8 @@ struct timex {
 extern unsigned long tick_usec;		/* USER_HZ period (usec) */
 extern unsigned long tick_nsec;		/* ACTHZ          period (nsec) */
 
+#ifndef CONFIG_GENERICTOD
+
 #ifdef CONFIG_TIME_INTERPOLATION
 
 #define TIME_SOURCE_CPU 0
@@ -281,6 +283,7 @@ time_interpolator_reset(void)
 }
 
 #endif /* !CONFIG_TIME_INTERPOLATION */
+#endif /* !CONFIG_GENERICTOD */
 
 #endif /* KERNEL */
 
diff --git a/init/main.c b/init/main.c
--- a/init/main.c
+++ b/init/main.c
@@ -47,6 +47,7 @@
 #include <linux/rmap.h>
 #include <linux/mempolicy.h>
 #include <linux/key.h>
+#include <linux/timeofday.h>
 
 #include <asm/io.h>
 #include <asm/bugs.h>
@@ -473,6 +474,7 @@ asmlinkage void __init start_kernel(void
 	pidhash_init();
 	init_timers();
 	softirq_init();
+	timeofday_init();
 	time_init();
 
 	/*
diff --git a/kernel/Makefile b/kernel/Makefile
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -9,6 +9,7 @@ obj-y     = sched.o fork.o exec_domain.o
 	    rcupdate.o intermodule.o extable.o params.o posix-timers.o \
 	    kthread.o wait.o kfifo.o sys_ni.o posix-cpu-timers.o ntp.o
 
+obj-$(CONFIG_GENERICTOD) += timeofday.o timesource.o
 obj-$(CONFIG_FUTEX) += futex.o
 obj-$(CONFIG_GENERIC_ISA_DMA) += dma.o
 obj-$(CONFIG_SMP) += cpu.o spinlock.o
diff --git a/kernel/ntp.c b/kernel/ntp.c
--- a/kernel/ntp.c
+++ b/kernel/ntp.c
@@ -48,6 +48,8 @@
 #include <linux/jiffies.h>
 #include <linux/errno.h>
 
+#define NTP_DEBUG 0
+
 /* Don't completely fail for HZ > 500.  */
 int tickadj = 500/HZ ? : 1;		/* microsecs */
 
@@ -67,6 +69,7 @@ static long ntp_reftime;                
 
 /* Extra values */
 static int ntp_state    = TIME_OK;              /* leapsecond state */
+static long ntp_tick    = USEC_PER_SEC/USER_HZ; /* tick length */
 long ntp_adjtime_offset;
 static long ntp_next_adjtime_offset;
 
@@ -83,6 +86,14 @@ static seqlock_t ntp_lock = SEQLOCK_UNLO
 #define MAX_SINGLESHOT_ADJ 500 /* (ppm) */
 #define SEC_PER_DAY 86400
 
+/**
+ * ntp_advance - Periodic hook which increments NTP state machine
+ * interval_nsec: interval value used to increment the state machine
+ *
+ *  Periodic hook which increments NTP state machine by interval.
+ *
+ *  This is ntp_hardclock in the RFC.
+ */
 void ntp_advance(unsigned long interval_nsec)
 {
 	static unsigned long interval_sum = 0;
@@ -243,6 +254,9 @@ static int ntp_hardupdate(long offset, s
 		offset_ppm <<= (SHIFT_USEC - SHIFT_UPDATE);
 
 		ntp_freq += shiftR(offset_ppm, SHIFT_KH);
+#if NTP_DEBUG
+		printk("ntp->freq change: %ld\n",shiftR(offset_ppm, damping));
+#endif
 
 	} else if ((ntp_status & STA_PLL) && (interval < MAXSEC)) {
 		long damping, offset_ppm;
@@ -253,7 +267,14 @@ static int ntp_hardupdate(long offset, s
 
 		ntp_freq += shiftR(offset_ppm, damping);
 
+#if NTP_DEBUG
+		printk("ntp->freq change: %ld\n", shiftR(offset_ppm, damping));
+#endif
 	} else { /* calibration interval out of bounds (p. 12) */
+#if NTP_DEBUG
+		printk("ntp_hardupdate(): interval out of bounds: %ld status: 0x%x\n",
+				interval, ntp_status);
+#endif
 		ret = TIME_ERROR;
 	}
 
@@ -264,9 +285,9 @@ static int ntp_hardupdate(long offset, s
 	return ret;
 }
 
-
-/* adjtimex mainly allows reading (and writing, if superuser) of
- * kernel time-keeping variables. used by xntpd.
+/**
+ * ntp_adjtimex - Interface to change NTP state machine
+ * @txc: timex value passed to the kernel to be used
  */
 int ntp_adjtimex(struct timex *txc)
 {
@@ -315,6 +336,13 @@ int ntp_adjtimex(struct timex *txc)
 				||(txc->tick > 11000000/USER_HZ)))
 		return -EINVAL;
 
+#if NTP_DEBUG
+	if(txc->modes) {
+		printk("adjtimex: txc->offset: %ld    txc->freq: %ld\n",
+				txc->offset, txc->freq);
+	}
+#endif
+
 	write_seqlock_irqsave(&ntp_lock, flags);
 
 	result = ntp_state;       /* mostly `TIME_OK' */
@@ -355,9 +383,8 @@ int ntp_adjtimex(struct timex *txc)
 		tick_adj = ((USEC_PER_SEC + USER_HZ/2)/USER_HZ) - txc->tick;
 		/* multiply by user_hz to get usec/sec => ppm */
 		tick_adj *= USER_HZ;
-
-		tick_usec = txc->tick;
-		tick_nsec = TICK_USEC_TO_NSEC(tick_usec);
+		/* save txc->tick for future calls to adjtimex */
+		ntp_tick = txc->tick;
 	}
 
 	if ((ntp_status & (STA_UNSYNC|STA_CLOCKERR)) != 0)
@@ -376,7 +403,7 @@ int ntp_adjtimex(struct timex *txc)
 	txc->constant = ntp_constant;
 	txc->precision = ntp_precision;
 	txc->tolerance = ntp_tolerance;
-	txc->tick = tick_usec;
+	txc->tick = ntp_tick;
 
 	/* PPS is not implemented, so these are zero */
 	txc->ppsfreq = 0;
diff --git a/kernel/time.c b/kernel/time.c
--- a/kernel/time.c
+++ b/kernel/time.c
@@ -39,6 +39,7 @@
 
 #include <asm/uaccess.h>
 #include <asm/unistd.h>
+#include <linux/timeofday.h>
 
 /* 
  * The timezone where the local system is located.  Used as a default by some
@@ -129,6 +130,7 @@ asmlinkage long sys_gettimeofday(struct 
  * as real UNIX machines always do it. This avoids all headaches about
  * daylight saving times and warping kernel clocks.
  */
+#ifndef CONFIG_GENERICTOD
 static inline void warp_clock(void)
 {
 	write_seqlock_irq(&xtime_lock);
@@ -138,6 +140,18 @@ static inline void warp_clock(void)
 	write_sequnlock_irq(&xtime_lock);
 	clock_was_set();
 }
+#else /* !CONFIG_GENERICTOD */
+/* XXX - this is somewhat cracked out and should
+         be checked  -johnstul@us.ibm.com
+*/
+static inline void warp_clock(void)
+{
+	struct timespec ts;
+	getnstimeofday(&ts);
+	ts.tv_sec += sys_tz.tz_minuteswest * 60;
+	do_settimeofday(&ts);
+}
+#endif /* !CONFIG_GENERICTOD */
 
 /*
  * In case for some reason the CMOS clock has not already been running
@@ -308,6 +322,7 @@ struct timespec timespec_trunc(struct ti
 }
 EXPORT_SYMBOL(timespec_trunc);
 
+#ifndef CONFIG_GENERICTOD
 #ifdef CONFIG_TIME_INTERPOLATION
 void getnstimeofday (struct timespec *tv)
 {
@@ -391,6 +406,7 @@ void getnstimeofday(struct timespec *tv)
 	tv->tv_nsec = x.tv_usec * NSEC_PER_USEC;
 }
 #endif
+#endif /* !CONFIG_GENERICTOD */
 
 #if (BITS_PER_LONG < 64)
 u64 get_jiffies_64(void)
diff --git a/kernel/timeofday.c b/kernel/timeofday.c
new file mode 100644
--- /dev/null
+++ b/kernel/timeofday.c
@@ -0,0 +1,548 @@
+/*********************************************************************
+* linux/kernel/timeofday.c
+*
+* This file contains the functions which access and manage
+* the system's time of day functionality.
+*
+* Copyright (C) 2003, 2004, 2005 IBM, John Stultz (johnstul@us.ibm.com)
+*
+* This program is free software; you can redistribute it and/or modify
+* it under the terms of the GNU General Public License as published by
+* the Free Software Foundation; either version 2 of the License, or
+* (at your option) any later version.
+*
+* This program is distributed in the hope that it will be useful,
+* but WITHOUT ANY WARRANTY; without even the implied warranty of
+* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+* GNU General Public License for more details.
+*
+* You should have received a copy of the GNU General Public License
+* along with this program; if not, write to the Free Software
+* Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
+*
+* TODO WishList:
+*   o See XXX's below.
+**********************************************************************/
+
+#include <linux/timeofday.h>
+#include <linux/timesource.h>
+#include <linux/ntp.h>
+#include <linux/timex.h>
+#include <linux/timer.h>
+#include <linux/module.h>
+#include <linux/sched.h> /* Needed for capable() */
+#include <linux/sysdev.h>
+#include <linux/jiffies.h>
+#include <asm/timeofday.h>
+
+#define TIME_DBG 0
+#define TIME_DBG_FREQ 60000
+
+/* only run periodic_hook every 50ms */
+#define PERIODIC_INTERVAL_MS 50
+
+/*[Nanosecond based variables]
+ * system_time:
+ *     Monotonically increasing counter of the number of nanoseconds
+ *     since boot.
+ * wall_time_offset:
+ *     Offset added to system_time to provide accurate time-of-day
+ */
+static nsec_t system_time;
+static nsec_t wall_time_offset;
+
+/*[Cycle based variables]
+ * offset_base:
+ *     Value of the timesource at the last timeofday_periodic_hook()
+ *     (adjusted only minorly to account for rounded off cycles)
+ */
+static cycle_t offset_base;
+
+/*[Time source data]
+ * timesource:
+ *     current timesource pointer
+ */
+static struct timesource_t *timesource;
+
+/*[NTP adjustment]
+ * ntp_adj:
+ *     value of the current ntp adjustment,
+ *     stored in timesource multiplier units.
+ */
+int ntp_adj;
+
+/*[Locks]
+ * system_time_lock:
+ *     generic lock for all locally scoped time values
+ */
+static seqlock_t system_time_lock = SEQLOCK_UNLOCKED;
+
+
+/*[Suspend/Resume info]
+ * time_suspend_state:
+ *     variable that keeps track of suspend state
+ * suspend_start:
+ *     start of the suspend call
+ */
+static enum {
+	TIME_RUNNING,
+	TIME_SUSPENDED
+} time_suspend_state = TIME_RUNNING;
+
+static nsec_t suspend_start;
+
+/* [Soft-Timers]
+ * timeofday_timer:
+ *     soft-timer used to call timeofday_periodic_hook()
+ */
+struct timer_list timeofday_timer;
+
+
+/* [Functions]
+ */
+
+/**
+ * get_lowres_timestamp - Returns a low res timestamp
+ *
+ * Returns a low res timestamp w/ PERIODIC_INTERVAL_MS
+ * granularity. (ie: the value of system_time as
+ * calculated at the last invocation of
+ * timeofday_periodic_hook())
+ */
+nsec_t get_lowres_timestamp(void)
+{
+	nsec_t ret;
+	unsigned long seq;
+	do {
+		seq = read_seqbegin(&system_time_lock);
+
+		ret = system_time;
+
+	} while (read_seqretry(&system_time_lock, seq));
+
+	return ret;
+}
+
+
+/**
+ * get_lowres_timeofday - Returns a low res time of day
+ *
+ * Returns a low res time of day, as calculated at the
+ * last invocation of timeofday_periodic_hook().
+ */
+nsec_t get_lowres_timeofday(void)
+{
+	nsec_t ret;
+	unsigned long seq;
+	do {
+		seq = read_seqbegin(&system_time_lock);
+
+		ret = system_time + wall_time_offset;
+
+	} while (read_seqretry(&system_time_lock, seq));
+
+	return ret;
+}
+
+
+/**
+ * update_legacy_time_values - Used to sync legacy time values
+ *
+ * Private function. Used to sync legacy time values to
+ * current timeofday. Assumes we have the system_time_lock.
+ * Hopefully someday this function can be removed.
+ */
+static void update_legacy_time_values(void)
+{
+	unsigned long flags;
+	write_seqlock_irqsave(&xtime_lock, flags);
+	xtime = ns_to_timespec(system_time + wall_time_offset);
+	wall_to_monotonic = ns_to_timespec(wall_time_offset);
+	set_normalized_timespec(&wall_to_monotonic,
+		-wall_to_monotonic.tv_sec, -wall_to_monotonic.tv_nsec);
+	/* We don't update jiffies here because it is its own time domain */
+	write_sequnlock_irqrestore(&xtime_lock, flags);
+}
+
+
+/**
+ * __monotonic_clock - Returns monotonically increasing nanoseconds
+ *
+ * private function, must hold system_time_lock lock when being
+ * called. Returns the monotonically increasing number of
+ * nanoseconds since the system booted (adjusted by NTP scaling)
+ */
+static inline nsec_t __monotonic_clock(void)
+{
+	nsec_t ret, ns_offset;
+	cycle_t now, cycle_delta;
+
+	/* read timesource */
+	now = read_timesource(timesource);
+
+	/* calculate the delta since the last timeofday_periodic_hook */
+	cycle_delta = (now - offset_base) & timesource->mask;
+
+	/* convert to nanoseconds */
+	ns_offset = cyc2ns(timesource, ntp_adj, cycle_delta);
+
+	/* add result to system time */
+	ret = system_time + ns_offset;
+
+	return ret;
+}
+
+
+/**
+ * do_monotonic_clock - Returns monotonically increasing nanoseconds
+ *
+ * Returns the monotonically increasing number of nanoseconds
+ * since the system booted via __monotonic_clock()
+ */
+nsec_t do_monotonic_clock(void)
+{
+	nsec_t ret;
+	unsigned long seq;
+
+	/* atomically read __monotonic_clock() */
+	do {
+		seq = read_seqbegin(&system_time_lock);
+
+		ret = __monotonic_clock();
+
+	} while (read_seqretry(&system_time_lock, seq));
+
+	return ret;
+}
+
+
+/**
+ * __gettimeofday - Returns the timeofday in nsec_t.
+ *
+ * Private function. Returns the timeofday in nsec_t.
+ */
+static inline nsec_t __gettimeofday(void)
+{
+	nsec_t wall, sys;
+	unsigned long seq;
+
+	/* atomically read wall and sys time */
+	do {
+		seq = read_seqbegin(&system_time_lock);
+
+		wall = wall_time_offset;
+		sys = __monotonic_clock();
+
+	} while (read_seqretry(&system_time_lock, seq));
+
+	return wall + sys;
+}
+
+
+/**
+ * getnstimeofday - Returns the time of day in a timespec
+ * @ts: pointer to the timespec to be set
+ *
+ * Returns the time of day in a timespec
+ * For consistency should be renamed
+ * later to do_getnstimeofday()
+ */
+void getnstimeofday(struct timespec *ts)
+{
+	*ts = ns_to_timespec(__gettimeofday());
+}
+EXPORT_SYMBOL(getnstimeofday);
+
+
+/**
+ * do_gettimeofday - Returns the time of day in a timeval
+ * @tv: pointer to the timeval to be set
+ *
+ */
+void do_gettimeofday(struct timeval *tv)
+{
+	*tv = ns_to_timeval(__gettimeofday());
+}
+EXPORT_SYMBOL(do_gettimeofday);
+
+
+/**
+ * do_settimeofday - Sets the time of day
+ * @tv: pointer to the timespec that will be used to set the time
+ *
+ */
+int do_settimeofday(struct timespec *tv)
+{
+	unsigned long flags;
+	nsec_t newtime = timespec_to_ns(tv);
+
+	/* atomically adjust wall_time_offset & clear ntp state machine */
+	write_seqlock_irqsave(&system_time_lock, flags);
+
+	wall_time_offset = newtime - __monotonic_clock();
+	ntp_clear();
+
+	update_legacy_time_values();
+
+	arch_update_vsyscall_gtod(system_time + wall_time_offset, offset_base,
+							timesource, ntp_adj);
+
+	write_sequnlock_irqrestore(&system_time_lock, flags);
+
+	/* signal posix-timers about time change */
+	clock_was_set();
+
+	return 0;
+}
+EXPORT_SYMBOL(do_settimeofday);
+
+
+/**
+ * timeofday_suspend_hook - allows the timeofday subsystem to be shutdown
+ * @dev: unused
+ * state: unused
+ *
+ * This function allows the timeofday subsystem to
+ * be shutdown for a period of time. Usefull when
+ * going into suspend/hibernate mode. The code is
+ * very similar to the first half of
+ * timeofday_periodic_hook().
+ */
+static int timeofday_suspend_hook(struct sys_device *dev, u32 state)
+{
+	unsigned long flags;
+
+	write_seqlock_irqsave(&system_time_lock, flags);
+
+	BUG_ON(time_suspend_state != TIME_RUNNING);
+
+	/* First off, save suspend start time
+	 * then quickly call __monotonic_clock.
+	 * These two calls hopefully occur quickly
+	 * because the difference between reads will
+	 * accumulate as time drift on resume.
+	 */
+	suspend_start = read_persistent_clock();
+	system_time = __monotonic_clock();
+
+	time_suspend_state = TIME_SUSPENDED;
+
+	write_sequnlock_irqrestore(&system_time_lock, flags);
+	return 0;
+}
+
+
+/**
+ * timeofday_resume_hook - Resumes the timeofday subsystem.
+ * @dev: unused
+ *
+ * This function resumes the timeofday subsystem
+ * from a previous call to timeofday_suspend_hook.
+ */
+static int timeofday_resume_hook(struct sys_device *dev)
+{
+	nsec_t now, suspend_time;
+	unsigned long flags;
+
+	write_seqlock_irqsave(&system_time_lock, flags);
+
+	BUG_ON(time_suspend_state != TIME_SUSPENDED);
+
+	/* Read persistent clock to mark the end of
+	 * the suspend interval then rebase the
+	 * offset_base to current timesource value.
+	 * Again, time between these two calls will
+	 * not be accounted for and will show up as
+	 * time drift.
+	 */
+	now = read_persistent_clock();
+	offset_base = read_timesource(timesource);
+
+	suspend_time = now - suspend_start;
+
+	system_time += suspend_time;
+
+	ntp_clear();
+
+	time_suspend_state = TIME_RUNNING;
+
+	update_legacy_time_values();
+
+	write_sequnlock_irqrestore(&system_time_lock, flags);
+
+	/* signal posix-timers about time change */
+	clock_was_set();
+
+	return 0;
+}
+
+/* sysfs resume/suspend bits */
+static struct sysdev_class timeofday_sysclass = {
+	.resume = timeofday_resume_hook,
+	.suspend = timeofday_suspend_hook,
+	set_kset_name("timeofday"),
+};
+static struct sys_device device_timer = {
+	.id	= 0,
+	.cls	= &timeofday_sysclass,
+};
+static int timeofday_init_device(void)
+{
+	int error = sysdev_class_register(&timeofday_sysclass);
+	if (!error)
+		error = sysdev_register(&device_timer);
+	return error;
+}
+device_initcall(timeofday_init_device);
+
+/**
+ * timeofday_periodic_hook - Does periodic update of timekeeping values.
+ * unused: unused
+ *
+ * Calculates the delta since the last call,
+ * updates system time and clears the offset.
+ *
+ * Called via timeofday_timer.
+ */
+static void timeofday_periodic_hook(unsigned long unused)
+{
+	cycle_t now, cycle_delta;
+	static u64 remainder;
+	nsec_t ns, ns_ntp;
+	long leapsecond;
+	struct timesource_t* next;
+	unsigned long flags;
+	u64 mult_adj;
+	int ppm;
+
+	write_seqlock_irqsave(&system_time_lock, flags);
+
+	/* read time source & calc time since last call*/
+	now = read_timesource(timesource);
+	cycle_delta = (now - offset_base) & timesource->mask;
+
+	/* convert cycles to ntp adjusted ns and save remainder */
+	ns_ntp = cyc2ns_rem(timesource, ntp_adj, cycle_delta, &remainder);
+
+	/* convert cycles to raw ns for ntp advance */
+	ns = cyc2ns(timesource, 0, cycle_delta);
+
+#if TIME_DBG
+	static int dbg=0;
+	if(!(dbg++%TIME_DBG_FREQ)){
+		printk(KERN_INFO "now: %lluc - then: %lluc = delta: %lluc -> %llu ns + %llu shift_ns (ntp_adj: %i)\n",
+			(unsigned long long)now, (unsigned long long)offset_base,
+			(unsigned long long)cycle_delta, (unsigned long long)ns,
+			(unsigned long long)remainder, ntp_adj);
+	}
+}
+#endif
+
+	/* update system_time */
+	system_time += ns_ntp;
+
+	/* reset the offset_base */
+	offset_base = now;
+
+	/* advance the ntp state machine by ns interval*/
+	ntp_advance((unsigned long)ns);
+
+	/* do ntp leap second processing*/
+	leapsecond = ntp_leapsecond(ns_to_timespec(system_time+wall_time_offset));
+	wall_time_offset += leapsecond * NSEC_PER_SEC;
+
+	/* sync the persistent clock */
+	if (ntp_synced())
+		sync_persistent_clock(ns_to_timespec(system_time + wall_time_offset));
+
+	/* if necessary, switch timesources */
+	next = get_next_timesource();
+	if (next != timesource) {
+		/* immediately set new offset_base */
+		offset_base = read_timesource(next);
+		/* swap timesources */
+		timesource = next;
+		printk(KERN_INFO "Time: %s timesource has been installed.\n",
+					timesource->name);
+		ntp_clear();
+		ntp_adj = 0;
+		remainder = 0;
+	}
+
+	/* now is a safe time, so allow timesource to adjust
+	 * itself (for example: to make cpufreq changes).
+	 */
+	if(timesource->update_callback)
+		timesource->update_callback();
+
+
+	/* Convert the signed ppm to timesource multiplier adjustment */
+	ppm = ntp_get_ppm_adjustment();
+	mult_adj = abs(ppm);
+	mult_adj = (mult_adj * timesource->mult)>>SHIFT_USEC;
+	mult_adj += 1000000/2; /* round for div*/
+	do_div(mult_adj, 1000000);
+	if (ppm < 0)
+		ntp_adj = -(int)mult_adj;
+	else
+		ntp_adj = (int)mult_adj;
+
+
+	update_legacy_time_values();
+
+	arch_update_vsyscall_gtod(system_time + wall_time_offset, offset_base,
+							timesource, ntp_adj);
+
+	write_sequnlock_irqrestore(&system_time_lock, flags);
+
+	/* XXX - Do we need to call clock_was_set() here? */
+
+	/* Set us up to go off on the next interval */
+	mod_timer(&timeofday_timer,
+				jiffies + msecs_to_jiffies(PERIODIC_INTERVAL_MS));
+}
+
+
+/**
+ * timeofday_init - Initializes time variables
+ *
+ */
+void __init timeofday_init(void)
+{
+	unsigned long flags;
+#if TIME_DBG
+	printk(KERN_INFO "timeofday_init: Starting up!\n");
+#endif
+	write_seqlock_irqsave(&system_time_lock, flags);
+
+	/* initialize the timesource variable */
+	timesource = get_next_timesource();
+
+	/* clear and initialize offsets */
+	offset_base = read_timesource(timesource);
+	wall_time_offset = read_persistent_clock();
+
+	/* clear NTP scaling factor & state machine */
+	ntp_adj = 0;
+	ntp_clear();
+
+	/* initialize legacy time values */
+	update_legacy_time_values();
+
+	arch_update_vsyscall_gtod(system_time + wall_time_offset, offset_base,
+							timesource, ntp_adj);
+
+	write_sequnlock_irqrestore(&system_time_lock, flags);
+
+	/* Install timeofday_periodic_hook timer */
+	init_timer(&timeofday_timer);
+	timeofday_timer.function = timeofday_periodic_hook;
+	timeofday_timer.expires = jiffies + 1;
+	add_timer(&timeofday_timer);
+
+
+#if TIME_DBG
+	printk(KERN_INFO "timeofday_init: finished!\n");
+#endif
+	return;
+}
diff --git a/kernel/timer.c b/kernel/timer.c
--- a/kernel/timer.c
+++ b/kernel/timer.c
@@ -599,6 +599,7 @@ struct timespec wall_to_monotonic __attr
 EXPORT_SYMBOL(xtime);
 
 
+#ifndef CONFIG_GENERICTOD
 /* in the NTP reference this is called "hardclock()" */
 static void update_wall_time_one_tick(void)
 {
@@ -641,6 +642,9 @@ static void update_wall_time(unsigned lo
 		}
 	} while (ticks);
 }
+#else /* !CONFIG_GENERICTOD */
+#define update_wall_time(x)
+#endif /* !CONFIG_GENERICTOD */
 
 /*
  * Called from the timer interrupt handler to charge one tick to the current 



^ permalink raw reply	[flat|nested] 71+ messages in thread

* [PATCH 3/9] Generic timekeeping i386 arch specific changes, part 1
  2005-08-11  2:16     ` [PATCH 2/9] Generic timekeeping core subsystem john stultz
@ 2005-08-11  2:18       ` john stultz
  2005-08-11  2:19         ` [PATCH 4/9] generic timekeeping i386 arch specific changes, part 2 john stultz
  0 siblings, 1 reply; 71+ messages in thread
From: john stultz @ 2005-08-11  2:18 UTC (permalink / raw)
  To: lkml
  Cc: George Anzinger, frank, Anton Blanchard, benh,
	Nishanth Aravamudan, Roman Zippel, Ulrich Windl

All,
	The conversion of i386 to use the generic timekeeping subsystem has
been split into 6 parts. This patch, the first of six, is just a simple
cleanup for the i386 arch in preperation of moving the the generic
timekeeping infrastructure. It simply moves some code from timer_pit.c
to i8259.c.
	
It applies on top of my timeofday-core patch. This patch is part the
timeofday-arch-i386 patchset, so without the following parts it is not
expected to compile (although just this one should).
	

thanks
-john

linux-2.6.13-rc6_timeofday-arch-i386-part1_B5.patch
============================================
diff --git a/arch/i386/kernel/i8259.c b/arch/i386/kernel/i8259.c
--- a/arch/i386/kernel/i8259.c
+++ b/arch/i386/kernel/i8259.c
@@ -24,6 +24,7 @@
 #include <asm/apic.h>
 #include <asm/arch_hooks.h>
 #include <asm/i8259.h>
+#include <asm/i8253.h>
 
 #include <linux/irq.h>
 
@@ -399,6 +400,45 @@ void __init init_ISA_irqs (void)
 	}
 }
 
+void setup_pit_timer(void)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&i8253_lock, flags);
+	outb_p(0x34,PIT_MODE);		/* binary, mode 2, LSB/MSB, ch 0 */
+	udelay(10);
+	outb_p(LATCH & 0xff , PIT_CH0);	/* LSB */
+	udelay(10);
+	outb(LATCH >> 8 , PIT_CH0);	/* MSB */
+	spin_unlock_irqrestore(&i8253_lock, flags);
+}
+
+static int timer_resume(struct sys_device *dev)
+{
+	setup_pit_timer();
+	return 0;
+}
+
+static struct sysdev_class timer_sysclass = {
+	set_kset_name("timer_pit"),
+	.resume	= timer_resume,
+};
+
+static struct sys_device device_timer = {
+	.id	= 0,
+	.cls	= &timer_sysclass,
+};
+
+static int __init init_timer_sysfs(void)
+{
+	int error = sysdev_class_register(&timer_sysclass);
+	if (!error)
+		error = sysdev_register(&device_timer);
+	return error;
+}
+
+device_initcall(init_timer_sysfs);
+
 void __init init_IRQ(void)
 {
 	int i;
diff --git a/arch/i386/kernel/timers/timer_pit.c b/arch/i386/kernel/timers/timer_pit.c
--- a/arch/i386/kernel/timers/timer_pit.c
+++ b/arch/i386/kernel/timers/timer_pit.c
@@ -162,43 +162,3 @@ struct init_timer_opts __initdata timer_
 	.init = init_pit, 
 	.opts = &timer_pit,
 };
-
-void setup_pit_timer(void)
-{
-	unsigned long flags;
-
-	spin_lock_irqsave(&i8253_lock, flags);
-	outb_p(0x34,PIT_MODE);		/* binary, mode 2, LSB/MSB, ch 0 */
-	udelay(10);
-	outb_p(LATCH & 0xff , PIT_CH0);	/* LSB */
-	udelay(10);
-	outb(LATCH >> 8 , PIT_CH0);	/* MSB */
-	spin_unlock_irqrestore(&i8253_lock, flags);
-}
-
-static int timer_resume(struct sys_device *dev)
-{
-	setup_pit_timer();
-	return 0;
-}
-
-static struct sysdev_class timer_sysclass = {
-	set_kset_name("timer_pit"),
-	.resume	= timer_resume,
-};
-
-static struct sys_device device_timer = {
-	.id	= 0,
-	.cls	= &timer_sysclass,
-};
-
-static int __init init_timer_sysfs(void)
-{
-	int error = sysdev_class_register(&timer_sysclass);
-	if (!error)
-		error = sysdev_register(&device_timer);
-	return error;
-}
-
-device_initcall(init_timer_sysfs);
-



^ permalink raw reply	[flat|nested] 71+ messages in thread

* [PATCH 4/9] generic timekeeping i386 arch specific changes, part 2
  2005-08-11  2:18       ` [PATCH 3/9] Generic timekeeping i386 arch specific changes, part 1 john stultz
@ 2005-08-11  2:19         ` john stultz
  2005-08-11  2:20           ` [PATCH 5/9] generic timekeeping i386 arch specific changes, part 3 john stultz
  0 siblings, 1 reply; 71+ messages in thread
From: john stultz @ 2005-08-11  2:19 UTC (permalink / raw)
  To: lkml
  Cc: George Anzinger, frank, Anton Blanchard, benh,
	Nishanth Aravamudan, Roman Zippel, Ulrich Windl

All,
	The conversion of i386 to use the generic timekeeping subsystem has
been split into 6 parts. This patch, the second of six, is a cleanup
patch for the i386 arch in preperation of moving the the generic
timekeping infrastructure. It moves some code from timer_tsc.c to a new
tsc.c file.

It applies on top of my timeofday-arch-i386-part1 patch. This patch is
part the timeofday-arch-i386 patchset, so without the following parts it
is not expected to compile.

thanks
-john

linux-2.6.13-rc6_timeofday-arch-i386-part2_B5.patch
============================================
diff --git a/arch/i386/kernel/Makefile b/arch/i386/kernel/Makefile
--- a/arch/i386/kernel/Makefile
+++ b/arch/i386/kernel/Makefile
@@ -7,7 +7,7 @@ extra-y := head.o init_task.o vmlinux.ld
 obj-y	:= process.o semaphore.o signal.o entry.o traps.o irq.o vm86.o \
 		ptrace.o time.o ioport.o ldt.o setup.o i8259.o sys_i386.o \
 		pci-dma.o i386_ksyms.o i387.o dmi_scan.o bootflag.o \
-		doublefault.o quirks.o
+		doublefault.o quirks.o tsc.o
 
 obj-y				+= cpu/
 obj-y				+= timers/
diff --git a/arch/i386/kernel/timers/common.c b/arch/i386/kernel/timers/common.c
--- a/arch/i386/kernel/timers/common.c
+++ b/arch/i386/kernel/timers/common.c
@@ -14,66 +14,6 @@
 
 #include "mach_timer.h"
 
-/* ------ Calibrate the TSC -------
- * Return 2^32 * (1 / (TSC clocks per usec)) for do_fast_gettimeoffset().
- * Too much 64-bit arithmetic here to do this cleanly in C, and for
- * accuracy's sake we want to keep the overhead on the CTC speaker (channel 2)
- * output busy loop as low as possible. We avoid reading the CTC registers
- * directly because of the awkward 8-bit access mechanism of the 82C54
- * device.
- */
-
-#define CALIBRATE_TIME	(5 * 1000020/HZ)
-
-unsigned long calibrate_tsc(void)
-{
-	mach_prepare_counter();
-
-	{
-		unsigned long startlow, starthigh;
-		unsigned long endlow, endhigh;
-		unsigned long count;
-
-		rdtsc(startlow,starthigh);
-		mach_countup(&count);
-		rdtsc(endlow,endhigh);
-
-
-		/* Error: ECTCNEVERSET */
-		if (count <= 1)
-			goto bad_ctc;
-
-		/* 64-bit subtract - gcc just messes up with long longs */
-		__asm__("subl %2,%0\n\t"
-			"sbbl %3,%1"
-			:"=a" (endlow), "=d" (endhigh)
-			:"g" (startlow), "g" (starthigh),
-			 "0" (endlow), "1" (endhigh));
-
-		/* Error: ECPUTOOFAST */
-		if (endhigh)
-			goto bad_ctc;
-
-		/* Error: ECPUTOOSLOW */
-		if (endlow <= CALIBRATE_TIME)
-			goto bad_ctc;
-
-		__asm__("divl %2"
-			:"=a" (endlow), "=d" (endhigh)
-			:"r" (endlow), "0" (0), "1" (CALIBRATE_TIME));
-
-		return endlow;
-	}
-
-	/*
-	 * The CTC wasn't reliable: we got a hit on the very first read,
-	 * or the CPU was so fast/slow that the quotient wouldn't fit in
-	 * 32 bits..
-	 */
-bad_ctc:
-	return 0;
-}
-
 #ifdef CONFIG_HPET_TIMER
 /* ------ Calibrate the TSC using HPET -------
  * Return 2^32 * (1 / (TSC clocks per usec)) for getting the CPU freq.
@@ -146,27 +86,3 @@ unsigned long read_timer_tsc(void)
 	rdtscl(retval);
 	return retval;
 }
-
-
-/* calculate cpu_khz */
-void init_cpu_khz(void)
-{
-	if (cpu_has_tsc) {
-		unsigned long tsc_quotient = calibrate_tsc();
-		if (tsc_quotient) {
-			/* report CPU clock rate in Hz.
-			 * The formula is (10^6 * 2^32) / (2^32 * 1 / (clocks/us)) =
-			 * clock/second. Our precision is about 100 ppm.
-			 */
-			{	unsigned long eax=0, edx=1000;
-				__asm__("divl %2"
-		       		:"=a" (cpu_khz), "=d" (edx)
-        	       		:"r" (tsc_quotient),
-	                	"0" (eax), "1" (edx));
-				printk("Detected %u.%03u MHz processor.\n",
-					cpu_khz / 1000, cpu_khz % 1000);
-			}
-		}
-	}
-}
-
diff --git a/arch/i386/kernel/timers/timer_tsc.c b/arch/i386/kernel/timers/timer_tsc.c
--- a/arch/i386/kernel/timers/timer_tsc.c
+++ b/arch/i386/kernel/timers/timer_tsc.c
@@ -32,10 +32,6 @@ static unsigned long hpet_last;
 static struct timer_opts timer_tsc;
 #endif
 
-static inline void cpufreq_delayed_get(void);
-
-int tsc_disable __devinitdata = 0;
-
 static int use_tsc;
 /* Number of usecs that the last interrupt was delayed */
 static int delay_at_last_interrupt;
@@ -45,34 +41,6 @@ static unsigned long last_tsc_high; /* m
 static unsigned long long monotonic_base;
 static seqlock_t monotonic_lock = SEQLOCK_UNLOCKED;
 
-/* convert from cycles(64bits) => nanoseconds (64bits)
- *  basic equation:
- *		ns = cycles / (freq / ns_per_sec)
- *		ns = cycles * (ns_per_sec / freq)
- *		ns = cycles * (10^9 / (cpu_mhz * 10^6))
- *		ns = cycles * (10^3 / cpu_mhz)
- *
- *	Then we use scaling math (suggested by george@mvista.com) to get:
- *		ns = cycles * (10^3 * SC / cpu_mhz) / SC
- *		ns = cycles * cyc2ns_scale / SC
- *
- *	And since SC is a constant power of two, we can convert the div
- *  into a shift.   
- *			-johnstul@us.ibm.com "math is hard, lets go shopping!"
- */
-static unsigned long cyc2ns_scale; 
-#define CYC2NS_SCALE_FACTOR 10 /* 2^10, carefully chosen */
-
-static inline void set_cyc2ns_scale(unsigned long cpu_mhz)
-{
-	cyc2ns_scale = (1000 << CYC2NS_SCALE_FACTOR)/cpu_mhz;
-}
-
-static inline unsigned long long cycles_2_ns(unsigned long long cyc)
-{
-	return (cyc * cyc2ns_scale) >> CYC2NS_SCALE_FACTOR;
-}
-
 static int count2; /* counter for mark_offset_tsc() */
 
 /* Cached *multiplier* to convert TSC counts to microseconds.
@@ -130,29 +98,6 @@ static unsigned long long monotonic_cloc
 	return base + cycles_2_ns(this_offset - last_offset);
 }
 
-/*
- * Scheduler clock - returns current time in nanosec units.
- */
-unsigned long long sched_clock(void)
-{
-	unsigned long long this_offset;
-
-	/*
-	 * In the NUMA case we dont use the TSC as they are not
-	 * synchronized across all CPUs.
-	 */
-#ifndef CONFIG_NUMA
-	if (!use_tsc)
-#endif
-		/* no locking but a rare wrong value is not a big deal */
-		return jiffies_64 * (1000000000 / HZ);
-
-	/* Read the Time Stamp Counter */
-	rdtscll(this_offset);
-
-	/* return the value in ns */
-	return cycles_2_ns(this_offset);
-}
 
 static void delay_tsc(unsigned long loops)
 {
@@ -217,127 +162,6 @@ static void mark_offset_tsc_hpet(void)
 #endif
 
 
-#ifdef CONFIG_CPU_FREQ
-#include <linux/workqueue.h>
-
-static unsigned int cpufreq_delayed_issched = 0;
-static unsigned int cpufreq_init = 0;
-static struct work_struct cpufreq_delayed_get_work;
-
-static void handle_cpufreq_delayed_get(void *v)
-{
-	unsigned int cpu;
-	for_each_online_cpu(cpu) {
-		cpufreq_get(cpu);
-	}
-	cpufreq_delayed_issched = 0;
-}
-
-/* if we notice lost ticks, schedule a call to cpufreq_get() as it tries
- * to verify the CPU frequency the timing core thinks the CPU is running
- * at is still correct.
- */
-static inline void cpufreq_delayed_get(void) 
-{
-	if (cpufreq_init && !cpufreq_delayed_issched) {
-		cpufreq_delayed_issched = 1;
-		printk(KERN_DEBUG "Losing some ticks... checking if CPU frequency changed.\n");
-		schedule_work(&cpufreq_delayed_get_work);
-	}
-}
-
-/* If the CPU frequency is scaled, TSC-based delays will need a different
- * loops_per_jiffy value to function properly.
- */
-
-static unsigned int  ref_freq = 0;
-static unsigned long loops_per_jiffy_ref = 0;
-
-#ifndef CONFIG_SMP
-static unsigned long fast_gettimeoffset_ref = 0;
-static unsigned int cpu_khz_ref = 0;
-#endif
-
-static int
-time_cpufreq_notifier(struct notifier_block *nb, unsigned long val,
-		       void *data)
-{
-	struct cpufreq_freqs *freq = data;
-
-	if (val != CPUFREQ_RESUMECHANGE)
-		write_seqlock_irq(&xtime_lock);
-	if (!ref_freq) {
-		ref_freq = freq->old;
-		loops_per_jiffy_ref = cpu_data[freq->cpu].loops_per_jiffy;
-#ifndef CONFIG_SMP
-		fast_gettimeoffset_ref = fast_gettimeoffset_quotient;
-		cpu_khz_ref = cpu_khz;
-#endif
-	}
-
-	if ((val == CPUFREQ_PRECHANGE  && freq->old < freq->new) ||
-	    (val == CPUFREQ_POSTCHANGE && freq->old > freq->new) ||
-	    (val == CPUFREQ_RESUMECHANGE)) {
-		if (!(freq->flags & CPUFREQ_CONST_LOOPS))
-			cpu_data[freq->cpu].loops_per_jiffy = cpufreq_scale(loops_per_jiffy_ref, ref_freq, freq->new);
-#ifndef CONFIG_SMP
-		if (cpu_khz)
-			cpu_khz = cpufreq_scale(cpu_khz_ref, ref_freq, freq->new);
-		if (use_tsc) {
-			if (!(freq->flags & CPUFREQ_CONST_LOOPS)) {
-				fast_gettimeoffset_quotient = cpufreq_scale(fast_gettimeoffset_ref, freq->new, ref_freq);
-				set_cyc2ns_scale(cpu_khz/1000);
-			}
-		}
-#endif
-	}
-
-	if (val != CPUFREQ_RESUMECHANGE)
-		write_sequnlock_irq(&xtime_lock);
-
-	return 0;
-}
-
-static struct notifier_block time_cpufreq_notifier_block = {
-	.notifier_call	= time_cpufreq_notifier
-};
-
-
-static int __init cpufreq_tsc(void)
-{
-	int ret;
-	INIT_WORK(&cpufreq_delayed_get_work, handle_cpufreq_delayed_get, NULL);
-	ret = cpufreq_register_notifier(&time_cpufreq_notifier_block,
-					CPUFREQ_TRANSITION_NOTIFIER);
-	if (!ret)
-		cpufreq_init = 1;
-	return ret;
-}
-core_initcall(cpufreq_tsc);
-
-#else /* CONFIG_CPU_FREQ */
-static inline void cpufreq_delayed_get(void) { return; }
-#endif 
-
-int recalibrate_cpu_khz(void)
-{
-#ifndef CONFIG_SMP
-	unsigned int cpu_khz_old = cpu_khz;
-
-	if (cpu_has_tsc) {
-		init_cpu_khz();
-		cpu_data[0].loops_per_jiffy =
-		    cpufreq_scale(cpu_data[0].loops_per_jiffy,
-			          cpu_khz_old,
-				  cpu_khz);
-		return 0;
-	} else
-		return -ENODEV;
-#else
-	return -ENODEV;
-#endif
-}
-EXPORT_SYMBOL(recalibrate_cpu_khz);
 
 static void mark_offset_tsc(void)
 {
@@ -543,24 +367,6 @@ static int __init init_tsc(char* overrid
 	return -ENODEV;
 }
 
-#ifndef CONFIG_X86_TSC
-/* disable flag for tsc.  Takes effect by clearing the TSC cpu flag
- * in cpu/common.c */
-static int __init tsc_setup(char *str)
-{
-	tsc_disable = 1;
-	return 1;
-}
-#else
-static int __init tsc_setup(char *str)
-{
-	printk(KERN_WARNING "notsc: Kernel compiled with CONFIG_X86_TSC, "
-				"cannot disable TSC.\n");
-	return 1;
-}
-#endif
-__setup("notsc", tsc_setup);
-
 
 
 /************************************************************/
diff --git a/arch/i386/kernel/tsc.c b/arch/i386/kernel/tsc.c
new file mode 100644
--- /dev/null
+++ b/arch/i386/kernel/tsc.c
@@ -0,0 +1,298 @@
+/*
+ * This code largely moved from arch/i386/kernel/timer/timer_tsc.c
+ * which was originally moved from arch/i386/kernel/time.c.
+ * See comments there for proper credits.
+ */
+
+#include <linux/init.h>
+#include <linux/timex.h>
+#include <linux/cpufreq.h>
+#include <asm/io.h>
+#include "mach_timer.h"
+
+int tsc_disable __initdata = 0;
+#ifndef CONFIG_X86_TSC
+/* disable flag for tsc.  Takes effect by clearing the TSC cpu flag
+ * in cpu/common.c */
+static int __init tsc_setup(char *str)
+{
+	tsc_disable = 1;
+	return 1;
+}
+#else
+static int __init tsc_setup(char *str)
+{
+	printk(KERN_WARNING "notsc: Kernel compiled with CONFIG_X86_TSC, "
+				"cannot disable TSC.\n");
+	return 1;
+}
+#endif
+__setup("notsc", tsc_setup);
+
+
+int read_current_timer(unsigned long *timer_val)
+{
+	if (cur_timer->read_timer) {
+		*timer_val = cur_timer->read_timer();
+		return 0;
+	}
+	return -1;
+}
+
+
+/* convert from cycles(64bits) => nanoseconds (64bits)
+ *  basic equation:
+ *		ns = cycles / (freq / ns_per_sec)
+ *		ns = cycles * (ns_per_sec / freq)
+ *		ns = cycles * (10^9 / (cpu_mhz * 10^6))
+ *		ns = cycles * (10^3 / cpu_mhz)
+ *
+ *	Then we use scaling math (suggested by george@mvista.com) to get:
+ *		ns = cycles * (10^3 * SC / cpu_mhz) / SC
+ *		ns = cycles * cyc2ns_scale / SC
+ *
+ *	And since SC is a constant power of two, we can convert the div
+ *  into a shift.
+ *			-johnstul@us.ibm.com "math is hard, lets go shopping!"
+ */
+static unsigned long cyc2ns_scale;
+#define CYC2NS_SCALE_FACTOR 10 /* 2^10, carefully chosen */
+
+static inline void set_cyc2ns_scale(unsigned long cpu_mhz)
+{
+	cyc2ns_scale = (1000 << CYC2NS_SCALE_FACTOR)/cpu_mhz;
+}
+
+static inline unsigned long long cycles_2_ns(unsigned long long cyc)
+{
+	return (cyc * cyc2ns_scale) >> CYC2NS_SCALE_FACTOR;
+}
+
+/*
+ * Scheduler clock - returns current time in nanosec units.
+ */
+unsigned long long sched_clock(void)
+{
+	unsigned long long this_offset;
+
+	/*
+	 * In the NUMA case we dont use the TSC as they are not
+	 * synchronized across all CPUs.
+	 */
+#ifndef CONFIG_NUMA
+	if (!use_tsc)
+#endif
+		/* no locking but a rare wrong value is not a big deal */
+		return jiffies_64 * (1000000000 / HZ);
+
+	/* Read the Time Stamp Counter */
+	rdtscll(this_offset);
+
+	/* return the value in ns */
+	return cycles_2_ns(this_offset);
+}
+
+/* ------ Calibrate the TSC -------
+ * Return 2^32 * (1 / (TSC clocks per usec)) for do_fast_gettimeoffset().
+ * Too much 64-bit arithmetic here to do this cleanly in C, and for
+ * accuracy's sake we want to keep the overhead on the CTC speaker (channel 2)
+ * output busy loop as low as possible. We avoid reading the CTC registers
+ * directly because of the awkward 8-bit access mechanism of the 82C54
+ * device.
+ */
+
+#define CALIBRATE_TIME	(5 * 1000020/HZ)
+
+unsigned long calibrate_tsc(void)
+{
+	mach_prepare_counter();
+
+	{
+		unsigned long startlow, starthigh;
+		unsigned long endlow, endhigh;
+		unsigned long count;
+
+		rdtsc(startlow,starthigh);
+		mach_countup(&count);
+		rdtsc(endlow,endhigh);
+
+
+		/* Error: ECTCNEVERSET */
+		if (count <= 1)
+			goto bad_ctc;
+
+		/* 64-bit subtract - gcc just messes up with long longs */
+		__asm__("subl %2,%0\n\t"
+			"sbbl %3,%1"
+			:"=a" (endlow), "=d" (endhigh)
+			:"g" (startlow), "g" (starthigh),
+			 "0" (endlow), "1" (endhigh));
+
+		/* Error: ECPUTOOFAST */
+		if (endhigh)
+			goto bad_ctc;
+
+		/* Error: ECPUTOOSLOW */
+		if (endlow <= CALIBRATE_TIME)
+			goto bad_ctc;
+
+		__asm__("divl %2"
+			:"=a" (endlow), "=d" (endhigh)
+			:"r" (endlow), "0" (0), "1" (CALIBRATE_TIME));
+
+		return endlow;
+	}
+
+	/*
+	 * The CTC wasn't reliable: we got a hit on the very first read,
+	 * or the CPU was so fast/slow that the quotient wouldn't fit in
+	 * 32 bits..
+	 */
+bad_ctc:
+	return 0;
+}
+
+int recalibrate_cpu_khz(void)
+{
+#ifndef CONFIG_SMP
+	unsigned long cpu_khz_old = cpu_khz;
+
+	if (cpu_has_tsc) {
+		init_cpu_khz();
+		cpu_data[0].loops_per_jiffy =
+		    cpufreq_scale(cpu_data[0].loops_per_jiffy,
+			          cpu_khz_old,
+				  cpu_khz);
+		return 0;
+	} else
+		return -ENODEV;
+#else
+	return -ENODEV;
+#endif
+}
+EXPORT_SYMBOL(recalibrate_cpu_khz);
+
+
+/* calculate cpu_khz */
+void init_cpu_khz(void)
+{
+	if (cpu_has_tsc) {
+		unsigned long tsc_quotient = calibrate_tsc();
+		if (tsc_quotient) {
+			/* report CPU clock rate in Hz.
+			 * The formula is (10^6 * 2^32) / (2^32 * 1 / (clocks/us)) =
+			 * clock/second. Our precision is about 100 ppm.
+			 */
+			{	unsigned long eax=0, edx=1000;
+				__asm__("divl %2"
+		       		:"=a" (cpu_khz), "=d" (edx)
+        	       		:"r" (tsc_quotient),
+	                	"0" (eax), "1" (edx));
+				printk("Detected %lu.%03lu MHz processor.\n", cpu_khz / 1000, cpu_khz % 1000);
+			}
+		}
+	}
+}
+
+
+#ifdef CONFIG_CPU_FREQ
+#include <linux/workqueue.h>
+
+static unsigned int cpufreq_delayed_issched = 0;
+static unsigned int cpufreq_init = 0;
+static struct work_struct cpufreq_delayed_get_work;
+
+static void handle_cpufreq_delayed_get(void *v)
+{
+	unsigned int cpu;
+	for_each_online_cpu(cpu) {
+		cpufreq_get(cpu);
+	}
+	cpufreq_delayed_issched = 0;
+}
+
+/* if we notice lost ticks, schedule a call to cpufreq_get() as it tries
+ * to verify the CPU frequency the timing core thinks the CPU is running
+ * at is still correct.
+ */
+void cpufreq_delayed_get(void)
+{
+	if (cpufreq_init && !cpufreq_delayed_issched) {
+		cpufreq_delayed_issched = 1;
+		printk(KERN_DEBUG "Losing some ticks... checking if CPU frequency changed.\n");
+		schedule_work(&cpufreq_delayed_get_work);
+	}
+}
+
+/* If the CPU frequency is scaled, TSC-based delays will need a different
+ * loops_per_jiffy value to function properly.
+ */
+
+static unsigned int  ref_freq = 0;
+static unsigned long loops_per_jiffy_ref = 0;
+
+#ifndef CONFIG_SMP
+static unsigned long fast_gettimeoffset_ref = 0;
+static unsigned long cpu_khz_ref = 0;
+#endif
+
+static int
+time_cpufreq_notifier(struct notifier_block *nb, unsigned long val,
+		       void *data)
+{
+	struct cpufreq_freqs *freq = data;
+
+	if (val != CPUFREQ_RESUMECHANGE)
+		write_seqlock_irq(&xtime_lock);
+	if (!ref_freq) {
+		ref_freq = freq->old;
+		loops_per_jiffy_ref = cpu_data[freq->cpu].loops_per_jiffy;
+#ifndef CONFIG_SMP
+		fast_gettimeoffset_ref = fast_gettimeoffset_quotient;
+		cpu_khz_ref = cpu_khz;
+#endif
+	}
+
+	if ((val == CPUFREQ_PRECHANGE  && freq->old < freq->new) ||
+	    (val == CPUFREQ_POSTCHANGE && freq->old > freq->new) ||
+	    (val == CPUFREQ_RESUMECHANGE)) {
+		if (!(freq->flags & CPUFREQ_CONST_LOOPS))
+			cpu_data[freq->cpu].loops_per_jiffy = cpufreq_scale(loops_per_jiffy_ref, ref_freq, freq->new);
+#ifndef CONFIG_SMP
+		if (cpu_khz)
+			cpu_khz = cpufreq_scale(cpu_khz_ref, ref_freq, freq->new);
+		if (use_tsc) {
+			if (!(freq->flags & CPUFREQ_CONST_LOOPS)) {
+				fast_gettimeoffset_quotient = cpufreq_scale(fast_gettimeoffset_ref, freq->new, ref_freq);
+				set_cyc2ns_scale(cpu_khz/1000);
+			}
+		}
+#endif
+	}
+
+	if (val != CPUFREQ_RESUMECHANGE)
+		write_sequnlock_irq(&xtime_lock);
+
+	return 0;
+}
+
+static struct notifier_block time_cpufreq_notifier_block = {
+	.notifier_call	= time_cpufreq_notifier
+};
+
+
+static int __init cpufreq_tsc(void)
+{
+	int ret;
+	INIT_WORK(&cpufreq_delayed_get_work, handle_cpufreq_delayed_get, NULL);
+	ret = cpufreq_register_notifier(&time_cpufreq_notifier_block,
+					CPUFREQ_TRANSITION_NOTIFIER);
+	if (!ret)
+		cpufreq_init = 1;
+	return ret;
+}
+core_initcall(cpufreq_tsc);
+
+#else /* CONFIG_CPU_FREQ */
+void cpufreq_delayed_get(void) { return; }
+#endif
diff --git a/include/asm-i386/timex.h b/include/asm-i386/timex.h
--- a/include/asm-i386/timex.h
+++ b/include/asm-i386/timex.h
@@ -8,6 +8,7 @@
 
 #include <linux/config.h>
 #include <asm/processor.h>
+#include <asm/tsc.h>
 
 #ifdef CONFIG_X86_ELAN
 #  define CLOCK_TICK_RATE 1189200 /* AMD Elan has different frequency! */
@@ -16,39 +17,6 @@
 #endif
 
 
-/*
- * Standard way to access the cycle counter on i586+ CPUs.
- * Currently only used on SMP.
- *
- * If you really have a SMP machine with i486 chips or older,
- * compile for that, and this will just always return zero.
- * That's ok, it just means that the nicer scheduling heuristics
- * won't work for you.
- *
- * We only use the low 32 bits, and we'd simply better make sure
- * that we reschedule before that wraps. Scheduling at least every
- * four billion cycles just basically sounds like a good idea,
- * regardless of how fast the machine is. 
- */
-typedef unsigned long long cycles_t;
-
-static inline cycles_t get_cycles (void)
-{
-	unsigned long long ret=0;
-
-#ifndef CONFIG_X86_TSC
-	if (!cpu_has_tsc)
-		return 0;
-#endif
-
-#if defined(CONFIG_X86_GENERIC) || defined(CONFIG_X86_TSC)
-	rdtscll(ret);
-#endif
-	return ret;
-}
-
-extern unsigned int cpu_khz;
-
 extern int read_current_timer(unsigned long *timer_value);
 #define ARCH_HAS_READ_CURRENT_TIMER	1
 
diff --git a/include/asm-i386/tsc.h b/include/asm-i386/tsc.h
new file mode 100644
--- /dev/null
+++ b/include/asm-i386/tsc.h
@@ -0,0 +1,44 @@
+/*
+ * linux/include/asm-i386/tsc.h
+ *
+ * i386 TSC related functions
+ */
+#ifndef _ASM_i386_TSC_H
+#define _ASM_i386_TSC_H
+
+#include <linux/config.h>
+#include <asm/processor.h>
+
+/*
+ * Standard way to access the cycle counter on i586+ CPUs.
+ * Currently only used on SMP.
+ *
+ * If you really have a SMP machine with i486 chips or older,
+ * compile for that, and this will just always return zero.
+ * That's ok, it just means that the nicer scheduling heuristics
+ * won't work for you.
+ *
+ * We only use the low 32 bits, and we'd simply better make sure
+ * that we reschedule before that wraps. Scheduling at least every
+ * four billion cycles just basically sounds like a good idea,
+ * regardless of how fast the machine is.
+ */
+typedef unsigned long long cycles_t;
+
+static inline cycles_t get_cycles (void)
+{
+	unsigned long long ret=0;
+
+#ifndef CONFIG_X86_TSC
+	if (!cpu_has_tsc)
+		return 0;
+#endif
+
+#if defined(CONFIG_X86_GENERIC) || defined(CONFIG_X86_TSC)
+	rdtscll(ret);
+#endif
+	return ret;
+}
+
+extern unsigned int cpu_khz;
+#endif



^ permalink raw reply	[flat|nested] 71+ messages in thread

* [PATCH 5/9] generic timekeeping i386 arch specific changes, part 3
  2005-08-11  2:19         ` [PATCH 4/9] generic timekeeping i386 arch specific changes, part 2 john stultz
@ 2005-08-11  2:20           ` john stultz
  2005-08-11  2:21             ` [PATCH 6/9] generic timekeeping i386 arch specific changes, part 4 john stultz
  0 siblings, 1 reply; 71+ messages in thread
From: john stultz @ 2005-08-11  2:20 UTC (permalink / raw)
  To: lkml
  Cc: George Anzinger, frank, Anton Blanchard, benh,
	Nishanth Aravamudan, Roman Zippel, Ulrich Windl


All,
	The conversion of i386 to use the generic timekeeping subsystem has
been split into 6 parts. This patch, the third of six, reworks some of
the code in the new tsc.c file, adding some new interfaces and hooks to
use these new interfaces appropriately. 

It applies on top of my timeofday-arch-i386-part2 patch. This patch is
part the timeofday-arch-i386 patchset, so without the following parts it
is not expected to compile.
	
thanks
-john

linux-2.6.13-rc6_timeofday-arch-i386-part3_B5.patch
============================================
diff --git a/arch/i386/kernel/setup.c b/arch/i386/kernel/setup.c
--- a/arch/i386/kernel/setup.c
+++ b/arch/i386/kernel/setup.c
@@ -1598,6 +1598,7 @@ void __init setup_arch(char **cmdline_p)
 	conswitchp = &dummy_con;
 #endif
 #endif
+	tsc_init();
 }
 
 #include "setup_arch_post.h"
diff --git a/arch/i386/kernel/tsc.c b/arch/i386/kernel/tsc.c
--- a/arch/i386/kernel/tsc.c
+++ b/arch/i386/kernel/tsc.c
@@ -5,11 +5,17 @@
  */
 
 #include <linux/init.h>
-#include <linux/timex.h>
 #include <linux/cpufreq.h>
+#include <asm/tsc.h>
 #include <asm/io.h>
 #include "mach_timer.h"
 
+/* On some systems the TSC frequency does not
+ * change with the cpu frequency. So we need
+ * an extra value to store the TSC freq
+ */
+unsigned int tsc_khz;
+
 int tsc_disable __initdata = 0;
 #ifndef CONFIG_X86_TSC
 /* disable flag for tsc.  Takes effect by clearing the TSC cpu flag
@@ -32,15 +38,42 @@ __setup("notsc", tsc_setup);
 
 int read_current_timer(unsigned long *timer_val)
 {
-	if (cur_timer->read_timer) {
-		*timer_val = cur_timer->read_timer();
+	if (!tsc_disable && cpu_khz) {
+		rdtscl(*timer_val);
 		return 0;
 	}
 	return -1;
 }
 
+/* Code to mark and check if the TSC is unstable
+ * due to cpufreq or due to unsynced TSCs
+ */
+static int tsc_unstable;
+int check_tsc_unstable(void)
+{
+	return tsc_unstable;
+}
+
+void mark_tsc_unstable(void)
+{
+	tsc_unstable = 1;
+}
+
+/* Code to compensate for C3 stalls */
+static u64 tsc_c3_offset;
+void tsc_c3_compensate(unsigned long usecs)
+{
+	u64 cycles = (usecs * tsc_khz)/1000;
+	tsc_c3_offset += cycles;
+}
+
+u64 tsc_read_c3_time(void)
+{
+	return tsc_c3_offset;
+}
 
-/* convert from cycles(64bits) => nanoseconds (64bits)
+/* Accellerators for sched_clock()
+ * convert from cycles(64bits) => nanoseconds (64bits)
  *  basic equation:
  *		ns = cycles / (freq / ns_per_sec)
  *		ns = cycles * (ns_per_sec / freq)
@@ -80,76 +113,54 @@ unsigned long long sched_clock(void)
 	 * synchronized across all CPUs.
 	 */
 #ifndef CONFIG_NUMA
-	if (!use_tsc)
+	if (!cpu_khz || check_tsc_unstable())
 #endif
 		/* no locking but a rare wrong value is not a big deal */
-		return jiffies_64 * (1000000000 / HZ);
+		return (jiffies_64 - INITIAL_JIFFIES) * (1000000000 / HZ);
 
 	/* Read the Time Stamp Counter */
 	rdtscll(this_offset);
+	this_offset += tsc_read_c3_time();
 
 	/* return the value in ns */
 	return cycles_2_ns(this_offset);
 }
 
-/* ------ Calibrate the TSC -------
- * Return 2^32 * (1 / (TSC clocks per usec)) for do_fast_gettimeoffset().
- * Too much 64-bit arithmetic here to do this cleanly in C, and for
- * accuracy's sake we want to keep the overhead on the CTC speaker (channel 2)
- * output busy loop as low as possible. We avoid reading the CTC registers
- * directly because of the awkward 8-bit access mechanism of the 82C54
- * device.
- */
-
-#define CALIBRATE_TIME	(5 * 1000020/HZ)
 
-unsigned long calibrate_tsc(void)
+static unsigned long calculate_cpu_khz(void)
 {
-	mach_prepare_counter();
-
-	{
-		unsigned long startlow, starthigh;
-		unsigned long endlow, endhigh;
-		unsigned long count;
-
-		rdtsc(startlow,starthigh);
+	unsigned long long start, end;
+	unsigned long count;
+	u64 delta64;
+	int i;
+	/* run 3 times to ensure the cache is warm */
+	for(i=0; i<3; i++) {
+		mach_prepare_counter();
+		rdtscll(start);
 		mach_countup(&count);
-		rdtsc(endlow,endhigh);
-
-
-		/* Error: ECTCNEVERSET */
-		if (count <= 1)
-			goto bad_ctc;
-
-		/* 64-bit subtract - gcc just messes up with long longs */
-		__asm__("subl %2,%0\n\t"
-			"sbbl %3,%1"
-			:"=a" (endlow), "=d" (endhigh)
-			:"g" (startlow), "g" (starthigh),
-			 "0" (endlow), "1" (endhigh));
-
-		/* Error: ECPUTOOFAST */
-		if (endhigh)
-			goto bad_ctc;
-
-		/* Error: ECPUTOOSLOW */
-		if (endlow <= CALIBRATE_TIME)
-			goto bad_ctc;
-
-		__asm__("divl %2"
-			:"=a" (endlow), "=d" (endhigh)
-			:"r" (endlow), "0" (0), "1" (CALIBRATE_TIME));
-
-		return endlow;
+		rdtscll(end);
 	}
-
-	/*
+	/* Error: ECTCNEVERSET
 	 * The CTC wasn't reliable: we got a hit on the very first read,
 	 * or the CPU was so fast/slow that the quotient wouldn't fit in
 	 * 32 bits..
 	 */
-bad_ctc:
-	return 0;
+	if (count <= 1)
+		return 0;
+
+	delta64 = end - start;
+
+	/* cpu freq too fast */
+	if(delta64 > (1ULL<<32))
+		return 0;
+	/* cpu freq too slow */
+	if (delta64 <= CALIBRATE_TIME_MSEC)
+		return 0;
+
+	delta64 += CALIBRATE_TIME_MSEC/2; /* round for do_div */
+	do_div(delta64,CALIBRATE_TIME_MSEC);
+
+	return (unsigned long)delta64;
 }
 
 int recalibrate_cpu_khz(void)
@@ -158,11 +169,11 @@ int recalibrate_cpu_khz(void)
 	unsigned long cpu_khz_old = cpu_khz;
 
 	if (cpu_has_tsc) {
-		init_cpu_khz();
+		cpu_khz = calculate_cpu_khz();
+		tsc_khz = cpu_khz;
 		cpu_data[0].loops_per_jiffy =
-		    cpufreq_scale(cpu_data[0].loops_per_jiffy,
-			          cpu_khz_old,
-				  cpu_khz);
+			cpufreq_scale(cpu_data[0].loops_per_jiffy,
+					cpu_khz_old, cpu_khz);
 		return 0;
 	} else
 		return -ENODEV;
@@ -173,25 +184,22 @@ int recalibrate_cpu_khz(void)
 EXPORT_SYMBOL(recalibrate_cpu_khz);
 
 
-/* calculate cpu_khz */
-void init_cpu_khz(void)
+void tsc_init(void)
 {
-	if (cpu_has_tsc) {
-		unsigned long tsc_quotient = calibrate_tsc();
-		if (tsc_quotient) {
-			/* report CPU clock rate in Hz.
-			 * The formula is (10^6 * 2^32) / (2^32 * 1 / (clocks/us)) =
-			 * clock/second. Our precision is about 100 ppm.
-			 */
-			{	unsigned long eax=0, edx=1000;
-				__asm__("divl %2"
-		       		:"=a" (cpu_khz), "=d" (edx)
-        	       		:"r" (tsc_quotient),
-	                	"0" (eax), "1" (edx));
-				printk("Detected %lu.%03lu MHz processor.\n", cpu_khz / 1000, cpu_khz % 1000);
-			}
-		}
-	}
+	if(!cpu_has_tsc || tsc_disable)
+		return;
+
+	cpu_khz = calculate_cpu_khz();
+	tsc_khz = cpu_khz;
+
+	if (!cpu_khz)
+		return;
+
+	printk("Detected %lu.%03lu MHz processor.\n",
+				(unsigned long)cpu_khz / 1000, 
+				(unsigned long)cpu_khz % 1000);
+
+	set_cyc2ns_scale(cpu_khz/1000);
 }
 
 
@@ -211,15 +219,15 @@ static void handle_cpufreq_delayed_get(v
 	cpufreq_delayed_issched = 0;
 }
 
-/* if we notice lost ticks, schedule a call to cpufreq_get() as it tries
+/* if we notice cpufreq oddness, schedule a call to cpufreq_get() as it tries
  * to verify the CPU frequency the timing core thinks the CPU is running
  * at is still correct.
  */
-void cpufreq_delayed_get(void)
+static inline void cpufreq_delayed_get(void)
 {
 	if (cpufreq_init && !cpufreq_delayed_issched) {
 		cpufreq_delayed_issched = 1;
-		printk(KERN_DEBUG "Losing some ticks... checking if CPU frequency changed.\n");
+		printk(KERN_DEBUG "Checking if CPU frequency changed.\n");
 		schedule_work(&cpufreq_delayed_get_work);
 	}
 }
@@ -232,13 +240,11 @@ static unsigned int  ref_freq = 0;
 static unsigned long loops_per_jiffy_ref = 0;
 
 #ifndef CONFIG_SMP
-static unsigned long fast_gettimeoffset_ref = 0;
 static unsigned long cpu_khz_ref = 0;
 #endif
 
-static int
-time_cpufreq_notifier(struct notifier_block *nb, unsigned long val,
-		       void *data)
+static int time_cpufreq_notifier(struct notifier_block *nb,
+		unsigned long val, void *data)
 {
 	struct cpufreq_freqs *freq = data;
 
@@ -248,7 +254,6 @@ time_cpufreq_notifier(struct notifier_bl
 		ref_freq = freq->old;
 		loops_per_jiffy_ref = cpu_data[freq->cpu].loops_per_jiffy;
 #ifndef CONFIG_SMP
-		fast_gettimeoffset_ref = fast_gettimeoffset_quotient;
 		cpu_khz_ref = cpu_khz;
 #endif
 	}
@@ -258,16 +263,20 @@ time_cpufreq_notifier(struct notifier_bl
 	    (val == CPUFREQ_RESUMECHANGE)) {
 		if (!(freq->flags & CPUFREQ_CONST_LOOPS))
 			cpu_data[freq->cpu].loops_per_jiffy = cpufreq_scale(loops_per_jiffy_ref, ref_freq, freq->new);
+
+		if (cpu_khz) {
 #ifndef CONFIG_SMP
-		if (cpu_khz)
 			cpu_khz = cpufreq_scale(cpu_khz_ref, ref_freq, freq->new);
-		if (use_tsc) {
+#endif
 			if (!(freq->flags & CPUFREQ_CONST_LOOPS)) {
-				fast_gettimeoffset_quotient = cpufreq_scale(fast_gettimeoffset_ref, freq->new, ref_freq);
+				tsc_khz = cpu_khz;
 				set_cyc2ns_scale(cpu_khz/1000);
+				/* TSC based sched_clock turns
+				 * to junk w/ cpufreq
+				 */
+				mark_tsc_unstable();
 			}
 		}
-#endif
 	}
 
 	if (val != CPUFREQ_RESUMECHANGE)
@@ -289,10 +298,9 @@ static int __init cpufreq_tsc(void)
 					CPUFREQ_TRANSITION_NOTIFIER);
 	if (!ret)
 		cpufreq_init = 1;
+
 	return ret;
 }
 core_initcall(cpufreq_tsc);
 
-#else /* CONFIG_CPU_FREQ */
-void cpufreq_delayed_get(void) { return; }
 #endif
diff --git a/drivers/acpi/processor_idle.c b/drivers/acpi/processor_idle.c
--- a/drivers/acpi/processor_idle.c
+++ b/drivers/acpi/processor_idle.c
@@ -166,6 +166,7 @@ acpi_processor_power_activate (
 	return;
 }
 
+extern void tsc_c3_compensate(unsigned long usecs);
 
 static atomic_t 	c3_cpu_count;
 
@@ -334,6 +335,9 @@ static void acpi_processor_idle (void)
 			acpi_set_register(ACPI_BITREG_ARB_DISABLE, 0, ACPI_MTX_DO_NOT_LOCK);
 		}
 
+		/* compensate for TSC pause */
+		tsc_c3_compensate((((t2-t1)&0xFFFFFF)*286)>>10);
+
 		/* Re-enable interrupts */
 		local_irq_enable();
 		/* Compute time (ticks) that we were actually asleep */
diff --git a/include/asm-i386/mach-default/mach_timer.h b/include/asm-i386/mach-default/mach_timer.h
--- a/include/asm-i386/mach-default/mach_timer.h
+++ b/include/asm-i386/mach-default/mach_timer.h
@@ -15,7 +15,9 @@
 #ifndef _MACH_TIMER_H
 #define _MACH_TIMER_H
 
-#define CALIBRATE_LATCH	(5 * LATCH)
+#define CALIBRATE_TIME_MSEC 30 /* 30 msecs */
+#define CALIBRATE_LATCH	\
+	((CLOCK_TICK_RATE * CALIBRATE_TIME_MSEC + 1000/2)/1000)
 
 static inline void mach_prepare_counter(void)
 {
diff --git a/include/asm-i386/mach-summit/mach_mpparse.h b/include/asm-i386/mach-summit/mach_mpparse.h
--- a/include/asm-i386/mach-summit/mach_mpparse.h
+++ b/include/asm-i386/mach-summit/mach_mpparse.h
@@ -30,6 +30,7 @@ static inline int mps_oem_check(struct m
 			(!strncmp(productid, "VIGIL SMP", 9) 
 			 || !strncmp(productid, "EXA", 3)
 			 || !strncmp(productid, "RUTHLESS SMP", 12))){
+		mark_tsc_unstable();
 		use_cyclone = 1; /*enable cyclone-timer*/
 		setup_summit();
 		usb_early_handoff = 1;
@@ -44,6 +45,7 @@ static inline int acpi_madt_oem_check(ch
 	if (!strncmp(oem_id, "IBM", 3) &&
 	    (!strncmp(oem_table_id, "SERVIGIL", 8)
 	     || !strncmp(oem_table_id, "EXA", 3))){
+		mark_tsc_unstable();
 		use_cyclone = 1; /*enable cyclone-timer*/
 		setup_summit();
 		usb_early_handoff = 1;
diff --git a/include/asm-i386/tsc.h b/include/asm-i386/tsc.h
--- a/include/asm-i386/tsc.h
+++ b/include/asm-i386/tsc.h
@@ -41,4 +41,10 @@ static inline cycles_t get_cycles (void)
 }
 
 extern unsigned int cpu_khz;
+extern unsigned int tsc_khz;
+extern void tsc_init(void);
+void tsc_c3_compensate(unsigned long usecs);
+u64 tsc_read_c3_time(void);
+extern int check_tsc_unstable(void);
+extern void mark_tsc_unstable(void);
 #endif



^ permalink raw reply	[flat|nested] 71+ messages in thread

* [PATCH 6/9] generic timekeeping i386 arch specific changes, part 4
  2005-08-11  2:20           ` [PATCH 5/9] generic timekeeping i386 arch specific changes, part 3 john stultz
@ 2005-08-11  2:21             ` john stultz
  2005-08-11  2:23               ` [PATCH 7/9] generic timekeeping i386 arch specific changes, part 5 john stultz
  0 siblings, 1 reply; 71+ messages in thread
From: john stultz @ 2005-08-11  2:21 UTC (permalink / raw)
  To: lkml
  Cc: George Anzinger, frank, Anton Blanchard, benh,
	Nishanth Aravamudan, Roman Zippel, Ulrich Windl

All,
	The conversion of i386 to use the generic timekeeping subsystem has
been split into 6 parts. This patch, the fourth of six, renames some
ACPI PM variables. 

It applies on top of my timeofday-arch-i386-part3 patch. This patch is
part the timeofday-arch-i386 patchset, so without the following parts it
is not expected to compile.
	
thanks
-john

linux-2.6.13-rc6_timeofday-arch-i386-part4_B5.patch
============================================
diff --git a/arch/i386/kernel/acpi/boot.c b/arch/i386/kernel/acpi/boot.c
--- a/arch/i386/kernel/acpi/boot.c
+++ b/arch/i386/kernel/acpi/boot.c
@@ -633,7 +633,8 @@ static int __init acpi_parse_hpet(unsign
 #endif
 
 #ifdef CONFIG_X86_PM_TIMER
-extern u32 pmtmr_ioport;
+u32 acpi_pmtmr_ioport;
+int acpi_pmtmr_buggy;
 #endif
 
 static int __init acpi_parse_fadt(unsigned long phys, unsigned long size)
@@ -664,13 +665,13 @@ static int __init acpi_parse_fadt(unsign
 		if (fadt->xpm_tmr_blk.address_space_id != ACPI_ADR_SPACE_SYSTEM_IO)
 			return 0;
 
-		pmtmr_ioport = fadt->xpm_tmr_blk.address;
+		acpi_pmtmr_ioport = fadt->xpm_tmr_blk.address;
 	} else {
 		/* FADT rev. 1 */
-		pmtmr_ioport = fadt->V1_pm_tmr_blk;
+		acpi_pmtmr_ioport = fadt->V1_pm_tmr_blk;
 	}
-	if (pmtmr_ioport)
-		printk(KERN_INFO PREFIX "PM-Timer IO Port: %#x\n", pmtmr_ioport);
+	if (acpi_pmtmr_ioport)
+		printk(KERN_INFO PREFIX "PM-Timer IO Port: %#x\n", acpi_pmtmr_ioport);
 #endif
 	return 0;
 }



^ permalink raw reply	[flat|nested] 71+ messages in thread

* [PATCH 7/9] generic timekeeping i386 arch specific changes, part 5
  2005-08-11  2:21             ` [PATCH 6/9] generic timekeeping i386 arch specific changes, part 4 john stultz
@ 2005-08-11  2:23               ` john stultz
  2005-08-11  2:24                 ` [PATCH 8/9] generic timekeeping i386 arch specific changes, part 6 john stultz
  0 siblings, 1 reply; 71+ messages in thread
From: john stultz @ 2005-08-11  2:23 UTC (permalink / raw)
  To: lkml
  Cc: George Anzinger, frank, Anton Blanchard, benh,
	Nishanth Aravamudan, Roman Zippel, Ulrich Windl

All,
	The conversion of i386 to use the generic timekeeping subsystem has
been split into 6 parts. This patch, the fifth of six, converts the i386
arch to use the generic timekeeping subsystem.

It applies on top of my timeofday-arch-i386-part4 patch. This patch is
part the timeofday-arch-i386 patchset, so without the following parts it
is not expected to compile.

thanks
-john

linux-2.6.13-rc6_timeofday-arch-i386-part5_B5.patch
============================================
diff --git a/arch/i386/Kconfig b/arch/i386/Kconfig
--- a/arch/i386/Kconfig
+++ b/arch/i386/Kconfig
@@ -14,6 +14,10 @@ config X86
 	  486, 586, Pentiums, and various instruction-set-compatible chips by
 	  AMD, Cyrix, and others.
 
+config GENERICTOD
+	bool
+	default y
+
 config MMU
 	bool
 	default y
diff --git a/arch/i386/kernel/Makefile b/arch/i386/kernel/Makefile
--- a/arch/i386/kernel/Makefile
+++ b/arch/i386/kernel/Makefile
@@ -10,7 +10,6 @@ obj-y	:= process.o semaphore.o signal.o 
 		doublefault.o quirks.o tsc.o
 
 obj-y				+= cpu/
-obj-y				+= timers/
 obj-$(CONFIG_ACPI_BOOT)		+= acpi/
 obj-$(CONFIG_X86_BIOS_REBOOT)	+= reboot.o
 obj-$(CONFIG_MCA)		+= mca.o
diff --git a/arch/i386/kernel/time.c b/arch/i386/kernel/time.c
--- a/arch/i386/kernel/time.c
+++ b/arch/i386/kernel/time.c
@@ -46,7 +46,6 @@
 #include <linux/bcd.h>
 #include <linux/efi.h>
 #include <linux/mca.h>
-#include <linux/ntp.h>
 
 #include <asm/io.h>
 #include <asm/smp.h>
@@ -57,6 +56,7 @@
 #include <asm/uaccess.h>
 #include <asm/processor.h>
 #include <asm/timer.h>
+#include <asm/timeofday.h>
 
 #include "mach_time.h"
 
@@ -92,8 +92,6 @@ EXPORT_SYMBOL(rtc_lock);
 DEFINE_SPINLOCK(i8253_lock);
 EXPORT_SYMBOL(i8253_lock);
 
-struct timer_opts *cur_timer __read_mostly = &timer_none;
-
 /*
  * This is a special lock that is owned by the CPU and holds the index
  * register we are working with.  It is required for NMI access to the
@@ -123,99 +121,19 @@ void rtc_cmos_write(unsigned char val, u
 }
 EXPORT_SYMBOL(rtc_cmos_write);
 
-/*
- * This version of gettimeofday has microsecond resolution
- * and better than microsecond precision on fast x86 machines with TSC.
- */
-void do_gettimeofday(struct timeval *tv)
-{
-	unsigned long seq;
-	unsigned long usec, sec;
-	unsigned long max_ntp_tick;
-
-	do {
-		unsigned long lost;
-
-		seq = read_seqbegin(&xtime_lock);
-
-		usec = cur_timer->get_offset();
-		lost = jiffies - wall_jiffies;
-
-		/*
-		 * If ntp_adjtime_offset is negative then NTP is slowing the clock
-		 * so make sure not to go into next possible interval.
-		 * Better to lose some accuracy than have time go backwards..
-		 */
-		if (unlikely(ntp_adjtime_offset < 0)) {
-			max_ntp_tick = (USEC_PER_SEC / HZ) - tickadj;
-			usec = min(usec, max_ntp_tick);
-
-			if (lost)
-				usec += lost * max_ntp_tick;
-		}
-		else if (unlikely(lost))
-			usec += lost * (USEC_PER_SEC / HZ);
-
-		sec = xtime.tv_sec;
-		usec += (xtime.tv_nsec / 1000);
-	} while (read_seqretry(&xtime_lock, seq));
-
-	while (usec >= 1000000) {
-		usec -= 1000000;
-		sec++;
-	}
-
-	tv->tv_sec = sec;
-	tv->tv_usec = usec;
-}
-
-EXPORT_SYMBOL(do_gettimeofday);
-
-int do_settimeofday(struct timespec *tv)
-{
-	time_t wtm_sec, sec = tv->tv_sec;
-	long wtm_nsec, nsec = tv->tv_nsec;
-
-	if ((unsigned long)tv->tv_nsec >= NSEC_PER_SEC)
-		return -EINVAL;
-
-	write_seqlock_irq(&xtime_lock);
-	/*
-	 * This is revolting. We need to set "xtime" correctly. However, the
-	 * value in this location is the value at the most recent update of
-	 * wall time.  Discover what correction gettimeofday() would have
-	 * made, and then undo it!
-	 */
-	nsec -= cur_timer->get_offset() * NSEC_PER_USEC;
-	nsec -= (jiffies - wall_jiffies) * TICK_NSEC;
-
-	wtm_sec  = wall_to_monotonic.tv_sec + (xtime.tv_sec - sec);
-	wtm_nsec = wall_to_monotonic.tv_nsec + (xtime.tv_nsec - nsec);
-
-	set_normalized_timespec(&xtime, sec, nsec);
-	set_normalized_timespec(&wall_to_monotonic, wtm_sec, wtm_nsec);
-
-	ntp_clear();
-	write_sequnlock_irq(&xtime_lock);
-	clock_was_set();
-	return 0;
-}
-
-EXPORT_SYMBOL(do_settimeofday);
-
 static int set_rtc_mmss(unsigned long nowtime)
 {
 	int retval;
-
-	WARN_ON(irqs_disabled());
+	unsigned long flags;
 
 	/* gets recalled with irq locally disabled */
-	spin_lock_irq(&rtc_lock);
+	/* XXX - does irqsave resolve this? -johnstul */
+	spin_lock_irqsave(&rtc_lock, flags);
 	if (efi_enabled)
 		retval = efi_set_rtc_mmss(nowtime);
 	else
 		retval = mach_set_rtc_mmss(nowtime);
-	spin_unlock_irq(&rtc_lock);
+	spin_unlock_irqrestore(&rtc_lock, flags);
 
 	return retval;
 }
@@ -223,16 +141,6 @@ static int set_rtc_mmss(unsigned long no
 
 int timer_ack;
 
-/* monotonic_clock(): returns # of nanoseconds passed since time_init()
- *		Note: This function is required to return accurate
- *		time even in the absence of multiple timer ticks.
- */
-unsigned long long monotonic_clock(void)
-{
-	return cur_timer->monotonic_clock();
-}
-EXPORT_SYMBOL(monotonic_clock);
-
 #if defined(CONFIG_SMP) && defined(CONFIG_FRAME_POINTER)
 unsigned long profile_pc(struct pt_regs *regs)
 {
@@ -247,12 +155,21 @@ EXPORT_SYMBOL(profile_pc);
 #endif
 
 /*
- * timer_interrupt() needs to keep up the real-time clock,
- * as well as call the "do_timer()" routine every clocktick
+ * This is the same as the above, except we _also_ save the current
+ * Time Stamp Counter value at the time of the timer interrupt, so that
+ * we later on can estimate the time of day more exactly.
  */
-static inline void do_timer_interrupt(int irq, void *dev_id,
-					struct pt_regs *regs)
+irqreturn_t timer_interrupt(int irq, void *dev_id, struct pt_regs *regs)
 {
+	/*
+	 * Here we are in the timer irq handler. We just have irqs locally
+	 * disabled but we don't know if the timer_bh is running on the other
+	 * CPU. We need to avoid to SMP race with it. NOTE: we don' t need
+	 * the irq version of write_lock because as just said we have irq
+	 * locally disabled. -arca
+	 */
+	write_seqlock(&xtime_lock);
+
 #ifdef CONFIG_X86_IO_APIC
 	if (timer_ack) {
 		/*
@@ -285,27 +202,6 @@ static inline void do_timer_interrupt(in
 		irq = inb_p( 0x61 );	/* read the current state */
 		outb_p( irq|0x80, 0x61 );	/* reset the IRQ */
 	}
-}
-
-/*
- * This is the same as the above, except we _also_ save the current
- * Time Stamp Counter value at the time of the timer interrupt, so that
- * we later on can estimate the time of day more exactly.
- */
-irqreturn_t timer_interrupt(int irq, void *dev_id, struct pt_regs *regs)
-{
-	/*
-	 * Here we are in the timer irq handler. We just have irqs locally
-	 * disabled but we don't know if the timer_bh is running on the other
-	 * CPU. We need to avoid to SMP race with it. NOTE: we don' t need
-	 * the irq version of write_lock because as just said we have irq
-	 * locally disabled. -arca
-	 */
-	write_seqlock(&xtime_lock);
-
-	cur_timer->mark_offset();
- 
-	do_timer_interrupt(irq, NULL, regs);
 
 	write_sequnlock(&xtime_lock);
 	return IRQ_HANDLED;
@@ -329,55 +225,34 @@ unsigned long get_cmos_time(void)
 }
 EXPORT_SYMBOL(get_cmos_time);
 
-static void sync_cmos_clock(unsigned long dummy);
-
-static struct timer_list sync_cmos_timer =
-                                      TIMER_INITIALIZER(sync_cmos_clock, 0, 0);
-
-static void sync_cmos_clock(unsigned long dummy)
+/* arch specific timeofday hooks */
+nsec_t read_persistent_clock(void)
 {
-	struct timeval now, next;
-	int fail = 1;
+	return (nsec_t)get_cmos_time() * NSEC_PER_SEC;
+}
 
+void sync_persistent_clock(struct timespec ts)
+{
+	static unsigned long last_rtc_update;
 	/*
 	 * If we have an externally synchronized Linux clock, then update
 	 * CMOS clock accordingly every ~11 minutes. Set_rtc_mmss() has to be
 	 * called as close as possible to 500 ms before the new second starts.
-	 * This code is run on a timer.  If the clock is set, that timer
-	 * may not expire at the correct time.  Thus, we adjust...
 	 */
-	if (!ntp_synced())
-		/*
-		 * Not synced, exit, do not restart a timer (if one is
-		 * running, let it run out).
-		 */
+	if (ts.tv_sec <= last_rtc_update + 660)
 		return;
 
-	do_gettimeofday(&now);
-	if (now.tv_usec >= USEC_AFTER - ((unsigned) TICK_SIZE) / 2 &&
-	    now.tv_usec <= USEC_BEFORE + ((unsigned) TICK_SIZE) / 2)
-		fail = set_rtc_mmss(now.tv_sec);
-
-	next.tv_usec = USEC_AFTER - now.tv_usec;
-	if (next.tv_usec <= 0)
-		next.tv_usec += USEC_PER_SEC;
-
-	if (!fail)
-		next.tv_sec = 659;
-	else
-		next.tv_sec = 0;
-
-	if (next.tv_usec >= USEC_PER_SEC) {
-		next.tv_sec++;
-		next.tv_usec -= USEC_PER_SEC;
+	if((ts.tv_nsec / 1000) >= USEC_AFTER - ((unsigned) TICK_SIZE) / 2 &&
+		(ts.tv_nsec / 1000) <= USEC_BEFORE + ((unsigned) TICK_SIZE) / 2) {
+		/* horrible...FIXME */
+		if (set_rtc_mmss(ts.tv_sec) == 0)
+			last_rtc_update = ts.tv_sec;
+		else
+			last_rtc_update = ts.tv_sec - 600; /* do it again in 60 s */
 	}
-	mod_timer(&sync_cmos_timer, jiffies + timeval_to_jiffies(&next));
 }
 
-void notify_arch_cmos_timer(void)
-{
-	mod_timer(&sync_cmos_timer, jiffies + 1);
-}
+
 
 static long clock_cmos_diff, sleep_start;
 
@@ -394,7 +269,6 @@ static int timer_suspend(struct sys_devi
 
 static int timer_resume(struct sys_device *dev)
 {
-	unsigned long flags;
 	unsigned long sec;
 	unsigned long sleep_length;
 
@@ -404,10 +278,6 @@ static int timer_resume(struct sys_devic
 #endif
 	sec = get_cmos_time() + clock_cmos_diff;
 	sleep_length = (get_cmos_time() - sleep_start) * HZ;
-	write_seqlock_irqsave(&xtime_lock, flags);
-	xtime.tv_sec = sec;
-	xtime.tv_nsec = 0;
-	write_sequnlock_irqrestore(&xtime_lock, flags);
 	jiffies += sleep_length;
 	wall_jiffies += sleep_length;
 	return 0;
@@ -441,17 +311,10 @@ extern void (*late_time_init)(void);
 /* Duplicate of time_init() below, with hpet_enable part added */
 static void __init hpet_time_init(void)
 {
-	xtime.tv_sec = get_cmos_time();
-	xtime.tv_nsec = (INITIAL_JIFFIES % HZ) * (NSEC_PER_SEC / HZ);
-	set_normalized_timespec(&wall_to_monotonic,
-		-xtime.tv_sec, -xtime.tv_nsec);
-
 	if ((hpet_enable() >= 0) && hpet_use_timer) {
 		printk("Using HPET for base-timer\n");
 	}
 
-	cur_timer = select_timer();
-	printk(KERN_INFO "Using %s for high-res timesource\n",cur_timer->name);
 
 	time_init_hook();
 }
@@ -469,13 +332,5 @@ void __init time_init(void)
 		return;
 	}
 #endif
-	xtime.tv_sec = get_cmos_time();
-	xtime.tv_nsec = (INITIAL_JIFFIES % HZ) * (NSEC_PER_SEC / HZ);
-	set_normalized_timespec(&wall_to_monotonic,
-		-xtime.tv_sec, -xtime.tv_nsec);
-
-	cur_timer = select_timer();
-	printk(KERN_INFO "Using %s for high-res timesource\n",cur_timer->name);
-
 	time_init_hook();
 }
diff --git a/arch/i386/lib/delay.c b/arch/i386/lib/delay.c
--- a/arch/i386/lib/delay.c
+++ b/arch/i386/lib/delay.c
@@ -13,6 +13,7 @@
 #include <linux/config.h>
 #include <linux/sched.h>
 #include <linux/delay.h>
+#include <linux/timeofday.h>
 #include <linux/module.h>
 #include <asm/processor.h>
 #include <asm/delay.h>
@@ -22,11 +23,20 @@
 #include <asm/smp.h>
 #endif
 
-extern struct timer_opts* timer;
-
+/* XXX - For now just use a simple loop delay
+ *  This has cpufreq issues, but so did the old method.
+ */
 void __delay(unsigned long loops)
 {
-	cur_timer->delay(loops);
+	int d0;
+	__asm__ __volatile__(
+		"\tjmp 1f\n"
+		".align 16\n"
+		"1:\tjmp 2f\n"
+		".align 16\n"
+		"2:\tdecl %0\n\tjns 2b"
+		:"=&a" (d0)
+		:"0" (loops));
 }
 
 inline void __const_udelay(unsigned long xloops)
diff --git a/include/asm-i386/timeofday.h b/include/asm-i386/timeofday.h
new file mode 100644
--- /dev/null
+++ b/include/asm-i386/timeofday.h
@@ -0,0 +1,4 @@
+#ifndef _ASM_I386_TIMEOFDAY_H
+#define _ASM_I386_TIMEOFDAY_H
+#include <asm-generic/timeofday.h>
+#endif
diff --git a/include/asm-i386/timer.h b/include/asm-i386/timer.h
--- a/include/asm-i386/timer.h
+++ b/include/asm-i386/timer.h
@@ -2,66 +2,11 @@
 #define _ASMi386_TIMER_H
 #include <linux/init.h>
 
-/**
- * struct timer_ops - used to define a timer source
- *
- * @name: name of the timer.
- * @init: Probes and initializes the timer. Takes clock= override 
- *        string as an argument. Returns 0 on success, anything else
- *        on failure.
- * @mark_offset: called by the timer interrupt.
- * @get_offset:  called by gettimeofday(). Returns the number of microseconds
- *               since the last timer interupt.
- * @monotonic_clock: returns the number of nanoseconds since the init of the
- *                   timer.
- * @delay: delays this many clock cycles.
- */
-struct timer_opts {
-	char* name;
-	void (*mark_offset)(void);
-	unsigned long (*get_offset)(void);
-	unsigned long long (*monotonic_clock)(void);
-	void (*delay)(unsigned long);
-	unsigned long (*read_timer)(void);
-};
-
-struct init_timer_opts {
-	int (*init)(char *override);
-	struct timer_opts *opts;
-};
-
 #define TICK_SIZE (tick_nsec / 1000)
-
-extern struct timer_opts* __init select_timer(void);
-extern void clock_fallback(void);
 void setup_pit_timer(void);
-
 /* Modifiers for buggy PIT handling */
-
 extern int pit_latch_buggy;
-
-extern struct timer_opts *cur_timer;
 extern int timer_ack;
-
-/* list of externed timers */
-extern struct timer_opts timer_none;
-extern struct timer_opts timer_pit;
-extern struct init_timer_opts timer_pit_init;
-extern struct init_timer_opts timer_tsc_init;
-#ifdef CONFIG_X86_CYCLONE_TIMER
-extern struct init_timer_opts timer_cyclone_init;
-#endif
-
-extern unsigned long calibrate_tsc(void);
-extern unsigned long read_timer_tsc(void);
-extern void init_cpu_khz(void);
 extern int recalibrate_cpu_khz(void);
-#ifdef CONFIG_HPET_TIMER
-extern struct init_timer_opts timer_hpet_init;
-extern unsigned long calibrate_tsc_hpet(unsigned long *tsc_hpet_quotient_ptr);
-#endif
 
-#ifdef CONFIG_X86_PM_TIMER
-extern struct init_timer_opts timer_pmtmr_init;
-#endif
 #endif



^ permalink raw reply	[flat|nested] 71+ messages in thread

* [PATCH 8/9] generic timekeeping i386 arch specific changes, part 6
  2005-08-11  2:23               ` [PATCH 7/9] generic timekeeping i386 arch specific changes, part 5 john stultz
@ 2005-08-11  2:24                 ` john stultz
  2005-08-11  2:25                   ` [PATCH 9/9] generic timekeeping i386 specific timesources john stultz
  0 siblings, 1 reply; 71+ messages in thread
From: john stultz @ 2005-08-11  2:24 UTC (permalink / raw)
  To: lkml
  Cc: George Anzinger, frank, Anton Blanchard, benh,
	Nishanth Aravamudan, Roman Zippel, Ulrich Windl

All,
	The conversion of i386 to use the generic timeofday subsystem has been
split into 6 parts. This patch, the final of four, removes the old
timers/timer_opts infrastructure. 

It applies on top of my timeofday-arch-i386-part5 patch. This patch is
the last in the the timeofday-arch-i386 patchset, so you should be able
to build and boot a kernel after it has been applied. 

Note that this patch does not provide any i386 timesources, so you will
only have the jiffies timesource. To get full replacements for the code
being removed here, the following timeofday-timesources-i386 patch will
need to be applied.

thanks
-john

linux-2.6.13-rc6_timeofday-arch-i386-part6_B5.patch
============================================
diff --git a/arch/i386/kernel/timers/Makefile b/arch/i386/kernel/timers/Makefile
deleted file mode 100644
--- a/arch/i386/kernel/timers/Makefile
+++ /dev/null
@@ -1,9 +0,0 @@
-#
-# Makefile for x86 timers
-#
-
-obj-y := timer.o timer_none.o timer_tsc.o timer_pit.o common.o
-
-obj-$(CONFIG_X86_CYCLONE_TIMER)	+= timer_cyclone.o
-obj-$(CONFIG_HPET_TIMER)	+= timer_hpet.o
-obj-$(CONFIG_X86_PM_TIMER)	+= timer_pm.o
diff --git a/arch/i386/kernel/timers/common.c b/arch/i386/kernel/timers/common.c
deleted file mode 100644
--- a/arch/i386/kernel/timers/common.c
+++ /dev/null
@@ -1,88 +0,0 @@
-/*
- *	Common functions used across the timers go here
- */
-
-#include <linux/init.h>
-#include <linux/timex.h>
-#include <linux/errno.h>
-#include <linux/jiffies.h>
-#include <linux/module.h>
-
-#include <asm/io.h>
-#include <asm/timer.h>
-#include <asm/hpet.h>
-
-#include "mach_timer.h"
-
-#ifdef CONFIG_HPET_TIMER
-/* ------ Calibrate the TSC using HPET -------
- * Return 2^32 * (1 / (TSC clocks per usec)) for getting the CPU freq.
- * Second output is parameter 1 (when non NULL)
- * Set 2^32 * (1 / (tsc per HPET clk)) for delay_hpet().
- * calibrate_tsc() calibrates the processor TSC by comparing
- * it to the HPET timer of known frequency.
- * Too much 64-bit arithmetic here to do this cleanly in C
- */
-#define CALIBRATE_CNT_HPET 	(5 * hpet_tick)
-#define CALIBRATE_TIME_HPET 	(5 * KERNEL_TICK_USEC)
-
-unsigned long __devinit calibrate_tsc_hpet(unsigned long *tsc_hpet_quotient_ptr)
-{
-	unsigned long tsc_startlow, tsc_starthigh;
-	unsigned long tsc_endlow, tsc_endhigh;
-	unsigned long hpet_start, hpet_end;
-	unsigned long result, remain;
-
-	hpet_start = hpet_readl(HPET_COUNTER);
-	rdtsc(tsc_startlow, tsc_starthigh);
-	do {
-		hpet_end = hpet_readl(HPET_COUNTER);
-	} while ((hpet_end - hpet_start) < CALIBRATE_CNT_HPET);
-	rdtsc(tsc_endlow, tsc_endhigh);
-
-	/* 64-bit subtract - gcc just messes up with long longs */
-	__asm__("subl %2,%0\n\t"
-		"sbbl %3,%1"
-		:"=a" (tsc_endlow), "=d" (tsc_endhigh)
-		:"g" (tsc_startlow), "g" (tsc_starthigh),
-		 "0" (tsc_endlow), "1" (tsc_endhigh));
-
-	/* Error: ECPUTOOFAST */
-	if (tsc_endhigh)
-		goto bad_calibration;
-
-	/* Error: ECPUTOOSLOW */
-	if (tsc_endlow <= CALIBRATE_TIME_HPET)
-		goto bad_calibration;
-
-	ASM_DIV64_REG(result, remain, tsc_endlow, 0, CALIBRATE_TIME_HPET);
-	if (remain > (tsc_endlow >> 1))
-		result++; /* rounding the result */
-
-	if (tsc_hpet_quotient_ptr) {
-		unsigned long tsc_hpet_quotient;
-
-		ASM_DIV64_REG(tsc_hpet_quotient, remain, tsc_endlow, 0,
-			CALIBRATE_CNT_HPET);
-		if (remain > (tsc_endlow >> 1))
-			tsc_hpet_quotient++; /* rounding the result */
-		*tsc_hpet_quotient_ptr = tsc_hpet_quotient;
-	}
-
-	return result;
-bad_calibration:
-	/*
-	 * the CPU was so fast/slow that the quotient wouldn't fit in
-	 * 32 bits..
-	 */
-	return 0;
-}
-#endif
-
-
-unsigned long read_timer_tsc(void)
-{
-	unsigned long retval;
-	rdtscl(retval);
-	return retval;
-}
diff --git a/arch/i386/kernel/timers/timer.c b/arch/i386/kernel/timers/timer.c
deleted file mode 100644
--- a/arch/i386/kernel/timers/timer.c
+++ /dev/null
@@ -1,75 +0,0 @@
-#include <linux/init.h>
-#include <linux/kernel.h>
-#include <linux/string.h>
-#include <asm/timer.h>
-
-#ifdef CONFIG_HPET_TIMER
-/*
- * HPET memory read is slower than tsc reads, but is more dependable as it
- * always runs at constant frequency and reduces complexity due to
- * cpufreq. So, we prefer HPET timer to tsc based one. Also, we cannot use
- * timer_pit when HPET is active. So, we default to timer_tsc.
- */
-#endif
-/* list of timers, ordered by preference, NULL terminated */
-static struct init_timer_opts* __initdata timers[] = {
-#ifdef CONFIG_X86_CYCLONE_TIMER
-	&timer_cyclone_init,
-#endif
-#ifdef CONFIG_HPET_TIMER
-	&timer_hpet_init,
-#endif
-#ifdef CONFIG_X86_PM_TIMER
-	&timer_pmtmr_init,
-#endif
-	&timer_tsc_init,
-	&timer_pit_init,
-	NULL,
-};
-
-static char clock_override[10] __initdata;
-
-static int __init clock_setup(char* str)
-{
-	if (str)
-		strlcpy(clock_override, str, sizeof(clock_override));
-	return 1;
-}
-__setup("clock=", clock_setup);
-
-
-/* The chosen timesource has been found to be bad.
- * Fall back to a known good timesource (the PIT)
- */
-void clock_fallback(void)
-{
-	cur_timer = &timer_pit;
-}
-
-/* iterates through the list of timers, returning the first 
- * one that initializes successfully.
- */
-struct timer_opts* __init select_timer(void)
-{
-	int i = 0;
-	
-	/* find most preferred working timer */
-	while (timers[i]) {
-		if (timers[i]->init)
-			if (timers[i]->init(clock_override) == 0)
-				return timers[i]->opts;
-		++i;
-	}
-		
-	panic("select_timer: Cannot find a suitable timer\n");
-	return NULL;
-}
-
-int read_current_timer(unsigned long *timer_val)
-{
-	if (cur_timer->read_timer) {
-		*timer_val = cur_timer->read_timer();
-		return 0;
-	}
-	return -1;
-}
diff --git a/arch/i386/kernel/timers/timer_cyclone.c b/arch/i386/kernel/timers/timer_cyclone.c
deleted file mode 100644
--- a/arch/i386/kernel/timers/timer_cyclone.c
+++ /dev/null
@@ -1,259 +0,0 @@
-/*	Cyclone-timer: 
- *		This code implements timer_ops for the cyclone counter found
- *		on IBM x440, x360, and other Summit based systems.
- *
- *	Copyright (C) 2002 IBM, John Stultz (johnstul@us.ibm.com)
- */
-
-
-#include <linux/spinlock.h>
-#include <linux/init.h>
-#include <linux/timex.h>
-#include <linux/errno.h>
-#include <linux/string.h>
-#include <linux/jiffies.h>
-
-#include <asm/timer.h>
-#include <asm/io.h>
-#include <asm/pgtable.h>
-#include <asm/fixmap.h>
-#include <asm/i8253.h>
-
-#include "io_ports.h"
-
-/* Number of usecs that the last interrupt was delayed */
-static int delay_at_last_interrupt;
-
-#define CYCLONE_CBAR_ADDR 0xFEB00CD0
-#define CYCLONE_PMCC_OFFSET 0x51A0
-#define CYCLONE_MPMC_OFFSET 0x51D0
-#define CYCLONE_MPCS_OFFSET 0x51A8
-#define CYCLONE_TIMER_FREQ 100000000
-#define CYCLONE_TIMER_MASK (((u64)1<<40)-1) /* 40 bit mask */
-int use_cyclone = 0;
-
-static u32* volatile cyclone_timer;	/* Cyclone MPMC0 register */
-static u32 last_cyclone_low;
-static u32 last_cyclone_high;
-static unsigned long long monotonic_base;
-static seqlock_t monotonic_lock = SEQLOCK_UNLOCKED;
-
-/* helper macro to atomically read both cyclone counter registers */
-#define read_cyclone_counter(low,high) \
-	do{ \
-		high = cyclone_timer[1]; low = cyclone_timer[0]; \
-	} while (high != cyclone_timer[1]);
-
-
-static void mark_offset_cyclone(void)
-{
-	unsigned long lost, delay;
-	unsigned long delta = last_cyclone_low;
-	int count;
-	unsigned long long this_offset, last_offset;
-
-	write_seqlock(&monotonic_lock);
-	last_offset = ((unsigned long long)last_cyclone_high<<32)|last_cyclone_low;
-	
-	spin_lock(&i8253_lock);
-	read_cyclone_counter(last_cyclone_low,last_cyclone_high);
-
-	/* read values for delay_at_last_interrupt */
-	outb_p(0x00, 0x43);     /* latch the count ASAP */
-
-	count = inb_p(0x40);    /* read the latched count */
-	count |= inb(0x40) << 8;
-
-	/*
-	 * VIA686a test code... reset the latch if count > max + 1
-	 * from timer_pit.c - cjb
-	 */
-	if (count > LATCH) {
-		outb_p(0x34, PIT_MODE);
-		outb_p(LATCH & 0xff, PIT_CH0);
-		outb(LATCH >> 8, PIT_CH0);
-		count = LATCH - 1;
-	}
-	spin_unlock(&i8253_lock);
-
-	/* lost tick compensation */
-	delta = last_cyclone_low - delta;	
-	delta /= (CYCLONE_TIMER_FREQ/1000000);
-	delta += delay_at_last_interrupt;
-	lost = delta/(1000000/HZ);
-	delay = delta%(1000000/HZ);
-	if (lost >= 2)
-		jiffies_64 += lost-1;
-	
-	/* update the monotonic base value */
-	this_offset = ((unsigned long long)last_cyclone_high<<32)|last_cyclone_low;
-	monotonic_base += (this_offset - last_offset) & CYCLONE_TIMER_MASK;
-	write_sequnlock(&monotonic_lock);
-
-	/* calculate delay_at_last_interrupt */
-	count = ((LATCH-1) - count) * TICK_SIZE;
-	delay_at_last_interrupt = (count + LATCH/2) / LATCH;
-
-
-	/* catch corner case where tick rollover occured 
-	 * between cyclone and pit reads (as noted when 
-	 * usec delta is > 90% # of usecs/tick)
-	 */
-	if (lost && abs(delay - delay_at_last_interrupt) > (900000/HZ))
-		jiffies_64++;
-}
-
-static unsigned long get_offset_cyclone(void)
-{
-	u32 offset;
-
-	if(!cyclone_timer)
-		return delay_at_last_interrupt;
-
-	/* Read the cyclone timer */
-	offset = cyclone_timer[0];
-
-	/* .. relative to previous jiffy */
-	offset = offset - last_cyclone_low;
-
-	/* convert cyclone ticks to microseconds */	
-	/* XXX slow, can we speed this up? */
-	offset = offset/(CYCLONE_TIMER_FREQ/1000000);
-
-	/* our adjusted time offset in microseconds */
-	return delay_at_last_interrupt + offset;
-}
-
-static unsigned long long monotonic_clock_cyclone(void)
-{
-	u32 now_low, now_high;
-	unsigned long long last_offset, this_offset, base;
-	unsigned long long ret;
-	unsigned seq;
-
-	/* atomically read monotonic base & last_offset */
-	do {
-		seq = read_seqbegin(&monotonic_lock);
-		last_offset = ((unsigned long long)last_cyclone_high<<32)|last_cyclone_low;
-		base = monotonic_base;
-	} while (read_seqretry(&monotonic_lock, seq));
-
-
-	/* Read the cyclone counter */
-	read_cyclone_counter(now_low,now_high);
-	this_offset = ((unsigned long long)now_high<<32)|now_low;
-
-	/* convert to nanoseconds */
-	ret = base + ((this_offset - last_offset)&CYCLONE_TIMER_MASK);
-	return ret * (1000000000 / CYCLONE_TIMER_FREQ);
-}
-
-static int __init init_cyclone(char* override)
-{
-	u32* reg;	
-	u32 base;		/* saved cyclone base address */
-	u32 pageaddr;	/* page that contains cyclone_timer register */
-	u32 offset;		/* offset from pageaddr to cyclone_timer register */
-	int i;
-	
-	/* check clock override */
-	if (override[0] && strncmp(override,"cyclone",7))
-			return -ENODEV;
-
-	/*make sure we're on a summit box*/
-	if(!use_cyclone) return -ENODEV; 
-	
-	printk(KERN_INFO "Summit chipset: Starting Cyclone Counter.\n");
-
-	/* find base address */
-	pageaddr = (CYCLONE_CBAR_ADDR)&PAGE_MASK;
-	offset = (CYCLONE_CBAR_ADDR)&(~PAGE_MASK);
-	set_fixmap_nocache(FIX_CYCLONE_TIMER, pageaddr);
-	reg = (u32*)(fix_to_virt(FIX_CYCLONE_TIMER) + offset);
-	if(!reg){
-		printk(KERN_ERR "Summit chipset: Could not find valid CBAR register.\n");
-		return -ENODEV;
-	}
-	base = *reg;	
-	if(!base){
-		printk(KERN_ERR "Summit chipset: Could not find valid CBAR value.\n");
-		return -ENODEV;
-	}
-	
-	/* setup PMCC */
-	pageaddr = (base + CYCLONE_PMCC_OFFSET)&PAGE_MASK;
-	offset = (base + CYCLONE_PMCC_OFFSET)&(~PAGE_MASK);
-	set_fixmap_nocache(FIX_CYCLONE_TIMER, pageaddr);
-	reg = (u32*)(fix_to_virt(FIX_CYCLONE_TIMER) + offset);
-	if(!reg){
-		printk(KERN_ERR "Summit chipset: Could not find valid PMCC register.\n");
-		return -ENODEV;
-	}
-	reg[0] = 0x00000001;
-
-	/* setup MPCS */
-	pageaddr = (base + CYCLONE_MPCS_OFFSET)&PAGE_MASK;
-	offset = (base + CYCLONE_MPCS_OFFSET)&(~PAGE_MASK);
-	set_fixmap_nocache(FIX_CYCLONE_TIMER, pageaddr);
-	reg = (u32*)(fix_to_virt(FIX_CYCLONE_TIMER) + offset);
-	if(!reg){
-		printk(KERN_ERR "Summit chipset: Could not find valid MPCS register.\n");
-		return -ENODEV;
-	}
-	reg[0] = 0x00000001;
-
-	/* map in cyclone_timer */
-	pageaddr = (base + CYCLONE_MPMC_OFFSET)&PAGE_MASK;
-	offset = (base + CYCLONE_MPMC_OFFSET)&(~PAGE_MASK);
-	set_fixmap_nocache(FIX_CYCLONE_TIMER, pageaddr);
-	cyclone_timer = (u32*)(fix_to_virt(FIX_CYCLONE_TIMER) + offset);
-	if(!cyclone_timer){
-		printk(KERN_ERR "Summit chipset: Could not find valid MPMC register.\n");
-		return -ENODEV;
-	}
-
-	/*quick test to make sure its ticking*/
-	for(i=0; i<3; i++){
-		u32 old = cyclone_timer[0];
-		int stall = 100;
-		while(stall--) barrier();
-		if(cyclone_timer[0] == old){
-			printk(KERN_ERR "Summit chipset: Counter not counting! DISABLED\n");
-			cyclone_timer = 0;
-			return -ENODEV;
-		}
-	}
-
-	init_cpu_khz();
-
-	/* Everything looks good! */
-	return 0;
-}
-
-
-static void delay_cyclone(unsigned long loops)
-{
-	unsigned long bclock, now;
-	if(!cyclone_timer)
-		return;
-	bclock = cyclone_timer[0];
-	do {
-		rep_nop();
-		now = cyclone_timer[0];
-	} while ((now-bclock) < loops);
-}
-/************************************************************/
-
-/* cyclone timer_opts struct */
-static struct timer_opts timer_cyclone = {
-	.name = "cyclone",
-	.mark_offset = mark_offset_cyclone, 
-	.get_offset = get_offset_cyclone,
-	.monotonic_clock =	monotonic_clock_cyclone,
-	.delay = delay_cyclone,
-};
-
-struct init_timer_opts __initdata timer_cyclone_init = {
-	.init = init_cyclone,
-	.opts = &timer_cyclone,
-};
diff --git a/arch/i386/kernel/timers/timer_hpet.c b/arch/i386/kernel/timers/timer_hpet.c
deleted file mode 100644
--- a/arch/i386/kernel/timers/timer_hpet.c
+++ /dev/null
@@ -1,195 +0,0 @@
-/*
- * This code largely moved from arch/i386/kernel/time.c.
- * See comments there for proper credits.
- */
-
-#include <linux/spinlock.h>
-#include <linux/init.h>
-#include <linux/timex.h>
-#include <linux/errno.h>
-#include <linux/string.h>
-#include <linux/jiffies.h>
-
-#include <asm/timer.h>
-#include <asm/io.h>
-#include <asm/processor.h>
-
-#include "io_ports.h"
-#include "mach_timer.h"
-#include <asm/hpet.h>
-
-static unsigned long __read_mostly hpet_usec_quotient;	/* convert hpet clks to usec */
-static unsigned long tsc_hpet_quotient;		/* convert tsc to hpet clks */
-static unsigned long hpet_last; 	/* hpet counter value at last tick*/
-static unsigned long last_tsc_low;	/* lsb 32 bits of Time Stamp Counter */
-static unsigned long last_tsc_high; 	/* msb 32 bits of Time Stamp Counter */
-static unsigned long long monotonic_base;
-static seqlock_t monotonic_lock = SEQLOCK_UNLOCKED;
-
-/* convert from cycles(64bits) => nanoseconds (64bits)
- *  basic equation:
- *		ns = cycles / (freq / ns_per_sec)
- *		ns = cycles * (ns_per_sec / freq)
- *		ns = cycles * (10^9 / (cpu_mhz * 10^6))
- *		ns = cycles * (10^3 / cpu_mhz)
- *
- *	Then we use scaling math (suggested by george@mvista.com) to get:
- *		ns = cycles * (10^3 * SC / cpu_mhz) / SC
- *		ns = cycles * cyc2ns_scale / SC
- *
- *	And since SC is a constant power of two, we can convert the div
- *  into a shift.
- *			-johnstul@us.ibm.com "math is hard, lets go shopping!"
- */
-static unsigned long cyc2ns_scale;
-#define CYC2NS_SCALE_FACTOR 10 /* 2^10, carefully chosen */
-
-static inline void set_cyc2ns_scale(unsigned long cpu_mhz)
-{
-	cyc2ns_scale = (1000 << CYC2NS_SCALE_FACTOR)/cpu_mhz;
-}
-
-static inline unsigned long long cycles_2_ns(unsigned long long cyc)
-{
-	return (cyc * cyc2ns_scale) >> CYC2NS_SCALE_FACTOR;
-}
-
-static unsigned long long monotonic_clock_hpet(void)
-{
-	unsigned long long last_offset, this_offset, base;
-	unsigned seq;
-
-	/* atomically read monotonic base & last_offset */
-	do {
-		seq = read_seqbegin(&monotonic_lock);
-		last_offset = ((unsigned long long)last_tsc_high<<32)|last_tsc_low;
-		base = monotonic_base;
-	} while (read_seqretry(&monotonic_lock, seq));
-
-	/* Read the Time Stamp Counter */
-	rdtscll(this_offset);
-
-	/* return the value in ns */
-	return base + cycles_2_ns(this_offset - last_offset);
-}
-
-static unsigned long get_offset_hpet(void)
-{
-	register unsigned long eax, edx;
-
-	eax = hpet_readl(HPET_COUNTER);
-	eax -= hpet_last;	/* hpet delta */
-	eax = min(hpet_tick, eax);
-	/*
-         * Time offset = (hpet delta) * ( usecs per HPET clock )
-	 *             = (hpet delta) * ( usecs per tick / HPET clocks per tick)
-	 *             = (hpet delta) * ( hpet_usec_quotient ) / (2^32)
-	 *
-	 * Where,
-	 * hpet_usec_quotient = (2^32 * usecs per tick)/HPET clocks per tick
-	 *
-	 * Using a mull instead of a divl saves some cycles in critical path.
-         */
-	ASM_MUL64_REG(eax, edx, hpet_usec_quotient, eax);
-
-	/* our adjusted time offset in microseconds */
-	return edx;
-}
-
-static void mark_offset_hpet(void)
-{
-	unsigned long long this_offset, last_offset;
-	unsigned long offset;
-
-	write_seqlock(&monotonic_lock);
-	last_offset = ((unsigned long long)last_tsc_high<<32)|last_tsc_low;
-	rdtsc(last_tsc_low, last_tsc_high);
-
-	if (hpet_use_timer)
-		offset = hpet_readl(HPET_T0_CMP) - hpet_tick;
-	else
-		offset = hpet_readl(HPET_COUNTER);
-	if (unlikely(((offset - hpet_last) >= (2*hpet_tick)) && (hpet_last != 0))) {
-		int lost_ticks = ((offset - hpet_last) / hpet_tick) - 1;
-		jiffies_64 += lost_ticks;
-	}
-	hpet_last = offset;
-
-	/* update the monotonic base value */
-	this_offset = ((unsigned long long)last_tsc_high<<32)|last_tsc_low;
-	monotonic_base += cycles_2_ns(this_offset - last_offset);
-	write_sequnlock(&monotonic_lock);
-}
-
-static void delay_hpet(unsigned long loops)
-{
-	unsigned long hpet_start, hpet_end;
-	unsigned long eax;
-
-	/* loops is the number of cpu cycles. Convert it to hpet clocks */
-	ASM_MUL64_REG(eax, loops, tsc_hpet_quotient, loops);
-
-	hpet_start = hpet_readl(HPET_COUNTER);
-	do {
-		rep_nop();
-		hpet_end = hpet_readl(HPET_COUNTER);
-	} while ((hpet_end - hpet_start) < (loops));
-}
-
-static int __init init_hpet(char* override)
-{
-	unsigned long result, remain;
-
-	/* check clock override */
-	if (override[0] && strncmp(override,"hpet",4))
-		return -ENODEV;
-
-	if (!is_hpet_enabled())
-		return -ENODEV;
-
-	printk("Using HPET for gettimeofday\n");
-	if (cpu_has_tsc) {
-		unsigned long tsc_quotient = calibrate_tsc_hpet(&tsc_hpet_quotient);
-		if (tsc_quotient) {
-			/* report CPU clock rate in Hz.
-			 * The formula is (10^6 * 2^32) / (2^32 * 1 / (clocks/us)) =
-			 * clock/second. Our precision is about 100 ppm.
-			 */
-			{	unsigned long eax=0, edx=1000;
-				ASM_DIV64_REG(cpu_khz, edx, tsc_quotient,
-						eax, edx);
-				printk("Detected %u.%03u MHz processor.\n",
-					cpu_khz / 1000, cpu_khz % 1000);
-			}
-			set_cyc2ns_scale(cpu_khz/1000);
-		}
-	}
-
-	/*
-	 * Math to calculate hpet to usec multiplier
-	 * Look for the comments at get_offset_hpet()
-	 */
-	ASM_DIV64_REG(result, remain, hpet_tick, 0, KERNEL_TICK_USEC);
-	if (remain > (hpet_tick >> 1))
-		result++; /* rounding the result */
-	hpet_usec_quotient = result;
-
-	return 0;
-}
-
-/************************************************************/
-
-/* tsc timer_opts struct */
-static struct timer_opts timer_hpet __read_mostly = {
-	.name = 		"hpet",
-	.mark_offset =		mark_offset_hpet,
-	.get_offset =		get_offset_hpet,
-	.monotonic_clock =	monotonic_clock_hpet,
-	.delay = 		delay_hpet,
-	.read_timer = 		read_timer_tsc,
-};
-
-struct init_timer_opts __initdata timer_hpet_init = {
-	.init =	init_hpet,
-	.opts = &timer_hpet,
-};
diff --git a/arch/i386/kernel/timers/timer_none.c b/arch/i386/kernel/timers/timer_none.c
deleted file mode 100644
--- a/arch/i386/kernel/timers/timer_none.c
+++ /dev/null
@@ -1,39 +0,0 @@
-#include <linux/init.h>
-#include <asm/timer.h>
-
-static void mark_offset_none(void)
-{
-	/* nothing needed */
-}
-
-static unsigned long get_offset_none(void)
-{
-	return 0;
-}
-
-static unsigned long long monotonic_clock_none(void)
-{
-	return 0;
-}
-
-static void delay_none(unsigned long loops)
-{
-	int d0;
-	__asm__ __volatile__(
-		"\tjmp 1f\n"
-		".align 16\n"
-		"1:\tjmp 2f\n"
-		".align 16\n"
-		"2:\tdecl %0\n\tjns 2b"
-		:"=&a" (d0)
-		:"0" (loops));
-}
-
-/* none timer_opts struct */
-struct timer_opts timer_none = {
-	.name = 	"none",
-	.mark_offset =	mark_offset_none, 
-	.get_offset =	get_offset_none,
-	.monotonic_clock =	monotonic_clock_none,
-	.delay = delay_none,
-};
diff --git a/arch/i386/kernel/timers/timer_pit.c b/arch/i386/kernel/timers/timer_pit.c
deleted file mode 100644
--- a/arch/i386/kernel/timers/timer_pit.c
+++ /dev/null
@@ -1,164 +0,0 @@
-/*
- * This code largely moved from arch/i386/kernel/time.c.
- * See comments there for proper credits.
- */
-
-#include <linux/spinlock.h>
-#include <linux/module.h>
-#include <linux/device.h>
-#include <linux/irq.h>
-#include <linux/sysdev.h>
-#include <linux/timex.h>
-#include <asm/delay.h>
-#include <asm/mpspec.h>
-#include <asm/timer.h>
-#include <asm/smp.h>
-#include <asm/io.h>
-#include <asm/arch_hooks.h>
-#include <asm/i8253.h>
-
-#include "do_timer.h"
-#include "io_ports.h"
-
-static int count_p; /* counter in get_offset_pit() */
-
-static int __init init_pit(char* override)
-{
- 	/* check clock override */
- 	if (override[0] && strncmp(override,"pit",3))
- 		printk(KERN_ERR "Warning: clock= override failed. Defaulting to PIT\n");
- 
-	count_p = LATCH;
-	return 0;
-}
-
-static void mark_offset_pit(void)
-{
-	/* nothing needed */
-}
-
-static unsigned long long monotonic_clock_pit(void)
-{
-	return 0;
-}
-
-static void delay_pit(unsigned long loops)
-{
-	int d0;
-	__asm__ __volatile__(
-		"\tjmp 1f\n"
-		".align 16\n"
-		"1:\tjmp 2f\n"
-		".align 16\n"
-		"2:\tdecl %0\n\tjns 2b"
-		:"=&a" (d0)
-		:"0" (loops));
-}
-
-
-/* This function must be called with xtime_lock held.
- * It was inspired by Steve McCanne's microtime-i386 for BSD.  -- jrs
- * 
- * However, the pc-audio speaker driver changes the divisor so that
- * it gets interrupted rather more often - it loads 64 into the
- * counter rather than 11932! This has an adverse impact on
- * do_gettimeoffset() -- it stops working! What is also not
- * good is that the interval that our timer function gets called
- * is no longer 10.0002 ms, but 9.9767 ms. To get around this
- * would require using a different timing source. Maybe someone
- * could use the RTC - I know that this can interrupt at frequencies
- * ranging from 8192Hz to 2Hz. If I had the energy, I'd somehow fix
- * it so that at startup, the timer code in sched.c would select
- * using either the RTC or the 8253 timer. The decision would be
- * based on whether there was any other device around that needed
- * to trample on the 8253. I'd set up the RTC to interrupt at 1024 Hz,
- * and then do some jiggery to have a version of do_timer that 
- * advanced the clock by 1/1024 s. Every time that reached over 1/100
- * of a second, then do all the old code. If the time was kept correct
- * then do_gettimeoffset could just return 0 - there is no low order
- * divider that can be accessed.
- *
- * Ideally, you would be able to use the RTC for the speaker driver,
- * but it appears that the speaker driver really needs interrupt more
- * often than every 120 us or so.
- *
- * Anyway, this needs more thought....		pjsg (1993-08-28)
- * 
- * If you are really that interested, you should be reading
- * comp.protocols.time.ntp!
- */
-
-static unsigned long get_offset_pit(void)
-{
-	int count;
-	unsigned long flags;
-	static unsigned long jiffies_p = 0;
-
-	/*
-	 * cache volatile jiffies temporarily; we have xtime_lock. 
-	 */
-	unsigned long jiffies_t;
-
-	spin_lock_irqsave(&i8253_lock, flags);
-	/* timer count may underflow right here */
-	outb_p(0x00, PIT_MODE);	/* latch the count ASAP */
-
-	count = inb_p(PIT_CH0);	/* read the latched count */
-
-	/*
-	 * We do this guaranteed double memory access instead of a _p 
-	 * postfix in the previous port access. Wheee, hackady hack
-	 */
- 	jiffies_t = jiffies;
-
-	count |= inb_p(PIT_CH0) << 8;
-	
-        /* VIA686a test code... reset the latch if count > max + 1 */
-        if (count > LATCH) {
-                outb_p(0x34, PIT_MODE);
-                outb_p(LATCH & 0xff, PIT_CH0);
-                outb(LATCH >> 8, PIT_CH0);
-                count = LATCH - 1;
-        }
-	
-	/*
-	 * avoiding timer inconsistencies (they are rare, but they happen)...
-	 * there are two kinds of problems that must be avoided here:
-	 *  1. the timer counter underflows
-	 *  2. hardware problem with the timer, not giving us continuous time,
-	 *     the counter does small "jumps" upwards on some Pentium systems,
-	 *     (see c't 95/10 page 335 for Neptun bug.)
-	 */
-
-	if( jiffies_t == jiffies_p ) {
-		if( count > count_p ) {
-			/* the nutcase */
-			count = do_timer_overflow(count);
-		}
-	} else
-		jiffies_p = jiffies_t;
-
-	count_p = count;
-
-	spin_unlock_irqrestore(&i8253_lock, flags);
-
-	count = ((LATCH-1) - count) * TICK_SIZE;
-	count = (count + LATCH/2) / LATCH;
-
-	return count;
-}
-
-
-/* tsc timer_opts struct */
-struct timer_opts timer_pit = {
-	.name = "pit",
-	.mark_offset = mark_offset_pit, 
-	.get_offset = get_offset_pit,
-	.monotonic_clock = monotonic_clock_pit,
-	.delay = delay_pit,
-};
-
-struct init_timer_opts __initdata timer_pit_init = {
-	.init = init_pit, 
-	.opts = &timer_pit,
-};
diff --git a/arch/i386/kernel/timers/timer_pm.c b/arch/i386/kernel/timers/timer_pm.c
deleted file mode 100644
--- a/arch/i386/kernel/timers/timer_pm.c
+++ /dev/null
@@ -1,259 +0,0 @@
-/*
- * (C) Dominik Brodowski <linux@brodo.de> 2003
- *
- * Driver to use the Power Management Timer (PMTMR) available in some
- * southbridges as primary timing source for the Linux kernel.
- *
- * Based on parts of linux/drivers/acpi/hardware/hwtimer.c, timer_pit.c,
- * timer_hpet.c, and on Arjan van de Ven's implementation for 2.4.
- *
- * This file is licensed under the GPL v2.
- */
-
-
-#include <linux/kernel.h>
-#include <linux/module.h>
-#include <linux/device.h>
-#include <linux/init.h>
-#include <asm/types.h>
-#include <asm/timer.h>
-#include <asm/smp.h>
-#include <asm/io.h>
-#include <asm/arch_hooks.h>
-
-#include <linux/timex.h>
-#include "mach_timer.h"
-
-/* Number of PMTMR ticks expected during calibration run */
-#define PMTMR_TICKS_PER_SEC 3579545
-#define PMTMR_EXPECTED_RATE \
-  ((CALIBRATE_LATCH * (PMTMR_TICKS_PER_SEC >> 10)) / (CLOCK_TICK_RATE>>10))
-
-
-/* The I/O port the PMTMR resides at.
- * The location is detected during setup_arch(),
- * in arch/i386/acpi/boot.c */
-u32 pmtmr_ioport = 0;
-
-
-/* value of the Power timer at last timer interrupt */
-static u32 offset_tick;
-static u32 offset_delay;
-
-static unsigned long long monotonic_base;
-static seqlock_t monotonic_lock = SEQLOCK_UNLOCKED;
-
-#define ACPI_PM_MASK 0xFFFFFF /* limit it to 24 bits */
-
-/*helper function to safely read acpi pm timesource*/
-static inline u32 read_pmtmr(void)
-{
-	u32 v1=0,v2=0,v3=0;
-	/* It has been reported that because of various broken
-	 * chipsets (ICH4, PIIX4 and PIIX4E) where the ACPI PM time
-	 * source is not latched, so you must read it multiple
-	 * times to insure a safe value is read.
-	 */
-	do {
-		v1 = inl(pmtmr_ioport);
-		v2 = inl(pmtmr_ioport);
-		v3 = inl(pmtmr_ioport);
-	} while ((v1 > v2 && v1 < v3) || (v2 > v3 && v2 < v1)
-			|| (v3 > v1 && v3 < v2));
-
-	/* mask the output to 24 bits */
-	return v2 & ACPI_PM_MASK;
-}
-
-
-/*
- * Some boards have the PMTMR running way too fast. We check
- * the PMTMR rate against PIT channel 2 to catch these cases.
- */
-static int verify_pmtmr_rate(void)
-{
-	u32 value1, value2;
-	unsigned long count, delta;
-
-	mach_prepare_counter();
-	value1 = read_pmtmr();
-	mach_countup(&count);
-	value2 = read_pmtmr();
-	delta = (value2 - value1) & ACPI_PM_MASK;
-
-	/* Check that the PMTMR delta is within 5% of what we expect */
-	if (delta < (PMTMR_EXPECTED_RATE * 19) / 20 ||
-	    delta > (PMTMR_EXPECTED_RATE * 21) / 20) {
-		printk(KERN_INFO "PM-Timer running at invalid rate: %lu%% of normal - aborting.\n", 100UL * delta / PMTMR_EXPECTED_RATE);
-		return -1;
-	}
-
-	return 0;
-}
-
-
-static int init_pmtmr(char* override)
-{
-	u32 value1, value2;
-	unsigned int i;
-
- 	if (override[0] && strncmp(override,"pmtmr",5))
-		return -ENODEV;
-
-	if (!pmtmr_ioport)
-		return -ENODEV;
-
-	/* we use the TSC for delay_pmtmr, so make sure it exists */
-	if (!cpu_has_tsc)
-		return -ENODEV;
-
-	/* "verify" this timing source */
-	value1 = read_pmtmr();
-	for (i = 0; i < 10000; i++) {
-		value2 = read_pmtmr();
-		if (value2 == value1)
-			continue;
-		if (value2 > value1)
-			goto pm_good;
-		if ((value2 < value1) && ((value2) < 0xFFF))
-			goto pm_good;
-		printk(KERN_INFO "PM-Timer had inconsistent results: 0x%#x, 0x%#x - aborting.\n", value1, value2);
-		return -EINVAL;
-	}
-	printk(KERN_INFO "PM-Timer had no reasonable result: 0x%#x - aborting.\n", value1);
-	return -ENODEV;
-
-pm_good:
-	if (verify_pmtmr_rate() != 0)
-		return -ENODEV;
-
-	init_cpu_khz();
-	return 0;
-}
-
-static inline u32 cyc2us(u32 cycles)
-{
-	/* The Power Management Timer ticks at 3.579545 ticks per microsecond.
-	 * 1 / PM_TIMER_FREQUENCY == 0.27936511 =~ 286/1024 [error: 0.024%]
-	 *
-	 * Even with HZ = 100, delta is at maximum 35796 ticks, so it can
-	 * easily be multiplied with 286 (=0x11E) without having to fear
-	 * u32 overflows.
-	 */
-	cycles *= 286;
-	return (cycles >> 10);
-}
-
-/*
- * this gets called during each timer interrupt
- *   - Called while holding the writer xtime_lock
- */
-static void mark_offset_pmtmr(void)
-{
-	u32 lost, delta, last_offset;
-	static int first_run = 1;
-	last_offset = offset_tick;
-
-	write_seqlock(&monotonic_lock);
-
-	offset_tick = read_pmtmr();
-
-	/* calculate tick interval */
-	delta = (offset_tick - last_offset) & ACPI_PM_MASK;
-
-	/* convert to usecs */
-	delta = cyc2us(delta);
-
-	/* update the monotonic base value */
-	monotonic_base += delta * NSEC_PER_USEC;
-	write_sequnlock(&monotonic_lock);
-
-	/* convert to ticks */
-	delta += offset_delay;
-	lost = delta / (USEC_PER_SEC / HZ);
-	offset_delay = delta % (USEC_PER_SEC / HZ);
-
-
-	/* compensate for lost ticks */
-	if (lost >= 2)
-		jiffies_64 += lost - 1;
-
-	/* don't calculate delay for first run,
-	   or if we've got less then a tick */
-	if (first_run || (lost < 1)) {
-		first_run = 0;
-		offset_delay = 0;
-	}
-}
-
-
-static unsigned long long monotonic_clock_pmtmr(void)
-{
-	u32 last_offset, this_offset;
-	unsigned long long base, ret;
-	unsigned seq;
-
-
-	/* atomically read monotonic base & last_offset */
-	do {
-		seq = read_seqbegin(&monotonic_lock);
-		last_offset = offset_tick;
-		base = monotonic_base;
-	} while (read_seqretry(&monotonic_lock, seq));
-
-	/* Read the pmtmr */
-	this_offset =  read_pmtmr();
-
-	/* convert to nanoseconds */
-	ret = (this_offset - last_offset) & ACPI_PM_MASK;
-	ret = base + (cyc2us(ret) * NSEC_PER_USEC);
-	return ret;
-}
-
-static void delay_pmtmr(unsigned long loops)
-{
-	unsigned long bclock, now;
-
-	rdtscl(bclock);
-	do
-	{
-		rep_nop();
-		rdtscl(now);
-	} while ((now-bclock) < loops);
-}
-
-
-/*
- * get the offset (in microseconds) from the last call to mark_offset()
- *	- Called holding a reader xtime_lock
- */
-static unsigned long get_offset_pmtmr(void)
-{
-	u32 now, offset, delta = 0;
-
-	offset = offset_tick;
-	now = read_pmtmr();
-	delta = (now - offset)&ACPI_PM_MASK;
-
-	return (unsigned long) offset_delay + cyc2us(delta);
-}
-
-
-/* acpi timer_opts struct */
-static struct timer_opts timer_pmtmr = {
-	.name			= "pmtmr",
-	.mark_offset		= mark_offset_pmtmr,
-	.get_offset		= get_offset_pmtmr,
-	.monotonic_clock 	= monotonic_clock_pmtmr,
-	.delay 			= delay_pmtmr,
-	.read_timer 		= read_timer_tsc,
-};
-
-struct init_timer_opts __initdata timer_pmtmr_init = {
-	.init = init_pmtmr,
-	.opts = &timer_pmtmr,
-};
-
-MODULE_LICENSE("GPL");
-MODULE_AUTHOR("Dominik Brodowski <linux@brodo.de>");
-MODULE_DESCRIPTION("Power Management Timer (PMTMR) as primary timing source for x86");
diff --git a/arch/i386/kernel/timers/timer_tsc.c b/arch/i386/kernel/timers/timer_tsc.c
deleted file mode 100644
--- a/arch/i386/kernel/timers/timer_tsc.c
+++ /dev/null
@@ -1,387 +0,0 @@
-/*
- * This code largely moved from arch/i386/kernel/time.c.
- * See comments there for proper credits.
- *
- * 2004-06-25    Jesper Juhl
- *      moved mark_offset_tsc below cpufreq_delayed_get to avoid gcc 3.4
- *      failing to inline.
- */
-
-#include <linux/spinlock.h>
-#include <linux/init.h>
-#include <linux/timex.h>
-#include <linux/errno.h>
-#include <linux/cpufreq.h>
-#include <linux/string.h>
-#include <linux/jiffies.h>
-
-#include <asm/timer.h>
-#include <asm/io.h>
-/* processor.h for distable_tsc flag */
-#include <asm/processor.h>
-
-#include "io_ports.h"
-#include "mach_timer.h"
-
-#include <asm/hpet.h>
-#include <asm/i8253.h>
-
-#ifdef CONFIG_HPET_TIMER
-static unsigned long hpet_usec_quotient;
-static unsigned long hpet_last;
-static struct timer_opts timer_tsc;
-#endif
-
-static int use_tsc;
-/* Number of usecs that the last interrupt was delayed */
-static int delay_at_last_interrupt;
-
-static unsigned long last_tsc_low; /* lsb 32 bits of Time Stamp Counter */
-static unsigned long last_tsc_high; /* msb 32 bits of Time Stamp Counter */
-static unsigned long long monotonic_base;
-static seqlock_t monotonic_lock = SEQLOCK_UNLOCKED;
-
-static int count2; /* counter for mark_offset_tsc() */
-
-/* Cached *multiplier* to convert TSC counts to microseconds.
- * (see the equation below).
- * Equal to 2^32 * (1 / (clocks per usec) ).
- * Initialized in time_init.
- */
-static unsigned long fast_gettimeoffset_quotient;
-
-static unsigned long get_offset_tsc(void)
-{
-	register unsigned long eax, edx;
-
-	/* Read the Time Stamp Counter */
-
-	rdtsc(eax,edx);
-
-	/* .. relative to previous jiffy (32 bits is enough) */
-	eax -= last_tsc_low;	/* tsc_low delta */
-
-	/*
-         * Time offset = (tsc_low delta) * fast_gettimeoffset_quotient
-         *             = (tsc_low delta) * (usecs_per_clock)
-         *             = (tsc_low delta) * (usecs_per_jiffy / clocks_per_jiffy)
-	 *
-	 * Using a mull instead of a divl saves up to 31 clock cycles
-	 * in the critical path.
-         */
-
-	__asm__("mull %2"
-		:"=a" (eax), "=d" (edx)
-		:"rm" (fast_gettimeoffset_quotient),
-		 "0" (eax));
-
-	/* our adjusted time offset in microseconds */
-	return delay_at_last_interrupt + edx;
-}
-
-static unsigned long long monotonic_clock_tsc(void)
-{
-	unsigned long long last_offset, this_offset, base;
-	unsigned seq;
-	
-	/* atomically read monotonic base & last_offset */
-	do {
-		seq = read_seqbegin(&monotonic_lock);
-		last_offset = ((unsigned long long)last_tsc_high<<32)|last_tsc_low;
-		base = monotonic_base;
-	} while (read_seqretry(&monotonic_lock, seq));
-
-	/* Read the Time Stamp Counter */
-	rdtscll(this_offset);
-
-	/* return the value in ns */
-	return base + cycles_2_ns(this_offset - last_offset);
-}
-
-
-static void delay_tsc(unsigned long loops)
-{
-	unsigned long bclock, now;
-	
-	rdtscl(bclock);
-	do
-	{
-		rep_nop();
-		rdtscl(now);
-	} while ((now-bclock) < loops);
-}
-
-#ifdef CONFIG_HPET_TIMER
-static void mark_offset_tsc_hpet(void)
-{
-	unsigned long long this_offset, last_offset;
- 	unsigned long offset, temp, hpet_current;
-
-	write_seqlock(&monotonic_lock);
-	last_offset = ((unsigned long long)last_tsc_high<<32)|last_tsc_low;
-	/*
-	 * It is important that these two operations happen almost at
-	 * the same time. We do the RDTSC stuff first, since it's
-	 * faster. To avoid any inconsistencies, we need interrupts
-	 * disabled locally.
-	 */
-	/*
-	 * Interrupts are just disabled locally since the timer irq
-	 * has the SA_INTERRUPT flag set. -arca
-	 */
-	/* read Pentium cycle counter */
-
-	hpet_current = hpet_readl(HPET_COUNTER);
-	rdtsc(last_tsc_low, last_tsc_high);
-
-	/* lost tick compensation */
-	offset = hpet_readl(HPET_T0_CMP) - hpet_tick;
-	if (unlikely(((offset - hpet_last) > hpet_tick) && (hpet_last != 0))) {
-		int lost_ticks = (offset - hpet_last) / hpet_tick;
-		jiffies_64 += lost_ticks;
-	}
-	hpet_last = hpet_current;
-
-	/* update the monotonic base value */
-	this_offset = ((unsigned long long)last_tsc_high<<32)|last_tsc_low;
-	monotonic_base += cycles_2_ns(this_offset - last_offset);
-	write_sequnlock(&monotonic_lock);
-
-	/* calculate delay_at_last_interrupt */
-	/*
-	 * Time offset = (hpet delta) * ( usecs per HPET clock )
-	 *             = (hpet delta) * ( usecs per tick / HPET clocks per tick)
-	 *             = (hpet delta) * ( hpet_usec_quotient ) / (2^32)
-	 * Where,
-	 * hpet_usec_quotient = (2^32 * usecs per tick)/HPET clocks per tick
-	 */
-	delay_at_last_interrupt = hpet_current - offset;
-	ASM_MUL64_REG(temp, delay_at_last_interrupt,
-			hpet_usec_quotient, delay_at_last_interrupt);
-}
-#endif
-
-
-
-static void mark_offset_tsc(void)
-{
-	unsigned long lost,delay;
-	unsigned long delta = last_tsc_low;
-	int count;
-	int countmp;
-	static int count1 = 0;
-	unsigned long long this_offset, last_offset;
-	static int lost_count = 0;
-
-	write_seqlock(&monotonic_lock);
-	last_offset = ((unsigned long long)last_tsc_high<<32)|last_tsc_low;
-	/*
-	 * It is important that these two operations happen almost at
-	 * the same time. We do the RDTSC stuff first, since it's
-	 * faster. To avoid any inconsistencies, we need interrupts
-	 * disabled locally.
-	 */
-
-	/*
-	 * Interrupts are just disabled locally since the timer irq
-	 * has the SA_INTERRUPT flag set. -arca
-	 */
-
-	/* read Pentium cycle counter */
-
-	rdtsc(last_tsc_low, last_tsc_high);
-
-	spin_lock(&i8253_lock);
-	outb_p(0x00, PIT_MODE);     /* latch the count ASAP */
-
-	count = inb_p(PIT_CH0);    /* read the latched count */
-	count |= inb(PIT_CH0) << 8;
-
-	/*
-	 * VIA686a test code... reset the latch if count > max + 1
-	 * from timer_pit.c - cjb
-	 */
-	if (count > LATCH) {
-		outb_p(0x34, PIT_MODE);
-		outb_p(LATCH & 0xff, PIT_CH0);
-		outb(LATCH >> 8, PIT_CH0);
-		count = LATCH - 1;
-	}
-
-	spin_unlock(&i8253_lock);
-
-	if (pit_latch_buggy) {
-		/* get center value of last 3 time lutch */
-		if ((count2 >= count && count >= count1)
-		    || (count1 >= count && count >= count2)) {
-			count2 = count1; count1 = count;
-		} else if ((count1 >= count2 && count2 >= count)
-			   || (count >= count2 && count2 >= count1)) {
-			countmp = count;count = count2;
-			count2 = count1;count1 = countmp;
-		} else {
-			count2 = count1; count1 = count; count = count1;
-		}
-	}
-
-	/* lost tick compensation */
-	delta = last_tsc_low - delta;
-	{
-		register unsigned long eax, edx;
-		eax = delta;
-		__asm__("mull %2"
-		:"=a" (eax), "=d" (edx)
-		:"rm" (fast_gettimeoffset_quotient),
-		 "0" (eax));
-		delta = edx;
-	}
-	delta += delay_at_last_interrupt;
-	lost = delta/(1000000/HZ);
-	delay = delta%(1000000/HZ);
-	if (lost >= 2) {
-		jiffies_64 += lost-1;
-
-		/* sanity check to ensure we're not always losing ticks */
-		if (lost_count++ > 100) {
-			printk(KERN_WARNING "Losing too many ticks!\n");
-			printk(KERN_WARNING "TSC cannot be used as a timesource.  \n");
-			printk(KERN_WARNING "Possible reasons for this are:\n");
-			printk(KERN_WARNING "  You're running with Speedstep,\n");
-			printk(KERN_WARNING "  You don't have DMA enabled for your hard disk (see hdparm),\n");
-			printk(KERN_WARNING "  Incorrect TSC synchronization on an SMP system (see dmesg).\n");
-			printk(KERN_WARNING "Falling back to a sane timesource now.\n");
-
-			clock_fallback();
-		}
-		/* ... but give the TSC a fair chance */
-		if (lost_count > 25)
-			cpufreq_delayed_get();
-	} else
-		lost_count = 0;
-	/* update the monotonic base value */
-	this_offset = ((unsigned long long)last_tsc_high<<32)|last_tsc_low;
-	monotonic_base += cycles_2_ns(this_offset - last_offset);
-	write_sequnlock(&monotonic_lock);
-
-	/* calculate delay_at_last_interrupt */
-	count = ((LATCH-1) - count) * TICK_SIZE;
-	delay_at_last_interrupt = (count + LATCH/2) / LATCH;
-
-	/* catch corner case where tick rollover occured
-	 * between tsc and pit reads (as noted when
-	 * usec delta is > 90% # of usecs/tick)
-	 */
-	if (lost && abs(delay - delay_at_last_interrupt) > (900000/HZ))
-		jiffies_64++;
-}
-
-static int __init init_tsc(char* override)
-{
-
-	/* check clock override */
-	if (override[0] && strncmp(override,"tsc",3)) {
-#ifdef CONFIG_HPET_TIMER
-		if (is_hpet_enabled()) {
-			printk(KERN_ERR "Warning: clock= override failed. Defaulting to tsc\n");
-		} else
-#endif
-		{
-			return -ENODEV;
-		}
-	}
-
-	/*
-	 * If we have APM enabled or the CPU clock speed is variable
-	 * (CPU stops clock on HLT or slows clock to save power)
-	 * then the TSC timestamps may diverge by up to 1 jiffy from
-	 * 'real time' but nothing will break.
-	 * The most frequent case is that the CPU is "woken" from a halt
-	 * state by the timer interrupt itself, so we get 0 error. In the
-	 * rare cases where a driver would "wake" the CPU and request a
-	 * timestamp, the maximum error is < 1 jiffy. But timestamps are
-	 * still perfectly ordered.
-	 * Note that the TSC counter will be reset if APM suspends
-	 * to disk; this won't break the kernel, though, 'cuz we're
-	 * smart.  See arch/i386/kernel/apm.c.
-	 */
- 	/*
- 	 *	Firstly we have to do a CPU check for chips with
- 	 * 	a potentially buggy TSC. At this point we haven't run
- 	 *	the ident/bugs checks so we must run this hook as it
- 	 *	may turn off the TSC flag.
- 	 *
- 	 *	NOTE: this doesn't yet handle SMP 486 machines where only
- 	 *	some CPU's have a TSC. Thats never worked and nobody has
- 	 *	moaned if you have the only one in the world - you fix it!
- 	 */
-
-	count2 = LATCH; /* initialize counter for mark_offset_tsc() */
-
-	if (cpu_has_tsc) {
-		unsigned long tsc_quotient;
-#ifdef CONFIG_HPET_TIMER
-		if (is_hpet_enabled() && hpet_use_timer) {
-			unsigned long result, remain;
-			printk("Using TSC for gettimeofday\n");
-			tsc_quotient = calibrate_tsc_hpet(NULL);
-			timer_tsc.mark_offset = &mark_offset_tsc_hpet;
-			/*
-			 * Math to calculate hpet to usec multiplier
-			 * Look for the comments at get_offset_tsc_hpet()
-			 */
-			ASM_DIV64_REG(result, remain, hpet_tick,
-					0, KERNEL_TICK_USEC);
-			if (remain > (hpet_tick >> 1))
-				result++; /* rounding the result */
-
-			hpet_usec_quotient = result;
-		} else
-#endif
-		{
-			tsc_quotient = calibrate_tsc();
-		}
-
-		if (tsc_quotient) {
-			fast_gettimeoffset_quotient = tsc_quotient;
-			use_tsc = 1;
-			/*
-			 *	We could be more selective here I suspect
-			 *	and just enable this for the next intel chips ?
-			 */
-			/* report CPU clock rate in Hz.
-			 * The formula is (10^6 * 2^32) / (2^32 * 1 / (clocks/us)) =
-			 * clock/second. Our precision is about 100 ppm.
-			 */
-			{	unsigned long eax=0, edx=1000;
-				__asm__("divl %2"
-		       		:"=a" (cpu_khz), "=d" (edx)
-        	       		:"r" (tsc_quotient),
-	                	"0" (eax), "1" (edx));
-				printk("Detected %u.%03u MHz processor.\n",
-					cpu_khz / 1000, cpu_khz % 1000);
-			}
-			set_cyc2ns_scale(cpu_khz/1000);
-			return 0;
-		}
-	}
-	return -ENODEV;
-}
-
-
-
-/************************************************************/
-
-/* tsc timer_opts struct */
-static struct timer_opts timer_tsc = {
-	.name = "tsc",
-	.mark_offset = mark_offset_tsc, 
-	.get_offset = get_offset_tsc,
-	.monotonic_clock = monotonic_clock_tsc,
-	.delay = delay_tsc,
-	.read_timer = read_timer_tsc,
-};
-
-struct init_timer_opts __initdata timer_tsc_init = {
-	.init = init_tsc,
-	.opts = &timer_tsc,
-};



^ permalink raw reply	[flat|nested] 71+ messages in thread

* [PATCH 9/9] generic timekeeping i386 specific timesources
  2005-08-11  2:24                 ` [PATCH 8/9] generic timekeeping i386 arch specific changes, part 6 john stultz
@ 2005-08-11  2:25                   ` john stultz
  0 siblings, 0 replies; 71+ messages in thread
From: john stultz @ 2005-08-11  2:25 UTC (permalink / raw)
  To: lkml
  Cc: George Anzinger, frank, Anton Blanchard, benh,
	Nishanth Aravamudan, Roman Zippel, Ulrich Windl

All,
	This patch implements the time sources shared between i386 and x86-64
(acpi_pm, cyclone, hpet, pit, tsc and tsc-interp). The patch should
apply on top of the timeofday-arch-i386-part4 patch
	
The patch should be fairly straight forward, only adding the new
timesources.

thanks
-john

linux-2.6.13-rc6_timeofday-timesources-i386_B5.patch
============================================
diff --git a/drivers/timesource/Makefile b/drivers/timesource/Makefile
--- a/drivers/timesource/Makefile
+++ b/drivers/timesource/Makefile
@@ -1 +1,8 @@
 obj-y += jiffies.o
+
+obj-$(CONFIG_X86) += tsc.o
+obj-$(CONFIG_X86) += i386_pit.o
+obj-$(CONFIG_X86) += tsc-interp.o
+obj-$(CONFIG_X86_CYCLONE_TIMER) += cyclone.o
+obj-$(CONFIG_X86_PM_TIMER) += acpi_pm.o
+obj-$(CONFIG_HPET_TIMER) += hpet.o
diff --git a/drivers/timesource/acpi_pm.c b/drivers/timesource/acpi_pm.c
new file mode 100644
--- /dev/null
+++ b/drivers/timesource/acpi_pm.c
@@ -0,0 +1,152 @@
+/*
+ * linux/drivers/timesource/acpi_pm.c
+ *
+ * This file contains the ACPI PM based time source.
+ *
+ * This code was largely moved from the i386 timer_pm.c file
+ * which was (C) Dominik Brodowski <linux@brodo.de> 2003
+ * and contained the following comments:
+ *
+ * Driver to use the Power Management Timer (PMTMR) available in some
+ * southbridges as primary timing source for the Linux kernel.
+ *
+ * Based on parts of linux/drivers/acpi/hardware/hwtimer.c, timer_pit.c,
+ * timer_hpet.c, and on Arjan van de Ven's implementation for 2.4.
+ *
+ * This file is licensed under the GPL v2.
+ */
+
+
+#include <linux/timesource.h>
+#include <linux/errno.h>
+#include <linux/init.h>
+#include <asm/io.h>
+
+/* Number of PMTMR ticks expected during calibration run */
+#define PMTMR_TICKS_PER_SEC 3579545
+
+#if (CONFIG_X86 && (!CONFIG_X86_64))
+#include "mach_timer.h"
+#define PMTMR_EXPECTED_RATE ((PMTMR_TICKS_PER_SEC*CALIBRATE_TIME_MSEC)/1000)
+#endif
+
+/* The I/O port the PMTMR resides at.
+ * The location is detected during setup_arch(),
+ * in arch/i386/acpi/boot.c */
+extern u32 acpi_pmtmr_ioport;
+extern int acpi_pmtmr_buggy;
+
+#define ACPI_PM_MASK 0xFFFFFF /* limit it to 24 bits */
+
+
+static inline u32 read_pmtmr(void)
+{
+	/* mask the output to 24 bits */
+	return inl(acpi_pmtmr_ioport) & ACPI_PM_MASK;
+}
+
+static cycle_t acpi_pm_read_verified(void)
+{
+	u32 v1=0,v2=0,v3=0;
+	/* It has been reported that because of various broken
+	 * chipsets (ICH4, PIIX4 and PIIX4E) where the ACPI PM time
+	 * source is not latched, so you must read it multiple
+	 * times to insure a safe value is read.
+	 */
+	do {
+		v1 = read_pmtmr();
+		v2 = read_pmtmr();
+		v3 = read_pmtmr();
+	} while ((v1 > v2 && v1 < v3) || (v2 > v3 && v2 < v1)
+			|| (v3 > v1 && v3 < v2));
+
+	return (cycle_t)v2;
+}
+
+
+static cycle_t acpi_pm_read(void)
+{
+	return (cycle_t)read_pmtmr();
+}
+
+struct timesource_t timesource_acpi_pm = {
+	.name = "acpi_pm",
+	.priority = 200,
+	.type = TIMESOURCE_FUNCTION,
+	.read_fnct = acpi_pm_read,
+	.mask = (cycle_t)ACPI_PM_MASK,
+	.mult = 0, /*to be caluclated*/
+	.shift = 22,
+};
+
+#if (CONFIG_X86 && (!CONFIG_X86_64))
+/*
+ * Some boards have the PMTMR running way too fast. We check
+ * the PMTMR rate against PIT channel 2 to catch these cases.
+ */
+static int __init verify_pmtmr_rate(void)
+{
+	u32 value1, value2;
+	unsigned long count, delta;
+
+	mach_prepare_counter();
+	value1 = read_pmtmr();
+	mach_countup(&count);
+	value2 = read_pmtmr();
+	delta = (value2 - value1) & ACPI_PM_MASK;
+
+	/* Check that the PMTMR delta is within 5% of what we expect */
+	if (delta < (PMTMR_EXPECTED_RATE * 19) / 20 ||
+	    delta > (PMTMR_EXPECTED_RATE * 21) / 20) {
+		printk(KERN_INFO "PM-Timer running at invalid rate: %lu%% of normal - aborting.\n", 100UL * delta / PMTMR_EXPECTED_RATE);
+		return -1;
+	}
+
+	return 0;
+}
+#else
+#define verify_pmtmr_rate() (0)
+#endif
+
+static int __init init_acpi_pm_timesource(void)
+{
+	u32 value1, value2;
+	unsigned int i;
+
+	if (!acpi_pmtmr_ioport)
+		return -ENODEV;
+
+	timesource_acpi_pm.mult = timesource_hz2mult(PMTMR_TICKS_PER_SEC,
+									timesource_acpi_pm.shift);
+
+	/* "verify" this timing source */
+	value1 = read_pmtmr();
+	for (i = 0; i < 10000; i++) {
+		value2 = read_pmtmr();
+		if (value2 == value1)
+			continue;
+		if (value2 > value1)
+			goto pm_good;
+		if ((value2 < value1) && ((value2) < 0xFFF))
+			goto pm_good;
+		printk(KERN_INFO "PM-Timer had inconsistent results: 0x%#x, 0x%#x - aborting.\n", value1, value2);
+		return -EINVAL;
+	}
+	printk(KERN_INFO "PM-Timer had no reasonable result: 0x%#x - aborting.\n", value1);
+	return -ENODEV;
+
+pm_good:
+	if (verify_pmtmr_rate() != 0)
+		return -ENODEV;
+
+	/* check to see if pmtmr is known buggy */
+	if (acpi_pmtmr_buggy) {
+		timesource_acpi_pm.read_fnct = acpi_pm_read_verified;
+		timesource_acpi_pm.priority = 110;
+	}
+
+	register_timesource(&timesource_acpi_pm);
+	return 0;
+}
+
+module_init(init_acpi_pm_timesource);
diff --git a/drivers/timesource/cyclone.c b/drivers/timesource/cyclone.c
new file mode 100644
--- /dev/null
+++ b/drivers/timesource/cyclone.c
@@ -0,0 +1,138 @@
+#include <linux/timesource.h>
+#include <linux/errno.h>
+#include <linux/string.h>
+#include <linux/timex.h>
+#include <linux/init.h>
+
+#include <asm/io.h>
+#include <asm/pgtable.h>
+#include "mach_timer.h"
+
+#define CYCLONE_CBAR_ADDR 0xFEB00CD0		/* base address ptr*/
+#define CYCLONE_PMCC_OFFSET 0x51A0		/* offset to control register */
+#define CYCLONE_MPCS_OFFSET 0x51A8		/* offset to select register */
+#define CYCLONE_MPMC_OFFSET 0x51D0		/* offset to count register */
+#define CYCLONE_TIMER_FREQ 100000000
+#define CYCLONE_TIMER_MASK (0xFFFFFFFF) /* 32 bit mask */
+
+int use_cyclone = 0;
+
+struct timesource_t timesource_cyclone = {
+	.name = "cyclone",
+	.priority = 250,
+	.type = TIMESOURCE_MMIO_32,
+	.mmio_ptr = NULL, /* to be set */
+	.mask = (cycle_t)CYCLONE_TIMER_MASK,
+	.mult = 10,
+	.shift = 0,
+};
+
+static unsigned long __init calibrate_cyclone(void)
+{
+	u64 delta64;
+	unsigned long start, end;
+	unsigned long i, count;
+	unsigned long cyclone_freq_khz;
+
+	/* repeat 3 times to make sure the cache is warm */
+	for(i=0; i < 3; i++) {
+		mach_prepare_counter();
+		start = readl(timesource_cyclone.mmio_ptr);
+		mach_countup(&count);
+		end = readl(timesource_cyclone.mmio_ptr);
+	}
+
+	delta64 = end - start;
+
+	delta64 += CALIBRATE_TIME_MSEC/2; /* round for do_div */
+	do_div(delta64,CALIBRATE_TIME_MSEC);
+
+	cyclone_freq_khz = (unsigned long)delta64;
+
+	printk("calculated cyclone_freq: %lu khz\n", cyclone_freq_khz);
+	return cyclone_freq_khz;
+}
+
+static int __init init_cyclone_timesource(void)
+{
+	unsigned long base;	/* saved value from CBAR */
+	unsigned long offset;
+	u32 __iomem* reg;
+	u32 __iomem* volatile cyclone_timer;	/* Cyclone MPMC0 register */
+	unsigned long khz;
+	int i;
+
+	/*make sure we're on a summit box*/
+	if (!use_cyclone) return -ENODEV;
+
+	printk(KERN_INFO "Summit chipset: Starting Cyclone Counter.\n");
+
+	/* find base address */
+	offset = CYCLONE_CBAR_ADDR;
+	reg = ioremap_nocache(offset, sizeof(reg));
+	if(!reg){
+		printk(KERN_ERR "Summit chipset: Could not find valid CBAR register.\n");
+		return -ENODEV;
+	}
+	/* even on 64bit systems, this is only 32bits */
+	base = readl(reg);
+	if(!base){
+		printk(KERN_ERR "Summit chipset: Could not find valid CBAR value.\n");
+		return -ENODEV;
+	}
+	iounmap(reg);
+
+	/* setup PMCC */
+	offset = base + CYCLONE_PMCC_OFFSET;
+	reg = ioremap_nocache(offset, sizeof(reg));
+	if(!reg){
+		printk(KERN_ERR "Summit chipset: Could not find valid PMCC register.\n");
+		return -ENODEV;
+	}
+	writel(0x00000001,reg);
+	iounmap(reg);
+
+	/* setup MPCS */
+	offset = base + CYCLONE_MPCS_OFFSET;
+	reg = ioremap_nocache(offset, sizeof(reg));
+	if(!reg){
+		printk(KERN_ERR "Summit chipset: Could not find valid MPCS register.\n");
+		return -ENODEV;
+	}
+	writel(0x00000001,reg);
+	iounmap(reg);
+
+	/* map in cyclone_timer */
+	offset = base + CYCLONE_MPMC_OFFSET;
+	cyclone_timer = ioremap_nocache(offset, sizeof(u64));
+	if(!cyclone_timer){
+		printk(KERN_ERR "Summit chipset: Could not find valid MPMC register.\n");
+		return -ENODEV;
+	}
+
+	/*quick test to make sure its ticking*/
+	for(i=0; i<3; i++){
+		u32 old = readl(cyclone_timer);
+		int stall = 100;
+		while(stall--) barrier();
+		if(readl(cyclone_timer) == old){
+			printk(KERN_ERR "Summit chipset: Counter not counting! DISABLED\n");
+			iounmap(cyclone_timer);
+			cyclone_timer = NULL;
+			return -ENODEV;
+		}
+	}
+	timesource_cyclone.mmio_ptr = cyclone_timer;
+
+	/* sort out mult/shift values */
+	khz = calibrate_cyclone();
+	timesource_cyclone.shift = 22;
+	timesource_cyclone.mult = timesource_khz2mult(khz,
+									timesource_cyclone.shift);
+
+	register_timesource(&timesource_cyclone);
+
+	return 0;
+}
+
+module_init(init_cyclone_timesource);
diff --git a/drivers/timesource/hpet.c b/drivers/timesource/hpet.c
new file mode 100644
--- /dev/null
+++ b/drivers/timesource/hpet.c
@@ -0,0 +1,59 @@
+#include <linux/timesource.h>
+#include <linux/hpet.h>
+#include <linux/errno.h>
+#include <linux/init.h>
+#include <asm/io.h>
+#include <asm/hpet.h>
+
+#define HPET_MASK (0xFFFFFFFF)
+#define HPET_SHIFT 22
+
+/* FSEC = 10^-15 NSEC = 10^-9 */
+#define FSEC_PER_NSEC 1000000
+
+struct timesource_t timesource_hpet = {
+	.name = "hpet",
+	.priority = 250,
+	.type = TIMESOURCE_MMIO_32,
+	.mmio_ptr = NULL,
+	.mask = (cycle_t)HPET_MASK,
+	.mult = 0, /* set below */
+	.shift = HPET_SHIFT,
+};
+
+static int __init init_hpet_timesource(void)
+{
+	unsigned long hpet_period;
+	void __iomem* hpet_base;
+	u64 tmp;
+
+	if (!hpet_address)
+		return -ENODEV;
+
+	/* calculate the hpet address */
+	hpet_base =
+		(void __iomem*)ioremap_nocache(hpet_address, HPET_MMAP_SIZE);
+	timesource_hpet.mmio_ptr = hpet_base + HPET_COUNTER;
+
+	/* calculate the frequency */
+	hpet_period = readl(hpet_base + HPET_PERIOD);
+
+
+	/* hpet period is in femto seconds per cycle
+	 * so we need to convert this to ns/cyc units
+	 * aproximated by mult/2^shift
+	 *
+	 *  fsec/cyc * 1nsec/1000000fsec = nsec/cyc = mult/2^shift
+	 *  fsec/cyc * 1ns/1000000fsec * 2^shift = mult
+	 *  fsec/cyc * 2^shift * 1nsec/1000000fsec = mult
+	 *  (fsec/cyc << shift)/1000000 = mult
+	 *  (hpet_period << shift)/FSEC_PER_NSEC = mult
+	 */
+	tmp = (u64)hpet_period << HPET_SHIFT;
+	do_div(tmp, FSEC_PER_NSEC);
+	timesource_hpet.mult = (u32)tmp;
+
+	register_timesource(&timesource_hpet);
+	return 0;
+}
+module_init(init_hpet_timesource);
diff --git a/drivers/timesource/i386_pit.c b/drivers/timesource/i386_pit.c
new file mode 100644
--- /dev/null
+++ b/drivers/timesource/i386_pit.c
@@ -0,0 +1,64 @@
+#include <linux/timesource.h>
+#include <linux/jiffies.h>
+#include <linux/init.h>
+/* pit timesource does not build on x86-64 */
+#ifndef CONFIG_X86_64
+#include <asm/io.h>
+#include "io_ports.h"
+
+extern spinlock_t i8253_lock;
+
+/* Since the PIT overflows every tick, its not very useful
+ * to just read by itself. So use jiffies to emulate a free
+ * running counter.
+ */
+
+static cycle_t pit_read(void)
+{
+	unsigned long flags, seq;
+	int count;
+	u64 jifs;
+
+	do {
+		seq = read_seqbegin(&xtime_lock);
+
+		spin_lock_irqsave(&i8253_lock, flags);
+
+		outb_p(0x00, PIT_MODE);	/* latch the count ASAP */
+		count = inb_p(PIT_CH0);	/* read the latched count */
+		count |= inb_p(PIT_CH0) << 8;
+
+		/* VIA686a test code... reset the latch if count > max + 1 */
+		if (count > LATCH) {
+			outb_p(0x34, PIT_MODE);
+			outb_p(LATCH & 0xff, PIT_CH0);
+			outb(LATCH >> 8, PIT_CH0);
+			count = LATCH - 1;
+		}
+		spin_unlock_irqrestore(&i8253_lock, flags);
+		jifs = get_jiffies_64() - INITIAL_JIFFIES;
+	} while (read_seqretry(&xtime_lock, seq));
+
+	count = (LATCH-1) - count;
+
+	return (cycle_t)(jifs * LATCH) + count;
+}
+
+static struct timesource_t timesource_pit = {
+	.name = "pit",
+	.priority = 110,
+	.type = TIMESOURCE_FUNCTION,
+	.read_fnct = pit_read,
+	.mask = (cycle_t)-1,
+	.mult = 0,
+	.shift = 20,
+};
+
+static int __init init_pit_timesource(void)
+{
+	timesource_pit.mult = timesource_hz2mult(CLOCK_TICK_RATE, 20);
+	register_timesource(&timesource_pit);
+	return 0;
+}
+module_init(init_pit_timesource);
+#endif
diff --git a/drivers/timesource/tsc-interp.c b/drivers/timesource/tsc-interp.c
new file mode 100644
--- /dev/null
+++ b/drivers/timesource/tsc-interp.c
@@ -0,0 +1,112 @@
+/* TSC-Jiffies Interpolation timesource
+	Example interpolation timesource.
+TODO:
+	o per-cpu TSC offsets
+*/
+#include <linux/timesource.h>
+#include <linux/timer.h>
+#include <linux/timex.h>
+#include <linux/init.h>
+#include <linux/jiffies.h>
+#include <linux/threads.h>
+#include <linux/smp.h>
+
+static unsigned long current_tsc_khz = 0;
+
+static seqlock_t tsc_interp_lock = SEQLOCK_UNLOCKED;
+static unsigned long tsc_then;
+static unsigned long jiffies_then;
+struct timer_list tsc_interp_timer;
+
+static unsigned long mult, shift;
+
+#define NSEC_PER_JIFFY ((((unsigned long long)NSEC_PER_SEC)<<8)/ACTHZ)
+#define SHIFT_VAL 22
+
+static cycle_t read_tsc_interp(void);
+static void tsc_interp_update_callback(void);
+
+static struct timesource_t timesource_tsc_interp = {
+	.name = "tsc-interp",
+	.priority = 150,
+	.type = TIMESOURCE_FUNCTION,
+	.read_fnct = read_tsc_interp,
+	.mask = (cycle_t)((1ULL<<32)-1),
+	.mult = 1<<SHIFT_VAL,
+	.shift = SHIFT_VAL,
+	.update_callback = tsc_interp_update_callback,
+};
+
+static void tsc_interp_sync(unsigned long unused)
+{
+	cycle_t tsc_now;
+	unsigned long jiffies_now;
+
+	do {
+		jiffies_now = jiffies;
+		rdtscll(tsc_now);
+	} while (jiffies_now != jiffies);
+
+	write_seqlock(&tsc_interp_lock);
+	jiffies_then = jiffies_now;
+	tsc_then = tsc_now;
+	write_sequnlock(&tsc_interp_lock);
+
+	mod_timer(&tsc_interp_timer, jiffies+1);
+}
+
+
+static cycle_t read_tsc_interp(void)
+{
+	cycle_t ret;
+	cycle_t now, then;
+	unsigned long jiffs_now, jiffs_then;
+	unsigned long seq;
+
+	do {
+		seq = read_seqbegin(&tsc_interp_lock);
+
+		jiffs_now = jiffies;
+		jiffs_then = jiffies_then;
+		then = tsc_then;
+
+	} while (read_seqretry(&tsc_interp_lock, seq));
+
+	rdtscll(now);
+	ret = (cycle_t)jiffs_then * NSEC_PER_JIFFY;
+	if (jiffs_then == jiffs_now)
+		ret += min((cycle_t)NSEC_PER_JIFFY,(cycle_t)((now - then)*mult)>> shift);
+	else
+		ret += (cycle_t)(jiffs_now - jiffs_then)*NSEC_PER_JIFFY;
+
+	return ret;
+}
+
+static void tsc_interp_update_callback(void)
+{
+	/* only update if tsc_khz has changed */
+	if (current_tsc_khz != tsc_khz){
+		current_tsc_khz = tsc_khz;
+		mult = timesource_khz2mult(current_tsc_khz, shift);
+	}
+}
+
+
+static int __init init_tsc_interp_timesource(void)
+{
+	/* TSC initialization is done in arch/i386/kernel/tsc.c */
+	if (cpu_has_tsc && tsc_khz) {
+		current_tsc_khz = tsc_khz;
+		shift = SHIFT_VAL;
+		mult = timesource_khz2mult(current_tsc_khz, shift);
+		/* setup periodic soft-timer */
+		init_timer(&tsc_interp_timer);
+		tsc_interp_timer.function = tsc_interp_sync;
+		tsc_interp_timer.expires = jiffies;
+		add_timer(&tsc_interp_timer);
+
+		register_timesource(&timesource_tsc_interp);
+	}
+	return 0;
+}
+module_init(init_tsc_interp_timesource);
diff --git a/drivers/timesource/tsc.c b/drivers/timesource/tsc.c
new file mode 100644
--- /dev/null
+++ b/drivers/timesource/tsc.c
@@ -0,0 +1,83 @@
+/* TODO:
+ *		o better calibration
+ */
+
+#include <linux/timesource.h>
+#include <linux/timex.h>
+#include <linux/init.h>
+
+static unsigned long current_tsc_khz = 0;
+
+static cycle_t read_safe_tsc(void);
+static void tsc_update_callback(void);
+
+static struct timesource_t timesource_safe_tsc = {
+	.name = "c3tsc",
+	.priority = 300,
+	.type = TIMESOURCE_FUNCTION,
+	.read_fnct = read_safe_tsc,
+	.mask = (cycle_t)-1,
+	.mult = 0, /* to be set */
+	.shift = 22,
+	.update_callback = tsc_update_callback,
+};
+
+static struct timesource_t timesource_raw_tsc = {
+	.name = "tsc",
+	.priority = 300,
+	.type = TIMESOURCE_CYCLES,
+	.mask = (cycle_t)-1,
+	.mult = 0, /* to be set */
+	.shift = 22,
+	.update_callback = tsc_update_callback,
+};
+
+static struct timesource_t timesource_tsc;
+
+static cycle_t read_safe_tsc(void)
+{
+	cycle_t ret;
+	rdtscll(ret);
+	return ret + tsc_read_c3_time();
+}
+
+static void tsc_update_callback(void)
+{
+	/* check to see if we should switch to the safe timesource */
+	if (tsc_read_c3_time() &&
+		strncmp(timesource_tsc.name, "c3tsc", 5)) {
+		printk("Falling back to C3 safe TSC\n");
+		timesource_safe_tsc.mult = timesource_tsc.mult;
+		timesource_safe_tsc.priority = timesource_tsc.priority;
+		timesource_tsc = timesource_safe_tsc;
+	}
+
+	if (check_tsc_unstable()) {
+		timesource_tsc.priority = 50;
+		reselect_timesource();
+	}
+	/* only update if tsc_khz has changed */
+	if (current_tsc_khz != tsc_khz){
+		current_tsc_khz = tsc_khz;
+		timesource_tsc.mult = timesource_khz2mult(current_tsc_khz,
+							timesource_tsc.shift);
+	}
+}
+
+static int __init init_tsc_timesource(void)
+{
+
+	timesource_tsc = timesource_raw_tsc;
+
+	/* TSC initialization is done in arch/i386/kernel/tsc.c */
+	if (cpu_has_tsc && tsc_khz) {
+		current_tsc_khz = tsc_khz;
+		timesource_tsc.mult = timesource_khz2mult(current_tsc_khz,
+							timesource_tsc.shift);
+		register_timesource(&timesource_tsc);
+	}
+	return 0;
+}
+
+module_init(init_tsc_timesource);
+



^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [RFC - 0/9] Generic timekeeping subsystem  (v. B5)
  2005-08-11  2:13 ` [RFC - 0/9] Generic timekeeping subsystem (v. B5) john stultz
  2005-08-11  2:14   ` [PATCH 1/9] Timesource management code john stultz
@ 2005-08-11  2:32   ` Lee Revell
  2005-08-11  2:39     ` john stultz
  2005-08-11  6:17     ` Ulrich Windl
  2005-08-11  2:37   ` [RFC] Cumulative NTP cleanujp and generic timekeeping patch (v B5) john stultz
  2005-08-15 22:14   ` [RFC - 0/9] Generic timekeeping subsystem (v. B5) Roman Zippel
  3 siblings, 2 replies; 71+ messages in thread
From: Lee Revell @ 2005-08-11  2:32 UTC (permalink / raw)
  To: john stultz
  Cc: lkml, George Anzinger, frank, Anton Blanchard, benh,
	Nishanth Aravamudan, Roman Zippel, Ulrich Windl

On Wed, 2005-08-10 at 19:13 -0700, john stultz wrote:
> All,
> 	Here's the next rev in my rework of the current timekeeping subsystem.
> No major changes, only some cleanups and further splitting the larger
> patches into smaller ones.

Last I heard this made gettimeofday() 20% slower on x86.  Is this still
the case?

Lee


^ permalink raw reply	[flat|nested] 71+ messages in thread

* [RFC] Cumulative NTP cleanujp and generic timekeeping patch (v B5)
  2005-08-11  2:13 ` [RFC - 0/9] Generic timekeeping subsystem (v. B5) john stultz
  2005-08-11  2:14   ` [PATCH 1/9] Timesource management code john stultz
  2005-08-11  2:32   ` [RFC - 0/9] Generic timekeeping subsystem (v. B5) Lee Revell
@ 2005-08-11  2:37   ` john stultz
  2005-08-15 22:14   ` [RFC - 0/9] Generic timekeeping subsystem (v. B5) Roman Zippel
  3 siblings, 0 replies; 71+ messages in thread
From: john stultz @ 2005-08-11  2:37 UTC (permalink / raw)
  To: lkml
  Cc: George Anzinger, frank, Anton Blanchard, benh,
	Nishanth Aravamudan, Roman Zippel, Ulrich Windl

[-- Attachment #1: Type: text/plain, Size: 353 bytes --]

All,
	Since I doubt everyone wants to save and apply 20-some patches to test
my patch (and that assumes they want to test my patch ;), attached is
the cumulative patch (thankyou git!) of the NTP cleanup work as well as
the generic timekeeping code against linus's current tree.

Please let me know if you have any comments or feedback.

thanks
-john




[-- Attachment #2: linux-2.6.13-rc6_timeofday-all.patch.gz --]
[-- Type: application/x-gzip, Size: 48155 bytes --]

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [RFC - 0/9] Generic timekeeping subsystem  (v. B5)
  2005-08-11  2:32   ` [RFC - 0/9] Generic timekeeping subsystem (v. B5) Lee Revell
@ 2005-08-11  2:39     ` john stultz
  2005-08-11  2:44       ` Lee Revell
  2005-08-11  6:17     ` Ulrich Windl
  1 sibling, 1 reply; 71+ messages in thread
From: john stultz @ 2005-08-11  2:39 UTC (permalink / raw)
  To: Lee Revell
  Cc: lkml, George Anzinger, frank, Anton Blanchard, benh,
	Nishanth Aravamudan, Roman Zippel, Ulrich Windl

On Wed, 2005-08-10 at 22:32 -0400, Lee Revell wrote:
> On Wed, 2005-08-10 at 19:13 -0700, john stultz wrote:
> > All,
> > 	Here's the next rev in my rework of the current timekeeping subsystem.
> > No major changes, only some cleanups and further splitting the larger
> > patches into smaller ones.
> 
> Last I heard this made gettimeofday() 20% slower on x86.  Is this still
> the case?

Ah, I've got a patch on my laptop that takes that down to ~2% or less. I
didn't include it in this patch set but I'll work to get it integrated
before the next release. Sorry about that.

If you have any suggestions for further performance improvements, please
let me know.

thanks
-john


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [RFC - 0/9] Generic timekeeping subsystem  (v. B5)
  2005-08-11  2:39     ` john stultz
@ 2005-08-11  2:44       ` Lee Revell
  0 siblings, 0 replies; 71+ messages in thread
From: Lee Revell @ 2005-08-11  2:44 UTC (permalink / raw)
  To: john stultz
  Cc: lkml, George Anzinger, frank, Anton Blanchard, benh,
	Nishanth Aravamudan, Roman Zippel, Ulrich Windl

On Wed, 2005-08-10 at 19:39 -0700, john stultz wrote:
> Ah, I've got a patch on my laptop that takes that down to ~2% or less.
> I didn't include it in this patch set but I'll work to get it
> integrated before the next release. Sorry about that.
> 
> If you have any suggestions for further performance improvements,
> please let me know.

2% sounds reasonable to me.  I just don't think we can afford 20%
because so many stupid apps bang on gettimeofday() constantly.

Lee


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [RFC - 0/9] Generic timekeeping subsystem  (v. B5)
  2005-08-11  2:32   ` [RFC - 0/9] Generic timekeeping subsystem (v. B5) Lee Revell
  2005-08-11  2:39     ` john stultz
@ 2005-08-11  6:17     ` Ulrich Windl
  1 sibling, 0 replies; 71+ messages in thread
From: Ulrich Windl @ 2005-08-11  6:17 UTC (permalink / raw)
  To: Lee Revell; +Cc: lkml

On 10 Aug 2005 at 22:32, Lee Revell wrote:

> On Wed, 2005-08-10 at 19:13 -0700, john stultz wrote:
> > All,
> > 	Here's the next rev in my rework of the current timekeeping subsystem.
> > No major changes, only some cleanups and further splitting the larger
> > patches into smaller ones.
> 
> Last I heard this made gettimeofday() 20% slower on x86.  Is this still
> the case?

If it's only 20% for an increase in resolution of 100000%, it's quite good ;-)

Regards,
Ulrich


> 
> Lee
> 



^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [RFC - 0/13] NTP cleanup work (v. B5)
  2005-08-11  1:21 [RFC - 0/13] NTP cleanup work (v. B5) john stultz
  2005-08-11  1:23 ` [RFC][PATCH - 1/13] NTP cleanup: Move NTP code into ntp.c john stultz
  2005-08-11  2:13 ` [RFC - 0/9] Generic timekeeping subsystem (v. B5) john stultz
@ 2005-08-15 22:12 ` Roman Zippel
  2005-08-15 22:46   ` john stultz
  2 siblings, 1 reply; 71+ messages in thread
From: Roman Zippel @ 2005-08-15 22:12 UTC (permalink / raw)
  To: john stultz
  Cc: lkml, George Anzinger, frank, Anton Blanchard, benh,
	Nishanth Aravamudan, Ulrich Windl

Hi,

On Wed, 10 Aug 2005, john stultz wrote:

> 	The goal of this patch set is to isolate the in kernel NTP state
> machine in the hope of simplifying the current timekeeping code and
> allowing for optional future changes in the timekeeping subsystem.
> 
> I've tried to address some of the complexity concerns for systems that
> do not have a continuous timesource, preserving the existing behavior
> while still providing a ppm interface allowing smooth adjustments to
> continuous timesources. 

I think most of this is premature cleanup. As it also changes the logic in 
small ways, I'm not even sure it qualifies as a cleanup.
The only obvious patch is the PPS code removal, which is fine.
For the rest I can't agree on to move everything that aggressively into 
the ntp namespace. The kernel clock is controlled via NTP, but how it 
actually works has little to do with "network time". Some of the 
parameters are even private clock variables (e.g. time adjustment, phase), 
which don't belong in any common code. (I'll expand on that in the next 
mail.)

bye, Roman

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [RFC - 0/9] Generic timekeeping subsystem  (v. B5)
  2005-08-11  2:13 ` [RFC - 0/9] Generic timekeeping subsystem (v. B5) john stultz
                     ` (2 preceding siblings ...)
  2005-08-11  2:37   ` [RFC] Cumulative NTP cleanujp and generic timekeeping patch (v B5) john stultz
@ 2005-08-15 22:14   ` Roman Zippel
  2005-08-16  0:10     ` john stultz
  3 siblings, 1 reply; 71+ messages in thread
From: Roman Zippel @ 2005-08-15 22:14 UTC (permalink / raw)
  To: john stultz
  Cc: lkml, George Anzinger, frank, Anton Blanchard, benh,
	Nishanth Aravamudan, Ulrich Windl

Hi,

On Wed, 10 Aug 2005, john stultz wrote:

> 	Here's the next rev in my rework of the current timekeeping subsystem.
> No major changes, only some cleanups and further splitting the larger
> patches into smaller ones.
> 
> The goal of this patch set is to provide a simplified and streamlined
> common timekeeping infrastructure that architectures can optionally use
> to avoid duplicating code with other architectures.

It's still the same old abstraction. Let's try it in OO terms why it's the 
wrong one. What basically is needed is something like this:

	base clock	-> continuous clock	-> clock implemention
			-> tick clock		-> ....

The base clock class is an abstract class which provides the basic time 
services like get time, set time...
The continuous clock class and tick clock class are also abstract classes,
but provide basic template functions, which can be used by the actual 
implementations do most of the work.

What you do with your patches is to just provide an abstract class for 
continous clocks and tick based clocks have to emulate a continuous clock. 
Please provide the right abstractions, e.g. leave the gettimeofday 
implementation to the timesource and use some helper (template) functions 
to do the actual work. Basically as long as you have a cycle_t in the 
common code something is really wrong, this code belongs in the continous 
clock template.

This also allows better implementations, e.g. gettimeofday can be done in 
a single step instead of two using a single lock instead of two.

bye, Roman

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [RFC - 0/13] NTP cleanup work (v. B5)
  2005-08-15 22:12 ` [RFC - 0/13] NTP cleanup work " Roman Zippel
@ 2005-08-15 22:46   ` john stultz
  2005-08-17  0:10     ` Roman Zippel
  0 siblings, 1 reply; 71+ messages in thread
From: john stultz @ 2005-08-15 22:46 UTC (permalink / raw)
  To: Roman Zippel
  Cc: lkml, George Anzinger, frank, Anton Blanchard, benh,
	Nishanth Aravamudan, Ulrich Windl

On Tue, 2005-08-16 at 00:12 +0200, Roman Zippel wrote:
> On Wed, 10 Aug 2005, john stultz wrote:
> 
> > 	The goal of this patch set is to isolate the in kernel NTP state
> > machine in the hope of simplifying the current timekeeping code and
> > allowing for optional future changes in the timekeeping subsystem.
> > 
> > I've tried to address some of the complexity concerns for systems that
> > do not have a continuous timesource, preserving the existing behavior
> > while still providing a ppm interface allowing smooth adjustments to
> > continuous timesources. 
> 
> I think most of this is premature cleanup. As it also changes the logic in 
> small ways, I'm not even sure it qualifies as a cleanup.

Please, Roman, I'm spending quite a bit of time breaking this up into
small chunks specifically to help this discussion. Rather then just
stating that the logic is changed in small ways, could you please be
specific and point to where that logic has changed and we can fix or
discuss it.

> The only obvious patch is the PPS code removal, which is fine.

Ok, one down, 12 to go ;)

> For the rest I can't agree on to move everything that aggressively into 
> the ntp namespace. The kernel clock is controlled via NTP, but how it 
> actually works has little to do with "network time". 

Eh? The adjtimex() interface causes a small adjustment to be added or
removed from the system time each tick. Why should this code not be put
behind a clear interface?

> Some of the 
> parameters are even private clock variables (e.g. time adjustment, phase), 
> which don't belong in any common code. (I'll expand on that in the next 
> mail.)

Again, I'm not understanding your objection. Its exactly because the
time_adjust and phase values are NTP specific variables that I'm trying
to move them from the timer.c code into ntp.c

I'm trying to address your concerns, I just need you to be more
explicit.

thanks
-john


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [RFC - 0/9] Generic timekeeping subsystem  (v. B5)
  2005-08-15 22:14   ` [RFC - 0/9] Generic timekeeping subsystem (v. B5) Roman Zippel
@ 2005-08-16  0:10     ` john stultz
  2005-08-16 18:25       ` Christoph Lameter
  2005-08-17  0:28       ` Roman Zippel
  0 siblings, 2 replies; 71+ messages in thread
From: john stultz @ 2005-08-16  0:10 UTC (permalink / raw)
  To: Roman Zippel
  Cc: lkml, George Anzinger, frank, Anton Blanchard, benh,
	Nishanth Aravamudan, Ulrich Windl

On Tue, 2005-08-16 at 00:14 +0200, Roman Zippel wrote:
> On Wed, 10 Aug 2005, john stultz wrote:
> 
> > 	Here's the next rev in my rework of the current timekeeping subsystem.
> > No major changes, only some cleanups and further splitting the larger
> > patches into smaller ones.
> > 
> > The goal of this patch set is to provide a simplified and streamlined
> > common timekeeping infrastructure that architectures can optionally use
> > to avoid duplicating code with other architectures.
> 
> It's still the same old abstraction. Let's try it in OO terms why it's the 
> wrong one. What basically is needed is something like this:
> 
> 	base clock	-> continuous clock	-> clock implemention
> 			-> tick clock		-> ....
> 
> The base clock class is an abstract class which provides the basic time 
> services like get time, set time...
> The continuous clock class and tick clock class are also abstract classes,
> but provide basic template functions, which can be used by the actual 
> implementations do most of the work.
> 
> What you do with your patches is to just provide an abstract class for 
> continous clocks and tick based clocks have to emulate a continuous clock. 

Sorry. It was subtle, but after thinking more about your arguments, I've
stepped back from my earlier goals of replacing the timekeeping code for
all arches and instead I've decided to just focus on allowing
architectures that would duplicate code using a continuous timesource
use a common code base.  

Think of it more as a replacement for the time_interpolator code (which
thanks to Christoph Lameter, it is quite influenced by).

So in that way the "abstract class" will just be the current interface
of:

1. do_gettimeofday()
2. do_settimeofday()
3. getnstimeofday()
4. periodic hook (update_wall_time)
5. init code

To that I'd like to add 
6. do_monotonic_clock() which I've just added and implementation for
tick based systems.

Then in the tick based class, nothing changes (except for the new
do_monotonic_clock implementation). And in the continuous timesource
class, it uses my generic-tod code.

> Please provide the right abstractions, e.g. leave the gettimeofday 
> implementation to the timesource and use some helper (template) functions 
> to do the actual work. Basically as long as you have a cycle_t in the 
> common code something is really wrong, this code belongs in the continous 
> clock template.

I'm not sure I agree. By pushing all the gettimeofday logic behind the
timesource or clock class you describe above, you end up with lots of
duplicated and error prone code. That's the issue I'm trying to avoid
between the different arches. Additionally The current i386 timer_opts
code (which I'm to blame for) does almost exactly this at the timesource
level, and while it did allow for alternate timesources to be easily
used, it caused a large amount of almost duplicate code with slightly
differing behavior, and has made changes like dynamic ticks difficult to
do correctly.

It was this reason (along with Christoph's proddings - due to the
fsyscall requirements) that the timesource structure only provides an
abstraction to a free running counter instead of a state-full structure
with function pointers that return the timeofday.

Now, this does not limit any arch from doing their own thing and
implementing their own "timeofday abstract class". I'm just trying to
provide a correct and clean infrastructure for the arches that could use
a continuous timesource. 

> This also allows better implementations, e.g. gettimeofday can be done in 
> a single step instead of two using a single lock instead of two.

This is a miss-characterization. In most cases the continuous
gettimeofday is done in a single step with a single lock. Although it
does have the flexibility to allow for more complex setups, as well as
the ability to opt out and use the existing tick based code. 

thanks
-john

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [RFC][PATCH - 4/13] NTP cleanup: Breakup ntp_adjtimex()
  2005-08-11  1:27       ` [RFC][PATCH - 4/13] NTP cleanup: Breakup ntp_adjtimex() john stultz
  2005-08-11  1:28         ` [RFC][PATCH - 5/13] NTP cleanup: Break out leapsecond processing john stultz
@ 2005-08-16  2:08         ` john stultz
  1 sibling, 0 replies; 71+ messages in thread
From: john stultz @ 2005-08-16  2:08 UTC (permalink / raw)
  To: lkml
  Cc: George Anzinger, frank, Anton Blanchard, benh,
	Nishanth Aravamudan, Roman Zippel, Ulrich Windl

On Wed, 2005-08-10 at 18:27 -0700, john stultz wrote:
> All,
> 	This patch breaks up the complex nesting of code in ntp_adjtimex() by
> creating a ntp_hardupdate() function and simplifying some of the logic.
> This also mimics the documented NTP spec somewhat better.
> 
> Any comments or feedback would be greatly appreciated.

Ugh. I just caught a bug where I misplaced the parens. 

> -			} /* STA_PLL */
> +				else if (ntp_hardupdate(txc->offset, xtime))
> +					result = TIME_ERROR;
> +			}
>  		} /* txc->modes & ADJ_OFFSET */
 
That's wrong. The following patch fixes it. 

thanks
-john


diff --git a/kernel/ntp.c b/kernel/ntp.c
--- a/kernel/ntp.c
+++ b/kernel/ntp.c
@@ -388,9 +388,8 @@ int ntp_adjtimex(struct timex *txc)
 				/* adjtime() is independent from ntp_adjtime() */
 				if ((time_next_adjust = txc->offset) == 0)
 					time_adjust = 0;
-				else if (ntp_hardupdate(txc->offset, xtime))
-					result = TIME_ERROR;
-			}
+			} else if (ntp_hardupdate(txc->offset, xtime))
+				result = TIME_ERROR;
 		} /* txc->modes & ADJ_OFFSET */
 
 		if (txc->modes & ADJ_TICK) {



^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [RFC - 0/9] Generic timekeeping subsystem  (v. B5)
  2005-08-16  0:10     ` john stultz
@ 2005-08-16 18:25       ` Christoph Lameter
  2005-08-16 23:48         ` john stultz
  2005-08-17  6:08         ` Ulrich Windl
  2005-08-17  0:28       ` Roman Zippel
  1 sibling, 2 replies; 71+ messages in thread
From: Christoph Lameter @ 2005-08-16 18:25 UTC (permalink / raw)
  To: john stultz
  Cc: Roman Zippel, lkml, George Anzinger, frank, Anton Blanchard, benh,
	Nishanth Aravamudan, Ulrich Windl

On Mon, 15 Aug 2005, john stultz wrote:

> Sorry. It was subtle, but after thinking more about your arguments, I've
> stepped back from my earlier goals of replacing the timekeeping code for
> all arches and instead I've decided to just focus on allowing
> architectures that would duplicate code using a continuous timesource
> use a common code base.  

Thats great!

> Think of it more as a replacement for the time_interpolator code (which
> thanks to Christoph Lameter, it is quite influenced by).

I have no objection to replacing the time_interpolator code if the 
timesources provide a superset of functionality. Rename time_interpolator 
to timesource (including all currently existing interpolator defintions 
which will become time sources) and modify/add fields to be able to 
satisfy your requirements. The interpolator compensations may become not 
necessary if the upper layers can deal with discrepancies between timer 
interrupts and actual intervals occurring between these interrupts and if 
the upper layer can adjust the time source in use.

You mentioned that the NTP code has some issues with time interpolation 
at the KS. This is due to the NTP layer not being aware of actual time 
differences between timer interrupts that the interpolator knows about. If 
the NTP layer would be aware of the actual intervals measured by the 
timesource (or interpolator) then presumably time could be adjusted in a 
more accurate way.

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [RFC - 0/9] Generic timekeeping subsystem  (v. B5)
  2005-08-16 18:25       ` Christoph Lameter
@ 2005-08-16 23:48         ` john stultz
  2005-08-17  0:14           ` Christoph Lameter
  2005-08-17  6:08         ` Ulrich Windl
  1 sibling, 1 reply; 71+ messages in thread
From: john stultz @ 2005-08-16 23:48 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Roman Zippel, lkml, George Anzinger, frank, Anton Blanchard, benh,
	Nishanth Aravamudan, Ulrich Windl

On Tue, 2005-08-16 at 11:25 -0700, Christoph Lameter wrote:
> You mentioned that the NTP code has some issues with time interpolation 
> at the KS. This is due to the NTP layer not being aware of actual time 
> differences between timer interrupts that the interpolator knows about.

My understanding of the issue was that when NTP makes an adjustment, it
only affects xtime, and any difference between the adjusted time and the
interpolator's time was just accumulated in the interpolator's offset.
That then, to my understanding, required the bit about adjusting the
interpolator frequency to be slower then what we expect so negative
offsets can be applied. 

Looking at it closer, it may very work, but it does seem to be
addressing the issue somewhat indirectly.

> If the NTP layer would be aware of the actual intervals measured by the 
> timesource (or interpolator) then presumably time could be adjusted in a 
> more accurate way.

This is basically what I do in my patch. I directly apply the NTP
adjustment to the timesource interval, and periodically increment the
NTP state machine by the timesource interval when we accumulate it.

thanks
-john

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [RFC - 0/13] NTP cleanup work (v. B5)
  2005-08-15 22:46   ` john stultz
@ 2005-08-17  0:10     ` Roman Zippel
  0 siblings, 0 replies; 71+ messages in thread
From: Roman Zippel @ 2005-08-17  0:10 UTC (permalink / raw)
  To: john stultz
  Cc: lkml, George Anzinger, frank, Anton Blanchard, benh,
	Nishanth Aravamudan, Ulrich Windl

Hi,

On Mon, 15 Aug 2005, john stultz wrote:

> > I think most of this is premature cleanup. As it also changes the logic in 
> > small ways, I'm not even sure it qualifies as a cleanup.
> 
> Please, Roman, I'm spending quite a bit of time breaking this up into
> small chunks specifically to help this discussion. Rather then just
> stating that the logic is changed in small ways, could you please be
> specific and point to where that logic has changed and we can fix or
> discuss it.

Well, part of the problem is that you didn't really explain, why your 
logical are better in any way. You broke them out of your big patch and 
they lose their context.
It's probably better to first to understand and agree on the functional 
changes, moving code around can still be done afterwards.

> > For the rest I can't agree on to move everything that aggressively into 
> > the ntp namespace. The kernel clock is controlled via NTP, but how it 
> > actually works has little to do with "network time". 
> 
> Eh? The adjtimex() interface causes a small adjustment to be added or
> removed from the system time each tick. Why should this code not be put
> behind a clear interface?

I don't mind the clear interface...

> > Some of the 
> > parameters are even private clock variables (e.g. time adjustment, phase), 
> > which don't belong in any common code. (I'll expand on that in the next 
> > mail.)
> 
> Again, I'm not understanding your objection. Its exactly because the
> time_adjust and phase values are NTP specific variables that I'm trying
> to move them from the timer.c code into ntp.c

The point is you're starting at the wrong point, first we need a clean 
clock interface, only then should we change the ntp code to it.
(More again in the next mail.)

bye, Roman

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [RFC - 0/9] Generic timekeeping subsystem  (v. B5)
  2005-08-16 23:48         ` john stultz
@ 2005-08-17  0:14           ` Christoph Lameter
  2005-08-17  0:17             ` john stultz
  0 siblings, 1 reply; 71+ messages in thread
From: Christoph Lameter @ 2005-08-17  0:14 UTC (permalink / raw)
  To: john stultz
  Cc: Roman Zippel, lkml, George Anzinger, frank, Anton Blanchard, benh,
	Nishanth Aravamudan, Ulrich Windl

On Tue, 16 Aug 2005, john stultz wrote:

> This is basically what I do in my patch. I directly apply the NTP
> adjustment to the timesource interval, and periodically increment the
> NTP state machine by the timesource interval when we accumulate it.

Is there some way to tell the NTP code how much the time_interpolator time 
deviates from xtime?

If the NTP code would use getnstimeofday or 
do_gettimeofday then it would already get interpolated time.

The curious issue in the current arrangement is that the interpolator 
knows much more accurately how much time has passed between interrupts 
than the timer interrupt but it has no time to make that information 
available to the NTP code.

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [RFC - 0/9] Generic timekeeping subsystem  (v. B5)
  2005-08-17  0:14           ` Christoph Lameter
@ 2005-08-17  0:17             ` john stultz
  2005-08-17  0:21               ` Christoph Lameter
  0 siblings, 1 reply; 71+ messages in thread
From: john stultz @ 2005-08-17  0:17 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Roman Zippel, lkml, George Anzinger, frank, Anton Blanchard, benh,
	Nishanth Aravamudan, Ulrich Windl

On Tue, 2005-08-16 at 17:14 -0700, Christoph Lameter wrote:
> On Tue, 16 Aug 2005, john stultz wrote:
> 
> > This is basically what I do in my patch. I directly apply the NTP
> > adjustment to the timesource interval, and periodically increment the
> > NTP state machine by the timesource interval when we accumulate it.
> 
> Is there some way to tell the NTP code how much the time_interpolator time 
> deviates from xtime?
> 
> If the NTP code would use getnstimeofday or 
> do_gettimeofday then it would already get interpolated time.

That seems a bit backwards, no?

> The curious issue in the current arrangement is that the interpolator 
> knows much more accurately how much time has passed between interrupts 
> than the timer interrupt but it has no time to make that information 
> available to the NTP code.

That is why I'm suggesting time_interpolator users to move to my code
(when they're ready, of course :).

thanks
-john



^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [RFC - 0/9] Generic timekeeping subsystem  (v. B5)
  2005-08-17  0:17             ` john stultz
@ 2005-08-17  0:21               ` Christoph Lameter
  0 siblings, 0 replies; 71+ messages in thread
From: Christoph Lameter @ 2005-08-17  0:21 UTC (permalink / raw)
  To: john stultz
  Cc: Roman Zippel, lkml, George Anzinger, frank, Anton Blanchard, benh,
	Nishanth Aravamudan, Ulrich Windl

On Tue, 16 Aug 2005, john stultz wrote:

> That is why I'm suggesting time_interpolator users to move to my code
> (when they're ready, of course :).

Both are basically timesources. That is why I would suggest you upgrade 
the interpolators to timesources. Doing that would enable a gradual 
transition instead of a cutover to a new time subsystem. It should also 
insure that the gains we have made in terms of accuracy of time will be 
preserved in the new system. And the code would be able to use the 
existing proven code that already allows system time with nanosecond 
precision.

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [RFC - 0/9] Generic timekeeping subsystem  (v. B5)
  2005-08-16  0:10     ` john stultz
  2005-08-16 18:25       ` Christoph Lameter
@ 2005-08-17  0:28       ` Roman Zippel
  2005-08-17  1:17         ` john stultz
  2005-08-17 19:03         ` George Anzinger
  1 sibling, 2 replies; 71+ messages in thread
From: Roman Zippel @ 2005-08-17  0:28 UTC (permalink / raw)
  To: john stultz
  Cc: lkml, George Anzinger, frank, Anton Blanchard, benh,
	Nishanth Aravamudan, Ulrich Windl

Hi,

On Mon, 15 Aug 2005, john stultz wrote:

> > Please provide the right abstractions, e.g. leave the gettimeofday 
> > implementation to the timesource and use some helper (template) functions 
> > to do the actual work. Basically as long as you have a cycle_t in the 
> > common code something is really wrong, this code belongs in the continous 
> > clock template.
> 
> I'm not sure I agree. By pushing all the gettimeofday logic behind the
> timesource or clock class you describe above, you end up with lots of
> duplicated and error prone code.

That's there template code comes in, so that the drivers need only to add 
little code themselves. The point is that every time source has different 
requirements and if we want efficient implementations, too much common 
code doesn't really help. It's a tradeoff.

Let's look at the example patch below. I played a little with some code 
and this just demonstrates an accurate conversion of the tick/freq values 
into the internal values in ns resolution. It does a little more work 
ahead, but the interrupt code becomes simpler and most important it 
doesn't require any expensive 64bit math and you can't get it much more 
accurate than that. The current gettimeofday code for tick based sources 
is really cheap and I'd like to keep that (i.e. free of 64bit math). The 
accuracy can and should be fixed (the change to timespec wasn't really a 
major improvement, as it introduced new rounding errors).

The other thing the example demonstrates is the interface from NTP to 
timer code. The NTP code provides the basic parameters and then leaves it 
to the clock implementation how they apply. The adjustment and phase 
variables are really private variables. In the code below it's rather 
easily possible to make HZ another parameter and you can have clocks 
running at different frequencies (e.g. to implement dynamic ticks). A low 
frequency timer provides the wall clock and a separate timer takes care of 
the kernel timer.

The code below needs of course a little more work, currently I use it to 
collect some data on how the current code behaves. I'll add the adjustment 
code and then I'll see how it compares to it.

> > This also allows better implementations, e.g. gettimeofday can be done in 
> > a single step instead of two using a single lock instead of two.
> 
> This is a miss-characterization. In most cases the continuous
> gettimeofday is done in a single step with a single lock. Although it
> does have the flexibility to allow for more complex setups, as well as
> the ability to opt out and use the existing tick based code. 

You have it the wrong way around. In the general case you need two locks 
and only in some cases can you optimize one away. To evaluate the 
complexity of the design you really have to look at the general case for 
each component. You're rather focused on just the best cases.

bye, Roman

---

 kernel/time.c  |    3 ++-
 kernel/timer.c |   53 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 55 insertions(+), 1 deletion(-)

Index: linux-2.6/kernel/time.c
===================================================================
--- linux-2.6.orig/kernel/time.c	2005-07-13 03:18:04.000000000 +0200
+++ linux-2.6/kernel/time.c	2005-08-16 01:37:20.000000000 +0200
@@ -366,8 +366,9 @@ int do_adjtimex(struct timex *txc)
 	    } /* txc->modes & ADJ_OFFSET */
 	    if (txc->modes & ADJ_TICK) {
 		tick_usec = txc->tick;
-		tick_nsec = TICK_USEC_TO_NSEC(tick_usec);
 	    }
+	    if (txc->modes & (ADJ_FREQUENCY|ADJ_OFFSET|ADJ_TICK))
+		    time_recalc();
 	} /* txc->modes */
 leave:	if ((time_status & (STA_UNSYNC|STA_CLOCKERR)) != 0
 	    || ((time_status & (STA_PPSFREQ|STA_PPSTIME)) != 0
Index: linux-2.6/kernel/timer.c
===================================================================
--- linux-2.6.orig/kernel/timer.c	2005-07-13 03:18:04.000000000 +0200
+++ linux-2.6/kernel/timer.c	2005-08-16 23:10:53.000000000 +0200
@@ -559,6 +559,7 @@ found:
  */
 unsigned long tick_usec = TICK_USEC; 		/* USER_HZ period (usec) */
 unsigned long tick_nsec = TICK_NSEC;		/* ACTHZ period (nsec) */
+unsigned long tick_nsec2 = TICK_NSEC;
 
 /* 
  * The current time 
@@ -569,6 +570,7 @@ unsigned long tick_nsec = TICK_NSEC;		/*
  * the usual normalization.
  */
 struct timespec xtime __attribute__ ((aligned (16)));
+struct timespec xtime2 __attribute__ ((aligned (16)));
 struct timespec wall_to_monotonic __attribute__ ((aligned (16)));
 
 EXPORT_SYMBOL(xtime);
@@ -596,6 +598,33 @@ static long time_adj;			/* tick adjust (
 long time_reftime;			/* time at last adjustment (s)	*/
 long time_adjust;
 long time_next_adjust;
+static long time_adj2, time_adj2_cur, time_freq_adj2, time_freq_phase2, time_phase2;
+
+void time_recalc(void)
+{
+	long f, t;
+	tick_nsec = TICK_USEC_TO_NSEC(tick_usec);
+
+	t = time_freq >> (SHIFT_USEC + 8);
+	if (t) {
+		time_freq -= t << (SHIFT_USEC + 8);
+		t *= 1000 << 8;
+	}
+	f = time_freq * 125;
+	t += tick_usec * USER_HZ * 1000 + (f >> (SHIFT_USEC - 3));
+	f &= (1 << (SHIFT_USEC - 3)) - 1;
+	tick_nsec2 = t / HZ;
+	f += (t % HZ) << (SHIFT_USEC - 3);
+	f <<= 5;
+	time_adj2 = f / HZ;
+	time_freq_adj2 = f % HZ;
+
+	printk("tr: %ld.%09ld(%ld,%ld,%ld,%ld) - %ld.%09ld(%ld,%ld,%ld)\n",
+		xtime.tv_sec, xtime.tv_sec,
+		tick_nsec, time_freq, time_offset, time_next_adjust,
+		xtime2.tv_sec, xtime2.tv_nsec,
+		tick_nsec2, time_adj2, time_freq_adj2);
+}
 
 /*
  * this routine handles the overflow of the microsecond field
@@ -739,6 +768,16 @@ static void second_overflow(void)
 #endif
 }
 
+static void second_overflow2(void)
+{
+	time_adj2_cur = time_adj2;
+	time_freq_phase2 += time_freq_adj2;
+	if (time_freq_phase2 > HZ) {
+		time_freq_phase2 -= HZ;
+		time_adj2_cur++;
+	}
+}
+
 /* in the NTP reference this is called "hardclock()" */
 static void update_wall_time_one_tick(void)
 {
@@ -786,6 +825,20 @@ static void update_wall_time_one_tick(vo
 		time_adjust = time_next_adjust;
 		time_next_adjust = 0;
 	}
+
+	delta_nsec = tick_nsec2;
+	time_phase2 += time_adj2_cur;
+	if (time_phase2 >= (1 << (SHIFT_USEC + 2))) {
+		long ltemp = time_phase2 >> (SHIFT_USEC + 2);
+		time_phase2 -= ltemp << (SHIFT_USEC + 2);
+		delta_nsec += ltemp;
+	}
+	xtime2.tv_nsec += delta_nsec;
+	if (xtime2.tv_nsec >= NSEC_PER_SEC) {
+		xtime2.tv_nsec -= NSEC_PER_SEC;
+		xtime2.tv_sec++;
+		second_overflow2();
+	}
 }
 
 /*

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [RFC - 0/9] Generic timekeeping subsystem  (v. B5)
  2005-08-17  0:28       ` Roman Zippel
@ 2005-08-17  1:17         ` john stultz
  2005-08-17  7:40           ` Ulrich Windl
  2005-08-19  0:27           ` Roman Zippel
  2005-08-17 19:03         ` George Anzinger
  1 sibling, 2 replies; 71+ messages in thread
From: john stultz @ 2005-08-17  1:17 UTC (permalink / raw)
  To: Roman Zippel
  Cc: lkml, George Anzinger, frank, Anton Blanchard, benh,
	Nishanth Aravamudan, Ulrich Windl

On Wed, 2005-08-17 at 02:28 +0200, Roman Zippel wrote:

> Let's look at the example patch below. I played a little with some code 
> and this just demonstrates an accurate conversion of the tick/freq values 
> into the internal values in ns resolution. It does a little more work 
> ahead, but the interrupt code becomes simpler and most important it 
> doesn't require any expensive 64bit math and you can't get it much more 
> accurate than that. The current gettimeofday code for tick based sources 
> is really cheap and I'd like to keep that (i.e. free of 64bit math). The 
> accuracy can and should be fixed (the change to timespec wasn't really a 
> major improvement, as it introduced new rounding errors).

Hmm. It could really use some comments, but it looks interesting. Let me
continue reading it and play around with it some more. 

> The other thing the example demonstrates is the interface from NTP to 
> timer code. The NTP code provides the basic parameters and then leaves it 
> to the clock implementation how they apply. The adjustment and phase 
> variables are really private variables. 

If they are private clock variables, why are they in the generic
timer.c? Everyone is using it in exactly the same way, no?  Why do you
oppose having the adjustment and phase values behind an ntp_function()
interface?

Maybe to focus this productively, I'll try to step back and outline the
goals at a high level and you can address those. 

My Assumptions:
1. adjtimex() sets/gets NTP state values
2. Every tick we adjust those state values
3. Every tick we use those values to make a nanosecond adjustment to
time.
4. Those state values are otherwise unused.

Goals:
1. Isolate NTP code to clean up the tick based timekeeping, reducing the
spaghetti-like code interactions.
2. Add interfaces to allow for continuous, rather then tick based,
adjustments (much how ppc64 does currently, only shareable).

thanks
-john

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [RFC - 0/9] Generic timekeeping subsystem  (v. B5)
  2005-08-16 18:25       ` Christoph Lameter
  2005-08-16 23:48         ` john stultz
@ 2005-08-17  6:08         ` Ulrich Windl
  2005-08-17 14:07           ` Christoph Lameter
  1 sibling, 1 reply; 71+ messages in thread
From: Ulrich Windl @ 2005-08-17  6:08 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Roman Zippel, lkml, George Anzinger, frank, Anton Blanchard, benh,
	Nishanth Aravamudan

On 16 Aug 2005 at 11:25, Christoph Lameter wrote:

> You mentioned that the NTP code has some issues with time interpolation 
> at the KS. This is due to the NTP layer not being aware of actual time 
> differences between timer interrupts that the interpolator knows about. If 
> the NTP layer would be aware of the actual intervals measured by the 
> timesource (or interpolator) then presumably time could be adjusted in a 
> more accurate way.

Hi,

whatever the implementation is, at some point there must exist an interface go get 
and set "normal time", free of any jumps and jitter. That "frontend time" will be 
used a a base of correction. Basically that means time should be as monotonic and 
jitter free as possible for any measurement interval you like.

Otherwise when extrapolating the time-error, it (NTP) will try to overcompensate 
(or undercompensate), making the whole thing instable.

Here's a sample from some ancient NTP distribution (pre-nanosecond), but you'll 
get the idea what to check:

more util/jitter.c
/*
 * This program can be used to calibrate the clock reading jitter of a
 * particular CPU and operating system. It first tickles every element
 * of an array, in order to force pages into memory, then repeatedly calls
 * gettimeofday() and, finally, writes out the time values for later
 * analysis. From this you can determine the jitter and if the clock ever
 * runs backwards.
 */
#include <sys/time.h>
#include <stdio.h>

#define NBUF 20002

void
main()
{
        struct timeval ts, tr;
        struct timezone tzp;
        long temp, j, i, gtod[NBUF];

        gettimeofday(&ts, &tzp);

        /*
         * Force pages into memory
         */
        for (i = 0; i < NBUF; i ++)
                gtod[i] = 0;

        /*
         * Construct gtod array
         */
        for (i = 0; i < NBUF; i ++) {
                gettimeofday(&tr, &tzp);
                gtod[i] = (tr.tv_sec - ts.tv_sec) * 1000000 + tr.tv_usec;
        }

        /*
         * Write out gtod array for later processing with S
         */
        for (i = 0; i < NBUF - 2; i++) {
/*
                printf("%lu\n", gtod[i]);
*/
                gtod[i] = gtod[i + 1] - gtod[i];
                printf("%lu\n", gtod[i]);
        }

        /*
         * Sort the gtod array and display deciles
         */
        for (i = 0; i < NBUF - 2; i++) {
                for (j = 0; j <= i; j++) {
                        if (gtod[j] > gtod[i]) {
                                temp = gtod[j];
                                gtod[j] = gtod[i];
                                gtod[i] = temp;
                        }
                }
        }
        fprintf(stderr, "First rank\n");
        for (i = 0; i < 10; i++)
                fprintf(stderr, "%10ld%10ld\n", i, gtod[i]);
        fprintf(stderr, "Last rank\n");
        for (i = NBUF - 12; i < NBUF - 2; i++)
                fprintf(stderr, "%10ld%10ld\n", i, gtod[i]);
}

Regards,
Ulrich


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [RFC - 0/9] Generic timekeeping subsystem  (v. B5)
  2005-08-17  1:17         ` john stultz
@ 2005-08-17  7:40           ` Ulrich Windl
  2005-08-19  0:27           ` Roman Zippel
  1 sibling, 0 replies; 71+ messages in thread
From: Ulrich Windl @ 2005-08-17  7:40 UTC (permalink / raw)
  To: john stultz
  Cc: lkml, George Anzinger, frank, Anton Blanchard, benh,
	Nishanth Aravamudan

On 16 Aug 2005 at 18:17, john stultz wrote:

[...]
> Maybe to focus this productively, I'll try to step back and outline the
> goals at a high level and you can address those. 
> 
> My Assumptions:
> 1. adjtimex() sets/gets NTP state values

One of the greatest mistakes in the past which still affects us now was the 
decision to piggy-back ntp_adjtime and ntp_gettime on top of adjtime() and thus 
creating adjtimex(). Only to save a system-call number or two. WE REALLY SHOULD 
GET RID OF THAT going back to Linux 0.something.

> 2. Every tick we adjust those state values

... which require it. 

> 3. Every tick we use those values to make a nanosecond adjustment to
> time.

...or even more frequent. In my code I tried to scale the tick interpolation as 
well, thus effectively making adjustments even within timer ticks (so far the 
theory...). I was assuming however that ticks and interpolation clocks are derived 
from one single source and would "float" the same way relative to each other.

> 4. Those state values are otherwise unused.

What is "otherwise"? Outside the "NTP clock model", or "between ticks"?

> 
> Goals:
> 1. Isolate NTP code to clean up the tick based timekeeping, reducing the
> spaghetti-like code interactions.

First you need a new clock model that's compatible with NTP. Then you can consider 
how to implement the NTP stuff. So the clock even without NTP has to be strictly 
monotonic for any interval it is read, be it nanoseconds, microseconds, 
milliseconds, seconds, minutes, hours, days, ... The clock delta (=increase of 
time) over time should be as constant as possible (i.e. time shouldn't go up like 
stairs).

> 2. Add interfaces to allow for continuous, rather then tick based,
> adjustments (much how ppc64 does currently, only shareable).

Adjustments to the clock _model_ are asynchronous by definition, while adjustments 
to the clock itself are, well, periodic. Whatever the period.

Maybe this helps and can be agreed on.

Regards,
Ulrich

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [RFC - 0/9] Generic timekeeping subsystem  (v. B5)
  2005-08-17  6:08         ` Ulrich Windl
@ 2005-08-17 14:07           ` Christoph Lameter
  0 siblings, 0 replies; 71+ messages in thread
From: Christoph Lameter @ 2005-08-17 14:07 UTC (permalink / raw)
  To: Ulrich Windl
  Cc: Roman Zippel, lkml, George Anzinger, frank, Anton Blanchard, benh,
	Nishanth Aravamudan

On Wed, 17 Aug 2005, Ulrich Windl wrote:

> whatever the implementation is, at some point there must exist an interface go get 
> and set "normal time", free of any jumps and jitter. That "frontend time" will be 
> used a a base of correction. Basically that means time should be as monotonic and 
> jitter free as possible for any measurement interval you like.

The interpolator provides such a time as xtime + offset and will self-tune 
to be as accurate as possible given fluctuations of the timer interrupt. 
It will even adapt to NTP interval variations.

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [RFC - 0/9] Generic timekeeping subsystem  (v. B5)
  2005-08-17  0:28       ` Roman Zippel
  2005-08-17  1:17         ` john stultz
@ 2005-08-17 19:03         ` George Anzinger
  1 sibling, 0 replies; 71+ messages in thread
From: George Anzinger @ 2005-08-17 19:03 UTC (permalink / raw)
  To: Roman Zippel
  Cc: john stultz, lkml, frank, Anton Blanchard, benh,
	Nishanth Aravamudan, Ulrich Windl

Roman Zippel wrote:


~

The thing that worries me about this function is that it does every 
thing in usec.  We are using nsec in xtime now and I wonder if it would 
not be more accurate to do the math in nsecs.  Even tick size 
(tick_nsec) does not translate well to usec, it currently being 999849 
nsecs.

George
> ---
> 
>  kernel/time.c  |    3 ++-
>  kernel/timer.c |   53 +++++++++++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 55 insertions(+), 1 deletion(-)
> 
> Index: linux-2.6/kernel/time.c
> ===================================================================
> --- linux-2.6.orig/kernel/time.c	2005-07-13 03:18:04.000000000 +0200
> +++ linux-2.6/kernel/time.c	2005-08-16 01:37:20.000000000 +0200
> @@ -366,8 +366,9 @@ int do_adjtimex(struct timex *txc)
>  	    } /* txc->modes & ADJ_OFFSET */
>  	    if (txc->modes & ADJ_TICK) {
>  		tick_usec = txc->tick;
> -		tick_nsec = TICK_USEC_TO_NSEC(tick_usec);
>  	    }
> +	    if (txc->modes & (ADJ_FREQUENCY|ADJ_OFFSET|ADJ_TICK))
> +		    time_recalc();
>  	} /* txc->modes */
>  leave:	if ((time_status & (STA_UNSYNC|STA_CLOCKERR)) != 0
>  	    || ((time_status & (STA_PPSFREQ|STA_PPSTIME)) != 0
> Index: linux-2.6/kernel/timer.c
> ===================================================================
> --- linux-2.6.orig/kernel/timer.c	2005-07-13 03:18:04.000000000 +0200
> +++ linux-2.6/kernel/timer.c	2005-08-16 23:10:53.000000000 +0200
> @@ -559,6 +559,7 @@ found:
>   */
>  unsigned long tick_usec = TICK_USEC; 		/* USER_HZ period (usec) */
>  unsigned long tick_nsec = TICK_NSEC;		/* ACTHZ period (nsec) */
> +unsigned long tick_nsec2 = TICK_NSEC;
>  
>  /* 
>   * The current time 
> @@ -569,6 +570,7 @@ unsigned long tick_nsec = TICK_NSEC;		/*
>   * the usual normalization.
>   */
>  struct timespec xtime __attribute__ ((aligned (16)));
> +struct timespec xtime2 __attribute__ ((aligned (16)));
>  struct timespec wall_to_monotonic __attribute__ ((aligned (16)));
>  
>  EXPORT_SYMBOL(xtime);
> @@ -596,6 +598,33 @@ static long time_adj;			/* tick adjust (
>  long time_reftime;			/* time at last adjustment (s)	*/
>  long time_adjust;
>  long time_next_adjust;
> +static long time_adj2, time_adj2_cur, time_freq_adj2, time_freq_phase2, time_phase2;
> +
> +void time_recalc(void)
> +{
> +	long f, t;
> +	tick_nsec = TICK_USEC_TO_NSEC(tick_usec);

This leaves bits on the floor.  Is it not possible to do this whole 
calculation in nano seconds?  Currently, for example, tick_nsec is 999849...
> +
> +	t = time_freq >> (SHIFT_USEC + 8);
> +	if (t) {
> +		time_freq -= t << (SHIFT_USEC + 8);
> +		t *= 1000 << 8;
> +	}
> +	f = time_freq * 125;
> +	t += tick_usec * USER_HZ * 1000 + (f >> (SHIFT_USEC - 3));
> +	f &= (1 << (SHIFT_USEC - 3)) - 1;
> +	tick_nsec2 = t / HZ;
> +	f += (t % HZ) << (SHIFT_USEC - 3);
> +	f <<= 5;
> +	time_adj2 = f / HZ;
> +	time_freq_adj2 = f % HZ;
> +
> +	printk("tr: %ld.%09ld(%ld,%ld,%ld,%ld) - %ld.%09ld(%ld,%ld,%ld)\n",
> +		xtime.tv_sec, xtime.tv_sec,
> +		tick_nsec, time_freq, time_offset, time_next_adjust,
> +		xtime2.tv_sec, xtime2.tv_nsec,
> +		tick_nsec2, time_adj2, time_freq_adj2);
> +}
>  
>  /*
>   * this routine handles the overflow of the microsecond field
> @@ -739,6 +768,16 @@ static void second_overflow(void)
>  #endif
>  }
>  
> +static void second_overflow2(void)
> +{
> +	time_adj2_cur = time_adj2;
> +	time_freq_phase2 += time_freq_adj2;
> +	if (time_freq_phase2 > HZ) {
> +		time_freq_phase2 -= HZ;
> +		time_adj2_cur++;
> +	}
> +}
> +
>  /* in the NTP reference this is called "hardclock()" */
>  static void update_wall_time_one_tick(void)
>  {
> @@ -786,6 +825,20 @@ static void update_wall_time_one_tick(vo
>  		time_adjust = time_next_adjust;
>  		time_next_adjust = 0;
>  	}
> +
> +	delta_nsec = tick_nsec2;
> +	time_phase2 += time_adj2_cur;
> +	if (time_phase2 >= (1 << (SHIFT_USEC + 2))) {
> +		long ltemp = time_phase2 >> (SHIFT_USEC + 2);
> +		time_phase2 -= ltemp << (SHIFT_USEC + 2);
> +		delta_nsec += ltemp;
> +	}
> +	xtime2.tv_nsec += delta_nsec;
> +	if (xtime2.tv_nsec >= NSEC_PER_SEC) {
> +		xtime2.tv_nsec -= NSEC_PER_SEC;
> +		xtime2.tv_sec++;
> +		second_overflow2();
> +	}
>  }
>  
>  /*
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

-- 
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [RFC - 0/9] Generic timekeeping subsystem  (v. B5)
  2005-08-17  1:17         ` john stultz
  2005-08-17  7:40           ` Ulrich Windl
@ 2005-08-19  0:27           ` Roman Zippel
  2005-08-20  2:32             ` john stultz
  1 sibling, 1 reply; 71+ messages in thread
From: Roman Zippel @ 2005-08-19  0:27 UTC (permalink / raw)
  To: john stultz
  Cc: lkml, George Anzinger, frank, Anton Blanchard, benh,
	Nishanth Aravamudan, Ulrich Windl

Hi,

On Tue, 16 Aug 2005, john stultz wrote:

> If they are private clock variables, why are they in the generic
> timer.c? Everyone is using it in exactly the same way, no?  Why do you
> oppose having the adjustment and phase values behind an ntp_function()
> interface?

These values belong to the clock, NTP specifies the speed of the clock via 
tick/frequency, these variables specify the current state of the clock. If 
we assume there is only one system clock, we need only one set of those, 
so leaving them in timer.c doesn't really hurt. How these variables are 
updated depends on the clock, so separating them doesn't make much sense.

> Maybe to focus this productively, I'll try to step back and outline the
> goals at a high level and you can address those. 
> 
> My Assumptions:
> 1. adjtimex() sets/gets NTP state values
> 2. Every tick we adjust those state values
> 3. Every tick we use those values to make a nanosecond adjustment to
> time.
> 4. Those state values are otherwise unused.
> 
> Goals:
> 1. Isolate NTP code to clean up the tick based timekeeping, reducing the
> spaghetti-like code interactions.
> 2. Add interfaces to allow for continuous, rather then tick based,
> adjustments (much how ppc64 does currently, only shareable).

Cleaning up the code would be nice, but that shouldn't be the priority 
right now, first we should get the math right.
I looked a bit more on this aspect of your patch and I think it's overly 
complex even for continuous time sources. You can reduce the complexity 
by updating the clock in more regular intervals. 

What basically is needed to update in constant intervals (n cycles) a 
reference time controlled via NTP and the system time. The difference 
between those two can be used to adjust the cycle multiplier for the next 
n cycles to speed up or slow down the system clock.
Calculating the offset in constant intervals makes the math a lot simpler, 
basically the current code is just a special case of that, where it 
directly updates the system time from the reference time at every tick.
(In the end the differences between tick based and continuous sources may 
be even smaller than your current patches suggest. :) )

bye, Roman

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [RFC - 0/9] Generic timekeeping subsystem  (v. B5)
  2005-08-19  0:27           ` Roman Zippel
@ 2005-08-20  2:32             ` john stultz
  2005-08-21 23:19               ` Roman Zippel
  0 siblings, 1 reply; 71+ messages in thread
From: john stultz @ 2005-08-20  2:32 UTC (permalink / raw)
  To: Roman Zippel
  Cc: lkml, George Anzinger, frank, Anton Blanchard, benh,
	Nishanth Aravamudan, Ulrich Windl

On Fri, 2005-08-19 at 02:27 +0200, Roman Zippel wrote:
> On Tue, 16 Aug 2005, john stultz wrote:
> > Maybe to focus this productively, I'll try to step back and outline the
> > goals at a high level and you can address those. 
> > 
> > My Assumptions:
> > 1. adjtimex() sets/gets NTP state values
> > 2. Every tick we adjust those state values
> > 3. Every tick we use those values to make a nanosecond adjustment to
> > time.
> > 4. Those state values are otherwise unused.
> > 
> > Goals:
> > 1. Isolate NTP code to clean up the tick based timekeeping, reducing the
> > spaghetti-like code interactions.
> > 2. Add interfaces to allow for continuous, rather then tick based,
> > adjustments (much how ppc64 does currently, only shareable).
> 
> Cleaning up the code would be nice, but that shouldn't be the priority 
> right now, first we should get the math right.
> I looked a bit more on this aspect of your patch and I think it's overly 
> complex even for continuous time sources. You can reduce the complexity 
> by updating the clock in more regular intervals. 

I feel in some ways I do this (inside the second overflow loop), but
maybe I'm misunderstanding you.


> What basically is needed to update in constant intervals (n cycles) a 
> reference time controlled via NTP and the system time. The difference 
> between those two can be used to adjust the cycle multiplier for the next 
> n cycles to speed up or slow down the system clock.
> Calculating the offset in constant intervals makes the math a lot simpler, 
> basically the current code is just a special case of that, where it 
> directly updates the system time from the reference time at every tick.
> (In the end the differences between tick based and continuous sources may 
> be even smaller than your current patches suggest. :) )

That would be great! So, would you mind helping me scratch out some
pseudo code for your idea?

Currently we have something like: 
===============================================
do_adjtimex():
	set ntp_status/maxerror/esterror/constant values
	set ntp_freq
	set ntp_tick
	if (singleshot_mode):
		set ntp_adjtime_offset
	else:
		set ntp_offset
		if appropriate, adjust ntp_freq


timer_interrupt():
	if (second_overflow):
		adjust ntp_maxerror/status
		/* calculate per tick phase adjustment 
		   using ntp_offset and ntp_freq
		*/
		sub_offset = math(ntp_offset)
		ntp_offset -= sub_offset
		phase_adj = math(sub_offset)		
		phase_adj += math(ntp_freq)

		leapsecond_stuff()

	tick_adjustment = 0;

	/* calculate singleshot adjustment */
	if (ntp_adjtime_offset):
		adj = min(ntp_adjtime_offset, tick_adj)
		ntp_adjtime_offset -= adj
	
		tick_adjustment += adj

	/* calculate the phase adjustment */
	phase += phase_adj
	if (phase > UNIT):
		phase -= UNIT
		tick_adjustment += UNIT


	xtime += ntp_tick + tick_adjustment



gettimeofday():
	return xtime + hardware_offset()




For continuous timesources, I'd like to see something like:
===============================================
do_adjtimex():
	no changes, only the addition of
	ntp_tick_ppm = calulate_ppm(ntp_tick)


timekeeping_perioidic_hook():

	/* get ntp adjusted interval length*/
	interval_length = get_timesource_interval(ppm)

	/* accumulate the NTP adjusted interval */
	xtime += interval_length

	/* inform NTP state machine that we have 
	   applied the last calculated adjustment for 
	   the interval length
	*/

	ntp_interval += interval_length
	while (ntp_interval > SECOND): /* just like second_overflow */
		adjust ntp_maxerror/status
		/* calculate the offset ppm adjustment */
		sub_offset = math(ntp_offset)
		ntp_offset -= sub_offset
		offset_ppm = math(sub_offset)

		/* same thing for single shot ntp_adjtime_offset */
		sub_ss_offset = math(ntp_adjtime_offset)
		ntp_adjtime_offset -= sub_ss_offset
		ss_offset_ppm = math(sub_ss_offset)


	/* sum up the ppm adjustments into a single ntp adjustment */
	ppm = offset_ppm + ntp_freq + ss_offset_ppm + ntp_tick_ppm

	leapsecond_stuff()

do_gettimeofday():
	interval = get_timesource_interval(ppm)
	return xtime + interval



Now could you adapt this to better show me what you're thinking of?

thanks
-john





^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [RFC - 0/9] Generic timekeeping subsystem  (v. B5)
  2005-08-20  2:32             ` john stultz
@ 2005-08-21 23:19               ` Roman Zippel
  2005-08-22 18:57                 ` john stultz
  0 siblings, 1 reply; 71+ messages in thread
From: Roman Zippel @ 2005-08-21 23:19 UTC (permalink / raw)
  To: john stultz
  Cc: lkml, George Anzinger, frank, Anton Blanchard, benh,
	Nishanth Aravamudan, Ulrich Windl

Hi,

On Fri, 19 Aug 2005, john stultz wrote:

> timekeeping_perioidic_hook():
> 
> 	/* get ntp adjusted interval length*/
> 	interval_length = get_timesource_interval(ppm)

Here starts the problem, this requires more expensive math than necessary, 
as every time you first have to scale the values.

Let's take a standard PIT timer as an example. With HZ=100 we program it 
with 11932, for simplicity let's assume this corresponds to 10^7ns and 
scale this by 2^8. This means the timer multiplier is initially 214549, 
this updates the system time by 214549*11932 and the reference time by 
10^7*2^8 every tick. We can now just ignore the error or as soon as it 
exceeds 11932/2 we increase/decrease the mutiplier. The error calculation 
is rather simple, usually just adds and shifts, only if the error exceeds 
2*11932 it gets a little more complicated, but even here the possible 
divide is avoidable.
The gettimeofday would then basically be "xtime + (cycle_offset * mult + 
error_offset) / 2^8". Depending on the update frequency and the required 
precision it's even possible to keep this within 32bit. The ntp part stays 
pretty much the same and the time source can add anything it wants on top 
of that. The basic math is also pretty much the same so we can generate 
most of the code depending on various parameters.

bye, Roman

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [RFC - 0/9] Generic timekeeping subsystem  (v. B5)
  2005-08-21 23:19               ` Roman Zippel
@ 2005-08-22 18:57                 ` john stultz
  2005-08-23 11:30                   ` Roman Zippel
  0 siblings, 1 reply; 71+ messages in thread
From: john stultz @ 2005-08-22 18:57 UTC (permalink / raw)
  To: Roman Zippel
  Cc: lkml, George Anzinger, frank, Anton Blanchard, benh,
	Nishanth Aravamudan, Ulrich Windl

On Mon, 2005-08-22 at 01:19 +0200, Roman Zippel wrote:
> Hi,
> 
> On Fri, 19 Aug 2005, john stultz wrote:
> 
> > timekeeping_perioidic_hook():
> > 
> > 	/* get ntp adjusted interval length*/
> > 	interval_length = get_timesource_interval(ppm)
> 
> Here starts the problem, this requires more expensive math than necessary, 
> as every time you first have to scale the values.

Hmmm. I feel like we're mixing signals. See below for more on this. 

> Let's take a standard PIT timer as an example. With HZ=100 we program it 
> with 11932, for simplicity let's assume this corresponds to 10^7ns and 
> scale this by 2^8. This means the timer multiplier is initially 214549, 
> this updates the system time by 214549*11932 and the reference time by 
> 10^7*2^8 every tick. We can now just ignore the error or as soon as it 
> exceeds 11932/2 we increase/decrease the mutiplier. The error calculation 
> is rather simple, usually just adds and shifts, only if the error exceeds 
> 2*11932 it gets a little more complicated, but even here the possible 
> divide is avoidable.

I feel like we're talking about different problems. Which reference
clock (other then the system clock) are you wanting to increment at the
tick time? Do you mean the ntp time_offset value? A little bit of psudo
code might go a long way in helping me understand your solution.

Also I'm not sure how this is connects to the continuous timesource
situation where we do not assume timer ticks are not lost or late. 

> The gettimeofday would then basically be "xtime + (cycle_offset * mult + 
> error_offset) / 2^8". Depending on the update frequency and the required 
> precision it's even possible to keep this within 32bit. The ntp part stays 
> pretty much the same and the time source can add anything it wants on top 
> of that. The basic math is also pretty much the same so we can generate 
> most of the code depending on various parameters.

Again, I must not be understanding what you're suggesting.  Above where
you called get_timesource_interval(ppm) too expensive, what you're
suggesting here is almost exactly what get_timesource_interval(ppm)
would do.  In my timeofday patches, its called cyc2ns() and gettimeofday
looks like:
  xtime + cyc2ns(timesource, ntp_adjustment, cycle_delta)

Where cyc2ns does:
  (cycle_delta * (timesource->mult + ntp_adjustment))>>timesource->shift

The reason why we calculate the interval_length in the continuous
timesource case is because we are not assuming anything about the
frequency that the timekeeping_periodic_hook() is called.

Again, I'm really wanting to address your concerns, but I still do not
really understand the specific objections. 

thanks
-john

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [RFC - 0/9] Generic timekeeping subsystem  (v. B5)
  2005-08-22 18:57                 ` john stultz
@ 2005-08-23 11:30                   ` Roman Zippel
  2005-08-23 18:52                     ` john stultz
  2005-08-23 20:51                     ` john stultz
  0 siblings, 2 replies; 71+ messages in thread
From: Roman Zippel @ 2005-08-23 11:30 UTC (permalink / raw)
  To: john stultz
  Cc: lkml, George Anzinger, frank, Anton Blanchard, benh,
	Nishanth Aravamudan, Ulrich Windl

Hi,

On Mon, 22 Aug 2005, john stultz wrote:

> The reason why we calculate the interval_length in the continuous
> timesource case is because we are not assuming anything about the
> frequency that the timekeeping_periodic_hook() is called.

The problem with your patch is that it doesn't allow making such 
assumptions.
Anyway, it's rather simple, if you want to update the time asynchronously:

	cycle_offset = get_cycles() - last_update;

	while (cycle_offset >= update_cycles) {
		cycle_offset -= update_cycles;
		last_update += update_cycles;
		// at init: system_update = update_cycles * mult;
		system_time += system_update;
		xtime += [tick_nsec, time_adj];
	}

	error = system_time - (xtime.tv_nsec << shift);

	if (abs(error) > update_cycles/2) {
		mult_adj = (error +- update_cycles/2) / update_cycles;
		mult += mult_adj;
		system_update += mult_adj * update_cycles;
		system_time -= mult_adj * cycle_offset;
		error -= mult_adj * cycle_offset;
	}

	if (xtime.tv_nsec + (error >> shift) > NSEC_PER_SEC) {
		system_time -= NSEC_PER_SEC << shift;
		second_overflow();
	}

Since we usually don't have to adjust for the error all at once, it should 
be possible to precalculate some of it in adjtimex/second_overflow and 
turn mult_adj into a mult_adj_shift.
I didn't really check the math here in detail, so there should be enough 
errors left :), but I hope it's enough to show the idea (especially how to 
do it without mult/divide).

There are now variations of this possible, the initial cycle_offset can be 
constant, this happens if it's regularly  called from an interrupt (and 
it's sufficient for UP systems). We could also completely ignore the 
error, so that the core calculation of the above results in the familiar:

	xtime += [tick_nsec, time_adj];
	if (xtime.tv_nsec > NSEC_PER_SEC)
		second_overflow();

Another variation would be useful for ppc64 (or maybe any 64bit arch, but 
ppc64 has already the matching gettimeofday). In this case we don't use a 
timespec based xtime and don't scale it to ns, but use 64bit values 
instead scaled to seconds.
The last one may become a bit of a challenge to keep as much as possible 
code common without abusing the preprocessor too much. In any case some 
functions will differ completely anyway, especially gettimeofday will be 
optimized differently depending on the arch/clock requirements, OTOH
introducing a common gettimeofday (that would even require a 64bit 
divide) would be a huge mistake.

bye, Roman

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [RFC - 0/9] Generic timekeeping subsystem  (v. B5)
  2005-08-23 11:30                   ` Roman Zippel
@ 2005-08-23 18:52                     ` john stultz
  2005-08-23 20:51                     ` john stultz
  1 sibling, 0 replies; 71+ messages in thread
From: john stultz @ 2005-08-23 18:52 UTC (permalink / raw)
  To: Roman Zippel
  Cc: lkml, George Anzinger, frank, Anton Blanchard, benh,
	Nishanth Aravamudan, Ulrich Windl

On Tue, 2005-08-23 at 13:30 +0200, Roman Zippel wrote:
> Hi,
> 
> On Mon, 22 Aug 2005, john stultz wrote:
> 
> > The reason why we calculate the interval_length in the continuous
> > timesource case is because we are not assuming anything about the
> > frequency that the timekeeping_periodic_hook() is called.
> 
> The problem with your patch is that it doesn't allow making such 
> assumptions.
> Anyway, it's rather simple, if you want to update the time asynchronously:
> 
> 	cycle_offset = get_cycles() - last_update;
> 
> 	while (cycle_offset >= update_cycles) {
> 		cycle_offset -= update_cycles;
> 		last_update += update_cycles;
> 		// at init: system_update = update_cycles * mult;
> 		system_time += system_update;
> 		xtime += [tick_nsec, time_adj];
> 	}
> 
> 	error = system_time - (xtime.tv_nsec << shift);
> 
> 	if (abs(error) > update_cycles/2) {
> 		mult_adj = (error +- update_cycles/2) / update_cycles;
> 		mult += mult_adj;
> 		system_update += mult_adj * update_cycles;
> 		system_time -= mult_adj * cycle_offset;
> 		error -= mult_adj * cycle_offset;
> 	}
> 
> 	if (xtime.tv_nsec + (error >> shift) > NSEC_PER_SEC) {
> 		system_time -= NSEC_PER_SEC << shift;
> 		second_overflow();
> 	}


AH! Ok, now I get it. Forgive me for being so dense, but code is just so
much more concrete and understandable. Let me take a swing at
integrating some of this idea into my code and then we can go around
again. :)


> The last one may become a bit of a challenge to keep as much as possible 
> code common without abusing the preprocessor too much. In any case some 
> functions will differ completely anyway, especially gettimeofday will be 
> optimized differently depending on the arch/clock requirements, OTOH
> introducing a common gettimeofday (that would even require a 64bit 
> divide) would be a huge mistake.

I'd always want to allow for arch specific implementations, but there
are many cases where the code is doing the exact same thing, so I'd like
to at least consolidate those users. No divides in the hot-path are
necessary.

Thanks again for the review and patience. I really do appreciate it.

thanks
-john


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [RFC - 0/9] Generic timekeeping subsystem  (v. B5)
  2005-08-23 11:30                   ` Roman Zippel
  2005-08-23 18:52                     ` john stultz
@ 2005-08-23 20:51                     ` john stultz
  2005-08-23 21:34                       ` Roman Zippel
  1 sibling, 1 reply; 71+ messages in thread
From: john stultz @ 2005-08-23 20:51 UTC (permalink / raw)
  To: Roman Zippel
  Cc: lkml, George Anzinger, frank, Anton Blanchard, benh,
	Nishanth Aravamudan, Ulrich Windl

On Tue, 2005-08-23 at 13:30 +0200, Roman Zippel wrote:
> On Mon, 22 Aug 2005, john stultz wrote:
> 
> > The reason why we calculate the interval_length in the continuous
> > timesource case is because we are not assuming anything about the
> > frequency that the timekeeping_periodic_hook() is called.
> 
> The problem with your patch is that it doesn't allow making such 
> assumptions.
> Anyway, it's rather simple, if you want to update the time asynchronously:
> 
> 	cycle_offset = get_cycles() - last_update;
> 
> 	while (cycle_offset >= update_cycles) {
> 		cycle_offset -= update_cycles;
> 		last_update += update_cycles;
> 		// at init: system_update = update_cycles * mult;
> 		system_time += system_update;
> 		xtime += [tick_nsec, time_adj];
> 	}

Hmm. An issue cropped up when I started working on this: It seems its
prone to time inconsistencies. 

One of the bug issues with my work is that we consistently accumulate
time in the exact same manner that we use it when calculating
gettimeofday.

That is:
	gettimeofday():
		xtime + cyc2ns(timesource, ntp_adj, cycle_delta)

	periodic_hook():
		interval = cyc2ns(timesource, ntp_adj, cycle_delta)
		xtime += interval
		...

Since we accumulate the entire interval using the same ntp_adjustment,
we ensure that time will not go briefly backwards around a call to
periodic_hook().

In the case above, you're accumulating in fixed cycle intervals. This
does avoid having to do the mult/shift combo each interrupt, however
since you do not accumulate the entire interval, and there is some
sub-tick remainder in cycle_offset. We have to ensure that that sub-tick
remainder is accumulated at the next interrupt using the same ntp
adjustment it would use in a call to gettimeofday() just prior to this
interrupt.

Not yet sure how to get around that issue. I'll keep working on it, and
maybe you might be able to shed some light on it?

thanks
-john

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [RFC - 0/9] Generic timekeeping subsystem  (v. B5)
  2005-08-23 20:51                     ` john stultz
@ 2005-08-23 21:34                       ` Roman Zippel
  2005-08-23 23:14                         ` john stultz
  0 siblings, 1 reply; 71+ messages in thread
From: Roman Zippel @ 2005-08-23 21:34 UTC (permalink / raw)
  To: john stultz
  Cc: lkml, George Anzinger, frank, Anton Blanchard, benh,
	Nishanth Aravamudan, Ulrich Windl

Hi,

On Tue, 23 Aug 2005, john stultz wrote:

> In the case above, you're accumulating in fixed cycle intervals. This
> does avoid having to do the mult/shift combo each interrupt, however
> since you do not accumulate the entire interval, and there is some
> sub-tick remainder in cycle_offset. We have to ensure that that sub-tick
> remainder is accumulated at the next interrupt using the same ntp
> adjustment it would use in a call to gettimeofday() just prior to this
> interrupt.

Look closer and you'll notice that the cycle_offset remainder isn't lost. :)

bye, Roman

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [RFC - 0/9] Generic timekeeping subsystem  (v. B5)
  2005-08-23 21:34                       ` Roman Zippel
@ 2005-08-23 23:14                         ` john stultz
  2005-08-23 23:54                           ` Roman Zippel
  0 siblings, 1 reply; 71+ messages in thread
From: john stultz @ 2005-08-23 23:14 UTC (permalink / raw)
  To: Roman Zippel
  Cc: Ulrich Windl, Nishanth Aravamudan, benh, Anton Blanchard, frank,
	George Anzinger, lkml

On Tue, 2005-08-23 at 23:34 +0200, Roman Zippel wrote: 
> Hi,
> 
> On Tue, 23 Aug 2005, john stultz wrote:
> 
> > In the case above, you're accumulating in fixed cycle intervals. This
> > does avoid having to do the mult/shift combo each interrupt, however
> > since you do not accumulate the entire interval, and there is some
> > sub-tick remainder in cycle_offset. We have to ensure that that sub-tick
> > remainder is accumulated at the next interrupt using the same ntp
> > adjustment it would use in a call to gettimeofday() just prior to this
> > interrupt.
> 
> Look closer and you'll notice that the cycle_offset remainder isn't lost. :)

I'm not saying its lost, but that it is accumulated differently then how
it would be used in gettimeofday().

So a somewhat lengthy and exaggerated example with a clock that has 2
cycles per milisecond, HZ=1000 and a 0ppm adjustment to begin with.

I'm assuming gettimeofday()/clock_gettime() looks something like:
   xtime + (get_cycles()-last_update)*(mult+ntp_adj)>>shift

Using (1,000,000, 1) for the (mult,shift) pair.

So first, the easy case:

time(0): gettimeofday: 0 + (0 - 0)*(1M+0)>>1 = 0 ns
time(1): gettimeofday: 0 + (1 - 0)*(1M+0)>>1 = 0.5M ns
time(2): gettimeofday: 0 + (2 - 0)*(1M+0)>>1 = 1M ns
time(2): interrupt
time(2): gettimeofday: 1M + (2 - 2)*(1M+0)>>1 = 1M ns
time(3): gettimeofday: 1M + (3 - 2)*(1M+0)>>1 = 1.5M ns


Now, lets look at how we deal with ticks that arrive late:
time(6): gettimeofday: 2M + (6 - 4)*(1M+0)>>1 = 3M ns
time(7): gettimeofday: 2M + (7 - 4)*(1M+0)>>1 = 3.5M ns
time(7): interrupt
time(7): gettimeofday: 3M + (7 - 6)*(1M+0)>>1 = 3.5M ns
time(8): gettimeofday: 3M + (8 - 6)*(1M+0)>>1 = 4M ns

So everything looks ducky. Now on to when we make NTP adjustments.

time(11): gettimeofday: 5M + (11 - 10)*(1M+0)>>1 = 5.5M ns
time(12): gettimeofday: 5M + (12 - 10)*(1M+0)>>1 = 6M ns
time(12): interrupt (set ntp_adj = 1000 ~= 500ppm)
time(12): gettimeofday: 6M + (12 - 12)*(1M+ 1000)>>1 = 6M ns
time(13): gettimeofday: 6M + (13 - 12)*(1M+ 1000)>>1 = 6,500,500 ns
time(14): gettimeofday: 6M + (14 - 12)*(1M+ 1000)>>1 = 7,001,000 ns

Still doing fine. Now lets look at doing NTP adjustments while ticks
arrive late:

time(15): gettimeofday: 7,001k + (15 - 14)*(1M+ 1000)>>1 = 7,501,500 ns
time(16): gettimeofday: 7,001k + (16 - 14)*(1M+ 1000)>>1 = 8,002,000 ns
time(17): gettimeofday: 7,001k + (17 - 14)*(1M+ 1000)>>1 = 8,502,500 ns
time(17): interrupt, (set ntp_adj = 0ppm)
time(17): gettimeofday: 8,002k + (17 - 16)*(1M+ 0)>>1 = 8,502,000 ns


And bang, we have a 500 ns time inconsistency!

And that was only with a tick arriving 1/2 a tick late. I've dealt with
systems that on occasion miss 30ms worth of ticks due to SMI crazyness.

This is why I accumulate the entire interval with NTP adjustments
consistently between the timer tick and gettimeofday. 

Right now I'm not sure how to work around this issue with your proposal,
but let me know if you have an idea or I'm missing some other subtlety.

thanks
-john





^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [RFC - 0/9] Generic timekeeping subsystem  (v. B5)
  2005-08-23 23:14                         ` john stultz
@ 2005-08-23 23:54                           ` Roman Zippel
  2005-08-24  0:29                             ` George Anzinger
                                               ` (2 more replies)
  0 siblings, 3 replies; 71+ messages in thread
From: Roman Zippel @ 2005-08-23 23:54 UTC (permalink / raw)
  To: john stultz
  Cc: Ulrich Windl, Nishanth Aravamudan, benh, Anton Blanchard, frank,
	George Anzinger, lkml

Hi,

On Tue, 23 Aug 2005, john stultz wrote:

> I'm assuming gettimeofday()/clock_gettime() looks something like:
>    xtime + (get_cycles()-last_update)*(mult+ntp_adj)>>shift

Where did you get the ntp_adj from? It's not in my example.
gettimeofday() was in the previous mail: "xtime + (cycle_offset * mult +
error) >> shift". The difference between system time and reference 
time is really important. gettimeofday() returns the system time, NTP 
controls the reference time and these two are synchronized regularly.
I didn't see that anywhere in your example.

bye, Roman

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [RFC - 0/9] Generic timekeeping subsystem  (v. B5)
  2005-08-23 23:54                           ` Roman Zippel
@ 2005-08-24  0:29                             ` George Anzinger
  2005-08-24 20:36                               ` john stultz
  2005-08-24  6:34                             ` Ulrich Windl
  2005-08-24 18:00                             ` john stultz
  2 siblings, 1 reply; 71+ messages in thread
From: George Anzinger @ 2005-08-24  0:29 UTC (permalink / raw)
  To: Roman Zippel
  Cc: john stultz, Ulrich Windl, Nishanth Aravamudan, benh,
	Anton Blanchard, frank, lkml

Roman Zippel wrote:
> Hi,
> 
> On Tue, 23 Aug 2005, john stultz wrote:
> 
> 
>>I'm assuming gettimeofday()/clock_gettime() looks something like:
>>   xtime + (get_cycles()-last_update)*(mult+ntp_adj)>>shift
> 
> 
> Where did you get the ntp_adj from? It's not in my example.
> gettimeofday() was in the previous mail: "xtime + (cycle_offset * mult +
> error) >> shift". The difference between system time and reference 
> time is really important. gettimeofday() returns the system time, NTP 
> controls the reference time and these two are synchronized regularly.
> I didn't see that anywhere in your example.
> 
John,
If I read your example right, the problem is when the NTP adjustment 
changes while the two clocks are out of sync (because of a late tick). 
It would appear that gettimeofday would need to know that the NTP 
adjustment is changing  (and to what).  It would also appear that this 
is known by the ntp code and could be made available to gettimeofday. 
If it is changing due to an NTP call, that system call, itself, 
should/must force synchronization.  So the only case gettimeofday needs 
to worry/know about is that an adjustment is to change at time X to 
value Y.  Also, me thinks there is only one such change that can be 
present at any given time.

Hope this helps...
-- 
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [RFC - 0/9] Generic timekeeping subsystem  (v. B5)
  2005-08-23 23:54                           ` Roman Zippel
  2005-08-24  0:29                             ` George Anzinger
@ 2005-08-24  6:34                             ` Ulrich Windl
  2005-08-24  9:47                               ` Roman Zippel
  2005-08-24 18:00                             ` john stultz
  2 siblings, 1 reply; 71+ messages in thread
From: Ulrich Windl @ 2005-08-24  6:34 UTC (permalink / raw)
  To: Roman Zippel
  Cc: Nishanth Aravamudan, benh, Anton Blanchard, frank,
	George Anzinger, lkml

On 24 Aug 2005 at 1:54, Roman Zippel wrote:

[...]
> error) >> shift". The difference between system time and reference 
> time is really important. gettimeofday() returns the system time, NTP 
> controls the reference time and these two are synchronized regularly.
[...]

Roman,

I'm having a problem with your wording: NTP _does_ control the "system time" 
(system clock), because it's the only clock it can use. The "reference time" is 
usually remote or elsewhere (multiple sources). Local NTP does not control the 
remote reference time(s).

Regards,
Ulrich


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [RFC - 0/9] Generic timekeeping subsystem  (v. B5)
  2005-08-24  6:34                             ` Ulrich Windl
@ 2005-08-24  9:47                               ` Roman Zippel
  0 siblings, 0 replies; 71+ messages in thread
From: Roman Zippel @ 2005-08-24  9:47 UTC (permalink / raw)
  To: Ulrich Windl
  Cc: Nishanth Aravamudan, benh, Anton Blanchard, frank,
	George Anzinger, lkml

Hi,

On Wed, 24 Aug 2005, Ulrich Windl wrote:

> I'm having a problem with your wording: NTP _does_ control the "system time" 
> (system clock), because it's the only clock it can use. The "reference time" is 
> usually remote or elsewhere (multiple sources). Local NTP does not control the 
> remote reference time(s).

I'm open to better wording suggestions, but this is from the kernel 
perspective and ntp daemon has as much control over the kernel time as the 
remote server has control over the ntp daemon (and basically also the 
other way around). Every entity has its own idea of time and uses 
something else as reference. The ntp daemon uses the remote server as 
reference time and the kernel gets from a ntp daemon a reference time. The 
kernel can now either just jump in regular intervals to that reference 
time or it modifies the speed of the system time to keep close to it.
It's really the kernel who modifies the system clock based on the 
parameters from the ntp daemon.

bye, Roman

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [RFC - 0/9] Generic timekeeping subsystem  (v. B5)
  2005-08-23 23:54                           ` Roman Zippel
  2005-08-24  0:29                             ` George Anzinger
  2005-08-24  6:34                             ` Ulrich Windl
@ 2005-08-24 18:00                             ` john stultz
  2005-08-24 18:48                               ` Roman Zippel
  2 siblings, 1 reply; 71+ messages in thread
From: john stultz @ 2005-08-24 18:00 UTC (permalink / raw)
  To: Roman Zippel
  Cc: Ulrich Windl, Nishanth Aravamudan, benh, Anton Blanchard, frank,
	George Anzinger, lkml

On Wed, 2005-08-24 at 01:54 +0200, Roman Zippel wrote:
> Hi,
> 
> On Tue, 23 Aug 2005, john stultz wrote:
> 
> > I'm assuming gettimeofday()/clock_gettime() looks something like:
> >    xtime + (get_cycles()-last_update)*(mult+ntp_adj)>>shift
> 
> Where did you get the ntp_adj from? It's not in my example.
> gettimeofday() was in the previous mail: "xtime + (cycle_offset * mult +
> error) >> shift". The difference between system time and reference 
> time is really important. gettimeofday() returns the system time, NTP 
> controls the reference time and these two are synchronized regularly.
> I didn't see that anywhere in your example.

Ok, so then to clarify the above (as you mention gettimeofday uses
system_time), would your gettimeofday look something like:

gettiemofday():
	return (system_time + (cycle_offset * mult) + error)>> shift

?

thanks
-john


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [RFC - 0/9] Generic timekeeping subsystem  (v. B5)
  2005-08-24 18:00                             ` john stultz
@ 2005-08-24 18:48                               ` Roman Zippel
  2005-08-24 19:15                                 ` john stultz
  0 siblings, 1 reply; 71+ messages in thread
From: Roman Zippel @ 2005-08-24 18:48 UTC (permalink / raw)
  To: john stultz
  Cc: Ulrich Windl, Nishanth Aravamudan, benh, Anton Blanchard, frank,
	George Anzinger, lkml

Hi,

On Wed, 24 Aug 2005, john stultz wrote:

> Ok, so then to clarify the above (as you mention gettimeofday uses
> system_time), would your gettimeofday look something like:
> 
> gettiemofday():
> 	return (system_time + (cycle_offset * mult) + error)>> shift
> 
> ?

No.

	reference_time = xtime;
	system_time = xtime + error >> shift;
	gettimeofday = system_time + (cycle_offset * mult) >> shift;

bye, Roman

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [RFC - 0/9] Generic timekeeping subsystem  (v. B5)
  2005-08-24 18:48                               ` Roman Zippel
@ 2005-08-24 19:15                                 ` john stultz
  2005-08-24 19:49                                   ` Roman Zippel
  0 siblings, 1 reply; 71+ messages in thread
From: john stultz @ 2005-08-24 19:15 UTC (permalink / raw)
  To: Roman Zippel
  Cc: Ulrich Windl, Nishanth Aravamudan, benh, Anton Blanchard, frank,
	George Anzinger, lkml

On Wed, 2005-08-24 at 20:48 +0200, Roman Zippel wrote:
> Hi,
> 
> On Wed, 24 Aug 2005, john stultz wrote:
> 
> > Ok, so then to clarify the above (as you mention gettimeofday uses
> > system_time), would your gettimeofday look something like:
> > 
> > gettiemofday():
> > 	return (system_time + (cycle_offset * mult) + error)>> shift
> > 
> > ?
> 
> No.
> 
> 	reference_time = xtime;
> 	system_time = xtime + error >> shift;
> 	gettimeofday = system_time + (cycle_offset * mult) >> shift;

Eh? In your example code from before you look to be keeping the
system_time and error values in shifted nsec units.

from your example:
>		// at init: system_update = update_cycles * mult;
> 		system_time += system_update;

and:
> 	error = system_time - (xtime.tv_nsec << shift);

This doesn't seem to make sense with the above.  Could you clarify?

thanks
-john


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [RFC - 0/9] Generic timekeeping subsystem  (v. B5)
  2005-08-24 19:15                                 ` john stultz
@ 2005-08-24 19:49                                   ` Roman Zippel
  2005-08-24 22:40                                     ` john stultz
  0 siblings, 1 reply; 71+ messages in thread
From: Roman Zippel @ 2005-08-24 19:49 UTC (permalink / raw)
  To: john stultz
  Cc: Ulrich Windl, Nishanth Aravamudan, benh, Anton Blanchard, frank,
	George Anzinger, lkml

Hi,

On Wed, 24 Aug 2005, john stultz wrote:

> from your example:
> >		// at init: system_update = update_cycles * mult;
> > 		system_time += system_update;
> 
> and:
> > 	error = system_time - (xtime.tv_nsec << shift);
> 
> This doesn't seem to make sense with the above.  Could you clarify?

The example here doesn't keep the complete system time, just enough to 
compute the difference.

bye, Roman

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [RFC - 0/9] Generic timekeeping subsystem  (v. B5)
  2005-08-24  0:29                             ` George Anzinger
@ 2005-08-24 20:36                               ` john stultz
  2005-08-24 23:46                                 ` George Anzinger
  0 siblings, 1 reply; 71+ messages in thread
From: john stultz @ 2005-08-24 20:36 UTC (permalink / raw)
  To: george
  Cc: Roman Zippel, Ulrich Windl, Nishanth Aravamudan, benh,
	Anton Blanchard, frank, lkml

On Tue, 2005-08-23 at 17:29 -0700, George Anzinger wrote:
> Roman Zippel wrote:
> > Hi,
> > 
> > On Tue, 23 Aug 2005, john stultz wrote:
> > 
> > 
> >>I'm assuming gettimeofday()/clock_gettime() looks something like:
> >>   xtime + (get_cycles()-last_update)*(mult+ntp_adj)>>shift
> > 
> > 
> > Where did you get the ntp_adj from? It's not in my example.
> > gettimeofday() was in the previous mail: "xtime + (cycle_offset * mult +
> > error) >> shift". The difference between system time and reference 
> > time is really important. gettimeofday() returns the system time, NTP 
> > controls the reference time and these two are synchronized regularly.
> > I didn't see that anywhere in your example.
> > 

> If I read your example right, the problem is when the NTP adjustment 
> changes while the two clocks are out of sync (because of a late tick). 

Not quite. The issue that I'm trying to describe is that if, we
inconsistently calculate time intervals in gettimeofday and the timer
interrupt, we have the possibility for time inconsistencies.

The trivial example using the current code would be something like:

Again with my 2 cyc per tick clock, HZ=1000.

gettimeofday():
	xtime + offset_ns

timer_interrupt:
	xtime += tick_length + ntp_adj
	offset_ns = 0

0:  gettimeofday:  0 + 0 = 0 ns
1:  gettimeofday:  0 + 500k ns = 500k ns
2:  gettimeofday:  0 + 1M ns = 1M ns
2:  timer_interrupt:  
2:  gettimeofday:  1M ns + 0 ns = 1M ns
3:  gettimeofday:  1M ns + 500k ns = 1.5M ns
4:  gettimeofday:  1M ns + 1M ns = 2 ns
4:  timer_interrupt (using -500ppm adjustment)
4:  gettimeofday:  1,999,500 ns + 0 ns = 1,999,500 ns



> It would appear that gettimeofday would need to know that the NTP 
> adjustment is changing  (and to what).  It would also appear that this 
> is known by the ntp code and could be made available to gettimeofday. 
> If it is changing due to an NTP call, that system call, itself, 
> should/must force synchronization.  So the only case gettimeofday needs 
> to worry/know about is that an adjustment is to change at time X to 
> value Y.  Also, me thinks there is only one such change that can be 
> present at any given time.

Well, in many arches gettimeofday() works around the above issue by
capping the offset_ns value as such:

gettimeofday:
	xtime + min(offset_ns, tick_len + ntp_adj)

The problem with this is that when we have lost or late ticks, or if we
are using dynamic ticks you have granularity problems.


thanks
-john





^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [RFC - 0/9] Generic timekeeping subsystem  (v. B5)
  2005-08-24 19:49                                   ` Roman Zippel
@ 2005-08-24 22:40                                     ` john stultz
  2005-08-25  0:45                                       ` Roman Zippel
  0 siblings, 1 reply; 71+ messages in thread
From: john stultz @ 2005-08-24 22:40 UTC (permalink / raw)
  To: Roman Zippel
  Cc: Ulrich Windl, Nishanth Aravamudan, benh, Anton Blanchard, frank,
	George Anzinger, lkml

On Wed, 2005-08-24 at 21:49 +0200, Roman Zippel wrote:
> On Wed, 24 Aug 2005, john stultz wrote:
> 
> > from your example:
> > >		// at init: system_update = update_cycles * mult;
> > > 		system_time += system_update;
> > 
> > and:
> > > 	error = system_time - (xtime.tv_nsec << shift);
> > 
> > This doesn't seem to make sense with the above.  Could you clarify?
> 
> The example here doesn't keep the complete system time, just enough to 
> compute the difference.

Hey Roman, 

Ok, well, I'm still at a loss for understanding how this avoids my
concern about time inconsistencies. However, I don't want to burn any
more of your patience explaining it, so in the hopes making some
productive outcome, I'm going to take a step back, pull the most trivial
and uncontroversial cleanups and fixes in my patches and try to send
them to Andrew one by one.

Hopefully that will give me a chance to spend some time and understand
your suggestions (or maybe allow someone else to express your
suggestions differently) and think of alternate solutions without
feeling like I'm constantly running into walls.

Again, I really do appreciate the time you've spent giving me feedback.

thanks
-john

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [RFC - 0/9] Generic timekeeping subsystem  (v. B5)
  2005-08-24 20:36                               ` john stultz
@ 2005-08-24 23:46                                 ` George Anzinger
  2005-08-25  0:42                                   ` john stultz
  0 siblings, 1 reply; 71+ messages in thread
From: George Anzinger @ 2005-08-24 23:46 UTC (permalink / raw)
  To: john stultz
  Cc: Roman Zippel, Ulrich Windl, Nishanth Aravamudan, benh,
	Anton Blanchard, frank, lkml

john stultz wrote:
> On Tue, 2005-08-23 at 17:29 -0700, George Anzinger wrote:
> 
>>Roman Zippel wrote:
>>
>>>Hi,
>>>
>>>On Tue, 23 Aug 2005, john stultz wrote:
>>>
>>>
>>>
>>>>I'm assuming gettimeofday()/clock_gettime() looks something like:
>>>>  xtime + (get_cycles()-last_update)*(mult+ntp_adj)>>shift
>>>
>>>
>>>Where did you get the ntp_adj from? It's not in my example.
>>>gettimeofday() was in the previous mail: "xtime + (cycle_offset * mult +
>>>error) >> shift". The difference between system time and reference 
>>>time is really important. gettimeofday() returns the system time, NTP 
>>>controls the reference time and these two are synchronized regularly.
>>>I didn't see that anywhere in your example.
>>>
> 
> 
>>If I read your example right, the problem is when the NTP adjustment 
>>changes while the two clocks are out of sync (because of a late tick). 
> 
> 
> Not quite. The issue that I'm trying to describe is that if, we
> inconsistently calculate time intervals in gettimeofday and the timer
> interrupt, we have the possibility for time inconsistencies.
> 
> The trivial example using the current code would be something like:
> 
> Again with my 2 cyc per tick clock, HZ=1000.
> 
> gettimeofday():
> 	xtime + offset_ns
> 
> timer_interrupt:
> 	xtime += tick_length + ntp_adj
> 	offset_ns = 0
> 
> 0:  gettimeofday:  0 + 0 = 0 ns
> 1:  gettimeofday:  0 + 500k ns = 500k ns
> 2:  gettimeofday:  0 + 1M ns = 1M ns
> 2:  timer_interrupt:  
> 2:  gettimeofday:  1M ns + 0 ns = 1M ns
> 3:  gettimeofday:  1M ns + 500k ns = 1.5M ns
> 4:  gettimeofday:  1M ns + 1M ns = 2 ns
> 4:  timer_interrupt (using -500ppm adjustment)
> 4:  gettimeofday:  1,999,500 ns + 0 ns = 1,999,500 ns
> 
At point 4 you are introducing a NEW ntp adjustment.  This, I submit, 
needs to actually have been introduced to the system prior to the 
interrupt at point 2 with the first xtime change at point 4.  However, 
gettimeofday() should be aware of it from the interrupt at point 2 and 
be doing corrections from that time forward.  Thus when the point 4 
interrutp happens xtime will be the same at the gettimeofday a ns earlier.

Likewise, gettimeofday() needs to know when to stop apply the correction 
so that if a tick is late, it will apply the correction only for those 
times that it was needed.  This, could be done by figuring the offset 
thusly:

offset = (offset from last tick to end of ntp period * ntp_adj1) + 
(offset from end of ntp period to now)

I suppose it is possible that the latter part of the offset is also 
under a different ntp correction which would mean a "* ntp_adj2" is 
needed.  I would argue that only two terms are needed here regardless of 
how late a tick is.  This is because, I would expect the ntp system call 
to sync the two clocks.  This means in your example, the ntp call would 
have been made at, or prior to the timer interrupt at 2 and this is the 
same edge that gettimeofday is to used to start applying the correction.


> 
> 
> 
>>It would appear that gettimeofday would need to know that the NTP 
>>adjustment is changing  (and to what).  It would also appear that this 
>>is known by the ntp code and could be made available to gettimeofday. 
>>If it is changing due to an NTP call, that system call, itself, 
>>should/must force synchronization.  So the only case gettimeofday needs 
>>to worry/know about is that an adjustment is to change at time X to 
>>value Y.  Also, me thinks there is only one such change that can be 
>>present at any given time.
> 
> 
> Well, in many arches gettimeofday() works around the above issue by
> capping the offset_ns value as such:

I think this may have been done with only usec gettimeofday.  Now that 
we have clock_gettime() returning nsec, we need to be a bit more careful.
> 
> gettimeofday:
> 	xtime + min(offset_ns, tick_len + ntp_adj)
> 
> The problem with this is that when we have lost or late ticks, or if we
> are using dynamic ticks you have granularity problems.
> 
>
-- 
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [RFC - 0/9] Generic timekeeping subsystem  (v. B5)
  2005-08-24 23:46                                 ` George Anzinger
@ 2005-08-25  0:42                                   ` john stultz
  2005-08-25  1:44                                     ` George Anzinger
  0 siblings, 1 reply; 71+ messages in thread
From: john stultz @ 2005-08-25  0:42 UTC (permalink / raw)
  To: george
  Cc: Roman Zippel, Ulrich Windl, Nishanth Aravamudan, benh,
	Anton Blanchard, frank, lkml

On Wed, 2005-08-24 at 16:46 -0700, George Anzinger wrote:
> john stultz wrote:
> > On Tue, 2005-08-23 at 17:29 -0700, George Anzinger wrote:
> > 
> >>Roman Zippel wrote:
> >>
> >>>Hi,
> >>>
> >>>On Tue, 23 Aug 2005, john stultz wrote:
> >>>
> >>>
> >>>
> >>>>I'm assuming gettimeofday()/clock_gettime() looks something like:
> >>>>  xtime + (get_cycles()-last_update)*(mult+ntp_adj)>>shift
> >>>
> >>>
> >>>Where did you get the ntp_adj from? It's not in my example.
> >>>gettimeofday() was in the previous mail: "xtime + (cycle_offset * mult +
> >>>error) >> shift". The difference between system time and reference 
> >>>time is really important. gettimeofday() returns the system time, NTP 
> >>>controls the reference time and these two are synchronized regularly.
> >>>I didn't see that anywhere in your example.
> >>>
> > 
> > 
> >>If I read your example right, the problem is when the NTP adjustment 
> >>changes while the two clocks are out of sync (because of a late tick). 
> > 
> > 
> > Not quite. The issue that I'm trying to describe is that if, we
> > inconsistently calculate time intervals in gettimeofday and the timer
> > interrupt, we have the possibility for time inconsistencies.
> > 
> > The trivial example using the current code would be something like:
> > 
> > Again with my 2 cyc per tick clock, HZ=1000.
> > 
> > gettimeofday():
> > 	xtime + offset_ns
> > 
> > timer_interrupt:
> > 	xtime += tick_length + ntp_adj
> > 	offset_ns = 0
> > 
> > 0:  gettimeofday:  0 + 0 = 0 ns
> > 1:  gettimeofday:  0 + 500k ns = 500k ns
> > 2:  gettimeofday:  0 + 1M ns = 1M ns
> > 2:  timer_interrupt:  
> > 2:  gettimeofday:  1M ns + 0 ns = 1M ns
> > 3:  gettimeofday:  1M ns + 500k ns = 1.5M ns
> > 4:  gettimeofday:  1M ns + 1M ns = 2 ns
> > 4:  timer_interrupt (using -500ppm adjustment)
> > 4:  gettimeofday:  1,999,500 ns + 0 ns = 1,999,500 ns
> > 
> At point 4 you are introducing a NEW ntp adjustment.  This, I submit, 
> needs to actually have been introduced to the system prior to the 
> interrupt at point 2 with the first xtime change at point 4.  However, 
> gettimeofday() should be aware of it from the interrupt at point 2 and 
> be doing corrections from that time forward.  Thus when the point 4 
> interrutp happens xtime will be the same at the gettimeofday a ns earlier.

Yes, clearly a forward knowledge of the NTP adjustment is necessary for
gettimeofday(), because after the NTP adjustment has been accumulated
into xtime, there's nothing left for gettimeofday to adjust (its already
been applied). :)


> Likewise, gettimeofday() needs to know when to stop apply the correction 
> so that if a tick is late, it will apply the correction only for those 
> times that it was needed.  This, could be done by figuring the offset 
> thusly:
> 
> offset = (offset from last tick to end of ntp period * ntp_adj1) + 
> (offset from end of ntp period to now)

Well, in my example, the ntp_adjustment is a fixed nanosecond offset, so
it would be added to the nanosecond offset from the last tick (which is
how the current code works). If you are doing scaling (as you have in
the equation above), then the problem goes away, since you can apply the
adjustment consistently through any interval.

> I suppose it is possible that the latter part of the offset is also 
> under a different ntp correction which would mean a "* ntp_adj2" is 
> needed.  

Ok, so your forcing gettimeofday to be interval aware, so its applying
different fixed NTP adjustments to different chunks of the current
interval. The issue of course is if you're using fixed adjustments, is
that you have to have n ntp adjustments for n intervals, or you have to
apply the same ntp adjustment to multiple intervals. 


> I would argue that only two terms are needed here regardless of 
> how late a tick is.  This is because, I would expect the ntp system call 
> to sync the two clocks.  This means in your example, the ntp call would 
> have been made at, or prior to the timer interrupt at 2 and this is the 
> same edge that gettimeofday is to used to start applying the correction.

If you argue that we only need two adjustments, why not argue for only
one? You're saying have one adjustment that you apply for the first
tick's worth of time, and a second adjustment that you apply for the
following N ticks' worth of time in the interval. Why the odd base
case? 


> >>It would appear that gettimeofday would need to know that the NTP 
> >>adjustment is changing  (and to what).  It would also appear that this 
> >>is known by the ntp code and could be made available to gettimeofday. 
> >>If it is changing due to an NTP call, that system call, itself, 
> >>should/must force synchronization.  So the only case gettimeofday needs 
> >>to worry/know about is that an adjustment is to change at time X to 
> >>value Y.  Also, me thinks there is only one such change that can be 
> >>present at any given time.
> > 
> > 
> > Well, in many arches gettimeofday() works around the above issue by
> > capping the offset_ns value as such:
> 
> I think this may have been done with only usec gettimeofday.  Now that 
> we have clock_gettime() returning nsec, we need to be a bit more careful.

Indeed.

thanks
-john


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [RFC - 0/9] Generic timekeeping subsystem  (v. B5)
  2005-08-24 22:40                                     ` john stultz
@ 2005-08-25  0:45                                       ` Roman Zippel
  2005-08-25 18:08                                         ` john stultz
  0 siblings, 1 reply; 71+ messages in thread
From: Roman Zippel @ 2005-08-25  0:45 UTC (permalink / raw)
  To: john stultz
  Cc: Ulrich Windl, Nishanth Aravamudan, benh, Anton Blanchard, frank,
	George Anzinger, lkml

Hi,

On Wed, 24 Aug 2005, john stultz wrote:

> Ok, well, I'm still at a loss for understanding how this avoids my
> concern about time inconsistencies.

Let's take a simple example to demonstrate the difference between system 
time and reference time.
NTP tells us to update the reference time by 1000 units every tick and a 
single tick consists of 123 cycles, so the initial multiplier is 8. This 
means after 1 tick the system time is 984 and off by -16:

time (ticks)	reference time	system time	mult	error
0		0		0		8	0
1		1000		984		8	-16
2		2000		1968		8	-32
3		3000		2952		8	-48
4		4000		3936		9	-64

the error is now big enough, so we speed up system time:

5		5000		5043		9	43
6		6000		6150		8	150

and slow it down again:

7		7000		7134		8	134
8		8000		8118		8	118
9		9000		9102		8	102
10		10000		10086		8	86
11		11000		11070		8	70
12		12000		12054		8	54
13		13000		13038		8	38
14		14000		14022		8	22
15		15000		15006		8	6
16		16000		15990		8	-10
17		17000		16974		8	-26
18		18000		17958		8	-42
19		19000		18942		8	-58
20		20000		19926		8	-74

let's assume we're late with the update by 10 cycles 
(gettimeofday=19926+10*8=20006), so a change to the mult also requires a 
adjustment of the system time:

20+10		20000		19916		9	-84

so gettimeofday=19916+10*9=20006

21		21000		21023		9	23
22		22000		22130		8	130

now add a single adjustment of 500 to the reference time:

23		23500		23114		11	-386
24		24500		24467		8	-33

A detail which is missing now in my example code is that we actually 
should look ahead to the next update, so that multiplier is immediately 
adjusted and the error above would never exceed 123/2 unless an update is 
delayed.

It's really not that difficult :), it's just important to understand the 
difference between reference time and system time. All the NTP adjustments 
are done to the reference time and we manipulate the speed of the system 
clock to keep it close. The latter has _nothing_ to do with NTP so I don't 
want to see anything called like ntp_adj there.

bye, Roman

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [RFC - 0/9] Generic timekeeping subsystem  (v. B5)
  2005-08-25  0:42                                   ` john stultz
@ 2005-08-25  1:44                                     ` George Anzinger
  2005-08-25  2:13                                       ` john stultz
  0 siblings, 1 reply; 71+ messages in thread
From: George Anzinger @ 2005-08-25  1:44 UTC (permalink / raw)
  To: john stultz
  Cc: Roman Zippel, Ulrich Windl, Nishanth Aravamudan, benh,
	Anton Blanchard, frank, lkml

john stultz wrote:
> On Wed, 2005-08-24 at 16:46 -0700, George Anzinger wrote:
> 
>>john stultz wrote:
>>
>>>On Tue, 2005-08-23 at 17:29 -0700, George Anzinger wrote:
>>>
>>>
>>>>Roman Zippel wrote:
>>>>
>>>>
>>>>>Hi,
>>>>>
>>>>>On Tue, 23 Aug 2005, john stultz wrote:
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>>I'm assuming gettimeofday()/clock_gettime() looks something like:
>>>>>> xtime + (get_cycles()-last_update)*(mult+ntp_adj)>>shift
>>>>>
>>>>>
>>>>>Where did you get the ntp_adj from? It's not in my example.
>>>>>gettimeofday() was in the previous mail: "xtime + (cycle_offset * mult +
>>>>>error) >> shift". The difference between system time and reference 
>>>>>time is really important. gettimeofday() returns the system time, NTP 
>>>>>controls the reference time and these two are synchronized regularly.
>>>>>I didn't see that anywhere in your example.
>>>>>
>>>
>>>
>>>>If I read your example right, the problem is when the NTP adjustment 
>>>>changes while the two clocks are out of sync (because of a late tick). 
>>>
>>>
>>>Not quite. The issue that I'm trying to describe is that if, we
>>>inconsistently calculate time intervals in gettimeofday and the timer
>>>interrupt, we have the possibility for time inconsistencies.
>>>
>>>The trivial example using the current code would be something like:
>>>
>>>Again with my 2 cyc per tick clock, HZ=1000.
>>>
>>>gettimeofday():
>>>	xtime + offset_ns
>>>
>>>timer_interrupt:
>>>	xtime += tick_length + ntp_adj
>>>	offset_ns = 0
>>>
>>>0:  gettimeofday:  0 + 0 = 0 ns
>>>1:  gettimeofday:  0 + 500k ns = 500k ns
>>>2:  gettimeofday:  0 + 1M ns = 1M ns
>>>2:  timer_interrupt:  
>>>2:  gettimeofday:  1M ns + 0 ns = 1M ns
>>>3:  gettimeofday:  1M ns + 500k ns = 1.5M ns
>>>4:  gettimeofday:  1M ns + 1M ns = 2 ns
>>>4:  timer_interrupt (using -500ppm adjustment)
>>>4:  gettimeofday:  1,999,500 ns + 0 ns = 1,999,500 ns
>>>
>>
>>At point 4 you are introducing a NEW ntp adjustment.  This, I submit, 
>>needs to actually have been introduced to the system prior to the 
>>interrupt at point 2 with the first xtime change at point 4.  However, 
>>gettimeofday() should be aware of it from the interrupt at point 2 and 
>>be doing corrections from that time forward.  Thus when the point 4 
>>interrutp happens xtime will be the same at the gettimeofday a ns earlier.
> 
> 
> Yes, clearly a forward knowledge of the NTP adjustment is necessary for
> gettimeofday(), because after the NTP adjustment has been accumulated
> into xtime, there's nothing left for gettimeofday to adjust (its already
> been applied). :)
> 
> 
> 
>>Likewise, gettimeofday() needs to know when to stop apply the correction 
>>so that if a tick is late, it will apply the correction only for those 
>>times that it was needed.  This, could be done by figuring the offset 
>>thusly:
>>
>>offset = (offset from last tick to end of ntp period * ntp_adj1) + 
>>(offset from end of ntp period to now)
> 
> 
> Well, in my example, the ntp_adjustment is a fixed nanosecond offset, so
> it would be added to the nanosecond offset from the last tick (which is
> how the current code works). If you are doing scaling (as you have in
> the equation above), then the problem goes away, since you can apply the
> adjustment consistently through any interval.

Until the end of the correction time...
> 
> 
>>I suppose it is possible that the latter part of the offset is also 
>>under a different ntp correction which would mean a "* ntp_adj2" is 
>>needed.  
> 
> 
> Ok, so your forcing gettimeofday to be interval aware, so its applying
> different fixed NTP adjustments to different chunks of the current
> interval. The issue of course is if you're using fixed adjustments, is
> that you have to have n ntp adjustments for n intervals, or you have to
> apply the same ntp adjustment to multiple intervals. 

Uh, are you saying that one ntpd call can set up several different 
adjustments?  I was assuming that any given call would set up either a 
fixed adjustment for ever or a fixed adjustment to be applied for a 
fixed number of ticks (or until so much correcting was done, which, in 
the end is the same thing at this point in the code).

If ntpd has to come back to change the adjustment, I am assuming that 
some kernel action can be taken at that time to sync the xtime clock and 
the gettimeofday reading of it.  I.e. we would only have to keep track 
of one adjustment with a possible pre specified end time.
> 
> 
> 
>>I would argue that only two terms are needed here regardless of 
>>how late a tick is.  This is because, I would expect the ntp system call 
>>to sync the two clocks.  This means in your example, the ntp call would 
>>have been made at, or prior to the timer interrupt at 2 and this is the 
>>same edge that gettimeofday is to used to start applying the correction.
> 
> 
> If you argue that we only need two adjustments, why not argue for only
> one? You're saying have one adjustment that you apply for the first
> tick's worth of time, and a second adjustment that you apply for the
> following N ticks' worth of time in the interval. Why the odd base
> case? 

Correct me if I am wrong here, but I am assuming that ntpd can ask for 
an adjustment of X amount which the kernel changes into N adjustments of 
X/N amount spread over the next N ticks.  This means that, once we have 
done N ticks, we MUST stop doing the correction.  Life would be easier 
if ntpd only asked for rate changes which are to be applied until the 
next ntpd call (which, as I understand it, it can ALSO do).  Of course, 
the issue of missed ticks kicks in here and is the ONLY reason we need 
to do two corrections, i.e. we need only one ntp correction if we have 
not missed a tick.
> 
> 
> 
>>>>It would appear that gettimeofday would need to know that the NTP 
>>>>adjustment is changing  (and to what).  It would also appear that this 
>>>>is known by the ntp code and could be made available to gettimeofday. 
>>>>If it is changing due to an NTP call, that system call, itself, 
>>>>should/must force synchronization.  So the only case gettimeofday needs 
>>>>to worry/know about is that an adjustment is to change at time X to 
>>>>value Y.  Also, me thinks there is only one such change that can be 
>>>>present at any given time.
>>>
>>>
>>>Well, in many arches gettimeofday() works around the above issue by
>>>capping the offset_ns value as such:
>>
>>I think this may have been done with only usec gettimeofday.  Now that 
>>we have clock_gettime() returning nsec, we need to be a bit more careful.
> 
> 
> Indeed.
> 
> thanks
> -john
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

-- 
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [RFC - 0/9] Generic timekeeping subsystem  (v. B5)
  2005-08-25  1:44                                     ` George Anzinger
@ 2005-08-25  2:13                                       ` john stultz
  0 siblings, 0 replies; 71+ messages in thread
From: john stultz @ 2005-08-25  2:13 UTC (permalink / raw)
  To: george
  Cc: Roman Zippel, Ulrich Windl, Nishanth Aravamudan, benh,
	Anton Blanchard, frank, lkml

On Wed, 2005-08-24 at 18:44 -0700, George Anzinger wrote:
> > Ok, so your forcing gettimeofday to be interval aware, so its applying
> > different fixed NTP adjustments to different chunks of the current
> > interval. The issue of course is if you're using fixed adjustments, is
> > that you have to have n ntp adjustments for n intervals, or you have to
> > apply the same ntp adjustment to multiple intervals. 
> 
> Uh, are you saying that one ntpd call can set up several different 
> adjustments?

Well, it allows for frequency adjustments, tick adjustments, and offset
adjustments in a single call or just the singleshot (adjtime)
adjustment. However it does not give multiple scaling factors for
different intervals, so you are correct there. 


>   I was assuming that any given call would set up either a 
> fixed adjustment for ever or a fixed adjustment to be applied for a 
> fixed number of ticks (or until so much correcting was done, which, in 
> the end is the same thing at this point in the code).
> 
> If ntpd has to come back to change the adjustment, I am assuming that 
> some kernel action can be taken at that time to sync the xtime clock and 
> the gettimeofday reading of it.  I.e. we would only have to keep track 
> of one adjustment with a possible pre specified end time.

Well, I guess a component of the adjustment would end at a specified
time, that's true. 


> >>I would argue that only two terms are needed here regardless of 
> >>how late a tick is.  This is because, I would expect the ntp system call 
> >>to sync the two clocks.  This means in your example, the ntp call would 
> >>have been made at, or prior to the timer interrupt at 2 and this is the 
> >>same edge that gettimeofday is to used to start applying the correction.
> > 
> > 
> > If you argue that we only need two adjustments, why not argue for only
> > one? You're saying have one adjustment that you apply for the first
> > tick's worth of time, and a second adjustment that you apply for the
> > following N ticks' worth of time in the interval. Why the odd base
> > case? 
> 
> Correct me if I am wrong here, but I am assuming that ntpd can ask for 
> an adjustment of X amount which the kernel changes into N adjustments of 
> X/N amount spread over the next N ticks.  

No, sorry, you are correct there, I was confusing things. 

It may work, and I had considered a similar idea when developing my
solution, but it seemed far too ugly and complicated. But that could
have just been my fault. :)

thanks
-john


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [RFC - 0/9] Generic timekeeping subsystem  (v. B5)
  2005-08-25  0:45                                       ` Roman Zippel
@ 2005-08-25 18:08                                         ` john stultz
  0 siblings, 0 replies; 71+ messages in thread
From: john stultz @ 2005-08-25 18:08 UTC (permalink / raw)
  To: Roman Zippel
  Cc: Ulrich Windl, Nishanth Aravamudan, benh, Anton Blanchard, frank,
	George Anzinger, lkml

On Thu, 2005-08-25 at 02:45 +0200, Roman Zippel wrote:
> Hi,
> 
> On Wed, 24 Aug 2005, john stultz wrote:
> 
> > Ok, well, I'm still at a loss for understanding how this avoids my
> > concern about time inconsistencies.
> 
> Let's take a simple example to demonstrate the difference between system 
> time and reference time.

[snip]

> 		17000		16974		8	-26
> 18		18000		17958		8	-42
> 19		19000		18942		8	-58
> 20		20000		19926		8	-74
> 
> let's assume we're late with the update by 10 cycles 
> (gettimeofday=19926+10*8=20006), so a change to the mult also requires a 
> adjustment of the system time:
> 
> 20+10		20000		19916		9	-84
> 
> so gettimeofday=19916+10*9=20006

Hey Roman, 
	Thanks for your patient persistence. The light bulb finally clicked on
for me last night. I'll start playing with the idea and get back to
you. 

thanks again,
-john


^ permalink raw reply	[flat|nested] 71+ messages in thread

end of thread, other threads:[~2005-08-25 18:10 UTC | newest]

Thread overview: 71+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-08-11  1:21 [RFC - 0/13] NTP cleanup work (v. B5) john stultz
2005-08-11  1:23 ` [RFC][PATCH - 1/13] NTP cleanup: Move NTP code into ntp.c john stultz
2005-08-11  1:25   ` [RFC][PATCH - 2/13] NTP cleanup: Move arches to new ntp interfaces john stultz
2005-08-11  1:26     ` [RFC][PATCH - 3/13] NTP cleanup: Remove unused NTP PPS code john stultz
2005-08-11  1:27       ` [RFC][PATCH - 4/13] NTP cleanup: Breakup ntp_adjtimex() john stultz
2005-08-11  1:28         ` [RFC][PATCH - 5/13] NTP cleanup: Break out leapsecond processing john stultz
2005-08-11  1:28           ` [RFC][PATCH - 6/13] NTP cleanup: Clean up ntp_adjtimex() arguement checking john stultz
2005-08-11  1:31             ` [RFC][PATCH - 7/13] NTP cleanup: Cleanup signed shifting logic john stultz
2005-08-11  1:31               ` [RFC][PATCH - 8/13] NTP cleanup: Integrate second_overflow() logic john stultz
2005-08-11  1:33                 ` [RFC][PATCH - 9/13] NTP cleanup: Improve NTP variable names john stultz
2005-08-11  1:33                   ` [RFC][PATCH - 10/13] NTP cleanup: Use ntp_lock instead of xtime_lock john stultz
2005-08-11  1:35                     ` [RFC][PATCH - 11/13] NTP cleanup: Introduce PPM adjustment variables john stultz
2005-08-11  1:36                       ` [RFC][PATCH - 12/13] NTP cleanup: cleanup ntp_advance() adjtime code john stultz
2005-08-11  1:38                         ` [RFC][PATCH - 13/13] NTP cleanup: drop time_phase and time_adj add copyright john stultz
2005-08-16  2:08         ` [RFC][PATCH - 4/13] NTP cleanup: Breakup ntp_adjtimex() john stultz
2005-08-11  2:13 ` [RFC - 0/9] Generic timekeeping subsystem (v. B5) john stultz
2005-08-11  2:14   ` [PATCH 1/9] Timesource management code john stultz
2005-08-11  2:16     ` [PATCH 2/9] Generic timekeeping core subsystem john stultz
2005-08-11  2:18       ` [PATCH 3/9] Generic timekeeping i386 arch specific changes, part 1 john stultz
2005-08-11  2:19         ` [PATCH 4/9] generic timekeeping i386 arch specific changes, part 2 john stultz
2005-08-11  2:20           ` [PATCH 5/9] generic timekeeping i386 arch specific changes, part 3 john stultz
2005-08-11  2:21             ` [PATCH 6/9] generic timekeeping i386 arch specific changes, part 4 john stultz
2005-08-11  2:23               ` [PATCH 7/9] generic timekeeping i386 arch specific changes, part 5 john stultz
2005-08-11  2:24                 ` [PATCH 8/9] generic timekeeping i386 arch specific changes, part 6 john stultz
2005-08-11  2:25                   ` [PATCH 9/9] generic timekeeping i386 specific timesources john stultz
2005-08-11  2:32   ` [RFC - 0/9] Generic timekeeping subsystem (v. B5) Lee Revell
2005-08-11  2:39     ` john stultz
2005-08-11  2:44       ` Lee Revell
2005-08-11  6:17     ` Ulrich Windl
2005-08-11  2:37   ` [RFC] Cumulative NTP cleanujp and generic timekeeping patch (v B5) john stultz
2005-08-15 22:14   ` [RFC - 0/9] Generic timekeeping subsystem (v. B5) Roman Zippel
2005-08-16  0:10     ` john stultz
2005-08-16 18:25       ` Christoph Lameter
2005-08-16 23:48         ` john stultz
2005-08-17  0:14           ` Christoph Lameter
2005-08-17  0:17             ` john stultz
2005-08-17  0:21               ` Christoph Lameter
2005-08-17  6:08         ` Ulrich Windl
2005-08-17 14:07           ` Christoph Lameter
2005-08-17  0:28       ` Roman Zippel
2005-08-17  1:17         ` john stultz
2005-08-17  7:40           ` Ulrich Windl
2005-08-19  0:27           ` Roman Zippel
2005-08-20  2:32             ` john stultz
2005-08-21 23:19               ` Roman Zippel
2005-08-22 18:57                 ` john stultz
2005-08-23 11:30                   ` Roman Zippel
2005-08-23 18:52                     ` john stultz
2005-08-23 20:51                     ` john stultz
2005-08-23 21:34                       ` Roman Zippel
2005-08-23 23:14                         ` john stultz
2005-08-23 23:54                           ` Roman Zippel
2005-08-24  0:29                             ` George Anzinger
2005-08-24 20:36                               ` john stultz
2005-08-24 23:46                                 ` George Anzinger
2005-08-25  0:42                                   ` john stultz
2005-08-25  1:44                                     ` George Anzinger
2005-08-25  2:13                                       ` john stultz
2005-08-24  6:34                             ` Ulrich Windl
2005-08-24  9:47                               ` Roman Zippel
2005-08-24 18:00                             ` john stultz
2005-08-24 18:48                               ` Roman Zippel
2005-08-24 19:15                                 ` john stultz
2005-08-24 19:49                                   ` Roman Zippel
2005-08-24 22:40                                     ` john stultz
2005-08-25  0:45                                       ` Roman Zippel
2005-08-25 18:08                                         ` john stultz
2005-08-17 19:03         ` George Anzinger
2005-08-15 22:12 ` [RFC - 0/13] NTP cleanup work " Roman Zippel
2005-08-15 22:46   ` john stultz
2005-08-17  0:10     ` Roman Zippel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox