[PATCH 1/6] new timeofday core subsystem for -mm (v.B3)

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH 1/6] new timeofday core subsystem for -mm (v.B3)
@ 2005-06-18  2:56 john stultz
  2005-06-18  2:58 ` [PATCH 2/6] new timeofday i386 arch specific changes, part 1 " john stultz
  2005-06-18 12:02 ` [PATCH 1/6] new timeofday core subsystem " Roman Zippel
  0 siblings, 2 replies; 32+ messages in thread
From: john stultz @ 2005-06-18  2:56 UTC (permalink / raw)
  To: lkml, Andrew Morton
  Cc: Tim Schmielau, George Anzinger, albert, Ulrich Windl,
	Christoph Lameter, Dominik Brodowski, David Mosberger, Andi Kleen,
	paulus, schwidefsky, keith maanthey, Chris McDermott, Max Asbock,
	mahuja, Nishanth Aravamudan, Darren Hart, Darrick J. Wong,
	Anton Blanchard, donf, mpm, benh, kernel-stuff, frank

Andrew, All,

	With the goal to simplify, streamline and consolidate the time-of-day
infrastructure, I propose the following common implementation across all
arches. This will allow generic bugs to be fixed once, reduce code
duplication, and with many architectures sharing the same time source,
this allows drivers to be written once for multiple architectures.
Additionally it will better delineate the lines between the soft-timer
subsystem and the time-of-day subsystem, opening the door for more
flexible and better soft-timer management. 

Features of this design:
========================

o Splits time of day management from timer interrupts
o Consolidates a large amount of code
o Generic algorithms which use time-source drivers chosen at runtime
o More consistent and readable code
o Uses nanoseconds as the kernel's base time unit
o Clearly separates the NTP code from the time code

For another description on the rework, see here:
http://lwn.net/Articles/120850/ (Many thanks to the LWN team for that
easy to understand writeup!)

This patch implements the architecture independent portion of the new
time of day subsystem. Included below is timeofday.c (which includes all
the time of day management and accessor functions), ntp.c (which
includes the ntp adjustment code, leapsecond processing, and ntp kernel
state machine code), timesource.c (for timesource specific management
functions), interface definition .h files, the example jiffies
timesource (lowest common denominator time source, mainly for use as
example code) and minimal hooks into arch independent code.

The patch does nothing without at least minimal architecture specific
hooks (i386, x86-64 and other architecture examples to follow), and it
should be able to be applied to a tree without affecting the existing
code.

Andrew, please consider for inclusion for testing into your tree.

Finally I'd like to thank the following people who have contributed
ideas, criticism, testing and code that has helped shape this work:

	George Anzinger, Nish Aravamudan, Max Asbock, Dominik Brodowski, Darren
Hart, Christoph Lameter, Matt Mackal, Keith Mannthey, Martin
Schwidefsky, Frank Sorenson, Ulrich Windl, Darrick Wong, and any others
whom I've accidentally forgotten.


thanks
-john


Signed-off-by: John Stultz <johnstul@us.ibm.com>

linux-2.6.12-rc6-mm1_timeofday-core_B3.patch
============================================
diff -ruN linux-2.6.12-rc6-mm1/Documentation/kernel-parameters.txt linux-2.6.12-rc6-mm1-tod/Documentation/kernel-parameters.txt
--- linux-2.6.12-rc6-mm1/Documentation/kernel-parameters.txt	2005-06-17 15:56:34.000000000 -0700
+++ linux-2.6.12-rc6-mm1-tod/Documentation/kernel-parameters.txt	2005-06-17 18:20:15.363903325 -0700
@@ -52,6 +52,7 @@
 	MTD	MTD support is enabled.
 	NET	Appropriate network support is enabled.
 	NUMA	NUMA support is enabled.
+	NEW_TOD The new timeofday code is enabled.
 	NFS	Appropriate NFS support is enabled.
 	OSS	OSS sound support is enabled.
 	PARIDE	The ParIDE subsystem is enabled.
@@ -309,7 +310,7 @@
 			Default value is set via a kernel config option.
 			Value can be changed at runtime via /selinux/checkreqprot.
  
- 	clock=		[BUGS=IA-32, HW] gettimeofday timesource override. 
+ 	clock=		[BUGS=IA-32, HW] gettimeofday timesource override. [Deprecated]
 			Forces specified timesource (if avaliable) to be used
 			when calculating gettimeofday(). If specicified timesource
 			is not avalible, it defaults to PIT. 
@@ -1443,6 +1444,10 @@
 
 	time		Show timing data prefixed to each printk message line
 
+	timesource=		[NEW_TOD] Override the default timesource
+			Override the default timesource and use the timesource
+			with the name specified.
+
 	tipar.timeout=	[HW,PPT]
 			Set communications timeout in tenths of a second
 			(default 15).
diff -ruN linux-2.6.12-rc6-mm1/drivers/char/hangcheck-timer.c linux-2.6.12-rc6-mm1-tod/drivers/char/hangcheck-timer.c
--- linux-2.6.12-rc6-mm1/drivers/char/hangcheck-timer.c	2005-06-17 15:56:34.000000000 -0700
+++ linux-2.6.12-rc6-mm1-tod/drivers/char/hangcheck-timer.c	2005-06-17 18:20:15.409897610 -0700
@@ -49,6 +49,7 @@
 #include <linux/delay.h>
 #include <asm/uaccess.h>
 #include <linux/sysrq.h>
+#include <linux/timeofday.h>
 
 
 #define VERSION_STR "0.9.0"
@@ -130,8 +131,12 @@
 #endif
 
 #ifdef HAVE_MONOTONIC
+#ifndef CONFIG_NEWTOD
 extern unsigned long long monotonic_clock(void);
 #else
+#define monotonic_clock() do_monotonic_clock()
+#endif
+#else
 static inline unsigned long long monotonic_clock(void)
 {
 # ifdef __s390__
diff -ruN linux-2.6.12-rc6-mm1/drivers/Makefile linux-2.6.12-rc6-mm1-tod/drivers/Makefile
--- linux-2.6.12-rc6-mm1/drivers/Makefile	2005-06-17 15:56:34.000000000 -0700
+++ linux-2.6.12-rc6-mm1-tod/drivers/Makefile	2005-06-17 18:20:38.168069630 -0700
@@ -70,3 +70,4 @@
 obj-y				+= firmware/
 obj-$(CONFIG_CRYPTO)		+= crypto/
 obj-$(CONFIG_DLM)		+= dlm/
+obj-$(CONFIG_NEWTOD)		+= timesource/
diff -ruN linux-2.6.12-rc6-mm1/drivers/timesource/jiffies.c linux-2.6.12-rc6-mm1-tod/drivers/timesource/jiffies.c
--- linux-2.6.12-rc6-mm1/drivers/timesource/jiffies.c	1969-12-31 16:00:00.000000000 -0800
+++ linux-2.6.12-rc6-mm1-tod/drivers/timesource/jiffies.c	2005-06-17 18:20:15.000000000 -0700
@@ -0,0 +1,69 @@
+/***********************************************************************
+* linux/drivers/timesource/jiffies.c
+*
+* This file contains the jiffies based time source.
+*
+* Copyright (C) 2004, 2005 IBM, John Stultz (johnstul@us.ibm.com)
+*
+* This program is free software; you can redistribute it and/or modify
+* it under the terms of the GNU General Public License as published by
+* the Free Software Foundation; either version 2 of the License, or
+* (at your option) any later version.
+*
+* This program is distributed in the hope that it will be useful,
+* but WITHOUT ANY WARRANTY; without even the implied warranty of
+* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+* GNU General Public License for more details.
+*
+* You should have received a copy of the GNU General Public License
+* along with this program; if not, write to the Free Software
+* Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
+*
+************************************************************************/
+#include <linux/timesource.h>
+#include <linux/jiffies.h>
+#include <linux/init.h>
+
+/* The Jiffies based timesource is the lowest common
+ * denominator time source which should function on
+ * all systems. It has the same coarse resolution as
+ * the timer interrupt frequency HZ and it suffers
+ * inaccuracies caused by missed or lost timer
+ * interrupts and the inability for the timer
+ * interrupt hardware to accuratly tick at the
+ * requested HZ value. It is also not reccomended
+ * for "tick-less" systems.
+ */
+#define NSEC_PER_JIFFY ((u32)((((u64)NSEC_PER_SEC)<<8)/ACTHZ))
+
+/* Since jiffies uses a simple NSEC_PER_JIFFY multiplier
+ * conversion, the .shift value could be zero. However
+ * this would make NTP adjustments impossible as they are
+ * in units of 1/2^.shift. Thus we use JIFFIES_SHIFT to
+ * shift both the nominator and denominator the same
+ * amount, and give ntp adjustments in units of 1/2^10
+ */
+#define JIFFIES_SHIFT 10
+
+static cycle_t jiffies_read(void)
+{
+	cycle_t ret = get_jiffies_64();
+	return ret;
+}
+
+struct timesource_t timesource_jiffies = {
+	.name = "jiffies",
+	.priority = 0, /* lowest priority*/
+	.type = TIMESOURCE_FUNCTION,
+	.read_fnct = jiffies_read,
+	.mask = (cycle_t)-1,
+	.mult = NSEC_PER_JIFFY << JIFFIES_SHIFT, /* See above for details */
+	.shift = JIFFIES_SHIFT,
+};
+
+static int __init init_jiffies_timesource(void)
+{
+	register_timesource(&timesource_jiffies);
+	return 0;
+}
+module_init(init_jiffies_timesource);
diff -ruN linux-2.6.12-rc6-mm1/drivers/timesource/Makefile linux-2.6.12-rc6-mm1-tod/drivers/timesource/Makefile
--- linux-2.6.12-rc6-mm1/drivers/timesource/Makefile	1969-12-31 16:00:00.000000000 -0800
+++ linux-2.6.12-rc6-mm1-tod/drivers/timesource/Makefile	2005-06-17 18:20:15.000000000 -0700
@@ -0,0 +1 @@
+obj-y += jiffies.o
diff -ruN linux-2.6.12-rc6-mm1/include/asm-generic/timeofday.h linux-2.6.12-rc6-mm1-tod/include/asm-generic/timeofday.h
--- linux-2.6.12-rc6-mm1/include/asm-generic/timeofday.h	1969-12-31 16:00:00.000000000 -0800
+++ linux-2.6.12-rc6-mm1-tod/include/asm-generic/timeofday.h	2005-06-17 18:20:15.000000000 -0700
@@ -0,0 +1,26 @@
+/*  linux/include/asm-generic/timeofday.h
+ *
+ *  This file contains the asm-generic interface
+ *  to the arch specific calls used by the time of day subsystem
+ */
+#ifndef _ASM_GENERIC_TIMEOFDAY_H
+#define _ASM_GENERIC_TIMEOFDAY_H
+#include <linux/types.h>
+#include <linux/time.h>
+#include <linux/timex.h>
+#include <asm/div64.h>
+#ifdef CONFIG_NEWTOD
+
+/* Required externs */
+extern nsec_t read_persistent_clock(void);
+extern void sync_persistent_clock(struct timespec ts);
+
+#ifdef CONFIG_NEWTOD_VSYSCALL
+extern void arch_update_vsyscall_gtod(nsec_t wall_time, cycle_t offset_base,
+				struct timesource_t* timesource, int ntp_adj);
+#else
+#define arch_update_vsyscall_gtod(x,y,z,w) {}
+#endif /* CONFIG_NEWTOD_VSYSCALL */
+
+#endif /* CONFIG_NEWTOD */
+#endif
diff -ruN linux-2.6.12-rc6-mm1/include/linux/ntp.h linux-2.6.12-rc6-mm1-tod/include/linux/ntp.h
--- linux-2.6.12-rc6-mm1/include/linux/ntp.h	1969-12-31 16:00:00.000000000 -0800
+++ linux-2.6.12-rc6-mm1-tod/include/linux/ntp.h	2005-06-17 18:20:15.000000000 -0700
@@ -0,0 +1,19 @@
+/*  linux/include/linux/ntp.h
+ *
+ *  This file NTP state machine accessor functions.
+ */
+
+#ifndef _LINUX_NTP_H
+#define _LINUX_NTP_H
+#include <linux/types.h>
+#include <linux/time.h>
+#include <linux/timex.h>
+
+/* NTP state machine interfaces */
+int ntp_advance(nsec_t value);
+int ntp_adjtimex(struct timex*);
+int ntp_leapsecond(struct timespec now);
+void ntp_clear(void);
+int get_ntp_status(void);
+
+#endif
diff -ruN linux-2.6.12-rc6-mm1/include/linux/time.h linux-2.6.12-rc6-mm1-tod/include/linux/time.h
--- linux-2.6.12-rc6-mm1/include/linux/time.h	2005-06-17 15:56:34.000000000 -0700
+++ linux-2.6.12-rc6-mm1-tod/include/linux/time.h	2005-06-17 18:20:15.000000000 -0700
@@ -27,6 +27,10 @@
 
 #ifdef __KERNEL__
 
+/* timeofday base types */
+typedef u64 nsec_t;
+typedef u64 cycle_t;
+
 /* Parameters used to convert the timespec values */
 #ifndef USEC_PER_SEC
 #define USEC_PER_SEC (1000000L)
diff -ruN linux-2.6.12-rc6-mm1/include/linux/timeofday.h linux-2.6.12-rc6-mm1-tod/include/linux/timeofday.h
--- linux-2.6.12-rc6-mm1/include/linux/timeofday.h	1969-12-31 16:00:00.000000000 -0800
+++ linux-2.6.12-rc6-mm1-tod/include/linux/timeofday.h	2005-06-17 18:20:15.000000000 -0700
@@ -0,0 +1,59 @@
+/*  linux/include/linux/timeofday.h
+ *
+ *  This file contains the interface to the time of day subsystem
+ */
+#ifndef _LINUX_TIMEOFDAY_H
+#define _LINUX_TIMEOFDAY_H
+#include <linux/types.h>
+#include <linux/time.h>
+#include <linux/timex.h>
+#include <asm/div64.h>
+
+#ifdef CONFIG_NEWTOD
+/* Public definitions */
+extern nsec_t get_lowres_timestamp(void);
+extern nsec_t get_lowres_timeofday(void);
+extern nsec_t do_monotonic_clock(void);
+
+extern void do_gettimeofday(struct timeval *tv);
+extern void getnstimeofday(struct timespec *ts);
+extern int do_settimeofday(struct timespec *tv);
+extern int do_adjtimex(struct timex *tx);
+
+extern void timeofday_init(void);
+
+/* Inline helper functions */
+static inline struct timeval ns_to_timeval(nsec_t ns)
+{
+	struct timeval tv;
+	tv.tv_sec = div_long_long_rem(ns, NSEC_PER_SEC, &tv.tv_usec);
+	tv.tv_usec = (tv.tv_usec + NSEC_PER_USEC/2) / NSEC_PER_USEC;
+	return tv;
+}
+
+static inline struct timespec ns_to_timespec(nsec_t ns)
+{
+	struct timespec ts;
+	ts.tv_sec = div_long_long_rem(ns, NSEC_PER_SEC, &ts.tv_nsec);
+	return ts;
+}
+
+static inline nsec_t timespec_to_ns(struct timespec* ts)
+{
+	nsec_t ret;
+	ret = ((nsec_t)ts->tv_sec) * NSEC_PER_SEC;
+	ret += ts->tv_nsec;
+	return ret;
+}
+
+static inline nsec_t timeval_to_ns(struct timeval* tv)
+{
+	nsec_t ret;
+	ret = ((nsec_t)tv->tv_sec) * NSEC_PER_SEC;
+	ret += tv->tv_usec * NSEC_PER_USEC;
+	return ret;
+}
+#else /* CONFIG_NEWTOD */
+#define timeofday_init()
+#endif /* CONFIG_NEWTOD */
+#endif /* _LINUX_TIMEOFDAY_H */
diff -ruN linux-2.6.12-rc6-mm1/include/linux/timesource.h linux-2.6.12-rc6-mm1-tod/include/linux/timesource.h
--- linux-2.6.12-rc6-mm1/include/linux/timesource.h	1969-12-31 16:00:00.000000000 -0800
+++ linux-2.6.12-rc6-mm1-tod/include/linux/timesource.h	2005-06-17 18:20:15.000000000 -0700
@@ -0,0 +1,172 @@
+/*  linux/include/linux/timesource.h
+ *
+ *  This file contains the structure definitions for timesources.
+ *
+ *  If you are not a timesource, or the time of day code, you should
+ *  not be including this file!
+ */
+#ifndef _LINUX_TIMESOURCE_H
+#define _LINUX_TIMESOURCE_H
+
+#include <linux/types.h>
+#include <linux/time.h>
+#include <linux/timex.h>
+#include <asm/io.h>
+#include <asm/div64.h>
+
+/* struct timesource_t:
+ *      Provides mostly state-free accessors to the underlying hardware.
+ *
+ * name:                ptr to timesource name
+ * priority:            priority value for selection (higher is better)
+ *                      To avoid priority inflation the following
+ *                      list should give you a guide as to how
+ *                      to assign your timesource a priority
+ *                      1-99: Unfit for real use
+ *                          Only available for bootup and testing purposes.
+ *                      100-199: Base level usability.
+ *                          Functional for real use, but not desired.
+ *                      200-299: Good.
+ *                           A correct and usable timesource.
+ *                      300-399: Desired.
+ *                           A reasonably fast and accurate timesource.
+ *                      400-499: Perfect
+ *                           The ideal timesource. A must-use where available.
+ * type:                defines timesource type
+ * @read_fnct:          returns a cycle value
+ * ptr:                 ptr to MMIO'ed counter
+ * mask:                bitmask for two's complement
+ *                      subtraction of non 64 bit counters
+ * mult:                cycle to nanosecond multiplier
+ * shift:               cycle to nanosecond divisor (power of two)
+ * @update_callback:    called when safe to alter timesource values
+ */
+struct timesource_t {
+	char* name;
+	int priority;
+	enum {
+		TIMESOURCE_FUNCTION,
+		TIMESOURCE_CYCLES,
+		TIMESOURCE_MMIO_32,
+		TIMESOURCE_MMIO_64
+	} type;
+	cycle_t (*read_fnct)(void);
+	void __iomem *mmio_ptr;
+	cycle_t mask;
+	u32 mult;
+	u32 shift;
+	void (*update_callback)(void);
+};
+
+
+/* Helper functions that converts a khz counter
+ * frequency to a timsource multiplier, given the
+ * timesource shift value
+ */
+static inline u32 timesource_khz2mult(u32 khz, u32 shift_constant)
+{
+	/*  khz = cyc/(Million ns)
+	 *  mult/2^shift  = ns/cyc
+	 *  mult = ns/cyc * 2^shift
+	 *  mult = 1Million/khz * 2^shift
+	 *  mult = 1000000 * 2^shift / khz
+	 *  mult = (1000000<<shift) / khz
+	 */
+	u64 tmp = ((u64)1000000) << shift_constant;
+	tmp += khz/2; /* round for do_div */
+	do_div(tmp, khz);
+	return (u32)tmp;
+}
+
+/* Helper functions that converts a hz counter
+ * frequency to a timsource multiplier, given the
+ * timesource shift value
+ */
+static inline u32 timesource_hz2mult(u32 hz, u32 shift_constant)
+{
+	/*  hz = cyc/(Billion ns)
+	 *  mult/2^shift  = ns/cyc
+	 *  mult = ns/cyc * 2^shift
+	 *  mult = 1Billion/hz * 2^shift
+	 *  mult = 1000000000 * 2^shift / hz
+	 *  mult = (1000000000<<shift) / hz
+	 */
+	u64 tmp = ((u64)1000000000) << shift_constant;
+	tmp += hz/2; /* round for do_div */
+	do_div(tmp, hz);
+	return (u32)tmp;
+}
+
+
+#ifndef readq
+/* Provide an a way to atomically read a u64 on a 32bit arch */
+static inline unsigned long long timesource_readq(void __iomem *addr)
+{
+	u32 low, high;
+	/* loop is required to make sure we get an atomic read */
+	do {
+		high = readl(addr+4);
+		low = readl(addr);
+	} while (high != readl(addr+4));
+
+	return low | (((unsigned long long)high) << 32LL);
+}
+#else
+#define timesource_readq(x) readq(x)
+#endif
+
+
+/* read_timesource():
+ *      Uses the timesource to return the current cycle_t value
+ */
+static inline cycle_t read_timesource(struct timesource_t *ts)
+{
+	switch (ts->type) {
+	case TIMESOURCE_MMIO_32:
+		return (cycle_t)readl(ts->mmio_ptr);
+	case TIMESOURCE_MMIO_64:
+		return (cycle_t)timesource_readq(ts->mmio_ptr);
+	case TIMESOURCE_CYCLES:
+		return (cycle_t)get_cycles();
+	default:/* case: TIMESOURCE_FUNCTION */
+		return ts->read_fnct();
+	}
+}
+
+/* cyc2ns():
+ *      Uses the timesource and ntp ajdustment interval to
+ *      convert cycle_ts to nanoseconds.
+ */
+static inline nsec_t cyc2ns(struct timesource_t *ts, int ntp_adj, cycle_t cycles)
+{
+	u64 ret;
+	ret = (u64)cycles;
+	ret *= (ts->mult + ntp_adj);
+	ret >>= ts->shift;
+	return (nsec_t)ret;
+}
+
+/* cyc2ns_rem():
+ *      Uses the timesource and ntp ajdustment interval to
+ *      convert cycle_ts to nanoseconds. Add in remainder portion
+ *      which is stored in ns<<ts->shift units and save the new
+ *      remainder off.
+ */
+static inline nsec_t cyc2ns_rem(struct timesource_t *ts, int ntp_adj, cycle_t cycles, u64* rem)
+{
+	u64 ret;
+	ret = (u64)cycles;
+	ret *= (ts->mult + ntp_adj);
+	if (rem) {
+		ret += *rem;
+		*rem = ret & ((1<<ts->shift)-1);
+	}
+	ret >>= ts->shift;
+	return (nsec_t)ret;
+}
+
+/* used to install a new time source */
+void register_timesource(struct timesource_t*);
+void reselect_timesource(void);
+struct timesource_t* get_next_timesource(void);
+#endif
diff -ruN linux-2.6.12-rc6-mm1/include/linux/timex.h linux-2.6.12-rc6-mm1-tod/include/linux/timex.h
--- linux-2.6.12-rc6-mm1/include/linux/timex.h	2005-06-17 15:56:34.000000000 -0700
+++ linux-2.6.12-rc6-mm1-tod/include/linux/timex.h	2005-06-17 18:20:15.000000000 -0700
@@ -228,6 +228,7 @@
 extern unsigned long tick_nsec;		/* ACTHZ          period (nsec) */
 extern int tickadj;			/* amount of adjustment per tick */
 
+#ifndef CONFIG_NEWTOD
 /*
  * phase-lock loop variables
  */
@@ -314,6 +315,7 @@
 }
 
 #endif /* !CONFIG_TIME_INTERPOLATION */
+#endif /* !CONFIG_NEWTOD */
 
 #endif /* KERNEL */
 
diff -ruN linux-2.6.12-rc6-mm1/init/main.c linux-2.6.12-rc6-mm1-tod/init/main.c
--- linux-2.6.12-rc6-mm1/init/main.c	2005-06-17 15:56:34.000000000 -0700
+++ linux-2.6.12-rc6-mm1-tod/init/main.c	2005-06-17 18:20:15.000000000 -0700
@@ -47,6 +47,7 @@
 #include <linux/rmap.h>
 #include <linux/mempolicy.h>
 #include <linux/key.h>
+#include <linux/timeofday.h>
 
 #include <asm/io.h>
 #include <asm/bugs.h>
@@ -474,6 +475,7 @@
 	pidhash_init();
 	init_timers();
 	softirq_init();
+	timeofday_init();
 	time_init();
 
 	/*
diff -ruN linux-2.6.12-rc6-mm1/kernel/Makefile linux-2.6.12-rc6-mm1-tod/kernel/Makefile
--- linux-2.6.12-rc6-mm1/kernel/Makefile	2005-06-17 15:56:34.000000000 -0700
+++ linux-2.6.12-rc6-mm1-tod/kernel/Makefile	2005-06-17 18:20:15.000000000 -0700
@@ -9,6 +9,7 @@
 	    rcupdate.o intermodule.o extable.o params.o posix-timers.o \
 	    kthread.o wait.o kfifo.o sys_ni.o posix-cpu-timers.o
 
+obj-$(CONFIG_NEWTOD) += timeofday.o timesource.o ntp.o
 obj-$(CONFIG_FUTEX) += futex.o
 obj-$(CONFIG_GENERIC_ISA_DMA) += dma.o
 obj-$(CONFIG_SMP) += cpu.o spinlock.o
diff -ruN linux-2.6.12-rc6-mm1/kernel/ntp.c linux-2.6.12-rc6-mm1-tod/kernel/ntp.c
--- linux-2.6.12-rc6-mm1/kernel/ntp.c	1969-12-31 16:00:00.000000000 -0800
+++ linux-2.6.12-rc6-mm1-tod/kernel/ntp.c	2005-06-17 18:20:15.000000000 -0700
@@ -0,0 +1,464 @@
+/********************************************************************
+* linux/kernel/ntp.c
+*
+* NTP state machine and time scaling code.
+*
+* Copyright (C) 2004, 2005 IBM, John Stultz (johnstul@us.ibm.com)
+*
+* Portions rewritten from kernel/time.c and kernel/timer.c
+* Please see those files for original copyrights.
+*
+* This program is free software; you can redistribute it and/or modify
+* it under the terms of the GNU General Public License as published by
+* the Free Software Foundation; either version 2 of the License, or
+* (at your option) any later version.
+*
+* This program is distributed in the hope that it will be useful,
+* but WITHOUT ANY WARRANTY; without even the implied warranty of
+* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+* GNU General Public License for more details.
+*
+* You should have received a copy of the GNU General Public License
+* along with this program; if not, write to the Free Software
+* Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
+*
+* Notes:
+*
+* Hopefully you should never have to understand or touch
+* any of the code below. but don't let that keep you from trying!
+*
+* This code is loosely based on David Mills' RFC 1589 and its
+* updates. Please see the following for more details:
+*  http://www.eecis.udel.edu/~mills/database/rfc/rfc1589.txt
+*  http://www.eecis.udel.edu/~mills/database/reports/kern/kernb.pdf
+*
+* NOTE:	To simplify the code, we do not implement any of
+* the PPS code, as the code that uses it never was merged.
+*                             -johnstul@us.ibm.com
+*
+* TODO List:
+*   o Move to using ppb for frequency adjustments
+*   o More documentation
+*   o More testing
+*   o More optimization
+*********************************************************************/
+
+#include <linux/ntp.h>
+#include <linux/errno.h>
+
+#define NTP_DEBUG 0
+
+/* Chapter 5: Kernel Variables [RFC 1589 pg. 28] */
+/* 5.1 Interface Variables */
+static int ntp_status       = STA_UNSYNC;       /* status */
+static long ntp_offset;                         /* usec */
+static long ntp_constant    = 2;                /* ntp magic? */
+static long ntp_maxerror    = NTP_PHASE_LIMIT;  /* usec */
+static long ntp_esterror    = NTP_PHASE_LIMIT;  /* usec */
+static const long ntp_tolerance	= MAXFREQ;      /* shifted ppm */
+static const long ntp_precision	= 1;            /* constant */
+
+/* 5.2 Phase-Lock Loop Variables */
+static long ntp_freq;                           /* shifted ppm */
+static long ntp_reftime;                        /* sec */
+
+/* Extra values */
+static int ntp_state    = TIME_OK;              /* leapsecond state */
+static long ntp_tick    = USEC_PER_SEC/USER_HZ; /* tick length */
+
+static s64 ss_offset_len;   /* SINGLESHOT offset adj interval (nsec)*/
+static long singleshot_adj; /* +/- MAX_SINGLESHOT_ADJ (ppm)*/
+static long tick_adj;       /* tx->tick adjustment (ppm) */
+static long offset_adj;     /* offset adjustment (ppm) */
+
+
+/* lock for the above variables */
+static seqlock_t ntp_lock = SEQLOCK_UNLOCKED;
+
+#define MAX_SINGLESHOT_ADJ 500 /* (ppm) */
+#define SEC_PER_DAY 86400
+
+/* Required to safely shift negative values */
+#define shiftR(x,s) (x < 0) ? (-((-x) >> (s))) : ((x) >> (s))
+
+/**
+ * ntp_advance - Periodic hook which increments NTP state machine
+ * interval: nsecond interval value used to increment the state machine
+ *
+ *  Periodic hook which increments NTP state machine by interval.
+ *  Returns the signed PPM adjustment to be used for the next interval.
+ *
+ *  This is ntp_hardclock in the RFC.
+ */
+int ntp_advance(nsec_t interval)
+{
+	static u64 interval_sum = 0;
+	static long ss_adj = 0;
+	unsigned long flags;
+	long ppm_sum;
+
+	write_seqlock_irqsave(&ntp_lock, flags);
+
+	/* decrement singleshot offset interval */
+	ss_offset_len -= interval;
+	if(ss_offset_len < 0) /* make sure it doesn't go negative */
+		ss_offset_len = 0;
+
+	/* Some components of the NTP state machine are advanced
+	 * in full second increments (this is a hold-over from
+	 * the old second_overflow() code)
+	 *
+	 * XXX - I'd prefer to smoothly apply this math at each
+	 * call to ntp_advance() rather then each second.
+	 */
+	interval_sum += interval;
+	while (interval_sum > NSEC_PER_SEC) {
+		long next_adj;
+		interval_sum -= NSEC_PER_SEC;
+
+		/* Bump maxerror by ntp_tolerance */
+		ntp_maxerror += shiftR(ntp_tolerance, SHIFT_USEC);
+		if (ntp_maxerror > NTP_PHASE_LIMIT) {
+			ntp_maxerror = NTP_PHASE_LIMIT;
+			ntp_status |= STA_UNSYNC;
+		}
+
+		/* Calculate offset_adj for the next second */
+		next_adj = ntp_offset;
+		if (!(ntp_status & STA_FLL))
+			next_adj = shiftR(next_adj, SHIFT_KG + ntp_constant);
+		next_adj = min(next_adj, (MAXPHASE / MINSEC) << SHIFT_UPDATE);
+		next_adj = max(next_adj, -(MAXPHASE / MINSEC) << SHIFT_UPDATE);
+		ntp_offset -= next_adj;
+		offset_adj = shiftR(next_adj, SHIFT_UPDATE); /* ppm */
+
+		/* Set ss_adj for the next second */
+		ss_adj = singleshot_adj;
+		singleshot_adj = 0;
+	}
+
+	/* calculate total ppm adjustment for the next interval */
+	ppm_sum = tick_adj;
+	ppm_sum += offset_adj;
+	ppm_sum += shiftR(ntp_freq,SHIFT_USEC);
+	ppm_sum += ss_adj;
+
+#if NTP_DEBUG
+{
+	static int dbg = 0;
+	if(!(dbg++%300000))
+		printk("tick_adj(%d) + offset_adj(%d) + ntp_freq(%d) + ss_adj(%d) = ppm_sum(%d)\n", tick_adj, offset_adj, shiftR(ntp_freq,SHIFT_USEC), ss_adj, ppm_sum);
+}
+#endif
+	write_sequnlock_irqrestore(&ntp_lock, flags);
+
+	return ppm_sum;
+}
+
+/**
+ * ntp_hardupdate - Calculates the offset and freq values
+ * offset: current offset
+ * tv: timeval holding the current time
+ *
+ * Private function, called only by ntp_adjtimex while holding ntp_lock
+ *
+ * This function is called when an offset adjustment is requested.
+ * It calculates the offset adjustment and manipulates the
+ * frequency adjustement accordingly.
+ */
+static int ntp_hardupdate(long offset, struct timeval tv)
+{
+	int ret;
+	long current_offset, interval;
+
+	ret = 0;
+	if (!(ntp_status & STA_PLL))
+		return ret;
+
+	current_offset = offset;
+	/* Make sure offset is bounded by MAXPHASE */
+	current_offset = min(current_offset, MAXPHASE);
+	current_offset = max(current_offset, -MAXPHASE);
+
+	ntp_offset = current_offset << SHIFT_UPDATE;
+
+	if ((ntp_status & STA_FREQHOLD) || (ntp_reftime == 0))
+		ntp_reftime = tv.tv_sec;
+
+	/* calculate seconds since last call to hardupdate */
+	interval = tv.tv_sec - ntp_reftime;
+	ntp_reftime = tv.tv_sec;
+
+	if ((ntp_status & STA_FLL) && (interval >= MINSEC)) {
+		long damping, offset_ppm;
+		/* calculate frequency for this interval */
+		offset_ppm = (offset + interval/2) / interval;
+
+		/* calculate damping factor */
+		damping = SHIFT_KH - SHIFT_USEC;
+
+		/* convert to shifted ppm, then apply damping factor */
+		ntp_freq += shiftR(offset_ppm, damping);
+#if NTP_DEBUG
+		printk("ntp->freq change: %ld\n",shiftR(offset_ppm, damping));
+#endif
+
+	} else if ((ntp_status & STA_PLL) && (interval < MAXSEC)) {
+		long damping, offset_ppm;
+		offset_ppm = offset * interval;
+
+		/* calculate damping factor */
+		damping = (2 * ntp_constant) + SHIFT_KF - SHIFT_USEC;
+
+		/* apply damping factor */
+		ntp_freq += shiftR(offset_ppm, damping);
+
+#if NTP_DEBUG
+		printk("ntp->freq change: %ld\n", shiftR(offset_ppm, damping));
+#endif
+	} else { /* interval out of bounds */
+#if NTP_DEBUG
+		printk("ntp_hardupdate(): interval out of bounds: %ld status: 0x%x\n",
+				interval, ntp_status);
+#endif
+		ret = TIME_ERROR;
+	}
+
+	/* bound ntp_freq */
+	if (ntp_freq > ntp_tolerance)
+		ntp_freq = ntp_tolerance;
+	if (ntp_freq < -ntp_tolerance)
+		ntp_freq = -ntp_tolerance;
+
+	return ret;
+}
+
+/**
+ * ntp_adjtimex - Interface to change NTP state machine
+ * @tx: timex value passed to the kernel to be used
+ */
+int ntp_adjtimex(struct timex* tx)
+{
+	long save_offset;
+	int result;
+	unsigned long flags;
+
+/* Sanity checking
+ */
+	/* frequency adjustment limited to +/- MAXFREQ */
+	if ((tx->modes & ADJ_FREQUENCY)
+			&& (abs(tx->freq) > MAXFREQ))
+		return -EINVAL;
+
+	/* maxerror adjustment limited to NTP_PHASE_LIMIT */
+	if ((tx->modes & ADJ_MAXERROR)
+			&& (tx->maxerror < 0
+				|| tx->maxerror >= NTP_PHASE_LIMIT))
+		return -EINVAL;
+
+	/* esterror adjustment limited to NTP_PHASE_LIMIT */
+	if ((tx->modes & ADJ_ESTERROR)
+			&& (tx->esterror < 0
+				|| tx->esterror >= NTP_PHASE_LIMIT))
+		return -EINVAL;
+
+	/* constant adjustment must be positive */
+	if ((tx->modes & ADJ_TIMECONST)
+			&& (tx->constant < 0))
+		return -EINVAL;
+
+	/* Single shot mode can only be used by itself */
+	if (((tx->modes & ADJ_OFFSET_SINGLESHOT) == ADJ_OFFSET_SINGLESHOT)
+			&& (tx->modes != ADJ_OFFSET_SINGLESHOT))
+		return -EINVAL;
+
+	/* offset adjustment limited to +/- MAXPHASE */
+	if ((tx->modes != ADJ_OFFSET_SINGLESHOT)
+			&& (tx->modes & ADJ_OFFSET)
+			&& (abs(tx->offset)>= MAXPHASE))
+		return -EINVAL;
+
+	/* tick adjustment limited to 10% */
+	if ((tx->modes & ADJ_TICK)
+			&& ((tx->tick < 900000/USER_HZ)
+				||(tx->tick > 11000000/USER_HZ)))
+		return -EINVAL;
+
+#if NTP_DEBUG
+	if(tx->modes) {
+		printk("adjtimex: tx->offset: %ld    tx->freq: %ld\n",
+				tx->offset, tx->freq);
+	}
+#endif
+
+	write_seqlock_irqsave(&ntp_lock, flags);
+
+	result = ntp_state;
+
+	/* For ADJ_OFFSET_SINGLESHOT we must return the old offset */
+	save_offset = shiftR(ntp_offset, SHIFT_UPDATE);
+
+	/* Process input parameters */
+	if (tx->modes & ADJ_STATUS) {
+		ntp_status &=  STA_RONLY;
+		ntp_status |= tx->status & ~STA_RONLY;
+	}
+
+	if (tx->modes & ADJ_FREQUENCY)
+		ntp_freq = tx->freq;
+
+	if (tx->modes & ADJ_MAXERROR)
+		ntp_maxerror = tx->maxerror;
+
+	if (tx->modes & ADJ_ESTERROR)
+		ntp_esterror = tx->esterror;
+
+	if (tx->modes & ADJ_TIMECONST)
+		ntp_constant = tx->constant;
+
+	if (tx->modes & ADJ_OFFSET) {
+		if (tx->modes == ADJ_OFFSET_SINGLESHOT)
+			singleshot_adj = tx->offset;
+		else if (ntp_hardupdate(tx->offset, tx->time))
+			result = TIME_ERROR;
+	}
+
+	if (tx->modes & ADJ_TICK) {
+		/* first calculate usec/user_tick offset */
+		tick_adj = ((USEC_PER_SEC + USER_HZ/2)/USER_HZ) - tx->tick;
+		/* multiply by user_hz to get usec/sec => ppm */
+		tick_adj *= USER_HZ;
+		/* save tx->tick for future calls to adjtimex */
+		ntp_tick = tx->tick;
+	}
+
+	if ((ntp_status & (STA_UNSYNC|STA_CLOCKERR)) != 0 )
+		result = TIME_ERROR;
+
+	/* write kernel state to user timex values*/
+	if ((tx->modes & ADJ_OFFSET_SINGLESHOT) == ADJ_OFFSET_SINGLESHOT)
+		tx->offset = save_offset;
+	else
+		tx->offset = shiftR(ntp_offset, SHIFT_UPDATE);
+
+	tx->freq = ntp_freq;
+	tx->maxerror = ntp_maxerror;
+	tx->esterror = ntp_esterror;
+	tx->status = ntp_status;
+	tx->constant = ntp_constant;
+	tx->precision = ntp_precision;
+	tx->tolerance = ntp_tolerance;
+
+	/* PPS is not implemented, so these are zero */
+	tx->ppsfreq	= 0;
+	tx->jitter	= 0;
+	tx->shift	= 0;
+	tx->stabil	= 0;
+	tx->jitcnt	= 0;
+	tx->calcnt	= 0;
+	tx->errcnt	= 0;
+	tx->stbcnt	= 0;
+
+	write_sequnlock_irqrestore(&ntp_lock, flags);
+
+	return result;
+}
+
+
+/**
+ * ntp_leapsecond - NTP leapsecond processing code.
+ * now: the current time
+ *
+ * Returns the number of seconds (-1, 0, or 1) that
+ * should be added to the current time to properly
+ * adjust for leapseconds.
+ */
+int ntp_leapsecond(struct timespec now)
+{
+	/*
+	 * Leap second processing. If in leap-insert state at
+	 * the end of the day, the system clock is set back one
+	 * second; if in leap-delete state, the system clock is
+	 * set ahead one second.
+	 */
+	static time_t leaptime = 0;
+
+	switch (ntp_state) {
+	case TIME_OK:
+		if (ntp_status & STA_INS) {
+			ntp_state = TIME_INS;
+			/* calculate end of today (23:59:59)*/
+			leaptime = now.tv_sec + SEC_PER_DAY -
+					(now.tv_sec % SEC_PER_DAY) - 1;
+		}
+		else if (ntp_status & STA_DEL) {
+			ntp_state = TIME_DEL;
+			/* calculate end of today (23:59:59)*/
+			leaptime = now.tv_sec + SEC_PER_DAY -
+					(now.tv_sec % SEC_PER_DAY) - 1;
+		}
+		break;
+
+	case TIME_INS:
+		/* Once we are at (or past) leaptime, insert the second */
+		if (now.tv_sec > leaptime) {
+			ntp_state = TIME_OOP;
+			printk(KERN_NOTICE
+				"Clock: inserting leap second 23:59:60 UTC\n");
+			return -1;
+		}
+		break;
+
+	case TIME_DEL:
+		/* Once we are at (or past) leaptime, delete the second */
+		if (now.tv_sec >= leaptime) {
+			ntp_state = TIME_WAIT;
+			printk(KERN_NOTICE
+				"Clock: deleting leap second 23:59:59 UTC\n");
+			return 1;
+		}
+		break;
+
+	case TIME_OOP:
+		/*  Wait for the end of the leap second*/
+		if (now.tv_sec > (leaptime + 1))
+			ntp_state = TIME_WAIT;
+		break;
+
+	case TIME_WAIT:
+		if (!(ntp_status & (STA_INS | STA_DEL)))
+			ntp_state = TIME_OK;
+	}
+
+	return 0;
+}
+
+/**
+ * ntp_clear - Clears the NTP state machine.
+ *
+ */
+void ntp_clear(void)
+{
+	unsigned long flags;
+	write_seqlock_irqsave(&ntp_lock, flags);
+
+	ntp_status |= STA_UNSYNC;
+	ntp_maxerror = NTP_PHASE_LIMIT;
+	ntp_esterror = NTP_PHASE_LIMIT;
+	ss_offset_len = 0;
+	singleshot_adj = 0;
+	tick_adj = 0;
+	offset_adj =0;
+
+	write_sequnlock_irqrestore(&ntp_lock, flags);
+}
+
+/**
+ * get_ntp_status - Returns the NTP status value
+ *
+ */
+int get_ntp_status(void)
+{
+	return ntp_status;
+}
+
diff -ruN linux-2.6.12-rc6-mm1/kernel/time.c linux-2.6.12-rc6-mm1-tod/kernel/time.c
--- linux-2.6.12-rc6-mm1/kernel/time.c	2005-06-17 15:56:34.000000000 -0700
+++ linux-2.6.12-rc6-mm1-tod/kernel/time.c	2005-06-17 18:20:15.000000000 -0700
@@ -38,6 +38,7 @@
 
 #include <asm/uaccess.h>
 #include <asm/unistd.h>
+#include <linux/timeofday.h>
 
 /* 
  * The timezone where the local system is located.  Used as a default by some
@@ -128,6 +129,7 @@
  * as real UNIX machines always do it. This avoids all headaches about
  * daylight saving times and warping kernel clocks.
  */
+#ifndef CONFIG_NEWTOD
 inline static void warp_clock(void)
 {
 	write_seqlock_irq(&xtime_lock);
@@ -137,6 +139,18 @@
 	write_sequnlock_irq(&xtime_lock);
 	clock_was_set();
 }
+#else /* !CONFIG_NEWTOD */
+/* XXX - this is somewhat cracked out and should
+         be checked  -johnstul@us.ibm.com
+*/
+inline static void warp_clock(void)
+{
+	struct timespec ts;
+	getnstimeofday(&ts);
+	ts.tv_sec += sys_tz.tz_minuteswest * 60;
+	do_settimeofday(&ts);
+}
+#endif /* !CONFIG_NEWTOD */
 
 /*
  * In case for some reason the CMOS clock has not already been running
@@ -227,6 +241,7 @@
 /* adjtimex mainly allows reading (and writing, if superuser) of
  * kernel time-keeping variables. used by xntpd.
  */
+#ifndef CONFIG_NEWTOD
 int do_adjtimex(struct timex *txc)
 {
         long ltemp, mtemp, save_adjust;
@@ -410,6 +425,7 @@
 	notify_arch_cmos_timer();
 	return(result);
 }
+#endif /* !CONFIG_NEWTOD */
 
 asmlinkage long sys_adjtimex(struct timex __user *txc_p)
 {
@@ -558,6 +574,7 @@
 
 
 #else
+#ifndef CONFIG_NEWTOD
 /*
  * Simulate gettimeofday using do_gettimeofday which only allows a timeval
  * and therefore only yields usec accuracy
@@ -570,6 +587,7 @@
 	tv->tv_sec = x.tv_sec;
 	tv->tv_nsec = x.tv_usec * NSEC_PER_USEC;
 }
+#endif /* !CONFIG_NEWTOD */
 #endif
 
 #if (BITS_PER_LONG < 64)
diff -ruN linux-2.6.12-rc6-mm1/kernel/timeofday.c linux-2.6.12-rc6-mm1-tod/kernel/timeofday.c
--- linux-2.6.12-rc6-mm1/kernel/timeofday.c	1969-12-31 16:00:00.000000000 -0800
+++ linux-2.6.12-rc6-mm1-tod/kernel/timeofday.c	2005-06-17 18:20:15.000000000 -0700
@@ -0,0 +1,567 @@
+/*********************************************************************
+* linux/kernel/timeofday.c
+*
+* This file contains the functions which access and manage
+* the system's time of day functionality.
+*
+* Copyright (C) 2003, 2004, 2005 IBM, John Stultz (johnstul@us.ibm.com)
+*
+* This program is free software; you can redistribute it and/or modify
+* it under the terms of the GNU General Public License as published by
+* the Free Software Foundation; either version 2 of the License, or
+* (at your option) any later version.
+*
+* This program is distributed in the hope that it will be useful,
+* but WITHOUT ANY WARRANTY; without even the implied warranty of
+* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+* GNU General Public License for more details.
+*
+* You should have received a copy of the GNU General Public License
+* along with this program; if not, write to the Free Software
+* Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
+*
+* TODO WishList:
+*   o See XXX's below.
+**********************************************************************/
+
+#include <linux/timeofday.h>
+#include <linux/timesource.h>
+#include <linux/ntp.h>
+#include <linux/timex.h>
+#include <linux/timer.h>
+#include <linux/module.h>
+#include <linux/sched.h> /* Needed for capable() */
+#include <linux/sysdev.h>
+#include <linux/jiffies.h>
+#include <asm/timeofday.h>
+
+#define TIME_DBG 0
+#define TIME_DBG_FREQ 60000
+
+/* only run periodic_hook every 50ms */
+#define PERIODIC_INTERVAL_MS 50
+
+/*[Nanosecond based variables]
+ * system_time:
+ *     Monotonically increasing counter of the number of nanoseconds
+ *     since boot.
+ * wall_time_offset:
+ *     Offset added to system_time to provide accurate time-of-day
+ */
+static nsec_t system_time;
+static nsec_t wall_time_offset;
+
+/*[Cycle based variables]
+ * offset_base:
+ *     Value of the timesource at the last timeofday_periodic_hook()
+ *     (adjusted only minorly to account for rounded off cycles)
+ */
+static cycle_t offset_base;
+
+/*[Time source data]
+ * timesource:
+ *     current timesource pointer
+ */
+static struct timesource_t *timesource;
+
+/*[NTP adjustment]
+ * ntp_adj:
+ *     value of the current ntp adjustment,
+ *     stored in timesource multiplier units.
+ */
+int ntp_adj;
+
+/*[Locks]
+ * system_time_lock:
+ *     generic lock for all locally scoped time values
+ */
+static seqlock_t system_time_lock = SEQLOCK_UNLOCKED;
+
+
+/*[Suspend/Resume info]
+ * time_suspend_state:
+ *     variable that keeps track of suspend state
+ * suspend_start:
+ *     start of the suspend call
+ */
+static enum {
+	TIME_RUNNING,
+	TIME_SUSPENDED
+} time_suspend_state = TIME_RUNNING;
+
+static nsec_t suspend_start;
+
+/* [Soft-Timers]
+ * timeofday_timer:
+ *     soft-timer used to call timeofday_periodic_hook()
+ */
+struct timer_list timeofday_timer;
+
+
+/* [Functions]
+ */
+
+/**
+ * get_lowres_timestamp - Returns a low res timestamp
+ *
+ * Returns a low res timestamp w/ PERIODIC_INTERVAL_MS
+ * granularity. (ie: the value of system_time as
+ * calculated at the last invocation of
+ * timeofday_periodic_hook())
+ */
+nsec_t get_lowres_timestamp(void)
+{
+	nsec_t ret;
+	unsigned long seq;
+	do {
+		seq = read_seqbegin(&system_time_lock);
+
+		ret = system_time;
+
+	} while (read_seqretry(&system_time_lock, seq));
+
+	return ret;
+}
+
+
+/**
+ * get_lowres_timeofday - Returns a low res time of day
+ *
+ * Returns a low res time of day, as calculated at the
+ * last invocation of timeofday_periodic_hook().
+ */
+nsec_t get_lowres_timeofday(void)
+{
+	nsec_t ret;
+	unsigned long seq;
+	do {
+		seq = read_seqbegin(&system_time_lock);
+
+		ret = system_time + wall_time_offset;
+
+	} while (read_seqretry(&system_time_lock, seq));
+
+	return ret;
+}
+
+
+/**
+ * update_legacy_time_values - Used to sync legacy time values
+ *
+ * Private function. Used to sync legacy time values to
+ * current timeofday. Assumes we have the system_time_lock.
+ * Hopefully someday this function can be removed.
+ */
+static void update_legacy_time_values(void)
+{
+	unsigned long flags;
+	write_seqlock_irqsave(&xtime_lock, flags);
+	xtime = ns_to_timespec(system_time + wall_time_offset);
+	wall_to_monotonic = ns_to_timespec(wall_time_offset);
+	set_normalized_timespec(&wall_to_monotonic,
+		-wall_to_monotonic.tv_sec, -wall_to_monotonic.tv_nsec);
+	/* We don't update jiffies here because it is its own time domain */
+	write_sequnlock_irqrestore(&xtime_lock, flags);
+}
+
+
+/**
+ * __monotonic_clock - Returns monotonically increasing nanoseconds
+ *
+ * private function, must hold system_time_lock lock when being
+ * called. Returns the monotonically increasing number of
+ * nanoseconds since the system booted (adjusted by NTP scaling)
+ */
+static inline nsec_t __monotonic_clock(void)
+{
+	nsec_t ret, ns_offset;
+	cycle_t now, cycle_delta;
+
+	/* read timesource */
+	now = read_timesource(timesource);
+
+	/* calculate the delta since the last timeofday_periodic_hook */
+	cycle_delta = (now - offset_base) & timesource->mask;
+
+	/* convert to nanoseconds */
+	ns_offset = cyc2ns(timesource, ntp_adj, cycle_delta);
+
+	/* add result to system time */
+	ret = system_time + ns_offset;
+
+	return ret;
+}
+
+
+/**
+ * do_monotonic_clock - Returns monotonically increasing nanoseconds
+ *
+ * Returns the monotonically increasing number of nanoseconds
+ * since the system booted via __monotonic_clock()
+ */
+nsec_t do_monotonic_clock(void)
+{
+	nsec_t ret;
+	unsigned long seq;
+
+	/* atomically read __monotonic_clock() */
+	do {
+		seq = read_seqbegin(&system_time_lock);
+
+		ret = __monotonic_clock();
+
+	} while (read_seqretry(&system_time_lock, seq));
+
+	return ret;
+}
+
+
+/**
+ * __gettimeofday - Returns the timeofday in nsec_t.
+ *
+ * Private function. Returns the timeofday in nsec_t.
+ */
+static inline nsec_t __gettimeofday(void)
+{
+	nsec_t wall, sys;
+	unsigned long seq;
+
+	/* atomically read wall and sys time */
+	do {
+		seq = read_seqbegin(&system_time_lock);
+
+		wall = wall_time_offset;
+		sys = __monotonic_clock();
+
+	} while (read_seqretry(&system_time_lock, seq));
+
+	return wall + sys;
+}
+
+
+/**
+ * getnstimeofday - Returns the time of day in a timespec
+ * @ts: pointer to the timespec to be set
+ *
+ * Returns the time of day in a timespec
+ * For consistency should be renamed
+ * later to do_getnstimeofday()
+ */
+void getnstimeofday(struct timespec *ts)
+{
+	*ts = ns_to_timespec(__gettimeofday());
+}
+EXPORT_SYMBOL(getnstimeofday);
+
+
+/**
+ * do_gettimeofday - Returns the time of day in a timeval
+ * @tv: pointer to the timeval to be set
+ *
+ */
+void do_gettimeofday(struct timeval *tv)
+{
+	*tv = ns_to_timeval(__gettimeofday());
+}
+EXPORT_SYMBOL(do_gettimeofday);
+
+
+/**
+ * do_settimeofday - Sets the time of day
+ * @tv: pointer to the timespec that will be used to set the time
+ *
+ */
+int do_settimeofday(struct timespec *tv)
+{
+	unsigned long flags;
+	nsec_t newtime = timespec_to_ns(tv);
+
+	/* atomically adjust wall_time_offset & clear ntp state machine */
+	write_seqlock_irqsave(&system_time_lock, flags);
+
+	wall_time_offset = newtime - __monotonic_clock();
+	ntp_clear();
+
+	update_legacy_time_values();
+
+	arch_update_vsyscall_gtod(system_time + wall_time_offset, offset_base,
+							timesource, ntp_adj);
+
+	write_sequnlock_irqrestore(&system_time_lock, flags);
+
+	/* signal posix-timers about time change */
+	clock_was_set();
+
+	return 0;
+}
+EXPORT_SYMBOL(do_settimeofday);
+
+
+/**
+ * do_adjtimex - interface to the kernel NTP variables
+ * @tx: pointer to the timex value that will be used
+ *
+ * Userspace NTP daemon's interface to the kernel NTP variables
+ */
+int do_adjtimex(struct timex *tx)
+{
+	if (tx->modes && !capable(CAP_SYS_TIME))
+		return -EPERM;
+
+	/* Note: We set tx->time first,
+	 * because ntp_adjtimex uses it
+	 */
+	do_gettimeofday(&tx->time);
+
+	return ntp_adjtimex(tx);
+}
+
+
+/**
+ * timeofday_suspend_hook - allows the timeofday subsystem to be shutdown
+ * @dev: unused
+ * state: unused
+ *
+ * This function allows the timeofday subsystem to
+ * be shutdown for a period of time. Usefull when
+ * going into suspend/hibernate mode. The code is
+ * very similar to the first half of
+ * timeofday_periodic_hook().
+ */
+static int timeofday_suspend_hook(struct sys_device *dev, u32 state)
+{
+	unsigned long flags;
+
+	write_seqlock_irqsave(&system_time_lock, flags);
+
+	BUG_ON(time_suspend_state != TIME_RUNNING);
+
+	/* First off, save suspend start time
+	 * then quickly call __monotonic_clock.
+	 * These two calls hopefully occur quickly
+	 * because the difference between reads will
+	 * accumulate as time drift on resume.
+	 */
+	suspend_start = read_persistent_clock();
+	system_time = __monotonic_clock();
+
+	time_suspend_state = TIME_SUSPENDED;
+
+	write_sequnlock_irqrestore(&system_time_lock, flags);
+	return 0;
+}
+
+
+/**
+ * timeofday_resume_hook - Resumes the timeofday subsystem.
+ * @dev: unused
+ *
+ * This function resumes the timeofday subsystem
+ * from a previous call to timeofday_suspend_hook.
+ */
+static int timeofday_resume_hook(struct sys_device *dev)
+{
+	nsec_t now, suspend_time;
+	unsigned long flags;
+
+	write_seqlock_irqsave(&system_time_lock, flags);
+
+	BUG_ON(time_suspend_state != TIME_SUSPENDED);
+
+	/* Read persistent clock to mark the end of
+	 * the suspend interval then rebase the
+	 * offset_base to current timesource value.
+	 * Again, time between these two calls will
+	 * not be accounted for and will show up as
+	 * time drift.
+	 */
+	now = read_persistent_clock();
+	offset_base = read_timesource(timesource);
+
+	suspend_time = now - suspend_start;
+
+	system_time += suspend_time;
+
+	ntp_clear();
+
+	time_suspend_state = TIME_RUNNING;
+
+	update_legacy_time_values();
+
+	write_sequnlock_irqrestore(&system_time_lock, flags);
+
+	/* signal posix-timers about time change */
+	clock_was_set();
+
+	return 0;
+}
+
+/* sysfs resume/suspend bits */
+static struct sysdev_class timeofday_sysclass = {
+	.resume = timeofday_resume_hook,
+	.suspend = timeofday_suspend_hook,
+	set_kset_name("timeofday"),
+};
+static struct sys_device device_timer = {
+	.id	= 0,
+	.cls	= &timeofday_sysclass,
+};
+static int timeofday_init_device(void)
+{
+	int error = sysdev_class_register(&timeofday_sysclass);
+	if (!error)
+		error = sysdev_register(&device_timer);
+	return error;
+}
+device_initcall(timeofday_init_device);
+
+/**
+ * timeofday_periodic_hook - Does periodic update of timekeeping values.
+ * unused: unused
+ *
+ * Calculates the delta since the last call,
+ * updates system time and clears the offset.
+ *
+ * Called via timeofday_timer.
+ */
+static void timeofday_periodic_hook(unsigned long unused)
+{
+	cycle_t now, cycle_delta;
+	static u64 remainder;
+	nsec_t ns, ns_ntp;
+	long leapsecond;
+	struct timesource_t* next;
+	unsigned long flags;
+	u64 mult_adj;
+	int ppm;
+
+	write_seqlock_irqsave(&system_time_lock, flags);
+
+	/* read time source & calc time since last call*/
+	now = read_timesource(timesource);
+	cycle_delta = (now - offset_base) & timesource->mask;
+
+	/* convert cycles to ntp adjusted ns and save remainder */
+	ns_ntp = cyc2ns_rem(timesource, ntp_adj, cycle_delta, &remainder);
+
+	/* convert cycles to raw ns for ntp advance */
+	ns = cyc2ns(timesource, 0, cycle_delta);
+
+#if TIME_DBG
+	static int dbg=0;
+	if(!(dbg++%TIME_DBG_FREQ)){
+		printk(KERN_INFO "now: %lluc - then: %lluc = delta: %lluc -> %llu ns + %llu shift_ns (ntp_adj: %i)\n",
+			(unsigned long long)now, (unsigned long long)offset_base,
+			(unsigned long long)cycle_delta, (unsigned long long)ns,
+			(unsigned long long)remainder, ntp_adj);
+	}
+}
+#endif
+
+	/* update system_time */
+	system_time += ns_ntp;
+
+	/* reset the offset_base */
+	offset_base = now;
+
+	/* advance the ntp state machine by ns interval*/
+	ppm = ntp_advance(ns);
+
+	/* do ntp leap second processing*/
+	leapsecond = ntp_leapsecond(ns_to_timespec(system_time+wall_time_offset));
+	wall_time_offset += leapsecond * NSEC_PER_SEC;
+
+	/* sync the persistent clock */
+	if (!(get_ntp_status() & STA_UNSYNC))
+		sync_persistent_clock(ns_to_timespec(system_time + wall_time_offset));
+
+	/* if necessary, switch timesources */
+	next = get_next_timesource();
+	if (next != timesource) {
+		/* immediately set new offset_base */
+		offset_base = read_timesource(next);
+		/* swap timesources */
+		timesource = next;
+		printk(KERN_INFO "Time: %s timesource has been installed.\n",
+					timesource->name);
+		ntp_clear();
+		ntp_adj = 0;
+		remainder = 0;
+	}
+
+	/* now is a safe time, so allow timesource to adjust
+	 * itself (for example: to make cpufreq changes).
+	 */
+	if(timesource->update_callback)
+		timesource->update_callback();
+
+
+	/* Convert the signed ppm to timesource multiplier adjustment */
+	mult_adj = abs(ppm);
+	mult_adj = mult_adj * timesource->mult;
+	mult_adj += 1000000/2; /* round for div*/
+	do_div(mult_adj, 1000000);
+	if (ppm < 0)
+		ntp_adj = -(int)mult_adj;
+	else
+		ntp_adj = (int)mult_adj;
+
+
+	update_legacy_time_values();
+
+	arch_update_vsyscall_gtod(system_time + wall_time_offset, offset_base,
+							timesource, ntp_adj);
+
+	write_sequnlock_irqrestore(&system_time_lock, flags);
+
+	/* XXX - Do we need to call clock_was_set() here? */
+
+	/* Set us up to go off on the next interval */
+	mod_timer(&timeofday_timer,
+				jiffies + msecs_to_jiffies(PERIODIC_INTERVAL_MS));
+}
+
+
+/**
+ * timeofday_init - Initializes time variables
+ *
+ */
+void __init timeofday_init(void)
+{
+	unsigned long flags;
+#if TIME_DBG
+	printk(KERN_INFO "timeofday_init: Starting up!\n");
+#endif
+	write_seqlock_irqsave(&system_time_lock, flags);
+
+	/* initialize the timesource variable */
+	timesource = get_next_timesource();
+
+	/* clear and initialize offsets */
+	offset_base = read_timesource(timesource);
+	wall_time_offset = read_persistent_clock();
+
+	/* clear NTP scaling factor & state machine */
+	ntp_adj = 0;
+	ntp_clear();
+
+	/* initialize legacy time values */
+	update_legacy_time_values();
+
+	arch_update_vsyscall_gtod(system_time + wall_time_offset, offset_base,
+							timesource, ntp_adj);
+
+	write_sequnlock_irqrestore(&system_time_lock, flags);
+
+	/* Install timeofday_periodic_hook timer */
+	init_timer(&timeofday_timer);
+	timeofday_timer.function = timeofday_periodic_hook;
+	timeofday_timer.expires = jiffies + 1;
+	add_timer(&timeofday_timer);
+
+
+#if TIME_DBG
+	printk(KERN_INFO "timeofday_init: finished!\n");
+#endif
+	return;
+}
diff -ruN linux-2.6.12-rc6-mm1/kernel/timer.c linux-2.6.12-rc6-mm1-tod/kernel/timer.c
--- linux-2.6.12-rc6-mm1/kernel/timer.c	2005-06-17 15:56:34.000000000 -0700
+++ linux-2.6.12-rc6-mm1-tod/kernel/timer.c	2005-06-17 18:20:15.000000000 -0700
@@ -602,6 +602,7 @@
 int tickadj = 500/HZ ? : 1;		/* microsecs */
 
 
+#ifndef CONFIG_NEWTOD
 /*
  * phase-lock loop variables
  */
@@ -832,6 +833,9 @@
 		}
 	} while (ticks);
 }
+#else /* !CONFIG_NEWTOD */
+#define update_wall_time(x)
+#endif /* !CONFIG_NEWTOD */
 
 /*
  * Called from the timer interrupt handler to charge one tick to the current 
diff -ruN linux-2.6.12-rc6-mm1/kernel/timesource.c linux-2.6.12-rc6-mm1-tod/kernel/timesource.c
--- linux-2.6.12-rc6-mm1/kernel/timesource.c	1969-12-31 16:00:00.000000000 -0800
+++ linux-2.6.12-rc6-mm1-tod/kernel/timesource.c	2005-06-17 18:20:15.000000000 -0700
@@ -0,0 +1,259 @@
+/*********************************************************************
+* linux/kernel/timesource.c
+*
+* This file contains the functions which manage timesource drivers.
+*
+* Copyright (C) 2004, 2005 IBM, John Stultz (johnstul@us.ibm.com)
+*
+* This program is free software; you can redistribute it and/or modify
+* it under the terms of the GNU General Public License as published by
+* the Free Software Foundation; either version 2 of the License, or
+* (at your option) any later version.
+*
+* This program is distributed in the hope that it will be useful,
+* but WITHOUT ANY WARRANTY; without even the implied warranty of
+* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+* GNU General Public License for more details.
+*
+* You should have received a copy of the GNU General Public License
+* along with this program; if not, write to the Free Software
+* Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
+*
+* TODO WishList:
+*   o Allow timesource drivers to be unregistered
+*   o get rid of timesource_jiffies extern
+**********************************************************************/
+
+#include <linux/timesource.h>
+#include <linux/sysdev.h>
+#include <linux/init.h>
+#include <linux/module.h>
+
+#define MAX_TIMESOURCES 10
+
+
+/* XXX - Would like a better way for initializing curr_timesource */
+extern struct timesource_t timesource_jiffies;
+
+/*[Timesource internal variables]---------
+ * curr_timesource:
+ *     currently selected timesource. Initialized to timesource_jiffies.
+ * next_timesource:
+ *     pending next selected timesource.
+ * timesource_list:
+ *     array of pointers pointing to registered timesources
+ * timesource_list_counter:
+ *     value which counts the number of registered timesources
+ * timesource_lock:
+ *     protects manipulations to curr_timesource and next_timesource
+ *     and the timesource_list
+ */
+static struct timesource_t *curr_timesource = &timesource_jiffies;
+static struct timesource_t *next_timesource;
+static struct timesource_t *timesource_list[MAX_TIMESOURCES];
+static int timesource_list_counter;
+static seqlock_t timesource_lock = SEQLOCK_UNLOCKED;
+
+static char override_name[32];
+
+/**
+ * get_next_timesource - Returns the selected timesource
+ *
+ */
+struct timesource_t* get_next_timesource(void)
+{
+	write_seqlock(&timesource_lock);
+	if (next_timesource) {
+		curr_timesource = next_timesource;
+		next_timesource = NULL;
+	}
+	write_sequnlock(&timesource_lock);
+
+	return curr_timesource;
+}
+
+/**
+ * select_timesource - Finds the best registered timesource.
+ *
+ * Private function. Must have a writelock on timesource_lock
+ * when called.
+ */
+static struct timesource_t* select_timesource(void)
+{
+	struct timesource_t* best = timesource_list[0];
+	int i;
+
+	for (i=0; i < timesource_list_counter; i++) {
+		/* Check for override */
+		if ((override_name[0] != 0) &&
+			(strlen(override_name)
+				== strlen(timesource_list[i]->name)) &&
+			(!strncmp(timesource_list[i]->name, override_name,
+				 strlen(override_name)))) {
+			best = timesource_list[i];
+			break;
+		}
+		/* Pick the highest priority */
+		if (timesource_list[i]->priority > best->priority)
+		 	best = timesource_list[i];
+	}
+	return best;
+}
+
+/**
+ * register_timesource - Used to install new timesources
+ * @t: timesource to be registered
+ *
+ */
+void register_timesource(struct timesource_t* t)
+{
+	char* error_msg = 0;
+	int i;
+	write_seqlock(&timesource_lock);
+
+	/* check if timesource is already registered */
+	for (i=0; i < timesource_list_counter; i++)
+		if (!strncmp(timesource_list[i]->name, t->name, strlen(t->name))){
+			error_msg = "Already registered!";
+			break;
+		}
+
+	/* check that the list isn't full */
+	if (timesource_list_counter >= MAX_TIMESOURCES)
+		error_msg = "Too many timesources!";
+
+	if(!error_msg)
+		timesource_list[timesource_list_counter++] = t;
+	else
+		printk("register_timesource: Cannot register %s. %s\n",
+					t->name, error_msg);
+
+	/* select next timesource */
+	next_timesource = select_timesource();
+
+	write_sequnlock(&timesource_lock);
+}
+EXPORT_SYMBOL(register_timesource);
+
+
+/**
+ * reselect_timesource - Rescan list for next timesource
+ *
+ * A quick helper function to be used if a timesource
+ * changes its priority. Forces the timesource list to
+ * be re-scaned for the best timesource.
+ */
+void reselect_timesource(void)
+{
+	write_seqlock(&timesource_lock);
+	next_timesource = select_timesource();
+	write_sequnlock(&timesource_lock);
+}
+
+/**
+ * sysfs_show_timesources - sysfs interface for listing timesource
+ * @dev: unused
+ * @buf: char buffer to be filled with timesource list
+ *
+ * Provides sysfs interface for listing registered timesources
+ */
+static ssize_t sysfs_show_timesources(struct sys_device *dev, char *buf)
+{
+	int i;
+	char* curr = buf;
+	write_seqlock(&timesource_lock);
+	for(i=0; i < timesource_list_counter; i++) {
+		/* Mark current timesource w/ a star */
+		if (timesource_list[i] == curr_timesource)
+			curr += sprintf(curr, "*");
+		curr += sprintf(curr, "%s ",timesource_list[i]->name);
+	}
+	write_sequnlock(&timesource_lock);
+
+	curr += sprintf(curr, "\n");
+	return curr - buf;
+}
+
+/**
+ * sysfs_override_timesource - interface for manually overriding timesource
+ * @dev: unused
+ * @buf: name of override timesource
+ *
+ *
+ *     Takes input from sysfs interface for manually overriding
+ *     the default timesource selction
+ */
+static ssize_t sysfs_override_timesource(struct sys_device *dev,
+			const char *buf, size_t count)
+{
+	/* check to avoid underflow later */
+	if (strlen(buf) == 0)
+		return count;
+
+	write_seqlock(&timesource_lock);
+
+	/* copy the name given */
+	strncpy(override_name, buf, strlen(buf)-1);
+	override_name[strlen(buf)-1] = 0;
+
+	/* see if we can find it */
+	next_timesource = select_timesource();
+
+	write_sequnlock(&timesource_lock);
+	return count;
+}
+
+/* Sysfs setup bits:
+ */
+static SYSDEV_ATTR(timesource, 0600, sysfs_show_timesources, sysfs_override_timesource);
+
+static struct sysdev_class timesource_sysclass = {
+	set_kset_name("timesource"),
+};
+
+static struct sys_device device_timesource = {
+	.id	= 0,
+	.cls	= &timesource_sysclass,
+};
+
+static int init_timesource_sysfs(void)
+{
+	int error = sysdev_class_register(&timesource_sysclass);
+	if (!error) {
+		error = sysdev_register(&device_timesource);
+		if (!error)
+			error = sysdev_create_file(&device_timesource, &attr_timesource);
+	}
+	return error;
+}
+device_initcall(init_timesource_sysfs);
+
+
+/**
+ * boot_override_timesource - boot time override
+ * @str: override name
+ *
+ * Takes a timesource= boot argument and uses it
+ * as the timesource override name
+ */
+static int __init boot_override_timesource(char* str)
+{
+	if (str)
+		strlcpy(override_name, str, sizeof(override_name));
+	return 1;
+}
+__setup("timesource=", boot_override_timesource);
+
+/**
+ * boot_override_clock - Compatibility layer for deprecated boot option
+ * @str: override name
+ *
+ * DEPRECATED! Takes a clock= boot argument and uses it
+ * as the timesource override name
+ */
+static int __init boot_override_clock(char* str)
+{
+	printk("Warning! clock= boot option is deprecated.\n");
+	return boot_override_timesource(str);
+}
+__setup("clock=", boot_override_clock);



^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 2/6] new timeofday i386 arch specific changes, part 1 for -mm (v.B3)
  2005-06-18  2:56 [PATCH 1/6] new timeofday core subsystem for -mm (v.B3) john stultz
@ 2005-06-18  2:58 ` john stultz
  2005-06-18  2:59   ` [PATCH 3/6] new timeofday i386 arch specific changes, part 2 " john stultz
  2005-06-18 12:02 ` [PATCH 1/6] new timeofday core subsystem " Roman Zippel
  1 sibling, 1 reply; 32+ messages in thread
From: john stultz @ 2005-06-18  2:58 UTC (permalink / raw)
  To: lkml, Andrew Morton
  Cc: Tim Schmielau, George Anzinger, albert, Ulrich Windl,
	Christoph Lameter, Dominik Brodowski, David Mosberger, Andi Kleen,
	paulus, schwidefsky, keith maanthey, Chris McDermott, Max Asbock,
	mahuja, Nishanth Aravamudan, Darren Hart, Darrick J. Wong,
	Anton Blanchard, donf, mpm, benh, kernel-stuff, frank

Andrew, All,
	To hopefully improve the review-ability of my changes, I've split up my
arch-i386 patch into four chunks. This patch is just a simple cleanup
for the i386 arch in preparation of moving the the new timeofday
infrastructure. It simply moves some code from timer_pit.c to i8259.c.
	
It applies on top of my timeofday-core_B3 patch. This patch is part the
timeofday-arch-i386 patchset, so without the following parts it is not
expected to compile (although just this one should).
	
Andrew, please consider for inclusion for testing into your tree.

thanks
-john

Signed-off-by: John Stultz <johnstul@us.ibm.com>

linux-2.6.12-rc6-mm1_timeofday-arch-i386-part1_B3.patch
=======================================================
diff -ruN linux-2.6.12-rc6-mm1/arch/i386/kernel/i8259.c linux-2.6.12-rc6-mm1-tod/arch/i386/kernel/i8259.c
--- linux-2.6.12-rc6-mm1/arch/i386/kernel/i8259.c	2005-06-17 15:56:27.000000000 -0700
+++ linux-2.6.12-rc6-mm1-tod/arch/i386/kernel/i8259.c	2005-06-17 18:28:05.576479704 -0700
@@ -400,6 +400,46 @@
 	}
 }
 
+void setup_pit_timer(void)
+{
+	extern spinlock_t i8253_lock;
+	unsigned long flags;
+
+	spin_lock_irqsave(&i8253_lock, flags);
+	outb_p(0x34,PIT_MODE);		/* binary, mode 2, LSB/MSB, ch 0 */
+	udelay(10);
+	outb_p(LATCH & 0xff , PIT_CH0);	/* LSB */
+	udelay(10);
+	outb(LATCH >> 8 , PIT_CH0);	/* MSB */
+	spin_unlock_irqrestore(&i8253_lock, flags);
+}
+
+static int timer_resume(struct sys_device *dev)
+{
+	setup_pit_timer();
+	return 0;
+}
+
+static struct sysdev_class timer_sysclass = {
+	set_kset_name("timer_pit"),
+	.resume	= timer_resume,
+};
+
+static struct sys_device device_timer = {
+	.id	= 0,
+	.cls	= &timer_sysclass,
+};
+
+static int __init init_timer_sysfs(void)
+{
+	int error = sysdev_class_register(&timer_sysclass);
+	if (!error)
+		error = sysdev_register(&device_timer);
+	return error;
+}
+
+device_initcall(init_timer_sysfs);
+
 void __init init_IRQ(void)
 {
 	int i;
diff -ruN linux-2.6.12-rc6-mm1/arch/i386/kernel/timers/timer_pit.c linux-2.6.12-rc6-mm1-tod/arch/i386/kernel/timers/timer_pit.c
--- linux-2.6.12-rc6-mm1/arch/i386/kernel/timers/timer_pit.c	2005-06-17 15:56:27.000000000 -0700
+++ linux-2.6.12-rc6-mm1-tod/arch/i386/kernel/timers/timer_pit.c	2005-06-17 18:28:05.586478462 -0700
@@ -163,44 +163,3 @@
 	.init = init_pit, 
 	.opts = &timer_pit,
 };
-
-void setup_pit_timer(void)
-{
-	extern spinlock_t i8253_lock;
-	unsigned long flags;
-
-	spin_lock_irqsave(&i8253_lock, flags);
-	outb_p(0x34,PIT_MODE);		/* binary, mode 2, LSB/MSB, ch 0 */
-	udelay(10);
-	outb_p(LATCH & 0xff , PIT_CH0);	/* LSB */
-	udelay(10);
-	outb(LATCH >> 8 , PIT_CH0);	/* MSB */
-	spin_unlock_irqrestore(&i8253_lock, flags);
-}
-
-static int timer_resume(struct sys_device *dev)
-{
-	setup_pit_timer();
-	return 0;
-}
-
-static struct sysdev_class timer_sysclass = {
-	set_kset_name("timer_pit"),
-	.resume	= timer_resume,
-};
-
-static struct sys_device device_timer = {
-	.id	= 0,
-	.cls	= &timer_sysclass,
-};
-
-static int __init init_timer_sysfs(void)
-{
-	int error = sysdev_class_register(&timer_sysclass);
-	if (!error)
-		error = sysdev_register(&device_timer);
-	return error;
-}
-
-device_initcall(init_timer_sysfs);
-



^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 3/6] new timeofday i386 arch specific changes, part 2 for -mm (v.B3)
  2005-06-18  2:58 ` [PATCH 2/6] new timeofday i386 arch specific changes, part 1 " john stultz
@ 2005-06-18  2:59   ` john stultz
  2005-06-18  3:01     ` [PATCH 4/6] new timeofday i386 arch specific changes, part 3 " john stultz
  0 siblings, 1 reply; 32+ messages in thread
From: john stultz @ 2005-06-18  2:59 UTC (permalink / raw)
  To: lkml, Andrew Morton
  Cc: Tim Schmielau, George Anzinger, albert, Ulrich Windl,
	Christoph Lameter, Dominik Brodowski, David Mosberger, Andi Kleen,
	paulus, schwidefsky, keith maanthey, Chris McDermott, Max Asbock,
	mahuja, Nishanth Aravamudan, Darren Hart, Darrick J. Wong,
	Anton Blanchard, donf, mpm, benh, kernel-stuff, frank

Andrew, All,
	To hopefully improve the review-ability of my changes, I've split up my
arch-i386 patch into four chunks. This patch is a cleanup patch for the
i386 arch in preparation of moving the the new timeofday infrastructure.
It moves some code from timer_tsc.c to a new tsc.c file.

It applies on top of my timeofday-arch-i386-part1_B3 patch. This patch
is part the timeofday-arch-i386 patchset, so without the following parts
it is not expected to compile.
	
Andrew, please consider for inclusion for testing into your tree.

thanks
-john

Signed-off-by: John Stultz <johnstul@us.ibm.com>

linux-2.6.12-rc6-mm1_timeofday-arch-i386-part2_B3.patch
=======================================================
diff -ruN linux-2.6.12-rc6-mm1/arch/i386/kernel/Makefile linux-2.6.12-rc6-mm1-tod/arch/i386/kernel/Makefile
--- linux-2.6.12-rc6-mm1/arch/i386/kernel/Makefile	2005-06-17 18:52:02.090991765 -0700
+++ linux-2.6.12-rc6-mm1-tod/arch/i386/kernel/Makefile	2005-06-17 18:35:08.000000000 -0700
@@ -7,7 +7,7 @@
 obj-y	:= process.o semaphore.o signal.o entry.o traps.o irq.o vm86.o \
 		ptrace.o time.o ioport.o ldt.o setup.o i8259.o sys_i386.o \
 		pci-dma.o i386_ksyms.o i387.o dmi_scan.o bootflag.o \
-		doublefault.o quirks.o
+		doublefault.o quirks.o tsc.o
 
 obj-y				+= cpu/
 obj-y				+= timers/
diff -ruN linux-2.6.12-rc6-mm1/arch/i386/kernel/timers/common.c linux-2.6.12-rc6-mm1-tod/arch/i386/kernel/timers/common.c
--- linux-2.6.12-rc6-mm1/arch/i386/kernel/timers/common.c	2005-06-17 18:52:02.091991641 -0700
+++ linux-2.6.12-rc6-mm1-tod/arch/i386/kernel/timers/common.c	2005-06-17 18:35:08.000000000 -0700
@@ -14,66 +14,6 @@
 
 #include "mach_timer.h"
 
-/* ------ Calibrate the TSC -------
- * Return 2^32 * (1 / (TSC clocks per usec)) for do_fast_gettimeoffset().
- * Too much 64-bit arithmetic here to do this cleanly in C, and for
- * accuracy's sake we want to keep the overhead on the CTC speaker (channel 2)
- * output busy loop as low as possible. We avoid reading the CTC registers
- * directly because of the awkward 8-bit access mechanism of the 82C54
- * device.
- */
-
-#define CALIBRATE_TIME	(5 * 1000020/HZ)
-
-unsigned long calibrate_tsc(void)
-{
-	mach_prepare_counter();
-
-	{
-		unsigned long startlow, starthigh;
-		unsigned long endlow, endhigh;
-		unsigned long count;
-
-		rdtsc(startlow,starthigh);
-		mach_countup(&count);
-		rdtsc(endlow,endhigh);
-
-
-		/* Error: ECTCNEVERSET */
-		if (count <= 1)
-			goto bad_ctc;
-
-		/* 64-bit subtract - gcc just messes up with long longs */
-		__asm__("subl %2,%0\n\t"
-			"sbbl %3,%1"
-			:"=a" (endlow), "=d" (endhigh)
-			:"g" (startlow), "g" (starthigh),
-			 "0" (endlow), "1" (endhigh));
-
-		/* Error: ECPUTOOFAST */
-		if (endhigh)
-			goto bad_ctc;
-
-		/* Error: ECPUTOOSLOW */
-		if (endlow <= CALIBRATE_TIME)
-			goto bad_ctc;
-
-		__asm__("divl %2"
-			:"=a" (endlow), "=d" (endhigh)
-			:"r" (endlow), "0" (0), "1" (CALIBRATE_TIME));
-
-		return endlow;
-	}
-
-	/*
-	 * The CTC wasn't reliable: we got a hit on the very first read,
-	 * or the CPU was so fast/slow that the quotient wouldn't fit in
-	 * 32 bits..
-	 */
-bad_ctc:
-	return 0;
-}
-
 #ifdef CONFIG_HPET_TIMER
 /* ------ Calibrate the TSC using HPET -------
  * Return 2^32 * (1 / (TSC clocks per usec)) for getting the CPU freq.
@@ -148,24 +88,3 @@
 }
 
 
-/* calculate cpu_khz */
-void init_cpu_khz(void)
-{
-	if (cpu_has_tsc) {
-		unsigned long tsc_quotient = calibrate_tsc();
-		if (tsc_quotient) {
-			/* report CPU clock rate in Hz.
-			 * The formula is (10^6 * 2^32) / (2^32 * 1 / (clocks/us)) =
-			 * clock/second. Our precision is about 100 ppm.
-			 */
-			{	unsigned long eax=0, edx=1000;
-				__asm__("divl %2"
-		       		:"=a" (cpu_khz), "=d" (edx)
-        	       		:"r" (tsc_quotient),
-	                	"0" (eax), "1" (edx));
-				printk("Detected %lu.%03lu MHz processor.\n", cpu_khz / 1000, cpu_khz % 1000);
-			}
-		}
-	}
-}
-
diff -ruN linux-2.6.12-rc6-mm1/arch/i386/kernel/timers/timer.c linux-2.6.12-rc6-mm1-tod/arch/i386/kernel/timers/timer.c
--- linux-2.6.12-rc6-mm1/arch/i386/kernel/timers/timer.c	2005-06-17 15:56:25.000000000 -0700
+++ linux-2.6.12-rc6-mm1-tod/arch/i386/kernel/timers/timer.c	2005-06-17 18:52:43.243878067 -0700
@@ -64,12 +64,3 @@
 	panic("select_timer: Cannot find a suitable timer\n");
 	return NULL;
 }
-
-int read_current_timer(unsigned long *timer_val)
-{
-	if (cur_timer->read_timer) {
-		*timer_val = cur_timer->read_timer();
-		return 0;
-	}
-	return -1;
-}
diff -ruN linux-2.6.12-rc6-mm1/arch/i386/kernel/timers/timer_tsc.c linux-2.6.12-rc6-mm1-tod/arch/i386/kernel/timers/timer_tsc.c
--- linux-2.6.12-rc6-mm1/arch/i386/kernel/timers/timer_tsc.c	2005-06-17 18:52:02.096991020 -0700
+++ linux-2.6.12-rc6-mm1-tod/arch/i386/kernel/timers/timer_tsc.c	2005-06-17 18:36:48.000000000 -0700
@@ -31,10 +31,6 @@
 static struct timer_opts timer_tsc;
 #endif
 
-static inline void cpufreq_delayed_get(void);
-
-int tsc_disable __devinitdata = 0;
-
 extern spinlock_t i8253_lock;
 
 static int use_tsc;
@@ -46,34 +42,6 @@
 static unsigned long long monotonic_base;
 static seqlock_t monotonic_lock = SEQLOCK_UNLOCKED;
 
-/* convert from cycles(64bits) => nanoseconds (64bits)
- *  basic equation:
- *		ns = cycles / (freq / ns_per_sec)
- *		ns = cycles * (ns_per_sec / freq)
- *		ns = cycles * (10^9 / (cpu_mhz * 10^6))
- *		ns = cycles * (10^3 / cpu_mhz)
- *
- *	Then we use scaling math (suggested by george@mvista.com) to get:
- *		ns = cycles * (10^3 * SC / cpu_mhz) / SC
- *		ns = cycles * cyc2ns_scale / SC
- *
- *	And since SC is a constant power of two, we can convert the div
- *  into a shift.   
- *			-johnstul@us.ibm.com "math is hard, lets go shopping!"
- */
-static unsigned long cyc2ns_scale; 
-#define CYC2NS_SCALE_FACTOR 10 /* 2^10, carefully chosen */
-
-static inline void set_cyc2ns_scale(unsigned long cpu_mhz)
-{
-	cyc2ns_scale = (1000 << CYC2NS_SCALE_FACTOR)/cpu_mhz;
-}
-
-static inline unsigned long long cycles_2_ns(unsigned long long cyc)
-{
-	return (cyc * cyc2ns_scale) >> CYC2NS_SCALE_FACTOR;
-}
-
 static int count2; /* counter for mark_offset_tsc() */
 
 /* Cached *multiplier* to convert TSC counts to microseconds.
@@ -131,29 +99,6 @@
 	return base + cycles_2_ns(this_offset - last_offset);
 }
 
-/*
- * Scheduler clock - returns current time in nanosec units.
- */
-unsigned long long sched_clock(void)
-{
-	unsigned long long this_offset;
-
-	/*
-	 * In the NUMA case we dont use the TSC as they are not
-	 * synchronized across all CPUs.
-	 */
-#ifndef CONFIG_NUMA
-	if (!use_tsc)
-#endif
-		/* no locking but a rare wrong value is not a big deal */
-		return jiffies_64 * (1000000000 / HZ);
-
-	/* Read the Time Stamp Counter */
-	rdtscll(this_offset);
-
-	/* return the value in ns */
-	return cycles_2_ns(this_offset);
-}
 
 static void delay_tsc(unsigned long loops)
 {
@@ -218,128 +163,6 @@
 #endif
 
 
-#ifdef CONFIG_CPU_FREQ
-#include <linux/workqueue.h>
-
-static unsigned int cpufreq_delayed_issched = 0;
-static unsigned int cpufreq_init = 0;
-static struct work_struct cpufreq_delayed_get_work;
-
-static void handle_cpufreq_delayed_get(void *v)
-{
-	unsigned int cpu;
-	for_each_online_cpu(cpu) {
-		cpufreq_get(cpu);
-	}
-	cpufreq_delayed_issched = 0;
-}
-
-/* if we notice lost ticks, schedule a call to cpufreq_get() as it tries
- * to verify the CPU frequency the timing core thinks the CPU is running
- * at is still correct.
- */
-static inline void cpufreq_delayed_get(void) 
-{
-	if (cpufreq_init && !cpufreq_delayed_issched) {
-		cpufreq_delayed_issched = 1;
-		printk(KERN_DEBUG "Losing some ticks... checking if CPU frequency changed.\n");
-		schedule_work(&cpufreq_delayed_get_work);
-	}
-}
-
-/* If the CPU frequency is scaled, TSC-based delays will need a different
- * loops_per_jiffy value to function properly.
- */
-
-static unsigned int  ref_freq = 0;
-static unsigned long loops_per_jiffy_ref = 0;
-
-#ifndef CONFIG_SMP
-static unsigned long fast_gettimeoffset_ref = 0;
-static unsigned long cpu_khz_ref = 0;
-#endif
-
-static int
-time_cpufreq_notifier(struct notifier_block *nb, unsigned long val,
-		       void *data)
-{
-	struct cpufreq_freqs *freq = data;
-
-	if (val != CPUFREQ_RESUMECHANGE)
-		write_seqlock_irq(&xtime_lock);
-	if (!ref_freq) {
-		ref_freq = freq->old;
-		loops_per_jiffy_ref = cpu_data[freq->cpu].loops_per_jiffy;
-#ifndef CONFIG_SMP
-		fast_gettimeoffset_ref = fast_gettimeoffset_quotient;
-		cpu_khz_ref = cpu_khz;
-#endif
-	}
-
-	if ((val == CPUFREQ_PRECHANGE  && freq->old < freq->new) ||
-	    (val == CPUFREQ_POSTCHANGE && freq->old > freq->new) ||
-	    (val == CPUFREQ_RESUMECHANGE)) {
-		if (!(freq->flags & CPUFREQ_CONST_LOOPS))
-			cpu_data[freq->cpu].loops_per_jiffy = cpufreq_scale(loops_per_jiffy_ref, ref_freq, freq->new);
-#ifndef CONFIG_SMP
-		if (cpu_khz)
-			cpu_khz = cpufreq_scale(cpu_khz_ref, ref_freq, freq->new);
-		if (use_tsc) {
-			if (!(freq->flags & CPUFREQ_CONST_LOOPS)) {
-				fast_gettimeoffset_quotient = cpufreq_scale(fast_gettimeoffset_ref, freq->new, ref_freq);
-				set_cyc2ns_scale(cpu_khz/1000);
-			}
-		}
-#endif
-	}
-
-	if (val != CPUFREQ_RESUMECHANGE)
-		write_sequnlock_irq(&xtime_lock);
-
-	return 0;
-}
-
-static struct notifier_block time_cpufreq_notifier_block = {
-	.notifier_call	= time_cpufreq_notifier
-};
-
-
-static int __init cpufreq_tsc(void)
-{
-	int ret;
-	INIT_WORK(&cpufreq_delayed_get_work, handle_cpufreq_delayed_get, NULL);
-	ret = cpufreq_register_notifier(&time_cpufreq_notifier_block,
-					CPUFREQ_TRANSITION_NOTIFIER);
-	if (!ret)
-		cpufreq_init = 1;
-	return ret;
-}
-core_initcall(cpufreq_tsc);
-
-#else /* CONFIG_CPU_FREQ */
-static inline void cpufreq_delayed_get(void) { return; }
-#endif 
-
-int recalibrate_cpu_khz(void)
-{
-#ifndef CONFIG_SMP
-	unsigned long cpu_khz_old = cpu_khz;
-
-	if (cpu_has_tsc) {
-		init_cpu_khz();
-		cpu_data[0].loops_per_jiffy =
-		    cpufreq_scale(cpu_data[0].loops_per_jiffy,
-			          cpu_khz_old,
-				  cpu_khz);
-		return 0;
-	} else
-		return -ENODEV;
-#else
-	return -ENODEV;
-#endif
-}
-EXPORT_SYMBOL(recalibrate_cpu_khz);
-
 static void mark_offset_tsc(void)
 {
 	unsigned long lost,delay;
@@ -543,24 +366,6 @@
 	return -ENODEV;
 }
 
-#ifndef CONFIG_X86_TSC
-/* disable flag for tsc.  Takes effect by clearing the TSC cpu flag
- * in cpu/common.c */
-static int __init tsc_setup(char *str)
-{
-	tsc_disable = 1;
-	return 1;
-}
-#else
-static int __init tsc_setup(char *str)
-{
-	printk(KERN_WARNING "notsc: Kernel compiled with CONFIG_X86_TSC, "
-				"cannot disable TSC.\n");
-	return 1;
-}
-#endif
-__setup("notsc", tsc_setup);
-
 
 
 /************************************************************/
diff -ruN linux-2.6.12-rc6-mm1/arch/i386/kernel/tsc.c linux-2.6.12-rc6-mm1-tod/arch/i386/kernel/tsc.c
--- linux-2.6.12-rc6-mm1/arch/i386/kernel/tsc.c	1969-12-31 16:00:00.000000000 -0800
+++ linux-2.6.12-rc6-mm1-tod/arch/i386/kernel/tsc.c	2005-06-17 18:52:39.975284226 -0700
@@ -0,0 +1,298 @@
+/*
+ * This code largely moved from arch/i386/kernel/timer/timer_tsc.c
+ * which was originally moved from arch/i386/kernel/time.c.
+ * See comments there for proper credits.
+ */
+
+#include <linux/init.h>
+#include <linux/timex.h>
+#include <linux/cpufreq.h>
+#include <asm/io.h>
+#include "mach_timer.h"
+
+int tsc_disable __initdata = 0;
+#ifndef CONFIG_X86_TSC
+/* disable flag for tsc.  Takes effect by clearing the TSC cpu flag
+ * in cpu/common.c */
+static int __init tsc_setup(char *str)
+{
+	tsc_disable = 1;
+	return 1;
+}
+#else
+static int __init tsc_setup(char *str)
+{
+	printk(KERN_WARNING "notsc: Kernel compiled with CONFIG_X86_TSC, "
+				"cannot disable TSC.\n");
+	return 1;
+}
+#endif
+__setup("notsc", tsc_setup);
+
+
+int read_current_timer(unsigned long *timer_val)
+{
+	if (cur_timer->read_timer) {
+		*timer_val = cur_timer->read_timer();
+		return 0;
+	}
+	return -1;
+}
+
+
+/* convert from cycles(64bits) => nanoseconds (64bits)
+ *  basic equation:
+ *		ns = cycles / (freq / ns_per_sec)
+ *		ns = cycles * (ns_per_sec / freq)
+ *		ns = cycles * (10^9 / (cpu_mhz * 10^6))
+ *		ns = cycles * (10^3 / cpu_mhz)
+ *
+ *	Then we use scaling math (suggested by george@mvista.com) to get:
+ *		ns = cycles * (10^3 * SC / cpu_mhz) / SC
+ *		ns = cycles * cyc2ns_scale / SC
+ *
+ *	And since SC is a constant power of two, we can convert the div
+ *  into a shift.
+ *			-johnstul@us.ibm.com "math is hard, lets go shopping!"
+ */
+static unsigned long cyc2ns_scale;
+#define CYC2NS_SCALE_FACTOR 10 /* 2^10, carefully chosen */
+
+static inline void set_cyc2ns_scale(unsigned long cpu_mhz)
+{
+	cyc2ns_scale = (1000 << CYC2NS_SCALE_FACTOR)/cpu_mhz;
+}
+
+static inline unsigned long long cycles_2_ns(unsigned long long cyc)
+{
+	return (cyc * cyc2ns_scale) >> CYC2NS_SCALE_FACTOR;
+}
+
+/*
+ * Scheduler clock - returns current time in nanosec units.
+ */
+unsigned long long sched_clock(void)
+{
+	unsigned long long this_offset;
+
+	/*
+	 * In the NUMA case we dont use the TSC as they are not
+	 * synchronized across all CPUs.
+	 */
+#ifndef CONFIG_NUMA
+	if (!use_tsc)
+#endif
+		/* no locking but a rare wrong value is not a big deal */
+		return jiffies_64 * (1000000000 / HZ);
+
+	/* Read the Time Stamp Counter */
+	rdtscll(this_offset);
+
+	/* return the value in ns */
+	return cycles_2_ns(this_offset);
+}
+
+/* ------ Calibrate the TSC -------
+ * Return 2^32 * (1 / (TSC clocks per usec)) for do_fast_gettimeoffset().
+ * Too much 64-bit arithmetic here to do this cleanly in C, and for
+ * accuracy's sake we want to keep the overhead on the CTC speaker (channel 2)
+ * output busy loop as low as possible. We avoid reading the CTC registers
+ * directly because of the awkward 8-bit access mechanism of the 82C54
+ * device.
+ */
+
+#define CALIBRATE_TIME	(5 * 1000020/HZ)
+
+unsigned long calibrate_tsc(void)
+{
+	mach_prepare_counter();
+
+	{
+		unsigned long startlow, starthigh;
+		unsigned long endlow, endhigh;
+		unsigned long count;
+
+		rdtsc(startlow,starthigh);
+		mach_countup(&count);
+		rdtsc(endlow,endhigh);
+
+
+		/* Error: ECTCNEVERSET */
+		if (count <= 1)
+			goto bad_ctc;
+
+		/* 64-bit subtract - gcc just messes up with long longs */
+		__asm__("subl %2,%0\n\t"
+			"sbbl %3,%1"
+			:"=a" (endlow), "=d" (endhigh)
+			:"g" (startlow), "g" (starthigh),
+			 "0" (endlow), "1" (endhigh));
+
+		/* Error: ECPUTOOFAST */
+		if (endhigh)
+			goto bad_ctc;
+
+		/* Error: ECPUTOOSLOW */
+		if (endlow <= CALIBRATE_TIME)
+			goto bad_ctc;
+
+		__asm__("divl %2"
+			:"=a" (endlow), "=d" (endhigh)
+			:"r" (endlow), "0" (0), "1" (CALIBRATE_TIME));
+
+		return endlow;
+	}
+
+	/*
+	 * The CTC wasn't reliable: we got a hit on the very first read,
+	 * or the CPU was so fast/slow that the quotient wouldn't fit in
+	 * 32 bits..
+	 */
+bad_ctc:
+	return 0;
+}
+
+int recalibrate_cpu_khz(void)
+{
+#ifndef CONFIG_SMP
+	unsigned long cpu_khz_old = cpu_khz;
+
+	if (cpu_has_tsc) {
+		init_cpu_khz();
+		cpu_data[0].loops_per_jiffy =
+		    cpufreq_scale(cpu_data[0].loops_per_jiffy,
+			          cpu_khz_old,
+				  cpu_khz);
+		return 0;
+	} else
+		return -ENODEV;
+#else
+	return -ENODEV;
+#endif
+}
+EXPORT_SYMBOL(recalibrate_cpu_khz);
+
+
+/* calculate cpu_khz */
+void init_cpu_khz(void)
+{
+	if (cpu_has_tsc) {
+		unsigned long tsc_quotient = calibrate_tsc();
+		if (tsc_quotient) {
+			/* report CPU clock rate in Hz.
+			 * The formula is (10^6 * 2^32) / (2^32 * 1 / (clocks/us)) =
+			 * clock/second. Our precision is about 100 ppm.
+			 */
+			{	unsigned long eax=0, edx=1000;
+				__asm__("divl %2"
+		       		:"=a" (cpu_khz), "=d" (edx)
+        	       		:"r" (tsc_quotient),
+	                	"0" (eax), "1" (edx));
+				printk("Detected %lu.%03lu MHz processor.\n", cpu_khz / 1000, cpu_khz % 1000);
+			}
+		}
+	}
+}
+
+
+#ifdef CONFIG_CPU_FREQ
+#include <linux/workqueue.h>
+
+static unsigned int cpufreq_delayed_issched = 0;
+static unsigned int cpufreq_init = 0;
+static struct work_struct cpufreq_delayed_get_work;
+
+static void handle_cpufreq_delayed_get(void *v)
+{
+	unsigned int cpu;
+	for_each_online_cpu(cpu) {
+		cpufreq_get(cpu);
+	}
+	cpufreq_delayed_issched = 0;
+}
+
+/* if we notice lost ticks, schedule a call to cpufreq_get() as it tries
+ * to verify the CPU frequency the timing core thinks the CPU is running
+ * at is still correct.
+ */
+void cpufreq_delayed_get(void)
+{
+	if (cpufreq_init && !cpufreq_delayed_issched) {
+		cpufreq_delayed_issched = 1;
+		printk(KERN_DEBUG "Losing some ticks... checking if CPU frequency changed.\n");
+		schedule_work(&cpufreq_delayed_get_work);
+	}
+}
+
+/* If the CPU frequency is scaled, TSC-based delays will need a different
+ * loops_per_jiffy value to function properly.
+ */
+
+static unsigned int  ref_freq = 0;
+static unsigned long loops_per_jiffy_ref = 0;
+
+#ifndef CONFIG_SMP
+static unsigned long fast_gettimeoffset_ref = 0;
+static unsigned long cpu_khz_ref = 0;
+#endif
+
+static int
+time_cpufreq_notifier(struct notifier_block *nb, unsigned long val,
+		       void *data)
+{
+	struct cpufreq_freqs *freq = data;
+
+	if (val != CPUFREQ_RESUMECHANGE)
+		write_seqlock_irq(&xtime_lock);
+	if (!ref_freq) {
+		ref_freq = freq->old;
+		loops_per_jiffy_ref = cpu_data[freq->cpu].loops_per_jiffy;
+#ifndef CONFIG_SMP
+		fast_gettimeoffset_ref = fast_gettimeoffset_quotient;
+		cpu_khz_ref = cpu_khz;
+#endif
+	}
+
+	if ((val == CPUFREQ_PRECHANGE  && freq->old < freq->new) ||
+	    (val == CPUFREQ_POSTCHANGE && freq->old > freq->new) ||
+	    (val == CPUFREQ_RESUMECHANGE)) {
+		if (!(freq->flags & CPUFREQ_CONST_LOOPS))
+			cpu_data[freq->cpu].loops_per_jiffy = cpufreq_scale(loops_per_jiffy_ref, ref_freq, freq->new);
+#ifndef CONFIG_SMP
+		if (cpu_khz)
+			cpu_khz = cpufreq_scale(cpu_khz_ref, ref_freq, freq->new);
+		if (use_tsc) {
+			if (!(freq->flags & CPUFREQ_CONST_LOOPS)) {
+				fast_gettimeoffset_quotient = cpufreq_scale(fast_gettimeoffset_ref, freq->new, ref_freq);
+				set_cyc2ns_scale(cpu_khz/1000);
+			}
+		}
+#endif
+	}
+
+	if (val != CPUFREQ_RESUMECHANGE)
+		write_sequnlock_irq(&xtime_lock);
+
+	return 0;
+}
+
+static struct notifier_block time_cpufreq_notifier_block = {
+	.notifier_call	= time_cpufreq_notifier
+};
+
+
+static int __init cpufreq_tsc(void)
+{
+	int ret;
+	INIT_WORK(&cpufreq_delayed_get_work, handle_cpufreq_delayed_get, NULL);
+	ret = cpufreq_register_notifier(&time_cpufreq_notifier_block,
+					CPUFREQ_TRANSITION_NOTIFIER);
+	if (!ret)
+		cpufreq_init = 1;
+	return ret;
+}
+core_initcall(cpufreq_tsc);
+
+#else /* CONFIG_CPU_FREQ */
+void cpufreq_delayed_get(void) { return; }
+#endif



^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 4/6] new timeofday i386 arch specific changes, part 3 for -mm (v.B3)
  2005-06-18  2:59   ` [PATCH 3/6] new timeofday i386 arch specific changes, part 2 " john stultz
@ 2005-06-18  3:01     ` john stultz
  2005-06-18  3:02       ` [PATCH 5/6] new timeofday i386 arch specific changes, part 4 " john stultz
  0 siblings, 1 reply; 32+ messages in thread
From: john stultz @ 2005-06-18  3:01 UTC (permalink / raw)
  To: lkml, Andrew Morton
  Cc: Tim Schmielau, George Anzinger, albert, Ulrich Windl,
	Christoph Lameter, Dominik Brodowski, David Mosberger, Andi Kleen,
	paulus, schwidefsky, keith maanthey, Chris McDermott, Max Asbock,
	mahuja, Nishanth Aravamudan, Darren Hart, Darrick J. Wong,
	Anton Blanchard, donf, mpm, benh, kernel-stuff, frank

Andrew, All,
	To hopefully improve the review-ability of my changes, I've split up my
arch-i386 patch into four chunks. This patch reworks some of the code in
the new tsc.c file. Additionally it adds some new interfaces and hooks
to use these new interfaces appropriately.  This patch also renames some
ACPI PM variables. 

It applies on top of my timeofday-arch-i386-part2_B3 patch. This patch
is part the timeofday-arch-i386 patchset, so without the following parts
it is not expected to compile.
	
Andrew, please consider for inclusion for testing into your tree.

thanks
-john

Signed-off-by: John Stultz <johnstul@us.ibm.com>

linux-2.6.12-rc6-mm1_timeofday-arch-i386-part3_B3.patch
=======================================================
diff -ruN linux-2.6.12-rc6-mm1/arch/i386/kernel/acpi/boot.c linux-2.6.12-rc6-mm1-tod/arch/i386/kernel/acpi/boot.c
--- linux-2.6.12-rc6-mm1/arch/i386/kernel/acpi/boot.c	2005-06-17 19:32:40.000000000 -0700
+++ linux-2.6.12-rc6-mm1-tod/arch/i386/kernel/acpi/boot.c	2005-06-17 19:34:48.010995694 -0700
@@ -607,7 +607,8 @@
 #endif
 
 #ifdef CONFIG_X86_PM_TIMER
-extern u32 pmtmr_ioport;
+u32 acpi_pmtmr_ioport;
+int acpi_pmtmr_buggy;
 #endif
 
 static int __init acpi_parse_fadt(unsigned long phys, unsigned long size)
@@ -638,13 +639,13 @@
 		if (fadt->xpm_tmr_blk.address_space_id != ACPI_ADR_SPACE_SYSTEM_IO)
 			return 0;
 
-		pmtmr_ioport = fadt->xpm_tmr_blk.address;
+		acpi_pmtmr_ioport = fadt->xpm_tmr_blk.address;
 	} else {
 		/* FADT rev. 1 */
-		pmtmr_ioport = fadt->V1_pm_tmr_blk;
+		acpi_pmtmr_ioport = fadt->V1_pm_tmr_blk;
 	}
-	if (pmtmr_ioport)
-		printk(KERN_INFO PREFIX "PM-Timer IO Port: %#x\n", pmtmr_ioport);
+	if (acpi_pmtmr_ioport)
+		printk(KERN_INFO PREFIX "PM-Timer IO Port: %#x\n", acpi_pmtmr_ioport);
 #endif
 	return 0;
 }
diff -ruN linux-2.6.12-rc6-mm1/arch/i386/kernel/setup.c linux-2.6.12-rc6-mm1-tod/arch/i386/kernel/setup.c
--- linux-2.6.12-rc6-mm1/arch/i386/kernel/setup.c	2005-06-17 19:32:40.000000000 -0700
+++ linux-2.6.12-rc6-mm1-tod/arch/i386/kernel/setup.c	2005-06-17 19:34:48.012995445 -0700
@@ -1610,6 +1610,7 @@
 	conswitchp = &dummy_con;
 #endif
 #endif
+	tsc_init();
 }
 
 #include "setup_arch_post.h"
diff -ruN linux-2.6.12-rc6-mm1/arch/i386/kernel/tsc.c linux-2.6.12-rc6-mm1-tod/arch/i386/kernel/tsc.c
--- linux-2.6.12-rc6-mm1/arch/i386/kernel/tsc.c	2005-06-17 19:32:40.000000000 -0700
+++ linux-2.6.12-rc6-mm1-tod/arch/i386/kernel/tsc.c	2005-06-17 19:35:10.108247154 -0700
@@ -5,11 +5,17 @@
  */
 
 #include <linux/init.h>
-#include <linux/timex.h>
 #include <linux/cpufreq.h>
+#include <asm/tsc.h>
 #include <asm/io.h>
 #include "mach_timer.h"
 
+/* On some systems the TSC frequency does not
+ * change with the cpu frequency. So we need
+ * an extra value to store the TSC freq
+ */
+unsigned long tsc_khz;
+
 int tsc_disable __initdata = 0;
 #ifndef CONFIG_X86_TSC
 /* disable flag for tsc.  Takes effect by clearing the TSC cpu flag
@@ -32,15 +38,43 @@
 
 int read_current_timer(unsigned long *timer_val)
 {
-	if (cur_timer->read_timer) {
-		*timer_val = cur_timer->read_timer();
+	if (!tsc_disable && cpu_khz) {
+		rdtscl(*timer_val);;
 		return 0;
 	}
 	return -1;
 }
 
+/* Code to mark and check if the TSC is unstable
+ * due to cpufreq or due to unsynced TSCs
+ */
+static int tsc_unstable;
+int check_tsc_unstable(void)
+{
+	return tsc_unstable;
+}
+
+void mark_tsc_unstable(void)
+{
+	tsc_unstable = 1;
+}
+
+/* Code to compensate for C3 stalls */
+static u64 tsc_c3_offset;
+void tsc_c3_compensate(unsigned long usecs)
+{
+	u64 cycles = (usecs * tsc_khz)/1000;
+	tsc_c3_offset += cycles;
+}
+
+u64 tsc_read_c3_time(void)
+{
+	return tsc_c3_offset;
+}
+
 
-/* convert from cycles(64bits) => nanoseconds (64bits)
+/* Accellerators for sched_clock()
+ * convert from cycles(64bits) => nanoseconds (64bits)
  *  basic equation:
  *		ns = cycles / (freq / ns_per_sec)
  *		ns = cycles * (ns_per_sec / freq)
@@ -80,76 +114,54 @@
 	 * synchronized across all CPUs.
 	 */
 #ifndef CONFIG_NUMA
-	if (!use_tsc)
+	if (!cpu_khz || check_tsc_unstable())
 #endif
 		/* no locking but a rare wrong value is not a big deal */
-		return jiffies_64 * (1000000000 / HZ);
+		return (jiffies_64 - INITIAL_JIFFIES) * (1000000000 / HZ);
 
 	/* Read the Time Stamp Counter */
 	rdtscll(this_offset);
+	this_offset += tsc_read_c3_time();
 
 	/* return the value in ns */
 	return cycles_2_ns(this_offset);
 }
 
-/* ------ Calibrate the TSC -------
- * Return 2^32 * (1 / (TSC clocks per usec)) for do_fast_gettimeoffset().
- * Too much 64-bit arithmetic here to do this cleanly in C, and for
- * accuracy's sake we want to keep the overhead on the CTC speaker (channel 2)
- * output busy loop as low as possible. We avoid reading the CTC registers
- * directly because of the awkward 8-bit access mechanism of the 82C54
- * device.
- */
-
-#define CALIBRATE_TIME	(5 * 1000020/HZ)
 
-unsigned long calibrate_tsc(void)
+static unsigned long calculate_cpu_khz(void)
 {
-	mach_prepare_counter();
-
-	{
-		unsigned long startlow, starthigh;
-		unsigned long endlow, endhigh;
-		unsigned long count;
-
-		rdtsc(startlow,starthigh);
+	unsigned long long start, end;
+	unsigned long count;
+	u64 delta64;
+	int i;
+	/* run 3 times to ensure the cache is warm */
+	for(i=0; i<3; i++) {
+		mach_prepare_counter();
+		rdtscll(start);
 		mach_countup(&count);
-		rdtsc(endlow,endhigh);
-
-
-		/* Error: ECTCNEVERSET */
-		if (count <= 1)
-			goto bad_ctc;
-
-		/* 64-bit subtract - gcc just messes up with long longs */
-		__asm__("subl %2,%0\n\t"
-			"sbbl %3,%1"
-			:"=a" (endlow), "=d" (endhigh)
-			:"g" (startlow), "g" (starthigh),
-			 "0" (endlow), "1" (endhigh));
-
-		/* Error: ECPUTOOFAST */
-		if (endhigh)
-			goto bad_ctc;
-
-		/* Error: ECPUTOOSLOW */
-		if (endlow <= CALIBRATE_TIME)
-			goto bad_ctc;
-
-		__asm__("divl %2"
-			:"=a" (endlow), "=d" (endhigh)
-			:"r" (endlow), "0" (0), "1" (CALIBRATE_TIME));
-
-		return endlow;
+		rdtscll(end);
 	}
-
-	/*
+	/* Error: ECTCNEVERSET
 	 * The CTC wasn't reliable: we got a hit on the very first read,
 	 * or the CPU was so fast/slow that the quotient wouldn't fit in
 	 * 32 bits..
 	 */
-bad_ctc:
-	return 0;
+	if (count <= 1)
+		return 0;
+
+	delta64 = end - start;
+
+	/* cpu freq too fast */
+	if(delta64 > (1ULL<<32))
+		return 0;
+	/* cpu freq too slow */
+	if (delta64 <= CALIBRATE_TIME_MSEC)
+		return 0;
+
+	delta64 += CALIBRATE_TIME_MSEC/2; /* round for do_div */
+	do_div(delta64,CALIBRATE_TIME_MSEC);
+
+	return (unsigned long)delta64;
 }
 
 int recalibrate_cpu_khz(void)
@@ -158,11 +170,11 @@
 	unsigned long cpu_khz_old = cpu_khz;
 
 	if (cpu_has_tsc) {
-		init_cpu_khz();
+		cpu_khz = calculate_cpu_khz();
+		tsc_khz = cpu_khz;
 		cpu_data[0].loops_per_jiffy =
-		    cpufreq_scale(cpu_data[0].loops_per_jiffy,
-			          cpu_khz_old,
-				  cpu_khz);
+			cpufreq_scale(cpu_data[0].loops_per_jiffy,
+					cpu_khz_old, cpu_khz);
 		return 0;
 	} else
 		return -ENODEV;
@@ -173,25 +185,21 @@
 EXPORT_SYMBOL(recalibrate_cpu_khz);
 
 
-/* calculate cpu_khz */
-void init_cpu_khz(void)
+void tsc_init(void)
 {
-	if (cpu_has_tsc) {
-		unsigned long tsc_quotient = calibrate_tsc();
-		if (tsc_quotient) {
-			/* report CPU clock rate in Hz.
-			 * The formula is (10^6 * 2^32) / (2^32 * 1 / (clocks/us)) =
-			 * clock/second. Our precision is about 100 ppm.
-			 */
-			{	unsigned long eax=0, edx=1000;
-				__asm__("divl %2"
-		       		:"=a" (cpu_khz), "=d" (edx)
-        	       		:"r" (tsc_quotient),
-	                	"0" (eax), "1" (edx));
-				printk("Detected %lu.%03lu MHz processor.\n", cpu_khz / 1000, cpu_khz % 1000);
-			}
-		}
-	}
+	if(!cpu_has_tsc || tsc_disable)
+		return;
+
+	cpu_khz = calculate_cpu_khz();
+	tsc_khz = cpu_khz;
+
+	if (!cpu_khz)
+		return;
+
+	printk("Detected %lu.%03lu MHz processor.\n",
+				cpu_khz / 1000, cpu_khz % 1000);
+
+	set_cyc2ns_scale(cpu_khz/1000);
 }
 
 
@@ -211,15 +219,15 @@
 	cpufreq_delayed_issched = 0;
 }
 
-/* if we notice lost ticks, schedule a call to cpufreq_get() as it tries
+/* if we notice cpufreq oddness, schedule a call to cpufreq_get() as it tries
  * to verify the CPU frequency the timing core thinks the CPU is running
  * at is still correct.
  */
-void cpufreq_delayed_get(void)
+static inline void cpufreq_delayed_get(void)
 {
 	if (cpufreq_init && !cpufreq_delayed_issched) {
 		cpufreq_delayed_issched = 1;
-		printk(KERN_DEBUG "Losing some ticks... checking if CPU frequency changed.\n");
+		printk(KERN_DEBUG "Checking if CPU frequency changed.\n");
 		schedule_work(&cpufreq_delayed_get_work);
 	}
 }
@@ -232,13 +240,11 @@
 static unsigned long loops_per_jiffy_ref = 0;
 
 #ifndef CONFIG_SMP
-static unsigned long fast_gettimeoffset_ref = 0;
 static unsigned long cpu_khz_ref = 0;
 #endif
 
-static int
-time_cpufreq_notifier(struct notifier_block *nb, unsigned long val,
-		       void *data)
+static int time_cpufreq_notifier(struct notifier_block *nb,
+		unsigned long val, void *data)
 {
 	struct cpufreq_freqs *freq = data;
 
@@ -248,7 +254,6 @@
 		ref_freq = freq->old;
 		loops_per_jiffy_ref = cpu_data[freq->cpu].loops_per_jiffy;
 #ifndef CONFIG_SMP
-		fast_gettimeoffset_ref = fast_gettimeoffset_quotient;
 		cpu_khz_ref = cpu_khz;
 #endif
 	}
@@ -258,16 +263,20 @@
 	    (val == CPUFREQ_RESUMECHANGE)) {
 		if (!(freq->flags & CPUFREQ_CONST_LOOPS))
 			cpu_data[freq->cpu].loops_per_jiffy = cpufreq_scale(loops_per_jiffy_ref, ref_freq, freq->new);
+
+		if (cpu_khz) {
 #ifndef CONFIG_SMP
-		if (cpu_khz)
 			cpu_khz = cpufreq_scale(cpu_khz_ref, ref_freq, freq->new);
-		if (use_tsc) {
+#endif
 			if (!(freq->flags & CPUFREQ_CONST_LOOPS)) {
-				fast_gettimeoffset_quotient = cpufreq_scale(fast_gettimeoffset_ref, freq->new, ref_freq);
+				tsc_khz = cpu_khz;
 				set_cyc2ns_scale(cpu_khz/1000);
+				/* TSC based sched_clock turns
+				 * to junk w/ cpufreq
+				 */
+				mark_tsc_unstable();
 			}
 		}
-#endif
 	}
 
 	if (val != CPUFREQ_RESUMECHANGE)
@@ -289,10 +298,9 @@
 					CPUFREQ_TRANSITION_NOTIFIER);
 	if (!ret)
 		cpufreq_init = 1;
+
 	return ret;
 }
 core_initcall(cpufreq_tsc);
 
-#else /* CONFIG_CPU_FREQ */
-void cpufreq_delayed_get(void) { return; }
 #endif
diff -ruN linux-2.6.12-rc6-mm1/drivers/acpi/processor_idle.c linux-2.6.12-rc6-mm1-tod/drivers/acpi/processor_idle.c
--- linux-2.6.12-rc6-mm1/drivers/acpi/processor_idle.c	2005-06-17 19:32:40.000000000 -0700
+++ linux-2.6.12-rc6-mm1-tod/drivers/acpi/processor_idle.c	2005-06-17 19:34:48.026993704 -0700
@@ -162,6 +162,7 @@
 	return;
 }
 
+extern void tsc_c3_compensate(unsigned long usecs);
 
 static void acpi_processor_idle (void)
 {
@@ -309,6 +310,10 @@
 		t2 = inl(acpi_fadt.xpm_tmr_blk.address);
 		/* Enable bus master arbitration */
 		acpi_set_register(ACPI_BITREG_ARB_DISABLE, 0, ACPI_MTX_DO_NOT_LOCK);
+
+		/* compensate for TSC pause */
+		tsc_c3_compensate((((t2-t1)&0xFFFFFF)*286)>>10);
+
 		/* Re-enable interrupts */
 		local_irq_enable();
 		/* Compute time (ticks) that we were actually asleep */
diff -ruN linux-2.6.12-rc6-mm1/include/asm-i386/mach-default/mach_timer.h linux-2.6.12-rc6-mm1-tod/include/asm-i386/mach-default/mach_timer.h
--- linux-2.6.12-rc6-mm1/include/asm-i386/mach-default/mach_timer.h	2005-06-17 19:32:40.000000000 -0700
+++ linux-2.6.12-rc6-mm1-tod/include/asm-i386/mach-default/mach_timer.h	2005-06-17 19:34:48.000000000 -0700
@@ -15,7 +15,9 @@
 #ifndef _MACH_TIMER_H
 #define _MACH_TIMER_H
 
-#define CALIBRATE_LATCH	(5 * LATCH)
+#define CALIBRATE_TIME_MSEC 30 /* 30 msecs */
+#define CALIBRATE_LATCH	\
+	((CLOCK_TICK_RATE * CALIBRATE_TIME_MSEC + 1000/2)/1000)
 
 static inline void mach_prepare_counter(void)
 {
diff -ruN linux-2.6.12-rc6-mm1/include/asm-i386/mach-summit/mach_mpparse.h linux-2.6.12-rc6-mm1-tod/include/asm-i386/mach-summit/mach_mpparse.h
--- linux-2.6.12-rc6-mm1/include/asm-i386/mach-summit/mach_mpparse.h	2005-06-17 19:32:40.000000000 -0700
+++ linux-2.6.12-rc6-mm1-tod/include/asm-i386/mach-summit/mach_mpparse.h	2005-06-17 19:34:48.000000000 -0700
@@ -30,6 +30,7 @@
 			(!strncmp(productid, "VIGIL SMP", 9) 
 			 || !strncmp(productid, "EXA", 3)
 			 || !strncmp(productid, "RUTHLESS SMP", 12))){
+		mark_tsc_unstable();
 		use_cyclone = 1; /*enable cyclone-timer*/
 		setup_summit();
 		usb_early_handoff = 1;
@@ -44,6 +45,7 @@
 	if (!strncmp(oem_id, "IBM", 3) &&
 	    (!strncmp(oem_table_id, "SERVIGIL", 8)
 	     || !strncmp(oem_table_id, "EXA", 3))){
+		mark_tsc_unstable();
 		use_cyclone = 1; /*enable cyclone-timer*/
 		setup_summit();
 		usb_early_handoff = 1;
diff -ruN linux-2.6.12-rc6-mm1/include/asm-i386/timex.h linux-2.6.12-rc6-mm1-tod/include/asm-i386/timex.h
--- linux-2.6.12-rc6-mm1/include/asm-i386/timex.h	2005-06-17 19:32:40.000000000 -0700
+++ linux-2.6.12-rc6-mm1-tod/include/asm-i386/timex.h	2005-06-17 19:34:48.000000000 -0700
@@ -8,6 +8,7 @@
 
 #include <linux/config.h>
 #include <asm/processor.h>
+#include <asm/tsc.h>
 
 #ifdef CONFIG_X86_ELAN
 #  define CLOCK_TICK_RATE 1189200 /* AMD Elan has different frequency! */
@@ -16,39 +17,6 @@
 #endif
 
 
-/*
- * Standard way to access the cycle counter on i586+ CPUs.
- * Currently only used on SMP.
- *
- * If you really have a SMP machine with i486 chips or older,
- * compile for that, and this will just always return zero.
- * That's ok, it just means that the nicer scheduling heuristics
- * won't work for you.
- *
- * We only use the low 32 bits, and we'd simply better make sure
- * that we reschedule before that wraps. Scheduling at least every
- * four billion cycles just basically sounds like a good idea,
- * regardless of how fast the machine is. 
- */
-typedef unsigned long long cycles_t;
-
-static inline cycles_t get_cycles (void)
-{
-	unsigned long long ret=0;
-
-#ifndef CONFIG_X86_TSC
-	if (!cpu_has_tsc)
-		return 0;
-#endif
-
-#if defined(CONFIG_X86_GENERIC) || defined(CONFIG_X86_TSC)
-	rdtscll(ret);
-#endif
-	return ret;
-}
-
-extern unsigned long cpu_khz;
-
 extern int read_current_timer(unsigned long *timer_value);
 #define ARCH_HAS_READ_CURRENT_TIMER	1
 
diff -ruN linux-2.6.12-rc6-mm1/include/asm-i386/tsc.h linux-2.6.12-rc6-mm1-tod/include/asm-i386/tsc.h
--- linux-2.6.12-rc6-mm1/include/asm-i386/tsc.h	1969-12-31 16:00:00.000000000 -0800
+++ linux-2.6.12-rc6-mm1-tod/include/asm-i386/tsc.h	2005-06-17 19:34:48.000000000 -0700
@@ -0,0 +1,50 @@
+/*
+ * linux/include/asm-i386/tsc.h
+ *
+ * i386 TSC related functions
+ */
+#ifndef _ASM_i386_TSC_H
+#define _ASM_i386_TSC_H
+
+#include <linux/config.h>
+#include <asm/processor.h>
+
+/*
+ * Standard way to access the cycle counter on i586+ CPUs.
+ * Currently only used on SMP.
+ *
+ * If you really have a SMP machine with i486 chips or older,
+ * compile for that, and this will just always return zero.
+ * That's ok, it just means that the nicer scheduling heuristics
+ * won't work for you.
+ *
+ * We only use the low 32 bits, and we'd simply better make sure
+ * that we reschedule before that wraps. Scheduling at least every
+ * four billion cycles just basically sounds like a good idea,
+ * regardless of how fast the machine is.
+ */
+typedef unsigned long long cycles_t;
+
+static inline cycles_t get_cycles (void)
+{
+	unsigned long long ret=0;
+
+#ifndef CONFIG_X86_TSC
+	if (!cpu_has_tsc)
+		return 0;
+#endif
+
+#if defined(CONFIG_X86_GENERIC) || defined(CONFIG_X86_TSC)
+	rdtscll(ret);
+#endif
+	return ret;
+}
+
+extern unsigned long cpu_khz;
+extern unsigned long tsc_khz;
+extern void tsc_init(void);
+void tsc_c3_compensate(unsigned long usecs);
+u64 tsc_read_c3_time(void);
+extern int check_tsc_unstable(void);
+extern void mark_tsc_unstable(void);
+#endif



^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 5/6] new timeofday i386 arch specific changes, part 4 for -mm  (v.B3)
  2005-06-18  3:01     ` [PATCH 4/6] new timeofday i386 arch specific changes, part 3 " john stultz
@ 2005-06-18  3:02       ` john stultz
  2005-06-18  3:04         ` [PATCH 6/6] new timeofday i386 specific timesources " john stultz
  0 siblings, 1 reply; 32+ messages in thread
From: john stultz @ 2005-06-18  3:02 UTC (permalink / raw)
  To: lkml, Andrew Morton
  Cc: Tim Schmielau, George Anzinger, albert, Ulrich Windl,
	Christoph Lameter, Dominik Brodowski, David Mosberger, Andi Kleen,
	paulus, schwidefsky, keith maanthey, Chris McDermott, Max Asbock,
	mahuja, Nishanth Aravamudan, Darren Hart, Darrick J. Wong,
	Anton Blanchard, donf, mpm, benh, kernel-stuff, frank

Andrew, All,
	To hopefully improve the review-ability of my changes, I've split up my
arch-i386 patch into four chunks. This patch converts the i386 arch to
use the new timeofday subsystem and removes the old timers/timer_opts
infrastructure. 
	
It applies on top of my timeofday-arch-i386-part3_B3 patch. This patch
is the last in the the timeofday-arch-i386 patchset, so you should be
able to build and boot a kernel after it has been applied. 

Note that this patch does not provide any i386 timesources, so you will
only have the jiffies timesource. To get full replacements for the code
being removed here, the following timeofday-timesources-i386 patch will
need to be applied.

Andrew, please consider for inclusion for testing into your tree.

thanks
-john

Signed-off-by: John Stultz <johnstul@us.ibm.com>

linux-2.6.12-rc6-mm1_timeofday-arch-i386-part4_B3.patch
=======================================================

diff -ruN linux-2.6.12-rc6-mm1/arch/i386/Kconfig linux-2.6.12-rc6-mm1-tod/arch/i386/Kconfig
--- linux-2.6.12-rc6-mm1/arch/i386/Kconfig	2005-06-17 19:22:40.047521801 -0700
+++ linux-2.6.12-rc6-mm1-tod/arch/i386/Kconfig	2005-06-17 19:05:21.000000000 -0700
@@ -14,6 +14,10 @@
 	  486, 586, Pentiums, and various instruction-set-compatible chips by
 	  AMD, Cyrix, and others.
 
+config NEWTOD
+	bool
+	default y
+
 config MMU
 	bool
 	default y
diff -ruN linux-2.6.12-rc6-mm1/arch/i386/kernel/Makefile linux-2.6.12-rc6-mm1-tod/arch/i386/kernel/Makefile
--- linux-2.6.12-rc6-mm1/arch/i386/kernel/Makefile	2005-06-17 19:22:40.047521801 -0700
+++ linux-2.6.12-rc6-mm1-tod/arch/i386/kernel/Makefile	2005-06-17 19:05:21.000000000 -0700
@@ -10,7 +10,6 @@
 		doublefault.o quirks.o tsc.o
 
 obj-y				+= cpu/
-obj-y				+= timers/
 obj-$(CONFIG_ACPI_BOOT)		+= acpi/
 obj-$(CONFIG_X86_BIOS_REBOOT)	+= reboot.o
 obj-$(CONFIG_MCA)		+= mca.o
diff -ruN linux-2.6.12-rc6-mm1/arch/i386/kernel/time.c linux-2.6.12-rc6-mm1-tod/arch/i386/kernel/time.c
--- linux-2.6.12-rc6-mm1/arch/i386/kernel/time.c	2005-06-17 19:22:40.049521552 -0700
+++ linux-2.6.12-rc6-mm1-tod/arch/i386/kernel/time.c	2005-06-17 19:07:22.000000000 -0700
@@ -56,6 +56,7 @@
 #include <asm/uaccess.h>
 #include <asm/processor.h>
 #include <asm/timer.h>
+#include <asm/timeofday.h>
 
 #include "mach_time.h"
 
@@ -88,8 +89,6 @@
 DEFINE_SPINLOCK(i8253_lock);
 EXPORT_SYMBOL(i8253_lock);
 
-struct timer_opts *cur_timer = &timer_none;
-
 /*
  * This is a special lock that is owned by the CPU and holds the index
  * register we are working with.  It is required for NMI access to the
@@ -119,102 +118,19 @@
 }
 EXPORT_SYMBOL(rtc_cmos_write);
 
-/*
- * This version of gettimeofday has microsecond resolution
- * and better than microsecond precision on fast x86 machines with TSC.
- */
-void do_gettimeofday(struct timeval *tv)
-{
-	unsigned long seq;
-	unsigned long usec, sec;
-	unsigned long max_ntp_tick;
-
-	do {
-		unsigned long lost;
-
-		seq = read_seqbegin(&xtime_lock);
-
-		usec = cur_timer->get_offset();
-		lost = jiffies - wall_jiffies;
-
-		/*
-		 * If time_adjust is negative then NTP is slowing the clock
-		 * so make sure not to go into next possible interval.
-		 * Better to lose some accuracy than have time go backwards..
-		 */
-		if (unlikely(time_adjust < 0)) {
-			max_ntp_tick = (USEC_PER_SEC / HZ) - tickadj;
-			usec = min(usec, max_ntp_tick);
-
-			if (lost)
-				usec += lost * max_ntp_tick;
-		}
-		else if (unlikely(lost))
-			usec += lost * (USEC_PER_SEC / HZ);
-
-		sec = xtime.tv_sec;
-		usec += (xtime.tv_nsec / 1000);
-	} while (read_seqretry(&xtime_lock, seq));
-
-	while (usec >= 1000000) {
-		usec -= 1000000;
-		sec++;
-	}
-
-	tv->tv_sec = sec;
-	tv->tv_usec = usec;
-}
-
-EXPORT_SYMBOL(do_gettimeofday);
-
-int do_settimeofday(struct timespec *tv)
-{
-	time_t wtm_sec, sec = tv->tv_sec;
-	long wtm_nsec, nsec = tv->tv_nsec;
-
-	if ((unsigned long)tv->tv_nsec >= NSEC_PER_SEC)
-		return -EINVAL;
-
-	write_seqlock_irq(&xtime_lock);
-	/*
-	 * This is revolting. We need to set "xtime" correctly. However, the
-	 * value in this location is the value at the most recent update of
-	 * wall time.  Discover what correction gettimeofday() would have
-	 * made, and then undo it!
-	 */
-	nsec -= cur_timer->get_offset() * NSEC_PER_USEC;
-	nsec -= (jiffies - wall_jiffies) * TICK_NSEC;
-
-	wtm_sec  = wall_to_monotonic.tv_sec + (xtime.tv_sec - sec);
-	wtm_nsec = wall_to_monotonic.tv_nsec + (xtime.tv_nsec - nsec);
-
-	set_normalized_timespec(&xtime, sec, nsec);
-	set_normalized_timespec(&wall_to_monotonic, wtm_sec, wtm_nsec);
-
-	time_adjust = 0;		/* stop active adjtime() */
-	time_status |= STA_UNSYNC;
-	time_maxerror = NTP_PHASE_LIMIT;
-	time_esterror = NTP_PHASE_LIMIT;
-	write_sequnlock_irq(&xtime_lock);
-	clock_was_set();
-	return 0;
-}
-
-EXPORT_SYMBOL(do_settimeofday);
-
 static int set_rtc_mmss(unsigned long nowtime)
 {
 	int retval;
-
-	WARN_ON(irqs_disabled());
+	unsigned long flags;
 
 	/* gets recalled with irq locally disabled */
-	spin_lock_irq(&rtc_lock);
+	/* XXX - does irqsave resolve this? -johnstul */
+	spin_lock_irqsave(&rtc_lock, flags);
 	if (efi_enabled)
 		retval = efi_set_rtc_mmss(nowtime);
 	else
 		retval = mach_set_rtc_mmss(nowtime);
-	spin_unlock_irq(&rtc_lock);
+	spin_unlock_irqrestore(&rtc_lock, flags);
 
 	return retval;
 }
@@ -222,16 +138,6 @@
 
 int timer_ack;
 
-/* monotonic_clock(): returns # of nanoseconds passed since time_init()
- *		Note: This function is required to return accurate
- *		time even in the absence of multiple timer ticks.
- */
-unsigned long long monotonic_clock(void)
-{
-	return cur_timer->monotonic_clock();
-}
-EXPORT_SYMBOL(monotonic_clock);
-
 #if defined(CONFIG_SMP) && defined(CONFIG_FRAME_POINTER)
 unsigned long profile_pc(struct pt_regs *regs)
 {
@@ -246,12 +152,21 @@
 #endif
 
 /*
- * timer_interrupt() needs to keep up the real-time clock,
- * as well as call the "do_timer()" routine every clocktick
+ * This is the same as the above, except we _also_ save the current
+ * Time Stamp Counter value at the time of the timer interrupt, so that
+ * we later on can estimate the time of day more exactly.
  */
-static inline void do_timer_interrupt(int irq, void *dev_id,
-					struct pt_regs *regs)
+irqreturn_t timer_interrupt(int irq, void *dev_id, struct pt_regs *regs)
 {
+	/*
+	 * Here we are in the timer irq handler. We just have irqs locally
+	 * disabled but we don't know if the timer_bh is running on the other
+	 * CPU. We need to avoid to SMP race with it. NOTE: we don' t need
+	 * the irq version of write_lock because as just said we have irq
+	 * locally disabled. -arca
+	 */
+	write_seqlock(&xtime_lock);
+
 #ifdef CONFIG_X86_IO_APIC
 	if (timer_ack) {
 		/*
@@ -284,27 +199,6 @@
 		irq = inb_p( 0x61 );	/* read the current state */
 		outb_p( irq|0x80, 0x61 );	/* reset the IRQ */
 	}
-}
-
-/*
- * This is the same as the above, except we _also_ save the current
- * Time Stamp Counter value at the time of the timer interrupt, so that
- * we later on can estimate the time of day more exactly.
- */
-irqreturn_t timer_interrupt(int irq, void *dev_id, struct pt_regs *regs)
-{
-	/*
-	 * Here we are in the timer irq handler. We just have irqs locally
-	 * disabled but we don't know if the timer_bh is running on the other
-	 * CPU. We need to avoid to SMP race with it. NOTE: we don' t need
-	 * the irq version of write_lock because as just said we have irq
-	 * locally disabled. -arca
-	 */
-	write_seqlock(&xtime_lock);
-
-	cur_timer->mark_offset();
- 
-	do_timer_interrupt(irq, NULL, regs);
 
 	write_sequnlock(&xtime_lock);
 	return IRQ_HANDLED;
@@ -328,55 +222,34 @@
 }
 EXPORT_SYMBOL(get_cmos_time);
 
-static void sync_cmos_clock(unsigned long dummy);
-
-static struct timer_list sync_cmos_timer =
-                                      TIMER_INITIALIZER(sync_cmos_clock, 0, 0);
-
-static void sync_cmos_clock(unsigned long dummy)
+/* arch specific timeofday hooks */
+nsec_t read_persistent_clock(void)
 {
-	struct timeval now, next;
-	int fail = 1;
+	return (nsec_t)get_cmos_time() * NSEC_PER_SEC;
+}
 
+void sync_persistent_clock(struct timespec ts)
+{
+	static unsigned long last_rtc_update;
 	/*
 	 * If we have an externally synchronized Linux clock, then update
 	 * CMOS clock accordingly every ~11 minutes. Set_rtc_mmss() has to be
 	 * called as close as possible to 500 ms before the new second starts.
-	 * This code is run on a timer.  If the clock is set, that timer
-	 * may not expire at the correct time.  Thus, we adjust...
 	 */
-	if ((time_status & STA_UNSYNC) != 0)
-		/*
-		 * Not synced, exit, do not restart a timer (if one is
-		 * running, let it run out).
-		 */
+	if (ts.tv_sec <= last_rtc_update + 660)
 		return;
 
-	do_gettimeofday(&now);
-	if (now.tv_usec >= USEC_AFTER - ((unsigned) TICK_SIZE) / 2 &&
-	    now.tv_usec <= USEC_BEFORE + ((unsigned) TICK_SIZE) / 2)
-		fail = set_rtc_mmss(now.tv_sec);
-
-	next.tv_usec = USEC_AFTER - now.tv_usec;
-	if (next.tv_usec <= 0)
-		next.tv_usec += USEC_PER_SEC;
-
-	if (!fail)
-		next.tv_sec = 659;
-	else
-		next.tv_sec = 0;
-
-	if (next.tv_usec >= USEC_PER_SEC) {
-		next.tv_sec++;
-		next.tv_usec -= USEC_PER_SEC;
+	if((ts.tv_nsec / 1000) >= USEC_AFTER - ((unsigned) TICK_SIZE) / 2 &&
+		(ts.tv_nsec / 1000) <= USEC_BEFORE + ((unsigned) TICK_SIZE) / 2) {
+		/* horrible...FIXME */
+		if (set_rtc_mmss(ts.tv_sec) == 0)
+			last_rtc_update = ts.tv_sec;
+		else
+			last_rtc_update = ts.tv_sec - 600; /* do it again in 60 s */
 	}
-	mod_timer(&sync_cmos_timer, jiffies + timeval_to_jiffies(&next));
 }
 
-void notify_arch_cmos_timer(void)
-{
-	mod_timer(&sync_cmos_timer, jiffies + 1);
-}
+
 
 static long clock_cmos_diff, sleep_start;
 
@@ -393,7 +266,6 @@
 
 static int timer_resume(struct sys_device *dev)
 {
-	unsigned long flags;
 	unsigned long sec;
 	unsigned long sleep_length;
 
@@ -403,10 +275,6 @@
 #endif
 	sec = get_cmos_time() + clock_cmos_diff;
 	sleep_length = (get_cmos_time() - sleep_start) * HZ;
-	write_seqlock_irqsave(&xtime_lock, flags);
-	xtime.tv_sec = sec;
-	xtime.tv_nsec = 0;
-	write_sequnlock_irqrestore(&xtime_lock, flags);
 	jiffies += sleep_length;
 	wall_jiffies += sleep_length;
 	touch_softlockup_watchdog();
@@ -441,17 +309,10 @@
 /* Duplicate of time_init() below, with hpet_enable part added */
 static void __init hpet_time_init(void)
 {
-	xtime.tv_sec = get_cmos_time();
-	xtime.tv_nsec = (INITIAL_JIFFIES % HZ) * (NSEC_PER_SEC / HZ);
-	set_normalized_timespec(&wall_to_monotonic,
-		-xtime.tv_sec, -xtime.tv_nsec);
-
 	if ((hpet_enable() >= 0) && hpet_use_timer) {
 		printk("Using HPET for base-timer\n");
 	}
 
-	cur_timer = select_timer();
-	printk(KERN_INFO "Using %s for high-res timesource\n",cur_timer->name);
 
 	time_init_hook();
 }
@@ -469,13 +330,5 @@
 		return;
 	}
 #endif
-	xtime.tv_sec = get_cmos_time();
-	xtime.tv_nsec = (INITIAL_JIFFIES % HZ) * (NSEC_PER_SEC / HZ);
-	set_normalized_timespec(&wall_to_monotonic,
-		-xtime.tv_sec, -xtime.tv_nsec);
-
-	cur_timer = select_timer();
-	printk(KERN_INFO "Using %s for high-res timesource\n",cur_timer->name);
-
 	time_init_hook();
 }
diff -ruN linux-2.6.12-rc6-mm1/arch/i386/kernel/timers/common.c linux-2.6.12-rc6-mm1-tod/arch/i386/kernel/timers/common.c
--- linux-2.6.12-rc6-mm1/arch/i386/kernel/timers/common.c	2005-06-17 19:22:40.057520557 -0700
+++ linux-2.6.12-rc6-mm1-tod/arch/i386/kernel/timers/common.c	1969-12-31 16:00:00.000000000 -0800
@@ -1,90 +0,0 @@
-/*
- *	Common functions used across the timers go here
- */
-
-#include <linux/init.h>
-#include <linux/timex.h>
-#include <linux/errno.h>
-#include <linux/jiffies.h>
-#include <linux/module.h>
-
-#include <asm/io.h>
-#include <asm/timer.h>
-#include <asm/hpet.h>
-
-#include "mach_timer.h"
-
-#ifdef CONFIG_HPET_TIMER
-/* ------ Calibrate the TSC using HPET -------
- * Return 2^32 * (1 / (TSC clocks per usec)) for getting the CPU freq.
- * Second output is parameter 1 (when non NULL)
- * Set 2^32 * (1 / (tsc per HPET clk)) for delay_hpet().
- * calibrate_tsc() calibrates the processor TSC by comparing
- * it to the HPET timer of known frequency.
- * Too much 64-bit arithmetic here to do this cleanly in C
- */
-#define CALIBRATE_CNT_HPET 	(5 * hpet_tick)
-#define CALIBRATE_TIME_HPET 	(5 * KERNEL_TICK_USEC)
-
-unsigned long __init calibrate_tsc_hpet(unsigned long *tsc_hpet_quotient_ptr)
-{
-	unsigned long tsc_startlow, tsc_starthigh;
-	unsigned long tsc_endlow, tsc_endhigh;
-	unsigned long hpet_start, hpet_end;
-	unsigned long result, remain;
-
-	hpet_start = hpet_readl(HPET_COUNTER);
-	rdtsc(tsc_startlow, tsc_starthigh);
-	do {
-		hpet_end = hpet_readl(HPET_COUNTER);
-	} while ((hpet_end - hpet_start) < CALIBRATE_CNT_HPET);
-	rdtsc(tsc_endlow, tsc_endhigh);
-
-	/* 64-bit subtract - gcc just messes up with long longs */
-	__asm__("subl %2,%0\n\t"
-		"sbbl %3,%1"
-		:"=a" (tsc_endlow), "=d" (tsc_endhigh)
-		:"g" (tsc_startlow), "g" (tsc_starthigh),
-		 "0" (tsc_endlow), "1" (tsc_endhigh));
-
-	/* Error: ECPUTOOFAST */
-	if (tsc_endhigh)
-		goto bad_calibration;
-
-	/* Error: ECPUTOOSLOW */
-	if (tsc_endlow <= CALIBRATE_TIME_HPET)
-		goto bad_calibration;
-
-	ASM_DIV64_REG(result, remain, tsc_endlow, 0, CALIBRATE_TIME_HPET);
-	if (remain > (tsc_endlow >> 1))
-		result++; /* rounding the result */
-
-	if (tsc_hpet_quotient_ptr) {
-		unsigned long tsc_hpet_quotient;
-
-		ASM_DIV64_REG(tsc_hpet_quotient, remain, tsc_endlow, 0,
-			CALIBRATE_CNT_HPET);
-		if (remain > (tsc_endlow >> 1))
-			tsc_hpet_quotient++; /* rounding the result */
-		*tsc_hpet_quotient_ptr = tsc_hpet_quotient;
-	}
-
-	return result;
-bad_calibration:
-	/*
-	 * the CPU was so fast/slow that the quotient wouldn't fit in
-	 * 32 bits..
-	 */
-	return 0;
-}
-#endif
-
-
-unsigned long read_timer_tsc(void)
-{
-	unsigned long retval;
-	rdtscl(retval);
-	return retval;
-}
-
-
diff -ruN linux-2.6.12-rc6-mm1/arch/i386/kernel/timers/Makefile linux-2.6.12-rc6-mm1-tod/arch/i386/kernel/timers/Makefile
--- linux-2.6.12-rc6-mm1/arch/i386/kernel/timers/Makefile	2005-06-17 19:22:40.057520557 -0700
+++ linux-2.6.12-rc6-mm1-tod/arch/i386/kernel/timers/Makefile	1969-12-31 16:00:00.000000000 -0800
@@ -1,9 +0,0 @@
-#
-# Makefile for x86 timers
-#
-
-obj-y := timer.o timer_none.o timer_tsc.o timer_pit.o common.o
-
-obj-$(CONFIG_X86_CYCLONE_TIMER)	+= timer_cyclone.o
-obj-$(CONFIG_HPET_TIMER)	+= timer_hpet.o
-obj-$(CONFIG_X86_PM_TIMER)	+= timer_pm.o
diff -ruN linux-2.6.12-rc6-mm1/arch/i386/kernel/timers/timer.c linux-2.6.12-rc6-mm1-tod/arch/i386/kernel/timers/timer.c
--- linux-2.6.12-rc6-mm1/arch/i386/kernel/timers/timer.c	2005-06-17 19:22:40.057520557 -0700
+++ linux-2.6.12-rc6-mm1-tod/arch/i386/kernel/timers/timer.c	1969-12-31 16:00:00.000000000 -0800
@@ -1,66 +0,0 @@
-#include <linux/init.h>
-#include <linux/kernel.h>
-#include <linux/string.h>
-#include <asm/timer.h>
-
-#ifdef CONFIG_HPET_TIMER
-/*
- * HPET memory read is slower than tsc reads, but is more dependable as it
- * always runs at constant frequency and reduces complexity due to
- * cpufreq. So, we prefer HPET timer to tsc based one. Also, we cannot use
- * timer_pit when HPET is active. So, we default to timer_tsc.
- */
-#endif
-/* list of timers, ordered by preference, NULL terminated */
-static struct init_timer_opts* __initdata timers[] = {
-#ifdef CONFIG_X86_CYCLONE_TIMER
-	&timer_cyclone_init,
-#endif
-#ifdef CONFIG_HPET_TIMER
-	&timer_hpet_init,
-#endif
-#ifdef CONFIG_X86_PM_TIMER
-	&timer_pmtmr_init,
-#endif
-	&timer_tsc_init,
-	&timer_pit_init,
-	NULL,
-};
-
-static char clock_override[10] __initdata;
-
-static int __init clock_setup(char* str)
-{
-	if (str)
-		strlcpy(clock_override, str, sizeof(clock_override));
-	return 1;
-}
-__setup("clock=", clock_setup);
-
-
-/* The chosen timesource has been found to be bad.
- * Fall back to a known good timesource (the PIT)
- */
-void clock_fallback(void)
-{
-	cur_timer = &timer_pit;
-}
-
-/* iterates through the list of timers, returning the first 
- * one that initializes successfully.
- */
-struct timer_opts* __init select_timer(void)
-{
-	int i = 0;
-	
-	/* find most preferred working timer */
-	while (timers[i]) {
-		if (timers[i]->init)
-			if (timers[i]->init(clock_override) == 0)
-				return timers[i]->opts;
-		++i;
-	}
-		
-	panic("select_timer: Cannot find a suitable timer\n");
-	return NULL;
-}
diff -ruN linux-2.6.12-rc6-mm1/arch/i386/kernel/timers/timer_cyclone.c linux-2.6.12-rc6-mm1-tod/arch/i386/kernel/timers/timer_cyclone.c
--- linux-2.6.12-rc6-mm1/arch/i386/kernel/timers/timer_cyclone.c	2005-06-17 19:22:40.058520433 -0700
+++ linux-2.6.12-rc6-mm1-tod/arch/i386/kernel/timers/timer_cyclone.c	1969-12-31 16:00:00.000000000 -0800
@@ -1,259 +0,0 @@
-/*	Cyclone-timer: 
- *		This code implements timer_ops for the cyclone counter found
- *		on IBM x440, x360, and other Summit based systems.
- *
- *	Copyright (C) 2002 IBM, John Stultz (johnstul@us.ibm.com)
- */
-
-
-#include <linux/spinlock.h>
-#include <linux/init.h>
-#include <linux/timex.h>
-#include <linux/errno.h>
-#include <linux/string.h>
-#include <linux/jiffies.h>
-
-#include <asm/timer.h>
-#include <asm/io.h>
-#include <asm/pgtable.h>
-#include <asm/fixmap.h>
-#include "io_ports.h"
-
-extern spinlock_t i8253_lock;
-
-/* Number of usecs that the last interrupt was delayed */
-static int delay_at_last_interrupt;
-
-#define CYCLONE_CBAR_ADDR 0xFEB00CD0
-#define CYCLONE_PMCC_OFFSET 0x51A0
-#define CYCLONE_MPMC_OFFSET 0x51D0
-#define CYCLONE_MPCS_OFFSET 0x51A8
-#define CYCLONE_TIMER_FREQ 100000000
-#define CYCLONE_TIMER_MASK (((u64)1<<40)-1) /* 40 bit mask */
-int use_cyclone = 0;
-
-static u32* volatile cyclone_timer;	/* Cyclone MPMC0 register */
-static u32 last_cyclone_low;
-static u32 last_cyclone_high;
-static unsigned long long monotonic_base;
-static seqlock_t monotonic_lock = SEQLOCK_UNLOCKED;
-
-/* helper macro to atomically read both cyclone counter registers */
-#define read_cyclone_counter(low,high) \
-	do{ \
-		high = cyclone_timer[1]; low = cyclone_timer[0]; \
-	} while (high != cyclone_timer[1]);
-
-
-static void mark_offset_cyclone(void)
-{
-	unsigned long lost, delay;
-	unsigned long delta = last_cyclone_low;
-	int count;
-	unsigned long long this_offset, last_offset;
-
-	write_seqlock(&monotonic_lock);
-	last_offset = ((unsigned long long)last_cyclone_high<<32)|last_cyclone_low;
-	
-	spin_lock(&i8253_lock);
-	read_cyclone_counter(last_cyclone_low,last_cyclone_high);
-
-	/* read values for delay_at_last_interrupt */
-	outb_p(0x00, 0x43);     /* latch the count ASAP */
-
-	count = inb_p(0x40);    /* read the latched count */
-	count |= inb(0x40) << 8;
-
-	/*
-	 * VIA686a test code... reset the latch if count > max + 1
-	 * from timer_pit.c - cjb
-	 */
-	if (count > LATCH) {
-		outb_p(0x34, PIT_MODE);
-		outb_p(LATCH & 0xff, PIT_CH0);
-		outb(LATCH >> 8, PIT_CH0);
-		count = LATCH - 1;
-	}
-	spin_unlock(&i8253_lock);
-
-	/* lost tick compensation */
-	delta = last_cyclone_low - delta;	
-	delta /= (CYCLONE_TIMER_FREQ/1000000);
-	delta += delay_at_last_interrupt;
-	lost = delta/(1000000/HZ);
-	delay = delta%(1000000/HZ);
-	if (lost >= 2)
-		jiffies_64 += lost-1;
-	
-	/* update the monotonic base value */
-	this_offset = ((unsigned long long)last_cyclone_high<<32)|last_cyclone_low;
-	monotonic_base += (this_offset - last_offset) & CYCLONE_TIMER_MASK;
-	write_sequnlock(&monotonic_lock);
-
-	/* calculate delay_at_last_interrupt */
-	count = ((LATCH-1) - count) * TICK_SIZE;
-	delay_at_last_interrupt = (count + LATCH/2) / LATCH;
-
-
-	/* catch corner case where tick rollover occured 
-	 * between cyclone and pit reads (as noted when 
-	 * usec delta is > 90% # of usecs/tick)
-	 */
-	if (lost && abs(delay - delay_at_last_interrupt) > (900000/HZ))
-		jiffies_64++;
-}
-
-static unsigned long get_offset_cyclone(void)
-{
-	u32 offset;
-
-	if(!cyclone_timer)
-		return delay_at_last_interrupt;
-
-	/* Read the cyclone timer */
-	offset = cyclone_timer[0];
-
-	/* .. relative to previous jiffy */
-	offset = offset - last_cyclone_low;
-
-	/* convert cyclone ticks to microseconds */	
-	/* XXX slow, can we speed this up? */
-	offset = offset/(CYCLONE_TIMER_FREQ/1000000);
-
-	/* our adjusted time offset in microseconds */
-	return delay_at_last_interrupt + offset;
-}
-
-static unsigned long long monotonic_clock_cyclone(void)
-{
-	u32 now_low, now_high;
-	unsigned long long last_offset, this_offset, base;
-	unsigned long long ret;
-	unsigned seq;
-
-	/* atomically read monotonic base & last_offset */
-	do {
-		seq = read_seqbegin(&monotonic_lock);
-		last_offset = ((unsigned long long)last_cyclone_high<<32)|last_cyclone_low;
-		base = monotonic_base;
-	} while (read_seqretry(&monotonic_lock, seq));
-
-
-	/* Read the cyclone counter */
-	read_cyclone_counter(now_low,now_high);
-	this_offset = ((unsigned long long)now_high<<32)|now_low;
-
-	/* convert to nanoseconds */
-	ret = base + ((this_offset - last_offset)&CYCLONE_TIMER_MASK);
-	return ret * (1000000000 / CYCLONE_TIMER_FREQ);
-}
-
-static int __init init_cyclone(char* override)
-{
-	u32* reg;	
-	u32 base;		/* saved cyclone base address */
-	u32 pageaddr;	/* page that contains cyclone_timer register */
-	u32 offset;		/* offset from pageaddr to cyclone_timer register */
-	int i;
-	
-	/* check clock override */
-	if (override[0] && strncmp(override,"cyclone",7))
-			return -ENODEV;
-
-	/*make sure we're on a summit box*/
-	if(!use_cyclone) return -ENODEV; 
-	
-	printk(KERN_INFO "Summit chipset: Starting Cyclone Counter.\n");
-
-	/* find base address */
-	pageaddr = (CYCLONE_CBAR_ADDR)&PAGE_MASK;
-	offset = (CYCLONE_CBAR_ADDR)&(~PAGE_MASK);
-	set_fixmap_nocache(FIX_CYCLONE_TIMER, pageaddr);
-	reg = (u32*)(fix_to_virt(FIX_CYCLONE_TIMER) + offset);
-	if(!reg){
-		printk(KERN_ERR "Summit chipset: Could not find valid CBAR register.\n");
-		return -ENODEV;
-	}
-	base = *reg;	
-	if(!base){
-		printk(KERN_ERR "Summit chipset: Could not find valid CBAR value.\n");
-		return -ENODEV;
-	}
-	
-	/* setup PMCC */
-	pageaddr = (base + CYCLONE_PMCC_OFFSET)&PAGE_MASK;
-	offset = (base + CYCLONE_PMCC_OFFSET)&(~PAGE_MASK);
-	set_fixmap_nocache(FIX_CYCLONE_TIMER, pageaddr);
-	reg = (u32*)(fix_to_virt(FIX_CYCLONE_TIMER) + offset);
-	if(!reg){
-		printk(KERN_ERR "Summit chipset: Could not find valid PMCC register.\n");
-		return -ENODEV;
-	}
-	reg[0] = 0x00000001;
-
-	/* setup MPCS */
-	pageaddr = (base + CYCLONE_MPCS_OFFSET)&PAGE_MASK;
-	offset = (base + CYCLONE_MPCS_OFFSET)&(~PAGE_MASK);
-	set_fixmap_nocache(FIX_CYCLONE_TIMER, pageaddr);
-	reg = (u32*)(fix_to_virt(FIX_CYCLONE_TIMER) + offset);
-	if(!reg){
-		printk(KERN_ERR "Summit chipset: Could not find valid MPCS register.\n");
-		return -ENODEV;
-	}
-	reg[0] = 0x00000001;
-
-	/* map in cyclone_timer */
-	pageaddr = (base + CYCLONE_MPMC_OFFSET)&PAGE_MASK;
-	offset = (base + CYCLONE_MPMC_OFFSET)&(~PAGE_MASK);
-	set_fixmap_nocache(FIX_CYCLONE_TIMER, pageaddr);
-	cyclone_timer = (u32*)(fix_to_virt(FIX_CYCLONE_TIMER) + offset);
-	if(!cyclone_timer){
-		printk(KERN_ERR "Summit chipset: Could not find valid MPMC register.\n");
-		return -ENODEV;
-	}
-
-	/*quick test to make sure its ticking*/
-	for(i=0; i<3; i++){
-		u32 old = cyclone_timer[0];
-		int stall = 100;
-		while(stall--) barrier();
-		if(cyclone_timer[0] == old){
-			printk(KERN_ERR "Summit chipset: Counter not counting! DISABLED\n");
-			cyclone_timer = 0;
-			return -ENODEV;
-		}
-	}
-
-	init_cpu_khz();
-
-	/* Everything looks good! */
-	return 0;
-}
-
-
-static void delay_cyclone(unsigned long loops)
-{
-	unsigned long bclock, now;
-	if(!cyclone_timer)
-		return;
-	bclock = cyclone_timer[0];
-	do {
-		rep_nop();
-		now = cyclone_timer[0];
-	} while ((now-bclock) < loops);
-}
-/************************************************************/
-
-/* cyclone timer_opts struct */
-static struct timer_opts timer_cyclone = {
-	.name = "cyclone",
-	.mark_offset = mark_offset_cyclone, 
-	.get_offset = get_offset_cyclone,
-	.monotonic_clock =	monotonic_clock_cyclone,
-	.delay = delay_cyclone,
-};
-
-struct init_timer_opts __initdata timer_cyclone_init = {
-	.init = init_cyclone,
-	.opts = &timer_cyclone,
-};
diff -ruN linux-2.6.12-rc6-mm1/arch/i386/kernel/timers/timer_hpet.c linux-2.6.12-rc6-mm1-tod/arch/i386/kernel/timers/timer_hpet.c
--- linux-2.6.12-rc6-mm1/arch/i386/kernel/timers/timer_hpet.c	2005-06-17 19:22:40.059520309 -0700
+++ linux-2.6.12-rc6-mm1-tod/arch/i386/kernel/timers/timer_hpet.c	1969-12-31 16:00:00.000000000 -0800
@@ -1,195 +0,0 @@
-/*
- * This code largely moved from arch/i386/kernel/time.c.
- * See comments there for proper credits.
- */
-
-#include <linux/spinlock.h>
-#include <linux/init.h>
-#include <linux/timex.h>
-#include <linux/errno.h>
-#include <linux/string.h>
-#include <linux/jiffies.h>
-
-#include <asm/timer.h>
-#include <asm/io.h>
-#include <asm/processor.h>
-
-#include "io_ports.h"
-#include "mach_timer.h"
-#include <asm/hpet.h>
-
-static unsigned long hpet_usec_quotient;	/* convert hpet clks to usec */
-static unsigned long tsc_hpet_quotient;		/* convert tsc to hpet clks */
-static unsigned long hpet_last; 	/* hpet counter value at last tick*/
-static unsigned long last_tsc_low;	/* lsb 32 bits of Time Stamp Counter */
-static unsigned long last_tsc_high; 	/* msb 32 bits of Time Stamp Counter */
-static unsigned long long monotonic_base;
-static seqlock_t monotonic_lock = SEQLOCK_UNLOCKED;
-
-/* convert from cycles(64bits) => nanoseconds (64bits)
- *  basic equation:
- *		ns = cycles / (freq / ns_per_sec)
- *		ns = cycles * (ns_per_sec / freq)
- *		ns = cycles * (10^9 / (cpu_mhz * 10^6))
- *		ns = cycles * (10^3 / cpu_mhz)
- *
- *	Then we use scaling math (suggested by george@mvista.com) to get:
- *		ns = cycles * (10^3 * SC / cpu_mhz) / SC
- *		ns = cycles * cyc2ns_scale / SC
- *
- *	And since SC is a constant power of two, we can convert the div
- *  into a shift.
- *			-johnstul@us.ibm.com "math is hard, lets go shopping!"
- */
-static unsigned long cyc2ns_scale;
-#define CYC2NS_SCALE_FACTOR 10 /* 2^10, carefully chosen */
-
-static inline void set_cyc2ns_scale(unsigned long cpu_mhz)
-{
-	cyc2ns_scale = (1000 << CYC2NS_SCALE_FACTOR)/cpu_mhz;
-}
-
-static inline unsigned long long cycles_2_ns(unsigned long long cyc)
-{
-	return (cyc * cyc2ns_scale) >> CYC2NS_SCALE_FACTOR;
-}
-
-static unsigned long long monotonic_clock_hpet(void)
-{
-	unsigned long long last_offset, this_offset, base;
-	unsigned seq;
-
-	/* atomically read monotonic base & last_offset */
-	do {
-		seq = read_seqbegin(&monotonic_lock);
-		last_offset = ((unsigned long long)last_tsc_high<<32)|last_tsc_low;
-		base = monotonic_base;
-	} while (read_seqretry(&monotonic_lock, seq));
-
-	/* Read the Time Stamp Counter */
-	rdtscll(this_offset);
-
-	/* return the value in ns */
-	return base + cycles_2_ns(this_offset - last_offset);
-}
-
-static unsigned long get_offset_hpet(void)
-{
-	register unsigned long eax, edx;
-
-	eax = hpet_readl(HPET_COUNTER);
-	eax -= hpet_last;	/* hpet delta */
-	eax = min(hpet_tick, eax);
-	/*
-         * Time offset = (hpet delta) * ( usecs per HPET clock )
-	 *             = (hpet delta) * ( usecs per tick / HPET clocks per tick)
-	 *             = (hpet delta) * ( hpet_usec_quotient ) / (2^32)
-	 *
-	 * Where,
-	 * hpet_usec_quotient = (2^32 * usecs per tick)/HPET clocks per tick
-	 *
-	 * Using a mull instead of a divl saves some cycles in critical path.
-         */
-	ASM_MUL64_REG(eax, edx, hpet_usec_quotient, eax);
-
-	/* our adjusted time offset in microseconds */
-	return edx;
-}
-
-static void mark_offset_hpet(void)
-{
-	unsigned long long this_offset, last_offset;
-	unsigned long offset;
-
-	write_seqlock(&monotonic_lock);
-	last_offset = ((unsigned long long)last_tsc_high<<32)|last_tsc_low;
-	rdtsc(last_tsc_low, last_tsc_high);
-
-	if (hpet_use_timer)
-		offset = hpet_readl(HPET_T0_CMP) - hpet_tick;
-	else
-		offset = hpet_readl(HPET_COUNTER);
-	if (unlikely(((offset - hpet_last) >= (2*hpet_tick)) && (hpet_last != 0))) {
-		int lost_ticks = ((offset - hpet_last) / hpet_tick) - 1;
-		jiffies_64 += lost_ticks;
-	}
-	hpet_last = offset;
-
-	/* update the monotonic base value */
-	this_offset = ((unsigned long long)last_tsc_high<<32)|last_tsc_low;
-	monotonic_base += cycles_2_ns(this_offset - last_offset);
-	write_sequnlock(&monotonic_lock);
-}
-
-static void delay_hpet(unsigned long loops)
-{
-	unsigned long hpet_start, hpet_end;
-	unsigned long eax;
-
-	/* loops is the number of cpu cycles. Convert it to hpet clocks */
-	ASM_MUL64_REG(eax, loops, tsc_hpet_quotient, loops);
-
-	hpet_start = hpet_readl(HPET_COUNTER);
-	do {
-		rep_nop();
-		hpet_end = hpet_readl(HPET_COUNTER);
-	} while ((hpet_end - hpet_start) < (loops));
-}
-
-static int __init init_hpet(char* override)
-{
-	unsigned long result, remain;
-
-	/* check clock override */
-	if (override[0] && strncmp(override,"hpet",4))
-		return -ENODEV;
-
-	if (!is_hpet_enabled())
-		return -ENODEV;
-
-	printk("Using HPET for gettimeofday\n");
-	if (cpu_has_tsc) {
-		unsigned long tsc_quotient = calibrate_tsc_hpet(&tsc_hpet_quotient);
-		if (tsc_quotient) {
-			/* report CPU clock rate in Hz.
-			 * The formula is (10^6 * 2^32) / (2^32 * 1 / (clocks/us)) =
-			 * clock/second. Our precision is about 100 ppm.
-			 */
-			{	unsigned long eax=0, edx=1000;
-				ASM_DIV64_REG(cpu_khz, edx, tsc_quotient,
-						eax, edx);
-				printk("Detected %lu.%03lu MHz processor.\n",
-					cpu_khz / 1000, cpu_khz % 1000);
-			}
-			set_cyc2ns_scale(cpu_khz/1000);
-		}
-	}
-
-	/*
-	 * Math to calculate hpet to usec multiplier
-	 * Look for the comments at get_offset_hpet()
-	 */
-	ASM_DIV64_REG(result, remain, hpet_tick, 0, KERNEL_TICK_USEC);
-	if (remain > (hpet_tick >> 1))
-		result++; /* rounding the result */
-	hpet_usec_quotient = result;
-
-	return 0;
-}
-
-/************************************************************/
-
-/* tsc timer_opts struct */
-static struct timer_opts timer_hpet = {
-	.name = 		"hpet",
-	.mark_offset =		mark_offset_hpet,
-	.get_offset =		get_offset_hpet,
-	.monotonic_clock =	monotonic_clock_hpet,
-	.delay = 		delay_hpet,
-	.read_timer = 		read_timer_tsc,
-};
-
-struct init_timer_opts __initdata timer_hpet_init = {
-	.init =	init_hpet,
-	.opts = &timer_hpet,
-};
diff -ruN linux-2.6.12-rc6-mm1/arch/i386/kernel/timers/timer_none.c linux-2.6.12-rc6-mm1-tod/arch/i386/kernel/timers/timer_none.c
--- linux-2.6.12-rc6-mm1/arch/i386/kernel/timers/timer_none.c	2005-06-17 19:22:40.059520309 -0700
+++ linux-2.6.12-rc6-mm1-tod/arch/i386/kernel/timers/timer_none.c	1969-12-31 16:00:00.000000000 -0800
@@ -1,39 +0,0 @@
-#include <linux/init.h>
-#include <asm/timer.h>
-
-static void mark_offset_none(void)
-{
-	/* nothing needed */
-}
-
-static unsigned long get_offset_none(void)
-{
-	return 0;
-}
-
-static unsigned long long monotonic_clock_none(void)
-{
-	return 0;
-}
-
-static void delay_none(unsigned long loops)
-{
-	int d0;
-	__asm__ __volatile__(
-		"\tjmp 1f\n"
-		".align 16\n"
-		"1:\tjmp 2f\n"
-		".align 16\n"
-		"2:\tdecl %0\n\tjns 2b"
-		:"=&a" (d0)
-		:"0" (loops));
-}
-
-/* none timer_opts struct */
-struct timer_opts timer_none = {
-	.name = 	"none",
-	.mark_offset =	mark_offset_none, 
-	.get_offset =	get_offset_none,
-	.monotonic_clock =	monotonic_clock_none,
-	.delay = delay_none,
-};
diff -ruN linux-2.6.12-rc6-mm1/arch/i386/kernel/timers/timer_pit.c linux-2.6.12-rc6-mm1-tod/arch/i386/kernel/timers/timer_pit.c
--- linux-2.6.12-rc6-mm1/arch/i386/kernel/timers/timer_pit.c	2005-06-17 19:22:40.060520184 -0700
+++ linux-2.6.12-rc6-mm1-tod/arch/i386/kernel/timers/timer_pit.c	1969-12-31 16:00:00.000000000 -0800
@@ -1,165 +0,0 @@
-/*
- * This code largely moved from arch/i386/kernel/time.c.
- * See comments there for proper credits.
- */
-
-#include <linux/spinlock.h>
-#include <linux/module.h>
-#include <linux/device.h>
-#include <linux/irq.h>
-#include <linux/sysdev.h>
-#include <linux/timex.h>
-#include <asm/delay.h>
-#include <asm/mpspec.h>
-#include <asm/timer.h>
-#include <asm/smp.h>
-#include <asm/io.h>
-#include <asm/arch_hooks.h>
-
-extern spinlock_t i8259A_lock;
-extern spinlock_t i8253_lock;
-#include "do_timer.h"
-#include "io_ports.h"
-
-static int count_p; /* counter in get_offset_pit() */
-
-static int __init init_pit(char* override)
-{
- 	/* check clock override */
- 	if (override[0] && strncmp(override,"pit",3))
- 		printk(KERN_ERR "Warning: clock= override failed. Defaulting to PIT\n");
- 
-	count_p = LATCH;
-	return 0;
-}
-
-static void mark_offset_pit(void)
-{
-	/* nothing needed */
-}
-
-static unsigned long long monotonic_clock_pit(void)
-{
-	return 0;
-}
-
-static void delay_pit(unsigned long loops)
-{
-	int d0;
-	__asm__ __volatile__(
-		"\tjmp 1f\n"
-		".align 16\n"
-		"1:\tjmp 2f\n"
-		".align 16\n"
-		"2:\tdecl %0\n\tjns 2b"
-		:"=&a" (d0)
-		:"0" (loops));
-}
-
-
-/* This function must be called with xtime_lock held.
- * It was inspired by Steve McCanne's microtime-i386 for BSD.  -- jrs
- * 
- * However, the pc-audio speaker driver changes the divisor so that
- * it gets interrupted rather more often - it loads 64 into the
- * counter rather than 11932! This has an adverse impact on
- * do_gettimeoffset() -- it stops working! What is also not
- * good is that the interval that our timer function gets called
- * is no longer 10.0002 ms, but 9.9767 ms. To get around this
- * would require using a different timing source. Maybe someone
- * could use the RTC - I know that this can interrupt at frequencies
- * ranging from 8192Hz to 2Hz. If I had the energy, I'd somehow fix
- * it so that at startup, the timer code in sched.c would select
- * using either the RTC or the 8253 timer. The decision would be
- * based on whether there was any other device around that needed
- * to trample on the 8253. I'd set up the RTC to interrupt at 1024 Hz,
- * and then do some jiggery to have a version of do_timer that 
- * advanced the clock by 1/1024 s. Every time that reached over 1/100
- * of a second, then do all the old code. If the time was kept correct
- * then do_gettimeoffset could just return 0 - there is no low order
- * divider that can be accessed.
- *
- * Ideally, you would be able to use the RTC for the speaker driver,
- * but it appears that the speaker driver really needs interrupt more
- * often than every 120 us or so.
- *
- * Anyway, this needs more thought....		pjsg (1993-08-28)
- * 
- * If you are really that interested, you should be reading
- * comp.protocols.time.ntp!
- */
-
-static unsigned long get_offset_pit(void)
-{
-	int count;
-	unsigned long flags;
-	static unsigned long jiffies_p = 0;
-
-	/*
-	 * cache volatile jiffies temporarily; we have xtime_lock. 
-	 */
-	unsigned long jiffies_t;
-
-	spin_lock_irqsave(&i8253_lock, flags);
-	/* timer count may underflow right here */
-	outb_p(0x00, PIT_MODE);	/* latch the count ASAP */
-
-	count = inb_p(PIT_CH0);	/* read the latched count */
-
-	/*
-	 * We do this guaranteed double memory access instead of a _p 
-	 * postfix in the previous port access. Wheee, hackady hack
-	 */
- 	jiffies_t = jiffies;
-
-	count |= inb_p(PIT_CH0) << 8;
-	
-        /* VIA686a test code... reset the latch if count > max + 1 */
-        if (count > LATCH) {
-                outb_p(0x34, PIT_MODE);
-                outb_p(LATCH & 0xff, PIT_CH0);
-                outb(LATCH >> 8, PIT_CH0);
-                count = LATCH - 1;
-        }
-	
-	/*
-	 * avoiding timer inconsistencies (they are rare, but they happen)...
-	 * there are two kinds of problems that must be avoided here:
-	 *  1. the timer counter underflows
-	 *  2. hardware problem with the timer, not giving us continuous time,
-	 *     the counter does small "jumps" upwards on some Pentium systems,
-	 *     (see c't 95/10 page 335 for Neptun bug.)
-	 */
-
-	if( jiffies_t == jiffies_p ) {
-		if( count > count_p ) {
-			/* the nutcase */
-			count = do_timer_overflow(count);
-		}
-	} else
-		jiffies_p = jiffies_t;
-
-	count_p = count;
-
-	spin_unlock_irqrestore(&i8253_lock, flags);
-
-	count = ((LATCH-1) - count) * TICK_SIZE;
-	count = (count + LATCH/2) / LATCH;
-
-	return count;
-}
-
-
-/* tsc timer_opts struct */
-struct timer_opts timer_pit = {
-	.name = "pit",
-	.mark_offset = mark_offset_pit, 
-	.get_offset = get_offset_pit,
-	.monotonic_clock = monotonic_clock_pit,
-	.delay = delay_pit,
-};
-
-struct init_timer_opts __initdata timer_pit_init = {
-	.init = init_pit, 
-	.opts = &timer_pit,
-};
diff -ruN linux-2.6.12-rc6-mm1/arch/i386/kernel/timers/timer_pm.c linux-2.6.12-rc6-mm1-tod/arch/i386/kernel/timers/timer_pm.c
--- linux-2.6.12-rc6-mm1/arch/i386/kernel/timers/timer_pm.c	2005-06-17 19:22:40.060520184 -0700
+++ linux-2.6.12-rc6-mm1-tod/arch/i386/kernel/timers/timer_pm.c	1969-12-31 16:00:00.000000000 -0800
@@ -1,259 +0,0 @@
-/*
- * (C) Dominik Brodowski <linux@brodo.de> 2003
- *
- * Driver to use the Power Management Timer (PMTMR) available in some
- * southbridges as primary timing source for the Linux kernel.
- *
- * Based on parts of linux/drivers/acpi/hardware/hwtimer.c, timer_pit.c,
- * timer_hpet.c, and on Arjan van de Ven's implementation for 2.4.
- *
- * This file is licensed under the GPL v2.
- */
-
-
-#include <linux/kernel.h>
-#include <linux/module.h>
-#include <linux/device.h>
-#include <linux/init.h>
-#include <asm/types.h>
-#include <asm/timer.h>
-#include <asm/smp.h>
-#include <asm/io.h>
-#include <asm/arch_hooks.h>
-
-#include <linux/timex.h>
-#include "mach_timer.h"
-
-/* Number of PMTMR ticks expected during calibration run */
-#define PMTMR_TICKS_PER_SEC 3579545
-#define PMTMR_EXPECTED_RATE \
-  ((CALIBRATE_LATCH * (PMTMR_TICKS_PER_SEC >> 10)) / (CLOCK_TICK_RATE>>10))
-
-
-/* The I/O port the PMTMR resides at.
- * The location is detected during setup_arch(),
- * in arch/i386/acpi/boot.c */
-u32 pmtmr_ioport = 0;
-
-
-/* value of the Power timer at last timer interrupt */
-static u32 offset_tick;
-static u32 offset_delay;
-
-static unsigned long long monotonic_base;
-static seqlock_t monotonic_lock = SEQLOCK_UNLOCKED;
-
-#define ACPI_PM_MASK 0xFFFFFF /* limit it to 24 bits */
-
-/*helper function to safely read acpi pm timesource*/
-static inline u32 read_pmtmr(void)
-{
-	u32 v1=0,v2=0,v3=0;
-	/* It has been reported that because of various broken
-	 * chipsets (ICH4, PIIX4 and PIIX4E) where the ACPI PM time
-	 * source is not latched, so you must read it multiple
-	 * times to insure a safe value is read.
-	 */
-	do {
-		v1 = inl(pmtmr_ioport);
-		v2 = inl(pmtmr_ioport);
-		v3 = inl(pmtmr_ioport);
-	} while ((v1 > v2 && v1 < v3) || (v2 > v3 && v2 < v1)
-			|| (v3 > v1 && v3 < v2));
-
-	/* mask the output to 24 bits */
-	return v2 & ACPI_PM_MASK;
-}
-
-
-/*
- * Some boards have the PMTMR running way too fast. We check
- * the PMTMR rate against PIT channel 2 to catch these cases.
- */
-static int verify_pmtmr_rate(void)
-{
-	u32 value1, value2;
-	unsigned long count, delta;
-
-	mach_prepare_counter();
-	value1 = read_pmtmr();
-	mach_countup(&count);
-	value2 = read_pmtmr();
-	delta = (value2 - value1) & ACPI_PM_MASK;
-
-	/* Check that the PMTMR delta is within 5% of what we expect */
-	if (delta < (PMTMR_EXPECTED_RATE * 19) / 20 ||
-	    delta > (PMTMR_EXPECTED_RATE * 21) / 20) {
-		printk(KERN_INFO "PM-Timer running at invalid rate: %lu%% of normal - aborting.\n", 100UL * delta / PMTMR_EXPECTED_RATE);
-		return -1;
-	}
-
-	return 0;
-}
-
-
-static int init_pmtmr(char* override)
-{
-	u32 value1, value2;
-	unsigned int i;
-
- 	if (override[0] && strncmp(override,"pmtmr",5))
-		return -ENODEV;
-
-	if (!pmtmr_ioport)
-		return -ENODEV;
-
-	/* we use the TSC for delay_pmtmr, so make sure it exists */
-	if (!cpu_has_tsc)
-		return -ENODEV;
-
-	/* "verify" this timing source */
-	value1 = read_pmtmr();
-	for (i = 0; i < 10000; i++) {
-		value2 = read_pmtmr();
-		if (value2 == value1)
-			continue;
-		if (value2 > value1)
-			goto pm_good;
-		if ((value2 < value1) && ((value2) < 0xFFF))
-			goto pm_good;
-		printk(KERN_INFO "PM-Timer had inconsistent results: 0x%#x, 0x%#x - aborting.\n", value1, value2);
-		return -EINVAL;
-	}
-	printk(KERN_INFO "PM-Timer had no reasonable result: 0x%#x - aborting.\n", value1);
-	return -ENODEV;
-
-pm_good:
-	if (verify_pmtmr_rate() != 0)
-		return -ENODEV;
-
-	init_cpu_khz();
-	return 0;
-}
-
-static inline u32 cyc2us(u32 cycles)
-{
-	/* The Power Management Timer ticks at 3.579545 ticks per microsecond.
-	 * 1 / PM_TIMER_FREQUENCY == 0.27936511 =~ 286/1024 [error: 0.024%]
-	 *
-	 * Even with HZ = 100, delta is at maximum 35796 ticks, so it can
-	 * easily be multiplied with 286 (=0x11E) without having to fear
-	 * u32 overflows.
-	 */
-	cycles *= 286;
-	return (cycles >> 10);
-}
-
-/*
- * this gets called during each timer interrupt
- *   - Called while holding the writer xtime_lock
- */
-static void mark_offset_pmtmr(void)
-{
-	u32 lost, delta, last_offset;
-	static int first_run = 1;
-	last_offset = offset_tick;
-
-	write_seqlock(&monotonic_lock);
-
-	offset_tick = read_pmtmr();
-
-	/* calculate tick interval */
-	delta = (offset_tick - last_offset) & ACPI_PM_MASK;
-
-	/* convert to usecs */
-	delta = cyc2us(delta);
-
-	/* update the monotonic base value */
-	monotonic_base += delta * NSEC_PER_USEC;
-	write_sequnlock(&monotonic_lock);
-
-	/* convert to ticks */
-	delta += offset_delay;
-	lost = delta / (USEC_PER_SEC / HZ);
-	offset_delay = delta % (USEC_PER_SEC / HZ);
-
-
-	/* compensate for lost ticks */
-	if (lost >= 2)
-		jiffies_64 += lost - 1;
-
-	/* don't calculate delay for first run,
-	   or if we've got less then a tick */
-	if (first_run || (lost < 1)) {
-		first_run = 0;
-		offset_delay = 0;
-	}
-}
-
-
-static unsigned long long monotonic_clock_pmtmr(void)
-{
-	u32 last_offset, this_offset;
-	unsigned long long base, ret;
-	unsigned seq;
-
-
-	/* atomically read monotonic base & last_offset */
-	do {
-		seq = read_seqbegin(&monotonic_lock);
-		last_offset = offset_tick;
-		base = monotonic_base;
-	} while (read_seqretry(&monotonic_lock, seq));
-
-	/* Read the pmtmr */
-	this_offset =  read_pmtmr();
-
-	/* convert to nanoseconds */
-	ret = (this_offset - last_offset) & ACPI_PM_MASK;
-	ret = base + (cyc2us(ret) * NSEC_PER_USEC);
-	return ret;
-}
-
-static void delay_pmtmr(unsigned long loops)
-{
-	unsigned long bclock, now;
-
-	rdtscl(bclock);
-	do
-	{
-		rep_nop();
-		rdtscl(now);
-	} while ((now-bclock) < loops);
-}
-
-
-/*
- * get the offset (in microseconds) from the last call to mark_offset()
- *	- Called holding a reader xtime_lock
- */
-static unsigned long get_offset_pmtmr(void)
-{
-	u32 now, offset, delta = 0;
-
-	offset = offset_tick;
-	now = read_pmtmr();
-	delta = (now - offset)&ACPI_PM_MASK;
-
-	return (unsigned long) offset_delay + cyc2us(delta);
-}
-
-
-/* acpi timer_opts struct */
-static struct timer_opts timer_pmtmr = {
-	.name			= "pmtmr",
-	.mark_offset		= mark_offset_pmtmr,
-	.get_offset		= get_offset_pmtmr,
-	.monotonic_clock 	= monotonic_clock_pmtmr,
-	.delay 			= delay_pmtmr,
-	.read_timer 		= read_timer_tsc,
-};
-
-struct init_timer_opts __initdata timer_pmtmr_init = {
-	.init = init_pmtmr,
-	.opts = &timer_pmtmr,
-};
-
-MODULE_LICENSE("GPL");
-MODULE_AUTHOR("Dominik Brodowski <linux@brodo.de>");
-MODULE_DESCRIPTION("Power Management Timer (PMTMR) as primary timing source for x86");
diff -ruN linux-2.6.12-rc6-mm1/arch/i386/kernel/timers/timer_tsc.c linux-2.6.12-rc6-mm1-tod/arch/i386/kernel/timers/timer_tsc.c
--- linux-2.6.12-rc6-mm1/arch/i386/kernel/timers/timer_tsc.c	2005-06-17 19:22:40.061520060 -0700
+++ linux-2.6.12-rc6-mm1-tod/arch/i386/kernel/timers/timer_tsc.c	1969-12-31 16:00:00.000000000 -0800
@@ -1,386 +0,0 @@
-/*
- * This code largely moved from arch/i386/kernel/time.c.
- * See comments there for proper credits.
- *
- * 2004-06-25    Jesper Juhl
- *      moved mark_offset_tsc below cpufreq_delayed_get to avoid gcc 3.4
- *      failing to inline.
- */
-
-#include <linux/spinlock.h>
-#include <linux/init.h>
-#include <linux/timex.h>
-#include <linux/errno.h>
-#include <linux/cpufreq.h>
-#include <linux/string.h>
-#include <linux/jiffies.h>
-
-#include <asm/timer.h>
-#include <asm/io.h>
-/* processor.h for distable_tsc flag */
-#include <asm/processor.h>
-
-#include "io_ports.h"
-#include "mach_timer.h"
-
-#include <asm/hpet.h>
-
-#ifdef CONFIG_HPET_TIMER
-static unsigned long hpet_usec_quotient;
-static unsigned long hpet_last;
-static struct timer_opts timer_tsc;
-#endif
-
-extern spinlock_t i8253_lock;
-
-static int use_tsc;
-/* Number of usecs that the last interrupt was delayed */
-static int delay_at_last_interrupt;
-
-static unsigned long last_tsc_low; /* lsb 32 bits of Time Stamp Counter */
-static unsigned long last_tsc_high; /* msb 32 bits of Time Stamp Counter */
-static unsigned long long monotonic_base;
-static seqlock_t monotonic_lock = SEQLOCK_UNLOCKED;
-
-static int count2; /* counter for mark_offset_tsc() */
-
-/* Cached *multiplier* to convert TSC counts to microseconds.
- * (see the equation below).
- * Equal to 2^32 * (1 / (clocks per usec) ).
- * Initialized in time_init.
- */
-static unsigned long fast_gettimeoffset_quotient;
-
-static unsigned long get_offset_tsc(void)
-{
-	register unsigned long eax, edx;
-
-	/* Read the Time Stamp Counter */
-
-	rdtsc(eax,edx);
-
-	/* .. relative to previous jiffy (32 bits is enough) */
-	eax -= last_tsc_low;	/* tsc_low delta */
-
-	/*
-         * Time offset = (tsc_low delta) * fast_gettimeoffset_quotient
-         *             = (tsc_low delta) * (usecs_per_clock)
-         *             = (tsc_low delta) * (usecs_per_jiffy / clocks_per_jiffy)
-	 *
-	 * Using a mull instead of a divl saves up to 31 clock cycles
-	 * in the critical path.
-         */
-
-	__asm__("mull %2"
-		:"=a" (eax), "=d" (edx)
-		:"rm" (fast_gettimeoffset_quotient),
-		 "0" (eax));
-
-	/* our adjusted time offset in microseconds */
-	return delay_at_last_interrupt + edx;
-}
-
-static unsigned long long monotonic_clock_tsc(void)
-{
-	unsigned long long last_offset, this_offset, base;
-	unsigned seq;
-	
-	/* atomically read monotonic base & last_offset */
-	do {
-		seq = read_seqbegin(&monotonic_lock);
-		last_offset = ((unsigned long long)last_tsc_high<<32)|last_tsc_low;
-		base = monotonic_base;
-	} while (read_seqretry(&monotonic_lock, seq));
-
-	/* Read the Time Stamp Counter */
-	rdtscll(this_offset);
-
-	/* return the value in ns */
-	return base + cycles_2_ns(this_offset - last_offset);
-}
-
-
-static void delay_tsc(unsigned long loops)
-{
-	unsigned long bclock, now;
-	
-	rdtscl(bclock);
-	do
-	{
-		rep_nop();
-		rdtscl(now);
-	} while ((now-bclock) < loops);
-}
-
-#ifdef CONFIG_HPET_TIMER
-static void mark_offset_tsc_hpet(void)
-{
-	unsigned long long this_offset, last_offset;
- 	unsigned long offset, temp, hpet_current;
-
-	write_seqlock(&monotonic_lock);
-	last_offset = ((unsigned long long)last_tsc_high<<32)|last_tsc_low;
-	/*
-	 * It is important that these two operations happen almost at
-	 * the same time. We do the RDTSC stuff first, since it's
-	 * faster. To avoid any inconsistencies, we need interrupts
-	 * disabled locally.
-	 */
-	/*
-	 * Interrupts are just disabled locally since the timer irq
-	 * has the SA_INTERRUPT flag set. -arca
-	 */
-	/* read Pentium cycle counter */
-
-	hpet_current = hpet_readl(HPET_COUNTER);
-	rdtsc(last_tsc_low, last_tsc_high);
-
-	/* lost tick compensation */
-	offset = hpet_readl(HPET_T0_CMP) - hpet_tick;
-	if (unlikely(((offset - hpet_last) > hpet_tick) && (hpet_last != 0))) {
-		int lost_ticks = (offset - hpet_last) / hpet_tick;
-		jiffies_64 += lost_ticks;
-	}
-	hpet_last = hpet_current;
-
-	/* update the monotonic base value */
-	this_offset = ((unsigned long long)last_tsc_high<<32)|last_tsc_low;
-	monotonic_base += cycles_2_ns(this_offset - last_offset);
-	write_sequnlock(&monotonic_lock);
-
-	/* calculate delay_at_last_interrupt */
-	/*
-	 * Time offset = (hpet delta) * ( usecs per HPET clock )
-	 *             = (hpet delta) * ( usecs per tick / HPET clocks per tick)
-	 *             = (hpet delta) * ( hpet_usec_quotient ) / (2^32)
-	 * Where,
-	 * hpet_usec_quotient = (2^32 * usecs per tick)/HPET clocks per tick
-	 */
-	delay_at_last_interrupt = hpet_current - offset;
-	ASM_MUL64_REG(temp, delay_at_last_interrupt,
-			hpet_usec_quotient, delay_at_last_interrupt);
-}
-#endif
-
-
-static void mark_offset_tsc(void)
-{
-	unsigned long lost,delay;
-	unsigned long delta = last_tsc_low;
-	int count;
-	int countmp;
-	static int count1 = 0;
-	unsigned long long this_offset, last_offset;
-	static int lost_count = 0;
-
-	write_seqlock(&monotonic_lock);
-	last_offset = ((unsigned long long)last_tsc_high<<32)|last_tsc_low;
-	/*
-	 * It is important that these two operations happen almost at
-	 * the same time. We do the RDTSC stuff first, since it's
-	 * faster. To avoid any inconsistencies, we need interrupts
-	 * disabled locally.
-	 */
-
-	/*
-	 * Interrupts are just disabled locally since the timer irq
-	 * has the SA_INTERRUPT flag set. -arca
-	 */
-
-	/* read Pentium cycle counter */
-
-	rdtsc(last_tsc_low, last_tsc_high);
-
-	spin_lock(&i8253_lock);
-	outb_p(0x00, PIT_MODE);     /* latch the count ASAP */
-
-	count = inb_p(PIT_CH0);    /* read the latched count */
-	count |= inb(PIT_CH0) << 8;
-
-	/*
-	 * VIA686a test code... reset the latch if count > max + 1
-	 * from timer_pit.c - cjb
-	 */
-	if (count > LATCH) {
-		outb_p(0x34, PIT_MODE);
-		outb_p(LATCH & 0xff, PIT_CH0);
-		outb(LATCH >> 8, PIT_CH0);
-		count = LATCH - 1;
-	}
-
-	spin_unlock(&i8253_lock);
-
-	if (pit_latch_buggy) {
-		/* get center value of last 3 time lutch */
-		if ((count2 >= count && count >= count1)
-		    || (count1 >= count && count >= count2)) {
-			count2 = count1; count1 = count;
-		} else if ((count1 >= count2 && count2 >= count)
-			   || (count >= count2 && count2 >= count1)) {
-			countmp = count;count = count2;
-			count2 = count1;count1 = countmp;
-		} else {
-			count2 = count1; count1 = count; count = count1;
-		}
-	}
-
-	/* lost tick compensation */
-	delta = last_tsc_low - delta;
-	{
-		register unsigned long eax, edx;
-		eax = delta;
-		__asm__("mull %2"
-		:"=a" (eax), "=d" (edx)
-		:"rm" (fast_gettimeoffset_quotient),
-		 "0" (eax));
-		delta = edx;
-	}
-	delta += delay_at_last_interrupt;
-	lost = delta/(1000000/HZ);
-	delay = delta%(1000000/HZ);
-	if (lost >= 2) {
-		jiffies_64 += lost-1;
-
-		/* sanity check to ensure we're not always losing ticks */
-		if (lost_count++ > 100) {
-			printk(KERN_WARNING "Losing too many ticks!\n");
-			printk(KERN_WARNING "TSC cannot be used as a timesource.  \n");
-			printk(KERN_WARNING "Possible reasons for this are:\n");
-			printk(KERN_WARNING "  You're running with Speedstep,\n");
-			printk(KERN_WARNING "  You don't have DMA enabled for your hard disk (see hdparm),\n");
-			printk(KERN_WARNING "  Incorrect TSC synchronization on an SMP system (see dmesg).\n");
-			printk(KERN_WARNING "Falling back to a sane timesource now.\n");
-
-			clock_fallback();
-		}
-		/* ... but give the TSC a fair chance */
-		if (lost_count > 25)
-			cpufreq_delayed_get();
-	} else
-		lost_count = 0;
-	/* update the monotonic base value */
-	this_offset = ((unsigned long long)last_tsc_high<<32)|last_tsc_low;
-	monotonic_base += cycles_2_ns(this_offset - last_offset);
-	write_sequnlock(&monotonic_lock);
-
-	/* calculate delay_at_last_interrupt */
-	count = ((LATCH-1) - count) * TICK_SIZE;
-	delay_at_last_interrupt = (count + LATCH/2) / LATCH;
-
-	/* catch corner case where tick rollover occured
-	 * between tsc and pit reads (as noted when
-	 * usec delta is > 90% # of usecs/tick)
-	 */
-	if (lost && abs(delay - delay_at_last_interrupt) > (900000/HZ))
-		jiffies_64++;
-}
-
-static int __init init_tsc(char* override)
-{
-
-	/* check clock override */
-	if (override[0] && strncmp(override,"tsc",3)) {
-#ifdef CONFIG_HPET_TIMER
-		if (is_hpet_enabled()) {
-			printk(KERN_ERR "Warning: clock= override failed. Defaulting to tsc\n");
-		} else
-#endif
-		{
-			return -ENODEV;
-		}
-	}
-
-	/*
-	 * If we have APM enabled or the CPU clock speed is variable
-	 * (CPU stops clock on HLT or slows clock to save power)
-	 * then the TSC timestamps may diverge by up to 1 jiffy from
-	 * 'real time' but nothing will break.
-	 * The most frequent case is that the CPU is "woken" from a halt
-	 * state by the timer interrupt itself, so we get 0 error. In the
-	 * rare cases where a driver would "wake" the CPU and request a
-	 * timestamp, the maximum error is < 1 jiffy. But timestamps are
-	 * still perfectly ordered.
-	 * Note that the TSC counter will be reset if APM suspends
-	 * to disk; this won't break the kernel, though, 'cuz we're
-	 * smart.  See arch/i386/kernel/apm.c.
-	 */
- 	/*
- 	 *	Firstly we have to do a CPU check for chips with
- 	 * 	a potentially buggy TSC. At this point we haven't run
- 	 *	the ident/bugs checks so we must run this hook as it
- 	 *	may turn off the TSC flag.
- 	 *
- 	 *	NOTE: this doesn't yet handle SMP 486 machines where only
- 	 *	some CPU's have a TSC. Thats never worked and nobody has
- 	 *	moaned if you have the only one in the world - you fix it!
- 	 */
-
-	count2 = LATCH; /* initialize counter for mark_offset_tsc() */
-
-	if (cpu_has_tsc) {
-		unsigned long tsc_quotient;
-#ifdef CONFIG_HPET_TIMER
-		if (is_hpet_enabled() && hpet_use_timer) {
-			unsigned long result, remain;
-			printk("Using TSC for gettimeofday\n");
-			tsc_quotient = calibrate_tsc_hpet(NULL);
-			timer_tsc.mark_offset = &mark_offset_tsc_hpet;
-			/*
-			 * Math to calculate hpet to usec multiplier
-			 * Look for the comments at get_offset_tsc_hpet()
-			 */
-			ASM_DIV64_REG(result, remain, hpet_tick,
-					0, KERNEL_TICK_USEC);
-			if (remain > (hpet_tick >> 1))
-				result++; /* rounding the result */
-
-			hpet_usec_quotient = result;
-		} else
-#endif
-		{
-			tsc_quotient = calibrate_tsc();
-		}
-
-		if (tsc_quotient) {
-			fast_gettimeoffset_quotient = tsc_quotient;
-			use_tsc = 1;
-			/*
-			 *	We could be more selective here I suspect
-			 *	and just enable this for the next intel chips ?
-			 */
-			/* report CPU clock rate in Hz.
-			 * The formula is (10^6 * 2^32) / (2^32 * 1 / (clocks/us)) =
-			 * clock/second. Our precision is about 100 ppm.
-			 */
-			{	unsigned long eax=0, edx=1000;
-				__asm__("divl %2"
-		       		:"=a" (cpu_khz), "=d" (edx)
-        	       		:"r" (tsc_quotient),
-	                	"0" (eax), "1" (edx));
-				printk("Detected %lu.%03lu MHz processor.\n", cpu_khz / 1000, cpu_khz % 1000);
-			}
-			set_cyc2ns_scale(cpu_khz/1000);
-			return 0;
-		}
-	}
-	return -ENODEV;
-}
-
-
-
-/************************************************************/
-
-/* tsc timer_opts struct */
-static struct timer_opts timer_tsc = {
-	.name = "tsc",
-	.mark_offset = mark_offset_tsc, 
-	.get_offset = get_offset_tsc,
-	.monotonic_clock = monotonic_clock_tsc,
-	.delay = delay_tsc,
-	.read_timer = read_timer_tsc,
-};
-
-struct init_timer_opts __initdata timer_tsc_init = {
-	.init = init_tsc,
-	.opts = &timer_tsc,
-};
diff -ruN linux-2.6.12-rc6-mm1/arch/i386/lib/delay.c linux-2.6.12-rc6-mm1-tod/arch/i386/lib/delay.c
--- linux-2.6.12-rc6-mm1/arch/i386/lib/delay.c	2005-06-17 19:22:40.067519314 -0700
+++ linux-2.6.12-rc6-mm1-tod/arch/i386/lib/delay.c	2005-06-17 19:08:14.000000000 -0700
@@ -14,6 +14,7 @@
 #include <linux/sched.h>
 #include <linux/delay.h>
 #include <linux/module.h>
+#include <linux/timeofday.h>
 #include <asm/processor.h>
 #include <asm/delay.h>
 #include <asm/timer.h>
@@ -22,11 +23,20 @@
 #include <asm/smp.h>
 #endif
 
-extern struct timer_opts* timer;
-
+/* XXX - For now just use a simple loop delay
+ *  This has cpufreq issues, but so did the old method.
+ */
 void __delay(unsigned long loops)
 {
-	cur_timer->delay(loops);
+	int d0;
+	__asm__ __volatile__(
+		"\tjmp 1f\n"
+		".align 16\n"
+		"1:\tjmp 2f\n"
+		".align 16\n"
+		"2:\tdecl %0\n\tjns 2b"
+		:"=&a" (d0)
+		:"0" (loops));
 }
 
 inline void __const_udelay(unsigned long xloops)
diff -ruN linux-2.6.12-rc6-mm1/include/asm-i386/timeofday.h linux-2.6.12-rc6-mm1-tod/include/asm-i386/timeofday.h
--- linux-2.6.12-rc6-mm1/include/asm-i386/timeofday.h	1969-12-31 16:00:00.000000000 -0800
+++ linux-2.6.12-rc6-mm1-tod/include/asm-i386/timeofday.h	2005-06-17 19:05:24.000000000 -0700
@@ -0,0 +1,4 @@
+#ifndef _ASM_I386_TIMEOFDAY_H
+#define _ASM_I386_TIMEOFDAY_H
+#include <asm-generic/timeofday.h>
+#endif
diff -ruN linux-2.6.12-rc6-mm1/include/asm-i386/timer.h linux-2.6.12-rc6-mm1-tod/include/asm-i386/timer.h
--- linux-2.6.12-rc6-mm1/include/asm-i386/timer.h	2005-06-17 19:22:40.000000000 -0700
+++ linux-2.6.12-rc6-mm1-tod/include/asm-i386/timer.h	2005-06-17 19:23:21.000000000 -0700
@@ -2,66 +2,11 @@
 #define _ASMi386_TIMER_H
 #include <linux/init.h>
 
-/**
- * struct timer_ops - used to define a timer source
- *
- * @name: name of the timer.
- * @init: Probes and initializes the timer. Takes clock= override 
- *        string as an argument. Returns 0 on success, anything else
- *        on failure.
- * @mark_offset: called by the timer interrupt.
- * @get_offset:  called by gettimeofday(). Returns the number of microseconds
- *               since the last timer interupt.
- * @monotonic_clock: returns the number of nanoseconds since the init of the
- *                   timer.
- * @delay: delays this many clock cycles.
- */
-struct timer_opts {
-	char* name;
-	void (*mark_offset)(void);
-	unsigned long (*get_offset)(void);
-	unsigned long long (*monotonic_clock)(void);
-	void (*delay)(unsigned long);
-	unsigned long (*read_timer)(void);
-};
-
-struct init_timer_opts {
-	int (*init)(char *override);
-	struct timer_opts *opts;
-};
-
 #define TICK_SIZE (tick_nsec / 1000)
-
-extern struct timer_opts* __init select_timer(void);
-extern void clock_fallback(void);
 void setup_pit_timer(void);
-
 /* Modifiers for buggy PIT handling */
-
 extern int pit_latch_buggy;
-
-extern struct timer_opts *cur_timer;
 extern int timer_ack;
-
-/* list of externed timers */
-extern struct timer_opts timer_none;
-extern struct timer_opts timer_pit;
-extern struct init_timer_opts timer_pit_init;
-extern struct init_timer_opts timer_tsc_init;
-#ifdef CONFIG_X86_CYCLONE_TIMER
-extern struct init_timer_opts timer_cyclone_init;
-#endif
-
-extern unsigned long calibrate_tsc(void);
-extern unsigned long read_timer_tsc(void);
-extern void init_cpu_khz(void);
 extern int recalibrate_cpu_khz(void);
-#ifdef CONFIG_HPET_TIMER
-extern struct init_timer_opts timer_hpet_init;
-extern unsigned long calibrate_tsc_hpet(unsigned long *tsc_hpet_quotient_ptr);
-#endif
 
-#ifdef CONFIG_X86_PM_TIMER
-extern struct init_timer_opts timer_pmtmr_init;
-#endif
 #endif



^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 6/6] new timeofday i386 specific timesources for -mm (v.B3)
  2005-06-18  3:02       ` [PATCH 5/6] new timeofday i386 arch specific changes, part 4 " john stultz
@ 2005-06-18  3:04         ` john stultz
  0 siblings, 0 replies; 32+ messages in thread
From: john stultz @ 2005-06-18  3:04 UTC (permalink / raw)
  To: lkml, Andrew Morton
  Cc: Tim Schmielau, George Anzinger, albert, Ulrich Windl,
	Christoph Lameter, Dominik Brodowski, David Mosberger, Andi Kleen,
	paulus, schwidefsky, keith maanthey, Chris McDermott, Max Asbock,
	mahuja, Nishanth Aravamudan, Darren Hart, Darrick J. Wong,
	Anton Blanchard, donf, mpm, benh, kernel-stuff, frank

Andrew, All,

	This patch implements the time sources for i386 (acpi_pm, cyclone,
hpet, pit, tsc and tsc-interp). The patch should apply on top of the
timeofday-arch-i386-part4_B3.patch.
	
The patch should be fairly straight forward, only adding the new
timesources.
	
Andrew, please consider for inclusion for testing into your tree.

thanks
-john

linux-2.6.12-rc6-mm1_timeofday-timesources-i386_B3.patch
========================================================
diff -ruN linux-2.6.12-rc6-mm1/drivers/timesource/acpi_pm.c linux-2.6.12-rc6-mm1-tod/drivers/timesource/acpi_pm.c
--- linux-2.6.12-rc6-mm1/drivers/timesource/acpi_pm.c	1969-12-31 16:00:00.000000000 -0800
+++ linux-2.6.12-rc6-mm1-tod/drivers/timesource/acpi_pm.c	2005-06-17 19:15:53.000000000 -0700
@@ -0,0 +1,152 @@
+/*
+ * linux/drivers/timesource/acpi_pm.c
+ *
+ * This file contains the ACPI PM based time source.
+ *
+ * This code was largely moved from the i386 timer_pm.c file
+ * which was (C) Dominik Brodowski <linux@brodo.de> 2003
+ * and contained the following comments:
+ *
+ * Driver to use the Power Management Timer (PMTMR) available in some
+ * southbridges as primary timing source for the Linux kernel.
+ *
+ * Based on parts of linux/drivers/acpi/hardware/hwtimer.c, timer_pit.c,
+ * timer_hpet.c, and on Arjan van de Ven's implementation for 2.4.
+ *
+ * This file is licensed under the GPL v2.
+ */
+
+
+#include <linux/timesource.h>
+#include <linux/errno.h>
+#include <linux/init.h>
+#include <asm/io.h>
+
+/* Number of PMTMR ticks expected during calibration run */
+#define PMTMR_TICKS_PER_SEC 3579545
+
+#if (CONFIG_X86 && (!CONFIG_X86_64))
+#include "mach_timer.h"
+#define PMTMR_EXPECTED_RATE ((PMTMR_TICKS_PER_SEC*CALIBRATE_TIME_MSEC)/1000)
+#endif
+
+/* The I/O port the PMTMR resides at.
+ * The location is detected during setup_arch(),
+ * in arch/i386/acpi/boot.c */
+extern u32 acpi_pmtmr_ioport;
+extern int acpi_pmtmr_buggy;
+
+#define ACPI_PM_MASK 0xFFFFFF /* limit it to 24 bits */
+
+
+static inline u32 read_pmtmr(void)
+{
+	/* mask the output to 24 bits */
+	return inl(acpi_pmtmr_ioport) & ACPI_PM_MASK;
+}
+
+static cycle_t acpi_pm_read_verified(void)
+{
+	u32 v1=0,v2=0,v3=0;
+	/* It has been reported that because of various broken
+	 * chipsets (ICH4, PIIX4 and PIIX4E) where the ACPI PM time
+	 * source is not latched, so you must read it multiple
+	 * times to insure a safe value is read.
+	 */
+	do {
+		v1 = read_pmtmr();
+		v2 = read_pmtmr();
+		v3 = read_pmtmr();
+	} while ((v1 > v2 && v1 < v3) || (v2 > v3 && v2 < v1)
+			|| (v3 > v1 && v3 < v2));
+
+	return (cycle_t)v2;
+}
+
+
+static cycle_t acpi_pm_read(void)
+{
+	return (cycle_t)read_pmtmr();
+}
+
+struct timesource_t timesource_acpi_pm = {
+	.name = "acpi_pm",
+	.priority = 200,
+	.type = TIMESOURCE_FUNCTION,
+	.read_fnct = acpi_pm_read,
+	.mask = (cycle_t)ACPI_PM_MASK,
+	.mult = 0, /*to be caluclated*/
+	.shift = 22,
+};
+
+#if (CONFIG_X86 && (!CONFIG_X86_64))
+/*
+ * Some boards have the PMTMR running way too fast. We check
+ * the PMTMR rate against PIT channel 2 to catch these cases.
+ */
+static int __init verify_pmtmr_rate(void)
+{
+	u32 value1, value2;
+	unsigned long count, delta;
+
+	mach_prepare_counter();
+	value1 = read_pmtmr();
+	mach_countup(&count);
+	value2 = read_pmtmr();
+	delta = (value2 - value1) & ACPI_PM_MASK;
+
+	/* Check that the PMTMR delta is within 5% of what we expect */
+	if (delta < (PMTMR_EXPECTED_RATE * 19) / 20 ||
+	    delta > (PMTMR_EXPECTED_RATE * 21) / 20) {
+		printk(KERN_INFO "PM-Timer running at invalid rate: %lu%% of normal - aborting.\n", 100UL * delta / PMTMR_EXPECTED_RATE);
+		return -1;
+	}
+
+	return 0;
+}
+#else
+#define verify_pmtmr_rate() (0)
+#endif
+
+static int __init init_acpi_pm_timesource(void)
+{
+	u32 value1, value2;
+	unsigned int i;
+
+	if (!acpi_pmtmr_ioport)
+		return -ENODEV;
+
+	timesource_acpi_pm.mult = timesource_hz2mult(PMTMR_TICKS_PER_SEC,
+									timesource_acpi_pm.shift);
+
+	/* "verify" this timing source */
+	value1 = read_pmtmr();
+	for (i = 0; i < 10000; i++) {
+		value2 = read_pmtmr();
+		if (value2 == value1)
+			continue;
+		if (value2 > value1)
+			goto pm_good;
+		if ((value2 < value1) && ((value2) < 0xFFF))
+			goto pm_good;
+		printk(KERN_INFO "PM-Timer had inconsistent results: 0x%#x, 0x%#x - aborting.\n", value1, value2);
+		return -EINVAL;
+	}
+	printk(KERN_INFO "PM-Timer had no reasonable result: 0x%#x - aborting.\n", value1);
+	return -ENODEV;
+
+pm_good:
+	if (verify_pmtmr_rate() != 0)
+		return -ENODEV;
+
+	/* check to see if pmtmr is known buggy */
+	if (acpi_pmtmr_buggy) {
+		timesource_acpi_pm.read_fnct = acpi_pm_read_verified;
+		timesource_acpi_pm.priority = 110;
+	}
+
+	register_timesource(&timesource_acpi_pm);
+	return 0;
+}
+
+module_init(init_acpi_pm_timesource);
diff -ruN linux-2.6.12-rc6-mm1/drivers/timesource/cyclone.c linux-2.6.12-rc6-mm1-tod/drivers/timesource/cyclone.c
--- linux-2.6.12-rc6-mm1/drivers/timesource/cyclone.c	1969-12-31 16:00:00.000000000 -0800
+++ linux-2.6.12-rc6-mm1-tod/drivers/timesource/cyclone.c	2005-06-17 19:15:53.000000000 -0700
@@ -0,0 +1,138 @@
+#include <linux/timesource.h>
+#include <linux/errno.h>
+#include <linux/string.h>
+#include <linux/timex.h>
+#include <linux/init.h>
+
+#include <asm/io.h>
+#include <asm/pgtable.h>
+#include "mach_timer.h"
+
+#define CYCLONE_CBAR_ADDR 0xFEB00CD0		/* base address ptr*/
+#define CYCLONE_PMCC_OFFSET 0x51A0		/* offset to control register */
+#define CYCLONE_MPCS_OFFSET 0x51A8		/* offset to select register */
+#define CYCLONE_MPMC_OFFSET 0x51D0		/* offset to count register */
+#define CYCLONE_TIMER_FREQ 100000000
+#define CYCLONE_TIMER_MASK (0xFFFFFFFF) /* 32 bit mask */
+
+int use_cyclone = 0;
+
+struct timesource_t timesource_cyclone = {
+	.name = "cyclone",
+	.priority = 250,
+	.type = TIMESOURCE_MMIO_32,
+	.mmio_ptr = NULL, /* to be set */
+	.mask = (cycle_t)CYCLONE_TIMER_MASK,
+	.mult = 10,
+	.shift = 0,
+};
+
+static unsigned long __init calibrate_cyclone(void)
+{
+	u64 delta64;
+	unsigned long start, end;
+	unsigned long i, count;
+	unsigned long cyclone_freq_khz;
+
+	/* repeat 3 times to make sure the cache is warm */
+	for(i=0; i < 3; i++) {
+		mach_prepare_counter();
+		start = readl(timesource_cyclone.mmio_ptr);
+		mach_countup(&count);
+		end = readl(timesource_cyclone.mmio_ptr);
+	}
+
+	delta64 = end - start;
+
+	delta64 += CALIBRATE_TIME_MSEC/2; /* round for do_div */
+	do_div(delta64,CALIBRATE_TIME_MSEC);
+
+	cyclone_freq_khz = (unsigned long)delta64;
+
+	printk("calculated cyclone_freq: %lu khz\n", cyclone_freq_khz);
+	return cyclone_freq_khz;
+}
+
+static int __init init_cyclone_timesource(void)
+{
+	unsigned long base;	/* saved value from CBAR */
+	unsigned long offset;
+	u32 __iomem* reg;
+	u32 __iomem* volatile cyclone_timer;	/* Cyclone MPMC0 register */
+	unsigned long khz;
+	int i;
+
+	/*make sure we're on a summit box*/
+	if (!use_cyclone) return -ENODEV;
+
+	printk(KERN_INFO "Summit chipset: Starting Cyclone Counter.\n");
+
+	/* find base address */
+	offset = CYCLONE_CBAR_ADDR;
+	reg = ioremap_nocache(offset, sizeof(reg));
+	if(!reg){
+		printk(KERN_ERR "Summit chipset: Could not find valid CBAR register.\n");
+		return -ENODEV;
+	}
+	/* even on 64bit systems, this is only 32bits */
+	base = readl(reg);
+	if(!base){
+		printk(KERN_ERR "Summit chipset: Could not find valid CBAR value.\n");
+		return -ENODEV;
+	}
+	iounmap(reg);
+
+	/* setup PMCC */
+	offset = base + CYCLONE_PMCC_OFFSET;
+	reg = ioremap_nocache(offset, sizeof(reg));
+	if(!reg){
+		printk(KERN_ERR "Summit chipset: Could not find valid PMCC register.\n");
+		return -ENODEV;
+	}
+	writel(0x00000001,reg);
+	iounmap(reg);
+
+	/* setup MPCS */
+	offset = base + CYCLONE_MPCS_OFFSET;
+	reg = ioremap_nocache(offset, sizeof(reg));
+	if(!reg){
+		printk(KERN_ERR "Summit chipset: Could not find valid MPCS register.\n");
+		return -ENODEV;
+	}
+	writel(0x00000001,reg);
+	iounmap(reg);
+
+	/* map in cyclone_timer */
+	offset = base + CYCLONE_MPMC_OFFSET;
+	cyclone_timer = ioremap_nocache(offset, sizeof(u64));
+	if(!cyclone_timer){
+		printk(KERN_ERR "Summit chipset: Could not find valid MPMC register.\n");
+		return -ENODEV;
+	}
+
+	/*quick test to make sure its ticking*/
+	for(i=0; i<3; i++){
+		u32 old = readl(cyclone_timer);
+		int stall = 100;
+		while(stall--) barrier();
+		if(readl(cyclone_timer) == old){
+			printk(KERN_ERR "Summit chipset: Counter not counting! DISABLED\n");
+			iounmap(cyclone_timer);
+			cyclone_timer = NULL;
+			return -ENODEV;
+		}
+	}
+	timesource_cyclone.mmio_ptr = cyclone_timer;
+
+	/* sort out mult/shift values */
+	khz = calibrate_cyclone();
+	timesource_cyclone.shift = 22;
+	timesource_cyclone.mult = timesource_khz2mult(khz,
+									timesource_cyclone.shift);
+
+	register_timesource(&timesource_cyclone);
+
+	return 0;
+}
+
+module_init(init_cyclone_timesource);
diff -ruN linux-2.6.12-rc6-mm1/drivers/timesource/hpet.c linux-2.6.12-rc6-mm1-tod/drivers/timesource/hpet.c
--- linux-2.6.12-rc6-mm1/drivers/timesource/hpet.c	1969-12-31 16:00:00.000000000 -0800
+++ linux-2.6.12-rc6-mm1-tod/drivers/timesource/hpet.c	2005-06-17 19:15:53.000000000 -0700
@@ -0,0 +1,59 @@
+#include <linux/timesource.h>
+#include <linux/hpet.h>
+#include <linux/errno.h>
+#include <linux/init.h>
+#include <asm/io.h>
+#include <asm/hpet.h>
+
+#define HPET_MASK (0xFFFFFFFF)
+#define HPET_SHIFT 22
+
+/* FSEC = 10^-15 NSEC = 10^-9 */
+#define FSEC_PER_NSEC 1000000
+
+struct timesource_t timesource_hpet = {
+	.name = "hpet",
+	.priority = 250,
+	.type = TIMESOURCE_MMIO_32,
+	.mmio_ptr = NULL,
+	.mask = (cycle_t)HPET_MASK,
+	.mult = 0, /* set below */
+	.shift = HPET_SHIFT,
+};
+
+static int __init init_hpet_timesource(void)
+{
+	unsigned long hpet_period;
+	void __iomem* hpet_base;
+	u64 tmp;
+
+	if (!hpet_address)
+		return -ENODEV;
+
+	/* calculate the hpet address */
+	hpet_base =
+		(void __iomem*)ioremap_nocache(hpet_address, HPET_MMAP_SIZE);
+	timesource_hpet.mmio_ptr = hpet_base + HPET_COUNTER;
+
+	/* calculate the frequency */
+	hpet_period = readl(hpet_base + HPET_PERIOD);
+
+
+	/* hpet period is in femto seconds per cycle
+	 * so we need to convert this to ns/cyc units
+	 * aproximated by mult/2^shift
+	 *
+	 *  fsec/cyc * 1nsec/1000000fsec = nsec/cyc = mult/2^shift
+	 *  fsec/cyc * 1ns/1000000fsec * 2^shift = mult
+	 *  fsec/cyc * 2^shift * 1nsec/1000000fsec = mult
+	 *  (fsec/cyc << shift)/1000000 = mult
+	 *  (hpet_period << shift)/FSEC_PER_NSEC = mult
+	 */
+	tmp = (u64)hpet_period << HPET_SHIFT;
+	do_div(tmp, FSEC_PER_NSEC);
+	timesource_hpet.mult = (u32)tmp;
+
+	register_timesource(&timesource_hpet);
+	return 0;
+}
+module_init(init_hpet_timesource);
diff -ruN linux-2.6.12-rc6-mm1/drivers/timesource/i386_pit.c linux-2.6.12-rc6-mm1-tod/drivers/timesource/i386_pit.c
--- linux-2.6.12-rc6-mm1/drivers/timesource/i386_pit.c	1969-12-31 16:00:00.000000000 -0800
+++ linux-2.6.12-rc6-mm1-tod/drivers/timesource/i386_pit.c	2005-06-17 19:15:53.000000000 -0700
@@ -0,0 +1,64 @@
+#include <linux/timesource.h>
+#include <linux/jiffies.h>
+#include <linux/init.h>
+/* pit timesource does not build on x86-64 */
+#ifndef CONFIG_X86_64
+#include <asm/io.h>
+#include "io_ports.h"
+
+extern spinlock_t i8253_lock;
+
+/* Since the PIT overflows every tick, its not very useful
+ * to just read by itself. So use jiffies to emulate a free
+ * running counter.
+ */
+
+static cycle_t pit_read(void)
+{
+	unsigned long flags, seq;
+	int count;
+	u64 jifs;
+
+	do {
+		seq = read_seqbegin(&xtime_lock);
+
+		spin_lock_irqsave(&i8253_lock, flags);
+
+		outb_p(0x00, PIT_MODE);	/* latch the count ASAP */
+		count = inb_p(PIT_CH0);	/* read the latched count */
+		count |= inb_p(PIT_CH0) << 8;
+
+		/* VIA686a test code... reset the latch if count > max + 1 */
+		if (count > LATCH) {
+			outb_p(0x34, PIT_MODE);
+			outb_p(LATCH & 0xff, PIT_CH0);
+			outb(LATCH >> 8, PIT_CH0);
+			count = LATCH - 1;
+		}
+		spin_unlock_irqrestore(&i8253_lock, flags);
+		jifs = get_jiffies_64() - INITIAL_JIFFIES;
+	} while (read_seqretry(&xtime_lock, seq));
+
+	count = (LATCH-1) - count;
+
+	return (cycle_t)(jifs * LATCH) + count;
+}
+
+static struct timesource_t timesource_pit = {
+	.name = "pit",
+	.priority = 110,
+	.type = TIMESOURCE_FUNCTION,
+	.read_fnct = pit_read,
+	.mask = (cycle_t)-1,
+	.mult = 0,
+	.shift = 20,
+};
+
+static int __init init_pit_timesource(void)
+{
+	timesource_pit.mult = timesource_hz2mult(CLOCK_TICK_RATE, 20);
+	register_timesource(&timesource_pit);
+	return 0;
+}
+module_init(init_pit_timesource);
+#endif
diff -ruN linux-2.6.12-rc6-mm1/drivers/timesource/Makefile linux-2.6.12-rc6-mm1-tod/drivers/timesource/Makefile
--- linux-2.6.12-rc6-mm1/drivers/timesource/Makefile	2005-06-17 18:27:48.000000000 -0700
+++ linux-2.6.12-rc6-mm1-tod/drivers/timesource/Makefile	2005-06-17 19:15:53.000000000 -0700
@@ -1 +1,8 @@
 obj-y += jiffies.o
+
+obj-$(CONFIG_X86) += tsc.o
+obj-$(CONFIG_X86) += i386_pit.o
+obj-$(CONFIG_X86) += tsc-interp.o
+obj-$(CONFIG_X86_CYCLONE_TIMER) += cyclone.o
+obj-$(CONFIG_X86_PM_TIMER) += acpi_pm.o
+obj-$(CONFIG_HPET_TIMER) += hpet.o
diff -ruN linux-2.6.12-rc6-mm1/drivers/timesource/tsc.c linux-2.6.12-rc6-mm1-tod/drivers/timesource/tsc.c
--- linux-2.6.12-rc6-mm1/drivers/timesource/tsc.c	1969-12-31 16:00:00.000000000 -0800
+++ linux-2.6.12-rc6-mm1-tod/drivers/timesource/tsc.c	2005-06-17 19:15:53.000000000 -0700
@@ -0,0 +1,83 @@
+/* TODO:
+ *		o better calibration
+ */
+
+#include <linux/timesource.h>
+#include <linux/timex.h>
+#include <linux/init.h>
+
+static unsigned long current_tsc_khz = 0;
+
+static cycle_t read_safe_tsc(void);
+static void tsc_update_callback(void);
+
+static struct timesource_t timesource_safe_tsc = {
+	.name = "c3tsc",
+	.priority = 300,
+	.type = TIMESOURCE_FUNCTION,
+	.read_fnct = read_safe_tsc,
+	.mask = (cycle_t)-1,
+	.mult = 0, /* to be set */
+	.shift = 22,
+	.update_callback = tsc_update_callback,
+};
+
+static struct timesource_t timesource_raw_tsc = {
+	.name = "tsc",
+	.priority = 300,
+	.type = TIMESOURCE_CYCLES,
+	.mask = (cycle_t)-1,
+	.mult = 0, /* to be set */
+	.shift = 22,
+	.update_callback = tsc_update_callback,
+};
+
+static struct timesource_t timesource_tsc;
+
+static cycle_t read_safe_tsc(void)
+{
+	cycle_t ret;
+	rdtscll(ret);
+	return ret + tsc_read_c3_time();
+}
+
+static void tsc_update_callback(void)
+{
+	/* check to see if we should switch to the safe timesource */
+	if (tsc_read_c3_time() &&
+		strncmp(timesource_tsc.name, "c3tsc", 5)) {
+		printk("Falling back to C3 safe TSC\n");
+		timesource_safe_tsc.mult = timesource_tsc.mult;
+		timesource_safe_tsc.priority = timesource_tsc.priority;
+		timesource_tsc = timesource_safe_tsc;
+	}
+
+	if (check_tsc_unstable()) {
+		timesource_tsc.priority = 50;
+		reselect_timesource();
+	}
+	/* only update if tsc_khz has changed */
+	if (current_tsc_khz != tsc_khz){
+		current_tsc_khz = tsc_khz;
+		timesource_tsc.mult = timesource_khz2mult(current_tsc_khz,
+							timesource_tsc.shift);
+	}
+}
+
+static int __init init_tsc_timesource(void)
+{
+
+	timesource_tsc = timesource_raw_tsc;
+
+	/* TSC initialization is done in arch/i386/kernel/tsc.c */
+	if (cpu_has_tsc && tsc_khz) {
+		current_tsc_khz = tsc_khz;
+		timesource_tsc.mult = timesource_khz2mult(current_tsc_khz,
+							timesource_tsc.shift);
+		register_timesource(&timesource_tsc);
+	}
+	return 0;
+}
+
+module_init(init_tsc_timesource);
+
diff -ruN linux-2.6.12-rc6-mm1/drivers/timesource/tsc-interp.c linux-2.6.12-rc6-mm1-tod/drivers/timesource/tsc-interp.c
--- linux-2.6.12-rc6-mm1/drivers/timesource/tsc-interp.c	1969-12-31 16:00:00.000000000 -0800
+++ linux-2.6.12-rc6-mm1-tod/drivers/timesource/tsc-interp.c	2005-06-17 19:15:53.000000000 -0700
@@ -0,0 +1,112 @@
+/* TSC-Jiffies Interpolation timesource
+	Example interpolation timesource.
+TODO:
+	o per-cpu TSC offsets
+*/
+#include <linux/timesource.h>
+#include <linux/timer.h>
+#include <linux/timex.h>
+#include <linux/init.h>
+#include <linux/jiffies.h>
+#include <linux/threads.h>
+#include <linux/smp.h>
+
+static unsigned long current_tsc_khz = 0;
+
+static seqlock_t tsc_interp_lock = SEQLOCK_UNLOCKED;
+static cycle_t tsc_then;
+static cycle_t jiffies_then;
+struct timer_list tsc_interp_timer;
+
+static unsigned long mult, shift;
+
+#define NSEC_PER_JIFFY ((((unsigned long long)NSEC_PER_SEC)<<8)/ACTHZ)
+#define SHIFT_VAL 22
+
+static cycle_t read_tsc_interp(void);
+static void tsc_interp_update_callback(void);
+
+static struct timesource_t timesource_tsc_interp = {
+	.name = "tsc-interp",
+	.priority = 150,
+	.type = TIMESOURCE_FUNCTION,
+	.read_fnct = read_tsc_interp,
+	.mask = (cycle_t)-1,
+	.mult = 1<<SHIFT_VAL,
+	.shift = SHIFT_VAL,
+	.update_callback = tsc_interp_update_callback,
+};
+
+static void tsc_interp_sync(unsigned long unused)
+{
+	cycle_t tsc_now;
+	u64 jiffies_now;
+
+	do {
+		jiffies_now = get_jiffies_64() - INITIAL_JIFFIES;
+		rdtscll(tsc_now);
+	} while (jiffies_now != (get_jiffies_64() - INITIAL_JIFFIES));
+
+	write_seqlock(&tsc_interp_lock);
+	jiffies_then = jiffies_now;
+	tsc_then = tsc_now;
+	write_sequnlock(&tsc_interp_lock);
+
+	mod_timer(&tsc_interp_timer, jiffies+1);
+}
+
+
+static cycle_t read_tsc_interp(void)
+{
+	cycle_t ret;
+	cycle_t now, then;
+	u64 jiffs_now, jiffs_then;
+	unsigned long seq;
+
+	do {
+		seq = read_seqbegin(&tsc_interp_lock);
+
+		jiffs_now = get_jiffies_64() - INITIAL_JIFFIES;
+		jiffs_then = jiffies_then;
+		then = tsc_then;
+
+	} while (read_seqretry(&tsc_interp_lock, seq));
+
+	rdtscll(now);
+	ret = jiffs_then * NSEC_PER_JIFFY;
+	if (jiffs_then == jiffs_now)
+		ret += min((cycle_t)NSEC_PER_JIFFY,(cycle_t)((now - then)*mult)>> shift);
+	else
+		ret += (jiffs_now - jiffs_then)*NSEC_PER_JIFFY;
+
+	return ret;
+}
+
+static void tsc_interp_update_callback(void)
+{
+	/* only update if tsc_khz has changed */
+	if (current_tsc_khz != tsc_khz){
+		current_tsc_khz = tsc_khz;
+		mult = timesource_khz2mult(current_tsc_khz, shift);
+	}
+}
+
+
+static int __init init_tsc_interp_timesource(void)
+{
+	/* TSC initialization is done in arch/i386/kernel/tsc.c */
+	if (cpu_has_tsc && tsc_khz) {
+		current_tsc_khz = tsc_khz;
+		shift = SHIFT_VAL;
+		mult = timesource_khz2mult(current_tsc_khz, shift);
+		/* setup periodic soft-timer */
+		init_timer(&tsc_interp_timer);
+		tsc_interp_timer.function = tsc_interp_sync;
+		tsc_interp_timer.expires = jiffies;
+		add_timer(&tsc_interp_timer);
+
+		register_timesource(&timesource_tsc_interp);
+	}
+	return 0;
+}
+module_init(init_tsc_interp_timesource);



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/6] new timeofday core subsystem for -mm (v.B3)
  2005-06-18  2:56 [PATCH 1/6] new timeofday core subsystem for -mm (v.B3) john stultz
  2005-06-18  2:58 ` [PATCH 2/6] new timeofday i386 arch specific changes, part 1 " john stultz
@ 2005-06-18 12:02 ` Roman Zippel
  2005-06-20  7:01   ` Ulrich Windl
  2005-06-20 17:09   ` john stultz
  1 sibling, 2 replies; 32+ messages in thread
From: Roman Zippel @ 2005-06-18 12:02 UTC (permalink / raw)
  To: john stultz; +Cc: lkml, Andrew Morton, George Anzinger, Ulrich Windl

Hi,

On Fri, 17 Jun 2005, john stultz wrote:

> o Uses nanoseconds as the kernel's base time unit

Maybe I missed it, but was there ever a conclusive discussion about the 
perfomance impact this has?
I see lots of new u64 variables. I'm especially interested how this code 
scales down to small and slow machines, where such a precision is absolute 
overkill. How do these patches change current and possibly common time 
operations?

bye, Roman

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/6] new timeofday core subsystem for -mm (v.B3)
  2005-06-18 12:02 ` [PATCH 1/6] new timeofday core subsystem " Roman Zippel
@ 2005-06-20  7:01   ` Ulrich Windl
  2005-06-20 10:22     ` Roman Zippel
  2005-06-20 17:09   ` john stultz
  1 sibling, 1 reply; 32+ messages in thread
From: Ulrich Windl @ 2005-06-20  7:01 UTC (permalink / raw)
  To: Roman Zippel; +Cc: lkml, Andrew Morton, George Anzinger, Ulrich Windl

On 18 Jun 2005 at 14:02, Roman Zippel wrote:

> Hi,
> 
> On Fri, 17 Jun 2005, john stultz wrote:
> 
> > o Uses nanoseconds as the kernel's base time unit
> 
> Maybe I missed it, but was there ever a conclusive discussion about the 
> perfomance impact this has?
> I see lots of new u64 variables. I'm especially interested how this code 
> scales down to small and slow machines, where such a precision is absolute 
> overkill. How do these patches change current and possibly common time 
> operations?

Hi all!

I had the impression that for slow and small machines every recent Linux 
distribution is overkill. Whenever I complained every relied "Harddisks are cheap, 
memory is cheap, get a new CPU". I do understand your doubts however. For my 
personal experience with my PPSkit patches, I found out that my ols 386/SX @16MHz 
failed to receive all serial characters when I timestamped each of them using my 
new clock routines. However a 486@33MHz would do (it had better serial UART chips, 
too). I would not thing my code was terribly efficient, because I tried to make it 
"right" first.

And even the 386 had limited support for 64 bit operations. Since the 486 a 32bit 
add is specified as using 1 CPU cycle (most likely in the optimal case). So doing 
one more would not harm that much.

Basically, either the new clock system has to be optional (a maintenance nightmare 
most likely), or you'll have to require a specific amount of performance for the 
latest software. If you cannot fulfill the requirements, you'll have to stick with 
an older release of the software.

Maybe let's try to make it as good (correct and efficient (and understandable) as 
good as we can.

Regards,
Ulrich

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/6] new timeofday core subsystem for -mm (v.B3)
  2005-06-20  7:01   ` Ulrich Windl
@ 2005-06-20 10:22     ` Roman Zippel
  2005-06-20 10:31       ` Ulrich Windl
  0 siblings, 1 reply; 32+ messages in thread
From: Roman Zippel @ 2005-06-20 10:22 UTC (permalink / raw)
  To: Ulrich Windl; +Cc: lkml, Andrew Morton, George Anzinger

Hi,

On Mon, 20 Jun 2005, Ulrich Windl wrote:

> Basically, either the new clock system has to be optional (a maintenance nightmare 
> most likely), or you'll have to require a specific amount of performance for the 
> latest software. If you cannot fulfill the requirements, you'll have to stick with 
> an older release of the software.
> 
> Maybe let's try to make it as good (correct and efficient (and understandable) as 
> good as we can.

If nobody can explain to me the perfomance impact of patch, maybe the 
patch isn't so understandable in first place?
I could also have asked how that code scales up, e.g. how much more work 
has to be done for a thousand Linux images. (AFAICR this question also 
came up in the context of tickless systems).

This patch is really damned hard to read as it changes too many things at 
once. Maybe it does some necessary cleanups, but they are hard see, as 
they pretty much get lost in all the functional changes.
I'm pretty close to suggest to reject this patch until it clearly 
separates new functionality from cleanups. If the current system is broken 
fix it first, if the current system is a mess clean it up first, but 
don't mix these two steps, unless you want to introduce more broken mess.

bye, Roman

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/6] new timeofday core subsystem for -mm (v.B3)
  2005-06-20 10:22     ` Roman Zippel
@ 2005-06-20 10:31       ` Ulrich Windl
  2005-06-20 10:54         ` Roman Zippel
  2005-06-20 11:04         ` Christoph Hellwig
  0 siblings, 2 replies; 32+ messages in thread
From: Ulrich Windl @ 2005-06-20 10:31 UTC (permalink / raw)
  To: Roman Zippel; +Cc: lkml, Andrew Morton, George Anzinger

On 20 Jun 2005 at 12:22, Roman Zippel wrote:

[...]
> This patch is really damned hard to read as it changes too many things at 
> once. Maybe it does some necessary cleanups, but they are hard see, as 
> they pretty much get lost in all the functional changes.
> I'm pretty close to suggest to reject this patch until it clearly 
> separates new functionality from cleanups. If the current system is broken 

Roman,

it seems you don't like the patch for some personal reasons, and now your are 
trying to find arguments against it. The best method to get the perfomance 
implications is trying it (the patched kernel).

> fix it first, if the current system is a mess clean it up first, but 
> don't mix these two steps, unless you want to introduce more broken mess.

If you introduce something new (higher resolution clock), you should start with 
something clean. Just hacking it in the first attempt, anf then making it 
beautiful in a second attempt is a waste of time IMHO.

Regards,
Ulrich


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/6] new timeofday core subsystem for -mm (v.B3)
  2005-06-20 10:31       ` Ulrich Windl
@ 2005-06-20 10:54         ` Roman Zippel
  2005-06-20 11:04         ` Christoph Hellwig
  1 sibling, 0 replies; 32+ messages in thread
From: Roman Zippel @ 2005-06-20 10:54 UTC (permalink / raw)
  To: Ulrich Windl; +Cc: lkml, Andrew Morton, George Anzinger

Hi,

On Mon, 20 Jun 2005, Ulrich Windl wrote:

> it seems you don't like the patch for some personal reasons, and now your are 
> trying to find arguments against it.

Please stop trying such mind reading tricks.

> > fix it first, if the current system is a mess clean it up first, but 
> > don't mix these two steps, unless you want to introduce more broken mess.
> 
> If you introduce something new (higher resolution clock), you should start with 
> something clean. Just hacking it in the first attempt, anf then making it 
> beautiful in a second attempt is a waste of time IMHO.

Well, I'm not really convinced of the quality of the patch, if nobody can 
explain it to me and I don't think my questions were that unreasonable.
We are talking about a core subsystem here, which IMO justifies a little 
more care. Maybe you should follow the fs development a bit for an example 
how to do even large changes in relatively simple steps.

bye, Roman

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/6] new timeofday core subsystem for -mm (v.B3)
  2005-06-20 10:31       ` Ulrich Windl
  2005-06-20 10:54         ` Roman Zippel
@ 2005-06-20 11:04         ` Christoph Hellwig
  1 sibling, 0 replies; 32+ messages in thread
From: Christoph Hellwig @ 2005-06-20 11:04 UTC (permalink / raw)
  To: Ulrich Windl; +Cc: Roman Zippel, lkml, Andrew Morton, George Anzinger

On Mon, Jun 20, 2005 at 12:31:48PM +0200, Ulrich Windl wrote:
> it seems you don't like the patch for some personal reasons, and now your are 
> trying to find arguments against it. The best method to get the perfomance 
> implications is trying it (the patched kernel).

Roman is just asking for explanations.  The patches approach might be really
great but if no one understands it that doesn't help.  And it's a very sensitive
part of the kernel so it needs to be understood very well.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/6] new timeofday core subsystem for -mm (v.B3)
  2005-06-18 12:02 ` [PATCH 1/6] new timeofday core subsystem " Roman Zippel
  2005-06-20  7:01   ` Ulrich Windl
@ 2005-06-20 17:09   ` john stultz
  2005-06-20 18:10     ` Lee Revell
  2005-06-20 22:05     ` Roman Zippel
  1 sibling, 2 replies; 32+ messages in thread
From: john stultz @ 2005-06-20 17:09 UTC (permalink / raw)
  To: Roman Zippel; +Cc: lkml, Andrew Morton, George Anzinger, Ulrich Windl

On Sat, 2005-06-18 at 14:02 +0200, Roman Zippel wrote:
> On Fri, 17 Jun 2005, john stultz wrote:
> 
> > o Uses nanoseconds as the kernel's base time unit
> 
> Maybe I missed it, but was there ever a conclusive discussion about the 
> perfomance impact this has?
> I see lots of new u64 variables. I'm especially interested how this code 
> scales down to small and slow machines, where such a precision is absolute 
> overkill. How do these patches change current and possibly common time 
> operations?

Hey Roman, 
	That's a good issue to bring up. With regards to the timeofday
infrastructure, there are two performance concerns (though let me know
if I'm forgetting something):
	1. timer interrupt processing overhead
	2. gettimeofday() syscall performance

On smaller systems, timer interrupt processing is a concern, with the
shift to HZ=1000, we got a number of complaints from folks w/ old 486s
where time would drift due lost ticks. This would happen when something
(usually IDE in PIO mode) would disable interrupts and they would miss a
ton of timer interrupts. Also the impact of running the timekeeping code
10x more frequently was seen in a number of cases.

With the new infrastructure, timekeeping is all done via a soft-timer
outside of interrupt context. In fact, the timekeeping soft-timer is
setup to run every 50ms instead of every ms. This should help overall
performance on slower systems using high HZ values.

As for gettimefoday() syscall performance, I one had some numbers, but I
would need to re-create them. I'll see if I can grab a slower box and
give you some hard numbers. The gettimeofday() path is fairly
streamlined and should be pretty straight forward in the patch (see
kernel/timeofday.c), so let me know if you have specific concerns. 

There will probably be a bit of a drop, but I have some ideas for
cacheing a precomputed timeval in the timekeeping soft-timer if its a
serious issue.

thanks
-john

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/6] new timeofday core subsystem for -mm (v.B3)
  2005-06-20 17:09   ` john stultz
@ 2005-06-20 18:10     ` Lee Revell
  2005-06-20 21:53       ` john stultz
  2005-06-21  6:26       ` Ulrich Windl
  2005-06-20 22:05     ` Roman Zippel
  1 sibling, 2 replies; 32+ messages in thread
From: Lee Revell @ 2005-06-20 18:10 UTC (permalink / raw)
  To: john stultz
  Cc: Roman Zippel, lkml, Andrew Morton, George Anzinger, Ulrich Windl

On Mon, 2005-06-20 at 10:09 -0700, john stultz wrote:
> As for gettimefoday() syscall performance, I one had some numbers, but
> I
> would need to re-create them. I'll see if I can grab a slower box and
> give you some hard numbers. 

I ran some tests lately that showed gettimeofday() to be 50x slower than
rdtsc() on my 600Mhz machine.  Many userspace apps that need a cheap
high res timer have to use rdtsc now due to the excessive overhead of
gettimeofday().  It would be more correct if these apps could use
gettimeofday() for various reasons (cpufreq and SMP issues).

So this patch is addressing a real problem.  I'd be interested to see if
the performance is good enough to replace rdtsc in these cases.

Lee

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/6] new timeofday core subsystem for -mm (v.B3)
  2005-06-20 18:10     ` Lee Revell
@ 2005-06-20 21:53       ` john stultz
  2005-06-20 23:44         ` Lee Revell
  2005-06-21  6:26       ` Ulrich Windl
  1 sibling, 1 reply; 32+ messages in thread
From: john stultz @ 2005-06-20 21:53 UTC (permalink / raw)
  To: Lee Revell
  Cc: Roman Zippel, lkml, Andrew Morton, George Anzinger, Ulrich Windl

On Mon, 2005-06-20 at 14:10 -0400, Lee Revell wrote:
> On Mon, 2005-06-20 at 10:09 -0700, john stultz wrote:
> > As for gettimefoday() syscall performance, I one had some numbers, but
> > I
> > would need to re-create them. I'll see if I can grab a slower box and
> > give you some hard numbers. 
> 
> I ran some tests lately that showed gettimeofday() to be 50x slower than
> rdtsc() on my 600Mhz machine.  Many userspace apps that need a cheap
> high res timer have to use rdtsc now due to the excessive overhead of
> gettimeofday().  It would be more correct if these apps could use
> gettimeofday() for various reasons (cpufreq and SMP issues).

Yea, I would strongly dissuade anyone from using the rdtsc counter for
anything but statistical analysis of code performance. 

> So this patch is addressing a real problem.  I'd be interested to see if
> the performance is good enough to replace rdtsc in these cases.

Yea, honestly I doubt gettimefoday performance will ever be as good as
rdtsc. I mean, that's a single instruction vs syscall overhead +
hardware clock reading + frequency conversion + ntp adjustment. Its just
not a fair comparison. 

On the other hand, I bet reading a random 64 bits out of memory is also
a bit faster then gettimeofday() as well ;)

I don't mean to promise the world. The point of the patch is not to
improve gettimeofday performance, it is to improve the subsystem so it
is correct and manageable, so that we have the flexibility to make
future improvements (such as High-res Timers, Dynamic Ticks/Variable
system timer, and virtualization needs) without impacting performance.

The best payout for gettimeofday performance will probably be in
vsyscall implementations such as what x86-64 already has. My new
infrastructure also supports this (it had to for x86-64), and I've even
got a proof of concept patch for i386 (see the lkml archives for
details).

thanks
-john

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/6] new timeofday core subsystem for -mm (v.B3)
  2005-06-20 17:09   ` john stultz
  2005-06-20 18:10     ` Lee Revell
@ 2005-06-20 22:05     ` Roman Zippel
  2005-06-20 23:40       ` Lee Revell
                         ` (2 more replies)
  1 sibling, 3 replies; 32+ messages in thread
From: Roman Zippel @ 2005-06-20 22:05 UTC (permalink / raw)
  To: john stultz; +Cc: lkml, Andrew Morton, George Anzinger, Ulrich Windl

Hi,

On Mon, 20 Jun 2005, john stultz wrote:

> > I see lots of new u64 variables. I'm especially interested how this code 
> > scales down to small and slow machines, where such a precision is absolute 
> > overkill. How do these patches change current and possibly common time 
> > operations?
> 
> 
> Hey Roman, 
> 	That's a good issue to bring up. With regards to the timeofday
> infrastructure, there are two performance concerns (though let me know
> if I'm forgetting something):

You don't really answer the core question, why do you change everything to 
nanoseconds and 64bit values?

> On smaller systems, timer interrupt processing is a concern, with the
> shift to HZ=1000, we got a number of complaints from folks w/ old 486s
> where time would drift due lost ticks. This would happen when something
> (usually IDE in PIO mode) would disable interrupts and they would miss a
> ton of timer interrupts. Also the impact of running the timekeeping code
> 10x more frequently was seen in a number of cases.
> 
> With the new infrastructure, timekeeping is all done via a soft-timer
> outside of interrupt context. In fact, the timekeeping soft-timer is
> setup to run every 50ms instead of every ms. This should help overall
> performance on slower systems using high HZ values.

With -mm you can now choose the HZ value, so that's not really the 
problem anymore. A lot of archs even never changed to a higher HZ value. 
So now I still like to know how does the complexity change compared to the 
old code?

> As for gettimefoday() syscall performance, I one had some numbers, but I
> would need to re-create them. I'll see if I can grab a slower box and
> give you some hard numbers. The gettimeofday() path is fairly
> streamlined and should be pretty straight forward in the patch (see
> kernel/timeofday.c), so let me know if you have specific concerns. 
> 
> There will probably be a bit of a drop, but I have some ideas for
> cacheing a precomputed timeval in the timekeeping soft-timer if its a
> serious issue.

Well, AFAICT on slower machines (older and embedded stuff) it's a serious 
issue. The current code calculates the timeval with some simple 32bit 
calculations. Your code introduces the nsec step, which means several 
64bit calculations and suddenly the overhead explodes on some machines.

As m68k maintainer I see no reason to ever switch to your new code, which 
might leave you with the dilemma having to maintain two versions of the 
timer code. What reason could I have to switch to the new timer code?

I had no problems with a little more overhead, like a _few_ more 64bit 
operations per second (and preferably add/shifts), but I'm not really 
enthusiastic about the new code. Why don't you keep the main part 32 bits 
(or long)? What's wrong with using timeval or timespec?

I like the concept of the a time source in your patch. m68k already uses a 
number of timer related callbacks into machine specific code. If I could 
replace that with a timer driver, I'd be really happy. OTOH if it requires 
several expensive conversion between different time formats, I rather keep 
the current code.

bye, Roman

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/6] new timeofday core subsystem for -mm (v.B3)
  2005-06-20 22:05     ` Roman Zippel
@ 2005-06-20 23:40       ` Lee Revell
  2005-06-20 23:55       ` john stultz
  2005-06-21  6:42       ` Ulrich Windl
  2 siblings, 0 replies; 32+ messages in thread
From: Lee Revell @ 2005-06-20 23:40 UTC (permalink / raw)
  To: Roman Zippel
  Cc: john stultz, lkml, Andrew Morton, George Anzinger, Ulrich Windl

On Tue, 2005-06-21 at 00:05 +0200, Roman Zippel wrote:
> With -mm you can now choose the HZ value, so that's not really the 
> problem anymore. A lot of archs even never changed to a higher HZ
> value. 

That does not solve anything, going back to HZ=100 is a big user-visible
regression because the resolution of sleep() (and poll, etc) is now 10ms
rather than 1ms.  Many apps have RT constraints between 1ms and 10ms.

Lee



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/6] new timeofday core subsystem for -mm (v.B3)
  2005-06-20 21:53       ` john stultz
@ 2005-06-20 23:44         ` Lee Revell
  2005-06-21 14:55           ` Chris Friesen
  0 siblings, 1 reply; 32+ messages in thread
From: Lee Revell @ 2005-06-20 23:44 UTC (permalink / raw)
  To: john stultz
  Cc: Roman Zippel, lkml, Andrew Morton, George Anzinger, Ulrich Windl

On Mon, 2005-06-20 at 14:53 -0700, john stultz wrote:
> Yea, honestly I doubt gettimefoday performance will ever be as good as
> rdtsc. I mean, that's a single instruction vs syscall overhead +
> hardware clock reading + frequency conversion + ntp adjustment. Its
> just not a fair comparison.  

Of course not, the patch would have to be magic for that to happen.  

But some user space apps are now *required* to use rdtsc for timing due
to the massive performance difference.  If we only took a 5x or 10x
performance hit vs rdtsc, rather than the current 50x, it might be
enough that user space apps won't have to do this.

Lee

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/6] new timeofday core subsystem for -mm (v.B3)
  2005-06-20 22:05     ` Roman Zippel
  2005-06-20 23:40       ` Lee Revell
@ 2005-06-20 23:55       ` john stultz
  2005-06-21 15:08         ` Roman Zippel
  2005-06-21  6:42       ` Ulrich Windl
  2 siblings, 1 reply; 32+ messages in thread
From: john stultz @ 2005-06-20 23:55 UTC (permalink / raw)
  To: Roman Zippel; +Cc: lkml, Andrew Morton, George Anzinger, Ulrich Windl

On Tue, 2005-06-21 at 00:05 +0200, Roman Zippel wrote:
> Hi,
> 
> On Mon, 20 Jun 2005, john stultz wrote:
> 
> > > I see lots of new u64 variables. I'm especially interested how this code 
> > > scales down to small and slow machines, where such a precision is absolute 
> > > overkill. How do these patches change current and possibly common time 
> > > operations?
> > 
> > 
> > Hey Roman, 
> > 	That's a good issue to bring up. With regards to the timeofday
> > infrastructure, there are two performance concerns (though let me know
> > if I'm forgetting something):
> 
> You don't really answer the core question, why do you change everything to 
> nanoseconds and 64bit values?


Well, for a reasonable range of timesource frequencies and interval
lengths, the cycle values needs to be 64 bits wide and the mult/shift
operation to convert cycles to time is going to need to done with 64
bits.

Since xtime is currently a timespec, we already keep nanosecond
precision using 64 bits. It was then just a question of how complicated
is it to do the manipulations with timespecs vs flat 64 bits worth of
nanoseconds.  Nanoseconds made the design cleaner, so I went with them
keeping the option to optimize in the future when real issues arose. 


> > On smaller systems, timer interrupt processing is a concern, with the
> > shift to HZ=1000, we got a number of complaints from folks w/ old 486s
> > where time would drift due lost ticks. This would happen when something
> > (usually IDE in PIO mode) would disable interrupts and they would miss a
> > ton of timer interrupts. Also the impact of running the timekeeping code
> > 10x more frequently was seen in a number of cases.
> > 
> > With the new infrastructure, timekeeping is all done via a soft-timer
> > outside of interrupt context. In fact, the timekeeping soft-timer is
> > setup to run every 50ms instead of every ms. This should help overall
> > performance on slower systems using high HZ values.
> 
> With -mm you can now choose the HZ value, so that's not really the 
> problem anymore. A lot of archs even never changed to a higher HZ value. 
> So now I still like to know how does the complexity change compared to the 
> old code?

Well, even if it is less of a concern with lower HZ values, my code
should still reduce the interrupt overhead for those who would like to
have higher HZ values to improve latency. (Although until I have
numbers, its all just talk. :)


> > As for gettimefoday() syscall performance, I one had some numbers, but I
> > would need to re-create them. I'll see if I can grab a slower box and
> > give you some hard numbers. The gettimeofday() path is fairly
> > streamlined and should be pretty straight forward in the patch (see
> > kernel/timeofday.c), so let me know if you have specific concerns. 
> > 
> > There will probably be a bit of a drop, but I have some ideas for
> > cacheing a precomputed timeval in the timekeeping soft-timer if its a
> > serious issue.
> 
> Well, AFAICT on slower machines (older and embedded stuff) it's a serious 
> issue. The current code calculates the timeval with some simple 32bit 
> calculations. Your code introduces the nsec step, which means several 
> 64bit calculations and suddenly the overhead explodes on some machines.
> 
> As m68k maintainer I see no reason to ever switch to your new code, which 
> might leave you with the dilemma having to maintain two versions of the 
> timer code. What reason could I have to switch to the new timer code?
> 
> I had no problems with a little more overhead, like a _few_ more 64bit 
> operations per second (and preferably add/shifts), but I'm not really 
> enthusiastic about the new code. Why don't you keep the main part 32 bits 
> (or long)? What's wrong with using timeval or timespec?

Well, I think the only overhead to be worried with is just in
gettimeofday(), but let me get some hard numbers to show that. I've
already implemented some optimized caching for x86-64's vsyscall-gtod
implementation, so let me also try to make an arch generic version and
see if I cannot settle your concerns.

> I like the concept of the a time source in your patch. m68k already uses a 
> number of timer related callbacks into machine specific code. If I could 
> replace that with a timer driver, I'd be really happy. OTOH if it requires 
> several expensive conversion between different time formats, I rather keep 
> the current code.

Thanks, I really appreciate your review and feedback. I very much want
this to be a solution everyone can be happy (or at least indifferent)
with.

thanks
-john


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/6] new timeofday core subsystem for -mm (v.B3)
  2005-06-20 18:10     ` Lee Revell
  2005-06-20 21:53       ` john stultz
@ 2005-06-21  6:26       ` Ulrich Windl
  1 sibling, 0 replies; 32+ messages in thread
From: Ulrich Windl @ 2005-06-21  6:26 UTC (permalink / raw)
  To: Lee Revell
  Cc: Roman Zippel, lkml, Andrew Morton, George Anzinger, Ulrich Windl

On 20 Jun 2005 at 14:10, Lee Revell wrote:

> On Mon, 2005-06-20 at 10:09 -0700, john stultz wrote:
> > As for gettimefoday() syscall performance, I one had some numbers, but
> > I
> > would need to re-create them. I'll see if I can grab a slower box and
> > give you some hard numbers. 
> 
> I ran some tests lately that showed gettimeofday() to be 50x slower than
> rdtsc() on my 600Mhz machine.  Many userspace apps that need a cheap

Hello!

Isn't that a kind of absurd comparison: rdtsc is one or two instructions, while 
gettimeofday is a complete syscall with some many instructions: rdtsc will never 
give you the time of day besides of that. Are you voting for replacing 
gettimeofday with rdtsc?

> high res timer have to use rdtsc now due to the excessive overhead of
> gettimeofday().  It would be more correct if these apps could use

You can either have it accurate, or you can have it fast. Also gettimeofday works 
on any UNIX platform, rdtsc does not.

> gettimeofday() for various reasons (cpufreq and SMP issues).

Good point, but you don't get that higher reliability and accuracy for free.

> 
> So this patch is addressing a real problem.  I'd be interested to see if
> the performance is good enough to replace rdtsc in these cases.

In which applications (Java excepted maybe) do you really need to call 
gettimeofday more than a few thousand times per second? Most likely you are 
working around a different issue then.

Regards,
Ulrich

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/6] new timeofday core subsystem for -mm (v.B3)
  2005-06-20 22:05     ` Roman Zippel
  2005-06-20 23:40       ` Lee Revell
  2005-06-20 23:55       ` john stultz
@ 2005-06-21  6:42       ` Ulrich Windl
  2005-06-21 15:13         ` Roman Zippel
  2 siblings, 1 reply; 32+ messages in thread
From: Ulrich Windl @ 2005-06-21  6:42 UTC (permalink / raw)
  To: Roman Zippel; +Cc: lkml, Andrew Morton, George Anzinger, Ulrich Windl

On 21 Jun 2005 at 0:05, Roman Zippel wrote:

> Hi,
> 
> On Mon, 20 Jun 2005, john stultz wrote:
> 
> > > I see lots of new u64 variables. I'm especially interested how this code 
> > > scales down to small and slow machines, where such a precision is absolute 
> > > overkill. How do these patches change current and possibly common time 
> > > operations?
> > 
> > 
> > Hey Roman, 
> > 	That's a good issue to bring up. With regards to the timeofday
> > infrastructure, there are two performance concerns (though let me know
> > if I'm forgetting something):
> 
> You don't really answer the core question, why do you change everything to 
> nanoseconds and 64bit values?

Because just multiplying the microseconds by one thousand doesn't really provide a 
nanosecond clock, maybe?

[...]
> With -mm you can now choose the HZ value, so that's not really the 
> problem anymore. A lot of archs even never changed to a higher HZ value. 

Did you ever do an analysis how this affected clock quality? See 
comp.protocols.time.ntp for all the complains about broken kernels (I think Redhat 
had it first, then the others followed).

> So now I still like to know how does the complexity change compared to the 
> old code?

You can have a look at the code. That's the point where you can decide about 
complexity. I haven't looked closely, but I guess it was O(1) before, and now also 
is O(1).

[...]
> Well, AFAICT on slower machines (older and embedded stuff) it's a serious 
> issue. The current code calculates the timeval with some simple 32bit 
> calculations. Your code introduces the nsec step, which means several 
> 64bit calculations and suddenly the overhead explodes on some machines.
> 
> As m68k maintainer I see no reason to ever switch to your new code, which 
> might leave you with the dilemma having to maintain two versions of the 
> timer code. What reason could I have to switch to the new timer code?

I never knew the 68k has such a poor performance.

> 
> I had no problems with a little more overhead, like a _few_ more 64bit 
> operations per second (and preferably add/shifts), but I'm not really 
> enthusiastic about the new code. Why don't you keep the main part 32 bits 
> (or long)? What's wrong with using timeval or timespec?
> 
> I like the concept of the a time source in your patch. m68k already uses a 
> number of timer related callbacks into machine specific code. If I could 
> replace that with a timer driver, I'd be really happy. OTOH if it requires 
> several expensive conversion between different time formats, I rather keep 
> the current code.

If everybody would be keeping the current code, we wouldn't have any problems with 
the new code (as I said before). New features do cost some power unfortunately.

Regards,
Ulrich


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/6] new timeofday core subsystem for -mm (v.B3)
  2005-06-20 23:44         ` Lee Revell
@ 2005-06-21 14:55           ` Chris Friesen
  2005-06-21 17:20             ` john stultz
  0 siblings, 1 reply; 32+ messages in thread
From: Chris Friesen @ 2005-06-21 14:55 UTC (permalink / raw)
  To: Lee Revell
  Cc: john stultz, Roman Zippel, lkml, Andrew Morton, George Anzinger,
	Ulrich Windl

Lee Revell wrote:

> But some user space apps are now *required* to use rdtsc for timing due
> to the massive performance difference.  If we only took a 5x or 10x
> performance hit vs rdtsc, rather than the current 50x, it might be
> enough that user space apps won't have to do this.

For my userspace apps I've actually switched to 
clock_gettime(CLOCK_MONOTONIC, &ts);

This at least guarantees that it will never go backwards.

For the experts: Is there a clock exported to userspace that is both 
monotonic and uniform?  Does CLOCK_MONOTONIC give this on linux?


Chris

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/6] new timeofday core subsystem for -mm (v.B3)
  2005-06-20 23:55       ` john stultz
@ 2005-06-21 15:08         ` Roman Zippel
  2005-06-22  0:57           ` john stultz
  0 siblings, 1 reply; 32+ messages in thread
From: Roman Zippel @ 2005-06-21 15:08 UTC (permalink / raw)
  To: john stultz; +Cc: lkml, Andrew Morton, George Anzinger, Ulrich Windl

Hi,

On Mon, 20 Jun 2005, john stultz wrote:

> > You don't really answer the core question, why do you change everything to 
> > nanoseconds and 64bit values?
> 
> 
> Well, for a reasonable range of timesource frequencies and interval
> lengths, the cycle values needs to be 64 bits wide and the mult/shift
> operation to convert cycles to time is going to need to done with 64
> bits.
> 
> Since xtime is currently a timespec, we already keep nanosecond
> precision using 64 bits. It was then just a question of how complicated
> is it to do the manipulations with timespecs vs flat 64 bits worth of
> nanoseconds.  Nanoseconds made the design cleaner, so I went with them
> keeping the option to optimize in the future when real issues arose. 

That might be the case, but your patch makes it very hard to verify.
You don't fix the old code, you just drop in a complete new 
implementation, so you have to explain it a bit more.
What's exactly wrong with old design and why wasn't possible to fix it 
incrementally? How exactly is your new design superior?

> > With -mm you can now choose the HZ value, so that's not really the 
> > problem anymore. A lot of archs even never changed to a higher HZ value. 
> > So now I still like to know how does the complexity change compared to the 
> > old code?
> 
> Well, even if it is less of a concern with lower HZ values, my code
> should still reduce the interrupt overhead for those who would like to
> have higher HZ values to improve latency. (Although until I have
> numbers, its all just talk. :)

For machines where it actually matters, I can only see that calculations 
have gotten more complex and thus slower. You need to provide a little 
more detailed information, why this necessary.

> Well, I think the only overhead to be worried with is just in
> gettimeofday(), but let me get some hard numbers to show that. I've
> already implemented some optimized caching for x86-64's vsyscall-gtod
> implementation, so let me also try to make an arch generic version and
> see if I cannot settle your concerns.

I don't need any practical numbers, I can already see from the code, that 
it's much worse and unless you eliminate the 64bit calculations from the 
fast path, I don't know what you are trying to optimize.

> > I like the concept of the a time source in your patch. m68k already uses a 
> > number of timer related callbacks into machine specific code. If I could 
> > replace that with a timer driver, I'd be really happy. OTOH if it requires 
> > several expensive conversion between different time formats, I rather keep 
> > the current code.
> 
> Thanks, I really appreciate your review and feedback. I very much want
> this to be a solution everyone can be happy (or at least indifferent)
> with.

I would seriously suggest you rework the first patch and fix the existing 
code instead or you have to explain why the current code is unfixable, but 
that would require to actually replace the old code. Having two 
alternative implementations is really the worst solution.

John, you are _very_ vague here, could you please go into more detail, why 
you did certain design decisions? "Simplicity" can't be the only reason, 
good perfomance is far more important.

bye, Roman

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/6] new timeofday core subsystem for -mm (v.B3)
  2005-06-21  6:42       ` Ulrich Windl
@ 2005-06-21 15:13         ` Roman Zippel
  0 siblings, 0 replies; 32+ messages in thread
From: Roman Zippel @ 2005-06-21 15:13 UTC (permalink / raw)
  To: Ulrich Windl; +Cc: lkml, Andrew Morton, George Anzinger

Hi,

On Tue, 21 Jun 2005, Ulrich Windl wrote:

> > You don't really answer the core question, why do you change everything to 
> > nanoseconds and 64bit values?
> 
> Because just multiplying the microseconds by one thousand doesn't really provide a 
> nanosecond clock, maybe?

What are you trying to tell me?

> > With -mm you can now choose the HZ value, so that's not really the 
> > problem anymore. A lot of archs even never changed to a higher HZ value. 
> 
> Did you ever do an analysis how this affected clock quality? See 
> comp.protocols.time.ntp for all the complains about broken kernels (I think Redhat 
> had it first, then the others followed).

So how exactly does this patch fix this?

> > So now I still like to know how does the complexity change compared to the 
> > old code?
> 
> You can have a look at the code. That's the point where you can decide about 
> complexity. I haven't looked closely, but I guess it was O(1) before, and now also 
> is O(1).

You guess or you know?

> > As m68k maintainer I see no reason to ever switch to your new code, which 
> > might leave you with the dilemma having to maintain two versions of the 
> > timer code. What reason could I have to switch to the new timer code?
> 
> I never knew the 68k has such a poor performance.

Usually it's code that either is efficient or performs badly.

bye, Roman

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/6] new timeofday core subsystem for -mm (v.B3)
  2005-06-21 14:55           ` Chris Friesen
@ 2005-06-21 17:20             ` john stultz
  0 siblings, 0 replies; 32+ messages in thread
From: john stultz @ 2005-06-21 17:20 UTC (permalink / raw)
  To: Chris Friesen
  Cc: Lee Revell, Roman Zippel, lkml, Andrew Morton, George Anzinger,
	Ulrich Windl

On Tue, 2005-06-21 at 08:55 -0600, Chris Friesen wrote:
> Lee Revell wrote:
> 
> > But some user space apps are now *required* to use rdtsc for timing due
> > to the massive performance difference.  If we only took a 5x or 10x
> > performance hit vs rdtsc, rather than the current 50x, it might be
> > enough that user space apps won't have to do this.
> 
> For my userspace apps I've actually switched to 
> clock_gettime(CLOCK_MONOTONIC, &ts);
> 
> This at least guarantees that it will never go backwards.
> 
> For the experts: Is there a clock exported to userspace that is both 
> monotonic and uniform?  Does CLOCK_MONOTONIC give this on linux?

clock_gettime(CLOCK_MONOTONIC) should be what you're looking for. Since
that still uses do_gettimeofday internally, it is still possible in some
conditions for the current code to have time inconsistencies caused by
interpolation error. This is one of the major reasons for my work.

thanks
-john


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/6] new timeofday core subsystem for -mm (v.B3)
  2005-06-21 15:08         ` Roman Zippel
@ 2005-06-22  0:57           ` john stultz
  2005-06-22  2:39             ` john stultz
  2005-06-22 19:45             ` Roman Zippel
  0 siblings, 2 replies; 32+ messages in thread
From: john stultz @ 2005-06-22  0:57 UTC (permalink / raw)
  To: Roman Zippel; +Cc: lkml, Andrew Morton, George Anzinger, Ulrich Windl

On Tue, 2005-06-21 at 17:08 +0200, Roman Zippel wrote:
> Hi,
> 
> On Mon, 20 Jun 2005, john stultz wrote:
> 
> > > You don't really answer the core question, why do you change everything to 
> > > nanoseconds and 64bit values?
> > 
> > 
> > Well, for a reasonable range of timesource frequencies and interval
> > lengths, the cycle values needs to be 64 bits wide and the mult/shift
> > operation to convert cycles to time is going to need to done with 64
> > bits.
> > 
> > Since xtime is currently a timespec, we already keep nanosecond
> > precision using 64 bits. It was then just a question of how complicated
> > is it to do the manipulations with timespecs vs flat 64 bits worth of
> > nanoseconds.  Nanoseconds made the design cleaner, so I went with them
> > keeping the option to optimize in the future when real issues arose. 
> 
> That might be the case, but your patch makes it very hard to verify.
> You don't fix the old code, you just drop in a complete new 
> implementation, so you have to explain it a bit more.
> What's exactly wrong with old design and why wasn't possible to fix it 
> incrementally? How exactly is your new design superior?

That's a fair criticism. Early in the design phase I spent a lot more
time explaining the reasons for doing this, but more recently I've been
focused on the code issues and haven't maintained that documentation.
Since I'm doing an OLS talk on this work, I need to refresh the
documents anyway, so I'll include that with my next submission.

Briefly the big issues are: 

Correctness: The current timekeeping code is tick based with
interpolation for high res granularity. There are many causes of
interpolation error which can cause either time inconsistencies or NTP
skew:
o tick/timesource calibration errors
o NTP adjustments are not done consistently, only at interrupt time
o Delayed, or lost ticks caused by interrupt starvation, drivers
disabling interrupts for too long, BIOS SMIs
o Fixes like lost tick compensation break virtualized systems

Flexibility: The various tick-less system (Dynamic ticks, NO_IDLE_HZ,
VST) projects are continually needed to add hacks to avoid breaking the
time subsystem. George's HRT patches needed to rework a good chunk of
the timeofday subsystem as well. When HZ was changed time was affected,
when interrupt routing changes time is affected. It just goes on and on.
Having a timeofday subsystem that is somewhat isolated and doesn't break
when someone sneezes near it will help allow for other improvements.

Maintainability:  Most arches do basically the same thing for their
timeofday code, time bugs are frequently only fixed in a few arches,
leaving other arches prone to the problems. Also the collection of
global variables used to keep time and ntp adjustments are getting
unmanageable and are not well understood by everyone resulting in bugs
when folks access them directly. There is a need for some opaqueness to
those variables.  Additionally, The number of timesources is increasing
as is the number of different architectures that share them. Some cross-
architecture method for sharing timesources is needed. 

The biggest improvement with my rework is for correctness. Timekeeping
is no longer tick based, so there is no interpolation (or interpolation
error) in the core algorithm. Delayed, or lost timer interrupt do not
affect timekeeping with my code. This also allows tick-less system
projects or any other projects that deal with timers to not worry about
affecting time with their changes.  It also makes NTP adjustments
smoothly across intervals to avoid time inconsistencies.

My rework also provides an arch generic timeofday framework, which
manages arch specific timesource drivers. Using the existing interfaces
as well as a few new functions, it provides full coverage for the
existing users of the timeofday code, keeping the internal timekeeping
variables opaque.

> > > With -mm you can now choose the HZ value, so that's not really the 
> > > problem anymore. A lot of archs even never changed to a higher HZ value. 
> > > So now I still like to know how does the complexity change compared to the 
> > > old code?
> > 
> > Well, even if it is less of a concern with lower HZ values, my code
> > should still reduce the interrupt overhead for those who would like to
> > have higher HZ values to improve latency. (Although until I have
> > numbers, its all just talk. :)
> 
> For machines where it actually matters, I can only see that calculations 
> have gotten more complex and thus slower. You need to provide a little 
> more detailed information, why this necessary.

Indeed, the periodic timekeeping function is likely more computationally
costly (although I don't have hard numbers on that yet, I will soon),
however we run it less frequently (50x actually) and we do it outside of
the interrupt context. I do not believe the periodic timekeeping path is
going to be a performance concern.

> > Well, I think the only overhead to be worried with is just in
> > gettimeofday(), but let me get some hard numbers to show that. I've
> > already implemented some optimized caching for x86-64's vsyscall-gtod
> > implementation, so let me also try to make an arch generic version and
> > see if I cannot settle your concerns.
> 
> I don't need any practical numbers, I can already see from the code, that 
> it's much worse and unless you eliminate the 64bit calculations from the 
> fast path, I don't know what you are trying to optimize.

That's exactly what I'm trying to optimize. By precalculating some of
the 64 bit manipulations, we can remove them from the fast path.

Ok, from my initial tests on my i686 laptop (@600Mhz), using the
cheapest timesource available (the TSC), the unoptimized B3 version of
the code I sent out shows a 17% performance hit in gettimeofday(). That
ratio will be even smaller if you use a more expensive timesource. So
starting there, let me see how much I can shave off.

> > > I like the concept of the a time source in your patch. m68k already uses a 
> > > number of timer related callbacks into machine specific code. If I could 
> > > replace that with a timer driver, I'd be really happy. OTOH if it requires 
> > > several expensive conversion between different time formats, I rather keep 
> > > the current code.
> > 
> > Thanks, I really appreciate your review and feedback. I very much want
> > this to be a solution everyone can be happy (or at least indifferent)
> > with.
> 
> I would seriously suggest you rework the first patch and fix the existing 
> code instead or you have to explain why the current code is unfixable, but 
> that would require to actually replace the old code. 

Forgive me for not communicating this well enough, I guess I've been
just a bit too stuck in the code to realize others aren't seeing the
problems I am.

It is my feeling that the current interpolated based timekeeping is not
easily fixable. In order to move to a non-interpolated method of
timekeeping, first each architecture has to provide some timesource like
interface that will give us a free running counter. So we provide some
form of timesource interface, and some form of generic timeofday code
and since all the arches won't switch on the same day we'll have to
stage it and then we're almost to where I am today.

The only other option I see is to let each arch sort it out for itself,
some using interpolation, and some not, maybe using a method similar to
what PPC64 does. Then we still have the duplicated effort having to work
on each arch and the unique bugs in each. Future changes like the tick-
less projects will have to conditionally work based on which arch has
which method.

> Having two 
> alternative implementations is really the worst solution.

It is not my intention to have two alternative implementations. I am
just trying to stage the transition so every arch doesn't have to move
all at once.

> John, you are _very_ vague here, could you please go into more detail, why 
> you did certain design decisions? "Simplicity" can't be the only reason, 
> good perfomance is far more important.

Performance is more important then simplicity, however correctness and
flexibility are necessary as well. Hopefully the discussion above will
answer some of your questions. I'll try to make a more thorough writeup
and include it with my next submission.

I really appreciate the feedback and questions. 

thanks
-john

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/6] new timeofday core subsystem for -mm (v.B3)
  2005-06-22  0:57           ` john stultz
@ 2005-06-22  2:39             ` john stultz
  2005-06-22 19:45             ` Roman Zippel
  1 sibling, 0 replies; 32+ messages in thread
From: john stultz @ 2005-06-22  2:39 UTC (permalink / raw)
  To: Roman Zippel; +Cc: lkml, Andrew Morton, George Anzinger, Ulrich Windl

On Tue, 2005-06-21 at 17:57 -0700, john stultz wrote:
> > I don't need any practical numbers, I can already see from the code, that 
> > it's much worse and unless you eliminate the 64bit calculations from the 
> > fast path, I don't know what you are trying to optimize.
> 
> That's exactly what I'm trying to optimize. By precalculating some of
> the 64 bit manipulations, we can remove them from the fast path.
> 
> Ok, from my initial tests on my i686 laptop (@600Mhz), using the
> cheapest timesource available (the TSC), the unoptimized B3 version of
> the code I sent out shows a 17% performance hit in gettimeofday(). That
> ratio will be even smaller if you use a more expensive timesource. So
> starting there, let me see how much I can shave off.

Just a quick update: With a bit of quick optimizing the current design,
removing most of the 64bit math from the gettimeofday fast path, I've
managed to cut it down to within 2% of the current code using the c3tsc
vs tsc. The c3tsc timesource adds a bit of overhead that the mainline
code doesn't do, so I'm going to see how a more comparable TSC vs TSC
test goes.

thanks
-john


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/6] new timeofday core subsystem for -mm (v.B3)
  2005-06-22  0:57           ` john stultz
  2005-06-22  2:39             ` john stultz
@ 2005-06-22 19:45             ` Roman Zippel
  2005-06-23  0:29               ` john stultz
  1 sibling, 1 reply; 32+ messages in thread
From: Roman Zippel @ 2005-06-22 19:45 UTC (permalink / raw)
  To: john stultz; +Cc: lkml, Andrew Morton, George Anzinger, Ulrich Windl

Hi,

On Tue, 21 Jun 2005, john stultz wrote:

> Briefly the big issues are: 

That's interesting, but it mainly describes your design goals, but I'm 
more interested why you did certain design decisions.
The main problem is that you try to fix all these issues with a single 
patch and I'd like to know why you don't pick a single issue and fix it 
first.

> o Delayed, or lost ticks caused by interrupt starvation, drivers
> disabling interrupts for too long, BIOS SMIs

These are actually different error sources. Ticks lost due to disabled 
interrupt can't be detected unless you have a second timer source and the 
generic code doesn't really know about this. If an arch has a second time 
source this is fixable, but I would consider this optional, that means 
adjustments are only done, iff this source is available.
The current concept of lost ticks simply means delayed soft interrupt 
handling. IMO this could be a good starting point to fix the timer code, 
by making it possible to call update_wall_time() with a reduced frequency.
If you move in a _later_ step from ticks to 32bit nanoseconds, that would 
give you still a 4 second window, which should be more than enough, so 
e.g. I don't see any reason to use any 64bit math here.

> The biggest improvement with my rework is for correctness. Timekeeping
> is no longer tick based, so there is no interpolation (or interpolation
> error) in the core algorithm.

AFAICS the interpolation is needed because some arch use different time 
sources for the scheduler and timeofday, but I don't see why fixing the 
timer code immediately requires a generic timer source infrastructure.
You have to be more explicit why it's not possible to fix the generic 
timer code first.

> > For machines where it actually matters, I can only see that calculations 
> > have gotten more complex and thus slower. You need to provide a little 
> > more detailed information, why this necessary.
> 
> Indeed, the periodic timekeeping function is likely more computationally
> costly (although I don't have hard numbers on that yet, I will soon),
> however we run it less frequently (50x actually) and we do it outside of
> the interrupt context. I do not believe the periodic timekeeping path is
> going to be a performance concern.

Please give me _some_ concrete information, why this code has to be more 
complex than the current code.

> > I don't need any practical numbers, I can already see from the code, that 
> > it's much worse and unless you eliminate the 64bit calculations from the 
> > fast path, I don't know what you are trying to optimize.
> 
> That's exactly what I'm trying to optimize. By precalculating some of
> the 64 bit manipulations, we can remove them from the fast path.

I want to remove all that, why do you need 64bit calculations in there? 
What's wrong with a base xtime + 32bit nanosecond offset?

> Ok, from my initial tests on my i686 laptop (@600Mhz), using the
> cheapest timesource available (the TSC), the unoptimized B3 version of
> the code I sent out shows a 17% performance hit in gettimeofday(). That
> ratio will be even smaller if you use a more expensive timesource. So
> starting there, let me see how much I can shave off.

That's hardly a fair comparison, you cannot use an expensive timesource to 
make your time code look cheap.

> It is my feeling that the current interpolated based timekeeping is not
> easily fixable. In order to move to a non-interpolated method of
> timekeeping, first each architecture has to provide some timesource like
> interface that will give us a free running counter.

Why is not possible to fix the generic time code first, so you can later 
drop the interpolation and use time sources instead?
For all archs where timeofday and tick has the same source you can easily 
upgrade to the new code, only the rest needs to do a bit more to update 
xtime from a different source. This way you break much less code and give 
arch maintainers much less headache.
If you think this is not possible, please give me some more concrete 
information.

bye, Roman

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/6] new timeofday core subsystem for -mm (v.B3)
  2005-06-22 19:45             ` Roman Zippel
@ 2005-06-23  0:29               ` john stultz
  2005-06-23 21:59                 ` Roman Zippel
  0 siblings, 1 reply; 32+ messages in thread
From: john stultz @ 2005-06-23  0:29 UTC (permalink / raw)
  To: Roman Zippel; +Cc: lkml, Andrew Morton, George Anzinger, Ulrich Windl

On Wed, 2005-06-22 at 21:45 +0200, Roman Zippel wrote:
> On Tue, 21 Jun 2005, john stultz wrote:
> 
> > Briefly the big issues are: 
> 
> That's interesting, but it mainly describes your design goals, but I'm 
> more interested why you did certain design decisions.
> The main problem is that you try to fix all these issues with a single 
> patch and I'd like to know why you don't pick a single issue and fix it 
> first.

I've actually spent quite a bit of time working to fix these single
issues, and that is why I'm proposing this. The best example is the lost
tick compensation. There I have added code in the i386 arch that
attempts to notice when a tick has been lost (two or more ticks worth of
time passed without an interrupt) and add in the appropriate number of
jiffies. This helped for systems that were losing ticks due to BIOS SMIs
or interrupt starvation. However that broke some VM systems because they
effectively disable interrupts for a period of time, and then re-play
interrupts to try to catch up. The first interrupt would trigger the
lost tick detection code bringing us up to the current time, then the
following catch-up interrupts would advance time again, causing the
system to go too far forward.

Some other architectures try to handle these situations as well. One
good example is PPC64 (which has greatly influenced my design). For
correctness PPC64 goes as far as not using interpolation by avoiding
almost all of the arch generic time code. It even has its own NTP
adjustment code! 

I have come to believe the current arch generic tick based timekeeping
is not sustainable. It seems to me in order to avoid bugs that customers
are seeing, arches are going to have to avoid the current tick based
arch generic code and implement their own non-interpolated based
timekeeping code. So that is why I've created this proposal and
implementation instead of just "fixing one issue".

So the question becomes: in order to achieve correctness, should every
architecture implement a full timeofday subsystem of its own? I designed
a system that would work, but instead of making it i386 and copying it
for x86-64 and whatever else I end up working on, I propose we make it a
common implementation.

Now, my proposal might not currently satisfy everyone's needs for
flexibility and performance in the timekeeping subsystem. But I'd like
to try, and letting me know your issues (such as the use of 64bit math)
helps me work to resolve them, so I really appreciate your feedback. I
really do want this to be something that every arch can use.

Although I hope it doesn't, but if it comes down to there being no
reasonable way for every arch to share the my proposed common
infrastructure, them maybe I need to add some code so those arches can
opt out and maintain their own timekeeping code. Does that sound
reasonable?

> > o Delayed, or lost ticks caused by interrupt starvation, drivers
> > disabling interrupts for too long, BIOS SMIs
> 
> These are actually different error sources. Ticks lost due to disabled 
> interrupt can't be detected unless you have a second timer source and the 
> generic code doesn't really know about this. If an arch has a second time 
> source this is fixable, but I would consider this optional, that means 
> adjustments are only done, iff this source is available.
> The current concept of lost ticks simply means delayed soft interrupt 
> handling. IMO this could be a good starting point to fix the timer code, 
> by making it possible to call update_wall_time() with a reduced frequency.
> If you move in a _later_ step from ticks to 32bit nanoseconds, that would 
> give you still a 4 second window, which should be more than enough, so 
> e.g. I don't see any reason to use any 64bit math here.
> 
> > The biggest improvement with my rework is for correctness. Timekeeping
> > is no longer tick based, so there is no interpolation (or interpolation
> > error) in the core algorithm.
> 
> AFAICS the interpolation is needed because some arch use different time 
> sources for the scheduler and timeofday, but I don't see why fixing the 
> timer code immediately requires a generic timer source infrastructure.
> You have to be more explicit why it's not possible to fix the generic 
> timer code first.

No, I don't believe high res time interpolation does not have anything
to do with the scheduler. It is used by gettimeofday() to give better
then tick-granular time resolution. In fact, the current generic
timekeeping code doesn't even acknowledge interpolation goes on. It just
believes time is incremented in tick units. Every tick, xtime gets
incremented one ticks worth of time. 

So if we were going to fix that, we would have to change the code so
that every tick the amount of time that has passed since the last tick
was added to xtime. In order for this to occur, we need some arch
specific accessor to find out how much time actually passed. This is
very similar what the timesource drivers provide. And once you have that
it is almost the same as my new code. 

> > > For machines where it actually matters, I can only see that calculations 
> > > have gotten more complex and thus slower. You need to provide a little 
> > > more detailed information, why this necessary.
> > 
> > Indeed, the periodic timekeeping function is likely more computationally
> > costly (although I don't have hard numbers on that yet, I will soon),
> > however we run it less frequently (50x actually) and we do it outside of
> > the interrupt context. I do not believe the periodic timekeeping path is
> > going to be a performance concern.
> 
> Please give me _some_ concrete information, why this code has to be more 
> complex than the current code.

Honestly it isn't that much more complex. It is however consistent. We
increment the timekeeping variables in the exact same way we generate
gettimeofday(). This is what avoids time inconsistencies. Additionally,
NTP adjustments are integrated into how we calculate gettimeofday() so
there is not inconsistencies caused by only applying NTP correction at
the tick.

> > > I don't need any practical numbers, I can already see from the code, that 
> > > it's much worse and unless you eliminate the 64bit calculations from the 
> > > fast path, I don't know what you are trying to optimize.
> > 
> > That's exactly what I'm trying to optimize. By precalculating some of
> > the 64 bit manipulations, we can remove them from the fast path.
> 
> I want to remove all that, why do you need 64bit calculations in there? 
> What's wrong with a base xtime + 32bit nanosecond offset?

I'm actually doing almost just that! I've cut the cycle_t down to an
unsigned long, and while nanoseconds are still 64bits (in order to have
a precise cyc2ns conversion), we only use 32bits of them for the most
part. 

> > Ok, from my initial tests on my i686 laptop (@600Mhz), using the
> > cheapest timesource available (the TSC), the unoptimized B3 version of
> > the code I sent out shows a 17% performance hit in gettimeofday(). That
> > ratio will be even smaller if you use a more expensive timesource. So
> > starting there, let me see how much I can shave off.
> 
> That's hardly a fair comparison, you cannot use an expensive timesource to 
> make your time code look cheap.

That was not my intent. The numbers I'm using are using the fastest
timesource available (the TSC). I'm just saying that the 17% performance
impact was as bad as it gets. And currently I'm down to just 2% using
the same timesource with some of the optimizations I mentioned above.

> > It is my feeling that the current interpolated based timekeeping is not
> > easily fixable. In order to move to a non-interpolated method of
> > timekeeping, first each architecture has to provide some timesource like
> > interface that will give us a free running counter.
> 
> Why is not possible to fix the generic time code first, so you can later 
> drop the interpolation and use time sources instead?

It is my opinion that dropping interpolation is what is needed in order
to fix the generic timekeeping code. 

> For all archs where timeofday and tick has the same source you can easily 
> upgrade to the new code, only the rest needs to do a bit more to update 
> xtime from a different source. This way you break much less code and give 
> arch maintainers much less headache.
> If you think this is not possible, please give me some more concrete 
> information.

I'm not sure I exactly follow what you're proposing. It may be possible
that there is a way to implement my design in a more piecemeal fashion,
and I would very much welcome specific suggestions on how to do so. So
please do explain in a bit more detail.

Once again, thanks again for your feedback. Its not my intention for
giving arch maintainers a headache (it is my hope that the new subsystem
will lessen the impact of future changes on arch specific code).

thanks
-john

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/6] new timeofday core subsystem for -mm (v.B3)
  2005-06-23  0:29               ` john stultz
@ 2005-06-23 21:59                 ` Roman Zippel
  2005-06-24  0:33                   ` john stultz
  0 siblings, 1 reply; 32+ messages in thread
From: Roman Zippel @ 2005-06-23 21:59 UTC (permalink / raw)
  To: john stultz; +Cc: lkml, Andrew Morton, George Anzinger, Ulrich Windl

Hi,

On Wed, 22 Jun 2005, john stultz wrote:

> Some other architectures try to handle these situations as well. One
> good example is PPC64 (which has greatly influenced my design). For
> correctness PPC64 goes as far as not using interpolation by avoiding
> almost all of the arch generic time code. It even has its own NTP
> adjustment code! 
> 
> I have come to believe the current arch generic tick based timekeeping
> is not sustainable. It seems to me in order to avoid bugs that customers
> are seeing, arches are going to have to avoid the current tick based
> arch generic code and implement their own non-interpolated based
> timekeeping code. So that is why I've created this proposal and
> implementation instead of just "fixing one issue".

I agree with you that the current time code is a problem for machines like 
ppc64, which basically use two different time sources.

We basically have two timer architectures: timer tick and continuous 
timer. The latter currently has to emulate the timer tick. Your patch 
completely reverses the rolls and forces everybody to produce a continuous 
timer, which I think is equally bad, as some simple computations become a 
lot more complex. Why should it not be possible to support both equally?

> So the question becomes: in order to achieve correctness, should every
> architecture implement a full timeofday subsystem of its own? I designed
> a system that would work, but instead of making it i386 and copying it
> for x86-64 and whatever else I end up working on, I propose we make it a
> common implementation.

That might result in the worst of both worlds. If I look at the ppc64 
implementation of gettimeofday, it's really nice and your (current) code 
would make this worse again. So why not leave it to the timer source, if 
it wants to manage a cycle counter or a xtime+offset? The common code can 
provide some helper functions to manage either of this. Converting 
everything to nanoseconds looks like a really bad compromise.

In the ppc64 example the main problem is that the generic tries to adjust 
for the wrong for the time source - the scheduler tick, which is derived 
from the cycle counter, so it has to redo all the work. Your code now 
introduces an abstract concept of nanosecond which adds extra overhead to 
either timer concept. Why not integrating what ppc64 does into the current 
timer code instead of replacing the current code with something else?

For tick based timer sources there is not much to do than incrementing 
xtime by precomputed constant. If I take ppc64 as an example for 
continuous time sources it does a lot less than your 
timeofday_periodic_hook(). Why is all this needed?
John, what I'd really like to see here is some math or code examples, 
which demonstrate how your new code is better compared to the old code. 
Your code makes a _huge_ transformation jump and I'd like you to explain 
some of the steps inbetween.

bye, Roman

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/6] new timeofday core subsystem for -mm (v.B3)
  2005-06-23 21:59                 ` Roman Zippel
@ 2005-06-24  0:33                   ` john stultz
  2005-06-24 10:58                     ` Roman Zippel
  0 siblings, 1 reply; 32+ messages in thread
From: john stultz @ 2005-06-24  0:33 UTC (permalink / raw)
  To: Roman Zippel; +Cc: lkml, Andrew Morton, George Anzinger, Ulrich Windl

On Thu, 2005-06-23 at 23:59 +0200, Roman Zippel wrote:
> Hi,
> 
> On Wed, 22 Jun 2005, john stultz wrote:
> 
> > Some other architectures try to handle these situations as well. One
> > good example is PPC64 (which has greatly influenced my design). For
> > correctness PPC64 goes as far as not using interpolation by avoiding
> > almost all of the arch generic time code. It even has its own NTP
> > adjustment code! 
> > 
> > I have come to believe the current arch generic tick based timekeeping
> > is not sustainable. It seems to me in order to avoid bugs that customers
> > are seeing, arches are going to have to avoid the current tick based
> > arch generic code and implement their own non-interpolated based
> > timekeeping code. So that is why I've created this proposal and
> > implementation instead of just "fixing one issue".
> 
> I agree with you that the current time code is a problem for machines like 
> ppc64, which basically use two different time sources.
> 
> We basically have two timer architectures: timer tick and continuous 
> timer. The latter currently has to emulate the timer tick. Your patch 
> completely reverses the rolls and forces everybody to produce a continuous 
> timer, which I think is equally bad, as some simple computations become a 
> lot more complex. Why should it not be possible to support both equally?

Yep, I think this is really the core contention. 

In my design I've reworked the time subsystem to assume time flows
continuously as provided by the timesource. On systems that do not have
a continuous counters like the PPC timebase, they can use a similar tick
based interpolation method to provide continuous time. However this
interpolation is done in the timesource driver instead of in the generic
code.

I do feel my design is more flexible then the current tick based code as
it can accommodate both methods equally and correctly. However your
complaint is that it is more computationally expensive for the tick
based systems, and that is a fair criticism.

So what I'm working on now is to get some more detailed numbers and
analysis of the code paths so we can be more specific in our discussion.
Hopefully this will help narrow the concern so I can properly address
it.

> > So the question becomes: in order to achieve correctness, should every
> > architecture implement a full timeofday subsystem of its own? I designed
> > a system that would work, but instead of making it i386 and copying it
> > for x86-64 and whatever else I end up working on, I propose we make it a
> > common implementation.
> 
> That might result in the worst of both worlds. If I look at the ppc64 
> implementation of gettimeofday, it's really nice and your (current) code 
> would make this worse again. 

Could you be more specific about how you feel the ppc64 is nice and how
my code is worse? My code is actually quite influenced by the ppc64
code, so specific details might help me respond.

> So why not leave it to the timer source, if 
> it wants to manage a cycle counter or a xtime+offset? The common code can 
> provide some helper functions to manage either of this. Converting 
> everything to nanoseconds looks like a really bad compromise.
> 
> In the ppc64 example the main problem is that the generic tries to adjust 
> for the wrong for the time source - the scheduler tick, which is derived 
> from the cycle counter, so it has to redo all the work. Your code now 
> introduces an abstract concept of nanosecond which adds extra overhead to 
> either timer concept. Why not integrating what ppc64 does into the current 
> timer code instead of replacing the current code with something else?

I'm not sure I followed that paragraph. Would you clarify a bit?

Also I'd not consider the concept of a nanosecond to be abstract at
all. :) In fact one of the reasons I worked around nanoseconds is
because it is very clear and understandable as to what it means. Now,I
agree that performance should override clarity, but clarity should be a
goal. A complaint I've dealt with is that the current time subsystem is
too fragile and confusing, so some cleanups are in order.

That said, I also agree that if possible cleanups and functional changes
should be separated. I'm looking to see how this might be possible.

> For tick based timer sources there is not much to do than incrementing 
> xtime by precomputed constant. If I take ppc64 as an example for 
> continuous time sources it does a lot less than your 
> timeofday_periodic_hook(). Why is all this needed?

timeofday_periodic_hook() does the following every 50ms:
	o read the timesource, save it as now
	o calculate the amount of NTP adjusted time that has past
	o calculate the amount of raw time that has past
	o Increment the system_time by the calculated NTP adjusted time
	o advance the NTP state machine by the raw time interval
	o do leapsecond processing
	o syncs the persistent clock (if appropriate)
	o do a bit of timesource management
	o calculate NTP adjustment for the current timesource
	o update legacy time variables (xtime, wall_to_monotonic)
	o update vsyscall info (if used)
	o reprogram soft-timer

ppc64's timer_interrupt code does the following each tick.
	o reads the timebase
	o calculates the amount of time that has past
	o saves the current offset
	o calls do_timer 
		o increments xtime by one tick
		o increments the generic NTP state machine
			o Does leapsecond processing
	o overwrites xtime (to avoids interpolation)
	o sync the persistent clock (if appropriate)
	o Calls into their internal NTP code
		o calculates the NTP adjustment for their timebase

Just logically (as opposed to computationally), my code isn't all that
more complex. I'm a little more explicit with my NTP adjustments because
I wanted to be very careful to do them properly and insure we do not get
time inconsistencies. And the timesource management is new but is
reasonably cheap. Maybe you can point to a specific component of the
above that you dislike?

Computationally, there are two 64bit mult/shifts (in the interval
calculations) and a 64 bit divide that occurs in my code. However this
is only done every 50ms instead of every tick, so I don't believe there
is much of a performance impact.

> John, what I'd really like to see here is some math or code examples, 
> which demonstrate how your new code is better compared to the old code. 
> Your code makes a _huge_ transformation jump and I'd like you to explain 
> some of the steps inbetween.

This discussion is difficult to have with the current large patch. I'm
trying to see where I can split it up into components that will help
narrow the discussion.  First up, I'm going to try to make the NTP
cleanups on their own. Then I can see what else can be moved in.

Thanks again, I'll look forward to your feedback on future releases.
-john

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/6] new timeofday core subsystem for -mm (v.B3)
  2005-06-24  0:33                   ` john stultz
@ 2005-06-24 10:58                     ` Roman Zippel
  0 siblings, 0 replies; 32+ messages in thread
From: Roman Zippel @ 2005-06-24 10:58 UTC (permalink / raw)
  To: john stultz; +Cc: lkml, Andrew Morton, George Anzinger, Ulrich Windl

Hi,

On Thu, 23 Jun 2005, john stultz wrote:

> > We basically have two timer architectures: timer tick and continuous 
> > timer. The latter currently has to emulate the timer tick. Your patch 
> > completely reverses the rolls and forces everybody to produce a continuous 
> > timer, which I think is equally bad, as some simple computations become a 
> > lot more complex. Why should it not be possible to support both equally?
> 
> Yep, I think this is really the core contention. 
> 
> In my design I've reworked the time subsystem to assume time flows
> continuously as provided by the timesource. On systems that do not have
> a continuous counters like the PPC timebase, they can use a similar tick
> based interpolation method to provide continuous time. However this
> interpolation is done in the timesource driver instead of in the generic
> code.

By introducing extra overhead, which is difficult to get rid of again.

> > That might result in the worst of both worlds. If I look at the ppc64 
> > implementation of gettimeofday, it's really nice and your (current) code 
> > would make this worse again. 
> 
> Could you be more specific about how you feel the ppc64 is nice and how
> my code is worse? My code is actually quite influenced by the ppc64
> code, so specific details might help me respond.

The ppc64 converts the timebase directly to a timeval, your code puts the 
nanosecond step inbetween.

> > So why not leave it to the timer source, if 
> > it wants to manage a cycle counter or a xtime+offset? The common code can 
> > provide some helper functions to manage either of this. Converting 
> > everything to nanoseconds looks like a really bad compromise.
> > 
> > In the ppc64 example the main problem is that the generic tries to adjust 
> > for the wrong for the time source - the scheduler tick, which is derived 
> > from the cycle counter, so it has to redo all the work. Your code now 
> > introduces an abstract concept of nanosecond which adds extra overhead to 
> > either timer concept. Why not integrating what ppc64 does into the current 
> > timer code instead of replacing the current code with something else?
> 
> I'm not sure I followed that paragraph. Would you clarify a bit?

What exactly?

> Computationally, there are two 64bit mult/shifts (in the interval
> calculations) and a 64 bit divide that occurs in my code. However this
> is only done every 50ms instead of every tick, so I don't believe there
> is much of a performance impact.

You forgot the cyc2ns conversions. Compare this now to some simple 32bit 
math executed every 100ms. Just look at all 32bit archs which use the 
generic do_div and you get an idea, who will not be happy with your new 
code.

Why do you do the adjustment at _every_ call? Why don't you take advantage 
of the fact, that you can be called regularly? In a tick based system you 
can be called every n*tick_nsec ns (without even the need to read a 
possibly expensive timer offset), in a cycle counter system you can 
trigger a call when the counter is past a certain value. In either case 
you know the callback function is called on average in a regular interval, 
so you can precompute adjustments based on this. Ignoring this possibility 
looks like a huge step back to me.

bye, Roman

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2005-06-24 11:03 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-06-18  2:56 [PATCH 1/6] new timeofday core subsystem for -mm (v.B3) john stultz
2005-06-18  2:58 ` [PATCH 2/6] new timeofday i386 arch specific changes, part 1 " john stultz
2005-06-18  2:59   ` [PATCH 3/6] new timeofday i386 arch specific changes, part 2 " john stultz
2005-06-18  3:01     ` [PATCH 4/6] new timeofday i386 arch specific changes, part 3 " john stultz
2005-06-18  3:02       ` [PATCH 5/6] new timeofday i386 arch specific changes, part 4 " john stultz
2005-06-18  3:04         ` [PATCH 6/6] new timeofday i386 specific timesources " john stultz
2005-06-18 12:02 ` [PATCH 1/6] new timeofday core subsystem " Roman Zippel
2005-06-20  7:01   ` Ulrich Windl
2005-06-20 10:22     ` Roman Zippel
2005-06-20 10:31       ` Ulrich Windl
2005-06-20 10:54         ` Roman Zippel
2005-06-20 11:04         ` Christoph Hellwig
2005-06-20 17:09   ` john stultz
2005-06-20 18:10     ` Lee Revell
2005-06-20 21:53       ` john stultz
2005-06-20 23:44         ` Lee Revell
2005-06-21 14:55           ` Chris Friesen
2005-06-21 17:20             ` john stultz
2005-06-21  6:26       ` Ulrich Windl
2005-06-20 22:05     ` Roman Zippel
2005-06-20 23:40       ` Lee Revell
2005-06-20 23:55       ` john stultz
2005-06-21 15:08         ` Roman Zippel
2005-06-22  0:57           ` john stultz
2005-06-22  2:39             ` john stultz
2005-06-22 19:45             ` Roman Zippel
2005-06-23  0:29               ` john stultz
2005-06-23 21:59                 ` Roman Zippel
2005-06-24  0:33                   ` john stultz
2005-06-24 10:58                     ` Roman Zippel
2005-06-21  6:42       ` Ulrich Windl
2005-06-21 15:13         ` Roman Zippel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox