* [PATCH] ktimers subsystem 2.6.14-rc2-kt5
@ 2005-09-28 20:43 tglx
2005-09-28 23:59 ` Frank Sorenson
` (3 more replies)
0 siblings, 4 replies; 67+ messages in thread
From: tglx @ 2005-09-28 20:43 UTC (permalink / raw)
To: linux-kernel
Cc: mingo, akpm, george, johnstul, paulmck, hch, oleg, zippel,
tim.bird
This is an updated version which contains following changes:
- Selectable time storage format: union/struct based, scalar (64bit)
- Fixed an endless loop in forward_posix_timer (George Anzinger)
- Fixed a wrong sizeof(x) (George Anzinger)
- Fixed build problems for non x86 architectures
Roman pointed out that the penalty for some architectures
would be quite big when using the nsec_t (64bit) scalar time
storage format. After a long discussion and some more detailed
tests especially on ARM it turned out that the scalar format
is unfortunately not suitable everywhere. The tradeoff between
performance and cleanliness seems too big for some architectures.
After several rounds of functional conversions and
cleanups an acceptable compromise between cleanliness and
storage format flexibility was found.
For 64bit architectures the scalar representation is definitely
a win and therefor enabled unconditionally. The code defaults to
the union/struct based implementation on 32bit archs, but can be
switched to the scalar storage format by setting
CONFIG_KTIME_SCALAR=y if there is a benefit for the particular
architecture. The union/struct magic has an advantage over the
struct timespec based format which I considered to use first. It
produces better and denser code for most architecures and does no
harm anywhere else. This might change with improvements of
compilers, but then it requires just a replacement of the related
macros / inlines.
The code is not harder to understand than the previous
open coded scalar storage based implementation.
The correctness was verified with the posix timer tests from
the HRT project on the forward ported ktimers based high
resolution proof of concept implementation.
For those interested in this topic the patchseries is available
at http://www.tglx.de/private/tglx/ktimers/patch-2.6.14-rc2-kt5.patches.tar.bz2
Thanks for review and feedback.
tglx
ktimers seperate the "timer API" from the "timeout API".
ktimers are used for:
- nanosleep
- posixtimers
- itimers
The patch contains the base implementation of ktimers and the
conversion of nanosleep, posixtimers and itimers to ktimer users.
The patch does not require other changes to the Linux time(r) core
system.
The implementation was done with following constraints in mind:
- Not bound to jiffies
- Multiple time sources
- Per CPU timer queues
- Simplification of absolute CLOCK_REALTIME posix timers
- High resolution timer aware
- Allows the timeout API to reschedule the next event
(for tickless systems)
Ktimers enqueue the timers into a time sorted list, which is implemented
with a rbtree, which is effiecient and already used in other performance
critical parts of the kernel. This is a bit slower than the timer wheel,
but due to the fact that the vast majority of timers is actually
expiring it has to be waged versus the cascading penalty.
The code supports multiple time sources. Currently implemented are
CLOCK_REALTIME and CLOCK_MONOTONIC. They provide seperate timer queues
and support functions.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
Index: linux-2.6.14-rc2-rt4/include/linux/calc64.h
===================================================================
--- /dev/null
+++ linux-2.6.14-rc2-rt4/include/linux/calc64.h
@@ -0,0 +1,31 @@
+#ifndef _linux_CALC64_H
+#define _linux_CALC64_H
+
+#include <linux/types.h>
+#include <asm/div64.h>
+
+#ifndef div_long_long_rem
+#define div_long_long_rem(dividend,divisor,remainder) \
+({ \
+ u64 result = dividend; \
+ *remainder = do_div(result,divisor); \
+ result; \
+})
+#endif
+
+static inline long div_long_long_rem_signed(long long dividend,
+ long divisor,
+ long *remainder)
+{
+ long res;
+
+ if (unlikely(dividend < 0)) {
+ res = -div_long_long_rem(-dividend, divisor, remainder);
+ *remainder = -(*remainder);
+ } else {
+ res = div_long_long_rem(dividend, divisor, remainder);
+ }
+ return res;
+}
+
+#endif
Index: linux-2.6.14-rc2-rt4/include/linux/jiffies.h
===================================================================
--- linux-2.6.14-rc2-rt4.orig/include/linux/jiffies.h
+++ linux-2.6.14-rc2-rt4/include/linux/jiffies.h
@@ -1,21 +1,12 @@
#ifndef _LINUX_JIFFIES_H
#define _LINUX_JIFFIES_H
+#include <linux/calc64.h>
#include <linux/kernel.h>
#include <linux/types.h>
#include <linux/time.h>
#include <linux/timex.h>
#include <asm/param.h> /* for HZ */
-#include <asm/div64.h>
-
-#ifndef div_long_long_rem
-#define div_long_long_rem(dividend,divisor,remainder) \
-({ \
- u64 result = dividend; \
- *remainder = do_div(result,divisor); \
- result; \
-})
-#endif
/*
* The following defines establish the engineering parameters of the PLL
Index: linux-2.6.14-rc2-rt4/fs/exec.c
===================================================================
--- linux-2.6.14-rc2-rt4.orig/fs/exec.c
+++ linux-2.6.14-rc2-rt4/fs/exec.c
@@ -645,9 +645,10 @@ static inline int de_thread(struct task_
* synchronize with any firing (by calling del_timer_sync)
* before we can safely let the old group leader die.
*/
- sig->real_timer.data = (unsigned long)current;
- if (del_timer_sync(&sig->real_timer))
- add_timer(&sig->real_timer);
+ sig->real_timer.data = current;
+ if (stop_ktimer(&sig->real_timer))
+ start_ktimer(&sig->real_timer, NULL,
+ KTIMER_RESTART|KTIMER_NOCHECK);
}
while (atomic_read(&sig->count) > count) {
sig->group_exit_task = current;
@@ -659,7 +660,7 @@ static inline int de_thread(struct task_
}
sig->group_exit_task = NULL;
sig->notify_count = 0;
- sig->real_timer.data = (unsigned long)current;
+ sig->real_timer.data = current;
spin_unlock_irq(lock);
/*
Index: linux-2.6.14-rc2-rt4/fs/proc/array.c
===================================================================
--- linux-2.6.14-rc2-rt4.orig/fs/proc/array.c
+++ linux-2.6.14-rc2-rt4/fs/proc/array.c
@@ -330,7 +330,7 @@ static int do_task_stat(struct task_stru
unsigned long min_flt = 0, maj_flt = 0;
cputime_t cutime, cstime, utime, stime;
unsigned long rsslim = 0;
- unsigned long it_real_value = 0;
+ DEFINE_KTIME(it_real_value);
struct task_struct *t;
char tcomm[sizeof(task->comm)];
@@ -386,7 +386,7 @@ static int do_task_stat(struct task_stru
utime = cputime_add(utime, task->signal->utime);
stime = cputime_add(stime, task->signal->stime);
}
- it_real_value = task->signal->it_real_value;
+ it_real_value = task->signal->real_timer.expires;
}
ppid = pid_alive(task) ? task->group_leader->real_parent->tgid : 0;
read_unlock(&tasklist_lock);
@@ -435,7 +435,7 @@ static int do_task_stat(struct task_stru
priority,
nice,
num_threads,
- jiffies_to_clock_t(it_real_value),
+ (clock_t) ktime_to_clock_t(it_real_value),
start_time,
vsize,
mm ? get_mm_counter(mm, rss) : 0, /* you might want to shift this left 3 */
Index: linux-2.6.14-rc2-rt4/include/linux/ktimer.h
===================================================================
--- /dev/null
+++ linux-2.6.14-rc2-rt4/include/linux/ktimer.h
@@ -0,0 +1,335 @@
+#ifndef _LINUX_KTIMER_H
+#define _LINUX_KTIMER_H
+
+#include <linux/init.h>
+#include <linux/list.h>
+#include <linux/rbtree.h>
+#include <linux/time.h>
+#include <linux/wait.h>
+
+/* Timer API */
+
+/*
+ * Select the ktime_t data type
+ */
+#if defined(CONFIG_KTIME_SCALAR) || (BITS_PER_LONG == 64)
+ #define KTIME_IS_SCALAR
+#endif
+
+#ifndef KTIME_IS_SCALAR
+typedef union {
+ s64 tv64;
+ struct {
+#ifdef __BIG_ENDIAN
+ s32 sec, nsec;
+#else
+ s32 nsec, sec;
+#endif
+ } tv;
+} ktime_t;
+
+#else
+
+typedef s64 ktime_t;
+
+#endif
+
+struct ktimer_base;
+
+/*
+ * Timer structure must be initialized by init_ktimer_xxx !
+ */
+struct ktimer {
+ struct rb_node node;
+ struct list_head list;
+ ktime_t expires;
+ ktime_t expired;
+ ktime_t interval;
+ int overrun;
+ unsigned long status;
+ void (*function)(void *);
+ void *data;
+ struct ktimer_base *base;
+};
+
+/*
+ * Timer base struct
+ */
+struct ktimer_base {
+ int index;
+ char *name;
+ spinlock_t lock;
+ struct rb_root active;
+ struct list_head pending;
+ int count;
+ unsigned long resolution;
+ ktime_t (*get_time)(void);
+ struct ktimer *running_timer;
+ wait_queue_head_t wait_for_running_timer;
+};
+
+/*
+ * Values for the mode argument of xxx_ktimer functions
+ */
+enum
+{
+ KTIMER_NOREARM, /* Internal value */
+ KTIMER_ABS, /* Time value is absolute */
+ KTIMER_REL, /* Time value is relativ to now */
+ KTIMER_INCR, /* Time value is relativ to previous expiry time */
+ KTIMER_FORWARD, /* Timer is rearmed with value. Overruns are accounted */
+ KTIMER_REARM, /* Timer is rearmed with interval. Overruns are accounted */
+ KTIMER_RESTART /* Timer is restarted with the stored expiry value */
+};
+
+/* The timer states */
+enum
+{
+ KTIMER_INACTIVE,
+ KTIMER_PENDING,
+ KTIMER_EXPIRED,
+ KTIMER_EXPIRED_NOQUEUE,
+};
+
+/* Expiry must not be checked when the timer is started */
+#define KTIMER_NOCHECK 0x10000
+
+#define KTIMER_POISON ((void *) 0x00100101)
+
+#define KTIME_ZERO 0LL
+
+#define ktimer_active(t) ((t)->status != KTIMER_INACTIVE)
+#define ktimer_before(t1, t2) (ktime_cmp((t1)->expires, <, (t2)->expires))
+
+#ifndef KTIME_IS_SCALAR
+/*
+ * Helper macros/inlines to get the math with ktime_t right. Uurgh, that's
+ * ugly as hell, but for performance sake we have to use this. The
+ * nsec_t based code was nice and simple. :(
+ *
+ * Be careful when using this stuff. It blows up on you if you dön't
+ * get the weirdness right.
+ *
+ * Be especially aware, that negative values are represented in the
+ * form:
+ * tv.sec < 0 and 0 >= tv.nsec < NSEC_PER_SEC
+ *
+ */
+#define DEFINE_KTIME(k) ktime_t k = {.tv64 = 0LL }
+
+#define ktime_cmp(a,op,b) ((a).tv64 op (b).tv64)
+#define ktime_cmp_val(a, op, b) ((a).tv64 op b)
+
+#define ktime_set(s,n) \
+({ \
+ ktime_t __kt; \
+ __kt.tv.sec = s; \
+ __kt.tv.nsec = n; \
+ __kt; \
+})
+
+#define ktime_set_zero(k) k.tv64 = 0LL
+
+#define ktime_set_low_high(l,h) ktime_set(h,l)
+
+#define ktime_get_low(t) (t).tv.nsec
+#define ktime_get_high(t) (t).tv.sec
+
+static inline ktime_t ktime_set_normalized(long sec, long nsec)
+{
+ ktime_t res;
+
+ while (nsec < 0) {
+ nsec += NSEC_PER_SEC;
+ sec--;
+ }
+ while (nsec >= NSEC_PER_SEC) {
+ nsec -= NSEC_PER_SEC;
+ sec++;
+ }
+
+ res.tv.sec = sec;
+ res.tv.nsec = nsec;
+ return res;
+}
+
+static inline ktime_t ktime_sub(ktime_t a, ktime_t b)
+{
+ ktime_t res;
+
+ res.tv64 = a.tv64 - b.tv64;
+ if (res.tv.nsec < 0)
+ res.tv.nsec += NSEC_PER_SEC;
+
+ return res;
+}
+
+static inline ktime_t ktime_add(ktime_t a, ktime_t b)
+{
+ ktime_t res;
+
+ res.tv64 = a.tv64 + b.tv64;
+ if (res.tv.nsec >= NSEC_PER_SEC) {
+ res.tv.nsec -= NSEC_PER_SEC;
+ res.tv.sec++;
+ }
+ return res;
+}
+
+static inline ktime_t ktime_add_ns(ktime_t a, u64 nsec)
+{
+ ktime_t tmp;
+
+ if (likely(nsec < NSEC_PER_SEC)) {
+ tmp.tv64 = nsec;
+ } else {
+ unsigned long rem;
+ rem = do_div(nsec, NSEC_PER_SEC);
+ tmp = ktime_set((long)nsec, rem);
+ }
+ return ktime_add(a,tmp);
+}
+
+#define timespec_to_ktime(ts) \
+({ \
+ ktime_t __kt; \
+ struct timespec __ts = (ts); \
+ __kt.tv.sec = (s32)__ts.tv_sec; \
+ __kt.tv.nsec = (s32)__ts.tv_nsec; \
+ __kt; \
+})
+
+#define ktime_to_timespec(kt) \
+({ \
+ struct timespec __ts; \
+ ktime_t __kt = (kt); \
+ __ts.tv_sec = (time_t)__kt.tv.sec; \
+ __ts.tv_nsec = (long)__kt.tv.nsec; \
+ __ts; \
+})
+
+#define ktime_to_timeval(kt) \
+({ \
+ struct timeval __tv; \
+ ktime_t __kt = (kt); \
+ __tv.tv_sec = (time_t)__kt.tv.sec; \
+ __tv.tv_usec = (long)(__kt.tv.nsec / NSEC_PER_USEC); \
+ __tv; \
+})
+
+#define ktime_to_clock_t(kt) \
+({ \
+ ktime_t __kt = (kt); \
+ u64 nsecs = (u64) __kt.tv.sec * NSEC_PER_SEC; \
+ nsec_to_clock_t(nsecs + (u64) __kt.tv.nsec); \
+})
+
+#define ktime_to_ns(kt) \
+({ \
+ ktime_t __kt = (kt); \
+ (((u64)__kt.tv.sec * NSEC_PER_SEC) + (u64)__kt.tv.nsec);\
+})
+
+#else
+
+/* ktime_t macros when using a 64bit variable */
+
+#define DEFINE_KTIME(kt) ktime_t kt = 0LL
+
+#define ktime_cmp(a,op,b) ((a) op (b))
+#define ktime_cmp_val(a,op,b) ((a) op b)
+
+#define ktime_set(s,n) (((s64) s * NSEC_PER_SEC) + (s64)n)
+#define ktime_set_zero(kt) kt = 0LL
+
+#define ktime_set_low_high(l,h) ((s64)((u64)l) | (((s64) h) << 32))
+
+#define ktime_get_low(t) ((t) & 0xFFFFFFFFLL)
+#define ktime_get_high(t) ((t) >> 32)
+
+#define ktime_sub(a,b) ((a) - (b))
+#define ktime_add(a,b) ((a) + (b))
+#define ktime_add_ns(a,b) ((a) + (b))
+
+#define timespec_to_ktime(ts) ktime_set(ts.tv_sec, ts.tv_nsec)
+
+#define ktime_to_timespec(kt) ns_to_timespec(kt)
+#define ktime_to_timeval(kt) ns_to_timeval(kt)
+
+#define ktime_to_clock_t(kt) nsec_to_clock_t(kt)
+
+#define ktime_to_ns(kt) (kt)
+
+#define ktime_set_normalized(s,n) ktime_set(s,n)
+
+#endif
+
+/* Exported functions */
+extern void fastcall init_ktimer_real(struct ktimer *timer);
+extern void fastcall init_ktimer_mono(struct ktimer *timer);
+extern int modify_ktimer(struct ktimer *timer, ktime_t *tim, int mode);
+extern int start_ktimer(struct ktimer *timer, ktime_t *tim, int mode);
+extern int try_to_stop_ktimer(struct ktimer *timer);
+extern int stop_ktimer(struct ktimer *timer);
+extern ktime_t get_remtime_ktimer(struct ktimer *timer, long fake);
+extern ktime_t get_expiry_ktimer(struct ktimer *timer, ktime_t *now);
+extern void __init init_ktimers(void);
+
+/* Conversion functions with rounding based on resolution */
+extern ktime_t ktimer_convert_timeval(struct ktimer *timer, struct timeval *tv);
+extern ktime_t ktimer_convert_timespec(struct ktimer *timer, struct timespec *ts);
+
+/* Posix timers current quirks */
+extern int get_ktimer_mono_res(clockid_t which_clock, struct timespec *tp);
+extern int get_ktimer_real_res(clockid_t which_clock, struct timespec *tp);
+
+/* nanosleep functions */
+long ktimer_nanosleep_mono(struct timespec *rqtp, struct timespec __user *rmtp, int mode);
+long ktimer_nanosleep_real(struct timespec *rqtp, struct timespec __user *rmtp, int mode);
+
+#if defined(CONFIG_SMP)
+extern void wait_for_ktimer(struct ktimer *timer);
+#else
+#define wait_for_ktimer(t) do {} while (0)
+#endif
+
+#define KTIME_REALTIME_RES (NSEC_PER_SEC/HZ)
+#define KTIME_MONOTONIC_RES (NSEC_PER_SEC/HZ)
+
+static inline void get_ktime_mono_ts(struct timespec *ts)
+{
+ unsigned long seq;
+ struct timespec tomono;
+ do {
+ seq = read_seqbegin(&xtime_lock);
+ getnstimeofday(ts);
+ tomono = wall_to_monotonic;
+ } while (read_seqretry(&xtime_lock, seq));
+
+
+ set_normalized_timespec(ts, ts->tv_sec + tomono.tv_sec,
+ ts->tv_nsec + tomono.tv_nsec);
+
+}
+
+static inline ktime_t do_get_ktime_mono(void)
+{
+ struct timespec now;
+
+ get_ktime_mono_ts(&now);
+ return timespec_to_ktime(now);
+}
+
+#define get_ktime_real_ts(ts) getnstimeofday(ts)
+static inline ktime_t do_get_ktime_real(void)
+{
+ struct timespec now;
+
+ getnstimeofday(&now);
+ return timespec_to_ktime(now);
+}
+
+#define clock_was_set() do { } while (0)
+extern void run_ktimer_queues(void);
+
+#endif
Index: linux-2.6.14-rc2-rt4/include/linux/posix-timers.h
===================================================================
--- linux-2.6.14-rc2-rt4.orig/include/linux/posix-timers.h
+++ linux-2.6.14-rc2-rt4/include/linux/posix-timers.h
@@ -51,10 +51,9 @@ struct k_itimer {
struct sigqueue *sigq; /* signal queue entry. */
union {
struct {
- struct timer_list timer;
- struct list_head abs_timer_entry; /* clock abs_timer_list */
- struct timespec wall_to_prev; /* wall_to_monotonic used when set */
- unsigned long incr; /* interval in jiffies */
+ struct ktimer timer;
+ ktime_t incr;
+ int overrun;
} real;
struct cpu_timer_list cpu;
struct {
@@ -66,10 +65,6 @@ struct k_itimer {
} it;
};
-struct k_clock_abs {
- struct list_head list;
- spinlock_t lock;
-};
struct k_clock {
int res; /* in nano seconds */
int (*clock_getres) (clockid_t which_clock, struct timespec *tp);
@@ -77,7 +72,7 @@ struct k_clock {
int (*clock_set) (clockid_t which_clock, struct timespec * tp);
int (*clock_get) (clockid_t which_clock, struct timespec * tp);
int (*timer_create) (struct k_itimer *timer);
- int (*nsleep) (clockid_t which_clock, int flags, struct timespec *);
+ int (*nsleep) (clockid_t which_clock, int flags, struct timespec *, struct timespec __user *);
int (*timer_set) (struct k_itimer * timr, int flags,
struct itimerspec * new_setting,
struct itimerspec * old_setting);
@@ -91,37 +86,104 @@ void register_posix_clock(clockid_t cloc
/* Error handlers for timer_create, nanosleep and settime */
int do_posix_clock_notimer_create(struct k_itimer *timer);
-int do_posix_clock_nonanosleep(clockid_t, int flags, struct timespec *);
+int do_posix_clock_nonanosleep(clockid_t, int flags, struct timespec *, struct timespec __user *);
int do_posix_clock_nosettime(clockid_t, struct timespec *tp);
/* function to call to trigger timer event */
int posix_timer_event(struct k_itimer *timr, int si_private);
-struct now_struct {
- unsigned long jiffies;
-};
-
-#define posix_get_now(now) (now)->jiffies = jiffies;
-#define posix_time_before(timer, now) \
- time_before((timer)->expires, (now)->jiffies)
-
-#define posix_bump_timer(timr, now) \
- do { \
- long delta, orun; \
- delta = now.jiffies - (timr)->it.real.timer.expires; \
- if (delta >= 0) { \
- orun = 1 + (delta / (timr)->it.real.incr); \
- (timr)->it.real.timer.expires += \
- orun * (timr)->it.real.incr; \
- (timr)->it_overrun += orun; \
- } \
- }while (0)
+#if (BITS_PER_LONG < 64)
+static inline ktime_t forward_posix_timer(struct k_itimer *t, ktime_t now)
+{
+ ktime_t delta = ktime_sub(now, t->it.real.timer.expires);
+ unsigned long orun = 1;
+
+ if (ktime_cmp_val(delta, <, KTIME_ZERO))
+ goto out;
+
+ if (unlikely(ktime_cmp(delta, >, t->it.real.incr))) {
+
+ int sft = 0;
+ u64 div, dclc, inc, dns;
+
+ dclc = dns = ktime_to_ns(delta);
+ div = inc = ktime_to_ns(t->it.real.incr);
+ /* Make sure the divisor is less than 2^32 */
+ while(div >> 32) {
+ sft++;
+ div >>= 1;
+ }
+ dclc >>= sft;
+ do_div(dclc, (unsigned long) div);
+ orun = (unsigned long) dclc;
+ if (likely(!(inc >> 32)))
+ dclc *= (unsigned long) inc;
+ else
+ dclc *= inc;
+ t->it.real.timer.expires = ktime_add_ns(t->it.real.timer.expires,
+ dclc);
+ } else {
+ t->it.real.timer.expires = ktime_add(t->it.real.timer.expires,
+ t->it.real.incr);
+ }
+ /*
+ * Here is the correction for exact. Also covers delta == incr
+ * which is the else clause above.
+ */
+ if (ktime_cmp(t->it.real.timer.expires, <=, now)) {
+ t->it.real.timer.expires = ktime_add(t->it.real.timer.expires,
+ t->it.real.incr);
+ orun++;
+ }
+ t->it_overrun += orun;
+
+ out:
+ return ktime_sub(t->it.real.timer.expires, now);
+}
+#else
+static inline ktime_t forward_posix_timer(struct k_itimer *t, ktime_t now)
+{
+ ktime_t delta = ktime_sub(now, t->it.real.timer.expires);
+ unsigned long orun = 1;
+
+ if (ktime_cmp_val(delta, <, KTIME_ZERO))
+ goto out;
+
+ if (unlikely(ktime_cmp(delta, >, t->it.real.incr))) {
+
+ u64 dns, inc;
+
+ dns = ktime_to_ns(delta);
+ inc = ktime_to_ns(t->it.real.incr);
+
+ orun = dns / inc;
+ t->it.real.timer.expires = ktime_add_ns(t->it.real.timer.expires,
+ orun * inc);
+ } else {
+ t->it.real.timer.expires = ktime_add(t->it.real.timer.expires,
+ t->it.real.incr);
+ }
+ /*
+ * Here is the correction for exact. Also covers delta == incr
+ * which is the else clause above.
+ */
+ if (ktime_cmp(t->it.real.timer.expires, <=, now)) {
+ t->it.real.timer.expires = ktime_add(t->it.real.timer.expires,
+ t->it.real.incr);
+ orun++;
+ }
+ t->it_overrun += orun;
+ out:
+ return ktime_sub(t->it.real.timer.expires, now);
+}
+#endif
int posix_cpu_clock_getres(clockid_t which_clock, struct timespec *);
int posix_cpu_clock_get(clockid_t which_clock, struct timespec *);
int posix_cpu_clock_set(clockid_t which_clock, const struct timespec *tp);
int posix_cpu_timer_create(struct k_itimer *);
-int posix_cpu_nsleep(clockid_t, int, struct timespec *);
+int posix_cpu_nsleep(clockid_t, int, struct timespec *,
+ struct timespec __user *);
int posix_cpu_timer_set(struct k_itimer *, int,
struct itimerspec *, struct itimerspec *);
int posix_cpu_timer_del(struct k_itimer *);
Index: linux-2.6.14-rc2-rt4/include/linux/sched.h
===================================================================
--- linux-2.6.14-rc2-rt4.orig/include/linux/sched.h
+++ linux-2.6.14-rc2-rt4/include/linux/sched.h
@@ -104,6 +104,7 @@ extern unsigned long nr_iowait(void);
#include <linux/param.h>
#include <linux/resource.h>
#include <linux/timer.h>
+#include <linux/ktimer.h>
#include <asm/processor.h>
@@ -346,8 +347,7 @@ struct signal_struct {
struct list_head posix_timers;
/* ITIMER_REAL timer for the process */
- struct timer_list real_timer;
- unsigned long it_real_value, it_real_incr;
+ struct ktimer real_timer;
/* ITIMER_PROF and ITIMER_VIRTUAL timers for the process */
cputime_t it_prof_expires, it_virt_expires;
Index: linux-2.6.14-rc2-rt4/include/linux/timer.h
===================================================================
--- linux-2.6.14-rc2-rt4.orig/include/linux/timer.h
+++ linux-2.6.14-rc2-rt4/include/linux/timer.h
@@ -91,6 +91,6 @@ static inline void add_timer(struct time
extern void init_timers(void);
extern void run_local_timers(void);
-extern void it_real_fn(unsigned long);
+extern void it_real_fn(void *);
#endif
Index: linux-2.6.14-rc2-rt4/init/main.c
===================================================================
--- linux-2.6.14-rc2-rt4.orig/init/main.c
+++ linux-2.6.14-rc2-rt4/init/main.c
@@ -485,6 +485,7 @@ asmlinkage void __init start_kernel(void
init_IRQ();
pidhash_init();
init_timers();
+ init_ktimers();
softirq_init();
time_init();
Index: linux-2.6.14-rc2-rt4/kernel/Makefile
===================================================================
--- linux-2.6.14-rc2-rt4.orig/kernel/Makefile
+++ linux-2.6.14-rc2-rt4/kernel/Makefile
@@ -7,7 +7,8 @@ obj-y = sched.o fork.o exec_domain.o
sysctl.o capability.o ptrace.o timer.o user.o \
signal.o sys.o kmod.o workqueue.o pid.o \
rcupdate.o intermodule.o extable.o params.o posix-timers.o \
- kthread.o wait.o kfifo.o sys_ni.o posix-cpu-timers.o
+ kthread.o wait.o kfifo.o sys_ni.o posix-cpu-timers.o \
+ ktimers.o
obj-$(CONFIG_FUTEX) += futex.o
obj-$(CONFIG_GENERIC_ISA_DMA) += dma.o
Index: linux-2.6.14-rc2-rt4/kernel/exit.c
===================================================================
--- linux-2.6.14-rc2-rt4.orig/kernel/exit.c
+++ linux-2.6.14-rc2-rt4/kernel/exit.c
@@ -842,7 +842,7 @@ fastcall NORET_TYPE void do_exit(long co
update_mem_hiwater(tsk);
group_dead = atomic_dec_and_test(&tsk->signal->live);
if (group_dead) {
- del_timer_sync(&tsk->signal->real_timer);
+ stop_ktimer(&tsk->signal->real_timer);
acct_process(code);
}
exit_mm(tsk);
Index: linux-2.6.14-rc2-rt4/kernel/fork.c
===================================================================
--- linux-2.6.14-rc2-rt4.orig/kernel/fork.c
+++ linux-2.6.14-rc2-rt4/kernel/fork.c
@@ -804,10 +804,9 @@ static inline int copy_signal(unsigned l
init_sigpending(&sig->shared_pending);
INIT_LIST_HEAD(&sig->posix_timers);
- sig->it_real_value = sig->it_real_incr = 0;
+ init_ktimer_mono(&sig->real_timer);
sig->real_timer.function = it_real_fn;
- sig->real_timer.data = (unsigned long) tsk;
- init_timer(&sig->real_timer);
+ sig->real_timer.data = tsk;
sig->it_virt_expires = cputime_zero;
sig->it_virt_incr = cputime_zero;
Index: linux-2.6.14-rc2-rt4/kernel/itimer.c
===================================================================
--- linux-2.6.14-rc2-rt4.orig/kernel/itimer.c
+++ linux-2.6.14-rc2-rt4/kernel/itimer.c
@@ -12,36 +12,22 @@
#include <linux/syscalls.h>
#include <linux/time.h>
#include <linux/posix-timers.h>
+#include <linux/ktimer.h>
#include <asm/uaccess.h>
-static unsigned long it_real_value(struct signal_struct *sig)
-{
- unsigned long val = 0;
- if (timer_pending(&sig->real_timer)) {
- val = sig->real_timer.expires - jiffies;
-
- /* look out for negative/zero itimer.. */
- if ((long) val <= 0)
- val = 1;
- }
- return val;
-}
-
int do_getitimer(int which, struct itimerval *value)
{
struct task_struct *tsk = current;
- unsigned long interval, val;
+ ktime_t interval, val;
cputime_t cinterval, cval;
switch (which) {
case ITIMER_REAL:
- spin_lock_irq(&tsk->sighand->siglock);
- interval = tsk->signal->it_real_incr;
- val = it_real_value(tsk->signal);
- spin_unlock_irq(&tsk->sighand->siglock);
- jiffies_to_timeval(val, &value->it_value);
- jiffies_to_timeval(interval, &value->it_interval);
+ interval = tsk->signal->real_timer.interval;
+ val = get_remtime_ktimer(&tsk->signal->real_timer, NSEC_PER_USEC);
+ value->it_value = ktime_to_timeval(val);
+ value->it_interval = ktime_to_timeval(interval);
break;
case ITIMER_VIRTUAL:
read_lock(&tasklist_lock);
@@ -113,59 +99,35 @@ asmlinkage long sys_getitimer(int which,
}
-void it_real_fn(unsigned long __data)
+/*
+ * The timer is automagically restarted, when interval != 0
+ */
+void it_real_fn(void *data)
{
- struct task_struct * p = (struct task_struct *) __data;
- unsigned long inc = p->signal->it_real_incr;
-
- send_group_sig_info(SIGALRM, SEND_SIG_PRIV, p);
-
- /*
- * Now restart the timer if necessary. We don't need any locking
- * here because do_setitimer makes sure we have finished running
- * before it touches anything.
- * Note, we KNOW we are (or should be) at a jiffie edge here so
- * we don't need the +1 stuff. Also, we want to use the prior
- * expire value so as to not "slip" a jiffie if we are late.
- * Deal with requesting a time prior to "now" here rather than
- * in add_timer.
- */
- if (!inc)
- return;
- while (time_before_eq(p->signal->real_timer.expires, jiffies))
- p->signal->real_timer.expires += inc;
- add_timer(&p->signal->real_timer);
+ send_group_sig_info(SIGALRM, SEND_SIG_PRIV, data);
}
int do_setitimer(int which, struct itimerval *value, struct itimerval *ovalue)
{
struct task_struct *tsk = current;
- unsigned long val, interval, expires;
+ struct ktimer *timer;
+ ktime_t expires;
cputime_t cval, cinterval, nval, ninterval;
switch (which) {
case ITIMER_REAL:
-again:
- spin_lock_irq(&tsk->sighand->siglock);
- interval = tsk->signal->it_real_incr;
- val = it_real_value(tsk->signal);
- /* We are sharing ->siglock with it_real_fn() */
- if (try_to_del_timer_sync(&tsk->signal->real_timer) < 0) {
- spin_unlock_irq(&tsk->sighand->siglock);
- goto again;
- }
- tsk->signal->it_real_incr =
- timeval_to_jiffies(&value->it_interval);
- expires = timeval_to_jiffies(&value->it_value);
- if (expires)
- mod_timer(&tsk->signal->real_timer,
- jiffies + 1 + expires);
- spin_unlock_irq(&tsk->sighand->siglock);
+ timer = &tsk->signal->real_timer;
+ stop_ktimer(timer);
if (ovalue) {
- jiffies_to_timeval(val, &ovalue->it_value);
- jiffies_to_timeval(interval,
- &ovalue->it_interval);
- }
+ ovalue->it_value = ktime_to_timeval(
+ get_remtime_ktimer(timer, NSEC_PER_USEC));
+ ovalue->it_interval = ktime_to_timeval(timer->interval);
+ }
+ timer->interval = ktimer_convert_timeval(timer, &value->it_interval);
+ expires = ktimer_convert_timeval(timer, &value->it_value);
+ if (ktime_cmp_val(expires, != , KTIME_ZERO))
+ modify_ktimer(timer, &expires, KTIMER_REL | KTIMER_NOCHECK);
+
break;
case ITIMER_VIRTUAL:
nval = timeval_to_cputime(&value->it_value);
Index: linux-2.6.14-rc2-rt4/kernel/ktimers.c
===================================================================
--- /dev/null
+++ linux-2.6.14-rc2-rt4/kernel/ktimers.c
@@ -0,0 +1,824 @@
+/*
+ * linux/kernel/ktimers.c
+ *
+ * Copyright(C) 2005 Thomas Gleixner <tglx@linutronix.de>
+ *
+ * Kudos to Ingo Molnar for review, criticism, ideas
+ *
+ * Credits:
+ * Lot of ideas and implementation details taken from
+ * timer.c and related code
+ *
+ * Kernel timers
+ *
+ * In contrast to the timeout related API found in kernel/timer.c,
+ * ktimers provide finer resolution and accuracy depending on system
+ * configuration and capabilities.
+ *
+ * These timers are used for
+ * - itimers
+ * - posixtimers
+ * - nanosleep
+ * - precise in kernel timing
+ *
+ * Please do not abuse this API for simple timeouts.
+ *
+ * For licencing details see kernel-base/COPYING
+ *
+ */
+
+#include <linux/cpu.h>
+#include <linux/interrupt.h>
+#include <linux/ktimer.h>
+#include <linux/module.h>
+#include <linux/notifier.h>
+#include <linux/percpu.h>
+#include <linux/syscalls.h>
+
+#include <asm/uaccess.h>
+
+static ktime_t get_ktime_mono(void);
+static ktime_t get_ktime_real(void);
+
+/* The time bases */
+#define MAX_KTIMER_BASES 2
+static DEFINE_PER_CPU(struct ktimer_base, ktimer_bases[MAX_KTIMER_BASES]) =
+{
+ {
+ .index = CLOCK_REALTIME,
+ .name = "Realtime",
+ .get_time = &get_ktime_real,
+ .resolution = KTIME_REALTIME_RES,
+ },
+ {
+ .index = CLOCK_MONOTONIC,
+ .name = "Monotonic",
+ .get_time = &get_ktime_mono,
+ .resolution = KTIME_MONOTONIC_RES,
+ },
+};
+
+/*
+ * The SMP/UP kludge goes here
+ */
+#if defined(CONFIG_SMP)
+
+#define set_running_timer(b,t) b->running_timer = t
+#define wake_up_timer_waiters(b) wake_up(&b->wait_for_running_timer)
+#define ktimer_base_can_change (1)
+/*
+ * Wait for a running timer
+ */
+void wait_for_ktimer(struct ktimer *timer)
+{
+ struct ktimer_base *base = timer->base;
+
+ if (base && base->running_timer == timer)
+ wait_event(base->wait_for_running_timer,
+ base->running_timer != timer);
+}
+
+/*
+ * We are using hashed locking: holding per_cpu(ktimer_bases)[n].lock
+ * means that all timers which are tied to this base via timer->base are
+ * locked, and the base itself is locked too.
+ *
+ * So __run_timers/migrate_timers can safely modify all timers which could
+ * be found on the lists/queues.
+ *
+ * When the timer's base is locked, and the timer removed from list, it is
+ * possible to set timer->base = NULL and drop the lock: the timer remains
+ * locked.
+ */
+static inline struct ktimer_base *lock_ktimer_base(struct ktimer *timer,
+ unsigned long *flags)
+{
+ struct ktimer_base *base;
+
+ for (;;) {
+ base = timer->base;
+ if (likely(base != NULL)) {
+ spin_lock_irqsave(&base->lock, *flags);
+ if (likely(base == timer->base))
+ return base;
+ /* The timer has migrated to another CPU */
+ spin_unlock_irqrestore(&base->lock, *flags);
+ }
+ cpu_relax();
+ }
+}
+
+static inline struct ktimer_base *switch_ktimer_base(struct ktimer *timer,
+ struct ktimer_base *base)
+{
+ int ktidx = base->index;
+ struct ktimer_base *new_base = &__get_cpu_var(ktimer_bases[ktidx]);
+
+ if (base != new_base) {
+ /*
+ * We are trying to schedule the timer on the local CPU.
+ * However we can't change timer's base while it is running,
+ * so we keep it on the same CPU. No hassle vs. reprogramming
+ * the event source in the high resolution case. The softirq
+ * code will take care of this when the timer function has
+ * completed. There is no conflict as we hold the lock until
+ * the timer is enqueued.
+ */
+ if (unlikely(base->running_timer == timer)) {
+ return base;
+ } else {
+ /* See the comment in lock_timer_base() */
+ timer->base = NULL;
+ spin_unlock(&base->lock);
+ spin_lock(&new_base->lock);
+ timer->base = new_base;
+ }
+ }
+ return new_base;
+}
+
+/*
+ * Get the timer base unlocked
+ *
+ * Take care of timer->base = NULL in switch_ktimer_base !
+ */
+static inline struct ktimer_base *get_ktimer_base_unlocked(struct ktimer *timer)
+{
+ struct ktimer_base *base;
+ while (!(base = timer->base));
+ return base;
+}
+#else
+
+#define set_running_timer(b,t) do {} while (0)
+#define wake_up_timer_waiters(b) do {} while (0)
+
+static inline struct ktimer_base *lock_ktimer_base(struct ktimer *timer,
+ unsigned long *flags)
+{
+ struct ktimer_base *base;
+
+ base = timer->base;
+ spin_lock_irqsave(&base->lock, *flags);
+ return base;
+}
+
+#define switch_ktimer_base(t, b) b
+
+#define get_ktimer_base_unlocked(t) (t)->base
+#define ktimer_base_can_change (0)
+
+#endif /* !CONFIG_SMP */
+
+/*
+ * Convert timespec to ktime_t with resolution adjustment
+ *
+ * Note: We can access base without locking here, as ktimers can
+ * migrate between CPUs but can not be moved from one clock source to
+ * another. The clock source binding is set at init_ktimer_XXX.
+ */
+ktime_t ktimer_convert_timespec(struct ktimer *timer, struct timespec *ts)
+{
+ struct ktimer_base *base = get_ktimer_base_unlocked(timer);
+ ktime_t t;
+ long rem = ts->tv_nsec % base->resolution;
+
+ t = ktime_set(ts->tv_sec, ts->tv_nsec);
+
+ /* Check, if the value has to be rounded */
+ if (rem)
+ t = ktime_add_ns(t, base->resolution - rem);
+ return t;
+}
+
+/*
+ * Convert timeval to ktime_t with resolution adjustment
+ */
+ktime_t ktimer_convert_timeval(struct ktimer *timer, struct timeval *tv)
+{
+ struct timespec ts;
+
+ ts.tv_sec = tv->tv_sec;
+ ts.tv_nsec = tv->tv_usec * NSEC_PER_USEC;
+
+ return ktimer_convert_timespec(timer, &ts);
+}
+
+/*
+ * Internal function to add (re)start a timer
+ *
+ * The timer is inserted in expiry order.
+ * Insertion into the red black tree is O(log(n))
+ *
+ */
+static int enqueue_ktimer(struct ktimer *timer, struct ktimer_base *base,
+ ktime_t *tim, int mode)
+{
+ struct rb_node **link = &base->active.rb_node;
+ struct rb_node *parent = NULL;
+ struct ktimer *entry;
+ struct list_head *prev = &base->pending;
+ ktime_t now;
+
+ /* Get current time */
+ now = base->get_time();
+
+ /* Timer expiry mode */
+ switch (mode & ~KTIMER_NOCHECK) {
+ case KTIMER_ABS:
+ timer->expires = *tim;
+ break;
+ case KTIMER_REL:
+ timer->expires = ktime_add(now, *tim);
+ break;
+ case KTIMER_INCR:
+ timer->expires = ktime_add(timer->expires, *tim);
+ break;
+ case KTIMER_FORWARD:
+ while ktime_cmp(timer->expires, <= , now) {
+ timer->expires = ktime_add(timer->expires, *tim);
+ timer->overrun++;
+ }
+ goto nocheck;
+ case KTIMER_REARM:
+ while ktime_cmp(timer->expires, <= , now) {
+ timer->expires = ktime_add(timer->expires, *tim);
+ timer->overrun++;
+ }
+ goto nocheck;
+ case KTIMER_RESTART:
+ break;
+ default:
+ BUG();
+ }
+
+ /* Already expired.*/
+ if ktime_cmp(timer->expires, <=, now) {
+ timer->expired = now;
+ /* The caller takes care of expiry */
+ if (!(mode & KTIMER_NOCHECK))
+ return -1;
+ }
+ nocheck:
+
+ while (*link) {
+ parent = *link;
+ entry = rb_entry(parent, struct ktimer, node);
+ /*
+ * We dont care about collisions. Nodes with
+ * the same expiry time stay together.
+ */
+ if (ktimer_before(timer, entry))
+ link = &(*link)->rb_left;
+ else {
+ link = &(*link)->rb_right;
+ prev = &entry->list;
+ }
+ }
+
+ rb_link_node(&timer->node, parent, link);
+ rb_insert_color(&timer->node, &base->active);
+ list_add(&timer->list, prev);
+ timer->status = KTIMER_PENDING;
+ base->count++;
+ return 0;
+}
+
+/*
+ * Internal helper to remove a timer
+ *
+ * The function allows automatic rearming for interval
+ * timers.
+ *
+ */
+static inline void do_remove_ktimer(struct ktimer *timer,
+ struct ktimer_base *base, int rearm)
+{
+ list_del(&timer->list);
+ rb_erase(&timer->node, &base->active);
+ timer->node.rb_parent = KTIMER_POISON;
+ timer->status = KTIMER_INACTIVE;
+ base->count--;
+ BUG_ON(base->count < 0);
+ /* Auto rearm the timer ? */
+ if (rearm && ktime_cmp_val(timer->interval, !=, KTIME_ZERO))
+ enqueue_ktimer(timer, base, NULL, KTIMER_REARM);
+}
+
+/*
+ * Called with base lock held
+ */
+static inline int remove_ktimer(struct ktimer *timer, struct ktimer_base *base)
+{
+ if (ktimer_active(timer)) {
+ do_remove_ktimer(timer, base, KTIMER_NOREARM);
+ return 1;
+ }
+ return 0;
+}
+
+/*
+ * Internal function to (re)start a timer.
+ */
+static int internal_restart_ktimer(struct ktimer *timer, ktime_t *tim,
+ int mode)
+{
+ struct ktimer_base *base, *new_base;
+ unsigned long flags;
+ int ret;
+
+ BUG_ON(!timer->function);
+
+ base = lock_ktimer_base(timer, &flags);
+
+ /* Remove an active timer from the queue */
+ ret = remove_ktimer(timer, base);
+
+ /* Switch the timer base, if necessary */
+ new_base = switch_ktimer_base(timer, base);
+
+ /*
+ * When the new timer setting is already expired,
+ * let the calling code deal with it.
+ */
+ if (enqueue_ktimer(timer, new_base, tim, mode))
+ ret = -1;
+
+ spin_unlock_irqrestore(&new_base->lock, flags);
+ return ret;
+}
+
+/***
+ * modify_ktimer - modify a running timer
+ * @timer: the timer to be modified
+ * @tim: expiry time (required)
+ * @mode: timer setup mode
+ *
+ */
+int modify_ktimer(struct ktimer *timer, ktime_t *tim, int mode)
+{
+ BUG_ON(!tim || !timer->function);
+ return internal_restart_ktimer(timer, tim, mode);
+}
+
+/***
+ * start_ktimer - start a timer on current CPU
+ * @timer: the timer to be added
+ * @tim: expiry time (optional, if not set in the timer)
+ * @mode: timer setup mode
+ */
+int start_ktimer(struct ktimer *timer, ktime_t *tim, int mode)
+{
+ BUG_ON(ktimer_active(timer) || !timer->function);
+
+ return internal_restart_ktimer(timer, tim, mode);
+}
+
+/***
+ * try_to_stop_ktimer - try to deactivate a timer
+ */
+int try_to_stop_ktimer(struct ktimer *timer)
+{
+ struct ktimer_base *base;
+ unsigned long flags;
+ int ret = -1;
+
+ base = lock_ktimer_base(timer, &flags);
+
+ if (base->running_timer != timer) {
+ ret = remove_ktimer(timer, base);
+ if (ret)
+ timer->expired = base->get_time();
+ }
+
+ spin_unlock_irqrestore(&base->lock, flags);
+
+ return ret;
+
+}
+
+/***
+ * stop_timer_sync - deactivate a timer and wait for the handler to finish.
+ * @timer: the timer to be deactivated
+ *
+ */
+int stop_ktimer(struct ktimer *timer)
+{
+ for (;;) {
+ int ret = try_to_stop_ktimer(timer);
+ if (ret >= 0)
+ return ret;
+ wait_for_ktimer(timer);
+ }
+}
+
+/***
+ * get_remtime_ktimer - get remaining time for the timer
+ * @timer: the timer to read
+ * @fake: when fake > 0 a pending, but expired timer
+ * returns fake (itimers need this, uurg)
+ */
+ktime_t get_remtime_ktimer(struct ktimer *timer, long fake)
+{
+ struct ktimer_base *base;
+ unsigned long flags;
+ ktime_t rem;
+
+ base = lock_ktimer_base(timer, &flags);
+ if (ktimer_active(timer)) {
+ rem = ktime_sub(timer->expires,base->get_time());
+ if (fake && ktime_cmp_val(rem, <=, KTIME_ZERO))
+ rem = ktime_set(0, fake);
+ } else {
+ if (!fake)
+ rem = ktime_sub(timer->expires,base->get_time());
+ else
+ ktime_set_zero(rem);
+ }
+ spin_unlock_irqrestore(&base->lock, flags);
+ return rem;
+}
+
+/***
+ * get_expiry_ktimer - get expiry time for the timer
+ * @timer: the timer to read
+ * @now: if != NULL store current base->time
+ */
+ktime_t get_expiry_ktimer(struct ktimer *timer, ktime_t *now)
+{
+ struct ktimer_base *base;
+ unsigned long flags;
+ ktime_t expiry;
+
+ base = lock_ktimer_base(timer, &flags);
+ expiry = timer->expires;
+ if (now)
+ *now = base->get_time();
+ spin_unlock_irqrestore(&base->lock, flags);
+ return expiry;
+}
+
+/*
+ * Functions related to clock sources
+ */
+
+static inline void ktimer_common_init(struct ktimer *timer)
+{
+ memset(timer, 0, sizeof(struct ktimer));
+ timer->node.rb_parent = KTIMER_POISON;
+}
+
+/*
+ * Get monotonic time
+ */
+static ktime_t get_ktime_mono(void)
+{
+ return do_get_ktime_mono();
+}
+
+/***
+ * init_ktimer_mono - initialize a timer on monotonic time
+ * @timer: the timer to be initialized
+ *
+ */
+void fastcall init_ktimer_mono(struct ktimer *timer)
+{
+ ktimer_common_init(timer);
+ timer->base =
+ &per_cpu(ktimer_bases, raw_smp_processor_id())[CLOCK_MONOTONIC];
+}
+
+/***
+ * get_ktimer_mono_res - get the monotonic timer resolution
+ *
+ */
+int get_ktimer_mono_res(clockid_t which_clock, struct timespec *tp)
+{
+ tp->tv_sec = 0;
+ tp->tv_nsec =
+ per_cpu(ktimer_bases, raw_smp_processor_id())[CLOCK_MONOTONIC].resolution;
+ return 0;
+}
+
+/*
+ * Get real time
+ */
+static ktime_t get_ktime_real(void)
+{
+ return do_get_ktime_real();
+}
+
+/***
+ * init_ktimer_real - initialize a timer on real time
+ * @timer: the timer to be initialized
+ *
+ */
+void fastcall init_ktimer_real(struct ktimer *timer)
+{
+ ktimer_common_init(timer);
+ timer->base =
+ &per_cpu(ktimer_bases, raw_smp_processor_id())[CLOCK_REALTIME];
+}
+
+/***
+ * get_ktimer_real_res - get the real timer resolution
+ *
+ */
+int get_ktimer_real_res(clockid_t which_clock, struct timespec *tp)
+{
+ tp->tv_sec = 0;
+ tp->tv_nsec =
+ per_cpu(ktimer_bases, raw_smp_processor_id())[CLOCK_REALTIME].resolution;
+ return 0;
+}
+
+/*
+ * The per base runqueue
+ */
+static inline void run_ktimer_queue(struct ktimer_base *base)
+{
+ ktime_t now = base->get_time();
+
+ spin_lock_irq(&base->lock);
+ while (!list_empty(&base->pending)) {
+ void (*fn)(void *);
+ void *data;
+ struct ktimer *timer = list_entry(base->pending.next,
+ struct ktimer, list);
+ if ktime_cmp(now, <=, timer->expires)
+ break;
+ timer->expired = now;
+ fn = timer->function;
+ data = timer->data;
+ set_running_timer(base, timer);
+ do_remove_ktimer(timer, base, KTIMER_REARM);
+ spin_unlock_irq(&base->lock);
+ fn(data);
+ spin_lock_irq(&base->lock);
+ set_running_timer(base, NULL);
+ }
+ spin_unlock_irq(&base->lock);
+ wake_up_timer_waiters(base);
+}
+
+/*
+ * Called from timer softirq every jiffy
+ */
+void run_ktimer_queues(void)
+{
+ struct ktimer_base *base = __get_cpu_var(ktimer_bases);
+ int i;
+
+ for (i = 0; i < MAX_KTIMER_BASES; i++)
+ run_ktimer_queue(&base[i]);
+}
+
+/*
+ * Functions related to initialization
+ */
+static void __devinit init_ktimers_cpu(int cpu)
+{
+ struct ktimer_base *base = per_cpu(ktimer_bases, cpu);
+ int i;
+
+ for (i = 0; i < MAX_KTIMER_BASES; i++) {
+ spin_lock_init(&base->lock);
+ INIT_LIST_HEAD(&base->pending);
+ init_waitqueue_head(&base->wait_for_running_timer);
+ base++;
+ }
+}
+
+#ifdef CONFIG_HOTPLUG_CPU
+static void migrate_ktimer_list(struct ktimer_base *old_base,
+ struct ktimer_base *new_base)
+{
+ struct ktimer *timer;
+ struct rb_node *node;
+
+ while ((node = rb_first(&old_base->active))) {
+ timer = rb_entry(node, struct ktimer, node);
+ remove_ktimer(timer, old_base);
+ timer->base = new_base;
+ enqueue_ktimer(timer, new_base, NULL, KTIMER_RESTART);
+ }
+}
+
+static void __devinit migrate_ktimers(int cpu)
+{
+ struct ktimer_base *old_base;
+ struct ktimer_base *new_base;
+ int i;
+
+ BUG_ON(cpu_online(cpu));
+ old_base = per_cpu(ktimer_bases, cpu);
+ new_base = get_cpu_var(ktimer_bases);
+
+ local_irq_disable();
+
+ for (i = 0; i < MAX_KTIMER_BASES; i++) {
+
+ spin_lock(&new_base->lock);
+ spin_lock(&old_base->lock);
+
+ if (old_base->running_timer)
+ BUG();
+
+ migrate_ktimer_list(old_base, new_base);
+
+ spin_unlock(&old_base->lock);
+ spin_unlock(&new_base->lock);
+ old_base++;
+ new_base++;
+ }
+
+ local_irq_enable();
+ &put_cpu_var(ktimer_bases);
+}
+#endif /* CONFIG_HOTPLUG_CPU */
+
+static int __devinit ktimer_cpu_notify(struct notifier_block *self,
+ unsigned long action, void *hcpu)
+{
+ long cpu = (long)hcpu;
+ switch(action) {
+ case CPU_UP_PREPARE:
+ init_ktimers_cpu(cpu);
+ break;
+#ifdef CONFIG_HOTPLUG_CPU
+ case CPU_DEAD:
+ migrate_ktimers(cpu);
+ break;
+#endif
+ default:
+ break;
+ }
+ return NOTIFY_OK;
+}
+
+static struct notifier_block __devinitdata ktimers_nb = {
+ .notifier_call = ktimer_cpu_notify,
+};
+
+void __init init_ktimers(void)
+{
+ ktimer_cpu_notify(&ktimers_nb, (unsigned long)CPU_UP_PREPARE,
+ (void *)(long)smp_processor_id());
+ register_cpu_notifier(&ktimers_nb);
+}
+
+/*
+ * system interface related functions
+ */
+static void process_ktimer(void *data)
+{
+ wake_up_process(data);
+}
+
+/**
+ * schedule_ktimer - sleep until timeout
+ * @timeout: timeout value
+ * @state: state to use for sleep
+ * @rel: timeout value is abs/rel
+ *
+ * Make the current task sleep until @timeout is
+ * elapsed.
+ *
+ * You can set the task state as follows -
+ *
+ * %TASK_UNINTERRUPTIBLE - at least @timeout is guaranteed to
+ * pass before the routine returns. The routine will return 0
+ *
+ * %TASK_INTERRUPTIBLE - the routine may return early if a signal is
+ * delivered to the current task. In this case the remaining time
+ * will be returned
+ *
+ * The current task state is guaranteed to be TASK_RUNNING when this
+ * routine returns.
+ *
+ */
+static fastcall ktime_t __sched schedule_ktimer(struct ktimer *timer,
+ ktime_t *t, int state, int mode)
+{
+ timer->data = current;
+ timer->function = process_ktimer;
+
+ current->state = state;
+ if (start_ktimer(timer, t, mode)) {
+ current->state = TASK_RUNNING;
+ goto out;
+ }
+ if (current->state != TASK_RUNNING)
+ schedule();
+ stop_ktimer(timer);
+ out:
+ /* Store the absolute expiry time */
+ *t = timer->expires;
+ /* Return the remaining time */
+ return ktime_sub(timer->expires, timer->expired);
+}
+
+static long __sched nanosleep_restart(struct ktimer *timer,
+ struct restart_block *restart)
+{
+ struct timespec tu;
+ ktime_t t, rem;
+ void *rfn = restart->fn;
+ struct timespec __user *rmtp = (struct timespec __user *) restart->arg2;
+
+ restart->fn = do_no_restart_syscall;
+
+ t = ktime_set_low_high(restart->arg0, restart->arg1);
+
+ rem = schedule_ktimer(timer, &t, TASK_INTERRUPTIBLE, KTIMER_ABS);
+
+ if (ktime_cmp_val(rem, <=, KTIME_ZERO))
+ return 0;
+
+ tu = ktime_to_timespec(rem);
+ if (rmtp && copy_to_user(rmtp, &rem, sizeof(tu)))
+ return -EFAULT;
+
+ restart->fn = rfn;
+ /* The other values in restart are already filled in */
+ return -ERESTART_RESTARTBLOCK;
+}
+
+static long __sched nanosleep_restart_mono(struct restart_block *restart)
+{
+ struct ktimer timer;
+
+ init_ktimer_mono(&timer);
+ return nanosleep_restart(&timer, restart);
+}
+
+static long __sched nanosleep_restart_real(struct restart_block *restart)
+{
+ struct ktimer timer;
+
+ init_ktimer_real(&timer);
+ return nanosleep_restart(&timer, restart);
+}
+
+static long ktimer_nanosleep(struct ktimer *timer, struct timespec *rqtp,
+ struct timespec __user *rmtp, int mode,
+ long (*rfn)(struct restart_block *))
+{
+ struct timespec tu;
+ ktime_t rem, t;
+ struct restart_block *restart;
+
+ t = ktimer_convert_timespec(timer, rqtp);
+
+ /* t is updated to absolute expiry time ! */
+ rem = schedule_ktimer(timer, &t, TASK_INTERRUPTIBLE, mode);
+
+ if (ktime_cmp_val(rem, <=, KTIME_ZERO))
+ return 0;
+
+ tu = ktime_to_timespec(rem);
+
+ if (rmtp && copy_to_user(rmtp, &tu, sizeof(tu)))
+ return -EFAULT;
+
+ restart = ¤t_thread_info()->restart_block;
+ restart->fn = rfn;
+ restart->arg0 = ktime_get_low(t);
+ restart->arg1 = ktime_get_high(t);
+ restart->arg2 = (unsigned long) rmtp;
+ return -ERESTART_RESTARTBLOCK;
+
+}
+
+long ktimer_nanosleep_mono(struct timespec *rqtp,
+ struct timespec __user *rmtp, int mode)
+{
+ struct ktimer timer;
+
+ init_ktimer_mono(&timer);
+ return ktimer_nanosleep(&timer, rqtp, rmtp, mode, nanosleep_restart_mono);
+}
+
+long ktimer_nanosleep_real(struct timespec *rqtp,
+ struct timespec __user *rmtp, int mode)
+{
+ struct ktimer timer;
+
+ init_ktimer_real(&timer);
+ return ktimer_nanosleep(&timer, rqtp, rmtp, mode, nanosleep_restart_real);
+}
+
+asmlinkage long sys_nanosleep(struct timespec __user *rqtp,
+ struct timespec __user *rmtp)
+{
+ struct timespec tu;
+
+ if (copy_from_user(&tu, rqtp, sizeof(tu)))
+ return -EFAULT;
+
+ if (!timespec_valid(&tu))
+ return -EINVAL;
+
+ return ktimer_nanosleep_mono(&tu, rmtp, KTIMER_REL);
+}
+
Index: linux-2.6.14-rc2-rt4/kernel/posix-cpu-timers.c
===================================================================
--- linux-2.6.14-rc2-rt4.orig/kernel/posix-cpu-timers.c
+++ linux-2.6.14-rc2-rt4/kernel/posix-cpu-timers.c
@@ -1394,7 +1394,7 @@ void set_process_cpu_timer(struct task_s
static long posix_cpu_clock_nanosleep_restart(struct restart_block *);
int posix_cpu_nsleep(clockid_t which_clock, int flags,
- struct timespec *rqtp)
+ struct timespec *rqtp, struct timespec __user *rmtp)
{
struct restart_block *restart_block =
¤t_thread_info()->restart_block;
@@ -1419,7 +1419,6 @@ int posix_cpu_nsleep(clockid_t which_clo
error = posix_cpu_timer_create(&timer);
timer.it_process = current;
if (!error) {
- struct timespec __user *rmtp;
static struct itimerspec zero_it;
struct itimerspec it = { .it_value = *rqtp,
.it_interval = {} };
@@ -1466,7 +1465,6 @@ int posix_cpu_nsleep(clockid_t which_clo
/*
* Report back to the user the time still remaining.
*/
- rmtp = (struct timespec __user *) restart_block->arg1;
if (rmtp != NULL && !(flags & TIMER_ABSTIME) &&
copy_to_user(rmtp, &it.it_value, sizeof *rmtp))
return -EFAULT;
@@ -1474,6 +1472,7 @@ int posix_cpu_nsleep(clockid_t which_clo
restart_block->fn = posix_cpu_clock_nanosleep_restart;
/* Caller already set restart_block->arg1 */
restart_block->arg0 = which_clock;
+ restart_block->arg1 = (unsigned long) rmtp;
restart_block->arg2 = rqtp->tv_sec;
restart_block->arg3 = rqtp->tv_nsec;
@@ -1487,10 +1486,15 @@ static long
posix_cpu_clock_nanosleep_restart(struct restart_block *restart_block)
{
clockid_t which_clock = restart_block->arg0;
- struct timespec t = { .tv_sec = restart_block->arg2,
- .tv_nsec = restart_block->arg3 };
+ struct timespec __user *rmtp;
+ struct timespec t;
+
+ rmtp = (struct timespec __user *) restart_block->arg1;
+ t.tv_sec = restart_block->arg2;
+ t.tv_nsec = restart_block->arg3;
+
restart_block->fn = do_no_restart_syscall;
- return posix_cpu_nsleep(which_clock, TIMER_ABSTIME, &t);
+ return posix_cpu_nsleep(which_clock, TIMER_ABSTIME, &t, rmtp);
}
@@ -1511,9 +1515,10 @@ static int process_cpu_timer_create(stru
return posix_cpu_timer_create(timer);
}
static int process_cpu_nsleep(clockid_t which_clock, int flags,
- struct timespec *rqtp)
+ struct timespec *rqtp,
+ struct timespec __user *rmtp)
{
- return posix_cpu_nsleep(PROCESS_CLOCK, flags, rqtp);
+ return posix_cpu_nsleep(PROCESS_CLOCK, flags, rqtp, rmtp);
}
static int thread_cpu_clock_getres(clockid_t which_clock, struct timespec *tp)
{
@@ -1529,7 +1534,7 @@ static int thread_cpu_timer_create(struc
return posix_cpu_timer_create(timer);
}
static int thread_cpu_nsleep(clockid_t which_clock, int flags,
- struct timespec *rqtp)
+ struct timespec *rqtp, struct timespec __user *rmtp)
{
return -EINVAL;
}
Index: linux-2.6.14-rc2-rt4/kernel/posix-timers.c
===================================================================
--- linux-2.6.14-rc2-rt4.orig/kernel/posix-timers.c
+++ linux-2.6.14-rc2-rt4/kernel/posix-timers.c
@@ -48,21 +48,6 @@
#include <linux/workqueue.h>
#include <linux/module.h>
-#ifndef div_long_long_rem
-#include <asm/div64.h>
-
-#define div_long_long_rem(dividend,divisor,remainder) ({ \
- u64 result = dividend; \
- *remainder = do_div(result,divisor); \
- result; })
-
-#endif
-#define CLOCK_REALTIME_RES TICK_NSEC /* In nano seconds. */
-
-static inline u64 mpy_l_X_l_ll(unsigned long mpy1,unsigned long mpy2)
-{
- return (u64)mpy1 * mpy2;
-}
/*
* Management arrays for POSIX timers. Timers are kept in slab memory
* Timer ids are allocated by an external routine that keeps track of the
@@ -148,18 +133,18 @@ static DEFINE_SPINLOCK(idr_lock);
*/
static struct k_clock posix_clocks[MAX_CLOCKS];
+
/*
- * We only have one real clock that can be set so we need only one abs list,
- * even if we should want to have several clocks with differing resolutions.
+ * These ones are defined below.
*/
-static struct k_clock_abs abs_list = {.list = LIST_HEAD_INIT(abs_list.list),
- .lock = SPIN_LOCK_UNLOCKED};
+static int common_nsleep(clockid_t, int flags, struct timespec *t,
+ struct timespec __user *rmtp);
+static void common_timer_get(struct k_itimer *, struct itimerspec *);
+static int common_timer_set(struct k_itimer *, int,
+ struct itimerspec *, struct itimerspec *);
+static int common_timer_del(struct k_itimer *timer);
-static void posix_timer_fn(unsigned long);
-static u64 do_posix_clock_monotonic_gettime_parts(
- struct timespec *tp, struct timespec *mo);
-int do_posix_clock_monotonic_gettime(struct timespec *tp);
-static int do_posix_clock_monotonic_get(clockid_t, struct timespec *tp);
+static void posix_timer_fn(void *data);
static struct k_itimer *lock_timer(timer_t timer_id, unsigned long *flags);
@@ -205,21 +190,25 @@ static inline int common_clock_set(clock
static inline int common_timer_create(struct k_itimer *new_timer)
{
- INIT_LIST_HEAD(&new_timer->it.real.abs_timer_entry);
- init_timer(&new_timer->it.real.timer);
- new_timer->it.real.timer.data = (unsigned long) new_timer;
+ return -EINVAL;
+}
+
+static int timer_create_mono(struct k_itimer *new_timer)
+{
+ init_ktimer_mono(&new_timer->it.real.timer);
+ new_timer->it.real.timer.data = new_timer;
+ new_timer->it.real.timer.function = posix_timer_fn;
+ return 0;
+}
+
+static int timer_create_real(struct k_itimer *new_timer)
+{
+ init_ktimer_real(&new_timer->it.real.timer);
+ new_timer->it.real.timer.data = new_timer;
new_timer->it.real.timer.function = posix_timer_fn;
return 0;
}
-/*
- * These ones are defined below.
- */
-static int common_nsleep(clockid_t, int flags, struct timespec *t);
-static void common_timer_get(struct k_itimer *, struct itimerspec *);
-static int common_timer_set(struct k_itimer *, int,
- struct itimerspec *, struct itimerspec *);
-static int common_timer_del(struct k_itimer *timer);
/*
* Return nonzero iff we know a priori this clockid_t value is bogus.
@@ -239,19 +228,44 @@ static inline int invalid_clockid(clocki
return 1;
}
+/*
+ * Get real time for posix timers
+ */
+static int posix_get_ktime_real_ts(clockid_t which_clock, struct timespec *tp)
+{
+ get_ktime_real_ts(tp);
+ return 0;
+}
+
+/*
+ * Get monotonic time for posix timers
+ */
+static int posix_get_ktime_mono_ts(clockid_t which_clock, struct timespec *tp)
+{
+ get_ktime_mono_ts(tp);
+ return 0;
+}
+
+void do_posix_clock_monotonic_gettime(struct timespec *ts)
+{
+ get_ktime_mono_ts(ts);
+}
/*
* Initialize everything, well, just everything in Posix clocks/timers ;)
*/
static __init int init_posix_timers(void)
{
- struct k_clock clock_realtime = {.res = CLOCK_REALTIME_RES,
- .abs_struct = &abs_list
+ struct k_clock clock_realtime = {
+ .clock_getres = get_ktimer_real_res,
+ .clock_get = posix_get_ktime_real_ts,
+ .timer_create = timer_create_real,
};
- struct k_clock clock_monotonic = {.res = CLOCK_REALTIME_RES,
- .abs_struct = NULL,
- .clock_get = do_posix_clock_monotonic_get,
- .clock_set = do_posix_clock_nosettime
+ struct k_clock clock_monotonic = {
+ .clock_getres = get_ktimer_mono_res,
+ .clock_get = posix_get_ktime_mono_ts,
+ .clock_set = do_posix_clock_nosettime,
+ .timer_create = timer_create_mono,
};
register_posix_clock(CLOCK_REALTIME, &clock_realtime);
@@ -265,117 +279,17 @@ static __init int init_posix_timers(void
__initcall(init_posix_timers);
-static void tstojiffie(struct timespec *tp, int res, u64 *jiff)
-{
- long sec = tp->tv_sec;
- long nsec = tp->tv_nsec + res - 1;
-
- if (nsec > NSEC_PER_SEC) {
- sec++;
- nsec -= NSEC_PER_SEC;
- }
-
- /*
- * The scaling constants are defined in <linux/time.h>
- * The difference between there and here is that we do the
- * res rounding and compute a 64-bit result (well so does that
- * but it then throws away the high bits).
- */
- *jiff = (mpy_l_X_l_ll(sec, SEC_CONVERSION) +
- (mpy_l_X_l_ll(nsec, NSEC_CONVERSION) >>
- (NSEC_JIFFIE_SC - SEC_JIFFIE_SC))) >> SEC_JIFFIE_SC;
-}
-
-/*
- * This function adjusts the timer as needed as a result of the clock
- * being set. It should only be called for absolute timers, and then
- * under the abs_list lock. It computes the time difference and sets
- * the new jiffies value in the timer. It also updates the timers
- * reference wall_to_monotonic value. It is complicated by the fact
- * that tstojiffies() only handles positive times and it needs to work
- * with both positive and negative times. Also, for negative offsets,
- * we need to defeat the res round up.
- *
- * Return is true if there is a new time, else false.
- */
-static long add_clockset_delta(struct k_itimer *timr,
- struct timespec *new_wall_to)
-{
- struct timespec delta;
- int sign = 0;
- u64 exp;
-
- set_normalized_timespec(&delta,
- new_wall_to->tv_sec -
- timr->it.real.wall_to_prev.tv_sec,
- new_wall_to->tv_nsec -
- timr->it.real.wall_to_prev.tv_nsec);
- if (likely(!(delta.tv_sec | delta.tv_nsec)))
- return 0;
- if (delta.tv_sec < 0) {
- set_normalized_timespec(&delta,
- -delta.tv_sec,
- 1 - delta.tv_nsec -
- posix_clocks[timr->it_clock].res);
- sign++;
- }
- tstojiffie(&delta, posix_clocks[timr->it_clock].res, &exp);
- timr->it.real.wall_to_prev = *new_wall_to;
- timr->it.real.timer.expires += (sign ? -exp : exp);
- return 1;
-}
-
-static void remove_from_abslist(struct k_itimer *timr)
-{
- if (!list_empty(&timr->it.real.abs_timer_entry)) {
- spin_lock(&abs_list.lock);
- list_del_init(&timr->it.real.abs_timer_entry);
- spin_unlock(&abs_list.lock);
- }
-}
static void schedule_next_timer(struct k_itimer *timr)
{
- struct timespec new_wall_to;
- struct now_struct now;
- unsigned long seq;
-
- /*
- * Set up the timer for the next interval (if there is one).
- * Note: this code uses the abs_timer_lock to protect
- * it.real.wall_to_prev and must hold it until exp is set, not exactly
- * obvious...
-
- * This function is used for CLOCK_REALTIME* and
- * CLOCK_MONOTONIC* timers. If we ever want to handle other
- * CLOCKs, the calling code (do_schedule_next_timer) would need
- * to pull the "clock" info from the timer and dispatch the
- * "other" CLOCKs "next timer" code (which, I suppose should
- * also be added to the k_clock structure).
- */
- if (!timr->it.real.incr)
+ if (ktime_cmp_val(timr->it.real.incr, ==, KTIME_ZERO))
return;
- do {
- seq = read_seqbegin(&xtime_lock);
- new_wall_to = wall_to_monotonic;
- posix_get_now(&now);
- } while (read_seqretry(&xtime_lock, seq));
-
- if (!list_empty(&timr->it.real.abs_timer_entry)) {
- spin_lock(&abs_list.lock);
- add_clockset_delta(timr, &new_wall_to);
-
- posix_bump_timer(timr, now);
-
- spin_unlock(&abs_list.lock);
- } else {
- posix_bump_timer(timr, now);
- }
- timr->it_overrun_last = timr->it_overrun;
- timr->it_overrun = -1;
+ timr->it_overrun_last = timr->it.real.overrun;
+ timr->it.real.overrun = timr->it.real.timer.overrun = -1;
++timr->it_requeue_pending;
- add_timer(&timr->it.real.timer);
+ start_ktimer(&timr->it.real.timer, &timr->it.real.incr, KTIMER_FORWARD);
+ timr->it.real.overrun = timr->it.real.timer.overrun;
}
/*
@@ -413,14 +327,7 @@ int posix_timer_event(struct k_itimer *t
{
memset(&timr->sigq->info, 0, sizeof(siginfo_t));
timr->sigq->info.si_sys_private = si_private;
- /*
- * Send signal to the process that owns this timer.
-
- * This code assumes that all the possible abs_lists share the
- * same lock (there is only one list at this time). If this is
- * not the case, the CLOCK info would need to be used to find
- * the proper abs list lock.
- */
+ /* Send signal to the process that owns this timer.*/
timr->sigq->info.si_signo = timr->it_sigev_signo;
timr->sigq->info.si_errno = 0;
@@ -454,65 +361,28 @@ EXPORT_SYMBOL_GPL(posix_timer_event);
* This code is for CLOCK_REALTIME* and CLOCK_MONOTONIC* timers.
*/
-static void posix_timer_fn(unsigned long __data)
+static void posix_timer_fn(void *data)
{
- struct k_itimer *timr = (struct k_itimer *) __data;
+ struct k_itimer *timr = data;
unsigned long flags;
- unsigned long seq;
- struct timespec delta, new_wall_to;
- u64 exp = 0;
- int do_notify = 1;
+ int si_private = 0;
spin_lock_irqsave(&timr->it_lock, flags);
- if (!list_empty(&timr->it.real.abs_timer_entry)) {
- spin_lock(&abs_list.lock);
- do {
- seq = read_seqbegin(&xtime_lock);
- new_wall_to = wall_to_monotonic;
- } while (read_seqretry(&xtime_lock, seq));
- set_normalized_timespec(&delta,
- new_wall_to.tv_sec -
- timr->it.real.wall_to_prev.tv_sec,
- new_wall_to.tv_nsec -
- timr->it.real.wall_to_prev.tv_nsec);
- if (likely((delta.tv_sec | delta.tv_nsec ) == 0)) {
- /* do nothing, timer is on time */
- } else if (delta.tv_sec < 0) {
- /* do nothing, timer is already late */
- } else {
- /* timer is early due to a clock set */
- tstojiffie(&delta,
- posix_clocks[timr->it_clock].res,
- &exp);
- timr->it.real.wall_to_prev = new_wall_to;
- timr->it.real.timer.expires += exp;
- add_timer(&timr->it.real.timer);
- do_notify = 0;
- }
- spin_unlock(&abs_list.lock);
- }
- if (do_notify) {
- int si_private=0;
+ if (ktime_cmp_val(timr->it.real.incr, !=, KTIME_ZERO))
+ si_private = ++timr->it_requeue_pending;
- if (timr->it.real.incr)
- si_private = ++timr->it_requeue_pending;
- else {
- remove_from_abslist(timr);
- }
+ if (posix_timer_event(timr, si_private))
+ /*
+ * signal was not sent because of sig_ignor
+ * we will not get a call back to restart it AND
+ * it should be restarted.
+ */
+ schedule_next_timer(timr);
- if (posix_timer_event(timr, si_private))
- /*
- * signal was not sent because of sig_ignor
- * we will not get a call back to restart it AND
- * it should be restarted.
- */
- schedule_next_timer(timr);
- }
unlock_timer(timr, flags); /* hold thru abs lock to keep irq off */
}
-
static inline struct task_struct * good_sigevent(sigevent_t * event)
{
struct task_struct *rtn = current->group_leader;
@@ -776,39 +646,40 @@ static struct k_itimer * lock_timer(time
static void
common_timer_get(struct k_itimer *timr, struct itimerspec *cur_setting)
{
- unsigned long expires;
- struct now_struct now;
+ ktime_t expires, now, remaining;
+ struct ktimer *timer = &timr->it.real.timer;
- do
- expires = timr->it.real.timer.expires;
- while ((volatile long) (timr->it.real.timer.expires) != expires);
-
- posix_get_now(&now);
-
- if (expires &&
- ((timr->it_sigev_notify & ~SIGEV_THREAD_ID) == SIGEV_NONE) &&
- !timr->it.real.incr &&
- posix_time_before(&timr->it.real.timer, &now))
- timr->it.real.timer.expires = expires = 0;
- if (expires) {
- if (timr->it_requeue_pending & REQUEUE_PENDING ||
- (timr->it_sigev_notify & ~SIGEV_THREAD_ID) == SIGEV_NONE) {
- posix_bump_timer(timr, now);
- expires = timr->it.real.timer.expires;
- }
- else
- if (!timer_pending(&timr->it.real.timer))
- expires = 0;
- if (expires)
- expires -= now.jiffies;
- }
- jiffies_to_timespec(expires, &cur_setting->it_value);
- jiffies_to_timespec(timr->it.real.incr, &cur_setting->it_interval);
-
- if (cur_setting->it_value.tv_sec < 0) {
+ memset(cur_setting, 0, sizeof(struct itimerspec));
+ expires = get_expiry_ktimer(timer, &now);
+ remaining = ktime_sub(expires, now);
+
+ /* Time left ? or timer pending */
+ if (ktime_cmp_val(remaining, >, KTIME_ZERO) || ktimer_active(timer))
+ goto calci;
+ /* interval timer ? */
+ if (ktime_cmp_val(timr->it.real.incr, ==, 0))
+ return;
+ /*
+ * When a requeue is pending or this is a SIGEV_NONE timer
+ * move the expiry time forward by intervals, so expiry is >
+ * now.
+ * The active (non SIGEV_NONE) rearm should be done
+ * automatically by the ktimer REARM mode. Thats the next
+ * iteration. The REQUEUE_PENDING part will go away !
+ */
+ if (timr->it_requeue_pending & REQUEUE_PENDING ||
+ (timr->it_sigev_notify & ~SIGEV_THREAD_ID) == SIGEV_NONE) {
+ remaining = forward_posix_timer(timr, now);
+ }
+ calci:
+ /* interval timer ? */
+ if (ktime_cmp_val(timr->it.real.incr, !=, KTIME_ZERO))
+ cur_setting->it_interval = ktime_to_timespec(timr->it.real.incr);
+ /* Return 0 only, when the timer is expired and not pending */
+ if (ktime_cmp_val(remaining, <=, KTIME_ZERO))
cur_setting->it_value.tv_nsec = 1;
- cur_setting->it_value.tv_sec = 0;
- }
+ else
+ cur_setting->it_value = ktime_to_timespec(remaining);
}
/* Get the time remaining on a POSIX.1b interval timer. */
@@ -832,6 +703,7 @@ sys_timer_gettime(timer_t timer_id, stru
return 0;
}
+
/*
* Get the number of overruns of a POSIX.1b interval timer. This is to
* be the overrun of the timer last delivered. At the same time we are
@@ -858,84 +730,6 @@ sys_timer_getoverrun(timer_t timer_id)
return overrun;
}
-/*
- * Adjust for absolute time
- *
- * If absolute time is given and it is not CLOCK_MONOTONIC, we need to
- * adjust for the offset between the timer clock (CLOCK_MONOTONIC) and
- * what ever clock he is using.
- *
- * If it is relative time, we need to add the current (CLOCK_MONOTONIC)
- * time to it to get the proper time for the timer.
- */
-static int adjust_abs_time(struct k_clock *clock, struct timespec *tp,
- int abs, u64 *exp, struct timespec *wall_to)
-{
- struct timespec now;
- struct timespec oc = *tp;
- u64 jiffies_64_f;
- int rtn =0;
-
- if (abs) {
- /*
- * The mask pick up the 4 basic clocks
- */
- if (!((clock - &posix_clocks[0]) & ~CLOCKS_MASK)) {
- jiffies_64_f = do_posix_clock_monotonic_gettime_parts(
- &now, wall_to);
- /*
- * If we are doing a MONOTONIC clock
- */
- if((clock - &posix_clocks[0]) & CLOCKS_MONO){
- now.tv_sec += wall_to->tv_sec;
- now.tv_nsec += wall_to->tv_nsec;
- }
- } else {
- /*
- * Not one of the basic clocks
- */
- clock->clock_get(clock - posix_clocks, &now);
- jiffies_64_f = get_jiffies_64();
- }
- /*
- * Take away now to get delta and normalize
- */
- set_normalized_timespec(&oc, oc.tv_sec - now.tv_sec,
- oc.tv_nsec - now.tv_nsec);
- }else{
- jiffies_64_f = get_jiffies_64();
- }
- /*
- * Check if the requested time is prior to now (if so set now)
- */
- if (oc.tv_sec < 0)
- oc.tv_sec = oc.tv_nsec = 0;
-
- if (oc.tv_sec | oc.tv_nsec)
- set_normalized_timespec(&oc, oc.tv_sec,
- oc.tv_nsec + clock->res);
- tstojiffie(&oc, clock->res, exp);
-
- /*
- * Check if the requested time is more than the timer code
- * can handle (if so we error out but return the value too).
- */
- if (*exp > ((u64)MAX_JIFFY_OFFSET))
- /*
- * This is a considered response, not exactly in
- * line with the standard (in fact it is silent on
- * possible overflows). We assume such a large
- * value is ALMOST always a programming error and
- * try not to compound it by setting a really dumb
- * value.
- */
- rtn = -EINVAL;
- /*
- * return the actual jiffies expire time, full 64 bits
- */
- *exp += jiffies_64_f;
- return rtn;
-}
/* Set a POSIX.1b interval timer. */
/* timr->it_lock is taken. */
@@ -943,68 +737,52 @@ static inline int
common_timer_set(struct k_itimer *timr, int flags,
struct itimerspec *new_setting, struct itimerspec *old_setting)
{
- struct k_clock *clock = &posix_clocks[timr->it_clock];
- u64 expire_64;
+ ktime_t expires;
+ int mode;
if (old_setting)
common_timer_get(timr, old_setting);
/* disable the timer */
- timr->it.real.incr = 0;
+ ktime_set_zero(timr->it.real.incr);
/*
* careful here. If smp we could be in the "fire" routine which will
* be spinning as we hold the lock. But this is ONLY an SMP issue.
*/
- if (try_to_del_timer_sync(&timr->it.real.timer) < 0) {
-#ifdef CONFIG_SMP
- /*
- * It can only be active if on an other cpu. Since
- * we have cleared the interval stuff above, it should
- * clear once we release the spin lock. Of course once
- * we do that anything could happen, including the
- * complete melt down of the timer. So return with
- * a "retry" exit status.
- */
+ if (try_to_stop_ktimer(&timr->it.real.timer) < 0)
return TIMER_RETRY;
-#endif
- }
-
- remove_from_abslist(timr);
timr->it_requeue_pending = (timr->it_requeue_pending + 2) &
~REQUEUE_PENDING;
timr->it_overrun_last = 0;
timr->it_overrun = -1;
- /*
- *switch off the timer when it_value is zero
- */
- if (!new_setting->it_value.tv_sec && !new_setting->it_value.tv_nsec) {
- timr->it.real.timer.expires = 0;
+
+ /* switch off the timer when it_value is zero */
+ if (!new_setting->it_value.tv_sec && !new_setting->it_value.tv_nsec)
return 0;
- }
- if (adjust_abs_time(clock,
- &new_setting->it_value, flags & TIMER_ABSTIME,
- &expire_64, &(timr->it.real.wall_to_prev))) {
- return -EINVAL;
- }
- timr->it.real.timer.expires = (unsigned long)expire_64;
- tstojiffie(&new_setting->it_interval, clock->res, &expire_64);
- timr->it.real.incr = (unsigned long)expire_64;
+ mode = flags & TIMER_ABSTIME ? KTIMER_ABS : KTIMER_REL;
- /*
- * We do not even queue SIGEV_NONE timers! But we do put them
- * in the abs list so we can do that right.
+ /* Posix madness. Only absolute CLOCK_REALTIME timers
+ * are affected by clock sets. So we must reiniatilize
+ * the timer.
*/
+ if (timr->it_clock == CLOCK_REALTIME && mode == KTIMER_ABS)
+ timer_create_real(timr);
+ else
+ timer_create_mono(timr);
+
+ expires = ktimer_convert_timespec(&timr->it.real.timer,
+ &new_setting->it_value);
+ /* This should be moved to the auto rearm code */
+ timr->it.real.incr = ktimer_convert_timespec(&timr->it.real.timer,
+ &new_setting->it_interval);
+
+ /* SIGEV_NONE timers are not queued ! See common_timer_get */
if (((timr->it_sigev_notify & ~SIGEV_THREAD_ID) != SIGEV_NONE))
- add_timer(&timr->it.real.timer);
+ start_ktimer(&timr->it.real.timer, &expires,
+ mode | KTIMER_NOCHECK);
- if (flags & TIMER_ABSTIME && clock->abs_struct) {
- spin_lock(&clock->abs_struct->lock);
- list_add_tail(&(timr->it.real.abs_timer_entry),
- &(clock->abs_struct->list));
- spin_unlock(&clock->abs_struct->lock);
- }
return 0;
}
@@ -1039,6 +817,7 @@ retry:
unlock_timer(timr, flag);
if (error == TIMER_RETRY) {
+ wait_for_ktimer(&timr->it.real.timer);
rtn = NULL; // We already got the old time...
goto retry;
}
@@ -1052,24 +831,10 @@ retry:
static inline int common_timer_del(struct k_itimer *timer)
{
- timer->it.real.incr = 0;
+ ktime_set_zero(timer->it.real.incr);
- if (try_to_del_timer_sync(&timer->it.real.timer) < 0) {
-#ifdef CONFIG_SMP
- /*
- * It can only be active if on an other cpu. Since
- * we have cleared the interval stuff above, it should
- * clear once we release the spin lock. Of course once
- * we do that anything could happen, including the
- * complete melt down of the timer. So return with
- * a "retry" exit status.
- */
+ if (try_to_stop_ktimer(&timer->it.real.timer) < 0)
return TIMER_RETRY;
-#endif
- }
-
- remove_from_abslist(timer);
-
return 0;
}
@@ -1085,24 +850,17 @@ sys_timer_delete(timer_t timer_id)
struct k_itimer *timer;
long flags;
-#ifdef CONFIG_SMP
- int error;
retry_delete:
-#endif
timer = lock_timer(timer_id, &flags);
if (!timer)
return -EINVAL;
-#ifdef CONFIG_SMP
- error = timer_delete_hook(timer);
-
- if (error == TIMER_RETRY) {
+ if (timer_delete_hook(timer) == TIMER_RETRY) {
unlock_timer(timer, flags);
+ wait_for_ktimer(&timer->it.real.timer);
goto retry_delete;
}
-#else
- timer_delete_hook(timer);
-#endif
+
spin_lock(¤t->sighand->siglock);
list_del(&timer->list);
spin_unlock(¤t->sighand->siglock);
@@ -1119,6 +877,7 @@ retry_delete:
release_posix_timer(timer, IT_ID_SET);
return 0;
}
+
/*
* return timer owned by the process, used by exit_itimers
*/
@@ -1126,22 +885,14 @@ static inline void itimer_delete(struct
{
unsigned long flags;
-#ifdef CONFIG_SMP
- int error;
retry_delete:
-#endif
spin_lock_irqsave(&timer->it_lock, flags);
-#ifdef CONFIG_SMP
- error = timer_delete_hook(timer);
-
- if (error == TIMER_RETRY) {
+ if (timer_delete_hook(timer) == TIMER_RETRY) {
unlock_timer(timer, flags);
+ wait_for_ktimer(&timer->it.real.timer);
goto retry_delete;
}
-#else
- timer_delete_hook(timer);
-#endif
list_del(&timer->list);
/*
* This keeps any tasks waiting on the spin lock from thinking
@@ -1170,60 +921,7 @@ void exit_itimers(struct signal_struct *
}
}
-/*
- * And now for the "clock" calls
- *
- * These functions are called both from timer functions (with the timer
- * spin_lock_irq() held and from clock calls with no locking. They must
- * use the save flags versions of locks.
- */
-
-/*
- * We do ticks here to avoid the irq lock ( they take sooo long).
- * The seqlock is great here. Since we a reader, we don't really care
- * if we are interrupted since we don't take lock that will stall us or
- * any other cpu. Voila, no irq lock is needed.
- *
- */
-
-static u64 do_posix_clock_monotonic_gettime_parts(
- struct timespec *tp, struct timespec *mo)
-{
- u64 jiff;
- unsigned int seq;
-
- do {
- seq = read_seqbegin(&xtime_lock);
- getnstimeofday(tp);
- *mo = wall_to_monotonic;
- jiff = jiffies_64;
-
- } while(read_seqretry(&xtime_lock, seq));
-
- return jiff;
-}
-
-static int do_posix_clock_monotonic_get(clockid_t clock, struct timespec *tp)
-{
- struct timespec wall_to_mono;
-
- do_posix_clock_monotonic_gettime_parts(tp, &wall_to_mono);
-
- tp->tv_sec += wall_to_mono.tv_sec;
- tp->tv_nsec += wall_to_mono.tv_nsec;
-
- if ((tp->tv_nsec - NSEC_PER_SEC) > 0) {
- tp->tv_nsec -= NSEC_PER_SEC;
- tp->tv_sec++;
- }
- return 0;
-}
-
-int do_posix_clock_monotonic_gettime(struct timespec *tp)
-{
- return do_posix_clock_monotonic_get(CLOCK_MONOTONIC, tp);
-}
-
+/* Not available / possible... functions */
int do_posix_clock_nosettime(clockid_t clockid, struct timespec *tp)
{
return -EINVAL;
@@ -1236,7 +934,8 @@ int do_posix_clock_notimer_create(struct
}
EXPORT_SYMBOL_GPL(do_posix_clock_notimer_create);
-int do_posix_clock_nonanosleep(clockid_t clock, int flags, struct timespec *t)
+int do_posix_clock_nonanosleep(clockid_t clock, int flags, struct timespec *t,
+ struct timespec __user *r)
{
#ifndef ENOTSUP
return -EOPNOTSUPP; /* aka ENOTSUP in userland for POSIX */
@@ -1295,125 +994,34 @@ sys_clock_getres(clockid_t which_clock,
return error;
}
-static void nanosleep_wake_up(unsigned long __data)
-{
- struct task_struct *p = (struct task_struct *) __data;
-
- wake_up_process(p);
-}
-
/*
- * The standard says that an absolute nanosleep call MUST wake up at
- * the requested time in spite of clock settings. Here is what we do:
- * For each nanosleep call that needs it (only absolute and not on
- * CLOCK_MONOTONIC* (as it can not be set)) we thread a little structure
- * into the "nanosleep_abs_list". All we need is the task_struct pointer.
- * When ever the clock is set we just wake up all those tasks. The rest
- * is done by the while loop in clock_nanosleep().
- *
- * On locking, clock_was_set() is called from update_wall_clock which
- * holds (or has held for it) a write_lock_irq( xtime_lock) and is
- * called from the timer bh code. Thus we need the irq save locks.
- *
- * Also, on the call from update_wall_clock, that is done as part of a
- * softirq thing. We don't want to delay the system that much (possibly
- * long list of timers to fix), so we defer that work to keventd.
+ * nanosleep for monotonic and realtime clocks
*/
-
-static DECLARE_WAIT_QUEUE_HEAD(nanosleep_abs_wqueue);
-static DECLARE_WORK(clock_was_set_work, (void(*)(void*))clock_was_set, NULL);
-
-static DECLARE_MUTEX(clock_was_set_lock);
-
-void clock_was_set(void)
+static int common_nsleep(clockid_t which_clock, int flags,
+ struct timespec *tsave, struct timespec __user *rmtp)
{
- struct k_itimer *timr;
- struct timespec new_wall_to;
- LIST_HEAD(cws_list);
- unsigned long seq;
-
+ int mode = flags & TIMER_ABSTIME ? KTIMER_ABS : KTIMER_REL;
- if (unlikely(in_interrupt())) {
- schedule_work(&clock_was_set_work);
- return;
+ switch (which_clock) {
+ case CLOCK_REALTIME:
+ /* Posix madness. Only absolute timers on clock realtime
+ are affected by clock set. */
+ if (mode == KTIMER_ABS)
+ return ktimer_nanosleep_real(tsave, rmtp, mode);
+ case CLOCK_MONOTONIC:
+ return ktimer_nanosleep_mono(tsave, rmtp, mode);
+ default:
+ break;
}
- wake_up_all(&nanosleep_abs_wqueue);
-
- /*
- * Check if there exist TIMER_ABSTIME timers to correct.
- *
- * Notes on locking: This code is run in task context with irq
- * on. We CAN be interrupted! All other usage of the abs list
- * lock is under the timer lock which holds the irq lock as
- * well. We REALLY don't want to scan the whole list with the
- * interrupt system off, AND we would like a sequence lock on
- * this code as well. Since we assume that the clock will not
- * be set often, it seems ok to take and release the irq lock
- * for each timer. In fact add_timer will do this, so this is
- * not an issue. So we know when we are done, we will move the
- * whole list to a new location. Then as we process each entry,
- * we will move it to the actual list again. This way, when our
- * copy is empty, we are done. We are not all that concerned
- * about preemption so we will use a semaphore lock to protect
- * aginst reentry. This way we will not stall another
- * processor. It is possible that this may delay some timers
- * that should have expired, given the new clock, but even this
- * will be minimal as we will always update to the current time,
- * even if it was set by a task that is waiting for entry to
- * this code. Timers that expire too early will be caught by
- * the expire code and restarted.
-
- * Absolute timers that repeat are left in the abs list while
- * waiting for the task to pick up the signal. This means we
- * may find timers that are not in the "add_timer" list, but are
- * in the abs list. We do the same thing for these, save
- * putting them back in the "add_timer" list. (Note, these are
- * left in the abs list mainly to indicate that they are
- * ABSOLUTE timers, a fact that is used by the re-arm code, and
- * for which we have no other flag.)
-
- */
-
- down(&clock_was_set_lock);
- spin_lock_irq(&abs_list.lock);
- list_splice_init(&abs_list.list, &cws_list);
- spin_unlock_irq(&abs_list.lock);
- do {
- do {
- seq = read_seqbegin(&xtime_lock);
- new_wall_to = wall_to_monotonic;
- } while (read_seqretry(&xtime_lock, seq));
-
- spin_lock_irq(&abs_list.lock);
- if (list_empty(&cws_list)) {
- spin_unlock_irq(&abs_list.lock);
- break;
- }
- timr = list_entry(cws_list.next, struct k_itimer,
- it.real.abs_timer_entry);
-
- list_del_init(&timr->it.real.abs_timer_entry);
- if (add_clockset_delta(timr, &new_wall_to) &&
- del_timer(&timr->it.real.timer)) /* timer run yet? */
- add_timer(&timr->it.real.timer);
- list_add(&timr->it.real.abs_timer_entry, &abs_list.list);
- spin_unlock_irq(&abs_list.lock);
- } while (1);
-
- up(&clock_was_set_lock);
+ return -EINVAL;
}
-long clock_nanosleep_restart(struct restart_block *restart_block);
-
asmlinkage long
sys_clock_nanosleep(clockid_t which_clock, int flags,
const struct timespec __user *rqtp,
struct timespec __user *rmtp)
{
struct timespec t;
- struct restart_block *restart_block =
- &(current_thread_info()->restart_block);
- int ret;
if (invalid_clockid(which_clock))
return -EINVAL;
@@ -1421,135 +1029,8 @@ sys_clock_nanosleep(clockid_t which_cloc
if (copy_from_user(&t, rqtp, sizeof (struct timespec)))
return -EFAULT;
- if ((unsigned) t.tv_nsec >= NSEC_PER_SEC || t.tv_sec < 0)
+ if (!timespec_valid(&t))
return -EINVAL;
- /*
- * Do this here as nsleep function does not have the real address.
- */
- restart_block->arg1 = (unsigned long)rmtp;
-
- ret = CLOCK_DISPATCH(which_clock, nsleep, (which_clock, flags, &t));
-
- if ((ret == -ERESTART_RESTARTBLOCK) && rmtp &&
- copy_to_user(rmtp, &t, sizeof (t)))
- return -EFAULT;
- return ret;
-}
-
-
-static int common_nsleep(clockid_t which_clock,
- int flags, struct timespec *tsave)
-{
- struct timespec t, dum;
- struct timer_list new_timer;
- DECLARE_WAITQUEUE(abs_wqueue, current);
- u64 rq_time = (u64)0;
- s64 left;
- int abs;
- struct restart_block *restart_block =
- ¤t_thread_info()->restart_block;
-
- abs_wqueue.flags = 0;
- init_timer(&new_timer);
- new_timer.expires = 0;
- new_timer.data = (unsigned long) current;
- new_timer.function = nanosleep_wake_up;
- abs = flags & TIMER_ABSTIME;
-
- if (restart_block->fn == clock_nanosleep_restart) {
- /*
- * Interrupted by a non-delivered signal, pick up remaining
- * time and continue. Remaining time is in arg2 & 3.
- */
- restart_block->fn = do_no_restart_syscall;
-
- rq_time = restart_block->arg3;
- rq_time = (rq_time << 32) + restart_block->arg2;
- if (!rq_time)
- return -EINTR;
- left = rq_time - get_jiffies_64();
- if (left <= (s64)0)
- return 0; /* Already passed */
- }
-
- if (abs && (posix_clocks[which_clock].clock_get !=
- posix_clocks[CLOCK_MONOTONIC].clock_get))
- add_wait_queue(&nanosleep_abs_wqueue, &abs_wqueue);
-
- do {
- t = *tsave;
- if (abs || !rq_time) {
- adjust_abs_time(&posix_clocks[which_clock], &t, abs,
- &rq_time, &dum);
- }
-
- left = rq_time - get_jiffies_64();
- if (left >= (s64)MAX_JIFFY_OFFSET)
- left = (s64)MAX_JIFFY_OFFSET;
- if (left < (s64)0)
- break;
-
- new_timer.expires = jiffies + left;
- __set_current_state(TASK_INTERRUPTIBLE);
- add_timer(&new_timer);
-
- schedule();
-
- del_timer_sync(&new_timer);
- left = rq_time - get_jiffies_64();
- } while (left > (s64)0 && !test_thread_flag(TIF_SIGPENDING));
-
- if (abs_wqueue.task_list.next)
- finish_wait(&nanosleep_abs_wqueue, &abs_wqueue);
-
- if (left > (s64)0) {
-
- /*
- * Always restart abs calls from scratch to pick up any
- * clock shifting that happened while we are away.
- */
- if (abs)
- return -ERESTARTNOHAND;
-
- left *= TICK_NSEC;
- tsave->tv_sec = div_long_long_rem(left,
- NSEC_PER_SEC,
- &tsave->tv_nsec);
- /*
- * Restart works by saving the time remaing in
- * arg2 & 3 (it is 64-bits of jiffies). The other
- * info we need is the clock_id (saved in arg0).
- * The sys_call interface needs the users
- * timespec return address which _it_ saves in arg1.
- * Since we have cast the nanosleep call to a clock_nanosleep
- * both can be restarted with the same code.
- */
- restart_block->fn = clock_nanosleep_restart;
- restart_block->arg0 = which_clock;
- /*
- * Caller sets arg1
- */
- restart_block->arg2 = rq_time & 0xffffffffLL;
- restart_block->arg3 = rq_time >> 32;
-
- return -ERESTART_RESTARTBLOCK;
- }
-
- return 0;
-}
-/*
- * This will restart clock_nanosleep.
- */
-long
-clock_nanosleep_restart(struct restart_block *restart_block)
-{
- struct timespec t;
- int ret = common_nsleep(restart_block->arg0, 0, &t);
-
- if ((ret == -ERESTART_RESTARTBLOCK) && restart_block->arg1 &&
- copy_to_user((struct timespec __user *)(restart_block->arg1), &t,
- sizeof (t)))
- return -EFAULT;
- return ret;
+ return CLOCK_DISPATCH(which_clock, nsleep, (which_clock, flags, &t, rmtp));
}
Index: linux-2.6.14-rc2-rt4/kernel/timer.c
===================================================================
--- linux-2.6.14-rc2-rt4.orig/kernel/timer.c
+++ linux-2.6.14-rc2-rt4/kernel/timer.c
@@ -912,6 +912,7 @@ static void run_timer_softirq(struct sof
{
tvec_base_t *base = &__get_cpu_var(tvec_bases);
+ run_ktimer_queues();
if (time_after_eq(jiffies, base->timer_jiffies))
__run_timers(base);
}
@@ -1177,62 +1178,6 @@ asmlinkage long sys_gettid(void)
return current->pid;
}
-static long __sched nanosleep_restart(struct restart_block *restart)
-{
- unsigned long expire = restart->arg0, now = jiffies;
- struct timespec __user *rmtp = (struct timespec __user *) restart->arg1;
- long ret;
-
- /* Did it expire while we handled signals? */
- if (!time_after(expire, now))
- return 0;
-
- expire = schedule_timeout_interruptible(expire - now);
-
- ret = 0;
- if (expire) {
- struct timespec t;
- jiffies_to_timespec(expire, &t);
-
- ret = -ERESTART_RESTARTBLOCK;
- if (rmtp && copy_to_user(rmtp, &t, sizeof(t)))
- ret = -EFAULT;
- /* The 'restart' block is already filled in */
- }
- return ret;
-}
-
-asmlinkage long sys_nanosleep(struct timespec __user *rqtp, struct timespec __user *rmtp)
-{
- struct timespec t;
- unsigned long expire;
- long ret;
-
- if (copy_from_user(&t, rqtp, sizeof(t)))
- return -EFAULT;
-
- if ((t.tv_nsec >= 1000000000L) || (t.tv_nsec < 0) || (t.tv_sec < 0))
- return -EINVAL;
-
- expire = timespec_to_jiffies(&t) + (t.tv_sec || t.tv_nsec);
- expire = schedule_timeout_interruptible(expire);
-
- ret = 0;
- if (expire) {
- struct restart_block *restart;
- jiffies_to_timespec(expire, &t);
- if (rmtp && copy_to_user(rmtp, &t, sizeof(t)))
- return -EFAULT;
-
- restart = ¤t_thread_info()->restart_block;
- restart->fn = nanosleep_restart;
- restart->arg0 = jiffies + expire;
- restart->arg1 = (unsigned long) rmtp;
- ret = -ERESTART_RESTARTBLOCK;
- }
- return ret;
-}
-
/*
* sys_sysinfo - fill in sysinfo struct
*/
Index: linux-2.6.14-rc2-rt4/include/linux/time.h
===================================================================
--- linux-2.6.14-rc2-rt4.orig/include/linux/time.h
+++ linux-2.6.14-rc2-rt4/include/linux/time.h
@@ -4,6 +4,7 @@
#include <linux/types.h>
#ifdef __KERNEL__
+#include <linux/calc64.h>
#include <linux/seqlock.h>
#endif
@@ -38,6 +39,11 @@ static __inline__ int timespec_equal(str
return (a->tv_sec == b->tv_sec) && (a->tv_nsec == b->tv_nsec);
}
+#define timespec_valid(ts) \
+(((ts)->tv_sec >= 0) && (((unsigned) (ts)->tv_nsec) < NSEC_PER_SEC))
+
+typedef s64 nsec_t;
+
/* Converts Gregorian date to seconds since 1970-01-01 00:00:00.
* Assumes input in normal date format, i.e. 1980-12-31 23:59:59
* => year=1980, mon=12, day=31, hour=23, min=59, sec=59.
@@ -88,8 +94,7 @@ struct timespec current_kernel_time(void
extern void do_gettimeofday(struct timeval *tv);
extern int do_settimeofday(struct timespec *tv);
extern int do_sys_settimeofday(struct timespec *tv, struct timezone *tz);
-extern void clock_was_set(void); // call when ever the clock is set
-extern int do_posix_clock_monotonic_gettime(struct timespec *tp);
+extern void do_posix_clock_monotonic_gettime(struct timespec *ts);
extern long do_utimes(char __user * filename, struct timeval * times);
struct itimerval;
extern int do_setitimer(int which, struct itimerval *value, struct itimerval *ovalue);
@@ -113,6 +118,40 @@ set_normalized_timespec (struct timespec
ts->tv_nsec = nsec;
}
+static __inline__ nsec_t timespec_to_ns(struct timespec *s)
+{
+ nsec_t res = (nsec_t) s->tv_sec * NSEC_PER_SEC;
+ return res + (nsec_t) s->tv_nsec;
+}
+
+static __inline__ struct timespec ns_to_timespec(nsec_t n)
+{
+ struct timespec ts;
+
+ if (n)
+ ts.tv_sec = div_long_long_rem_signed(n, NSEC_PER_SEC, &ts.tv_nsec);
+ else
+ ts.tv_sec = ts.tv_nsec = 0;
+ return ts;
+}
+
+static __inline__ nsec_t timeval_to_ns(struct timeval *s)
+{
+ nsec_t res = (nsec_t) s->tv_sec * NSEC_PER_SEC;
+ return res + (nsec_t) s->tv_usec * NSEC_PER_USEC;
+}
+
+static __inline__ struct timeval ns_to_timeval(nsec_t n)
+{
+ struct timeval tv;
+ if (n) {
+ tv.tv_sec = div_long_long_rem_signed(n, NSEC_PER_SEC, &tv.tv_usec);
+ tv.tv_usec /= 1000;
+ } else
+ tv.tv_sec = tv.tv_usec = 0;
+ return tv;
+}
+
#endif /* __KERNEL__ */
#define NFDBITS __NFDBITS
@@ -145,23 +184,18 @@ struct itimerval {
/*
* The IDs of the various system clocks (for POSIX.1b interval timers).
*/
-#define CLOCK_REALTIME 0
-#define CLOCK_MONOTONIC 1
+#define CLOCK_REALTIME 0
+#define CLOCK_MONOTONIC 1
#define CLOCK_PROCESS_CPUTIME_ID 2
#define CLOCK_THREAD_CPUTIME_ID 3
-#define CLOCK_REALTIME_HR 4
-#define CLOCK_MONOTONIC_HR 5
/*
* The IDs of various hardware clocks
*/
-
-
#define CLOCK_SGI_CYCLE 10
#define MAX_CLOCKS 16
-#define CLOCKS_MASK (CLOCK_REALTIME | CLOCK_MONOTONIC | \
- CLOCK_REALTIME_HR | CLOCK_MONOTONIC_HR)
-#define CLOCKS_MONO (CLOCK_MONOTONIC & CLOCK_MONOTONIC_HR)
+#define CLOCKS_MASK (CLOCK_REALTIME | CLOCK_MONOTONIC)
+#define CLOCKS_MONO (CLOCK_MONOTONIC)
/*
* The various flags for setting POSIX.1b interval timers.
^ permalink raw reply [flat|nested] 67+ messages in thread* Re: [PATCH] ktimers subsystem 2.6.14-rc2-kt5 2005-09-28 20:43 [PATCH] ktimers subsystem 2.6.14-rc2-kt5 tglx @ 2005-09-28 23:59 ` Frank Sorenson 2005-09-29 0:50 ` Frank Sorenson 2005-09-29 1:10 ` john stultz ` (2 subsequent siblings) 3 siblings, 1 reply; 67+ messages in thread From: Frank Sorenson @ 2005-09-28 23:59 UTC (permalink / raw) To: tglx Cc: linux-kernel, mingo, akpm, george, johnstul, paulmck, hch, oleg, zippel, tim.bird -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 tglx@linutronix.de wrote: > This is an updated version which contains following changes: <snip> > Thanks for review and feedback. > > tglx I get this kernel panic on boot (serial capture) with the latest git tree (2.6.14-rc2++) plus this version of ktimers: [4294709.646000] Unable to handle kernel NULL pointer dereference at virtual address 00000000 [4294709.646000] printing eip: [4294709.646000] c0137578 [4294709.646000] *pde = 00000000 [4294709.646000] Oops: 0000 [#1] [4294709.646000] PREEMPT [4294709.646000] Modules linked in: ipw2200 ieee80211 ieee80211_crypt [4294709.646000] CPU: 0 [4294709.646000] EIP: 0060:[<c0137578>] Not tainted VLI [4294709.646000] EFLAGS: 00010087 (2.6.14-rc2-fs2) [4294709.646000] EIP is at enqueue_ktimer+0x168/0x280 [4294709.646000] eax: 00000000 ebx: c051c1b4 ecx: 0000002a edx: 14712508 [4294709.646000] esi: 00000000 edi: f7f42240 ebp: c051c1b8 esp: c05e4f58 [4294709.646000] ds: 007b es: 007b ss: 0068 [4294709.646000] Process swapper (pid: 0, threadinfo=c05e4000 task=c0515bc0) [4294709.646000] Stack: c051c1b4 00000000 c051c1ac 147a7f90 0000002a 147a7f90 0000002a f7f42240 [4294709.646000] 147a6ff0 0000002a c051c1ac c0137d72 00000005 c05e4000 c051c1b8 c051c1b4 [4294709.646000] c05e4000 c0124380 f7f69a90 147a6ff0 0000002a 00000001 c06136c8 0000000a [4<0>Kernel panic - not syncing: Fatal exception in interrupt [4294709.680000] Frank - -- Frank Sorenson - KD7TZK Systems Manager, Computer Science Department Brigham Young University frank@tuxrocks.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.1 (GNU/Linux) Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org iD8DBQFDOy5haI0dwg4A47wRAnXcAJ996Yrw2nkjuNThfLCep2GRZ0VjzgCcDIWl IvIgmrrHG3qB8LNszTPITX8= =TLMU -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] ktimers subsystem 2.6.14-rc2-kt5 2005-09-28 23:59 ` Frank Sorenson @ 2005-09-29 0:50 ` Frank Sorenson 2005-09-29 0:56 ` john stultz 0 siblings, 1 reply; 67+ messages in thread From: Frank Sorenson @ 2005-09-29 0:50 UTC (permalink / raw) To: Frank Sorenson Cc: tglx, linux-kernel, mingo, akpm, george, johnstul, paulmck, hch, oleg, zippel, tim.bird -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Frank Sorenson wrote: > I get this kernel panic on boot (serial capture) with the latest > git tree (2.6.14-rc2++) plus this version of ktimers: Here's a little more information. I've narrowed the panic down to ntpd startup. Without ntpd, the system seems to run okay, but panics the moment I startup ntpd. Hope this helps, Frank - -- Frank Sorenson - KD7TZK Systems Manager, Computer Science Department Brigham Young University frank@tuxrocks.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.1 (GNU/Linux) Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org iD8DBQFDOzpSaI0dwg4A47wRAipFAJ0c6/2tif49xVEhDZCH2drgpJXQmACgoY+G tT9LkOWmS67SyX5Vekrl024= =f/qY -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] ktimers subsystem 2.6.14-rc2-kt5 2005-09-29 0:50 ` Frank Sorenson @ 2005-09-29 0:56 ` john stultz 2005-09-29 1:05 ` Frank Sorenson 0 siblings, 1 reply; 67+ messages in thread From: john stultz @ 2005-09-29 0:56 UTC (permalink / raw) To: Frank Sorenson Cc: tglx, linux-kernel, mingo, akpm, george, paulmck, hch, oleg, zippel, tim.bird On Wed, 2005-09-28 at 18:50 -0600, Frank Sorenson wrote: > Frank Sorenson wrote: > > I get this kernel panic on boot (serial capture) with the latest > > git tree (2.6.14-rc2++) plus this version of ktimers: > > Here's a little more information. I've narrowed the panic down to ntpd > startup. Without ntpd, the system seems to run okay, but panics the > moment I startup ntpd. Are you just testing the ktimers patch or the full set of patches Thomas is working with (including my code)? thanks -john ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] ktimers subsystem 2.6.14-rc2-kt5 2005-09-29 0:56 ` john stultz @ 2005-09-29 1:05 ` Frank Sorenson 0 siblings, 0 replies; 67+ messages in thread From: Frank Sorenson @ 2005-09-29 1:05 UTC (permalink / raw) To: john stultz Cc: tglx, linux-kernel, mingo, akpm, george, paulmck, hch, oleg, zippel, tim.bird -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 john stultz wrote: > On Wed, 2005-09-28 at 18:50 -0600, Frank Sorenson wrote: > >>Frank Sorenson wrote: >> >>>I get this kernel panic on boot (serial capture) with the latest >>>git tree (2.6.14-rc2++) plus this version of ktimers: >> >>Here's a little more information. I've narrowed the panic down to ntpd >>startup. Without ntpd, the system seems to run okay, but panics the >>moment I startup ntpd. > > > Are you just testing the ktimers patch or the full set of patches Thomas > is working with (including my code)? > > thanks > -john After first testing with other patches, I verified that the panic occurs without any other patches involved. So, I am just testing this particular ktimers patch, without any others. Am I correct in my understanding that this patch is standalone? Frank - -- Frank Sorenson - KD7TZK Systems Manager, Computer Science Department Brigham Young University frank@tuxrocks.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.1 (GNU/Linux) Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org iD8DBQFDOz3faI0dwg4A47wRAn+/AKDsu/lRzUhbln8pNoRpfZ2V45D0NgCfQLHF lK6+uXzWFQQhp8SvqBxPw1M= =B9oy -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] ktimers subsystem 2.6.14-rc2-kt5 2005-09-28 20:43 [PATCH] ktimers subsystem 2.6.14-rc2-kt5 tglx 2005-09-28 23:59 ` Frank Sorenson @ 2005-09-29 1:10 ` john stultz 2005-09-29 6:53 ` Thomas Gleixner 2005-09-29 19:57 ` George Anzinger 2005-10-01 1:03 ` Roman Zippel 3 siblings, 1 reply; 67+ messages in thread From: john stultz @ 2005-09-29 1:10 UTC (permalink / raw) To: tglx Cc: linux-kernel, mingo, akpm, george, paulmck, hch, oleg, zippel, tim.bird On Wed, 2005-09-28 at 22:43 +0200, tglx@linutronix.de wrote: > +static int enqueue_ktimer(struct ktimer *timer, struct ktimer_base *base, > + ktime_t *tim, int mode) > +{ > + struct rb_node **link = &base->active.rb_node; > + struct rb_node *parent = NULL; > + struct ktimer *entry; > + struct list_head *prev = &base->pending; > + ktime_t now; > + > + /* Get current time */ > + now = base->get_time(); > + > + /* Timer expiry mode */ > + switch (mode & ~KTIMER_NOCHECK) { > + case KTIMER_ABS: > + timer->expires = *tim; > + break; > + case KTIMER_REL: > + timer->expires = ktime_add(now, *tim); > + break; > + case KTIMER_INCR: > + timer->expires = ktime_add(timer->expires, *tim); > + break; ... > +static inline void do_remove_ktimer(struct ktimer *timer, > + struct ktimer_base *base, int rearm) > +{ > + list_del(&timer->list); > + rb_erase(&timer->node, &base->active); > + timer->node.rb_parent = KTIMER_POISON; > + timer->status = KTIMER_INACTIVE; > + base->count--; > + BUG_ON(base->count < 0); > + /* Auto rearm the timer ? */ > + if (rearm && ktime_cmp_val(timer->interval, !=, KTIME_ZERO)) > + enqueue_ktimer(timer, base, NULL, KTIMER_REARM); > +} There's a couple of places like this where you pass NULL as the ktime_t pointer tim to enqueue_ktimer(). However in enqueue_ktimer, you dereference tim in a few spots w/o checking for NULL. I'm guessing this is what Frank is seeing. thanks -john ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] ktimers subsystem 2.6.14-rc2-kt5 2005-09-29 1:10 ` john stultz @ 2005-09-29 6:53 ` Thomas Gleixner 2005-09-30 15:58 ` Frank Sorenson 0 siblings, 1 reply; 67+ messages in thread From: Thomas Gleixner @ 2005-09-29 6:53 UTC (permalink / raw) To: john stultz Cc: linux-kernel, mingo, akpm, george, paulmck, hch, oleg, zippel, tim.bird On Wed, 2005-09-28 at 18:10 -0700, john stultz wrote: > > + /* Auto rearm the timer ? */ > > + if (rearm && ktime_cmp_val(timer->interval, !=, KTIME_ZERO)) > > + enqueue_ktimer(timer, base, NULL, KTIMER_REARM); > > +} > > > There's a couple of places like this where you pass NULL as the ktime_t > pointer tim to enqueue_ktimer(). However in enqueue_ktimer, you > dereference tim in a few spots w/o checking for NULL. > The KTIMER_REARM case is the broken spot. I fixed this already as it was oopsing here to, but somehow I messed up with quilt. tglx Index: linux-2.6.14-rc2-rt4/kernel/ktimers.c =================================================================== --- linux-2.6.14-rc2-rt4.orig/kernel/ktimers.c +++ linux-2.6.14-rc2-rt4/kernel/ktimers.c @@ -242,7 +242,7 @@ static int enqueue_ktimer(struct ktimer goto nocheck; case KTIMER_REARM: while ktime_cmp(timer->expires, <= , now) { - timer->expires = ktime_add(timer->expires, *tim); + timer->expires = ktime_add(timer->expires, timer->interval); timer->overrun++; } goto nocheck; ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] ktimers subsystem 2.6.14-rc2-kt5 2005-09-29 6:53 ` Thomas Gleixner @ 2005-09-30 15:58 ` Frank Sorenson 0 siblings, 0 replies; 67+ messages in thread From: Frank Sorenson @ 2005-09-30 15:58 UTC (permalink / raw) To: tglx Cc: john stultz, linux-kernel, mingo, akpm, george, paulmck, hch, oleg, zippel, tim.bird -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Thomas Gleixner wrote: > The KTIMER_REARM case is the broken spot. I fixed this already as it was > oopsing here to, but somehow I messed up with quilt. > > tglx This does indeed fix the panic. Thanks. Frank - -- Frank Sorenson - KD7TZK Systems Manager, Computer Science Department Brigham Young University frank@tuxrocks.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.1 (GNU/Linux) Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org iD8DBQFDPWCKaI0dwg4A47wRAmjAAJ0XarfSYFyqAvGKi+uHbXZLg4+fEwCgso39 5hdrQfgzwMDdT9zM+4GkwLk= =UoVd -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] ktimers subsystem 2.6.14-rc2-kt5 2005-09-28 20:43 [PATCH] ktimers subsystem 2.6.14-rc2-kt5 tglx 2005-09-28 23:59 ` Frank Sorenson 2005-09-29 1:10 ` john stultz @ 2005-09-29 19:57 ` George Anzinger 2005-10-01 1:03 ` Roman Zippel 3 siblings, 0 replies; 67+ messages in thread From: George Anzinger @ 2005-09-29 19:57 UTC (permalink / raw) To: tglx Cc: linux-kernel, mingo, akpm, johnstul, paulmck, hch, oleg, zippel, tim.bird Am I the only one finding "=20\n" and other corruption in this patch? George -- tglx@linutronix.de wrote: > This is an updated version which contains following changes: > > - Selectable time storage format: union/struct based, scalar (64bit) > - Fixed an endless loop in forward_posix_timer (George Anzinger) > - Fixed a wrong sizeof(x) (George Anzinger) > - Fixed build problems for non x86 architectures > > Roman pointed out that the penalty for some architectures > would be quite big when using the nsec_t (64bit) scalar time > storage format. After a long discussion and some more detailed > tests especially on ARM it turned out that the scalar format > is unfortunately not suitable everywhere. The tradeoff between > performance and cleanliness seems too big for some architectures. > > After several rounds of functional conversions and > cleanups an acceptable compromise between cleanliness and > storage format flexibility was found. > > For 64bit architectures the scalar representation is definitely > a win and therefor enabled unconditionally. The code defaults to > the union/struct based implementation on 32bit archs, but can be > switched to the scalar storage format by setting > CONFIG_KTIME_SCALAR=y if there is a benefit for the particular > architecture. The union/struct magic has an advantage over the > struct timespec based format which I considered to use first. It > produces better and denser code for most architecures and does no > harm anywhere else. This might change with improvements of > compilers, but then it requires just a replacement of the related > macros / inlines. > > The code is not harder to understand than the previous > open coded scalar storage based implementation. > > The correctness was verified with the posix timer tests from > the HRT project on the forward ported ktimers based high > resolution proof of concept implementation. > For those interested in this topic the patchseries is available > at http://www.tglx.de/private/tglx/ktimers/patch-2.6.14-rc2-kt5.patches.tar.bz2 > > > Thanks for review and feedback. > > tglx > > > ktimers seperate the "timer API" from the "timeout API". > ktimers are used for: > - nanosleep > - posixtimers > - itimers > > > The patch contains the base implementation of ktimers and the > conversion of nanosleep, posixtimers and itimers to ktimer users. > > The patch does not require other changes to the Linux time(r) core > system. > > The implementation was done with following constraints in mind: > > - Not bound to jiffies > - Multiple time sources > - Per CPU timer queues > - Simplification of absolute CLOCK_REALTIME posix timers > - High resolution timer aware > - Allows the timeout API to reschedule the next event > (for tickless systems) > > Ktimers enqueue the timers into a time sorted list, which is implemented > with a rbtree, which is effiecient and already used in other performance > critical parts of the kernel. This is a bit slower than the timer wheel, > but due to the fact that the vast majority of timers is actually > expiring it has to be waged versus the cascading penalty. > > The code supports multiple time sources. Currently implemented are > CLOCK_REALTIME and CLOCK_MONOTONIC. They provide seperate timer queues > and support functions. > > Signed-off-by: Thomas Gleixner <tglx@linutronix.de> > Signed-off-by: Ingo Molnar <mingo@elte.hu> > > --- > Index: linux-2.6.14-rc2-rt4/include/linux/calc64.h > =================================================================== > --- /dev/null > +++ linux-2.6.14-rc2-rt4/include/linux/calc64.h > @@ -0,0 +1,31 @@ > +#ifndef _linux_CALC64_H > +#define _linux_CALC64_H > + > +#include <linux/types.h> > +#include <asm/div64.h> > + > +#ifndef div_long_long_rem > +#define div_long_long_rem(dividend,divisor,remainder) \ > +({ \ > + u64 result = dividend; \ > + *remainder = do_div(result,divisor); \ > + result; \ > +}) > +#endif > + > +static inline long div_long_long_rem_signed(long long dividend, > + long divisor, > + long *remainder) > +{ > + long res; > + > + if (unlikely(dividend < 0)) { > + res = -div_long_long_rem(-dividend, divisor, remainder); > + *remainder = -(*remainder); > + } else { > + res = div_long_long_rem(dividend, divisor, remainder); > + } > + return res; > +} > + > +#endif > Index: linux-2.6.14-rc2-rt4/include/linux/jiffies.h > =================================================================== > --- linux-2.6.14-rc2-rt4.orig/include/linux/jiffies.h > +++ linux-2.6.14-rc2-rt4/include/linux/jiffies.h > @@ -1,21 +1,12 @@ > #ifndef _LINUX_JIFFIES_H > #define _LINUX_JIFFIES_H > > +#include <linux/calc64.h> > #include <linux/kernel.h> > #include <linux/types.h> > #include <linux/time.h> > #include <linux/timex.h> > #include <asm/param.h> /* for HZ */ > -#include <asm/div64.h> > - > -#ifndef div_long_long_rem > -#define div_long_long_rem(dividend,divisor,remainder) \ > -({ \ > - u64 result = dividend; \ > - *remainder = do_div(result,divisor); \ > - result; \ > -}) > -#endif > > /* > * The following defines establish the engineering parameters of the PLL > Index: linux-2.6.14-rc2-rt4/fs/exec.c > =================================================================== > --- linux-2.6.14-rc2-rt4.orig/fs/exec.c > +++ linux-2.6.14-rc2-rt4/fs/exec.c > @@ -645,9 +645,10 @@ static inline int de_thread(struct task_ > * synchronize with any firing (by calling del_timer_sync) > * before we can safely let the old group leader die. > */ > - sig->real_timer.data = (unsigned long)current; > - if (del_timer_sync(&sig->real_timer)) > - add_timer(&sig->real_timer); > + sig->real_timer.data = current; > + if (stop_ktimer(&sig->real_timer)) > + start_ktimer(&sig->real_timer, NULL, > + KTIMER_RESTART|KTIMER_NOCHECK); > } > while (atomic_read(&sig->count) > count) { > sig->group_exit_task = current; > @@ -659,7 +660,7 @@ static inline int de_thread(struct task_ > } > sig->group_exit_task = NULL; > sig->notify_count = 0; > - sig->real_timer.data = (unsigned long)current; > + sig->real_timer.data = current; > spin_unlock_irq(lock); > > /* > Index: linux-2.6.14-rc2-rt4/fs/proc/array.c > =================================================================== > --- linux-2.6.14-rc2-rt4.orig/fs/proc/array.c > +++ linux-2.6.14-rc2-rt4/fs/proc/array.c > @@ -330,7 +330,7 @@ static int do_task_stat(struct task_stru > unsigned long min_flt = 0, maj_flt = 0; > cputime_t cutime, cstime, utime, stime; > unsigned long rsslim = 0; > - unsigned long it_real_value = 0; > + DEFINE_KTIME(it_real_value); > struct task_struct *t; > char tcomm[sizeof(task->comm)]; > > @@ -386,7 +386,7 @@ static int do_task_stat(struct task_stru > utime = cputime_add(utime, task->signal->utime); > stime = cputime_add(stime, task->signal->stime); > } > - it_real_value = task->signal->it_real_value; > + it_real_value = task->signal->real_timer.expires; > } > ppid = pid_alive(task) ? task->group_leader->real_parent->tgid : 0; > read_unlock(&tasklist_lock); > @@ -435,7 +435,7 @@ static int do_task_stat(struct task_stru > priority, > nice, > num_threads, > - jiffies_to_clock_t(it_real_value), > + (clock_t) ktime_to_clock_t(it_real_value), > start_time, > vsize, > mm ? get_mm_counter(mm, rss) : 0, /* you might want to shift this left 3 */ > Index: linux-2.6.14-rc2-rt4/include/linux/ktimer.h > =================================================================== > --- /dev/null > +++ linux-2.6.14-rc2-rt4/include/linux/ktimer.h > @@ -0,0 +1,335 @@ > +#ifndef _LINUX_KTIMER_H > +#define _LINUX_KTIMER_H > + > +#include <linux/init.h> > +#include <linux/list.h> > +#include <linux/rbtree.h> > +#include <linux/time.h> > +#include <linux/wait.h> > + > +/* Timer API */ > + > +/* > + * Select the ktime_t data type > + */ > +#if defined(CONFIG_KTIME_SCALAR) || (BITS_PER_LONG == 64) > + #define KTIME_IS_SCALAR > +#endif > + > +#ifndef KTIME_IS_SCALAR > +typedef union { > + s64 tv64; > + struct { > +#ifdef __BIG_ENDIAN > + s32 sec, nsec; > +#else > + s32 nsec, sec; > +#endif > + } tv; > +} ktime_t; > + > +#else > + > +typedef s64 ktime_t; > + > +#endif > + > +struct ktimer_base; > + > +/* > + * Timer structure must be initialized by init_ktimer_xxx ! > + */ > +struct ktimer { > + struct rb_node node; > + struct list_head list; > + ktime_t expires; > + ktime_t expired; > + ktime_t interval; > + int overrun; > + unsigned long status; > + void (*function)(void *); > + void *data; > + struct ktimer_base *base; > +}; > + > +/* > + * Timer base struct > + */ > +struct ktimer_base { > + int index; > + char *name; > + spinlock_t lock; > + struct rb_root active; > + struct list_head pending; > + int count; > + unsigned long resolution; > + ktime_t (*get_time)(void); > + struct ktimer *running_timer; > + wait_queue_head_t wait_for_running_timer; > +}; > + > +/* > + * Values for the mode argument of xxx_ktimer functions > + */ > +enum > +{ > + KTIMER_NOREARM, /* Internal value */ > + KTIMER_ABS, /* Time value is absolute */ > + KTIMER_REL, /* Time value is relativ to now */ > + KTIMER_INCR, /* Time value is relativ to previous expiry time */ > + KTIMER_FORWARD, /* Timer is rearmed with value. Overruns are accounted */ > + KTIMER_REARM, /* Timer is rearmed with interval. Overruns are accounted */ > + KTIMER_RESTART /* Timer is restarted with the stored expiry value */ > +}; > + > +/* The timer states */ > +enum > +{ > + KTIMER_INACTIVE, > + KTIMER_PENDING, > + KTIMER_EXPIRED, > + KTIMER_EXPIRED_NOQUEUE, > +}; > + > +/* Expiry must not be checked when the timer is started */ > +#define KTIMER_NOCHECK 0x10000 > + > +#define KTIMER_POISON ((void *) 0x00100101) > + > +#define KTIME_ZERO 0LL > + > +#define ktimer_active(t) ((t)->status != KTIMER_INACTIVE) > +#define ktimer_before(t1, t2) (ktime_cmp((t1)->expires, <, (t2)->expires)) > + > +#ifndef KTIME_IS_SCALAR > +/* > + * Helper macros/inlines to get the math with ktime_t right. Uurgh, that's > + * ugly as hell, but for performance sake we have to use this. The > + * nsec_t based code was nice and simple. :( > + * > + * Be careful when using this stuff. It blows up on you if you dön't > + * get the weirdness right. > + * > + * Be especially aware, that negative values are represented in the > + * form: > + * tv.sec < 0 and 0 >= tv.nsec < NSEC_PER_SEC > + * > + */ > +#define DEFINE_KTIME(k) ktime_t k = {.tv64 = 0LL } > + > +#define ktime_cmp(a,op,b) ((a).tv64 op (b).tv64) > +#define ktime_cmp_val(a, op, b) ((a).tv64 op b) > + > +#define ktime_set(s,n) \ > +({ \ > + ktime_t __kt; \ > + __kt.tv.sec = s; \ > + __kt.tv.nsec = n; \ > + __kt; \ > +}) > + > +#define ktime_set_zero(k) k.tv64 = 0LL > + > +#define ktime_set_low_high(l,h) ktime_set(h,l) > + > +#define ktime_get_low(t) (t).tv.nsec > +#define ktime_get_high(t) (t).tv.sec > + > +static inline ktime_t ktime_set_normalized(long sec, long nsec) > +{ > + ktime_t res; > + > + while (nsec < 0) { > + nsec += NSEC_PER_SEC; > + sec--; > + } > + while (nsec >= NSEC_PER_SEC) { > + nsec -= NSEC_PER_SEC; > + sec++; > + } > + > + res.tv.sec = sec; > + res.tv.nsec = nsec; > + return res; > +} > + > +static inline ktime_t ktime_sub(ktime_t a, ktime_t b) > +{ > + ktime_t res; > + > + res.tv64 = a.tv64 - b.tv64; > + if (res.tv.nsec < 0) > + res.tv.nsec += NSEC_PER_SEC; > + > + return res; > +} > + > +static inline ktime_t ktime_add(ktime_t a, ktime_t b) > +{ > + ktime_t res; > + > + res.tv64 = a.tv64 + b.tv64; > + if (res.tv.nsec >= NSEC_PER_SEC) { > + res.tv.nsec -= NSEC_PER_SEC; > + res.tv.sec++; > + } > + return res; > +} > + > +static inline ktime_t ktime_add_ns(ktime_t a, u64 nsec) > +{ > + ktime_t tmp; > + > + if (likely(nsec < NSEC_PER_SEC)) { > + tmp.tv64 = nsec; > + } else { > + unsigned long rem; > + rem = do_div(nsec, NSEC_PER_SEC); > + tmp = ktime_set((long)nsec, rem); > + } > + return ktime_add(a,tmp); > +} > + > +#define timespec_to_ktime(ts) \ > +({ \ > + ktime_t __kt; \ > + struct timespec __ts = (ts); \ > + __kt.tv.sec = (s32)__ts.tv_sec; \ > + __kt.tv.nsec = (s32)__ts.tv_nsec; \ > + __kt; \ > +}) > + > +#define ktime_to_timespec(kt) \ > +({ \ > + struct timespec __ts; \ > + ktime_t __kt = (kt); \ > + __ts.tv_sec = (time_t)__kt.tv.sec; \ > + __ts.tv_nsec = (long)__kt.tv.nsec; \ > + __ts; \ > +}) > + > +#define ktime_to_timeval(kt) \ > +({ \ > + struct timeval __tv; \ > + ktime_t __kt = (kt); \ > + __tv.tv_sec = (time_t)__kt.tv.sec; \ > + __tv.tv_usec = (long)(__kt.tv.nsec / NSEC_PER_USEC); \ > + __tv; \ > +}) > + > +#define ktime_to_clock_t(kt) \ > +({ \ > + ktime_t __kt = (kt); \ > + u64 nsecs = (u64) __kt.tv.sec * NSEC_PER_SEC; \ > + nsec_to_clock_t(nsecs + (u64) __kt.tv.nsec); \ > +}) > + > +#define ktime_to_ns(kt) \ > +({ \ > + ktime_t __kt = (kt); \ > + (((u64)__kt.tv.sec * NSEC_PER_SEC) + (u64)__kt.tv.nsec);\ > +}) > + > +#else > + > +/* ktime_t macros when using a 64bit variable */ > + > +#define DEFINE_KTIME(kt) ktime_t kt = 0LL > + > +#define ktime_cmp(a,op,b) ((a) op (b)) > +#define ktime_cmp_val(a,op,b) ((a) op b) > + > +#define ktime_set(s,n) (((s64) s * NSEC_PER_SEC) + (s64)n) > +#define ktime_set_zero(kt) kt = 0LL > + > +#define ktime_set_low_high(l,h) ((s64)((u64)l) | (((s64) h) << 32)) > + > +#define ktime_get_low(t) ((t) & 0xFFFFFFFFLL) > +#define ktime_get_high(t) ((t) >> 32) > + > +#define ktime_sub(a,b) ((a) - (b)) > +#define ktime_add(a,b) ((a) + (b)) > +#define ktime_add_ns(a,b) ((a) + (b)) > + > +#define timespec_to_ktime(ts) ktime_set(ts.tv_sec, ts.tv_nsec) > + > +#define ktime_to_timespec(kt) ns_to_timespec(kt) > +#define ktime_to_timeval(kt) ns_to_timeval(kt) > + > +#define ktime_to_clock_t(kt) nsec_to_clock_t(kt) > + > +#define ktime_to_ns(kt) (kt) > + > +#define ktime_set_normalized(s,n) ktime_set(s,n) > + > +#endif > + > +/* Exported functions */ > +extern void fastcall init_ktimer_real(struct ktimer *timer); > +extern void fastcall init_ktimer_mono(struct ktimer *timer); > +extern int modify_ktimer(struct ktimer *timer, ktime_t *tim, int mode); > +extern int start_ktimer(struct ktimer *timer, ktime_t *tim, int mode); > +extern int try_to_stop_ktimer(struct ktimer *timer); > +extern int stop_ktimer(struct ktimer *timer); > +extern ktime_t get_remtime_ktimer(struct ktimer *timer, long fake); > +extern ktime_t get_expiry_ktimer(struct ktimer *timer, ktime_t *now); > +extern void __init init_ktimers(void); > + > +/* Conversion functions with rounding based on resolution */ > +extern ktime_t ktimer_convert_timeval(struct ktimer *timer, struct timeval *tv); > +extern ktime_t ktimer_convert_timespec(struct ktimer *timer, struct timespec *ts); > + > +/* Posix timers current quirks */ > +extern int get_ktimer_mono_res(clockid_t which_clock, struct timespec *tp); > +extern int get_ktimer_real_res(clockid_t which_clock, struct timespec *tp); > + > +/* nanosleep functions */ > +long ktimer_nanosleep_mono(struct timespec *rqtp, struct timespec __user *rmtp, int mode); > +long ktimer_nanosleep_real(struct timespec *rqtp, struct timespec __user *rmtp, int mode); > + > +#if defined(CONFIG_SMP) > +extern void wait_for_ktimer(struct ktimer *timer); > +#else > +#define wait_for_ktimer(t) do {} while (0) > +#endif > + > +#define KTIME_REALTIME_RES (NSEC_PER_SEC/HZ) > +#define KTIME_MONOTONIC_RES (NSEC_PER_SEC/HZ) > + > +static inline void get_ktime_mono_ts(struct timespec *ts) > +{ > + unsigned long seq; > + struct timespec tomono; > + do { > + seq = read_seqbegin(&xtime_lock); > + getnstimeofday(ts); > + tomono = wall_to_monotonic; > + } while (read_seqretry(&xtime_lock, seq)); > + > + > + set_normalized_timespec(ts, ts->tv_sec + tomono.tv_sec, > + ts->tv_nsec + tomono.tv_nsec); > + > +} > + > +static inline ktime_t do_get_ktime_mono(void) > +{ > + struct timespec now; > + > + get_ktime_mono_ts(&now); > + return timespec_to_ktime(now); > +} > + > +#define get_ktime_real_ts(ts) getnstimeofday(ts) > +static inline ktime_t do_get_ktime_real(void) > +{ > + struct timespec now; > + > + getnstimeofday(&now); > + return timespec_to_ktime(now); > +} > + > +#define clock_was_set() do { } while (0) > +extern void run_ktimer_queues(void); > + > +#endif > Index: linux-2.6.14-rc2-rt4/include/linux/posix-timers.h > =================================================================== > --- linux-2.6.14-rc2-rt4.orig/include/linux/posix-timers.h > +++ linux-2.6.14-rc2-rt4/include/linux/posix-timers.h > @@ -51,10 +51,9 @@ struct k_itimer { > struct sigqueue *sigq; /* signal queue entry. */ > union { > struct { > - struct timer_list timer; > - struct list_head abs_timer_entry; /* clock abs_timer_list */ > - struct timespec wall_to_prev; /* wall_to_monotonic used when set */ > - unsigned long incr; /* interval in jiffies */ > + struct ktimer timer; > + ktime_t incr; > + int overrun; > } real; > struct cpu_timer_list cpu; > struct { > @@ -66,10 +65,6 @@ struct k_itimer { > } it; > }; > > -struct k_clock_abs { > - struct list_head list; > - spinlock_t lock; > -}; > struct k_clock { > int res; /* in nano seconds */ > int (*clock_getres) (clockid_t which_clock, struct timespec *tp); > @@ -77,7 +72,7 @@ struct k_clock { > int (*clock_set) (clockid_t which_clock, struct timespec * tp); > int (*clock_get) (clockid_t which_clock, struct timespec * tp); > int (*timer_create) (struct k_itimer *timer); > - int (*nsleep) (clockid_t which_clock, int flags, struct timespec *); > + int (*nsleep) (clockid_t which_clock, int flags, struct timespec *, struct timespec __user *); > int (*timer_set) (struct k_itimer * timr, int flags, > struct itimerspec * new_setting, > struct itimerspec * old_setting); > @@ -91,37 +86,104 @@ void register_posix_clock(clockid_t cloc > > /* Error handlers for timer_create, nanosleep and settime */ > int do_posix_clock_notimer_create(struct k_itimer *timer); > -int do_posix_clock_nonanosleep(clockid_t, int flags, struct timespec *); > +int do_posix_clock_nonanosleep(clockid_t, int flags, struct timespec *, struct timespec __user *); > int do_posix_clock_nosettime(clockid_t, struct timespec *tp); > > /* function to call to trigger timer event */ > int posix_timer_event(struct k_itimer *timr, int si_private); > > -struct now_struct { > - unsigned long jiffies; > -}; > - > -#define posix_get_now(now) (now)->jiffies = jiffies; > -#define posix_time_before(timer, now) \ > - time_before((timer)->expires, (now)->jiffies) > - > -#define posix_bump_timer(timr, now) \ > - do { \ > - long delta, orun; \ > - delta = now.jiffies - (timr)->it.real.timer.expires; \ > - if (delta >= 0) { \ > - orun = 1 + (delta / (timr)->it.real.incr); \ > - (timr)->it.real.timer.expires += \ > - orun * (timr)->it.real.incr; \ > - (timr)->it_overrun += orun; \ > - } \ > - }while (0) > +#if (BITS_PER_LONG < 64) > +static inline ktime_t forward_posix_timer(struct k_itimer *t, ktime_t now) > +{ > + ktime_t delta = ktime_sub(now, t->it.real.timer.expires); > + unsigned long orun = 1; > + > + if (ktime_cmp_val(delta, <, KTIME_ZERO)) > + goto out; > + > + if (unlikely(ktime_cmp(delta, >, t->it.real.incr))) { > + > + int sft = 0; > + u64 div, dclc, inc, dns; > + > + dclc = dns = ktime_to_ns(delta); > + div = inc = ktime_to_ns(t->it.real.incr); > + /* Make sure the divisor is less than 2^32 */ > + while(div >> 32) { > + sft++; > + div >>= 1; > + } > + dclc >>= sft; > + do_div(dclc, (unsigned long) div); > + orun = (unsigned long) dclc; > + if (likely(!(inc >> 32))) > + dclc *= (unsigned long) inc; > + else > + dclc *= inc; > + t->it.real.timer.expires = ktime_add_ns(t->it.real.timer.expires, > + dclc); > + } else { > + t->it.real.timer.expires = ktime_add(t->it.real.timer.expires, > + t->it.real.incr); > + } > + /* > + * Here is the correction for exact. Also covers delta == incr > + * which is the else clause above. > + */ > + if (ktime_cmp(t->it.real.timer.expires, <=, now)) { > + t->it.real.timer.expires = ktime_add(t->it.real.timer.expires, > + t->it.real.incr); > + orun++; > + } > + t->it_overrun += orun; > + > + out: > + return ktime_sub(t->it.real.timer.expires, now); > +} > +#else > +static inline ktime_t forward_posix_timer(struct k_itimer *t, ktime_t now) > +{ > + ktime_t delta = ktime_sub(now, t->it.real.timer.expires); > + unsigned long orun = 1; > + > + if (ktime_cmp_val(delta, <, KTIME_ZERO)) > + goto out; > + > + if (unlikely(ktime_cmp(delta, >, t->it.real.incr))) { > + > + u64 dns, inc; > + > + dns = ktime_to_ns(delta); > + inc = ktime_to_ns(t->it.real.incr); > + > + orun = dns / inc; > + t->it.real.timer.expires = ktime_add_ns(t->it.real.timer.expires, > + orun * inc); > + } else { > + t->it.real.timer.expires = ktime_add(t->it.real.timer.expires, > + t->it.real.incr); > + } > + /* > + * Here is the correction for exact. Also covers delta == incr > + * which is the else clause above. > + */ > + if (ktime_cmp(t->it.real.timer.expires, <=, now)) { > + t->it.real.timer.expires = ktime_add(t->it.real.timer.expires, > + t->it.real.incr); > + orun++; > + } > + t->it_overrun += orun; > + out: > + return ktime_sub(t->it.real.timer.expires, now); > +} > +#endif > > int posix_cpu_clock_getres(clockid_t which_clock, struct timespec *); > int posix_cpu_clock_get(clockid_t which_clock, struct timespec *); > int posix_cpu_clock_set(clockid_t which_clock, const struct timespec *tp); > int posix_cpu_timer_create(struct k_itimer *); > -int posix_cpu_nsleep(clockid_t, int, struct timespec *); > +int posix_cpu_nsleep(clockid_t, int, struct timespec *, > + struct timespec __user *); > int posix_cpu_timer_set(struct k_itimer *, int, > struct itimerspec *, struct itimerspec *); > int posix_cpu_timer_del(struct k_itimer *); > Index: linux-2.6.14-rc2-rt4/include/linux/sched.h > =================================================================== > --- linux-2.6.14-rc2-rt4.orig/include/linux/sched.h > +++ linux-2.6.14-rc2-rt4/include/linux/sched.h > @@ -104,6 +104,7 @@ extern unsigned long nr_iowait(void); > #include <linux/param.h> > #include <linux/resource.h> > #include <linux/timer.h> > +#include <linux/ktimer.h> > > #include <asm/processor.h> > > @@ -346,8 +347,7 @@ struct signal_struct { > struct list_head posix_timers; > > /* ITIMER_REAL timer for the process */ > - struct timer_list real_timer; > - unsigned long it_real_value, it_real_incr; > + struct ktimer real_timer; > > /* ITIMER_PROF and ITIMER_VIRTUAL timers for the process */ > cputime_t it_prof_expires, it_virt_expires; > Index: linux-2.6.14-rc2-rt4/include/linux/timer.h > =================================================================== > --- linux-2.6.14-rc2-rt4.orig/include/linux/timer.h > +++ linux-2.6.14-rc2-rt4/include/linux/timer.h > @@ -91,6 +91,6 @@ static inline void add_timer(struct time > > extern void init_timers(void); > extern void run_local_timers(void); > -extern void it_real_fn(unsigned long); > +extern void it_real_fn(void *); > > #endif > Index: linux-2.6.14-rc2-rt4/init/main.c > =================================================================== > --- linux-2.6.14-rc2-rt4.orig/init/main.c > +++ linux-2.6.14-rc2-rt4/init/main.c > @@ -485,6 +485,7 @@ asmlinkage void __init start_kernel(void > init_IRQ(); > pidhash_init(); > init_timers(); > + init_ktimers(); > softirq_init(); > time_init(); > > Index: linux-2.6.14-rc2-rt4/kernel/Makefile > =================================================================== > --- linux-2.6.14-rc2-rt4.orig/kernel/Makefile > +++ linux-2.6.14-rc2-rt4/kernel/Makefile > @@ -7,7 +7,8 @@ obj-y = sched.o fork.o exec_domain.o > sysctl.o capability.o ptrace.o timer.o user.o \ > signal.o sys.o kmod.o workqueue.o pid.o \ > rcupdate.o intermodule.o extable.o params.o posix-timers.o \ > - kthread.o wait.o kfifo.o sys_ni.o posix-cpu-timers.o > + kthread.o wait.o kfifo.o sys_ni.o posix-cpu-timers.o \ > + ktimers.o > > obj-$(CONFIG_FUTEX) += futex.o > obj-$(CONFIG_GENERIC_ISA_DMA) += dma.o > Index: linux-2.6.14-rc2-rt4/kernel/exit.c > =================================================================== > --- linux-2.6.14-rc2-rt4.orig/kernel/exit.c > +++ linux-2.6.14-rc2-rt4/kernel/exit.c > @@ -842,7 +842,7 @@ fastcall NORET_TYPE void do_exit(long co > update_mem_hiwater(tsk); > group_dead = atomic_dec_and_test(&tsk->signal->live); > if (group_dead) { > - del_timer_sync(&tsk->signal->real_timer); > + stop_ktimer(&tsk->signal->real_timer); > acct_process(code); > } > exit_mm(tsk); > Index: linux-2.6.14-rc2-rt4/kernel/fork.c > =================================================================== > --- linux-2.6.14-rc2-rt4.orig/kernel/fork.c > +++ linux-2.6.14-rc2-rt4/kernel/fork.c > @@ -804,10 +804,9 @@ static inline int copy_signal(unsigned l > init_sigpending(&sig->shared_pending); > INIT_LIST_HEAD(&sig->posix_timers); > > - sig->it_real_value = sig->it_real_incr = 0; > + init_ktimer_mono(&sig->real_timer); > sig->real_timer.function = it_real_fn; > - sig->real_timer.data = (unsigned long) tsk; > - init_timer(&sig->real_timer); > + sig->real_timer.data = tsk; > > sig->it_virt_expires = cputime_zero; > sig->it_virt_incr = cputime_zero; > Index: linux-2.6.14-rc2-rt4/kernel/itimer.c > =================================================================== > --- linux-2.6.14-rc2-rt4.orig/kernel/itimer.c > +++ linux-2.6.14-rc2-rt4/kernel/itimer.c > @@ -12,36 +12,22 @@ > #include <linux/syscalls.h> > #include <linux/time.h> > #include <linux/posix-timers.h> > +#include <linux/ktimer.h> > > #include <asm/uaccess.h> > > -static unsigned long it_real_value(struct signal_struct *sig) > -{ > - unsigned long val = 0; > - if (timer_pending(&sig->real_timer)) { > - val = sig->real_timer.expires - jiffies; > - > - /* look out for negative/zero itimer.. */ > - if ((long) val <= 0) > - val = 1; > - } > - return val; > -} > - > int do_getitimer(int which, struct itimerval *value) > { > struct task_struct *tsk = current; > - unsigned long interval, val; > + ktime_t interval, val; > cputime_t cinterval, cval; > > switch (which) { > case ITIMER_REAL: > - spin_lock_irq(&tsk->sighand->siglock); > - interval = tsk->signal->it_real_incr; > - val = it_real_value(tsk->signal); > - spin_unlock_irq(&tsk->sighand->siglock); > - jiffies_to_timeval(val, &value->it_value); > - jiffies_to_timeval(interval, &value->it_interval); > + interval = tsk->signal->real_timer.interval; > + val = get_remtime_ktimer(&tsk->signal->real_timer, NSEC_PER_USEC); > + value->it_value = ktime_to_timeval(val); > + value->it_interval = ktime_to_timeval(interval); > break; > case ITIMER_VIRTUAL: > read_lock(&tasklist_lock); > @@ -113,59 +99,35 @@ asmlinkage long sys_getitimer(int which, > } > > > -void it_real_fn(unsigned long __data) > +/* > + * The timer is automagically restarted, when interval != 0 > + */ > +void it_real_fn(void *data) > { > - struct task_struct * p = (struct task_struct *) __data; > - unsigned long inc = p->signal->it_real_incr; > - > - send_group_sig_info(SIGALRM, SEND_SIG_PRIV, p); > - > - /* > - * Now restart the timer if necessary. We don't need any locking > - * here because do_setitimer makes sure we have finished running > - * before it touches anything. > - * Note, we KNOW we are (or should be) at a jiffie edge here so > - * we don't need the +1 stuff. Also, we want to use the prior > - * expire value so as to not "slip" a jiffie if we are late. > - * Deal with requesting a time prior to "now" here rather than > - * in add_timer. > - */ > - if (!inc) > - return; > - while (time_before_eq(p->signal->real_timer.expires, jiffies)) > - p->signal->real_timer.expires += inc; > - add_timer(&p->signal->real_timer); > + send_group_sig_info(SIGALRM, SEND_SIG_PRIV, data); > } > > int do_setitimer(int which, struct itimerval *value, struct itimerval *ovalue) > { > struct task_struct *tsk = current; > - unsigned long val, interval, expires; > + struct ktimer *timer; > + ktime_t expires; > cputime_t cval, cinterval, nval, ninterval; > > switch (which) { > case ITIMER_REAL: > -again: > - spin_lock_irq(&tsk->sighand->siglock); > - interval = tsk->signal->it_real_incr; > - val = it_real_value(tsk->signal); > - /* We are sharing ->siglock with it_real_fn() */ > - if (try_to_del_timer_sync(&tsk->signal->real_timer) < 0) { > - spin_unlock_irq(&tsk->sighand->siglock); > - goto again; > - } > - tsk->signal->it_real_incr = > - timeval_to_jiffies(&value->it_interval); > - expires = timeval_to_jiffies(&value->it_value); > - if (expires) > - mod_timer(&tsk->signal->real_timer, > - jiffies + 1 + expires); > - spin_unlock_irq(&tsk->sighand->siglock); > + timer = &tsk->signal->real_timer; > + stop_ktimer(timer); > if (ovalue) { > - jiffies_to_timeval(val, &ovalue->it_value); > - jiffies_to_timeval(interval, > - &ovalue->it_interval); > - } > + ovalue->it_value = ktime_to_timeval( > + get_remtime_ktimer(timer, NSEC_PER_USEC)); > + ovalue->it_interval = ktime_to_timeval(timer->interval); > + } > + timer->interval = ktimer_convert_timeval(timer, &value->it_interval); > + expires = ktimer_convert_timeval(timer, &value->it_value); > + if (ktime_cmp_val(expires, != , KTIME_ZERO)) > + modify_ktimer(timer, &expires, KTIMER_REL | KTIMER_NOCHECK); > + > break; > case ITIMER_VIRTUAL: > nval = timeval_to_cputime(&value->it_value); > Index: linux-2.6.14-rc2-rt4/kernel/ktimers.c > =================================================================== > --- /dev/null > +++ linux-2.6.14-rc2-rt4/kernel/ktimers.c > @@ -0,0 +1,824 @@ > +/* > + * linux/kernel/ktimers.c > + * > + * Copyright(C) 2005 Thomas Gleixner <tglx@linutronix.de> > + * > + * Kudos to Ingo Molnar for review, criticism, ideas > + * > + * Credits: > + * Lot of ideas and implementation details taken from > + * timer.c and related code > + * > + * Kernel timers > + * > + * In contrast to the timeout related API found in kernel/timer.c, > + * ktimers provide finer resolution and accuracy depending on system > + * configuration and capabilities. > + * > + * These timers are used for > + * - itimers > + * - posixtimers > + * - nanosleep > + * - precise in kernel timing > + * > + * Please do not abuse this API for simple timeouts. > + * > + * For licencing details see kernel-base/COPYING > + * > + */ > + > +#include <linux/cpu.h> > +#include <linux/interrupt.h> > +#include <linux/ktimer.h> > +#include <linux/module.h> > +#include <linux/notifier.h> > +#include <linux/percpu.h> > +#include <linux/syscalls.h> > + > +#include <asm/uaccess.h> > + > +static ktime_t get_ktime_mono(void); > +static ktime_t get_ktime_real(void); > + > +/* The time bases */ > +#define MAX_KTIMER_BASES 2 > +static DEFINE_PER_CPU(struct ktimer_base, ktimer_bases[MAX_KTIMER_BASES]) = > +{ > + { > + .index = CLOCK_REALTIME, > + .name = "Realtime", > + .get_time = &get_ktime_real, > + .resolution = KTIME_REALTIME_RES, > + }, > + { > + .index = CLOCK_MONOTONIC, > + .name = "Monotonic", > + .get_time = &get_ktime_mono, > + .resolution = KTIME_MONOTONIC_RES, > + }, > +}; > + > +/* > + * The SMP/UP kludge goes here > + */ > +#if defined(CONFIG_SMP) > + > +#define set_running_timer(b,t) b->running_timer = t > +#define wake_up_timer_waiters(b) wake_up(&b->wait_for_running_timer) > +#define ktimer_base_can_change (1) > +/* > + * Wait for a running timer > + */ > +void wait_for_ktimer(struct ktimer *timer) > +{ > + struct ktimer_base *base = timer->base; > + > + if (base && base->running_timer == timer) > + wait_event(base->wait_for_running_timer, > + base->running_timer != timer); > +} > + > +/* > + * We are using hashed locking: holding per_cpu(ktimer_bases)[n].lock > + * means that all timers which are tied to this base via timer->base are > + * locked, and the base itself is locked too. > + * > + * So __run_timers/migrate_timers can safely modify all timers which could > + * be found on the lists/queues. > + * > + * When the timer's base is locked, and the timer removed from list, it is > + * possible to set timer->base = NULL and drop the lock: the timer remains > + * locked. > + */ > +static inline struct ktimer_base *lock_ktimer_base(struct ktimer *timer, > + unsigned long *flags) > +{ > + struct ktimer_base *base; > + > + for (;;) { > + base = timer->base; > + if (likely(base != NULL)) { > + spin_lock_irqsave(&base->lock, *flags); > + if (likely(base == timer->base)) > + return base; > + /* The timer has migrated to another CPU */ > + spin_unlock_irqrestore(&base->lock, *flags); > + } > + cpu_relax(); > + } > +} > + > +static inline struct ktimer_base *switch_ktimer_base(struct ktimer *timer, > + struct ktimer_base *base) > +{ > + int ktidx = base->index; > + struct ktimer_base *new_base = &__get_cpu_var(ktimer_bases[ktidx]); > + > + if (base != new_base) { > + /* > + * We are trying to schedule the timer on the local CPU. > + * However we can't change timer's base while it is running, > + * so we keep it on the same CPU. No hassle vs. reprogramming > + * the event source in the high resolution case. The softirq > + * code will take care of this when the timer function has > + * completed. There is no conflict as we hold the lock until > + * the timer is enqueued. > + */ > + if (unlikely(base->running_timer == timer)) { > + return base; > + } else { > + /* See the comment in lock_timer_base() */ > + timer->base = NULL; > + spin_unlock(&base->lock); > + spin_lock(&new_base->lock); > + timer->base = new_base; > + } > + } > + return new_base; > +} > + > +/* > + * Get the timer base unlocked > + * > + * Take care of timer->base = NULL in switch_ktimer_base ! > + */ > +static inline struct ktimer_base *get_ktimer_base_unlocked(struct ktimer *timer) > +{ > + struct ktimer_base *base; > + while (!(base = timer->base)); > + return base; > +} > +#else > + > +#define set_running_timer(b,t) do {} while (0) > +#define wake_up_timer_waiters(b) do {} while (0) > + > +static inline struct ktimer_base *lock_ktimer_base(struct ktimer *timer, > + unsigned long *flags) > +{ > + struct ktimer_base *base; > + > + base = timer->base; > + spin_lock_irqsave(&base->lock, *flags); > + return base; > +} > + > +#define switch_ktimer_base(t, b) b > + > +#define get_ktimer_base_unlocked(t) (t)->base > +#define ktimer_base_can_change (0) > + > +#endif /* !CONFIG_SMP */ > + > +/* > + * Convert timespec to ktime_t with resolution adjustment > + * > + * Note: We can access base without locking here, as ktimers can > + * migrate between CPUs but can not be moved from one clock source to > + * another. The clock source binding is set at init_ktimer_XXX. > + */ > +ktime_t ktimer_convert_timespec(struct ktimer *timer, struct timespec *ts) > +{ > + struct ktimer_base *base = get_ktimer_base_unlocked(timer); > + ktime_t t; > + long rem = ts->tv_nsec % base->resolution; > + > + t = ktime_set(ts->tv_sec, ts->tv_nsec); > + > + /* Check, if the value has to be rounded */ > + if (rem) > + t = ktime_add_ns(t, base->resolution - rem); > + return t; > +} > + > +/* > + * Convert timeval to ktime_t with resolution adjustment > + */ > +ktime_t ktimer_convert_timeval(struct ktimer *timer, struct timeval *tv) > +{ > + struct timespec ts; > + > + ts.tv_sec = tv->tv_sec; > + ts.tv_nsec = tv->tv_usec * NSEC_PER_USEC; > + > + return ktimer_convert_timespec(timer, &ts); > +} > + > +/* > + * Internal function to add (re)start a timer > + * > + * The timer is inserted in expiry order. > + * Insertion into the red black tree is O(log(n)) > + * > + */ > +static int enqueue_ktimer(struct ktimer *timer, struct ktimer_base *base, > + ktime_t *tim, int mode) > +{ > + struct rb_node **link = &base->active.rb_node; > + struct rb_node *parent = NULL; > + struct ktimer *entry; > + struct list_head *prev = &base->pending; > + ktime_t now; > + > + /* Get current time */ > + now = base->get_time(); > + > + /* Timer expiry mode */ > + switch (mode & ~KTIMER_NOCHECK) { > + case KTIMER_ABS: > + timer->expires = *tim; > + break; > + case KTIMER_REL: > + timer->expires = ktime_add(now, *tim); > + break; > + case KTIMER_INCR: > + timer->expires = ktime_add(timer->expires, *tim); > + break; > + case KTIMER_FORWARD: > + while ktime_cmp(timer->expires, <= , now) { > + timer->expires = ktime_add(timer->expires, *tim); > + timer->overrun++; > + } > + goto nocheck; > + case KTIMER_REARM: > + while ktime_cmp(timer->expires, <= , now) { > + timer->expires = ktime_add(timer->expires, *tim); > + timer->overrun++; > + } > + goto nocheck; > + case KTIMER_RESTART: > + break; > + default: > + BUG(); > + } > + > + /* Already expired.*/ > + if ktime_cmp(timer->expires, <=, now) { > + timer->expired = now; > + /* The caller takes care of expiry */ > + if (!(mode & KTIMER_NOCHECK)) > + return -1; > + } > + nocheck: > + > + while (*link) { > + parent = *link; > + entry = rb_entry(parent, struct ktimer, node); > + /* > + * We dont care about collisions. Nodes with > + * the same expiry time stay together. > + */ > + if (ktimer_before(timer, entry)) > + link = &(*link)->rb_left; > + else { > + link = &(*link)->rb_right; > + prev = &entry->list; > + } > + } > + > + rb_link_node(&timer->node, parent, link); > + rb_insert_color(&timer->node, &base->active); > + list_add(&timer->list, prev); > + timer->status = KTIMER_PENDING; > + base->count++; > + return 0; > +} > + > +/* > + * Internal helper to remove a timer > + * > + * The function allows automatic rearming for interval > + * timers. > + * > + */ > +static inline void do_remove_ktimer(struct ktimer *timer, > + struct ktimer_base *base, int rearm) > +{ > + list_del(&timer->list); > + rb_erase(&timer->node, &base->active); > + timer->node.rb_parent = KTIMER_POISON; > + timer->status = KTIMER_INACTIVE; > + base->count--; > + BUG_ON(base->count < 0); > + /* Auto rearm the timer ? */ > + if (rearm && ktime_cmp_val(timer->interval, !=, KTIME_ZERO)) > + enqueue_ktimer(timer, base, NULL, KTIMER_REARM); > +} > + > +/* > + * Called with base lock held > + */ > +static inline int remove_ktimer(struct ktimer *timer, struct ktimer_base *base) > +{ > + if (ktimer_active(timer)) { > + do_remove_ktimer(timer, base, KTIMER_NOREARM); > + return 1; > + } > + return 0; > +} > + > +/* > + * Internal function to (re)start a timer. > + */ > +static int internal_restart_ktimer(struct ktimer *timer, ktime_t *tim, > + int mode) > +{ > + struct ktimer_base *base, *new_base; > + unsigned long flags; > + int ret; > + > + BUG_ON(!timer->function); > + > + base = lock_ktimer_base(timer, &flags); > + > + /* Remove an active timer from the queue */ > + ret = remove_ktimer(timer, base); > + > + /* Switch the timer base, if necessary */ > + new_base = switch_ktimer_base(timer, base); > + > + /* > + * When the new timer setting is already expired, > + * let the calling code deal with it. > + */ > + if (enqueue_ktimer(timer, new_base, tim, mode)) > + ret = -1; > + > + spin_unlock_irqrestore(&new_base->lock, flags); > + return ret; > +} > + > +/*** > + * modify_ktimer - modify a running timer > + * @timer: the timer to be modified > + * @tim: expiry time (required) > + * @mode: timer setup mode > + * > + */ > +int modify_ktimer(struct ktimer *timer, ktime_t *tim, int mode) > +{ > + BUG_ON(!tim || !timer->function); > + return internal_restart_ktimer(timer, tim, mode); > +} > + > +/*** > + * start_ktimer - start a timer on current CPU > + * @timer: the timer to be added > + * @tim: expiry time (optional, if not set in the timer) > + * @mode: timer setup mode > + */ > +int start_ktimer(struct ktimer *timer, ktime_t *tim, int mode) > +{ > + BUG_ON(ktimer_active(timer) || !timer->function); > + > + return internal_restart_ktimer(timer, tim, mode); > +} > + > +/*** > + * try_to_stop_ktimer - try to deactivate a timer > + */ > +int try_to_stop_ktimer(struct ktimer *timer) > +{ > + struct ktimer_base *base; > + unsigned long flags; > + int ret = -1; > + > + base = lock_ktimer_base(timer, &flags); > + > + if (base->running_timer != timer) { > + ret = remove_ktimer(timer, base); > + if (ret) > + timer->expired = base->get_time(); > + } > + > + spin_unlock_irqrestore(&base->lock, flags); > + > + return ret; > + > +} > + > +/*** > + * stop_timer_sync - deactivate a timer and wait for the handler to finish. > + * @timer: the timer to be deactivated > + * > + */ > +int stop_ktimer(struct ktimer *timer) > +{ > + for (;;) { > + int ret = try_to_stop_ktimer(timer); > + if (ret >= 0) > + return ret; > + wait_for_ktimer(timer); > + } > +} > + > +/*** > + * get_remtime_ktimer - get remaining time for the timer > + * @timer: the timer to read > + * @fake: when fake > 0 a pending, but expired timer > + * returns fake (itimers need this, uurg) > + */ > +ktime_t get_remtime_ktimer(struct ktimer *timer, long fake) > +{ > + struct ktimer_base *base; > + unsigned long flags; > + ktime_t rem; > + > + base = lock_ktimer_base(timer, &flags); > + if (ktimer_active(timer)) { > + rem = ktime_sub(timer->expires,base->get_time()); > + if (fake && ktime_cmp_val(rem, <=, KTIME_ZERO)) > + rem = ktime_set(0, fake); > + } else { > + if (!fake) > + rem = ktime_sub(timer->expires,base->get_time()); > + else > + ktime_set_zero(rem); > + } > + spin_unlock_irqrestore(&base->lock, flags); > + return rem; > +} > + > +/*** > + * get_expiry_ktimer - get expiry time for the timer > + * @timer: the timer to read > + * @now: if != NULL store current base->time > + */ > +ktime_t get_expiry_ktimer(struct ktimer *timer, ktime_t *now) > +{ > + struct ktimer_base *base; > + unsigned long flags; > + ktime_t expiry; > + > + base = lock_ktimer_base(timer, &flags); > + expiry = timer->expires; > + if (now) > + *now = base->get_time(); > + spin_unlock_irqrestore(&base->lock, flags); > + return expiry; > +} > + > +/* > + * Functions related to clock sources > + */ > + > +static inline void ktimer_common_init(struct ktimer *timer) > +{ > + memset(timer, 0, sizeof(struct ktimer)); > + timer->node.rb_parent = KTIMER_POISON; > +} > + > +/* > + * Get monotonic time > + */ > +static ktime_t get_ktime_mono(void) > +{ > + return do_get_ktime_mono(); > +} > + > +/*** > + * init_ktimer_mono - initialize a timer on monotonic time > + * @timer: the timer to be initialized > + * > + */ > +void fastcall init_ktimer_mono(struct ktimer *timer) > +{ > + ktimer_common_init(timer); > + timer->base = > + &per_cpu(ktimer_bases, raw_smp_processor_id())[CLOCK_MONOTONIC]; > +} > + > +/*** > + * get_ktimer_mono_res - get the monotonic timer resolution > + * > + */ > +int get_ktimer_mono_res(clockid_t which_clock, struct timespec *tp) > +{ > + tp->tv_sec = 0; > + tp->tv_nsec = > + per_cpu(ktimer_bases, raw_smp_processor_id())[CLOCK_MONOTONIC].resolution; > + return 0; > +} > + > +/* > + * Get real time > + */ > +static ktime_t get_ktime_real(void) > +{ > + return do_get_ktime_real(); > +} > + > +/*** > + * init_ktimer_real - initialize a timer on real time > + * @timer: the timer to be initialized > + * > + */ > +void fastcall init_ktimer_real(struct ktimer *timer) > +{ > + ktimer_common_init(timer); > + timer->base = > + &per_cpu(ktimer_bases, raw_smp_processor_id())[CLOCK_REALTIME]; > +} > + > +/*** > + * get_ktimer_real_res - get the real timer resolution > + * > + */ > +int get_ktimer_real_res(clockid_t which_clock, struct timespec *tp) > +{ > + tp->tv_sec = 0; > + tp->tv_nsec = > + per_cpu(ktimer_bases, raw_smp_processor_id())[CLOCK_REALTIME].resolution; > + return 0; > +} > + > +/* > + * The per base runqueue > + */ > +static inline void run_ktimer_queue(struct ktimer_base *base) > +{ > + ktime_t now = base->get_time(); > + > + spin_lock_irq(&base->lock); > + while (!list_empty(&base->pending)) { > + void (*fn)(void *); > + void *data; > + struct ktimer *timer = list_entry(base->pending.next, > + struct ktimer, list); > + if ktime_cmp(now, <=, timer->expires) > + break; > + timer->expired = now; > + fn = timer->function; > + data = timer->data; > + set_running_timer(base, timer); > + do_remove_ktimer(timer, base, KTIMER_REARM); > + spin_unlock_irq(&base->lock); > + fn(data); > + spin_lock_irq(&base->lock); > + set_running_timer(base, NULL); > + } > + spin_unlock_irq(&base->lock); > + wake_up_timer_waiters(base); > +} > + > +/* > + * Called from timer softirq every jiffy > + */ > +void run_ktimer_queues(void) > +{ > + struct ktimer_base *base = __get_cpu_var(ktimer_bases); > + int i; > + > + for (i = 0; i < MAX_KTIMER_BASES; i++) > + run_ktimer_queue(&base[i]); > +} > + > +/* > + * Functions related to initialization > + */ > +static void __devinit init_ktimers_cpu(int cpu) > +{ > + struct ktimer_base *base = per_cpu(ktimer_bases, cpu); > + int i; > + > + for (i = 0; i < MAX_KTIMER_BASES; i++) { > + spin_lock_init(&base->lock); > + INIT_LIST_HEAD(&base->pending); > + init_waitqueue_head(&base->wait_for_running_timer); > + base++; > + } > +} > + > +#ifdef CONFIG_HOTPLUG_CPU > +static void migrate_ktimer_list(struct ktimer_base *old_base, > + struct ktimer_base *new_base) > +{ > + struct ktimer *timer; > + struct rb_node *node; > + > + while ((node = rb_first(&old_base->active))) { > + timer = rb_entry(node, struct ktimer, node); > + remove_ktimer(timer, old_base); > + timer->base = new_base; > + enqueue_ktimer(timer, new_base, NULL, KTIMER_RESTART); > + } > +} > + > +static void __devinit migrate_ktimers(int cpu) > +{ > + struct ktimer_base *old_base; > + struct ktimer_base *new_base; > + int i; > + > + BUG_ON(cpu_online(cpu)); > + old_base = per_cpu(ktimer_bases, cpu); > + new_base = get_cpu_var(ktimer_bases); > + > + local_irq_disable(); > + > + for (i = 0; i < MAX_KTIMER_BASES; i++) { > + > + spin_lock(&new_base->lock); > + spin_lock(&old_base->lock); > + > + if (old_base->running_timer) > + BUG(); > + > + migrate_ktimer_list(old_base, new_base); > + > + spin_unlock(&old_base->lock); > + spin_unlock(&new_base->lock); > + old_base++; > + new_base++; > + } > + > + local_irq_enable(); > + &put_cpu_var(ktimer_bases); > +} > +#endif /* CONFIG_HOTPLUG_CPU */ > + > +static int __devinit ktimer_cpu_notify(struct notifier_block *self, > + unsigned long action, void *hcpu) > +{ > + long cpu = (long)hcpu; > + switch(action) { > + case CPU_UP_PREPARE: > + init_ktimers_cpu(cpu); > + break; > +#ifdef CONFIG_HOTPLUG_CPU > + case CPU_DEAD: > + migrate_ktimers(cpu); > + break; > +#endif > + default: > + break; > + } > + return NOTIFY_OK; > +} > + > +static struct notifier_block __devinitdata ktimers_nb = { > + .notifier_call = ktimer_cpu_notify, > +}; > + > +void __init init_ktimers(void) > +{ > + ktimer_cpu_notify(&ktimers_nb, (unsigned long)CPU_UP_PREPARE, > + (void *)(long)smp_processor_id()); > + register_cpu_notifier(&ktimers_nb); > +} > + > +/* > + * system interface related functions > + */ > +static void process_ktimer(void *data) > +{ > + wake_up_process(data); > +} > + > +/** > + * schedule_ktimer - sleep until timeout > + * @timeout: timeout value > + * @state: state to use for sleep > + * @rel: timeout value is abs/rel > + * > + * Make the current task sleep until @timeout is > + * elapsed. > + * > + * You can set the task state as follows - > + * > + * %TASK_UNINTERRUPTIBLE - at least @timeout is guaranteed to > + * pass before the routine returns. The routine will return 0 > + * > + * %TASK_INTERRUPTIBLE - the routine may return early if a signal is > + * delivered to the current task. In this case the remaining time > + * will be returned > + * > + * The current task state is guaranteed to be TASK_RUNNING when this > + * routine returns. > + * > + */ > +static fastcall ktime_t __sched schedule_ktimer(struct ktimer *timer, > + ktime_t *t, int state, int mode) > +{ > + timer->data = current; > + timer->function = process_ktimer; > + > + current->state = state; > + if (start_ktimer(timer, t, mode)) { > + current->state = TASK_RUNNING; > + goto out; > + } > + if (current->state != TASK_RUNNING) > + schedule(); > + stop_ktimer(timer); > + out: > + /* Store the absolute expiry time */ > + *t = timer->expires; > + /* Return the remaining time */ > + return ktime_sub(timer->expires, timer->expired); > +} > + > +static long __sched nanosleep_restart(struct ktimer *timer, > + struct restart_block *restart) > +{ > + struct timespec tu; > + ktime_t t, rem; > + void *rfn = restart->fn; > + struct timespec __user *rmtp = (struct timespec __user *) restart->arg2; > + > + restart->fn = do_no_restart_syscall; > + > + t = ktime_set_low_high(restart->arg0, restart->arg1); > + > + rem = schedule_ktimer(timer, &t, TASK_INTERRUPTIBLE, KTIMER_ABS); > + > + if (ktime_cmp_val(rem, <=, KTIME_ZERO)) > + return 0; > + > + tu = ktime_to_timespec(rem); > + if (rmtp && copy_to_user(rmtp, &rem, sizeof(tu))) > + return -EFAULT; > + > + restart->fn = rfn; > + /* The other values in restart are already filled in */ > + return -ERESTART_RESTARTBLOCK; > +} > + > +static long __sched nanosleep_restart_mono(struct restart_block *restart) > +{ > + struct ktimer timer; > + > + init_ktimer_mono(&timer); > + return nanosleep_restart(&timer, restart); > +} > + > +static long __sched nanosleep_restart_real(struct restart_block *restart) > +{ > + struct ktimer timer; > + > + init_ktimer_real(&timer); > + return nanosleep_restart(&timer, restart); > +} > + > +static long ktimer_nanosleep(struct ktimer *timer, struct timespec *rqtp, > + struct timespec __user *rmtp, int mode, > + long (*rfn)(struct restart_block *)) > +{ > + struct timespec tu; > + ktime_t rem, t; > + struct restart_block *restart; > + > + t = ktimer_convert_timespec(timer, rqtp); > + > + /* t is updated to absolute expiry time ! */ > + rem = schedule_ktimer(timer, &t, TASK_INTERRUPTIBLE, mode); > + > + if (ktime_cmp_val(rem, <=, KTIME_ZERO)) > + return 0; > + > + tu = ktime_to_timespec(rem); > + > + if (rmtp && copy_to_user(rmtp, &tu, sizeof(tu))) > + return -EFAULT; > + > + restart = ¤t_thread_info()->restart_block; > + restart->fn = rfn; > + restart->arg0 = ktime_get_low(t); > + restart->arg1 = ktime_get_high(t); > + restart->arg2 = (unsigned long) rmtp; > + return -ERESTART_RESTARTBLOCK; > + > +} > + > +long ktimer_nanosleep_mono(struct timespec *rqtp, > + struct timespec __user *rmtp, int mode) > +{ > + struct ktimer timer; > + > + init_ktimer_mono(&timer); > + return ktimer_nanosleep(&timer, rqtp, rmtp, mode, nanosleep_restart_mono); > +} > + > +long ktimer_nanosleep_real(struct timespec *rqtp, > + struct timespec __user *rmtp, int mode) > +{ > + struct ktimer timer; > + > + init_ktimer_real(&timer); > + return ktimer_nanosleep(&timer, rqtp, rmtp, mode, nanosleep_restart_real); > +} > + > +asmlinkage long sys_nanosleep(struct timespec __user *rqtp, > + struct timespec __user *rmtp) > +{ > + struct timespec tu; > + > + if (copy_from_user(&tu, rqtp, sizeof(tu))) > + return -EFAULT; > + > + if (!timespec_valid(&tu)) > + return -EINVAL; > + > + return ktimer_nanosleep_mono(&tu, rmtp, KTIMER_REL); > +} > + > Index: linux-2.6.14-rc2-rt4/kernel/posix-cpu-timers.c > =================================================================== > --- linux-2.6.14-rc2-rt4.orig/kernel/posix-cpu-timers.c > +++ linux-2.6.14-rc2-rt4/kernel/posix-cpu-timers.c > @@ -1394,7 +1394,7 @@ void set_process_cpu_timer(struct task_s > static long posix_cpu_clock_nanosleep_restart(struct restart_block *); > > int posix_cpu_nsleep(clockid_t which_clock, int flags, > - struct timespec *rqtp) > + struct timespec *rqtp, struct timespec __user *rmtp) > { > struct restart_block *restart_block = > ¤t_thread_info()->restart_block; > @@ -1419,7 +1419,6 @@ int posix_cpu_nsleep(clockid_t which_clo > error = posix_cpu_timer_create(&timer); > timer.it_process = current; > if (!error) { > - struct timespec __user *rmtp; > static struct itimerspec zero_it; > struct itimerspec it = { .it_value = *rqtp, > .it_interval = {} }; > @@ -1466,7 +1465,6 @@ int posix_cpu_nsleep(clockid_t which_clo > /* > * Report back to the user the time still remaining. > */ > - rmtp = (struct timespec __user *) restart_block->arg1; > if (rmtp != NULL && !(flags & TIMER_ABSTIME) && > copy_to_user(rmtp, &it.it_value, sizeof *rmtp)) > return -EFAULT; > @@ -1474,6 +1472,7 @@ int posix_cpu_nsleep(clockid_t which_clo > restart_block->fn = posix_cpu_clock_nanosleep_restart; > /* Caller already set restart_block->arg1 */ > restart_block->arg0 = which_clock; > + restart_block->arg1 = (unsigned long) rmtp; > restart_block->arg2 = rqtp->tv_sec; > restart_block->arg3 = rqtp->tv_nsec; > > @@ -1487,10 +1486,15 @@ static long > posix_cpu_clock_nanosleep_restart(struct restart_block *restart_block) > { > clockid_t which_clock = restart_block->arg0; > - struct timespec t = { .tv_sec = restart_block->arg2, > - .tv_nsec = restart_block->arg3 }; > + struct timespec __user *rmtp; > + struct timespec t; > + > + rmtp = (struct timespec __user *) restart_block->arg1; > + t.tv_sec = restart_block->arg2; > + t.tv_nsec = restart_block->arg3; > + > restart_block->fn = do_no_restart_syscall; > - return posix_cpu_nsleep(which_clock, TIMER_ABSTIME, &t); > + return posix_cpu_nsleep(which_clock, TIMER_ABSTIME, &t, rmtp); > } > > > @@ -1511,9 +1515,10 @@ static int process_cpu_timer_create(stru > return posix_cpu_timer_create(timer); > } > static int process_cpu_nsleep(clockid_t which_clock, int flags, > - struct timespec *rqtp) > + struct timespec *rqtp, > + struct timespec __user *rmtp) > { > - return posix_cpu_nsleep(PROCESS_CLOCK, flags, rqtp); > + return posix_cpu_nsleep(PROCESS_CLOCK, flags, rqtp, rmtp); > } > static int thread_cpu_clock_getres(clockid_t which_clock, struct timespec *tp) > { > @@ -1529,7 +1534,7 @@ static int thread_cpu_timer_create(struc > return posix_cpu_timer_create(timer); > } > static int thread_cpu_nsleep(clockid_t which_clock, int flags, > - struct timespec *rqtp) > + struct timespec *rqtp, struct timespec __user *rmtp) > { > return -EINVAL; > } > Index: linux-2.6.14-rc2-rt4/kernel/posix-timers.c > =================================================================== > --- linux-2.6.14-rc2-rt4.orig/kernel/posix-timers.c > +++ linux-2.6.14-rc2-rt4/kernel/posix-timers.c > @@ -48,21 +48,6 @@ > #include <linux/workqueue.h> > #include <linux/module.h> > > -#ifndef div_long_long_rem > -#include <asm/div64.h> > - > -#define div_long_long_rem(dividend,divisor,remainder) ({ \ > - u64 result = dividend; \ > - *remainder = do_div(result,divisor); \ > - result; }) > - > -#endif > -#define CLOCK_REALTIME_RES TICK_NSEC /* In nano seconds. */ > - > -static inline u64 mpy_l_X_l_ll(unsigned long mpy1,unsigned long mpy2) > -{ > - return (u64)mpy1 * mpy2; > -} > /* > * Management arrays for POSIX timers. Timers are kept in slab memory > * Timer ids are allocated by an external routine that keeps track of the > @@ -148,18 +133,18 @@ static DEFINE_SPINLOCK(idr_lock); > */ > > static struct k_clock posix_clocks[MAX_CLOCKS]; > + > /* > - * We only have one real clock that can be set so we need only one abs list, > - * even if we should want to have several clocks with differing resolutions. > + * These ones are defined below. > */ > -static struct k_clock_abs abs_list = {.list = LIST_HEAD_INIT(abs_list.list), > - .lock = SPIN_LOCK_UNLOCKED}; > +static int common_nsleep(clockid_t, int flags, struct timespec *t, > + struct timespec __user *rmtp); > +static void common_timer_get(struct k_itimer *, struct itimerspec *); > +static int common_timer_set(struct k_itimer *, int, > + struct itimerspec *, struct itimerspec *); > +static int common_timer_del(struct k_itimer *timer); > > -static void posix_timer_fn(unsigned long); > -static u64 do_posix_clock_monotonic_gettime_parts( > - struct timespec *tp, struct timespec *mo); > -int do_posix_clock_monotonic_gettime(struct timespec *tp); > -static int do_posix_clock_monotonic_get(clockid_t, struct timespec *tp); > +static void posix_timer_fn(void *data); > > static struct k_itimer *lock_timer(timer_t timer_id, unsigned long *flags); > > @@ -205,21 +190,25 @@ static inline int common_clock_set(clock > > static inline int common_timer_create(struct k_itimer *new_timer) > { > - INIT_LIST_HEAD(&new_timer->it.real.abs_timer_entry); > - init_timer(&new_timer->it.real.timer); > - new_timer->it.real.timer.data = (unsigned long) new_timer; > + return -EINVAL; > +} > + > +static int timer_create_mono(struct k_itimer *new_timer) > +{ > + init_ktimer_mono(&new_timer->it.real.timer); > + new_timer->it.real.timer.data = new_timer; > + new_timer->it.real.timer.function = posix_timer_fn; > + return 0; > +} > + > +static int timer_create_real(struct k_itimer *new_timer) > +{ > + init_ktimer_real(&new_timer->it.real.timer); > + new_timer->it.real.timer.data = new_timer; > new_timer->it.real.timer.function = posix_timer_fn; > return 0; > } > > -/* > - * These ones are defined below. > - */ > -static int common_nsleep(clockid_t, int flags, struct timespec *t); > -static void common_timer_get(struct k_itimer *, struct itimerspec *); > -static int common_timer_set(struct k_itimer *, int, > - struct itimerspec *, struct itimerspec *); > -static int common_timer_del(struct k_itimer *timer); > > /* > * Return nonzero iff we know a priori this clockid_t value is bogus. > @@ -239,19 +228,44 @@ static inline int invalid_clockid(clocki > return 1; > } > > +/* > + * Get real time for posix timers > + */ > +static int posix_get_ktime_real_ts(clockid_t which_clock, struct timespec *tp) > +{ > + get_ktime_real_ts(tp); > + return 0; > +} > + > +/* > + * Get monotonic time for posix timers > + */ > +static int posix_get_ktime_mono_ts(clockid_t which_clock, struct timespec *tp) > +{ > + get_ktime_mono_ts(tp); > + return 0; > +} > + > +void do_posix_clock_monotonic_gettime(struct timespec *ts) > +{ > + get_ktime_mono_ts(ts); > +} > > /* > * Initialize everything, well, just everything in Posix clocks/timers ;) > */ > static __init int init_posix_timers(void) > { > - struct k_clock clock_realtime = {.res = CLOCK_REALTIME_RES, > - .abs_struct = &abs_list > + struct k_clock clock_realtime = { > + .clock_getres = get_ktimer_real_res, > + .clock_get = posix_get_ktime_real_ts, > + .timer_create = timer_create_real, > }; > - struct k_clock clock_monotonic = {.res = CLOCK_REALTIME_RES, > - .abs_struct = NULL, > - .clock_get = do_posix_clock_monotonic_get, > - .clock_set = do_posix_clock_nosettime > + struct k_clock clock_monotonic = { > + .clock_getres = get_ktimer_mono_res, > + .clock_get = posix_get_ktime_mono_ts, > + .clock_set = do_posix_clock_nosettime, > + .timer_create = timer_create_mono, > }; > > register_posix_clock(CLOCK_REALTIME, &clock_realtime); > @@ -265,117 +279,17 @@ static __init int init_posix_timers(void > > __initcall(init_posix_timers); > > -static void tstojiffie(struct timespec *tp, int res, u64 *jiff) > -{ > - long sec = tp->tv_sec; > - long nsec = tp->tv_nsec + res - 1; > - > - if (nsec > NSEC_PER_SEC) { > - sec++; > - nsec -= NSEC_PER_SEC; > - } > - > - /* > - * The scaling constants are defined in <linux/time.h> > - * The difference between there and here is that we do the > - * res rounding and compute a 64-bit result (well so does that > - * but it then throws away the high bits). > - */ > - *jiff = (mpy_l_X_l_ll(sec, SEC_CONVERSION) + > - (mpy_l_X_l_ll(nsec, NSEC_CONVERSION) >> > - (NSEC_JIFFIE_SC - SEC_JIFFIE_SC))) >> SEC_JIFFIE_SC; > -} > - > -/* > - * This function adjusts the timer as needed as a result of the clock > - * being set. It should only be called for absolute timers, and then > - * under the abs_list lock. It computes the time difference and sets > - * the new jiffies value in the timer. It also updates the timers > - * reference wall_to_monotonic value. It is complicated by the fact > - * that tstojiffies() only handles positive times and it needs to work > - * with both positive and negative times. Also, for negative offsets, > - * we need to defeat the res round up. > - * > - * Return is true if there is a new time, else false. > - */ > -static long add_clockset_delta(struct k_itimer *timr, > - struct timespec *new_wall_to) > -{ > - struct timespec delta; > - int sign = 0; > - u64 exp; > - > - set_normalized_timespec(&delta, > - new_wall_to->tv_sec - > - timr->it.real.wall_to_prev.tv_sec, > - new_wall_to->tv_nsec - > - timr->it.real.wall_to_prev.tv_nsec); > - if (likely(!(delta.tv_sec | delta.tv_nsec))) > - return 0; > - if (delta.tv_sec < 0) { > - set_normalized_timespec(&delta, > - -delta.tv_sec, > - 1 - delta.tv_nsec - > - posix_clocks[timr->it_clock].res); > - sign++; > - } > - tstojiffie(&delta, posix_clocks[timr->it_clock].res, &exp); > - timr->it.real.wall_to_prev = *new_wall_to; > - timr->it.real.timer.expires += (sign ? -exp : exp); > - return 1; > -} > - > -static void remove_from_abslist(struct k_itimer *timr) > -{ > - if (!list_empty(&timr->it.real.abs_timer_entry)) { > - spin_lock(&abs_list.lock); > - list_del_init(&timr->it.real.abs_timer_entry); > - spin_unlock(&abs_list.lock); > - } > -} > > static void schedule_next_timer(struct k_itimer *timr) > { > - struct timespec new_wall_to; > - struct now_struct now; > - unsigned long seq; > - > - /* > - * Set up the timer for the next interval (if there is one). > - * Note: this code uses the abs_timer_lock to protect > - * it.real.wall_to_prev and must hold it until exp is set, not exactly > - * obvious... > - > - * This function is used for CLOCK_REALTIME* and > - * CLOCK_MONOTONIC* timers. If we ever want to handle other > - * CLOCKs, the calling code (do_schedule_next_timer) would need > - * to pull the "clock" info from the timer and dispatch the > - * "other" CLOCKs "next timer" code (which, I suppose should > - * also be added to the k_clock structure). > - */ > - if (!timr->it.real.incr) > + if (ktime_cmp_val(timr->it.real.incr, ==, KTIME_ZERO)) > return; > > - do { > - seq = read_seqbegin(&xtime_lock); > - new_wall_to = wall_to_monotonic; > - posix_get_now(&now); > - } while (read_seqretry(&xtime_lock, seq)); > - > - if (!list_empty(&timr->it.real.abs_timer_entry)) { > - spin_lock(&abs_list.lock); > - add_clockset_delta(timr, &new_wall_to); > - > - posix_bump_timer(timr, now); > - > - spin_unlock(&abs_list.lock); > - } else { > - posix_bump_timer(timr, now); > - } > - timr->it_overrun_last = timr->it_overrun; > - timr->it_overrun = -1; > + timr->it_overrun_last = timr->it.real.overrun; > + timr->it.real.overrun = timr->it.real.timer.overrun = -1; > ++timr->it_requeue_pending; > - add_timer(&timr->it.real.timer); > + start_ktimer(&timr->it.real.timer, &timr->it.real.incr, KTIMER_FORWARD); > + timr->it.real.overrun = timr->it.real.timer.overrun; > } > > /* > @@ -413,14 +327,7 @@ int posix_timer_event(struct k_itimer *t > { > memset(&timr->sigq->info, 0, sizeof(siginfo_t)); > timr->sigq->info.si_sys_private = si_private; > - /* > - * Send signal to the process that owns this timer. > - > - * This code assumes that all the possible abs_lists share the > - * same lock (there is only one list at this time). If this is > - * not the case, the CLOCK info would need to be used to find > - * the proper abs list lock. > - */ > + /* Send signal to the process that owns this timer.*/ > > timr->sigq->info.si_signo = timr->it_sigev_signo; > timr->sigq->info.si_errno = 0; > @@ -454,65 +361,28 @@ EXPORT_SYMBOL_GPL(posix_timer_event); > > * This code is for CLOCK_REALTIME* and CLOCK_MONOTONIC* timers. > */ > -static void posix_timer_fn(unsigned long __data) > +static void posix_timer_fn(void *data) > { > - struct k_itimer *timr = (struct k_itimer *) __data; > + struct k_itimer *timr = data; > unsigned long flags; > - unsigned long seq; > - struct timespec delta, new_wall_to; > - u64 exp = 0; > - int do_notify = 1; > + int si_private = 0; > > spin_lock_irqsave(&timr->it_lock, flags); > - if (!list_empty(&timr->it.real.abs_timer_entry)) { > - spin_lock(&abs_list.lock); > - do { > - seq = read_seqbegin(&xtime_lock); > - new_wall_to = wall_to_monotonic; > - } while (read_seqretry(&xtime_lock, seq)); > - set_normalized_timespec(&delta, > - new_wall_to.tv_sec - > - timr->it.real.wall_to_prev.tv_sec, > - new_wall_to.tv_nsec - > - timr->it.real.wall_to_prev.tv_nsec); > - if (likely((delta.tv_sec | delta.tv_nsec ) == 0)) { > - /* do nothing, timer is on time */ > - } else if (delta.tv_sec < 0) { > - /* do nothing, timer is already late */ > - } else { > - /* timer is early due to a clock set */ > - tstojiffie(&delta, > - posix_clocks[timr->it_clock].res, > - &exp); > - timr->it.real.wall_to_prev = new_wall_to; > - timr->it.real.timer.expires += exp; > - add_timer(&timr->it.real.timer); > - do_notify = 0; > - } > - spin_unlock(&abs_list.lock); > > - } > - if (do_notify) { > - int si_private=0; > + if (ktime_cmp_val(timr->it.real.incr, !=, KTIME_ZERO)) > + si_private = ++timr->it_requeue_pending; > > - if (timr->it.real.incr) > - si_private = ++timr->it_requeue_pending; > - else { > - remove_from_abslist(timr); > - } > + if (posix_timer_event(timr, si_private)) > + /* > + * signal was not sent because of sig_ignor > + * we will not get a call back to restart it AND > + * it should be restarted. > + */ > + schedule_next_timer(timr); > > - if (posix_timer_event(timr, si_private)) > - /* > - * signal was not sent because of sig_ignor > - * we will not get a call back to restart it AND > - * it should be restarted. > - */ > - schedule_next_timer(timr); > - } > unlock_timer(timr, flags); /* hold thru abs lock to keep irq off */ > } > > - > static inline struct task_struct * good_sigevent(sigevent_t * event) > { > struct task_struct *rtn = current->group_leader; > @@ -776,39 +646,40 @@ static struct k_itimer * lock_timer(time > static void > common_timer_get(struct k_itimer *timr, struct itimerspec *cur_setting) > { > - unsigned long expires; > - struct now_struct now; > + ktime_t expires, now, remaining; > + struct ktimer *timer = &timr->it.real.timer; > > - do > - expires = timr->it.real.timer.expires; > - while ((volatile long) (timr->it.real.timer.expires) != expires); > - > - posix_get_now(&now); > - > - if (expires && > - ((timr->it_sigev_notify & ~SIGEV_THREAD_ID) == SIGEV_NONE) && > - !timr->it.real.incr && > - posix_time_before(&timr->it.real.timer, &now)) > - timr->it.real.timer.expires = expires = 0; > - if (expires) { > - if (timr->it_requeue_pending & REQUEUE_PENDING || > - (timr->it_sigev_notify & ~SIGEV_THREAD_ID) == SIGEV_NONE) { > - posix_bump_timer(timr, now); > - expires = timr->it.real.timer.expires; > - } > - else > - if (!timer_pending(&timr->it.real.timer)) > - expires = 0; > - if (expires) > - expires -= now.jiffies; > - } > - jiffies_to_timespec(expires, &cur_setting->it_value); > - jiffies_to_timespec(timr->it.real.incr, &cur_setting->it_interval); > - > - if (cur_setting->it_value.tv_sec < 0) { > + memset(cur_setting, 0, sizeof(struct itimerspec)); > + expires = get_expiry_ktimer(timer, &now); > + remaining = ktime_sub(expires, now); > + > + /* Time left ? or timer pending */ > + if (ktime_cmp_val(remaining, >, KTIME_ZERO) || ktimer_active(timer)) > + goto calci; > + /* interval timer ? */ > + if (ktime_cmp_val(timr->it.real.incr, ==, 0)) > + return; > + /* > + * When a requeue is pending or this is a SIGEV_NONE timer > + * move the expiry time forward by intervals, so expiry is > > + * now. > + * The active (non SIGEV_NONE) rearm should be done > + * automatically by the ktimer REARM mode. Thats the next > + * iteration. The REQUEUE_PENDING part will go away ! > + */ > + if (timr->it_requeue_pending & REQUEUE_PENDING || > + (timr->it_sigev_notify & ~SIGEV_THREAD_ID) == SIGEV_NONE) { > + remaining = forward_posix_timer(timr, now); > + } > + calci: > + /* interval timer ? */ > + if (ktime_cmp_val(timr->it.real.incr, !=, KTIME_ZERO)) > + cur_setting->it_interval = ktime_to_timespec(timr->it.real.incr); > + /* Return 0 only, when the timer is expired and not pending */ > + if (ktime_cmp_val(remaining, <=, KTIME_ZERO)) > cur_setting->it_value.tv_nsec = 1; > - cur_setting->it_value.tv_sec = 0; > - } > + else > + cur_setting->it_value = ktime_to_timespec(remaining); > } > > /* Get the time remaining on a POSIX.1b interval timer. */ > @@ -832,6 +703,7 @@ sys_timer_gettime(timer_t timer_id, stru > > return 0; > } > + > /* > * Get the number of overruns of a POSIX.1b interval timer. This is to > * be the overrun of the timer last delivered. At the same time we are > @@ -858,84 +730,6 @@ sys_timer_getoverrun(timer_t timer_id) > > return overrun; > } > -/* > - * Adjust for absolute time > - * > - * If absolute time is given and it is not CLOCK_MONOTONIC, we need to > - * adjust for the offset between the timer clock (CLOCK_MONOTONIC) and > - * what ever clock he is using. > - * > - * If it is relative time, we need to add the current (CLOCK_MONOTONIC) > - * time to it to get the proper time for the timer. > - */ > -static int adjust_abs_time(struct k_clock *clock, struct timespec *tp, > - int abs, u64 *exp, struct timespec *wall_to) > -{ > - struct timespec now; > - struct timespec oc = *tp; > - u64 jiffies_64_f; > - int rtn =0; > - > - if (abs) { > - /* > - * The mask pick up the 4 basic clocks > - */ > - if (!((clock - &posix_clocks[0]) & ~CLOCKS_MASK)) { > - jiffies_64_f = do_posix_clock_monotonic_gettime_parts( > - &now, wall_to); > - /* > - * If we are doing a MONOTONIC clock > - */ > - if((clock - &posix_clocks[0]) & CLOCKS_MONO){ > - now.tv_sec += wall_to->tv_sec; > - now.tv_nsec += wall_to->tv_nsec; > - } > - } else { > - /* > - * Not one of the basic clocks > - */ > - clock->clock_get(clock - posix_clocks, &now); > - jiffies_64_f = get_jiffies_64(); > - } > - /* > - * Take away now to get delta and normalize > - */ > - set_normalized_timespec(&oc, oc.tv_sec - now.tv_sec, > - oc.tv_nsec - now.tv_nsec); > - }else{ > - jiffies_64_f = get_jiffies_64(); > - } > - /* > - * Check if the requested time is prior to now (if so set now) > - */ > - if (oc.tv_sec < 0) > - oc.tv_sec = oc.tv_nsec = 0; > - > - if (oc.tv_sec | oc.tv_nsec) > - set_normalized_timespec(&oc, oc.tv_sec, > - oc.tv_nsec + clock->res); > - tstojiffie(&oc, clock->res, exp); > - > - /* > - * Check if the requested time is more than the timer code > - * can handle (if so we error out but return the value too). > - */ > - if (*exp > ((u64)MAX_JIFFY_OFFSET)) > - /* > - * This is a considered response, not exactly in > - * line with the standard (in fact it is silent on > - * possible overflows). We assume such a large > - * value is ALMOST always a programming error and > - * try not to compound it by setting a really dumb > - * value. > - */ > - rtn = -EINVAL; > - /* > - * return the actual jiffies expire time, full 64 bits > - */ > - *exp += jiffies_64_f; > - return rtn; > -} > > /* Set a POSIX.1b interval timer. */ > /* timr->it_lock is taken. */ > @@ -943,68 +737,52 @@ static inline int > common_timer_set(struct k_itimer *timr, int flags, > struct itimerspec *new_setting, struct itimerspec *old_setting) > { > - struct k_clock *clock = &posix_clocks[timr->it_clock]; > - u64 expire_64; > + ktime_t expires; > + int mode; > > if (old_setting) > common_timer_get(timr, old_setting); > > /* disable the timer */ > - timr->it.real.incr = 0; > + ktime_set_zero(timr->it.real.incr); > /* > * careful here. If smp we could be in the "fire" routine which will > * be spinning as we hold the lock. But this is ONLY an SMP issue. > */ > - if (try_to_del_timer_sync(&timr->it.real.timer) < 0) { > -#ifdef CONFIG_SMP > - /* > - * It can only be active if on an other cpu. Since > - * we have cleared the interval stuff above, it should > - * clear once we release the spin lock. Of course once > - * we do that anything could happen, including the > - * complete melt down of the timer. So return with > - * a "retry" exit status. > - */ > + if (try_to_stop_ktimer(&timr->it.real.timer) < 0) > return TIMER_RETRY; > -#endif > - } > - > - remove_from_abslist(timr); > > timr->it_requeue_pending = (timr->it_requeue_pending + 2) & > ~REQUEUE_PENDING; > timr->it_overrun_last = 0; > timr->it_overrun = -1; > - /* > - *switch off the timer when it_value is zero > - */ > - if (!new_setting->it_value.tv_sec && !new_setting->it_value.tv_nsec) { > - timr->it.real.timer.expires = 0; > + > + /* switch off the timer when it_value is zero */ > + if (!new_setting->it_value.tv_sec && !new_setting->it_value.tv_nsec) > return 0; > - } > > - if (adjust_abs_time(clock, > - &new_setting->it_value, flags & TIMER_ABSTIME, > - &expire_64, &(timr->it.real.wall_to_prev))) { > - return -EINVAL; > - } > - timr->it.real.timer.expires = (unsigned long)expire_64; > - tstojiffie(&new_setting->it_interval, clock->res, &expire_64); > - timr->it.real.incr = (unsigned long)expire_64; > + mode = flags & TIMER_ABSTIME ? KTIMER_ABS : KTIMER_REL; > > - /* > - * We do not even queue SIGEV_NONE timers! But we do put them > - * in the abs list so we can do that right. > + /* Posix madness. Only absolute CLOCK_REALTIME timers > + * are affected by clock sets. So we must reiniatilize > + * the timer. > */ > + if (timr->it_clock == CLOCK_REALTIME && mode == KTIMER_ABS) > + timer_create_real(timr); > + else > + timer_create_mono(timr); > + > + expires = ktimer_convert_timespec(&timr->it.real.timer, > + &new_setting->it_value); > + /* This should be moved to the auto rearm code */ > + timr->it.real.incr = ktimer_convert_timespec(&timr->it.real.timer, > + &new_setting->it_interval); > + > + /* SIGEV_NONE timers are not queued ! See common_timer_get */ > if (((timr->it_sigev_notify & ~SIGEV_THREAD_ID) != SIGEV_NONE)) > - add_timer(&timr->it.real.timer); > + start_ktimer(&timr->it.real.timer, &expires, > + mode | KTIMER_NOCHECK); > > - if (flags & TIMER_ABSTIME && clock->abs_struct) { > - spin_lock(&clock->abs_struct->lock); > - list_add_tail(&(timr->it.real.abs_timer_entry), > - &(clock->abs_struct->list)); > - spin_unlock(&clock->abs_struct->lock); > - } > return 0; > } > > @@ -1039,6 +817,7 @@ retry: > > unlock_timer(timr, flag); > if (error == TIMER_RETRY) { > + wait_for_ktimer(&timr->it.real.timer); > rtn = NULL; // We already got the old time... > goto retry; > } > @@ -1052,24 +831,10 @@ retry: > > static inline int common_timer_del(struct k_itimer *timer) > { > - timer->it.real.incr = 0; > + ktime_set_zero(timer->it.real.incr); > > - if (try_to_del_timer_sync(&timer->it.real.timer) < 0) { > -#ifdef CONFIG_SMP > - /* > - * It can only be active if on an other cpu. Since > - * we have cleared the interval stuff above, it should > - * clear once we release the spin lock. Of course once > - * we do that anything could happen, including the > - * complete melt down of the timer. So return with > - * a "retry" exit status. > - */ > + if (try_to_stop_ktimer(&timer->it.real.timer) < 0) > return TIMER_RETRY; > -#endif > - } > - > - remove_from_abslist(timer); > - > return 0; > } > > @@ -1085,24 +850,17 @@ sys_timer_delete(timer_t timer_id) > struct k_itimer *timer; > long flags; > > -#ifdef CONFIG_SMP > - int error; > retry_delete: > -#endif > timer = lock_timer(timer_id, &flags); > if (!timer) > return -EINVAL; > > -#ifdef CONFIG_SMP > - error = timer_delete_hook(timer); > - > - if (error == TIMER_RETRY) { > + if (timer_delete_hook(timer) == TIMER_RETRY) { > unlock_timer(timer, flags); > + wait_for_ktimer(&timer->it.real.timer); > goto retry_delete; > } > -#else > - timer_delete_hook(timer); > -#endif > + > spin_lock(¤t->sighand->siglock); > list_del(&timer->list); > spin_unlock(¤t->sighand->siglock); > @@ -1119,6 +877,7 @@ retry_delete: > release_posix_timer(timer, IT_ID_SET); > return 0; > } > + > /* > * return timer owned by the process, used by exit_itimers > */ > @@ -1126,22 +885,14 @@ static inline void itimer_delete(struct > { > unsigned long flags; > > -#ifdef CONFIG_SMP > - int error; > retry_delete: > -#endif > spin_lock_irqsave(&timer->it_lock, flags); > > -#ifdef CONFIG_SMP > - error = timer_delete_hook(timer); > - > - if (error == TIMER_RETRY) { > + if (timer_delete_hook(timer) == TIMER_RETRY) { > unlock_timer(timer, flags); > + wait_for_ktimer(&timer->it.real.timer); > goto retry_delete; > } > -#else > - timer_delete_hook(timer); > -#endif > list_del(&timer->list); > /* > * This keeps any tasks waiting on the spin lock from thinking > @@ -1170,60 +921,7 @@ void exit_itimers(struct signal_struct * > } > } > > -/* > - * And now for the "clock" calls > - * > - * These functions are called both from timer functions (with the timer > - * spin_lock_irq() held and from clock calls with no locking. They must > - * use the save flags versions of locks. > - */ > - > -/* > - * We do ticks here to avoid the irq lock ( they take sooo long). > - * The seqlock is great here. Since we a reader, we don't really care > - * if we are interrupted since we don't take lock that will stall us or > - * any other cpu. Voila, no irq lock is needed. > - * > - */ > - > -static u64 do_posix_clock_monotonic_gettime_parts( > - struct timespec *tp, struct timespec *mo) > -{ > - u64 jiff; > - unsigned int seq; > - > - do { > - seq = read_seqbegin(&xtime_lock); > - getnstimeofday(tp); > - *mo = wall_to_monotonic; > - jiff = jiffies_64; > - > - } while(read_seqretry(&xtime_lock, seq)); > - > - return jiff; > -} > - > -static int do_posix_clock_monotonic_get(clockid_t clock, struct timespec *tp) > -{ > - struct timespec wall_to_mono; > - > - do_posix_clock_monotonic_gettime_parts(tp, &wall_to_mono); > - > - tp->tv_sec += wall_to_mono.tv_sec; > - tp->tv_nsec += wall_to_mono.tv_nsec; > - > - if ((tp->tv_nsec - NSEC_PER_SEC) > 0) { > - tp->tv_nsec -= NSEC_PER_SEC; > - tp->tv_sec++; > - } > - return 0; > -} > - > -int do_posix_clock_monotonic_gettime(struct timespec *tp) > -{ > - return do_posix_clock_monotonic_get(CLOCK_MONOTONIC, tp); > -} > - > +/* Not available / possible... functions */ > int do_posix_clock_nosettime(clockid_t clockid, struct timespec *tp) > { > return -EINVAL; > @@ -1236,7 +934,8 @@ int do_posix_clock_notimer_create(struct > } > EXPORT_SYMBOL_GPL(do_posix_clock_notimer_create); > > -int do_posix_clock_nonanosleep(clockid_t clock, int flags, struct timespec *t) > +int do_posix_clock_nonanosleep(clockid_t clock, int flags, struct timespec *t, > + struct timespec __user *r) > { > #ifndef ENOTSUP > return -EOPNOTSUPP; /* aka ENOTSUP in userland for POSIX */ > @@ -1295,125 +994,34 @@ sys_clock_getres(clockid_t which_clock, > return error; > } > > -static void nanosleep_wake_up(unsigned long __data) > -{ > - struct task_struct *p = (struct task_struct *) __data; > - > - wake_up_process(p); > -} > - > /* > - * The standard says that an absolute nanosleep call MUST wake up at > - * the requested time in spite of clock settings. Here is what we do: > - * For each nanosleep call that needs it (only absolute and not on > - * CLOCK_MONOTONIC* (as it can not be set)) we thread a little structure > - * into the "nanosleep_abs_list". All we need is the task_struct pointer. > - * When ever the clock is set we just wake up all those tasks. The rest > - * is done by the while loop in clock_nanosleep(). > - * > - * On locking, clock_was_set() is called from update_wall_clock which > - * holds (or has held for it) a write_lock_irq( xtime_lock) and is > - * called from the timer bh code. Thus we need the irq save locks. > - * > - * Also, on the call from update_wall_clock, that is done as part of a > - * softirq thing. We don't want to delay the system that much (possibly > - * long list of timers to fix), so we defer that work to keventd. > + * nanosleep for monotonic and realtime clocks > */ > - > -static DECLARE_WAIT_QUEUE_HEAD(nanosleep_abs_wqueue); > -static DECLARE_WORK(clock_was_set_work, (void(*)(void*))clock_was_set, NULL); > - > -static DECLARE_MUTEX(clock_was_set_lock); > - > -void clock_was_set(void) > +static int common_nsleep(clockid_t which_clock, int flags, > + struct timespec *tsave, struct timespec __user *rmtp) > { > - struct k_itimer *timr; > - struct timespec new_wall_to; > - LIST_HEAD(cws_list); > - unsigned long seq; > - > + int mode = flags & TIMER_ABSTIME ? KTIMER_ABS : KTIMER_REL; > > - if (unlikely(in_interrupt())) { > - schedule_work(&clock_was_set_work); > - return; > + switch (which_clock) { > + case CLOCK_REALTIME: > + /* Posix madness. Only absolute timers on clock realtime > + are affected by clock set. */ > + if (mode == KTIMER_ABS) > + return ktimer_nanosleep_real(tsave, rmtp, mode); > + case CLOCK_MONOTONIC: > + return ktimer_nanosleep_mono(tsave, rmtp, mode); > + default: > + break; > } > - wake_up_all(&nanosleep_abs_wqueue); > - > - /* > - * Check if there exist TIMER_ABSTIME timers to correct. > - * > - * Notes on locking: This code is run in task context with irq > - * on. We CAN be interrupted! All other usage of the abs list > - * lock is under the timer lock which holds the irq lock as > - * well. We REALLY don't want to scan the whole list with the > - * interrupt system off, AND we would like a sequence lock on > - * this code as well. Since we assume that the clock will not > - * be set often, it seems ok to take and release the irq lock > - * for each timer. In fact add_timer will do this, so this is > - * not an issue. So we know when we are done, we will move the > - * whole list to a new location. Then as we process each entry, > - * we will move it to the actual list again. This way, when our > - * copy is empty, we are done. We are not all that concerned > - * about preemption so we will use a semaphore lock to protect > - * aginst reentry. This way we will not stall another > - * processor. It is possible that this may delay some timers > - * that should have expired, given the new clock, but even this > - * will be minimal as we will always update to the current time, > - * even if it was set by a task that is waiting for entry to > - * this code. Timers that expire too early will be caught by > - * the expire code and restarted. > - > - * Absolute timers that repeat are left in the abs list while > - * waiting for the task to pick up the signal. This means we > - * may find timers that are not in the "add_timer" list, but are > - * in the abs list. We do the same thing for these, save > - * putting them back in the "add_timer" list. (Note, these are > - * left in the abs list mainly to indicate that they are > - * ABSOLUTE timers, a fact that is used by the re-arm code, and > - * for which we have no other flag.) > - > - */ > - > - down(&clock_was_set_lock); > - spin_lock_irq(&abs_list.lock); > - list_splice_init(&abs_list.list, &cws_list); > - spin_unlock_irq(&abs_list.lock); > - do { > - do { > - seq = read_seqbegin(&xtime_lock); > - new_wall_to = wall_to_monotonic; > - } while (read_seqretry(&xtime_lock, seq)); > - > - spin_lock_irq(&abs_list.lock); > - if (list_empty(&cws_list)) { > - spin_unlock_irq(&abs_list.lock); > - break; > - } > - timr = list_entry(cws_list.next, struct k_itimer, > - it.real.abs_timer_entry); > - > - list_del_init(&timr->it.real.abs_timer_entry); > - if (add_clockset_delta(timr, &new_wall_to) && > - del_timer(&timr->it.real.timer)) /* timer run yet? */ > - add_timer(&timr->it.real.timer); > - list_add(&timr->it.real.abs_timer_entry, &abs_list.list); > - spin_unlock_irq(&abs_list.lock); > - } while (1); > - > - up(&clock_was_set_lock); > + return -EINVAL; > } > > -long clock_nanosleep_restart(struct restart_block *restart_block); > - > asmlinkage long > sys_clock_nanosleep(clockid_t which_clock, int flags, > const struct timespec __user *rqtp, > struct timespec __user *rmtp) > { > struct timespec t; > - struct restart_block *restart_block = > - &(current_thread_info()->restart_block); > - int ret; > > if (invalid_clockid(which_clock)) > return -EINVAL; > @@ -1421,135 +1029,8 @@ sys_clock_nanosleep(clockid_t which_cloc > if (copy_from_user(&t, rqtp, sizeof (struct timespec))) > return -EFAULT; > > - if ((unsigned) t.tv_nsec >= NSEC_PER_SEC || t.tv_sec < 0) > + if (!timespec_valid(&t)) > return -EINVAL; > > - /* > - * Do this here as nsleep function does not have the real address. > - */ > - restart_block->arg1 = (unsigned long)rmtp; > - > - ret = CLOCK_DISPATCH(which_clock, nsleep, (which_clock, flags, &t)); > - > - if ((ret == -ERESTART_RESTARTBLOCK) && rmtp && > - copy_to_user(rmtp, &t, sizeof (t))) > - return -EFAULT; > - return ret; > -} > - > - > -static int common_nsleep(clockid_t which_clock, > - int flags, struct timespec *tsave) > -{ > - struct timespec t, dum; > - struct timer_list new_timer; > - DECLARE_WAITQUEUE(abs_wqueue, current); > - u64 rq_time = (u64)0; > - s64 left; > - int abs; > - struct restart_block *restart_block = > - ¤t_thread_info()->restart_block; > - > - abs_wqueue.flags = 0; > - init_timer(&new_timer); > - new_timer.expires = 0; > - new_timer.data = (unsigned long) current; > - new_timer.function = nanosleep_wake_up; > - abs = flags & TIMER_ABSTIME; > - > - if (restart_block->fn == clock_nanosleep_restart) { > - /* > - * Interrupted by a non-delivered signal, pick up remaining > - * time and continue. Remaining time is in arg2 & 3. > - */ > - restart_block->fn = do_no_restart_syscall; > - > - rq_time = restart_block->arg3; > - rq_time = (rq_time << 32) + restart_block->arg2; > - if (!rq_time) > - return -EINTR; > - left = rq_time - get_jiffies_64(); > - if (left <= (s64)0) > - return 0; /* Already passed */ > - } > - > - if (abs && (posix_clocks[which_clock].clock_get != > - posix_clocks[CLOCK_MONOTONIC].clock_get)) > - add_wait_queue(&nanosleep_abs_wqueue, &abs_wqueue); > - > - do { > - t = *tsave; > - if (abs || !rq_time) { > - adjust_abs_time(&posix_clocks[which_clock], &t, abs, > - &rq_time, &dum); > - } > - > - left = rq_time - get_jiffies_64(); > - if (left >= (s64)MAX_JIFFY_OFFSET) > - left = (s64)MAX_JIFFY_OFFSET; > - if (left < (s64)0) > - break; > - > - new_timer.expires = jiffies + left; > - __set_current_state(TASK_INTERRUPTIBLE); > - add_timer(&new_timer); > - > - schedule(); > - > - del_timer_sync(&new_timer); > - left = rq_time - get_jiffies_64(); > - } while (left > (s64)0 && !test_thread_flag(TIF_SIGPENDING)); > - > - if (abs_wqueue.task_list.next) > - finish_wait(&nanosleep_abs_wqueue, &abs_wqueue); > - > - if (left > (s64)0) { > - > - /* > - * Always restart abs calls from scratch to pick up any > - * clock shifting that happened while we are away. > - */ > - if (abs) > - return -ERESTARTNOHAND; > - > - left *= TICK_NSEC; > - tsave->tv_sec = div_long_long_rem(left, > - NSEC_PER_SEC, > - &tsave->tv_nsec); > - /* > - * Restart works by saving the time remaing in > - * arg2 & 3 (it is 64-bits of jiffies). The other > - * info we need is the clock_id (saved in arg0). > - * The sys_call interface needs the users > - * timespec return address which _it_ saves in arg1. > - * Since we have cast the nanosleep call to a clock_nanosleep > - * both can be restarted with the same code. > - */ > - restart_block->fn = clock_nanosleep_restart; > - restart_block->arg0 = which_clock; > - /* > - * Caller sets arg1 > - */ > - restart_block->arg2 = rq_time & 0xffffffffLL; > - restart_block->arg3 = rq_time >> 32; > - > - return -ERESTART_RESTARTBLOCK; > - } > - > - return 0; > -} > -/* > - * This will restart clock_nanosleep. > - */ > -long > -clock_nanosleep_restart(struct restart_block *restart_block) > -{ > - struct timespec t; > - int ret = common_nsleep(restart_block->arg0, 0, &t); > - > - if ((ret == -ERESTART_RESTARTBLOCK) && restart_block->arg1 && > - copy_to_user((struct timespec __user *)(restart_block->arg1), &t, > - sizeof (t))) > - return -EFAULT; > - return ret; > + return CLOCK_DISPATCH(which_clock, nsleep, (which_clock, flags, &t, rmtp)); > } > Index: linux-2.6.14-rc2-rt4/kernel/timer.c > =================================================================== > --- linux-2.6.14-rc2-rt4.orig/kernel/timer.c > +++ linux-2.6.14-rc2-rt4/kernel/timer.c > @@ -912,6 +912,7 @@ static void run_timer_softirq(struct sof > { > tvec_base_t *base = &__get_cpu_var(tvec_bases); > > + run_ktimer_queues(); > if (time_after_eq(jiffies, base->timer_jiffies)) > __run_timers(base); > } > @@ -1177,62 +1178,6 @@ asmlinkage long sys_gettid(void) > return current->pid; > } > > -static long __sched nanosleep_restart(struct restart_block *restart) > -{ > - unsigned long expire = restart->arg0, now = jiffies; > - struct timespec __user *rmtp = (struct timespec __user *) restart->arg1; > - long ret; > - > - /* Did it expire while we handled signals? */ > - if (!time_after(expire, now)) > - return 0; > - > - expire = schedule_timeout_interruptible(expire - now); > - > - ret = 0; > - if (expire) { > - struct timespec t; > - jiffies_to_timespec(expire, &t); > - > - ret = -ERESTART_RESTARTBLOCK; > - if (rmtp && copy_to_user(rmtp, &t, sizeof(t))) > - ret = -EFAULT; > - /* The 'restart' block is already filled in */ > - } > - return ret; > -} > - > -asmlinkage long sys_nanosleep(struct timespec __user *rqtp, struct timespec __user *rmtp) > -{ > - struct timespec t; > - unsigned long expire; > - long ret; > - > - if (copy_from_user(&t, rqtp, sizeof(t))) > - return -EFAULT; > - > - if ((t.tv_nsec >= 1000000000L) || (t.tv_nsec < 0) || (t.tv_sec < 0)) > - return -EINVAL; > - > - expire = timespec_to_jiffies(&t) + (t.tv_sec || t.tv_nsec); > - expire = schedule_timeout_interruptible(expire); > - > - ret = 0; > - if (expire) { > - struct restart_block *restart; > - jiffies_to_timespec(expire, &t); > - if (rmtp && copy_to_user(rmtp, &t, sizeof(t))) > - return -EFAULT; > - > - restart = ¤t_thread_info()->restart_block; > - restart->fn = nanosleep_restart; > - restart->arg0 = jiffies + expire; > - restart->arg1 = (unsigned long) rmtp; > - ret = -ERESTART_RESTARTBLOCK; > - } > - return ret; > -} > - > /* > * sys_sysinfo - fill in sysinfo struct > */ > Index: linux-2.6.14-rc2-rt4/include/linux/time.h > =================================================================== > --- linux-2.6.14-rc2-rt4.orig/include/linux/time.h > +++ linux-2.6.14-rc2-rt4/include/linux/time.h > @@ -4,6 +4,7 @@ > #include <linux/types.h> > > #ifdef __KERNEL__ > +#include <linux/calc64.h> > #include <linux/seqlock.h> > #endif > > @@ -38,6 +39,11 @@ static __inline__ int timespec_equal(str > return (a->tv_sec == b->tv_sec) && (a->tv_nsec == b->tv_nsec); > } > > +#define timespec_valid(ts) \ > +(((ts)->tv_sec >= 0) && (((unsigned) (ts)->tv_nsec) < NSEC_PER_SEC)) > + > +typedef s64 nsec_t; > + > /* Converts Gregorian date to seconds since 1970-01-01 00:00:00. > * Assumes input in normal date format, i.e. 1980-12-31 23:59:59 > * => year=1980, mon=12, day=31, hour=23, min=59, sec=59. > @@ -88,8 +94,7 @@ struct timespec current_kernel_time(void > extern void do_gettimeofday(struct timeval *tv); > extern int do_settimeofday(struct timespec *tv); > extern int do_sys_settimeofday(struct timespec *tv, struct timezone *tz); > -extern void clock_was_set(void); // call when ever the clock is set > -extern int do_posix_clock_monotonic_gettime(struct timespec *tp); > +extern void do_posix_clock_monotonic_gettime(struct timespec *ts); > extern long do_utimes(char __user * filename, struct timeval * times); > struct itimerval; > extern int do_setitimer(int which, struct itimerval *value, struct itimerval *ovalue); > @@ -113,6 +118,40 @@ set_normalized_timespec (struct timespec > ts->tv_nsec = nsec; > } > > +static __inline__ nsec_t timespec_to_ns(struct timespec *s) > +{ > + nsec_t res = (nsec_t) s->tv_sec * NSEC_PER_SEC; > + return res + (nsec_t) s->tv_nsec; > +} > + > +static __inline__ struct timespec ns_to_timespec(nsec_t n) > +{ > + struct timespec ts; > + > + if (n) > + ts.tv_sec = div_long_long_rem_signed(n, NSEC_PER_SEC, &ts.tv_nsec); > + else > + ts.tv_sec = ts.tv_nsec = 0; > + return ts; > +} > + > +static __inline__ nsec_t timeval_to_ns(struct timeval *s) > +{ > + nsec_t res = (nsec_t) s->tv_sec * NSEC_PER_SEC; > + return res + (nsec_t) s->tv_usec * NSEC_PER_USEC; > +} > + > +static __inline__ struct timeval ns_to_timeval(nsec_t n) > +{ > + struct timeval tv; > + if (n) { > + tv.tv_sec = div_long_long_rem_signed(n, NSEC_PER_SEC, &tv.tv_usec); > + tv.tv_usec /= 1000; > + } else > + tv.tv_sec = tv.tv_usec = 0; > + return tv; > +} > + > #endif /* __KERNEL__ */ > > #define NFDBITS __NFDBITS > @@ -145,23 +184,18 @@ struct itimerval { > /* > * The IDs of the various system clocks (for POSIX.1b interval timers). > */ > -#define CLOCK_REALTIME 0 > -#define CLOCK_MONOTONIC 1 > +#define CLOCK_REALTIME 0 > +#define CLOCK_MONOTONIC 1 > #define CLOCK_PROCESS_CPUTIME_ID 2 > #define CLOCK_THREAD_CPUTIME_ID 3 > -#define CLOCK_REALTIME_HR 4 > -#define CLOCK_MONOTONIC_HR 5 > > /* > * The IDs of various hardware clocks > */ > - > - > #define CLOCK_SGI_CYCLE 10 > #define MAX_CLOCKS 16 > -#define CLOCKS_MASK (CLOCK_REALTIME | CLOCK_MONOTONIC | \ > - CLOCK_REALTIME_HR | CLOCK_MONOTONIC_HR) > -#define CLOCKS_MONO (CLOCK_MONOTONIC & CLOCK_MONOTONIC_HR) > +#define CLOCKS_MASK (CLOCK_REALTIME | CLOCK_MONOTONIC) > +#define CLOCKS_MONO (CLOCK_MONOTONIC) > > /* > * The various flags for setting POSIX.1b interval timers. > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] ktimers subsystem 2.6.14-rc2-kt5 2005-09-28 20:43 [PATCH] ktimers subsystem 2.6.14-rc2-kt5 tglx ` (2 preceding siblings ...) 2005-09-29 19:57 ` George Anzinger @ 2005-10-01 1:03 ` Roman Zippel 2005-10-01 11:22 ` Ingo Molnar ` (2 more replies) 3 siblings, 3 replies; 67+ messages in thread From: Roman Zippel @ 2005-10-01 1:03 UTC (permalink / raw) To: tglx Cc: linux-kernel, mingo, Andrew Morton, george, johnstul, paulmck, Christoph Hellwig, oleg, tim.bird Hi, On Wed, 28 Sep 2005 tglx@linutronix.de wrote: Your patch introduces some whitespace damage, search for "^\+ " in your patch. > ktimers seperate the "timer API" from the "timeout API". I'm not really happy with these names, timeouts are what timers do, so these names don't tell at all, what the difference is. Calling them "process timer" and "kernel timer" would include their main usage, although that also means ptimer were the more correct abbreviation. > +#ifndef KTIME_IS_SCALAR > +typedef union { > + s64 tv64; > + struct { > +#ifdef __BIG_ENDIAN > + s32 sec, nsec; > +#else > + s32 nsec, sec; > +#endif > + } tv; > +} ktime_t; > + > +#else > + > +typedef s64 ktime_t; > + > +#endif Making the union unconditional, would make tv64 always available and a lot of macros unnessary. > +struct ktimer { > + struct rb_node node; > + struct list_head list; > + ktime_t expires; > + ktime_t expired; > + ktime_t interval; > + int overrun; > + unsigned long status; > + void (*function)(void *); > + void *data; > + struct ktimer_base *base; > +}; This structure is rather large and I think a lot can be avoided. - list: AFAICT it's only used by run_ktimer_queue() to get the first pending entry. This can also be done by keeping track of the first entry in the base structure (useful in other places as well). - expired: can be replaced by base->last_expired (may also be useful in other places) - status: only user is ktimer_active(), the same test can be done by testing node.rb_parent. - interval/overrun: this is only needed by itimers and I think it's possible to leave it there. Main change would be to let 'function' return a value indicating whether to rearm the timer or not (this includes expires is updated). > +#define DEFINE_KTIME(k) ktime_t k = {.tv64 = 0LL } > + > +#define ktime_cmp(a,op,b) ((a).tv64 op (b).tv64) > +#define ktime_cmp_val(a, op, b) ((a).tv64 op b) A union ktime would especially avoid this. > +static inline ktime_t ktime_sub(ktime_t a, ktime_t b) > +{ > + ktime_t res; > + > + res.tv64 = a.tv64 - b.tv64; > + if (res.tv.nsec < 0) > + res.tv.nsec += NSEC_PER_SEC; > + > + return res; > +} > + > +static inline ktime_t ktime_add(ktime_t a, ktime_t b) > +{ > + ktime_t res; > + > + res.tv64 = a.tv64 + b.tv64; > + if (res.tv.nsec >= NSEC_PER_SEC) { > + res.tv.nsec -= NSEC_PER_SEC; > + res.tv.sec++; > + } > + return res; > +} Not using 64bit math here allows gcc to generate better code, e.g. gcc has to add another test for "nsec < 0" because the condition code is already used for the overflow, adding the "sec--" instead is IMO faster (i.e. less likely). > +/* The time bases */ > +#define MAX_KTIMER_BASES 2 > +static DEFINE_PER_CPU(struct ktimer_base, ktimer_bases[MAX_KTIMER_BASES]) = Do you have any numbers (besides maybe microbenchmarks) that show a real advantage by using per cpu data? What kind of usage do you expect here? The other thing is that this assumes, that all time sources are programmable per cpu, otherwise it will be more complicated for a time source to run the timers for every cpu, I don't know how safe that assumption is. Changing the array of structures into an array of pointers to the structures would allow to switch between percpu bases and a single base. > +ktime_t ktimer_convert_timespec(struct ktimer *timer, struct timespec *ts) > +{ > + struct ktimer_base *base = get_ktimer_base_unlocked(timer); > + ktime_t t; > + long rem = ts->tv_nsec % base->resolution; > + > + t = ktime_set(ts->tv_sec, ts->tv_nsec); > + > + /* Check, if the value has to be rounded */ > + if (rem) > + t = ktime_add_ns(t, base->resolution - rem); > + return t; > +} Could you explain a little the resolution handling behind in your patch? If I read SUS correctly clock resolution and timer resolution don't have to be the same, the first is returned by clock_getres() and the latter only documented somewhere (and AFAICT our implementation always returned the wrong value). IMO this also means we can don't have to make the rounding that complicated. Actually it could be done automatically by the timer, e.g. interval timer are reprogrammed at (now + interval) and the timer resolution will automatically round it up. > +static int enqueue_ktimer(struct ktimer *timer, struct ktimer_base *base, > + ktime_t *tim, int mode) > +{ > + struct rb_node **link = &base->active.rb_node; > + struct rb_node *parent = NULL; > + struct ktimer *entry; > + struct list_head *prev = &base->pending; > + ktime_t now; > + > + /* Get current time */ > + now = base->get_time(); As get_time() is not necessarily cheap, it can be avoided for nonrelative timers by comparing it with the first pending timer. Maintaining a pointer to the first timer here, avoids the timer list and is a simple check whether the time source needs any reprogramming later. > + if ktime_cmp(timer->expires, <=, now) { > + timer->expired = now; > + /* The caller takes care of expiry */ > + if (!(mode & KTIMER_NOCHECK)) > + return -1; I think KTIMER_NOFAIL would be better name, for a while that had me confused, as you actually do check the value, but you don't fail it and enqueue it anyway. bye, Roman ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] ktimers subsystem 2.6.14-rc2-kt5 2005-10-01 1:03 ` Roman Zippel @ 2005-10-01 11:22 ` Ingo Molnar 2005-10-04 1:59 ` George Anzinger 2005-10-10 12:42 ` Roman Zippel 2005-10-01 12:05 ` Thomas Gleixner 2005-10-04 1:55 ` George Anzinger 2 siblings, 2 replies; 67+ messages in thread From: Ingo Molnar @ 2005-10-01 11:22 UTC (permalink / raw) To: Roman Zippel Cc: tglx, linux-kernel, Andrew Morton, george, johnstul, paulmck, Christoph Hellwig, oleg, tim.bird * Roman Zippel <zippel@linux-m68k.org> wrote: > > +/* The time bases */ > > +#define MAX_KTIMER_BASES 2 > > +static DEFINE_PER_CPU(struct ktimer_base, ktimer_bases[MAX_KTIMER_BASES]) = > > Do you have any numbers (besides maybe microbenchmarks) that show a > real advantage by using per cpu data? What kind of usage do you expect > here? it has countless advantages, and these days we basically only design per-CPU data structures within the kernel, unless some limitation (such as API or hw property) forces us to do otherwise. So i turn around the question: what would be your reason for _not_ doing this clean per-CPU design for SMP systems? > The other thing is that this assumes, that all time sources are > programmable per cpu, otherwise it will be more complicated for a time > source to run the timers for every cpu, I don't know how safe that > assumption is. Changing the array of structures into an array of > pointers to the structures would allow to switch between percpu bases > and a single base. yeah, and that's an assumption that simplifies things on SMP significantly. PIT on SMP systems for HRT is so gross that it's not funny. If anyone wants to revive that notion, please do a separate patch and make the case convincing enough ... Ingo ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] ktimers subsystem 2.6.14-rc2-kt5 2005-10-01 11:22 ` Ingo Molnar @ 2005-10-04 1:59 ` George Anzinger 2005-10-04 5:51 ` Ingo Molnar 2005-10-10 12:42 ` Roman Zippel 1 sibling, 1 reply; 67+ messages in thread From: George Anzinger @ 2005-10-04 1:59 UTC (permalink / raw) To: Ingo Molnar Cc: Roman Zippel, tglx, linux-kernel, Andrew Morton, johnstul, paulmck, Christoph Hellwig, oleg, tim.bird Ingo Molnar wrote: > * Roman Zippel <zippel@linux-m68k.org> wrote: > > >>The other thing is that this assumes, that all time sources are >>programmable per cpu, otherwise it will be more complicated for a time >>source to run the timers for every cpu, I don't know how safe that >>assumption is. Changing the array of structures into an array of >>pointers to the structures would allow to switch between percpu bases >>and a single base. > > > yeah, and that's an assumption that simplifies things on SMP > significantly. PIT on SMP systems for HRT is so gross that it's not > funny. If anyone wants to revive that notion, please do a separate patch > and make the case convincing enough ... > Lets not talk about PIT, but, a lot of SMP platforms do NOT have per cpu timers. For those, it would seem having per cpu lists to handle the timer is not really reasonable. -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] ktimers subsystem 2.6.14-rc2-kt5 2005-10-04 1:59 ` George Anzinger @ 2005-10-04 5:51 ` Ingo Molnar 0 siblings, 0 replies; 67+ messages in thread From: Ingo Molnar @ 2005-10-04 5:51 UTC (permalink / raw) To: George Anzinger Cc: Roman Zippel, tglx, linux-kernel, Andrew Morton, johnstul, paulmck, Christoph Hellwig, oleg, tim.bird * George Anzinger <george@mvista.com> wrote: > > yeah, and that's an assumption that simplifies things on SMP > > significantly. PIT on SMP systems for HRT is so gross that it's not > > funny. If anyone wants to revive that notion, please do a separate > > patch and make the case convincing enough ... > > Lets not talk about PIT, but, a lot of SMP platforms do NOT have per > cpu timers. For those, it would seem having per cpu lists to handle > the timer is not really reasonable. frankly, such systems are rare, and are an afterthought at most. Think about it: 8 CPUs and only one hres timer source? It cannot work nor scale well. i agree that they might eventually be handled (although i think we shouldnt bother, all sane SMP designs have per-CPU timers), but we definite wont design for them. What such an architecture has to do is to provide the proper do_hr_timer_int() and arch_hrtimer_reprogram() semantics, via locking around that timer source (naturally), and via cross-CPU calls - as if they were per-CPU timers. Ingo ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] ktimers subsystem 2.6.14-rc2-kt5 2005-10-01 11:22 ` Ingo Molnar 2005-10-04 1:59 ` George Anzinger @ 2005-10-10 12:42 ` Roman Zippel 2005-10-10 14:04 ` Ingo Molnar 1 sibling, 1 reply; 67+ messages in thread From: Roman Zippel @ 2005-10-10 12:42 UTC (permalink / raw) To: Ingo Molnar Cc: tglx, linux-kernel, Andrew Morton, george, johnstul, paulmck, Christoph Hellwig, oleg, tim.bird Hi, On Sat, 1 Oct 2005, Ingo Molnar wrote: > > Do you have any numbers (besides maybe microbenchmarks) that show a > > real advantage by using per cpu data? What kind of usage do you expect > > here? > > it has countless advantages, and these days we basically only design > per-CPU data structures within the kernel, unless some limitation (such > as API or hw property) forces us to do otherwise. So i turn around the > question: what would be your reason for _not_ doing this clean per-CPU > design for SMP systems? Did I say I'm against it? No, I was just hoping someone put some more thought into it than just "all the other kids are doing it". I was just curious how well it really scales compared to the simple version, e.g. what happens if most timer end up on a single cpu or what happens if we want to start the timer on a different cpu. Is this so wrong that you have to go into attack mode? :( > > The other thing is that this assumes, that all time sources are > > programmable per cpu, otherwise it will be more complicated for a time > > source to run the timers for every cpu, I don't know how safe that > > assumption is. Changing the array of structures into an array of > > pointers to the structures would allow to switch between percpu bases > > and a single base. > > yeah, and that's an assumption that simplifies things on SMP > significantly. PIT on SMP systems for HRT is so gross that it's not > funny. If anyone wants to revive that notion, please do a separate patch > and make the case convincing enough ... Why do use "PIT on SMP" as an extreme example to reject the general concept completely? This doesn't explain, why first such a (simple) SMP design shouldn't exist and why secondly my suggestion is such a big problem. bye, Roman ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] ktimers subsystem 2.6.14-rc2-kt5 2005-10-10 12:42 ` Roman Zippel @ 2005-10-10 14:04 ` Ingo Molnar 0 siblings, 0 replies; 67+ messages in thread From: Ingo Molnar @ 2005-10-10 14:04 UTC (permalink / raw) To: Roman Zippel Cc: tglx, linux-kernel, Andrew Morton, george, johnstul, paulmck, Christoph Hellwig, oleg, tim.bird * Roman Zippel <zippel@linux-m68k.org> wrote: > > > Do you have any numbers (besides maybe microbenchmarks) that show a > > > real advantage by using per cpu data? What kind of usage do you expect > > > here? > > > > it has countless advantages, and these days we basically only design > > per-CPU data structures within the kernel, unless some limitation (such > > as API or hw property) forces us to do otherwise. So i turn around the > > question: what would be your reason for _not_ doing this clean per-CPU > > design for SMP systems? > > Did I say I'm against it? No, I was just hoping someone put some more > thought into it than just "all the other kids are doing it". I was > just curious how well it really scales compared to the simple version, > e.g. what happens if most timer end up on a single cpu or what happens > if we want to start the timer on a different cpu. Is this so wrong > that you have to go into attack mode? :( [ sorry, and i didnt go into 'attack mode'. I believe you'll distinctly notice when i do that :-) ] just think NUMA, and the generic advantages of PER_CPU become obvious. (via PER_CPU the different data structures indexed can properly end up on another domain's RAM, and can thus improve caching characteristics.) Ingo ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] ktimers subsystem 2.6.14-rc2-kt5 2005-10-01 1:03 ` Roman Zippel 2005-10-01 11:22 ` Ingo Molnar @ 2005-10-01 12:05 ` Thomas Gleixner 2005-10-10 17:22 ` Roman Zippel 2005-10-04 1:55 ` George Anzinger 2 siblings, 1 reply; 67+ messages in thread From: Thomas Gleixner @ 2005-10-01 12:05 UTC (permalink / raw) To: Roman Zippel Cc: linux-kernel, mingo, Andrew Morton, george, johnstul, paulmck, Christoph Hellwig, oleg, tim.bird [-- Attachment #1: Type: text/plain, Size: 8003 bytes --] Roman, On Sat, 2005-10-01 at 03:03 +0200, Roman Zippel wrote: > Your patch introduces some whitespace damage, search for "^\+ " in your > patch. Ok. > > ktimers seperate the "timer API" from the "timeout API". > I'm not really happy with these names, timeouts are what timers do, so > these names don't tell at all, what the difference is. There is a clear distinction between timers and timeouts. >From IT-dictonary: "Timeout is a specified period of time that will be allowed to elapse in a system before a specified event is to take place, unless another specified event occurs first; in either case, the period is terminated when either event takes place." "A timer is a specialized type of clock. A timer can be used to control the sequence of an event or process." > Calling them "process timer" and "kernel timer" would include their main > usage, although that also means ptimer were the more correct abbreviation. As said before I think the disctinction between timers and timeouts makes perfectly sense and ktimers are _not_ restricted to process timers. > > +#ifndef KTIME_IS_SCALAR > > +typedef union { > > + s64 tv64; > > + struct { > > +#ifdef __BIG_ENDIAN > > + s32 sec, nsec; > > +#else > > + s32 nsec, sec; > > +#endif > > + } tv; > > +} ktime_t; > > + > > +#else > > + > > +typedef s64 ktime_t; > > + > > +#endif > > Making the union unconditional, would make tv64 always available and a lot > of macros unnessary. nsec,sec storage format is essentially different to the scalar storage format and has to be handled different. The above gives a clear distinction between scalar and sec/nsec based cases. So you cannot mess up without notice. I prefer having a lot more macros / inlines around rather than tracking down _one_ single bug which happens by a non clearly seperated implementation. > > +struct ktimer { > > + struct rb_node node; > > + struct list_head list; > > + ktime_t expires; > > + ktime_t expired; > > + ktime_t interval; > > + int overrun; > > + unsigned long status; > > + void (*function)(void *); > > + void *data; > > + struct ktimer_base *base; > > +}; > > This structure is rather large and I think a lot can be avoided. > - list: AFAICT it's only used by run_ktimer_queue() to get the first > pending entry. This can also be done by keeping track of the first entry > in the base structure (useful in other places as well). You are right that the list is not necessary for the plain integration into the current system, but it is necessary once you start to upgrade to high resolution timers. > - expired: can be replaced by base->last_expired (may also be useful in > other places) How gives base->last_expired a per timer expired information? And where would it be useful ? > - status: only user is ktimer_active(), the same test can be done by > testing node.rb_parent. Uurg. Been there and discarded the idea, because its ugly and clashes with further extensibilty requirements e.g. high resolution timers, where we have more than two states. Having status information bound to arbitrary pointers is trading a variable against flexibility, cleanliness and maintainability. > - interval/overrun: this is only needed by itimers and I think it's > possible to leave it there. Main change would be to let 'function' return > a value indicating whether to rearm the timer or not (this includes > expires is updated). It is also used by the posix timer code and I plan to do another round of simplification also there. This implementation is chosen to be flexible and easy exstensible for use cases like high resolution timers. I do not want to end up with a next round of discussion there about either introducing tons of new ifdefs, macros or redesigning the code base another time. As others have stated too, we have to wage the tradeoff between simplicity, flexibility, maintainability vs. size and performance impacts Performance is definitely an important issue and was accepted and addressed. The tradeoff of the size in question is not a valid argument to give up a clear, flexible and maintainable design. > > +#define DEFINE_KTIME(k) ktime_t k = {.tv64 = 0LL } > > + > > +#define ktime_cmp(a,op,b) ((a).tv64 op (b).tv64) > > +#define ktime_cmp_val(a, op, b) ((a).tv64 op b) > > A union ktime would especially avoid this. See above > > +static inline ktime_t ktime_sub(ktime_t a, ktime_t b) > > +{ > > + ktime_t res; > > + > > + res.tv64 = a.tv64 - b.tv64; > > + if (res.tv.nsec < 0) > > + res.tv.nsec += NSEC_PER_SEC; > > + > > + return res; > > +} > > + > > +static inline ktime_t ktime_add(ktime_t a, ktime_t b) > > +{ > > + ktime_t res; > > + > > + res.tv64 = a.tv64 + b.tv64; > > + if (res.tv.nsec >= NSEC_PER_SEC) { > > + res.tv.nsec -= NSEC_PER_SEC; > > + res.tv.sec++; > > + } > > + return res; > > +} > > Not using 64bit math here allows gcc to generate better code, e.g. gcc > has to add another test for "nsec < 0" because the condition code is > already used for the overflow, adding the "sec--" instead is IMO faster > (i.e. less likely). i686 DOADD32 00000048 DOADD64 0000002a DOSUB32 00000060 DOSUB64 0000002f arm DOADD32 0000004c DOADD64 0000004c DOSUB32 00000040 DOSUB64 00000038 m68k DOADD32 0000003c DOADD64 0000002e DOSUB32 00000036 DOSUB64 00000028 powerpc DOADD32 00000040 DOADD64 00000044 DOSUB32 00000044 DOSUB64 00000044 m68k DOADD32 0000003c DOADD64 0000002e DOSUB32 00000036 DOSUB64 00000028 Please do not tell me that size does not matter. :) I attached the assembler dumps, so you can have a look yourself. I did these tests during the implementation and decided on the results rather than on assumptions about gcc. > Could you explain a little the resolution handling behind in your patch? > If I read SUS correctly clock resolution and timer resolution don't have > to be the same, the first is returned by clock_getres() and the latter > only documented somewhere (and AFAICT our implementation always returned > the wrong value). As far as I understand SUS timer resolution is equal to clock resolution and the timer value/interval is rounded up to the resolution. > IMO this also means we can don't have to make the rounding that > complicated. Actually it could be done automatically by the timer, e.g. > interval timer are reprogrammed at (now + interval) and the timer > resolution will automatically round it up. Reprogramming interval timers by now + interval is completely wrong. Reprogramming has to be timer->expires + interval and nothing else. Doing real rounding in the reprogramming code would be a performance impact. > > + /* Get current time */ > > + now = base->get_time(); > > As get_time() is not necessarily cheap, it can be avoided for nonrelative > timers by comparing it with the first pending timer. Maintaining a pointer > to the first timer here, avoids the timer list and is a simple check > whether the time source needs any reprogramming later. Would you please care to read the complete related code to find out why this does not work. This is totaly unrelated to reprogramming of the time event source in the HRT case. ... case KTIMER_FORWARD: while ktime_cmp(timer->expires, <= , now) { ... case KTIMER_REARM: while ktime_cmp(timer->expires, <= , now) { timer->expires = ktime_add(timer->expires, and of course the expiry check below. > > + if ktime_cmp(timer->expires, <=, now) { > > + timer->expired = now; > > + /* The caller takes care of expiry */ > > + if (!(mode & KTIMER_NOCHECK)) > > + return -1; > > I think KTIMER_NOFAIL would be better name, for a while that had me > confused, as you actually do check the value, but you don't fail it and > enqueue it anyway. It does not fail. It returns in the case that the timer is already expired. The NOCHECK flag is used to skip the check. tglx [-- Attachment #2: testarchs.dmp --] [-- Type: text/plain, Size: 18161 bytes --] DOADD32 ktime.o: file format elf32-i386 Disassembly of section .text: 00000000 <ktime_ops>: 0: 55 push %ebp 1: 89 e5 mov %esp,%ebp 3: 83 ec 0c sub $0xc,%esp 6: 89 1c 24 mov %ebx,(%esp) 9: 8b 45 14 mov 0x14(%ebp),%eax c: 8b 5d 10 mov 0x10(%ebp),%ebx f: 89 74 24 04 mov %esi,0x4(%esp) 13: 8b 55 18 mov 0x18(%ebp),%edx 16: 89 7c 24 08 mov %edi,0x8(%esp) 1a: 8d 34 03 lea (%ebx,%eax,1),%esi 1d: 81 fe ff c9 9a 3b cmp $0x3b9ac9ff,%esi 23: 8d 3c 13 lea (%ebx,%edx,1),%edi 26: 7e 07 jle 2f <ktime_ops+0x2f> 28: 81 ee 00 ca 9a 3b sub $0x3b9aca00,%esi 2e: 47 inc %edi 2f: 8b 45 08 mov 0x8(%ebp),%eax 32: 89 30 mov %esi,(%eax) 34: 89 78 04 mov %edi,0x4(%eax) 37: 8b 1c 24 mov (%esp),%ebx 3a: 8b 74 24 04 mov 0x4(%esp),%esi 3e: 8b 7c 24 08 mov 0x8(%esp),%edi 42: 89 ec mov %ebp,%esp 44: 5d pop %ebp 45: c2 04 00 ret $0x4 ------------------------------------------------------------------------- DOADD64 ktime.o: file format elf32-i386 Disassembly of section .text: 00000000 <ktime_ops>: 0: 55 push %ebp 1: 89 e5 mov %esp,%ebp 3: 8b 55 14 mov 0x14(%ebp),%edx 6: 03 55 0c add 0xc(%ebp),%edx 9: 8b 4d 18 mov 0x18(%ebp),%ecx c: 8b 45 08 mov 0x8(%ebp),%eax f: 13 4d 10 adc 0x10(%ebp),%ecx 12: 81 fa ff c9 9a 3b cmp $0x3b9ac9ff,%edx 18: 7e 07 jle 21 <ktime_ops+0x21> 1a: 81 ea 00 ca 9a 3b sub $0x3b9aca00,%edx 20: 41 inc %ecx 21: 89 10 mov %edx,(%eax) 23: 89 48 04 mov %ecx,0x4(%eax) 26: 5d pop %ebp 27: c2 04 00 ret $0x4 ------------------------------------------------------------------------- DOSUB32 ktime.o: file format elf32-i386 Disassembly of section .text: 00000000 <ktime_ops>: 0: 55 push %ebp 1: 89 e5 mov %esp,%ebp 3: 83 ec 0c sub $0xc,%esp 6: 89 74 24 04 mov %esi,0x4(%esp) a: 89 7c 24 08 mov %edi,0x8(%esp) e: 89 1c 24 mov %ebx,(%esp) 11: 8b 5d 10 mov 0x10(%ebp),%ebx 14: 8b 45 14 mov 0x14(%ebp),%eax 17: 8b 55 18 mov 0x18(%ebp),%edx 1a: 89 de mov %ebx,%esi 1c: 89 df mov %ebx,%edi 1e: 29 c6 sub %eax,%esi 20: 29 d7 sub %edx,%edi 22: 85 f6 test %esi,%esi 24: 78 1a js 40 <ktime_ops+0x40> 26: 8b 45 08 mov 0x8(%ebp),%eax 29: 89 30 mov %esi,(%eax) 2b: 89 78 04 mov %edi,0x4(%eax) 2e: 8b 1c 24 mov (%esp),%ebx 31: 8b 74 24 04 mov 0x4(%esp),%esi 35: 8b 7c 24 08 mov 0x8(%esp),%edi 39: 89 ec mov %ebp,%esp 3b: 5d pop %ebp 3c: c2 04 00 ret $0x4 3f: 90 nop 40: 8b 45 08 mov 0x8(%ebp),%eax 43: 81 c6 00 ca 9a 3b add $0x3b9aca00,%esi 49: 4f dec %edi 4a: 89 30 mov %esi,(%eax) 4c: 89 78 04 mov %edi,0x4(%eax) 4f: 8b 1c 24 mov (%esp),%ebx 52: 8b 74 24 04 mov 0x4(%esp),%esi 56: 8b 7c 24 08 mov 0x8(%esp),%edi 5a: 89 ec mov %ebp,%esp 5c: 5d pop %ebp 5d: c2 04 00 ret $0x4 ------------------------------------------------------------------------- DOSUB64 ktime.o: file format elf32-i386 Disassembly of section .text: 00000000 <ktime_ops>: 0: 55 push %ebp 1: 89 e5 mov %esp,%ebp 3: 8b 55 0c mov 0xc(%ebp),%edx 6: 2b 55 14 sub 0x14(%ebp),%edx 9: 8b 4d 10 mov 0x10(%ebp),%ecx c: 8b 45 08 mov 0x8(%ebp),%eax f: 1b 4d 18 sbb 0x18(%ebp),%ecx 12: 85 d2 test %edx,%edx 14: 78 0a js 20 <ktime_ops+0x20> 16: 89 10 mov %edx,(%eax) 18: 89 48 04 mov %ecx,0x4(%eax) 1b: 5d pop %ebp 1c: c2 04 00 ret $0x4 1f: 90 nop 20: 89 48 04 mov %ecx,0x4(%eax) 23: 81 c2 00 ca 9a 3b add $0x3b9aca00,%edx 29: 89 10 mov %edx,(%eax) 2b: 5d pop %ebp 2c: c2 04 00 ret $0x4 ------------------------------------------------------------------------- DOADD32 ktime.o: file format elf32-littlearm Disassembly of section .text: 00000000 <ktime_ops>: 0: e24dd004 sub sp, sp, #4 ; 0x4 4: e92d4070 stmdb sp!, {r4, r5, r6, lr} 8: e3e0c331 mvn ip, #-1006632960 ; 0xc4000000 c: e24cc865 sub ip, ip, #6619136 ; 0x650000 10: e24ccc36 sub ip, ip, #13824 ; 0x3600 14: e58d3010 str r3, [sp, #16] 18: e28de010 add lr, sp, #16 ; 0x10 1c: e89e0018 ldmia lr, {r3, r4} 20: e0825003 add r5, r2, r3 24: e155000c cmp r5, ip 28: c2855331 addgt r5, r5, #-1006632960 ; 0xc4000000 2c: e0826004 add r6, r2, r4 30: c2855865 addgt r5, r5, #6619136 ; 0x650000 34: c2855c36 addgt r5, r5, #13824 ; 0x3600 38: c2866001 addgt r6, r6, #1 ; 0x1 3c: e8800060 stmia r0, {r5, r6} 40: e8bd4070 ldmia sp!, {r4, r5, r6, lr} 44: e28dd004 add sp, sp, #4 ; 0x4 48: e1a0f00e mov pc, lr ------------------------------------------------------------------------- DOADD64 ktime.o: file format elf32-littlearm Disassembly of section .text: 00000000 <ktime_ops>: 0: e24dd004 sub sp, sp, #4 ; 0x4 4: e92d4030 stmdb sp!, {r4, r5, lr} 8: e58d300c str r3, [sp, #12] c: e28de010 add lr, sp, #16 ; 0x10 10: e3e03331 mvn r3, #-1006632960 ; 0xc4000000 14: e2433865 sub r3, r3, #6619136 ; 0x650000 18: e2433c36 sub r3, r3, #13824 ; 0x3600 1c: e81e0030 ldmda lr, {r4, r5} 20: e0944001 adds r4, r4, r1 24: e0a55002 adc r5, r5, r2 28: e1540003 cmp r4, r3 2c: c2844331 addgt r4, r4, #-1006632960 ; 0xc4000000 30: c2844865 addgt r4, r4, #6619136 ; 0x650000 34: c2844c36 addgt r4, r4, #13824 ; 0x3600 38: c2855001 addgt r5, r5, #1 ; 0x1 3c: e8800030 stmia r0, {r4, r5} 40: e8bd4030 ldmia sp!, {r4, r5, lr} 44: e28dd004 add sp, sp, #4 ; 0x4 48: e1a0f00e mov pc, lr ------------------------------------------------------------------------- DOSUB32 ktime.o: file format elf32-littlearm Disassembly of section .text: 00000000 <ktime_ops>: 0: e24dd004 sub sp, sp, #4 ; 0x4 4: e92d4070 stmdb sp!, {r4, r5, r6, lr} 8: e58d3010 str r3, [sp, #16] c: e28de010 add lr, sp, #16 ; 0x10 10: e89e0018 ldmia lr, {r3, r4} 14: e0635002 rsb r5, r3, r2 18: e3550000 cmp r5, #0 ; 0x0 1c: b28555ee addlt r5, r5, #998244352 ; 0x3b800000 20: e0646002 rsb r6, r4, r2 24: b285596b addlt r5, r5, #1753088 ; 0x1ac000 28: b2855c0a addlt r5, r5, #2560 ; 0xa00 2c: b2466001 sublt r6, r6, #1 ; 0x1 30: e8800060 stmia r0, {r5, r6} 34: e8bd4070 ldmia sp!, {r4, r5, r6, lr} 38: e28dd004 add sp, sp, #4 ; 0x4 3c: e1a0f00e mov pc, lr ------------------------------------------------------------------------- DOSUB64 ktime.o: file format elf32-littlearm Disassembly of section .text: 00000000 <ktime_ops>: 0: e24dd004 sub sp, sp, #4 ; 0x4 4: e52d4004 str r4, [sp, #-4]! 8: e58d3004 str r3, [sp, #4] c: e99d0018 ldmib sp, {r3, r4} 10: e0511003 subs r1, r1, r3 14: e0c22004 sbc r2, r2, r4 18: e3510000 cmp r1, #0 ; 0x0 1c: b28115ee addlt r1, r1, #998244352 ; 0x3b800000 20: b281196b addlt r1, r1, #1753088 ; 0x1ac000 24: b2811c0a addlt r1, r1, #2560 ; 0xa00 28: e8800006 stmia r0, {r1, r2} 2c: e8bd0010 ldmia sp!, {r4} 30: e28dd004 add sp, sp, #4 ; 0x4 34: e1a0f00e mov pc, lr ------------------------------------------------------------------------- DOADD32 ktime.o: file format elf32-m68k Disassembly of section .text: 00000000 <ktime_ops>: 0: 4e56 0000 linkw %fp,#0 4: 2f03 movel %d3,%sp@- 6: 2f02 movel %d2,%sp@- 8: 206e 0008 moveal %fp@(8),%a0 c: 226e 000c moveal %fp@(12),%a1 10: 202e 0010 movel %fp@(16),%d0 14: 222e 0014 movel %fp@(20),%d1 18: 2408 movel %a0,%d2 1a: d480 addl %d0,%d2 1c: 2608 movel %a0,%d3 1e: d681 addl %d1,%d3 20: 0c83 3b9a c9ff cmpil #999999999,%d3 26: 6f08 bles 30 <ktime_ops+0x30> 28: 0683 c465 3600 addil #-1000000000,%d3 2e: 5282 addql #1,%d2 30: 2002 movel %d2,%d0 32: 2203 movel %d3,%d1 34: 241f movel %sp@+,%d2 36: 261f movel %sp@+,%d3 38: 4e5e unlk %fp 3a: 4e75 rts ------------------------------------------------------------------------- DOADD64 ktime.o: file format elf32-m68k Disassembly of section .text: 00000000 <ktime_ops>: 0: 4e56 0000 linkw %fp,#0 4: 2f02 movel %d2,%sp@- 6: 202e 0008 movel %fp@(8),%d0 a: 222e 000c movel %fp@(12),%d1 e: 242e 0010 movel %fp@(16),%d2 12: d2ae 0014 addl %fp@(20),%d1 16: d182 addxl %d2,%d0 18: 0c81 3b9a c9ff cmpil #999999999,%d1 1e: 6f08 bles 28 <ktime_ops+0x28> 20: 0681 c465 3600 addil #-1000000000,%d1 26: 5280 addql #1,%d0 28: 241f movel %sp@+,%d2 2a: 4e5e unlk %fp 2c: 4e75 rts ------------------------------------------------------------------------- DOSUB32 ktime.o: file format elf32-m68k Disassembly of section .text: 00000000 <ktime_ops>: 0: 4e56 0000 linkw %fp,#0 4: 2f03 movel %d3,%sp@- 6: 2f02 movel %d2,%sp@- 8: 202e 0008 movel %fp@(8),%d0 c: 222e 000c movel %fp@(12),%d1 10: 206e 0010 moveal %fp@(16),%a0 14: 226e 0014 moveal %fp@(20),%a1 18: 2400 movel %d0,%d2 1a: 9488 subl %a0,%d2 1c: 9089 subl %a1,%d0 1e: 2600 movel %d0,%d3 20: 6c08 bges 2a <ktime_ops+0x2a> 22: 0683 3b9a ca00 addil #1000000000,%d3 28: 5382 subql #1,%d2 2a: 2002 movel %d2,%d0 2c: 2203 movel %d3,%d1 2e: 241f movel %sp@+,%d2 30: 261f movel %sp@+,%d3 32: 4e5e unlk %fp 34: 4e75 rts ------------------------------------------------------------------------- DOSUB64 ktime.o: file format elf32-m68k Disassembly of section .text: 00000000 <ktime_ops>: 0: 4e56 0000 linkw %fp,#0 4: 2f02 movel %d2,%sp@- 6: 202e 0008 movel %fp@(8),%d0 a: 222e 000c movel %fp@(12),%d1 e: 242e 0010 movel %fp@(16),%d2 12: 92ae 0014 subl %fp@(20),%d1 16: 9182 subxl %d2,%d0 18: 4a81 tstl %d1 1a: 6c06 bges 22 <ktime_ops+0x22> 1c: 0681 3b9a ca00 addil #1000000000,%d1 22: 241f movel %sp@+,%d2 24: 4e5e unlk %fp 26: 4e75 rts ------------------------------------------------------------------------- DOADD32 ktime.o: file format elf32-powerpc Disassembly of section .text: 00000000 <ktime_ops>: 0: 81 64 00 00 lwz r11,0(r4) 4: 3c 00 3b 9a lis r0,15258 8: 81 45 00 04 lwz r10,4(r5) c: 60 00 c9 ff ori r0,r0,51711 10: 81 25 00 00 lwz r9,0(r5) 14: 7d 0b 52 14 add r8,r11,r10 18: 7f 88 00 00 cmpw cr7,r8,r0 1c: 7c eb 4a 14 add r7,r11,r9 20: 3d 68 c4 65 addis r11,r8,-15259 24: 7c 69 1b 78 mr r9,r3 28: 40 9d 00 0c ble- cr7,34 <ktime_ops+0x34> 2c: 39 0b 36 00 addi r8,r11,13824 30: 38 e7 00 01 addi r7,r7,1 34: 90 e9 00 00 stw r7,0(r9) 38: 91 09 00 04 stw r8,4(r9) 3c: 4e 80 00 20 blr ------------------------------------------------------------------------- DOADD64 ktime.o: file format elf32-powerpc Disassembly of section .text: 00000000 <ktime_ops>: 0: 81 25 00 00 lwz r9,0(r5) 4: 3c 00 3b 9a lis r0,15258 8: 81 45 00 04 lwz r10,4(r5) c: 60 00 c9 ff ori r0,r0,51711 10: 81 64 00 00 lwz r11,0(r4) 14: 81 84 00 04 lwz r12,4(r4) 18: 7d 8c 50 14 addc r12,r12,r10 1c: 7d 6b 49 14 adde r11,r11,r9 20: 7c 69 1b 78 mr r9,r3 24: 7f 8c 00 00 cmpw cr7,r12,r0 28: 3d 4c c4 65 addis r10,r12,-15259 2c: 40 9d 00 0c ble- cr7,38 <ktime_ops+0x38> 30: 39 8a 36 00 addi r12,r10,13824 34: 39 6b 00 01 addi r11,r11,1 38: 91 69 00 00 stw r11,0(r9) 3c: 91 89 00 04 stw r12,4(r9) 40: 4e 80 00 20 blr ------------------------------------------------------------------------- DOSUB32 ktime.o: file format elf32-powerpc Disassembly of section .text: 00000000 <ktime_ops>: 0: 81 24 00 00 lwz r9,0(r4) 4: 81 85 00 04 lwz r12,4(r5) 8: 81 65 00 00 lwz r11,0(r5) c: 7d 0c 48 50 subf r8,r12,r9 10: 2f 88 00 00 cmpwi cr7,r8,0 14: 7c eb 48 50 subf r7,r11,r9 18: 3d 68 3b 9b addis r11,r8,15259 1c: 7c 69 1b 78 mr r9,r3 20: 41 9c 00 10 blt- cr7,30 <ktime_ops+0x30> 24: 90 e9 00 00 stw r7,0(r9) 28: 91 09 00 04 stw r8,4(r9) 2c: 4e 80 00 20 blr 30: 39 0b ca 00 addi r8,r11,-13824 34: 38 e7 ff ff addi r7,r7,-1 38: 90 e9 00 00 stw r7,0(r9) 3c: 91 09 00 04 stw r8,4(r9) 40: 4e 80 00 20 blr ------------------------------------------------------------------------- DOSUB64 ktime.o: file format elf32-powerpc Disassembly of section .text: 00000000 <ktime_ops>: 0: 81 65 00 00 lwz r11,0(r5) 4: 7c 68 1b 78 mr r8,r3 8: 81 24 00 00 lwz r9,0(r4) c: 81 44 00 04 lwz r10,4(r4) 10: 81 85 00 04 lwz r12,4(r5) 14: 7d 4c 50 10 subfc r10,r12,r10 18: 7d 2b 49 10 subfe r9,r11,r9 1c: 2f 8a 00 00 cmpwi cr7,r10,0 20: 3d 6a 3b 9b addis r11,r10,15259 24: 41 9c 00 10 blt- cr7,34 <ktime_ops+0x34> 28: 91 28 00 00 stw r9,0(r8) 2c: 91 48 00 04 stw r10,4(r8) 30: 4e 80 00 20 blr 34: 39 4b ca 00 addi r10,r11,-13824 38: 91 28 00 00 stw r9,0(r8) 3c: 91 48 00 04 stw r10,4(r8) 40: 4e 80 00 20 blr ------------------------------------------------------------------------- DOADD32 ktime.o: file format elf32-m68k Disassembly of section .text: 00000000 <ktime_ops>: 0: 4e56 0000 linkw %fp,#0 4: 2f03 movel %d3,%sp@- 6: 2f02 movel %d2,%sp@- 8: 206e 0008 moveal %fp@(8),%a0 c: 226e 000c moveal %fp@(12),%a1 10: 202e 0010 movel %fp@(16),%d0 14: 222e 0014 movel %fp@(20),%d1 18: 2408 movel %a0,%d2 1a: d480 addl %d0,%d2 1c: 2608 movel %a0,%d3 1e: d681 addl %d1,%d3 20: 0c83 3b9a c9ff cmpil #999999999,%d3 26: 6f08 bles 30 <ktime_ops+0x30> 28: 0683 c465 3600 addil #-1000000000,%d3 2e: 5282 addql #1,%d2 30: 2002 movel %d2,%d0 32: 2203 movel %d3,%d1 34: 241f movel %sp@+,%d2 36: 261f movel %sp@+,%d3 38: 4e5e unlk %fp 3a: 4e75 rts ------------------------------------------------------------------------- DOADD64 ktime.o: file format elf32-m68k Disassembly of section .text: 00000000 <ktime_ops>: 0: 4e56 0000 linkw %fp,#0 4: 2f02 movel %d2,%sp@- 6: 202e 0008 movel %fp@(8),%d0 a: 222e 000c movel %fp@(12),%d1 e: 242e 0010 movel %fp@(16),%d2 12: d2ae 0014 addl %fp@(20),%d1 16: d182 addxl %d2,%d0 18: 0c81 3b9a c9ff cmpil #999999999,%d1 1e: 6f08 bles 28 <ktime_ops+0x28> 20: 0681 c465 3600 addil #-1000000000,%d1 26: 5280 addql #1,%d0 28: 241f movel %sp@+,%d2 2a: 4e5e unlk %fp 2c: 4e75 rts ------------------------------------------------------------------------- DOSUB32 ktime.o: file format elf32-m68k Disassembly of section .text: 00000000 <ktime_ops>: 0: 4e56 0000 linkw %fp,#0 4: 2f03 movel %d3,%sp@- 6: 2f02 movel %d2,%sp@- 8: 202e 0008 movel %fp@(8),%d0 c: 222e 000c movel %fp@(12),%d1 10: 206e 0010 moveal %fp@(16),%a0 14: 226e 0014 moveal %fp@(20),%a1 18: 2400 movel %d0,%d2 1a: 9488 subl %a0,%d2 1c: 9089 subl %a1,%d0 1e: 2600 movel %d0,%d3 20: 6c08 bges 2a <ktime_ops+0x2a> 22: 0683 3b9a ca00 addil #1000000000,%d3 28: 5382 subql #1,%d2 2a: 2002 movel %d2,%d0 2c: 2203 movel %d3,%d1 2e: 241f movel %sp@+,%d2 30: 261f movel %sp@+,%d3 32: 4e5e unlk %fp 34: 4e75 rts ------------------------------------------------------------------------- DOSUB64 ktime.o: file format elf32-m68k Disassembly of section .text: 00000000 <ktime_ops>: 0: 4e56 0000 linkw %fp,#0 4: 2f02 movel %d2,%sp@- 6: 202e 0008 movel %fp@(8),%d0 a: 222e 000c movel %fp@(12),%d1 e: 242e 0010 movel %fp@(16),%d2 12: 92ae 0014 subl %fp@(20),%d1 16: 9182 subxl %d2,%d0 18: 4a81 tstl %d1 1a: 6c06 bges 22 <ktime_ops+0x22> 1c: 0681 3b9a ca00 addil #1000000000,%d1 22: 241f movel %sp@+,%d2 24: 4e5e unlk %fp 26: 4e75 rts ------------------------------------------------------------------------- ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] ktimers subsystem 2.6.14-rc2-kt5 2005-10-01 12:05 ` Thomas Gleixner @ 2005-10-10 17:22 ` Roman Zippel 2005-10-11 7:42 ` Thomas Gleixner 0 siblings, 1 reply; 67+ messages in thread From: Roman Zippel @ 2005-10-10 17:22 UTC (permalink / raw) To: Thomas Gleixner Cc: linux-kernel, mingo, Andrew Morton, george, johnstul, paulmck, Christoph Hellwig, oleg, tim.bird Hi, On Sat, 1 Oct 2005, Thomas Gleixner wrote: > > > ktimers seperate the "timer API" from the "timeout API". > > I'm not really happy with these names, timeouts are what timers do, so > > these names don't tell at all, what the difference is. > > There is a clear distinction between timers and timeouts. > > >From IT-dictonary: > > "Timeout is a specified period of time that will be allowed to elapse in > a system before a specified event is to take place, unless another > specified event occurs first; in either case, the period is terminated > when either event takes place." > > "A timer is a specialized type of clock. A timer can be used to control > the sequence of an event or process." IOW a timer uses timeouts to control a sequence of events, it's still part of the same thing, which makes "timer API" and "timeout API" very confusing. > > Calling them "process timer" and "kernel timer" would include their main > > usage, although that also means ptimer were the more correct abbreviation. > > As said before I think the disctinction between timers and timeouts > makes perfectly sense and ktimers are _not_ restricted to process > timers. "main usage" != "restricted to" > > > +#ifndef KTIME_IS_SCALAR > > > +typedef union { > > > + s64 tv64; > > > + struct { > > > +#ifdef __BIG_ENDIAN > > > + s32 sec, nsec; > > > +#else > > > + s32 nsec, sec; > > > +#endif > > > + } tv; > > > +} ktime_t; > > > + > > > +#else > > > + > > > +typedef s64 ktime_t; > > > + > > > +#endif > > > > Making the union unconditional, would make tv64 always available and a lot > > of macros unnessary. > > nsec,sec storage format is essentially different to the scalar storage > format and has to be handled different. > > The above gives a clear distinction between scalar and sec/nsec based > cases. So you cannot mess up without notice. There are enough macros to do this anyway. There are a number of operations which are identical. Separating them artifically makes everything only more complicated. > > > +struct ktimer { > > > + struct rb_node node; > > > + struct list_head list; > > > + ktime_t expires; > > > + ktime_t expired; > > > + ktime_t interval; > > > + int overrun; > > > + unsigned long status; > > > + void (*function)(void *); > > > + void *data; > > > + struct ktimer_base *base; > > > +}; > > > > This structure is rather large and I think a lot can be avoided. > > - list: AFAICT it's only used by run_ktimer_queue() to get the first > > pending entry. This can also be done by keeping track of the first entry > > in the base structure (useful in other places as well). > > You are right that the list is not necessary for the plain integration > into the current system, but it is necessary once you start to upgrade > to high resolution timers. Could you please specifiy these requirements? > > - expired: can be replaced by base->last_expired (may also be useful in > > other places) > > How gives base->last_expired a per timer expired information? And where > would it be useful ? If a callback needs that information, it can it get from there. > > - status: only user is ktimer_active(), the same test can be done by > > testing node.rb_parent. > > Uurg. Been there and discarded the idea, because its ugly and clashes > with further extensibilty requirements e.g. high resolution timers, > where we have more than two states. > > Having status information bound to arbitrary pointers is trading a > variable against flexibility, cleanliness and maintainability. If you want to introduce more states later, it requires changing _one_ macro, so I don't really see the problem. > > - interval/overrun: this is only needed by itimers and I think it's > > possible to leave it there. Main change would be to let 'function' return > > a value indicating whether to rearm the timer or not (this includes > > expires is updated). > > It is also used by the posix timer code and I plan to do another round > of simplification also there. Please explain. > I do not want to end up with a next round of discussion there about > either introducing tons of new ifdefs, macros or redesigning the code > base another time. I don't really see why this should be an excuse to introduce now more complex code than really necessary. If that extra complexity can't stand on it's own please introduce as soon as it becomes necessary. I like most of the patch, but I would prefer to do a simple implementation/ cleanup first and then build anything more complex on top of it. If you need another complete redesign for this, then you likely do something wrong already now. > > Not using 64bit math here allows gcc to generate better code, e.g. gcc > > has to add another test for "nsec < 0" because the condition code is > > already used for the overflow, adding the "sec--" instead is IMO faster > > (i.e. less likely). > > i686 > DOADD32 00000048 > DOADD64 0000002a > DOSUB32 00000060 > DOSUB64 0000002f > arm > DOADD32 0000004c > DOADD64 0000004c > DOSUB32 00000040 > DOSUB64 00000038 > m68k > DOADD32 0000003c > DOADD64 0000002e > DOSUB32 00000036 > DOSUB64 00000028 > powerpc > DOADD32 00000040 > DOADD64 00000044 > DOSUB32 00000044 > DOSUB64 00000044 > > Please do not tell me that size does not matter. :) > > I attached the assembler dumps, so you can have a look yourself. I did > these tests during the implementation and decided on the results rather > than on assumptions about gcc. Did you look at the generating code? Most of it is function prologue/ epilogue, which is quite unimportant for inline functions. The other thing I forgot to mention last time is that passing values by reference instead of value also makes a difference. For m68k I actually got smaller code this way (mostly because addx/subx are limited in their addressing modes). In the other cases I'm actually surprised gcc doesn't use the previous result from the sub and adds another test. The remaining difference comes from how gcc deals with structure vs. integral values, which could use some improvement, especially the add case should have produced nearly identical results. Anyway, this point wasn't that important, it's only microoptimizations and at least having the option to change it later (after more tests) is fine with me. > > Could you explain a little the resolution handling behind in your patch? > > If I read SUS correctly clock resolution and timer resolution don't have > > to be the same, the first is returned by clock_getres() and the latter > > only documented somewhere (and AFAICT our implementation always returned > > the wrong value). > > As far as I understand SUS timer resolution is equal to clock resolution > and the timer value/interval is rounded up to the resolution. Please check the rationale about clocks and timers. It talks about clocks and timer services based on them and their resolutions can be different. > > IMO this also means we can don't have to make the rounding that > > complicated. Actually it could be done automatically by the timer, e.g. > > interval timer are reprogrammed at (now + interval) and the timer > > resolution will automatically round it up. > > Reprogramming interval timers by now + interval is completely wrong. > Reprogramming has to be > timer->expires + interval and nothing else. Where do get the requirement for an explicit rounding from? The point is that the timer should not expire early, but there is more than one way to do this and can be done implicitly using the timer resolution. > > > + /* Get current time */ > > > + now = base->get_time(); > > > > As get_time() is not necessarily cheap, it can be avoided for nonrelative > > timers by comparing it with the first pending timer. Maintaining a pointer > > to the first timer here, avoids the timer list and is a simple check > > whether the time source needs any reprogramming later. > > Would you please care to read the complete related code to find out why > this does not work. This is totaly unrelated to reprogramming of the > time event source in the HRT case. You saw that I restricted this to "nonrelative timers"? > > > + if ktime_cmp(timer->expires, <=, now) { > > > + timer->expired = now; > > > + /* The caller takes care of expiry */ > > > + if (!(mode & KTIMER_NOCHECK)) > > > + return -1; > > > > I think KTIMER_NOFAIL would be better name, for a while that had me > > confused, as you actually do check the value, but you don't fail it and > > enqueue it anyway. > > It does not fail. It returns in the case that the timer is already > expired. The NOCHECK flag is used to skip the check. It returns with a failure value!? The NOCHECK name is ambiguous about what should not be checked, the NOFAIL name is more clear that the caller doesn't need to check the return value, because the function won't fail. bye, Roman ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] ktimers subsystem 2.6.14-rc2-kt5 2005-10-10 17:22 ` Roman Zippel @ 2005-10-11 7:42 ` Thomas Gleixner 2005-10-12 22:36 ` Roman Zippel 0 siblings, 1 reply; 67+ messages in thread From: Thomas Gleixner @ 2005-10-11 7:42 UTC (permalink / raw) To: Roman Zippel Cc: linux-kernel, mingo, Andrew Morton, george, johnstul, paulmck, Christoph Hellwig, oleg, tim.bird On Mon, 2005-10-10 at 19:22 +0200, Roman Zippel wrote: > > The above gives a clear distinction between scalar and sec/nsec based > > cases. So you cannot mess up without notice. > > There are enough macros to do this anyway. There are a number of > operations which are identical. Separating them artifically makes > everything only more complicated. I don't see a distinct set of macros around which is providing all the functionality. > > As far as I understand SUS timer resolution is equal to clock resolution > > and the timer value/interval is rounded up to the resolution. > > Please check the rationale about clocks and timers. It talks about clocks > and timer services based on them and their resolutions can be different. clock_settime(): ... Time values that are between two consecutive non-negative integer multiples of the resolution of the specified clock shall be truncated down to the smaller multiple of the resolution. timer_settime(): ...Time values that are between two consecutive non-negative integer multiples of the resolution of the specified timer shall be rounded up to the larger multiple of the resolution. Quantization error shall not cause the timer to expire earlier than the rounded time value. > > Reprogramming interval timers by now + interval is completely wrong. > > Reprogramming has to be > > timer->expires + interval and nothing else. > > Where do get the requirement for an explicit rounding from? > The point is that the timer should not expire early, but there is more > than one way to do this and can be done implicitly using the timer > resolution. See above. tglx ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] ktimers subsystem 2.6.14-rc2-kt5 2005-10-11 7:42 ` Thomas Gleixner @ 2005-10-12 22:36 ` Roman Zippel 2005-10-12 23:46 ` George Anzinger 0 siblings, 1 reply; 67+ messages in thread From: Roman Zippel @ 2005-10-12 22:36 UTC (permalink / raw) To: Thomas Gleixner Cc: linux-kernel, mingo, Andrew Morton, george, johnstul, paulmck, Christoph Hellwig, oleg, tim.bird Hi, On Tue, 11 Oct 2005, Thomas Gleixner wrote: > > > As far as I understand SUS timer resolution is equal to clock resolution > > > and the timer value/interval is rounded up to the resolution. > > > > Please check the rationale about clocks and timers. It talks about clocks > > and timer services based on them and their resolutions can be different. > > clock_settime(): > ... Time values that are between two consecutive non-negative integer > multiples of the resolution of the specified clock shall be truncated > down to the smaller multiple of the resolution. > > timer_settime(): > ...Time values that are between two consecutive non-negative integer > multiples of the resolution of the specified timer shall be rounded up > to the larger multiple of the resolution. Quantization error shall not > cause the timer to expire earlier than the rounded time value. Where does it say anything about that their resolution is equal? > > > Reprogramming interval timers by now + interval is completely wrong. > > > Reprogramming has to be > > > timer->expires + interval and nothing else. > > > > Where do get the requirement for an explicit rounding from? > > The point is that the timer should not expire early, but there is more > > than one way to do this and can be done implicitly using the timer > > resolution. > > See above. I know it and above is an _interface_ description, but what leads you to the conclusion that your _implementation_ is the only correct one? Thomas, are you even interested in discussing this? Do you just expect that everyone accepts your patch and is happy? So far it's difficult enough to get you to explain your design, but a serious discussion also requires to look at the possible alternatives. It's quite possible I'm wrong, but you have to try a little harder at explaining why. bye, Roman ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] ktimers subsystem 2.6.14-rc2-kt5 2005-10-12 22:36 ` Roman Zippel @ 2005-10-12 23:46 ` George Anzinger 2005-10-16 16:34 ` Roman Zippel 0 siblings, 1 reply; 67+ messages in thread From: George Anzinger @ 2005-10-12 23:46 UTC (permalink / raw) To: Roman Zippel Cc: Thomas Gleixner, linux-kernel, mingo, Andrew Morton, johnstul, paulmck, Christoph Hellwig, oleg, tim.bird Roman Zippel wrote: > Hi, > > On Tue, 11 Oct 2005, Thomas Gleixner wrote: > > >>>>As far as I understand SUS timer resolution is equal to clock resolution >>>>and the timer value/interval is rounded up to the resolution. >>> >>>Please check the rationale about clocks and timers. It talks about clocks >>>and timer services based on them and their resolutions can be different. Well, yes and no. Under timer_settime() it talks about ticks and resolution being the inverse of the tick rate. AND it does imply that timers on a given CLOCK will have that clocks resolution as returned by clock_res(). This is fine as far as it goes. In practical systems we almost always have a much higher resolution for the clock_gettime() and gettimeofday() than the tick rate. What the standard does not seem to want to do is to admit that a clock may have the ability to be read at a better resolution than its tick rate. For this reason, the usual practice is to return the "timer" resolution for clock_res() and to return clock values with as much resolution as possible. In no case should the actual clock resolution be less than what clock_res() returns. >> >>clock_settime(): >>... Time values that are between two consecutive non-negative integer >>multiples of the resolution of the specified clock shall be truncated >>down to the smaller multiple of the resolution. >> >>timer_settime(): >>...Time values that are between two consecutive non-negative integer >>multiples of the resolution of the specified timer shall be rounded up >>to the larger multiple of the resolution. Quantization error shall not >>cause the timer to expire earlier than the rounded time value. Here the standard uses "resolution of the specified timer" but the only way, in the standard, to associate a resolution with a timer is via the CLOCK used. > > > Where does it say anything about that their resolution is equal? So the timers resolution is the same as the CLOCKs resolution as returned by clock_res() but, as I said above, the usual practice is to return clock values (via clock_gettime or gettimeofday) with higher resolution. > > >>>>Reprogramming interval timers by now + interval is completely wrong. >>>>Reprogramming has to be >>>>timer->expires + interval and nothing else. >>> >>>Where do get the requirement for an explicit rounding from? >>>The point is that the timer should not expire early, but there is more >>>than one way to do this and can be done implicitly using the timer >>>resolution. >> >>See above. The standard requires that timer expiry times and interval times be rounded up to the next "resolution" value. For the first or initial time of a repeating timer we, usually, have to add an additional "resolution" to account for starting the timer at some point between ticks. For the interval on repeating timers, we know that the interval is starting at the last expiry time and thus do not need to account for the between tick start time. > ~ -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] ktimers subsystem 2.6.14-rc2-kt5 2005-10-12 23:46 ` George Anzinger @ 2005-10-16 16:34 ` Roman Zippel 2005-10-16 19:26 ` Thomas Gleixner 0 siblings, 1 reply; 67+ messages in thread From: Roman Zippel @ 2005-10-16 16:34 UTC (permalink / raw) To: George Anzinger Cc: Thomas Gleixner, linux-kernel, mingo, Andrew Morton, johnstul, paulmck, Christoph Hellwig, oleg, tim.bird Hi, On Wed, 12 Oct 2005, George Anzinger wrote: > > > > > As far as I understand SUS timer resolution is equal to clock > > > > > resolution > > > > > and the timer value/interval is rounded up to the resolution. > > > > > > > > Please check the rationale about clocks and timers. It talks about > > > > clocks and timer services based on them and their resolutions can be > > > > different. > > Well, yes and no. Under timer_settime() it talks about ticks and resolution > being the inverse of the tick rate. AND it does imply that timers on a given > CLOCK will have that clocks resolution as returned by clock_res(). This is > fine as far as it goes. In practical systems we almost always have a much > higher resolution for the clock_gettime() and gettimeofday() than the tick > rate. What the standard does not seem to want to do is to admit that a clock > may have the ability to be read at a better resolution than its tick rate. The interesting question is what resolution has CLOCK_REALTIME really? This paragraph in timer_settime() doesn't mention CLOCK_REALTIME and AFAICT historically the resolution of e.g. gettimeofday() was really in the msec range. IMO there is a far more interesting in sentence under clock_getres(): "If the time argument of clock_settime() is not a multiple of res, then the value is truncated to a multiple of res." This is relatively obvious for hardware clocks, e.g. we could define a CLOCK_JIFFIES with a resolution of TICK_NSEC or CLOCK_PIT with a resolution of 838 nsec. The conversion from the actual clock value to/from timespec automatically takes care of any truncation/rounding. CLOCK_REALTIME is now a bit special as it doesn't map directly to a hardware clock, it also includes adjustments and these are done in nsec resolution (actually even fractions of that in the NTP code). In 2.6 we don't truncate the value anywhere and maintain it as a nsec value, therefore the resolution of CLOCK_REALTIME should really really 1 nsec (and 1 usec under 2.4). OTOH the precision with which the clock can be read is a different matter and depends on the hardware clock CLOCK_REALTIME is derived of. It would really help if we could agree on something what clock resolution really means (especially for CLOCK_REALTIME). For hardware clocks the resolution is defined by the conversion factor from clock cycles to timespec, but CLOCK_REALTIME is a virtual clock, so is its resolution the precision with which the clock can be read or written? clock_getres() specifically mentions clock_settime()... Depending on this is how we define what timer resolution means. Currently we convert the timespec value from/into a jiffies value, so I guess the resolution is really TICK_NSEC, as it's the resolution at which we maintain the timer value. Thomas's patch now changes this and we keep a nsec value, but doesn't that mean the resolution of the timer becomes 1 nsec? It's basically the same question as above, is the timer resolution the precision at which we maintain the values, the precision with which the timer can be read or the precision with which the timer can be programmed? The spec is not really clear and Thomas refusal to explain his design decision is as also not really helpful. :-( He sets the timer resolution to (NSEC_PER_SEC/HZ) which matches no value above and this way he basically creates another virtual timer, which has only little to do with the real kernel timer tick. I'm open to other interpretations and I think it's important to get to some agreement, _before_ we start to change interfaces. bye, Roman ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] ktimers subsystem 2.6.14-rc2-kt5 2005-10-16 16:34 ` Roman Zippel @ 2005-10-16 19:26 ` Thomas Gleixner 2005-10-16 23:03 ` Roman Zippel 0 siblings, 1 reply; 67+ messages in thread From: Thomas Gleixner @ 2005-10-16 19:26 UTC (permalink / raw) To: Roman Zippel Cc: George Anzinger, linux-kernel, mingo, Andrew Morton, johnstul, paulmck, Christoph Hellwig, oleg, tim.bird On Sun, 2005-10-16 at 18:34 +0200, Roman Zippel wrote: > The spec is not really clear and Thomas refusal to explain his design > decision is as also not really helpful. :-( I did explain, why I did the rounding in the way it is implemented. If you define the fact that I have a different interpretation of SUS than you as refusal, then we can stop this thread right here. > He sets the timer resolution to (NSEC_PER_SEC/HZ) which matches no value > above and this way he basically creates another virtual timer, which has > only little to do with the real kernel timer tick. As George explained already we return the resolution of the timer as the value which can be assumed to be the resolution of the event source, which drives the timer, because that seems to be the only interesting value for an application programmer. The theoretical resolution of a jiffie based timer system is NSEC_PER_SEC/HZ. So why is NSEC_PER_SEC/HZ creating a virtual timer ? Because the ntp adjusted resolution per tick is 1% off ? I really don't see any sense in returning changing resolution values every 5 minutes due to NTP adjustments. I imagine the happiness of application programmers which actually do calculations based on such a resolution value. And in the logical consequence you would have to save the original userspace timespec value including the time when the timer is set up and redo the rounding and calculation every time NTP changes the NSEC_PER_TICK value for _all_ timers which are related to CLOCK_MONOTONIC and CLOCK_REALTIME. The code does not introduce a virtual timer at all. It uses the ntp adjusted time reference and guarantees that the timer goes not off early. Usually it expires with the next tick - of course system load can delay it further. tglx ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] ktimers subsystem 2.6.14-rc2-kt5 2005-10-16 19:26 ` Thomas Gleixner @ 2005-10-16 23:03 ` Roman Zippel 2005-10-17 7:59 ` Ingo Molnar 0 siblings, 1 reply; 67+ messages in thread From: Roman Zippel @ 2005-10-16 23:03 UTC (permalink / raw) To: Thomas Gleixner Cc: George Anzinger, linux-kernel, mingo, Andrew Morton, johnstul, paulmck, Christoph Hellwig, oleg, tim.bird Hi, On Sun, 16 Oct 2005, Thomas Gleixner wrote: > > The spec is not really clear and Thomas refusal to explain his design > > decision is as also not really helpful. :-( > > I did explain, why I did the rounding in the way it is implemented. If > you define the fact that I have a different interpretation of SUS than > you as refusal, then we can stop this thread right here. I have no problem with you having a different opinion, I have a problem with your childish behaviour. :-( You completely ignore the rest of my mail, trying to establish some base definitions, which would help to figure out the options we have based on the spec. You instead just insist on your interpretation without going into any detail. > > He sets the timer resolution to (NSEC_PER_SEC/HZ) which matches no value > > above and this way he basically creates another virtual timer, which has > > only little to do with the real kernel timer tick. > > As George explained already we return the resolution of the timer as the > value which can be assumed to be the resolution of the event source, > which drives the timer, because that seems to be the only interesting > value for an application programmer. The theoretical resolution of a > jiffie based timer system is NSEC_PER_SEC/HZ. You still don't explain, how you you get to this conclusion based on the spec. Instead you redefine it now to useful assupmtions for application programmers who can't read the spec... You still completely leave the question unanswered of the possibility of different resolutions. We can still discuss what resolution to return with clock_getres(), but first we have to establish with what kind of resoltion we're dealing with here. > I really don't see any sense in returning changing resolution values > every 5 minutes due to NTP adjustments. I imagine the happiness of > application programmers which actually do calculations based on such a > resolution value. Why are they doing this kind of calculations based on this value? We can discuss returning a reasonable value for these applications, but I don't see how these assumptions should control how the kernel works. > And in the logical consequence you would have to save the original > userspace timespec value including the time when the timer is set up and > redo the rounding and calculation every time NTP changes the > NSEC_PER_TICK value for _all_ timers which are related to > CLOCK_MONOTONIC and CLOCK_REALTIME. The rounding is done based on your interpretation of the spec, which you refuse to discuss. AFAICT the spec leaves enough room to avoid this rounding completely. bye, Roman ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] ktimers subsystem 2.6.14-rc2-kt5 2005-10-16 23:03 ` Roman Zippel @ 2005-10-17 7:59 ` Ingo Molnar 2005-10-17 8:26 ` Steven Rostedt 2005-10-17 9:29 ` Roman Zippel 0 siblings, 2 replies; 67+ messages in thread From: Ingo Molnar @ 2005-10-17 7:59 UTC (permalink / raw) To: Roman Zippel Cc: Thomas Gleixner, George Anzinger, linux-kernel, Andrew Morton, johnstul, paulmck, Christoph Hellwig, oleg, tim.bird * Roman Zippel <zippel@linux-m68k.org> wrote: > > > The spec is not really clear and Thomas refusal to explain his design > > > decision is as also not really helpful. :-( > > > > I did explain, why I did the rounding in the way it is implemented. If > > you define the fact that I have a different interpretation of SUS than > > you as refusal, then we can stop this thread right here. > > I have no problem with you having a different opinion, I have a problem > with your childish behaviour. :-( Roman, IMO Thomas has been more than reasonable in replying to you - i'd have stopped replying to you after the first couple of mails, and we are at mail round 10 now! Thomas is being very patient with you. You are being difficult, and IMO you are wasting his and others' time. the thing is that Thomas has advanced the whole issue of timeouts and timekeeping by leaps and bounds and he has written thousands of lines of new and excellent code for a kernel subsystem that has seen little activity for many years, before John got involved. One of Thomas' accomplishments is a timer/time design that allows the enabling of HRT timers via an _18 lines_ architecture patch. (!) on the other hand, i have yet to see a single line of code from you and have yet to receive a single bugreport from you. (!) so for me as a patch integrator and upstream maintainer the equation is very simple, and i am not nearly as tolerant as Thomas: shut up Roman already and show us the code! really, start sending in patches. Testreports. Useful feedback. Those we can judge by their merits. Talk is cheap. The time subsystem has been dormant for years, and it has had more than enough talk already. the moment you express yourself via patches we'll know that 1) you understand what we have done so far 2) you have useful ideas of what should be done differently 3) you have the coder capability to implement and test those ideas. Patches wont be ignored, i can assure you. Get the patches rolling! Ingo ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] ktimers subsystem 2.6.14-rc2-kt5 2005-10-17 7:59 ` Ingo Molnar @ 2005-10-17 8:26 ` Steven Rostedt 2005-10-17 9:29 ` Roman Zippel 1 sibling, 0 replies; 67+ messages in thread From: Steven Rostedt @ 2005-10-17 8:26 UTC (permalink / raw) To: Thomas Gleixner; +Cc: Ingo Molnar, linux-kernel Trivial "stupid" patch. MAKE Makefile HAVE -kt($num)!!!! This will help with ketchup :-) -- Steve Index: linux-2.6.14-rc4-kt2/Makefile =================================================================== --- linux-2.6.14-rc4-kt2.orig/Makefile 2005-10-17 10:14:26.000000000 +0200 +++ linux-2.6.14-rc4-kt2/Makefile 2005-10-17 10:15:12.000000000 +0200 @@ -1,7 +1,7 @@ VERSION = 2 PATCHLEVEL = 6 SUBLEVEL = 14 -EXTRAVERSION =-rc4 +EXTRAVERSION =-rc4-kt2 NAME=Affluent Albatross # *DOCUMENTATION* ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] ktimers subsystem 2.6.14-rc2-kt5 2005-10-17 7:59 ` Ingo Molnar 2005-10-17 8:26 ` Steven Rostedt @ 2005-10-17 9:29 ` Roman Zippel 2005-10-17 9:41 ` Ingo Molnar 2005-10-17 9:54 ` Steven Rostedt 1 sibling, 2 replies; 67+ messages in thread From: Roman Zippel @ 2005-10-17 9:29 UTC (permalink / raw) To: Ingo Molnar Cc: Thomas Gleixner, George Anzinger, linux-kernel, Andrew Morton, johnstul, paulmck, Christoph Hellwig, oleg, tim.bird Hi, On Mon, 17 Oct 2005, Ingo Molnar wrote: > the thing is that Thomas has advanced the whole issue of timeouts and > timekeeping by leaps and bounds and he has written thousands of lines of > new and excellent code for a kernel subsystem that has seen little > activity for many years, before John got involved. One of Thomas' > accomplishments is a timer/time design that allows the enabling of HRT > timers via an _18 lines_ architecture patch. (!) Did I say these patches were bad in general? All I'm asking for is an explanation for a few design decisions to understand the patch and its behaviour better and evaluate alternative solutions. Neither of you have shown any real interest in this so far. > the moment you express yourself via patches we'll know that 1) you > understand what we have done so far 2) you have useful ideas of what > should be done differently 3) you have the coder capability to implement > and test those ideas. Patches wont be ignored, i can assure you. Get the > patches rolling! This "shut up and show code" attitude is sometimes quite funny, but it's no real threat to me. I hoped to avoid this and solve this more civilized. Of course I'll understand the issues better afterwards, but you could as easily just tell me. It will waste my time, I could spend on other projects and it will put Andrew in the unfortunate position to decide, which patch to accept. Is this really what you want? bye, Roman ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] ktimers subsystem 2.6.14-rc2-kt5 2005-10-17 9:29 ` Roman Zippel @ 2005-10-17 9:41 ` Ingo Molnar 2005-10-17 9:56 ` Andrew Morton 2005-10-17 16:33 ` Roman Zippel 2005-10-17 9:54 ` Steven Rostedt 1 sibling, 2 replies; 67+ messages in thread From: Ingo Molnar @ 2005-10-17 9:41 UTC (permalink / raw) To: Roman Zippel Cc: Thomas Gleixner, George Anzinger, linux-kernel, Andrew Morton, johnstul, paulmck, Christoph Hellwig, oleg, tim.bird * Roman Zippel <zippel@linux-m68k.org> wrote: > > the moment you express yourself via patches we'll know that 1) you > > understand what we have done so far 2) you have useful ideas of what > > should be done differently 3) you have the coder capability to implement > > and test those ideas. Patches wont be ignored, i can assure you. Get the > > patches rolling! > > This "shut up and show code" attitude is sometimes quite funny, but > it's no real threat to me. I hoped to avoid this and solve this more > civilized. Of course I'll understand the issues better afterwards, but > you could as easily just tell me. [...] if a dozen mails werent enough then one more probably wont make a difference, especially with your last mail calling Thomas's behavior "childish" - when all he did was to try to explain his reasons to you as patiently as possible! Thomas is not obliged to teach you or bear with you - it is his own free choice. (But if you want to discuss this personal angle any further please take the public lists (and other people) off the Cc: list, it's getting very off-topic.) Thomas's stuff is now fully integrated into the -rt tree and it works excellently. I have measured a 12 usecs worst-case HR timer-delivery latency (using cyclictest). _That_ is the thing i care about. > [...] It will waste my time, I could spend on other projects and it > will put Andrew in the unfortunate position to decide, which patch to > accept. [...] yes, please, put Andrew (and me too) into that unfortunate position! Please, pretty please, get on with the patches! Ingo ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] ktimers subsystem 2.6.14-rc2-kt5 2005-10-17 9:41 ` Ingo Molnar @ 2005-10-17 9:56 ` Andrew Morton 2005-10-17 11:00 ` Ingo Molnar 2005-10-17 16:25 ` Roman Zippel 2005-10-17 16:33 ` Roman Zippel 1 sibling, 2 replies; 67+ messages in thread From: Andrew Morton @ 2005-10-17 9:56 UTC (permalink / raw) To: Ingo Molnar Cc: zippel, tglx, george, linux-kernel, johnstul, paulmck, hch, oleg, tim.bird Ingo Molnar <mingo@elte.hu> wrote: > > > [...] It will waste my time, I could spend on other projects and it > > will put Andrew in the unfortunate position to decide, which patch to > > accept. [...] > > yes, please, put Andrew (and me too) into that unfortunate position! > Please, pretty please, get on with the patches! I'm with Roman on this one - the old "show me the code" trick which people use to quash other people's objections is rather poor form - we should simply address the objections as raised. That being said, I'll confess that I've largely ignored this discussion in the hope that things would get sorted out. Seems that this won't be happening and as Roman's opinions carry weight I do intend to solicit a (brief!) summary of his objections from him when the patch comes round again. Sorry. ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] ktimers subsystem 2.6.14-rc2-kt5 2005-10-17 9:56 ` Andrew Morton @ 2005-10-17 11:00 ` Ingo Molnar 2005-10-17 16:25 ` Roman Zippel 1 sibling, 0 replies; 67+ messages in thread From: Ingo Molnar @ 2005-10-17 11:00 UTC (permalink / raw) To: Andrew Morton Cc: zippel, tglx, george, linux-kernel, johnstul, paulmck, hch, oleg, tim.bird * Andrew Morton <akpm@osdl.org> wrote: > Ingo Molnar <mingo@elte.hu> wrote: > > > > > [...] It will waste my time, I could spend on other projects and it > > > will put Andrew in the unfortunate position to decide, which patch to > > > accept. [...] > > > > yes, please, put Andrew (and me too) into that unfortunate position! > > Please, pretty please, get on with the patches! > > I'm with Roman on this one - the old "show me the code" trick which > people use to quash other people's objections is rather poor form - we > should simply address the objections as raised. > > That being said, I'll confess that I've largely ignored this > discussion in the hope that things would get sorted out. Seems that > this won't be happening and as Roman's opinions carry weight I do > intend to solicit a (brief!) summary of his objections from him when > the patch comes round again. Sorry. Fine with me. A brief summary of technical objections (without any personal attacks) is all we wanted to have to begin with. "Show me the code" was my last-ditch attempt to move this seemingly unmovable discussion from a communication channel where the chemistry doesnt seem to work out to a more objective format. Ingo ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] ktimers subsystem 2.6.14-rc2-kt5 2005-10-17 9:56 ` Andrew Morton 2005-10-17 11:00 ` Ingo Molnar @ 2005-10-17 16:25 ` Roman Zippel 2005-10-17 16:49 ` Tim Bird 2005-10-17 20:55 ` Thomas Gleixner 1 sibling, 2 replies; 67+ messages in thread From: Roman Zippel @ 2005-10-17 16:25 UTC (permalink / raw) To: Andrew Morton Cc: Ingo Molnar, tglx, george, linux-kernel, johnstul, paulmck, hch, oleg, tim.bird Hi, On Mon, 17 Oct 2005, Andrew Morton wrote: > That being said, I'll confess that I've largely ignored this discussion in > the hope that things would get sorted out. Seems that this won't be > happening and as Roman's opinions carry weight I do intend to solicit a > (brief!) summary of his objections from him when the patch comes round > again. Sorry. It's rather simple: - "timer API" vs "timeout API": I got absolutely no acknowlegement that this might be a little confusing and in consequence "process timer" may be a better name. - I pointed out various (IMO) unnecessary complexities, which were rather quickly brushed off e.g. with a need for further (not closer specified) cleanups. - resolution handling: at what resolution should/does the kernel work and what do we report to user space. The spec allows multiple interpretations and I have a hard time to get at least one coherent interpretation out of Thomas. Maybe I'm the only one who found Thomas answers a little superficial, but as this is a central kernel subsystem I think it deserves a closer look and everytime I tried to poke a little deeper I got nothing. bye, Roman ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] ktimers subsystem 2.6.14-rc2-kt5 2005-10-17 16:25 ` Roman Zippel @ 2005-10-17 16:49 ` Tim Bird 2005-10-17 17:26 ` Steven Rostedt 2005-10-17 18:49 ` Roman Zippel 2005-10-17 20:55 ` Thomas Gleixner 1 sibling, 2 replies; 67+ messages in thread From: Tim Bird @ 2005-10-17 16:49 UTC (permalink / raw) To: Roman Zippel Cc: Andrew Morton, Ingo Molnar, tglx, george, linux-kernel, johnstul, paulmck, hch, oleg Roman Zippel wrote: > On Mon, 17 Oct 2005, Andrew Morton wrote: >>That being said, I'll confess that I've largely ignored this discussion in >>the hope that things would get sorted out. Seems that this won't be >>happening and as Roman's opinions carry weight I do intend to solicit a >>(brief!) summary of his objections from him when the patch comes round >>again. Sorry. > > > It's rather simple: > - "timer API" vs "timeout API": I got absolutely no acknowlegement that > this might be a little confusing and in consequence "process timer" may be > a better name. I agree with Thomas on this one. Maybe "timer" and "timeout" are too close, but I think they are the most descriptive names. - timeout is something used for a timeout. Timeouts only actually expire infrequently, so they have a host of attributes associated with that characteristic. - timer is something used to time something. They almost always expire as part of their normal behaviour. In the ktimer code they have a host of attributes related to this characteristic. Thomas answered the suggestion to use "process timer" as an alternative name, but I didn't see a reply after that from Roman (I may have missed it.) > - I pointed out various (IMO) unnecessary complexities, which were rather > quickly brushed off e.g. with a need for further (not closer specified) > cleanups. This is rather vague. It is rather easy to raise hypothetical issues. From what I've seen, Thomas has gone to great lengths to address specific issues raised. For example, he actually compiled code on 4 different platforms to get the REAL size of the assembly fragments, in order to address your concern about CONJECTURED size problems. > - resolution handling: at what resolution should/does the kernel work and > what do we report to user space. The spec allows multiple interpretations > and I have a hard time to get at least one coherent interpretation out of > Thomas. Huh? I thought Thomas' last answer was pretty clear. > > Maybe I'm the only one who found Thomas answers a little superficial, but > as this is a central kernel subsystem I think it deserves a closer look > and everytime I tried to poke a little deeper I got nothing. No one minds poking deep. But frankly, I find hypothetical arguments to be less useful than reality-backed ones. I would rather not hear reasoning about a resolution issue - I'd like to numbers, if possible, about the degradation of performance, if that's the issue. If it's confusion about the API, then maybe we just need clear statements that "X API provides resolution at Y level (from one of: hardware, tick, something else). Regards, -- Tim ============================= Tim Bird Architecture Group Chair, CE Linux Forum Senior Staff Engineer, Sony Electronics ============================= ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] ktimers subsystem 2.6.14-rc2-kt5 2005-10-17 16:49 ` Tim Bird @ 2005-10-17 17:26 ` Steven Rostedt 2005-10-17 18:49 ` Roman Zippel 1 sibling, 0 replies; 67+ messages in thread From: Steven Rostedt @ 2005-10-17 17:26 UTC (permalink / raw) To: Tim Bird Cc: Roman Zippel, Andrew Morton, Ingo Molnar, tglx, george, linux-kernel, johnstul, paulmck, hch, oleg On Mon, 17 Oct 2005, Tim Bird wrote: > > > > > > It's rather simple: > > - "timer API" vs "timeout API": I got absolutely no acknowlegement that > > this might be a little confusing and in consequence "process timer" may be > > a better name. > > I agree with Thomas on this one. Maybe "timer" and "timeout" are too > close, but I think they are the most descriptive names. > - timeout is something used for a timeout. Timeouts only actually > expire infrequently, so they have a host of attributes associated > with that characteristic. > - timer is something used to time something. They almost always > expire as part of their normal behaviour. In the ktimer code they > have a host of attributes related to this characteristic. > > Thomas answered the suggestion to use "process timer" as an alternative > name, but I didn't see a reply after that from Roman (I may have missed it.) > I can add to this. After this was brought up, I did a little non-scientific survey. I walked around and asked various engineers here at my customer's site, what it meant to them if I had two types of timer APIs, one for "timers" and one for "timeouts". All 100% of 8 people that I asked (not a lot, but still), had no confusion with what they meant. I asked them to explain what these names mean to them, and every one said basically, timeouts are for situations that are for things that lasted too long, and timers and for things where they want to be notified of an event that takes place at some time. They all agreed with me that timeouts were for exceptions and not expected to be triggered, and timers were the other way around and should always be triggered. Not only that, I also asked if these timers would make sense if we called them "kernel" timers and "process" timers. These names confused them because they use both timers in their kernel modules. That convinced me enough to think that Thomas' naming convention is not confusing. -- Steve ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] ktimers subsystem 2.6.14-rc2-kt5 2005-10-17 16:49 ` Tim Bird 2005-10-17 17:26 ` Steven Rostedt @ 2005-10-17 18:49 ` Roman Zippel 2005-10-17 19:19 ` Tim Bird 2005-10-17 20:09 ` Ingo Molnar 1 sibling, 2 replies; 67+ messages in thread From: Roman Zippel @ 2005-10-17 18:49 UTC (permalink / raw) To: Tim Bird Cc: Andrew Morton, Ingo Molnar, tglx, george, linux-kernel, johnstul, paulmck, hch, oleg Hi, On Mon, 17 Oct 2005, Tim Bird wrote: > > It's rather simple: > > - "timer API" vs "timeout API": I got absolutely no acknowlegement that this > > might be a little confusing and in consequence "process timer" may be a > > better name. > > I agree with Thomas on this one. Maybe "timer" and "timeout" are too close, > but I think they are the most descriptive names. > - timeout is something used for a timeout. Timeouts only actually > expire infrequently, so they have a host of attributes associated > with that characteristic. > - timer is something used to time something. They almost always > expire as part of their normal behaviour. In the ktimer code they > have a host of attributes related to this characteristic. There is of course a difference, but is it big enough that they deserve different APIs? Just look into <linux/timer.h> it doesn't mention timeout once, but according to Thomas that's our "timeout API". Look at the description of mod_timer() in timer.c: "modify a timer's timeout". It seems I'm not only one who thinks that both are closely related. > Thomas answered the suggestion to use "process timer" as an alternative name, > but I didn't see a reply after that from Roman (I may have missed it.) It was short and painless: } > > Calling them "process timer" and "kernel timer" would include their main } > > usage, although that also means ptimer were the more correct abbreviation. } > } > As said before I think the disctinction between timers and timeouts } > makes perfectly sense and ktimers are _not_ restricted to process } > timers. } } "main usage" != "restricted to" IOW I didn't say that "process timer" are restricted to processes, but it's their intended main usage. "kernel timer" are OTOH the first choice for any internal kernel time issues (which are not just timeouts). > > - I pointed out various (IMO) unnecessary complexities, which were rather > > quickly brushed off e.g. with a need for further (not closer specified) > > cleanups. > > This is rather vague. It is rather easy to raise hypothetical > issues. From what I've seen, Thomas has gone to great lengths to > address specific issues raised. For example, he actually compiled > code on 4 different platforms to get the REAL size of the assembly > fragments, in order to address your concern about CONJECTURED size > problems. This was the _only_ issue where he got into any detail, but I also mentioned later that this one of the minor issues. Above was about the size of the ktimer structure and interval timer. > > - resolution handling: at what resolution should/does the kernel work and > > what do we report to user space. The spec allows multiple interpretations > > and I have a hard time to get at least one coherent interpretation out of > > Thomas. > > Huh? I thought Thomas' last answer was pretty clear. Then I must have missed something. Earlier he just quotes something from SUS without any explanation. His last answer was just about user expectations without any connection to the different resolutions at the kernel side I described in the mail before. bye, Roman ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] ktimers subsystem 2.6.14-rc2-kt5 2005-10-17 18:49 ` Roman Zippel @ 2005-10-17 19:19 ` Tim Bird 2005-10-17 19:48 ` Roman Zippel 2005-10-17 20:09 ` Ingo Molnar 1 sibling, 1 reply; 67+ messages in thread From: Tim Bird @ 2005-10-17 19:19 UTC (permalink / raw) To: Roman Zippel Cc: Bird, Tim, Andrew Morton, Ingo Molnar, tglx, george, linux-kernel, johnstul, paulmck, hch, oleg Roman Zippel wrote: > } > > Calling them "process timer" and "kernel timer" would include > their main > } > > usage, although that also means ptimer were the more correct > abbreviation. > } > > } > As said before I think the disctinction between timers and timeouts > } > makes perfectly sense and ktimers are _not_ restricted to process > } > timers. > } > } "main usage" != "restricted to" > > IOW I didn't say that "process timer" are restricted to processes, but > it's their intended main usage. "kernel timer" are OTOH the first choice > > for any internal kernel time issues (which are not just timeouts). Maybe for a more experienced kernel person such as yourself, this distinction make sense. But "process timer" and "kernel timer" don't carry much semantic value for me. They seem to convey an arbitrary expectation of usage patterns. Maybe they match the current usage patterns in the kernel, but I'd prefer naming based on functionality or behaviour of the API. > There is of course a difference, but is it big enough that they deserve > different APIs? IMHO yes. I think having separate APIs will eventually be beneficial to allow better handling of resolution manipulation in the future. For example, timeouts are likely to need less resolution, and it may be valuable to adjust the resolution of timeouts to support coalescing timeouts for better tickless operation. (Driving towards better power management performance for embedded devices.) > Just look into <linux/timer.h> it doesn't mention timeout > once, but according to Thomas that's our "timeout API". Look at the > description of mod_timer() in timer.c: "modify a timer's timeout". > It seems I'm not only one who thinks that both are closely related. I'm not sure if you are arguing for renaming the old API. I would be in favor of this (from an abstract perspective, to clarify the usage in the kernel), but it might be too big a change right now. Regards, -- Tim ============================= Tim Bird Architecture Group Chair, CE Linux Forum Senior Staff Engineer, Sony Electronics ============================= ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] ktimers subsystem 2.6.14-rc2-kt5 2005-10-17 19:19 ` Tim Bird @ 2005-10-17 19:48 ` Roman Zippel 2005-10-17 20:13 ` Ingo Molnar 0 siblings, 1 reply; 67+ messages in thread From: Roman Zippel @ 2005-10-17 19:48 UTC (permalink / raw) To: Tim Bird Cc: Andrew Morton, Ingo Molnar, tglx, george, linux-kernel, johnstul, paulmck, hch, oleg Hi, On Mon, 17 Oct 2005, Tim Bird wrote: > Maybe for a more experienced kernel person such as > yourself, this distinction make sense. But > "process timer" and "kernel timer" don't carry much > semantic value for me. They seem to convey an > arbitrary expectation of usage patterns. Maybe > they match the current usage patterns in the kernel, > but I'd prefer naming based on functionality or > behaviour of the API. Let's say you want to implement a watchdog timer for a driver, which runs about every second to do something. Now if you have the choice between "timer API" vs. "timeout API" and "kernel timer" vs. "process timer", what would you choose based on the name? bye, Roman ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] ktimers subsystem 2.6.14-rc2-kt5 2005-10-17 19:48 ` Roman Zippel @ 2005-10-17 20:13 ` Ingo Molnar 2005-10-17 20:31 ` Roman Zippel 0 siblings, 1 reply; 67+ messages in thread From: Ingo Molnar @ 2005-10-17 20:13 UTC (permalink / raw) To: Roman Zippel Cc: Tim Bird, Andrew Morton, tglx, george, linux-kernel, johnstul, paulmck, hch, oleg * Roman Zippel <zippel@linux-m68k.org> wrote: > > Maybe for a more experienced kernel person such as > > yourself, this distinction make sense. But > > "process timer" and "kernel timer" don't carry much > > semantic value for me. They seem to convey an > > arbitrary expectation of usage patterns. Maybe > > they match the current usage patterns in the kernel, > > but I'd prefer naming based on functionality or > > behaviour of the API. > > Let's say you want to implement a watchdog timer for a driver, which > runs about every second to do something. Now if you have the choice > between "timer API" vs. "timeout API" and "kernel timer" vs. "process > timer", what would you choose based on the name? why you insist on ktimers being 'process timers'? They are totally separate entities, not limited to any process notion. One of their first practical use happens to be POSIX process timers (both itimers and ptimers) via them, but no way are ktimers only 'process timers'. They are very generic timers, usable for any kernel purpose. so to answer your question: it is totally possible for a watchdog mechanism to use ktimers. In fact it would be desirable from a robustness POV too: e.g. we dont want a watchdog from being overload-able via too many timeouts in the timer wheel ... Ingo ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] ktimers subsystem 2.6.14-rc2-kt5 2005-10-17 20:13 ` Ingo Molnar @ 2005-10-17 20:31 ` Roman Zippel 2005-10-18 8:46 ` Ingo Molnar 0 siblings, 1 reply; 67+ messages in thread From: Roman Zippel @ 2005-10-17 20:31 UTC (permalink / raw) To: Ingo Molnar Cc: Tim Bird, Andrew Morton, tglx, george, linux-kernel, johnstul, paulmck, hch, oleg Hi, On Mon, 17 Oct 2005, Ingo Molnar wrote: > why you insist on ktimers being 'process timers'? Because they are optimized for process usage. OTOH kernel usage is more than just "timeouts". > so to answer your question: it is totally possible for a watchdog > mechanism to use ktimers. In fact it would be desirable from a > robustness POV too: "possible" and "desirable" is still different from "preferable", as they involve a higher cost. > e.g. we dont want a watchdog from being > overload-able via too many timeouts in the timer wheel ... Please explain. bye, Roman ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] ktimers subsystem 2.6.14-rc2-kt5 2005-10-17 20:31 ` Roman Zippel @ 2005-10-18 8:46 ` Ingo Molnar 2005-10-18 23:52 ` Tim Bird 2005-10-19 1:58 ` Roman Zippel 0 siblings, 2 replies; 67+ messages in thread From: Ingo Molnar @ 2005-10-18 8:46 UTC (permalink / raw) To: Roman Zippel Cc: Tim Bird, Andrew Morton, tglx, george, linux-kernel, johnstul, paulmck, hch, oleg * Roman Zippel <zippel@linux-m68k.org> wrote: > On Mon, 17 Oct 2005, Ingo Molnar wrote: > > > why you insist on ktimers being 'process timers'? > > Because they are optimized for process usage. OTOH kernel usage is > more than just "timeouts". you have cut out the rest of what i write in the paragraph, which IMO answers your question: > > They are totally separate entities, not limited to any process > > notion. One of their first practical use happens to be POSIX process > > timers (both itimers and ptimers) via them, but no way are ktimers > > only 'process timers'. They are very generic timers, usable for any > > kernel purpose. so i can only repeat that ktimers is a generic timer subsystem, with a focus on _actually delivering a timer event_. and no, ktimers are not "optimized for process usage" (or tied to whatever other process notion, as i said before), they are optimized for: - the delivery of time related events as contrasted to the timeout-API (a'ka "timer wheel") code in kernel/timers.c that is optimized towards: - the fast adding/removal of timers without too much focus on robust and deterministic delivery of events. these two concepts are conflicting, and i claim that a (sane) data structure that maximally fulfills both sets of requirements does not exist, mathematically. (to repeat, the requirements are: 'fast add/remove' and 'fast+deterministic expiry') at this point i'd really suggest for readers to lean back and think about the mathematical foundations of timer data structures for a bit, with a focus on the tradeoffs that the timer wheel data structure has, vs. the tradeoffs of the rbtree data structure that ktimers has. My claim is that if you _know_ that a timer will expire most likely, you want it to order at insertion time - i.e. you want to have a tree structure. If you _know_ that a timer will most likely _not_ expire, then you can avoid the tree overhead by 'delaying' the decision of sorting timers, to the point in the future where we really are forced to do so. The result of this mathematical paradox is that we end up with two data structures: one is the timer wheel (kernel/timers.c) for timeout/exception related use; the other one is ktimers (kernel/ktimers.c), for expiry oriented use. > > so to answer your question: it is totally possible for a watchdog > > mechanism to use ktimers. In fact it would be desirable from a > > robustness POV too: > > "possible" and "desirable" is still different from "preferable", as > they involve a higher cost. [ in my answer above you are free to substitute "preferable" with "desirable" - i do mean it as it reads in plain English. ] > > e.g. we dont want a watchdog from being > > overload-able via too many timeouts in the timer wheel ... > > Please explain. e.g. on busy networked servers (i.e. ones that do have a need for watchdogs) the timer wheel often includes large numbers of timeouts, 99.9% of which never expire. If they do expire en masse for whatever reason, then we can get into overload mode: a million timers might have to expire before we get to process the watchdog event and act upon it. This can delay the watchdog event significantly, which delay might (or might not) matter to the watchdog application. in short: the timer wheel was not designed with determinism in mind (nor should 'simple timeouts' care about determinism). Watchdogs are preferably (and desirably) implemented via the most deterministic timer mechanism that the kernel offers: ktimers in this particular case. Ingo ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] ktimers subsystem 2.6.14-rc2-kt5 2005-10-18 8:46 ` Ingo Molnar @ 2005-10-18 23:52 ` Tim Bird 2005-10-19 0:03 ` George Anzinger 2005-10-19 1:58 ` Roman Zippel 1 sibling, 1 reply; 67+ messages in thread From: Tim Bird @ 2005-10-18 23:52 UTC (permalink / raw) To: Ingo Molnar Cc: Roman Zippel, Bird, Tim, Andrew Morton, tglx, george, linux-kernel, johnstul, paulmck, hch, oleg Ingo Molnar wrote: > My claim is that if you _know_ that a timer will expire most likely, you > want it to order at insertion time - i.e. you want to have a tree > structure. If you _know_ that a timer will most likely _not_ expire, > then you can avoid the tree overhead by 'delaying' the decision of > sorting timers, to the point in the future where we really are forced to > do so. > > The result of this mathematical paradox is that we end up with two data > structures: one is the timer wheel (kernel/timers.c) for > timeout/exception related use; the other one is ktimers > (kernel/ktimers.c), for expiry oriented use. I'd like to make an observation on another difference between the wheel and the rbtree. Note that the wheel implementation inherently coalesces timeouts that are near each other, due to it's relatively low resolution (at tick granularity - which is still pretty low resolution on embedded hardware - usually 10 milliseconds.) One concern I have with the rbtree is that this automatic coalescing is lost, and there may be unanticipated overhead in the move to support high resolution timers. Whether some form of coalescing should be preserved for timers, even when the system supports higher resolution, will be a function of the number of timers and their intended use. I don't see any support for that in the current patch, but maybe I'm missing something. ============================= Tim Bird Architecture Group Chair, CE Linux Forum Senior Staff Engineer, Sony Electronics ============================= ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] ktimers subsystem 2.6.14-rc2-kt5 2005-10-18 23:52 ` Tim Bird @ 2005-10-19 0:03 ` George Anzinger 0 siblings, 0 replies; 67+ messages in thread From: George Anzinger @ 2005-10-19 0:03 UTC (permalink / raw) To: Tim Bird Cc: Ingo Molnar, Roman Zippel, Andrew Morton, tglx, linux-kernel, johnstul, paulmck, hch, oleg Tim Bird wrote: > Ingo Molnar wrote: > >>My claim is that if you _know_ that a timer will expire most likely, you >>want it to order at insertion time - i.e. you want to have a tree >>structure. If you _know_ that a timer will most likely _not_ expire, >>then you can avoid the tree overhead by 'delaying' the decision of >>sorting timers, to the point in the future where we really are forced to >>do so. >> >>The result of this mathematical paradox is that we end up with two data >>structures: one is the timer wheel (kernel/timers.c) for >>timeout/exception related use; the other one is ktimers >>(kernel/ktimers.c), for expiry oriented use. > > > I'd like to make an observation on another > difference between the wheel and the rbtree. Note that > the wheel implementation inherently coalesces timeouts > that are near each other, due to it's relatively > low resolution (at tick granularity - which is > still pretty low resolution on embedded hardware - > usually 10 milliseconds.) > > One concern I have with the rbtree is that this > automatic coalescing is lost, and there may be > unanticipated overhead in the move to support > high resolution timers. I think the coalescing is really done by the resolution rounding. There will always be the list removal overhead, but short of a duplex tree (i.e. one entry per time with dup times linked from the first (Ug)) you will always have that. What you want to coalesce is the interrupt overhead, not the list overhead, the former being MUCH larger. The difference here is that we don't see the resolution reflected in the tree structure, but that, I think, is good. > > Whether some form of coalescing should be > preserved for timers, even when the system > supports higher resolution, will be a > function of the number of timers and their > intended use. I don't see any support for that > in the current patch, but maybe I'm missing > something. > > ============================= ~ -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] ktimers subsystem 2.6.14-rc2-kt5 2005-10-18 8:46 ` Ingo Molnar 2005-10-18 23:52 ` Tim Bird @ 2005-10-19 1:58 ` Roman Zippel 2005-10-19 6:46 ` Ingo Molnar ` (3 more replies) 1 sibling, 4 replies; 67+ messages in thread From: Roman Zippel @ 2005-10-19 1:58 UTC (permalink / raw) To: Ingo Molnar Cc: Tim Bird, Andrew Morton, tglx, george, linux-kernel, johnstul, paulmck, hch, oleg Hi, On Tue, 18 Oct 2005, Ingo Molnar wrote: > > Because they are optimized for process usage. OTOH kernel usage is > > more than just "timeouts". > > you have cut out the rest of what i write in the paragraph, which IMO > answers your question: > > > > They are totally separate entities, not limited to any process > > > notion. One of their first practical use happens to be POSIX process > > > timers (both itimers and ptimers) via them, but no way are ktimers > > > only 'process timers'. They are very generic timers, usable for any > > > kernel purpose. > > so i can only repeat that ktimers is a generic timer subsystem, with a > focus on _actually delivering a timer event_. It doesn't answer it at all. The new timer system is definitively not "usable for any kernel purpose", it has certain properties, which makes it only applicable under certain conditions. > and no, ktimers are not "optimized for process usage" (or tied to > whatever other process notion, as i said before), they are optimized > for: > > - the delivery of time related events > > as contrasted to the timeout-API (a'ka "timer wheel") code in > kernel/timers.c that is optimized towards: > > - the fast adding/removal of timers > > without too much focus on robust and deterministic delivery of events. You forgot the main property of high resolution, which implies a higher maintainance cost. Whether the timer event is delivered or not is completely unimportant, as at some point the event has to be removed anyway, so that optimizing a timer for (non)delivery is complete nonsense. > these two concepts are conflicting, and i claim that a (sane) data > structure that maximally fulfills both sets of requirements does not > exist, mathematically. (to repeat, the requirements are: 'fast > add/remove' and 'fast+deterministic expiry') to repeat: low resolution/overhead vs high resolution. Both are hopefully deterministic (only at different resolutions) or we have serious bug at hand. > > > e.g. we dont want a watchdog from being > > > overload-able via too many timeouts in the timer wheel ... > > > > Please explain. > > e.g. on busy networked servers (i.e. ones that do have a need for > watchdogs) the timer wheel often includes large numbers of timeouts, > 99.9% of which never expire. If they do expire en masse for whatever > reason, then we can get into overload mode: a million timers might have > to expire before we get to process the watchdog event and act upon it. > This can delay the watchdog event significantly, which delay might (or > might not) matter to the watchdog application. I already mentioned earlier that it's possible to reduce the timer load by using a watchdog timer to filter most of these events, so that you get into the interesting situation that most kernel timer actually do expire and suddenly you easily can have more "timers" than "timeouts". bye, Roman ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] ktimers subsystem 2.6.14-rc2-kt5 2005-10-19 1:58 ` Roman Zippel @ 2005-10-19 6:46 ` Ingo Molnar 2005-10-19 10:49 ` kernel/timer.c design (was: Re: ktimers subsystem) Ingo Molnar ` (2 subsequent siblings) 3 siblings, 0 replies; 67+ messages in thread From: Ingo Molnar @ 2005-10-19 6:46 UTC (permalink / raw) To: Roman Zippel Cc: Tim Bird, Andrew Morton, tglx, george, linux-kernel, johnstul, paulmck, hch, oleg * Roman Zippel <zippel@linux-m68k.org> wrote: > > and no, ktimers are not "optimized for process usage" (or tied to > > whatever other process notion, as i said before), they are optimized > > for: > > > > - the delivery of time related events > > > > as contrasted to the timeout-API (a'ka "timer wheel") code in > > kernel/timers.c that is optimized towards: > > > > - the fast adding/removal of timers > > > > without too much focus on robust and deterministic delivery of events. > > You forgot the main property of high resolution, which implies a > higher maintainance cost. what did i forget? I did not mention "high resolution" anywhere. And what precisely do you mean by "higher maintainance cost"? Ingo ^ permalink raw reply [flat|nested] 67+ messages in thread
* kernel/timer.c design (was: Re: ktimers subsystem) 2005-10-19 1:58 ` Roman Zippel 2005-10-19 6:46 ` Ingo Molnar @ 2005-10-19 10:49 ` Ingo Molnar 2005-10-19 17:48 ` kernel/timer.c design Tim Bird ` (2 more replies) 2005-10-19 11:40 ` [PATCH] ktimers subsystem 2.6.14-rc2-kt5 Ingo Molnar 2005-10-19 11:58 ` Ingo Molnar 3 siblings, 3 replies; 67+ messages in thread From: Ingo Molnar @ 2005-10-19 10:49 UTC (permalink / raw) To: Roman Zippel Cc: Tim Bird, Andrew Morton, tglx, george, linux-kernel, johnstul, paulmck, hch, oleg * Roman Zippel <zippel@linux-m68k.org> wrote: > Whether the timer event is delivered or not is completely unimportant, > as at some point the event has to be removed anyway, so that > optimizing a timer for (non)delivery is complete nonsense. completely wrong! To explain this, let me first give you an introduction to the design goals and implementation/optimization details of the upstream kernel/timer.c code: The current design has remained largely unchanged since Finn Arne Gangstad implemented timer wheels in 1997. The code implements 'struct timer_list' objects, which can be 'added' via add_timer() to 'expire' in N jiffies, and can be 'removed' via del_timer() before expiry. If timers are not removed before expiry then they will expire, at which point the kernel has to call timer->fn(timer->data). Time has a granularity of 1/HZ and timeouts are 32 bits. [ sidenote: there are other details, like timer modification and other API variants, SMP scalability and other issues - in that sense this writeup is simplified, but the essence of the algorithms is still the same. ] since timers can be added in arbitrary time order (a timer that will expire sooner can be added after a timer has been added that will expire later, etc.), the kernel has to have timers sorted when they expire. Note: there is no requirement to sort timers _before_ expiry! the initial Linux timer implementation did not (have to) bother about the 'millions of timers' workloads yet, so it went for the simplest model: it has put all timers into a doubly-linked list, and sorted timers at insertion time, which made addition O(N). It also had an O(N) removal function, only expiry was O(1). [ the name 'struct timer_list' originates from this linked-list model, and this name has survived 15 years. The reason for the O(N) removal overhead of the original implementation was that it maintained a 'next timer will expire in N jiffies' value for every timer on the list, which the kernel could have used to implement dynamic timer ticks. We never ended up using that particular aspect of the implementation, and future timer implementations removed that property altogether. ] one could implement a add:O(N)/del:O(1)/exp:O(1) algorithm for sorted linked lists, the original implementation was suboptimal in doing a O(N) del_timer(). one could also implement a add:O(1)/del:O(1)/exp:O(N) algorithm via an unsorted linked list. In any case, if there's only a single list then either insertion or expiry has to carry the O(N) linear sorting overhead. another canonical 'computer science' way of dealing with timers is to put them into a binary tree that sorts by expiry-time: this means that at add_timer() time we have to insert the timer into the binary tree (O(log(N)) overhead), removal and expiry is O(1). the fastest theoretical timer algorithm is to have a linear array of lists [timer buckets] for every future jiffy (and a running index to represent the current jiffy): then adding a timer is a simple add_list() for the array entry indexed by the target timeout. Removing a timer is a simple list_del(), and expiring the timer is a matter of advancing the 'current time' index by one and expiring all (if any) timers that are in the next slot. Thus adding, removing and expiring a timer has constant O(1) overhead, and the worst-case behavior is constant bounded too. what makes this algorithm impossible in practice is its huge RAM footprint: tens of gigabytes of RAM to represent all ~2^32 jiffies. (Some OSs still do this, at the price of restricting either timer granularity, or the maximum possible timeout) it can be proven that under our assumptions this 'linear array of time' approach is the best fully O(1) algorithm [with constant worst-case behavior as well], so whatever other solution we choose to significantly reduce the RAM footprint, it wont be fully O(1). we've seen two practical approaches so far: the 'historical Linux implementation' which was add:O(N)/del:O(N)/exp:O(1), and the 'timer tree' solution which is add:O(log(N))/del:O(1)/exp:O(1). but the current Linux kernel uses a third algorithm: the timer wheels. This is a variant of the simple 'array of future jiffies' model, but instead of representing every future jiffy in a bucket, it categorizes future jiffies into a 'logarithmic array of arrays' where the arrays represent buckets with larger and larger 'scope/granularity': the further a jiffy is in the future, the more jiffies belong to the same single bucket. In practice it's done by categorizing all future jiffies into 5 groups: 1..256, 257..16384, 16385..1048576, 1048577..67108864, 67108865..4294967295 the first category consists of 256 buckets (each bucket representing a single jiffy), the second category consists of 64 buckets equally divided (each bucket represents 256 subsequent jiffies), the third category consists of 64 buckets too (each bucket representing 256*64 == 16384 jiffies), the fourth category consists of 64 buckets too (each bucket representing 256*64*64 == 1048576 jiffies), the fifth category consists of 64 buckets too (each bucket representing 67108864 jiffies). the buckets of each category are put into a per-category fixed-size array, called the "timer vector" - named tv1, tv2, tv3, tv4 and tv5. as you can see, we only used 256+64+64+64+64 == 512 buckets, but we've managed to map all 4294967295 future jiffies to these buckets! In other words: we've split up the 32 bits of 'timeout' value into 8+6+6+6+6 bits. [ you might ask: why dont we use an even number of buckets such as 8+8+8+8, which would simplify the code? The reason is mostly RAM footprint optimizations: an 8+8+8+8 splitup gives a total of 256+256+256+256 == 1024 buckets, which was considered a bit too high back when this code was designed. In fact, in recent 2.6 kernels, if CONFIG_BASE_SMALL is specified then we use a 6+4+4+4+4 splitup and round down the remaining 10 bits, which gives an embedded-friendly RAM footprint of 128 buckets. The 'splitup' is under constant revision and we might switch to the simpler (and slightly faster) 8+8+8+8 model in the future, for servers. ] how do we insert timers? In add_timer() we can calculate their "target category" in constant overhead (with at most 5 comparisons), and put the timer into that bucket. Note: unless it's in the first category, timers with different timeout values can end up in the same bucket. E.g. timers expiring at jiffy 260 and 265 will be both put into the first bucket of category 2. This means that timers in these buckets are 'partially sorted': they are only sorted in their highest bits, initially. So add_timer() is O(1) overhead. removal is simple: we remove the timer from the bucket, which is a list_del(), so O(1) overhead too. we knew that there's no free lunch, right? The main complication is how we do expiry. The first 256 jiffies are not a problem, because they are represented by the first array of buckets, so the expiry code only has to check whether there are any timers to be expired in that bucket. Expiry overhead is O(1) for these steps. But at jiffy 257 we do something special: the expiry code 'cascades' the first bucket of the second array 'down into' the first 256 buckets. It does it the hard way: walks the list of timers in that bucket (if any), and removes them from that list and inserts them into one of the first 256 buckets (depending on what the timeout value of that timer is). Then the expiry code goes back to bucket 1, and expires the timers there (if any). The expiry code keeps a persistent running index for every category, and if that index overflows back to 1, it increments the next category's index by one and 'cascades down' timers from that bucket into the previous category. in other words: what happens is that we sort timers "piecemail wise", first we order by the highest bits of their timeout value, then we sort by the lower bits too - in the end they are fully sorted. If all timers expire and are never removed then still we have won relative to the fully-sorted-list approach: all timers will end up fully sorted, and average per-timer expiry overhead is still O(1)! But expiry worst-case is not bounded, it is O(N). One cost is the burstiness of processing: a single step of cascading can take many timers to be processed (if they happen to be in that same bucket), and no timers may expire while we do that processing. The worst-case expiry behavior is O(N). (The average cost is still O(1), because we process every timer at most 5 times.) Another cost is that we touch (and dirty) the timers again and again during their lifetime, bringing them into cache multiple times. But there's a hidden win as well from this approach: if a timer is removed before it expires, we've saved the remaining cascading steps! This happens surprisingly often: on a busy networked server, the majority of the timers never expire, and are removed before they have to be cascaded even once. in other words: we 'lazy sort' timers, and we push most of the sorting overhead as much into the future as possible, in the hope of the problem of having to sort them going away, because they get removed before they expire. (and even if we wanted, we couldnt sort earlier in this model, due to the RAM footprint limits) with all these details in mind, lets go back to Roman's assertion: > Whether the timer event is delivered or not is completely unimportant, > as at some point the event has to be removed anyway, so that > optimizing a timer for (non)delivery is complete nonsense. it is very much crutial whether a timer event is delivered. Think about the 'millions of network timers' case: most of them are removed before cascaded even once! By removing early we might not have to propagate and sort the timer in any way: it is added to a bucket and soon removed from the same bucket. Ingo ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: kernel/timer.c design 2005-10-19 10:49 ` kernel/timer.c design (was: Re: ktimers subsystem) Ingo Molnar @ 2005-10-19 17:48 ` Tim Bird 2005-10-19 18:00 ` Tim Bird 2005-10-19 22:12 ` kernel/timer.c design (was: Re: ktimers subsystem) Roman Zippel 2 siblings, 0 replies; 67+ messages in thread From: Tim Bird @ 2005-10-19 17:48 UTC (permalink / raw) To: Ingo Molnar Cc: Roman Zippel, Andrew Morton, tglx, george, linux-kernel, johnstul, paulmck, hch, oleg Ingo, Thanks for the excellent description of the timer wheel implementation. Ingo Molnar wrote: > One cost is the burstiness of processing: a single step of cascading can > take many timers to be processed (if they happen to be in that same > bucket)... > But there's a hidden win as well from this approach: if a timer is > removed before it expires, we've saved the remaining cascading steps! > This happens surprisingly often: on a busy networked server, the > majority of the timers never expire, and are removed before they have to > be cascaded even once. Unfortunately, this means that the actual costs of the wheel implementation vary depending on the relationship between HZ, the average timeout duration, and the bucket mappings (which, as you say, can be adjusted for size reasons.) This is one of the downsides of the wheel implementation. It's very difficult to tell in advance whether a particular timer load will cascade or not, making the costs (although bounded) unexpectedly variable. One solution (even suggested by Linus) for high resolution timers was to increase HZ and skip timer ticks. Unfortunately, this has a dramatic affect on the cost of cascading, and on the maximum duration available for timers. (By increasing HZ, you push more timers to higher tiers in the wheel, which means you potentially end up cascading them more often, even when they are removed before expiry.) These types of unexpected consequences are one good reason for avoiding use of the wheel for high res timers. ============================= Tim Bird Architecture Group Chair, CE Linux Forum Senior Staff Engineer, Sony Electronics ============================= ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: kernel/timer.c design 2005-10-19 10:49 ` kernel/timer.c design (was: Re: ktimers subsystem) Ingo Molnar 2005-10-19 17:48 ` kernel/timer.c design Tim Bird @ 2005-10-19 18:00 ` Tim Bird 2005-10-19 19:04 ` Thomas Gleixner 2005-10-19 22:12 ` kernel/timer.c design (was: Re: ktimers subsystem) Roman Zippel 2 siblings, 1 reply; 67+ messages in thread From: Tim Bird @ 2005-10-19 18:00 UTC (permalink / raw) To: Ingo Molnar Cc: Roman Zippel, Andrew Morton, tglx, george, linux-kernel, johnstul, paulmck, hch, oleg Ingo, Thanks for the excellent description of the timer wheel implementation. Ingo Molnar wrote: > One cost is the burstiness of processing: a single step of cascading can > take many timers to be processed (if they happen to be in that same > bucket)... > But there's a hidden win as well from this approach: if a timer is > removed before it expires, we've saved the remaining cascading steps! > This happens surprisingly often: on a busy networked server, the > majority of the timers never expire, and are removed before they have to > be cascaded even once. Unfortunately, this means that the actual costs of the wheel implementation vary depending on the relationship between HZ, the average timeout duration, and the bucket mappings (which, as you say, can be adjusted for size reasons.) This is one of the downsides of the wheel implementation. It's very difficult to tell in advance whether a particular timer load will cascade or not, making the costs (although bounded) unexpectedly variable. One solution (even suggested by Linus) for high resolution timers was to increase HZ and skip timer ticks. Unfortunately, this has a dramatic affect on the cost of cascading, and on the maximum duration available for timers. (By increasing HZ, you push more timers to higher tiers in the wheel, which means you potentially end up cascading them more often, even when they are removed before expiry.) These types of unexpected consequences are one good reason for avoiding use of the wheel for high res timers. ============================= Tim Bird Architecture Group Chair, CE Linux Forum Senior Staff Engineer, Sony Electronics ============================= ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: kernel/timer.c design 2005-10-19 18:00 ` Tim Bird @ 2005-10-19 19:04 ` Thomas Gleixner 0 siblings, 0 replies; 67+ messages in thread From: Thomas Gleixner @ 2005-10-19 19:04 UTC (permalink / raw) To: Tim Bird Cc: Ingo Molnar, Roman Zippel, Andrew Morton, george, linux-kernel, johnstul, paulmck, hch, oleg On Wed, 2005-10-19 at 11:00 -0700, Tim Bird wrote: > > But there's a hidden win as well from this approach: if a timer is > > removed before it expires, we've saved the remaining cascading steps! > > This happens surprisingly often: on a busy networked server, the > > majority of the timers never expire, and are removed before they have to > > be cascaded even once. > > Unfortunately, this means that the actual costs of the wheel > implementation vary depending on the relationship between HZ, > the average timeout duration, and the bucket mappings (which, > as you say, can be adjusted for size reasons.) This is one of > the downsides of the wheel implementation. It's very difficult > to tell in advance whether a particular timer load > will cascade or not, making the costs (although bounded) > unexpectedly variable. Thats exactly the problem we described earlier in the ktimer discussion: Changing HZ from 100 to 1000 while keeping the primary wheel size unchanged caused increased cascading load. HZ CONFIG_BASE_SMALL=n CONFIG_BASE_SMALL=y 100 2560 ms 640 ms 250 1024 ms 256 ms 1000 256 ms 64 ms A lot of timeouts are in the range of 500ms. While the HZ=100 and HZ=250 settings keep them in the primary wheel either until expiry or early removal, HZ=1000 and CONFIG_BASE_SMALL with HZ > 100 make cascading more likely when the system load goes up. Thats hard to balance for sure. tglx ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: kernel/timer.c design (was: Re: ktimers subsystem) 2005-10-19 10:49 ` kernel/timer.c design (was: Re: ktimers subsystem) Ingo Molnar 2005-10-19 17:48 ` kernel/timer.c design Tim Bird 2005-10-19 18:00 ` Tim Bird @ 2005-10-19 22:12 ` Roman Zippel 2 siblings, 0 replies; 67+ messages in thread From: Roman Zippel @ 2005-10-19 22:12 UTC (permalink / raw) To: Ingo Molnar Cc: Tim Bird, Andrew Morton, tglx, george, linux-kernel, johnstul, paulmck, hch, oleg Hi, On Wed, 19 Oct 2005, Ingo Molnar wrote: > > Whether the timer event is delivered or not is completely unimportant, > > as at some point the event has to be removed anyway, so that > > optimizing a timer for (non)delivery is complete nonsense. > > completely wrong! To explain this, let me first give you an introduction > to the design goals and implementation/optimization details of the > upstream kernel/timer.c code: I indeed made a mistake, thanks for pointing it out so elaborately. I'd like to mention something else here. It's rather bad style to start with "completely wrong!" and then continue to gloat with "let me give you an introduction", unless you intentionally want to insult me. Usually I would just ignore this, as it can happen to anyone, but I can find this style too often in your mails lately with the most obvious example of your "shut up or show code" comment. You're more busy trying to prove me wrong than adressing the actual issue. It never was my intention to discuss the kernel timer design (the one in timer.c you describe here), the original issue was and still is that "timer API" is a too generic term and you actually proved my point by using the terms timer and their timeout values very consistently in your description. It's possible I read this wrong, in that case I apologize already in advance, but please rethink the attitude you're showing, otherwise I'll reduce our conversion to a minimum. You're certainly have the more detailed knowledge in this area, but you don't have to show it off like this. bye, Roman ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] ktimers subsystem 2.6.14-rc2-kt5 2005-10-19 1:58 ` Roman Zippel 2005-10-19 6:46 ` Ingo Molnar 2005-10-19 10:49 ` kernel/timer.c design (was: Re: ktimers subsystem) Ingo Molnar @ 2005-10-19 11:40 ` Ingo Molnar 2005-10-19 11:58 ` Ingo Molnar 3 siblings, 0 replies; 67+ messages in thread From: Ingo Molnar @ 2005-10-19 11:40 UTC (permalink / raw) To: Roman Zippel Cc: Tim Bird, Andrew Morton, tglx, george, linux-kernel, johnstul, paulmck, hch, oleg * Roman Zippel <zippel@linux-m68k.org> wrote: > > so i can only repeat that ktimers is a generic timer subsystem, with a > > focus on _actually delivering a timer event_. > > It doesn't answer it at all. The new timer system is definitively not > "usable for any kernel purpose", it has certain properties, which > makes it only applicable under certain conditions. what "certain properties" and under what "certain conditions"? Please provide specifics to prove your point. I repeat for the third time: ktimers is a generic timer subsystem, with a focus on timer event delivery. Ingo ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] ktimers subsystem 2.6.14-rc2-kt5 2005-10-19 1:58 ` Roman Zippel ` (2 preceding siblings ...) 2005-10-19 11:40 ` [PATCH] ktimers subsystem 2.6.14-rc2-kt5 Ingo Molnar @ 2005-10-19 11:58 ` Ingo Molnar 2005-10-19 22:24 ` Roman Zippel 3 siblings, 1 reply; 67+ messages in thread From: Ingo Molnar @ 2005-10-19 11:58 UTC (permalink / raw) To: Roman Zippel Cc: Tim Bird, Andrew Morton, tglx, george, linux-kernel, johnstul, paulmck, hch, oleg * Roman Zippel <zippel@linux-m68k.org> wrote: > > > > e.g. we dont want a watchdog from being > > > > overload-able via too many timeouts in the timer wheel ... > > > > > > Please explain. > > > > e.g. on busy networked servers (i.e. ones that do have a need for > > watchdogs) the timer wheel often includes large numbers of timeouts, > > 99.9% of which never expire. If they do expire en masse for whatever > > reason, then we can get into overload mode: a million timers might have > > to expire before we get to process the watchdog event and act upon it. > > This can delay the watchdog event significantly, which delay might (or > > might not) matter to the watchdog application. > > I already mentioned earlier that it's possible to reduce the timer > load by using a watchdog timer to filter most of these events, so that > you get into the interesting situation that most kernel timer actually > do expire and suddenly you easily can have more "timers" than > "timeouts". this sentence does not parse at all, for me. Here's the effort i did trying to decypher it: Firstly, you mention 'watchdog' without clarifying whether it's the examplary watchdog we were talking about above, or whether it's some other, new mechanism. The former makes no sense (what does the watchdog timer in a random driver have to do with the millions of network timers i was talking about, and how could it be used to filter anything?), the later you dont explain. Secondly, the above sentence is the first time in the ktimer discussion that you ever mentioned the word 'filter', and you never mentioned the word 'watchdog' outside of the example we were discussing, so i'm curious about the source of the above "I already mentioned earlier" statement. When earlier? Which email? Frankly, the whole paragraph reads as if from another planet, i see the words but the content seems totally out of context and makes no sense to me. So i cannot even agree or disagree with anything you said in that sentence, because the sentence simply does not parse. Please enlighten me! Ingo ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] ktimers subsystem 2.6.14-rc2-kt5 2005-10-19 11:58 ` Ingo Molnar @ 2005-10-19 22:24 ` Roman Zippel 0 siblings, 0 replies; 67+ messages in thread From: Roman Zippel @ 2005-10-19 22:24 UTC (permalink / raw) To: Ingo Molnar Cc: Tim Bird, Andrew Morton, tglx, george, linux-kernel, johnstul, paulmck, hch, oleg Hi, On Wed, 19 Oct 2005, Ingo Molnar wrote: > Secondly, the above sentence is the first time in the ktimer discussion > that you ever mentioned the word 'filter', and you never mentioned the > word 'watchdog' outside of the example we were discussing, so i'm > curious about the source of the above "I already mentioned earlier" > statement. When earlier? Which email? http://marc.theaimsgroup.com/?l=linux-kernel&m=112752984710746&w=2 bye, Roman ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] ktimers subsystem 2.6.14-rc2-kt5 2005-10-17 18:49 ` Roman Zippel 2005-10-17 19:19 ` Tim Bird @ 2005-10-17 20:09 ` Ingo Molnar 1 sibling, 0 replies; 67+ messages in thread From: Ingo Molnar @ 2005-10-17 20:09 UTC (permalink / raw) To: Roman Zippel Cc: Tim Bird, Andrew Morton, tglx, george, linux-kernel, johnstul, paulmck, hch, oleg * Roman Zippel <zippel@linux-m68k.org> wrote: > > > It's rather simple: > > > - "timer API" vs "timeout API": I got absolutely no acknowlegement that this > > > might be a little confusing and in consequence "process timer" may be a > > > better name. > > > > I agree with Thomas on this one. Maybe "timer" and "timeout" are too close, > > but I think they are the most descriptive names. > > - timeout is something used for a timeout. Timeouts only actually > > expire infrequently, so they have a host of attributes associated > > with that characteristic. > > - timer is something used to time something. They almost always > > expire as part of their normal behaviour. In the ktimer code they > > have a host of attributes related to this characteristic. > > There is of course a difference, but is it big enough that they > deserve different APIs? Just look into <linux/timer.h> it doesn't > mention timeout once, but according to Thomas that's our "timeout > API". Look at the description of mod_timer() in timer.c: "modify a > timer's timeout". It seems I'm not only one who thinks that both are > closely related. this is one more area where there's no good substitute from 'walking the walk', i.e. getting yourself dirty with actual code. I have been involved with the following variants which were part of the -rt tree: - we implemented both timeouts and timers with the same timeout-optimized framework [i.e. with the 'wheel'] - it sucked. - timers and timeouts with a timer-optimized framework [i.e. with a binary tree] sucks too, due to the tree overhead. - we in fact tried another variant too: a hybrid method where timers and timeouts lived in the timer wheel and some time before (hr) timers were about to time out they were put into a separate hr-list. This hybrid solution sucked too. so then we tried a separate API and subsystem for both of them, and voila, many of the uglinesses went away, and things became robust. Ingo ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] ktimers subsystem 2.6.14-rc2-kt5 2005-10-17 16:25 ` Roman Zippel 2005-10-17 16:49 ` Tim Bird @ 2005-10-17 20:55 ` Thomas Gleixner 2005-10-18 0:07 ` Roman Zippel 1 sibling, 1 reply; 67+ messages in thread From: Thomas Gleixner @ 2005-10-17 20:55 UTC (permalink / raw) To: Roman Zippel Cc: Andrew Morton, Ingo Molnar, george, linux-kernel, johnstul, paulmck, hch, oleg, tim.bird On Mon, 2005-10-17 at 18:25 +0200, Roman Zippel wrote: > It's rather simple: > - "timer API" vs "timeout API": I got absolutely no acknowlegement that > this might be a little confusing and in consequence "process timer" may be > a better name. Not only me, also a lot of other people do _not_ find it confusing and I explained why it is a clear technical distcinction. I also explained why I think that process_timers is too restrictive IMO. I accept that you find it confusing, but I dont understand neither what kind of acknowledgement you want nor how you deduce my obligation for acknowledging whatever. > - I pointed out various (IMO) unnecessary complexities, which were rather > quickly brushed off e.g. with a need for further (not closer specified) > cleanups. The so called complexities are a not various. You complained about exactly 5 members of the ktimer structure. - list, expired, status, interval, overrun which are superflous in your opinion. Again an explanation for each : list: allows fast access to the time sorted list without walking the rbtree and is a preliminary for the extension to high resolution timers. ----------- expired: The field was added for simplification of some delta calculations in the return path. e.g. nanosleep in the expired case to avoid the extra call to get the current time. Also quite useful for debugging. ----------- status: A simple field, which stores at the moment 2 states and is necessary for extensions to high resolution timers too, as we have more states there. The suggested usage of the rbnode.parent pointer is wrong IMO as the overloading of arbitrary pointers for status information is a kind of pseudo optimization which is reducing in fact maintainability and clarity for a the win of a 32bit variable. ----------- interval, overrun: Interval holds the converted interval value for itimers. The overrun member is used by the rearm code so the caller can figure out the number of missed events. The cleanup I pointed out for the posix timer interval timers is pretty obvious. It makes use of interval and overrun and removes two members of the posix timer structure. ----------- The size of the ktimer structure is a matter of micro optimizations in the same way as the macros/inlines are. Calling the pure existence of some struct members complexity is an exaggeration and contradicts your own request for a simple and clear design. The implementation was done clear and simple from the very beginning and I really dont understand why the preparation for further extensions in the first place is bad. Doing a design with the final goal in mind is much cleaner than doing micro optimizations in the first place and afterwards working around them when you apply extensions. > - resolution handling: at what resolution should/does the kernel work and > what do we report to user space. The spec allows multiple interpretations > and I have a hard time to get at least one coherent interpretation out of > Thomas. I interpret the spec in the way I do for following reasons: 1. It is _usual practice_ to return the "timer" resolution for clock_res() and to return clock values with as much resolution as possible. In no case should the actual clock resolution be less than what clock_res() returns. - George Anzinger in this thread. Similar opinions can be found via Google. I came to the same conclusion and saw no reason to repeat Georges statement. I thought a simple pointer would be sufficient. 2. The rounding to the resolution value is explicitly required by the standard. 3. It makes a lot of sense to do what (1.) describes, due to the fact that we actually want to restrict the timer resolution to avoid interrupt and reprogramming floods in very short intervals. This in fact is the default behaviour in a jiffy driven environment. Pretending a real nsec resolution and doing no rounding at all is violating (2.). >From an application programmers view it makes sense to return the timer resolution so he actually can adjust the program behaviour on the provided resolution and not rely on wild guess assumptions about what might be there. Applications need to be able to verify whether the system can handle the required intervals or not. tglx ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] ktimers subsystem 2.6.14-rc2-kt5 2005-10-17 20:55 ` Thomas Gleixner @ 2005-10-18 0:07 ` Roman Zippel 2005-10-18 1:03 ` George Anzinger 0 siblings, 1 reply; 67+ messages in thread From: Roman Zippel @ 2005-10-18 0:07 UTC (permalink / raw) To: Thomas Gleixner Cc: Andrew Morton, Ingo Molnar, george, linux-kernel, johnstul, paulmck, hch, oleg, tim.bird Hi, On Mon, 17 Oct 2005, Thomas Gleixner wrote: > On Mon, 2005-10-17 at 18:25 +0200, Roman Zippel wrote: > > It's rather simple: > > - "timer API" vs "timeout API": I got absolutely no acknowlegement that > > this might be a little confusing and in consequence "process timer" may be > > a better name. > > Not only me, also a lot of other people do _not_ find it confusing and I > explained why it is a clear technical distcinction. I also explained why > I think that process_timers is too restrictive IMO. People don't find it confusing, exactly because it gives them the wrong idea about it, neither "API" is restricted to just timeouts or timer. I don't insist on the term "process timer", but I'd really like to find something better than ktimer. We already have kernel timer API, which is the primary API for kernel usage (for both timer and timeouts). > list: > allows fast access to the time sorted list without walking the rbtree > and is a preliminary for the extension to high resolution timers. Only access to first element is needed, which can be cached in the base. Please explain the second part. > expired: > The field was added for simplification of some delta calculations in the > return path. e.g. nanosleep in the expired case to avoid the extra call > to get the current time. Also quite useful for debugging. The return path can also get it from the base. > status: > A simple field, which stores at the moment 2 states and is necessary for > extensions to high resolution timers too, as we have more states there. > The suggested usage of the rbnode.parent pointer is wrong IMO as the > overloading of arbitrary pointers for status information is a kind of > pseudo optimization which is reducing in fact maintainability and > clarity for a the win of a 32bit variable. Testing a pointer is not "arbitrary", we do it all the time in the kernel. > interval, overrun: > Interval holds the converted interval value for itimers. The overrun > member is used by the rearm code so the caller can figure out the number > of missed events. > > The cleanup I pointed out for the posix timer interval timers is pretty > obvious. It makes use of interval and overrun and removes two members of > the posix timer structure. Where I think it's possible to separate the timer from the interval functionality to get a simpler timer base implementation. > The size of the ktimer structure is a matter of micro optimizations in > the same way as the macros/inlines are. Not really, these fields have to be initialized and maintained, which quickly goes beyond "micro optimizations". > Calling the pure existence of some struct members complexity is an > exaggeration and contradicts your own request for a simple and clear > design. That's not all what I had it in mind regarding complexity, I just started with the more simpler parts and never got to the more complex part. > Doing a design with the final goal in mind is much cleaner than doing > micro optimizations in the first place and afterwards working around > them when you apply extensions. This is fine, but then you should explain them, I'm not a mind reader, so that I can guess what you're planning. > > - resolution handling: at what resolution should/does the kernel work and > > what do we report to user space. The spec allows multiple interpretations > > and I have a hard time to get at least one coherent interpretation out of > > Thomas. > > I interpret the spec in the way I do for following reasons: > > 1. It is _usual practice_ to return the "timer" resolution for > clock_res() and to return clock values with as much resolution as > possible. In no case should the actual clock resolution be less than > what clock_res() returns. > - George Anzinger in this thread. Similar opinions can be found via > Google. I came to the same conclusion and saw no reason to repeat > Georges statement. I thought a simple pointer would be sufficient. In this case you don't interpret the spec, you ignore the spec. (I'll leave it open whether that's a good or bad thing.) > 2. The rounding to the resolution value is explicitly required by the > standard. It doesn't explicitly specify which resolution (see the previous mail). It doesn't explicitly specify how this rounding has to be implemented. > 3. It makes a lot of sense to do what (1.) describes, due to the fact > that we actually want to restrict the timer resolution to avoid > interrupt and reprogramming floods in very short intervals. This in fact > is the default behaviour in a jiffy driven environment. Pretending a > real nsec resolution and doing no rounding at all is violating (2.). > >From an application programmers view it makes sense to return the timer > resolution so he actually can adjust the program behaviour on the > provided resolution and not rely on wild guess assumptions about what > might be there. Applications need to be able to verify whether the > system can handle the required intervals or not. A portable application simply cannot make this assumption. Anyway, it's rather confusing how you ignore the spec, when "it makes a lot of sense" and OTOH how you can stick to the spec. I honestly don't know how to argue on this basis, where the spec can be arbitrarily redefined based on undocumented assumptions. bye, Roman ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] ktimers subsystem 2.6.14-rc2-kt5 2005-10-18 0:07 ` Roman Zippel @ 2005-10-18 1:03 ` George Anzinger 2005-10-19 1:26 ` Roman Zippel 0 siblings, 1 reply; 67+ messages in thread From: George Anzinger @ 2005-10-18 1:03 UTC (permalink / raw) To: Roman Zippel Cc: Thomas Gleixner, Andrew Morton, Ingo Molnar, linux-kernel, johnstul, paulmck, hch, oleg, tim.bird Roman Zippel wrote: > Hi, > > On Mon, 17 Oct 2005, Thomas Gleixner wrote: > > >>On Mon, 2005-10-17 at 18:25 +0200, Roman Zippel wrote: >> ~ >>interval, overrun: >>Interval holds the converted interval value for itimers. The overrun >>member is used by the rearm code so the caller can figure out the number >>of missed events. >> >>The cleanup I pointed out for the posix timer interval timers is pretty >>obvious. It makes use of interval and overrun and removes two members of >>the posix timer structure. > > > Where I think it's possible to separate the timer from the interval > functionality to get a simpler timer base implementation. They are required fields for the POSIX timer. I think you are saying that they should be there and not in the ktime struct, which is part of the POSIX timer struct. Is that right? Along this line, I have a bit of a problem with the ktimer code doing the timer repeat stuff. This is NOT used by POSIX timers because we want to wait for the user to pick up the signal before starting the next interval. This is key to avoiding timer storms and I would think that puting the repeat stuff in ktimer code opens it to the possibility of other users starting a timer storm via this. I think the itimer code should also use the signal call back to start the next interval, and for the same reason. > > ~ >>>- resolution handling: at what resolution should/does the kernel work and >>>what do we report to user space. The spec allows multiple interpretations >>>and I have a hard time to get at least one coherent interpretation out of >>>Thomas. >> >>I interpret the spec in the way I do for following reasons: >> >>1. It is _usual practice_ to return the "timer" resolution for >>clock_res() and to return clock values with as much resolution as >>possible. In no case should the actual clock resolution be less than >>what clock_res() returns. >>- George Anzinger in this thread. Similar opinions can be found via >>Google. I came to the same conclusion and saw no reason to repeat >>Georges statement. I thought a simple pointer would be sufficient. > > > In this case you don't interpret the spec, you ignore the spec. (I'll > leave it open whether that's a good or bad thing.) Eh? Granted we don't truncate the time on settime, but how else is it ignored? > > >>2. The rounding to the resolution value is explicitly required by the >>standard. > > > It doesn't explicitly specify which resolution (see the previous mail). > It doesn't explicitly specify how this rounding has to be implemented. In the timer_settime() call there is only one possible resolution refered to, that of the specified clock. The standard says(http://www.opengroup.org/onlinepubs/009695399/functions/timer_settime.html): Time values that are between two consecutive non-negative integer multiples of the resolution of the specified timer shall be rounded up to the larger multiple of the resolution. Quantization error shall not cause the timer to expire earlier than the rounded time value. This says a) round to the next resolution, and b) don't allow the resulting timer to expire early. The implication is that timers are to expire on resolution boundries so we need to round such that the expire happens _after_ the rounded time. Am I missing something here? The assumption, that I think you made, that we can let the hardware do the rounding runs contrary to one of the main reasons for resolution, i.e. to group timers so that we can reduce the system overhead. Just because we have timer hardware with microsecond resolution is not reason enough to offer it to the user as handling an interrupt every micro second is way too much overhead, and, in most cases, the user doesn't even want to such a fine resolution. > > >>3. It makes a lot of sense to do what (1.) describes, due to the fact >>that we actually want to restrict the timer resolution to avoid >>interrupt and reprogramming floods in very short intervals. This in fact >>is the default behaviour in a jiffy driven environment. Pretending a >>real nsec resolution and doing no rounding at all is violating (2.). >>>From an application programmers view it makes sense to return the timer >>resolution so he actually can adjust the program behaviour on the >>provided resolution and not rely on wild guess assumptions about what >>might be there. Applications need to be able to verify whether the >>system can handle the required intervals or not. > > > A portable application simply cannot make this assumption. POSIX clocks and timers are part of the REAL TIME POSIX extension. Arguing that real time apps need to be portable is, I think, rather beside the point. At the same time, if rounding follows the rules, one can set up a timer_settime() timer_gettime() sequence to get the resolution, even with the itimer one can do this. So resolution is available to the user in one way or another. What he does with it is up to him, but at least some RT apps. set up timer to expire early and after expiry, busy wait until the "appointed" time. Knowing the resolution helps to know how to set this up... ~ -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] ktimers subsystem 2.6.14-rc2-kt5 2005-10-18 1:03 ` George Anzinger @ 2005-10-19 1:26 ` Roman Zippel 2005-10-19 2:52 ` George Anzinger 0 siblings, 1 reply; 67+ messages in thread From: Roman Zippel @ 2005-10-19 1:26 UTC (permalink / raw) To: George Anzinger Cc: Thomas Gleixner, Andrew Morton, Ingo Molnar, linux-kernel, johnstul, paulmck, hch, oleg, tim.bird Hi, On Mon, 17 Oct 2005, George Anzinger wrote: > > Where I think it's possible to separate the timer from the interval > > functionality to get a simpler timer base implementation. > > They are required fields for the POSIX timer. I think you are saying that > they should be there and not in the ktime struct, which is part of the POSIX > timer struct. Is that right? Basically, yes. I would take some simpler steps in creating the new timer system. Thomas' patch introduces multiple concepts at once, which are hard to digest via a simple review. As it looks right now I have to take the patch apart myself and split it into simpler patches. > > > 2. The rounding to the resolution value is explicitly required by the > > > standard. > > > > > > It doesn't explicitly specify which resolution (see the previous mail). > > It doesn't explicitly specify how this rounding has to be implemented. > > In the timer_settime() call there is only one possible resolution refered to, > that of the specified clock. The standard > says(http://www.opengroup.org/onlinepubs/009695399/functions/timer_settime.html): > > Time values that are between two consecutive non-negative integer multiples of > the resolution of the specified timer shall be rounded up to the larger > multiple of the resolution. Quantization error shall not cause the timer to > expire earlier than the rounded time value. > > This says a) round to the next resolution, and b) don't allow the resulting > timer to expire early. The implication is that timers are to expire on > resolution boundries so we need to round such that the expire happens _after_ > the rounded time. > > Am I missing something here? In short: rounding errors. Above says IOW if we have a clock with a freqency f and a resolution with r=10^9/f, we have to round time t up so that it becomes a integer multiple i of r, so that once the counter reaches the value i all timer with upto a time value of i*r are expired. If we now simply ignore the resolution fraction, we get a rounded value which is quickly far away from the real value (with a worst case of r-1 nsec). This means an explicit rounding is likely only to make things worse and any rounding is better done as part of the conversion from/to timespec to/from the counter value according to the above rules and even this conversion should be avoided as much as possible to minimize rounding errors. > The assumption, that I think you made, that we can let the hardware do the > rounding runs contrary to one of the main reasons for resolution, i.e. to > group timers so that we can reduce the system overhead. Just because we have > timer hardware with microsecond resolution is not reason enough to offer it to > the user as handling an interrupt every micro second is way too much overhead, > and, in most cases, the user doesn't even want to such a fine resolution. This just means that we have two values to describe a timer, the clock resolution describes the precision with which the timer can be programmed and the timer precision which describes the maximum frequency of timer expiry. I think both values are of interest to user applications, so my current preference is to actually export them both properly instead of overloading the clock_getres() interface. The spec allows both resolutions: "an implementation (is required) to document the resolution supported for timers and nanosleep() if they differ from the supported clock resolution" This means that unfortunately only one can be determined at runtime via standard means, so if we are going to create nonportable interfaces, we should do it at least properly. bye, Roman ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] ktimers subsystem 2.6.14-rc2-kt5 2005-10-19 1:26 ` Roman Zippel @ 2005-10-19 2:52 ` George Anzinger 2005-10-21 16:22 ` Roman Zippel 0 siblings, 1 reply; 67+ messages in thread From: George Anzinger @ 2005-10-19 2:52 UTC (permalink / raw) To: Roman Zippel Cc: Thomas Gleixner, Andrew Morton, Ingo Molnar, linux-kernel, johnstul, paulmck, hch, oleg, tim.bird Roman Zippel wrote: > Hi, > > On Mon, 17 Oct 2005, George Anzinger wrote: > ~ > >>>>2. The rounding to the resolution value is explicitly required by the >>>>standard. >>> >>> >>>It doesn't explicitly specify which resolution (see the previous mail). >>>It doesn't explicitly specify how this rounding has to be implemented. >> >>In the timer_settime() call there is only one possible resolution referred to, >>that of the specified clock. The standard >>says(http://www.opengroup.org/onlinepubs/009695399/functions/timer_settime.html): >> >>Time values that are between two consecutive non-negative integer multiples of >>the resolution of the specified timer shall be rounded up to the larger >>multiple of the resolution. Quantization error shall not cause the timer to >>expire earlier than the rounded time value. >> >>This says a) round to the next resolution, and b) don't allow the resulting >>timer to expire early. The implication is that timers are to expire on >>resolution boundaries so we need to round such that the expire happens _after_ >>the rounded time. >> >>Am I missing something here? > > > In short: rounding errors. > > Above says IOW if we have a clock with a frequency f and a resolution with > r=10^9/f, we have to round time t up so that it becomes a integer multiple > i of r, so that once the counter reaches the value i all timer with up to a > time value of i*r are expired. > > If we now simply ignore the resolution fraction, we get a rounded value > which is quickly far away from the real value (with a worst case of r-1 > nsec). This means an explicit rounding is likely only to make things > worse and any rounding is better done as part of the conversion from/to > timespec to/from the counter value according to the above rules and even > this conversion should be avoided as much as possible to minimize rounding > errors. I think the rounding errors you are talking about would require us to define the clock period in something finer than nanoseconds. The usual practice is to work with a resolution specified in nanoseconds (which is the same units the user hands us). We then only worry about the last "resolution" or so of the elapsed time, rather than going back to the beginning of time. The math becomes harder when converting to a particular timer with resolution in the nanosecond area, as, for example, the TSC. Here we use what I call "scaled math" to both improve resolution and accuracy and to avoid the evil div instruction. It is rather easy to get accuracy down to a few parts per billion. I really don't think the math, however, is the issue here. Rather I think you would like to turn the hardware resolution into the resolution we use and send to the user. This, I think, is not quite the right way to go. Suppose, for example, we have a timer that will do micro second resolution. To provide this to the user implies that he is free to ask for timers that expire every micro second. Today, this is not really a wise thing to do as we would soon use all the cpu cycles doing interrupt overhead. So we define a resolution, say 100 micro seconds, and set things up that way. This means we, at most, need to handle timer interrupts once every 100 usecs (still not really wise, put possible with some of todays hardware). Now, if the timer we use actually has a resolution of 1.33333 usec, do we want to use a multiple of this as our resolution? Not really, folks would just get confused. We can just tell them it is 100usec and do the math. The errors introduced by this are, at most, 1.3333 usec, and they are NOT cumulative, as long as we do the math for each expiry. (If we try to compute a LATCH to use to get 100 usec periods, we will accumulate errors, so why do that?) A jitter of 1.3333 usec is well under the radar, being lost in the interrupt overhead. > > >>The assumption, that I think you made, that we can let the hardware do the >>rounding runs contrary to one of the main reasons for resolution, i.e. to >>group timers so that we can reduce the system overhead. Just because we have >>timer hardware with microsecond resolution is not reason enough to offer it to >>the user as handling an interrupt every micro second is way too much overhead, >>and, in most cases, the user doesn't even want to such a fine resolution. > > > This just means that we have two values to describe a timer, the clock > resolution describes the precision with which the timer can be programmed > and the timer precision which describes the maximum frequency of timer > expiry. I think both values are of interest to user applications, so my > current preference is to actually export them both properly instead of > overloading the clock_getres() interface. But, as I say above, we don't want to export the hardware detail, but an abstraction we build on top of it. Suppose we don't want to provide 100 usec timers except where really needed. We could provide a different abstraction that has, say 10 ms resolution. We could then set things up so that the user gets this all most all the time, say by define CLOCK_REALTIME with this resolution. We then might define CLOCK_REALTIME_HR to have a resolution of 100 usec. The user who needs it will realize that it has higher overhead (else why would we make it a bit harder to get to), and use it only when he needs the resolution it provides. There is no reason that both of these "clocks" can not use the same underlying code and hardware. At the same time they do not have to. > > The spec allows both resolutions: > > "an implementation (is required) to document the resolution supported for > timers and nanosleep() if they differ from the supported clock resolution" What we want to do, and what is done by others, is to define different clocks which carry their resolution to the timers used on them. This is a little orthogonal to the standard, but seems to be a reasonable extension. > > This means that unfortunately only one can be determined at runtime via > standard means, so if we are going to create non portable interfaces, we > should do it at least properly. > > bye, Roman -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] ktimers subsystem 2.6.14-rc2-kt5 2005-10-19 2:52 ` George Anzinger @ 2005-10-21 16:22 ` Roman Zippel 2005-10-23 18:17 ` George Anzinger 0 siblings, 1 reply; 67+ messages in thread From: Roman Zippel @ 2005-10-21 16:22 UTC (permalink / raw) To: George Anzinger Cc: Thomas Gleixner, Andrew Morton, Ingo Molnar, linux-kernel, johnstul, paulmck, hch, oleg, tim.bird Hi, On Tue, 18 Oct 2005, George Anzinger wrote: > > Above says IOW if we have a clock with a frequency f and a resolution with > > r=10^9/f, we have to round time t up so that it becomes a integer multiple i > > of r, so that once the counter reaches the value i all timer with up to a > > time value of i*r are expired. You don't specifically disagree, so I can assume you agree that this a valid interpretation of the spec? (I'm asking because it's important for the design of the timer system.) > > If we now simply ignore the resolution fraction, we get a rounded value > > which is quickly far away from the real value (with a worst case of r-1 > > nsec). This means an explicit rounding is likely only to make things worse > > and any rounding is better done as part of the conversion from/to timespec > > to/from the counter value according to the above rules and even this > > conversion should be avoided as much as possible to minimize rounding > > errors. > > I think the rounding errors you are talking about would require us to define > the clock period in something finer than nanoseconds. No, you don't have to, all you have to do is to make sure that "Quantization error shall not cause the timer to expire earlier than the rounded time value." IOW at the time the timer expires, the expiry time must not be greater than clock_gettime(). > Rather I think you would like to turn the hardware resolution into the > resolution we use and send to the user. This, I think, is not quite the right > way to go. Suppose, for example, we have a timer that will do micro second > resolution. To provide this to the user implies that he is free to ask for > timers that expire every micro second. Today, this is not really a wise thing > to do as we would soon use all the cpu cycles doing interrupt overhead. So we > define a resolution, say 100 micro seconds, and set things up that way. This > means we, at most, need to handle timer interrupts once every 100 usecs (still > not really wise, put possible with some of todays hardware). > > Now, if the timer we use actually has a resolution of 1.33333 usec, do we want > to use a multiple of this as our resolution? Not really, folks would just get > confused. We can just tell them it is 100usec and do the math. The errors > introduced by this are, at most, 1.3333 usec, and they are NOT cumulative, as > long as we do the math for each expiry. (If we try to compute a LATCH to use > to get 100 usec periods, we will accumulate errors, so why do that?) A jitter > of 1.3333 usec is well under the radar, being lost in the interrupt overhead. No, the error is worse, although I specifically talk about the rounding done in Thomas' patch, I'm not sure we're really talking about the same thing. I didn't mean the error caused by jitter, in this case I'd actually agree with you. He sets the resolution right now to (NSEC_PER_SEC/HZ) and uses this value to explicit round the time values. For example a timer is set to the value 1.1ms and is rounded to 2ms. The timer tick now actually expires at 1.2ms and could expire the timer, but it's instead expired at 2.2ms and user space sees an error of 1.1ms. A similiar error even exists with interval timer, e.g. an interval timer is set to 0.9ms and rounded to 1ms. If the clock now expires a little too early the timer will expire repeatedly one tick too late. In general due to this rounding and normal clock skew an extra error is added with an average value of half the timer resolution. > But, as I say above, we don't want to export the hardware detail, but an > abstraction we build on top of it. Suppose we don't want to provide 100 usec > timers except where really needed. We could provide a different abstraction > that has, say 10 ms resolution. We could then set things up so that the user > gets this all most all the time, say by define CLOCK_REALTIME with this > resolution. We then might define CLOCK_REALTIME_HR to have a resolution of > 100 usec. The user who needs it will realize that it has higher overhead > (else why would we make it a bit harder to get to), and use it only when he > needs the resolution it provides. > > There is no reason that both of these "clocks" can not use the same underlying > code and hardware. At the same time they do not have to. I don't have a problem with this at all. I think it's fine to leave clock_getres(CLOCK_REALTIME) at a save value. > > The spec allows both resolutions: > > > > "an implementation (is required) to document the resolution supported for > > timers and nanosleep() if they differ from the supported clock resolution" > > What we want to do, and what is done by others, is to define different clocks > which carry their resolution to the timers used on them. This is a little > orthogonal to the standard, but seems to be a reasonable extension. Could you please give me an example for "others"? I don't think I have a problem with this. My point is to better define "reasonable extension" or to be more specific what user expectations are reasonable. What is needed by applications and where exactly do we draw the line, when it comes to extra complexity in the timer design. I wouldn't a priori exclude the possibility to break some user applications, which have unreasonable expectations. We did this in the past (e.g. sched_yield()), we simply fixed the applications and moved on, but this requires more information about what applications expect about high resolution timer. George, many thanks for trying to understand me and helping me to get a better understanding of the issues. I see more misunderstandings than disagreements, so I'd be really grateful, if you can help me later to translate this into something Thomas and Ingo can understand. :-) bye, Roman ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] ktimers subsystem 2.6.14-rc2-kt5 2005-10-21 16:22 ` Roman Zippel @ 2005-10-23 18:17 ` George Anzinger 2005-10-27 20:23 ` Roman Zippel 0 siblings, 1 reply; 67+ messages in thread From: George Anzinger @ 2005-10-23 18:17 UTC (permalink / raw) To: Roman Zippel Cc: Thomas Gleixner, Andrew Morton, Ingo Molnar, linux-kernel, johnstul, paulmck, hch, oleg, tim.bird Roman Zippel wrote: > Hi, > > On Tue, 18 Oct 2005, George Anzinger wrote: > > >>>Above says IOW if we have a clock with a frequency f and a resolution with >>>r=10^9/f, we have to round time t up so that it becomes a integer multiple i >>>of r, so that once the counter reaches the value i all timer with up to a >>>time value of i*r are expired. > > > You don't specifically disagree, so I can assume you agree that this a > valid interpretation of the spec? > (I'm asking because it's important for the design of the timer system.) I agree with the proviso that we can define such a clock as an abstraction of a clock with a better resolution. I.e. we can provide clocks with lesser resolution than the physical clock has. > > >>>If we now simply ignore the resolution fraction, we get a rounded value >>>which is quickly far away from the real value (with a worst case of r-1 >>>nsec). This means an explicit rounding is likely only to make things worse >>>and any rounding is better done as part of the conversion from/to timespec >>>to/from the counter value according to the above rules and even this >>>conversion should be avoided as much as possible to minimize rounding >>>errors. >> >>I think the rounding errors you are talking about would require us to define >>the clock period in something finer than nanoseconds. > > > No, you don't have to, all you have to do is to make sure that > "Quantization error shall not cause the timer to expire earlier than the > rounded time value." IOW at the time the timer expires, the expiry time > must not be greater than clock_gettime(). That should be "less than", but yes. The comment I was making is that the math is not that hard to get right. > > >>Rather I think you would like to turn the hardware resolution into the >>resolution we use and send to the user. This, I think, is not quite the right >>way to go. Suppose, for example, we have a timer that will do micro second >>resolution. To provide this to the user implies that he is free to ask for >>timers that expire every micro second. Today, this is not really a wise thing >>to do as we would soon use all the cpu cycles doing interrupt overhead. So we >>define a resolution, say 100 micro seconds, and set things up that way. This >>means we, at most, need to handle timer interrupts once every 100 usecs (still >>not really wise, put possible with some of todays hardware). >> >>Now, if the timer we use actually has a resolution of 1.33333 usec, do we want >>to use a multiple of this as our resolution? Not really, folks would just get >>confused. We can just tell them it is 100usec and do the math. The errors >>introduced by this are, at most, 1.3333 usec, and they are NOT cumulative, as >>long as we do the math for each expiry. (If we try to compute a LATCH to use >>to get 100 usec periods, we will accumulate errors, so why do that?) A jitter >>of 1.3333 usec is well under the radar, being lost in the interrupt overhead. > > > No, the error is worse, although I specifically talk about the rounding > done in Thomas' patch, I'm not sure we're really talking about the same > thing. I didn't mean the error caused by jitter, in this case I'd > actually agree with you. > > He sets the resolution right now to (NSEC_PER_SEC/HZ) and uses this value > to explicit round the time values. For example a timer is set to the value > 1.1ms and is rounded to 2ms. The timer tick now actually expires at 1.2ms > and could expire the timer, but it's instead expired at 2.2ms and user > space sees an error of 1.1ms. > A similiar error even exists with interval timer, e.g. an interval timer > is set to 0.9ms and rounded to 1ms. If the clock now expires a little > too early the timer will expire repeatedly one tick too late. > > In general due to this rounding and normal clock skew an extra error is > added with an average value of half the timer resolution. I admit I have not looked, in detail, at this part of ktimers, however, assuming that the clock ticks at HZ then the normal error to be expected is and average of 1/2 the resolution with a max of 1 resolution. This is AFTER the rounding to the next resolution, so we can expect the expiry to be any where from 0 to 2*resolution-1. (up to resolution-1 from rounding, and up to one resolution from clock skew). This the way I and every one I have worked with understand the standard. In your example, consider a request for 0.1ms rounded to 1ms.... > > >>But, as I say above, we don't want to export the hardware detail, but an >>abstraction we build on top of it. Suppose we don't want to provide 100 usec >>timers except where really needed. We could provide a different abstraction >>that has, say 10 ms resolution. We could then set things up so that the user >>gets this all most all the time, say by define CLOCK_REALTIME with this >>resolution. We then might define CLOCK_REALTIME_HR to have a resolution of >>100 usec. The user who needs it will realize that it has higher overhead >>(else why would we make it a bit harder to get to), and use it only when he >>needs the resolution it provides. >> >>There is no reason that both of these "clocks" can not use the same underlying >>code and hardware. At the same time they do not have to. > > > I don't have a problem with this at all. I think it's fine to leave > clock_getres(CLOCK_REALTIME) at a save value. > > >>>The spec allows both resolutions: >>> >>>"an implementation (is required) to document the resolution supported for >>>timers and nanosleep() if they differ from the supported clock resolution" >> >>What we want to do, and what is done by others, is to define different clocks >>which carry their resolution to the timers used on them. This is a little >>orthogonal to the standard, but seems to be a reasonable extension. > > > Could you please give me an example for "others"? Well, I know that HP in the HPRT system (likely long gone by now) did it this way. That was back prior to 1997. The system was based on the HP PA risc arch which has a system timer based on a cycle counter (rather like the PPC, but different). > I don't think I have a problem with this. My point is to better define > "reasonable extension" or to be more specific what user expectations are > reasonable. What is needed by applications and where exactly do we draw > the line, when it comes to extra complexity in the timer design. > > I wouldn't a priori exclude the possibility to break some user > applications, which have unreasonable expectations. We did this in the > past (e.g. sched_yield()), we simply fixed the applications and moved on, > but this requires more information about what applications expect about > high resolution timer. We have had good acceptance of the HRT patch in our customer base. As far as I know we have not gotten any feed back on just what resolution they want or use. We allow them to define it (within reason) at configure time. I have been recommending, for the x86, nothing better than about 100usec but this is based on my machine being able to handle an interrupt in about that amount of time. We don't require alignment of the resolution with the actual hardware resolution as at these levels the interrupt jitter smooths over any issues in this area. A comment here, in some of your math examples you seem to be implying that we are going to use a particular resolution from the begining of time to compute expiry time. In fact, we start from "now" as defined by a, possibly corrected, system clock. Once we have the rounded expiry time we use full resolution math to figure how to fit that into the timing services. So, infact, the resolution comes into play only over 1 to 2 resolutions of the requested time. In other words, errors do not accumulate since we always mark to the corrected clock. > > George, many thanks for trying to understand me and helping me to get a > better understanding of the issues. I see more misunderstandings than > disagreements, so I'd be really grateful, if you can help me later to > translate this into something Thomas and Ingo can understand. :-) > No problem. Do be advised I will be out most of next week through the end of Oct. -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] ktimers subsystem 2.6.14-rc2-kt5 2005-10-23 18:17 ` George Anzinger @ 2005-10-27 20:23 ` Roman Zippel 2005-10-28 4:52 ` Steven Rostedt 0 siblings, 1 reply; 67+ messages in thread From: Roman Zippel @ 2005-10-27 20:23 UTC (permalink / raw) To: George Anzinger Cc: Thomas Gleixner, Andrew Morton, Ingo Molnar, linux-kernel, johnstul, paulmck, hch, oleg, tim.bird Hi, On Sun, 23 Oct 2005, George Anzinger wrote: > > > > Above says IOW if we have a clock with a frequency f and a resolution with > > > > r=10^9/f, we have to round time t up so that it becomes a integer multiple i > > > > of r, so that once the counter reaches the value i all timer with up to a > > > > time value of i*r are expired. > > > > > > You don't specifically disagree, so I can assume you agree that this a valid > > interpretation of the spec? > > (I'm asking because it's important for the design of the timer system.) > > I agree with the proviso that we can define such a clock as an abstraction of > a clock with a better resolution. I.e. we can provide clocks with lesser > resolution than the physical clock has. I had a different aspect in mind: at what resolution are we doing the calculations? Let's say we have a clock with a frequency of 300Hz, now we could programm the timer like this: tmp = time * 300; clock = tmp / 10^9 + (tmp % 10^9 != 0); This rounds the time to the next clock count and as soon as the clock reaches this count the timer is expired. OTOH We could also do this at the timer interrupt: tmp = clock * 10^9; time = tmp / 300 + (tmp % 300 != 0); and use time to expire all timer upto this time. In either case the behaviour is exactly the same. The problem is now that we can export the real resolution only as integer value. What consequences has this to the kernel timer implementation? Something like above must be done anyway, so what's the point in doing an extra rounding step? For example if we set a timer to expire at 999999990ns, so the next interrupt is at 1000000000ns, but rounding it to 3333333ns means the expiry time changes to 1003333233ns and the timer expires one clock tick later. Which application seriously expects this kind of behaviour? > I admit I have not looked, in detail, at this part of ktimers, however, > assuming that the clock ticks at HZ then the normal error to be expected is > and average of 1/2 the resolution with a max of 1 resolution. This is AFTER > the rounding to the next resolution, so we can expect the expiry to be any > where from 0 to 2*resolution-1. (up to resolution-1 from rounding, and up to > one resolution from clock skew). This the way I and every one I have worked > with understand the standard. For relative timer I agree that the error can be twice the resolution. First the value read from the clock must be rounded up and then we still have to wait till the next clock tick. OTOH for absolute timer we don't need the first step, we just have to wait until the clock reaches this time. Why should we add an extra error to it, if we can avoid it? The spec actually says "a timer expiration signal is requested when the associated clock reaches or exceeds the specified time." The clock resolution causes the actual expiration time automatically to be a rounded value of the requested value. Next question would be what happens if timer and clock resolution differs? For example if the clock has a resolution of 1us and the timer runs every 1ms. For relative timer this would mean we can keep the error within 1.001ms and for absolute timer within 1ms. Do we really have to force an error larger than really necessary? Interesting is now that Thomas doesn't take the clock resolution into account at all. Let's say clock and timer resolution are 1ms (or HZ=1000). If we program a normal kernel timer, we do something like this: timer->expires = jiffies + 1 + usecs_to_jiffies(timeout); Thomas does now basically this: timer->expires = jiffies * res + round(timeout, res); IOW if the clock resolution is larger than the interrupt delay, the timer may expire early. > We have had good acceptance of the HRT patch in our customer base. As far as > I know we have not gotten any feed back on just what resolution they want or > use. We allow them to define it (within reason) at configure time. I have > been recommending, for the x86, nothing better than about 100usec but this is > based on my machine being able to handle an interrupt in about that amount of > time. > > We don't require alignment of the resolution with the actual hardware > resolution as at these levels the interrupt jitter smooths over any issues in > this area. I expected as much, so users who do care make sure the timer resolution is good enough. In this case I would also expect that they are interested in keeping the error as small as possible. > A comment here, in some of your math examples you seem to be > implying that we are going to use a particular resolution from the begining of > time to compute expiry time. In fact, we start from "now" as defined by a, > possibly corrected, system clock. Once we have the rounded expiry time we use > full resolution math to figure how to fit that into the timing services. So, > infact, the resolution comes into play only over 1 to 2 resolutions of the > requested time. In other words, errors do not accumulate since we always mark > to the corrected clock. I didn't imply that, I tried to keep focus on the model as described in the spec. I think we should keep the focus on the behaviour this model describes, no user cares how the kernel implements the spec, just that the visible behaviour matches the spec. SUS rationale specifically says "The interfaces also allow flexibility in the implementation of the functions. For example, ..." (in "Relationship of Timers to Clocks"), i.e. there is not one true implementation and so I think it's very well worth it to explore our options. Reducing the whole design to a single number (the resolution returned by clock_getres()) would be IMO very shortsighted. We could very well allow the user to define his own timer based on various parameters, so he can adjust the timer to his needs. bye, Roman ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] ktimers subsystem 2.6.14-rc2-kt5 2005-10-27 20:23 ` Roman Zippel @ 2005-10-28 4:52 ` Steven Rostedt 2005-10-28 16:06 ` Roman Zippel 0 siblings, 1 reply; 67+ messages in thread From: Steven Rostedt @ 2005-10-28 4:52 UTC (permalink / raw) To: Roman Zippel Cc: tim.bird, oleg, hch, paulmck, johnstul, linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner, George Anzinger On Thu, 2005-10-27 at 22:23 +0200, Roman Zippel wrote: > > Next question would be what happens if timer and clock resolution differs? > For example if the clock has a resolution of 1us and the timer runs every > 1ms. For relative timer this would mean we can keep the error within > 1.001ms and for absolute timer within 1ms. Do we really have to force an > error larger than really necessary? > > Interesting is now that Thomas doesn't take the clock resolution into > account at all. Let's say clock and timer resolution are 1ms (or HZ=1000). > If we program a normal kernel timer, we do something like this: > > timer->expires = jiffies + 1 + usecs_to_jiffies(timeout); > > Thomas does now basically this: > > timer->expires = jiffies * res + round(timeout, res); > > IOW if the clock resolution is larger than the interrupt delay, the timer > may expire early. Roman, I think I know what you are trying to say here. Although it took me several readings of what you wrote and then really just looking at Thomas' code. It's the old problem with: 1 2 3 4 +----------+----------+----------+---------->> ^ ^ | | Start End Asking for 2 ms (with both clock and res the same at 1ms). We start the clock at 1 but it really is 1.7 and we get the interrupt and return at 3 but really 3.2, so instead of receiving a wait of 2ms, we return with 3.2 - 1.7 = 1.5ms Currently, this is not a problem when the clock is at a higher frequency, (like the tsc). So the base->get_time works now since the clock is at a higher frequency, but if the get_time returned jiffies, this would fail. And the clock used is also much faster that the delay it takes to get back to the calling process (which is much more than a nanosecond today). Is that what you were trying to say Roman? Interesting though, I tried to force this scenario, by changing the base->get_time to return jiffies. I have a jitter test and ran this several times, and I could never get it to expire early. I even changed HZ back to 100. Then I looked at run_ktimer_queue. And here we have the compare: timer = list_entry(base->pending.next, struct ktimer, list); if (ktime_cmp(now, <=, timer->expires)) break; So, the timer does _not_ get processed if it is after or _equal_ to the current time. So although the timer may go off early, the expired queue does not get executed. So the above example would not go off at 3.2, but some time in the 4 category. So the function will _not_ be executed early, although this could mean that the timer could actually go off early (in the HRT case), but I haven't taken a look there. That is to say the interrupt goes off early, not the function being executed. -- Steve ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] ktimers subsystem 2.6.14-rc2-kt5 2005-10-28 4:52 ` Steven Rostedt @ 2005-10-28 16:06 ` Roman Zippel 0 siblings, 0 replies; 67+ messages in thread From: Roman Zippel @ 2005-10-28 16:06 UTC (permalink / raw) To: Steven Rostedt Cc: tim.bird, oleg, Christoph Hellwig, paulmck, johnstul, linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner, George Anzinger Hi, On Fri, 28 Oct 2005, Steven Rostedt wrote: > Roman, I think I know what you are trying to say here. Although it took > me several readings of what you wrote and then really just looking at > Thomas' code. Thanks for the effort. :-) I know I'm sometimes a bit difficult to understand, which makes it easier to simply flame me for a silly mistake. > It's the old problem with: > > 1 2 3 4 > +----------+----------+----------+---------->> > ^ ^ > | | > Start End > > Asking for 2 ms (with both clock and res the same at 1ms). We start the > clock at 1 but it really is 1.7 and we get the interrupt and return at 3 > but really 3.2, so instead of receiving a wait of 2ms, we return with > 3.2 - 1.7 = 1.5ms > > Currently, this is not a problem when the clock is at a higher > frequency, (like the tsc). So the base->get_time works now since the > clock is at a higher frequency, but if the get_time returned jiffies, > this would fail. And the clock used is also much faster that the delay > it takes to get back to the calling process (which is much more than a > nanosecond today). > > Is that what you were trying to say Roman? Yes. > Interesting though, I tried to force this scenario, by changing the > base->get_time to return jiffies. I have a jitter test and ran this > several times, and I could never get it to expire early. I even changed > HZ back to 100. > > Then I looked at run_ktimer_queue. And here we have the compare: > > timer = list_entry(base->pending.next, struct ktimer, list); > if (ktime_cmp(now, <=, timer->expires)) > break; > > So, the timer does _not_ get processed if it is after or _equal_ to the > current time. So although the timer may go off early, the expired queue > does not get executed. So the above example would not go off at 3.2, > but some time in the 4 category. > > So the function will _not_ be executed early, although this could mean > that the timer could actually go off early (in the HRT case), but I > haven't taken a look there. That is to say the interrupt goes off > early, not the function being executed. You're correct. I missed that comparison, so if clock resolution and timer resolution are equal, this indeed works. It still goes wrong if the resolutions are different. get_time() normally wouldn't use jiffies but xtime. Thomas uses a fixed resolution, but xtime updates are not constant. On my machine here with HZ=250 the resolution would be 4000000ns, but xtime is updated by about 4000150ns every tick. This means the timer value is rounded to full 4ms, but this is not enough to get it past the next tick. In the other more common case, where clock resolution is smaller than the timer resolution, this means the delay may be larger than really necessary. bye, Roman ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] ktimers subsystem 2.6.14-rc2-kt5 2005-10-17 9:41 ` Ingo Molnar 2005-10-17 9:56 ` Andrew Morton @ 2005-10-17 16:33 ` Roman Zippel 2005-10-17 16:39 ` Ingo Molnar 1 sibling, 1 reply; 67+ messages in thread From: Roman Zippel @ 2005-10-17 16:33 UTC (permalink / raw) To: Ingo Molnar Cc: Thomas Gleixner, George Anzinger, linux-kernel, Andrew Morton, johnstul, paulmck, Christoph Hellwig, oleg, tim.bird Hi, On Mon, 17 Oct 2005, Ingo Molnar wrote: > if a dozen mails werent enough then one more probably wont make a > difference, Just for the record: in this thread I got exactly three answers from Thomas. I don't know where you got the other nine mails from, maybe you could forward them to me, as they seem to contain the "patient explanations" I'm missing. bye, Roman ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] ktimers subsystem 2.6.14-rc2-kt5 2005-10-17 16:33 ` Roman Zippel @ 2005-10-17 16:39 ` Ingo Molnar 2005-10-17 16:54 ` Roman Zippel 0 siblings, 1 reply; 67+ messages in thread From: Ingo Molnar @ 2005-10-17 16:39 UTC (permalink / raw) To: Roman Zippel Cc: Thomas Gleixner, George Anzinger, linux-kernel, Andrew Morton, johnstul, paulmck, Christoph Hellwig, oleg, tim.bird * Roman Zippel <zippel@linux-m68k.org> wrote: > Hi, > > On Mon, 17 Oct 2005, Ingo Molnar wrote: > > > if a dozen mails werent enough then one more probably wont make a > > difference, > > Just for the record: in this thread I got exactly three answers from > Thomas. I don't know where you got the other nine mails from, maybe > you could forward them to me, as they seem to contain the "patient > explanations" I'm missing. here are all the replies from Thomas, regarding ktimers: 12359 * Sep 22 Thomas Gleixner ( 319) Re: [ANNOUNCE] ktimers subsystem 12362 * Sep 23 Thomas Gleixner ( 49) Re: [ANNOUNCE] ktimers subsystem 12363 * Sep 23 Thomas Gleixner ( 235) Re: [ANNOUNCE] ktimers subsystem 12367 * Sep 24 Thomas Gleixner ( 214) Re: [ANNOUNCE] ktimers subsystem 12368 * Sep 25 Thomas Gleixner ( 25) Re: [ANNOUNCE] ktimers subsystem 12369 * Sep 25 Thomas Gleixner ( 17) Re: [ANNOUNCE] ktimers subsystem 12370 * Sep 25 Thomas Gleixner ( 10) Re: [ANNOUNCE] ktimers subsystem 12387 * Oct 01 Thomas Gleixner ( 817) Re: [PATCH] ktimers subsystem 2.6.14-rc 12419 * Oct 11 Thomas Gleixner ( 41) Re: [PATCH] ktimers subsystem 2.6.14-rc 12434 * Oct 16 Thomas Gleixner ( 40) Re: [PATCH] ktimers subsystem 2.6.14-rc some of them very large and detailed. Ingo ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] ktimers subsystem 2.6.14-rc2-kt5 2005-10-17 16:39 ` Ingo Molnar @ 2005-10-17 16:54 ` Roman Zippel 2005-10-17 17:35 ` Ingo Molnar 0 siblings, 1 reply; 67+ messages in thread From: Roman Zippel @ 2005-10-17 16:54 UTC (permalink / raw) To: Ingo Molnar Cc: Thomas Gleixner, George Anzinger, linux-kernel, Andrew Morton, johnstul, paulmck, Christoph Hellwig, oleg, tim.bird Hi, On Mon, 17 Oct 2005, Ingo Molnar wrote: > here are all the replies from Thomas, regarding ktimers: > > 12359 * Sep 22 Thomas Gleixner ( 319) Re: [ANNOUNCE] ktimers subsystem > 12362 * Sep 23 Thomas Gleixner ( 49) Re: [ANNOUNCE] ktimers subsystem > 12363 * Sep 23 Thomas Gleixner ( 235) Re: [ANNOUNCE] ktimers subsystem > 12367 * Sep 24 Thomas Gleixner ( 214) Re: [ANNOUNCE] ktimers subsystem > 12368 * Sep 25 Thomas Gleixner ( 25) Re: [ANNOUNCE] ktimers subsystem > 12369 * Sep 25 Thomas Gleixner ( 17) Re: [ANNOUNCE] ktimers subsystem > 12370 * Sep 25 Thomas Gleixner ( 10) Re: [ANNOUNCE] ktimers subsystem Different thread and not directly related to issues with the patch. > 12387 * Oct 01 Thomas Gleixner ( 817) Re: [PATCH] ktimers subsystem 2.6.14-rc > 12419 * Oct 11 Thomas Gleixner ( 41) Re: [PATCH] ktimers subsystem 2.6.14-rc > 12434 * Oct 16 Thomas Gleixner ( 40) Re: [PATCH] ktimers subsystem 2.6.14-rc That's the only mails related to the patch. bye, Roman ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] ktimers subsystem 2.6.14-rc2-kt5 2005-10-17 16:54 ` Roman Zippel @ 2005-10-17 17:35 ` Ingo Molnar 0 siblings, 0 replies; 67+ messages in thread From: Ingo Molnar @ 2005-10-17 17:35 UTC (permalink / raw) To: Roman Zippel Cc: Thomas Gleixner, George Anzinger, linux-kernel, Andrew Morton, johnstul, paulmck, Christoph Hellwig, oleg, tim.bird * Roman Zippel <zippel@linux-m68k.org> wrote: > > > Just for the record: in this thread I got exactly three answers > > > from Thomas. I don't know where you got the other nine mails from, > > > maybe you could forward them to me, as they seem to contain the > > > "patient explanations" I'm missing. > > > > > here are all the replies from Thomas, regarding ktimers: > > > > 12359 * Sep 22 Thomas Gleixner ( 319) Re: [ANNOUNCE] ktimers subsystem > > 12362 * Sep 23 Thomas Gleixner ( 49) Re: [ANNOUNCE] ktimers subsystem > > 12363 * Sep 23 Thomas Gleixner ( 235) Re: [ANNOUNCE] ktimers subsystem > > 12367 * Sep 24 Thomas Gleixner ( 214) Re: [ANNOUNCE] ktimers subsystem > > 12368 * Sep 25 Thomas Gleixner ( 25) Re: [ANNOUNCE] ktimers subsystem > > 12369 * Sep 25 Thomas Gleixner ( 17) Re: [ANNOUNCE] ktimers subsystem > > 12370 * Sep 25 Thomas Gleixner ( 10) Re: [ANNOUNCE] ktimers subsystem > > Different thread and not directly related to issues with the patch. ugh, what were they about then, poetry? Ah i think i know what you mean: these were about a PREVIOUS VERSION of the patch, and hence they fell off the face of the earth, regardless of their content, right? What a tricky little definition of "Thomas replied only 3 times" ... > > 12387 * Oct 01 Thomas Gleixner ( 817) Re: [PATCH] ktimers subsystem 2.6.14-rc > > 12419 * Oct 11 Thomas Gleixner ( 41) Re: [PATCH] ktimers subsystem 2.6.14-rc > > 12434 * Oct 16 Thomas Gleixner ( 40) Re: [PATCH] ktimers subsystem 2.6.14-rc > > That's the only mails related to the patch. your latest mail with the list of 'open' issues seems to contradict your assertion that the above 3 mails from Thomas where "the only mails related to the patch". E.g.: ' - "timer API" vs "timeout API": I got absolutely no acknowlegement that this might be a little confusing and in consequence "process timer" may be a better name. ' was raised and discussed in the first chunk of mails just as well. Ingo ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] ktimers subsystem 2.6.14-rc2-kt5 2005-10-17 9:29 ` Roman Zippel 2005-10-17 9:41 ` Ingo Molnar @ 2005-10-17 9:54 ` Steven Rostedt 1 sibling, 0 replies; 67+ messages in thread From: Steven Rostedt @ 2005-10-17 9:54 UTC (permalink / raw) To: Roman Zippel Cc: Ingo Molnar, Thomas Gleixner, George Anzinger, linux-kernel, Andrew Morton, johnstul, paulmck, Christoph Hellwig, oleg, tim.bird On Mon, 17 Oct 2005, Roman Zippel wrote: > Hi, > > On Mon, 17 Oct 2005, Ingo Molnar wrote: > > > the thing is that Thomas has advanced the whole issue of timeouts and > > timekeeping by leaps and bounds and he has written thousands of lines of > > new and excellent code for a kernel subsystem that has seen little > > activity for many years, before John got involved. One of Thomas' > > accomplishments is a timer/time design that allows the enabling of HRT > > timers via an _18 lines_ architecture patch. (!) > > Did I say these patches were bad in general? All I'm asking for is an > explanation for a few design decisions to understand the patch and its > behaviour better and evaluate alternative solutions. > Neither of you have shown any real interest in this so far. > Well, for me anyway, the best way I have with understanding ones decisions in their code design _is_ to start playing with the code. Try it the way you want and you might realize things don't work so well, and then you might understand why Thomas did it his way. There's several times where I thought I could write something better, and after playing with it, the problems start to arise where I then become "enlightened" by the decisions others have made. > > the moment you express yourself via patches we'll know that 1) you > > understand what we have done so far 2) you have useful ideas of what > > should be done differently 3) you have the coder capability to implement > > and test those ideas. Patches wont be ignored, i can assure you. Get the > > patches rolling! > > This "shut up and show code" attitude is sometimes quite funny, but it's > no real threat to me. I hoped to avoid this and solve this more civilized. > Of course I'll understand the issues better afterwards, but you could as > easily just tell me. It will waste my time, I could spend on other > projects and it will put Andrew in the unfortunate position to decide, > which patch to accept. > Is this really what you want? > I think what Ingo is saying, is to modify Thomas' code and show where it is failing, instead of just talking about it. You can ask "why" he did something, but I think Thomas gave you enough in his answers. If you are still not satisfied, then that is the time to start playing with the code and find the problems, fix them and show us that "yes" your way is better. Don't just ask why Thomas did it one way without a patch that changes it to show us why he shouldn't have. -- Steve ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] ktimers subsystem 2.6.14-rc2-kt5 2005-10-01 1:03 ` Roman Zippel 2005-10-01 11:22 ` Ingo Molnar 2005-10-01 12:05 ` Thomas Gleixner @ 2005-10-04 1:55 ` George Anzinger 2 siblings, 0 replies; 67+ messages in thread From: George Anzinger @ 2005-10-04 1:55 UTC (permalink / raw) To: Roman Zippel Cc: tglx, linux-kernel, mingo, Andrew Morton, johnstul, paulmck, Christoph Hellwig, oleg, tim.bird Roman Zippel wrote: > > > Could you explain a little the resolution handling behind in your patch? > If I read SUS correctly clock resolution and timer resolution don't have > to be the same, the first is returned by clock_getres() and the latter > only documented somewhere (and AFAICT our implementation always returned > the wrong value). > IMO this also means we can don't have to make the rounding that > complicated. Actually it could be done automatically by the timer, e.g. > interval timer are reprogrammed at (now + interval) and the timer > resolution will automatically round it up. As I understand it the resolution should apply to timers assigned to the given clock. I assume most clock reads will return the best resolution possible, but we can only know what that is (in user land) by looking at at series of clock reads and making an educated guess (if indeed we care). For timers, on the other hand, resolution serves two purposes: a) it tells the user/ application what to expect and allows him to take evasive action (such as asking for the timer to expire a "res" amount sooner) to get what he wants/needs. b) for the kernel, it allows timers to be grouped such that we can limit the number of interrupts we need to service to handle timers. Some of this might be possible by relying on the hardware, but a lot of hardware may actually be able to handle nanosecond resolution. At that point you end up grouping by latency and getting to the point were, for no good reason, you have the possibility of timer storms. For no good reason, i.e. the user really doesn't need or want that level of resolution, being happy with, for example 10 microseconds or some such. This is why, in the HRT patch, the same can be said of the new ability to set HZ at configure time. > > -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ ^ permalink raw reply [flat|nested] 67+ messages in thread
end of thread, other threads:[~2005-10-28 16:07 UTC | newest] Thread overview: 67+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2005-09-28 20:43 [PATCH] ktimers subsystem 2.6.14-rc2-kt5 tglx 2005-09-28 23:59 ` Frank Sorenson 2005-09-29 0:50 ` Frank Sorenson 2005-09-29 0:56 ` john stultz 2005-09-29 1:05 ` Frank Sorenson 2005-09-29 1:10 ` john stultz 2005-09-29 6:53 ` Thomas Gleixner 2005-09-30 15:58 ` Frank Sorenson 2005-09-29 19:57 ` George Anzinger 2005-10-01 1:03 ` Roman Zippel 2005-10-01 11:22 ` Ingo Molnar 2005-10-04 1:59 ` George Anzinger 2005-10-04 5:51 ` Ingo Molnar 2005-10-10 12:42 ` Roman Zippel 2005-10-10 14:04 ` Ingo Molnar 2005-10-01 12:05 ` Thomas Gleixner 2005-10-10 17:22 ` Roman Zippel 2005-10-11 7:42 ` Thomas Gleixner 2005-10-12 22:36 ` Roman Zippel 2005-10-12 23:46 ` George Anzinger 2005-10-16 16:34 ` Roman Zippel 2005-10-16 19:26 ` Thomas Gleixner 2005-10-16 23:03 ` Roman Zippel 2005-10-17 7:59 ` Ingo Molnar 2005-10-17 8:26 ` Steven Rostedt 2005-10-17 9:29 ` Roman Zippel 2005-10-17 9:41 ` Ingo Molnar 2005-10-17 9:56 ` Andrew Morton 2005-10-17 11:00 ` Ingo Molnar 2005-10-17 16:25 ` Roman Zippel 2005-10-17 16:49 ` Tim Bird 2005-10-17 17:26 ` Steven Rostedt 2005-10-17 18:49 ` Roman Zippel 2005-10-17 19:19 ` Tim Bird 2005-10-17 19:48 ` Roman Zippel 2005-10-17 20:13 ` Ingo Molnar 2005-10-17 20:31 ` Roman Zippel 2005-10-18 8:46 ` Ingo Molnar 2005-10-18 23:52 ` Tim Bird 2005-10-19 0:03 ` George Anzinger 2005-10-19 1:58 ` Roman Zippel 2005-10-19 6:46 ` Ingo Molnar 2005-10-19 10:49 ` kernel/timer.c design (was: Re: ktimers subsystem) Ingo Molnar 2005-10-19 17:48 ` kernel/timer.c design Tim Bird 2005-10-19 18:00 ` Tim Bird 2005-10-19 19:04 ` Thomas Gleixner 2005-10-19 22:12 ` kernel/timer.c design (was: Re: ktimers subsystem) Roman Zippel 2005-10-19 11:40 ` [PATCH] ktimers subsystem 2.6.14-rc2-kt5 Ingo Molnar 2005-10-19 11:58 ` Ingo Molnar 2005-10-19 22:24 ` Roman Zippel 2005-10-17 20:09 ` Ingo Molnar 2005-10-17 20:55 ` Thomas Gleixner 2005-10-18 0:07 ` Roman Zippel 2005-10-18 1:03 ` George Anzinger 2005-10-19 1:26 ` Roman Zippel 2005-10-19 2:52 ` George Anzinger 2005-10-21 16:22 ` Roman Zippel 2005-10-23 18:17 ` George Anzinger 2005-10-27 20:23 ` Roman Zippel 2005-10-28 4:52 ` Steven Rostedt 2005-10-28 16:06 ` Roman Zippel 2005-10-17 16:33 ` Roman Zippel 2005-10-17 16:39 ` Ingo Molnar 2005-10-17 16:54 ` Roman Zippel 2005-10-17 17:35 ` Ingo Molnar 2005-10-17 9:54 ` Steven Rostedt 2005-10-04 1:55 ` George Anzinger
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).