* [PATCH 0/2] Dynamic Tick: Enabling longer sleep times on 32-bit machines
@ 2009-08-18 17:45 Jon Hunter
2009-08-18 17:45 ` [PATCH 1/2] Dynamic Tick: Prevent clocksource wrapping during idle Jon Hunter
2009-11-11 20:43 ` [PATCH 0/2] Dynamic Tick: Enabling longer sleep times on 32-bit machines john stultz
0 siblings, 2 replies; 29+ messages in thread
From: Jon Hunter @ 2009-08-18 17:45 UTC (permalink / raw)
To: linux-kernel; +Cc: Thomas Gleixner, John Stultz, Jon Hunter
From: Jon Hunter <jon-hunter@ti.com>
This is a resend of the patch series shown here:
http://www.spinics.net/lists/kernel/msg891029.html
This patch series has been rebase on the linux-2.6-tip timers/core branch per
request from Thomas Gleixner.
This patch series ensures that the wrapping of the clocksource will not be
missed if the kernel sleeps for longer periods and allows 32-bit machines to
sleep for longer than 2.15 seconds.
Jon Hunter (2):
Dynamic Tick: Prevent clocksource wrapping during idle
Dynamic Tick: Allow 32-bit machines to sleep for more than 2.15
seconds
include/linux/clockchips.h | 6 ++--
include/linux/clocksource.h | 2 +
include/linux/time.h | 1 +
kernel/hrtimer.c | 2 +-
kernel/time/clockevents.c | 10 ++++----
kernel/time/clocksource.c | 47 +++++++++++++++++++++++++++++++++++
kernel/time/tick-oneshot.c | 2 +-
kernel/time/tick-sched.c | 57 ++++++++++++++++++++++++++++++++----------
kernel/time/timekeeping.c | 11 ++++++++
kernel/time/timer_list.c | 4 +-
10 files changed, 116 insertions(+), 26 deletions(-)
^ permalink raw reply [flat|nested] 29+ messages in thread* [PATCH 1/2] Dynamic Tick: Prevent clocksource wrapping during idle 2009-08-18 17:45 [PATCH 0/2] Dynamic Tick: Enabling longer sleep times on 32-bit machines Jon Hunter @ 2009-08-18 17:45 ` Jon Hunter 2009-08-18 17:45 ` [PATCH 2/2] Dynamic Tick: Allow 32-bit machines to sleep for more than 2.15 seconds Jon Hunter ` (2 more replies) 2009-11-11 20:43 ` [PATCH 0/2] Dynamic Tick: Enabling longer sleep times on 32-bit machines john stultz 1 sibling, 3 replies; 29+ messages in thread From: Jon Hunter @ 2009-08-18 17:45 UTC (permalink / raw) To: linux-kernel; +Cc: Thomas Gleixner, John Stultz, Jon Hunter From: Jon Hunter <jon-hunter@ti.com> The dynamic tick allows the kernel to sleep for periods longer than a single tick. This patch prevents that the kernel from sleeping for a period longer than the maximum time that the current clocksource can count. This ensures that the kernel will not lose track of time. This patch adds a function called "clocksource_max_deferment()" that calculates the maximum time the kernel can sleep for a given clocksource and function called "timekeeping_max_deferment()" that returns maximum time the kernel can sleep for the current clocksource. Signed-off-by: Jon Hunter <jon-hunter@ti.com> --- include/linux/clocksource.h | 2 + include/linux/time.h | 1 + kernel/time/clocksource.c | 47 +++++++++++++++++++++++++++++++++++ kernel/time/tick-sched.c | 57 ++++++++++++++++++++++++++++++++---------- kernel/time/timekeeping.c | 11 ++++++++ 5 files changed, 104 insertions(+), 14 deletions(-) diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h index 9ea40ff..09ed7f1 100644 --- a/include/linux/clocksource.h +++ b/include/linux/clocksource.h @@ -151,6 +151,7 @@ extern u64 timecounter_cyc2time(struct timecounter *tc, * subtraction of non 64 bit counters * @mult: cycle to nanosecond multiplier * @shift: cycle to nanosecond divisor (power of two) + * @max_idle_ns: max idle time permitted by the clocksource (nsecs) * @flags: flags describing special properties * @vread: vsyscall based read * @resume: resume function for the clocksource, if necessary @@ -168,6 +169,7 @@ struct clocksource { cycle_t mask; u32 mult; u32 shift; + s64 max_idle_ns; unsigned long flags; cycle_t (*vread)(void); void (*resume)(void); diff --git a/include/linux/time.h b/include/linux/time.h index f505988..e68a480 100644 --- a/include/linux/time.h +++ b/include/linux/time.h @@ -146,6 +146,7 @@ extern void monotonic_to_bootbased(struct timespec *ts); extern struct timespec timespec_trunc(struct timespec t, unsigned gran); extern int timekeeping_valid_for_hres(void); +extern s64 timekeeping_max_deferment(void); extern void update_wall_time(void); extern void update_xtime_cache(u64 nsec); extern void timekeeping_leap_insert(int leapsecond); diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c index 02dc22d..7fffe54 100644 --- a/kernel/time/clocksource.c +++ b/kernel/time/clocksource.c @@ -369,6 +369,50 @@ void clocksource_touch_watchdog(void) clocksource_resume_watchdog(); } +/** + * clocksource_max_deferment - Returns max time the clocksource can be deferred + * @cs: Pointer to clocksource + * + */ +static s64 clocksource_max_deferment(struct clocksource *cs) +{ + s64 max_nsecs; + u64 max_cycles; + + /* + * Calculate the maximum number of cycles that we can pass to the + * cyc2ns function without overflowing a 64-bit signed result. The + * maximum number of cycles is equal to ULLONG_MAX/cs->mult which + * is equivalent to the below. + * max_cycles < (2^63)/cs->mult + * max_cycles < 2^(log2((2^63)/cs->mult)) + * max_cycles < 2^(log2(2^63) - log2(cs->mult)) + * max_cycles < 2^(63 - log2(cs->mult)) + * max_cycles < 1 << (63 - log2(cs->mult)) + * Please note that we add 1 to the result of the log2 to account for + * any rounding errors, ensure the above inequality is satisfied and + * no overflow will occur. + */ + max_cycles = 1ULL << (63 - (ilog2(cs->mult) + 1)); + + /* + * The actual maximum number of cycles we can defer the clocksource is + * determined by the minimum of max_cycles and cs->mask. + */ + max_cycles = min(max_cycles, cs->mask); + max_nsecs = clocksource_cyc2ns(max_cycles, cs->mult, cs->shift); + + /* + * To ensure that the clocksource does not wrap whilst we are idle, + * limit the time the clocksource can be deferred by 12.5%. Please + * note a margin of 12.5% is used because this can be computed with + * a shift, versus say 10% which would require division. + */ + max_nsecs = max_nsecs - (max_nsecs >> 5); + + return max_nsecs; +} + #ifdef CONFIG_GENERIC_TIME static int finished_booting; @@ -461,6 +505,9 @@ static void clocksource_enqueue(struct clocksource *cs) */ int clocksource_register(struct clocksource *cs) { + /* calculate max idle time permitted for this clocksource */ + cs->max_idle_ns = clocksource_max_deferment(cs); + mutex_lock(&clocksource_mutex); clocksource_enqueue(cs); clocksource_select(); diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c index e0f59a2..7a98e90 100644 --- a/kernel/time/tick-sched.c +++ b/kernel/time/tick-sched.c @@ -217,6 +217,7 @@ void tick_nohz_stop_sched_tick(int inidle) ktime_t last_update, expires, now; struct clock_event_device *dev = __get_cpu_var(tick_cpu_device).evtdev; int cpu; + s64 time_delta, max_time_delta; local_irq_save(flags); @@ -270,6 +271,18 @@ void tick_nohz_stop_sched_tick(int inidle) seq = read_seqbegin(&xtime_lock); last_update = last_jiffies_update; last_jiffies = jiffies; + + /* + * On SMP we really should only care for the CPU which + * has the do_timer duty assigned. All other CPUs can + * sleep as long as they want. + */ + if (cpu == tick_do_timer_cpu || + tick_do_timer_cpu == TICK_DO_TIMER_NONE) + max_time_delta = timekeeping_max_deferment(); + else + max_time_delta = KTIME_MAX; + } while (read_seqretry(&xtime_lock, seq)); /* Get the next timer wheel timer */ @@ -289,11 +302,30 @@ void tick_nohz_stop_sched_tick(int inidle) if ((long)delta_jiffies >= 1) { /* - * calculate the expiry time for the next timer wheel - * timer - */ - expires = ktime_add_ns(last_update, tick_period.tv64 * - delta_jiffies); + * calculate the expiry time for the next timer wheel + * timer. delta_jiffies >= NEXT_TIMER_MAX_DELTA signals + * that there is no timer pending or at least extremely + * far into the future (12 days for HZ=1000). In this + * case we set the expiry to the end of time. + */ + if (likely(delta_jiffies < NEXT_TIMER_MAX_DELTA)) { + + /* + * Calculate the time delta for the next timer event. + * If the time delta exceeds the maximum time delta + * permitted by the current clocksource then adjust + * the time delta accordingly to ensure the + * clocksource does not wrap. + */ + time_delta = tick_period.tv64 * delta_jiffies; + + if (time_delta > max_time_delta) + time_delta = max_time_delta; + + expires = ktime_add_ns(last_update, time_delta); + } else { + expires.tv64 = KTIME_MAX; + } /* * If this cpu is the one which updates jiffies, then @@ -337,22 +369,19 @@ void tick_nohz_stop_sched_tick(int inidle) ts->idle_sleeps++; + /* Mark expires */ + ts->idle_expires = expires; + /* - * delta_jiffies >= NEXT_TIMER_MAX_DELTA signals that - * there is no timer pending or at least extremly far - * into the future (12 days for HZ=1000). In this case - * we simply stop the tick timer: + * If the expiration time == KTIME_MAX, then + * in this case we simply stop the tick timer. */ - if (unlikely(delta_jiffies >= NEXT_TIMER_MAX_DELTA)) { - ts->idle_expires.tv64 = KTIME_MAX; + if (unlikely(expires.tv64 == KTIME_MAX)) { if (ts->nohz_mode == NOHZ_MODE_HIGHRES) hrtimer_cancel(&ts->sched_timer); goto out; } - /* Mark expiries */ - ts->idle_expires = expires; - if (ts->nohz_mode == NOHZ_MODE_HIGHRES) { hrtimer_start(&ts->sched_timer, expires, HRTIMER_MODE_ABS_PINNED); diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c index 15e06de..2e57251 100644 --- a/kernel/time/timekeeping.c +++ b/kernel/time/timekeeping.c @@ -487,6 +487,17 @@ int timekeeping_valid_for_hres(void) } /** + * timekeeping_max_deferment - Returns max time the clocksource can be deferred + * + * IMPORTANT: Caller must observe xtime_lock via read_seqbegin/read_seqretry + * to ensure that the clocksource does not change! + */ +s64 timekeeping_max_deferment(void) +{ + return timekeeper.clock->max_idle_ns; +} + +/** * read_persistent_clock - Return time from the persistent clock. * * Weak dummy function for arches that do not yet support it. -- 1.6.0.4 ^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH 2/2] Dynamic Tick: Allow 32-bit machines to sleep for more than 2.15 seconds 2009-08-18 17:45 ` [PATCH 1/2] Dynamic Tick: Prevent clocksource wrapping during idle Jon Hunter @ 2009-08-18 17:45 ` Jon Hunter 2009-08-18 19:26 ` Thomas Gleixner 2009-11-13 19:50 ` [tip:timers/core] nohz: " tip-bot for Jon Hunter 2009-08-18 19:25 ` [PATCH 1/2] Dynamic Tick: Prevent clocksource wrapping during idle Thomas Gleixner 2009-11-13 19:49 ` [tip:timers/core] nohz: " tip-bot for Jon Hunter 2 siblings, 2 replies; 29+ messages in thread From: Jon Hunter @ 2009-08-18 17:45 UTC (permalink / raw) To: linux-kernel; +Cc: Thomas Gleixner, John Stultz, Jon Hunter From: Jon Hunter <jon-hunter@ti.com> In the dynamic tick code, "max_delta_ns" (member of the "clock_event_device" structure) represents the maximum sleep time that can occur between timer events in nanoseconds. The variable, "max_delta_ns", is defined as an unsigned long which is a 32-bit integer for 32-bit machines and a 64-bit integer for 64-bit machines (if -m64 option is used for gcc). The value of max_delta_ns is set by calling the function "clockevent_delta2ns()" which returns a maximum value of LONG_MAX. For a 32-bit machine LONG_MAX is equal to 0x7fffffff and in nanoseconds this equates to ~2.15 seconds. Hence, the maximum sleep time for a 32-bit machine is ~2.15 seconds, where as for a 64-bit machine it will be many years. This patch changes the type of max_delta_ns to be "unsigned long long" instead of "unsigned long" so that this variable is a 64-bit type for both 32-bit and 64-bit machines. It also changes the maximum value returned by clockevent_delta2ns() to LLONG_MAX. Hence this allows a 32-bit machine to sleep for longer than ~2.15 seconds. Please note that this patch also changes "min_delta_ns" to be "unsigned long long" too and although this is probably unnecessary, it makes the patch simpler. Signed-off-by: Jon Hunter <jon-hunter@ti.com> --- include/linux/clockchips.h | 6 +++--- kernel/hrtimer.c | 2 +- kernel/time/clockevents.c | 10 +++++----- kernel/time/tick-oneshot.c | 2 +- kernel/time/timer_list.c | 4 ++-- 5 files changed, 12 insertions(+), 12 deletions(-) diff --git a/include/linux/clockchips.h b/include/linux/clockchips.h index 3a1dbba..8154bc6 100644 --- a/include/linux/clockchips.h +++ b/include/linux/clockchips.h @@ -77,8 +77,8 @@ enum clock_event_nofitiers { struct clock_event_device { const char *name; unsigned int features; - unsigned long max_delta_ns; - unsigned long min_delta_ns; + unsigned long long max_delta_ns; + unsigned long long min_delta_ns; unsigned long mult; int shift; int rating; @@ -116,7 +116,7 @@ static inline unsigned long div_sc(unsigned long ticks, unsigned long nsec, } /* Clock event layer functions */ -extern unsigned long clockevent_delta2ns(unsigned long latch, +extern unsigned long long clockevent_delta2ns(unsigned long latch, struct clock_event_device *evt); extern void clockevents_register_device(struct clock_event_device *dev); diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c index e2f91ec..2043e78 100644 --- a/kernel/hrtimer.c +++ b/kernel/hrtimer.c @@ -1198,7 +1198,7 @@ hrtimer_interrupt_hanging(struct clock_event_device *dev, force_clock_reprogram = 1; dev->min_delta_ns = (unsigned long)try_time.tv64 * 3; printk(KERN_WARNING "hrtimer: interrupt too slow, " - "forcing clock min delta to %lu ns\n", dev->min_delta_ns); + "forcing clock min delta to %llu ns\n", dev->min_delta_ns); } /* * High resolution timer interrupt diff --git a/kernel/time/clockevents.c b/kernel/time/clockevents.c index a6dcd67..6db410f 100644 --- a/kernel/time/clockevents.c +++ b/kernel/time/clockevents.c @@ -37,10 +37,10 @@ static DEFINE_SPINLOCK(clockevents_lock); * * Math helper, returns latch value converted to nanoseconds (bound checked) */ -unsigned long clockevent_delta2ns(unsigned long latch, +unsigned long long clockevent_delta2ns(unsigned long latch, struct clock_event_device *evt) { - u64 clc = ((u64) latch << evt->shift); + unsigned long long clc = ((unsigned long long) latch << evt->shift); if (unlikely(!evt->mult)) { evt->mult = 1; @@ -50,10 +50,10 @@ unsigned long clockevent_delta2ns(unsigned long latch, do_div(clc, evt->mult); if (clc < 1000) clc = 1000; - if (clc > LONG_MAX) - clc = LONG_MAX; + if (clc > LLONG_MAX) + clc = LLONG_MAX; - return (unsigned long) clc; + return clc; } EXPORT_SYMBOL_GPL(clockevent_delta2ns); diff --git a/kernel/time/tick-oneshot.c b/kernel/time/tick-oneshot.c index a96c0e2..327d4ed 100644 --- a/kernel/time/tick-oneshot.c +++ b/kernel/time/tick-oneshot.c @@ -50,7 +50,7 @@ int tick_dev_program_event(struct clock_event_device *dev, ktime_t expires, dev->min_delta_ns += dev->min_delta_ns >> 1; printk(KERN_WARNING - "CE: %s increasing min_delta_ns to %lu nsec\n", + "CE: %s increasing min_delta_ns to %llu nsec\n", dev->name ? dev->name : "?", dev->min_delta_ns << 1); diff --git a/kernel/time/timer_list.c b/kernel/time/timer_list.c index a999b92..3bf30b4 100644 --- a/kernel/time/timer_list.c +++ b/kernel/time/timer_list.c @@ -204,8 +204,8 @@ print_tickdevice(struct seq_file *m, struct tick_device *td, int cpu) return; } SEQ_printf(m, "%s\n", dev->name); - SEQ_printf(m, " max_delta_ns: %lu\n", dev->max_delta_ns); - SEQ_printf(m, " min_delta_ns: %lu\n", dev->min_delta_ns); + SEQ_printf(m, " max_delta_ns: %llu\n", dev->max_delta_ns); + SEQ_printf(m, " min_delta_ns: %llu\n", dev->min_delta_ns); SEQ_printf(m, " mult: %lu\n", dev->mult); SEQ_printf(m, " shift: %d\n", dev->shift); SEQ_printf(m, " mode: %d\n", dev->mode); -- 1.6.0.4 ^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: [PATCH 2/2] Dynamic Tick: Allow 32-bit machines to sleep for more than 2.15 seconds 2009-08-18 17:45 ` [PATCH 2/2] Dynamic Tick: Allow 32-bit machines to sleep for more than 2.15 seconds Jon Hunter @ 2009-08-18 19:26 ` Thomas Gleixner 2009-08-18 20:52 ` Jon Hunter 2009-11-13 19:50 ` [tip:timers/core] nohz: " tip-bot for Jon Hunter 1 sibling, 1 reply; 29+ messages in thread From: Thomas Gleixner @ 2009-08-18 19:26 UTC (permalink / raw) To: Jon Hunter; +Cc: linux-kernel, John Stultz On Tue, 18 Aug 2009, Jon Hunter wrote: > diff --git a/include/linux/clockchips.h b/include/linux/clockchips.h > index 3a1dbba..8154bc6 100644 > --- a/include/linux/clockchips.h > +++ b/include/linux/clockchips.h > @@ -77,8 +77,8 @@ enum clock_event_nofitiers { > struct clock_event_device { > const char *name; > unsigned int features; > - unsigned long max_delta_ns; > - unsigned long min_delta_ns; > + unsigned long long max_delta_ns; > + unsigned long long min_delta_ns; Can we please use u64 for this ? Thanks, tglx ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 2/2] Dynamic Tick: Allow 32-bit machines to sleep for more than 2.15 seconds 2009-08-18 19:26 ` Thomas Gleixner @ 2009-08-18 20:52 ` Jon Hunter 0 siblings, 0 replies; 29+ messages in thread From: Jon Hunter @ 2009-08-18 20:52 UTC (permalink / raw) To: Thomas Gleixner; +Cc: linux-kernel, John Stultz Thomas Gleixner wrote: > On Tue, 18 Aug 2009, Jon Hunter wrote: >> diff --git a/include/linux/clockchips.h b/include/linux/clockchips.h >> index 3a1dbba..8154bc6 100644 >> --- a/include/linux/clockchips.h >> +++ b/include/linux/clockchips.h >> @@ -77,8 +77,8 @@ enum clock_event_nofitiers { >> struct clock_event_device { >> const char *name; >> unsigned int features; >> - unsigned long max_delta_ns; >> - unsigned long min_delta_ns; >> + unsigned long long max_delta_ns; >> + unsigned long long min_delta_ns; > > Can we please use u64 for this ? John brought this up as well. There was some discussion sometime back about this. I did get some feedback that u64 was a different type between ppc64 and x86-64 which was causing problems with printk. The above variables are also used with printk in the kernel today. See the following email: http://marc.info/?l=linux-kernel&m=124041426203283&w=2 I am not sure if this is still the case and safer to stick with long-long for now. Let me know your thoughts. Cheers Jon ^ permalink raw reply [flat|nested] 29+ messages in thread
* [tip:timers/core] nohz: Allow 32-bit machines to sleep for more than 2.15 seconds 2009-08-18 17:45 ` [PATCH 2/2] Dynamic Tick: Allow 32-bit machines to sleep for more than 2.15 seconds Jon Hunter 2009-08-18 19:26 ` Thomas Gleixner @ 2009-11-13 19:50 ` tip-bot for Jon Hunter 1 sibling, 0 replies; 29+ messages in thread From: tip-bot for Jon Hunter @ 2009-11-13 19:50 UTC (permalink / raw) To: linux-tip-commits; +Cc: linux-kernel, hpa, mingo, johnstul, jon-hunter, tglx Commit-ID: 97813f2fe77804a4464564c75ba8d8826377feea Gitweb: http://git.kernel.org/tip/97813f2fe77804a4464564c75ba8d8826377feea Author: Jon Hunter <jon-hunter@ti.com> AuthorDate: Tue, 18 Aug 2009 12:45:11 -0500 Committer: Thomas Gleixner <tglx@linutronix.de> CommitDate: Fri, 13 Nov 2009 20:46:24 +0100 nohz: Allow 32-bit machines to sleep for more than 2.15 seconds In the dynamic tick code, "max_delta_ns" (member of the "clock_event_device" structure) represents the maximum sleep time that can occur between timer events in nanoseconds. The variable, "max_delta_ns", is defined as an unsigned long which is a 32-bit integer for 32-bit machines and a 64-bit integer for 64-bit machines (if -m64 option is used for gcc). The value of max_delta_ns is set by calling the function "clockevent_delta2ns()" which returns a maximum value of LONG_MAX. For a 32-bit machine LONG_MAX is equal to 0x7fffffff and in nanoseconds this equates to ~2.15 seconds. Hence, the maximum sleep time for a 32-bit machine is ~2.15 seconds, where as for a 64-bit machine it will be many years. This patch changes the type of max_delta_ns to be "u64" instead of "unsigned long" so that this variable is a 64-bit type for both 32-bit and 64-bit machines. It also changes the maximum value returned by clockevent_delta2ns() to KTIME_MAX. Hence this allows a 32-bit machine to sleep for longer than ~2.15 seconds. Please note that this patch also changes "min_delta_ns" to be "u64" too and although this is unnecessary, it makes the patch simpler as it avoids to fixup all callers of clockevent_delta2ns(). [ tglx: changed "unsigned long long" to u64 as we use this data type through out the time code ] Signed-off-by: Jon Hunter <jon-hunter@ti.com> Cc: John Stultz <johnstul@us.ibm.com> LKML-Reference: <1250617512-23567-3-git-send-email-jon-hunter@ti.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> --- include/linux/clockchips.h | 8 ++++---- kernel/hrtimer.c | 3 ++- kernel/time/clockevents.c | 11 +++++------ kernel/time/tick-oneshot.c | 4 ++-- kernel/time/timer_list.c | 6 ++++-- 5 files changed, 17 insertions(+), 15 deletions(-) diff --git a/include/linux/clockchips.h b/include/linux/clockchips.h index 4d438b0..0cf725b 100644 --- a/include/linux/clockchips.h +++ b/include/linux/clockchips.h @@ -77,8 +77,8 @@ enum clock_event_nofitiers { struct clock_event_device { const char *name; unsigned int features; - unsigned long max_delta_ns; - unsigned long min_delta_ns; + u64 max_delta_ns; + u64 min_delta_ns; u32 mult; u32 shift; int rating; @@ -116,8 +116,8 @@ static inline unsigned long div_sc(unsigned long ticks, unsigned long nsec, } /* Clock event layer functions */ -extern unsigned long clockevent_delta2ns(unsigned long latch, - struct clock_event_device *evt); +extern u64 clockevent_delta2ns(unsigned long latch, + struct clock_event_device *evt); extern void clockevents_register_device(struct clock_event_device *dev); extern void clockevents_exchange_device(struct clock_event_device *old, diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c index 6d70204..c215b74 100644 --- a/kernel/hrtimer.c +++ b/kernel/hrtimer.c @@ -1240,7 +1240,8 @@ hrtimer_interrupt_hanging(struct clock_event_device *dev, force_clock_reprogram = 1; dev->min_delta_ns = (unsigned long)try_time.tv64 * 3; printk(KERN_WARNING "hrtimer: interrupt too slow, " - "forcing clock min delta to %lu ns\n", dev->min_delta_ns); + "forcing clock min delta to %llu ns\n", + (unsigned long long) dev->min_delta_ns); } /* * High resolution timer interrupt diff --git a/kernel/time/clockevents.c b/kernel/time/clockevents.c index 620b58a..05e8aee 100644 --- a/kernel/time/clockevents.c +++ b/kernel/time/clockevents.c @@ -37,10 +37,9 @@ static DEFINE_SPINLOCK(clockevents_lock); * * Math helper, returns latch value converted to nanoseconds (bound checked) */ -unsigned long clockevent_delta2ns(unsigned long latch, - struct clock_event_device *evt) +u64 clockevent_delta2ns(unsigned long latch, struct clock_event_device *evt) { - u64 clc = ((u64) latch << evt->shift); + u64 clc = (u64) latch << evt->shift; if (unlikely(!evt->mult)) { evt->mult = 1; @@ -50,10 +49,10 @@ unsigned long clockevent_delta2ns(unsigned long latch, do_div(clc, evt->mult); if (clc < 1000) clc = 1000; - if (clc > LONG_MAX) - clc = LONG_MAX; + if (clc > KTIME_MAX) + clc = KTIME_MAX; - return (unsigned long) clc; + return clc; } EXPORT_SYMBOL_GPL(clockevent_delta2ns); diff --git a/kernel/time/tick-oneshot.c b/kernel/time/tick-oneshot.c index a96c0e2..0a8a213 100644 --- a/kernel/time/tick-oneshot.c +++ b/kernel/time/tick-oneshot.c @@ -50,9 +50,9 @@ int tick_dev_program_event(struct clock_event_device *dev, ktime_t expires, dev->min_delta_ns += dev->min_delta_ns >> 1; printk(KERN_WARNING - "CE: %s increasing min_delta_ns to %lu nsec\n", + "CE: %s increasing min_delta_ns to %llu nsec\n", dev->name ? dev->name : "?", - dev->min_delta_ns << 1); + (unsigned long long) dev->min_delta_ns << 1); i = 0; } diff --git a/kernel/time/timer_list.c b/kernel/time/timer_list.c index fa00da1..665c76e 100644 --- a/kernel/time/timer_list.c +++ b/kernel/time/timer_list.c @@ -204,8 +204,10 @@ print_tickdevice(struct seq_file *m, struct tick_device *td, int cpu) return; } SEQ_printf(m, "%s\n", dev->name); - SEQ_printf(m, " max_delta_ns: %lu\n", dev->max_delta_ns); - SEQ_printf(m, " min_delta_ns: %lu\n", dev->min_delta_ns); + SEQ_printf(m, " max_delta_ns: %llu\n", + (unsigned long long) dev->max_delta_ns); + SEQ_printf(m, " min_delta_ns: %llu\n", + (unsigned long long) dev->min_delta_ns); SEQ_printf(m, " mult: %u\n", dev->mult); SEQ_printf(m, " shift: %u\n", dev->shift); SEQ_printf(m, " mode: %d\n", dev->mode); ^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: [PATCH 1/2] Dynamic Tick: Prevent clocksource wrapping during idle 2009-08-18 17:45 ` [PATCH 1/2] Dynamic Tick: Prevent clocksource wrapping during idle Jon Hunter 2009-08-18 17:45 ` [PATCH 2/2] Dynamic Tick: Allow 32-bit machines to sleep for more than 2.15 seconds Jon Hunter @ 2009-08-18 19:25 ` Thomas Gleixner 2009-08-18 20:42 ` Jon Hunter 2009-11-13 19:49 ` [tip:timers/core] nohz: " tip-bot for Jon Hunter 2 siblings, 1 reply; 29+ messages in thread From: Thomas Gleixner @ 2009-08-18 19:25 UTC (permalink / raw) To: Jon Hunter; +Cc: linux-kernel, John Stultz On Tue, 18 Aug 2009, Jon Hunter wrote: > From: Jon Hunter <jon-hunter@ti.com> > > The dynamic tick allows the kernel to sleep for periods longer > than a single tick. This patch prevents that the kernel from > sleeping for a period longer than the maximum time that the > current clocksource can count. This ensures that the kernel will > not lose track of time. This patch adds a function called > "clocksource_max_deferment()" that calculates the maximum time the > kernel can sleep for a given clocksource and function called > "timekeeping_max_deferment()" that returns maximum time the kernel > can sleep for the current clocksource. > > Signed-off-by: Jon Hunter <jon-hunter@ti.com> > --- > include/linux/clocksource.h | 2 + > include/linux/time.h | 1 + > kernel/time/clocksource.c | 47 +++++++++++++++++++++++++++++++++++ > kernel/time/tick-sched.c | 57 ++++++++++++++++++++++++++++++++---------- > kernel/time/timekeeping.c | 11 ++++++++ > 5 files changed, 104 insertions(+), 14 deletions(-) > > diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h > index 9ea40ff..09ed7f1 100644 > --- a/include/linux/clocksource.h > +++ b/include/linux/clocksource.h > @@ -151,6 +151,7 @@ extern u64 timecounter_cyc2time(struct timecounter *tc, > * subtraction of non 64 bit counters > * @mult: cycle to nanosecond multiplier > * @shift: cycle to nanosecond divisor (power of two) > + * @max_idle_ns: max idle time permitted by the clocksource (nsecs) > * @flags: flags describing special properties > * @vread: vsyscall based read > * @resume: resume function for the clocksource, if necessary > @@ -168,6 +169,7 @@ struct clocksource { > cycle_t mask; > u32 mult; > u32 shift; > + s64 max_idle_ns; I don't think we should move this to the clocksource. That should go into the new struct timekeeper and initialized when a clocksource is selected for timekeeping. > diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c > index e0f59a2..7a98e90 100644 > --- a/kernel/time/tick-sched.c > +++ b/kernel/time/tick-sched.c > @@ -217,6 +217,7 @@ void tick_nohz_stop_sched_tick(int inidle) > ktime_t last_update, expires, now; > struct clock_event_device *dev = __get_cpu_var(tick_cpu_device).evtdev; > int cpu; > + s64 time_delta, max_time_delta; > > local_irq_save(flags); > > @@ -270,6 +271,18 @@ void tick_nohz_stop_sched_tick(int inidle) > seq = read_seqbegin(&xtime_lock); > last_update = last_jiffies_update; > last_jiffies = jiffies; > + > + /* > + * On SMP we really should only care for the CPU which > + * has the do_timer duty assigned. All other CPUs can > + * sleep as long as they want. > + */ > + if (cpu == tick_do_timer_cpu || > + tick_do_timer_cpu == TICK_DO_TIMER_NONE) > + max_time_delta = timekeeping_max_deferment(); > + else > + max_time_delta = KTIME_MAX; > + Is it worth the extra check instead of always using timekeeping_max_deferment() ? > } while (read_seqretry(&xtime_lock, seq)); > > /* Get the next timer wheel timer */ > @@ -289,11 +302,30 @@ void tick_nohz_stop_sched_tick(int inidle) > if ((long)delta_jiffies >= 1) { > > /* > - * calculate the expiry time for the next timer wheel > - * timer > - */ > - expires = ktime_add_ns(last_update, tick_period.tv64 * > - delta_jiffies); > + * calculate the expiry time for the next timer wheel > + * timer. delta_jiffies >= NEXT_TIMER_MAX_DELTA signals > + * that there is no timer pending or at least extremely > + * far into the future (12 days for HZ=1000). In this > + * case we set the expiry to the end of time. > + */ > + if (likely(delta_jiffies < NEXT_TIMER_MAX_DELTA)) { > + > + /* > + * Calculate the time delta for the next timer event. > + * If the time delta exceeds the maximum time delta > + * permitted by the current clocksource then adjust > + * the time delta accordingly to ensure the > + * clocksource does not wrap. > + */ > + time_delta = tick_period.tv64 * delta_jiffies; > + > + if (time_delta > max_time_delta) > + time_delta = max_time_delta; > + > + expires = ktime_add_ns(last_update, time_delta); > + } else { > + expires.tv64 = KTIME_MAX; > + } This looks incorrect. You set expires to KTIME_MAX when no timer is pending, but that defeats the purpose of this patch. When we hit this code path and the next interrupt comes in after the timekeeping clocksource wrapped we are bust. Thanks, tglx ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 1/2] Dynamic Tick: Prevent clocksource wrapping during idle 2009-08-18 19:25 ` [PATCH 1/2] Dynamic Tick: Prevent clocksource wrapping during idle Thomas Gleixner @ 2009-08-18 20:42 ` Jon Hunter 0 siblings, 0 replies; 29+ messages in thread From: Jon Hunter @ 2009-08-18 20:42 UTC (permalink / raw) To: Thomas Gleixner; +Cc: john stultz, linux-kernel@vger.kernel.org Thomas Gleixner wrote: > On Tue, 18 Aug 2009, Jon Hunter wrote: > >> From: Jon Hunter <jon-hunter@ti.com> >> >> The dynamic tick allows the kernel to sleep for periods longer >> than a single tick. This patch prevents that the kernel from >> sleeping for a period longer than the maximum time that the >> current clocksource can count. This ensures that the kernel will >> not lose track of time. This patch adds a function called >> "clocksource_max_deferment()" that calculates the maximum time the >> kernel can sleep for a given clocksource and function called >> "timekeeping_max_deferment()" that returns maximum time the kernel >> can sleep for the current clocksource. >> >> Signed-off-by: Jon Hunter <jon-hunter@ti.com> >> --- >> include/linux/clocksource.h | 2 + >> include/linux/time.h | 1 + >> kernel/time/clocksource.c | 47 +++++++++++++++++++++++++++++++++++ >> kernel/time/tick-sched.c | 57 ++++++++++++++++++++++++++++++++---------- >> kernel/time/timekeeping.c | 11 ++++++++ >> 5 files changed, 104 insertions(+), 14 deletions(-) >> >> diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h >> index 9ea40ff..09ed7f1 100644 >> --- a/include/linux/clocksource.h >> +++ b/include/linux/clocksource.h >> @@ -151,6 +151,7 @@ extern u64 timecounter_cyc2time(struct timecounter *tc, >> * subtraction of non 64 bit counters >> * @mult: cycle to nanosecond multiplier >> * @shift: cycle to nanosecond divisor (power of two) >> + * @max_idle_ns: max idle time permitted by the clocksource (nsecs) >> * @flags: flags describing special properties >> * @vread: vsyscall based read >> * @resume: resume function for the clocksource, if necessary >> @@ -168,6 +169,7 @@ struct clocksource { >> cycle_t mask; >> u32 mult; >> u32 shift; >> + s64 max_idle_ns; > > I don't think we should move this to the clocksource. That should go > into the new struct timekeeper and initialized when a clocksource is > selected for timekeeping. > >> diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c >> index e0f59a2..7a98e90 100644 >> --- a/kernel/time/tick-sched.c >> +++ b/kernel/time/tick-sched.c >> @@ -217,6 +217,7 @@ void tick_nohz_stop_sched_tick(int inidle) >> ktime_t last_update, expires, now; >> struct clock_event_device *dev = __get_cpu_var(tick_cpu_device).evtdev; >> int cpu; >> + s64 time_delta, max_time_delta; >> >> local_irq_save(flags); >> >> @@ -270,6 +271,18 @@ void tick_nohz_stop_sched_tick(int inidle) >> seq = read_seqbegin(&xtime_lock); >> last_update = last_jiffies_update; >> last_jiffies = jiffies; >> + >> + /* >> + * On SMP we really should only care for the CPU which >> + * has the do_timer duty assigned. All other CPUs can >> + * sleep as long as they want. >> + */ >> + if (cpu == tick_do_timer_cpu || >> + tick_do_timer_cpu == TICK_DO_TIMER_NONE) >> + max_time_delta = timekeeping_max_deferment(); >> + else >> + max_time_delta = KTIME_MAX; >> + > > Is it worth the extra check instead of always using > timekeeping_max_deferment() ? > >> } while (read_seqretry(&xtime_lock, seq)); >> >> /* Get the next timer wheel timer */ >> @@ -289,11 +302,30 @@ void tick_nohz_stop_sched_tick(int inidle) >> if ((long)delta_jiffies >= 1) { >> >> /* >> - * calculate the expiry time for the next timer wheel >> - * timer >> - */ >> - expires = ktime_add_ns(last_update, tick_period.tv64 * >> - delta_jiffies); >> + * calculate the expiry time for the next timer wheel >> + * timer. delta_jiffies >= NEXT_TIMER_MAX_DELTA signals >> + * that there is no timer pending or at least extremely >> + * far into the future (12 days for HZ=1000). In this >> + * case we set the expiry to the end of time. >> + */ >> + if (likely(delta_jiffies < NEXT_TIMER_MAX_DELTA)) { >> + >> + /* >> + * Calculate the time delta for the next timer event. >> + * If the time delta exceeds the maximum time delta >> + * permitted by the current clocksource then adjust >> + * the time delta accordingly to ensure the >> + * clocksource does not wrap. >> + */ >> + time_delta = tick_period.tv64 * delta_jiffies; >> + >> + if (time_delta > max_time_delta) >> + time_delta = max_time_delta; >> + >> + expires = ktime_add_ns(last_update, time_delta); >> + } else { >> + expires.tv64 = KTIME_MAX; >> + } > > This looks incorrect. You set expires to KTIME_MAX when no timer is > pending, but that defeats the purpose of this patch. When we hit this > code path and the next interrupt comes in after the timekeeping > clocksource wrapped we are bust. Right, so this is a bit of a grey area for me. When I first started looking at this I was questioning the purpose of the following code that exists today in the tick_nohz_stop_sched_tick() function: /* * delta_jiffies >= NEXT_TIMER_MAX_DELTA signals that * there is no timer pending or at least extremly far * into the future (12 days for HZ=1000). In this case * we simply stop the tick timer: */ if (unlikely(delta_jiffies >= NEXT_TIMER_MAX_DELTA)) { ts->idle_expires.tv64 = KTIME_MAX; if (ts->nohz_mode == NOHZ_MODE_HIGHRES) hrtimer_cancel(&ts->sched_timer); goto out; } The above code checks to see delta_jiffies is greater than NEXT_TIMER_MAX_DELTA, if so then sets expires to KTIME_MAX and disables the timer. I had questioned this a few months ago, but I don't think that John and I knew the history here. So for right or wrong, I left this code alone. In the above patch it is still do the same thing if delta_jiffies is indeed greater than NEXT_TIMER_MAX_DELTA. If you agree that this code is not needed and that in the case where we have no timers we should simply make the next timer event always occur in max_time_delta ns later, then I can re-work it to do this. Thanks Jon ^ permalink raw reply [flat|nested] 29+ messages in thread
* [tip:timers/core] nohz: Prevent clocksource wrapping during idle 2009-08-18 17:45 ` [PATCH 1/2] Dynamic Tick: Prevent clocksource wrapping during idle Jon Hunter 2009-08-18 17:45 ` [PATCH 2/2] Dynamic Tick: Allow 32-bit machines to sleep for more than 2.15 seconds Jon Hunter 2009-08-18 19:25 ` [PATCH 1/2] Dynamic Tick: Prevent clocksource wrapping during idle Thomas Gleixner @ 2009-11-13 19:49 ` tip-bot for Jon Hunter 2 siblings, 0 replies; 29+ messages in thread From: tip-bot for Jon Hunter @ 2009-11-13 19:49 UTC (permalink / raw) To: linux-tip-commits; +Cc: linux-kernel, hpa, mingo, johnstul, jon-hunter, tglx Commit-ID: 98962465ed9e6ea99c38e0af63fe1dcb5a79dc25 Gitweb: http://git.kernel.org/tip/98962465ed9e6ea99c38e0af63fe1dcb5a79dc25 Author: Jon Hunter <jon-hunter@ti.com> AuthorDate: Tue, 18 Aug 2009 12:45:10 -0500 Committer: Thomas Gleixner <tglx@linutronix.de> CommitDate: Fri, 13 Nov 2009 20:46:24 +0100 nohz: Prevent clocksource wrapping during idle The dynamic tick allows the kernel to sleep for periods longer than a single tick, but it does not limit the sleep time currently. In the worst case the kernel could sleep longer than the wrap around time of the time keeping clock source which would result in losing track of time. Prevent this by limiting it to the safe maximum sleep time of the current time keeping clock source. The value is calculated when the clock source is registered. [ tglx: simplified the code a bit and massaged the commit msg ] Signed-off-by: Jon Hunter <jon-hunter@ti.com> Cc: John Stultz <johnstul@us.ibm.com> LKML-Reference: <1250617512-23567-2-git-send-email-jon-hunter@ti.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> --- include/linux/clocksource.h | 2 + include/linux/time.h | 1 + kernel/time/clocksource.c | 44 ++++++++++++++++++++++++++++++++++++ kernel/time/tick-sched.c | 52 +++++++++++++++++++++++++++++++----------- kernel/time/timekeeping.c | 11 +++++++++ 5 files changed, 96 insertions(+), 14 deletions(-) diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h index f57f882..279c547 100644 --- a/include/linux/clocksource.h +++ b/include/linux/clocksource.h @@ -151,6 +151,7 @@ extern u64 timecounter_cyc2time(struct timecounter *tc, * subtraction of non 64 bit counters * @mult: cycle to nanosecond multiplier * @shift: cycle to nanosecond divisor (power of two) + * @max_idle_ns: max idle time permitted by the clocksource (nsecs) * @flags: flags describing special properties * @vread: vsyscall based read * @resume: resume function for the clocksource, if necessary @@ -168,6 +169,7 @@ struct clocksource { cycle_t mask; u32 mult; u32 shift; + u64 max_idle_ns; unsigned long flags; cycle_t (*vread)(void); void (*resume)(void); diff --git a/include/linux/time.h b/include/linux/time.h index fe04e5e..6e026e4 100644 --- a/include/linux/time.h +++ b/include/linux/time.h @@ -148,6 +148,7 @@ extern void monotonic_to_bootbased(struct timespec *ts); extern struct timespec timespec_trunc(struct timespec t, unsigned gran); extern int timekeeping_valid_for_hres(void); +extern u64 timekeeping_max_deferment(void); extern void update_wall_time(void); extern void update_xtime_cache(u64 nsec); extern void timekeeping_leap_insert(int leapsecond); diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c index 407c089..b65b242 100644 --- a/kernel/time/clocksource.c +++ b/kernel/time/clocksource.c @@ -469,6 +469,47 @@ void clocksource_touch_watchdog(void) #ifdef CONFIG_GENERIC_TIME /** + * clocksource_max_deferment - Returns max time the clocksource can be deferred + * @cs: Pointer to clocksource + * + */ +static u64 clocksource_max_deferment(struct clocksource *cs) +{ + u64 max_nsecs, max_cycles; + + /* + * Calculate the maximum number of cycles that we can pass to the + * cyc2ns function without overflowing a 64-bit signed result. The + * maximum number of cycles is equal to ULLONG_MAX/cs->mult which + * is equivalent to the below. + * max_cycles < (2^63)/cs->mult + * max_cycles < 2^(log2((2^63)/cs->mult)) + * max_cycles < 2^(log2(2^63) - log2(cs->mult)) + * max_cycles < 2^(63 - log2(cs->mult)) + * max_cycles < 1 << (63 - log2(cs->mult)) + * Please note that we add 1 to the result of the log2 to account for + * any rounding errors, ensure the above inequality is satisfied and + * no overflow will occur. + */ + max_cycles = 1ULL << (63 - (ilog2(cs->mult) + 1)); + + /* + * The actual maximum number of cycles we can defer the clocksource is + * determined by the minimum of max_cycles and cs->mask. + */ + max_cycles = min_t(u64, max_cycles, (u64) cs->mask); + max_nsecs = clocksource_cyc2ns(max_cycles, cs->mult, cs->shift); + + /* + * To ensure that the clocksource does not wrap whilst we are idle, + * limit the time the clocksource can be deferred by 12.5%. Please + * note a margin of 12.5% is used because this can be computed with + * a shift, versus say 10% which would require division. + */ + return max_nsecs - (max_nsecs >> 5); +} + +/** * clocksource_select - Select the best clocksource available * * Private function. Must hold clocksource_mutex when called. @@ -564,6 +605,9 @@ static void clocksource_enqueue(struct clocksource *cs) */ int clocksource_register(struct clocksource *cs) { + /* calculate max idle time permitted for this clocksource */ + cs->max_idle_ns = clocksource_max_deferment(cs); + mutex_lock(&clocksource_mutex); clocksource_enqueue(cs); clocksource_select(); diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c index c65ba0f..a80b464 100644 --- a/kernel/time/tick-sched.c +++ b/kernel/time/tick-sched.c @@ -208,6 +208,7 @@ void tick_nohz_stop_sched_tick(int inidle) struct tick_sched *ts; ktime_t last_update, expires, now; struct clock_event_device *dev = __get_cpu_var(tick_cpu_device).evtdev; + u64 time_delta; int cpu; local_irq_save(flags); @@ -262,6 +263,17 @@ void tick_nohz_stop_sched_tick(int inidle) seq = read_seqbegin(&xtime_lock); last_update = last_jiffies_update; last_jiffies = jiffies; + + /* + * On SMP we really should only care for the CPU which + * has the do_timer duty assigned. All other CPUs can + * sleep as long as they want. + */ + if (cpu == tick_do_timer_cpu || + tick_do_timer_cpu == TICK_DO_TIMER_NONE) + time_delta = timekeeping_max_deferment(); + else + time_delta = KTIME_MAX; } while (read_seqretry(&xtime_lock, seq)); if (rcu_needs_cpu(cpu) || printk_needs_cpu(cpu) || @@ -284,11 +296,26 @@ void tick_nohz_stop_sched_tick(int inidle) if ((long)delta_jiffies >= 1) { /* - * calculate the expiry time for the next timer wheel - * timer - */ - expires = ktime_add_ns(last_update, tick_period.tv64 * - delta_jiffies); + * calculate the expiry time for the next timer wheel + * timer. delta_jiffies >= NEXT_TIMER_MAX_DELTA signals + * that there is no timer pending or at least extremely + * far into the future (12 days for HZ=1000). In this + * case we set the expiry to the end of time. + */ + if (likely(delta_jiffies < NEXT_TIMER_MAX_DELTA)) { + /* + * Calculate the time delta for the next timer event. + * If the time delta exceeds the maximum time delta + * permitted by the current clocksource then adjust + * the time delta accordingly to ensure the + * clocksource does not wrap. + */ + time_delta = min_t(u64, time_delta, + tick_period.tv64 * delta_jiffies); + expires = ktime_add_ns(last_update, time_delta); + } else { + expires.tv64 = KTIME_MAX; + } /* * If this cpu is the one which updates jiffies, then @@ -332,22 +359,19 @@ void tick_nohz_stop_sched_tick(int inidle) ts->idle_sleeps++; + /* Mark expires */ + ts->idle_expires = expires; + /* - * delta_jiffies >= NEXT_TIMER_MAX_DELTA signals that - * there is no timer pending or at least extremly far - * into the future (12 days for HZ=1000). In this case - * we simply stop the tick timer: + * If the expiration time == KTIME_MAX, then + * in this case we simply stop the tick timer. */ - if (unlikely(delta_jiffies >= NEXT_TIMER_MAX_DELTA)) { - ts->idle_expires.tv64 = KTIME_MAX; + if (unlikely(expires.tv64 == KTIME_MAX)) { if (ts->nohz_mode == NOHZ_MODE_HIGHRES) hrtimer_cancel(&ts->sched_timer); goto out; } - /* Mark expiries */ - ts->idle_expires = expires; - if (ts->nohz_mode == NOHZ_MODE_HIGHRES) { hrtimer_start(&ts->sched_timer, expires, HRTIMER_MODE_ABS_PINNED); diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c index 96b3f0d..5d4d423 100644 --- a/kernel/time/timekeeping.c +++ b/kernel/time/timekeeping.c @@ -478,6 +478,17 @@ int timekeeping_valid_for_hres(void) } /** + * timekeeping_max_deferment - Returns max time the clocksource can be deferred + * + * Caller must observe xtime_lock via read_seqbegin/read_seqretry to + * ensure that the clocksource does not change! + */ +u64 timekeeping_max_deferment(void) +{ + return timekeeper.clock->max_idle_ns; +} + +/** * read_persistent_clock - Return time from the persistent clock. * * Weak dummy function for arches that do not yet support it. ^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: [PATCH 0/2] Dynamic Tick: Enabling longer sleep times on 32-bit machines 2009-08-18 17:45 [PATCH 0/2] Dynamic Tick: Enabling longer sleep times on 32-bit machines Jon Hunter 2009-08-18 17:45 ` [PATCH 1/2] Dynamic Tick: Prevent clocksource wrapping during idle Jon Hunter @ 2009-11-11 20:43 ` john stultz 2009-11-11 20:57 ` Jon Hunter 1 sibling, 1 reply; 29+ messages in thread From: john stultz @ 2009-11-11 20:43 UTC (permalink / raw) To: Jon Hunter; +Cc: linux-kernel, Thomas Gleixner On Tue, Aug 18, 2009 at 9:45 AM, Jon Hunter <jon-hunter@ti.com> wrote: > From: Jon Hunter <jon-hunter@ti.com> > > This is a resend of the patch series shown here: > http://www.spinics.net/lists/kernel/msg891029.html > > This patch series has been rebase on the linux-2.6-tip timers/core branch per > request from Thomas Gleixner. > > This patch series ensures that the wrapping of the clocksource will not be > missed if the kernel sleeps for longer periods and allows 32-bit machines to > sleep for longer than 2.15 seconds. > > Jon Hunter (2): > Dynamic Tick: Prevent clocksource wrapping during idle > Dynamic Tick: Allow 32-bit machines to sleep for more than 2.15 > seconds I could have sworn this was in mainline by now, but I recently was looking for the code and can't find it there or in -tip either. Thomas, are they just hiding somewhere I can't find? Jon, you've been terribly patient and great about resubmitting these patches over and over. If I'm not just being crazy and missing these patches in front of my nose, are you still willing to submit them again? I think they'll be quite useful as folks start pushing the NOHZ idle times out. thanks -john ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 0/2] Dynamic Tick: Enabling longer sleep times on 32-bit machines 2009-11-11 20:43 ` [PATCH 0/2] Dynamic Tick: Enabling longer sleep times on 32-bit machines john stultz @ 2009-11-11 20:57 ` Jon Hunter 2009-11-11 22:37 ` john stultz 0 siblings, 1 reply; 29+ messages in thread From: Jon Hunter @ 2009-11-11 20:57 UTC (permalink / raw) To: john stultz; +Cc: linux-kernel, Thomas Gleixner john stultz wrote: > I could have sworn this was in mainline by now, but I recently was > looking for the code and can't find it there or in -tip either. > > Thomas, are they just hiding somewhere I can't find? > > Jon, you've been terribly patient and great about resubmitting these > patches over and over. If I'm not just being crazy and missing these > patches in front of my nose, are you still willing to submit them > again? I think they'll be quite useful as folks start pushing the NOHZ > idle times out. Absolutely! It is still on my to-do list, but unfortunately, I got busy with a couple other things. With regard to the last patch set I submitted for this, Thomas had an issue with one of the patches. I understand the concern, but I am not sure which would be the preferred way to handle this. See the below thread: http://marc.info/?l=linux-kernel&m=125062817124381&w=2 If you or Thomas have any feedback on this, I could re-work the patch against the latest kernel tree. Cheers Jon ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 0/2] Dynamic Tick: Enabling longer sleep times on 32-bit machines 2009-11-11 20:57 ` Jon Hunter @ 2009-11-11 22:37 ` john stultz 0 siblings, 0 replies; 29+ messages in thread From: john stultz @ 2009-11-11 22:37 UTC (permalink / raw) To: Jon Hunter; +Cc: linux-kernel, Thomas Gleixner On Wed, 2009-11-11 at 14:57 -0600, Jon Hunter wrote: > john stultz wrote: > > I could have sworn this was in mainline by now, but I recently was > > looking for the code and can't find it there or in -tip either. > > > > Thomas, are they just hiding somewhere I can't find? > > > > Jon, you've been terribly patient and great about resubmitting these > > patches over and over. If I'm not just being crazy and missing these > > patches in front of my nose, are you still willing to submit them > > again? I think they'll be quite useful as folks start pushing the NOHZ > > idle times out. > > Absolutely! It is still on my to-do list, but unfortunately, I got busy > with a couple other things. > > With regard to the last patch set I submitted for this, Thomas had an > issue with one of the patches. I understand the concern, but I am not > sure which would be the preferred way to handle this. See the below thread: > > http://marc.info/?l=linux-kernel&m=125062817124381&w=2 > > If you or Thomas have any feedback on this, I could re-work the patch > against the latest kernel tree. Ok. I think Thomas is right there, setting the expiration to max_time_delta makes the most sense. Honestly I suspect we don't ever hit that case in the current code (no timers for 12 days), so its probably an untested code path as it stands. thanks -john ^ permalink raw reply [flat|nested] 29+ messages in thread
* [PATCH 0/2] Dynamic Tick: Enabling longer sleep times on 32-bit
@ 2009-07-28 0:00 Jon Hunter
2009-07-28 0:00 ` [PATCH 1/2] Dynamic Tick: Prevent clocksource wrapping during idle Jon Hunter
0 siblings, 1 reply; 29+ messages in thread
From: Jon Hunter @ 2009-07-28 0:00 UTC (permalink / raw)
To: linux-kernel; +Cc: Thomas Gleixner, John Stultz, Jon Hunter
From: Jon Hunter <jon-hunter@ti.com>
This is a resend of the patch series shown here:
http://www.spinics.net/lists/kernel/msg891029.html
This patch series has been updated based on the feedback received and
rebased against the current kernel.
This patch series ensures that the wrapping of the clocksource will not be
missed if the kernel sleeps for longer periods and allows 32-bit machines to
sleep for longer than 2.15 seconds.
Jon Hunter (2):
Dynamic Tick: Prevent clocksource wrapping during idle
Dynamic Tick: Allow 32-bit machines to sleep for more than 2.15
seconds
include/linux/clockchips.h | 6 ++--
include/linux/clocksource.h | 2 +
include/linux/time.h | 1 +
kernel/hrtimer.c | 2 +-
kernel/time/clockevents.c | 10 ++++----
kernel/time/clocksource.c | 47 +++++++++++++++++++++++++++++++++++
kernel/time/tick-oneshot.c | 2 +-
kernel/time/tick-sched.c | 57 ++++++++++++++++++++++++++++++++----------
kernel/time/timekeeping.c | 11 ++++++++
kernel/time/timer_list.c | 4 +-
10 files changed, 116 insertions(+), 26 deletions(-)
^ permalink raw reply [flat|nested] 29+ messages in thread* [PATCH 1/2] Dynamic Tick: Prevent clocksource wrapping during idle 2009-07-28 0:00 [PATCH 0/2] Dynamic Tick: Enabling longer sleep times on 32-bit Jon Hunter @ 2009-07-28 0:00 ` Jon Hunter 0 siblings, 0 replies; 29+ messages in thread From: Jon Hunter @ 2009-07-28 0:00 UTC (permalink / raw) To: linux-kernel; +Cc: Thomas Gleixner, John Stultz, Jon Hunter From: Jon Hunter <jon-hunter@ti.com> The dynamic tick allows the kernel to sleep for periods longer than a single tick. This patch prevents that the kernel from sleeping for a period longer than the maximum time that the current clocksource can count. This ensures that the kernel will not lose track of time. This patch adds a function called "clocksource_max_deferment()" that calculates the maximum time the kernel can sleep for a given clocksource and function called "timekeeping_max_deferment()" that returns maximum time the kernel can sleep for the current clocksource. Signed-off-by: Jon Hunter <jon-hunter@ti.com> --- include/linux/clocksource.h | 2 + include/linux/time.h | 1 + kernel/time/clocksource.c | 47 +++++++++++++++++++++++++++++++++++ kernel/time/tick-sched.c | 57 ++++++++++++++++++++++++++++++++---------- kernel/time/timekeeping.c | 11 ++++++++ 5 files changed, 104 insertions(+), 14 deletions(-) diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h index c56457c..5528090 100644 --- a/include/linux/clocksource.h +++ b/include/linux/clocksource.h @@ -151,6 +151,7 @@ extern u64 timecounter_cyc2time(struct timecounter *tc, * @mult: cycle to nanosecond multiplier (adjusted by NTP) * @mult_orig: cycle to nanosecond multiplier (unadjusted by NTP) * @shift: cycle to nanosecond divisor (power of two) + * @max_idle_ns: max idle time permitted by the clocksource (nsecs) * @flags: flags describing special properties * @vread: vsyscall based read * @resume: resume function for the clocksource, if necessary @@ -171,6 +172,7 @@ struct clocksource { u32 mult; u32 mult_orig; u32 shift; + s64 max_idle_ns; unsigned long flags; cycle_t (*vread)(void); void (*resume)(void); diff --git a/include/linux/time.h b/include/linux/time.h index ea16c1a..ddcff53 100644 --- a/include/linux/time.h +++ b/include/linux/time.h @@ -145,6 +145,7 @@ extern void monotonic_to_bootbased(struct timespec *ts); extern struct timespec timespec_trunc(struct timespec t, unsigned gran); extern int timekeeping_valid_for_hres(void); +extern s64 timekeeping_max_deferment(void); extern void update_wall_time(void); extern void update_xtime_cache(u64 nsec); diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c index 7466cb8..fa28f29 100644 --- a/kernel/time/clocksource.c +++ b/kernel/time/clocksource.c @@ -321,6 +321,50 @@ void clocksource_touch_watchdog(void) } /** + * clocksource_max_deferment - Returns max time the clocksource can be deferred + * @cs: Pointer to clocksource + * + */ +static s64 clocksource_max_deferment(struct clocksource *cs) +{ + s64 max_nsecs; + u64 max_cycles; + + /* + * Calculate the maximum number of cycles that we can pass to the + * cyc2ns function without overflowing a 64-bit signed result. The + * maximum number of cycles is equal to ULLONG_MAX/cs->mult which + * is equivalent to the below. + * max_cycles < (2^63)/cs->mult + * max_cycles < 2^(log2((2^63)/cs->mult)) + * max_cycles < 2^(log2(2^63) - log2(cs->mult)) + * max_cycles < 2^(63 - log2(cs->mult)) + * max_cycles < 1 << (63 - log2(cs->mult)) + * Please note that we add 1 to the result of the log2 to account for + * any rounding errors, ensure the above inequality is satisfied and + * no overflow will occur. + */ + max_cycles = 1ULL << (63 - (ilog2(cs->mult) + 1)); + + /* + * The actual maximum number of cycles we can defer the clocksource is + * determined by the minimum of max_cycles and cs->mask. + */ + max_cycles = min(max_cycles, cs->mask); + max_nsecs = cyc2ns(cs, max_cycles); + + /* + * To ensure that the clocksource does not wrap whilst we are idle, + * limit the time the clocksource can be deferred by 12.5%. Please + * note a margin of 12.5% is used because this can be computed with + * a shift, versus say 10% which would require division. + */ + max_nsecs = max_nsecs - (max_nsecs >> 5); + + return max_nsecs; +} + +/** * clocksource_get_next - Returns the selected clocksource * */ @@ -402,6 +446,9 @@ int clocksource_register(struct clocksource *c) unsigned long flags; int ret; + /* calculate max idle time permitted for this clocksource */ + c->max_idle_ns = clocksource_max_deferment(c); + spin_lock_irqsave(&clocksource_lock, flags); ret = clocksource_enqueue(c); if (!ret) diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c index e0f59a2..7a98e90 100644 --- a/kernel/time/tick-sched.c +++ b/kernel/time/tick-sched.c @@ -217,6 +217,7 @@ void tick_nohz_stop_sched_tick(int inidle) ktime_t last_update, expires, now; struct clock_event_device *dev = __get_cpu_var(tick_cpu_device).evtdev; int cpu; + s64 time_delta, max_time_delta; local_irq_save(flags); @@ -270,6 +271,18 @@ void tick_nohz_stop_sched_tick(int inidle) seq = read_seqbegin(&xtime_lock); last_update = last_jiffies_update; last_jiffies = jiffies; + + /* + * On SMP we really should only care for the CPU which + * has the do_timer duty assigned. All other CPUs can + * sleep as long as they want. + */ + if (cpu == tick_do_timer_cpu || + tick_do_timer_cpu == TICK_DO_TIMER_NONE) + max_time_delta = timekeeping_max_deferment(); + else + max_time_delta = KTIME_MAX; + } while (read_seqretry(&xtime_lock, seq)); /* Get the next timer wheel timer */ @@ -289,11 +302,30 @@ void tick_nohz_stop_sched_tick(int inidle) if ((long)delta_jiffies >= 1) { /* - * calculate the expiry time for the next timer wheel - * timer - */ - expires = ktime_add_ns(last_update, tick_period.tv64 * - delta_jiffies); + * calculate the expiry time for the next timer wheel + * timer. delta_jiffies >= NEXT_TIMER_MAX_DELTA signals + * that there is no timer pending or at least extremely + * far into the future (12 days for HZ=1000). In this + * case we set the expiry to the end of time. + */ + if (likely(delta_jiffies < NEXT_TIMER_MAX_DELTA)) { + + /* + * Calculate the time delta for the next timer event. + * If the time delta exceeds the maximum time delta + * permitted by the current clocksource then adjust + * the time delta accordingly to ensure the + * clocksource does not wrap. + */ + time_delta = tick_period.tv64 * delta_jiffies; + + if (time_delta > max_time_delta) + time_delta = max_time_delta; + + expires = ktime_add_ns(last_update, time_delta); + } else { + expires.tv64 = KTIME_MAX; + } /* * If this cpu is the one which updates jiffies, then @@ -337,22 +369,19 @@ void tick_nohz_stop_sched_tick(int inidle) ts->idle_sleeps++; + /* Mark expires */ + ts->idle_expires = expires; + /* - * delta_jiffies >= NEXT_TIMER_MAX_DELTA signals that - * there is no timer pending or at least extremly far - * into the future (12 days for HZ=1000). In this case - * we simply stop the tick timer: + * If the expiration time == KTIME_MAX, then + * in this case we simply stop the tick timer. */ - if (unlikely(delta_jiffies >= NEXT_TIMER_MAX_DELTA)) { - ts->idle_expires.tv64 = KTIME_MAX; + if (unlikely(expires.tv64 == KTIME_MAX)) { if (ts->nohz_mode == NOHZ_MODE_HIGHRES) hrtimer_cancel(&ts->sched_timer); goto out; } - /* Mark expiries */ - ts->idle_expires = expires; - if (ts->nohz_mode == NOHZ_MODE_HIGHRES) { hrtimer_start(&ts->sched_timer, expires, HRTIMER_MODE_ABS_PINNED); diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c index e8c77d9..cd1b110 100644 --- a/kernel/time/timekeeping.c +++ b/kernel/time/timekeeping.c @@ -278,6 +278,17 @@ int timekeeping_valid_for_hres(void) } /** + * timekeeping_max_deferment - Returns max time the clocksource can be deferred + * + * IMPORTANT: Caller must observe xtime_lock via read_seqbegin/read_seqretry + * to ensure that the clocksource does not change! + */ +s64 timekeeping_max_deferment(void) +{ + return clock->max_idle_ns; +} + +/** * read_persistent_clock - Return time in seconds from the persistent clock. * * Weak dummy function for arches that do not yet support it. -- 1.6.0.4 ^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH 1/2] Dynamic Tick: Prevent clocksource wrapping during idle
@ 2009-05-27 14:49 Jon Hunter
2009-05-27 16:01 ` Thomas Gleixner
` (2 more replies)
0 siblings, 3 replies; 29+ messages in thread
From: Jon Hunter @ 2009-05-27 14:49 UTC (permalink / raw)
To: linux-kernel@vger.kernel.org; +Cc: john stultz, Thomas Gleixner, Ingo Molnar
The dynamic tick allows the kernel to sleep for periods longer than a
single tick. This patch prevents that the kernel from sleeping for a
period longer than the maximum time that the current clocksource can
count. This ensures that the kernel will not lose track of time. This
patch adds a new function called "timekeeping_max_deferment()" that
calculates the maximum time the kernel can sleep for a given clocksource.
Signed-off-by: Jon Hunter <jon-hunter@ti.com>
---
include/linux/time.h | 1 +
kernel/time/tick-sched.c | 36 +++++++++++++++++++++++----------
kernel/time/timekeeping.c | 47
+++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 73 insertions(+), 11 deletions(-)
diff --git a/include/linux/time.h b/include/linux/time.h
index 242f624..090be07 100644
--- a/include/linux/time.h
+++ b/include/linux/time.h
@@ -130,6 +130,7 @@ extern void monotonic_to_bootbased(struct timespec *ts);
extern struct timespec timespec_trunc(struct timespec t, unsigned gran);
extern int timekeeping_valid_for_hres(void);
+extern s64 timekeeping_max_deferment(void);
extern void update_wall_time(void);
extern void update_xtime_cache(u64 nsec);
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index d3f1ef4..f0155ae 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -217,6 +217,7 @@ void tick_nohz_stop_sched_tick(int inidle)
ktime_t last_update, expires, now;
struct clock_event_device *dev = __get_cpu_var(tick_cpu_device).evtdev;
int cpu;
+ s64 time_delta, max_time_delta;
local_irq_save(flags);
@@ -264,6 +265,7 @@ void tick_nohz_stop_sched_tick(int inidle)
seq = read_seqbegin(&xtime_lock);
last_update = last_jiffies_update;
last_jiffies = jiffies;
+ max_time_delta = timekeeping_max_deferment();
} while (read_seqretry(&xtime_lock, seq));
/* Get the next timer wheel timer */
@@ -283,11 +285,22 @@ void tick_nohz_stop_sched_tick(int inidle)
if ((long)delta_jiffies >= 1) {
/*
- * calculate the expiry time for the next timer wheel
- * timer
- */
- expires = ktime_add_ns(last_update, tick_period.tv64 *
- delta_jiffies);
+ * Calculate the time delta for the next timer event.
+ * If the time delta exceeds the maximum time delta
+ * permitted by the current clocksource then adjust
+ * the time delta accordingly to ensure the
+ * clocksource does not wrap.
+ */
+ time_delta = tick_period.tv64 * delta_jiffies;
+
+ if (time_delta > max_time_delta)
+ time_delta = max_time_delta;
+
+ /*
+ * calculate the expiry time for the next timer wheel
+ * timer
+ */
+ expires = ktime_add_ns(last_update, time_delta);
/*
* If this cpu is the one which updates jiffies, then
@@ -300,7 +313,7 @@ void tick_nohz_stop_sched_tick(int inidle)
if (cpu == tick_do_timer_cpu)
tick_do_timer_cpu = TICK_DO_TIMER_NONE;
- if (delta_jiffies > 1)
+ if (time_delta > tick_period.tv64)
cpumask_set_cpu(cpu, nohz_cpu_mask);
/* Skip reprogram of event if its not changed */
@@ -332,12 +345,13 @@ void tick_nohz_stop_sched_tick(int inidle)
ts->idle_sleeps++;
/*
- * delta_jiffies >= NEXT_TIMER_MAX_DELTA signals that
- * there is no timer pending or at least extremly far
- * into the future (12 days for HZ=1000). In this case
- * we simply stop the tick timer:
+ * time_delta >= (tick_period.tv64 * NEXT_TIMER_MAX_DELTA)
+ * signals that there is no timer pending or at least
+ * extremely far into the future (12 days for HZ=1000).
+ * In this case we simply stop the tick timer:
*/
- if (unlikely(delta_jiffies >= NEXT_TIMER_MAX_DELTA)) {
+ if (unlikely(time_delta >=
+ (tick_period.tv64 * NEXT_TIMER_MAX_DELTA))) {
ts->idle_expires.tv64 = KTIME_MAX;
if (ts->nohz_mode == NOHZ_MODE_HIGHRES)
hrtimer_cancel(&ts->sched_timer);
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 687dff4..608fc6f 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -271,6 +271,53 @@ int timekeeping_valid_for_hres(void)
}
/**
+ * timekeeping_max_deferment - Returns max time the clocksource can be
deferred
+ *
+ * IMPORTANT: Must be called with xtime_lock held!
+ */
+s64 timekeeping_max_deferment(void)
+{
+ s64 max_nsecs;
+ u64 max_cycles;
+
+ /*
+ * Calculate the maximum number of cycles that we can pass to the
+ * cyc2ns function without overflowing a 64-bit signed result. The
+ * maximum number of cycles is equal to ULLONG_MAX/clock->mult which
+ * is equivalent to the below.
+ * max_cycles < (2^63)/clock->mult
+ * max_cycles < 2^(log2((2^63)/clock->mult))
+ * max_cycles < 2^(log2(2^63) - log2(clock->mult))
+ * max_cycles < 2^(63 - log2(clock->mult))
+ * max_cycles < 1 << (63 - log2(clock->mult))
+ * Please note that we add 1 to the result of the log2 to account for
+ * any rounding errors, ensure the above inequality is satisfied and
+ * no overflow will occur.
+ */
+ max_cycles = 1ULL << (63 - (ilog2(clock->mult) + 1));
+
+ /*
+ * The actual maximum number of cycles we can defer the clocksource is
+ * determined by the minimum of max_cycles and clock->mask.
+ */
+ max_cycles = min(max_cycles, clock->mask);
+ max_nsecs = cyc2ns(clock, max_cycles);
+
+ /*
+ * To ensure that the clocksource does not wrap whilst we are idle,
+ * limit the time the clocksource can be deferred by 6.25%. Please
+ * note a margin of 6.25% is used because this can be computed with
+ * a shift, versus say 5% which would require division.
+ */
+ max_nsecs = max_nsecs - (max_nsecs >> 4);
+
+ if (max_nsecs < 0)
+ max_nsecs = 0;
+
+ return max_nsecs;
+}
+
+/**
* read_persistent_clock - Return time in seconds from the persistent
clock.
*
* Weak dummy function for arches that do not yet support it.
--
1.6.1
^ permalink raw reply related [flat|nested] 29+ messages in thread* Re: [PATCH 1/2] Dynamic Tick: Prevent clocksource wrapping during idle 2009-05-27 14:49 Jon Hunter @ 2009-05-27 16:01 ` Thomas Gleixner 2009-05-27 20:20 ` john stultz 2009-05-27 18:15 ` john stultz 2009-05-27 20:54 ` Alok Kataria 2 siblings, 1 reply; 29+ messages in thread From: Thomas Gleixner @ 2009-05-27 16:01 UTC (permalink / raw) To: Jon Hunter; +Cc: linux-kernel@vger.kernel.org, john stultz, Ingo Molnar On Wed, 27 May 2009, Jon Hunter wrote: > /** > + * timekeeping_max_deferment - Returns max time the clocksource can be > deferred > + * > + * IMPORTANT: Must be called with xtime_lock held! No, that would mean that xtime_lock needs to be write locked. And we definitely do not want that. The caller needs to observe xtime_lock via read_seqbegin / read_seqretry because clock might change. > + */ > +s64 timekeeping_max_deferment(void) > +{ > + s64 max_nsecs; > + u64 max_cycles; > + > + /* > + * Calculate the maximum number of cycles that we can pass to the > + * cyc2ns function without overflowing a 64-bit signed result. The > + * maximum number of cycles is equal to ULLONG_MAX/clock->mult which > + * is equivalent to the below. > + * max_cycles < (2^63)/clock->mult > + * max_cycles < 2^(log2((2^63)/clock->mult)) > + * max_cycles < 2^(log2(2^63) - log2(clock->mult)) > + * max_cycles < 2^(63 - log2(clock->mult)) > + * max_cycles < 1 << (63 - log2(clock->mult)) > + * Please note that we add 1 to the result of the log2 to account for > + * any rounding errors, ensure the above inequality is satisfied and > + * no overflow will occur. > + */ > + max_cycles = 1ULL << (63 - (ilog2(clock->mult) + 1)); > + > + /* > + * The actual maximum number of cycles we can defer the clocksource is > + * determined by the minimum of max_cycles and clock->mask. > + */ > + max_cycles = min(max_cycles, clock->mask); > + max_nsecs = cyc2ns(clock, max_cycles); Why do you want to recalculate the whole stuff over and over ? That computation can be done when the clock source is initialized or any fundamental change of the clock parameters happens. Stick that value into the clocksource struct and just read it out. > + /* > + * To ensure that the clocksource does not wrap whilst we are idle, > + * limit the time the clocksource can be deferred by 6.25%. Please > + * note a margin of 6.25% is used because this can be computed with > + * a shift, versus say 5% which would require division. > + */ > + max_nsecs = max_nsecs - (max_nsecs >> 4); > + > + if (max_nsecs < 0) > + max_nsecs = 0; How does "max_nsecs = max_nsecs - (max_nsecs >> 4)" ever become negative ? Thanks, tglx ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 1/2] Dynamic Tick: Prevent clocksource wrapping during idle 2009-05-27 16:01 ` Thomas Gleixner @ 2009-05-27 20:20 ` john stultz 2009-05-27 20:32 ` Thomas Gleixner 0 siblings, 1 reply; 29+ messages in thread From: john stultz @ 2009-05-27 20:20 UTC (permalink / raw) To: Thomas Gleixner; +Cc: Jon Hunter, linux-kernel@vger.kernel.org, Ingo Molnar On Wed, 2009-05-27 at 18:01 +0200, Thomas Gleixner wrote: > On Wed, 27 May 2009, Jon Hunter wrote: > > + */ > > +s64 timekeeping_max_deferment(void) > > +{ > > + s64 max_nsecs; > > + u64 max_cycles; > > + > > + /* > > + * Calculate the maximum number of cycles that we can pass to the > > + * cyc2ns function without overflowing a 64-bit signed result. The > > + * maximum number of cycles is equal to ULLONG_MAX/clock->mult which > > + * is equivalent to the below. > > + * max_cycles < (2^63)/clock->mult > > + * max_cycles < 2^(log2((2^63)/clock->mult)) > > + * max_cycles < 2^(log2(2^63) - log2(clock->mult)) > > + * max_cycles < 2^(63 - log2(clock->mult)) > > + * max_cycles < 1 << (63 - log2(clock->mult)) > > + * Please note that we add 1 to the result of the log2 to account for > > + * any rounding errors, ensure the above inequality is satisfied and > > + * no overflow will occur. > > + */ > > + max_cycles = 1ULL << (63 - (ilog2(clock->mult) + 1)); > > + > > + /* > > + * The actual maximum number of cycles we can defer the clocksource is > > + * determined by the minimum of max_cycles and clock->mask. > > + */ > > + max_cycles = min(max_cycles, clock->mask); > > + max_nsecs = cyc2ns(clock, max_cycles); > > Why do you want to recalculate the whole stuff over and over ? > > That computation can be done when the clock source is initialized or > any fundamental change of the clock parameters happens. > > Stick that value into the clocksource struct and just read it out. Sigh. I was hoping to avoid hanging another bit of junk off of the clocksource struct. But I guess we could compute that value on registration and keep it around. Changes to mult could effect things, but should be well within the 6% safety net we give ourselves. > > + /* > > + * To ensure that the clocksource does not wrap whilst we are idle, > > + * limit the time the clocksource can be deferred by 6.25%. Please > > + * note a margin of 6.25% is used because this can be computed with > > + * a shift, versus say 5% which would require division. > > + */ > > + max_nsecs = max_nsecs - (max_nsecs >> 4); > > + > > + if (max_nsecs < 0) > > + max_nsecs = 0; > > How does "max_nsecs = max_nsecs - (max_nsecs >> 4)" ever become > negative ? Fair point. Now we've limited the overflow case, we shouldn't trip negative values. thanks -john ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 1/2] Dynamic Tick: Prevent clocksource wrapping during idle 2009-05-27 20:20 ` john stultz @ 2009-05-27 20:32 ` Thomas Gleixner 2009-05-28 20:21 ` Jon Hunter 0 siblings, 1 reply; 29+ messages in thread From: Thomas Gleixner @ 2009-05-27 20:32 UTC (permalink / raw) To: john stultz; +Cc: Jon Hunter, linux-kernel@vger.kernel.org, Ingo Molnar On Wed, 27 May 2009, john stultz wrote: > > Why do you want to recalculate the whole stuff over and over ? > > > > That computation can be done when the clock source is initialized or > > any fundamental change of the clock parameters happens. > > > > Stick that value into the clocksource struct and just read it out. > > Sigh. > > I was hoping to avoid hanging another bit of junk off of the clocksource > struct. Sure, but buying that 8 bytes with Einsteinian insanity is nuts. > But I guess we could compute that value on registration and keep it > around. Changes to mult could effect things, but should be well within > the 6% safety net we give ourselves. That was my thought as well, but even if we have to go to 12% it's way better than doing repeated nonsense on the way to idle. Thanks, tglx ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 1/2] Dynamic Tick: Prevent clocksource wrapping during idle 2009-05-27 20:32 ` Thomas Gleixner @ 2009-05-28 20:21 ` Jon Hunter 2009-05-28 20:36 ` Thomas Gleixner 0 siblings, 1 reply; 29+ messages in thread From: Jon Hunter @ 2009-05-28 20:21 UTC (permalink / raw) To: Thomas Gleixner; +Cc: john stultz, linux-kernel@vger.kernel.org, Ingo Molnar Thomas Gleixner wrote: > On Wed, 27 May 2009, john stultz wrote: >>> Why do you want to recalculate the whole stuff over and over ? >>> >>> That computation can be done when the clock source is initialized or >>> any fundamental change of the clock parameters happens. >>> >>> Stick that value into the clocksource struct and just read it out. >> Sigh. >> >> I was hoping to avoid hanging another bit of junk off of the clocksource >> struct. > > Sure, but buying that 8 bytes with Einsteinian insanity is nuts. Ok, I have re-worked the patch to avoid computing the value over and over and just use the original mult value to calculate the max deferment. This patch calculates the value on registering the clocksource and stores the value in the clocksource struct. >> But I guess we could compute that value on registration and keep it >> around. Changes to mult could effect things, but should be well within >> the 6% safety net we give ourselves. > > That was my thought as well, but even if we have to go to 12% it's way > better than doing repeated nonsense on the way to idle. For now I have modified the patch to go to 12.5% margin to be on the safe side. Let me know your thoughts on the below. Cheers Jon The dynamic tick allows the kernel to sleep for periods longer than a single tick. This patch prevents that the kernel from sleeping for a period longer than the maximum time that the current clocksource can count. This ensures that the kernel will not lose track of time. This patch adds a function called "clocksource_max_deferment()" that calculates the maximum time the kernel can sleep for a given clocksource and function called "timekeeping_max_deferment()" that returns maximum time the kernel can sleep for the current clocksource. Signed-off-by: Jon Hunter <jon-hunter@ti.com> --- include/linux/clocksource.h | 46 +++++++++++++++++++++++++++++++++++++++++++ include/linux/time.h | 1 + kernel/time/clocksource.c | 3 ++ kernel/time/tick-sched.c | 36 +++++++++++++++++++++++---------- kernel/time/timekeeping.c | 11 ++++++++++ 5 files changed, 86 insertions(+), 11 deletions(-) diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h index 5a40d14..b0d676e 100644 --- a/include/linux/clocksource.h +++ b/include/linux/clocksource.h @@ -151,6 +151,7 @@ extern u64 timecounter_cyc2time(struct timecounter *tc, * @mult: cycle to nanosecond multiplier (adjusted by NTP) * @mult_orig: cycle to nanosecond multiplier (unadjusted by NTP) * @shift: cycle to nanosecond divisor (power of two) + * @max_idle_ns: max idle time permitted by the clocksource (nsecs) * @flags: flags describing special properties * @vread: vsyscall based read * @resume: resume function for the clocksource, if necessary @@ -171,6 +172,7 @@ struct clocksource { u32 mult; u32 mult_orig; u32 shift; + s64 max_idle_ns; unsigned long flags; cycle_t (*vread)(void); void (*resume)(void); @@ -322,6 +324,50 @@ static inline s64 cyc2ns(struct clocksource *cs, cycle_t cycles) } /** + * clocksource_max_deferment - Returns max time the clocksource can be deferred + * @cs: Pointer to clocksource + * + */ +static inline s64 clocksource_max_deferment(struct clocksource *cs) +{ + s64 max_nsecs; + u64 max_cycles; + + /* + * Calculate the maximum number of cycles that we can pass to the + * cyc2ns function without overflowing a 64-bit signed result. The + * maximum number of cycles is equal to ULLONG_MAX/cs->mult which + * is equivalent to the below. + * max_cycles < (2^63)/cs->mult + * max_cycles < 2^(log2((2^63)/cs->mult)) + * max_cycles < 2^(log2(2^63) - log2(cs->mult)) + * max_cycles < 2^(63 - log2(cs->mult)) + * max_cycles < 1 << (63 - log2(cs->mult)) + * Please note that we add 1 to the result of the log2 to account for + * any rounding errors, ensure the above inequality is satisfied and + * no overflow will occur. + */ + max_cycles = 1ULL << (63 - (ilog2(cs->mult) + 1)); + + /* + * The actual maximum number of cycles we can defer the clocksource is + * determined by the minimum of max_cycles and cs->mask. + */ + max_cycles = min(max_cycles, cs->mask); + max_nsecs = cyc2ns(cs, max_cycles); + + /* + * To ensure that the clocksource does not wrap whilst we are idle, + * limit the time the clocksource can be deferred by 12.5%. Please + * note a margin of 12.5% is used because this can be computed with + * a shift, versus say 10% which would require division. + */ + max_nsecs = max_nsecs - (max_nsecs >> 5); + + return max_nsecs; +} + +/** * clocksource_calculate_interval - Calculates a clocksource interval struct * * @c: Pointer to clocksource. diff --git a/include/linux/time.h b/include/linux/time.h index 242f624..090be07 100644 --- a/include/linux/time.h +++ b/include/linux/time.h @@ -130,6 +130,7 @@ extern void monotonic_to_bootbased(struct timespec *ts); extern struct timespec timespec_trunc(struct timespec t, unsigned gran); extern int timekeeping_valid_for_hres(void); +extern s64 timekeeping_max_deferment(void); extern void update_wall_time(void); extern void update_xtime_cache(u64 nsec); diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c index ecfd7b5..0d98dc2 100644 --- a/kernel/time/clocksource.c +++ b/kernel/time/clocksource.c @@ -405,6 +405,9 @@ int clocksource_register(struct clocksource *c) /* save mult_orig on registration */ c->mult_orig = c->mult; + /* calculate max idle time permitted for this clocksource */ + c->max_idle_ns = clocksource_max_deferment(c); + spin_lock_irqsave(&clocksource_lock, flags); ret = clocksource_enqueue(c); if (!ret) diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c index d3f1ef4..f0155ae 100644 --- a/kernel/time/tick-sched.c +++ b/kernel/time/tick-sched.c @@ -217,6 +217,7 @@ void tick_nohz_stop_sched_tick(int inidle) ktime_t last_update, expires, now; struct clock_event_device *dev = __get_cpu_var(tick_cpu_device).evtdev; int cpu; + s64 time_delta, max_time_delta; local_irq_save(flags); @@ -264,6 +265,7 @@ void tick_nohz_stop_sched_tick(int inidle) seq = read_seqbegin(&xtime_lock); last_update = last_jiffies_update; last_jiffies = jiffies; + max_time_delta = timekeeping_max_deferment(); } while (read_seqretry(&xtime_lock, seq)); /* Get the next timer wheel timer */ @@ -283,11 +285,22 @@ void tick_nohz_stop_sched_tick(int inidle) if ((long)delta_jiffies >= 1) { /* - * calculate the expiry time for the next timer wheel - * timer - */ - expires = ktime_add_ns(last_update, tick_period.tv64 * - delta_jiffies); + * Calculate the time delta for the next timer event. + * If the time delta exceeds the maximum time delta + * permitted by the current clocksource then adjust + * the time delta accordingly to ensure the + * clocksource does not wrap. + */ + time_delta = tick_period.tv64 * delta_jiffies; + + if (time_delta > max_time_delta) + time_delta = max_time_delta; + + /* + * calculate the expiry time for the next timer wheel + * timer + */ + expires = ktime_add_ns(last_update, time_delta); /* * If this cpu is the one which updates jiffies, then @@ -300,7 +313,7 @@ void tick_nohz_stop_sched_tick(int inidle) if (cpu == tick_do_timer_cpu) tick_do_timer_cpu = TICK_DO_TIMER_NONE; - if (delta_jiffies > 1) + if (time_delta > tick_period.tv64) cpumask_set_cpu(cpu, nohz_cpu_mask); /* Skip reprogram of event if its not changed */ @@ -332,12 +345,13 @@ void tick_nohz_stop_sched_tick(int inidle) ts->idle_sleeps++; /* - * delta_jiffies >= NEXT_TIMER_MAX_DELTA signals that - * there is no timer pending or at least extremly far - * into the future (12 days for HZ=1000). In this case - * we simply stop the tick timer: + * time_delta >= (tick_period.tv64 * NEXT_TIMER_MAX_DELTA) + * signals that there is no timer pending or at least + * extremely far into the future (12 days for HZ=1000). + * In this case we simply stop the tick timer: */ - if (unlikely(delta_jiffies >= NEXT_TIMER_MAX_DELTA)) { + if (unlikely(time_delta >= + (tick_period.tv64 * NEXT_TIMER_MAX_DELTA))) { ts->idle_expires.tv64 = KTIME_MAX; if (ts->nohz_mode == NOHZ_MODE_HIGHRES) hrtimer_cancel(&ts->sched_timer); diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c index 687dff4..659cae3 100644 --- a/kernel/time/timekeeping.c +++ b/kernel/time/timekeeping.c @@ -271,6 +271,17 @@ int timekeeping_valid_for_hres(void) } /** + * timekeeping_max_deferment - Returns max time the clocksource can be deferred + * + * IMPORTANT: Caller must observe xtime_lock via read_seqbegin/read_seqretry + * to ensure that the clocksource does not change! + */ +s64 timekeeping_max_deferment(void) +{ + return clock->max_idle_ns; +} + +/** * read_persistent_clock - Return time in seconds from the persistent clock. * * Weak dummy function for arches that do not yet support it. -- 1.6.1 ^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: [PATCH 1/2] Dynamic Tick: Prevent clocksource wrapping during idle 2009-05-28 20:21 ` Jon Hunter @ 2009-05-28 20:36 ` Thomas Gleixner 2009-05-28 21:10 ` Jon Hunter 0 siblings, 1 reply; 29+ messages in thread From: Thomas Gleixner @ 2009-05-28 20:36 UTC (permalink / raw) To: Jon Hunter; +Cc: john stultz, linux-kernel@vger.kernel.org, Ingo Molnar On Thu, 28 May 2009, Jon Hunter wrote: > /** > + * clocksource_max_deferment - Returns max time the clocksource can be > deferred > + * @cs: Pointer to clocksource > + * > + */ > +static inline s64 clocksource_max_deferment(struct clocksource *cs) Please make this a real function. There is no reason to stick this into a header file. The only user is clocksource.c anyway, so please put it there as a static function and let the compiler decide what to do with it. Otherwise, I'm happy with it. Thanks, tglx ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 1/2] Dynamic Tick: Prevent clocksource wrapping during idle 2009-05-28 20:36 ` Thomas Gleixner @ 2009-05-28 21:10 ` Jon Hunter 2009-05-28 21:43 ` John Stultz ` (2 more replies) 0 siblings, 3 replies; 29+ messages in thread From: Jon Hunter @ 2009-05-28 21:10 UTC (permalink / raw) To: Thomas Gleixner; +Cc: john stultz, linux-kernel@vger.kernel.org, Ingo Molnar Thomas Gleixner wrote: > Please make this a real function. There is no reason to stick this > into a header file. The only user is clocksource.c anyway, so please > put it there as a static function and let the compiler decide what > to do with it. No problem. Please see below. Let me know if this is ok and there is anything else. Cheers Jon The dynamic tick allows the kernel to sleep for periods longer than a single tick. This patch prevents that the kernel from sleeping for a period longer than the maximum time that the current clocksource can count. This ensures that the kernel will not lose track of time. This patch adds a function called "clocksource_max_deferment()" that calculates the maximum time the kernel can sleep for a given clocksource and function called "timekeeping_max_deferment()" that returns maximum time the kernel can sleep for the current clocksource. Signed-off-by: Jon Hunter <jon-hunter@ti.com> --- include/linux/clocksource.h | 2 + include/linux/time.h | 1 + kernel/time/clocksource.c | 47 +++++++++++++++++++++++++++++++++++++++++++ kernel/time/tick-sched.c | 36 ++++++++++++++++++++++---------- kernel/time/timekeeping.c | 11 ++++++++++ 5 files changed, 86 insertions(+), 11 deletions(-) diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h index 5a40d14..465af22 100644 --- a/include/linux/clocksource.h +++ b/include/linux/clocksource.h @@ -151,6 +151,7 @@ extern u64 timecounter_cyc2time(struct timecounter *tc, * @mult: cycle to nanosecond multiplier (adjusted by NTP) * @mult_orig: cycle to nanosecond multiplier (unadjusted by NTP) * @shift: cycle to nanosecond divisor (power of two) + * @max_idle_ns: max idle time permitted by the clocksource (nsecs) * @flags: flags describing special properties * @vread: vsyscall based read * @resume: resume function for the clocksource, if necessary @@ -171,6 +172,7 @@ struct clocksource { u32 mult; u32 mult_orig; u32 shift; + s64 max_idle_ns; unsigned long flags; cycle_t (*vread)(void); void (*resume)(void); diff --git a/include/linux/time.h b/include/linux/time.h index 242f624..090be07 100644 --- a/include/linux/time.h +++ b/include/linux/time.h @@ -130,6 +130,7 @@ extern void monotonic_to_bootbased(struct timespec *ts); extern struct timespec timespec_trunc(struct timespec t, unsigned gran); extern int timekeeping_valid_for_hres(void); +extern s64 timekeeping_max_deferment(void); extern void update_wall_time(void); extern void update_xtime_cache(u64 nsec); diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c index ecfd7b5..18d2b9f 100644 --- a/kernel/time/clocksource.c +++ b/kernel/time/clocksource.c @@ -321,6 +321,50 @@ void clocksource_touch_watchdog(void) } /** + * clocksource_max_deferment - Returns max time the clocksource can be deferred + * @cs: Pointer to clocksource + * + */ +static s64 clocksource_max_deferment(struct clocksource *cs) +{ + s64 max_nsecs; + u64 max_cycles; + + /* + * Calculate the maximum number of cycles that we can pass to the + * cyc2ns function without overflowing a 64-bit signed result. The + * maximum number of cycles is equal to ULLONG_MAX/cs->mult which + * is equivalent to the below. + * max_cycles < (2^63)/cs->mult + * max_cycles < 2^(log2((2^63)/cs->mult)) + * max_cycles < 2^(log2(2^63) - log2(cs->mult)) + * max_cycles < 2^(63 - log2(cs->mult)) + * max_cycles < 1 << (63 - log2(cs->mult)) + * Please note that we add 1 to the result of the log2 to account for + * any rounding errors, ensure the above inequality is satisfied and + * no overflow will occur. + */ + max_cycles = 1ULL << (63 - (ilog2(cs->mult) + 1)); + + /* + * The actual maximum number of cycles we can defer the clocksource is + * determined by the minimum of max_cycles and cs->mask. + */ + max_cycles = min(max_cycles, cs->mask); + max_nsecs = cyc2ns(cs, max_cycles); + + /* + * To ensure that the clocksource does not wrap whilst we are idle, + * limit the time the clocksource can be deferred by 12.5%. Please + * note a margin of 12.5% is used because this can be computed with + * a shift, versus say 10% which would require division. + */ + max_nsecs = max_nsecs - (max_nsecs >> 5); + + return max_nsecs; +} + +/** * clocksource_get_next - Returns the selected clocksource * */ @@ -405,6 +449,9 @@ int clocksource_register(struct clocksource *c) /* save mult_orig on registration */ c->mult_orig = c->mult; + /* calculate max idle time permitted for this clocksource */ + c->max_idle_ns = clocksource_max_deferment(c); + spin_lock_irqsave(&clocksource_lock, flags); ret = clocksource_enqueue(c); if (!ret) diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c index d3f1ef4..f0155ae 100644 --- a/kernel/time/tick-sched.c +++ b/kernel/time/tick-sched.c @@ -217,6 +217,7 @@ void tick_nohz_stop_sched_tick(int inidle) ktime_t last_update, expires, now; struct clock_event_device *dev = __get_cpu_var(tick_cpu_device).evtdev; int cpu; + s64 time_delta, max_time_delta; local_irq_save(flags); @@ -264,6 +265,7 @@ void tick_nohz_stop_sched_tick(int inidle) seq = read_seqbegin(&xtime_lock); last_update = last_jiffies_update; last_jiffies = jiffies; + max_time_delta = timekeeping_max_deferment(); } while (read_seqretry(&xtime_lock, seq)); /* Get the next timer wheel timer */ @@ -283,11 +285,22 @@ void tick_nohz_stop_sched_tick(int inidle) if ((long)delta_jiffies >= 1) { /* - * calculate the expiry time for the next timer wheel - * timer - */ - expires = ktime_add_ns(last_update, tick_period.tv64 * - delta_jiffies); + * Calculate the time delta for the next timer event. + * If the time delta exceeds the maximum time delta + * permitted by the current clocksource then adjust + * the time delta accordingly to ensure the + * clocksource does not wrap. + */ + time_delta = tick_period.tv64 * delta_jiffies; + + if (time_delta > max_time_delta) + time_delta = max_time_delta; + + /* + * calculate the expiry time for the next timer wheel + * timer + */ + expires = ktime_add_ns(last_update, time_delta); /* * If this cpu is the one which updates jiffies, then @@ -300,7 +313,7 @@ void tick_nohz_stop_sched_tick(int inidle) if (cpu == tick_do_timer_cpu) tick_do_timer_cpu = TICK_DO_TIMER_NONE; - if (delta_jiffies > 1) + if (time_delta > tick_period.tv64) cpumask_set_cpu(cpu, nohz_cpu_mask); /* Skip reprogram of event if its not changed */ @@ -332,12 +345,13 @@ void tick_nohz_stop_sched_tick(int inidle) ts->idle_sleeps++; /* - * delta_jiffies >= NEXT_TIMER_MAX_DELTA signals that - * there is no timer pending or at least extremly far - * into the future (12 days for HZ=1000). In this case - * we simply stop the tick timer: + * time_delta >= (tick_period.tv64 * NEXT_TIMER_MAX_DELTA) + * signals that there is no timer pending or at least + * extremely far into the future (12 days for HZ=1000). + * In this case we simply stop the tick timer: */ - if (unlikely(delta_jiffies >= NEXT_TIMER_MAX_DELTA)) { + if (unlikely(time_delta >= + (tick_period.tv64 * NEXT_TIMER_MAX_DELTA))) { ts->idle_expires.tv64 = KTIME_MAX; if (ts->nohz_mode == NOHZ_MODE_HIGHRES) hrtimer_cancel(&ts->sched_timer); diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c index 687dff4..659cae3 100644 --- a/kernel/time/timekeeping.c +++ b/kernel/time/timekeeping.c @@ -271,6 +271,17 @@ int timekeeping_valid_for_hres(void) } /** + * timekeeping_max_deferment - Returns max time the clocksource can be deferred + * + * IMPORTANT: Caller must observe xtime_lock via read_seqbegin/read_seqretry + * to ensure that the clocksource does not change! + */ +s64 timekeeping_max_deferment(void) +{ + return clock->max_idle_ns; +} + +/** * read_persistent_clock - Return time in seconds from the persistent clock. * * Weak dummy function for arches that do not yet support it. -- 1.6.1 ^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: [PATCH 1/2] Dynamic Tick: Prevent clocksource wrapping during idle 2009-05-28 21:10 ` Jon Hunter @ 2009-05-28 21:43 ` John Stultz 2009-05-28 22:16 ` Thomas Gleixner 2009-05-30 1:00 ` Jon Hunter 2 siblings, 0 replies; 29+ messages in thread From: John Stultz @ 2009-05-28 21:43 UTC (permalink / raw) To: Jon Hunter; +Cc: Thomas Gleixner, linux-kernel@vger.kernel.org, Ingo Molnar On Thu, 2009-05-28 at 16:10 -0500, Jon Hunter wrote: > Thomas Gleixner wrote: > > Please make this a real function. There is no reason to stick this > > into a header file. The only user is clocksource.c anyway, so please > > put it there as a static function and let the compiler decide what > > to do with it. > > No problem. Please see below. Let me know if this is ok and there is > anything else. > > Cheers > Jon > > The dynamic tick allows the kernel to sleep for periods longer > than a single tick. This patch prevents that the kernel from > sleeping for a period longer than the maximum time that the > current clocksource can count. This ensures that the kernel will > not lose track of time. This patch adds a function called > "clocksource_max_deferment()" that calculates the maximum time the > kernel can sleep for a given clocksource and function called > "timekeeping_max_deferment()" that returns maximum time the kernel > can sleep for the current clocksource. > > Signed-off-by: Jon Hunter <jon-hunter@ti.com> Thanks for putting up with my apparent misdirections and going around and around on this. :) Acked-by: John Stultz <johnstul@us.ibm.com> > --- > include/linux/clocksource.h | 2 + > include/linux/time.h | 1 + > kernel/time/clocksource.c | 47 > +++++++++++++++++++++++++++++++++++++++++++ > kernel/time/tick-sched.c | 36 ++++++++++++++++++++++---------- > kernel/time/timekeeping.c | 11 ++++++++++ > 5 files changed, 86 insertions(+), 11 deletions(-) > > diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h > index 5a40d14..465af22 100644 > --- a/include/linux/clocksource.h > +++ b/include/linux/clocksource.h > @@ -151,6 +151,7 @@ extern u64 timecounter_cyc2time(struct timecounter *tc, > * @mult: cycle to nanosecond multiplier (adjusted by NTP) > * @mult_orig: cycle to nanosecond multiplier (unadjusted by NTP) > * @shift: cycle to nanosecond divisor (power of two) > + * @max_idle_ns: max idle time permitted by the clocksource (nsecs) > * @flags: flags describing special properties > * @vread: vsyscall based read > * @resume: resume function for the clocksource, if necessary > @@ -171,6 +172,7 @@ struct clocksource { > u32 mult; > u32 mult_orig; > u32 shift; > + s64 max_idle_ns; > unsigned long flags; > cycle_t (*vread)(void); > void (*resume)(void); > diff --git a/include/linux/time.h b/include/linux/time.h > index 242f624..090be07 100644 > --- a/include/linux/time.h > +++ b/include/linux/time.h > @@ -130,6 +130,7 @@ extern void monotonic_to_bootbased(struct timespec *ts); > > extern struct timespec timespec_trunc(struct timespec t, unsigned gran); > extern int timekeeping_valid_for_hres(void); > +extern s64 timekeeping_max_deferment(void); > extern void update_wall_time(void); > extern void update_xtime_cache(u64 nsec); > > diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c > index ecfd7b5..18d2b9f 100644 > --- a/kernel/time/clocksource.c > +++ b/kernel/time/clocksource.c > @@ -321,6 +321,50 @@ void clocksource_touch_watchdog(void) > } > > /** > + * clocksource_max_deferment - Returns max time the clocksource can be > deferred > + * @cs: Pointer to clocksource > + * > + */ > +static s64 clocksource_max_deferment(struct clocksource *cs) > +{ > + s64 max_nsecs; > + u64 max_cycles; > + > + /* > + * Calculate the maximum number of cycles that we can pass to the > + * cyc2ns function without overflowing a 64-bit signed result. The > + * maximum number of cycles is equal to ULLONG_MAX/cs->mult which > + * is equivalent to the below. > + * max_cycles < (2^63)/cs->mult > + * max_cycles < 2^(log2((2^63)/cs->mult)) > + * max_cycles < 2^(log2(2^63) - log2(cs->mult)) > + * max_cycles < 2^(63 - log2(cs->mult)) > + * max_cycles < 1 << (63 - log2(cs->mult)) > + * Please note that we add 1 to the result of the log2 to account for > + * any rounding errors, ensure the above inequality is satisfied and > + * no overflow will occur. > + */ > + max_cycles = 1ULL << (63 - (ilog2(cs->mult) + 1)); > + > + /* > + * The actual maximum number of cycles we can defer the clocksource is > + * determined by the minimum of max_cycles and cs->mask. > + */ > + max_cycles = min(max_cycles, cs->mask); > + max_nsecs = cyc2ns(cs, max_cycles); > + > + /* > + * To ensure that the clocksource does not wrap whilst we are idle, > + * limit the time the clocksource can be deferred by 12.5%. Please > + * note a margin of 12.5% is used because this can be computed with > + * a shift, versus say 10% which would require division. > + */ > + max_nsecs = max_nsecs - (max_nsecs >> 5); > + > + return max_nsecs; > +} > + > +/** > * clocksource_get_next - Returns the selected clocksource > * > */ > @@ -405,6 +449,9 @@ int clocksource_register(struct clocksource *c) > /* save mult_orig on registration */ > c->mult_orig = c->mult; > > + /* calculate max idle time permitted for this clocksource */ > + c->max_idle_ns = clocksource_max_deferment(c); > + > spin_lock_irqsave(&clocksource_lock, flags); > ret = clocksource_enqueue(c); > if (!ret) > diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c > index d3f1ef4..f0155ae 100644 > --- a/kernel/time/tick-sched.c > +++ b/kernel/time/tick-sched.c > @@ -217,6 +217,7 @@ void tick_nohz_stop_sched_tick(int inidle) > ktime_t last_update, expires, now; > struct clock_event_device *dev = __get_cpu_var(tick_cpu_device).evtdev; > int cpu; > + s64 time_delta, max_time_delta; > > local_irq_save(flags); > > @@ -264,6 +265,7 @@ void tick_nohz_stop_sched_tick(int inidle) > seq = read_seqbegin(&xtime_lock); > last_update = last_jiffies_update; > last_jiffies = jiffies; > + max_time_delta = timekeeping_max_deferment(); > } while (read_seqretry(&xtime_lock, seq)); > > /* Get the next timer wheel timer */ > @@ -283,11 +285,22 @@ void tick_nohz_stop_sched_tick(int inidle) > if ((long)delta_jiffies >= 1) { > > /* > - * calculate the expiry time for the next timer wheel > - * timer > - */ > - expires = ktime_add_ns(last_update, tick_period.tv64 * > - delta_jiffies); > + * Calculate the time delta for the next timer event. > + * If the time delta exceeds the maximum time delta > + * permitted by the current clocksource then adjust > + * the time delta accordingly to ensure the > + * clocksource does not wrap. > + */ > + time_delta = tick_period.tv64 * delta_jiffies; > + > + if (time_delta > max_time_delta) > + time_delta = max_time_delta; > + > + /* > + * calculate the expiry time for the next timer wheel > + * timer > + */ > + expires = ktime_add_ns(last_update, time_delta); > > /* > * If this cpu is the one which updates jiffies, then > @@ -300,7 +313,7 @@ void tick_nohz_stop_sched_tick(int inidle) > if (cpu == tick_do_timer_cpu) > tick_do_timer_cpu = TICK_DO_TIMER_NONE; > > - if (delta_jiffies > 1) > + if (time_delta > tick_period.tv64) > cpumask_set_cpu(cpu, nohz_cpu_mask); > > /* Skip reprogram of event if its not changed */ > @@ -332,12 +345,13 @@ void tick_nohz_stop_sched_tick(int inidle) > ts->idle_sleeps++; > > /* > - * delta_jiffies >= NEXT_TIMER_MAX_DELTA signals that > - * there is no timer pending or at least extremly far > - * into the future (12 days for HZ=1000). In this case > - * we simply stop the tick timer: > + * time_delta >= (tick_period.tv64 * NEXT_TIMER_MAX_DELTA) > + * signals that there is no timer pending or at least > + * extremely far into the future (12 days for HZ=1000). > + * In this case we simply stop the tick timer: > */ > - if (unlikely(delta_jiffies >= NEXT_TIMER_MAX_DELTA)) { > + if (unlikely(time_delta >= > + (tick_period.tv64 * NEXT_TIMER_MAX_DELTA))) { > ts->idle_expires.tv64 = KTIME_MAX; > if (ts->nohz_mode == NOHZ_MODE_HIGHRES) > hrtimer_cancel(&ts->sched_timer); > diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c > index 687dff4..659cae3 100644 > --- a/kernel/time/timekeeping.c > +++ b/kernel/time/timekeeping.c > @@ -271,6 +271,17 @@ int timekeeping_valid_for_hres(void) > } > > /** > + * timekeeping_max_deferment - Returns max time the clocksource can be > deferred > + * > + * IMPORTANT: Caller must observe xtime_lock via > read_seqbegin/read_seqretry > + * to ensure that the clocksource does not change! > + */ > +s64 timekeeping_max_deferment(void) > +{ > + return clock->max_idle_ns; > +} > + > +/** > * read_persistent_clock - Return time in seconds from the persistent > clock. > * > * Weak dummy function for arches that do not yet support it. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 1/2] Dynamic Tick: Prevent clocksource wrapping during idle 2009-05-28 21:10 ` Jon Hunter 2009-05-28 21:43 ` John Stultz @ 2009-05-28 22:16 ` Thomas Gleixner 2009-05-29 19:43 ` Jon Hunter 2009-05-30 1:00 ` Jon Hunter 2 siblings, 1 reply; 29+ messages in thread From: Thomas Gleixner @ 2009-05-28 22:16 UTC (permalink / raw) To: Jon Hunter; +Cc: john stultz, linux-kernel@vger.kernel.org, Ingo Molnar On Thu, 28 May 2009, Jon Hunter wrote: > Thomas Gleixner wrote: > > Please make this a real function. There is no reason to stick this > > into a header file. The only user is clocksource.c anyway, so please > > put it there as a static function and let the compiler decide what > > to do with it. > > No problem. Please see below. Let me know if this is ok and there is anything > else. Looks good now. > /** > + * timekeeping_max_deferment - Returns max time the clocksource can be > deferred > + * > + * IMPORTANT: Caller must observe xtime_lock via read_seqbegin/read_seqretry > + * to ensure that the clocksource does not change! > + */ Just nitpicking here. For the intended use case this is irrelevant. On UP this is called from an irq disabled section, so nothing is going to change the clock source. On SMP it does not matter if CPU A goes to sleep with the old clock source and CPU B changes the clock source while A is idle. When B goes idle it will take the change into account. But that leads me to an interesting observation: On SMP we really should only care for the CPU which has the do_timer duty assigned. All other CPUs can sleep as long as they want. When that CPU goes idle and drops the do_timer duty it needs to look at max_deferement, but the others can sleep as long as they want. So the rule would be: if (cpu == tick_do_timer_cpu || tick_do_timer_cpu == TICK_DO_TIMER_NONE) check_max_deferment(); else sleep_as_long_as_you_want; Could you add that perhaps ? > +s64 timekeeping_max_deferment(void) > +{ > + return clock->max_idle_ns; > +} > + Thanks for your patience, tglx ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 1/2] Dynamic Tick: Prevent clocksource wrapping during idle 2009-05-28 22:16 ` Thomas Gleixner @ 2009-05-29 19:43 ` Jon Hunter 0 siblings, 0 replies; 29+ messages in thread From: Jon Hunter @ 2009-05-29 19:43 UTC (permalink / raw) To: Thomas Gleixner; +Cc: john stultz, linux-kernel@vger.kernel.org, Ingo Molnar Thomas Gleixner wrote: > On Thu, 28 May 2009, Jon Hunter wrote: >> /** >> + * timekeeping_max_deferment - Returns max time the clocksource can be >> deferred >> + * >> + * IMPORTANT: Caller must observe xtime_lock via read_seqbegin/read_seqretry >> + * to ensure that the clocksource does not change! >> + */ > > Just nitpicking here. For the intended use case this is irrelevant. > > On UP this is called from an irq disabled section, so nothing is > going to change the clock source. > > On SMP it does not matter if CPU A goes to sleep with the old clock > source and CPU B changes the clock source while A is idle. When B > goes idle it will take the change into account. Ok, understood. Let me know if you would like me to remove the comment above. I wanted to make sure that if someone was to use this function else where (can't think of why right now) that they would not over look this. > But that leads me to an interesting observation: > > On SMP we really should only care for the CPU which has the do_timer > duty assigned. All other CPUs can sleep as long as they want. When > that CPU goes idle and drops the do_timer duty it needs to look at > max_deferement, but the others can sleep as long as they want. > > So the rule would be: > > if (cpu == tick_do_timer_cpu || tick_do_timer_cpu == TICK_DO_TIMER_NONE) > check_max_deferment(); > else > sleep_as_long_as_you_want; > > Could you add that perhaps ? Absolutely. Please see below and let me know if this is ok. > Thanks for your patience, No problem. Thanks for the feedback. Cheers Jon The dynamic tick allows the kernel to sleep for periods longer than a single tick. This patch prevents that the kernel from sleeping for a period longer than the maximum time that the current clocksource can count. This ensures that the kernel will not lose track of time. This patch adds a function called "clocksource_max_deferment()" that calculates the maximum time the kernel can sleep for a given clocksource and function called "timekeeping_max_deferment()" that returns maximum time the kernel can sleep for the current clocksource. Signed-off-by: Jon Hunter <jon-hunter@ti.com> --- include/linux/clocksource.h | 2 + include/linux/time.h | 1 + kernel/time/clocksource.c | 47 +++++++++++++++++++++++++++++++++++++++++++ kernel/time/tick-sched.c | 47 ++++++++++++++++++++++++++++++++---------- kernel/time/timekeeping.c | 11 ++++++++++ 5 files changed, 97 insertions(+), 11 deletions(-) diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h index 5a40d14..465af22 100644 --- a/include/linux/clocksource.h +++ b/include/linux/clocksource.h @@ -151,6 +151,7 @@ extern u64 timecounter_cyc2time(struct timecounter *tc, * @mult: cycle to nanosecond multiplier (adjusted by NTP) * @mult_orig: cycle to nanosecond multiplier (unadjusted by NTP) * @shift: cycle to nanosecond divisor (power of two) + * @max_idle_ns: max idle time permitted by the clocksource (nsecs) * @flags: flags describing special properties * @vread: vsyscall based read * @resume: resume function for the clocksource, if necessary @@ -171,6 +172,7 @@ struct clocksource { u32 mult; u32 mult_orig; u32 shift; + s64 max_idle_ns; unsigned long flags; cycle_t (*vread)(void); void (*resume)(void); diff --git a/include/linux/time.h b/include/linux/time.h index 242f624..090be07 100644 --- a/include/linux/time.h +++ b/include/linux/time.h @@ -130,6 +130,7 @@ extern void monotonic_to_bootbased(struct timespec *ts); extern struct timespec timespec_trunc(struct timespec t, unsigned gran); extern int timekeeping_valid_for_hres(void); +extern s64 timekeeping_max_deferment(void); extern void update_wall_time(void); extern void update_xtime_cache(u64 nsec); diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c index ecfd7b5..18d2b9f 100644 --- a/kernel/time/clocksource.c +++ b/kernel/time/clocksource.c @@ -321,6 +321,50 @@ void clocksource_touch_watchdog(void) } /** + * clocksource_max_deferment - Returns max time the clocksource can be deferred + * @cs: Pointer to clocksource + * + */ +static s64 clocksource_max_deferment(struct clocksource *cs) +{ + s64 max_nsecs; + u64 max_cycles; + + /* + * Calculate the maximum number of cycles that we can pass to the + * cyc2ns function without overflowing a 64-bit signed result. The + * maximum number of cycles is equal to ULLONG_MAX/cs->mult which + * is equivalent to the below. + * max_cycles < (2^63)/cs->mult + * max_cycles < 2^(log2((2^63)/cs->mult)) + * max_cycles < 2^(log2(2^63) - log2(cs->mult)) + * max_cycles < 2^(63 - log2(cs->mult)) + * max_cycles < 1 << (63 - log2(cs->mult)) + * Please note that we add 1 to the result of the log2 to account for + * any rounding errors, ensure the above inequality is satisfied and + * no overflow will occur. + */ + max_cycles = 1ULL << (63 - (ilog2(cs->mult) + 1)); + + /* + * The actual maximum number of cycles we can defer the clocksource is + * determined by the minimum of max_cycles and cs->mask. + */ + max_cycles = min(max_cycles, cs->mask); + max_nsecs = cyc2ns(cs, max_cycles); + + /* + * To ensure that the clocksource does not wrap whilst we are idle, + * limit the time the clocksource can be deferred by 12.5%. Please + * note a margin of 12.5% is used because this can be computed with + * a shift, versus say 10% which would require division. + */ + max_nsecs = max_nsecs - (max_nsecs >> 5); + + return max_nsecs; +} + +/** * clocksource_get_next - Returns the selected clocksource * */ @@ -405,6 +449,9 @@ int clocksource_register(struct clocksource *c) /* save mult_orig on registration */ c->mult_orig = c->mult; + /* calculate max idle time permitted for this clocksource */ + c->max_idle_ns = clocksource_max_deferment(c); + spin_lock_irqsave(&clocksource_lock, flags); ret = clocksource_enqueue(c); if (!ret) diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c index d3f1ef4..318cf8a 100644 --- a/kernel/time/tick-sched.c +++ b/kernel/time/tick-sched.c @@ -217,6 +217,7 @@ void tick_nohz_stop_sched_tick(int inidle) ktime_t last_update, expires, now; struct clock_event_device *dev = __get_cpu_var(tick_cpu_device).evtdev; int cpu; + s64 time_delta, max_time_delta; local_irq_save(flags); @@ -264,6 +265,18 @@ void tick_nohz_stop_sched_tick(int inidle) seq = read_seqbegin(&xtime_lock); last_update = last_jiffies_update; last_jiffies = jiffies; + + /* + * On SMP we really should only care for the CPU which + * has the do_timer duty assigned. All other CPUs can + * sleep as long as they want. + */ + if (cpu == tick_do_timer_cpu || + tick_do_timer_cpu == TICK_DO_TIMER_NONE) + max_time_delta = timekeeping_max_deferment(); + else + max_time_delta = KTIME_MAX; + } while (read_seqretry(&xtime_lock, seq)); /* Get the next timer wheel timer */ @@ -283,11 +296,22 @@ void tick_nohz_stop_sched_tick(int inidle) if ((long)delta_jiffies >= 1) { /* - * calculate the expiry time for the next timer wheel - * timer - */ - expires = ktime_add_ns(last_update, tick_period.tv64 * - delta_jiffies); + * Calculate the time delta for the next timer event. + * If the time delta exceeds the maximum time delta + * permitted by the current clocksource then adjust + * the time delta accordingly to ensure the + * clocksource does not wrap. + */ + time_delta = tick_period.tv64 * delta_jiffies; + + if (time_delta > max_time_delta) + time_delta = max_time_delta; + + /* + * calculate the expiry time for the next timer wheel + * timer + */ + expires = ktime_add_ns(last_update, time_delta); /* * If this cpu is the one which updates jiffies, then @@ -300,7 +324,7 @@ void tick_nohz_stop_sched_tick(int inidle) if (cpu == tick_do_timer_cpu) tick_do_timer_cpu = TICK_DO_TIMER_NONE; - if (delta_jiffies > 1) + if (time_delta > tick_period.tv64) cpumask_set_cpu(cpu, nohz_cpu_mask); /* Skip reprogram of event if its not changed */ @@ -332,12 +356,13 @@ void tick_nohz_stop_sched_tick(int inidle) ts->idle_sleeps++; /* - * delta_jiffies >= NEXT_TIMER_MAX_DELTA signals that - * there is no timer pending or at least extremly far - * into the future (12 days for HZ=1000). In this case - * we simply stop the tick timer: + * time_delta >= (tick_period.tv64 * NEXT_TIMER_MAX_DELTA) + * signals that there is no timer pending or at least + * extremely far into the future (12 days for HZ=1000). + * In this case we simply stop the tick timer: */ - if (unlikely(delta_jiffies >= NEXT_TIMER_MAX_DELTA)) { + if (unlikely(time_delta >= + (tick_period.tv64 * NEXT_TIMER_MAX_DELTA))) { ts->idle_expires.tv64 = KTIME_MAX; if (ts->nohz_mode == NOHZ_MODE_HIGHRES) hrtimer_cancel(&ts->sched_timer); diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c index 687dff4..659cae3 100644 --- a/kernel/time/timekeeping.c +++ b/kernel/time/timekeeping.c @@ -271,6 +271,17 @@ int timekeeping_valid_for_hres(void) } /** + * timekeeping_max_deferment - Returns max time the clocksource can be deferred + * + * IMPORTANT: Caller must observe xtime_lock via read_seqbegin/read_seqretry + * to ensure that the clocksource does not change! + */ +s64 timekeeping_max_deferment(void) +{ + return clock->max_idle_ns; +} + +/** * read_persistent_clock - Return time in seconds from the persistent clock. * * Weak dummy function for arches that do not yet support it. -- 1.6.1 ^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: [PATCH 1/2] Dynamic Tick: Prevent clocksource wrapping during idle 2009-05-28 21:10 ` Jon Hunter 2009-05-28 21:43 ` John Stultz 2009-05-28 22:16 ` Thomas Gleixner @ 2009-05-30 1:00 ` Jon Hunter 2009-06-04 19:29 ` Jon Hunter 2 siblings, 1 reply; 29+ messages in thread From: Jon Hunter @ 2009-05-30 1:00 UTC (permalink / raw) To: Thomas Gleixner; +Cc: john stultz, linux-kernel@vger.kernel.org, Ingo Molnar Jon Hunter wrote: > + * Calculate the time delta for the next timer event. > + * If the time delta exceeds the maximum time delta > + * permitted by the current clocksource then adjust > + * the time delta accordingly to ensure the > + * clocksource does not wrap. > + */ > + time_delta = tick_period.tv64 * delta_jiffies; Thinking about this more, although it is very unlikely, for 64-bit machines there is a chance that the above multiply could overflow if delta_jiffies is very large. tick_period.tv64 should always be less than NSEC_PER_SEC and so you would need delta_jiffies to be greater than 2^32 to cause overflow. On a 32-bit machine an unsigned long will not be greater than 2^32 as it is only 32-bits but this would be possible on a 64-bit machines. So to be safe we should make sure that delta_jiffies is not greater than NEXT_TIMER_MAX_DELTA (2^30 - 1) before doing the multiply. If you think that this is a valid concern, then I can re-work and re-post. Sorry for not catching this before. Jon ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 1/2] Dynamic Tick: Prevent clocksource wrapping during idle 2009-05-30 1:00 ` Jon Hunter @ 2009-06-04 19:29 ` Jon Hunter 2009-06-25 19:10 ` Jon Hunter 0 siblings, 1 reply; 29+ messages in thread From: Jon Hunter @ 2009-06-04 19:29 UTC (permalink / raw) To: Thomas Gleixner; +Cc: john stultz, linux-kernel@vger.kernel.org, Ingo Molnar Jon Hunter wrote: > Jon Hunter wrote: >> + * Calculate the time delta for the next timer event. >> + * If the time delta exceeds the maximum time delta >> + * permitted by the current clocksource then adjust >> + * the time delta accordingly to ensure the >> + * clocksource does not wrap. >> + */ >> + time_delta = tick_period.tv64 * delta_jiffies; > > Thinking about this more, although it is very unlikely, for 64-bit > machines there is a chance that the above multiply could overflow if > delta_jiffies is very large. > > tick_period.tv64 should always be less than NSEC_PER_SEC and so you > would need delta_jiffies to be greater than 2^32 to cause overflow. On a > 32-bit machine an unsigned long will not be greater than 2^32 as it is > only 32-bits but this would be possible on a 64-bit machines. > > So to be safe we should make sure that delta_jiffies is not greater than > NEXT_TIMER_MAX_DELTA (2^30 - 1) before doing the multiply. If you > think that this is a valid concern, then I can re-work and re-post. > Sorry for not catching this before. With regard to the above, to ensure that there are no overflows with the above calculation, I re-worked this patch a little. The below should be equivalent to the current code, just re-organised a little. Let me know if this would be acceptable or not. Cheers Jon The dynamic tick allows the kernel to sleep for periods longer than a single tick. This patch prevents that the kernel from sleeping for a period longer than the maximum time that the current clocksource can count. This ensures that the kernel will not lose track of time. This patch adds a function called "clocksource_max_deferment()" that calculates the maximum time the kernel can sleep for a given clocksource and function called "timekeeping_max_deferment()" that returns maximum time the kernel can sleep for the current clocksource. Signed-off-by: Jon Hunter <jon-hunter@ti.com> --- include/linux/clocksource.h | 2 + include/linux/time.h | 1 + kernel/time/clocksource.c | 47 +++++++++++++++++++++++++++++++++++ kernel/time/tick-sched.c | 57 ++++++++++++++++++++++++++++++++---------- kernel/time/timekeeping.c | 11 ++++++++ 5 files changed, 104 insertions(+), 14 deletions(-) diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h index 5a40d14..465af22 100644 --- a/include/linux/clocksource.h +++ b/include/linux/clocksource.h @@ -151,6 +151,7 @@ extern u64 timecounter_cyc2time(struct timecounter *tc, * @mult: cycle to nanosecond multiplier (adjusted by NTP) * @mult_orig: cycle to nanosecond multiplier (unadjusted by NTP) * @shift: cycle to nanosecond divisor (power of two) + * @max_idle_ns: max idle time permitted by the clocksource (nsecs) * @flags: flags describing special properties * @vread: vsyscall based read * @resume: resume function for the clocksource, if necessary @@ -171,6 +172,7 @@ struct clocksource { u32 mult; u32 mult_orig; u32 shift; + s64 max_idle_ns; unsigned long flags; cycle_t (*vread)(void); void (*resume)(void); diff --git a/include/linux/time.h b/include/linux/time.h index 242f624..090be07 100644 --- a/include/linux/time.h +++ b/include/linux/time.h @@ -130,6 +130,7 @@ extern void monotonic_to_bootbased(struct timespec *ts); extern struct timespec timespec_trunc(struct timespec t, unsigned gran); extern int timekeeping_valid_for_hres(void); +extern s64 timekeeping_max_deferment(void); extern void update_wall_time(void); extern void update_xtime_cache(u64 nsec); diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c index ecfd7b5..18d2b9f 100644 --- a/kernel/time/clocksource.c +++ b/kernel/time/clocksource.c @@ -321,6 +321,50 @@ void clocksource_touch_watchdog(void) } /** + * clocksource_max_deferment - Returns max time the clocksource can be deferred + * @cs: Pointer to clocksource + * + */ +static s64 clocksource_max_deferment(struct clocksource *cs) +{ + s64 max_nsecs; + u64 max_cycles; + + /* + * Calculate the maximum number of cycles that we can pass to the + * cyc2ns function without overflowing a 64-bit signed result. The + * maximum number of cycles is equal to ULLONG_MAX/cs->mult which + * is equivalent to the below. + * max_cycles < (2^63)/cs->mult + * max_cycles < 2^(log2((2^63)/cs->mult)) + * max_cycles < 2^(log2(2^63) - log2(cs->mult)) + * max_cycles < 2^(63 - log2(cs->mult)) + * max_cycles < 1 << (63 - log2(cs->mult)) + * Please note that we add 1 to the result of the log2 to account for + * any rounding errors, ensure the above inequality is satisfied and + * no overflow will occur. + */ + max_cycles = 1ULL << (63 - (ilog2(cs->mult) + 1)); + + /* + * The actual maximum number of cycles we can defer the clocksource is + * determined by the minimum of max_cycles and cs->mask. + */ + max_cycles = min(max_cycles, cs->mask); + max_nsecs = cyc2ns(cs, max_cycles); + + /* + * To ensure that the clocksource does not wrap whilst we are idle, + * limit the time the clocksource can be deferred by 12.5%. Please + * note a margin of 12.5% is used because this can be computed with + * a shift, versus say 10% which would require division. + */ + max_nsecs = max_nsecs - (max_nsecs >> 5); + + return max_nsecs; +} + +/** * clocksource_get_next - Returns the selected clocksource * */ @@ -405,6 +449,9 @@ int clocksource_register(struct clocksource *c) /* save mult_orig on registration */ c->mult_orig = c->mult; + /* calculate max idle time permitted for this clocksource */ + c->max_idle_ns = clocksource_max_deferment(c); + spin_lock_irqsave(&clocksource_lock, flags); ret = clocksource_enqueue(c); if (!ret) diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c index d3f1ef4..9988e5e 100644 --- a/kernel/time/tick-sched.c +++ b/kernel/time/tick-sched.c @@ -217,6 +217,7 @@ void tick_nohz_stop_sched_tick(int inidle) ktime_t last_update, expires, now; struct clock_event_device *dev = __get_cpu_var(tick_cpu_device).evtdev; int cpu; + s64 time_delta, max_time_delta; local_irq_save(flags); @@ -264,6 +265,18 @@ void tick_nohz_stop_sched_tick(int inidle) seq = read_seqbegin(&xtime_lock); last_update = last_jiffies_update; last_jiffies = jiffies; + + /* + * On SMP we really should only care for the CPU which + * has the do_timer duty assigned. All other CPUs can + * sleep as long as they want. + */ + if (cpu == tick_do_timer_cpu || + tick_do_timer_cpu == TICK_DO_TIMER_NONE) + max_time_delta = timekeeping_max_deferment(); + else + max_time_delta = KTIME_MAX; + } while (read_seqretry(&xtime_lock, seq)); /* Get the next timer wheel timer */ @@ -283,11 +296,30 @@ void tick_nohz_stop_sched_tick(int inidle) if ((long)delta_jiffies >= 1) { /* - * calculate the expiry time for the next timer wheel - * timer - */ - expires = ktime_add_ns(last_update, tick_period.tv64 * - delta_jiffies); + * calculate the expiry time for the next timer wheel + * timer. delta_jiffies >= NEXT_TIMER_MAX_DELTA signals + * that there is no timer pending or at least extremely + * far into the future (12 days for HZ=1000). In this + * case we set the expiry to the end of time. + */ + if (likely(delta_jiffies < NEXT_TIMER_MAX_DELTA)) { + + /* + * Calculate the time delta for the next timer event. + * If the time delta exceeds the maximum time delta + * permitted by the current clocksource then adjust + * the time delta accordingly to ensure the + * clocksource does not wrap. + */ + time_delta = tick_period.tv64 * delta_jiffies; + + if (time_delta > max_time_delta) + time_delta = max_time_delta; + + expires = ktime_add_ns(last_update, time_delta); + } else { + expires.tv64 = KTIME_MAX; + } /* * If this cpu is the one which updates jiffies, then @@ -331,22 +363,19 @@ void tick_nohz_stop_sched_tick(int inidle) ts->idle_sleeps++; + /* Mark expires */ + ts->idle_expires = expires; + /* - * delta_jiffies >= NEXT_TIMER_MAX_DELTA signals that - * there is no timer pending or at least extremly far - * into the future (12 days for HZ=1000). In this case - * we simply stop the tick timer: + * If the expiration time == KTIME_MAX, then + * in this case we simply stop the tick timer. */ - if (unlikely(delta_jiffies >= NEXT_TIMER_MAX_DELTA)) { - ts->idle_expires.tv64 = KTIME_MAX; + if (unlikely(expires.tv64 == KTIME_MAX)) { if (ts->nohz_mode == NOHZ_MODE_HIGHRES) hrtimer_cancel(&ts->sched_timer); goto out; } - /* Mark expiries */ - ts->idle_expires = expires; - if (ts->nohz_mode == NOHZ_MODE_HIGHRES) { hrtimer_start(&ts->sched_timer, expires, HRTIMER_MODE_ABS); diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c index 687dff4..659cae3 100644 --- a/kernel/time/timekeeping.c +++ b/kernel/time/timekeeping.c @@ -271,6 +271,17 @@ int timekeeping_valid_for_hres(void) } /** + * timekeeping_max_deferment - Returns max time the clocksource can be deferred + * + * IMPORTANT: Caller must observe xtime_lock via read_seqbegin/read_seqretry + * to ensure that the clocksource does not change! + */ +s64 timekeeping_max_deferment(void) +{ + return clock->max_idle_ns; +} + +/** * read_persistent_clock - Return time in seconds from the persistent clock. * * Weak dummy function for arches that do not yet support it. -- 1.6.1 ^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: [PATCH 1/2] Dynamic Tick: Prevent clocksource wrapping during idle 2009-06-04 19:29 ` Jon Hunter @ 2009-06-25 19:10 ` Jon Hunter 0 siblings, 0 replies; 29+ messages in thread From: Jon Hunter @ 2009-06-25 19:10 UTC (permalink / raw) To: Thomas Gleixner; +Cc: john stultz, linux-kernel@vger.kernel.org, Ingo Molnar Jon Hunter wrote: > Jon Hunter wrote: >> Jon Hunter wrote: >>> + * Calculate the time delta for the next timer event. >>> + * If the time delta exceeds the maximum time delta >>> + * permitted by the current clocksource then adjust >>> + * the time delta accordingly to ensure the >>> + * clocksource does not wrap. >>> + */ >>> + time_delta = tick_period.tv64 * delta_jiffies; >> Thinking about this more, although it is very unlikely, for 64-bit >> machines there is a chance that the above multiply could overflow if >> delta_jiffies is very large. >> >> tick_period.tv64 should always be less than NSEC_PER_SEC and so you >> would need delta_jiffies to be greater than 2^32 to cause overflow. On a >> 32-bit machine an unsigned long will not be greater than 2^32 as it is >> only 32-bits but this would be possible on a 64-bit machines. >> >> So to be safe we should make sure that delta_jiffies is not greater than >> NEXT_TIMER_MAX_DELTA (2^30 - 1) before doing the multiply. If you >> think that this is a valid concern, then I can re-work and re-post. >> Sorry for not catching this before. > > With regard to the above, to ensure that there are no overflows with the > above calculation, I re-worked this patch a little. The below should be > equivalent to the current code, just re-organised a little. Let me know > if this would be acceptable or not. Hi Thomas, John, Did you guys have chance to review this? Let me know if you have any further comments/feedback. Cheers Jon ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 1/2] Dynamic Tick: Prevent clocksource wrapping during idle 2009-05-27 14:49 Jon Hunter 2009-05-27 16:01 ` Thomas Gleixner @ 2009-05-27 18:15 ` john stultz 2009-05-27 20:54 ` Alok Kataria 2 siblings, 0 replies; 29+ messages in thread From: john stultz @ 2009-05-27 18:15 UTC (permalink / raw) To: Jon Hunter; +Cc: linux-kernel@vger.kernel.org, Thomas Gleixner, Ingo Molnar On Wed, 2009-05-27 at 09:49 -0500, Jon Hunter wrote: > The dynamic tick allows the kernel to sleep for periods longer than a > single tick. This patch prevents that the kernel from sleeping for a > period longer than the maximum time that the current clocksource can > count. This ensures that the kernel will not lose track of time. This > patch adds a new function called "timekeeping_max_deferment()" that > calculates the maximum time the kernel can sleep for a given clocksource. > > Signed-off-by: Jon Hunter <jon-hunter@ti.com> Acked-by: John Stultz <johnstul@us.ibm.com> > --- > include/linux/time.h | 1 + > kernel/time/tick-sched.c | 36 +++++++++++++++++++++++---------- > kernel/time/timekeeping.c | 47 > +++++++++++++++++++++++++++++++++++++++++++++ > 3 files changed, 73 insertions(+), 11 deletions(-) > > diff --git a/include/linux/time.h b/include/linux/time.h > index 242f624..090be07 100644 > --- a/include/linux/time.h > +++ b/include/linux/time.h > @@ -130,6 +130,7 @@ extern void monotonic_to_bootbased(struct timespec *ts); > > extern struct timespec timespec_trunc(struct timespec t, unsigned gran); > extern int timekeeping_valid_for_hres(void); > +extern s64 timekeeping_max_deferment(void); > extern void update_wall_time(void); > extern void update_xtime_cache(u64 nsec); > > diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c > index d3f1ef4..f0155ae 100644 > --- a/kernel/time/tick-sched.c > +++ b/kernel/time/tick-sched.c > @@ -217,6 +217,7 @@ void tick_nohz_stop_sched_tick(int inidle) > ktime_t last_update, expires, now; > struct clock_event_device *dev = __get_cpu_var(tick_cpu_device).evtdev; > int cpu; > + s64 time_delta, max_time_delta; > > local_irq_save(flags); > > @@ -264,6 +265,7 @@ void tick_nohz_stop_sched_tick(int inidle) > seq = read_seqbegin(&xtime_lock); > last_update = last_jiffies_update; > last_jiffies = jiffies; > + max_time_delta = timekeeping_max_deferment(); > } while (read_seqretry(&xtime_lock, seq)); > > /* Get the next timer wheel timer */ > @@ -283,11 +285,22 @@ void tick_nohz_stop_sched_tick(int inidle) > if ((long)delta_jiffies >= 1) { > > /* > - * calculate the expiry time for the next timer wheel > - * timer > - */ > - expires = ktime_add_ns(last_update, tick_period.tv64 * > - delta_jiffies); > + * Calculate the time delta for the next timer event. > + * If the time delta exceeds the maximum time delta > + * permitted by the current clocksource then adjust > + * the time delta accordingly to ensure the > + * clocksource does not wrap. > + */ > + time_delta = tick_period.tv64 * delta_jiffies; > + > + if (time_delta > max_time_delta) > + time_delta = max_time_delta; > + > + /* > + * calculate the expiry time for the next timer wheel > + * timer > + */ > + expires = ktime_add_ns(last_update, time_delta); > > /* > * If this cpu is the one which updates jiffies, then > @@ -300,7 +313,7 @@ void tick_nohz_stop_sched_tick(int inidle) > if (cpu == tick_do_timer_cpu) > tick_do_timer_cpu = TICK_DO_TIMER_NONE; > > - if (delta_jiffies > 1) > + if (time_delta > tick_period.tv64) > cpumask_set_cpu(cpu, nohz_cpu_mask); > > /* Skip reprogram of event if its not changed */ > @@ -332,12 +345,13 @@ void tick_nohz_stop_sched_tick(int inidle) > ts->idle_sleeps++; > > /* > - * delta_jiffies >= NEXT_TIMER_MAX_DELTA signals that > - * there is no timer pending or at least extremly far > - * into the future (12 days for HZ=1000). In this case > - * we simply stop the tick timer: > + * time_delta >= (tick_period.tv64 * NEXT_TIMER_MAX_DELTA) > + * signals that there is no timer pending or at least > + * extremely far into the future (12 days for HZ=1000). > + * In this case we simply stop the tick timer: > */ > - if (unlikely(delta_jiffies >= NEXT_TIMER_MAX_DELTA)) { > + if (unlikely(time_delta >= > + (tick_period.tv64 * NEXT_TIMER_MAX_DELTA))) { > ts->idle_expires.tv64 = KTIME_MAX; > if (ts->nohz_mode == NOHZ_MODE_HIGHRES) > hrtimer_cancel(&ts->sched_timer); > diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c > index 687dff4..608fc6f 100644 > --- a/kernel/time/timekeeping.c > +++ b/kernel/time/timekeeping.c > @@ -271,6 +271,53 @@ int timekeeping_valid_for_hres(void) > } > > /** > + * timekeeping_max_deferment - Returns max time the clocksource can be > deferred > + * > + * IMPORTANT: Must be called with xtime_lock held! > + */ > +s64 timekeeping_max_deferment(void) > +{ > + s64 max_nsecs; > + u64 max_cycles; > + > + /* > + * Calculate the maximum number of cycles that we can pass to the > + * cyc2ns function without overflowing a 64-bit signed result. The > + * maximum number of cycles is equal to ULLONG_MAX/clock->mult which > + * is equivalent to the below. > + * max_cycles < (2^63)/clock->mult > + * max_cycles < 2^(log2((2^63)/clock->mult)) > + * max_cycles < 2^(log2(2^63) - log2(clock->mult)) > + * max_cycles < 2^(63 - log2(clock->mult)) > + * max_cycles < 1 << (63 - log2(clock->mult)) > + * Please note that we add 1 to the result of the log2 to account for > + * any rounding errors, ensure the above inequality is satisfied and > + * no overflow will occur. > + */ > + max_cycles = 1ULL << (63 - (ilog2(clock->mult) + 1)); > + > + /* > + * The actual maximum number of cycles we can defer the clocksource is > + * determined by the minimum of max_cycles and clock->mask. > + */ > + max_cycles = min(max_cycles, clock->mask); > + max_nsecs = cyc2ns(clock, max_cycles); > + > + /* > + * To ensure that the clocksource does not wrap whilst we are idle, > + * limit the time the clocksource can be deferred by 6.25%. Please > + * note a margin of 6.25% is used because this can be computed with > + * a shift, versus say 5% which would require division. > + */ > + max_nsecs = max_nsecs - (max_nsecs >> 4); > + > + if (max_nsecs < 0) > + max_nsecs = 0; > + > + return max_nsecs; > +} > + > +/** > * read_persistent_clock - Return time in seconds from the persistent > clock. > * > * Weak dummy function for arches that do not yet support it. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 1/2] Dynamic Tick: Prevent clocksource wrapping during idle 2009-05-27 14:49 Jon Hunter 2009-05-27 16:01 ` Thomas Gleixner 2009-05-27 18:15 ` john stultz @ 2009-05-27 20:54 ` Alok Kataria 2009-05-27 21:12 ` Thomas Gleixner 2 siblings, 1 reply; 29+ messages in thread From: Alok Kataria @ 2009-05-27 20:54 UTC (permalink / raw) To: Jon Hunter Cc: linux-kernel@vger.kernel.org, john stultz, Thomas Gleixner, Ingo Molnar, akataria On Wed, May 27, 2009 at 7:49 AM, Jon Hunter <jon-hunter@ti.com> wrote: > > The dynamic tick allows the kernel to sleep for periods longer than a single > tick. This patch prevents that the kernel from sleeping for a period longer > than the maximum time that the current clocksource can count. This ensures > that the kernel will not lose track of time. This patch adds a new function > called "timekeeping_max_deferment()" that calculates the maximum time the > kernel can sleep for a given clocksource. > >From the patch description I understand that this will avoid wrapping around for only the *current* clocksource. What happens if, say, TSC is the clocksource and ACPI_PM is being used as the watchdog_clocksource, in that case the timekeeping_max_deferement will give TSC' max allowed sleep value (which is greater than ACPI_PMs). i.e. We could still sleep beyond ACPI_PM's wrap around threshold which may result in us marking TSC as unsuable as a clocksource. That could still result in incorrect timekeeping right ? Thanks, Alok > Signed-off-by: Jon Hunter <jon-hunter@ti.com> > --- > include/linux/time.h | 1 + > kernel/time/tick-sched.c | 36 +++++++++++++++++++++++---------- > kernel/time/timekeeping.c | 47 > +++++++++++++++++++++++++++++++++++++++++++++ > 3 files changed, 73 insertions(+), 11 deletions(-) > > diff --git a/include/linux/time.h b/include/linux/time.h > index 242f624..090be07 100644 > --- a/include/linux/time.h > +++ b/include/linux/time.h > @@ -130,6 +130,7 @@ extern void monotonic_to_bootbased(struct timespec *ts); > > extern struct timespec timespec_trunc(struct timespec t, unsigned gran); > extern int timekeeping_valid_for_hres(void); > +extern s64 timekeeping_max_deferment(void); > extern void update_wall_time(void); > extern void update_xtime_cache(u64 nsec); > > diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c > index d3f1ef4..f0155ae 100644 > --- a/kernel/time/tick-sched.c > +++ b/kernel/time/tick-sched.c > @@ -217,6 +217,7 @@ void tick_nohz_stop_sched_tick(int inidle) > ktime_t last_update, expires, now; > struct clock_event_device *dev = > __get_cpu_var(tick_cpu_device).evtdev; > int cpu; > + s64 time_delta, max_time_delta; > > local_irq_save(flags); > > @@ -264,6 +265,7 @@ void tick_nohz_stop_sched_tick(int inidle) > seq = read_seqbegin(&xtime_lock); > last_update = last_jiffies_update; > last_jiffies = jiffies; > + max_time_delta = timekeeping_max_deferment(); > } while (read_seqretry(&xtime_lock, seq)); > > /* Get the next timer wheel timer */ > @@ -283,11 +285,22 @@ void tick_nohz_stop_sched_tick(int inidle) > if ((long)delta_jiffies >= 1) { > > /* > - * calculate the expiry time for the next timer wheel > - * timer > - */ > - expires = ktime_add_ns(last_update, tick_period.tv64 * > - delta_jiffies); > + * Calculate the time delta for the next timer event. > + * If the time delta exceeds the maximum time delta > + * permitted by the current clocksource then adjust > + * the time delta accordingly to ensure the > + * clocksource does not wrap. > + */ > + time_delta = tick_period.tv64 * delta_jiffies; > + > + if (time_delta > max_time_delta) > + time_delta = max_time_delta; > + > + /* > + * calculate the expiry time for the next timer wheel > + * timer > + */ > + expires = ktime_add_ns(last_update, time_delta); > > /* > * If this cpu is the one which updates jiffies, then > @@ -300,7 +313,7 @@ void tick_nohz_stop_sched_tick(int inidle) > if (cpu == tick_do_timer_cpu) > tick_do_timer_cpu = TICK_DO_TIMER_NONE; > > - if (delta_jiffies > 1) > + if (time_delta > tick_period.tv64) > cpumask_set_cpu(cpu, nohz_cpu_mask); > > /* Skip reprogram of event if its not changed */ > @@ -332,12 +345,13 @@ void tick_nohz_stop_sched_tick(int inidle) > ts->idle_sleeps++; > > /* > - * delta_jiffies >= NEXT_TIMER_MAX_DELTA signals that > - * there is no timer pending or at least extremly far > - * into the future (12 days for HZ=1000). In this case > - * we simply stop the tick timer: > + * time_delta >= (tick_period.tv64 * NEXT_TIMER_MAX_DELTA) > + * signals that there is no timer pending or at least > + * extremely far into the future (12 days for HZ=1000). > + * In this case we simply stop the tick timer: > */ > - if (unlikely(delta_jiffies >= NEXT_TIMER_MAX_DELTA)) { > + if (unlikely(time_delta >= > + (tick_period.tv64 * NEXT_TIMER_MAX_DELTA))) > { > ts->idle_expires.tv64 = KTIME_MAX; > if (ts->nohz_mode == NOHZ_MODE_HIGHRES) > hrtimer_cancel(&ts->sched_timer); > diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c > index 687dff4..608fc6f 100644 > --- a/kernel/time/timekeeping.c > +++ b/kernel/time/timekeeping.c > @@ -271,6 +271,53 @@ int timekeeping_valid_for_hres(void) > } > > /** > + * timekeeping_max_deferment - Returns max time the clocksource can be > deferred > + * > + * IMPORTANT: Must be called with xtime_lock held! > + */ > +s64 timekeeping_max_deferment(void) > +{ > + s64 max_nsecs; > + u64 max_cycles; > + > + /* > + * Calculate the maximum number of cycles that we can pass to the > + * cyc2ns function without overflowing a 64-bit signed result. The > + * maximum number of cycles is equal to ULLONG_MAX/clock->mult which > + * is equivalent to the below. > + * max_cycles < (2^63)/clock->mult > + * max_cycles < 2^(log2((2^63)/clock->mult)) > + * max_cycles < 2^(log2(2^63) - log2(clock->mult)) > + * max_cycles < 2^(63 - log2(clock->mult)) > + * max_cycles < 1 << (63 - log2(clock->mult)) > + * Please note that we add 1 to the result of the log2 to account > for > + * any rounding errors, ensure the above inequality is satisfied and > + * no overflow will occur. > + */ > + max_cycles = 1ULL << (63 - (ilog2(clock->mult) + 1)); > + > + /* > + * The actual maximum number of cycles we can defer the clocksource > is > + * determined by the minimum of max_cycles and clock->mask. > + */ > + max_cycles = min(max_cycles, clock->mask); > + max_nsecs = cyc2ns(clock, max_cycles); > + > + /* > + * To ensure that the clocksource does not wrap whilst we are idle, > + * limit the time the clocksource can be deferred by 6.25%. Please > + * note a margin of 6.25% is used because this can be computed with > + * a shift, versus say 5% which would require division. > + */ > + max_nsecs = max_nsecs - (max_nsecs >> 4); > + > + if (max_nsecs < 0) > + max_nsecs = 0; > + > + return max_nsecs; > +} > + > +/** > * read_persistent_clock - Return time in seconds from the persistent > clock. > * > * Weak dummy function for arches that do not yet support it. > -- > 1.6.1 > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 1/2] Dynamic Tick: Prevent clocksource wrapping during idle 2009-05-27 20:54 ` Alok Kataria @ 2009-05-27 21:12 ` Thomas Gleixner 0 siblings, 0 replies; 29+ messages in thread From: Thomas Gleixner @ 2009-05-27 21:12 UTC (permalink / raw) To: Alok Kataria Cc: Jon Hunter, linux-kernel@vger.kernel.org, john stultz, Ingo Molnar, akataria On Wed, 27 May 2009, Alok Kataria wrote: > On Wed, May 27, 2009 at 7:49 AM, Jon Hunter <jon-hunter@ti.com> wrote: > > > > The dynamic tick allows the kernel to sleep for periods longer than a single > > tick. This patch prevents that the kernel from sleeping for a period longer > > than the maximum time that the current clocksource can count. This ensures > > that the kernel will not lose track of time. This patch adds a new function > > called "timekeeping_max_deferment()" that calculates the maximum time the > > kernel can sleep for a given clocksource. > > > > >From the patch description I understand that this will avoid wrapping > around for only the *current* clocksource. What happens if, say, TSC > is the clocksource and ACPI_PM is being used as the > watchdog_clocksource, in that case the timekeeping_max_deferement will > give TSC' max allowed sleep value (which is greater than ACPI_PMs). > i.e. We could still sleep beyond ACPI_PM's wrap around threshold which > may result in us marking TSC as unsuable as a clocksource. > That could still result in incorrect timekeeping right ? No, because the watchdog timer takes care of that. It wakes up at time. Thanks, tglx ^ permalink raw reply [flat|nested] 29+ messages in thread
end of thread, other threads:[~2009-11-13 19:50 UTC | newest] Thread overview: 29+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-08-18 17:45 [PATCH 0/2] Dynamic Tick: Enabling longer sleep times on 32-bit machines Jon Hunter 2009-08-18 17:45 ` [PATCH 1/2] Dynamic Tick: Prevent clocksource wrapping during idle Jon Hunter 2009-08-18 17:45 ` [PATCH 2/2] Dynamic Tick: Allow 32-bit machines to sleep for more than 2.15 seconds Jon Hunter 2009-08-18 19:26 ` Thomas Gleixner 2009-08-18 20:52 ` Jon Hunter 2009-11-13 19:50 ` [tip:timers/core] nohz: " tip-bot for Jon Hunter 2009-08-18 19:25 ` [PATCH 1/2] Dynamic Tick: Prevent clocksource wrapping during idle Thomas Gleixner 2009-08-18 20:42 ` Jon Hunter 2009-11-13 19:49 ` [tip:timers/core] nohz: " tip-bot for Jon Hunter 2009-11-11 20:43 ` [PATCH 0/2] Dynamic Tick: Enabling longer sleep times on 32-bit machines john stultz 2009-11-11 20:57 ` Jon Hunter 2009-11-11 22:37 ` john stultz -- strict thread matches above, loose matches on Subject: below -- 2009-07-28 0:00 [PATCH 0/2] Dynamic Tick: Enabling longer sleep times on 32-bit Jon Hunter 2009-07-28 0:00 ` [PATCH 1/2] Dynamic Tick: Prevent clocksource wrapping during idle Jon Hunter 2009-05-27 14:49 Jon Hunter 2009-05-27 16:01 ` Thomas Gleixner 2009-05-27 20:20 ` john stultz 2009-05-27 20:32 ` Thomas Gleixner 2009-05-28 20:21 ` Jon Hunter 2009-05-28 20:36 ` Thomas Gleixner 2009-05-28 21:10 ` Jon Hunter 2009-05-28 21:43 ` John Stultz 2009-05-28 22:16 ` Thomas Gleixner 2009-05-29 19:43 ` Jon Hunter 2009-05-30 1:00 ` Jon Hunter 2009-06-04 19:29 ` Jon Hunter 2009-06-25 19:10 ` Jon Hunter 2009-05-27 18:15 ` john stultz 2009-05-27 20:54 ` Alok Kataria 2009-05-27 21:12 ` Thomas Gleixner
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox