[PATCH 0/2] Dynamic Tick: Enabling longer sleep times on 32-bit machines

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/2] Dynamic Tick: Enabling longer sleep times on 32-bit machines
@ 2009-08-18 17:45 Jon Hunter
  2009-08-18 17:45 ` [PATCH 1/2] Dynamic Tick: Prevent clocksource wrapping during idle Jon Hunter
  2009-11-11 20:43 ` [PATCH 0/2] Dynamic Tick: Enabling longer sleep times on 32-bit machines john stultz
  0 siblings, 2 replies; 29+ messages in thread
From: Jon Hunter @ 2009-08-18 17:45 UTC (permalink / raw)
  To: linux-kernel; +Cc: Thomas Gleixner, John Stultz, Jon Hunter

From: Jon Hunter <jon-hunter@ti.com>

This is a resend of the patch series shown here:
http://www.spinics.net/lists/kernel/msg891029.html

This patch series has been rebase on the linux-2.6-tip timers/core branch per
request from Thomas Gleixner.

This patch series ensures that the wrapping of the clocksource will not be
missed if the kernel sleeps for longer periods and allows 32-bit machines to
sleep for longer than 2.15 seconds.

Jon Hunter (2):
  Dynamic Tick: Prevent clocksource wrapping during idle
  Dynamic Tick: Allow 32-bit machines to sleep for more than 2.15
    seconds

 include/linux/clockchips.h  |    6 ++--
 include/linux/clocksource.h |    2 +
 include/linux/time.h        |    1 +
 kernel/hrtimer.c            |    2 +-
 kernel/time/clockevents.c   |   10 ++++----
 kernel/time/clocksource.c   |   47 +++++++++++++++++++++++++++++++++++
 kernel/time/tick-oneshot.c  |    2 +-
 kernel/time/tick-sched.c    |   57 ++++++++++++++++++++++++++++++++----------
 kernel/time/timekeeping.c   |   11 ++++++++
 kernel/time/timer_list.c    |    4 +-
 10 files changed, 116 insertions(+), 26 deletions(-)


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 1/2] Dynamic Tick: Prevent clocksource wrapping during idle
  2009-08-18 17:45 [PATCH 0/2] Dynamic Tick: Enabling longer sleep times on 32-bit machines Jon Hunter
@ 2009-08-18 17:45 ` Jon Hunter
  2009-08-18 17:45   ` [PATCH 2/2] Dynamic Tick: Allow 32-bit machines to sleep for more than 2.15 seconds Jon Hunter
                     ` (2 more replies)
  2009-11-11 20:43 ` [PATCH 0/2] Dynamic Tick: Enabling longer sleep times on 32-bit machines john stultz
  1 sibling, 3 replies; 29+ messages in thread
From: Jon Hunter @ 2009-08-18 17:45 UTC (permalink / raw)
  To: linux-kernel; +Cc: Thomas Gleixner, John Stultz, Jon Hunter

From: Jon Hunter <jon-hunter@ti.com>

The dynamic tick allows the kernel to sleep for periods longer
than a single tick. This patch prevents that the kernel from
sleeping for a period longer than the maximum time that the
current clocksource can count. This ensures that the kernel will
not lose track of time. This patch adds a function called
"clocksource_max_deferment()" that calculates the maximum time the
kernel can sleep for a given clocksource and function called
"timekeeping_max_deferment()" that returns maximum time the kernel
can sleep for the current clocksource.

Signed-off-by: Jon Hunter <jon-hunter@ti.com>
---
 include/linux/clocksource.h |    2 +
 include/linux/time.h        |    1 +
 kernel/time/clocksource.c   |   47 +++++++++++++++++++++++++++++++++++
 kernel/time/tick-sched.c    |   57 ++++++++++++++++++++++++++++++++----------
 kernel/time/timekeeping.c   |   11 ++++++++
 5 files changed, 104 insertions(+), 14 deletions(-)

diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h
index 9ea40ff..09ed7f1 100644
--- a/include/linux/clocksource.h
+++ b/include/linux/clocksource.h
@@ -151,6 +151,7 @@ extern u64 timecounter_cyc2time(struct timecounter *tc,
  *			subtraction of non 64 bit counters
  * @mult:		cycle to nanosecond multiplier
  * @shift:		cycle to nanosecond divisor (power of two)
+ * @max_idle_ns:	max idle time permitted by the clocksource (nsecs)
  * @flags:		flags describing special properties
  * @vread:		vsyscall based read
  * @resume:		resume function for the clocksource, if necessary
@@ -168,6 +169,7 @@ struct clocksource {
 	cycle_t mask;
 	u32 mult;
 	u32 shift;
+	s64 max_idle_ns;
 	unsigned long flags;
 	cycle_t (*vread)(void);
 	void (*resume)(void);
diff --git a/include/linux/time.h b/include/linux/time.h
index f505988..e68a480 100644
--- a/include/linux/time.h
+++ b/include/linux/time.h
@@ -146,6 +146,7 @@ extern void monotonic_to_bootbased(struct timespec *ts);
 
 extern struct timespec timespec_trunc(struct timespec t, unsigned gran);
 extern int timekeeping_valid_for_hres(void);
+extern s64 timekeeping_max_deferment(void);
 extern void update_wall_time(void);
 extern void update_xtime_cache(u64 nsec);
 extern void timekeeping_leap_insert(int leapsecond);
diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index 02dc22d..7fffe54 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -369,6 +369,50 @@ void clocksource_touch_watchdog(void)
 	clocksource_resume_watchdog();
 }
 
+/**
+ * clocksource_max_deferment - Returns max time the clocksource can be deferred
+ * @cs:         Pointer to clocksource
+ *
+ */
+static s64 clocksource_max_deferment(struct clocksource *cs)
+{
+	s64 max_nsecs;
+	u64 max_cycles;
+
+	/*
+	 * Calculate the maximum number of cycles that we can pass to the
+	 * cyc2ns function without overflowing a 64-bit signed result. The
+	 * maximum number of cycles is equal to ULLONG_MAX/cs->mult which
+	 * is equivalent to the below.
+	 * max_cycles < (2^63)/cs->mult
+	 * max_cycles < 2^(log2((2^63)/cs->mult))
+	 * max_cycles < 2^(log2(2^63) - log2(cs->mult))
+	 * max_cycles < 2^(63 - log2(cs->mult))
+	 * max_cycles < 1 << (63 - log2(cs->mult))
+	 * Please note that we add 1 to the result of the log2 to account for
+	 * any rounding errors, ensure the above inequality is satisfied and
+	 * no overflow will occur.
+	 */
+	max_cycles = 1ULL << (63 - (ilog2(cs->mult) + 1));
+
+	/*
+	 * The actual maximum number of cycles we can defer the clocksource is
+	 * determined by the minimum of max_cycles and cs->mask.
+	 */
+	max_cycles = min(max_cycles, cs->mask);
+	max_nsecs = clocksource_cyc2ns(max_cycles, cs->mult, cs->shift);
+
+	/*
+	 * To ensure that the clocksource does not wrap whilst we are idle,
+	 * limit the time the clocksource can be deferred by 12.5%. Please
+	 * note a margin of 12.5% is used because this can be computed with
+	 * a shift, versus say 10% which would require division.
+	 */
+	max_nsecs = max_nsecs - (max_nsecs >> 5);
+
+	return max_nsecs;
+}
+
 #ifdef CONFIG_GENERIC_TIME
 
 static int finished_booting;
@@ -461,6 +505,9 @@ static void clocksource_enqueue(struct clocksource *cs)
  */
 int clocksource_register(struct clocksource *cs)
 {
+	/* calculate max idle time permitted for this clocksource */
+	cs->max_idle_ns = clocksource_max_deferment(cs);
+
 	mutex_lock(&clocksource_mutex);
 	clocksource_enqueue(cs);
 	clocksource_select();
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index e0f59a2..7a98e90 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -217,6 +217,7 @@ void tick_nohz_stop_sched_tick(int inidle)
 	ktime_t last_update, expires, now;
 	struct clock_event_device *dev = __get_cpu_var(tick_cpu_device).evtdev;
 	int cpu;
+	s64 time_delta, max_time_delta;
 
 	local_irq_save(flags);
 
@@ -270,6 +271,18 @@ void tick_nohz_stop_sched_tick(int inidle)
 		seq = read_seqbegin(&xtime_lock);
 		last_update = last_jiffies_update;
 		last_jiffies = jiffies;
+
+		/*
+		 * On SMP we really should only care for the CPU which
+		 * has the do_timer duty assigned. All other CPUs can
+		 * sleep as long as they want.
+		 */
+		if (cpu == tick_do_timer_cpu ||
+				tick_do_timer_cpu == TICK_DO_TIMER_NONE)
+			max_time_delta = timekeeping_max_deferment();
+		else
+			max_time_delta = KTIME_MAX;
+
 	} while (read_seqretry(&xtime_lock, seq));
 
 	/* Get the next timer wheel timer */
@@ -289,11 +302,30 @@ void tick_nohz_stop_sched_tick(int inidle)
 	if ((long)delta_jiffies >= 1) {
 
 		/*
-		* calculate the expiry time for the next timer wheel
-		* timer
-		*/
-		expires = ktime_add_ns(last_update, tick_period.tv64 *
-				   delta_jiffies);
+		 * calculate the expiry time for the next timer wheel
+		 * timer. delta_jiffies >= NEXT_TIMER_MAX_DELTA signals
+		 * that there is no timer pending or at least extremely
+		 * far into the future (12 days for HZ=1000). In this
+		 * case we set the expiry to the end of time.
+		 */
+		if (likely(delta_jiffies < NEXT_TIMER_MAX_DELTA)) {
+
+			/*
+			 * Calculate the time delta for the next timer event.
+			 * If the time delta exceeds the maximum time delta
+			 * permitted by the current clocksource then adjust
+			 * the time delta accordingly to ensure the
+			 * clocksource does not wrap.
+			 */
+			time_delta = tick_period.tv64 * delta_jiffies;
+
+			if (time_delta > max_time_delta)
+				time_delta = max_time_delta;
+
+			expires = ktime_add_ns(last_update, time_delta);
+		} else {
+			expires.tv64 = KTIME_MAX;
+		}
 
 		/*
 		 * If this cpu is the one which updates jiffies, then
@@ -337,22 +369,19 @@ void tick_nohz_stop_sched_tick(int inidle)
 
 		ts->idle_sleeps++;
 
+		/* Mark expires */
+		ts->idle_expires = expires;
+
 		/*
-		 * delta_jiffies >= NEXT_TIMER_MAX_DELTA signals that
-		 * there is no timer pending or at least extremly far
-		 * into the future (12 days for HZ=1000). In this case
-		 * we simply stop the tick timer:
+		 * If the expiration time == KTIME_MAX, then
+		 * in this case we simply stop the tick timer.
 		 */
-		if (unlikely(delta_jiffies >= NEXT_TIMER_MAX_DELTA)) {
-			ts->idle_expires.tv64 = KTIME_MAX;
+		 if (unlikely(expires.tv64 == KTIME_MAX)) {
 			if (ts->nohz_mode == NOHZ_MODE_HIGHRES)
 				hrtimer_cancel(&ts->sched_timer);
 			goto out;
 		}
 
-		/* Mark expiries */
-		ts->idle_expires = expires;
-
 		if (ts->nohz_mode == NOHZ_MODE_HIGHRES) {
 			hrtimer_start(&ts->sched_timer, expires,
 				      HRTIMER_MODE_ABS_PINNED);
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 15e06de..2e57251 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -487,6 +487,17 @@ int timekeeping_valid_for_hres(void)
 }
 
 /**
+ * timekeeping_max_deferment - Returns max time the clocksource can be deferred
+ *
+ * IMPORTANT: Caller must observe xtime_lock via read_seqbegin/read_seqretry
+ * to ensure that the clocksource does not change!
+ */
+s64 timekeeping_max_deferment(void)
+{
+	return timekeeper.clock->max_idle_ns;
+}
+
+/**
  * read_persistent_clock -  Return time from the persistent clock.
  *
  * Weak dummy function for arches that do not yet support it.
-- 
1.6.0.4


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 2/2] Dynamic Tick: Allow 32-bit machines to sleep for more than 2.15 seconds
  2009-08-18 17:45 ` [PATCH 1/2] Dynamic Tick: Prevent clocksource wrapping during idle Jon Hunter
@ 2009-08-18 17:45   ` Jon Hunter
  2009-08-18 19:26     ` Thomas Gleixner
  2009-11-13 19:50     ` [tip:timers/core] nohz: " tip-bot for Jon Hunter
  2009-08-18 19:25   ` [PATCH 1/2] Dynamic Tick: Prevent clocksource wrapping during idle Thomas Gleixner
  2009-11-13 19:49   ` [tip:timers/core] nohz: " tip-bot for Jon Hunter
  2 siblings, 2 replies; 29+ messages in thread
From: Jon Hunter @ 2009-08-18 17:45 UTC (permalink / raw)
  To: linux-kernel; +Cc: Thomas Gleixner, John Stultz, Jon Hunter

From: Jon Hunter <jon-hunter@ti.com>

In the dynamic tick code, "max_delta_ns" (member of the
"clock_event_device" structure) represents the maximum sleep time
that can occur between timer events in nanoseconds.

The variable, "max_delta_ns", is defined as an unsigned long
which is a 32-bit integer for 32-bit machines and a 64-bit
integer for 64-bit machines (if -m64 option is used for gcc).
The value of max_delta_ns is set by calling the function
"clockevent_delta2ns()" which returns a maximum value of LONG_MAX.
For a 32-bit machine LONG_MAX is equal to 0x7fffffff and in
nanoseconds this equates to ~2.15 seconds. Hence, the maximum
sleep time for a 32-bit machine is ~2.15 seconds, where as for
a 64-bit machine it will be many years.

This patch changes the type of max_delta_ns to be "unsigned long
long" instead of "unsigned long" so that this variable is a 64-bit
type for both 32-bit and 64-bit machines. It also changes the
maximum value returned by clockevent_delta2ns() to LLONG_MAX.
Hence this allows a 32-bit machine to sleep for longer than ~2.15
seconds. Please note that this patch also changes "min_delta_ns"
to be "unsigned long long" too and although this is probably
unnecessary, it makes the patch simpler.

Signed-off-by: Jon Hunter <jon-hunter@ti.com>
---
 include/linux/clockchips.h |    6 +++---
 kernel/hrtimer.c           |    2 +-
 kernel/time/clockevents.c  |   10 +++++-----
 kernel/time/tick-oneshot.c |    2 +-
 kernel/time/timer_list.c   |    4 ++--
 5 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/include/linux/clockchips.h b/include/linux/clockchips.h
index 3a1dbba..8154bc6 100644
--- a/include/linux/clockchips.h
+++ b/include/linux/clockchips.h
@@ -77,8 +77,8 @@ enum clock_event_nofitiers {
 struct clock_event_device {
 	const char		*name;
 	unsigned int		features;
-	unsigned long		max_delta_ns;
-	unsigned long		min_delta_ns;
+	unsigned long long	max_delta_ns;
+	unsigned long long	min_delta_ns;
 	unsigned long		mult;
 	int			shift;
 	int			rating;
@@ -116,7 +116,7 @@ static inline unsigned long div_sc(unsigned long ticks, unsigned long nsec,
 }
 
 /* Clock event layer functions */
-extern unsigned long clockevent_delta2ns(unsigned long latch,
+extern unsigned long long clockevent_delta2ns(unsigned long latch,
 					 struct clock_event_device *evt);
 extern void clockevents_register_device(struct clock_event_device *dev);
 
diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
index e2f91ec..2043e78 100644
--- a/kernel/hrtimer.c
+++ b/kernel/hrtimer.c
@@ -1198,7 +1198,7 @@ hrtimer_interrupt_hanging(struct clock_event_device *dev,
 	force_clock_reprogram = 1;
 	dev->min_delta_ns = (unsigned long)try_time.tv64 * 3;
 	printk(KERN_WARNING "hrtimer: interrupt too slow, "
-		"forcing clock min delta to %lu ns\n", dev->min_delta_ns);
+		"forcing clock min delta to %llu ns\n", dev->min_delta_ns);
 }
 /*
  * High resolution timer interrupt
diff --git a/kernel/time/clockevents.c b/kernel/time/clockevents.c
index a6dcd67..6db410f 100644
--- a/kernel/time/clockevents.c
+++ b/kernel/time/clockevents.c
@@ -37,10 +37,10 @@ static DEFINE_SPINLOCK(clockevents_lock);
  *
  * Math helper, returns latch value converted to nanoseconds (bound checked)
  */
-unsigned long clockevent_delta2ns(unsigned long latch,
+unsigned long long clockevent_delta2ns(unsigned long latch,
 				  struct clock_event_device *evt)
 {
-	u64 clc = ((u64) latch << evt->shift);
+	unsigned long long clc = ((unsigned long long) latch << evt->shift);
 
 	if (unlikely(!evt->mult)) {
 		evt->mult = 1;
@@ -50,10 +50,10 @@ unsigned long clockevent_delta2ns(unsigned long latch,
 	do_div(clc, evt->mult);
 	if (clc < 1000)
 		clc = 1000;
-	if (clc > LONG_MAX)
-		clc = LONG_MAX;
+	if (clc > LLONG_MAX)
+		clc = LLONG_MAX;
 
-	return (unsigned long) clc;
+	return clc;
 }
 EXPORT_SYMBOL_GPL(clockevent_delta2ns);
 
diff --git a/kernel/time/tick-oneshot.c b/kernel/time/tick-oneshot.c
index a96c0e2..327d4ed 100644
--- a/kernel/time/tick-oneshot.c
+++ b/kernel/time/tick-oneshot.c
@@ -50,7 +50,7 @@ int tick_dev_program_event(struct clock_event_device *dev, ktime_t expires,
 				dev->min_delta_ns += dev->min_delta_ns >> 1;
 
 			printk(KERN_WARNING
-			       "CE: %s increasing min_delta_ns to %lu nsec\n",
+			       "CE: %s increasing min_delta_ns to %llu nsec\n",
 			       dev->name ? dev->name : "?",
 			       dev->min_delta_ns << 1);
 
diff --git a/kernel/time/timer_list.c b/kernel/time/timer_list.c
index a999b92..3bf30b4 100644
--- a/kernel/time/timer_list.c
+++ b/kernel/time/timer_list.c
@@ -204,8 +204,8 @@ print_tickdevice(struct seq_file *m, struct tick_device *td, int cpu)
 		return;
 	}
 	SEQ_printf(m, "%s\n", dev->name);
-	SEQ_printf(m, " max_delta_ns:   %lu\n", dev->max_delta_ns);
-	SEQ_printf(m, " min_delta_ns:   %lu\n", dev->min_delta_ns);
+	SEQ_printf(m, " max_delta_ns:   %llu\n", dev->max_delta_ns);
+	SEQ_printf(m, " min_delta_ns:   %llu\n", dev->min_delta_ns);
 	SEQ_printf(m, " mult:           %lu\n", dev->mult);
 	SEQ_printf(m, " shift:          %d\n", dev->shift);
 	SEQ_printf(m, " mode:           %d\n", dev->mode);
-- 
1.6.0.4


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/2] Dynamic Tick: Allow 32-bit machines to sleep for more than 2.15 seconds
  2009-08-18 17:45   ` [PATCH 2/2] Dynamic Tick: Allow 32-bit machines to sleep for more than 2.15 seconds Jon Hunter
@ 2009-08-18 19:26     ` Thomas Gleixner
  2009-08-18 20:52       ` Jon Hunter
  2009-11-13 19:50     ` [tip:timers/core] nohz: " tip-bot for Jon Hunter
  1 sibling, 1 reply; 29+ messages in thread
From: Thomas Gleixner @ 2009-08-18 19:26 UTC (permalink / raw)
  To: Jon Hunter; +Cc: linux-kernel, John Stultz

On Tue, 18 Aug 2009, Jon Hunter wrote:
> diff --git a/include/linux/clockchips.h b/include/linux/clockchips.h
> index 3a1dbba..8154bc6 100644
> --- a/include/linux/clockchips.h
> +++ b/include/linux/clockchips.h
> @@ -77,8 +77,8 @@ enum clock_event_nofitiers {
>  struct clock_event_device {
>  	const char		*name;
>  	unsigned int		features;
> -	unsigned long		max_delta_ns;
> -	unsigned long		min_delta_ns;
> +	unsigned long long	max_delta_ns;
> +	unsigned long long	min_delta_ns;

Can we please use u64 for this ?

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/2] Dynamic Tick: Allow 32-bit machines to sleep for more than 2.15 seconds
  2009-08-18 19:26     ` Thomas Gleixner
@ 2009-08-18 20:52       ` Jon Hunter
  0 siblings, 0 replies; 29+ messages in thread
From: Jon Hunter @ 2009-08-18 20:52 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: linux-kernel, John Stultz


Thomas Gleixner wrote:
> On Tue, 18 Aug 2009, Jon Hunter wrote:
>> diff --git a/include/linux/clockchips.h b/include/linux/clockchips.h
>> index 3a1dbba..8154bc6 100644
>> --- a/include/linux/clockchips.h
>> +++ b/include/linux/clockchips.h
>> @@ -77,8 +77,8 @@ enum clock_event_nofitiers {
>>  struct clock_event_device {
>>  	const char		*name;
>>  	unsigned int		features;
>> -	unsigned long		max_delta_ns;
>> -	unsigned long		min_delta_ns;
>> +	unsigned long long	max_delta_ns;
>> +	unsigned long long	min_delta_ns;
> 
> Can we please use u64 for this ?

John brought this up as well. There was some discussion sometime back 
about this. I did get some feedback that u64 was a different type 
between ppc64 and x86-64 which was causing problems with printk. The 
above variables are also used with printk in the kernel today.

See the following email:
http://marc.info/?l=linux-kernel&m=124041426203283&w=2

I am not sure if this is still the case and safer to stick with 
long-long for now. Let me know your thoughts.

Cheers
Jon









^ permalink raw reply	[flat|nested] 29+ messages in thread

* [tip:timers/core] nohz: Allow 32-bit machines to sleep for more than 2.15 seconds
  2009-08-18 17:45   ` [PATCH 2/2] Dynamic Tick: Allow 32-bit machines to sleep for more than 2.15 seconds Jon Hunter
  2009-08-18 19:26     ` Thomas Gleixner
@ 2009-11-13 19:50     ` tip-bot for Jon Hunter
  1 sibling, 0 replies; 29+ messages in thread
From: tip-bot for Jon Hunter @ 2009-11-13 19:50 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: linux-kernel, hpa, mingo, johnstul, jon-hunter, tglx

Commit-ID:  97813f2fe77804a4464564c75ba8d8826377feea
Gitweb:     http://git.kernel.org/tip/97813f2fe77804a4464564c75ba8d8826377feea
Author:     Jon Hunter <jon-hunter@ti.com>
AuthorDate: Tue, 18 Aug 2009 12:45:11 -0500
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Fri, 13 Nov 2009 20:46:24 +0100

nohz: Allow 32-bit machines to sleep for more than 2.15 seconds

In the dynamic tick code, "max_delta_ns" (member of the
"clock_event_device" structure) represents the maximum sleep time
that can occur between timer events in nanoseconds.

The variable, "max_delta_ns", is defined as an unsigned long
which is a 32-bit integer for 32-bit machines and a 64-bit
integer for 64-bit machines (if -m64 option is used for gcc).
The value of max_delta_ns is set by calling the function
"clockevent_delta2ns()" which returns a maximum value of LONG_MAX.
For a 32-bit machine LONG_MAX is equal to 0x7fffffff and in
nanoseconds this equates to ~2.15 seconds. Hence, the maximum
sleep time for a 32-bit machine is ~2.15 seconds, where as for
a 64-bit machine it will be many years.

This patch changes the type of max_delta_ns to be "u64" instead of
"unsigned long" so that this variable is a 64-bit type for both 32-bit
and 64-bit machines. It also changes the maximum value returned by
clockevent_delta2ns() to KTIME_MAX.  Hence this allows a 32-bit
machine to sleep for longer than ~2.15 seconds. Please note that this
patch also changes "min_delta_ns" to be "u64" too and although this is
unnecessary, it makes the patch simpler as it avoids to fixup all
callers of clockevent_delta2ns().

[ tglx: changed "unsigned long long" to u64 as we use this data type
  	through out the time code ]

Signed-off-by: Jon Hunter <jon-hunter@ti.com>
Cc: John Stultz <johnstul@us.ibm.com>
LKML-Reference: <1250617512-23567-3-git-send-email-jon-hunter@ti.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 include/linux/clockchips.h |    8 ++++----
 kernel/hrtimer.c           |    3 ++-
 kernel/time/clockevents.c  |   11 +++++------
 kernel/time/tick-oneshot.c |    4 ++--
 kernel/time/timer_list.c   |    6 ++++--
 5 files changed, 17 insertions(+), 15 deletions(-)

diff --git a/include/linux/clockchips.h b/include/linux/clockchips.h
index 4d438b0..0cf725b 100644
--- a/include/linux/clockchips.h
+++ b/include/linux/clockchips.h
@@ -77,8 +77,8 @@ enum clock_event_nofitiers {
 struct clock_event_device {
 	const char		*name;
 	unsigned int		features;
-	unsigned long		max_delta_ns;
-	unsigned long		min_delta_ns;
+	u64			max_delta_ns;
+	u64			min_delta_ns;
 	u32			mult;
 	u32			shift;
 	int			rating;
@@ -116,8 +116,8 @@ static inline unsigned long div_sc(unsigned long ticks, unsigned long nsec,
 }
 
 /* Clock event layer functions */
-extern unsigned long clockevent_delta2ns(unsigned long latch,
-					 struct clock_event_device *evt);
+extern u64 clockevent_delta2ns(unsigned long latch,
+			       struct clock_event_device *evt);
 extern void clockevents_register_device(struct clock_event_device *dev);
 
 extern void clockevents_exchange_device(struct clock_event_device *old,
diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
index 6d70204..c215b74 100644
--- a/kernel/hrtimer.c
+++ b/kernel/hrtimer.c
@@ -1240,7 +1240,8 @@ hrtimer_interrupt_hanging(struct clock_event_device *dev,
 	force_clock_reprogram = 1;
 	dev->min_delta_ns = (unsigned long)try_time.tv64 * 3;
 	printk(KERN_WARNING "hrtimer: interrupt too slow, "
-		"forcing clock min delta to %lu ns\n", dev->min_delta_ns);
+	       "forcing clock min delta to %llu ns\n",
+	       (unsigned long long) dev->min_delta_ns);
 }
 /*
  * High resolution timer interrupt
diff --git a/kernel/time/clockevents.c b/kernel/time/clockevents.c
index 620b58a..05e8aee 100644
--- a/kernel/time/clockevents.c
+++ b/kernel/time/clockevents.c
@@ -37,10 +37,9 @@ static DEFINE_SPINLOCK(clockevents_lock);
  *
  * Math helper, returns latch value converted to nanoseconds (bound checked)
  */
-unsigned long clockevent_delta2ns(unsigned long latch,
-				  struct clock_event_device *evt)
+u64 clockevent_delta2ns(unsigned long latch, struct clock_event_device *evt)
 {
-	u64 clc = ((u64) latch << evt->shift);
+	u64 clc = (u64) latch << evt->shift;
 
 	if (unlikely(!evt->mult)) {
 		evt->mult = 1;
@@ -50,10 +49,10 @@ unsigned long clockevent_delta2ns(unsigned long latch,
 	do_div(clc, evt->mult);
 	if (clc < 1000)
 		clc = 1000;
-	if (clc > LONG_MAX)
-		clc = LONG_MAX;
+	if (clc > KTIME_MAX)
+		clc = KTIME_MAX;
 
-	return (unsigned long) clc;
+	return clc;
 }
 EXPORT_SYMBOL_GPL(clockevent_delta2ns);
 
diff --git a/kernel/time/tick-oneshot.c b/kernel/time/tick-oneshot.c
index a96c0e2..0a8a213 100644
--- a/kernel/time/tick-oneshot.c
+++ b/kernel/time/tick-oneshot.c
@@ -50,9 +50,9 @@ int tick_dev_program_event(struct clock_event_device *dev, ktime_t expires,
 				dev->min_delta_ns += dev->min_delta_ns >> 1;
 
 			printk(KERN_WARNING
-			       "CE: %s increasing min_delta_ns to %lu nsec\n",
+			       "CE: %s increasing min_delta_ns to %llu nsec\n",
 			       dev->name ? dev->name : "?",
-			       dev->min_delta_ns << 1);
+			       (unsigned long long) dev->min_delta_ns << 1);
 
 			i = 0;
 		}
diff --git a/kernel/time/timer_list.c b/kernel/time/timer_list.c
index fa00da1..665c76e 100644
--- a/kernel/time/timer_list.c
+++ b/kernel/time/timer_list.c
@@ -204,8 +204,10 @@ print_tickdevice(struct seq_file *m, struct tick_device *td, int cpu)
 		return;
 	}
 	SEQ_printf(m, "%s\n", dev->name);
-	SEQ_printf(m, " max_delta_ns:   %lu\n", dev->max_delta_ns);
-	SEQ_printf(m, " min_delta_ns:   %lu\n", dev->min_delta_ns);
+	SEQ_printf(m, " max_delta_ns:   %llu\n",
+		   (unsigned long long) dev->max_delta_ns);
+	SEQ_printf(m, " min_delta_ns:   %llu\n",
+		   (unsigned long long) dev->min_delta_ns);
 	SEQ_printf(m, " mult:           %u\n", dev->mult);
 	SEQ_printf(m, " shift:          %u\n", dev->shift);
 	SEQ_printf(m, " mode:           %d\n", dev->mode);

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/2] Dynamic Tick: Prevent clocksource wrapping during idle
  2009-08-18 17:45 ` [PATCH 1/2] Dynamic Tick: Prevent clocksource wrapping during idle Jon Hunter
  2009-08-18 17:45   ` [PATCH 2/2] Dynamic Tick: Allow 32-bit machines to sleep for more than 2.15 seconds Jon Hunter
@ 2009-08-18 19:25   ` Thomas Gleixner
  2009-08-18 20:42     ` Jon Hunter
  2009-11-13 19:49   ` [tip:timers/core] nohz: " tip-bot for Jon Hunter
  2 siblings, 1 reply; 29+ messages in thread
From: Thomas Gleixner @ 2009-08-18 19:25 UTC (permalink / raw)
  To: Jon Hunter; +Cc: linux-kernel, John Stultz

On Tue, 18 Aug 2009, Jon Hunter wrote:

> From: Jon Hunter <jon-hunter@ti.com>
> 
> The dynamic tick allows the kernel to sleep for periods longer
> than a single tick. This patch prevents that the kernel from
> sleeping for a period longer than the maximum time that the
> current clocksource can count. This ensures that the kernel will
> not lose track of time. This patch adds a function called
> "clocksource_max_deferment()" that calculates the maximum time the
> kernel can sleep for a given clocksource and function called
> "timekeeping_max_deferment()" that returns maximum time the kernel
> can sleep for the current clocksource.
> 
> Signed-off-by: Jon Hunter <jon-hunter@ti.com>
> ---
>  include/linux/clocksource.h |    2 +
>  include/linux/time.h        |    1 +
>  kernel/time/clocksource.c   |   47 +++++++++++++++++++++++++++++++++++
>  kernel/time/tick-sched.c    |   57 ++++++++++++++++++++++++++++++++----------
>  kernel/time/timekeeping.c   |   11 ++++++++
>  5 files changed, 104 insertions(+), 14 deletions(-)
> 
> diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h
> index 9ea40ff..09ed7f1 100644
> --- a/include/linux/clocksource.h
> +++ b/include/linux/clocksource.h
> @@ -151,6 +151,7 @@ extern u64 timecounter_cyc2time(struct timecounter *tc,
>   *			subtraction of non 64 bit counters
>   * @mult:		cycle to nanosecond multiplier
>   * @shift:		cycle to nanosecond divisor (power of two)
> + * @max_idle_ns:	max idle time permitted by the clocksource (nsecs)
>   * @flags:		flags describing special properties
>   * @vread:		vsyscall based read
>   * @resume:		resume function for the clocksource, if necessary
> @@ -168,6 +169,7 @@ struct clocksource {
>  	cycle_t mask;
>  	u32 mult;
>  	u32 shift;
> +	s64 max_idle_ns;

I don't think we should move this to the clocksource. That should go
into the new struct timekeeper and initialized when a clocksource is
selected for timekeeping.

> diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> index e0f59a2..7a98e90 100644
> --- a/kernel/time/tick-sched.c
> +++ b/kernel/time/tick-sched.c
> @@ -217,6 +217,7 @@ void tick_nohz_stop_sched_tick(int inidle)
>  	ktime_t last_update, expires, now;
>  	struct clock_event_device *dev = __get_cpu_var(tick_cpu_device).evtdev;
>  	int cpu;
> +	s64 time_delta, max_time_delta;
>  
>  	local_irq_save(flags);
>  
> @@ -270,6 +271,18 @@ void tick_nohz_stop_sched_tick(int inidle)
>  		seq = read_seqbegin(&xtime_lock);
>  		last_update = last_jiffies_update;
>  		last_jiffies = jiffies;
> +
> +		/*
> +		 * On SMP we really should only care for the CPU which
> +		 * has the do_timer duty assigned. All other CPUs can
> +		 * sleep as long as they want.
> +		 */
> +		if (cpu == tick_do_timer_cpu ||
> +				tick_do_timer_cpu == TICK_DO_TIMER_NONE)
> +			max_time_delta = timekeeping_max_deferment();
> +		else
> +			max_time_delta = KTIME_MAX;
> +

Is it worth the extra check instead of always using
timekeeping_max_deferment() ?

>  	} while (read_seqretry(&xtime_lock, seq));
>  
>  	/* Get the next timer wheel timer */
> @@ -289,11 +302,30 @@ void tick_nohz_stop_sched_tick(int inidle)
>  	if ((long)delta_jiffies >= 1) {
>  
>  		/*
> -		* calculate the expiry time for the next timer wheel
> -		* timer
> -		*/
> -		expires = ktime_add_ns(last_update, tick_period.tv64 *
> -				   delta_jiffies);
> +		 * calculate the expiry time for the next timer wheel
> +		 * timer. delta_jiffies >= NEXT_TIMER_MAX_DELTA signals
> +		 * that there is no timer pending or at least extremely
> +		 * far into the future (12 days for HZ=1000). In this
> +		 * case we set the expiry to the end of time.
> +		 */
> +		if (likely(delta_jiffies < NEXT_TIMER_MAX_DELTA)) {
> +
> +			/*
> +			 * Calculate the time delta for the next timer event.
> +			 * If the time delta exceeds the maximum time delta
> +			 * permitted by the current clocksource then adjust
> +			 * the time delta accordingly to ensure the
> +			 * clocksource does not wrap.
> +			 */
> +			time_delta = tick_period.tv64 * delta_jiffies;
> +
> +			if (time_delta > max_time_delta)
> +				time_delta = max_time_delta;
> +
> +			expires = ktime_add_ns(last_update, time_delta);
> +		} else {
> +			expires.tv64 = KTIME_MAX;
> +		}

This looks incorrect. You set expires to KTIME_MAX when no timer is
pending, but that defeats the purpose of this patch. When we hit this
code path and the next interrupt comes in after the timekeeping
clocksource wrapped we are bust.
  
Thanks,

	tglx

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/2] Dynamic Tick: Prevent clocksource wrapping during idle
  2009-08-18 19:25   ` [PATCH 1/2] Dynamic Tick: Prevent clocksource wrapping during idle Thomas Gleixner
@ 2009-08-18 20:42     ` Jon Hunter
  0 siblings, 0 replies; 29+ messages in thread
From: Jon Hunter @ 2009-08-18 20:42 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: john stultz, linux-kernel@vger.kernel.org


Thomas Gleixner wrote:
> On Tue, 18 Aug 2009, Jon Hunter wrote:
> 
>> From: Jon Hunter <jon-hunter@ti.com>
>>
>> The dynamic tick allows the kernel to sleep for periods longer
>> than a single tick. This patch prevents that the kernel from
>> sleeping for a period longer than the maximum time that the
>> current clocksource can count. This ensures that the kernel will
>> not lose track of time. This patch adds a function called
>> "clocksource_max_deferment()" that calculates the maximum time the
>> kernel can sleep for a given clocksource and function called
>> "timekeeping_max_deferment()" that returns maximum time the kernel
>> can sleep for the current clocksource.
>>
>> Signed-off-by: Jon Hunter <jon-hunter@ti.com>
>> ---
>>  include/linux/clocksource.h |    2 +
>>  include/linux/time.h        |    1 +
>>  kernel/time/clocksource.c   |   47 +++++++++++++++++++++++++++++++++++
>>  kernel/time/tick-sched.c    |   57 ++++++++++++++++++++++++++++++++----------
>>  kernel/time/timekeeping.c   |   11 ++++++++
>>  5 files changed, 104 insertions(+), 14 deletions(-)
>>
>> diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h
>> index 9ea40ff..09ed7f1 100644
>> --- a/include/linux/clocksource.h
>> +++ b/include/linux/clocksource.h
>> @@ -151,6 +151,7 @@ extern u64 timecounter_cyc2time(struct timecounter *tc,
>>   *			subtraction of non 64 bit counters
>>   * @mult:		cycle to nanosecond multiplier
>>   * @shift:		cycle to nanosecond divisor (power of two)
>> + * @max_idle_ns:	max idle time permitted by the clocksource (nsecs)
>>   * @flags:		flags describing special properties
>>   * @vread:		vsyscall based read
>>   * @resume:		resume function for the clocksource, if necessary
>> @@ -168,6 +169,7 @@ struct clocksource {
>>  	cycle_t mask;
>>  	u32 mult;
>>  	u32 shift;
>> +	s64 max_idle_ns;
> 
> I don't think we should move this to the clocksource. That should go
> into the new struct timekeeper and initialized when a clocksource is
> selected for timekeeping.
> 
>> diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
>> index e0f59a2..7a98e90 100644
>> --- a/kernel/time/tick-sched.c
>> +++ b/kernel/time/tick-sched.c
>> @@ -217,6 +217,7 @@ void tick_nohz_stop_sched_tick(int inidle)
>>  	ktime_t last_update, expires, now;
>>  	struct clock_event_device *dev = __get_cpu_var(tick_cpu_device).evtdev;
>>  	int cpu;
>> +	s64 time_delta, max_time_delta;
>>  
>>  	local_irq_save(flags);
>>  
>> @@ -270,6 +271,18 @@ void tick_nohz_stop_sched_tick(int inidle)
>>  		seq = read_seqbegin(&xtime_lock);
>>  		last_update = last_jiffies_update;
>>  		last_jiffies = jiffies;
>> +
>> +		/*
>> +		 * On SMP we really should only care for the CPU which
>> +		 * has the do_timer duty assigned. All other CPUs can
>> +		 * sleep as long as they want.
>> +		 */
>> +		if (cpu == tick_do_timer_cpu ||
>> +				tick_do_timer_cpu == TICK_DO_TIMER_NONE)
>> +			max_time_delta = timekeeping_max_deferment();
>> +		else
>> +			max_time_delta = KTIME_MAX;
>> +
> 
> Is it worth the extra check instead of always using
> timekeeping_max_deferment() ?
> 
>>  	} while (read_seqretry(&xtime_lock, seq));
>>  
>>  	/* Get the next timer wheel timer */
>> @@ -289,11 +302,30 @@ void tick_nohz_stop_sched_tick(int inidle)
>>  	if ((long)delta_jiffies >= 1) {
>>  
>>  		/*
>> -		* calculate the expiry time for the next timer wheel
>> -		* timer
>> -		*/
>> -		expires = ktime_add_ns(last_update, tick_period.tv64 *
>> -				   delta_jiffies);
>> +		 * calculate the expiry time for the next timer wheel
>> +		 * timer. delta_jiffies >= NEXT_TIMER_MAX_DELTA signals
>> +		 * that there is no timer pending or at least extremely
>> +		 * far into the future (12 days for HZ=1000). In this
>> +		 * case we set the expiry to the end of time.
>> +		 */
>> +		if (likely(delta_jiffies < NEXT_TIMER_MAX_DELTA)) {
>> +
>> +			/*
>> +			 * Calculate the time delta for the next timer event.
>> +			 * If the time delta exceeds the maximum time delta
>> +			 * permitted by the current clocksource then adjust
>> +			 * the time delta accordingly to ensure the
>> +			 * clocksource does not wrap.
>> +			 */
>> +			time_delta = tick_period.tv64 * delta_jiffies;
>> +
>> +			if (time_delta > max_time_delta)
>> +				time_delta = max_time_delta;
>> +
>> +			expires = ktime_add_ns(last_update, time_delta);
>> +		} else {
>> +			expires.tv64 = KTIME_MAX;
>> +		}
> 
> This looks incorrect. You set expires to KTIME_MAX when no timer is
> pending, but that defeats the purpose of this patch. When we hit this
> code path and the next interrupt comes in after the timekeeping
> clocksource wrapped we are bust.

Right, so this is a bit of a grey area for me. When I first started 
looking at this I was questioning the purpose of the following code that 
exists today in the tick_nohz_stop_sched_tick() function:

	/*
	 * delta_jiffies >= NEXT_TIMER_MAX_DELTA signals that
	 * there is no timer pending or at least extremly far
	 * into the future (12 days for HZ=1000). In this case
	 * we simply stop the tick timer:
	 */
	if (unlikely(delta_jiffies >= NEXT_TIMER_MAX_DELTA)) {
		ts->idle_expires.tv64 = KTIME_MAX;
		if (ts->nohz_mode == NOHZ_MODE_HIGHRES)
			hrtimer_cancel(&ts->sched_timer);
			goto out;
	}

The above code checks to see delta_jiffies is greater than 
NEXT_TIMER_MAX_DELTA, if so then sets expires to KTIME_MAX and disables 
the timer. I had questioned this a few months ago, but I don't think 
that John and I knew the history here. So for right or wrong, I left 
this code alone. In the above patch it is still do the same thing if 
delta_jiffies is indeed greater than NEXT_TIMER_MAX_DELTA.

If you agree that this code is not needed and that in the case where we 
have no timers we should simply make the next timer event always occur 
in max_time_delta ns later, then I can re-work it to do this.

Thanks
Jon


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [tip:timers/core] nohz: Prevent clocksource wrapping during idle
  2009-08-18 17:45 ` [PATCH 1/2] Dynamic Tick: Prevent clocksource wrapping during idle Jon Hunter
  2009-08-18 17:45   ` [PATCH 2/2] Dynamic Tick: Allow 32-bit machines to sleep for more than 2.15 seconds Jon Hunter
  2009-08-18 19:25   ` [PATCH 1/2] Dynamic Tick: Prevent clocksource wrapping during idle Thomas Gleixner
@ 2009-11-13 19:49   ` tip-bot for Jon Hunter
  2 siblings, 0 replies; 29+ messages in thread
From: tip-bot for Jon Hunter @ 2009-11-13 19:49 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: linux-kernel, hpa, mingo, johnstul, jon-hunter, tglx

Commit-ID:  98962465ed9e6ea99c38e0af63fe1dcb5a79dc25
Gitweb:     http://git.kernel.org/tip/98962465ed9e6ea99c38e0af63fe1dcb5a79dc25
Author:     Jon Hunter <jon-hunter@ti.com>
AuthorDate: Tue, 18 Aug 2009 12:45:10 -0500
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Fri, 13 Nov 2009 20:46:24 +0100

nohz: Prevent clocksource wrapping during idle

The dynamic tick allows the kernel to sleep for periods longer than a
single tick, but it does not limit the sleep time currently. In the
worst case the kernel could sleep longer than the wrap around time of
the time keeping clock source which would result in losing track of
time.

Prevent this by limiting it to the safe maximum sleep time of the
current time keeping clock source. The value is calculated when the
clock source is registered.

[ tglx: simplified the code a bit and massaged the commit msg ]

Signed-off-by: Jon Hunter <jon-hunter@ti.com>
Cc: John Stultz <johnstul@us.ibm.com>
LKML-Reference: <1250617512-23567-2-git-send-email-jon-hunter@ti.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 include/linux/clocksource.h |    2 +
 include/linux/time.h        |    1 +
 kernel/time/clocksource.c   |   44 ++++++++++++++++++++++++++++++++++++
 kernel/time/tick-sched.c    |   52 +++++++++++++++++++++++++++++++-----------
 kernel/time/timekeeping.c   |   11 +++++++++
 5 files changed, 96 insertions(+), 14 deletions(-)

diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h
index f57f882..279c547 100644
--- a/include/linux/clocksource.h
+++ b/include/linux/clocksource.h
@@ -151,6 +151,7 @@ extern u64 timecounter_cyc2time(struct timecounter *tc,
  *			subtraction of non 64 bit counters
  * @mult:		cycle to nanosecond multiplier
  * @shift:		cycle to nanosecond divisor (power of two)
+ * @max_idle_ns:	max idle time permitted by the clocksource (nsecs)
  * @flags:		flags describing special properties
  * @vread:		vsyscall based read
  * @resume:		resume function for the clocksource, if necessary
@@ -168,6 +169,7 @@ struct clocksource {
 	cycle_t mask;
 	u32 mult;
 	u32 shift;
+	u64 max_idle_ns;
 	unsigned long flags;
 	cycle_t (*vread)(void);
 	void (*resume)(void);
diff --git a/include/linux/time.h b/include/linux/time.h
index fe04e5e..6e026e4 100644
--- a/include/linux/time.h
+++ b/include/linux/time.h
@@ -148,6 +148,7 @@ extern void monotonic_to_bootbased(struct timespec *ts);
 
 extern struct timespec timespec_trunc(struct timespec t, unsigned gran);
 extern int timekeeping_valid_for_hres(void);
+extern u64 timekeeping_max_deferment(void);
 extern void update_wall_time(void);
 extern void update_xtime_cache(u64 nsec);
 extern void timekeeping_leap_insert(int leapsecond);
diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index 407c089..b65b242 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -469,6 +469,47 @@ void clocksource_touch_watchdog(void)
 #ifdef CONFIG_GENERIC_TIME
 
 /**
+ * clocksource_max_deferment - Returns max time the clocksource can be deferred
+ * @cs:         Pointer to clocksource
+ *
+ */
+static u64 clocksource_max_deferment(struct clocksource *cs)
+{
+	u64 max_nsecs, max_cycles;
+
+	/*
+	 * Calculate the maximum number of cycles that we can pass to the
+	 * cyc2ns function without overflowing a 64-bit signed result. The
+	 * maximum number of cycles is equal to ULLONG_MAX/cs->mult which
+	 * is equivalent to the below.
+	 * max_cycles < (2^63)/cs->mult
+	 * max_cycles < 2^(log2((2^63)/cs->mult))
+	 * max_cycles < 2^(log2(2^63) - log2(cs->mult))
+	 * max_cycles < 2^(63 - log2(cs->mult))
+	 * max_cycles < 1 << (63 - log2(cs->mult))
+	 * Please note that we add 1 to the result of the log2 to account for
+	 * any rounding errors, ensure the above inequality is satisfied and
+	 * no overflow will occur.
+	 */
+	max_cycles = 1ULL << (63 - (ilog2(cs->mult) + 1));
+
+	/*
+	 * The actual maximum number of cycles we can defer the clocksource is
+	 * determined by the minimum of max_cycles and cs->mask.
+	 */
+	max_cycles = min_t(u64, max_cycles, (u64) cs->mask);
+	max_nsecs = clocksource_cyc2ns(max_cycles, cs->mult, cs->shift);
+
+	/*
+	 * To ensure that the clocksource does not wrap whilst we are idle,
+	 * limit the time the clocksource can be deferred by 12.5%. Please
+	 * note a margin of 12.5% is used because this can be computed with
+	 * a shift, versus say 10% which would require division.
+	 */
+	return max_nsecs - (max_nsecs >> 5);
+}
+
+/**
  * clocksource_select - Select the best clocksource available
  *
  * Private function. Must hold clocksource_mutex when called.
@@ -564,6 +605,9 @@ static void clocksource_enqueue(struct clocksource *cs)
  */
 int clocksource_register(struct clocksource *cs)
 {
+	/* calculate max idle time permitted for this clocksource */
+	cs->max_idle_ns = clocksource_max_deferment(cs);
+
 	mutex_lock(&clocksource_mutex);
 	clocksource_enqueue(cs);
 	clocksource_select();
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index c65ba0f..a80b464 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -208,6 +208,7 @@ void tick_nohz_stop_sched_tick(int inidle)
 	struct tick_sched *ts;
 	ktime_t last_update, expires, now;
 	struct clock_event_device *dev = __get_cpu_var(tick_cpu_device).evtdev;
+	u64 time_delta;
 	int cpu;
 
 	local_irq_save(flags);
@@ -262,6 +263,17 @@ void tick_nohz_stop_sched_tick(int inidle)
 		seq = read_seqbegin(&xtime_lock);
 		last_update = last_jiffies_update;
 		last_jiffies = jiffies;
+
+		/*
+		 * On SMP we really should only care for the CPU which
+		 * has the do_timer duty assigned. All other CPUs can
+		 * sleep as long as they want.
+		 */
+		if (cpu == tick_do_timer_cpu ||
+		    tick_do_timer_cpu == TICK_DO_TIMER_NONE)
+			time_delta = timekeeping_max_deferment();
+		else
+			time_delta = KTIME_MAX;
 	} while (read_seqretry(&xtime_lock, seq));
 
 	if (rcu_needs_cpu(cpu) || printk_needs_cpu(cpu) ||
@@ -284,11 +296,26 @@ void tick_nohz_stop_sched_tick(int inidle)
 	if ((long)delta_jiffies >= 1) {
 
 		/*
-		* calculate the expiry time for the next timer wheel
-		* timer
-		*/
-		expires = ktime_add_ns(last_update, tick_period.tv64 *
-				   delta_jiffies);
+		 * calculate the expiry time for the next timer wheel
+		 * timer. delta_jiffies >= NEXT_TIMER_MAX_DELTA signals
+		 * that there is no timer pending or at least extremely
+		 * far into the future (12 days for HZ=1000). In this
+		 * case we set the expiry to the end of time.
+		 */
+		if (likely(delta_jiffies < NEXT_TIMER_MAX_DELTA)) {
+			/*
+			 * Calculate the time delta for the next timer event.
+			 * If the time delta exceeds the maximum time delta
+			 * permitted by the current clocksource then adjust
+			 * the time delta accordingly to ensure the
+			 * clocksource does not wrap.
+			 */
+			time_delta = min_t(u64, time_delta,
+					   tick_period.tv64 * delta_jiffies);
+			expires = ktime_add_ns(last_update, time_delta);
+		} else {
+			expires.tv64 = KTIME_MAX;
+		}
 
 		/*
 		 * If this cpu is the one which updates jiffies, then
@@ -332,22 +359,19 @@ void tick_nohz_stop_sched_tick(int inidle)
 
 		ts->idle_sleeps++;
 
+		/* Mark expires */
+		ts->idle_expires = expires;
+
 		/*
-		 * delta_jiffies >= NEXT_TIMER_MAX_DELTA signals that
-		 * there is no timer pending or at least extremly far
-		 * into the future (12 days for HZ=1000). In this case
-		 * we simply stop the tick timer:
+		 * If the expiration time == KTIME_MAX, then
+		 * in this case we simply stop the tick timer.
 		 */
-		if (unlikely(delta_jiffies >= NEXT_TIMER_MAX_DELTA)) {
-			ts->idle_expires.tv64 = KTIME_MAX;
+		 if (unlikely(expires.tv64 == KTIME_MAX)) {
 			if (ts->nohz_mode == NOHZ_MODE_HIGHRES)
 				hrtimer_cancel(&ts->sched_timer);
 			goto out;
 		}
 
-		/* Mark expiries */
-		ts->idle_expires = expires;
-
 		if (ts->nohz_mode == NOHZ_MODE_HIGHRES) {
 			hrtimer_start(&ts->sched_timer, expires,
 				      HRTIMER_MODE_ABS_PINNED);
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 96b3f0d..5d4d423 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -478,6 +478,17 @@ int timekeeping_valid_for_hres(void)
 }
 
 /**
+ * timekeeping_max_deferment - Returns max time the clocksource can be deferred
+ *
+ * Caller must observe xtime_lock via read_seqbegin/read_seqretry to
+ * ensure that the clocksource does not change!
+ */
+u64 timekeeping_max_deferment(void)
+{
+	return timekeeper.clock->max_idle_ns;
+}
+
+/**
  * read_persistent_clock -  Return time from the persistent clock.
  *
  * Weak dummy function for arches that do not yet support it.

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH 0/2] Dynamic Tick: Enabling longer sleep times on 32-bit  machines
  2009-08-18 17:45 [PATCH 0/2] Dynamic Tick: Enabling longer sleep times on 32-bit machines Jon Hunter
  2009-08-18 17:45 ` [PATCH 1/2] Dynamic Tick: Prevent clocksource wrapping during idle Jon Hunter
@ 2009-11-11 20:43 ` john stultz
  2009-11-11 20:57   ` Jon Hunter
  1 sibling, 1 reply; 29+ messages in thread
From: john stultz @ 2009-11-11 20:43 UTC (permalink / raw)
  To: Jon Hunter; +Cc: linux-kernel, Thomas Gleixner

On Tue, Aug 18, 2009 at 9:45 AM, Jon Hunter <jon-hunter@ti.com> wrote:
> From: Jon Hunter <jon-hunter@ti.com>
>
> This is a resend of the patch series shown here:
> http://www.spinics.net/lists/kernel/msg891029.html
>
> This patch series has been rebase on the linux-2.6-tip timers/core branch per
> request from Thomas Gleixner.
>
> This patch series ensures that the wrapping of the clocksource will not be
> missed if the kernel sleeps for longer periods and allows 32-bit machines to
> sleep for longer than 2.15 seconds.
>
> Jon Hunter (2):
>  Dynamic Tick: Prevent clocksource wrapping during idle
>  Dynamic Tick: Allow 32-bit machines to sleep for more than 2.15
>    seconds

I could have sworn this was in mainline by now, but I recently was
looking for the code and can't find it there or in -tip either.

Thomas, are they just hiding somewhere I can't find?

Jon, you've been terribly patient and great about resubmitting these
patches over and over. If I'm not just being crazy and missing these
patches in front of my nose, are you still willing to submit them
again? I think they'll be quite useful as folks start pushing the NOHZ
idle times out.

thanks
-john

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 0/2] Dynamic Tick: Enabling longer sleep times on 32-bit machines
  2009-11-11 20:43 ` [PATCH 0/2] Dynamic Tick: Enabling longer sleep times on 32-bit machines john stultz
@ 2009-11-11 20:57   ` Jon Hunter
  2009-11-11 22:37     ` john stultz
  0 siblings, 1 reply; 29+ messages in thread
From: Jon Hunter @ 2009-11-11 20:57 UTC (permalink / raw)
  To: john stultz; +Cc: linux-kernel, Thomas Gleixner


john stultz wrote:
> I could have sworn this was in mainline by now, but I recently was
> looking for the code and can't find it there or in -tip either.
> 
> Thomas, are they just hiding somewhere I can't find?
> 
> Jon, you've been terribly patient and great about resubmitting these
> patches over and over. If I'm not just being crazy and missing these
> patches in front of my nose, are you still willing to submit them
> again? I think they'll be quite useful as folks start pushing the NOHZ
> idle times out.

Absolutely! It is still on my to-do list, but unfortunately, I got busy 
with a couple other things.

With regard to the last patch set I submitted for this, Thomas had an 
issue with one of the patches. I understand the concern, but I am not 
sure  which would be the preferred way to handle this. See the below thread:

http://marc.info/?l=linux-kernel&m=125062817124381&w=2

If you or Thomas have any feedback on this, I could re-work the patch 
against the latest kernel tree.

Cheers
Jon

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 0/2] Dynamic Tick: Enabling longer sleep times on 32-bit machines
  2009-11-11 20:57   ` Jon Hunter
@ 2009-11-11 22:37     ` john stultz
  0 siblings, 0 replies; 29+ messages in thread
From: john stultz @ 2009-11-11 22:37 UTC (permalink / raw)
  To: Jon Hunter; +Cc: linux-kernel, Thomas Gleixner

On Wed, 2009-11-11 at 14:57 -0600, Jon Hunter wrote:
> john stultz wrote:
> > I could have sworn this was in mainline by now, but I recently was
> > looking for the code and can't find it there or in -tip either.
> > 
> > Thomas, are they just hiding somewhere I can't find?
> > 
> > Jon, you've been terribly patient and great about resubmitting these
> > patches over and over. If I'm not just being crazy and missing these
> > patches in front of my nose, are you still willing to submit them
> > again? I think they'll be quite useful as folks start pushing the NOHZ
> > idle times out.
> 
> Absolutely! It is still on my to-do list, but unfortunately, I got busy 
> with a couple other things.
> 
> With regard to the last patch set I submitted for this, Thomas had an 
> issue with one of the patches. I understand the concern, but I am not 
> sure  which would be the preferred way to handle this. See the below thread:
> 
> http://marc.info/?l=linux-kernel&m=125062817124381&w=2
> 
> If you or Thomas have any feedback on this, I could re-work the patch 
> against the latest kernel tree.

Ok. I think Thomas is right there, setting the expiration to
max_time_delta makes the most sense. Honestly I suspect we don't ever
hit that case in the current code (no timers for 12 days), so its
probably an untested code path as it stands.

thanks
-john


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 0/2] Dynamic Tick: Enabling longer sleep times on 32-bit
@ 2009-07-28  0:00 Jon Hunter
  2009-07-28  0:00 ` [PATCH 1/2] Dynamic Tick: Prevent clocksource wrapping during idle Jon Hunter
  0 siblings, 1 reply; 29+ messages in thread
From: Jon Hunter @ 2009-07-28  0:00 UTC (permalink / raw)
  To: linux-kernel; +Cc: Thomas Gleixner, John Stultz, Jon Hunter

From: Jon Hunter <jon-hunter@ti.com>

This is a resend of the patch series shown here:
http://www.spinics.net/lists/kernel/msg891029.html

This patch series has been updated based on the feedback received and
rebased against the current kernel.

This patch series ensures that the wrapping of the clocksource will not be
missed if the kernel sleeps for longer periods and allows 32-bit machines to
sleep for longer than 2.15 seconds.

Jon Hunter (2):
  Dynamic Tick: Prevent clocksource wrapping during idle
  Dynamic Tick: Allow 32-bit machines to sleep for more than 2.15
    seconds

 include/linux/clockchips.h  |    6 ++--
 include/linux/clocksource.h |    2 +
 include/linux/time.h        |    1 +
 kernel/hrtimer.c            |    2 +-
 kernel/time/clockevents.c   |   10 ++++----
 kernel/time/clocksource.c   |   47 +++++++++++++++++++++++++++++++++++
 kernel/time/tick-oneshot.c  |    2 +-
 kernel/time/tick-sched.c    |   57 ++++++++++++++++++++++++++++++++----------
 kernel/time/timekeeping.c   |   11 ++++++++
 kernel/time/timer_list.c    |    4 +-
 10 files changed, 116 insertions(+), 26 deletions(-)


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 1/2] Dynamic Tick: Prevent clocksource wrapping during idle
  2009-07-28  0:00 [PATCH 0/2] Dynamic Tick: Enabling longer sleep times on 32-bit Jon Hunter
@ 2009-07-28  0:00 ` Jon Hunter
  0 siblings, 0 replies; 29+ messages in thread
From: Jon Hunter @ 2009-07-28  0:00 UTC (permalink / raw)
  To: linux-kernel; +Cc: Thomas Gleixner, John Stultz, Jon Hunter

From: Jon Hunter <jon-hunter@ti.com>

The dynamic tick allows the kernel to sleep for periods longer
than a single tick. This patch prevents that the kernel from
sleeping for a period longer than the maximum time that the
current clocksource can count. This ensures that the kernel will
not lose track of time. This patch adds a function called
"clocksource_max_deferment()" that calculates the maximum time the
kernel can sleep for a given clocksource and function called
"timekeeping_max_deferment()" that returns maximum time the kernel
can sleep for the current clocksource.

Signed-off-by: Jon Hunter <jon-hunter@ti.com>
---
 include/linux/clocksource.h |    2 +
 include/linux/time.h        |    1 +
 kernel/time/clocksource.c   |   47 +++++++++++++++++++++++++++++++++++
 kernel/time/tick-sched.c    |   57 ++++++++++++++++++++++++++++++++----------
 kernel/time/timekeeping.c   |   11 ++++++++
 5 files changed, 104 insertions(+), 14 deletions(-)

diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h
index c56457c..5528090 100644
--- a/include/linux/clocksource.h
+++ b/include/linux/clocksource.h
@@ -151,6 +151,7 @@ extern u64 timecounter_cyc2time(struct timecounter *tc,
  * @mult:		cycle to nanosecond multiplier (adjusted by NTP)
  * @mult_orig:		cycle to nanosecond multiplier (unadjusted by NTP)
  * @shift:		cycle to nanosecond divisor (power of two)
+ * @max_idle_ns:	max idle time permitted by the clocksource (nsecs)
  * @flags:		flags describing special properties
  * @vread:		vsyscall based read
  * @resume:		resume function for the clocksource, if necessary
@@ -171,6 +172,7 @@ struct clocksource {
 	u32 mult;
 	u32 mult_orig;
 	u32 shift;
+	s64 max_idle_ns;
 	unsigned long flags;
 	cycle_t (*vread)(void);
 	void (*resume)(void);
diff --git a/include/linux/time.h b/include/linux/time.h
index ea16c1a..ddcff53 100644
--- a/include/linux/time.h
+++ b/include/linux/time.h
@@ -145,6 +145,7 @@ extern void monotonic_to_bootbased(struct timespec *ts);
 
 extern struct timespec timespec_trunc(struct timespec t, unsigned gran);
 extern int timekeeping_valid_for_hres(void);
+extern s64 timekeeping_max_deferment(void);
 extern void update_wall_time(void);
 extern void update_xtime_cache(u64 nsec);
 
diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index 7466cb8..fa28f29 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -321,6 +321,50 @@ void clocksource_touch_watchdog(void)
 }
 
 /**
+ * clocksource_max_deferment - Returns max time the clocksource can be deferred
+ * @cs:         Pointer to clocksource
+ *
+ */
+static s64 clocksource_max_deferment(struct clocksource *cs)
+{
+	s64 max_nsecs;
+	u64 max_cycles;
+
+	/*
+	 * Calculate the maximum number of cycles that we can pass to the
+	 * cyc2ns function without overflowing a 64-bit signed result. The
+	 * maximum number of cycles is equal to ULLONG_MAX/cs->mult which
+	 * is equivalent to the below.
+	 * max_cycles < (2^63)/cs->mult
+	 * max_cycles < 2^(log2((2^63)/cs->mult))
+	 * max_cycles < 2^(log2(2^63) - log2(cs->mult))
+	 * max_cycles < 2^(63 - log2(cs->mult))
+	 * max_cycles < 1 << (63 - log2(cs->mult))
+	 * Please note that we add 1 to the result of the log2 to account for
+	 * any rounding errors, ensure the above inequality is satisfied and
+	 * no overflow will occur.
+	 */
+	max_cycles = 1ULL << (63 - (ilog2(cs->mult) + 1));
+
+	/*
+	 * The actual maximum number of cycles we can defer the clocksource is
+	 * determined by the minimum of max_cycles and cs->mask.
+	 */
+	max_cycles = min(max_cycles, cs->mask);
+	max_nsecs = cyc2ns(cs, max_cycles);
+
+	/*
+	 * To ensure that the clocksource does not wrap whilst we are idle,
+	 * limit the time the clocksource can be deferred by 12.5%. Please
+	 * note a margin of 12.5% is used because this can be computed with
+	 * a shift, versus say 10% which would require division.
+	 */
+	max_nsecs = max_nsecs - (max_nsecs >> 5);
+
+	return max_nsecs;
+}
+
+/**
  * clocksource_get_next - Returns the selected clocksource
  *
  */
@@ -402,6 +446,9 @@ int clocksource_register(struct clocksource *c)
 	unsigned long flags;
 	int ret;
 
+	/* calculate max idle time permitted for this clocksource */
+	c->max_idle_ns = clocksource_max_deferment(c);
+
 	spin_lock_irqsave(&clocksource_lock, flags);
 	ret = clocksource_enqueue(c);
 	if (!ret)
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index e0f59a2..7a98e90 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -217,6 +217,7 @@ void tick_nohz_stop_sched_tick(int inidle)
 	ktime_t last_update, expires, now;
 	struct clock_event_device *dev = __get_cpu_var(tick_cpu_device).evtdev;
 	int cpu;
+	s64 time_delta, max_time_delta;
 
 	local_irq_save(flags);
 
@@ -270,6 +271,18 @@ void tick_nohz_stop_sched_tick(int inidle)
 		seq = read_seqbegin(&xtime_lock);
 		last_update = last_jiffies_update;
 		last_jiffies = jiffies;
+
+		/*
+		 * On SMP we really should only care for the CPU which
+		 * has the do_timer duty assigned. All other CPUs can
+		 * sleep as long as they want.
+		 */
+		if (cpu == tick_do_timer_cpu ||
+				tick_do_timer_cpu == TICK_DO_TIMER_NONE)
+			max_time_delta = timekeeping_max_deferment();
+		else
+			max_time_delta = KTIME_MAX;
+
 	} while (read_seqretry(&xtime_lock, seq));
 
 	/* Get the next timer wheel timer */
@@ -289,11 +302,30 @@ void tick_nohz_stop_sched_tick(int inidle)
 	if ((long)delta_jiffies >= 1) {
 
 		/*
-		* calculate the expiry time for the next timer wheel
-		* timer
-		*/
-		expires = ktime_add_ns(last_update, tick_period.tv64 *
-				   delta_jiffies);
+		 * calculate the expiry time for the next timer wheel
+		 * timer. delta_jiffies >= NEXT_TIMER_MAX_DELTA signals
+		 * that there is no timer pending or at least extremely
+		 * far into the future (12 days for HZ=1000). In this
+		 * case we set the expiry to the end of time.
+		 */
+		if (likely(delta_jiffies < NEXT_TIMER_MAX_DELTA)) {
+
+			/*
+			 * Calculate the time delta for the next timer event.
+			 * If the time delta exceeds the maximum time delta
+			 * permitted by the current clocksource then adjust
+			 * the time delta accordingly to ensure the
+			 * clocksource does not wrap.
+			 */
+			time_delta = tick_period.tv64 * delta_jiffies;
+
+			if (time_delta > max_time_delta)
+				time_delta = max_time_delta;
+
+			expires = ktime_add_ns(last_update, time_delta);
+		} else {
+			expires.tv64 = KTIME_MAX;
+		}
 
 		/*
 		 * If this cpu is the one which updates jiffies, then
@@ -337,22 +369,19 @@ void tick_nohz_stop_sched_tick(int inidle)
 
 		ts->idle_sleeps++;
 
+		/* Mark expires */
+		ts->idle_expires = expires;
+
 		/*
-		 * delta_jiffies >= NEXT_TIMER_MAX_DELTA signals that
-		 * there is no timer pending or at least extremly far
-		 * into the future (12 days for HZ=1000). In this case
-		 * we simply stop the tick timer:
+		 * If the expiration time == KTIME_MAX, then
+		 * in this case we simply stop the tick timer.
 		 */
-		if (unlikely(delta_jiffies >= NEXT_TIMER_MAX_DELTA)) {
-			ts->idle_expires.tv64 = KTIME_MAX;
+		 if (unlikely(expires.tv64 == KTIME_MAX)) {
 			if (ts->nohz_mode == NOHZ_MODE_HIGHRES)
 				hrtimer_cancel(&ts->sched_timer);
 			goto out;
 		}
 
-		/* Mark expiries */
-		ts->idle_expires = expires;
-
 		if (ts->nohz_mode == NOHZ_MODE_HIGHRES) {
 			hrtimer_start(&ts->sched_timer, expires,
 				      HRTIMER_MODE_ABS_PINNED);
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index e8c77d9..cd1b110 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -278,6 +278,17 @@ int timekeeping_valid_for_hres(void)
 }
 
 /**
+ * timekeeping_max_deferment - Returns max time the clocksource can be deferred
+ *
+ * IMPORTANT: Caller must observe xtime_lock via read_seqbegin/read_seqretry
+ * to ensure that the clocksource does not change!
+ */
+s64 timekeeping_max_deferment(void)
+{
+	return clock->max_idle_ns;
+}
+
+/**
  * read_persistent_clock -  Return time in seconds from the persistent clock.
  *
  * Weak dummy function for arches that do not yet support it.
-- 
1.6.0.4


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 1/2] Dynamic Tick: Prevent clocksource wrapping during idle
@ 2009-05-27 14:49 Jon Hunter
  2009-05-27 16:01 ` Thomas Gleixner
                   ` (2 more replies)
  0 siblings, 3 replies; 29+ messages in thread
From: Jon Hunter @ 2009-05-27 14:49 UTC (permalink / raw)
  To: linux-kernel@vger.kernel.org; +Cc: john stultz, Thomas Gleixner, Ingo Molnar


The dynamic tick allows the kernel to sleep for periods longer than a 
single tick. This patch prevents that the kernel from sleeping for a 
period longer than the maximum time that the current clocksource can 
count. This ensures that the kernel will not lose track of time. This 
patch adds a new function called "timekeeping_max_deferment()" that 
calculates the maximum time the kernel can sleep for a given clocksource.

Signed-off-by: Jon Hunter <jon-hunter@ti.com>
---
  include/linux/time.h      |    1 +
  kernel/time/tick-sched.c  |   36 +++++++++++++++++++++++----------
  kernel/time/timekeeping.c |   47 
+++++++++++++++++++++++++++++++++++++++++++++
  3 files changed, 73 insertions(+), 11 deletions(-)

diff --git a/include/linux/time.h b/include/linux/time.h
index 242f624..090be07 100644
--- a/include/linux/time.h
+++ b/include/linux/time.h
@@ -130,6 +130,7 @@ extern void monotonic_to_bootbased(struct timespec *ts);

  extern struct timespec timespec_trunc(struct timespec t, unsigned gran);
  extern int timekeeping_valid_for_hres(void);
+extern s64 timekeeping_max_deferment(void);
  extern void update_wall_time(void);
  extern void update_xtime_cache(u64 nsec);

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index d3f1ef4..f0155ae 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -217,6 +217,7 @@ void tick_nohz_stop_sched_tick(int inidle)
  	ktime_t last_update, expires, now;
  	struct clock_event_device *dev = __get_cpu_var(tick_cpu_device).evtdev;
  	int cpu;
+	s64 time_delta, max_time_delta;

  	local_irq_save(flags);

@@ -264,6 +265,7 @@ void tick_nohz_stop_sched_tick(int inidle)
  		seq = read_seqbegin(&xtime_lock);
  		last_update = last_jiffies_update;
  		last_jiffies = jiffies;
+		max_time_delta = timekeeping_max_deferment();
  	} while (read_seqretry(&xtime_lock, seq));

  	/* Get the next timer wheel timer */
@@ -283,11 +285,22 @@ void tick_nohz_stop_sched_tick(int inidle)
  	if ((long)delta_jiffies >= 1) {

  		/*
-		* calculate the expiry time for the next timer wheel
-		* timer
-		*/
-		expires = ktime_add_ns(last_update, tick_period.tv64 *
-				   delta_jiffies);
+		 * Calculate the time delta for the next timer event.
+		 * If the time delta exceeds the maximum time delta
+		 * permitted by the current clocksource then adjust
+		 * the time delta accordingly to ensure the
+		 * clocksource does not wrap.
+		 */
+		time_delta = tick_period.tv64 * delta_jiffies;
+
+		if (time_delta > max_time_delta)
+			time_delta = max_time_delta;
+
+		/*
+		 * calculate the expiry time for the next timer wheel
+		 * timer
+		 */
+		expires = ktime_add_ns(last_update, time_delta);

  		/*
  		 * If this cpu is the one which updates jiffies, then
@@ -300,7 +313,7 @@ void tick_nohz_stop_sched_tick(int inidle)
  		if (cpu == tick_do_timer_cpu)
  			tick_do_timer_cpu = TICK_DO_TIMER_NONE;

-		if (delta_jiffies > 1)
+		if (time_delta > tick_period.tv64)
  			cpumask_set_cpu(cpu, nohz_cpu_mask);

  		/* Skip reprogram of event if its not changed */
@@ -332,12 +345,13 @@ void tick_nohz_stop_sched_tick(int inidle)
  		ts->idle_sleeps++;

  		/*
-		 * delta_jiffies >= NEXT_TIMER_MAX_DELTA signals that
-		 * there is no timer pending or at least extremly far
-		 * into the future (12 days for HZ=1000). In this case
-		 * we simply stop the tick timer:
+		 * time_delta >= (tick_period.tv64 * NEXT_TIMER_MAX_DELTA)
+		 * signals that there is no timer pending or at least
+		 * extremely far into the future (12 days for HZ=1000).
+		 * In this case we simply stop the tick timer:
  		 */
-		if (unlikely(delta_jiffies >= NEXT_TIMER_MAX_DELTA)) {
+		if (unlikely(time_delta >=
+				(tick_period.tv64 * NEXT_TIMER_MAX_DELTA))) {
  			ts->idle_expires.tv64 = KTIME_MAX;
  			if (ts->nohz_mode == NOHZ_MODE_HIGHRES)
  				hrtimer_cancel(&ts->sched_timer);
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 687dff4..608fc6f 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -271,6 +271,53 @@ int timekeeping_valid_for_hres(void)
  }

  /**
+ * timekeeping_max_deferment - Returns max time the clocksource can be 
deferred
+ *
+ * IMPORTANT: Must be called with xtime_lock held!
+ */
+s64 timekeeping_max_deferment(void)
+{
+	s64 max_nsecs;
+	u64 max_cycles;
+
+	/*
+	 * Calculate the maximum number of cycles that we can pass to the
+	 * cyc2ns function without overflowing a 64-bit signed result. The
+	 * maximum number of cycles is equal to ULLONG_MAX/clock->mult which
+	 * is equivalent to the below.
+	 * max_cycles < (2^63)/clock->mult
+	 * max_cycles < 2^(log2((2^63)/clock->mult))
+	 * max_cycles < 2^(log2(2^63) - log2(clock->mult))
+	 * max_cycles < 2^(63 - log2(clock->mult))
+	 * max_cycles < 1 << (63 - log2(clock->mult))
+	 * Please note that we add 1 to the result of the log2 to account for
+	 * any rounding errors, ensure the above inequality is satisfied and
+	 * no overflow will occur.
+	 */
+	max_cycles = 1ULL << (63 - (ilog2(clock->mult) + 1));
+
+	/*
+	 * The actual maximum number of cycles we can defer the clocksource is
+	 * determined by the minimum of max_cycles and clock->mask.
+	 */
+	max_cycles = min(max_cycles, clock->mask);
+	max_nsecs = cyc2ns(clock, max_cycles);
+
+	/*
+	 * To ensure that the clocksource does not wrap whilst we are idle,
+	 * limit the time the clocksource can be deferred by 6.25%. Please
+	 * note a margin of 6.25% is used because this can be computed with
+	 * a shift, versus say 5% which would require division.
+	 */
+	max_nsecs = max_nsecs - (max_nsecs >> 4);
+
+	if (max_nsecs < 0)
+		max_nsecs = 0;
+
+	return max_nsecs;
+}
+
+/**
   * read_persistent_clock -  Return time in seconds from the persistent 
clock.
   *
   * Weak dummy function for arches that do not yet support it.
-- 
1.6.1

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/2] Dynamic Tick: Prevent clocksource wrapping during idle
  2009-05-27 14:49 Jon Hunter
@ 2009-05-27 16:01 ` Thomas Gleixner
  2009-05-27 20:20   ` john stultz
  2009-05-27 18:15 ` john stultz
  2009-05-27 20:54 ` Alok Kataria
  2 siblings, 1 reply; 29+ messages in thread
From: Thomas Gleixner @ 2009-05-27 16:01 UTC (permalink / raw)
  To: Jon Hunter; +Cc: linux-kernel@vger.kernel.org, john stultz, Ingo Molnar

On Wed, 27 May 2009, Jon Hunter wrote:
>  /**
> + * timekeeping_max_deferment - Returns max time the clocksource can be
> deferred
> + *
> + * IMPORTANT: Must be called with xtime_lock held!

  No, that would mean that xtime_lock needs to be write locked. And we
  definitely do not want that.

  The caller needs to observe xtime_lock via read_seqbegin /
  read_seqretry because clock might change.

> + */
> +s64 timekeeping_max_deferment(void)
> +{
> +	s64 max_nsecs;
> +	u64 max_cycles;
> +
> +	/*
> +	 * Calculate the maximum number of cycles that we can pass to the
> +	 * cyc2ns function without overflowing a 64-bit signed result. The
> +	 * maximum number of cycles is equal to ULLONG_MAX/clock->mult which
> +	 * is equivalent to the below.
> +	 * max_cycles < (2^63)/clock->mult
> +	 * max_cycles < 2^(log2((2^63)/clock->mult))
> +	 * max_cycles < 2^(log2(2^63) - log2(clock->mult))
> +	 * max_cycles < 2^(63 - log2(clock->mult))
> +	 * max_cycles < 1 << (63 - log2(clock->mult))
> +	 * Please note that we add 1 to the result of the log2 to account for
> +	 * any rounding errors, ensure the above inequality is satisfied and
> +	 * no overflow will occur.
> +	 */
> +	max_cycles = 1ULL << (63 - (ilog2(clock->mult) + 1));
> +
> +	/*
> +	 * The actual maximum number of cycles we can defer the clocksource is
> +	 * determined by the minimum of max_cycles and clock->mask.
> +	 */
> +	max_cycles = min(max_cycles, clock->mask);
> +	max_nsecs = cyc2ns(clock, max_cycles);

  Why do you want to recalculate the whole stuff over and over ?

  That computation can be done when the clock source is initialized or
  any fundamental change of the clock parameters happens.

  Stick that value into the clocksource struct and just read it out.

> +	/*
> +	 * To ensure that the clocksource does not wrap whilst we are idle,
> +	 * limit the time the clocksource can be deferred by 6.25%. Please
> +	 * note a margin of 6.25% is used because this can be computed with
> +	 * a shift, versus say 5% which would require division.
> +	 */
> +	max_nsecs = max_nsecs - (max_nsecs >> 4);
> +
> +	if (max_nsecs < 0)
> +		max_nsecs = 0;

  How does "max_nsecs = max_nsecs - (max_nsecs >> 4)" ever become
  negative ?

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/2] Dynamic Tick: Prevent clocksource wrapping during idle
  2009-05-27 16:01 ` Thomas Gleixner
@ 2009-05-27 20:20   ` john stultz
  2009-05-27 20:32     ` Thomas Gleixner
  0 siblings, 1 reply; 29+ messages in thread
From: john stultz @ 2009-05-27 20:20 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Jon Hunter, linux-kernel@vger.kernel.org, Ingo Molnar

On Wed, 2009-05-27 at 18:01 +0200, Thomas Gleixner wrote:
> On Wed, 27 May 2009, Jon Hunter wrote:
> > + */
> > +s64 timekeeping_max_deferment(void)
> > +{
> > +	s64 max_nsecs;
> > +	u64 max_cycles;
> > +
> > +	/*
> > +	 * Calculate the maximum number of cycles that we can pass to the
> > +	 * cyc2ns function without overflowing a 64-bit signed result. The
> > +	 * maximum number of cycles is equal to ULLONG_MAX/clock->mult which
> > +	 * is equivalent to the below.
> > +	 * max_cycles < (2^63)/clock->mult
> > +	 * max_cycles < 2^(log2((2^63)/clock->mult))
> > +	 * max_cycles < 2^(log2(2^63) - log2(clock->mult))
> > +	 * max_cycles < 2^(63 - log2(clock->mult))
> > +	 * max_cycles < 1 << (63 - log2(clock->mult))
> > +	 * Please note that we add 1 to the result of the log2 to account for
> > +	 * any rounding errors, ensure the above inequality is satisfied and
> > +	 * no overflow will occur.
> > +	 */
> > +	max_cycles = 1ULL << (63 - (ilog2(clock->mult) + 1));
> > +
> > +	/*
> > +	 * The actual maximum number of cycles we can defer the clocksource is
> > +	 * determined by the minimum of max_cycles and clock->mask.
> > +	 */
> > +	max_cycles = min(max_cycles, clock->mask);
> > +	max_nsecs = cyc2ns(clock, max_cycles);
> 
>   Why do you want to recalculate the whole stuff over and over ?
> 
>   That computation can be done when the clock source is initialized or
>   any fundamental change of the clock parameters happens.
> 
>   Stick that value into the clocksource struct and just read it out.

Sigh. 

I was hoping to avoid hanging another bit of junk off of the clocksource
struct. 

But I guess we could compute that value on registration and keep it
around. Changes to mult could effect things, but should be well within
the 6% safety net we give ourselves.


> > +	/*
> > +	 * To ensure that the clocksource does not wrap whilst we are idle,
> > +	 * limit the time the clocksource can be deferred by 6.25%. Please
> > +	 * note a margin of 6.25% is used because this can be computed with
> > +	 * a shift, versus say 5% which would require division.
> > +	 */
> > +	max_nsecs = max_nsecs - (max_nsecs >> 4);
> > +
> > +	if (max_nsecs < 0)
> > +		max_nsecs = 0;
> 
>   How does "max_nsecs = max_nsecs - (max_nsecs >> 4)" ever become
>   negative ?

Fair point. Now we've limited the overflow case, we shouldn't trip
negative values. 

thanks
-john



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/2] Dynamic Tick: Prevent clocksource wrapping during idle
  2009-05-27 20:20   ` john stultz
@ 2009-05-27 20:32     ` Thomas Gleixner
  2009-05-28 20:21       ` Jon Hunter
  0 siblings, 1 reply; 29+ messages in thread
From: Thomas Gleixner @ 2009-05-27 20:32 UTC (permalink / raw)
  To: john stultz; +Cc: Jon Hunter, linux-kernel@vger.kernel.org, Ingo Molnar

On Wed, 27 May 2009, john stultz wrote:
> >   Why do you want to recalculate the whole stuff over and over ?
> > 
> >   That computation can be done when the clock source is initialized or
> >   any fundamental change of the clock parameters happens.
> > 
> >   Stick that value into the clocksource struct and just read it out.
> 
> Sigh. 
> 
> I was hoping to avoid hanging another bit of junk off of the clocksource
> struct. 

Sure, but buying that 8 bytes with Einsteinian insanity is nuts.
 
> But I guess we could compute that value on registration and keep it
> around. Changes to mult could effect things, but should be well within
> the 6% safety net we give ourselves.

That was my thought as well, but even if we have to go to 12% it's way
better than doing repeated nonsense on the way to idle.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/2] Dynamic Tick: Prevent clocksource wrapping during idle
  2009-05-27 20:32     ` Thomas Gleixner
@ 2009-05-28 20:21       ` Jon Hunter
  2009-05-28 20:36         ` Thomas Gleixner
  0 siblings, 1 reply; 29+ messages in thread
From: Jon Hunter @ 2009-05-28 20:21 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: john stultz, linux-kernel@vger.kernel.org, Ingo Molnar


Thomas Gleixner wrote:
> On Wed, 27 May 2009, john stultz wrote:
>>>   Why do you want to recalculate the whole stuff over and over ?
>>>
>>>   That computation can be done when the clock source is initialized or
>>>   any fundamental change of the clock parameters happens.
>>>
>>>   Stick that value into the clocksource struct and just read it out.
>> Sigh. 
>>
>> I was hoping to avoid hanging another bit of junk off of the clocksource
>> struct. 
> 
> Sure, but buying that 8 bytes with Einsteinian insanity is nuts.

Ok, I have re-worked the patch to avoid computing the value over and 
over and just use the original mult value to calculate the max 
deferment. This patch calculates the value on registering the 
clocksource and stores the value in the clocksource struct.

>> But I guess we could compute that value on registration and keep it
>> around. Changes to mult could effect things, but should be well within
>> the 6% safety net we give ourselves.
> 
> That was my thought as well, but even if we have to go to 12% it's way
> better than doing repeated nonsense on the way to idle.

For now I have modified the patch to go to 12.5% margin to be on the 
safe side.

Let me know your thoughts on the below.

Cheers
Jon

The dynamic tick allows the kernel to sleep for periods longer than a 
single tick. This patch prevents that the kernel from sleeping for a 
period longer than the maximum time that the current clocksource can 
count. This ensures that the kernel will not lose track of time. This 
patch adds a function called "clocksource_max_deferment()" that 
calculates the maximum time the kernel can sleep for a given clocksource 
and function called "timekeeping_max_deferment()" that returns maximum 
time the kernel can sleep for the current clocksource.

Signed-off-by: Jon Hunter <jon-hunter@ti.com>
---
  include/linux/clocksource.h |   46 
+++++++++++++++++++++++++++++++++++++++++++
  include/linux/time.h        |    1 +
  kernel/time/clocksource.c   |    3 ++
  kernel/time/tick-sched.c    |   36 +++++++++++++++++++++++----------
  kernel/time/timekeeping.c   |   11 ++++++++++
  5 files changed, 86 insertions(+), 11 deletions(-)

diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h
index 5a40d14..b0d676e 100644
--- a/include/linux/clocksource.h
+++ b/include/linux/clocksource.h
@@ -151,6 +151,7 @@ extern u64 timecounter_cyc2time(struct timecounter *tc,
   * @mult:		cycle to nanosecond multiplier (adjusted by NTP)
   * @mult_orig:		cycle to nanosecond multiplier (unadjusted by NTP)
   * @shift:		cycle to nanosecond divisor (power of two)
+ * @max_idle_ns:	max idle time permitted by the clocksource (nsecs)
   * @flags:		flags describing special properties
   * @vread:		vsyscall based read
   * @resume:		resume function for the clocksource, if necessary
@@ -171,6 +172,7 @@ struct clocksource {
  	u32 mult;
  	u32 mult_orig;
  	u32 shift;
+	s64 max_idle_ns;
  	unsigned long flags;
  	cycle_t (*vread)(void);
  	void (*resume)(void);
@@ -322,6 +324,50 @@ static inline s64 cyc2ns(struct clocksource *cs, 
cycle_t cycles)
  }

  /**
+ * clocksource_max_deferment - Returns max time the clocksource can be 
deferred
+ * @cs:		Pointer to clocksource
+ *
+ */
+static inline s64 clocksource_max_deferment(struct clocksource *cs)
+{
+	s64 max_nsecs;
+	u64 max_cycles;
+
+	/*
+	 * Calculate the maximum number of cycles that we can pass to the
+	 * cyc2ns function without overflowing a 64-bit signed result. The
+	 * maximum number of cycles is equal to ULLONG_MAX/cs->mult which
+	 * is equivalent to the below.
+	 * max_cycles < (2^63)/cs->mult
+	 * max_cycles < 2^(log2((2^63)/cs->mult))
+	 * max_cycles < 2^(log2(2^63) - log2(cs->mult))
+	 * max_cycles < 2^(63 - log2(cs->mult))
+	 * max_cycles < 1 << (63 - log2(cs->mult))
+	 * Please note that we add 1 to the result of the log2 to account for
+	 * any rounding errors, ensure the above inequality is satisfied and
+	 * no overflow will occur.
+	 */
+	max_cycles = 1ULL << (63 - (ilog2(cs->mult) + 1));
+
+	/*
+	 * The actual maximum number of cycles we can defer the clocksource is
+	 * determined by the minimum of max_cycles and cs->mask.
+	 */
+	max_cycles = min(max_cycles, cs->mask);
+	max_nsecs = cyc2ns(cs, max_cycles);
+
+	/*
+	 * To ensure that the clocksource does not wrap whilst we are idle,
+	 * limit the time the clocksource can be deferred by 12.5%. Please
+	 * note a margin of 12.5% is used because this can be computed with
+	 * a shift, versus say 10% which would require division.
+	 */
+	max_nsecs = max_nsecs - (max_nsecs >> 5);
+
+	return max_nsecs;
+}
+
+/**
   * clocksource_calculate_interval - Calculates a clocksource interval 
struct
   *
   * @c:		Pointer to clocksource.
diff --git a/include/linux/time.h b/include/linux/time.h
index 242f624..090be07 100644
--- a/include/linux/time.h
+++ b/include/linux/time.h
@@ -130,6 +130,7 @@ extern void monotonic_to_bootbased(struct timespec *ts);

  extern struct timespec timespec_trunc(struct timespec t, unsigned gran);
  extern int timekeeping_valid_for_hres(void);
+extern s64 timekeeping_max_deferment(void);
  extern void update_wall_time(void);
  extern void update_xtime_cache(u64 nsec);

diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index ecfd7b5..0d98dc2 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -405,6 +405,9 @@ int clocksource_register(struct clocksource *c)
  	/* save mult_orig on registration */
  	c->mult_orig = c->mult;

+	/* calculate max idle time permitted for this clocksource */
+	c->max_idle_ns = clocksource_max_deferment(c);
+
  	spin_lock_irqsave(&clocksource_lock, flags);
  	ret = clocksource_enqueue(c);
  	if (!ret)
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index d3f1ef4..f0155ae 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -217,6 +217,7 @@ void tick_nohz_stop_sched_tick(int inidle)
  	ktime_t last_update, expires, now;
  	struct clock_event_device *dev = __get_cpu_var(tick_cpu_device).evtdev;
  	int cpu;
+	s64 time_delta, max_time_delta;

  	local_irq_save(flags);

@@ -264,6 +265,7 @@ void tick_nohz_stop_sched_tick(int inidle)
  		seq = read_seqbegin(&xtime_lock);
  		last_update = last_jiffies_update;
  		last_jiffies = jiffies;
+		max_time_delta = timekeeping_max_deferment();
  	} while (read_seqretry(&xtime_lock, seq));

  	/* Get the next timer wheel timer */
@@ -283,11 +285,22 @@ void tick_nohz_stop_sched_tick(int inidle)
  	if ((long)delta_jiffies >= 1) {

  		/*
-		* calculate the expiry time for the next timer wheel
-		* timer
-		*/
-		expires = ktime_add_ns(last_update, tick_period.tv64 *
-				   delta_jiffies);
+		 * Calculate the time delta for the next timer event.
+		 * If the time delta exceeds the maximum time delta
+		 * permitted by the current clocksource then adjust
+		 * the time delta accordingly to ensure the
+		 * clocksource does not wrap.
+		 */
+		time_delta = tick_period.tv64 * delta_jiffies;
+
+		if (time_delta > max_time_delta)
+			time_delta = max_time_delta;
+
+		/*
+		 * calculate the expiry time for the next timer wheel
+		 * timer
+		 */
+		expires = ktime_add_ns(last_update, time_delta);

  		/*
  		 * If this cpu is the one which updates jiffies, then
@@ -300,7 +313,7 @@ void tick_nohz_stop_sched_tick(int inidle)
  		if (cpu == tick_do_timer_cpu)
  			tick_do_timer_cpu = TICK_DO_TIMER_NONE;

-		if (delta_jiffies > 1)
+		if (time_delta > tick_period.tv64)
  			cpumask_set_cpu(cpu, nohz_cpu_mask);

  		/* Skip reprogram of event if its not changed */
@@ -332,12 +345,13 @@ void tick_nohz_stop_sched_tick(int inidle)
  		ts->idle_sleeps++;

  		/*
-		 * delta_jiffies >= NEXT_TIMER_MAX_DELTA signals that
-		 * there is no timer pending or at least extremly far
-		 * into the future (12 days for HZ=1000). In this case
-		 * we simply stop the tick timer:
+		 * time_delta >= (tick_period.tv64 * NEXT_TIMER_MAX_DELTA)
+		 * signals that there is no timer pending or at least
+		 * extremely far into the future (12 days for HZ=1000).
+		 * In this case we simply stop the tick timer:
  		 */
-		if (unlikely(delta_jiffies >= NEXT_TIMER_MAX_DELTA)) {
+		if (unlikely(time_delta >=
+				(tick_period.tv64 * NEXT_TIMER_MAX_DELTA))) {
  			ts->idle_expires.tv64 = KTIME_MAX;
  			if (ts->nohz_mode == NOHZ_MODE_HIGHRES)
  				hrtimer_cancel(&ts->sched_timer);
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 687dff4..659cae3 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -271,6 +271,17 @@ int timekeeping_valid_for_hres(void)
  }

  /**
+ * timekeeping_max_deferment - Returns max time the clocksource can be 
deferred
+ *
+ * IMPORTANT: Caller must observe xtime_lock via 
read_seqbegin/read_seqretry
+ * to ensure that the clocksource does not change!
+ */
+s64 timekeeping_max_deferment(void)
+{
+	return clock->max_idle_ns;
+}
+
+/**
   * read_persistent_clock -  Return time in seconds from the persistent 
clock.
   *
   * Weak dummy function for arches that do not yet support it.
-- 
1.6.1



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/2] Dynamic Tick: Prevent clocksource wrapping during idle
  2009-05-28 20:21       ` Jon Hunter
@ 2009-05-28 20:36         ` Thomas Gleixner
  2009-05-28 21:10           ` Jon Hunter
  0 siblings, 1 reply; 29+ messages in thread
From: Thomas Gleixner @ 2009-05-28 20:36 UTC (permalink / raw)
  To: Jon Hunter; +Cc: john stultz, linux-kernel@vger.kernel.org, Ingo Molnar

On Thu, 28 May 2009, Jon Hunter wrote:
>  /**
> + * clocksource_max_deferment - Returns max time the clocksource can be
> deferred
> + * @cs:		Pointer to clocksource
> + *
> + */
> +static inline s64 clocksource_max_deferment(struct clocksource *cs)

  Please make this a real function. There is no reason to stick this
  into a header file. The only user is clocksource.c anyway, so please
  put it there as a static function and let the compiler decide what
  to do with it.

  Otherwise, I'm happy with it.

  Thanks,

	tglx

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/2] Dynamic Tick: Prevent clocksource wrapping during idle
  2009-05-28 20:36         ` Thomas Gleixner
@ 2009-05-28 21:10           ` Jon Hunter
  2009-05-28 21:43             ` John Stultz
                               ` (2 more replies)
  0 siblings, 3 replies; 29+ messages in thread
From: Jon Hunter @ 2009-05-28 21:10 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: john stultz, linux-kernel@vger.kernel.org, Ingo Molnar


Thomas Gleixner wrote:
>   Please make this a real function. There is no reason to stick this
>   into a header file. The only user is clocksource.c anyway, so please
>   put it there as a static function and let the compiler decide what
>   to do with it.

No problem. Please see below. Let me know if this is ok and there is 
anything else.

Cheers
Jon

The dynamic tick allows the kernel to sleep for periods longer
than a single tick. This patch prevents that the kernel from
sleeping for a period longer than the maximum time that the
current clocksource can count. This ensures that the kernel will
not lose track of time. This patch adds a function called
"clocksource_max_deferment()" that calculates the maximum time the
kernel can sleep for a given clocksource and function called
"timekeeping_max_deferment()" that returns maximum time the kernel
can sleep for the current clocksource.

Signed-off-by: Jon Hunter <jon-hunter@ti.com>
---
  include/linux/clocksource.h |    2 +
  include/linux/time.h        |    1 +
  kernel/time/clocksource.c   |   47 
+++++++++++++++++++++++++++++++++++++++++++
  kernel/time/tick-sched.c    |   36 ++++++++++++++++++++++----------
  kernel/time/timekeeping.c   |   11 ++++++++++
  5 files changed, 86 insertions(+), 11 deletions(-)

diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h
index 5a40d14..465af22 100644
--- a/include/linux/clocksource.h
+++ b/include/linux/clocksource.h
@@ -151,6 +151,7 @@ extern u64 timecounter_cyc2time(struct timecounter *tc,
   * @mult:		cycle to nanosecond multiplier (adjusted by NTP)
   * @mult_orig:		cycle to nanosecond multiplier (unadjusted by NTP)
   * @shift:		cycle to nanosecond divisor (power of two)
+ * @max_idle_ns:	max idle time permitted by the clocksource (nsecs)
   * @flags:		flags describing special properties
   * @vread:		vsyscall based read
   * @resume:		resume function for the clocksource, if necessary
@@ -171,6 +172,7 @@ struct clocksource {
  	u32 mult;
  	u32 mult_orig;
  	u32 shift;
+	s64 max_idle_ns;
  	unsigned long flags;
  	cycle_t (*vread)(void);
  	void (*resume)(void);
diff --git a/include/linux/time.h b/include/linux/time.h
index 242f624..090be07 100644
--- a/include/linux/time.h
+++ b/include/linux/time.h
@@ -130,6 +130,7 @@ extern void monotonic_to_bootbased(struct timespec *ts);

  extern struct timespec timespec_trunc(struct timespec t, unsigned gran);
  extern int timekeeping_valid_for_hres(void);
+extern s64 timekeeping_max_deferment(void);
  extern void update_wall_time(void);
  extern void update_xtime_cache(u64 nsec);

diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index ecfd7b5..18d2b9f 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -321,6 +321,50 @@ void clocksource_touch_watchdog(void)
  }

  /**
+ * clocksource_max_deferment - Returns max time the clocksource can be 
deferred
+ * @cs:         Pointer to clocksource
+ *
+ */
+static s64 clocksource_max_deferment(struct clocksource *cs)
+{
+	s64 max_nsecs;
+	u64 max_cycles;
+
+	/*
+	 * Calculate the maximum number of cycles that we can pass to the
+	 * cyc2ns function without overflowing a 64-bit signed result. The
+	 * maximum number of cycles is equal to ULLONG_MAX/cs->mult which
+	 * is equivalent to the below.
+	 * max_cycles < (2^63)/cs->mult
+	 * max_cycles < 2^(log2((2^63)/cs->mult))
+	 * max_cycles < 2^(log2(2^63) - log2(cs->mult))
+	 * max_cycles < 2^(63 - log2(cs->mult))
+	 * max_cycles < 1 << (63 - log2(cs->mult))
+	 * Please note that we add 1 to the result of the log2 to account for
+	 * any rounding errors, ensure the above inequality is satisfied and
+	 * no overflow will occur.
+	 */
+	max_cycles = 1ULL << (63 - (ilog2(cs->mult) + 1));
+
+	/*
+	 * The actual maximum number of cycles we can defer the clocksource is
+	 * determined by the minimum of max_cycles and cs->mask.
+	 */
+	max_cycles = min(max_cycles, cs->mask);
+	max_nsecs = cyc2ns(cs, max_cycles);
+
+	/*
+	 * To ensure that the clocksource does not wrap whilst we are idle,
+	 * limit the time the clocksource can be deferred by 12.5%. Please
+	 * note a margin of 12.5% is used because this can be computed with
+	 * a shift, versus say 10% which would require division.
+	 */
+	max_nsecs = max_nsecs - (max_nsecs >> 5);
+
+	return max_nsecs;
+}
+
+/**
   * clocksource_get_next - Returns the selected clocksource
   *
   */
@@ -405,6 +449,9 @@ int clocksource_register(struct clocksource *c)
  	/* save mult_orig on registration */
  	c->mult_orig = c->mult;

+	/* calculate max idle time permitted for this clocksource */
+	c->max_idle_ns = clocksource_max_deferment(c);
+
  	spin_lock_irqsave(&clocksource_lock, flags);
  	ret = clocksource_enqueue(c);
  	if (!ret)
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index d3f1ef4..f0155ae 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -217,6 +217,7 @@ void tick_nohz_stop_sched_tick(int inidle)
  	ktime_t last_update, expires, now;
  	struct clock_event_device *dev = __get_cpu_var(tick_cpu_device).evtdev;
  	int cpu;
+	s64 time_delta, max_time_delta;

  	local_irq_save(flags);

@@ -264,6 +265,7 @@ void tick_nohz_stop_sched_tick(int inidle)
  		seq = read_seqbegin(&xtime_lock);
  		last_update = last_jiffies_update;
  		last_jiffies = jiffies;
+		max_time_delta = timekeeping_max_deferment();
  	} while (read_seqretry(&xtime_lock, seq));

  	/* Get the next timer wheel timer */
@@ -283,11 +285,22 @@ void tick_nohz_stop_sched_tick(int inidle)
  	if ((long)delta_jiffies >= 1) {

  		/*
-		* calculate the expiry time for the next timer wheel
-		* timer
-		*/
-		expires = ktime_add_ns(last_update, tick_period.tv64 *
-				   delta_jiffies);
+		 * Calculate the time delta for the next timer event.
+		 * If the time delta exceeds the maximum time delta
+		 * permitted by the current clocksource then adjust
+		 * the time delta accordingly to ensure the
+		 * clocksource does not wrap.
+		 */
+		time_delta = tick_period.tv64 * delta_jiffies;
+
+		if (time_delta > max_time_delta)
+			time_delta = max_time_delta;
+
+		/*
+		 * calculate the expiry time for the next timer wheel
+		 * timer
+		 */
+		expires = ktime_add_ns(last_update, time_delta);

  		/*
  		 * If this cpu is the one which updates jiffies, then
@@ -300,7 +313,7 @@ void tick_nohz_stop_sched_tick(int inidle)
  		if (cpu == tick_do_timer_cpu)
  			tick_do_timer_cpu = TICK_DO_TIMER_NONE;

-		if (delta_jiffies > 1)
+		if (time_delta > tick_period.tv64)
  			cpumask_set_cpu(cpu, nohz_cpu_mask);

  		/* Skip reprogram of event if its not changed */
@@ -332,12 +345,13 @@ void tick_nohz_stop_sched_tick(int inidle)
  		ts->idle_sleeps++;

  		/*
-		 * delta_jiffies >= NEXT_TIMER_MAX_DELTA signals that
-		 * there is no timer pending or at least extremly far
-		 * into the future (12 days for HZ=1000). In this case
-		 * we simply stop the tick timer:
+		 * time_delta >= (tick_period.tv64 * NEXT_TIMER_MAX_DELTA)
+		 * signals that there is no timer pending or at least
+		 * extremely far into the future (12 days for HZ=1000).
+		 * In this case we simply stop the tick timer:
  		 */
-		if (unlikely(delta_jiffies >= NEXT_TIMER_MAX_DELTA)) {
+		if (unlikely(time_delta >=
+				(tick_period.tv64 * NEXT_TIMER_MAX_DELTA))) {
  			ts->idle_expires.tv64 = KTIME_MAX;
  			if (ts->nohz_mode == NOHZ_MODE_HIGHRES)
  				hrtimer_cancel(&ts->sched_timer);
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 687dff4..659cae3 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -271,6 +271,17 @@ int timekeeping_valid_for_hres(void)
  }

  /**
+ * timekeeping_max_deferment - Returns max time the clocksource can be 
deferred
+ *
+ * IMPORTANT: Caller must observe xtime_lock via 
read_seqbegin/read_seqretry
+ * to ensure that the clocksource does not change!
+ */
+s64 timekeeping_max_deferment(void)
+{
+	return clock->max_idle_ns;
+}
+
+/**
   * read_persistent_clock -  Return time in seconds from the persistent 
clock.
   *
   * Weak dummy function for arches that do not yet support it.
-- 
1.6.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/2] Dynamic Tick: Prevent clocksource wrapping during idle
  2009-05-28 21:10           ` Jon Hunter
@ 2009-05-28 21:43             ` John Stultz
  2009-05-28 22:16             ` Thomas Gleixner
  2009-05-30  1:00             ` Jon Hunter
  2 siblings, 0 replies; 29+ messages in thread
From: John Stultz @ 2009-05-28 21:43 UTC (permalink / raw)
  To: Jon Hunter; +Cc: Thomas Gleixner, linux-kernel@vger.kernel.org, Ingo Molnar

On Thu, 2009-05-28 at 16:10 -0500, Jon Hunter wrote:
> Thomas Gleixner wrote:
> >   Please make this a real function. There is no reason to stick this
> >   into a header file. The only user is clocksource.c anyway, so please
> >   put it there as a static function and let the compiler decide what
> >   to do with it.
> 
> No problem. Please see below. Let me know if this is ok and there is 
> anything else.
> 
> Cheers
> Jon
> 
> The dynamic tick allows the kernel to sleep for periods longer
> than a single tick. This patch prevents that the kernel from
> sleeping for a period longer than the maximum time that the
> current clocksource can count. This ensures that the kernel will
> not lose track of time. This patch adds a function called
> "clocksource_max_deferment()" that calculates the maximum time the
> kernel can sleep for a given clocksource and function called
> "timekeeping_max_deferment()" that returns maximum time the kernel
> can sleep for the current clocksource.
> 
> Signed-off-by: Jon Hunter <jon-hunter@ti.com>

Thanks for putting up with my apparent misdirections and going around
and around on this. :)

Acked-by: John Stultz <johnstul@us.ibm.com>


> ---
>   include/linux/clocksource.h |    2 +
>   include/linux/time.h        |    1 +
>   kernel/time/clocksource.c   |   47 
> +++++++++++++++++++++++++++++++++++++++++++
>   kernel/time/tick-sched.c    |   36 ++++++++++++++++++++++----------
>   kernel/time/timekeeping.c   |   11 ++++++++++
>   5 files changed, 86 insertions(+), 11 deletions(-)
> 
> diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h
> index 5a40d14..465af22 100644
> --- a/include/linux/clocksource.h
> +++ b/include/linux/clocksource.h
> @@ -151,6 +151,7 @@ extern u64 timecounter_cyc2time(struct timecounter *tc,
>    * @mult:		cycle to nanosecond multiplier (adjusted by NTP)
>    * @mult_orig:		cycle to nanosecond multiplier (unadjusted by NTP)
>    * @shift:		cycle to nanosecond divisor (power of two)
> + * @max_idle_ns:	max idle time permitted by the clocksource (nsecs)
>    * @flags:		flags describing special properties
>    * @vread:		vsyscall based read
>    * @resume:		resume function for the clocksource, if necessary
> @@ -171,6 +172,7 @@ struct clocksource {
>   	u32 mult;
>   	u32 mult_orig;
>   	u32 shift;
> +	s64 max_idle_ns;
>   	unsigned long flags;
>   	cycle_t (*vread)(void);
>   	void (*resume)(void);
> diff --git a/include/linux/time.h b/include/linux/time.h
> index 242f624..090be07 100644
> --- a/include/linux/time.h
> +++ b/include/linux/time.h
> @@ -130,6 +130,7 @@ extern void monotonic_to_bootbased(struct timespec *ts);
> 
>   extern struct timespec timespec_trunc(struct timespec t, unsigned gran);
>   extern int timekeeping_valid_for_hres(void);
> +extern s64 timekeeping_max_deferment(void);
>   extern void update_wall_time(void);
>   extern void update_xtime_cache(u64 nsec);
> 
> diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
> index ecfd7b5..18d2b9f 100644
> --- a/kernel/time/clocksource.c
> +++ b/kernel/time/clocksource.c
> @@ -321,6 +321,50 @@ void clocksource_touch_watchdog(void)
>   }
> 
>   /**
> + * clocksource_max_deferment - Returns max time the clocksource can be 
> deferred
> + * @cs:         Pointer to clocksource
> + *
> + */
> +static s64 clocksource_max_deferment(struct clocksource *cs)
> +{
> +	s64 max_nsecs;
> +	u64 max_cycles;
> +
> +	/*
> +	 * Calculate the maximum number of cycles that we can pass to the
> +	 * cyc2ns function without overflowing a 64-bit signed result. The
> +	 * maximum number of cycles is equal to ULLONG_MAX/cs->mult which
> +	 * is equivalent to the below.
> +	 * max_cycles < (2^63)/cs->mult
> +	 * max_cycles < 2^(log2((2^63)/cs->mult))
> +	 * max_cycles < 2^(log2(2^63) - log2(cs->mult))
> +	 * max_cycles < 2^(63 - log2(cs->mult))
> +	 * max_cycles < 1 << (63 - log2(cs->mult))
> +	 * Please note that we add 1 to the result of the log2 to account for
> +	 * any rounding errors, ensure the above inequality is satisfied and
> +	 * no overflow will occur.
> +	 */
> +	max_cycles = 1ULL << (63 - (ilog2(cs->mult) + 1));
> +
> +	/*
> +	 * The actual maximum number of cycles we can defer the clocksource is
> +	 * determined by the minimum of max_cycles and cs->mask.
> +	 */
> +	max_cycles = min(max_cycles, cs->mask);
> +	max_nsecs = cyc2ns(cs, max_cycles);
> +
> +	/*
> +	 * To ensure that the clocksource does not wrap whilst we are idle,
> +	 * limit the time the clocksource can be deferred by 12.5%. Please
> +	 * note a margin of 12.5% is used because this can be computed with
> +	 * a shift, versus say 10% which would require division.
> +	 */
> +	max_nsecs = max_nsecs - (max_nsecs >> 5);
> +
> +	return max_nsecs;
> +}
> +
> +/**
>    * clocksource_get_next - Returns the selected clocksource
>    *
>    */
> @@ -405,6 +449,9 @@ int clocksource_register(struct clocksource *c)
>   	/* save mult_orig on registration */
>   	c->mult_orig = c->mult;
> 
> +	/* calculate max idle time permitted for this clocksource */
> +	c->max_idle_ns = clocksource_max_deferment(c);
> +
>   	spin_lock_irqsave(&clocksource_lock, flags);
>   	ret = clocksource_enqueue(c);
>   	if (!ret)
> diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> index d3f1ef4..f0155ae 100644
> --- a/kernel/time/tick-sched.c
> +++ b/kernel/time/tick-sched.c
> @@ -217,6 +217,7 @@ void tick_nohz_stop_sched_tick(int inidle)
>   	ktime_t last_update, expires, now;
>   	struct clock_event_device *dev = __get_cpu_var(tick_cpu_device).evtdev;
>   	int cpu;
> +	s64 time_delta, max_time_delta;
> 
>   	local_irq_save(flags);
> 
> @@ -264,6 +265,7 @@ void tick_nohz_stop_sched_tick(int inidle)
>   		seq = read_seqbegin(&xtime_lock);
>   		last_update = last_jiffies_update;
>   		last_jiffies = jiffies;
> +		max_time_delta = timekeeping_max_deferment();
>   	} while (read_seqretry(&xtime_lock, seq));
> 
>   	/* Get the next timer wheel timer */
> @@ -283,11 +285,22 @@ void tick_nohz_stop_sched_tick(int inidle)
>   	if ((long)delta_jiffies >= 1) {
> 
>   		/*
> -		* calculate the expiry time for the next timer wheel
> -		* timer
> -		*/
> -		expires = ktime_add_ns(last_update, tick_period.tv64 *
> -				   delta_jiffies);
> +		 * Calculate the time delta for the next timer event.
> +		 * If the time delta exceeds the maximum time delta
> +		 * permitted by the current clocksource then adjust
> +		 * the time delta accordingly to ensure the
> +		 * clocksource does not wrap.
> +		 */
> +		time_delta = tick_period.tv64 * delta_jiffies;
> +
> +		if (time_delta > max_time_delta)
> +			time_delta = max_time_delta;
> +
> +		/*
> +		 * calculate the expiry time for the next timer wheel
> +		 * timer
> +		 */
> +		expires = ktime_add_ns(last_update, time_delta);
> 
>   		/*
>   		 * If this cpu is the one which updates jiffies, then
> @@ -300,7 +313,7 @@ void tick_nohz_stop_sched_tick(int inidle)
>   		if (cpu == tick_do_timer_cpu)
>   			tick_do_timer_cpu = TICK_DO_TIMER_NONE;
> 
> -		if (delta_jiffies > 1)
> +		if (time_delta > tick_period.tv64)
>   			cpumask_set_cpu(cpu, nohz_cpu_mask);
> 
>   		/* Skip reprogram of event if its not changed */
> @@ -332,12 +345,13 @@ void tick_nohz_stop_sched_tick(int inidle)
>   		ts->idle_sleeps++;
> 
>   		/*
> -		 * delta_jiffies >= NEXT_TIMER_MAX_DELTA signals that
> -		 * there is no timer pending or at least extremly far
> -		 * into the future (12 days for HZ=1000). In this case
> -		 * we simply stop the tick timer:
> +		 * time_delta >= (tick_period.tv64 * NEXT_TIMER_MAX_DELTA)
> +		 * signals that there is no timer pending or at least
> +		 * extremely far into the future (12 days for HZ=1000).
> +		 * In this case we simply stop the tick timer:
>   		 */
> -		if (unlikely(delta_jiffies >= NEXT_TIMER_MAX_DELTA)) {
> +		if (unlikely(time_delta >=
> +				(tick_period.tv64 * NEXT_TIMER_MAX_DELTA))) {
>   			ts->idle_expires.tv64 = KTIME_MAX;
>   			if (ts->nohz_mode == NOHZ_MODE_HIGHRES)
>   				hrtimer_cancel(&ts->sched_timer);
> diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
> index 687dff4..659cae3 100644
> --- a/kernel/time/timekeeping.c
> +++ b/kernel/time/timekeeping.c
> @@ -271,6 +271,17 @@ int timekeeping_valid_for_hres(void)
>   }
> 
>   /**
> + * timekeeping_max_deferment - Returns max time the clocksource can be 
> deferred
> + *
> + * IMPORTANT: Caller must observe xtime_lock via 
> read_seqbegin/read_seqretry
> + * to ensure that the clocksource does not change!
> + */
> +s64 timekeeping_max_deferment(void)
> +{
> +	return clock->max_idle_ns;
> +}
> +
> +/**
>    * read_persistent_clock -  Return time in seconds from the persistent 
> clock.
>    *
>    * Weak dummy function for arches that do not yet support it.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/2] Dynamic Tick: Prevent clocksource wrapping during idle
  2009-05-28 21:10           ` Jon Hunter
  2009-05-28 21:43             ` John Stultz
@ 2009-05-28 22:16             ` Thomas Gleixner
  2009-05-29 19:43               ` Jon Hunter
  2009-05-30  1:00             ` Jon Hunter
  2 siblings, 1 reply; 29+ messages in thread
From: Thomas Gleixner @ 2009-05-28 22:16 UTC (permalink / raw)
  To: Jon Hunter; +Cc: john stultz, linux-kernel@vger.kernel.org, Ingo Molnar

On Thu, 28 May 2009, Jon Hunter wrote:

> Thomas Gleixner wrote:
> >   Please make this a real function. There is no reason to stick this
> >   into a header file. The only user is clocksource.c anyway, so please
> >   put it there as a static function and let the compiler decide what
> >   to do with it.
> 
> No problem. Please see below. Let me know if this is ok and there is anything
> else.

  Looks good now.

>  /**
> + * timekeeping_max_deferment - Returns max time the clocksource can be
> deferred
> + *
> + * IMPORTANT: Caller must observe xtime_lock via read_seqbegin/read_seqretry
> + * to ensure that the clocksource does not change!
> + */

  Just nitpicking here. For the intended use case this is irrelevant.

  On UP this is called from an irq disabled section, so nothing is
  going to change the clock source.

  On SMP it does not matter if CPU A goes to sleep with the old clock
  source and CPU B changes the clock source while A is idle. When B
  goes idle it will take the change into account.

  But that leads me to an interesting observation:

  On SMP we really should only care for the CPU which has the do_timer
  duty assigned. All other CPUs can sleep as long as they want. When
  that CPU goes idle and drops the do_timer duty it needs to look at
  max_deferement, but the others can sleep as long as they want.

  So the rule would be:

  if (cpu == tick_do_timer_cpu || tick_do_timer_cpu == TICK_DO_TIMER_NONE)
     check_max_deferment();
  else
     sleep_as_long_as_you_want;

  Could you add that perhaps ?

> +s64 timekeeping_max_deferment(void)
> +{
> +	return clock->max_idle_ns;
> +}
> +

Thanks for your patience,

	tglx

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/2] Dynamic Tick: Prevent clocksource wrapping during idle
  2009-05-28 22:16             ` Thomas Gleixner
@ 2009-05-29 19:43               ` Jon Hunter
  0 siblings, 0 replies; 29+ messages in thread
From: Jon Hunter @ 2009-05-29 19:43 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: john stultz, linux-kernel@vger.kernel.org, Ingo Molnar


Thomas Gleixner wrote:
> On Thu, 28 May 2009, Jon Hunter wrote:
>>  /**
>> + * timekeeping_max_deferment - Returns max time the clocksource can be
>> deferred
>> + *
>> + * IMPORTANT: Caller must observe xtime_lock via read_seqbegin/read_seqretry
>> + * to ensure that the clocksource does not change!
>> + */
> 
>   Just nitpicking here. For the intended use case this is irrelevant.
>   
>   On UP this is called from an irq disabled section, so nothing is
>   going to change the clock source.
> 
>   On SMP it does not matter if CPU A goes to sleep with the old clock
>   source and CPU B changes the clock source while A is idle. When B
>   goes idle it will take the change into account.

Ok, understood. Let me know if you would like me to remove the comment 
above. I wanted to make sure that if someone was to use this function 
else where (can't think of why right now) that they would not over look 
this.

>   But that leads me to an interesting observation:
> 
>   On SMP we really should only care for the CPU which has the do_timer
>   duty assigned. All other CPUs can sleep as long as they want. When
>   that CPU goes idle and drops the do_timer duty it needs to look at
>   max_deferement, but the others can sleep as long as they want.
> 
>   So the rule would be:
> 
>   if (cpu == tick_do_timer_cpu || tick_do_timer_cpu == TICK_DO_TIMER_NONE)
>      check_max_deferment();
>   else
>      sleep_as_long_as_you_want;
> 
>   Could you add that perhaps ?

Absolutely. Please see below and let me know if this is ok.

> Thanks for your patience,

No problem. Thanks for the feedback.

Cheers
Jon

The dynamic tick allows the kernel to sleep for periods longer
than a single tick. This patch prevents that the kernel from
sleeping for a period longer than the maximum time that the
current clocksource can count. This ensures that the kernel will
not lose track of time. This patch adds a function called
"clocksource_max_deferment()" that calculates the maximum time the
kernel can sleep for a given clocksource and function called
"timekeeping_max_deferment()" that returns maximum time the kernel
can sleep for the current clocksource.

Signed-off-by: Jon Hunter <jon-hunter@ti.com>
---
  include/linux/clocksource.h |    2 +
  include/linux/time.h        |    1 +
  kernel/time/clocksource.c   |   47 
+++++++++++++++++++++++++++++++++++++++++++
  kernel/time/tick-sched.c    |   47 
++++++++++++++++++++++++++++++++----------
  kernel/time/timekeeping.c   |   11 ++++++++++
  5 files changed, 97 insertions(+), 11 deletions(-)

diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h
index 5a40d14..465af22 100644
--- a/include/linux/clocksource.h
+++ b/include/linux/clocksource.h
@@ -151,6 +151,7 @@ extern u64 timecounter_cyc2time(struct timecounter *tc,
   * @mult:		cycle to nanosecond multiplier (adjusted by NTP)
   * @mult_orig:		cycle to nanosecond multiplier (unadjusted by NTP)
   * @shift:		cycle to nanosecond divisor (power of two)
+ * @max_idle_ns:	max idle time permitted by the clocksource (nsecs)
   * @flags:		flags describing special properties
   * @vread:		vsyscall based read
   * @resume:		resume function for the clocksource, if necessary
@@ -171,6 +172,7 @@ struct clocksource {
  	u32 mult;
  	u32 mult_orig;
  	u32 shift;
+	s64 max_idle_ns;
  	unsigned long flags;
  	cycle_t (*vread)(void);
  	void (*resume)(void);
diff --git a/include/linux/time.h b/include/linux/time.h
index 242f624..090be07 100644
--- a/include/linux/time.h
+++ b/include/linux/time.h
@@ -130,6 +130,7 @@ extern void monotonic_to_bootbased(struct timespec *ts);

  extern struct timespec timespec_trunc(struct timespec t, unsigned gran);
  extern int timekeeping_valid_for_hres(void);
+extern s64 timekeeping_max_deferment(void);
  extern void update_wall_time(void);
  extern void update_xtime_cache(u64 nsec);

diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index ecfd7b5..18d2b9f 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -321,6 +321,50 @@ void clocksource_touch_watchdog(void)
  }

  /**
+ * clocksource_max_deferment - Returns max time the clocksource can be 
deferred
+ * @cs:         Pointer to clocksource
+ *
+ */
+static s64 clocksource_max_deferment(struct clocksource *cs)
+{
+	s64 max_nsecs;
+	u64 max_cycles;
+
+	/*
+	 * Calculate the maximum number of cycles that we can pass to the
+	 * cyc2ns function without overflowing a 64-bit signed result. The
+	 * maximum number of cycles is equal to ULLONG_MAX/cs->mult which
+	 * is equivalent to the below.
+	 * max_cycles < (2^63)/cs->mult
+	 * max_cycles < 2^(log2((2^63)/cs->mult))
+	 * max_cycles < 2^(log2(2^63) - log2(cs->mult))
+	 * max_cycles < 2^(63 - log2(cs->mult))
+	 * max_cycles < 1 << (63 - log2(cs->mult))
+	 * Please note that we add 1 to the result of the log2 to account for
+	 * any rounding errors, ensure the above inequality is satisfied and
+	 * no overflow will occur.
+	 */
+	max_cycles = 1ULL << (63 - (ilog2(cs->mult) + 1));
+
+	/*
+	 * The actual maximum number of cycles we can defer the clocksource is
+	 * determined by the minimum of max_cycles and cs->mask.
+	 */
+	max_cycles = min(max_cycles, cs->mask);
+	max_nsecs = cyc2ns(cs, max_cycles);
+
+	/*
+	 * To ensure that the clocksource does not wrap whilst we are idle,
+	 * limit the time the clocksource can be deferred by 12.5%. Please
+	 * note a margin of 12.5% is used because this can be computed with
+	 * a shift, versus say 10% which would require division.
+	 */
+	max_nsecs = max_nsecs - (max_nsecs >> 5);
+
+	return max_nsecs;
+}
+
+/**
   * clocksource_get_next - Returns the selected clocksource
   *
   */
@@ -405,6 +449,9 @@ int clocksource_register(struct clocksource *c)
  	/* save mult_orig on registration */
  	c->mult_orig = c->mult;

+	/* calculate max idle time permitted for this clocksource */
+	c->max_idle_ns = clocksource_max_deferment(c);
+
  	spin_lock_irqsave(&clocksource_lock, flags);
  	ret = clocksource_enqueue(c);
  	if (!ret)
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index d3f1ef4..318cf8a 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -217,6 +217,7 @@ void tick_nohz_stop_sched_tick(int inidle)
  	ktime_t last_update, expires, now;
  	struct clock_event_device *dev = __get_cpu_var(tick_cpu_device).evtdev;
  	int cpu;
+	s64 time_delta, max_time_delta;

  	local_irq_save(flags);

@@ -264,6 +265,18 @@ void tick_nohz_stop_sched_tick(int inidle)
  		seq = read_seqbegin(&xtime_lock);
  		last_update = last_jiffies_update;
  		last_jiffies = jiffies;
+
+		/*
+		 * On SMP we really should only care for the CPU which
+		 * has the do_timer duty assigned. All other CPUs can
+		 * sleep as long as they want.
+		 */
+		if (cpu == tick_do_timer_cpu ||
+				tick_do_timer_cpu == TICK_DO_TIMER_NONE)
+			max_time_delta = timekeeping_max_deferment();
+		else
+			max_time_delta = KTIME_MAX;
+
  	} while (read_seqretry(&xtime_lock, seq));

  	/* Get the next timer wheel timer */
@@ -283,11 +296,22 @@ void tick_nohz_stop_sched_tick(int inidle)
  	if ((long)delta_jiffies >= 1) {

  		/*
-		* calculate the expiry time for the next timer wheel
-		* timer
-		*/
-		expires = ktime_add_ns(last_update, tick_period.tv64 *
-				   delta_jiffies);
+		 * Calculate the time delta for the next timer event.
+		 * If the time delta exceeds the maximum time delta
+		 * permitted by the current clocksource then adjust
+		 * the time delta accordingly to ensure the
+		 * clocksource does not wrap.
+		 */
+		time_delta = tick_period.tv64 * delta_jiffies;
+
+		if (time_delta > max_time_delta)
+			time_delta = max_time_delta;
+
+		/*
+		 * calculate the expiry time for the next timer wheel
+		 * timer
+		 */
+		expires = ktime_add_ns(last_update, time_delta);

  		/*
  		 * If this cpu is the one which updates jiffies, then
@@ -300,7 +324,7 @@ void tick_nohz_stop_sched_tick(int inidle)
  		if (cpu == tick_do_timer_cpu)
  			tick_do_timer_cpu = TICK_DO_TIMER_NONE;

-		if (delta_jiffies > 1)
+		if (time_delta > tick_period.tv64)
  			cpumask_set_cpu(cpu, nohz_cpu_mask);

  		/* Skip reprogram of event if its not changed */
@@ -332,12 +356,13 @@ void tick_nohz_stop_sched_tick(int inidle)
  		ts->idle_sleeps++;

  		/*
-		 * delta_jiffies >= NEXT_TIMER_MAX_DELTA signals that
-		 * there is no timer pending or at least extremly far
-		 * into the future (12 days for HZ=1000). In this case
-		 * we simply stop the tick timer:
+		 * time_delta >= (tick_period.tv64 * NEXT_TIMER_MAX_DELTA)
+		 * signals that there is no timer pending or at least
+		 * extremely far into the future (12 days for HZ=1000).
+		 * In this case we simply stop the tick timer:
  		 */
-		if (unlikely(delta_jiffies >= NEXT_TIMER_MAX_DELTA)) {
+		if (unlikely(time_delta >=
+				(tick_period.tv64 * NEXT_TIMER_MAX_DELTA))) {
  			ts->idle_expires.tv64 = KTIME_MAX;
  			if (ts->nohz_mode == NOHZ_MODE_HIGHRES)
  				hrtimer_cancel(&ts->sched_timer);
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 687dff4..659cae3 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -271,6 +271,17 @@ int timekeeping_valid_for_hres(void)
  }

  /**
+ * timekeeping_max_deferment - Returns max time the clocksource can be 
deferred
+ *
+ * IMPORTANT: Caller must observe xtime_lock via 
read_seqbegin/read_seqretry
+ * to ensure that the clocksource does not change!
+ */
+s64 timekeeping_max_deferment(void)
+{
+	return clock->max_idle_ns;
+}
+
+/**
   * read_persistent_clock -  Return time in seconds from the persistent 
clock.
   *
   * Weak dummy function for arches that do not yet support it.
-- 
1.6.1




^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/2] Dynamic Tick: Prevent clocksource wrapping during idle
  2009-05-28 21:10           ` Jon Hunter
  2009-05-28 21:43             ` John Stultz
  2009-05-28 22:16             ` Thomas Gleixner
@ 2009-05-30  1:00             ` Jon Hunter
  2009-06-04 19:29               ` Jon Hunter
  2 siblings, 1 reply; 29+ messages in thread
From: Jon Hunter @ 2009-05-30  1:00 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: john stultz, linux-kernel@vger.kernel.org, Ingo Molnar

Jon Hunter wrote:
> +		 * Calculate the time delta for the next timer event.
> +		 * If the time delta exceeds the maximum time delta
> +		 * permitted by the current clocksource then adjust
> +		 * the time delta accordingly to ensure the
> +		 * clocksource does not wrap.
> +		 */
> +		time_delta = tick_period.tv64 * delta_jiffies;

Thinking about this more, although it is very unlikely, for 64-bit 
machines there is a chance that the above multiply could overflow if 
delta_jiffies is very large.

tick_period.tv64 should always be less than NSEC_PER_SEC and so you 
would need delta_jiffies to be greater than 2^32 to cause overflow. On a 
32-bit machine an unsigned long will not be greater than 2^32 as it is 
only 32-bits but this would be possible on a 64-bit machines.

So to be safe we should make sure that delta_jiffies is not greater than 
  NEXT_TIMER_MAX_DELTA (2^30 - 1) before doing the multiply. If you 
think that this is a valid concern, then I can re-work and re-post. 
Sorry for not catching this before.

Jon

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/2] Dynamic Tick: Prevent clocksource wrapping during idle
  2009-05-30  1:00             ` Jon Hunter
@ 2009-06-04 19:29               ` Jon Hunter
  2009-06-25 19:10                 ` Jon Hunter
  0 siblings, 1 reply; 29+ messages in thread
From: Jon Hunter @ 2009-06-04 19:29 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: john stultz, linux-kernel@vger.kernel.org, Ingo Molnar


Jon Hunter wrote:
> Jon Hunter wrote:
>> +		 * Calculate the time delta for the next timer event.
>> +		 * If the time delta exceeds the maximum time delta
>> +		 * permitted by the current clocksource then adjust
>> +		 * the time delta accordingly to ensure the
>> +		 * clocksource does not wrap.
>> +		 */
>> +		time_delta = tick_period.tv64 * delta_jiffies;
> 
> Thinking about this more, although it is very unlikely, for 64-bit 
> machines there is a chance that the above multiply could overflow if 
> delta_jiffies is very large.
> 
> tick_period.tv64 should always be less than NSEC_PER_SEC and so you 
> would need delta_jiffies to be greater than 2^32 to cause overflow. On a 
> 32-bit machine an unsigned long will not be greater than 2^32 as it is 
> only 32-bits but this would be possible on a 64-bit machines.
> 
> So to be safe we should make sure that delta_jiffies is not greater than 
>   NEXT_TIMER_MAX_DELTA (2^30 - 1) before doing the multiply. If you 
> think that this is a valid concern, then I can re-work and re-post. 
> Sorry for not catching this before.

With regard to the above, to ensure that there are no overflows with the 
above calculation, I re-worked this patch a little. The below should be 
equivalent to the current code, just re-organised a little. Let me know 
if this would be acceptable or not.

Cheers
Jon

The dynamic tick allows the kernel to sleep for periods longer
than a single tick. This patch prevents that the kernel from
sleeping for a period longer than the maximum time that the
current clocksource can count. This ensures that the kernel will
not lose track of time. This patch adds a function called
"clocksource_max_deferment()" that calculates the maximum time the
kernel can sleep for a given clocksource and function called
"timekeeping_max_deferment()" that returns maximum time the kernel
can sleep for the current clocksource.

Signed-off-by: Jon Hunter <jon-hunter@ti.com>
---
  include/linux/clocksource.h |    2 +
  include/linux/time.h        |    1 +
  kernel/time/clocksource.c   |   47 +++++++++++++++++++++++++++++++++++
  kernel/time/tick-sched.c    |   57 
++++++++++++++++++++++++++++++++----------
  kernel/time/timekeeping.c   |   11 ++++++++
  5 files changed, 104 insertions(+), 14 deletions(-)

diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h
index 5a40d14..465af22 100644
--- a/include/linux/clocksource.h
+++ b/include/linux/clocksource.h
@@ -151,6 +151,7 @@ extern u64 timecounter_cyc2time(struct timecounter *tc,
   * @mult:		cycle to nanosecond multiplier (adjusted by NTP)
   * @mult_orig:		cycle to nanosecond multiplier (unadjusted by NTP)
   * @shift:		cycle to nanosecond divisor (power of two)
+ * @max_idle_ns:	max idle time permitted by the clocksource (nsecs)
   * @flags:		flags describing special properties
   * @vread:		vsyscall based read
   * @resume:		resume function for the clocksource, if necessary
@@ -171,6 +172,7 @@ struct clocksource {
  	u32 mult;
  	u32 mult_orig;
  	u32 shift;
+	s64 max_idle_ns;
  	unsigned long flags;
  	cycle_t (*vread)(void);
  	void (*resume)(void);
diff --git a/include/linux/time.h b/include/linux/time.h
index 242f624..090be07 100644
--- a/include/linux/time.h
+++ b/include/linux/time.h
@@ -130,6 +130,7 @@ extern void monotonic_to_bootbased(struct timespec *ts);

  extern struct timespec timespec_trunc(struct timespec t, unsigned gran);
  extern int timekeeping_valid_for_hres(void);
+extern s64 timekeeping_max_deferment(void);
  extern void update_wall_time(void);
  extern void update_xtime_cache(u64 nsec);

diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index ecfd7b5..18d2b9f 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -321,6 +321,50 @@ void clocksource_touch_watchdog(void)
  }

  /**
+ * clocksource_max_deferment - Returns max time the clocksource can be 
deferred
+ * @cs:         Pointer to clocksource
+ *
+ */
+static s64 clocksource_max_deferment(struct clocksource *cs)
+{
+	s64 max_nsecs;
+	u64 max_cycles;
+
+	/*
+	 * Calculate the maximum number of cycles that we can pass to the
+	 * cyc2ns function without overflowing a 64-bit signed result. The
+	 * maximum number of cycles is equal to ULLONG_MAX/cs->mult which
+	 * is equivalent to the below.
+	 * max_cycles < (2^63)/cs->mult
+	 * max_cycles < 2^(log2((2^63)/cs->mult))
+	 * max_cycles < 2^(log2(2^63) - log2(cs->mult))
+	 * max_cycles < 2^(63 - log2(cs->mult))
+	 * max_cycles < 1 << (63 - log2(cs->mult))
+	 * Please note that we add 1 to the result of the log2 to account for
+	 * any rounding errors, ensure the above inequality is satisfied and
+	 * no overflow will occur.
+	 */
+	max_cycles = 1ULL << (63 - (ilog2(cs->mult) + 1));
+
+	/*
+	 * The actual maximum number of cycles we can defer the clocksource is
+	 * determined by the minimum of max_cycles and cs->mask.
+	 */
+	max_cycles = min(max_cycles, cs->mask);
+	max_nsecs = cyc2ns(cs, max_cycles);
+
+	/*
+	 * To ensure that the clocksource does not wrap whilst we are idle,
+	 * limit the time the clocksource can be deferred by 12.5%. Please
+	 * note a margin of 12.5% is used because this can be computed with
+	 * a shift, versus say 10% which would require division.
+	 */
+	max_nsecs = max_nsecs - (max_nsecs >> 5);
+
+	return max_nsecs;
+}
+
+/**
   * clocksource_get_next - Returns the selected clocksource
   *
   */
@@ -405,6 +449,9 @@ int clocksource_register(struct clocksource *c)
  	/* save mult_orig on registration */
  	c->mult_orig = c->mult;

+	/* calculate max idle time permitted for this clocksource */
+	c->max_idle_ns = clocksource_max_deferment(c);
+
  	spin_lock_irqsave(&clocksource_lock, flags);
  	ret = clocksource_enqueue(c);
  	if (!ret)
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index d3f1ef4..9988e5e 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -217,6 +217,7 @@ void tick_nohz_stop_sched_tick(int inidle)
  	ktime_t last_update, expires, now;
  	struct clock_event_device *dev = __get_cpu_var(tick_cpu_device).evtdev;
  	int cpu;
+	s64 time_delta, max_time_delta;

  	local_irq_save(flags);

@@ -264,6 +265,18 @@ void tick_nohz_stop_sched_tick(int inidle)
  		seq = read_seqbegin(&xtime_lock);
  		last_update = last_jiffies_update;
  		last_jiffies = jiffies;
+
+		/*
+		 * On SMP we really should only care for the CPU which
+		 * has the do_timer duty assigned. All other CPUs can
+		 * sleep as long as they want.
+		 */
+		if (cpu == tick_do_timer_cpu ||
+				tick_do_timer_cpu == TICK_DO_TIMER_NONE)
+			max_time_delta = timekeeping_max_deferment();
+		else
+			max_time_delta = KTIME_MAX;
+
  	} while (read_seqretry(&xtime_lock, seq));

  	/* Get the next timer wheel timer */
@@ -283,11 +296,30 @@ void tick_nohz_stop_sched_tick(int inidle)
  	if ((long)delta_jiffies >= 1) {

  		/*
-		* calculate the expiry time for the next timer wheel
-		* timer
-		*/
-		expires = ktime_add_ns(last_update, tick_period.tv64 *
-				   delta_jiffies);
+		 * calculate the expiry time for the next timer wheel
+		 * timer. delta_jiffies >= NEXT_TIMER_MAX_DELTA signals
+		 * that there is no timer pending or at least extremely
+		 * far into the future (12 days for HZ=1000). In this
+		 * case we set the expiry to the end of time.
+		 */
+		if (likely(delta_jiffies < NEXT_TIMER_MAX_DELTA)) {
+
+			/*
+			 * Calculate the time delta for the next timer event.
+			 * If the time delta exceeds the maximum time delta
+			 * permitted by the current clocksource then adjust
+			 * the time delta accordingly to ensure the
+			 * clocksource does not wrap.
+			 */
+			time_delta = tick_period.tv64 * delta_jiffies;
+
+			if (time_delta > max_time_delta)
+				time_delta = max_time_delta;
+
+			expires = ktime_add_ns(last_update, time_delta);
+		} else {
+			expires.tv64 = KTIME_MAX;
+		}

  		/*
  		 * If this cpu is the one which updates jiffies, then
@@ -331,22 +363,19 @@ void tick_nohz_stop_sched_tick(int inidle)

  		ts->idle_sleeps++;

+		/* Mark expires */
+		ts->idle_expires = expires;
+
  		/*
-		 * delta_jiffies >= NEXT_TIMER_MAX_DELTA signals that
-		 * there is no timer pending or at least extremly far
-		 * into the future (12 days for HZ=1000). In this case
-		 * we simply stop the tick timer:
+		 * If the expiration time == KTIME_MAX, then
+		 * in this case we simply stop the tick timer.
  		 */
-		if (unlikely(delta_jiffies >= NEXT_TIMER_MAX_DELTA)) {
-			ts->idle_expires.tv64 = KTIME_MAX;
+		if (unlikely(expires.tv64 == KTIME_MAX)) {
  			if (ts->nohz_mode == NOHZ_MODE_HIGHRES)
  				hrtimer_cancel(&ts->sched_timer);
  			goto out;
  		}

-		/* Mark expiries */
-		ts->idle_expires = expires;
-
  		if (ts->nohz_mode == NOHZ_MODE_HIGHRES) {
  			hrtimer_start(&ts->sched_timer, expires,
  				      HRTIMER_MODE_ABS);
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 687dff4..659cae3 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -271,6 +271,17 @@ int timekeeping_valid_for_hres(void)
  }

  /**
+ * timekeeping_max_deferment - Returns max time the clocksource can be 
deferred
+ *
+ * IMPORTANT: Caller must observe xtime_lock via 
read_seqbegin/read_seqretry
+ * to ensure that the clocksource does not change!
+ */
+s64 timekeeping_max_deferment(void)
+{
+	return clock->max_idle_ns;
+}
+
+/**
   * read_persistent_clock -  Return time in seconds from the persistent 
clock.
   *
   * Weak dummy function for arches that do not yet support it.
-- 
1.6.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/2] Dynamic Tick: Prevent clocksource wrapping during idle
  2009-06-04 19:29               ` Jon Hunter
@ 2009-06-25 19:10                 ` Jon Hunter
  0 siblings, 0 replies; 29+ messages in thread
From: Jon Hunter @ 2009-06-25 19:10 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: john stultz, linux-kernel@vger.kernel.org, Ingo Molnar


Jon Hunter wrote:
> Jon Hunter wrote:
>> Jon Hunter wrote:
>>> +		 * Calculate the time delta for the next timer event.
>>> +		 * If the time delta exceeds the maximum time delta
>>> +		 * permitted by the current clocksource then adjust
>>> +		 * the time delta accordingly to ensure the
>>> +		 * clocksource does not wrap.
>>> +		 */
>>> +		time_delta = tick_period.tv64 * delta_jiffies;
>> Thinking about this more, although it is very unlikely, for 64-bit 
>> machines there is a chance that the above multiply could overflow if 
>> delta_jiffies is very large.
>>
>> tick_period.tv64 should always be less than NSEC_PER_SEC and so you 
>> would need delta_jiffies to be greater than 2^32 to cause overflow. On a 
>> 32-bit machine an unsigned long will not be greater than 2^32 as it is 
>> only 32-bits but this would be possible on a 64-bit machines.
>>
>> So to be safe we should make sure that delta_jiffies is not greater than 
>>   NEXT_TIMER_MAX_DELTA (2^30 - 1) before doing the multiply. If you 
>> think that this is a valid concern, then I can re-work and re-post. 
>> Sorry for not catching this before.
> 
> With regard to the above, to ensure that there are no overflows with the 
> above calculation, I re-worked this patch a little. The below should be 
> equivalent to the current code, just re-organised a little. Let me know 
> if this would be acceptable or not.

Hi Thomas, John,

Did you guys have chance to review this? Let me know if you have any 
further comments/feedback.

Cheers
Jon

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/2] Dynamic Tick: Prevent clocksource wrapping during idle
  2009-05-27 14:49 Jon Hunter
  2009-05-27 16:01 ` Thomas Gleixner
@ 2009-05-27 18:15 ` john stultz
  2009-05-27 20:54 ` Alok Kataria
  2 siblings, 0 replies; 29+ messages in thread
From: john stultz @ 2009-05-27 18:15 UTC (permalink / raw)
  To: Jon Hunter; +Cc: linux-kernel@vger.kernel.org, Thomas Gleixner, Ingo Molnar

On Wed, 2009-05-27 at 09:49 -0500, Jon Hunter wrote:
> The dynamic tick allows the kernel to sleep for periods longer than a 
> single tick. This patch prevents that the kernel from sleeping for a 
> period longer than the maximum time that the current clocksource can 
> count. This ensures that the kernel will not lose track of time. This 
> patch adds a new function called "timekeeping_max_deferment()" that 
> calculates the maximum time the kernel can sleep for a given clocksource.
> 
> Signed-off-by: Jon Hunter <jon-hunter@ti.com>

Acked-by: John Stultz <johnstul@us.ibm.com>


> ---
>   include/linux/time.h      |    1 +
>   kernel/time/tick-sched.c  |   36 +++++++++++++++++++++++----------
>   kernel/time/timekeeping.c |   47 
> +++++++++++++++++++++++++++++++++++++++++++++
>   3 files changed, 73 insertions(+), 11 deletions(-)
> 
> diff --git a/include/linux/time.h b/include/linux/time.h
> index 242f624..090be07 100644
> --- a/include/linux/time.h
> +++ b/include/linux/time.h
> @@ -130,6 +130,7 @@ extern void monotonic_to_bootbased(struct timespec *ts);
> 
>   extern struct timespec timespec_trunc(struct timespec t, unsigned gran);
>   extern int timekeeping_valid_for_hres(void);
> +extern s64 timekeeping_max_deferment(void);
>   extern void update_wall_time(void);
>   extern void update_xtime_cache(u64 nsec);
> 
> diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> index d3f1ef4..f0155ae 100644
> --- a/kernel/time/tick-sched.c
> +++ b/kernel/time/tick-sched.c
> @@ -217,6 +217,7 @@ void tick_nohz_stop_sched_tick(int inidle)
>   	ktime_t last_update, expires, now;
>   	struct clock_event_device *dev = __get_cpu_var(tick_cpu_device).evtdev;
>   	int cpu;
> +	s64 time_delta, max_time_delta;
> 
>   	local_irq_save(flags);
> 
> @@ -264,6 +265,7 @@ void tick_nohz_stop_sched_tick(int inidle)
>   		seq = read_seqbegin(&xtime_lock);
>   		last_update = last_jiffies_update;
>   		last_jiffies = jiffies;
> +		max_time_delta = timekeeping_max_deferment();
>   	} while (read_seqretry(&xtime_lock, seq));
> 
>   	/* Get the next timer wheel timer */
> @@ -283,11 +285,22 @@ void tick_nohz_stop_sched_tick(int inidle)
>   	if ((long)delta_jiffies >= 1) {
> 
>   		/*
> -		* calculate the expiry time for the next timer wheel
> -		* timer
> -		*/
> -		expires = ktime_add_ns(last_update, tick_period.tv64 *
> -				   delta_jiffies);
> +		 * Calculate the time delta for the next timer event.
> +		 * If the time delta exceeds the maximum time delta
> +		 * permitted by the current clocksource then adjust
> +		 * the time delta accordingly to ensure the
> +		 * clocksource does not wrap.
> +		 */
> +		time_delta = tick_period.tv64 * delta_jiffies;
> +
> +		if (time_delta > max_time_delta)
> +			time_delta = max_time_delta;
> +
> +		/*
> +		 * calculate the expiry time for the next timer wheel
> +		 * timer
> +		 */
> +		expires = ktime_add_ns(last_update, time_delta);
> 
>   		/*
>   		 * If this cpu is the one which updates jiffies, then
> @@ -300,7 +313,7 @@ void tick_nohz_stop_sched_tick(int inidle)
>   		if (cpu == tick_do_timer_cpu)
>   			tick_do_timer_cpu = TICK_DO_TIMER_NONE;
> 
> -		if (delta_jiffies > 1)
> +		if (time_delta > tick_period.tv64)
>   			cpumask_set_cpu(cpu, nohz_cpu_mask);
> 
>   		/* Skip reprogram of event if its not changed */
> @@ -332,12 +345,13 @@ void tick_nohz_stop_sched_tick(int inidle)
>   		ts->idle_sleeps++;
> 
>   		/*
> -		 * delta_jiffies >= NEXT_TIMER_MAX_DELTA signals that
> -		 * there is no timer pending or at least extremly far
> -		 * into the future (12 days for HZ=1000). In this case
> -		 * we simply stop the tick timer:
> +		 * time_delta >= (tick_period.tv64 * NEXT_TIMER_MAX_DELTA)
> +		 * signals that there is no timer pending or at least
> +		 * extremely far into the future (12 days for HZ=1000).
> +		 * In this case we simply stop the tick timer:
>   		 */
> -		if (unlikely(delta_jiffies >= NEXT_TIMER_MAX_DELTA)) {
> +		if (unlikely(time_delta >=
> +				(tick_period.tv64 * NEXT_TIMER_MAX_DELTA))) {
>   			ts->idle_expires.tv64 = KTIME_MAX;
>   			if (ts->nohz_mode == NOHZ_MODE_HIGHRES)
>   				hrtimer_cancel(&ts->sched_timer);
> diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
> index 687dff4..608fc6f 100644
> --- a/kernel/time/timekeeping.c
> +++ b/kernel/time/timekeeping.c
> @@ -271,6 +271,53 @@ int timekeeping_valid_for_hres(void)
>   }
> 
>   /**
> + * timekeeping_max_deferment - Returns max time the clocksource can be 
> deferred
> + *
> + * IMPORTANT: Must be called with xtime_lock held!
> + */
> +s64 timekeeping_max_deferment(void)
> +{
> +	s64 max_nsecs;
> +	u64 max_cycles;
> +
> +	/*
> +	 * Calculate the maximum number of cycles that we can pass to the
> +	 * cyc2ns function without overflowing a 64-bit signed result. The
> +	 * maximum number of cycles is equal to ULLONG_MAX/clock->mult which
> +	 * is equivalent to the below.
> +	 * max_cycles < (2^63)/clock->mult
> +	 * max_cycles < 2^(log2((2^63)/clock->mult))
> +	 * max_cycles < 2^(log2(2^63) - log2(clock->mult))
> +	 * max_cycles < 2^(63 - log2(clock->mult))
> +	 * max_cycles < 1 << (63 - log2(clock->mult))
> +	 * Please note that we add 1 to the result of the log2 to account for
> +	 * any rounding errors, ensure the above inequality is satisfied and
> +	 * no overflow will occur.
> +	 */
> +	max_cycles = 1ULL << (63 - (ilog2(clock->mult) + 1));
> +
> +	/*
> +	 * The actual maximum number of cycles we can defer the clocksource is
> +	 * determined by the minimum of max_cycles and clock->mask.
> +	 */
> +	max_cycles = min(max_cycles, clock->mask);
> +	max_nsecs = cyc2ns(clock, max_cycles);
> +
> +	/*
> +	 * To ensure that the clocksource does not wrap whilst we are idle,
> +	 * limit the time the clocksource can be deferred by 6.25%. Please
> +	 * note a margin of 6.25% is used because this can be computed with
> +	 * a shift, versus say 5% which would require division.
> +	 */
> +	max_nsecs = max_nsecs - (max_nsecs >> 4);
> +
> +	if (max_nsecs < 0)
> +		max_nsecs = 0;
> +
> +	return max_nsecs;
> +}
> +
> +/**
>    * read_persistent_clock -  Return time in seconds from the persistent 
> clock.
>    *
>    * Weak dummy function for arches that do not yet support it.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/2] Dynamic Tick: Prevent clocksource wrapping during  idle
  2009-05-27 14:49 Jon Hunter
  2009-05-27 16:01 ` Thomas Gleixner
  2009-05-27 18:15 ` john stultz
@ 2009-05-27 20:54 ` Alok Kataria
  2009-05-27 21:12   ` Thomas Gleixner
  2 siblings, 1 reply; 29+ messages in thread
From: Alok Kataria @ 2009-05-27 20:54 UTC (permalink / raw)
  To: Jon Hunter
  Cc: linux-kernel@vger.kernel.org, john stultz, Thomas Gleixner,
	Ingo Molnar, akataria

On Wed, May 27, 2009 at 7:49 AM, Jon Hunter <jon-hunter@ti.com> wrote:
>
> The dynamic tick allows the kernel to sleep for periods longer than a single
> tick. This patch prevents that the kernel from sleeping for a period longer
> than the maximum time that the current clocksource can count. This ensures
> that the kernel will not lose track of time. This patch adds a new function
> called "timekeeping_max_deferment()" that calculates the maximum time the
> kernel can sleep for a given clocksource.
>

>From the patch description I understand that this will avoid wrapping
around for only the *current* clocksource. What happens if, say, TSC
is the clocksource and ACPI_PM is being used as the
watchdog_clocksource, in that case the timekeeping_max_deferement will
give TSC' max allowed sleep value (which is greater than ACPI_PMs).
i.e. We could still sleep beyond ACPI_PM's wrap around threshold which
may result in us marking TSC as unsuable as a clocksource.
That could still result in incorrect timekeeping right ?

Thanks,
Alok

> Signed-off-by: Jon Hunter <jon-hunter@ti.com>
> ---
>  include/linux/time.h      |    1 +
>  kernel/time/tick-sched.c  |   36 +++++++++++++++++++++++----------
>  kernel/time/timekeeping.c |   47
> +++++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 73 insertions(+), 11 deletions(-)
>
> diff --git a/include/linux/time.h b/include/linux/time.h
> index 242f624..090be07 100644
> --- a/include/linux/time.h
> +++ b/include/linux/time.h
> @@ -130,6 +130,7 @@ extern void monotonic_to_bootbased(struct timespec *ts);
>
>  extern struct timespec timespec_trunc(struct timespec t, unsigned gran);
>  extern int timekeeping_valid_for_hres(void);
> +extern s64 timekeeping_max_deferment(void);
>  extern void update_wall_time(void);
>  extern void update_xtime_cache(u64 nsec);
>
> diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> index d3f1ef4..f0155ae 100644
> --- a/kernel/time/tick-sched.c
> +++ b/kernel/time/tick-sched.c
> @@ -217,6 +217,7 @@ void tick_nohz_stop_sched_tick(int inidle)
>        ktime_t last_update, expires, now;
>        struct clock_event_device *dev =
> __get_cpu_var(tick_cpu_device).evtdev;
>        int cpu;
> +       s64 time_delta, max_time_delta;
>
>        local_irq_save(flags);
>
> @@ -264,6 +265,7 @@ void tick_nohz_stop_sched_tick(int inidle)
>                seq = read_seqbegin(&xtime_lock);
>                last_update = last_jiffies_update;
>                last_jiffies = jiffies;
> +               max_time_delta = timekeeping_max_deferment();
>        } while (read_seqretry(&xtime_lock, seq));
>
>        /* Get the next timer wheel timer */
> @@ -283,11 +285,22 @@ void tick_nohz_stop_sched_tick(int inidle)
>        if ((long)delta_jiffies >= 1) {
>
>                /*
> -               * calculate the expiry time for the next timer wheel
> -               * timer
> -               */
> -               expires = ktime_add_ns(last_update, tick_period.tv64 *
> -                                  delta_jiffies);
> +                * Calculate the time delta for the next timer event.
> +                * If the time delta exceeds the maximum time delta
> +                * permitted by the current clocksource then adjust
> +                * the time delta accordingly to ensure the
> +                * clocksource does not wrap.
> +                */
> +               time_delta = tick_period.tv64 * delta_jiffies;
> +
> +               if (time_delta > max_time_delta)
> +                       time_delta = max_time_delta;
> +
> +               /*
> +                * calculate the expiry time for the next timer wheel
> +                * timer
> +                */
> +               expires = ktime_add_ns(last_update, time_delta);
>
>                /*
>                 * If this cpu is the one which updates jiffies, then
> @@ -300,7 +313,7 @@ void tick_nohz_stop_sched_tick(int inidle)
>                if (cpu == tick_do_timer_cpu)
>                        tick_do_timer_cpu = TICK_DO_TIMER_NONE;
>
> -               if (delta_jiffies > 1)
> +               if (time_delta > tick_period.tv64)
>                        cpumask_set_cpu(cpu, nohz_cpu_mask);
>
>                /* Skip reprogram of event if its not changed */
> @@ -332,12 +345,13 @@ void tick_nohz_stop_sched_tick(int inidle)
>                ts->idle_sleeps++;
>
>                /*
> -                * delta_jiffies >= NEXT_TIMER_MAX_DELTA signals that
> -                * there is no timer pending or at least extremly far
> -                * into the future (12 days for HZ=1000). In this case
> -                * we simply stop the tick timer:
> +                * time_delta >= (tick_period.tv64 * NEXT_TIMER_MAX_DELTA)
> +                * signals that there is no timer pending or at least
> +                * extremely far into the future (12 days for HZ=1000).
> +                * In this case we simply stop the tick timer:
>                 */
> -               if (unlikely(delta_jiffies >= NEXT_TIMER_MAX_DELTA)) {
> +               if (unlikely(time_delta >=
> +                               (tick_period.tv64 * NEXT_TIMER_MAX_DELTA)))
> {
>                        ts->idle_expires.tv64 = KTIME_MAX;
>                        if (ts->nohz_mode == NOHZ_MODE_HIGHRES)
>                                hrtimer_cancel(&ts->sched_timer);
> diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
> index 687dff4..608fc6f 100644
> --- a/kernel/time/timekeeping.c
> +++ b/kernel/time/timekeeping.c
> @@ -271,6 +271,53 @@ int timekeeping_valid_for_hres(void)
>  }
>
>  /**
> + * timekeeping_max_deferment - Returns max time the clocksource can be
> deferred
> + *
> + * IMPORTANT: Must be called with xtime_lock held!
> + */
> +s64 timekeeping_max_deferment(void)
> +{
> +       s64 max_nsecs;
> +       u64 max_cycles;
> +
> +       /*
> +        * Calculate the maximum number of cycles that we can pass to the
> +        * cyc2ns function without overflowing a 64-bit signed result. The
> +        * maximum number of cycles is equal to ULLONG_MAX/clock->mult which
> +        * is equivalent to the below.
> +        * max_cycles < (2^63)/clock->mult
> +        * max_cycles < 2^(log2((2^63)/clock->mult))
> +        * max_cycles < 2^(log2(2^63) - log2(clock->mult))
> +        * max_cycles < 2^(63 - log2(clock->mult))
> +        * max_cycles < 1 << (63 - log2(clock->mult))
> +        * Please note that we add 1 to the result of the log2 to account
> for
> +        * any rounding errors, ensure the above inequality is satisfied and
> +        * no overflow will occur.
> +        */
> +       max_cycles = 1ULL << (63 - (ilog2(clock->mult) + 1));
> +
> +       /*
> +        * The actual maximum number of cycles we can defer the clocksource
> is
> +        * determined by the minimum of max_cycles and clock->mask.
> +        */
> +       max_cycles = min(max_cycles, clock->mask);
> +       max_nsecs = cyc2ns(clock, max_cycles);
> +
> +       /*
> +        * To ensure that the clocksource does not wrap whilst we are idle,
> +        * limit the time the clocksource can be deferred by 6.25%. Please
> +        * note a margin of 6.25% is used because this can be computed with
> +        * a shift, versus say 5% which would require division.
> +        */
> +       max_nsecs = max_nsecs - (max_nsecs >> 4);
> +
> +       if (max_nsecs < 0)
> +               max_nsecs = 0;
> +
> +       return max_nsecs;
> +}
> +
> +/**
>  * read_persistent_clock -  Return time in seconds from the persistent
> clock.
>  *
>  * Weak dummy function for arches that do not yet support it.
> --
> 1.6.1
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/2] Dynamic Tick: Prevent clocksource wrapping during idle
  2009-05-27 20:54 ` Alok Kataria
@ 2009-05-27 21:12   ` Thomas Gleixner
  0 siblings, 0 replies; 29+ messages in thread
From: Thomas Gleixner @ 2009-05-27 21:12 UTC (permalink / raw)
  To: Alok Kataria
  Cc: Jon Hunter, linux-kernel@vger.kernel.org, john stultz,
	Ingo Molnar, akataria

On Wed, 27 May 2009, Alok Kataria wrote:

> On Wed, May 27, 2009 at 7:49 AM, Jon Hunter <jon-hunter@ti.com> wrote:
> >
> > The dynamic tick allows the kernel to sleep for periods longer than a single
> > tick. This patch prevents that the kernel from sleeping for a period longer
> > than the maximum time that the current clocksource can count. This ensures
> > that the kernel will not lose track of time. This patch adds a new function
> > called "timekeeping_max_deferment()" that calculates the maximum time the
> > kernel can sleep for a given clocksource.
> >
> 
> >From the patch description I understand that this will avoid wrapping
> around for only the *current* clocksource. What happens if, say, TSC
> is the clocksource and ACPI_PM is being used as the
> watchdog_clocksource, in that case the timekeeping_max_deferement will
> give TSC' max allowed sleep value (which is greater than ACPI_PMs).
> i.e. We could still sleep beyond ACPI_PM's wrap around threshold which
> may result in us marking TSC as unsuable as a clocksource.
> That could still result in incorrect timekeeping right ?

No, because the watchdog timer takes care of that. It wakes up at time.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2009-11-13 19:50 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-08-18 17:45 [PATCH 0/2] Dynamic Tick: Enabling longer sleep times on 32-bit machines Jon Hunter
2009-08-18 17:45 ` [PATCH 1/2] Dynamic Tick: Prevent clocksource wrapping during idle Jon Hunter
2009-08-18 17:45   ` [PATCH 2/2] Dynamic Tick: Allow 32-bit machines to sleep for more than 2.15 seconds Jon Hunter
2009-08-18 19:26     ` Thomas Gleixner
2009-08-18 20:52       ` Jon Hunter
2009-11-13 19:50     ` [tip:timers/core] nohz: " tip-bot for Jon Hunter
2009-08-18 19:25   ` [PATCH 1/2] Dynamic Tick: Prevent clocksource wrapping during idle Thomas Gleixner
2009-08-18 20:42     ` Jon Hunter
2009-11-13 19:49   ` [tip:timers/core] nohz: " tip-bot for Jon Hunter
2009-11-11 20:43 ` [PATCH 0/2] Dynamic Tick: Enabling longer sleep times on 32-bit machines john stultz
2009-11-11 20:57   ` Jon Hunter
2009-11-11 22:37     ` john stultz
  -- strict thread matches above, loose matches on Subject: below --
2009-07-28  0:00 [PATCH 0/2] Dynamic Tick: Enabling longer sleep times on 32-bit Jon Hunter
2009-07-28  0:00 ` [PATCH 1/2] Dynamic Tick: Prevent clocksource wrapping during idle Jon Hunter
2009-05-27 14:49 Jon Hunter
2009-05-27 16:01 ` Thomas Gleixner
2009-05-27 20:20   ` john stultz
2009-05-27 20:32     ` Thomas Gleixner
2009-05-28 20:21       ` Jon Hunter
2009-05-28 20:36         ` Thomas Gleixner
2009-05-28 21:10           ` Jon Hunter
2009-05-28 21:43             ` John Stultz
2009-05-28 22:16             ` Thomas Gleixner
2009-05-29 19:43               ` Jon Hunter
2009-05-30  1:00             ` Jon Hunter
2009-06-04 19:29               ` Jon Hunter
2009-06-25 19:10                 ` Jon Hunter
2009-05-27 18:15 ` john stultz
2009-05-27 20:54 ` Alok Kataria
2009-05-27 21:12   ` Thomas Gleixner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox