Re: [PATCH] clocksource: Avoid selecting mult values that might overflow when adjusted

All of lore.kernel.org
 help / color / mirror / Atom feed

From: John Stultz <john.stultz@linaro.org>
To: Ingo Molnar <mingo@elte.hu>
Cc: LKML <linux-kernel@vger.kernel.org>,
	Yong Zhang <yong.zhang0@gmail.com>,
	David Daney <ddaney.cavm@gmail.com>,
	Thomas Gleixner <tglx@linutronix.de>
Subject: Re: [PATCH] clocksource: Avoid selecting mult values that might overflow when adjusted
Date: Mon, 07 Nov 2011 19:09:00 -0800	[thread overview]
Message-ID: <1320721740.10668.23.camel@work-vm> (raw)
In-Reply-To: <20111103211000.GA17895@elte.hu>

On Thu, 2011-11-03 at 22:10 +0100, Ingo Molnar wrote:
> * John Stultz <john.stultz@linaro.org> wrote:
> 
> > For some frequqencies, the clocks_calc_mult_shift() function will
> > unfortunately select mult values very close to 0xffffffff.  This
> > has the potential to overflow when NTP adjusts the clock, adding
> > to the mult value.
> > 
> > This patch adds a clocksource.maxadj value, which provides
> > an approximation of an 11% adjustment(NTP limits adjustments to
> > 500ppm and the tick adjustment is limited to 10%), which could
> > be made to the clocksource.mult value. This is then used to both
> > check that the current mult value won't overflow/underflow, as
> > well as warning us if the timekeeping_adjust() code pushes over
> > that 11% boundary.
> > 
> > CC: Yong Zhang <yong.zhang0@gmail.com>
> > CC: David Daney <ddaney.cavm@gmail.com>
> > CC: Thomas Gleixner <tglx@linutronix.de>
> > Reported-by: Chen Jie <chenj@lemote.com>
> > Reported-by: zhangfx <zhangfx@lemote.com>
> > Signed-off-by: John Stultz <john.stultz@linaro.org>
> > ---
> >  include/linux/clocksource.h |    3 +-
> >  kernel/time/clocksource.c   |   53 ++++++++++++++++++++++++++++++++++--------
> >  kernel/time/timekeeping.c   |    3 ++
> >  3 files changed, 48 insertions(+), 11 deletions(-)
> 
> This patch (included in tip:timers/urgent) causes the following boot 
> warning x86:
> 
> [    0.000000] Fast TSC calibration using PIT
> [    0.000000] ------------[ cut here ]------------
> [    0.000000] WARNING: at kernel/time/timekeeping.c:855 do_timer+0x47f/0x4c0()
> [    0.000000] Hardware name: System Product Name
> [    0.000000] Adjusting more then 11%
> [    0.000000] Modules linked in:
> [    0.000000] Pid: 0, comm: swapper Not tainted 3.1.0-tip+ #161792
> [    0.000000] Call Trace:
> [    0.000000]  <IRQ>  [<ffffffff81042d0a>] warn_slowpath_common+0x7a/0xb0
> [    0.000000]  [<ffffffff81042de1>] warn_slowpath_fmt+0x41/0x50
> [    0.000000]  [<ffffffff8106e78f>] do_timer+0x47f/0x4c0
> [    0.000000]  [<ffffffff81073953>] tick_periodic+0x63/0x80
> [    0.000000]  [<ffffffff810739f1>] tick_handle_periodic+0x21/0x70
> [    0.000000]  [<ffffffff810046d8>] timer_interrupt+0x18/0x20
> [    0.000000]  [<ffffffff8109ff9e>] handle_irq_event_percpu+0x5e/0x220
> [    0.000000]  [<ffffffff810a019b>] handle_irq_event+0x3b/0x60
> [    0.000000]  [<ffffffff810a295c>] handle_level_irq+0x6c/0xd0
> [    0.000000]  [<ffffffff81003f34>] handle_irq+0x44/0xa0
> 
> Full bootlog and config attached.
> 
> i've excluded it from tip:master for now.

Thanks again for the bug report. I was able to reproduce it using the
jiffies clocksource. Looking at the code after a weekend of decent
sleep, I see the max_adjustment calculation is simply wrong (it was
proportional to the shift, not the mult - but adjustments are made on
mult).

This version simplifies the calculation and improves warn-on messages so
we also catch any overflow potential on clocksources that don't use the
clocksource_register_hz/khz interfaces.

Ingo: could you give it a whirl on your test box and verify it doesn't
have any trouble? 

Yong: Can you also give this a test run to make sure you don't see any
problems?

thanks
-john

>From 82c5b70fc5074b6bb6e05514afb6e9c73c740422 Mon Sep 17 00:00:00 2001
From: John Stultz <john.stultz@linaro.org>
Date: Mon, 31 Oct 2011 17:06:35 -0400
Subject: [PATCH] clocksource: Avoid selecting mult values that might overflow when adjusted

For some frequqencies, the clocks_calc_mult_shift() function will
unfortunately select mult values very close to 0xffffffff.  This
has the potential to overflow when NTP adjusts the clock, adding
to the mult value.

This patch adds a clocksource.maxadj value, which provides
an approximation of an 11% adjustment(NTP limits adjustments to
500ppm and the tick adjustment is limited to 10%), which could
be made to the clocksource.mult value. This is then used to both
check that the current mult value won't overflow/underflow, as
well as warning us if the timekeeping_adjust() code pushes over
that 11% boundary.

v2: Fix max_adjustment calculation, and improve WARN_ONCE
messages.

CC: Yong Zhang <yong.zhang0@gmail.com>
CC: David Daney <ddaney.cavm@gmail.com>
CC: Thomas Gleixner <tglx@linutronix.de>
Reported-by: Chen Jie <chenj@lemote.com>
Reported-by: zhangfx <zhangfx@lemote.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
---
 include/linux/clocksource.h |    3 +-
 kernel/time/clocksource.c   |   58 +++++++++++++++++++++++++++++++++++-------
 kernel/time/timekeeping.c   |    6 ++++
 3 files changed, 56 insertions(+), 11 deletions(-)

diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h
index 139c4db..c86c940 100644
--- a/include/linux/clocksource.h
+++ b/include/linux/clocksource.h
@@ -156,6 +156,7 @@ extern u64 timecounter_cyc2time(struct timecounter *tc,
  * @mult:		cycle to nanosecond multiplier
  * @shift:		cycle to nanosecond divisor (power of two)
  * @max_idle_ns:	max idle time permitted by the clocksource (nsecs)
+ * @maxadj		maximum adjustment value to mult (~11%)
  * @flags:		flags describing special properties
  * @archdata:		arch-specific data
  * @suspend:		suspend function for the clocksource, if necessary
@@ -172,7 +173,7 @@ struct clocksource {
 	u32 mult;
 	u32 shift;
 	u64 max_idle_ns;
-
+	u32 maxadj;
 #ifdef CONFIG_ARCH_CLOCKSOURCE_DATA
 	struct arch_clocksource_data archdata;
 #endif
diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index cf52fda..cfc65e1 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -492,6 +492,22 @@ void clocksource_touch_watchdog(void)
 }
 
 /**
+ * clocksource_max_adjustment- Returns max adjustment amount
+ * @cs:         Pointer to clocksource
+ *
+ */
+static u32 clocksource_max_adjustment(struct clocksource *cs)
+{
+	u64 ret;
+	/*
+	 * We won't try to correct for more then 11% adjustments (110,000 ppm),
+	 */
+	ret = (u64)cs->mult * 11;
+	do_div(ret,100);
+	return (u32)ret;
+}
+
+/**
  * clocksource_max_deferment - Returns max time the clocksource can be deferred
  * @cs:         Pointer to clocksource
  *
@@ -503,25 +519,28 @@ static u64 clocksource_max_deferment(struct clocksource *cs)
 	/*
 	 * Calculate the maximum number of cycles that we can pass to the
 	 * cyc2ns function without overflowing a 64-bit signed result. The
-	 * maximum number of cycles is equal to ULLONG_MAX/cs->mult which
-	 * is equivalent to the below.
-	 * max_cycles < (2^63)/cs->mult
-	 * max_cycles < 2^(log2((2^63)/cs->mult))
-	 * max_cycles < 2^(log2(2^63) - log2(cs->mult))
-	 * max_cycles < 2^(63 - log2(cs->mult))
-	 * max_cycles < 1 << (63 - log2(cs->mult))
+	 * maximum number of cycles is equal to ULLONG_MAX/(cs->mult+cs->maxadj)
+	 * which is equivalent to the below.
+	 * max_cycles < (2^63)/(cs->mult + cs->maxadj)
+	 * max_cycles < 2^(log2((2^63)/(cs->mult + cs->maxadj)))
+	 * max_cycles < 2^(log2(2^63) - log2(cs->mult + cs->maxadj))
+	 * max_cycles < 2^(63 - log2(cs->mult + cs->maxadj))
+	 * max_cycles < 1 << (63 - log2(cs->mult + cs->maxadj))
 	 * Please note that we add 1 to the result of the log2 to account for
 	 * any rounding errors, ensure the above inequality is satisfied and
 	 * no overflow will occur.
 	 */
-	max_cycles = 1ULL << (63 - (ilog2(cs->mult) + 1));
+	max_cycles = 1ULL << (63 - (ilog2(cs->mult + cs->maxadj) + 1));
 
 	/*
 	 * The actual maximum number of cycles we can defer the clocksource is
 	 * determined by the minimum of max_cycles and cs->mask.
+	 * Note: Here we subtract the maxadj to make sure we don't sleep for
+	 * too long if there's a large negative adjustment.
 	 */
 	max_cycles = min_t(u64, max_cycles, (u64) cs->mask);
-	max_nsecs = clocksource_cyc2ns(max_cycles, cs->mult, cs->shift);
+	max_nsecs = clocksource_cyc2ns(max_cycles, cs->mult - cs->maxadj,
+					cs->shift);
 
 	/*
 	 * To ensure that the clocksource does not wrap whilst we are idle,
@@ -640,7 +659,6 @@ static void clocksource_enqueue(struct clocksource *cs)
 void __clocksource_updatefreq_scale(struct clocksource *cs, u32 scale, u32 freq)
 {
 	u64 sec;
-
 	/*
 	 * Calc the maximum number of seconds which we can run before
 	 * wrapping around. For clocksources which have a mask > 32bit
@@ -661,6 +679,20 @@ void __clocksource_updatefreq_scale(struct clocksource *cs, u32 scale, u32 freq)
 
 	clocks_calc_mult_shift(&cs->mult, &cs->shift, freq,
 			       NSEC_PER_SEC / scale, sec * scale);
+
+	/*
+	 * for clocksources that have large mults, to avoid overflow.
+	 * Since mult may be adjusted by ntp, add an safety extra margin
+	 *
+	 */
+	cs->maxadj = clocksource_max_adjustment(cs);
+	while ((cs->mult + cs->maxadj < cs->mult)
+		|| (cs->mult - cs->maxadj > cs->mult)) {
+		cs->mult >>= 1;
+		cs->shift--;
+		cs->maxadj = clocksource_max_adjustment(cs);
+	}
+
 	cs->max_idle_ns = clocksource_max_deferment(cs);
 }
 EXPORT_SYMBOL_GPL(__clocksource_updatefreq_scale);
@@ -701,6 +733,12 @@ EXPORT_SYMBOL_GPL(__clocksource_register_scale);
  */
 int clocksource_register(struct clocksource *cs)
 {
+	/* calculate max adjustment for given mult/shift */
+	cs->maxadj = clocksource_max_adjustment(cs);
+	WARN_ONCE(cs->mult + cs->maxadj < cs->mult,
+		"Clocksource %s might overflow on 11%% adjustment\n",
+		cs->name);
+
 	/* calculate max idle time permitted for this clocksource */
 	cs->max_idle_ns = clocksource_max_deferment(cs);
 
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 2b021b0e..2c04610 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -820,6 +820,12 @@ static void timekeeping_adjust(s64 offset)
 	} else
 		return;
 
+	WARN_ONCE(timekeeper.mult+adj >
+			timekeeper.clock->mult + timekeeper.clock->maxadj,
+			"Adjusting %s more then 11%% (%ld vs %ld)\n",
+			timekeeper.clock->name, (long)timekeeper.mult+adj,
+			(long)timekeeper.clock->mult +
+				timekeeper.clock->maxadj);
 	timekeeper.mult += adj;
 	timekeeper.xtime_interval += interval;
 	timekeeper.xtime_nsec -= offset;
-- 
1.7.3.2.146.gca209

next prev parent reply	other threads:[~2011-11-08  3:09 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-11-02 20:01 [PATCH] clocksource: Avoid selecting mult values that might overflow when adjusted John Stultz
2011-11-03  3:10 ` Yong Zhang
2011-11-03  9:36   ` Américo Wang
2011-11-04  2:16     ` Yong Zhang
2011-11-03 12:05 ` Thomas Gleixner
2011-11-03 13:10   ` John Stultz
2011-11-03 13:26     ` Thomas Gleixner
2011-11-03 14:01       ` John Stultz
2011-11-03 14:09         ` John Stultz
2011-11-03 14:49           ` Thomas Gleixner
2011-11-03 14:52             ` Thomas Gleixner
2011-11-03 15:14               ` John Stultz
2011-11-03 21:10 ` Ingo Molnar
2011-11-04 13:11   ` John Stultz
2011-11-04 15:20     ` Ingo Molnar
2011-11-08  3:09   ` John Stultz [this message]
2011-11-08  3:11     ` Yong Zhang
2011-11-08  5:02     ` Yong Zhang
2011-11-08 21:39       ` John Stultz
2011-11-09  1:46         ` Yong Zhang
2011-11-10 15:05           ` Ingo Molnar
  -- strict thread matches above, loose matches on Subject: below --
2011-11-09  2:08 John Stultz

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:139c4db dfblob:c86c940 dfblob:cf52fda dfblob:cfc65e1
dfblob:2b021b0 dfblob:2c04610 )
 OR (
bs:"clocksource: Avoid selecting mult values that might overflow when adjusted" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1320721740.10668.23.camel@work-vm \
    --to=john.stultz@linaro.org \
    --cc=ddaney.cavm@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=tglx@linutronix.de \
    --cc=yong.zhang0@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.