linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] Fix for leapsecond caused hrtimer/futex issue
@ 2012-07-05 19:12 John Stultz
  2012-07-05 19:12 ` [PATCH 1/3] hrtimer: Fix clock_was_set so it is safe to call from irq context John Stultz
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: John Stultz @ 2012-07-05 19:12 UTC (permalink / raw)
  To: Linux Kernel; +Cc: John Stultz, Prarit Bhargava, stable, Thomas Gleixner

Thomas:
	So Prarit and my testing over the last few days have gone fine,
and its been quiet otherwise, so I wanted to go ahead and submit this
for inclusion.

As widely reported on the internet, many Linux systems after
the leapsecond was inserted experienced futex related load
spikes (usually connected to MySQL, Firefox, Thunderbird, Java, etc).

An apparent  workaround for this issue is running:
$ date -s "`date`"

Credit: http://www.sheeri.com/content/mysql-and-leap-second-high-cpu-and-fix


This issue stemmed from the timekeeping subsystem not notifying
the hrtimer subsystem that the leapsecond occurred, causing
CLOCK_REALTIME hritmers to be fired one second early, and 
sub-second CLOCK_REALTIME hrtimer timeouts to fire immediately
(causing the load spikes).


To address this issue I'm proposing we do three things:
1) Fix the clock_was_set() call to remove the limitation that kept
us from calling it from update_wall_time().

2) Call clock_was_set() when we add/remove a leapsecond.

3) Change hrtimer_interrupt to update the hrtimer base offset values.
This third item provides additional robustness should the
clock_was_set() notification (done via a timer if we're in_atomic)
be delayed significantly.


NOTE: Some reports have been of a hard hang right at or before
the leapsecond. I've not been able to reproduce or diagnose
this, so this fix does not likely address the reported hard
hangs (unless they end up being connected to the futex/hrtimer
issue). Please email lkml and me if you experienced this.

Big thanks to Prarit for shaking out a few issues in the earlier
version of this patch set, as well as the extra effort testing over
the Holiday!

Also, I've already got backports generated for -stable, that I'm
testing and I'll submitting them once I have upstream commit ids for
these patches.

thanks
-john

CC: Prarit Bhargava <prarit@redhat.com>
CC: stable@vger.kernel.org
CC: Thomas Gleixner <tglx@linutronix.de>


John Stultz (3):
  hrtimer: Fix clock_was_set so it is safe to call from irq context
  time: Fix leapsecond triggered hrtimer/futex load spike issue
  hrtimer: Update hrtimer base offsets each hrtimer_interrupt

 include/linux/hrtimer.h   |    3 +++
 kernel/hrtimer.c          |   31 +++++++++++++++++++++++++++----
 kernel/time/timekeeping.c |   38 ++++++++++++++++++++++++++++++++++++++
 3 files changed, 68 insertions(+), 4 deletions(-)

-- 
1.7.9.5


^ permalink raw reply	[flat|nested] 8+ messages in thread
* [PATCH 6/6] hrtimer: Update hrtimer base offsets each hrtimer_interrupt
@ 2012-07-10 22:43 John Stultz
  2012-07-11 21:45 ` [tip:timers/urgent] " tip-bot for John Stultz
  0 siblings, 1 reply; 8+ messages in thread
From: John Stultz @ 2012-07-10 22:43 UTC (permalink / raw)
  To: Linux Kernel
  Cc: John Stultz, Ingo Molnar, Peter Zijlstra, Prarit Bhargava,
	Thomas Gleixner, stable

The update of the hrtimer base offsets on all cpus cannot be made
atomically from the timekeeper.lock held and interrupt disabled region
as smp function calls are not allowed there.

clock_was_set(), which enforces the update on all cpus, is called
either from preemptible process context in case of do_settimeofday()
or from the softirq context when the offset modification happened in
the timer interrupt itself due to a leap second.

In both cases there is a race window for an hrtimer interrupt between
dropping timekeeper lock, enabling interrupts and clock_was_set()
issuing the updates. Any interrupt which arrives in that window will
see the new time but operate on stale offsets.

So we need to make sure that an hrtimer interrupt always sees a
consistent state of time and offsets.

ktime_get_update_offsets() allows us to get the current monotonic time
and update the per cpu hrtimer base offsets from hrtimer_interrupt()
to capture a consistent state of monotonic time and the offsets. The
function replaces the existing ktime_get() calls in hrtimer_interrupt().

The overhead of the new function vs. ktime_get() is minimal as it just
adds two store operations.

This ensures that any changes to realtime or boottime offsets are
noticed and stored into the per-cpu hrtimer base structures, prior to
any hrtimer expiration and guarantees that timers are not expired early.

CC: Ingo Molnar <mingo@kernel.org>
CC: Peter Zijlstra <a.p.zijlstra@chello.nl>
CC: Prarit Bhargava <prarit@redhat.com>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: stable@vger.kernel.org
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Stultz <johnstul@us.ibm.com>
---
 kernel/hrtimer.c |   28 ++++++++++++++--------------
 1 file changed, 14 insertions(+), 14 deletions(-)

diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
index 14a260a..669522b 100644
--- a/kernel/hrtimer.c
+++ b/kernel/hrtimer.c
@@ -657,6 +657,14 @@ static inline int hrtimer_enqueue_reprogram(struct hrtimer *timer,
 	return 0;
 }
 
+static inline ktime_t hrtimer_update_base(struct hrtimer_cpu_base *base)
+{
+	ktime_t *offs_real = &base->clock_base[HRTIMER_BASE_REALTIME].offset;
+	ktime_t *offs_boot = &base->clock_base[HRTIMER_BASE_BOOTTIME].offset;
+
+	return ktime_get_update_offsets(offs_real, offs_boot);
+}
+
 /*
  * Retrigger next event is called after clock was set
  *
@@ -665,22 +673,12 @@ static inline int hrtimer_enqueue_reprogram(struct hrtimer *timer,
 static void retrigger_next_event(void *arg)
 {
 	struct hrtimer_cpu_base *base = &__get_cpu_var(hrtimer_bases);
-	struct timespec realtime_offset, xtim, wtm, sleep;
 
 	if (!hrtimer_hres_active())
 		return;
 
-	/* Optimized out for !HIGH_RES */
-	get_xtime_and_monotonic_and_sleep_offset(&xtim, &wtm, &sleep);
-	set_normalized_timespec(&realtime_offset, -wtm.tv_sec, -wtm.tv_nsec);
-
-	/* Adjust CLOCK_REALTIME offset */
 	raw_spin_lock(&base->lock);
-	base->clock_base[HRTIMER_BASE_REALTIME].offset =
-		timespec_to_ktime(realtime_offset);
-	base->clock_base[HRTIMER_BASE_BOOTTIME].offset =
-		timespec_to_ktime(sleep);
-
+	hrtimer_update_base(base);
 	hrtimer_force_reprogram(base, 0);
 	raw_spin_unlock(&base->lock);
 }
@@ -710,7 +708,6 @@ static int hrtimer_switch_to_hres(void)
 		base->clock_base[i].resolution = KTIME_HIGH_RES;
 
 	tick_setup_sched_timer();
-
 	/* "Retrigger" the interrupt to get things going */
 	retrigger_next_event(NULL);
 	local_irq_restore(flags);
@@ -1264,7 +1261,7 @@ void hrtimer_interrupt(struct clock_event_device *dev)
 	dev->next_event.tv64 = KTIME_MAX;
 
 	raw_spin_lock(&cpu_base->lock);
-	entry_time = now = ktime_get();
+	entry_time = now = hrtimer_update_base(cpu_base);
 retry:
 	expires_next.tv64 = KTIME_MAX;
 	/*
@@ -1342,9 +1339,12 @@ retry:
 	 * We need to prevent that we loop forever in the hrtimer
 	 * interrupt routine. We give it 3 attempts to avoid
 	 * overreacting on some spurious event.
+	 *
+	 * Acquire base lock for updating the offsets and retrieving
+	 * the current time.
 	 */
 	raw_spin_lock(&cpu_base->lock);
-	now = ktime_get();
+	now = hrtimer_update_base(cpu_base);
 	cpu_base->nr_retries++;
 	if (++retries < 3)
 		goto retry;
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2012-07-11 21:45 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-07-05 19:12 [PATCH 0/3] Fix for leapsecond caused hrtimer/futex issue John Stultz
2012-07-05 19:12 ` [PATCH 1/3] hrtimer: Fix clock_was_set so it is safe to call from irq context John Stultz
2012-07-09  9:43   ` [tip:timers/urgent] " tip-bot for John Stultz
2012-07-05 19:12 ` [PATCH 2/3] time: Fix leapsecond triggered hrtimer/futex load spike issue John Stultz
2012-07-09  9:43   ` [tip:timers/urgent] time: Fix leapsecond triggered hrtimer/ futex " tip-bot for John Stultz
2012-07-05 19:12 ` [PATCH 3/3] hrtimer: Update hrtimer base offsets each hrtimer_interrupt John Stultz
2012-07-09  9:44   ` [tip:timers/urgent] " tip-bot for John Stultz
  -- strict thread matches above, loose matches on Subject: below --
2012-07-10 22:43 [PATCH 6/6] " John Stultz
2012-07-11 21:45 ` [tip:timers/urgent] " tip-bot for John Stultz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).