[PATCH 0/3][RFC] Potential fix for leapsecond caused futex issue (v3)

stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/3][RFC] Potential fix for leapsecond caused futex issue (v3)
@ 2012-07-03  2:16 John Stultz
  2012-07-03  2:16 ` [PATCH 1/3] [RFC] hrtimer: Fix clock_was_set so it is safe to call from atomic John Stultz
                   ` (3 more replies)
  0 siblings, 4 replies; 10+ messages in thread
From: John Stultz @ 2012-07-03  2:16 UTC (permalink / raw)
  To: Linux Kernel; +Cc: John Stultz, Prarit Bhargava, stable, Thomas Gleixner

As widely reported on the internet, many Linux systems after
the leapsecond was inserted are experiencing futex related load
spikes (usually connected to MySQL, Firefox, Thunderbird, Java, etc).

An apparent  workaround for this issue is running:
$ date -s "`date`"

Credit: http://www.sheeri.com/content/mysql-and-leap-second-high-cpu-and-fix

To address this issue I'm proposing we do three things:
1) Fix the clock_was_set() call to remove the limitation that kept
us from calling it from update_wall_time().

2) Call clock_was_set() when we add/remove a leapsecond.

3) Change hrtimer_interrupt to update the hrtimer base offset values.
This third item provides additional robustness should the
clock_was_set() notification (done via a timer if we're in_atomic)
be delayed significantly.

This third item is new and tries to better address the fact that
the hrtimer code caches its sense of time separately from the
timekeeping core. This is necessary for performance reasons, as
hrtimer code is a very hot path, but opens up races between when
the time offsets have changed and when the hrtimer code updates
its bases on each cpu. By updating the base offsets prior to
doing any expiration, we ensure no timers are expired early.

Close review, however, would be appreciated.

I'm fairly happy with this set of changes, so if there's no
objections, I'd propose merging these for 3.5, and I'll
start generating backports for -stable (unfortunately
these won't apply trivially to 3.3 and prior kernels).

I'm also looking to see if we can consolidate the per-cpu base
offset values, so they are not per-cpu and are protected by their
own lock, allowing us to update them quickly from atomic context, 
even while holding the timekeeper.lock (currently I believe there's
the risk of having an ABBA deadlock between the base.lock and the
timekeeper.lock if we try to update the base offsets under
the timekeepr lock). However this will be potentially a more
significant change and wouldn't be appropriate for backporting,
so I want to get these three changes to fix the issue merged first.

NOTE: Some reports have been of a hard hang right at or before
the leapsecond. I've not been able to reproduce or diagnose
this, so this fix does not likely address the reported hard
hangs (unless they end up being connected to the futex/hrtimer
issue). Please email lkml and me if you experienced this.

TODOs:
* Collect feedback & acks
* Submit for merging.
* Generate a backports for pre-v3.4 kernels

v2:
* Address the issue w/ calling clock_was_set from atomic context,
pointed out by Prarit and Ben.
* Rework fix so its simpler.

v3:
* Change from using a work item to a timer for scheduling the
do_clock_was_set() call sooner.
* Add hrtimer_interrupt base offset updating

CC: Prarit Bhargava <prarit@redhat.com>
CC: stable@vger.kernel.org
CC: Thomas Gleixner <tglx@linutronix.de>
Reported-by: Jan Engelhardt <jengelh@inai.de>
Signed-off-by: John Stultz <johnstul@us.ibm.com>

John Stultz (3):
  [RFC] hrtimer: Fix clock_was_set so it is safe to call from atomic
  [RFC] time: Fix leapsecond triggered hrtimer/futex load spike issue
  [RFC] hrtimer: Update hrtimer base offsets each hrtimer_interrupt

 include/linux/hrtimer.h   |    3 +++
 kernel/hrtimer.c          |   33 +++++++++++++++++++++++++++++----
 kernel/time/timekeeping.c |   39 +++++++++++++++++++++++++++++++++++++++
 3 files changed, 71 insertions(+), 4 deletions(-)

-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/3] [RFC] hrtimer: Fix clock_was_set so it is safe to call from atomic
  2012-07-03  2:16 [PATCH 0/3][RFC] Potential fix for leapsecond caused futex issue (v3) John Stultz
@ 2012-07-03  2:16 ` John Stultz
  2012-07-03  2:16 ` [PATCH 2/3] [RFC] time: Fix leapsecond triggered hrtimer/futex load spike issue John Stultz
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 10+ messages in thread
From: John Stultz @ 2012-07-03  2:16 UTC (permalink / raw)
  To: Linux Kernel; +Cc: John Stultz, Prarit Bhargava, stable, Thomas Gleixner

NOTE:This is a prerequisite patch that's required to
address the widely observed leap-second related futex/hrtimer
issues.

Currently clock_was_set() is unsafe to be called from atomic
context, as it calls on_each_cpu(). This causes problems when
we need to adjust the time from update_wall_time().

To fix this, if clock_was_set is called we're in_atomic,
we schedule a timer to fire for immedately after we're
out of interrupt context to then notify the hrtimer
subsystem.

CC: Prarit Bhargava <prarit@redhat.com>
CC: stable@vger.kernel.org
CC: Thomas Gleixner <tglx@linutronix.de>
Reported-by: Jan Engelhardt <jengelh@inai.de>
Signed-off-by: John Stultz <johnstul@us.ibm.com>
---
 kernel/hrtimer.c |   19 ++++++++++++++++++-
 1 file changed, 18 insertions(+), 1 deletion(-)

diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
index ae34bf5..393fd4d 100644
--- a/kernel/hrtimer.c
+++ b/kernel/hrtimer.c
@@ -746,7 +746,7 @@ static inline void retrigger_next_event(void *arg) { }
  * resolution timer interrupts. On UP we just disable interrupts and
  * call the high resolution interrupt code.
  */
-void clock_was_set(void)
+static void do_clock_was_set(unsigned long data)
 {
 #ifdef CONFIG_HIGH_RES_TIMERS
 	/* Retrigger the CPU local events everywhere */
@@ -755,6 +755,21 @@ void clock_was_set(void)
 	timerfd_clock_was_set();
 }
 
+static struct timer_list clock_was_set_timer;
+
+void clock_was_set(void)
+{
+	/*
+	 * We can't call on_each_cpu() from atomic context,
+	 * so if we're in_atomic, schedule the clock_was_set
+	 * via a timer_list timer for right after.
+	 */
+	if (in_atomic())
+		mod_timer(&clock_was_set_timer, jiffies);
+	else
+		do_clock_was_set(0);
+}
+
 /*
  * During resume we might have to reprogram the high resolution timer
  * interrupt (on the local CPU):
@@ -1152,6 +1167,8 @@ static void __hrtimer_init(struct hrtimer *timer, clockid_t clock_id,
 	base = hrtimer_clockid_to_base(clock_id);
 	timer->base = &cpu_base->clock_base[base];
 	timerqueue_init(&timer->node);
+	init_timer(&clock_was_set_timer);
+	clock_was_set_timer.function = do_clock_was_set;
 
 #ifdef CONFIG_TIMER_STATS
 	timer->start_site = NULL;
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 2/3] [RFC] time: Fix leapsecond triggered hrtimer/futex load spike issue
  2012-07-03  2:16 [PATCH 0/3][RFC] Potential fix for leapsecond caused futex issue (v3) John Stultz
  2012-07-03  2:16 ` [PATCH 1/3] [RFC] hrtimer: Fix clock_was_set so it is safe to call from atomic John Stultz
@ 2012-07-03  2:16 ` John Stultz
  2012-07-03  2:16 ` [PATCH 3/3] [RFC] hrtimer: Update hrtimer base offsets each hrtimer_interrupt John Stultz
  2012-07-03  6:09 ` [PATCH 0/3][RFC] Potential fix for leapsecond caused futex issue (v3) John Stultz
  3 siblings, 0 replies; 10+ messages in thread
From: John Stultz @ 2012-07-03  2:16 UTC (permalink / raw)
  To: Linux Kernel; +Cc: John Stultz, Prarit Bhargava, stable, Thomas Gleixner

As widely reported on the internet, some Linux systems after
the leapsecond was inserted are experiencing futex related load
spikes (usually connected to MySQL, Firefox, Thunderbird, Java, etc).

An apparent for this issue workaround is running:
$ date -s "`date`"

Credit: http://www.sheeri.com/content/mysql-and-leap-second-high-cpu-and-fix

I this issue is due to the leapsecond being added without
calling clock_was_set() to notify the hrtimer subsystem of the
change.

The workaround functions as it forces a clock_was_set()
call from settimeofday().

This fix adds the required clock_was_set() calls to where
we adjust for leapseconds.

NOTE: This fix *depends* on the previous fix, which allows
clock_was_set to be called from atomic context. Do not try
to apply just this patch.

CC: Prarit Bhargava <prarit@redhat.com>
CC: stable@vger.kernel.org
CC: Thomas Gleixner <tglx@linutronix.de>
Reported-by: Jan Engelhardt <jengelh@inai.de>
Signed-off-by: John Stultz <johnstul@us.ibm.com>
---
 kernel/time/timekeeping.c |    4 ++++
 1 file changed, 4 insertions(+)

diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 6f46a00..cc2991d 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -963,6 +963,8 @@ static cycle_t logarithmic_accumulation(cycle_t offset, int shift)
 		leap = second_overflow(timekeeper.xtime.tv_sec);
 		timekeeper.xtime.tv_sec += leap;
 		timekeeper.wall_to_monotonic.tv_sec -= leap;
+		if (leap)
+			clock_was_set();
 	}
 
 	/* Accumulate raw time */
@@ -1079,6 +1081,8 @@ static void update_wall_time(void)
 		leap = second_overflow(timekeeper.xtime.tv_sec);
 		timekeeper.xtime.tv_sec += leap;
 		timekeeper.wall_to_monotonic.tv_sec -= leap;
+		if (leap)
+			clock_was_set();
 	}
 
 	timekeeping_update(false);
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 3/3] [RFC] hrtimer: Update hrtimer base offsets each hrtimer_interrupt
  2012-07-03  2:16 [PATCH 0/3][RFC] Potential fix for leapsecond caused futex issue (v3) John Stultz
  2012-07-03  2:16 ` [PATCH 1/3] [RFC] hrtimer: Fix clock_was_set so it is safe to call from atomic John Stultz
  2012-07-03  2:16 ` [PATCH 2/3] [RFC] time: Fix leapsecond triggered hrtimer/futex load spike issue John Stultz
@ 2012-07-03  2:16 ` John Stultz
  2012-07-03  6:09 ` [PATCH 0/3][RFC] Potential fix for leapsecond caused futex issue (v3) John Stultz
  3 siblings, 0 replies; 10+ messages in thread
From: John Stultz @ 2012-07-03  2:16 UTC (permalink / raw)
  To: Linux Kernel; +Cc: John Stultz, Prarit Bhargava, stable, Thomas Gleixner

This patch introduces a new funciton which captures the
CLOCK_MONOTONIC time, along with the CLOCK_REALTIME and
CLOCK_BOOTTIME offsets at the same moment. This new function
is then used in place of ktime_get() when hrtimer_interrupt()
is expiring timers.

This ensures that any changes to realtime or boottime offsets
are noticed and stored into the per-cpu hrtimer base structures,
prior to doing any hrtimer expiration. This should ensure that
timers are not expired early if the offsets changes under us.

This is useful in the case where clock_was_set() is called from
atomic context and have to schedule the hrtimer base offset update
via a timer, as it provides extra robustness in the face of any
possible timer delay.

CC: Prarit Bhargava <prarit@redhat.com>
CC: stable@vger.kernel.org
CC: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Stultz <johnstul@us.ibm.com>
---
 include/linux/hrtimer.h   |    3 +++
 kernel/hrtimer.c          |   14 +++++++++++---
 kernel/time/timekeeping.c |   35 +++++++++++++++++++++++++++++++++++
 3 files changed, 49 insertions(+), 3 deletions(-)

diff --git a/include/linux/hrtimer.h b/include/linux/hrtimer.h
index fd0dc30..f6b2a74 100644
--- a/include/linux/hrtimer.h
+++ b/include/linux/hrtimer.h
@@ -320,6 +320,9 @@ extern ktime_t ktime_get(void);
 extern ktime_t ktime_get_real(void);
 extern ktime_t ktime_get_boottime(void);
 extern ktime_t ktime_get_monotonic_offset(void);
+extern void ktime_get_and_real_and_sleep_offset(ktime_t *monotonic,
+						ktime_t *real_offset,
+						ktime_t *sleep_offset);
 
 DECLARE_PER_CPU(struct tick_device, tick_cpu_device);
 
diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
index 393fd4d..0e78b3e 100644
--- a/kernel/hrtimer.c
+++ b/kernel/hrtimer.c
@@ -1260,18 +1260,26 @@ static void __run_hrtimer(struct hrtimer *timer, ktime_t *now)
 void hrtimer_interrupt(struct clock_event_device *dev)
 {
 	struct hrtimer_cpu_base *cpu_base = &__get_cpu_var(hrtimer_bases);
-	ktime_t expires_next, now, entry_time, delta;
+	ktime_t expires_next, now, entry_time, delta, real_offset, sleep_offset;
 	int i, retries = 0;
 
 	BUG_ON(!cpu_base->hres_active);
 	cpu_base->nr_events++;
 	dev->next_event.tv64 = KTIME_MAX;
 
-	entry_time = now = ktime_get();
+
+	ktime_get_and_real_and_sleep_offset(&now, &real_offset, &sleep_offset);
+
+	entry_time = now;
 retry:
 	expires_next.tv64 = KTIME_MAX;
 
 	raw_spin_lock(&cpu_base->lock);
+
+	/* Update base offsets, to avoid early wakeups */
+	cpu_base->clock_base[HRTIMER_BASE_REALTIME].offset = real_offset;
+	cpu_base->clock_base[HRTIMER_BASE_BOOTTIME].offset = sleep_offset;
+
 	/*
 	 * We set expires_next to KTIME_MAX here with cpu_base->lock
 	 * held to prevent that a timer is enqueued in our queue via
@@ -1348,7 +1356,7 @@ retry:
 	 * interrupt routine. We give it 3 attempts to avoid
 	 * overreacting on some spurious event.
 	 */
-	now = ktime_get();
+	ktime_get_and_real_and_sleep_offset(&now, &real_offset, &sleep_offset);
 	cpu_base->nr_retries++;
 	if (++retries < 3)
 		goto retry;
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index cc2991d..c2ba132 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -1250,6 +1250,41 @@ void get_xtime_and_monotonic_and_sleep_offset(struct timespec *xtim,
 	} while (read_seqretry(&timekeeper.lock, seq));
 }
 
+
+void ktime_get_and_real_and_sleep_offset(ktime_t *monotonic,
+						ktime_t *real_offset,
+						ktime_t *sleep_offset)
+{
+	unsigned long seq;
+	struct timespec wtom, sleep;
+	u64 secs, nsecs;
+
+	do {
+		seq = read_seqbegin(&timekeeper.lock);
+
+		secs = timekeeper.xtime.tv_sec +
+				timekeeper.wall_to_monotonic.tv_sec;
+		nsecs = timekeeper.xtime.tv_nsec +
+				timekeeper.wall_to_monotonic.tv_nsec;
+		nsecs += timekeeping_get_ns();
+		/* If arch requires, add in gettimeoffset() */
+		nsecs += arch_gettimeoffset();
+
+		wtom = timekeeper.wall_to_monotonic;
+		sleep = timekeeper.total_sleep_time;
+	} while (read_seqretry(&timekeeper.lock, seq));
+
+
+	*monotonic = ktime_add_ns(ktime_set(secs, 0), nsecs);
+	set_normalized_timespec(&wtom, -wtom.tv_sec, -wtom.tv_nsec);
+	*real_offset =	timespec_to_ktime(wtom);
+	*sleep_offset = timespec_to_ktime(sleep);
+}
+
+
+
+
+
 /**
  * ktime_get_monotonic_offset() - get wall_to_monotonic in ktime_t format
  */
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 0/3][RFC] Potential fix for leapsecond caused futex issue (v3)
  2012-07-03  2:16 [PATCH 0/3][RFC] Potential fix for leapsecond caused futex issue (v3) John Stultz
                   ` (2 preceding siblings ...)
  2012-07-03  2:16 ` [PATCH 3/3] [RFC] hrtimer: Update hrtimer base offsets each hrtimer_interrupt John Stultz
@ 2012-07-03  6:09 ` John Stultz
  2012-07-03 15:27   ` Prarit Bhargava
  3 siblings, 1 reply; 10+ messages in thread
From: John Stultz @ 2012-07-03  6:09 UTC (permalink / raw)
  To: John Stultz; +Cc: Linux Kernel, Prarit Bhargava, stable, Thomas Gleixner

[-- Attachment #1: Type: text/plain, Size: 2151 bytes --]

On 07/02/2012 07:16 PM, John Stultz wrote:
> NOTE: Some reports have been of a hard hang right at or before
> the leapsecond. I've not been able to reproduce or diagnose
> this, so this fix does not likely address the reported hard
> hangs (unless they end up being connected to the futex/hrtimer
> issue). Please email lkml and me if you experienced this.

Since as noted above, I've seen some sporadic reports of hard hangs. 
Some seem connected to the hrtimer problem, where ksoftirq seems to go 
crazy and cause nmi watchdog lockups, but others are less clear.

I wanted to try to provide a way to stress both the kernel's leapsecond 
code as well as provide a way for folks to be able to test their 
application's robustness in the face of leapsecond inconsistencies.

Attached is my first attempt at such a test.

It is designed to be run on a server, where it will schedule a 
leapsecond every day at midnight GMT.  So every day, while it runs, the 
server will see a leapsecond.  This allows the the leap second, as well 
as any suspected timer related lockups that might happen when the 
leapsecond is scheduled to be stressed.

The test also outputs time samples right before, during and after the 
leapsecond is applied, so you can watch it happen.

Also since once a day is a fairly low frequency, if you pass a "-s" to 
the test, it will jump the system time forward to 10 seconds right 
before the scheduled leapsecond for that day. Allowing a leapsecond to 
occur every ~13 seconds. This mode may cause application disruption, as 
it also causes the system to advance a day every ~13 seconds.

The test additionally will note if it observes the hrtimer early 
expiration problem that was widely seen over the weekend.

Hopefully this will provide a mechanism to test and maintain the 
kernel's correct behaviour for these rare events, as well as allowing 
folks to get more comfortable with leapsecond behaviour and test how it 
might impact their applications.

If anyone who observed a hard hang is able to use this to reproduce the 
problem, I'd greatly like to hear about it.

Build instructions are in the test file.

thanks
-john

[-- Attachment #2: leap-a-day.c --]
[-- Type: text/x-csrc, Size: 5225 bytes --]

/* Leap second stress test
 *              by: john stultz (johnstul@us.ibm.com)
 *              (C) Copyright IBM 2012
 *              Licensed under the GPLv2
 *
 *  This test signals the kernel to insert a leap second
 *  every day at midnight GMT. This allows for stessing the
 *  kernel's leap-second behavior, as well as how well applications
 *  handle the leap-second discontinuity.
 *
 *  Usage: leap-a-day [-s]
 *
 *  Options:
 *	-s:	Each iteration, set the date to 10 seconds before midnight GMT.
 *		This speeds up the number of leapsecond transitions tested,
 *		but because it calls settimeofday frequently, advancing the
 *		time by 24 hours every ~16 seconds, it may cause application
 *		disruption.
 *
 *  Other notes: Disabling NTP prior to running this is advised, as the two
 *		 may conflict in thier commands to the kernel.
 *
 *  To build:
 *	$ gcc leap-a-day.c -o leap-a-day -lrt
 */

#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <sys/time.h>
#include <sys/timex.h>
#include <string.h>
#include <signal.h>

#define NSEC_PER_SEC 1000000000ULL

/* returns 1 if a <= b, 0 otherwise */
static inline int in_order(struct timespec a, struct timespec b)
{
        if(a.tv_sec < b.tv_sec)
                return 1;
        if(a.tv_sec > b.tv_sec)
                return 0;
        if(a.tv_nsec > b.tv_nsec)
                return 0;
        return 1;
}

struct timespec timespec_add(struct timespec ts, unsigned long long ns)
{
	ts.tv_nsec += ns;
	while(ts.tv_nsec >= NSEC_PER_SEC) {
		ts.tv_nsec -= NSEC_PER_SEC;
		ts.tv_sec++;
	}
	return ts;
}

char* time_state_str(int state)
{
	switch (state) {
		case TIME_OK:	return "TIME_OK";
		case TIME_INS:	return "TIME_INS";
		case TIME_DEL:	return "TIME_DEL";
		case TIME_OOP:	return "TIME_OOP";
		case TIME_WAIT:	return "TIME_WAIT";
		case TIME_BAD:	return "TIME_BAD";
	}
	return "ERROR"; 
}

/* clear NTP time_status & time_state */
void clear_time_state(void)
{
	struct timex tx;
	int ret;

	/*
	 * XXX - The fact we have to call this twice seems
	 * to point to a slight issue in the kernel's ntp state
	 * managment. Needs to be investigated further.
	 */

	tx.modes = ADJ_STATUS;
	tx.status = STA_PLL;
	ret = adjtimex(&tx);

	tx.modes = ADJ_STATUS;
	tx.status = 0;
	ret = adjtimex(&tx);
}

/* Make sure we cleanup on ctrl-c */
void handler(int unused)
{
	clear_time_state();
	exit(0);
}

/* Test for known hrtimer failure */
void test_hrtimer_failure(void)
{
	struct timespec now, target;

	clock_gettime(CLOCK_REALTIME, &now);
	target = timespec_add(now, NSEC_PER_SEC/2);
	clock_nanosleep(CLOCK_REALTIME, TIMER_ABSTIME, &target, NULL);
	clock_gettime(CLOCK_REALTIME, &now);

	if (!in_order(target, now)) {
		printf("Note: hrtimer early expiration failure observed.\n");
	}

}

int main(int argc, char** argv) 
{
	struct timeval tv;
	struct timex tx;
	struct timespec ts;
	int settime = 0;

	signal(SIGINT, handler);
	signal(SIGKILL, handler);
	printf("This runs continuously. Press ctrl-c to stop\n");

	/* Process arguments */
	if (argc > 1) {
		if (!strncmp(argv[1], "-s", 2)) {
			printf("Setting time to speed up testing\n");
			settime = 1;
		} else {
			printf("Usage: %s [-s]\n", argv[0]);
			printf("	-s: Set time to right before leap second each iteration\n");
		}
	}

	printf("\n");
	while (1) {
		int ret;
		time_t now, next_leap;
		/* Get the current time */
		gettimeofday(&tv, NULL);

		/* Calculate the next possible leap second 23:59:60 GMT */
		tv.tv_sec += 86400 - (tv.tv_sec % 86400);
		next_leap = ts.tv_sec = tv.tv_sec;

		if (settime) {
			tv.tv_sec -= 10;
			settimeofday(&tv, NULL);
			printf("Setting time to %s", ctime(&ts.tv_sec));
		}

		/* Reset NTP time state */
		clear_time_state();

		/* Set the leap second insert flag */
		tx.modes = ADJ_STATUS;
		tx.status = STA_INS;
		ret = adjtimex(&tx);
		if (ret) {
			printf("Error: Problem setting STA_INS!: %s\n",
							time_state_str(ret));
			return -1;
		}

		/* Validate STA_INS was set */
		ret = adjtimex(&tx);
		if (tx.status != STA_INS) {
			printf("Error: STA_INS not set!: %s\n",
							time_state_str(ret));
			return -1;
		}

		printf("Scheduling leap second for %s", ctime(&next_leap));

		/* Wake up 3 seconds before leap */
		ts.tv_sec -= 3;
		if(clock_nanosleep(CLOCK_REALTIME, TIMER_ABSTIME, &ts, NULL))
			printf("Something woke us up, returning to sleep\n");

		/* Validate STA_INS is still set */
		ret = adjtimex(&tx);
		if (tx.status != STA_INS) {
			printf("Something cleared STA_INS, setting it again.\n");
			tx.modes = ADJ_STATUS;
			tx.status = STA_INS;
			ret = adjtimex(&tx);
		}

		/* Check adjtimex output every half second */
		now = tx.time.tv_sec;
		while (now < next_leap+2) {
			char buf[26];
			ret = adjtimex(&tx);

			ctime_r(&tx.time.tv_sec, buf);
			buf[strlen(buf)-1] = 0; /*remove trailing\n */

			printf("%s + %6ld us\t%s\n",
					buf,
					tx.time.tv_usec, 
					time_state_str(ret));
			now = tx.time.tv_sec;
			/* Sleep for another half second */
			ts.tv_sec = 0;
			ts.tv_nsec = NSEC_PER_SEC/2;
			clock_nanosleep(CLOCK_MONOTONIC, 0, &ts, NULL);			
		}

		/* Note if kernel has known hrtimer failure */
		test_hrtimer_failure();

		printf("Leap complete\n\n");
	}

	clear_time_state();
	return 0;
}

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 0/3][RFC] Potential fix for leapsecond caused futex issue (v3)
  2012-07-03  6:09 ` [PATCH 0/3][RFC] Potential fix for leapsecond caused futex issue (v3) John Stultz
@ 2012-07-03 15:27   ` Prarit Bhargava
  2012-07-03 16:02     ` John Stultz
  2012-07-04  0:19     ` John Stultz
  0 siblings, 2 replies; 10+ messages in thread
From: Prarit Bhargava @ 2012-07-03 15:27 UTC (permalink / raw)
  To: John Stultz; +Cc: Linux Kernel, stable, Thomas Gleixner



On 07/03/2012 02:09 AM, John Stultz wrote:
> On 07/02/2012 07:16 PM, John Stultz wrote:
>> NOTE: Some reports have been of a hard hang right at or before
>> the leapsecond. I've not been able to reproduce or diagnose
>> this, so this fix does not likely address the reported hard
>> hangs (unless they end up being connected to the futex/hrtimer
>> issue). Please email lkml and me if you experienced this.
> 
> Since as noted above, I've seen some sporadic reports of hard hangs. Some seem
> connected to the hrtimer problem, where ksoftirq seems to go crazy and cause nmi
> watchdog lockups, but others are less clear.
> 
> I wanted to try to provide a way to stress both the kernel's leapsecond code as
> well as provide a way for folks to be able to test their application's
> robustness in the face of leapsecond inconsistencies.
> 
> Attached is my first attempt at such a test.
> 
> It is designed to be run on a server, where it will schedule a leapsecond every
> day at midnight GMT.  So every day, while it runs, the server will see a
> leapsecond.  This allows the the leap second, as well as any suspected timer
> related lockups that might happen when the leapsecond is scheduled to be stressed.
> 
> The test also outputs time samples right before, during and after the leapsecond
> is applied, so you can watch it happen.
> 
> Also since once a day is a fairly low frequency, if you pass a "-s" to the test,
> it will jump the system time forward to 10 seconds right before the scheduled
> leapsecond for that day. Allowing a leapsecond to occur every ~13 seconds. This
> mode may cause application disruption, as it also causes the system to advance a
> day every ~13 seconds.
> 
> The test additionally will note if it observes the hrtimer early expiration
> problem that was widely seen over the weekend.
> 
> Hopefully this will provide a mechanism to test and maintain the kernel's
> correct behaviour for these rare events, as well as allowing folks to get more
> comfortable with leapsecond behaviour and test how it might impact their
> applications.
> 
> If anyone who observed a hard hang is able to use this to reproduce the problem,
> I'd greatly like to hear about it.

> 
> Build instructions are in the test file.

Thanks John -- I moved to using this for testing and hit the following
softlockup when running latest + your patchset:

[ 1084.433362] BUG: soft lockup - CPU#17 stuck for 22s! [leap-a-day:1275]^M
[ 1084.440700] Modules linked in: nfs nfs_acl auth_rpcgss fscache lockd sunrpc
kvm_intel ixgbe coretemp kvm igb ptp pps_core mdio ioatdma lpc_ich crc32c_intel
joydev mfd_core i2c_i801 ghash_clmulni_intel tpm_tis wmi dca sb_edac microcode
edac_core pcspkr tpm tpm_bios hid_generic isci libsas scsi_transport_sas mgag200
i2c_algo_bit drm_kms_helper ttm drm i2c_core [last unloaded: scsi_wait_scan]^M
[ 1084.479183] CPU 17 ^M
[ 1084.481568] Modules linked in: nfs nfs_acl auth_rpcgss fscache lockd sunrpc
kvm_intel ixgbe coretemp kvm igb ptp pps_core mdio ioatdma lpc_ich crc32c_intel
joydev mfd_core i2c_i801 ghash_clmulni_intel tpm_tis wmi dca sb_edac microcode
edac_core pcspkr tpm tpm_bios hid_generic isci libsas scsi_transport_sas mgag200
i2c_algo_bit drm_kms_helper ttm drm i2c_core [last unloaded: scsi_wait_scan]^M
[ 1084.520061] ^M
[ 1084.521740] Pid: 1275, comm: leap-a-day Not tainted 3.5.0-rc4+ #1 Intel
Corporation S2600CP/S2600CP^M
[ 1084.531860] RIP: 0010:[<ffffffff810b3d57>]  [<ffffffff810b3d57>]
smp_call_function_many+0x1f7/0x260^M
[ 1084.541962] RSP: 0018:ffff88042769fdf8  EFLAGS: 00000202^M
[ 1084.547891] RAX: 0000000000000080 RBX: 0000000000000292 RCX: 0000000000000020^M
[ 1084.555858] RDX: 0000000000000080 RSI: 0000000000000080 RDI: 0000000000000292^M
[ 1084.563826] RBP: ffff88042769fe48 R08: ffffffff81cd7200 R09: 0000000000000080^M
[ 1084.571790] R10: ffff88042f7342f0 R11: 0000000000000216 R12: ffffffff8137cd43^M
[ 1084.579758] R13: ffff88042769fd88 R14: 0000000000000292 R15: ffff88042769fda8^M
[ 1084.587727] FS:  00007fba8d48b740(0000) GS:ffff88042f720000(0000)
knlGS:0000000000000000^M
[ 1084.596758] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033^M
[ 1084.603174] CR2: 0000003d72e18c48 CR3: 0000000415d66000 CR4: 00000000000407e0^M
[ 1084.611141] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000^M
[ 1084.619120] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400^M
[ 1084.627092] Process leap-a-day (pid: 1275, threadinfo ffff88042769e000, task
ffff880425fd1720)^M
[ 1084.636694] Stack:^M
[ 1084.638950]  0000000000000003 0100000000000019 0000000000000000
ffffffff8107e960^M
[ 1084.647211]  ffff88042769fe58 ffff88042769ff58 ffffffff8107e960
0000000000000000^M
[ 1084.655479]  0000000000000000 0000000000000000 ffff88042769fe58
ffffffff810b3f12^M
[ 1084.663723] Call Trace:^M
[ 1084.666466]  [<ffffffff8107e960>] ? hrtimer_wakeup+0x30/0x30^M
[ 1084.672784]  [<ffffffff8107e960>] ? hrtimer_wakeup+0x30/0x30^M
[ 1084.679107]  [<ffffffff810b3f12>] smp_call_function+0x22/0x30^M
[ 1084.685530]  [<ffffffff810b3f78>] on_each_cpu+0x28/0x70^M
[ 1084.691371]  [<ffffffff8107ec1c>] do_clock_was_set+0x1c/0x30^M
[ 1084.697691]  [<ffffffff8107f005>] clock_was_set+0x55/0x60^M
[ 1084.703732]  [<ffffffff810a6a23>] do_settimeofday+0xd3/0xe0^M
[ 1084.709971]  [<ffffffff8105f4e5>] do_sys_settimeofday+0xb5/0x110^M
[ 1084.716677]  [<ffffffff8105f5c3>] sys_settimeofday+0x83/0xb0^M
[ 1084.723012]  [<ffffffff8160f129>] system_call_fastpath+0x16/0x1b^M
[ 1084.729782] Code: f7 ff 15 95 89 b6 00 80 7d bf 00 0f 84 9c fe ff ff 41 f6 47
20 01 0f 84 91 fe ff ff 0f 1f 84 00 00 00 00 00 f3 90 41 f6 47 20 01 <75> f7 e9
7b fe ff ff 66 90 4c 89 e2 4c 89 ee 89 df e8 53 8b 21 ^M

I'm taking a look now ... I'm not sure I believe the hrtimer_wakeup() calls on
the stack.

P.


> 
> thanks
> -john

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 0/3][RFC] Potential fix for leapsecond caused futex issue (v3)
  2012-07-03 15:27   ` Prarit Bhargava
@ 2012-07-03 16:02     ` John Stultz
  2012-07-04  0:19     ` John Stultz
  1 sibling, 0 replies; 10+ messages in thread
From: John Stultz @ 2012-07-03 16:02 UTC (permalink / raw)
  To: Prarit Bhargava; +Cc: Linux Kernel, stable, Thomas Gleixner

On 07/03/2012 08:27 AM, Prarit Bhargava wrote:
>
> On 07/03/2012 02:09 AM, John Stultz wrote:
>> On 07/02/2012 07:16 PM, John Stultz wrote:
>>> NOTE: Some reports have been of a hard hang right at or before
>>> the leapsecond. I've not been able to reproduce or diagnose
>>> this, so this fix does not likely address the reported hard
>>> hangs (unless they end up being connected to the futex/hrtimer
>>> issue). Please email lkml and me if you experienced this.
>> Since as noted above, I've seen some sporadic reports of hard hangs. Some seem
>> connected to the hrtimer problem, where ksoftirq seems to go crazy and cause nmi
>> watchdog lockups, but others are less clear.
>>
>> I wanted to try to provide a way to stress both the kernel's leapsecond code as
>> well as provide a way for folks to be able to test their application's
>> robustness in the face of leapsecond inconsistencies.
>>
>> Attached is my first attempt at such a test.
>>
>> It is designed to be run on a server, where it will schedule a leapsecond every
>> day at midnight GMT.  So every day, while it runs, the server will see a
>> leapsecond.  This allows the the leap second, as well as any suspected timer
>> related lockups that might happen when the leapsecond is scheduled to be stressed.
>>
>> The test also outputs time samples right before, during and after the leapsecond
>> is applied, so you can watch it happen.
>>
>> Also since once a day is a fairly low frequency, if you pass a "-s" to the test,
>> it will jump the system time forward to 10 seconds right before the scheduled
>> leapsecond for that day. Allowing a leapsecond to occur every ~13 seconds. This
>> mode may cause application disruption, as it also causes the system to advance a
>> day every ~13 seconds.
>>
>> The test additionally will note if it observes the hrtimer early expiration
>> problem that was widely seen over the weekend.
>>
>> Hopefully this will provide a mechanism to test and maintain the kernel's
>> correct behaviour for these rare events, as well as allowing folks to get more
>> comfortable with leapsecond behaviour and test how it might impact their
>> applications.
>>
>> If anyone who observed a hard hang is able to use this to reproduce the problem,
>> I'd greatly like to hear about it.
>> Build instructions are in the test file.
> Thanks John -- I moved to using this for testing and hit the following
> softlockup when running latest + your patchset:
>
> [ 1084.433362] BUG: soft lockup - CPU#17 stuck for 22s! [leap-a-day:1275]^M
> [ 1084.440700] Modules linked in: nfs nfs_acl auth_rpcgss fscache lockd sunrpc
> kvm_intel ixgbe coretemp kvm igb ptp pps_core mdio ioatdma lpc_ich crc32c_intel
> joydev mfd_core i2c_i801 ghash_clmulni_intel tpm_tis wmi dca sb_edac microcode
> edac_core pcspkr tpm tpm_bios hid_generic isci libsas scsi_transport_sas mgag200
> i2c_algo_bit drm_kms_helper ttm drm i2c_core [last unloaded: scsi_wait_scan]^M
> [ 1084.479183] CPU 17 ^M
> [ 1084.481568] Modules linked in: nfs nfs_acl auth_rpcgss fscache lockd sunrpc
> kvm_intel ixgbe coretemp kvm igb ptp pps_core mdio ioatdma lpc_ich crc32c_intel
> joydev mfd_core i2c_i801 ghash_clmulni_intel tpm_tis wmi dca sb_edac microcode
> edac_core pcspkr tpm tpm_bios hid_generic isci libsas scsi_transport_sas mgag200
> i2c_algo_bit drm_kms_helper ttm drm i2c_core [last unloaded: scsi_wait_scan]^M
> [ 1084.520061] ^M
> [ 1084.521740] Pid: 1275, comm: leap-a-day Not tainted 3.5.0-rc4+ #1 Intel
> Corporation S2600CP/S2600CP^M
> [ 1084.531860] RIP: 0010:[<ffffffff810b3d57>]  [<ffffffff810b3d57>]
> smp_call_function_many+0x1f7/0x260^M

Hrmm. Can you run:
         $ gdb  --eval-command "list *0xffffffff810b3d57" ./vmlinux

In the root kernel source directory where you saw this?

> [ 1084.541962] RSP: 0018:ffff88042769fdf8  EFLAGS: 00000202^M
> [ 1084.547891] RAX: 0000000000000080 RBX: 0000000000000292 RCX: 0000000000000020^M
> [ 1084.555858] RDX: 0000000000000080 RSI: 0000000000000080 RDI: 0000000000000292^M
> [ 1084.563826] RBP: ffff88042769fe48 R08: ffffffff81cd7200 R09: 0000000000000080^M
> [ 1084.571790] R10: ffff88042f7342f0 R11: 0000000000000216 R12: ffffffff8137cd43^M
> [ 1084.579758] R13: ffff88042769fd88 R14: 0000000000000292 R15: ffff88042769fda8^M
> [ 1084.587727] FS:  00007fba8d48b740(0000) GS:ffff88042f720000(0000)
> knlGS:0000000000000000^M
> [ 1084.596758] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033^M
> [ 1084.603174] CR2: 0000003d72e18c48 CR3: 0000000415d66000 CR4: 00000000000407e0^M
> [ 1084.611141] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000^M
> [ 1084.619120] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400^M
> [ 1084.627092] Process leap-a-day (pid: 1275, threadinfo ffff88042769e000, task
> ffff880425fd1720)^M
> [ 1084.636694] Stack:^M
> [ 1084.638950]  0000000000000003 0100000000000019 0000000000000000
> ffffffff8107e960^M
> [ 1084.647211]  ffff88042769fe58 ffff88042769ff58 ffffffff8107e960
> 0000000000000000^M
> [ 1084.655479]  0000000000000000 0000000000000000 ffff88042769fe58
> ffffffff810b3f12^M
> [ 1084.663723] Call Trace:^M
> [ 1084.666466]  [<ffffffff8107e960>] ? hrtimer_wakeup+0x30/0x30^M
> [ 1084.672784]  [<ffffffff8107e960>] ? hrtimer_wakeup+0x30/0x30^M
> [ 1084.679107]  [<ffffffff810b3f12>] smp_call_function+0x22/0x30^M
> [ 1084.685530]  [<ffffffff810b3f78>] on_each_cpu+0x28/0x70^M
> [ 1084.691371]  [<ffffffff8107ec1c>] do_clock_was_set+0x1c/0x30^M
> [ 1084.697691]  [<ffffffff8107f005>] clock_was_set+0x55/0x60^M
> [ 1084.703732]  [<ffffffff810a6a23>] do_settimeofday+0xd3/0xe0^M
> [ 1084.709971]  [<ffffffff8105f4e5>] do_sys_settimeofday+0xb5/0x110^M
> [ 1084.716677]  [<ffffffff8105f5c3>] sys_settimeofday+0x83/0xb0^M
> [ 1084.723012]  [<ffffffff8160f129>] system_call_fastpath+0x16/0x1b^M
> [ 1084.729782] Code: f7 ff 15 95 89 b6 00 80 7d bf 00 0f 84 9c fe ff ff 41 f6 47
> 20 01 0f 84 91 fe ff ff 0f 1f 84 00 00 00 00 00 f3 90 41 f6 47 20 01 <75> f7 e9
> 7b fe ff ff 66 90 4c 89 e2 4c 89 ee 89 df e8 53 8b 21 ^M
>
> I'm taking a look now ... I'm not sure I believe the hrtimer_wakeup() calls on
> the stack.

Hrm.  Can you sysrq-t the box to see what the other cores are doing?

thanks
-john


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 0/3][RFC] Potential fix for leapsecond caused futex issue (v3)
  2012-07-03 15:27   ` Prarit Bhargava
  2012-07-03 16:02     ` John Stultz
@ 2012-07-04  0:19     ` John Stultz
  1 sibling, 0 replies; 10+ messages in thread
From: John Stultz @ 2012-07-04  0:19 UTC (permalink / raw)
  To: Prarit Bhargava; +Cc: Linux Kernel, stable, Thomas Gleixner

On 07/03/2012 08:27 AM, Prarit Bhargava wrote:
> Thanks John -- I moved to using this for testing and hit the following
> softlockup when running latest + your patchset:
>
> [ 1084.433362] BUG: soft lockup - CPU#17 stuck for 22s! [leap-a-day:1275]^M
[snip]
> [ 1084.531860] RIP: 0010:[<ffffffff810b3d57>]  [<ffffffff810b3d57>]
> smp_call_function_many+0x1f7/0x260^M
[snip]
> [ 1084.663723] Call Trace:^M
> [ 1084.666466]  [<ffffffff8107e960>] ? hrtimer_wakeup+0x30/0x30^M
> [ 1084.672784]  [<ffffffff8107e960>] ? hrtimer_wakeup+0x30/0x30^M
> [ 1084.679107]  [<ffffffff810b3f12>] smp_call_function+0x22/0x30^M
> [ 1084.685530]  [<ffffffff810b3f78>] on_each_cpu+0x28/0x70^M
> [ 1084.691371]  [<ffffffff8107ec1c>] do_clock_was_set+0x1c/0x30^M
> [ 1084.697691]  [<ffffffff8107f005>] clock_was_set+0x55/0x60^M
> [ 1084.703732]  [<ffffffff810a6a23>] do_settimeofday+0xd3/0xe0^M
> [ 1084.709971]  [<ffffffff8105f4e5>] do_sys_settimeofday+0xb5/0x110^M
> [ 1084.716677]  [<ffffffff8105f5c3>] sys_settimeofday+0x83/0xb0^M
> [ 1084.723012]  [<ffffffff8160f129>] system_call_fastpath+0x16/0x1b^M
> [ 1084.729782] Code: f7 ff 15 95 89 b6 00 80 7d bf 00 0f 84 9c fe ff ff 41 f6 47
> 20 01 0f 84 91 fe ff ff 0f 1f 84 00 00 00 00 00 f3 90 41 f6 47 20 01 <75> f7 e9
> 7b fe ff ff 66 90 4c 89 e2 4c 89 ee 89 df e8 53 8b 21 ^M
>
> I'm taking a look now ... I'm not sure I believe the hrtimer_wakeup() calls on
> the stack.
I worked with Prarit and Thomas today to try to chase this down.

Prarit was also seeing "BUG at kernel/timer.c:1091!" problems, and once 
he sent me his config I was able to reproduce the problem. Thomas 
suggested enabling debugobjects and that quickly pointed out the 
think-o: I had mistook __hrtimer_init() as the hrtimer subsystem 
initialization, rather then what gets to initialize every hrtimer. So 
when in my patch I initialized the clock_was_set_timer there, we end up 
potentially re-initializing that timer while it is enqueued, which can 
cause the cpu its enqueued on to lockup with irqs off, which then gums 
up the smp_call_function().

The obvious fix is to initialize the clock_was_set_timer when we define it.

Thanks for Prarit for testing and noticing the problem and Thomas for 
suggesting how to isolate it!

I'm going to continue testing for a bit longer and then will send out 
the revised patchset. Hopefully I can collect some acks tomorrow and 
hopefully try to get it merged later Thursday  (I'd like for Prarit to 
get a chance to test the patch thurs before pushing it).

thanks
-john

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 0/3][RFC] Fix for leapsecond caused futex issue (v4)
@ 2012-07-04  6:21 John Stultz
  2012-07-04  6:21 ` [PATCH 2/3] [RFC] time: Fix leapsecond triggered hrtimer/futex load spike issue John Stultz
  0 siblings, 1 reply; 10+ messages in thread
From: John Stultz @ 2012-07-04  6:21 UTC (permalink / raw)
  To: Linux Kernel; +Cc: John Stultz, Prarit Bhargava, stable, Thomas Gleixner, linux

Ok, made a few tweaks to address issues caught by Prarit's and my
testing. This has run for a number of hours now w/ my leap-a-day.c
test on a few machines.

I'd really appreciate any extra testing, review, or acks at this point.
I'm targeting mid-late Thursday (to give folks in the US a chance to
review & test) as a point when I'll submit this upstream if no other
issues are found.

As widely reported on the internet, many Linux systems after
the leapsecond was inserted are experiencing futex related load
spikes (usually connected to MySQL, Firefox, Thunderbird, Java, etc).

An apparent  workaround for this issue is running:
$ date -s "`date`"

Credit: http://www.sheeri.com/content/mysql-and-leap-second-high-cpu-and-fix

To address this issue I'm proposing we do three things:
1) Fix the clock_was_set() call to remove the limitation that kept
us from calling it from update_wall_time().

2) Call clock_was_set() when we add/remove a leapsecond.

3) Change hrtimer_interrupt to update the hrtimer base offset values.
This third item provides additional robustness should the
clock_was_set() notification (done via a timer if we're in_atomic)
be delayed significantly.

This third item is new and tries to better address the fact that
the hrtimer code caches its sense of time separately from the
timekeeping core. This is necessary for performance reasons, as
hrtimer code is a very hot path, but opens up races between when
the time offsets have changed and when the hrtimer code updates
its bases on each cpu. By updating the base offsets prior to
doing any expiration, we ensure no timers are expired early.

Close review, however, would be appreciated.

I'm fairly happy with this set of changes, so if there's no
objections, I'd propose merging these for 3.5, and I'll
start generating backports for -stable (unfortunately
these won't apply trivially to 3.3 and prior kernels).

I'm also looking to see if we can consolidate the per-cpu base
offset values, so they are not per-cpu and are protected by their
own lock, allowing us to update them quickly from atomic context, 
even while holding the timekeeper.lock (currently I believe there's
the risk of having an ABBA deadlock between the base.lock and the
timekeeper.lock if we try to update the base offsets under
the timekeepr lock). However this will be potentially a more
significant change and wouldn't be appropriate for backporting,
so I want to get these three changes to fix the issue merged first.

NOTE: Some reports have been of a hard hang right at or before
the leapsecond. I've not been able to reproduce or diagnose
this, so this fix does not likely address the reported hard
hangs (unless they end up being connected to the futex/hrtimer
issue). Please email lkml and me if you experienced this.

TODOs:
* Collect feedback & acks
* Submit for merging.
* Generate a backports for pre-v3.4 kernels

v2:
* Address the issue w/ calling clock_was_set from atomic context,
pointed out by Prarit and Ben.
* Rework fix so its simpler.

v3:
* Change from using a work item to a timer for scheduling the
do_clock_was_set() call sooner.
* Add hrtimer_interrupt base offset updating

v4:
* Fix clock_was_set_timer initialization bug found by Prarit
* Switch from is_atomic() to irqs_disabled(), since is_atomic()
  isn't a sufficient check prior to calling smp_call_function()

CC: Prarit Bhargava <prarit@redhat.com>
CC: stable@vger.kernel.org
CC: Thomas Gleixner <tglx@linutronix.de>
CC: linux@openhuawei.org

John Stultz (3):
  [RFC] hrtimer: Fix clock_was_set so it is safe to call from irq
    context
  [RFC] time: Fix leapsecond triggered hrtimer/futex load spike issue
  [RFC] hrtimer: Update hrtimer base offsets each hrtimer_interrupt

 include/linux/hrtimer.h   |    3 +++
 kernel/hrtimer.c          |   31 +++++++++++++++++++++++++++----
 kernel/time/timekeeping.c |   38 ++++++++++++++++++++++++++++++++++++++
 3 files changed, 68 insertions(+), 4 deletions(-)

-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 2/3] [RFC] time: Fix leapsecond triggered hrtimer/futex load spike issue
  2012-07-04  6:21 [PATCH 0/3][RFC] Fix for leapsecond caused futex issue (v4) John Stultz
@ 2012-07-04  6:21 ` John Stultz
  2012-07-05 14:29   ` Prarit Bhargava
  0 siblings, 1 reply; 10+ messages in thread
From: John Stultz @ 2012-07-04  6:21 UTC (permalink / raw)
  To: Linux Kernel; +Cc: John Stultz, Prarit Bhargava, stable, Thomas Gleixner, linux

As widely reported on the internet, some Linux systems after
the leapsecond was inserted are experiencing futex related load
spikes (usually connected to MySQL, Firefox, Thunderbird, Java, etc).

An apparent for this issue workaround is running:
$ date -s "`date`"

Credit: http://www.sheeri.com/content/mysql-and-leap-second-high-cpu-and-fix

I this issue is due to the leapsecond being added without
calling clock_was_set() to notify the hrtimer subsystem of the
change.

The workaround functions as it forces a clock_was_set()
call from settimeofday().

This fix adds the required clock_was_set() calls to where
we adjust for leapseconds.

NOTE: This fix *depends* on the previous fix, which allows
clock_was_set to be called from atomic context. Do not try
to apply just this patch.

CC: Prarit Bhargava <prarit@redhat.com>
CC: stable@vger.kernel.org
CC: Thomas Gleixner <tglx@linutronix.de>
CC: linux@openhuawei.org
Reported-by: Jan Engelhardt <jengelh@inai.de>
Signed-off-by: John Stultz <johnstul@us.ibm.com>
---
 kernel/time/timekeeping.c |    4 ++++
 1 file changed, 4 insertions(+)

diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 6f46a00..cc2991d 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -963,6 +963,8 @@ static cycle_t logarithmic_accumulation(cycle_t offset, int shift)
 		leap = second_overflow(timekeeper.xtime.tv_sec);
 		timekeeper.xtime.tv_sec += leap;
 		timekeeper.wall_to_monotonic.tv_sec -= leap;
+		if (leap)
+			clock_was_set();
 	}
 
 	/* Accumulate raw time */
@@ -1079,6 +1081,8 @@ static void update_wall_time(void)
 		leap = second_overflow(timekeeper.xtime.tv_sec);
 		timekeeper.xtime.tv_sec += leap;
 		timekeeper.wall_to_monotonic.tv_sec -= leap;
+		if (leap)
+			clock_was_set();
 	}
 
 	timekeeping_update(false);
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/3] [RFC] time: Fix leapsecond triggered hrtimer/futex load spike issue
  2012-07-04  6:21 ` [PATCH 2/3] [RFC] time: Fix leapsecond triggered hrtimer/futex load spike issue John Stultz
@ 2012-07-05 14:29   ` Prarit Bhargava
  0 siblings, 0 replies; 10+ messages in thread
From: Prarit Bhargava @ 2012-07-05 14:29 UTC (permalink / raw)
  To: John Stultz; +Cc: Linux Kernel, stable, Thomas Gleixner, linux



On 07/04/2012 02:21 AM, John Stultz wrote:
> As widely reported on the internet, some Linux systems after
> the leapsecond was inserted are experiencing futex related load
> spikes (usually connected to MySQL, Firefox, Thunderbird, Java, etc).
> 
> An apparent for this issue workaround is running:
> $ date -s "`date`"
> 
> Credit: http://www.sheeri.com/content/mysql-and-leap-second-high-cpu-and-fix
> 
> I this issue is due to the leapsecond being added without
> calling clock_was_set() to notify the hrtimer subsystem of the
> change.
> 
> The workaround functions as it forces a clock_was_set()
> call from settimeofday().
> 
> This fix adds the required clock_was_set() calls to where
> we adjust for leapseconds.
> 
> NOTE: This fix *depends* on the previous fix, which allows
> clock_was_set to be called from atomic context. Do not try
> to apply just this patch.
> 
> CC: Prarit Bhargava <prarit@redhat.com>
> CC: stable@vger.kernel.org
> CC: Thomas Gleixner <tglx@linutronix.de>
> CC: linux@openhuawei.org
> Reported-by: Jan Engelhardt <jengelh@inai.de>
> Signed-off-by: John Stultz <johnstul@us.ibm.com>


Acked-by: Prarit Bhargava <prarit@redhat.com>

P.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2012-07-05 14:29 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-07-03  2:16 [PATCH 0/3][RFC] Potential fix for leapsecond caused futex issue (v3) John Stultz
2012-07-03  2:16 ` [PATCH 1/3] [RFC] hrtimer: Fix clock_was_set so it is safe to call from atomic John Stultz
2012-07-03  2:16 ` [PATCH 2/3] [RFC] time: Fix leapsecond triggered hrtimer/futex load spike issue John Stultz
2012-07-03  2:16 ` [PATCH 3/3] [RFC] hrtimer: Update hrtimer base offsets each hrtimer_interrupt John Stultz
2012-07-03  6:09 ` [PATCH 0/3][RFC] Potential fix for leapsecond caused futex issue (v3) John Stultz
2012-07-03 15:27   ` Prarit Bhargava
2012-07-03 16:02     ` John Stultz
2012-07-04  0:19     ` John Stultz
  -- strict thread matches above, loose matches on Subject: below --
2012-07-04  6:21 [PATCH 0/3][RFC] Fix for leapsecond caused futex issue (v4) John Stultz
2012-07-04  6:21 ` [PATCH 2/3] [RFC] time: Fix leapsecond triggered hrtimer/futex load spike issue John Stultz
2012-07-05 14:29   ` Prarit Bhargava

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).