All of lore.kernel.org
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: Greg KH <gregkh@linuxfoundation.org>,
	torvalds@linux-foundation.org, akpm@linux-foundation.org,
	alan@lxorguk.ukuu.org.uk, Sasha Levin <levinsasha928@gmail.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Prarit Bhargava <prarit@redhat.com>,
	John Stultz <john.stultz@linaro.org>
Subject: [ 13/23] ntp: Fix leap-second hrtimer livelock
Date: Tue, 17 Jul 2012 17:12:06 -0700	[thread overview]
Message-ID: <20120717232330.302715762@linuxfoundation.org> (raw)
In-Reply-To: <20120717232329.276003806@linuxfoundation.org>

From: Greg KH <gregkh@linuxfoundation.org>

3.0-stable review patch.  If anyone has any objections, please let me know.

------------------


From: John Stultz <john.stultz@linaro.org>

This is a backport of 6b43ae8a619d17c4935c3320d2ef9e92bdeed05d

This should have been backported when it was commited, but I
mistook the problem as requiring the ntp_lock changes
that landed in 3.4 in order for it to occur.

Unfortunately the same issue can happen (with only one cpu)
as follows:
do_adjtimex()
 write_seqlock_irq(&xtime_lock);
  process_adjtimex_modes()
   process_adj_status()
    ntp_start_leap_timer()
     hrtimer_start()
      hrtimer_reprogram()
       tick_program_event()
        clockevents_program_event()
         ktime_get()
          seq = req_seqbegin(xtime_lock); [DEADLOCK]

This deadlock will no always occur, as it requires the
leap_timer to force a hrtimer_reprogram which only happens
if its set and there's no sooner timer to expire.

NOTE: This patch, being faithful to the original commit,
introduces a bug (we don't update wall_to_monotonic),
which will be resovled by backporting a following fix.

Original commit message below:

Since commit 7dffa3c673fbcf835cd7be80bb4aec8ad3f51168 the ntp
subsystem has used an hrtimer for triggering the leapsecond
adjustment. However, this can cause a potential livelock.

Thomas diagnosed this as the following pattern:
CPU 0                                                    CPU 1
do_adjtimex()
  spin_lock_irq(&ntp_lock);
    process_adjtimex_modes();				 timer_interrupt()
      process_adj_status();                                do_timer()
        ntp_start_leap_timer();                             write_lock(&xtime_lock);
          hrtimer_start();                                  update_wall_time();
             hrtimer_reprogram();                            ntp_tick_length()
               tick_program_event()                            spin_lock(&ntp_lock);
                 clockevents_program_event()
		   ktime_get()
                     seq = req_seqbegin(xtime_lock);

This patch tries to avoid the problem by reverting back to not using
an hrtimer to inject leapseconds, and instead we handle the leapsecond
processing in the second_overflow() function.

The downside to this change is that on systems that support highres
timers, the leap second processing will occur on a HZ tick boundary,
(ie: ~1-10ms, depending on HZ)  after the leap second instead of
possibly sooner (~34us in my tests w/ x86_64 lapic).

This patch applies on top of tip/timers/core.

Reported-by: Sasha Levin <levinsasha928@gmail.com>
Diagnoised-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Sasha Levin <levinsasha928@gmail.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 include/linux/timex.h     |    2 
 kernel/time/ntp.c         |  122 +++++++++++++++-------------------------------
 kernel/time/timekeeping.c |   18 ++----
 3 files changed, 48 insertions(+), 94 deletions(-)

--- a/include/linux/timex.h
+++ b/include/linux/timex.h
@@ -266,7 +266,7 @@ static inline int ntp_synced(void)
 /* Returns how long ticks are at present, in ns / 2^NTP_SCALE_SHIFT. */
 extern u64 tick_length;
 
-extern void second_overflow(void);
+extern int second_overflow(unsigned long secs);
 extern void update_ntp_one_tick(void);
 extern int do_adjtimex(struct timex *);
 extern void hardpps(const struct timespec *, const struct timespec *);
--- a/kernel/time/ntp.c
+++ b/kernel/time/ntp.c
@@ -31,8 +31,6 @@ unsigned long			tick_nsec;
 u64				tick_length;
 static u64			tick_length_base;
 
-static struct hrtimer		leap_timer;
-
 #define MAX_TICKADJ		500LL		/* usecs */
 #define MAX_TICKADJ_SCALED \
 	(((MAX_TICKADJ * NSEC_PER_USEC) << NTP_SCALE_SHIFT) / NTP_INTERVAL_FREQ)
@@ -350,60 +348,60 @@ void ntp_clear(void)
 }
 
 /*
- * Leap second processing. If in leap-insert state at the end of the
- * day, the system clock is set back one second; if in leap-delete
- * state, the system clock is set ahead one second.
+ * this routine handles the overflow of the microsecond field
+ *
+ * The tricky bits of code to handle the accurate clock support
+ * were provided by Dave Mills (Mills@UDEL.EDU) of NTP fame.
+ * They were originally developed for SUN and DEC kernels.
+ * All the kudos should go to Dave for this stuff.
+ *
+ * Also handles leap second processing, and returns leap offset
  */
-static enum hrtimer_restart ntp_leap_second(struct hrtimer *timer)
+int second_overflow(unsigned long secs)
 {
-	enum hrtimer_restart res = HRTIMER_NORESTART;
-
-	write_seqlock(&xtime_lock);
+	int leap = 0;
+	s64 delta;
 
+	/*
+	 * Leap second processing. If in leap-insert state at the end of the
+	 * day, the system clock is set back one second; if in leap-delete
+	 * state, the system clock is set ahead one second.
+	 */
 	switch (time_state) {
 	case TIME_OK:
+		if (time_status & STA_INS)
+			time_state = TIME_INS;
+		else if (time_status & STA_DEL)
+			time_state = TIME_DEL;
 		break;
 	case TIME_INS:
-		timekeeping_leap_insert(-1);
-		time_state = TIME_OOP;
-		printk(KERN_NOTICE
-			"Clock: inserting leap second 23:59:60 UTC\n");
-		hrtimer_add_expires_ns(&leap_timer, NSEC_PER_SEC);
-		res = HRTIMER_RESTART;
+		if (secs % 86400 == 0) {
+			leap = -1;
+			time_state = TIME_OOP;
+			printk(KERN_NOTICE
+				"Clock: inserting leap second 23:59:60 UTC\n");
+		}
 		break;
 	case TIME_DEL:
-		timekeeping_leap_insert(1);
-		time_tai--;
-		time_state = TIME_WAIT;
-		printk(KERN_NOTICE
-			"Clock: deleting leap second 23:59:59 UTC\n");
+		if ((secs + 1) % 86400 == 0) {
+			leap = 1;
+			time_tai--;
+			time_state = TIME_WAIT;
+			printk(KERN_NOTICE
+				"Clock: deleting leap second 23:59:59 UTC\n");
+		}
 		break;
 	case TIME_OOP:
 		time_tai++;
 		time_state = TIME_WAIT;
-		/* fall through */
+		break;
+
 	case TIME_WAIT:
 		if (!(time_status & (STA_INS | STA_DEL)))
 			time_state = TIME_OK;
 		break;
 	}
 
-	write_sequnlock(&xtime_lock);
-
-	return res;
-}
-
-/*
- * this routine handles the overflow of the microsecond field
- *
- * The tricky bits of code to handle the accurate clock support
- * were provided by Dave Mills (Mills@UDEL.EDU) of NTP fame.
- * They were originally developed for SUN and DEC kernels.
- * All the kudos should go to Dave for this stuff.
- */
-void second_overflow(void)
-{
-	s64 delta;
 
 	/* Bump the maxerror field */
 	time_maxerror += MAXFREQ / NSEC_PER_USEC;
@@ -423,23 +421,25 @@ void second_overflow(void)
 	pps_dec_valid();
 
 	if (!time_adjust)
-		return;
+		goto out;
 
 	if (time_adjust > MAX_TICKADJ) {
 		time_adjust -= MAX_TICKADJ;
 		tick_length += MAX_TICKADJ_SCALED;
-		return;
+		goto out;
 	}
 
 	if (time_adjust < -MAX_TICKADJ) {
 		time_adjust += MAX_TICKADJ;
 		tick_length -= MAX_TICKADJ_SCALED;
-		return;
+		goto out;
 	}
 
 	tick_length += (s64)(time_adjust * NSEC_PER_USEC / NTP_INTERVAL_FREQ)
 							 << NTP_SCALE_SHIFT;
 	time_adjust = 0;
+out:
+	return leap;
 }
 
 #ifdef CONFIG_GENERIC_CMOS_UPDATE
@@ -501,27 +501,6 @@ static void notify_cmos_timer(void)
 static inline void notify_cmos_timer(void) { }
 #endif
 
-/*
- * Start the leap seconds timer:
- */
-static inline void ntp_start_leap_timer(struct timespec *ts)
-{
-	long now = ts->tv_sec;
-
-	if (time_status & STA_INS) {
-		time_state = TIME_INS;
-		now += 86400 - now % 86400;
-		hrtimer_start(&leap_timer, ktime_set(now, 0), HRTIMER_MODE_ABS);
-
-		return;
-	}
-
-	if (time_status & STA_DEL) {
-		time_state = TIME_DEL;
-		now += 86400 - (now + 1) % 86400;
-		hrtimer_start(&leap_timer, ktime_set(now, 0), HRTIMER_MODE_ABS);
-	}
-}
 
 /*
  * Propagate a new txc->status value into the NTP state:
@@ -546,22 +525,6 @@ static inline void process_adj_status(st
 	time_status &= STA_RONLY;
 	time_status |= txc->status & ~STA_RONLY;
 
-	switch (time_state) {
-	case TIME_OK:
-		ntp_start_leap_timer(ts);
-		break;
-	case TIME_INS:
-	case TIME_DEL:
-		time_state = TIME_OK;
-		ntp_start_leap_timer(ts);
-	case TIME_WAIT:
-		if (!(time_status & (STA_INS | STA_DEL)))
-			time_state = TIME_OK;
-		break;
-	case TIME_OOP:
-		hrtimer_restart(&leap_timer);
-		break;
-	}
 }
 /*
  * Called with the xtime lock held, so we can access and modify
@@ -643,9 +606,6 @@ int do_adjtimex(struct timex *txc)
 		    (txc->tick <  900000/USER_HZ ||
 		     txc->tick > 1100000/USER_HZ))
 			return -EINVAL;
-
-		if (txc->modes & ADJ_STATUS && time_state != TIME_OK)
-			hrtimer_cancel(&leap_timer);
 	}
 
 	if (txc->modes & ADJ_SETOFFSET) {
@@ -967,6 +927,4 @@ __setup("ntp_tick_adj=", ntp_tick_adj_se
 void __init ntp_init(void)
 {
 	ntp_clear();
-	hrtimer_init(&leap_timer, CLOCK_REALTIME, HRTIMER_MODE_ABS);
-	leap_timer.function = ntp_leap_second;
 }
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -169,15 +169,6 @@ static struct timespec raw_time;
 /* flag for if timekeeping is suspended */
 int __read_mostly timekeeping_suspended;
 
-/* must hold xtime_lock */
-void timekeeping_leap_insert(int leapsecond)
-{
-	xtime.tv_sec += leapsecond;
-	wall_to_monotonic.tv_sec -= leapsecond;
-	update_vsyscall(&xtime, &wall_to_monotonic, timekeeper.clock,
-			timekeeper.mult);
-}
-
 /**
  * timekeeping_forward_now - update clock to the current time
  *
@@ -828,9 +819,11 @@ static cycle_t logarithmic_accumulation(
 
 	timekeeper.xtime_nsec += timekeeper.xtime_interval << shift;
 	while (timekeeper.xtime_nsec >= nsecps) {
+		int leap;
 		timekeeper.xtime_nsec -= nsecps;
 		xtime.tv_sec++;
-		second_overflow();
+		leap = second_overflow(xtime.tv_sec);
+		xtime.tv_sec += leap;
 	}
 
 	/* Accumulate raw time */
@@ -936,9 +929,12 @@ static void update_wall_time(void)
 	 * xtime.tv_nsec isn't larger then NSEC_PER_SEC
 	 */
 	if (unlikely(xtime.tv_nsec >= NSEC_PER_SEC)) {
+		int leap;
 		xtime.tv_nsec -= NSEC_PER_SEC;
 		xtime.tv_sec++;
-		second_overflow();
+		leap = second_overflow(xtime.tv_sec);
+		xtime.tv_sec += leap;
+
 	}
 
 	/* check to see if there is a new clocksource to use */



  parent reply	other threads:[~2012-07-18  0:13 UTC|newest]

Thread overview: 67+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-07-17 23:24 [ 00/23] 3.0.38-stable review Greg KH
2012-07-18  0:02 ` [ 01/37] Remove easily user-triggerable BUG from generic_setlease Greg Kroah-Hartman
2012-07-18  0:02   ` [ 02/37] media: cx231xx: dont DMA to random addresses Greg Kroah-Hartman
2012-07-18  0:02   ` [ 03/37] hwmon: (it87) Preserve configuration register bits on init Greg Kroah-Hartman
2012-07-18  0:02   ` [ 04/37] block: fix infinite loop in __getblk_slow Greg Kroah-Hartman
2012-07-18  0:02   ` [ 05/37] media: dvb-core: Release semaphore on error path dvb_register_device() Greg Kroah-Hartman
2012-07-18  0:02   ` [ 06/37] mtd: nandsim: dont open code a do_div helper Greg Kroah-Hartman
2012-07-18  0:02   ` [ 07/37] md/raid1: fix use-after-free bug in RAID1 data-check code Greg Kroah-Hartman
2012-07-18  0:02   ` [ 08/37] ARM: SAMSUNG: fix race in s3c_adc_start for ADC Greg Kroah-Hartman
2012-07-18  0:02   ` [ 09/37] ACPICA: Fix possible fault in return package object repair code Greg Kroah-Hartman
2012-07-18  0:02   ` [ 10/37] cpufreq / ACPI: Fix not loading acpi-cpufreq driver regression Greg Kroah-Hartman
2012-07-18  0:02   ` [ 11/37] sched/nohz: Rewrite and fix load-avg computation -- again Greg Kroah-Hartman
2012-07-18  0:16     ` Jonathan Nieder
2012-07-20 17:04       ` Peter Zijlstra
2012-07-20 17:13         ` Jonathan Nieder
2012-07-20 17:25           ` Peter Zijlstra
2012-07-21 16:02             ` Doug Smythies
2012-07-18  0:02   ` [ 12/37] intel_ips: blacklist HP ProBook laptops Greg Kroah-Hartman
2012-07-18  0:02   ` [ 13/37] fifo: Do not restart open() if it already found a partner Greg Kroah-Hartman
2012-07-18  0:02   ` [ 14/37] rt2x00usb: fix indexes ordering on RX queue kick Greg Kroah-Hartman
2012-07-18  0:02   ` [ 15/37] e1000e: Correct link check logic for 82571 serdes Greg Kroah-Hartman
2012-07-18  0:02   ` [ 16/37] iwlegacy: always monitor for stuck queues Greg Kroah-Hartman
2012-07-18  0:02   ` [ 17/37] iwlegacy: dont mess up the SCD when removing a key Greg Kroah-Hartman
2012-07-18  0:02   ` [ 18/37] rpmsg: fix dependency on initialization order Greg Kroah-Hartman
2012-07-18  0:02   ` [ 19/37] mac80211: destroy assoc_data correctly if assoc fails Greg Kroah-Hartman
2012-07-18  0:02   ` [ 20/37] stmmac: Fix for nfs hang on multiple reboot Greg Kroah-Hartman
2012-07-18  0:02   ` [ 21/37] bonding: debugfs and network namespaces are incompatible Greg Kroah-Hartman
2012-07-18  0:02   ` [ 22/37] bonding: Manage /proc/net/bonding/ entries from the netdev events Greg Kroah-Hartman
2012-07-18  0:03   ` [ 23/37] Input: bcm5974 - Add support for 2012 MacBook Pro Retina Greg Kroah-Hartman
2012-07-18  0:03   ` [ 24/37] Input: xpad - handle all variations of Mad Catz Beat Pad Greg Kroah-Hartman
2012-07-18  0:03   ` [ 25/37] Input: xpad - add signature for Razer Onza Tournament Edition Greg Kroah-Hartman
2012-07-18  0:03   ` [ 26/37] Input: xpad - add Andamiro Pump It Up pad Greg Kroah-Hartman
2012-07-18  0:03   ` [ 27/37] HID: add support for 2012 MacBook Pro Retina Greg Kroah-Hartman
2012-07-18  0:03   ` [ 28/37] clk: Check parent for NULL in clk_change_rate Greg Kroah-Hartman
2012-07-18  0:03   ` [ 29/37] cfg80211: check iface combinations only when iface is running Greg Kroah-Hartman
2012-07-18  0:03   ` [ 30/37] hrtimer: Provide clock_was_set_delayed() Greg Kroah-Hartman
2012-07-18  0:03   ` [ 31/37] timekeeping: Fix leapsecond triggered load spike issue Greg Kroah-Hartman
2012-07-18  0:03   ` [ 32/37] timekeeping: Maintain ktime_t based offsets for hrtimers Greg Kroah-Hartman
2012-07-18  0:03   ` [ 33/37] hrtimers: Move lock held region in hrtimer_interrupt() Greg Kroah-Hartman
2012-07-18  0:03   ` [ 34/37] timekeeping: Provide hrtimer update function Greg Kroah-Hartman
2012-07-18  0:03   ` [ 35/37] hrtimer: Update hrtimer base offsets each hrtimer_interrupt Greg Kroah-Hartman
2012-07-18  0:03   ` [ 37/37] NFC: Export nfc.h to userland Greg Kroah-Hartman
2012-07-18  0:14   ` [ 01/37] Remove easily user-triggerable BUG from generic_setlease Greg KH
2012-07-18 13:36     ` Nick Bowler
2012-07-18 17:58     ` formail doing weird things (was: [ 01/37] Remove easily user-triggerable BUG from generic_setlease) Roland Eggner
2012-07-18  0:11 ` [ 01/23] hwmon: (it87) Preserve configuration register bits on init Greg Kroah-Hartman
2012-07-18  0:11   ` [ 02/23] block: fix infinite loop in __getblk_slow Greg Kroah-Hartman
2012-07-18  0:11   ` [ 03/23] media: dvb-core: Release semaphore on error path dvb_register_device() Greg Kroah-Hartman
2012-07-18  0:11   ` [ 04/23] mtd: nandsim: dont open code a do_div helper Greg Kroah-Hartman
2012-07-18  0:11   ` [ 05/23] ARM: SAMSUNG: fix race in s3c_adc_start for ADC Greg Kroah-Hartman
2012-07-18  0:11   ` [ 06/23] intel_ips: blacklist HP ProBook laptops Greg Kroah-Hartman
2012-07-18  0:12   ` [ 07/23] fifo: Do not restart open() if it already found a partner Greg Kroah-Hartman
2012-07-18  0:12   ` [ 08/23] rt2x00usb: fix indexes ordering on RX queue kick Greg Kroah-Hartman
2012-07-18  0:12   ` [ 09/23] e1000e: Correct link check logic for 82571 serdes Greg Kroah-Hartman
2012-07-18  0:12   ` [ 10/23] Input: xpad - add Andamiro Pump It Up pad Greg Kroah-Hartman
2012-07-18  0:12   ` [ 11/23] tcp: drop SYN+FIN messages Greg Kroah-Hartman
2012-07-18  0:12   ` [ 12/23] cfg80211: check iface combinations only when iface is running Greg Kroah-Hartman
2012-07-18  0:12   ` Greg Kroah-Hartman [this message]
2012-07-18  0:12   ` [ 14/23] ntp: Correct TAI offset during leap second Greg Kroah-Hartman
2012-07-18  0:12   ` [ 15/23] timekeeping: Fix CLOCK_MONOTONIC inconsistency during leapsecond Greg Kroah-Hartman
2012-07-18  0:12   ` [ 16/23] time: Move common updates to a function Greg Kroah-Hartman
2012-07-18  0:12   ` [ 17/23] hrtimer: Provide clock_was_set_delayed() Greg Kroah-Hartman
2012-07-18  0:12   ` [ 18/23] timekeeping: Fix leapsecond triggered load spike issue Greg Kroah-Hartman
2012-07-18  0:12   ` [ 19/23] timekeeping: Maintain ktime_t based offsets for hrtimers Greg Kroah-Hartman
2012-07-18  0:12   ` [ 20/23] hrtimers: Move lock held region in hrtimer_interrupt() Greg Kroah-Hartman
2012-07-18  0:12   ` [ 21/23] timekeeping: Provide hrtimer update function Greg Kroah-Hartman
2012-07-18  0:12   ` [ 22/23] hrtimer: Update hrtimer base offsets each hrtimer_interrupt Greg Kroah-Hartman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120717232330.302715762@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=akpm@linux-foundation.org \
    --cc=alan@lxorguk.ukuu.org.uk \
    --cc=john.stultz@linaro.org \
    --cc=levinsasha928@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=prarit@redhat.com \
    --cc=stable@vger.kernel.org \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.