stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3][RFC] Potential fix for leapsecond caused futex issue (v3)
@ 2012-07-03  2:16 John Stultz
  2012-07-03  2:16 ` [PATCH 1/3] [RFC] hrtimer: Fix clock_was_set so it is safe to call from atomic John Stultz
                   ` (3 more replies)
  0 siblings, 4 replies; 10+ messages in thread
From: John Stultz @ 2012-07-03  2:16 UTC (permalink / raw)
  To: Linux Kernel; +Cc: John Stultz, Prarit Bhargava, stable, Thomas Gleixner

As widely reported on the internet, many Linux systems after
the leapsecond was inserted are experiencing futex related load
spikes (usually connected to MySQL, Firefox, Thunderbird, Java, etc).

An apparent  workaround for this issue is running:
$ date -s "`date`"

Credit: http://www.sheeri.com/content/mysql-and-leap-second-high-cpu-and-fix


To address this issue I'm proposing we do three things:
1) Fix the clock_was_set() call to remove the limitation that kept
us from calling it from update_wall_time().

2) Call clock_was_set() when we add/remove a leapsecond.

3) Change hrtimer_interrupt to update the hrtimer base offset values.
This third item provides additional robustness should the
clock_was_set() notification (done via a timer if we're in_atomic)
be delayed significantly.


This third item is new and tries to better address the fact that
the hrtimer code caches its sense of time separately from the
timekeeping core. This is necessary for performance reasons, as
hrtimer code is a very hot path, but opens up races between when
the time offsets have changed and when the hrtimer code updates
its bases on each cpu. By updating the base offsets prior to
doing any expiration, we ensure no timers are expired early.

Close review, however, would be appreciated.

I'm fairly happy with this set of changes, so if there's no
objections, I'd propose merging these for 3.5, and I'll
start generating backports for -stable (unfortunately
these won't apply trivially to 3.3 and prior kernels).

I'm also looking to see if we can consolidate the per-cpu base
offset values, so they are not per-cpu and are protected by their
own lock, allowing us to update them quickly from atomic context, 
even while holding the timekeeper.lock (currently I believe there's
the risk of having an ABBA deadlock between the base.lock and the
timekeeper.lock if we try to update the base offsets under
the timekeepr lock). However this will be potentially a more
significant change and wouldn't be appropriate for backporting,
so I want to get these three changes to fix the issue merged first.


NOTE: Some reports have been of a hard hang right at or before
the leapsecond. I've not been able to reproduce or diagnose
this, so this fix does not likely address the reported hard
hangs (unless they end up being connected to the futex/hrtimer
issue). Please email lkml and me if you experienced this.


TODOs:
* Collect feedback & acks
* Submit for merging.
* Generate a backports for pre-v3.4 kernels


v2:
* Address the issue w/ calling clock_was_set from atomic context,
pointed out by Prarit and Ben.
* Rework fix so its simpler.

v3:
* Change from using a work item to a timer for scheduling the
do_clock_was_set() call sooner.
* Add hrtimer_interrupt base offset updating



CC: Prarit Bhargava <prarit@redhat.com>
CC: stable@vger.kernel.org
CC: Thomas Gleixner <tglx@linutronix.de>
Reported-by: Jan Engelhardt <jengelh@inai.de>
Signed-off-by: John Stultz <johnstul@us.ibm.com>

John Stultz (3):
  [RFC] hrtimer: Fix clock_was_set so it is safe to call from atomic
  [RFC] time: Fix leapsecond triggered hrtimer/futex load spike issue
  [RFC] hrtimer: Update hrtimer base offsets each hrtimer_interrupt

 include/linux/hrtimer.h   |    3 +++
 kernel/hrtimer.c          |   33 +++++++++++++++++++++++++++++----
 kernel/time/timekeeping.c |   39 +++++++++++++++++++++++++++++++++++++++
 3 files changed, 71 insertions(+), 4 deletions(-)

-- 
1.7.9.5


^ permalink raw reply	[flat|nested] 10+ messages in thread
* [PATCH 0/3][RFC] Fix for leapsecond caused futex issue (v4)
@ 2012-07-04  6:21 John Stultz
  2012-07-04  6:21 ` [PATCH 2/3] [RFC] time: Fix leapsecond triggered hrtimer/futex load spike issue John Stultz
  0 siblings, 1 reply; 10+ messages in thread
From: John Stultz @ 2012-07-04  6:21 UTC (permalink / raw)
  To: Linux Kernel; +Cc: John Stultz, Prarit Bhargava, stable, Thomas Gleixner, linux

Ok, made a few tweaks to address issues caught by Prarit's and my
testing. This has run for a number of hours now w/ my leap-a-day.c
test on a few machines.

I'd really appreciate any extra testing, review, or acks at this point.
I'm targeting mid-late Thursday (to give folks in the US a chance to
review & test) as a point when I'll submit this upstream if no other
issues are found.


As widely reported on the internet, many Linux systems after
the leapsecond was inserted are experiencing futex related load
spikes (usually connected to MySQL, Firefox, Thunderbird, Java, etc).

An apparent  workaround for this issue is running:
$ date -s "`date`"

Credit: http://www.sheeri.com/content/mysql-and-leap-second-high-cpu-and-fix


To address this issue I'm proposing we do three things:
1) Fix the clock_was_set() call to remove the limitation that kept
us from calling it from update_wall_time().

2) Call clock_was_set() when we add/remove a leapsecond.

3) Change hrtimer_interrupt to update the hrtimer base offset values.
This third item provides additional robustness should the
clock_was_set() notification (done via a timer if we're in_atomic)
be delayed significantly.


This third item is new and tries to better address the fact that
the hrtimer code caches its sense of time separately from the
timekeeping core. This is necessary for performance reasons, as
hrtimer code is a very hot path, but opens up races between when
the time offsets have changed and when the hrtimer code updates
its bases on each cpu. By updating the base offsets prior to
doing any expiration, we ensure no timers are expired early.

Close review, however, would be appreciated.

I'm fairly happy with this set of changes, so if there's no
objections, I'd propose merging these for 3.5, and I'll
start generating backports for -stable (unfortunately
these won't apply trivially to 3.3 and prior kernels).

I'm also looking to see if we can consolidate the per-cpu base
offset values, so they are not per-cpu and are protected by their
own lock, allowing us to update them quickly from atomic context, 
even while holding the timekeeper.lock (currently I believe there's
the risk of having an ABBA deadlock between the base.lock and the
timekeeper.lock if we try to update the base offsets under
the timekeepr lock). However this will be potentially a more
significant change and wouldn't be appropriate for backporting,
so I want to get these three changes to fix the issue merged first.


NOTE: Some reports have been of a hard hang right at or before
the leapsecond. I've not been able to reproduce or diagnose
this, so this fix does not likely address the reported hard
hangs (unless they end up being connected to the futex/hrtimer
issue). Please email lkml and me if you experienced this.


TODOs:
* Collect feedback & acks
* Submit for merging.
* Generate a backports for pre-v3.4 kernels


v2:
* Address the issue w/ calling clock_was_set from atomic context,
pointed out by Prarit and Ben.
* Rework fix so its simpler.

v3:
* Change from using a work item to a timer for scheduling the
do_clock_was_set() call sooner.
* Add hrtimer_interrupt base offset updating

v4:
* Fix clock_was_set_timer initialization bug found by Prarit
* Switch from is_atomic() to irqs_disabled(), since is_atomic()
  isn't a sufficient check prior to calling smp_call_function()
  

CC: Prarit Bhargava <prarit@redhat.com>
CC: stable@vger.kernel.org
CC: Thomas Gleixner <tglx@linutronix.de>
CC: linux@openhuawei.org

John Stultz (3):
  [RFC] hrtimer: Fix clock_was_set so it is safe to call from irq
    context
  [RFC] time: Fix leapsecond triggered hrtimer/futex load spike issue
  [RFC] hrtimer: Update hrtimer base offsets each hrtimer_interrupt

 include/linux/hrtimer.h   |    3 +++
 kernel/hrtimer.c          |   31 +++++++++++++++++++++++++++----
 kernel/time/timekeeping.c |   38 ++++++++++++++++++++++++++++++++++++++
 3 files changed, 68 insertions(+), 4 deletions(-)

-- 
1.7.9.5


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2012-07-05 14:29 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-07-03  2:16 [PATCH 0/3][RFC] Potential fix for leapsecond caused futex issue (v3) John Stultz
2012-07-03  2:16 ` [PATCH 1/3] [RFC] hrtimer: Fix clock_was_set so it is safe to call from atomic John Stultz
2012-07-03  2:16 ` [PATCH 2/3] [RFC] time: Fix leapsecond triggered hrtimer/futex load spike issue John Stultz
2012-07-03  2:16 ` [PATCH 3/3] [RFC] hrtimer: Update hrtimer base offsets each hrtimer_interrupt John Stultz
2012-07-03  6:09 ` [PATCH 0/3][RFC] Potential fix for leapsecond caused futex issue (v3) John Stultz
2012-07-03 15:27   ` Prarit Bhargava
2012-07-03 16:02     ` John Stultz
2012-07-04  0:19     ` John Stultz
  -- strict thread matches above, loose matches on Subject: below --
2012-07-04  6:21 [PATCH 0/3][RFC] Fix for leapsecond caused futex issue (v4) John Stultz
2012-07-04  6:21 ` [PATCH 2/3] [RFC] time: Fix leapsecond triggered hrtimer/futex load spike issue John Stultz
2012-07-05 14:29   ` Prarit Bhargava

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).