stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: John Stultz <johnstul@us.ibm.com>
To: Prarit Bhargava <prarit@redhat.com>
Cc: Linux Kernel <linux-kernel@vger.kernel.org>,
	stable@vger.kernel.org, Thomas Gleixner <tglx@linutronix.de>
Subject: Re: [PATCH 0/3][RFC] Potential fix for leapsecond caused futex issue (v3)
Date: Tue, 03 Jul 2012 17:19:54 -0700	[thread overview]
Message-ID: <4FF38C2A.9080301@us.ibm.com> (raw)
In-Reply-To: <4FF30F48.3030702@redhat.com>

On 07/03/2012 08:27 AM, Prarit Bhargava wrote:
> Thanks John -- I moved to using this for testing and hit the following
> softlockup when running latest + your patchset:
>
> [ 1084.433362] BUG: soft lockup - CPU#17 stuck for 22s! [leap-a-day:1275]^M
[snip]
> [ 1084.531860] RIP: 0010:[<ffffffff810b3d57>]  [<ffffffff810b3d57>]
> smp_call_function_many+0x1f7/0x260^M
[snip]
> [ 1084.663723] Call Trace:^M
> [ 1084.666466]  [<ffffffff8107e960>] ? hrtimer_wakeup+0x30/0x30^M
> [ 1084.672784]  [<ffffffff8107e960>] ? hrtimer_wakeup+0x30/0x30^M
> [ 1084.679107]  [<ffffffff810b3f12>] smp_call_function+0x22/0x30^M
> [ 1084.685530]  [<ffffffff810b3f78>] on_each_cpu+0x28/0x70^M
> [ 1084.691371]  [<ffffffff8107ec1c>] do_clock_was_set+0x1c/0x30^M
> [ 1084.697691]  [<ffffffff8107f005>] clock_was_set+0x55/0x60^M
> [ 1084.703732]  [<ffffffff810a6a23>] do_settimeofday+0xd3/0xe0^M
> [ 1084.709971]  [<ffffffff8105f4e5>] do_sys_settimeofday+0xb5/0x110^M
> [ 1084.716677]  [<ffffffff8105f5c3>] sys_settimeofday+0x83/0xb0^M
> [ 1084.723012]  [<ffffffff8160f129>] system_call_fastpath+0x16/0x1b^M
> [ 1084.729782] Code: f7 ff 15 95 89 b6 00 80 7d bf 00 0f 84 9c fe ff ff 41 f6 47
> 20 01 0f 84 91 fe ff ff 0f 1f 84 00 00 00 00 00 f3 90 41 f6 47 20 01 <75> f7 e9
> 7b fe ff ff 66 90 4c 89 e2 4c 89 ee 89 df e8 53 8b 21 ^M
>
> I'm taking a look now ... I'm not sure I believe the hrtimer_wakeup() calls on
> the stack.
I worked with Prarit and Thomas today to try to chase this down.

Prarit was also seeing "BUG at kernel/timer.c:1091!" problems, and once 
he sent me his config I was able to reproduce the problem. Thomas 
suggested enabling debugobjects and that quickly pointed out the 
think-o: I had mistook __hrtimer_init() as the hrtimer subsystem 
initialization, rather then what gets to initialize every hrtimer. So 
when in my patch I initialized the clock_was_set_timer there, we end up 
potentially re-initializing that timer while it is enqueued, which can 
cause the cpu its enqueued on to lockup with irqs off, which then gums 
up the smp_call_function().

The obvious fix is to initialize the clock_was_set_timer when we define it.

Thanks for Prarit for testing and noticing the problem and Thomas for 
suggesting how to isolate it!

I'm going to continue testing for a bit longer and then will send out 
the revised patchset. Hopefully I can collect some acks tomorrow and 
hopefully try to get it merged later Thursday  (I'd like for Prarit to 
get a chance to test the patch thurs before pushing it).

thanks
-john


      parent reply	other threads:[~2012-07-04  0:19 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-07-03  2:16 [PATCH 0/3][RFC] Potential fix for leapsecond caused futex issue (v3) John Stultz
2012-07-03  2:16 ` [PATCH 1/3] [RFC] hrtimer: Fix clock_was_set so it is safe to call from atomic John Stultz
2012-07-03  2:16 ` [PATCH 2/3] [RFC] time: Fix leapsecond triggered hrtimer/futex load spike issue John Stultz
2012-07-03  2:16 ` [PATCH 3/3] [RFC] hrtimer: Update hrtimer base offsets each hrtimer_interrupt John Stultz
2012-07-03  6:09 ` [PATCH 0/3][RFC] Potential fix for leapsecond caused futex issue (v3) John Stultz
2012-07-03 15:27   ` Prarit Bhargava
2012-07-03 16:02     ` John Stultz
2012-07-04  0:19     ` John Stultz [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4FF38C2A.9080301@us.ibm.com \
    --to=johnstul@us.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=prarit@redhat.com \
    --cc=stable@vger.kernel.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).