linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: John Stultz <john.stultz@linaro.org>
To: "Falauto, Gerlando" <Gerlando.Falauto@keymile.com>
Cc: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Richard Cochran <richardcochran@gmail.com>,
	Prarit Bhargava <prarit@redhat.com>,
	"Brunck, Holger" <Holger.Brunck@keymile.com>,
	"Longchamp, Valentin" <Valentin.Longchamp@keymile.com>,
	"Bigler, Stefan" <Stefan.Bigler@keymile.com>
Subject: Re: kernel deadlock
Date: Thu, 29 Aug 2013 16:45:22 -0700	[thread overview]
Message-ID: <521FDD12.7050000@linaro.org> (raw)
In-Reply-To: <E92455D61233364C88CCD4A6DAB2E2850E09518110@SRVDEHAN1MX1.keymile.net>

On 08/29/2013 01:56 PM, Falauto, Gerlando wrote:
> Hi everyone,
>
> I ran into the deadlock situation reported at the bottom.
> Actually, on my latest 3.10 kernel for some reason I don't get the
> report (the kernel just hangs for some reason), so it took me quite some
> time to track it down.
>
> Once I figured the trigger to the machine hanging was adjtimex(), I
> reverted everything (between 3.9 to 3.10) that was touching
> kernel/time/timekeeping/timekeeping.c and kernel/time/ntp.c, I double
> checked that indeed the problem was not happening anymore, and finally
> started bisecting, landing on the following offending commit.
> THEN, and ONLY THEN, did I get the &%""ç+"% deadlock report.
>
> Do you guys have any ideas what could be wrong and how to fix it?

Thanks for the report!

What exactly is your process for reproducing the issue?


> [ INFO: inconsistent lock state ]
> 3.10.0-04864-g346ecc9-dirty #16 Not tainted
> ---------------------------------
> inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
> SAKEY/738 [HC0[0]:SC0[0]:HE1:SE1] takes:
>   (timekeeper_lock){?.-...}, at: [<c004a3e4>] do_adjtimex+0x64/0xbc
> {IN-HARDIRQ-W} state was registered at:
>    [<c0055138>] __lock_acquire+0xabc/0x1bb8
>    [<c0056838>] lock_acquire+0xa8/0x15c
>    [<c04c14ec>] _raw_spin_lock_irqsave+0x50/0x64
>    [<c00497a4>] do_timer+0x2c/0xa54
>    [<c004e7f4>] tick_periodic+0x74/0x9c
>    [<c004e834>] tick_handle_periodic+0x18/0x7c
>    [<c001349c>] orion_timer_interrupt+0x24/0x34
>    [<c0069c2c>] handle_irq_event_percpu+0x5c/0x300
>    [<c0069f0c>] handle_irq_event+0x3c/0x5c
>    [<c006c194>] handle_level_irq+0x8c/0xe8
>    [<c0069574>] generic_handle_irq+0x30/0x4c
>    [<c000951c>] handle_IRQ+0x30/0x84
>    [<c04c2178>] __irq_svc+0x38/0xa0
>    [<c06cf15c>] calibrate_delay+0x350/0x4e4
>    [<c06986e0>] start_kernel+0x23c/0x2c4
>    [<0000803c>] 0x803c
> irq event stamp: 32358
> hardirqs last  enabled at (32357): [<c0008c64>] ret_fast_syscall+0x24/0x44
> hardirqs last disabled at (32358): [<c04c14bc>]
> _raw_spin_lock_irqsave+0x20/0x64
> softirqs last  enabled at (32160): [<c001e234>] __do_softirq+0x1b8/0x308
> softirqs last disabled at (32137): [<c001e77c>] irq_exit+0xa0/0xd8
>
> other info that might help us debug this:
>   Possible unsafe locking scenario:
>
>         CPU0
>         ----
>    lock(timekeeper_lock);
>    <Interrupt>
>      lock(timekeeper_lock);
>
>   *** DEADLOCK ***
>
> 1 lock held by SAKEY/738:
>   #0:  (timekeeper_lock){?.-...}, at: [<c004a3e4>] do_adjtimex+0x64/0xbc
>
> stack backtrace:
> CPU: 0 PID: 738 Comm: SAKEY Not tainted 3.10.0-04864-g346ecc9-dirty #16
> [<c000d67c>] (unwind_backtrace+0x0/0xf0) from [<c000b530>]
> (show_stack+0x10/0x14)
> [<c000b530>] (show_stack+0x10/0x14) from [<c04ba07c>]
> (print_usage_bug.part.27+0x218/0x280)
> [<c04ba07c>] (print_usage_bug.part.27+0x218/0x280) from [<c0053058>]
> (mark_lock+0x538/0x6bc)
> [<c0053058>] (mark_lock+0x538/0x6bc) from [<c005326c>]
> (mark_held_locks+0x90/0x124)
> [<c005326c>] (mark_held_locks+0x90/0x124) from [<c00533a8>]
> (trace_hardirqs_on_caller+0xa8/0x23c)
> [<c00533a8>] (trace_hardirqs_on_caller+0xa8/0x23c) from [<c04c1c60>]
> (_raw_spin_unlock_irq+0x24/0x5c)
> [<c04c1c60>] (_raw_spin_unlock_irq+0x24/0x5c) from [<c004ac8c>]
> (__do_adjtimex+0x17c/0x65c)
> [<c004ac8c>] (__do_adjtimex+0x17c/0x65c) from [<c004a404>]
> (do_adjtimex+0x84/0xbc)
> [<c004a404>] (do_adjtimex+0x84/0xbc) from [<c001d62c>]
> (SyS_adjtimex+0x50/0xa8)
> [<c001d62c>] (SyS_adjtimex+0x50/0xa8) from [<c0008c40>]
> (ret_fast_syscall+0x0/0x44)

Hrmm. So I'm a little confused by the report, as we hold the write lock
on the timekeeper_lock disabling irqs, so I'm not sure I see how the irq
could trigger to cause the deadlock. In fact, all the timekeeper_lock
users save irqs.

Hrmm. I dunno. :(

Thomas, you have a guess?

Let me know how you trigger it and I'll see if I can't reproduce it myself.

thanks
-john




  reply	other threads:[~2013-08-29 23:45 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <521F6D06.1040107@keymile.com>
2013-08-29 20:56 ` kernel deadlock Falauto, Gerlando
2013-08-29 23:45   ` John Stultz [this message]
2013-08-30 23:04     ` Gerlando Falauto
2013-08-30 23:10       ` John Stultz
2013-08-31  0:48         ` Stephen Boyd
2013-08-31  8:11           ` Gerlando Falauto
2013-09-03 14:57         ` Gerlando Falauto
2013-09-03 17:26           ` John Stultz
2013-09-04  8:11             ` Gerlando Falauto
2013-09-09 20:29               ` John Stultz
2013-09-10  7:29                 ` Ingo Molnar
2013-09-10 16:23                   ` John Stultz
2013-09-10  7:56                 ` Peter Zijlstra
2013-09-10  8:59                 ` Lin Ming
2013-09-10 16:38                   ` John Stultz
2013-09-11  0:37                     ` Lin Ming
2013-09-09 10:08             ` Peter Zijlstra
2013-09-12 14:42               ` Gerlando Falauto
2003-07-30 20:00 Kernel deadlock Paul Douglas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=521FDD12.7050000@linaro.org \
    --to=john.stultz@linaro.org \
    --cc=Gerlando.Falauto@keymile.com \
    --cc=Holger.Brunck@keymile.com \
    --cc=Stefan.Bigler@keymile.com \
    --cc=Valentin.Longchamp@keymile.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=prarit@redhat.com \
    --cc=richardcochran@gmail.com \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).