All of lore.kernel.org
 help / color / mirror / Atom feed
From: John Stultz <john.stultz@linaro.org>
To: "Falauto, Gerlando" <Gerlando.Falauto@keymile.com>
Cc: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Richard Cochran <richardcochran@gmail.com>,
	Prarit Bhargava <prarit@redhat.com>,
	"Brunck, Holger" <Holger.Brunck@keymile.com>,
	"Longchamp, Valentin" <Valentin.Longchamp@keymile.com>,
	"Bigler, Stefan" <Stefan.Bigler@keymile.com>
Subject: Re: kernel deadlock
Date: Thu, 29 Aug 2013 16:45:22 -0700	[thread overview]
Message-ID: <521FDD12.7050000@linaro.org> (raw)
In-Reply-To: <E92455D61233364C88CCD4A6DAB2E2850E09518110@SRVDEHAN1MX1.keymile.net>

On 08/29/2013 01:56 PM, Falauto, Gerlando wrote:
> Hi everyone,
>
> I ran into the deadlock situation reported at the bottom.
> Actually, on my latest 3.10 kernel for some reason I don't get the
> report (the kernel just hangs for some reason), so it took me quite some
> time to track it down.
>
> Once I figured the trigger to the machine hanging was adjtimex(), I
> reverted everything (between 3.9 to 3.10) that was touching
> kernel/time/timekeeping/timekeeping.c and kernel/time/ntp.c, I double
> checked that indeed the problem was not happening anymore, and finally
> started bisecting, landing on the following offending commit.
> THEN, and ONLY THEN, did I get the &%""ç+"% deadlock report.
>
> Do you guys have any ideas what could be wrong and how to fix it?

Thanks for the report!

What exactly is your process for reproducing the issue?


> [ INFO: inconsistent lock state ]
> 3.10.0-04864-g346ecc9-dirty #16 Not tainted
> ---------------------------------
> inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
> SAKEY/738 [HC0[0]:SC0[0]:HE1:SE1] takes:
>   (timekeeper_lock){?.-...}, at: [<c004a3e4>] do_adjtimex+0x64/0xbc
> {IN-HARDIRQ-W} state was registered at:
>    [<c0055138>] __lock_acquire+0xabc/0x1bb8
>    [<c0056838>] lock_acquire+0xa8/0x15c
>    [<c04c14ec>] _raw_spin_lock_irqsave+0x50/0x64
>    [<c00497a4>] do_timer+0x2c/0xa54
>    [<c004e7f4>] tick_periodic+0x74/0x9c
>    [<c004e834>] tick_handle_periodic+0x18/0x7c
>    [<c001349c>] orion_timer_interrupt+0x24/0x34
>    [<c0069c2c>] handle_irq_event_percpu+0x5c/0x300
>    [<c0069f0c>] handle_irq_event+0x3c/0x5c
>    [<c006c194>] handle_level_irq+0x8c/0xe8
>    [<c0069574>] generic_handle_irq+0x30/0x4c
>    [<c000951c>] handle_IRQ+0x30/0x84
>    [<c04c2178>] __irq_svc+0x38/0xa0
>    [<c06cf15c>] calibrate_delay+0x350/0x4e4
>    [<c06986e0>] start_kernel+0x23c/0x2c4
>    [<0000803c>] 0x803c
> irq event stamp: 32358
> hardirqs last  enabled at (32357): [<c0008c64>] ret_fast_syscall+0x24/0x44
> hardirqs last disabled at (32358): [<c04c14bc>]
> _raw_spin_lock_irqsave+0x20/0x64
> softirqs last  enabled at (32160): [<c001e234>] __do_softirq+0x1b8/0x308
> softirqs last disabled at (32137): [<c001e77c>] irq_exit+0xa0/0xd8
>
> other info that might help us debug this:
>   Possible unsafe locking scenario:
>
>         CPU0
>         ----
>    lock(timekeeper_lock);
>    <Interrupt>
>      lock(timekeeper_lock);
>
>   *** DEADLOCK ***
>
> 1 lock held by SAKEY/738:
>   #0:  (timekeeper_lock){?.-...}, at: [<c004a3e4>] do_adjtimex+0x64/0xbc
>
> stack backtrace:
> CPU: 0 PID: 738 Comm: SAKEY Not tainted 3.10.0-04864-g346ecc9-dirty #16
> [<c000d67c>] (unwind_backtrace+0x0/0xf0) from [<c000b530>]
> (show_stack+0x10/0x14)
> [<c000b530>] (show_stack+0x10/0x14) from [<c04ba07c>]
> (print_usage_bug.part.27+0x218/0x280)
> [<c04ba07c>] (print_usage_bug.part.27+0x218/0x280) from [<c0053058>]
> (mark_lock+0x538/0x6bc)
> [<c0053058>] (mark_lock+0x538/0x6bc) from [<c005326c>]
> (mark_held_locks+0x90/0x124)
> [<c005326c>] (mark_held_locks+0x90/0x124) from [<c00533a8>]
> (trace_hardirqs_on_caller+0xa8/0x23c)
> [<c00533a8>] (trace_hardirqs_on_caller+0xa8/0x23c) from [<c04c1c60>]
> (_raw_spin_unlock_irq+0x24/0x5c)
> [<c04c1c60>] (_raw_spin_unlock_irq+0x24/0x5c) from [<c004ac8c>]
> (__do_adjtimex+0x17c/0x65c)
> [<c004ac8c>] (__do_adjtimex+0x17c/0x65c) from [<c004a404>]
> (do_adjtimex+0x84/0xbc)
> [<c004a404>] (do_adjtimex+0x84/0xbc) from [<c001d62c>]
> (SyS_adjtimex+0x50/0xa8)
> [<c001d62c>] (SyS_adjtimex+0x50/0xa8) from [<c0008c40>]
> (ret_fast_syscall+0x0/0x44)

Hrmm. So I'm a little confused by the report, as we hold the write lock
on the timekeeper_lock disabling irqs, so I'm not sure I see how the irq
could trigger to cause the deadlock. In fact, all the timekeeper_lock
users save irqs.

Hrmm. I dunno. :(

Thomas, you have a guess?

Let me know how you trigger it and I'll see if I can't reproduce it myself.

thanks
-john




  reply	other threads:[~2013-08-29 23:45 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <521F6D06.1040107@keymile.com>
2013-08-29 20:56 ` kernel deadlock Falauto, Gerlando
2013-08-29 23:45   ` John Stultz [this message]
2013-08-30 23:04     ` Gerlando Falauto
2013-08-30 23:10       ` John Stultz
2013-08-31  0:48         ` Stephen Boyd
2013-08-31  8:11           ` Gerlando Falauto
2013-09-03 14:57         ` Gerlando Falauto
2013-09-03 17:26           ` John Stultz
2013-09-04  8:11             ` Gerlando Falauto
2013-09-09 20:29               ` John Stultz
2013-09-10  7:29                 ` Ingo Molnar
2013-09-10 16:23                   ` John Stultz
2013-09-10  7:56                 ` Peter Zijlstra
2013-09-10  8:59                 ` Lin Ming
2013-09-10 16:38                   ` John Stultz
2013-09-11  0:37                     ` Lin Ming
2013-09-09 10:08             ` Peter Zijlstra
2013-09-12 14:42               ` Gerlando Falauto
2003-07-30 20:00 Kernel deadlock Paul Douglas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=521FDD12.7050000@linaro.org \
    --to=john.stultz@linaro.org \
    --cc=Gerlando.Falauto@keymile.com \
    --cc=Holger.Brunck@keymile.com \
    --cc=Stefan.Bigler@keymile.com \
    --cc=Valentin.Longchamp@keymile.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=prarit@redhat.com \
    --cc=richardcochran@gmail.com \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.