Kernel deadlock

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Kernel deadlock
@ 2003-07-30 20:00 Paul Douglas
  0 siblings, 0 replies; 19+ messages in thread
From: Paul Douglas @ 2003-07-30 20:00 UTC (permalink / raw)
  To: linux-kernel

Hi Everyone,

I'm not sure is this issue have been identified and dealt with... I've been
searching through the mailing list archives with no luck.
I'm a neophyte when it comes to the linux kernel as a result I don't 
subscribe
to the kernel mailing list so please cc me on any responses...

I'm load testing my PPP based server and it deadlocks when I stress it out.
The way the server is architected each new incoming connection causes 
the server
to spawn a child process to handle that session. This child process then 
spawns a pppd process
and opens a pty to connect it to the pppd process. When I run my load 
test I have 300 sessions
up and am pinging with 1500 byte packets over each session.
It looks like due to the load on the system ppp keep alive echo requests 
timeout for the some of the sessions
and as they do the pppd process terminates.  This is turn causes child 
process to close it's end of the pty. After
a while, generally a few minutes, as one of the sessions terminates  the 
machine locks up and all logging stops.

Here's an example of a *.debug syslog... notice that there is no 
activity in the log after 17:45:59 until the
system is power cycled.

Jul 25 17:45:50 gondor pppd[2554]: LCP terminated by peer (Peer not 
responding)
Jul 25 17:45:55 gondor pppd[2554]: Connection terminated.
Jul 25 17:45:55 gondor pppd[2554]: Connect time 94.7 minutes.
Jul 25 17:45:55 gondor pppd[2554]: Sent 5205988 bytes, received 5277223 
bytes.
Jul 25 17:45:59 gondor svr-chld: SVRC: 2553: Select Error: 0
Jul 25 17:45:59 gondor svr-chld: SVRC: 2553: Read error
Jul 25 17:45:59 gondor svr-chld: SVRC: 2553: ^I^ICALLING QUEUE CLOSE
Jul 25 17:45:59 gondor svr-chld: SVRC: 2553: Shutting down socket
Jul 25 17:45:59 gondor pppd[2554]: Hangup (SIGHUP)
Jul 25 17:45:59 gondor pppd[2554]: ioctl(TIOCSETD, N_TTY): Interrupted 
system call
Jul 25 17:45:59 gondor pppd[2554]: Exit.
Jul 25 17:45:59 gondor svr: SVR: A child has died:; psvpns_chld_reap: 1
Jul 25 17:45:59 gondor svr: SVR: Received a disconnect message id:1142
Jul 25 17:45:59 gondor svr-chld: SVRC: 2553: Select Error: 0
Jul 25 17:45:59 gondor svr-chld: SVRC: 2553: Read error
Jul 25 17:45:59 gondor svr-chld: SVRC: 2553: ^I^ICALLING QUEUE CLOSE
Jul 25 17:45:59 gondor svr-chld: SVRC: 2553: Shutting down socket
Jul 25 17:45:59 gondor pppd[2554]: Hangup (SIGHUP)
Jul 25 17:45:59 gondor pppd[2554]: ioctl(TIOCSETD, N_TTY): Interrupted 
system call
Jul 25 17:45:59 gondor pppd[2554]: Exit.
Jul 25 18:02:39 gondor syslogd 1.4-0: restart.
Jul 25 18:02:39 gondor kernel: klogd 1.4-0, log source = /proc/kmsg 
started.
Jul 25 18:02:39 gondor kernel: Inspecting /boot/System.map
Jul 25 18:02:39 gondor kernel: Loaded 15287 symbols from /boot/System.map.


By enabling the NMI I'm able to get some information on the console when 
the nmi_watchdog breaks the
deadlock.. Here's what I get...

NMI Watchdog detected LOCKUP on CPU0, eip c01090ca, registers:
CPU: 0
EIP: 0010:[<c01090ca>] Not tainted
EFLAGS: 00000002
eax: 00000001 ebx: df998b80 ecx: cc7f8000 edx: 00000000
esi: 00000014 edi: 00000000 ebp: cc7f9e0c esp: cc7f9dc8
ds: 0018 es: 0018 ss: 0018
Process svr-chld (pid: 2957, stackpage=cc7f9000)
Stack: d1ec62a0 c01ac3eb ce3e8000 c0315a80 00000014 df998b80 00000280 
c0109318
00000014 cc7f9e0c df998b80 00000000 00000002 c03195e0 fffffffd 00000000
c010bea8 00000002 cc7f8000 00000000 c03195e0 fffffffd 00000000 00000000
Call Trace: [<c01ac3eb>] [<c0109318>] [<c010bea8>] [<c01e0018>] 
[<c011dee3>]
[<c01e9a8c>] [<c01eb3ee>] [<c01ac290>] [<c01a82b2>] [<c01a8363>] 
[<c01ac640>]
[<c01a4f94>] [<c01a615c>] [<c0238288>] [<c0207690>] [<c01090f9>] 
[<c0242e66>]
[<c0153943>] [<c01a65f2>] [<c013ce88>] [<c013b19e>] [<c013b246>] 
[<c010771f>]

Code: 75 f4 8b 43 04 be 01 00 00 00 a9 00 00 00 20 75 08 fb 8d 74
console shuts up ...

After passing this through ksymoops...

ksymoops 2.4.9 on i686 2.4.21_0pre5.6. Options used
-V (default)
-k /proc/ksyms (default)
-l /proc/modules (default)
-o /lib/modules/2.4.21_0pre5.6/ (default)
-m /boot/System.map (specified)

Error (regular_file): read_ksyms stat /proc/ksyms failed
No modules in ksyms, skipping objects
No ksyms, skipping lsmod
NMI Watchdog detected LOCKUP on CPU0, eip c01090ca, registers:
CPU: 0
EIP: 0010:[<c01090ca>] Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00000002
eax: 00000001 ebx: df998b80 ecx: cc7f8000 edx: 00000000
esi: 00000014 edi: 00000000 ebp: cc7f9e0c esp: cc7f9dc8
ds: 0018 es: 0018 ss: 0018
Process svr-chld (pid: 2957, stackpage=cc7f9000)
Stack: d1ec62a0 c01ac3eb ce3e8000 c0315a80 00000014 df998b80 00000280 
c0109318
00000014 cc7f9e0c df998b80 00000000 00000002 c03195e0 fffffffd 00000000
c010bea8 00000002 cc7f8000 00000000 c03195e0 fffffffd 00000000 00000000
Call Trace: [<c01ac3eb>] [<c0109318>] [<c010bea8>] [<c01e0018>] 
[<c011dee3>]
[<c01e9a8c>] [<c01eb3ee>] [<c01ac290>] [<c01a82b2>] [<c01a8363>] 
[<c01ac640>]
[<c01a4f94>] [<c01a615c>] [<c0238288>] [<c0207690>] [<c01090f9>] 
[<c0242e66>]
[<c0153943>] [<c01a65f2>] [<c013ce88>] [<c013b19e>] [<c013b246>] 
[<c010771f>]
Code: 75 f4 8b 43 04 be 01 00 00 00 a9 00 00 00 20 75 08 fb 8d 74


 >>EIP; c01090ca <handle_IRQ_event+3a/a0> <=====

Trace; c01ac3eb <pty_write+14b/150>
Trace; c0109318 <do_IRQ+98/f0>
Trace; c010bea8 <call_do_IRQ+5/d>
Trace; c01e0018 <tg3_interrupt+f8/160>
Trace; c011dee3 <do_softirq+63/e0>
Trace; c01e9a8c <.text.lock.ppp_generic+181/4e5>
Trace; c01eb3ee <ppp_asynctty_wakeup+5e/70>
Trace; c01ac290 <pty_unthrottle+60/70>
Trace; c01a82b2 <check_unthrottle+32/40>
Trace; c01a8363 <n_tty_flush_buffer+13/60>
Trace; c01ac640 <pty_flush_buffer+70/80>
Trace; c01a4f94 <do_tty_hangup+324/380>
Trace; c01a615c <release_dev+60c/660>
Trace; c0238288 <tcp_close+108/7e0>
Trace; c0207690 <ide_dma_intr+0/c0>
Trace; c01090f9 <handle_IRQ_event+69/a0>
Trace; c0242e66 <tcp_send_fin+146/200>
Trace; c0153943 <clear_inode+13/c0>
Trace; c01a65f2 <tty_release+32/70>
Trace; c013ce88 <fput+118/140>
Trace; c013b19e <filp_close+8e/d0>
Trace; c013b246 <sys_close+66/80>
Trace; c010771f <system_call+33/38>

Code; c01090ca <handle_IRQ_event+3a/a0>
00000000 <_EIP>:
Code; c01090ca <handle_IRQ_event+3a/a0> <=====
0: 75 f4 jne fffffff6 <_EIP+0xfffffff6> c01090c0 
<handle_IRQ_event+30/a0> <=====
Code; c01090cc <handle_IRQ_event+3c/a0>
2: 8b 43 04 mov 0x4(%ebx),%eax
Code; c01090cf <handle_IRQ_event+3f/a0>
5: be 01 00 00 00 mov $0x1,%esi
Code; c01090d4 <handle_IRQ_event+44/a0>
a: a9 00 00 00 20 test $0x20000000,%eax
Code; c01090d9 <handle_IRQ_event+49/a0>
f: 75 08 jne 19 <_EIP+0x19> c01090e3 <handle_IRQ_event+53/a0>
Code; c01090db <handle_IRQ_event+4b/a0>
11: fb sti
Code; c01090dc <handle_IRQ_event+4c/a0>
12: 8d 74 00 00 lea 0x0(%eax,%eax,1),%esi


The version of the kernel is Linux 2.4.21_0pre5.6 #1 SMP
It's running on a single P3 machine.
The PPP version is 2.4.1-6mdk.

Any help or insight into this problem would be really apprecaited...
Thanks much,
Paul



^ permalink raw reply	[flat|nested] 19+ messages in thread

[parent not found: <521F6D06.1040107@keymile.com>]

* kernel deadlock
       [not found] <521F6D06.1040107@keymile.com>
@ 2013-08-29 20:56 ` Falauto, Gerlando
  2013-08-29 23:45   ` John Stultz
  0 siblings, 1 reply; 19+ messages in thread
From: Falauto, Gerlando @ 2013-08-29 20:56 UTC (permalink / raw)
  To: Falauto, Gerlando, John Stultz, linux-kernel@vger.kernel.org,
	Thomas Gleixner, Richard Cochran, Prarit Bhargava
  Cc: Brunck, Holger, Longchamp, Valentin, Bigler, Stefan

Hi everyone,

I ran into the deadlock situation reported at the bottom.
Actually, on my latest 3.10 kernel for some reason I don't get the
report (the kernel just hangs for some reason), so it took me quite some
time to track it down.

Once I figured the trigger to the machine hanging was adjtimex(), I
reverted everything (between 3.9 to 3.10) that was touching
kernel/time/timekeeping/timekeeping.c and kernel/time/ntp.c, I double
checked that indeed the problem was not happening anymore, and finally
started bisecting, landing on the following offending commit.
THEN, and ONLY THEN, did I get the &%""ç+"% deadlock report.

Do you guys have any ideas what could be wrong and how to fix it?

Thank you,
Gerlando

commit 06c017fdd4dc48451a29ac37fc1db4a3f86b7f40
Author: John Stultz <john.stultz@linaro.org>
Date:   Fri Mar 22 11:37:28 2013 -0700

     timekeeping: Hold timekeepering locks in do_adjtimex and hardpps

     In moving the NTP state to be protected by the timekeeping locks,
     be sure to acquire the timekeeping locks prior to calling
     ntp functions.

     Cc: Thomas Gleixner <tglx@linutronix.de>
     Cc: Richard Cochran <richardcochran@gmail.com>
     Cc: Prarit Bhargava <prarit@redhat.com>
     Signed-off-by: John Stultz <john.stultz@linaro.org>

=================================
[ INFO: inconsistent lock state ]
3.10.0-04864-g346ecc9-dirty #16 Not tainted
---------------------------------
inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
SAKEY/738 [HC0[0]:SC0[0]:HE1:SE1] takes:
  (timekeeper_lock){?.-...}, at: [<c004a3e4>] do_adjtimex+0x64/0xbc
{IN-HARDIRQ-W} state was registered at:
   [<c0055138>] __lock_acquire+0xabc/0x1bb8
   [<c0056838>] lock_acquire+0xa8/0x15c
   [<c04c14ec>] _raw_spin_lock_irqsave+0x50/0x64
   [<c00497a4>] do_timer+0x2c/0xa54
   [<c004e7f4>] tick_periodic+0x74/0x9c
   [<c004e834>] tick_handle_periodic+0x18/0x7c
   [<c001349c>] orion_timer_interrupt+0x24/0x34
   [<c0069c2c>] handle_irq_event_percpu+0x5c/0x300
   [<c0069f0c>] handle_irq_event+0x3c/0x5c
   [<c006c194>] handle_level_irq+0x8c/0xe8
   [<c0069574>] generic_handle_irq+0x30/0x4c
   [<c000951c>] handle_IRQ+0x30/0x84
   [<c04c2178>] __irq_svc+0x38/0xa0
   [<c06cf15c>] calibrate_delay+0x350/0x4e4
   [<c06986e0>] start_kernel+0x23c/0x2c4
   [<0000803c>] 0x803c
irq event stamp: 32358
hardirqs last  enabled at (32357): [<c0008c64>] ret_fast_syscall+0x24/0x44
hardirqs last disabled at (32358): [<c04c14bc>]
_raw_spin_lock_irqsave+0x20/0x64
softirqs last  enabled at (32160): [<c001e234>] __do_softirq+0x1b8/0x308
softirqs last disabled at (32137): [<c001e77c>] irq_exit+0xa0/0xd8

other info that might help us debug this:
  Possible unsafe locking scenario:

        CPU0
        ----
   lock(timekeeper_lock);
   <Interrupt>
     lock(timekeeper_lock);

  *** DEADLOCK ***

1 lock held by SAKEY/738:
  #0:  (timekeeper_lock){?.-...}, at: [<c004a3e4>] do_adjtimex+0x64/0xbc

stack backtrace:
CPU: 0 PID: 738 Comm: SAKEY Not tainted 3.10.0-04864-g346ecc9-dirty #16
[<c000d67c>] (unwind_backtrace+0x0/0xf0) from [<c000b530>]
(show_stack+0x10/0x14)
[<c000b530>] (show_stack+0x10/0x14) from [<c04ba07c>]
(print_usage_bug.part.27+0x218/0x280)
[<c04ba07c>] (print_usage_bug.part.27+0x218/0x280) from [<c0053058>]
(mark_lock+0x538/0x6bc)
[<c0053058>] (mark_lock+0x538/0x6bc) from [<c005326c>]
(mark_held_locks+0x90/0x124)
[<c005326c>] (mark_held_locks+0x90/0x124) from [<c00533a8>]
(trace_hardirqs_on_caller+0xa8/0x23c)
[<c00533a8>] (trace_hardirqs_on_caller+0xa8/0x23c) from [<c04c1c60>]
(_raw_spin_unlock_irq+0x24/0x5c)
[<c04c1c60>] (_raw_spin_unlock_irq+0x24/0x5c) from [<c004ac8c>]
(__do_adjtimex+0x17c/0x65c)
[<c004ac8c>] (__do_adjtimex+0x17c/0x65c) from [<c004a404>]
(do_adjtimex+0x84/0xbc)
[<c004a404>] (do_adjtimex+0x84/0xbc) from [<c001d62c>]
(SyS_adjtimex+0x50/0xa8)
[<c001d62c>] (SyS_adjtimex+0x50/0xa8) from [<c0008c40>]
(ret_fast_syscall+0x0/0x44)


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: kernel deadlock
  2013-08-29 20:56 ` kernel deadlock Falauto, Gerlando
@ 2013-08-29 23:45   ` John Stultz
  2013-08-30 23:04     ` Gerlando Falauto
  0 siblings, 1 reply; 19+ messages in thread
From: John Stultz @ 2013-08-29 23:45 UTC (permalink / raw)
  To: Falauto, Gerlando
  Cc: linux-kernel@vger.kernel.org, Thomas Gleixner, Richard Cochran,
	Prarit Bhargava, Brunck, Holger, Longchamp, Valentin,
	Bigler, Stefan

On 08/29/2013 01:56 PM, Falauto, Gerlando wrote:
> Hi everyone,
>
> I ran into the deadlock situation reported at the bottom.
> Actually, on my latest 3.10 kernel for some reason I don't get the
> report (the kernel just hangs for some reason), so it took me quite some
> time to track it down.
>
> Once I figured the trigger to the machine hanging was adjtimex(), I
> reverted everything (between 3.9 to 3.10) that was touching
> kernel/time/timekeeping/timekeeping.c and kernel/time/ntp.c, I double
> checked that indeed the problem was not happening anymore, and finally
> started bisecting, landing on the following offending commit.
> THEN, and ONLY THEN, did I get the &%""ç+"% deadlock report.
>
> Do you guys have any ideas what could be wrong and how to fix it?

Thanks for the report!

What exactly is your process for reproducing the issue?


> [ INFO: inconsistent lock state ]
> 3.10.0-04864-g346ecc9-dirty #16 Not tainted
> ---------------------------------
> inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
> SAKEY/738 [HC0[0]:SC0[0]:HE1:SE1] takes:
>   (timekeeper_lock){?.-...}, at: [<c004a3e4>] do_adjtimex+0x64/0xbc
> {IN-HARDIRQ-W} state was registered at:
>    [<c0055138>] __lock_acquire+0xabc/0x1bb8
>    [<c0056838>] lock_acquire+0xa8/0x15c
>    [<c04c14ec>] _raw_spin_lock_irqsave+0x50/0x64
>    [<c00497a4>] do_timer+0x2c/0xa54
>    [<c004e7f4>] tick_periodic+0x74/0x9c
>    [<c004e834>] tick_handle_periodic+0x18/0x7c
>    [<c001349c>] orion_timer_interrupt+0x24/0x34
>    [<c0069c2c>] handle_irq_event_percpu+0x5c/0x300
>    [<c0069f0c>] handle_irq_event+0x3c/0x5c
>    [<c006c194>] handle_level_irq+0x8c/0xe8
>    [<c0069574>] generic_handle_irq+0x30/0x4c
>    [<c000951c>] handle_IRQ+0x30/0x84
>    [<c04c2178>] __irq_svc+0x38/0xa0
>    [<c06cf15c>] calibrate_delay+0x350/0x4e4
>    [<c06986e0>] start_kernel+0x23c/0x2c4
>    [<0000803c>] 0x803c
> irq event stamp: 32358
> hardirqs last  enabled at (32357): [<c0008c64>] ret_fast_syscall+0x24/0x44
> hardirqs last disabled at (32358): [<c04c14bc>]
> _raw_spin_lock_irqsave+0x20/0x64
> softirqs last  enabled at (32160): [<c001e234>] __do_softirq+0x1b8/0x308
> softirqs last disabled at (32137): [<c001e77c>] irq_exit+0xa0/0xd8
>
> other info that might help us debug this:
>   Possible unsafe locking scenario:
>
>         CPU0
>         ----
>    lock(timekeeper_lock);
>    <Interrupt>
>      lock(timekeeper_lock);
>
>   *** DEADLOCK ***
>
> 1 lock held by SAKEY/738:
>   #0:  (timekeeper_lock){?.-...}, at: [<c004a3e4>] do_adjtimex+0x64/0xbc
>
> stack backtrace:
> CPU: 0 PID: 738 Comm: SAKEY Not tainted 3.10.0-04864-g346ecc9-dirty #16
> [<c000d67c>] (unwind_backtrace+0x0/0xf0) from [<c000b530>]
> (show_stack+0x10/0x14)
> [<c000b530>] (show_stack+0x10/0x14) from [<c04ba07c>]
> (print_usage_bug.part.27+0x218/0x280)
> [<c04ba07c>] (print_usage_bug.part.27+0x218/0x280) from [<c0053058>]
> (mark_lock+0x538/0x6bc)
> [<c0053058>] (mark_lock+0x538/0x6bc) from [<c005326c>]
> (mark_held_locks+0x90/0x124)
> [<c005326c>] (mark_held_locks+0x90/0x124) from [<c00533a8>]
> (trace_hardirqs_on_caller+0xa8/0x23c)
> [<c00533a8>] (trace_hardirqs_on_caller+0xa8/0x23c) from [<c04c1c60>]
> (_raw_spin_unlock_irq+0x24/0x5c)
> [<c04c1c60>] (_raw_spin_unlock_irq+0x24/0x5c) from [<c004ac8c>]
> (__do_adjtimex+0x17c/0x65c)
> [<c004ac8c>] (__do_adjtimex+0x17c/0x65c) from [<c004a404>]
> (do_adjtimex+0x84/0xbc)
> [<c004a404>] (do_adjtimex+0x84/0xbc) from [<c001d62c>]
> (SyS_adjtimex+0x50/0xa8)
> [<c001d62c>] (SyS_adjtimex+0x50/0xa8) from [<c0008c40>]
> (ret_fast_syscall+0x0/0x44)

Hrmm. So I'm a little confused by the report, as we hold the write lock
on the timekeeper_lock disabling irqs, so I'm not sure I see how the irq
could trigger to cause the deadlock. In fact, all the timekeeper_lock
users save irqs.

Hrmm. I dunno. :(

Thomas, you have a guess?

Let me know how you trigger it and I'll see if I can't reproduce it myself.

thanks
-john




^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: kernel deadlock
  2013-08-29 23:45   ` John Stultz
@ 2013-08-30 23:04     ` Gerlando Falauto
  2013-08-30 23:10       ` John Stultz
  0 siblings, 1 reply; 19+ messages in thread
From: Gerlando Falauto @ 2013-08-30 23:04 UTC (permalink / raw)
  To: John Stultz
  Cc: linux-kernel@vger.kernel.org, Thomas Gleixner, Richard Cochran,
	Prarit Bhargava, Brunck, Holger, Longchamp, Valentin,
	Bigler, Stefan

Hi,

sorry, it took me a while to narrow it down...

On 08/30/2013 01:45 AM, John Stultz wrote:
> On 08/29/2013 01:56 PM, Falauto, Gerlando wrote:
>> Hi everyone,
>>
>> I ran into the deadlock situation reported at the bottom.
>> Actually, on my latest 3.10 kernel for some reason I don't get the
>> report (the kernel just hangs for some reason), so it took me quite some
>> time to track it down.
>>
>> Once I figured the trigger to the machine hanging was adjtimex(), I
>> reverted everything (between 3.9 to 3.10) that was touching
>> kernel/time/timekeeping/timekeeping.c and kernel/time/ntp.c, I double
>> checked that indeed the problem was not happening anymore, and finally
>> started bisecting, landing on the following offending commit.
>> THEN, and ONLY THEN, did I get the &%""ç+"% deadlock report.
>>
>> Do you guys have any ideas what could be wrong and how to fix it?
>
> Thanks for the report!
>
> What exactly is your process for reproducing the issue?

Now (well, now...), it's quite easy.

Three ingredients:

1) Kernel 3.10

2) Enable HRTICK

diff --git a/kernel/sched/features.h b/kernel/sched/features.h
index 99399f8..294e3ca 100644
--- a/kernel/sched/features.h
+++ b/kernel/sched/features.h
@@ -41,7 +41,7 @@ SCHED_FEAT(WAKEUP_PREEMPTION, true)
   */
  SCHED_FEAT(ARCH_POWER, true)

-SCHED_FEAT(HRTICK, false)
+SCHED_FEAT(HRTICK, true)
  SCHED_FEAT(DOUBLE_TICK, false)
  SCHED_FEAT(LB_BIAS, true)

3) Run the following:

#include <stdio.h>
#include <sys/timex.h>

int main(void)
{
     int i;

     for (i = 0 ; ; i++) {
	struct timex adj = {};
	printf("%d\r", i);
	fflush(stdout);
	adjtimex(&adj);
     }
     return 0;
}

Notice how:
1) The original issue (with a bit more complicated scenario) was seen on 
ARM and PowerPC platforms
2) Under the above test conditions (on ARM) I *don't* get any deadlock 
report printed, the machine just hangs
3) The offending commit (below) I had found through a weird (manual) 
process of reverting and re-reverting (where some commits could have 
been reverted out of order), so I'm not 100% sure you'd come to the same 
conclusions.

commit 06c017fdd4dc48451a29ac37fc1db4a3f86b7f40
Author: John Stultz <john.stultz@linaro.org>
Date:   Fri Mar 22 11:37:28 2013 -0700

      timekeeping: Hold timekeepering locks in do_adjtimex and hardpps

I'm not able to perform any further testing at this very moment, but if 
needed, I can try bisecting again sometime next week, so to make an even 
more reliable statement.

Thank you!
Gerlando

>
>
>> [ INFO: inconsistent lock state ]
>> 3.10.0-04864-g346ecc9-dirty #16 Not tainted
>> ---------------------------------
>> inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
>> SAKEY/738 [HC0[0]:SC0[0]:HE1:SE1] takes:
>>    (timekeeper_lock){?.-...}, at: [<c004a3e4>] do_adjtimex+0x64/0xbc
>> {IN-HARDIRQ-W} state was registered at:
>>     [<c0055138>] __lock_acquire+0xabc/0x1bb8
>>     [<c0056838>] lock_acquire+0xa8/0x15c
>>     [<c04c14ec>] _raw_spin_lock_irqsave+0x50/0x64
>>     [<c00497a4>] do_timer+0x2c/0xa54
>>     [<c004e7f4>] tick_periodic+0x74/0x9c
>>     [<c004e834>] tick_handle_periodic+0x18/0x7c
>>     [<c001349c>] orion_timer_interrupt+0x24/0x34
>>     [<c0069c2c>] handle_irq_event_percpu+0x5c/0x300
>>     [<c0069f0c>] handle_irq_event+0x3c/0x5c
>>     [<c006c194>] handle_level_irq+0x8c/0xe8
>>     [<c0069574>] generic_handle_irq+0x30/0x4c
>>     [<c000951c>] handle_IRQ+0x30/0x84
>>     [<c04c2178>] __irq_svc+0x38/0xa0
>>     [<c06cf15c>] calibrate_delay+0x350/0x4e4
>>     [<c06986e0>] start_kernel+0x23c/0x2c4
>>     [<0000803c>] 0x803c
>> irq event stamp: 32358
>> hardirqs last  enabled at (32357): [<c0008c64>] ret_fast_syscall+0x24/0x44
>> hardirqs last disabled at (32358): [<c04c14bc>]
>> _raw_spin_lock_irqsave+0x20/0x64
>> softirqs last  enabled at (32160): [<c001e234>] __do_softirq+0x1b8/0x308
>> softirqs last disabled at (32137): [<c001e77c>] irq_exit+0xa0/0xd8
>>
>> other info that might help us debug this:
>>    Possible unsafe locking scenario:
>>
>>          CPU0
>>          ----
>>     lock(timekeeper_lock);
>>     <Interrupt>
>>       lock(timekeeper_lock);
>>
>>    *** DEADLOCK ***
>>
>> 1 lock held by SAKEY/738:
>>    #0:  (timekeeper_lock){?.-...}, at: [<c004a3e4>] do_adjtimex+0x64/0xbc
>>
>> stack backtrace:
>> CPU: 0 PID: 738 Comm: SAKEY Not tainted 3.10.0-04864-g346ecc9-dirty #16
>> [<c000d67c>] (unwind_backtrace+0x0/0xf0) from [<c000b530>]
>> (show_stack+0x10/0x14)
>> [<c000b530>] (show_stack+0x10/0x14) from [<c04ba07c>]
>> (print_usage_bug.part.27+0x218/0x280)
>> [<c04ba07c>] (print_usage_bug.part.27+0x218/0x280) from [<c0053058>]
>> (mark_lock+0x538/0x6bc)
>> [<c0053058>] (mark_lock+0x538/0x6bc) from [<c005326c>]
>> (mark_held_locks+0x90/0x124)
>> [<c005326c>] (mark_held_locks+0x90/0x124) from [<c00533a8>]
>> (trace_hardirqs_on_caller+0xa8/0x23c)
>> [<c00533a8>] (trace_hardirqs_on_caller+0xa8/0x23c) from [<c04c1c60>]
>> (_raw_spin_unlock_irq+0x24/0x5c)
>> [<c04c1c60>] (_raw_spin_unlock_irq+0x24/0x5c) from [<c004ac8c>]
>> (__do_adjtimex+0x17c/0x65c)
>> [<c004ac8c>] (__do_adjtimex+0x17c/0x65c) from [<c004a404>]
>> (do_adjtimex+0x84/0xbc)
>> [<c004a404>] (do_adjtimex+0x84/0xbc) from [<c001d62c>]
>> (SyS_adjtimex+0x50/0xa8)
>> [<c001d62c>] (SyS_adjtimex+0x50/0xa8) from [<c0008c40>]
>> (ret_fast_syscall+0x0/0x44)
>
> Hrmm. So I'm a little confused by the report, as we hold the write lock
> on the timekeeper_lock disabling irqs, so I'm not sure I see how the irq
> could trigger to cause the deadlock. In fact, all the timekeeper_lock
> users save irqs.
>
> Hrmm. I dunno. :(
>
> Thomas, you have a guess?
>
> Let me know how you trigger it and I'll see if I can't reproduce it myself.
>
> thanks
> -john
>
>
>


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: kernel deadlock
  2013-08-30 23:04     ` Gerlando Falauto
@ 2013-08-30 23:10       ` John Stultz
  2013-08-31  0:48         ` Stephen Boyd
  2013-09-03 14:57         ` Gerlando Falauto
  0 siblings, 2 replies; 19+ messages in thread
From: John Stultz @ 2013-08-30 23:10 UTC (permalink / raw)
  To: Gerlando Falauto
  Cc: linux-kernel@vger.kernel.org, Thomas Gleixner, Richard Cochran,
	Prarit Bhargava, Brunck, Holger, Longchamp, Valentin,
	Bigler, Stefan

On 08/30/2013 04:04 PM, Gerlando Falauto wrote:
> Hi,
>
> sorry, it took me a while to narrow it down...
>
> On 08/30/2013 01:45 AM, John Stultz wrote:
>> On 08/29/2013 01:56 PM, Falauto, Gerlando wrote:
>>> Hi everyone,
>>>
>>> I ran into the deadlock situation reported at the bottom.
>>> Actually, on my latest 3.10 kernel for some reason I don't get the
>>> report (the kernel just hangs for some reason), so it took me quite
>>> some
>>> time to track it down.
>>>
>>> Once I figured the trigger to the machine hanging was adjtimex(), I
>>> reverted everything (between 3.9 to 3.10) that was touching
>>> kernel/time/timekeeping/timekeeping.c and kernel/time/ntp.c, I double
>>> checked that indeed the problem was not happening anymore, and finally
>>> started bisecting, landing on the following offending commit.
>>> THEN, and ONLY THEN, did I get the &%""ç+"% deadlock report.
>>>
>>> Do you guys have any ideas what could be wrong and how to fix it?
>>
>> Thanks for the report!
>>
>> What exactly is your process for reproducing the issue?
>
> Now (well, now...), it's quite easy.
>
> Three ingredients:
>
> 1) Kernel 3.10
>
> 2) Enable HRTICK
>
> diff --git a/kernel/sched/features.h b/kernel/sched/features.h
> index 99399f8..294e3ca 100644
> --- a/kernel/sched/features.h
> +++ b/kernel/sched/features.h
> @@ -41,7 +41,7 @@ SCHED_FEAT(WAKEUP_PREEMPTION, true)
>   */
>  SCHED_FEAT(ARCH_POWER, true)
>
> -SCHED_FEAT(HRTICK, false)
> +SCHED_FEAT(HRTICK, true)
>  SCHED_FEAT(DOUBLE_TICK, false)
>  SCHED_FEAT(LB_BIAS, true)
>
> 3) Run the following:
>
> #include <stdio.h>
> #include <sys/timex.h>
>
> int main(void)
> {
>     int i;
>
>     for (i = 0 ; ; i++) {
>     struct timex adj = {};
>     printf("%d\r", i);
>     fflush(stdout);
>     adjtimex(&adj);
>     }
>     return 0;
> }
>
> Notice how:
> 1) The original issue (with a bit more complicated scenario) was seen
> on ARM and PowerPC platforms
> 2) Under the above test conditions (on ARM) I *don't* get any deadlock
> report printed, the machine just hangs
> 3) The offending commit (below) I had found through a weird (manual)
> process of reverting and re-reverting (where some commits could have
> been reverted out of order), so I'm not 100% sure you'd come to the
> same conclusions.
>
> commit 06c017fdd4dc48451a29ac37fc1db4a3f86b7f40
> Author: John Stultz <john.stultz@linaro.org>
> Date:   Fri Mar 22 11:37:28 2013 -0700
>
>      timekeeping: Hold timekeepering locks in do_adjtimex and hardpps
>
> I'm not able to perform any further testing at this very moment, but
> if needed, I can try bisecting again sometime next week, so to make an
> even more reliable statement.
>

Thanks so much for the details! I'll take a shot at reproducing this and
will let you know what comes of it.

thanks
-john


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: kernel deadlock
  2013-08-30 23:10       ` John Stultz
@ 2013-08-31  0:48         ` Stephen Boyd
  2013-08-31  8:11           ` Gerlando Falauto
  2013-09-03 14:57         ` Gerlando Falauto
  1 sibling, 1 reply; 19+ messages in thread
From: Stephen Boyd @ 2013-08-31  0:48 UTC (permalink / raw)
  To: John Stultz, Gerlando Falauto
  Cc: linux-kernel@vger.kernel.org, Thomas Gleixner, Richard Cochran,
	Prarit Bhargava, Brunck, Holger, Longchamp, Valentin,
	Bigler, Stefan

On 08/30/13 16:10, John Stultz wrote:
> On 08/30/2013 04:04 PM, Gerlando Falauto wrote:
>> Hi,
>>
>> sorry, it took me a while to narrow it down...
>>
>> On 08/30/2013 01:45 AM, John Stultz wrote:
>>> On 08/29/2013 01:56 PM, Falauto, Gerlando wrote:
>>>> Hi everyone,
>>>>
>>>> I ran into the deadlock situation reported at the bottom.
>>>> Actually, on my latest 3.10 kernel for some reason I don't get the
>>>> report (the kernel just hangs for some reason), so it took me quite
>>>> some
>>>> time to track it down.
>>>>
>>>> Once I figured the trigger to the machine hanging was adjtimex(), I
>>>> reverted everything (between 3.9 to 3.10) that was touching
>>>> kernel/time/timekeeping/timekeeping.c and kernel/time/ntp.c, I double
>>>> checked that indeed the problem was not happening anymore, and finally
>>>> started bisecting, landing on the following offending commit.
>>>> THEN, and ONLY THEN, did I get the &%""ç+"% deadlock report.
>>>>
>>>> Do you guys have any ideas what could be wrong and how to fix it?
>>> Thanks for the report!
>>>
>>> What exactly is your process for reproducing the issue?
>> Now (well, now...), it's quite easy.
>>
>> Three ingredients:
>>
>> 1) Kernel 3.10
>>
>> 2) Enable HRTICK
>>
>> diff --git a/kernel/sched/features.h b/kernel/sched/features.h
>> index 99399f8..294e3ca 100644
>> --- a/kernel/sched/features.h
>> +++ b/kernel/sched/features.h
>> @@ -41,7 +41,7 @@ SCHED_FEAT(WAKEUP_PREEMPTION, true)
>>   */
>>  SCHED_FEAT(ARCH_POWER, true)
>>
>> -SCHED_FEAT(HRTICK, false)
>> +SCHED_FEAT(HRTICK, true)
>>  SCHED_FEAT(DOUBLE_TICK, false)
>>  SCHED_FEAT(LB_BIAS, true)
>>
>> 3) Run the following:
>>
>> #include <stdio.h>
>> #include <sys/timex.h>
>>
>> int main(void)
>> {
>>     int i;
>>
>>     for (i = 0 ; ; i++) {
>>     struct timex adj = {};
>>     printf("%d\r", i);
>>     fflush(stdout);
>>     adjtimex(&adj);
>>     }
>>     return 0;
>> }
>>
>> Notice how:
>> 1) The original issue (with a bit more complicated scenario) was seen
>> on ARM and PowerPC platforms
>> 2) Under the above test conditions (on ARM) I *don't* get any deadlock
>> report printed, the machine just hangs
>> 3) The offending commit (below) I had found through a weird (manual)
>> process of reverting and re-reverting (where some commits could have
>> been reverted out of order), so I'm not 100% sure you'd come to the
>> same conclusions.
>>
>> commit 06c017fdd4dc48451a29ac37fc1db4a3f86b7f40
>> Author: John Stultz <john.stultz@linaro.org>
>> Date:   Fri Mar 22 11:37:28 2013 -0700
>>
>>      timekeeping: Hold timekeepering locks in do_adjtimex and hardpps
>>
>> I'm not able to perform any further testing at this very moment, but
>> if needed, I can try bisecting again sometime next week, so to make an
>> even more reliable statement.
>>
>>

Just curious. Do you have this patch from 3.11 applied to your 3.10
kernel tree?

commit 971ee28cbd1ccd87b3164facd9359a534c1d2892
Author: Peter Zijlstra <peterz@infradead.org>
Date:   Fri Jun 28 11:18:53 2013 +0200

    sched: Fix HRTICK
   
    David reported that the HRTICK sched feature was borken; which was
enough
    motivation for me to finally fix it ;-)
   
    We should not allow hrtimer code to do softirq wakeups while holding
schedul
er
    locks. The hrtimer code only needs this when we accidentally try to
program
an
    expired time. We don't much care about those anyway since we have
the regula
r
    tick to fall back to.
   
    Reported-by: David Ahern <dsahern@gmail.com>
    Tested-by: David Ahern <dsahern@gmail.com>
    Signed-off-by: Peter Zijlstra <peterz@infradead.org>
    Link:
http://lkml.kernel.org/r/20130628091853.GE29209@dyad.programming.kicks
-ass.net
    Signed-off-by: Ingo Molnar <mingo@kernel.org>

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: kernel deadlock
  2013-08-31  0:48         ` Stephen Boyd
@ 2013-08-31  8:11           ` Gerlando Falauto
  0 siblings, 0 replies; 19+ messages in thread
From: Gerlando Falauto @ 2013-08-31  8:11 UTC (permalink / raw)
  To: Stephen Boyd
  Cc: John Stultz, linux-kernel@vger.kernel.org, Thomas Gleixner,
	Richard Cochran, Prarit Bhargava, Brunck, Holger,
	Longchamp, Valentin, Bigler, Stefan

Hi Stephen,
 >
> Just curious. Do you have this patch from 3.11 applied to your 3.10
> kernel tree?

Nope, I didn't. But I applied it, and it doesn't seem to make a 
difference, unfortunately. :-(

Thanks for your help anyway!
Gerlando

>
> commit 971ee28cbd1ccd87b3164facd9359a534c1d2892
> Author: Peter Zijlstra <peterz@infradead.org>
> Date:   Fri Jun 28 11:18:53 2013 +0200
>
>      sched: Fix HRTICK
>
>      David reported that the HRTICK sched feature was borken; which was
> enough
>      motivation for me to finally fix it ;-)
>
>      We should not allow hrtimer code to do softirq wakeups while holding
> schedul
> er
>      locks. The hrtimer code only needs this when we accidentally try to
> program
> an
>      expired time. We don't much care about those anyway since we have
> the regula
> r
>      tick to fall back to.
>
>      Reported-by: David Ahern <dsahern@gmail.com>
>      Tested-by: David Ahern <dsahern@gmail.com>
>      Signed-off-by: Peter Zijlstra <peterz@infradead.org>
>      Link:
> http://lkml.kernel.org/r/20130628091853.GE29209@dyad.programming.kicks
> -ass.net
>      Signed-off-by: Ingo Molnar <mingo@kernel.org>
>


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: kernel deadlock
  2013-08-30 23:10       ` John Stultz
  2013-08-31  0:48         ` Stephen Boyd
@ 2013-09-03 14:57         ` Gerlando Falauto
  2013-09-03 17:26           ` John Stultz
  1 sibling, 1 reply; 19+ messages in thread
From: Gerlando Falauto @ 2013-09-03 14:57 UTC (permalink / raw)
  To: John Stultz
  Cc: linux-kernel@vger.kernel.org, Thomas Gleixner, Richard Cochran,
	Prarit Bhargava, Brunck, Holger, Longchamp, Valentin,
	Bigler, Stefan

Hi,

I tried again from scratch, so let me recap the whole situation, so we 
can all view it from the same standpoint. This should make the problem 
easier to see and reproduce.

I can confirm that running a stock 3.10 kernel with HRTICK enabled:

diff --git a/kernel/sched/features.h b/kernel/sched/features.h
index 99399f8..294e3ca 100644
--- a/kernel/sched/features.h
+++ b/kernel/sched/features.h
@@ -41,7 +41,7 @@ SCHED_FEAT(WAKEUP_PREEMPTION, true)
   */
  SCHED_FEAT(ARCH_POWER, true)

-SCHED_FEAT(HRTICK, false)
+SCHED_FEAT(HRTICK, true)
  SCHED_FEAT(DOUBLE_TICK, false)
  SCHED_FEAT(LB_BIAS, true)

makes the following program (and the whole board, as a matter of fact) 
hang with no further notice:

#include <stdio.h>
#include <sys/timex.h>

int main(void)
{
     int i;

     for (i = 0 ; ; i++) {
         struct timex adj = {};
         printf("%d\r", i);
         fflush(stdout);
         adjtimex(&adj);
     }
     return 0;
}

If I then revert everything up to (and including) the offending commit
(mind the '~'):

$ git log --oneline ...06c017f~ -- kernel/time/timekeeping.c 
kernel/time/ntp.c | cut -f1 -d' ' | xargs git revert

The problem disappears.

If I then cherry-pick again the offending commit:

$ git cherry-pick 06c017f; git log -1

commit 06c017fdd4dc48451a29ac37fc1db4a3f86b7f40
Author: John Stultz <john.stultz@linaro.org>
Date:   Fri Mar 22 11:37:28 2013 -0700

      timekeeping: Hold timekeepering locks in do_adjtimex and hardpps

      In moving the NTP state to be protected by the timekeeping locks,
      be sure to acquire the timekeeping locks prior to calling
      ntp functions.

      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Signed-off-by: John Stultz <john.stultz@linaro.org>

I get the following deadlock report:

================================ cut ===============================

=================================
[ INFO: inconsistent lock state ]
3.10.0-00018-gd915798 #3 Not tainted
---------------------------------
inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
a.out/574 [HC0[0]:SC0[0]:HE1:SE1] takes:
  (timekeeper_lock){?.-...}, at: [<c0049d00>] do_adjtimex+0x94/0xf4
{IN-HARDIRQ-W} state was registered at:
   [<c00548fc>] __lock_acquire+0xabc/0x1bb8
   [<c0055ffc>] lock_acquire+0xa8/0x15c
   [<c0432c1c>] _raw_spin_lock_irqsave+0x50/0x64
   [<c0049090>] do_timer+0x2c/0xa54
   [<c004dfa8>] tick_periodic+0x74/0x9c
   [<c004dfe8>] tick_handle_periodic+0x18/0x7c
   [<c0012da8>] orion_timer_interrupt+0x24/0x34
   [<c00693b8>] handle_irq_event_percpu+0x5c/0x300
   [<c0069698>] handle_irq_event+0x3c/0x5c
   [<c006b920>] handle_level_irq+0x8c/0xe8
   [<c0068d30>] generic_handle_irq+0x28/0x44
   [<c000951c>] handle_IRQ+0x30/0x84
   [<c04338b8>] __irq_svc+0x38/0xa0
   [<c0617824>] calibrate_delay+0x350/0x4e4
   [<c05e26e0>] start_kernel+0x23c/0x2c4
   [<0000803c>] 0x803c
irq event stamp: 2840
hardirqs last  enabled at (2839): [<c0008ccc>] no_work_pending+0x8/0x28
hardirqs last disabled at (2840): [<c0432bec>] 
_raw_spin_lock_irqsave+0x20/0x64
softirqs last  enabled at (2098): [<c039e908>] rpc_wake_up_first+0x6c/0x15c
softirqs last disabled at (2096): [<c0432e58>] _raw_spin_lock_bh+0x14/0x54

other info that might help us debug this:
  Possible unsafe locking scenario:

        CPU0
        ----
   lock(timekeeper_lock);
   <Interrupt>
     lock(timekeeper_lock);

  *** DEADLOCK ***

1 lock held by a.out/574:
  #0:  (timekeeper_lock){?.-...}, at: [<c0049d00>] do_adjtimex+0x94/0xf4

stack backtrace:
CPU: 0 PID: 574 Comm: a.out Not tainted 3.10.0-00018-gd915798 #3
[<c000d67c>] (unwind_backtrace+0x0/0xf0) from [<c000b530>] 
(show_stack+0x10/0x14)
[<c000b530>] (show_stack+0x10/0x14) from [<c042c274>] 
(print_usage_bug.part.27+0x218/0x280)
[<c042c274>] (print_usage_bug.part.27+0x218/0x280) from [<c005281c>] 
(mark_lock+0x538/0x6bc)
[<c005281c>] (mark_lock+0x538/0x6bc) from [<c0052a30>] 
(mark_held_locks+0x90/0x124)
[<c0052a30>] (mark_held_locks+0x90/0x124) from [<c0052b6c>] 
(trace_hardirqs_on_caller+0xa8/0x23c)
[<c0052b6c>] (trace_hardirqs_on_caller+0xa8/0x23c) from [<c0433390>] 
(_raw_spin_unlock_irq+0x24/0x5c)
[<c0433390>] (_raw_spin_unlock_irq+0x24/0x5c) from [<c004a490>] 
(__do_adjtimex+0xf0/0x580)
[<c004a490>] (__do_adjtimex+0xf0/0x580) from [<c0049d20>] 
(do_adjtimex+0xb4/0xf4)
[<c0049d20>] (do_adjtimex+0xb4/0xf4) from [<c001cf38>] 
(SyS_adjtimex+0x50/0xa8)
[<c001cf38>] (SyS_adjtimex+0x50/0xa8) from [<c0008c40>] 
(ret_fast_syscall+0x0/0x44)

================================ cut ===============================

And as soon as I also cherry-pick (notice there is another commit in 
between, which seems not to be relevant on this matter):

$ git cherry-pick a076b2146fabb0894cae5e0189a8ba3f1502d737; git show

commit a076b2146fabb0894cae5e0189a8ba3f1502d737
Author: John Stultz <john.stultz@linaro.org>
Date:   Fri Mar 22 11:52:03 2013 -0700

     ntp: Remove ntp_lock, using the timekeeping locks to protect ntp state

     In order to properly handle the NTP state in future changes to the
     timekeeping lock management, this patch moves the management of
     all of the ntp state under the timekeeping locks.

     This allows us to remove the ntp_lock.

     Cc: Thomas Gleixner <tglx@linutronix.de>
     Cc: Richard Cochran <richardcochran@gmail.com>
     Cc: Prarit Bhargava <prarit@redhat.com>
     Signed-off-by: John Stultz <john.stultz@linaro.org>


I end up in the situation where the system hangs completely and NO 
deadlock report whatsoever is output.

So it looks like 06c017fdd4dc48451a29ac37fc1db4a3f86b7f40 introduces the 
deadlock, while a076b2146fabb0894cae5e0189a8ba3f1502d737 cares to hide 
the report.

Notice how I tested the above on an ARM board; on PowerPC I get similar 
results, although I am not able to see the deadlock report under any 
circumstances (enabling CONFIG_PROVE_LOCKING, which is the flag that 
triggers the deadlock report, causes the kernel to hang at startup even 
on a vanilla 3.10 kernel).

John, could you please confirm whether you're at least able to reproduce 
it somehow?

Thank you,
Gerlando

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: kernel deadlock
  2013-09-03 14:57         ` Gerlando Falauto
@ 2013-09-03 17:26           ` John Stultz
  2013-09-04  8:11             ` Gerlando Falauto
  2013-09-09 10:08             ` Peter Zijlstra
  0 siblings, 2 replies; 19+ messages in thread
From: John Stultz @ 2013-09-03 17:26 UTC (permalink / raw)
  To: Gerlando Falauto
  Cc: linux-kernel@vger.kernel.org, Thomas Gleixner, Richard Cochran,
	Prarit Bhargava, Brunck, Holger, Longchamp, Valentin,
	Bigler, Stefan, Peter Zijlstra

On 09/03/2013 07:57 AM, Gerlando Falauto wrote:
> Hi,
>
> I tried again from scratch, so let me recap the whole situation, so we
> can all view it from the same standpoint. This should make the problem
> easier to see and reproduce.
>
> I can confirm that running a stock 3.10 kernel with HRTICK enabled:
[snip]
> makes the following program (and the whole board, as a matter of fact)
> hang with no further notice:
>
[snip]
> If I then revert everything up to (and including) the offending commit
> (mind the '~'):
>
> $ git log --oneline ...06c017f~ -- kernel/time/timekeeping.c
> kernel/time/ntp.c | cut -f1 -d' ' | xargs git revert
>
> The problem disappears.
>
> If I then cherry-pick again the offending commit:
>
> $ git cherry-pick 06c017f; git log -1
>
> commit 06c017fdd4dc48451a29ac37fc1db4a3f86b7f40
> Author: John Stultz <john.stultz@linaro.org>
> Date:   Fri Mar 22 11:37:28 2013 -0700
>
>      timekeeping: Hold timekeepering locks in do_adjtimex and hardpps
>
[snip]
> And as soon as I also cherry-pick (notice there is another commit in
> between, which seems not to be relevant on this matter):
>
> $ git cherry-pick a076b2146fabb0894cae5e0189a8ba3f1502d737; git show
>
> commit a076b2146fabb0894cae5e0189a8ba3f1502d737
> Author: John Stultz <john.stultz@linaro.org>
> Date:   Fri Mar 22 11:52:03 2013 -0700
>
>     ntp: Remove ntp_lock, using the timekeeping locks to protect ntp
> state
>
[snip]
>
> I end up in the situation where the system hangs completely and NO
> deadlock report whatsoever is output.
>
> So it looks like 06c017fdd4dc48451a29ac37fc1db4a3f86b7f40 introduces
> the deadlock, while a076b2146fabb0894cae5e0189a8ba3f1502d737 cares to
> hide the report.
>
> Notice how I tested the above on an ARM board; on PowerPC I get
> similar results, although I am not able to see the deadlock report
> under any circumstances (enabling CONFIG_PROVE_LOCKING, which is the
> flag that triggers the deadlock report, causes the kernel to hang at
> startup even on a vanilla 3.10 kernel).
>
> John, could you please confirm whether you're at least able to
> reproduce it somehow?

Thanks again for the detailed notes. I've so far been unable to
reproduce this, but I'm still looking at it.

Enabling the SCHED_FEAT(HRTICK, true) bit tends to cause lots of issues
on the various hardware I have, tripping the lockdep warnings on various
other issues:

[    4.224487] ======================================================
[    4.230987] [ INFO: possible circular locking dependency detected ]
[    4.237579] 3.9.0-rc2-00148-g06c017f-dirty #8 Not tainted
[    4.243255] -------------------------------------------------------
[    4.249847] kworker/0:1H/1329 is trying to acquire lock:
[    4.255432]  (&p->pi_lock){-.-.-.}, at: [<c006d184>]
try_to_wake_up+0x30/0x3c
[    4.263092]
[    4.263092] but task is already holding lock:
[    4.269226]  (&rq->lock){-.-.-.}, at: [<c0721f18>] __schedule+0xb8/0x7d0
[    4.276306]
[    4.276306] which lock already depends on the new lock.

and

[    2.360591] =============================================
[    2.363072] [ INFO: possible recursive locking detected ]
[    2.364882] 3.9.0-rc2+ #9 Not tainted
[    2.365486] ---------------------------------------------
[    2.366345] blkid/999 is trying to acquire lock:
[    2.367092]  (&p->pi_lock){-.-.-.}, at: [<ffffffff810c763c>]
try_to_wake_up+0x2c/0x300
[    2.367275]
[    2.367275] but task is already holding lock:
[    2.367275]  (&p->pi_lock){-.-.-.}, at: [<ffffffff810c763c>]
try_to_wake_up+0x2c/0x300
[    2.367275]
[    2.367275] other info that might help us debug this:
[    2.367275]  Possible unsafe locking scenario:
[    2.367275]

These warnings disable lockdep, so it may just be I'm not getting a
chance to see the warning you describe.


That said, the one system that doesn't throw those lockdep warnings,
doesn't seem to hit your lockdep warning either, and I've not been able
to trigger any hang using the test code you provided.

Just in case there's something else here at play, could you send me your
.config that you're using?

thanks
-john


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: kernel deadlock
  2013-09-03 17:26           ` John Stultz
@ 2013-09-04  8:11             ` Gerlando Falauto
  2013-09-09 20:29               ` John Stultz
  2013-09-09 10:08             ` Peter Zijlstra
  1 sibling, 1 reply; 19+ messages in thread
From: Gerlando Falauto @ 2013-09-04  8:11 UTC (permalink / raw)
  To: John Stultz
  Cc: linux-kernel@vger.kernel.org, Thomas Gleixner, Richard Cochran,
	Prarit Bhargava, Brunck, Holger, Longchamp, Valentin,
	Bigler, Stefan, Peter Zijlstra

Hi John,

On 09/03/2013 07:26 PM, John Stultz wrote:
> On 09/03/2013 07:57 AM, Gerlando Falauto wrote:
>> Hi,
>>
>> I tried again from scratch, so let me recap the whole situation, so we
>> can all view it from the same standpoint. This should make the problem
>> easier to see and reproduce.
>>
>> I can confirm that running a stock 3.10 kernel with HRTICK enabled:
> [snip]
>> makes the following program (and the whole board, as a matter of fact)
>> hang with no further notice:
>>
> [snip]
>> If I then revert everything up to (and including) the offending commit
>> (mind the '~'):
>>
>> $ git log --oneline ...06c017f~ -- kernel/time/timekeeping.c
>> kernel/time/ntp.c | cut -f1 -d' ' | xargs git revert
>>
>> The problem disappears.
>>
>> If I then cherry-pick again the offending commit:
>>
>> $ git cherry-pick 06c017f; git log -1
>>
>> commit 06c017fdd4dc48451a29ac37fc1db4a3f86b7f40
>> Author: John Stultz <john.stultz@linaro.org>
>> Date:   Fri Mar 22 11:37:28 2013 -0700
>>
>>       timekeeping: Hold timekeepering locks in do_adjtimex and hardpps
>>
> [snip]
>> And as soon as I also cherry-pick (notice there is another commit in
>> between, which seems not to be relevant on this matter):
>>
>> $ git cherry-pick a076b2146fabb0894cae5e0189a8ba3f1502d737; git show
>>
>> commit a076b2146fabb0894cae5e0189a8ba3f1502d737
>> Author: John Stultz <john.stultz@linaro.org>
>> Date:   Fri Mar 22 11:52:03 2013 -0700
>>
>>      ntp: Remove ntp_lock, using the timekeeping locks to protect ntp
>> state
>>
> [snip]
>>
>> I end up in the situation where the system hangs completely and NO
>> deadlock report whatsoever is output.
>>
>> So it looks like 06c017fdd4dc48451a29ac37fc1db4a3f86b7f40 introduces
>> the deadlock, while a076b2146fabb0894cae5e0189a8ba3f1502d737 cares to
>> hide the report.
>>
>> Notice how I tested the above on an ARM board; on PowerPC I get
>> similar results, although I am not able to see the deadlock report
>> under any circumstances (enabling CONFIG_PROVE_LOCKING, which is the
>> flag that triggers the deadlock report, causes the kernel to hang at
>> startup even on a vanilla 3.10 kernel).
>>
>> John, could you please confirm whether you're at least able to
>> reproduce it somehow?
>
> Thanks again for the detailed notes. I've so far been unable to
> reproduce this, but I'm still looking at it.
>
> Enabling the SCHED_FEAT(HRTICK, true) bit tends to cause lots of issues
> on the various hardware I have, tripping the lockdep warnings on various
> other issues:
>
> [    4.224487] ======================================================
> [    4.230987] [ INFO: possible circular locking dependency detected ]
> [    4.237579] 3.9.0-rc2-00148-g06c017f-dirty #8 Not tainted
> [    4.243255] -------------------------------------------------------
> [    4.249847] kworker/0:1H/1329 is trying to acquire lock:
> [    4.255432]  (&p->pi_lock){-.-.-.}, at: [<c006d184>]
> try_to_wake_up+0x30/0x3c
> [    4.263092]
> [    4.263092] but task is already holding lock:
> [    4.269226]  (&rq->lock){-.-.-.}, at: [<c0721f18>] __schedule+0xb8/0x7d0
> [    4.276306]
> [    4.276306] which lock already depends on the new lock.
>
> and
>
> [    2.360591] =============================================
> [    2.363072] [ INFO: possible recursive locking detected ]
> [    2.364882] 3.9.0-rc2+ #9 Not tainted
> [    2.365486] ---------------------------------------------
> [    2.366345] blkid/999 is trying to acquire lock:
> [    2.367092]  (&p->pi_lock){-.-.-.}, at: [<ffffffff810c763c>]
> try_to_wake_up+0x2c/0x300
> [    2.367275]
> [    2.367275] but task is already holding lock:
> [    2.367275]  (&p->pi_lock){-.-.-.}, at: [<ffffffff810c763c>]
> try_to_wake_up+0x2c/0x300
> [    2.367275]
> [    2.367275] other info that might help us debug this:
> [    2.367275]  Possible unsafe locking scenario:
> [    2.367275]
>
> These warnings disable lockdep, so it may just be I'm not getting a
> chance to see the warning you describe.
>
>
> That said, the one system that doesn't throw those lockdep warnings,
> doesn't seem to hit your lockdep warning either, and I've not been able
> to trigger any hang using the test code you provided.
>

So by the sounds of it, it is more of an issue with HRTICK than with 
your changes...

 > Just in case there's something else here at play, could you send me your
 > .config that you're using?
 >

Yes, it's definitely possible that there's something else.
Even though my rate of failure (2 out of 2 boards tested) had me 
thinking this problem would have been quite common instead...

Here is my km_kirkwood_defconfig (hope that's the right way to send it):

# CONFIG_ARM_PATCH_PHYS_VIRT is not set
CONFIG_PHYS_OFFSET=0x0
CONFIG_SYSVIPC=y
CONFIG_NO_HZ=y
CONFIG_HIGH_RES_TIMERS=y
CONFIG_LOG_BUF_SHIFT=19
CONFIG_CGROUPS=y
CONFIG_CGROUP_SCHED=y
CONFIG_NAMESPACES=y
CONFIG_EMBEDDED=y
CONFIG_PROFILING=y
CONFIG_OPROFILE=y
CONFIG_KPROBES=y
CONFIG_MODULES=y
CONFIG_MODULE_UNLOAD=y
# CONFIG_BLK_DEV_BSG is not set
CONFIG_ARCH_KIRKWOOD=y
CONFIG_MACH_D2NET_V2=y
CONFIG_MACH_DOCKSTAR=y
CONFIG_MACH_ESATA_SHEEVAPLUG=y
CONFIG_MACH_GURUPLUG=y
CONFIG_MACH_INETSPACE_V2=y
CONFIG_MACH_MV88F6281GTW_GE=y
CONFIG_MACH_NET2BIG_V2=y
CONFIG_MACH_NET5BIG_V2=y
CONFIG_MACH_NETSPACE_MAX_V2=y
CONFIG_MACH_NETSPACE_V2=y
CONFIG_MACH_OPENRD_BASE=y
CONFIG_MACH_OPENRD_CLIENT=y
CONFIG_MACH_OPENRD_ULTIMATE=y
CONFIG_MACH_RD88F6192_NAS=y
CONFIG_MACH_RD88F6281=y
CONFIG_MACH_SHEEVAPLUG=y
CONFIG_MACH_T5325=y
CONFIG_MACH_TS219=y
CONFIG_MACH_TS41X=y
CONFIG_MACH_DLINK_KIRKWOOD_DT=y
CONFIG_MACH_DOCKSTAR_DT=y
CONFIG_MACH_DREAMPLUG_DT=y
CONFIG_MACH_GOFLEXNET_DT=y
CONFIG_MACH_IB62X0_DT=y
CONFIG_MACH_ICONNECT_DT=y
CONFIG_MACH_INETSPACE_V2_DT=y
CONFIG_MACH_IOMEGA_IX2_200_DT=y
CONFIG_MACH_KM_KIRKWOOD_DT=y
CONFIG_MACH_LSXL_DT=y
CONFIG_MACH_MPLCEC4_DT=y
CONFIG_MACH_NETSPACE_LITE_V2_DT=y
CONFIG_MACH_NETSPACE_MAX_V2_DT=y
CONFIG_MACH_NETSPACE_MINI_V2_DT=y
CONFIG_MACH_NETSPACE_V2_DT=y
CONFIG_MACH_OPENBLOCKS_A6_DT=y
CONFIG_MACH_TOPKICK_DT=y
CONFIG_MACH_TS219_DT=y
# CONFIG_CPU_FEROCEON_OLD_ID is not set
CONFIG_CACHE_FEROCEON_L2_WRITETHROUGH=y
CONFIG_PREEMPT=y
CONFIG_AEABI=y
# CONFIG_OABI_COMPAT is not set
CONFIG_ZBOOT_ROM_TEXT=0x0
CONFIG_ZBOOT_ROM_BSS=0x0
CONFIG_CPU_IDLE=y
CONFIG_NET=y
CONFIG_PACKET=y
CONFIG_UNIX=y
CONFIG_INET=y
CONFIG_IP_MULTICAST=y
CONFIG_IP_PNP=y
CONFIG_IP_PNP_DHCP=y
CONFIG_IP_PNP_BOOTP=y
# CONFIG_IPV6 is not set
CONFIG_NET_PKTGEN=m
CONFIG_CFG80211=y
CONFIG_MAC80211=y
CONFIG_UEVENT_HELPER_PATH="/sbin/hotplug"
CONFIG_MTD=y
CONFIG_MTD_CMDLINE_PARTS=y
CONFIG_MTD_BLOCK=y
CONFIG_MTD_CFI=y
CONFIG_MTD_JEDECPROBE=y
CONFIG_MTD_CFI_ADV_OPTIONS=y
CONFIG_MTD_CFI_GEOMETRY=y
# CONFIG_MTD_MAP_BANK_WIDTH_4 is not set
CONFIG_MTD_CFI_INTELEXT=y
CONFIG_MTD_CFI_STAA=y
CONFIG_MTD_PHYSMAP=y
CONFIG_MTD_M25P80=y
CONFIG_MTD_NAND=y
CONFIG_MTD_NAND_ORION=y
CONFIG_BLK_DEV_LOOP=y
# CONFIG_SCSI_PROC_FS is not set
CONFIG_BLK_DEV_SD=y
CONFIG_BLK_DEV_SR=m
CONFIG_CHR_DEV_SG=m
CONFIG_ATA=y
CONFIG_SATA_AHCI=y
CONFIG_SATA_MV=y
CONFIG_NETDEVICES=y
CONFIG_MII=y
CONFIG_NET_DSA_MV88E6123_61_65=y
CONFIG_MV643XX_ETH=y
CONFIG_MARVELL_PHY=y
CONFIG_LIBERTAS=y
CONFIG_LIBERTAS_SDIO=y
CONFIG_INPUT_EVDEV=y
CONFIG_KEYBOARD_GPIO=y
# CONFIG_INPUT_MOUSE is not set
CONFIG_LEGACY_PTY_COUNT=16
# CONFIG_DEVKMEM is not set
CONFIG_SERIAL_8250=y
CONFIG_SERIAL_8250_CONSOLE=y
CONFIG_SERIAL_8250_RUNTIME_UARTS=2
CONFIG_SERIAL_OF_PLATFORM=y
# CONFIG_HW_RANDOM is not set
CONFIG_I2C=y
# CONFIG_I2C_COMPAT is not set
CONFIG_I2C_CHARDEV=y
CONFIG_I2C_MV64XXX=y
CONFIG_SPI=y
CONFIG_SPI_ORION=y
CONFIG_GPIO_SYSFS=y
# CONFIG_HWMON is not set
CONFIG_WATCHDOG=y
CONFIG_ORION_WATCHDOG=y
CONFIG_HID_DRAGONRISE=y
CONFIG_HID_GYRATION=y
CONFIG_HID_TWINHAN=y
CONFIG_HID_NTRIG=y
CONFIG_HID_PANTHERLORD=y
CONFIG_HID_PETALYNX=y
CONFIG_HID_SAMSUNG=y
CONFIG_HID_SONY=y
CONFIG_HID_SUNPLUS=y
CONFIG_HID_GREENASIA=y
CONFIG_HID_SMARTJOYPLUS=y
CONFIG_HID_TOPSEED=y
CONFIG_HID_THRUSTMASTER=y
CONFIG_HID_ZEROPLUS=y
CONFIG_USB=y
CONFIG_USB_EHCI_HCD=y
CONFIG_USB_EHCI_ROOT_HUB_TT=y
CONFIG_USB_PRINTER=m
CONFIG_USB_STORAGE=y
CONFIG_USB_STORAGE_DATAFAB=y
CONFIG_USB_STORAGE_FREECOM=y
CONFIG_USB_STORAGE_SDDR09=y
CONFIG_USB_STORAGE_SDDR55=y
CONFIG_USB_STORAGE_JUMPSHOT=y
CONFIG_MMC=y
CONFIG_SDIO_UART=y
CONFIG_MMC_MVSDIO=y
CONFIG_NEW_LEDS=y
CONFIG_LEDS_CLASS=y
CONFIG_LEDS_GPIO=y
CONFIG_LEDS_TRIGGERS=y
CONFIG_LEDS_TRIGGER_TIMER=y
CONFIG_LEDS_TRIGGER_HEARTBEAT=y
CONFIG_LEDS_TRIGGER_DEFAULT_ON=y
CONFIG_RTC_CLASS=y
CONFIG_RTC_DRV_S35390A=y
CONFIG_RTC_DRV_MV=y
CONFIG_DMADEVICES=y
CONFIG_MV_XOR=y
CONFIG_EXT2_FS=y
CONFIG_EXT3_FS=y
# CONFIG_EXT3_FS_XATTR is not set
CONFIG_ISO9660_FS=m
CONFIG_JOLIET=y
CONFIG_UDF_FS=m
CONFIG_MSDOS_FS=y
CONFIG_VFAT_FS=y
CONFIG_TMPFS=y
CONFIG_JFFS2_FS=y
CONFIG_CRAMFS=y
CONFIG_NFS_FS=y
CONFIG_ROOT_NFS=y
CONFIG_NLS_CODEPAGE_437=y
CONFIG_NLS_CODEPAGE_850=y
CONFIG_NLS_ISO8859_1=y
CONFIG_NLS_ISO8859_2=y
CONFIG_NLS_UTF8=y
CONFIG_MAGIC_SYSRQ=y
CONFIG_DEBUG_FS=y
# CONFIG_SCHED_DEBUG is not set
# CONFIG_DEBUG_PREEMPT is not set
CONFIG_DEBUG_INFO=y
# CONFIG_FTRACE is not set
CONFIG_DEBUG_USER=y
CONFIG_DEBUG_LL=y
CONFIG_CRYPTO_CBC=m
CONFIG_CRYPTO_PCBC=m
# CONFIG_CRYPTO_ANSI_CPRNG is not set
CONFIG_CRYPTO_DEV_MV_CESA=y
CONFIG_CRC_CCITT=y
CONFIG_CRC16=y
CONFIG_LIBCRC32C=y

What kind of systems are you using? ARM boards right?
I guess it would be nice for the two of us to work on the same board.
I could try reproducing it on a TK71 or on a Dreamplug, though that's 
going to take quite some time.
Before I even try, I'd like to know whether you would have either of 
those available.
As I said, we might be looking at an issue that's caused by something 
completely unrelated (which is not even turned on by default, and has 
several other issues too). So I'm not sure we should keep bothering.

Thanks,
Gerlando

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: kernel deadlock
  2013-09-04  8:11             ` Gerlando Falauto
@ 2013-09-09 20:29               ` John Stultz
  2013-09-10  7:29                 ` Ingo Molnar
                                   ` (2 more replies)
  0 siblings, 3 replies; 19+ messages in thread
From: John Stultz @ 2013-09-09 20:29 UTC (permalink / raw)
  To: Gerlando Falauto
  Cc: linux-kernel@vger.kernel.org, Thomas Gleixner, Richard Cochran,
	Prarit Bhargava, Brunck, Holger, Longchamp, Valentin,
	Bigler, Stefan, Peter Zijlstra, Ingo Molnar

On 09/04/2013 01:11 AM, Gerlando Falauto wrote:
> Hi John,
>
> On 09/03/2013 07:26 PM, John Stultz wrote:
>> On 09/03/2013 07:57 AM, Gerlando Falauto wrote:
>>> Hi,
>>>
>>> I tried again from scratch, so let me recap the whole situation, so we
>>> can all view it from the same standpoint. This should make the problem
>>> easier to see and reproduce.
>>>
>>> I can confirm that running a stock 3.10 kernel with HRTICK enabled:
>> [snip]
>>> makes the following program (and the whole board, as a matter of fact)
>>> hang with no further notice:
>>>
>> [snip]
>>> If I then revert everything up to (and including) the offending commit
>>> (mind the '~'):
>>>
>>> $ git log --oneline ...06c017f~ -- kernel/time/timekeeping.c
>>> kernel/time/ntp.c | cut -f1 -d' ' | xargs git revert
>>>
>>> The problem disappears.
>>>
>>> If I then cherry-pick again the offending commit:
>>>
>>> $ git cherry-pick 06c017f; git log -1
>>>
>>> commit 06c017fdd4dc48451a29ac37fc1db4a3f86b7f40
>>> Author: John Stultz <john.stultz@linaro.org>
>>> Date:   Fri Mar 22 11:37:28 2013 -0700
>>>
>>>       timekeeping: Hold timekeepering locks in do_adjtimex and hardpps
>>>
>> [snip]
>>> And as soon as I also cherry-pick (notice there is another commit in
>>> between, which seems not to be relevant on this matter):
>>>
>>> $ git cherry-pick a076b2146fabb0894cae5e0189a8ba3f1502d737; git show
>>>
>>> commit a076b2146fabb0894cae5e0189a8ba3f1502d737
>>> Author: John Stultz <john.stultz@linaro.org>
>>> Date:   Fri Mar 22 11:52:03 2013 -0700
>>>
>>>      ntp: Remove ntp_lock, using the timekeeping locks to protect ntp
>>> state
>>>
>> [snip]
>>>
>>> I end up in the situation where the system hangs completely and NO
>>> deadlock report whatsoever is output.
>>>
>>> So it looks like 06c017fdd4dc48451a29ac37fc1db4a3f86b7f40 introduces
>>> the deadlock, while a076b2146fabb0894cae5e0189a8ba3f1502d737 cares to
>>> hide the report.
>>>
>>> Notice how I tested the above on an ARM board; on PowerPC I get
>>> similar results, although I am not able to see the deadlock report
>>> under any circumstances (enabling CONFIG_PROVE_LOCKING, which is the
>>> flag that triggers the deadlock report, causes the kernel to hang at
>>> startup even on a vanilla 3.10 kernel).
>>>
>>> John, could you please confirm whether you're at least able to
>>> reproduce it somehow?
>>
>> Thanks again for the detailed notes. I've so far been unable to
>> reproduce this, but I'm still looking at it.

So I think I've managed to finally reproduce this and hunt it down.

With Peter's "sched: Fix HRTICK" patch and HRTICK enabled, I found I
could trigger a hard hang at boot on my x86_64 kvm system. sysrq didn't
function, so I checked out info cpus and that pointed to both cpus being
in ktime_get() and ktime_get_update_offsets(), which suggested a
seqcount deadlock (basically calling something that reads the seqlock
while we hold the write on it).

Unfortunately the seqlock/seqcount infrastructure doesn't support
lockdep, so I added some debug code to take and release the
timekeeper_lock in every function that does a read on the timekeeper_seq.

This popped up with:
[   15.846386] ======================================================
[   15.847174] [ INFO: possible circular locking dependency detected ]
[   15.847959] 3.10.0+ #30 Not tainted
[   15.848402] -------------------------------------------------------
[   15.849182] ntpd/4062 is trying to acquire lock:
[   15.849765]  (&(&pool->lock)->rlock){..-...}, at:
[<ffffffff810aa9b5>] __queue_work+0x145/0x480
[   15.850051]
[   15.850051] but task is already holding lock:
[   15.850051]  (timekeeper_lock){-.-.-.}, at: [<ffffffff810df6df>]
do_adjtimex+0x7f/0x100
[   15.850051]
[   15.850051] which lock already depends on the new lock.
[   15.850051]
[   15.850051]
[   15.850051] the existing dependency chain (in reverse order) is:
[   15.850051]
-> #2 (timekeeper_lock){-.-.-.}:
[   15.850051]        [<ffffffff810eee7e>] lock_acquire+0x9e/0x1e0
[   15.850051]        [<ffffffff81900983>] _raw_spin_lock_irqsave+0x53/0x90
[   15.850051]        [<ffffffff810dde62>] ktime_get+0x32/0x110
[   15.850051]        [<ffffffff810c0cd0>] hrtick_start+0x20/0x90
[   15.850051]        [<ffffffff810c8fb3>] hrtick_start_fair+0x83/0xd0
[   15.850051]        [<ffffffff810c94ce>] hrtick_update+0x5e/0x60
[   15.850051]        [<ffffffff810cc5a1>] enqueue_task_fair+0x961/0xfd0
[   15.850051]        [<ffffffff810c0c11>] enqueue_task+0x61/0x70
[   15.850051]        [<ffffffff810c101e>] activate_task+0x1e/0x20
[   15.850051]        [<ffffffff818fedf9>] __schedule+0x799/0xa10
[   15.850051]        [<ffffffff818ff094>] schedule+0x24/0x70
[   15.850051]        [<ffffffff818fc84c>] schedule_timeout+0x16c/0x2c0
[   15.850051]        [<ffffffff818fe60d>] io_schedule_timeout+0x9d/0xf0
[   15.850051]        [<ffffffff818ff74f>] wait_for_completion_io+0xaf/0x120
[   15.850051]        [<ffffffff813468ec>] blk_execute_rq+0x8c/0xe0
[   15.850051]        [<ffffffff8157b23f>] scsi_execute+0xdf/0x170
[   15.850051]        [<ffffffff8157b365>] scsi_execute_req_flags+0x95/0x110
[   15.850051]        [<ffffffff8158c3b4>] sr_check_events+0xa4/0x2e0
[   15.850051]        [<ffffffff815e0393>] cdrom_check_events+0x13/0x30
[   15.850051]        [<ffffffff8158bde4>] sr_block_check_events+0x34/0x50
[   15.850051]        [<ffffffff81349dd7>] disk_check_events+0x57/0x180
[   15.850051]        [<ffffffff81349f17>] disk_events_workfn+0x17/0x20
[   15.850051]        [<ffffffff810ab777>] process_one_work+0x1e7/0x660
[   15.850051]        [<ffffffff810ac069>] worker_thread+0x119/0x370
[   15.850051]        [<ffffffff810b4376>] kthread+0xd6/0xe0
[   15.850051]        [<ffffffff81901fac>] ret_from_fork+0x7c/0xb0
[   15.850051]
-> #1 (&p->pi_lock){-.-.-.}:
[   15.850051]        [<ffffffff810eee7e>] lock_acquire+0x9e/0x1e0
[   15.850051]        [<ffffffff81900983>] _raw_spin_lock_irqsave+0x53/0x90
[   15.850051]        [<ffffffff810c436c>] try_to_wake_up+0x2c/0x300
[   15.850051]        [<ffffffff810c467e>] wake_up_process+0x1e/0x40
[   15.850051]        [<ffffffff810a89e2>] start_worker+0x22/0x30
[   15.850051]        [<ffffffff810a9d98>] create_and_start_worker+0x48/0x90
[   15.850051]        [<ffffffff820d89a8>] init_workqueues+0x1b8/0x380
[   15.850051]        [<ffffffff820c0e94>] do_one_initcall+0x7b/0x114
[   15.850051]        [<ffffffff820c0fc6>] kernel_init_freeable+0x99/0x1c3
[   15.850051]        [<ffffffff818e2b59>] kernel_init+0x9/0xf0
[   15.850051]        [<ffffffff81901fac>] ret_from_fork+0x7c/0xb0
[   15.850051]
-> #0 (&(&pool->lock)->rlock){..-...}:
[   15.850051]        [<ffffffff810ed20e>] __lock_acquire+0x1dce/0x1ee0
[   15.850051]        [<ffffffff810eee7e>] lock_acquire+0x9e/0x1e0
[   15.850051]        [<ffffffff819008f1>] _raw_spin_lock+0x41/0x80
[   15.850051]        [<ffffffff810aa9b5>] __queue_work+0x145/0x480
[   15.850051]        [<ffffffff810aadba>] __queue_delayed_work+0xaa/0x1a0
[   15.850051]        [<ffffffff810ac4de>] queue_delayed_work_on+0x8e/0xa0
[   15.850051]        [<ffffffff810dfe68>] __do_adjtimex+0x158/0x560
[   15.850051]        [<ffffffff810df703>] do_adjtimex+0xa3/0x100
[   15.850051]        [<ffffffff81090f7b>] SYSC_adjtimex+0x3b/0x80
[   15.850051]        [<ffffffff810912b9>] SyS_adjtimex+0x9/0x10
[   15.850051]        [<ffffffff81902264>] tracesys+0xdd/0xe2
[   15.850051]
[   15.850051] other info that might help us debug this:
[   15.850051]
[   15.850051] Chain exists of:
  &(&pool->lock)->rlock --> &p->pi_lock --> timekeeper_lock
[   15.850051]  Possible unsafe locking scenario:
[   15.850051]
[   15.850051]        CPU0                    CPU1
[   15.850051]        ----                    ----
[   15.850051]   lock(timekeeper_lock);
[   15.850051]                                lock(&p->pi_lock);
[   15.850051]                                lock(timekeeper_lock);
[   15.850051]   lock(&(&pool->lock)->rlock);
[   15.850051]
[   15.850051]  *** DEADLOCK ***
[   15.850051]
[   15.850051] 1 lock held by ntpd/4062:
[   15.850051]  #0:  (timekeeper_lock){-.-.-.}, at: [<ffffffff810df6df>]
do_adjtimex+0x7f/0x100
[   15.850051]
[   15.850051] stack backtrace:
[   15.850051] CPU: 0 PID: 4062 Comm: ntpd Not tainted 3.10.0+ #30
[   15.850051] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[   15.850051]  ffffffff82354260 ffff880000239b78 ffffffff818f6cfe
ffff880000239bb8
[   15.850051]  ffffffff818f2f51 ffff880000239c40 ffff88000002ea78
ffff88000002ea40
[   15.850051]  ffff88000002e310 0000000000000000 0000000000000001
ffff880000239ca8
[   15.850051] Call Trace:
[   15.850051]  [<ffffffff818f6cfe>] dump_stack+0x19/0x1b
[   15.850051]  [<ffffffff818f2f51>] print_circular_bug+0x1f9/0x208
[   15.850051]  [<ffffffff810ed20e>] __lock_acquire+0x1dce/0x1ee0
[   15.850051]  [<ffffffff810e9c2d>] ? trace_hardirqs_off+0xd/0x10
[   15.850051]  [<ffffffff810c6fad>] ? sched_clock_local+0x1d/0x90
[   15.850051]  [<ffffffff810eee7e>] lock_acquire+0x9e/0x1e0
[   15.850051]  [<ffffffff810aa9b5>] ? __queue_work+0x145/0x480
[   15.850051]  [<ffffffff819008f1>] _raw_spin_lock+0x41/0x80
[   15.850051]  [<ffffffff810aa9b5>] ? __queue_work+0x145/0x480
[   15.850051]  [<ffffffff810aa9b5>] __queue_work+0x145/0x480
[   15.850051]  [<ffffffff810aadba>] __queue_delayed_work+0xaa/0x1a0
[   15.850051]  [<ffffffff810ac4de>] queue_delayed_work_on+0x8e/0xa0
[   15.850051]  [<ffffffff810dfe68>] __do_adjtimex+0x158/0x560
[   15.850051]  [<ffffffff810df703>] do_adjtimex+0xa3/0x100
[   15.850051]  [<ffffffff81090f7b>] SYSC_adjtimex+0x3b/0x80
[   15.850051]  [<ffffffff810912b9>] SyS_adjtimex+0x9/0x10
[   15.850051]  [<ffffffff81902264>] tracesys+0xdd/0xe2



So yea, thanks to the lockdep code we can see there's a three-lock deadlock.

And as you discovered, it connected to adjtimex and the timekeeping
locking changes. Specifically that we're now calling
schedule_delayed_work() under a timekeeper_seq write, and that somehow
ends up calling back to a timekeeping accessor function.

Now, I'm still in the dark as to why HRTICK exposes this, but seems like
the following patch should resolve the issue (and quiets the lockdep
warnings in my testing). Let me know how it works for you!

Ingo: This makes me think we really should have some lockdep smarts
added to seqlock/seqcount structures. Is there something already
discovered thats preventing this, or has this just not yet been tried?

thanks
-john

>From 09885df9734aa678ccb61dbef356fea4faeff16c Mon Sep 17 00:00:00 2001
From: John Stultz <john.stultz@linaro.org>
Date: Mon, 9 Sep 2013 13:07:18 -0700
Subject: [PATCH] timekeepeing: Fix HRTICK related deadlock from ntp lock
 changes

It was reported that when HRTICK is enabled, its possible to
trigger system deadlocks. These were hard to reproduce, as
HRTICK has been broken in the past, but seemed to be connected
to the timekeeping_seq lock.

Since seqlock/seqcount's aren't supported w/ lockdep, I added
some extra spinlock based locking and triggered the following
lockdep output:

[   15.849182] ntpd/4062 is trying to acquire lock:
[   15.849765]  (&(&pool->lock)->rlock){..-...}, at: [<ffffffff810aa9b5>] __queue_work+0x145/0x480
[   15.850051]
[   15.850051] but task is already holding lock:
[   15.850051]  (timekeeper_lock){-.-.-.}, at: [<ffffffff810df6df>] do_adjtimex+0x7f/0x100

<snip>

[   15.850051] Chain exists of:
  &(&pool->lock)->rlock --> &p->pi_lock --> timekeeper_lock
[   15.850051]  Possible unsafe locking scenario:
[   15.850051]
[   15.850051]        CPU0                    CPU1
[   15.850051]        ----                    ----
[   15.850051]   lock(timekeeper_lock);
[   15.850051]                                lock(&p->pi_lock);
[   15.850051]                                lock(timekeeper_lock);
[   15.850051]   lock(&(&pool->lock)->rlock);
[   15.850051]
[   15.850051]  *** DEADLOCK ***

This patch avoids this deadlock, by moving the call to
schedule_delayed_work() outside of the timekeeper lock
critical section.

Reported-by: Gerlando Falauto <gerlando.falauto@keymile.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
---
 include/linux/timex.h     | 1 +
 kernel/time/ntp.c         | 6 ++----
 kernel/time/timekeeping.c | 2 ++
 3 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/include/linux/timex.h b/include/linux/timex.h
index b3726e6..dd3edd7 100644
--- a/include/linux/timex.h
+++ b/include/linux/timex.h
@@ -141,6 +141,7 @@ extern int do_adjtimex(struct timex *);
 extern void hardpps(const struct timespec *, const struct timespec *);
 
 int read_current_timer(unsigned long *timer_val);
+void ntp_notify_cmos_timer(void);
 
 /* The clock frequency of the i8253/i8254 PIT */
 #define PIT_TICK_RATE 1193182ul
diff --git a/kernel/time/ntp.c b/kernel/time/ntp.c
index 8f5b3b9..bb22151 100644
--- a/kernel/time/ntp.c
+++ b/kernel/time/ntp.c
@@ -516,13 +516,13 @@ static void sync_cmos_clock(struct work_struct *work)
 	schedule_delayed_work(&sync_cmos_work, timespec_to_jiffies(&next));
 }
 
-static void notify_cmos_timer(void)
+void ntp_notify_cmos_timer(void)
 {
 	schedule_delayed_work(&sync_cmos_work, 0);
 }
 
 #else
-static inline void notify_cmos_timer(void) { }
+void ntp_notify_cmos_timer(void) { }
 #endif
 
 
@@ -687,8 +687,6 @@ int __do_adjtimex(struct timex *txc, struct timespec *ts, s32 *time_tai)
 	if (!(time_status & STA_NANO))
 		txc->time.tv_usec /= NSEC_PER_USEC;
 
-	notify_cmos_timer();
-
 	return result;
 }
 
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index baeeb5c..fcc261c 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -1682,6 +1682,8 @@ int do_adjtimex(struct timex *txc)
 	write_seqcount_end(&timekeeper_seq);
 	raw_spin_unlock_irqrestore(&timekeeper_lock, flags);
 
+	ntp_notify_cmos_timer();
+
 	return ret;
 }
 
-- 
1.8.1.2


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: kernel deadlock
  2013-09-09 20:29               ` John Stultz
@ 2013-09-10  7:29                 ` Ingo Molnar
  2013-09-10 16:23                   ` John Stultz
  2013-09-10  7:56                 ` Peter Zijlstra
  2013-09-10  8:59                 ` Lin Ming
  2 siblings, 1 reply; 19+ messages in thread
From: Ingo Molnar @ 2013-09-10  7:29 UTC (permalink / raw)
  To: John Stultz
  Cc: Gerlando Falauto, linux-kernel@vger.kernel.org, Thomas Gleixner,
	Richard Cochran, Prarit Bhargava, Brunck, Holger,
	Longchamp, Valentin, Bigler, Stefan, Peter Zijlstra


* John Stultz <john.stultz@linaro.org> wrote:

> Now, I'm still in the dark as to why HRTICK exposes this, but seems like 
> the following patch should resolve the issue (and quiets the lockdep 
> warnings in my testing). Let me know how it works for you!
> 
> Ingo: This makes me think we really should have some lockdep smarts 
> added to seqlock/seqcount structures. Is there something already 
> discovered thats preventing this, or has this just not yet been tried?

There's no fundamental reason to not do it, it's a very good idea!

> From 09885df9734aa678ccb61dbef356fea4faeff16c Mon Sep 17 00:00:00 2001
> From: John Stultz <john.stultz@linaro.org>
> Date: Mon, 9 Sep 2013 13:07:18 -0700
> Subject: [PATCH] timekeepeing: Fix HRTICK related deadlock from ntp lock changes

Cool fix! I think I've seen this deadlock in various incarnations but was 
never able to track it down. Do you have a guesstimate about roughly when 
it was introduced? Is it an ancient bug?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: kernel deadlock
  2013-09-10  7:29                 ` Ingo Molnar
@ 2013-09-10 16:23                   ` John Stultz
  0 siblings, 0 replies; 19+ messages in thread
From: John Stultz @ 2013-09-10 16:23 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Gerlando Falauto, linux-kernel@vger.kernel.org, Thomas Gleixner,
	Richard Cochran, Prarit Bhargava, Brunck, Holger,
	Longchamp, Valentin, Bigler, Stefan, Peter Zijlstra

On 09/10/2013 12:29 AM, Ingo Molnar wrote:
> * John Stultz <john.stultz@linaro.org> wrote:
>
>> Now, I'm still in the dark as to why HRTICK exposes this, but seems like 
>> the following patch should resolve the issue (and quiets the lockdep 
>> warnings in my testing). Let me know how it works for you!
>>
>> Ingo: This makes me think we really should have some lockdep smarts 
>> added to seqlock/seqcount structures. Is there something already 
>> discovered thats preventing this, or has this just not yet been tried?
> There's no fundamental reason to not do it, it's a very good idea!
>
>> From 09885df9734aa678ccb61dbef356fea4faeff16c Mon Sep 17 00:00:00 2001
>> From: John Stultz <john.stultz@linaro.org>
>> Date: Mon, 9 Sep 2013 13:07:18 -0700
>> Subject: [PATCH] timekeepeing: Fix HRTICK related deadlock from ntp lock changes
> Cool fix! I think I've seen this deadlock in various incarnations but was 
> never able to track it down. Do you have a guesstimate about roughly when 
> it was introduced? Is it an ancient bug?

3.10, I believe. So no not ancient, but we've seen and resolved similar
issues in the past (the old printk while holding xtime_lock deadlock).

Hopefully the lockdep enablement helps sort these out for good.

thanks
-john


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: kernel deadlock
  2013-09-09 20:29               ` John Stultz
  2013-09-10  7:29                 ` Ingo Molnar
@ 2013-09-10  7:56                 ` Peter Zijlstra
  2013-09-10  8:59                 ` Lin Ming
  2 siblings, 0 replies; 19+ messages in thread
From: Peter Zijlstra @ 2013-09-10  7:56 UTC (permalink / raw)
  To: John Stultz
  Cc: Gerlando Falauto, linux-kernel@vger.kernel.org, Thomas Gleixner,
	Richard Cochran, Prarit Bhargava, Brunck, Holger,
	Longchamp, Valentin, Bigler, Stefan, Ingo Molnar

On Mon, Sep 09, 2013 at 01:29:47PM -0700, John Stultz wrote:
> Ingo: This makes me think we really should have some lockdep smarts
> added to seqlock/seqcount structures. Is there something already
> discovered thats preventing this, or has this just not yet been tried?

The only reason this hasn't been done in the past is because of the VDSO
I believe. We use a seqcount_t from userspace.

We either have to open-code the VDSO version or create a variant for
that specific single use case.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: kernel deadlock
  2013-09-09 20:29               ` John Stultz
  2013-09-10  7:29                 ` Ingo Molnar
  2013-09-10  7:56                 ` Peter Zijlstra
@ 2013-09-10  8:59                 ` Lin Ming
  2013-09-10 16:38                   ` John Stultz
  2 siblings, 1 reply; 19+ messages in thread
From: Lin Ming @ 2013-09-10  8:59 UTC (permalink / raw)
  To: John Stultz
  Cc: Gerlando Falauto, linux-kernel@vger.kernel.org, Thomas Gleixner,
	Richard Cochran, Prarit Bhargava, Brunck, Holger,
	Longchamp, Valentin, Bigler, Stefan, Peter Zijlstra, Ingo Molnar

 On Tue, Sep 10, 2013 at 4:29 AM, John Stultz <john.stultz@linaro.org> wrote:

[snip]

>
> So I think I've managed to finally reproduce this and hunt it down.
>
> With Peter's "sched: Fix HRTICK" patch and HRTICK enabled, I found I
> could trigger a hard hang at boot on my x86_64 kvm system. sysrq didn't
> function, so I checked out info cpus and that pointed to both cpus being

Hi,

Is "info cpus" a command of kvm/qemu? That's very helpful.
I can reproduce this bug, but there is no any output.
How did you find out that both cpus being in ktime_get() and
ktime_get_update_offsets().

> in ktime_get() and ktime_get_update_offsets(), which suggested a
> seqcount deadlock (basically calling something that reads the seqlock
> while we hold the write on it).

HRTICK enabled, then I can reproduce this simply with,

while [ 1 ] ;
    adjtimex -t 9999
done

And your patch fixed it.

Thanks,
Lin Ming

>
> Unfortunately the seqlock/seqcount infrastructure doesn't support
> lockdep, so I added some debug code to take and release the
> timekeeper_lock in every function that does a read on the timekeeper_seq.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: kernel deadlock
  2013-09-10  8:59                 ` Lin Ming
@ 2013-09-10 16:38                   ` John Stultz
  2013-09-11  0:37                     ` Lin Ming
  0 siblings, 1 reply; 19+ messages in thread
From: John Stultz @ 2013-09-10 16:38 UTC (permalink / raw)
  To: Lin Ming
  Cc: Gerlando Falauto, linux-kernel@vger.kernel.org, Thomas Gleixner,
	Richard Cochran, Prarit Bhargava, Brunck, Holger,
	Longchamp, Valentin, Bigler, Stefan, Peter Zijlstra, Ingo Molnar

On 09/10/2013 01:59 AM, Lin Ming wrote:
>  On Tue, Sep 10, 2013 at 4:29 AM, John Stultz <john.stultz@linaro.org> wrote:
>
> [snip]
>
>> So I think I've managed to finally reproduce this and hunt it down.
>>
>> With Peter's "sched: Fix HRTICK" patch and HRTICK enabled, I found I
>> could trigger a hard hang at boot on my x86_64 kvm system. sysrq didn't
>> function, so I checked out info cpus and that pointed to both cpus being
> Hi,
>
> Is "info cpus" a command of kvm/qemu? That's very helpful.
Yes. If you switch into the qemu monitor, you can use "info cpus" to
show the current instruction pointers

> I can reproduce this bug, but there is no any output.
> How did you find out that both cpus being in ktime_get() and
> ktime_get_update_offsets().

Once I had the instruction pointers, I ran gdb on the vmlinux and used
"list *<address>" to show where it was stuck.

>> in ktime_get() and ktime_get_update_offsets(), which suggested a
>> seqcount deadlock (basically calling something that reads the seqlock
>> while we hold the write on it).
> HRTICK enabled, then I can reproduce this simply with,
>
> while [ 1 ] ;
>     adjtimex -t 9999
> done
>
> And your patch fixed it.

Great! Can I add a Tested-by: from you on the patch?

thanks
-john


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: kernel deadlock
  2013-09-10 16:38                   ` John Stultz
@ 2013-09-11  0:37                     ` Lin Ming
  0 siblings, 0 replies; 19+ messages in thread
From: Lin Ming @ 2013-09-11  0:37 UTC (permalink / raw)
  To: John Stultz
  Cc: Gerlando Falauto, linux-kernel@vger.kernel.org, Thomas Gleixner,
	Richard Cochran, Prarit Bhargava, Brunck, Holger,
	Longchamp, Valentin, Bigler, Stefan, Peter Zijlstra, Ingo Molnar

On Wed, Sep 11, 2013 at 12:38 AM, John Stultz <john.stultz@linaro.org> wrote:
> On 09/10/2013 01:59 AM, Lin Ming wrote:
>>  On Tue, Sep 10, 2013 at 4:29 AM, John Stultz <john.stultz@linaro.org> wrote:
>>
>> [snip]
>>
>>> So I think I've managed to finally reproduce this and hunt it down.
>>>
>>> With Peter's "sched: Fix HRTICK" patch and HRTICK enabled, I found I
>>> could trigger a hard hang at boot on my x86_64 kvm system. sysrq didn't
>>> function, so I checked out info cpus and that pointed to both cpus being
>> Hi,
>>
>> Is "info cpus" a command of kvm/qemu? That's very helpful.
> Yes. If you switch into the qemu monitor, you can use "info cpus" to
> show the current instruction pointers
>
>> I can reproduce this bug, but there is no any output.
>> How did you find out that both cpus being in ktime_get() and
>> ktime_get_update_offsets().
>
> Once I had the instruction pointers, I ran gdb on the vmlinux and used
> "list *<address>" to show where it was stuck.
>
>>> in ktime_get() and ktime_get_update_offsets(), which suggested a
>>> seqcount deadlock (basically calling something that reads the seqlock
>>> while we hold the write on it).
>> HRTICK enabled, then I can reproduce this simply with,
>>
>> while [ 1 ] ;
>>     adjtimex -t 9999
>> done
>>
>> And your patch fixed it.
>
> Great! Can I add a Tested-by: from you on the patch?

Sure.
Tested-by: Lin Ming <minggr@gmail.com>

Thanks.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: kernel deadlock
  2013-09-03 17:26           ` John Stultz
  2013-09-04  8:11             ` Gerlando Falauto
@ 2013-09-09 10:08             ` Peter Zijlstra
  2013-09-12 14:42               ` Gerlando Falauto
  1 sibling, 1 reply; 19+ messages in thread
From: Peter Zijlstra @ 2013-09-09 10:08 UTC (permalink / raw)
  To: John Stultz
  Cc: Gerlando Falauto, linux-kernel@vger.kernel.org, Thomas Gleixner,
	Richard Cochran, Prarit Bhargava, Brunck, Holger,
	Longchamp, Valentin, Bigler, Stefan

On Tue, Sep 03, 2013 at 10:26:19AM -0700, John Stultz wrote:
> Enabling the SCHED_FEAT(HRTICK, true) bit tends to cause lots of issues
> on the various hardware I have, tripping the lockdep warnings on various
> other issues:

Does whatever kernel you guys are running have this commit:

---
commit 971ee28cbd1ccd87b3164facd9359a534c1d2892
Author: Peter Zijlstra <peterz@infradead.org>
Date:   Fri Jun 28 11:18:53 2013 +0200

    sched: Fix HRTICK
    
    David reported that the HRTICK sched feature was borken; which was enough
    motivation for me to finally fix it ;-)
    
    We should not allow hrtimer code to do softirq wakeups while holding scheduler
    locks. The hrtimer code only needs this when we accidentally try to program an
    expired time. We don't much care about those anyway since we have the regular
    tick to fall back to.
    
    Reported-by: David Ahern <dsahern@gmail.com>
    Tested-by: David Ahern <dsahern@gmail.com>
    Signed-off-by: Peter Zijlstra <peterz@infradead.org>
    Link: http://lkml.kernel.org/r/20130628091853.GE29209@dyad.programming.kicks-ass.net
    Signed-off-by: Ingo Molnar <mingo@kernel.org>

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 9b1f2e5..0d8eb45 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -370,13 +370,6 @@ static struct rq *this_rq_lock(void)
 #ifdef CONFIG_SCHED_HRTICK
 /*
  * Use HR-timers to deliver accurate preemption points.
- *
- * Its all a bit involved since we cannot program an hrt while holding the
- * rq->lock. So what we do is store a state in in rq->hrtick_* and ask for a
- * reschedule event.
- *
- * When we get rescheduled we reprogram the hrtick_timer outside of the
- * rq->lock.
  */
 
 static void hrtick_clear(struct rq *rq)
@@ -404,6 +397,15 @@ static enum hrtimer_restart hrtick(struct hrtimer *timer)
 }
 
 #ifdef CONFIG_SMP
+
+static int __hrtick_restart(struct rq *rq)
+{
+	struct hrtimer *timer = &rq->hrtick_timer;
+	ktime_t time = hrtimer_get_softexpires(timer);
+
+	return __hrtimer_start_range_ns(timer, time, 0, HRTIMER_MODE_ABS_PINNED, 0);
+}
+
 /*
  * called from hardirq (IPI) context
  */
@@ -412,7 +414,7 @@ static void __hrtick_start(void *arg)
 	struct rq *rq = arg;
 
 	raw_spin_lock(&rq->lock);
-	hrtimer_restart(&rq->hrtick_timer);
+	__hrtick_restart(rq);
 	rq->hrtick_csd_pending = 0;
 	raw_spin_unlock(&rq->lock);
 }
@@ -430,7 +432,7 @@ void hrtick_start(struct rq *rq, u64 delay)
 	hrtimer_set_expires(timer, time);
 
 	if (rq == this_rq()) {
-		hrtimer_restart(timer);
+		__hrtick_restart(rq);
 	} else if (!rq->hrtick_csd_pending) {
 		__smp_call_function_single(cpu_of(rq), &rq->hrtick_csd, 0);
 		rq->hrtick_csd_pending = 1;

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: kernel deadlock
  2013-09-09 10:08             ` Peter Zijlstra
@ 2013-09-12 14:42               ` Gerlando Falauto
  0 siblings, 0 replies; 19+ messages in thread
From: Gerlando Falauto @ 2013-09-12 14:42 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: John Stultz, linux-kernel@vger.kernel.org, Thomas Gleixner,
	Richard Cochran, Prarit Bhargava, Brunck, Holger,
	Longchamp, Valentin, Bigler, Stefan, sboyd

Hi Peter,

sorry for the slow response...

On 09/09/2013 12:08 PM, Peter Zijlstra wrote:
> On Tue, Sep 03, 2013 at 10:26:19AM -0700, John Stultz wrote:
>> Enabling the SCHED_FEAT(HRTICK, true) bit tends to cause lots of issues
>> on the various hardware I have, tripping the lockdep warnings on various
>> other issues:
>
> Does whatever kernel you guys are running have this commit:

That didn't seem to make a difference (it had already been pointed out 
by Stephen Boyd).
In any case, latest patch from John fixes the problem... Hooray! :-)

Thanks again John!
Gerlando


>
> ---
> commit 971ee28cbd1ccd87b3164facd9359a534c1d2892
> Author: Peter Zijlstra <peterz@infradead.org>
> Date:   Fri Jun 28 11:18:53 2013 +0200
>
>      sched: Fix HRTICK
>
>      David reported that the HRTICK sched feature was borken; which was enough
>      motivation for me to finally fix it ;-)
>
>      We should not allow hrtimer code to do softirq wakeups while holding scheduler
>      locks. The hrtimer code only needs this when we accidentally try to program an
>      expired time. We don't much care about those anyway since we have the regular
>      tick to fall back to.
>
>      Reported-by: David Ahern <dsahern@gmail.com>
>      Tested-by: David Ahern <dsahern@gmail.com>
>      Signed-off-by: Peter Zijlstra <peterz@infradead.org>
>      Link: http://lkml.kernel.org/r/20130628091853.GE29209@dyad.programming.kicks-ass.net
>      Signed-off-by: Ingo Molnar <mingo@kernel.org>
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 9b1f2e5..0d8eb45 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -370,13 +370,6 @@ static struct rq *this_rq_lock(void)
>   #ifdef CONFIG_SCHED_HRTICK
>   /*
>    * Use HR-timers to deliver accurate preemption points.
> - *
> - * Its all a bit involved since we cannot program an hrt while holding the
> - * rq->lock. So what we do is store a state in in rq->hrtick_* and ask for a
> - * reschedule event.
> - *
> - * When we get rescheduled we reprogram the hrtick_timer outside of the
> - * rq->lock.
>    */
>
>   static void hrtick_clear(struct rq *rq)
> @@ -404,6 +397,15 @@ static enum hrtimer_restart hrtick(struct hrtimer *timer)
>   }
>
>   #ifdef CONFIG_SMP
> +
> +static int __hrtick_restart(struct rq *rq)
> +{
> +	struct hrtimer *timer = &rq->hrtick_timer;
> +	ktime_t time = hrtimer_get_softexpires(timer);
> +
> +	return __hrtimer_start_range_ns(timer, time, 0, HRTIMER_MODE_ABS_PINNED, 0);
> +}
> +
>   /*
>    * called from hardirq (IPI) context
>    */
> @@ -412,7 +414,7 @@ static void __hrtick_start(void *arg)
>   	struct rq *rq = arg;
>
>   	raw_spin_lock(&rq->lock);
> -	hrtimer_restart(&rq->hrtick_timer);
> +	__hrtick_restart(rq);
>   	rq->hrtick_csd_pending = 0;
>   	raw_spin_unlock(&rq->lock);
>   }
> @@ -430,7 +432,7 @@ void hrtick_start(struct rq *rq, u64 delay)
>   	hrtimer_set_expires(timer, time);
>
>   	if (rq == this_rq()) {
> -		hrtimer_restart(timer);
> +		__hrtick_restart(rq);
>   	} else if (!rq->hrtick_csd_pending) {
>   		__smp_call_function_single(cpu_of(rq), &rq->hrtick_csd, 0);
>   		rq->hrtick_csd_pending = 1;
>


^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2013-09-12 14:42 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-07-30 20:00 Kernel deadlock Paul Douglas
     [not found] <521F6D06.1040107@keymile.com>
2013-08-29 20:56 ` kernel deadlock Falauto, Gerlando
2013-08-29 23:45   ` John Stultz
2013-08-30 23:04     ` Gerlando Falauto
2013-08-30 23:10       ` John Stultz
2013-08-31  0:48         ` Stephen Boyd
2013-08-31  8:11           ` Gerlando Falauto
2013-09-03 14:57         ` Gerlando Falauto
2013-09-03 17:26           ` John Stultz
2013-09-04  8:11             ` Gerlando Falauto
2013-09-09 20:29               ` John Stultz
2013-09-10  7:29                 ` Ingo Molnar
2013-09-10 16:23                   ` John Stultz
2013-09-10  7:56                 ` Peter Zijlstra
2013-09-10  8:59                 ` Lin Ming
2013-09-10 16:38                   ` John Stultz
2013-09-11  0:37                     ` Lin Ming
2013-09-09 10:08             ` Peter Zijlstra
2013-09-12 14:42               ` Gerlando Falauto

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).