[RT] RCU splat on 2.6.33.4-rt20

linux-rt-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [RT] RCU splat on 2.6.33.4-rt20
@ 2010-05-20  2:19 Clark Williams
  2010-05-20  3:22 ` Paul E. McKenney
  0 siblings, 1 reply; 3+ messages in thread
From: Clark Williams @ 2010-05-20  2:19 UTC (permalink / raw)
  To: Thomas Gleixner, Paul E. McKenney; +Cc: RT

[-- Attachment #1: Type: text/plain, Size: 6261 bytes --]

Thomas/Paul,

Resumed my laptop (T60, 64-bit kernel) and after about five
minutes of doing nothing but surfing with firefox, got the following
RCU stall notification from a 2.6.33.4-rt20 kernel:

INFO: RCU detected CPU 1 stall (t=0 jiffies)
sending NMI to all CPUs:
NMI backtrace for cpu 1
CPU 1 
Pid: 0, comm: swapper Not tainted 2.6.33.4-rt20 #37        /       
RIP: 0010:[<ffffffff8120f188>]  [<ffffffff8120f188>]
__const_udelay+0x37/0x44 RSP: 0018:ffff88000a303dc8  EFLAGS: 00000002
RAX: 0000000001062560 RBX: 0000000000000001 RCX: ffff88000a300000
RDX: 000000001dba61d0 RSI: 0000000000000002 RDI: 0000000000418958
RBP: ffff88000a303dc8 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffff88000a30b8b0
R13: ffffffff81a7c800 R14: ffff88000a303f48 R15: ffff88000a30b5e0
FS:  0000000000000000(0000) GS:ffff88000a300000(0000)
knlGS:0000000000000000 CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00007f5fedd4b000 CR3: 0000000001a56000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process swapper (pid: 0, threadinfo ffff8800bf1d8000, task
ffff8800bf1d67c0) Stack:
 ffff88000a303de8 ffffffff810185aa 000000000000088e ffffffff81a7c800
<0> ffff88000a303e38 ffffffff81092650 ffff8800bf1d9dc8 0000000000000001
<0> ffff88000a303e58 0000000000000001 0000000000000001 0000000000000000
Call Trace:
 <IRQ> 
 [<ffffffff810185aa>] arch_trigger_all_cpu_backtrace+0x56/0x62
 [<ffffffff81092650>] __rcu_pending+0x6e/0x2d3
 [<ffffffff81092909>] rcu_check_callbacks+0x54/0x15b
 [<ffffffff8104972e>] update_process_times+0x46/0x5b
 [<ffffffff810656ed>] tick_sched_timer+0x7a/0xa4
 [<ffffffff8105b7ca>] __run_hrtimer+0x130/0x1bd
 [<ffffffff81065673>] ? tick_sched_timer+0x0/0xa4
 [<ffffffff8105bb9f>] hrtimer_interrupt+0xdc/0x1dd
 [<ffffffff8143bcb4>] smp_apic_timer_interrupt+0x89/0x9c
 [<ffffffff81003593>] apic_timer_interrupt+0x13/0x20
 <EOI> 
 [<ffffffff81275cd8>] ? acpi_idle_enter_bm+0x2b0/0x2db
 [<ffffffff81275cce>] ? acpi_idle_enter_bm+0x2a6/0x2db
 [<ffffffff81370a67>] cpuidle_idle_call+0x96/0x11f
 [<ffffffff81001da9>] cpu_idle+0xab/0xee
 [<ffffffff81430232>] start_secondary+0x1a5/0x1a9
Code: 00 ff 00 00 48 8d 04 bd 00 00 00 00 65 8b 0c 25 78 ac 00 00 48 63
c9 48 8b 0c cd 60 23 af 81 48 69 94 0a 98 00 00 00 fa 00 00 00 <f7> e2
48 8d 7a 01 e8 ad ff ff ff c9 c3 55 48 89 e5 0f 1f 44 00 Call Trace:
<#MC>  <<EOE>> Pid: 0, comm: swapper Not tainted 2.6.33.4-rt20 #37 Call
Trace: <NMI>  [<ffffffff81001a9c>] ? show_regs+0x2b/0x30
 [<ffffffff814380bd>] nmi_watchdog_tick+0xc2/0x1a6
 [<ffffffff814376c3>] do_nmi+0xb5/0x25b
 [<ffffffff814371aa>] nmi+0x1a/0x20
 [<ffffffff8120f188>] ? __const_udelay+0x37/0x44
 <<EOE>>  <IRQ>  [<ffffffff810185aa>]
arch_trigger_all_cpu_backtrace+0x56/0x62 [<ffffffff81092650>]
__rcu_pending+0x6e/0x2d3 [<ffffffff81092909>]
rcu_check_callbacks+0x54/0x15b [<ffffffff8104972e>]
update_process_times+0x46/0x5b [<ffffffff810656ed>]
tick_sched_timer+0x7a/0xa4 [<ffffffff8105b7ca>]
__run_hrtimer+0x130/0x1bd [<ffffffff81065673>] ?
tick_sched_timer+0x0/0xa4 [<ffffffff8105bb9f>]
hrtimer_interrupt+0xdc/0x1dd [<ffffffff8143bcb4>]
smp_apic_timer_interrupt+0x89/0x9c [<ffffffff81003593>]
apic_timer_interrupt+0x13/0x20 <EOI>  [<ffffffff81275cd8>] ?
acpi_idle_enter_bm+0x2b0/0x2db [<ffffffff81275cce>] ?
acpi_idle_enter_bm+0x2a6/0x2db [<ffffffff81370a67>]
cpuidle_idle_call+0x96/0x11f [<ffffffff81001da9>] cpu_idle+0xab/0xee
 [<ffffffff81430232>] start_secondary+0x1a5/0x1a9
NMI backtrace for cpu 0
CPU 0 
Pid: 31, comm: events/0 Not tainted 2.6.33.4-rt20 #37        /       
RIP: 0010:[<ffffffff813c124d>]  [<ffffffff813c124d>]
rt_worker_func+0xd3/0x2de RSP: 0018:ffff8800bf1ddd80  EFLAGS: 00000246
RAX: 0000000000000000 RBX: 0000000000002770 RCX: ffff8800b8a13b80
RDX: 0000000000002770 RSI: ffff8800bf1ddfd8 RDI: ffff8800b893a760
RBP: ffff8800bf1dddf0 R08: ffff8800bf1dc000 R09: ffff8800bf1da800
R10: ffff88000a210000 R11: 0000000000000001 R12: 0000000000006666
R13: 0000000000000000 R14: 00000000000062b0 R15: 0000000000013b80
FS:  0000000000000000(0000) GS:ffff88000a200000(0000)
knlGS:0000000000000000 CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00007fbe40a86000 CR3: 0000000001a56000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process events/0 (pid: 31, threadinfo ffff8800bf1dc000, task
ffff8800bf1da800) Stack:
 ffff8800bf1dddc0 ffff8800b8a135a0 0000000000000000 ffff8800bf1ddfd8
<0> 0000000000002760 00000000000493e0 0000000000000000 00000000000493e0
<0> ffff8800bf1dc000 ffff88000a212ac0 ffff8800bf1da800 ffff8800bf1dc000
Call Trace:
 [<ffffffff81053f0f>] worker_thread+0x206/0x2a3
 [<ffffffff813c117a>] ? rt_worker_func+0x0/0x2de
 [<ffffffff81058360>] ? autoremove_wake_function+0x0/0x39
 [<ffffffff81053d09>] ? worker_thread+0x0/0x2a3
 [<ffffffff81053d09>] ? worker_thread+0x0/0x2a3
 [<ffffffff81057fbd>] kthread+0x7f/0x87
 [<ffffffff810039d4>] kernel_thread_helper+0x4/0x10
 [<ffffffff81057f3e>] ? kthread+0x0/0x87
 [<ffffffff810039d0>] ? kernel_thread_helper+0x0/0x10
Code: 00 00 00 00 4c 89 f9 48 03 0d 98 63 73 00 a8 08 74 15 48 89 55 90
48 89 4d 98 e8 67 3c 07 00 48 8b 4d 98 48 8b 55 90 48 83 39 00 <0f> 84
86 01 00 00 48 63 45 c8 48 89 4d 98 48 89 45 b8 81 e2 ff Call Trace:
<#MC>  <<EOE>> Pid: 31, comm: events/0 Not tainted 2.6.33.4-rt20 #37
Call Trace: <NMI>  [<ffffffff81001a9c>] ? show_regs+0x2b/0x30
 [<ffffffff814380bd>] nmi_watchdog_tick+0xc2/0x1a6
 [<ffffffff814376c3>] do_nmi+0xb5/0x25b
 [<ffffffff814371aa>] nmi+0x1a/0x20
 [<ffffffff813c124d>] ? rt_worker_func+0xd3/0x2de
 <<EOE>>  [<ffffffff81053f0f>] worker_thread+0x206/0x2a3
 [<ffffffff813c117a>] ? rt_worker_func+0x0/0x2de
 [<ffffffff81058360>] ? autoremove_wake_function+0x0/0x39
 [<ffffffff81053d09>] ? worker_thread+0x0/0x2a3
 [<ffffffff81053d09>] ? worker_thread+0x0/0x2a3
 [<ffffffff81057fbd>] kthread+0x7f/0x87
 [<ffffffff810039d4>] kernel_thread_helper+0x4/0x10
 [<ffffffff81057f3e>] ? kthread+0x0/0x87
 [<ffffffff810039d0>] ? kernel_thread_helper+0x0/0x10

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [RT] RCU splat on 2.6.33.4-rt20
  2010-05-20  2:19 [RT] RCU splat on 2.6.33.4-rt20 Clark Williams
@ 2010-05-20  3:22 ` Paul E. McKenney
  2010-05-20 13:59   ` Clark Williams
  0 siblings, 1 reply; 3+ messages in thread
From: Paul E. McKenney @ 2010-05-20  3:22 UTC (permalink / raw)
  To: Clark Williams; +Cc: Thomas Gleixner, RT

On Wed, May 19, 2010 at 09:19:11PM -0500, Clark Williams wrote:
> Thomas/Paul,
> 
> Resumed my laptop (T60, 64-bit kernel) and after about five
> minutes of doing nothing but surfing with firefox, got the following
> RCU stall notification from a 2.6.33.4-rt20 kernel:
> 
> INFO: RCU detected CPU 1 stall (t=0 jiffies)

Hmmm...  "t=0 jiffies".  I wonder if the hibernation/resumption confused
the stall-warning code.  My first thought was that an RCU grace period
had spanned the hibernation, but that would not be consistent with it
waiting five minutes after resumption to splat.

I will look into this.

							Thanx, Paul

> sending NMI to all CPUs:
> NMI backtrace for cpu 1
> CPU 1 
> Pid: 0, comm: swapper Not tainted 2.6.33.4-rt20 #37        /       
> RIP: 0010:[<ffffffff8120f188>]  [<ffffffff8120f188>]
> __const_udelay+0x37/0x44 RSP: 0018:ffff88000a303dc8  EFLAGS: 00000002
> RAX: 0000000001062560 RBX: 0000000000000001 RCX: ffff88000a300000
> RDX: 000000001dba61d0 RSI: 0000000000000002 RDI: 0000000000418958
> RBP: ffff88000a303dc8 R08: 0000000000000001 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000000 R12: ffff88000a30b8b0
> R13: ffffffff81a7c800 R14: ffff88000a303f48 R15: ffff88000a30b5e0
> FS:  0000000000000000(0000) GS:ffff88000a300000(0000)
> knlGS:0000000000000000 CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> CR2: 00007f5fedd4b000 CR3: 0000000001a56000 CR4: 00000000000006e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Process swapper (pid: 0, threadinfo ffff8800bf1d8000, task
> ffff8800bf1d67c0) Stack:
>  ffff88000a303de8 ffffffff810185aa 000000000000088e ffffffff81a7c800
> <0> ffff88000a303e38 ffffffff81092650 ffff8800bf1d9dc8 0000000000000001
> <0> ffff88000a303e58 0000000000000001 0000000000000001 0000000000000000
> Call Trace:
>  <IRQ> 
>  [<ffffffff810185aa>] arch_trigger_all_cpu_backtrace+0x56/0x62
>  [<ffffffff81092650>] __rcu_pending+0x6e/0x2d3
>  [<ffffffff81092909>] rcu_check_callbacks+0x54/0x15b
>  [<ffffffff8104972e>] update_process_times+0x46/0x5b
>  [<ffffffff810656ed>] tick_sched_timer+0x7a/0xa4
>  [<ffffffff8105b7ca>] __run_hrtimer+0x130/0x1bd
>  [<ffffffff81065673>] ? tick_sched_timer+0x0/0xa4
>  [<ffffffff8105bb9f>] hrtimer_interrupt+0xdc/0x1dd
>  [<ffffffff8143bcb4>] smp_apic_timer_interrupt+0x89/0x9c
>  [<ffffffff81003593>] apic_timer_interrupt+0x13/0x20
>  <EOI> 
>  [<ffffffff81275cd8>] ? acpi_idle_enter_bm+0x2b0/0x2db
>  [<ffffffff81275cce>] ? acpi_idle_enter_bm+0x2a6/0x2db
>  [<ffffffff81370a67>] cpuidle_idle_call+0x96/0x11f
>  [<ffffffff81001da9>] cpu_idle+0xab/0xee
>  [<ffffffff81430232>] start_secondary+0x1a5/0x1a9
> Code: 00 ff 00 00 48 8d 04 bd 00 00 00 00 65 8b 0c 25 78 ac 00 00 48 63
> c9 48 8b 0c cd 60 23 af 81 48 69 94 0a 98 00 00 00 fa 00 00 00 <f7> e2
> 48 8d 7a 01 e8 ad ff ff ff c9 c3 55 48 89 e5 0f 1f 44 00 Call Trace:
> <#MC>  <<EOE>> Pid: 0, comm: swapper Not tainted 2.6.33.4-rt20 #37 Call
> Trace: <NMI>  [<ffffffff81001a9c>] ? show_regs+0x2b/0x30
>  [<ffffffff814380bd>] nmi_watchdog_tick+0xc2/0x1a6
>  [<ffffffff814376c3>] do_nmi+0xb5/0x25b
>  [<ffffffff814371aa>] nmi+0x1a/0x20
>  [<ffffffff8120f188>] ? __const_udelay+0x37/0x44
>  <<EOE>>  <IRQ>  [<ffffffff810185aa>]
> arch_trigger_all_cpu_backtrace+0x56/0x62 [<ffffffff81092650>]
> __rcu_pending+0x6e/0x2d3 [<ffffffff81092909>]
> rcu_check_callbacks+0x54/0x15b [<ffffffff8104972e>]
> update_process_times+0x46/0x5b [<ffffffff810656ed>]
> tick_sched_timer+0x7a/0xa4 [<ffffffff8105b7ca>]
> __run_hrtimer+0x130/0x1bd [<ffffffff81065673>] ?
> tick_sched_timer+0x0/0xa4 [<ffffffff8105bb9f>]
> hrtimer_interrupt+0xdc/0x1dd [<ffffffff8143bcb4>]
> smp_apic_timer_interrupt+0x89/0x9c [<ffffffff81003593>]
> apic_timer_interrupt+0x13/0x20 <EOI>  [<ffffffff81275cd8>] ?
> acpi_idle_enter_bm+0x2b0/0x2db [<ffffffff81275cce>] ?
> acpi_idle_enter_bm+0x2a6/0x2db [<ffffffff81370a67>]
> cpuidle_idle_call+0x96/0x11f [<ffffffff81001da9>] cpu_idle+0xab/0xee
>  [<ffffffff81430232>] start_secondary+0x1a5/0x1a9
> NMI backtrace for cpu 0
> CPU 0 
> Pid: 31, comm: events/0 Not tainted 2.6.33.4-rt20 #37        /       
> RIP: 0010:[<ffffffff813c124d>]  [<ffffffff813c124d>]
> rt_worker_func+0xd3/0x2de RSP: 0018:ffff8800bf1ddd80  EFLAGS: 00000246
> RAX: 0000000000000000 RBX: 0000000000002770 RCX: ffff8800b8a13b80
> RDX: 0000000000002770 RSI: ffff8800bf1ddfd8 RDI: ffff8800b893a760
> RBP: ffff8800bf1dddf0 R08: ffff8800bf1dc000 R09: ffff8800bf1da800
> R10: ffff88000a210000 R11: 0000000000000001 R12: 0000000000006666
> R13: 0000000000000000 R14: 00000000000062b0 R15: 0000000000013b80
> FS:  0000000000000000(0000) GS:ffff88000a200000(0000)
> knlGS:0000000000000000 CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> CR2: 00007fbe40a86000 CR3: 0000000001a56000 CR4: 00000000000006f0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Process events/0 (pid: 31, threadinfo ffff8800bf1dc000, task
> ffff8800bf1da800) Stack:
>  ffff8800bf1dddc0 ffff8800b8a135a0 0000000000000000 ffff8800bf1ddfd8
> <0> 0000000000002760 00000000000493e0 0000000000000000 00000000000493e0
> <0> ffff8800bf1dc000 ffff88000a212ac0 ffff8800bf1da800 ffff8800bf1dc000
> Call Trace:
>  [<ffffffff81053f0f>] worker_thread+0x206/0x2a3
>  [<ffffffff813c117a>] ? rt_worker_func+0x0/0x2de
>  [<ffffffff81058360>] ? autoremove_wake_function+0x0/0x39
>  [<ffffffff81053d09>] ? worker_thread+0x0/0x2a3
>  [<ffffffff81053d09>] ? worker_thread+0x0/0x2a3
>  [<ffffffff81057fbd>] kthread+0x7f/0x87
>  [<ffffffff810039d4>] kernel_thread_helper+0x4/0x10
>  [<ffffffff81057f3e>] ? kthread+0x0/0x87
>  [<ffffffff810039d0>] ? kernel_thread_helper+0x0/0x10
> Code: 00 00 00 00 4c 89 f9 48 03 0d 98 63 73 00 a8 08 74 15 48 89 55 90
> 48 89 4d 98 e8 67 3c 07 00 48 8b 4d 98 48 8b 55 90 48 83 39 00 <0f> 84
> 86 01 00 00 48 63 45 c8 48 89 4d 98 48 89 45 b8 81 e2 ff Call Trace:
> <#MC>  <<EOE>> Pid: 31, comm: events/0 Not tainted 2.6.33.4-rt20 #37
> Call Trace: <NMI>  [<ffffffff81001a9c>] ? show_regs+0x2b/0x30
>  [<ffffffff814380bd>] nmi_watchdog_tick+0xc2/0x1a6
>  [<ffffffff814376c3>] do_nmi+0xb5/0x25b
>  [<ffffffff814371aa>] nmi+0x1a/0x20
>  [<ffffffff813c124d>] ? rt_worker_func+0xd3/0x2de
>  <<EOE>>  [<ffffffff81053f0f>] worker_thread+0x206/0x2a3
>  [<ffffffff813c117a>] ? rt_worker_func+0x0/0x2de
>  [<ffffffff81058360>] ? autoremove_wake_function+0x0/0x39
>  [<ffffffff81053d09>] ? worker_thread+0x0/0x2a3
>  [<ffffffff81053d09>] ? worker_thread+0x0/0x2a3
>  [<ffffffff81057fbd>] kthread+0x7f/0x87
>  [<ffffffff810039d4>] kernel_thread_helper+0x4/0x10
>  [<ffffffff81057f3e>] ? kthread+0x0/0x87
>  [<ffffffff810039d0>] ? kernel_thread_helper+0x0/0x10



^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [RT] RCU splat on 2.6.33.4-rt20
  2010-05-20  3:22 ` Paul E. McKenney
@ 2010-05-20 13:59   ` Clark Williams
  0 siblings, 0 replies; 3+ messages in thread
From: Clark Williams @ 2010-05-20 13:59 UTC (permalink / raw)
  To: paulmck; +Cc: Thomas Gleixner, RT

[-- Attachment #1: Type: text/plain, Size: 1152 bytes --]

On Wed, 19 May 2010 20:22:46 -0700
"Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:

> On Wed, May 19, 2010 at 09:19:11PM -0500, Clark Williams wrote:
> > Thomas/Paul,
> > 
> > Resumed my laptop (T60, 64-bit kernel) and after about five
> > minutes of doing nothing but surfing with firefox, got the following
> > RCU stall notification from a 2.6.33.4-rt20 kernel:
> > 
> > INFO: RCU detected CPU 1 stall (t=0 jiffies)
> 
> Hmmm...  "t=0 jiffies".  I wonder if the hibernation/resumption confused
> the stall-warning code.  My first thought was that an RCU grace period
> had spanned the hibernation, but that would not be consistent with it
> waiting five minutes after resumption to splat.
> 
> I will look into this.
> 

Don't take that "five minutes" as anything approaching some degree of
accuracy. It could have been quicker; I was distracted and not even
touching the keyboard when a loud beep sounded and I had to poke around
a bit to realize that an NMI backtrace had triggered.

Depending on your timeout values, I think that it's quite reasonable to
suspect that an RCU spanned the suspend/resume.

Clark

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2010-05-20 13:59 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-05-20  2:19 [RT] RCU splat on 2.6.33.4-rt20 Clark Williams
2010-05-20  3:22 ` Paul E. McKenney
2010-05-20 13:59   ` Clark Williams

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).