RCU stall on panda

linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed

* RCU stall on panda
@ 2014-05-05  9:38 Alex Shi
  0 siblings, 0 replies; 15+ messages in thread
From: Alex Shi @ 2014-05-05  9:38 UTC (permalink / raw)
  To: linux-arm-kernel

I keep seeing the RCU stall problem on panda board from 3.10 kernel to latest upstream kernel
and google find some one report it before: https://lkml.org/lkml/2012/9/20/519

Is it the hardware issue or a real software problem?

  95.519653] INFO: rcu_sched self-detected stall on CPU^M
[   95.519866]  1: (1 GPs behind) idle=2e7/1/0 softirq=4404/4405 ^M
[   95.526489] INFO: rcu_sched detected stalls on CPUs/tasks:^M
[   95.526489]  1: (1 GPs behind) idle=2e7/1/0 softirq=4404/4405 ^M
[   95.526489]  (detected by 0, t=4229 jiffies, g=800, c=799, q=440)^M
[   95.526519] Task dump for CPU 1:^M
[   95.526519] swapper/1       R running      0     0      1 0x00000000^M
[   95.559844]   (t=4229 jiffies g=800 c=799 q=440)^M
[   95.564727] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.15.0-rc4 #93^M
[   95.571502] [<c00133fd>] (unwind_backtrace) from [<c001076d>] (show_stack+0x11/0x14)^M
[   95.579711] [<c001076d>] (show_stack) from [<c0570465>] (dump_stack+0x75/0x88)^M
[   95.587371] [<c0570465>] (dump_stack) from [<c0084383>] (rcu_check_callbacks+0x353/0x79c)^M
[   95.596038] [<c0084383>] (rcu_check_callbacks) from [<c003e99f>] (update_process_times+0x33/0x4c)^M
[   95.605438] [<c003e99f>] (update_process_times) from [<c008e5a3>] (tick_sched_handle.isra.18+0x1f/0x48)^M
[   95.615386] [<c008e5a3>] (tick_sched_handle.isra.18) from [<c008e609>] (tick_sched_timer+0x3d/0x5c)^M
[   95.624969] [<c008e609>] (tick_sched_timer) from [<c0051a23>] (__run_hrtimer+0x67/0x310)^M
[   95.633544] [<c0051a23>] (__run_hrtimer) from [<c00525fd>] (hrtimer_interrupt+0xe1/0x214)^M
[   95.642211] [<c00525fd>] (hrtimer_interrupt) from [<c008cecb>] (tick_receive_broadcast+0x1f/0x30)^M
[   95.651611] [<c008cecb>] (tick_receive_broadcast) from [<c0011e4f>] (handle_IPI+0xb3/0x120)^M
[   95.660461] [<c0011e4f>] (handle_IPI) from [<c00085e5>] (gic_handle_irq+0x51/0x54)^M
[   95.668487] [<c00085e5>] (gic_handle_irq) from [<c057603f>] (__irq_svc+0x3f/0x64)^M
[   95.676391] Exception stack(0xee0dbf10 to 0xee0dbf58)^M
[   95.681762] bf00:                                     00000001 00000001 00000000 ee0d8c40^M
[   95.690429] bf20: 3c6bd296 00000016 3c6f8c43 00000016 eefab540 c08e0c84 00000000 c0fc7114^M
[   95.699066] bf40: 00000010 ee0dbf58 c006ef4d c0443890 40000033 ffffffff^M
[   95.706085] [<c057603f>] (__irq_svc) from [<c0443890>] (cpuidle_enter_state+0xc0/0xc4)^M
[   95.714477] [<c0443890>] (cpuidle_enter_state) from [<c0444d11>] (cpuidle_enter_state_coupled+0xe1/0x290)^M
[   95.724639] [<c0444d11>] (cpuidle_enter_state_coupled) from [<c0067cd1>] (cpu_startup_entry+0x1a5/0x494)^M
[   95.734680] [<c0067cd1>] (cpu_startup_entry) from [<80008685>] (0x80008685)^M
[   95.742095] BUG: soft lockup - CPU#1 stuck for 40s! [swapper/1:0]^M
[   95.748535] Modules linked in:^M
[   95.751770] irq event stamp: 128730^M
[   95.755462] hardirqs last  enabled at (128727): [<c044388f>] cpuidle_enter_state+0xbf/0xc4^M
[   95.764221] hardirqs last disabled at (128728): [<c0576033>] __irq_svc+0x33/0x64^M
[   95.772064] softirqs last  enabled at (128730): [<c00386cd>] irq_enter+0x59/0x60^M
[   95.779907] softirqs last disabled at (128729): [<c00386ba>] irq_enter+0x46/0x60^M
[   95.787750] ^M

-- 
Thanks
    Alex

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RCU stall on panda
@ 2014-05-05  9:39 Alex Shi
  2014-05-05 18:06 ` Paul E. McKenney
  0 siblings, 1 reply; 15+ messages in thread
From: Alex Shi @ 2014-05-05  9:39 UTC (permalink / raw)
  To: linux-arm-kernel

I keep seeing the RCU stall problem on panda board from 3.10 kernel to latest upstream kernel
and google find some one report it before: https://lkml.org/lkml/2012/9/20/519

Is it the hardware issue or a real software problem?

  95.519653] INFO: rcu_sched self-detected stall on CPU^M
[   95.519866]  1: (1 GPs behind) idle=2e7/1/0 softirq=4404/4405 ^M
[   95.526489] INFO: rcu_sched detected stalls on CPUs/tasks:^M
[   95.526489]  1: (1 GPs behind) idle=2e7/1/0 softirq=4404/4405 ^M
[   95.526489]  (detected by 0, t=4229 jiffies, g=800, c=799, q=440)^M
[   95.526519] Task dump for CPU 1:^M
[   95.526519] swapper/1       R running      0     0      1 0x00000000^M
[   95.559844]   (t=4229 jiffies g=800 c=799 q=440)^M
[   95.564727] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.15.0-rc4 #93^M
[   95.571502] [<c00133fd>] (unwind_backtrace) from [<c001076d>] (show_stack+0x11/0x14)^M
[   95.579711] [<c001076d>] (show_stack) from [<c0570465>] (dump_stack+0x75/0x88)^M
[   95.587371] [<c0570465>] (dump_stack) from [<c0084383>] (rcu_check_callbacks+0x353/0x79c)^M
[   95.596038] [<c0084383>] (rcu_check_callbacks) from [<c003e99f>] (update_process_times+0x33/0x4c)^M
[   95.605438] [<c003e99f>] (update_process_times) from [<c008e5a3>] (tick_sched_handle.isra.18+0x1f/0x48)^M
[   95.615386] [<c008e5a3>] (tick_sched_handle.isra.18) from [<c008e609>] (tick_sched_timer+0x3d/0x5c)^M
[   95.624969] [<c008e609>] (tick_sched_timer) from [<c0051a23>] (__run_hrtimer+0x67/0x310)^M
[   95.633544] [<c0051a23>] (__run_hrtimer) from [<c00525fd>] (hrtimer_interrupt+0xe1/0x214)^M
[   95.642211] [<c00525fd>] (hrtimer_interrupt) from [<c008cecb>] (tick_receive_broadcast+0x1f/0x30)^M
[   95.651611] [<c008cecb>] (tick_receive_broadcast) from [<c0011e4f>] (handle_IPI+0xb3/0x120)^M
[   95.660461] [<c0011e4f>] (handle_IPI) from [<c00085e5>] (gic_handle_irq+0x51/0x54)^M
[   95.668487] [<c00085e5>] (gic_handle_irq) from [<c057603f>] (__irq_svc+0x3f/0x64)^M
[   95.676391] Exception stack(0xee0dbf10 to 0xee0dbf58)^M
[   95.681762] bf00:                                     00000001 00000001 00000000 ee0d8c40^M
[   95.690429] bf20: 3c6bd296 00000016 3c6f8c43 00000016 eefab540 c08e0c84 00000000 c0fc7114^M
[   95.699066] bf40: 00000010 ee0dbf58 c006ef4d c0443890 40000033 ffffffff^M
[   95.706085] [<c057603f>] (__irq_svc) from [<c0443890>] (cpuidle_enter_state+0xc0/0xc4)^M
[   95.714477] [<c0443890>] (cpuidle_enter_state) from [<c0444d11>] (cpuidle_enter_state_coupled+0xe1/0x290)^M
[   95.724639] [<c0444d11>] (cpuidle_enter_state_coupled) from [<c0067cd1>] (cpu_startup_entry+0x1a5/0x494)^M
[   95.734680] [<c0067cd1>] (cpu_startup_entry) from [<80008685>] (0x80008685)^M
[   95.742095] BUG: soft lockup - CPU#1 stuck for 40s! [swapper/1:0]^M
[   95.748535] Modules linked in:^M
[   95.751770] irq event stamp: 128730^M
[   95.755462] hardirqs last  enabled at (128727): [<c044388f>] cpuidle_enter_state+0xbf/0xc4^M
[   95.764221] hardirqs last disabled at (128728): [<c0576033>] __irq_svc+0x33/0x64^M
[   95.772064] softirqs last  enabled at (128730): [<c00386cd>] irq_enter+0x59/0x60^M
[   95.779907] softirqs last disabled at (128729): [<c00386ba>] irq_enter+0x46/0x60^M
[   95.787750] ^M


my RCU and IDLE related kernel config as blow:

CONFIG_TREE_RCU=y
CONFIG_RCU_STALL_COMMON=y
CONFIG_RCU_FANOUT=32
CONFIG_RCU_FANOUT_LEAF=16
CONFIG_TREE_RCU_TRACE=y
CONFIG_PROVE_RCU=y
CONFIG_PROVE_RCU_REPEATEDLY=y
CONFIG_SPARSE_RCU_POINTER=y
CONFIG_RCU_CPU_STALL_TIMEOUT=21
CONFIG_RCU_CPU_STALL_INFO=y
CONFIG_RCU_TRACE=y
alexs at alex-panda:~$ cat /proc/config.gz | gunzip | grep IDLE
CONFIG_NO_HZ_IDLE=y
CONFIG_GENERIC_SMP_IDLE_THREAD=y
CONFIG_GENERIC_IDLE_POLL_SETUP=y
CONFIG_CPU_IDLE=y
CONFIG_CPU_IDLE_GOV_LADDER=y
CONFIG_CPU_IDLE_GOV_MENU=y
CONFIG_ARCH_NEEDS_CPU_IDLE_COUPLED=y

-- 
Thanks
    Alex

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RCU stall on panda
  2014-05-05  9:39 Alex Shi
@ 2014-05-05 18:06 ` Paul E. McKenney
  2014-05-12 21:21   ` Tony Lindgren
  0 siblings, 1 reply; 15+ messages in thread
From: Paul E. McKenney @ 2014-05-05 18:06 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, May 05, 2014 at 05:39:43PM +0800, Alex Shi wrote:
> I keep seeing the RCU stall problem on panda board from 3.10 kernel to latest upstream kernel
> and google find some one report it before: https://lkml.org/lkml/2012/9/20/519
> 
> Is it the hardware issue or a real software problem?

I cannot distinguish between hardware and software from the trace below,
but given that you are also seeing a soft lockup, either way you do
appear to have a real problem as opposed to an RCU CPU stall warning
false positive.

							Thanx, Paul

>   95.519653] INFO: rcu_sched self-detected stall on CPU^M
> [   95.519866]  1: (1 GPs behind) idle=2e7/1/0 softirq=4404/4405 ^M
> [   95.526489] INFO: rcu_sched detected stalls on CPUs/tasks:^M
> [   95.526489]  1: (1 GPs behind) idle=2e7/1/0 softirq=4404/4405 ^M
> [   95.526489]  (detected by 0, t=4229 jiffies, g=800, c=799, q=440)^M
> [   95.526519] Task dump for CPU 1:^M
> [   95.526519] swapper/1       R running      0     0      1 0x00000000^M
> [   95.559844]   (t=4229 jiffies g=800 c=799 q=440)^M
> [   95.564727] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.15.0-rc4 #93^M
> [   95.571502] [<c00133fd>] (unwind_backtrace) from [<c001076d>] (show_stack+0x11/0x14)^M
> [   95.579711] [<c001076d>] (show_stack) from [<c0570465>] (dump_stack+0x75/0x88)^M
> [   95.587371] [<c0570465>] (dump_stack) from [<c0084383>] (rcu_check_callbacks+0x353/0x79c)^M
> [   95.596038] [<c0084383>] (rcu_check_callbacks) from [<c003e99f>] (update_process_times+0x33/0x4c)^M
> [   95.605438] [<c003e99f>] (update_process_times) from [<c008e5a3>] (tick_sched_handle.isra.18+0x1f/0x48)^M
> [   95.615386] [<c008e5a3>] (tick_sched_handle.isra.18) from [<c008e609>] (tick_sched_timer+0x3d/0x5c)^M
> [   95.624969] [<c008e609>] (tick_sched_timer) from [<c0051a23>] (__run_hrtimer+0x67/0x310)^M
> [   95.633544] [<c0051a23>] (__run_hrtimer) from [<c00525fd>] (hrtimer_interrupt+0xe1/0x214)^M
> [   95.642211] [<c00525fd>] (hrtimer_interrupt) from [<c008cecb>] (tick_receive_broadcast+0x1f/0x30)^M
> [   95.651611] [<c008cecb>] (tick_receive_broadcast) from [<c0011e4f>] (handle_IPI+0xb3/0x120)^M
> [   95.660461] [<c0011e4f>] (handle_IPI) from [<c00085e5>] (gic_handle_irq+0x51/0x54)^M
> [   95.668487] [<c00085e5>] (gic_handle_irq) from [<c057603f>] (__irq_svc+0x3f/0x64)^M
> [   95.676391] Exception stack(0xee0dbf10 to 0xee0dbf58)^M
> [   95.681762] bf00:                                     00000001 00000001 00000000 ee0d8c40^M
> [   95.690429] bf20: 3c6bd296 00000016 3c6f8c43 00000016 eefab540 c08e0c84 00000000 c0fc7114^M
> [   95.699066] bf40: 00000010 ee0dbf58 c006ef4d c0443890 40000033 ffffffff^M
> [   95.706085] [<c057603f>] (__irq_svc) from [<c0443890>] (cpuidle_enter_state+0xc0/0xc4)^M
> [   95.714477] [<c0443890>] (cpuidle_enter_state) from [<c0444d11>] (cpuidle_enter_state_coupled+0xe1/0x290)^M
> [   95.724639] [<c0444d11>] (cpuidle_enter_state_coupled) from [<c0067cd1>] (cpu_startup_entry+0x1a5/0x494)^M
> [   95.734680] [<c0067cd1>] (cpu_startup_entry) from [<80008685>] (0x80008685)^M
> [   95.742095] BUG: soft lockup - CPU#1 stuck for 40s! [swapper/1:0]^M
> [   95.748535] Modules linked in:^M
> [   95.751770] irq event stamp: 128730^M
> [   95.755462] hardirqs last  enabled at (128727): [<c044388f>] cpuidle_enter_state+0xbf/0xc4^M
> [   95.764221] hardirqs last disabled at (128728): [<c0576033>] __irq_svc+0x33/0x64^M
> [   95.772064] softirqs last  enabled at (128730): [<c00386cd>] irq_enter+0x59/0x60^M
> [   95.779907] softirqs last disabled at (128729): [<c00386ba>] irq_enter+0x46/0x60^M
> [   95.787750] ^M
> 
> 
> my RCU and IDLE related kernel config as blow:
> 
> CONFIG_TREE_RCU=y
> CONFIG_RCU_STALL_COMMON=y
> CONFIG_RCU_FANOUT=32
> CONFIG_RCU_FANOUT_LEAF=16
> CONFIG_TREE_RCU_TRACE=y
> CONFIG_PROVE_RCU=y
> CONFIG_PROVE_RCU_REPEATEDLY=y
> CONFIG_SPARSE_RCU_POINTER=y
> CONFIG_RCU_CPU_STALL_TIMEOUT=21
> CONFIG_RCU_CPU_STALL_INFO=y
> CONFIG_RCU_TRACE=y
> alexs at alex-panda:~$ cat /proc/config.gz | gunzip | grep IDLE
> CONFIG_NO_HZ_IDLE=y
> CONFIG_GENERIC_SMP_IDLE_THREAD=y
> CONFIG_GENERIC_IDLE_POLL_SETUP=y
> CONFIG_CPU_IDLE=y
> CONFIG_CPU_IDLE_GOV_LADDER=y
> CONFIG_CPU_IDLE_GOV_MENU=y
> CONFIG_ARCH_NEEDS_CPU_IDLE_COUPLED=y
> 
> -- 
> Thanks
>     Alex
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RCU stall on panda
  2014-05-05 18:06 ` Paul E. McKenney
@ 2014-05-12 21:21   ` Tony Lindgren
  2014-05-13  6:36     ` Alex Shi
  0 siblings, 1 reply; 15+ messages in thread
From: Tony Lindgren @ 2014-05-12 21:21 UTC (permalink / raw)
  To: linux-arm-kernel

* Paul E. McKenney <paulmck@linux.vnet.ibm.com> [140505 11:11]:
> On Mon, May 05, 2014 at 05:39:43PM +0800, Alex Shi wrote:
> > I keep seeing the RCU stall problem on panda board from 3.10 kernel to latest upstream kernel
> > and google find some one report it before: https://lkml.org/lkml/2012/9/20/519
> > 
> > Is it the hardware issue or a real software problem?
> 
> I cannot distinguish between hardware and software from the trace below,
> but given that you are also seeing a soft lockup, either way you do
> appear to have a real problem as opposed to an RCU CPU stall warning
> false positive.

Looks like you have CPU_IDLE enabled on panda. Hangs with current linux
next with CPU_IDLE are currently being discussed on the linux-omap list
in thread "omap4-panda-es boot issues with v3.15-rc4"

I've seen occasional system hangs, and I've also noticed that doing
ctrl-a-f h or ctrl-a-f l for sysrq backtrace can unlock the system
producing similar errors to the below.

Regards,

Tony
 
> >   95.519653] INFO: rcu_sched self-detected stall on CPU^M
> > [   95.519866]  1: (1 GPs behind) idle=2e7/1/0 softirq=4404/4405 ^M
> > [   95.526489] INFO: rcu_sched detected stalls on CPUs/tasks:^M
> > [   95.526489]  1: (1 GPs behind) idle=2e7/1/0 softirq=4404/4405 ^M
> > [   95.526489]  (detected by 0, t=4229 jiffies, g=800, c=799, q=440)^M
> > [   95.526519] Task dump for CPU 1:^M
> > [   95.526519] swapper/1       R running      0     0      1 0x00000000^M
> > [   95.559844]   (t=4229 jiffies g=800 c=799 q=440)^M
> > [   95.564727] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.15.0-rc4 #93^M
> > [   95.571502] [<c00133fd>] (unwind_backtrace) from [<c001076d>] (show_stack+0x11/0x14)^M
> > [   95.579711] [<c001076d>] (show_stack) from [<c0570465>] (dump_stack+0x75/0x88)^M
> > [   95.587371] [<c0570465>] (dump_stack) from [<c0084383>] (rcu_check_callbacks+0x353/0x79c)^M
> > [   95.596038] [<c0084383>] (rcu_check_callbacks) from [<c003e99f>] (update_process_times+0x33/0x4c)^M
> > [   95.605438] [<c003e99f>] (update_process_times) from [<c008e5a3>] (tick_sched_handle.isra.18+0x1f/0x48)^M
> > [   95.615386] [<c008e5a3>] (tick_sched_handle.isra.18) from [<c008e609>] (tick_sched_timer+0x3d/0x5c)^M
> > [   95.624969] [<c008e609>] (tick_sched_timer) from [<c0051a23>] (__run_hrtimer+0x67/0x310)^M
> > [   95.633544] [<c0051a23>] (__run_hrtimer) from [<c00525fd>] (hrtimer_interrupt+0xe1/0x214)^M
> > [   95.642211] [<c00525fd>] (hrtimer_interrupt) from [<c008cecb>] (tick_receive_broadcast+0x1f/0x30)^M
> > [   95.651611] [<c008cecb>] (tick_receive_broadcast) from [<c0011e4f>] (handle_IPI+0xb3/0x120)^M
> > [   95.660461] [<c0011e4f>] (handle_IPI) from [<c00085e5>] (gic_handle_irq+0x51/0x54)^M
> > [   95.668487] [<c00085e5>] (gic_handle_irq) from [<c057603f>] (__irq_svc+0x3f/0x64)^M
> > [   95.676391] Exception stack(0xee0dbf10 to 0xee0dbf58)^M
> > [   95.681762] bf00:                                     00000001 00000001 00000000 ee0d8c40^M
> > [   95.690429] bf20: 3c6bd296 00000016 3c6f8c43 00000016 eefab540 c08e0c84 00000000 c0fc7114^M
> > [   95.699066] bf40: 00000010 ee0dbf58 c006ef4d c0443890 40000033 ffffffff^M
> > [   95.706085] [<c057603f>] (__irq_svc) from [<c0443890>] (cpuidle_enter_state+0xc0/0xc4)^M
> > [   95.714477] [<c0443890>] (cpuidle_enter_state) from [<c0444d11>] (cpuidle_enter_state_coupled+0xe1/0x290)^M
> > [   95.724639] [<c0444d11>] (cpuidle_enter_state_coupled) from [<c0067cd1>] (cpu_startup_entry+0x1a5/0x494)^M
> > [   95.734680] [<c0067cd1>] (cpu_startup_entry) from [<80008685>] (0x80008685)^M
> > [   95.742095] BUG: soft lockup - CPU#1 stuck for 40s! [swapper/1:0]^M
> > [   95.748535] Modules linked in:^M
> > [   95.751770] irq event stamp: 128730^M
> > [   95.755462] hardirqs last  enabled at (128727): [<c044388f>] cpuidle_enter_state+0xbf/0xc4^M
> > [   95.764221] hardirqs last disabled at (128728): [<c0576033>] __irq_svc+0x33/0x64^M
> > [   95.772064] softirqs last  enabled at (128730): [<c00386cd>] irq_enter+0x59/0x60^M
> > [   95.779907] softirqs last disabled at (128729): [<c00386ba>] irq_enter+0x46/0x60^M
> > [   95.787750] ^M
> > 
> > 
> > my RCU and IDLE related kernel config as blow:
> > 
> > CONFIG_TREE_RCU=y
> > CONFIG_RCU_STALL_COMMON=y
> > CONFIG_RCU_FANOUT=32
> > CONFIG_RCU_FANOUT_LEAF=16
> > CONFIG_TREE_RCU_TRACE=y
> > CONFIG_PROVE_RCU=y
> > CONFIG_PROVE_RCU_REPEATEDLY=y
> > CONFIG_SPARSE_RCU_POINTER=y
> > CONFIG_RCU_CPU_STALL_TIMEOUT=21
> > CONFIG_RCU_CPU_STALL_INFO=y
> > CONFIG_RCU_TRACE=y
> > alexs at alex-panda:~$ cat /proc/config.gz | gunzip | grep IDLE
> > CONFIG_NO_HZ_IDLE=y
> > CONFIG_GENERIC_SMP_IDLE_THREAD=y
> > CONFIG_GENERIC_IDLE_POLL_SETUP=y
> > CONFIG_CPU_IDLE=y
> > CONFIG_CPU_IDLE_GOV_LADDER=y
> > CONFIG_CPU_IDLE_GOV_MENU=y
> > CONFIG_ARCH_NEEDS_CPU_IDLE_COUPLED=y
> > 
> > -- 
> > Thanks
> >     Alex
> > 
> 
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RCU stall on panda
  2014-05-12 21:21   ` Tony Lindgren
@ 2014-05-13  6:36     ` Alex Shi
  2014-05-13 15:32       ` Tony Lindgren
  0 siblings, 1 reply; 15+ messages in thread
From: Alex Shi @ 2014-05-13  6:36 UTC (permalink / raw)
  To: linux-arm-kernel

On 05/13/2014 05:21 AM, Tony Lindgren wrote:
> * Paul E. McKenney <paulmck@linux.vnet.ibm.com> [140505 11:11]:
>> On Mon, May 05, 2014 at 05:39:43PM +0800, Alex Shi wrote:
>>> I keep seeing the RCU stall problem on panda board from 3.10 kernel to latest upstream kernel
>>> and google find some one report it before: https://lkml.org/lkml/2012/9/20/519
>>>
>>> Is it the hardware issue or a real software problem?
>>
>> I cannot distinguish between hardware and software from the trace below,
>> but given that you are also seeing a soft lockup, either way you do
>> appear to have a real problem as opposed to an RCU CPU stall warning
>> false positive.
> 
> Looks like you have CPU_IDLE enabled on panda. Hangs with current linux
> next with CPU_IDLE are currently being discussed on the linux-omap list
> in thread "omap4-panda-es boot issues with v3.15-rc4"
> 
> I've seen occasional system hangs, and I've also noticed that doing
> ctrl-a-f h or ctrl-a-f l for sysrq backtrace can unlock the system
> producing similar errors to the below.
> 

Thanks a lot for the info.
In fact, the oops keeps in upstream kernel from 3.10 to latest.

> Regards,
> 
> Tony
>  
>>>   95.519653] INFO: rcu_sched self-detected stall on CPU^M
>>> [   95.519866]  1: (1 GPs behind) idle=2e7/1/0 softirq=4404/4405 ^M
>>> [   95.526489] INFO: rcu_sched detected stalls on CPUs/tasks:^M
>>> [   95.526489]  1: (1 GPs behind) idle=2e7/1/0 softirq=4404/4405 ^M
>>> [   95.526489]  (detected by 0, t=4229 jiffies, g=800, c=799, q=440)^M
>>> [   95.526519] Task dump for CPU 1:^M
>>> [   95.526519] swapper/1       R running      0     0      1 0x00000000^M
>>> [   95.559844]   (t=4229 jiffies g=800 c=799 q=440)^M
>>> [   95.564727] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.15.0-rc4 #93^M
>>> [   95.571502] [<c00133fd>] (unwind_backtrace) from [<c001076d>] (show_stack+0x11/0x14)^M
>>> [   95.579711] [<c001076d>] (show_stack) from [<c0570465>] (dump_stack+0x75/0x88)^M
>>> [   95.587371] [<c0570465>] (dump_stack) from [<c0084383>] (rcu_check_callbacks+0x353/0x79c)^M
>>> [   95.596038] [<c0084383>] (rcu_check_callbacks) from [<c003e99f>] (update_process_times+0x33/0x4c)^M
>>> [   95.605438] [<c003e99f>] (update_process_times) from [<c008e5a3>] (tick_sched_handle.isra.18+0x1f/0x48)^M
>>> [   95.615386] [<c008e5a3>] (tick_sched_handle.isra.18) from [<c008e609>] (tick_sched_timer+0x3d/0x5c)^M
>>> [   95.624969] [<c008e609>] (tick_sched_timer) from [<c0051a23>] (__run_hrtimer+0x67/0x310)^M
>>> [   95.633544] [<c0051a23>] (__run_hrtimer) from [<c00525fd>] (hrtimer_interrupt+0xe1/0x214)^M
>>> [   95.642211] [<c00525fd>] (hrtimer_interrupt) from [<c008cecb>] (tick_receive_broadcast+0x1f/0x30)^M
>>> [   95.651611] [<c008cecb>] (tick_receive_broadcast) from [<c0011e4f>] (handle_IPI+0xb3/0x120)^M
>>> [   95.660461] [<c0011e4f>] (handle_IPI) from [<c00085e5>] (gic_handle_irq+0x51/0x54)^M
>>> [   95.668487] [<c00085e5>] (gic_handle_irq) from [<c057603f>] (__irq_svc+0x3f/0x64)^M
>>> [   95.676391] Exception stack(0xee0dbf10 to 0xee0dbf58)^M
>>> [   95.681762] bf00:                                     00000001 00000001 00000000 ee0d8c40^M
>>> [   95.690429] bf20: 3c6bd296 00000016 3c6f8c43 00000016 eefab540 c08e0c84 00000000 c0fc7114^M
>>> [   95.699066] bf40: 00000010 ee0dbf58 c006ef4d c0443890 40000033 ffffffff^M
>>> [   95.706085] [<c057603f>] (__irq_svc) from [<c0443890>] (cpuidle_enter_state+0xc0/0xc4)^M
>>> [   95.714477] [<c0443890>] (cpuidle_enter_state) from [<c0444d11>] (cpuidle_enter_state_coupled+0xe1/0x290)^M
>>> [   95.724639] [<c0444d11>] (cpuidle_enter_state_coupled) from [<c0067cd1>] (cpu_startup_entry+0x1a5/0x494)^M
>>> [   95.734680] [<c0067cd1>] (cpu_startup_entry) from [<80008685>] (0x80008685)^M
>>> [   95.742095] BUG: soft lockup - CPU#1 stuck for 40s! [swapper/1:0]^M
>>> [   95.748535] Modules linked in:^M
>>> [   95.751770] irq event stamp: 128730^M
>>> [   95.755462] hardirqs last  enabled at (128727): [<c044388f>] cpuidle_enter_state+0xbf/0xc4^M
>>> [   95.764221] hardirqs last disabled at (128728): [<c0576033>] __irq_svc+0x33/0x64^M
>>> [   95.772064] softirqs last  enabled at (128730): [<c00386cd>] irq_enter+0x59/0x60^M
>>> [   95.779907] softirqs last disabled at (128729): [<c00386ba>] irq_enter+0x46/0x60^M
>>> [   95.787750] ^M
>>>
>>>
>>> my RCU and IDLE related kernel config as blow:
>>>
>>> CONFIG_TREE_RCU=y
>>> CONFIG_RCU_STALL_COMMON=y
>>> CONFIG_RCU_FANOUT=32
>>> CONFIG_RCU_FANOUT_LEAF=16
>>> CONFIG_TREE_RCU_TRACE=y
>>> CONFIG_PROVE_RCU=y
>>> CONFIG_PROVE_RCU_REPEATEDLY=y
>>> CONFIG_SPARSE_RCU_POINTER=y
>>> CONFIG_RCU_CPU_STALL_TIMEOUT=21
>>> CONFIG_RCU_CPU_STALL_INFO=y
>>> CONFIG_RCU_TRACE=y
>>> alexs at alex-panda:~$ cat /proc/config.gz | gunzip | grep IDLE
>>> CONFIG_NO_HZ_IDLE=y
>>> CONFIG_GENERIC_SMP_IDLE_THREAD=y
>>> CONFIG_GENERIC_IDLE_POLL_SETUP=y
>>> CONFIG_CPU_IDLE=y
>>> CONFIG_CPU_IDLE_GOV_LADDER=y
>>> CONFIG_CPU_IDLE_GOV_MENU=y
>>> CONFIG_ARCH_NEEDS_CPU_IDLE_COUPLED=y
>>>
>>> -- 
>>> Thanks
>>>     Alex
>>>
>>
>>
>> _______________________________________________
>> linux-arm-kernel mailing list
>> linux-arm-kernel at lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel


-- 
Thanks
    Alex

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RCU stall on panda
  2014-05-13  6:36     ` Alex Shi
@ 2014-05-13 15:32       ` Tony Lindgren
  2014-05-15  8:44         ` Alex Shi
  0 siblings, 1 reply; 15+ messages in thread
From: Tony Lindgren @ 2014-05-13 15:32 UTC (permalink / raw)
  To: linux-arm-kernel

* Alex Shi <alex.shi@linaro.org> [140512 23:37]:
> On 05/13/2014 05:21 AM, Tony Lindgren wrote:
> > * Paul E. McKenney <paulmck@linux.vnet.ibm.com> [140505 11:11]:
> >> On Mon, May 05, 2014 at 05:39:43PM +0800, Alex Shi wrote:
> >>> I keep seeing the RCU stall problem on panda board from 3.10 kernel to latest upstream kernel
> >>> and google find some one report it before: https://lkml.org/lkml/2012/9/20/519
> >>>
> >>> Is it the hardware issue or a real software problem?
> >>
> >> I cannot distinguish between hardware and software from the trace below,
> >> but given that you are also seeing a soft lockup, either way you do
> >> appear to have a real problem as opposed to an RCU CPU stall warning
> >> false positive.
> > 
> > Looks like you have CPU_IDLE enabled on panda. Hangs with current linux
> > next with CPU_IDLE are currently being discussed on the linux-omap list
> > in thread "omap4-panda-es boot issues with v3.15-rc4"
> > 
> > I've seen occasional system hangs, and I've also noticed that doing
> > ctrl-a-f h or ctrl-a-f l for sysrq backtrace can unlock the system
> > producing similar errors to the below.
> > 
> 
> Thanks a lot for the info.
> In fact, the oops keeps in upstream kernel from 3.10 to latest.

Care to test if the revert of commit cb7094 Santosh posted as
"[PATCH] ARM: OMAP4: Fix the boot regression with CPU_IDLE enabled"
solves the problem for you?

Regards,

Tony

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RCU stall on panda
  2014-05-13 15:32       ` Tony Lindgren
@ 2014-05-15  8:44         ` Alex Shi
  2014-05-15  9:05           ` Daniel Lezcano
  0 siblings, 1 reply; 15+ messages in thread
From: Alex Shi @ 2014-05-15  8:44 UTC (permalink / raw)
  To: linux-arm-kernel

On 05/13/2014 11:32 PM, Tony Lindgren wrote:
> * Alex Shi <alex.shi@linaro.org> [140512 23:37]:
>> On 05/13/2014 05:21 AM, Tony Lindgren wrote:
>>> * Paul E. McKenney <paulmck@linux.vnet.ibm.com> [140505 11:11]:
>>>> On Mon, May 05, 2014 at 05:39:43PM +0800, Alex Shi wrote:
>>>>> I keep seeing the RCU stall problem on panda board from 3.10 kernel to latest upstream kernel
>>>>> and google find some one report it before: https://lkml.org/lkml/2012/9/20/519
>>>>>
>>>>> Is it the hardware issue or a real software problem?
>>>>
>>>> I cannot distinguish between hardware and software from the trace below,
>>>> but given that you are also seeing a soft lockup, either way you do
>>>> appear to have a real problem as opposed to an RCU CPU stall warning
>>>> false positive.
>>>
>>> Looks like you have CPU_IDLE enabled on panda. Hangs with current linux
>>> next with CPU_IDLE are currently being discussed on the linux-omap list
>>> in thread "omap4-panda-es boot issues with v3.15-rc4"
>>>
>>> I've seen occasional system hangs, and I've also noticed that doing
>>> ctrl-a-f h or ctrl-a-f l for sysrq backtrace can unlock the system
>>> producing similar errors to the below.
>>>
>>
>> Thanks a lot for the info.
>> In fact, the oops keeps in upstream kernel from 3.10 to latest.
> 
> Care to test if the revert of commit cb7094 Santosh posted as
> "[PATCH] ARM: OMAP4: Fix the boot regression with CPU_IDLE enabled"
> solves the problem for you?
> 

After enable this patch, system maybe hang in idle. :(

> Regards,
> 
> Tony
> 


-- 
Thanks
    Alex

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RCU stall on panda
  2014-05-15  8:44         ` Alex Shi
@ 2014-05-15  9:05           ` Daniel Lezcano
  2014-05-15 13:26             ` Alex Shi
  0 siblings, 1 reply; 15+ messages in thread
From: Daniel Lezcano @ 2014-05-15  9:05 UTC (permalink / raw)
  To: linux-arm-kernel

On 05/15/2014 10:44 AM, Alex Shi wrote:
> On 05/13/2014 11:32 PM, Tony Lindgren wrote:
>> * Alex Shi <alex.shi@linaro.org> [140512 23:37]:
>>> On 05/13/2014 05:21 AM, Tony Lindgren wrote:
>>>> * Paul E. McKenney <paulmck@linux.vnet.ibm.com> [140505 11:11]:
>>>>> On Mon, May 05, 2014 at 05:39:43PM +0800, Alex Shi wrote:
>>>>>> I keep seeing the RCU stall problem on panda board from 3.10 kernel to latest upstream kernel
>>>>>> and google find some one report it before: https://lkml.org/lkml/2012/9/20/519
>>>>>>
>>>>>> Is it the hardware issue or a real software problem?
>>>>>
>>>>> I cannot distinguish between hardware and software from the trace below,
>>>>> but given that you are also seeing a soft lockup, either way you do
>>>>> appear to have a real problem as opposed to an RCU CPU stall warning
>>>>> false positive.
>>>>
>>>> Looks like you have CPU_IDLE enabled on panda. Hangs with current linux
>>>> next with CPU_IDLE are currently being discussed on the linux-omap list
>>>> in thread "omap4-panda-es boot issues with v3.15-rc4"
>>>>
>>>> I've seen occasional system hangs, and I've also noticed that doing
>>>> ctrl-a-f h or ctrl-a-f l for sysrq backtrace can unlock the system
>>>> producing similar errors to the below.
>>>>
>>>
>>> Thanks a lot for the info.
>>> In fact, the oops keeps in upstream kernel from 3.10 to latest.
>>
>> Care to test if the revert of commit cb7094 Santosh posted as
>> "[PATCH] ARM: OMAP4: Fix the boot regression with CPU_IDLE enabled"
>> solves the problem for you?
>>
>
> After enable this patch, system maybe hang in idle. :(

Hi Alex,

do you mean even with this revert applied, the board hangs in idle ?




-- 
  <http://www.linaro.org/> Linaro.org ? Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RCU stall on panda
  2014-05-15  9:05           ` Daniel Lezcano
@ 2014-05-15 13:26             ` Alex Shi
  2014-05-15 18:32               ` Tony Lindgren
  0 siblings, 1 reply; 15+ messages in thread
From: Alex Shi @ 2014-05-15 13:26 UTC (permalink / raw)
  To: linux-arm-kernel

On 05/15/2014 05:05 PM, Daniel Lezcano wrote:
>>>
>>
>> After enable this patch, system maybe hang in idle. :(
> 
> Hi Alex,
> 
> do you mean even with this revert applied, the board hangs in idle ?
> 

yes.
My board is panda ES. without this revert, it works.

-- 
Thanks
    Alex

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RCU stall on panda
  2014-05-15 13:26             ` Alex Shi
@ 2014-05-15 18:32               ` Tony Lindgren
  2014-05-15 18:36                 ` Santosh Shilimkar
  0 siblings, 1 reply; 15+ messages in thread
From: Tony Lindgren @ 2014-05-15 18:32 UTC (permalink / raw)
  To: linux-arm-kernel

* Alex Shi <alex.shi@linaro.org> [140515 06:27]:
> On 05/15/2014 05:05 PM, Daniel Lezcano wrote:
> >>>
> >>
> >> After enable this patch, system maybe hang in idle. :(
> > 
> > Hi Alex,
> > 
> > do you mean even with this revert applied, the board hangs in idle ?
> > 
> 
> yes.
> My board is panda ES. without this revert, it works.

Care to specify what linux version you are testing against?

Does it hang in idle always immediately on booting?

Or does the serial console first hang with sysrq still
working (ctrl-a h in minicom for help) with device
eventually locking up hard?

Regards,

Tony

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RCU stall on panda
  2014-05-15 18:32               ` Tony Lindgren
@ 2014-05-15 18:36                 ` Santosh Shilimkar
  2014-05-16  7:41                   ` Alex Shi
  0 siblings, 1 reply; 15+ messages in thread
From: Santosh Shilimkar @ 2014-05-15 18:36 UTC (permalink / raw)
  To: linux-arm-kernel

On Thursday 15 May 2014 02:32 PM, Tony Lindgren wrote:
> * Alex Shi <alex.shi@linaro.org> [140515 06:27]:
>> On 05/15/2014 05:05 PM, Daniel Lezcano wrote:
>>>>>
>>>>
>>>> After enable this patch, system maybe hang in idle. :(
>>>
>>> Hi Alex,
>>>
>>> do you mean even with this revert applied, the board hangs in idle ?
>>>
>>
>> yes.
>> My board is panda ES. without this revert, it works.
> 
> Care to specify what linux version you are testing against?
> 
> Does it hang in idle always immediately on booting?
> 
> Or does the serial console first hang with sysrq still
> working (ctrl-a h in minicom for help) with device
> eventually locking up hard?
>
I just posted an updated patch Alex on other thread.
Attaching here again for your reference. Please try
it out and see if the you still get a hang.

Regards,
Santosh

>From bb3b82cc5645b83bedf1343d03cc956f27f6fc83 Mon Sep 17 00:00:00 2001
From: Santosh Shilimkar <santosh.shilimkar@ti.com>
Date: Mon, 12 May 2014 17:37:59 -0400
Subject: [PATCH] ARM: OMAP4: Fix the boot regression with CPU_IDLE enabled

On OMAP4 panda board, there have been several bug reports about boot
hang and lock-ups with CPU_IDLE enabled. The root cause of the issue
is missing interrupts while in idle state. Commit cb7094e8 {cpuidle / omap4 :
use CPUIDLE_FLAG_TIMER_STOP flag} moved the broadcast notifiers to common
code for right reasons but on OMAP4 which suffers from a nasty ROM code
bug with GIC, commit ff999b8a {ARM: OMAP4460: Workaround for ROM bug ..},
we loose interrupts which leads to issues like lock-up, hangs etc.

Patch reverts commit cb7094 {cpuidle / omap4 : use CPUIDLE_FLAG_TIMER_STOP
flag} and 54769d6 {cpuidle: OMAP4: remove timer broadcast initialization} to
avoid the issue. With this change, OMAP4 panda boards, the mentioned
issues are getting fixed. We no longer loose interrupts which was the cause
of the regression.

Cc: Roger Quadros <rogerq@ti.com>
Cc: Kevin Hilman <khilman@linaro.org>
Cc: Tony Lindgren <tony@atomide.com>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Reported-tested-by: Roger Quadros <rogerq@ti.com>
Reported-tested-by: Kevin Hilman <khilman@linaro.org>
Tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
---
 arch/arm/mach-omap2/cpuidle44xx.c |   25 +++++++++++++++++++++----
 1 file changed, 21 insertions(+), 4 deletions(-)

diff --git a/arch/arm/mach-omap2/cpuidle44xx.c b/arch/arm/mach-omap2/cpuidle44xx.c
index 01fc710..2498ab0 100644
--- a/arch/arm/mach-omap2/cpuidle44xx.c
+++ b/arch/arm/mach-omap2/cpuidle44xx.c
@@ -14,6 +14,7 @@
 #include <linux/cpuidle.h>
 #include <linux/cpu_pm.h>
 #include <linux/export.h>
+#include <linux/clockchips.h>
 
 #include <asm/cpuidle.h>
 #include <asm/proc-fns.h>
@@ -83,6 +84,7 @@ static int omap_enter_idle_coupled(struct cpuidle_device *dev,
 {
 	struct idle_statedata *cx = state_ptr + index;
 	u32 mpuss_can_lose_context = 0;
+	int cpu_id = smp_processor_id();
 
 	/*
 	 * CPU0 has to wait and stay ON until CPU1 is OFF state.
@@ -110,6 +112,8 @@ static int omap_enter_idle_coupled(struct cpuidle_device *dev,
 	mpuss_can_lose_context = (cx->mpu_state == PWRDM_POWER_RET) &&
 				 (cx->mpu_logic_state == PWRDM_POWER_OFF);
 
+	clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_ENTER, &cpu_id);
+
 	/*
 	 * Call idle CPU PM enter notifier chain so that
 	 * VFP and per CPU interrupt context is saved.
@@ -165,6 +169,8 @@ static int omap_enter_idle_coupled(struct cpuidle_device *dev,
 	if (dev->cpu == 0 && mpuss_can_lose_context)
 		cpu_cluster_pm_exit();
 
+	clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_EXIT, &cpu_id);
+
 fail:
 	cpuidle_coupled_parallel_barrier(dev, &abort_barrier);
 	cpu_done[dev->cpu] = false;
@@ -172,6 +178,16 @@ fail:
 	return index;
 }
 
+/*
+ * For each cpu, setup the broadcast timer because local timers
+ * stops for the states above C1.
+ */
+static void omap_setup_broadcast_timer(void *arg)
+{
+	int cpu = smp_processor_id();
+	clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_ON, &cpu);
+}
+
 static struct cpuidle_driver omap4_idle_driver = {
 	.name				= "omap4_idle",
 	.owner				= THIS_MODULE,
@@ -189,8 +205,7 @@ static struct cpuidle_driver omap4_idle_driver = {
 			/* C2 - CPU0 OFF + CPU1 OFF + MPU CSWR */
 			.exit_latency = 328 + 440,
 			.target_residency = 960,
-			.flags = CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_COUPLED |
-			         CPUIDLE_FLAG_TIMER_STOP,
+			.flags = CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_COUPLED,
 			.enter = omap_enter_idle_coupled,
 			.name = "C2",
 			.desc = "CPUx OFF, MPUSS CSWR",
@@ -199,8 +214,7 @@ static struct cpuidle_driver omap4_idle_driver = {
 			/* C3 - CPU0 OFF + CPU1 OFF + MPU OSWR */
 			.exit_latency = 460 + 518,
 			.target_residency = 1100,
-			.flags = CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_COUPLED |
-			         CPUIDLE_FLAG_TIMER_STOP,
+			.flags = CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_COUPLED,
 			.enter = omap_enter_idle_coupled,
 			.name = "C3",
 			.desc = "CPUx OFF, MPUSS OSWR",
@@ -231,5 +245,8 @@ int __init omap4_idle_init(void)
 	if (!cpu_clkdm[0] || !cpu_clkdm[1])
 		return -ENODEV;
 
+	/* Configure the broadcast timer on each cpu */
+	on_each_cpu(omap_setup_broadcast_timer, NULL, 1);
+
 	return cpuidle_register(&omap4_idle_driver, cpu_online_mask);
 }
-- 
1.7.9.5

-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-ARM-OMAP4-Fix-the-boot-regression-with-CPU_IDLE-enab.patch
Type: text/x-diff
Size: 4450 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20140515/2fa821ee/attachment-0001.bin>

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* RCU stall on panda
  2014-05-15 18:36                 ` Santosh Shilimkar
@ 2014-05-16  7:41                   ` Alex Shi
  2014-05-16 13:37                     ` Santosh Shilimkar
  0 siblings, 1 reply; 15+ messages in thread
From: Alex Shi @ 2014-05-16  7:41 UTC (permalink / raw)
  To: linux-arm-kernel

On 05/16/2014 02:36 AM, Santosh Shilimkar wrote:
>>> >> yes.
>>> >> My board is panda ES. without this revert, it works.
>> > 
>> > Care to specify what linux version you are testing against?
>> > 
>> > Does it hang in idle always immediately on booting?
>> > 
>> > Or does the serial console first hang with sysrq still
>> > working (ctrl-a h in minicom for help) with device
>> > eventually locking up hard?
>> >
> I just posted an updated patch Alex on other thread.
> Attaching here again for your reference. Please try
> it out and see if the you still get a hang.

it does not hang this time.

but I am not sure it can solve my problem, since RCU stall is not easy
to reproduce in short time.

-- 
Thanks
    Alex

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RCU stall on panda
  2014-05-16  7:41                   ` Alex Shi
@ 2014-05-16 13:37                     ` Santosh Shilimkar
  2014-05-22  8:59                       ` Alex Shi
  0 siblings, 1 reply; 15+ messages in thread
From: Santosh Shilimkar @ 2014-05-16 13:37 UTC (permalink / raw)
  To: linux-arm-kernel

On Friday 16 May 2014 03:41 AM, Alex Shi wrote:
> On 05/16/2014 02:36 AM, Santosh Shilimkar wrote:
>>>>>> yes.
>>>>>> My board is panda ES. without this revert, it works.
>>>>
>>>> Care to specify what linux version you are testing against?
>>>>
>>>> Does it hang in idle always immediately on booting?
>>>>
>>>> Or does the serial console first hang with sysrq still
>>>> working (ctrl-a h in minicom for help) with device
>>>> eventually locking up hard?
>>>>
>> I just posted an updated patch Alex on other thread.
>> Attaching here again for your reference. Please try
>> it out and see if the you still get a hang.
> 
> it does not hang this time.
>
This is good news and exactly what I expected.
 
> but I am not sure it can solve my problem, since RCU stall is not easy
> to reproduce in short time.
> 
You may want to run the system longer if you can. I suspect the RCU stall
was also side effect of missing interrupts.

Regards,
Santosh

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RCU stall on panda
  2014-05-16 13:37                     ` Santosh Shilimkar
@ 2014-05-22  8:59                       ` Alex Shi
  2014-05-22 13:36                         ` Santosh Shilimkar
  0 siblings, 1 reply; 15+ messages in thread
From: Alex Shi @ 2014-05-22  8:59 UTC (permalink / raw)
  To: linux-arm-kernel

On 05/16/2014 09:37 PM, Santosh Shilimkar wrote:
> On Friday 16 May 2014 03:41 AM, Alex Shi wrote:
>> On 05/16/2014 02:36 AM, Santosh Shilimkar wrote:
>>>>>>> yes.
>>>>>>> My board is panda ES. without this revert, it works.
>>>>>
>>>>> Care to specify what linux version you are testing against?
>>>>>
>>>>> Does it hang in idle always immediately on booting?
>>>>>
>>>>> Or does the serial console first hang with sysrq still
>>>>> working (ctrl-a h in minicom for help) with device
>>>>> eventually locking up hard?
>>>>>
>>> I just posted an updated patch Alex on other thread.
>>> Attaching here again for your reference. Please try
>>> it out and see if the you still get a hang.
>>
>> it does not hang this time.
>>
> This is good news and exactly what I expected.
>  
>> but I am not sure it can solve my problem, since RCU stall is not easy
>> to reproduce in short time.
>>
> You may want to run the system longer if you can. I suspect the RCU stall
> was also side effect of missing interrupts.

Sure. it do remove the RCU stall on my panda board.

> 
> Regards,
> Santosh
> 


-- 
Thanks
    Alex

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RCU stall on panda
  2014-05-22  8:59                       ` Alex Shi
@ 2014-05-22 13:36                         ` Santosh Shilimkar
  0 siblings, 0 replies; 15+ messages in thread
From: Santosh Shilimkar @ 2014-05-22 13:36 UTC (permalink / raw)
  To: linux-arm-kernel

On Thursday 22 May 2014 04:59 AM, Alex Shi wrote:
> On 05/16/2014 09:37 PM, Santosh Shilimkar wrote:
>> On Friday 16 May 2014 03:41 AM, Alex Shi wrote:
>>> On 05/16/2014 02:36 AM, Santosh Shilimkar wrote:
>>>>>>>> yes.
>>>>>>>> My board is panda ES. without this revert, it works.
>>>>>>
>>>>>> Care to specify what linux version you are testing against?
>>>>>>
>>>>>> Does it hang in idle always immediately on booting?
>>>>>>
>>>>>> Or does the serial console first hang with sysrq still
>>>>>> working (ctrl-a h in minicom for help) with device
>>>>>> eventually locking up hard?
>>>>>>
>>>> I just posted an updated patch Alex on other thread.
>>>> Attaching here again for your reference. Please try
>>>> it out and see if the you still get a hang.
>>>
>>> it does not hang this time.
>>>
>> This is good news and exactly what I expected.
>>  
>>> but I am not sure it can solve my problem, since RCU stall is not easy
>>> to reproduce in short time.
>>>
>> You may want to run the system longer if you can. I suspect the RCU stall
>> was also side effect of missing interrupts.
> 
> Sure. it do remove the RCU stall on my panda board.
> 
Thanks for confirming. Tony already send fix upstream so it should
show up in next rc mostly

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2014-05-22 13:36 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-05-05  9:38 RCU stall on panda Alex Shi
  -- strict thread matches above, loose matches on Subject: below --
2014-05-05  9:39 Alex Shi
2014-05-05 18:06 ` Paul E. McKenney
2014-05-12 21:21   ` Tony Lindgren
2014-05-13  6:36     ` Alex Shi
2014-05-13 15:32       ` Tony Lindgren
2014-05-15  8:44         ` Alex Shi
2014-05-15  9:05           ` Daniel Lezcano
2014-05-15 13:26             ` Alex Shi
2014-05-15 18:32               ` Tony Lindgren
2014-05-15 18:36                 ` Santosh Shilimkar
2014-05-16  7:41                   ` Alex Shi
2014-05-16 13:37                     ` Santosh Shilimkar
2014-05-22  8:59                       ` Alex Shi
2014-05-22 13:36                         ` Santosh Shilimkar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).