All of lore.kernel.org
 help / color / mirror / Atom feed
From: Tony Lindgren <tony@atomide.com>
To: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Alex Shi <alex.shi@linaro.org>,
	"naresh.kamboju@linaro.org Kamboju" <naresh.kamboju@linaro.org>,
	Daniel Lezcano <daniel.lezcano@linaro.org>,
	Linaro Kernel <linaro-kernel@lists.linaro.org>,
	LAK <linux-arm-kernel@lists.infradead.org>,
	Mark Brown <broonie@linaro.org>,
	linux-omap@vger.kernel.org
Subject: Re: RCU stall on panda
Date: Mon, 12 May 2014 14:21:03 -0700	[thread overview]
Message-ID: <20140512212102.GF5668@atomide.com> (raw)
In-Reply-To: <20140505180617.GM8754@linux.vnet.ibm.com>

* Paul E. McKenney <paulmck@linux.vnet.ibm.com> [140505 11:11]:
> On Mon, May 05, 2014 at 05:39:43PM +0800, Alex Shi wrote:
> > I keep seeing the RCU stall problem on panda board from 3.10 kernel to latest upstream kernel
> > and google find some one report it before: https://lkml.org/lkml/2012/9/20/519
> > 
> > Is it the hardware issue or a real software problem?
> 
> I cannot distinguish between hardware and software from the trace below,
> but given that you are also seeing a soft lockup, either way you do
> appear to have a real problem as opposed to an RCU CPU stall warning
> false positive.

Looks like you have CPU_IDLE enabled on panda. Hangs with current linux
next with CPU_IDLE are currently being discussed on the linux-omap list
in thread "omap4-panda-es boot issues with v3.15-rc4"

I've seen occasional system hangs, and I've also noticed that doing
ctrl-a-f h or ctrl-a-f l for sysrq backtrace can unlock the system
producing similar errors to the below.

Regards,

Tony
 
> >   95.519653] INFO: rcu_sched self-detected stall on CPU^M
> > [   95.519866]  1: (1 GPs behind) idle=2e7/1/0 softirq=4404/4405 ^M
> > [   95.526489] INFO: rcu_sched detected stalls on CPUs/tasks:^M
> > [   95.526489]  1: (1 GPs behind) idle=2e7/1/0 softirq=4404/4405 ^M
> > [   95.526489]  (detected by 0, t=4229 jiffies, g=800, c=799, q=440)^M
> > [   95.526519] Task dump for CPU 1:^M
> > [   95.526519] swapper/1       R running      0     0      1 0x00000000^M
> > [   95.559844]   (t=4229 jiffies g=800 c=799 q=440)^M
> > [   95.564727] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.15.0-rc4 #93^M
> > [   95.571502] [<c00133fd>] (unwind_backtrace) from [<c001076d>] (show_stack+0x11/0x14)^M
> > [   95.579711] [<c001076d>] (show_stack) from [<c0570465>] (dump_stack+0x75/0x88)^M
> > [   95.587371] [<c0570465>] (dump_stack) from [<c0084383>] (rcu_check_callbacks+0x353/0x79c)^M
> > [   95.596038] [<c0084383>] (rcu_check_callbacks) from [<c003e99f>] (update_process_times+0x33/0x4c)^M
> > [   95.605438] [<c003e99f>] (update_process_times) from [<c008e5a3>] (tick_sched_handle.isra.18+0x1f/0x48)^M
> > [   95.615386] [<c008e5a3>] (tick_sched_handle.isra.18) from [<c008e609>] (tick_sched_timer+0x3d/0x5c)^M
> > [   95.624969] [<c008e609>] (tick_sched_timer) from [<c0051a23>] (__run_hrtimer+0x67/0x310)^M
> > [   95.633544] [<c0051a23>] (__run_hrtimer) from [<c00525fd>] (hrtimer_interrupt+0xe1/0x214)^M
> > [   95.642211] [<c00525fd>] (hrtimer_interrupt) from [<c008cecb>] (tick_receive_broadcast+0x1f/0x30)^M
> > [   95.651611] [<c008cecb>] (tick_receive_broadcast) from [<c0011e4f>] (handle_IPI+0xb3/0x120)^M
> > [   95.660461] [<c0011e4f>] (handle_IPI) from [<c00085e5>] (gic_handle_irq+0x51/0x54)^M
> > [   95.668487] [<c00085e5>] (gic_handle_irq) from [<c057603f>] (__irq_svc+0x3f/0x64)^M
> > [   95.676391] Exception stack(0xee0dbf10 to 0xee0dbf58)^M
> > [   95.681762] bf00:                                     00000001 00000001 00000000 ee0d8c40^M
> > [   95.690429] bf20: 3c6bd296 00000016 3c6f8c43 00000016 eefab540 c08e0c84 00000000 c0fc7114^M
> > [   95.699066] bf40: 00000010 ee0dbf58 c006ef4d c0443890 40000033 ffffffff^M
> > [   95.706085] [<c057603f>] (__irq_svc) from [<c0443890>] (cpuidle_enter_state+0xc0/0xc4)^M
> > [   95.714477] [<c0443890>] (cpuidle_enter_state) from [<c0444d11>] (cpuidle_enter_state_coupled+0xe1/0x290)^M
> > [   95.724639] [<c0444d11>] (cpuidle_enter_state_coupled) from [<c0067cd1>] (cpu_startup_entry+0x1a5/0x494)^M
> > [   95.734680] [<c0067cd1>] (cpu_startup_entry) from [<80008685>] (0x80008685)^M
> > [   95.742095] BUG: soft lockup - CPU#1 stuck for 40s! [swapper/1:0]^M
> > [   95.748535] Modules linked in:^M
> > [   95.751770] irq event stamp: 128730^M
> > [   95.755462] hardirqs last  enabled at (128727): [<c044388f>] cpuidle_enter_state+0xbf/0xc4^M
> > [   95.764221] hardirqs last disabled at (128728): [<c0576033>] __irq_svc+0x33/0x64^M
> > [   95.772064] softirqs last  enabled at (128730): [<c00386cd>] irq_enter+0x59/0x60^M
> > [   95.779907] softirqs last disabled at (128729): [<c00386ba>] irq_enter+0x46/0x60^M
> > [   95.787750] ^M
> > 
> > 
> > my RCU and IDLE related kernel config as blow:
> > 
> > CONFIG_TREE_RCU=y
> > CONFIG_RCU_STALL_COMMON=y
> > CONFIG_RCU_FANOUT=32
> > CONFIG_RCU_FANOUT_LEAF=16
> > CONFIG_TREE_RCU_TRACE=y
> > CONFIG_PROVE_RCU=y
> > CONFIG_PROVE_RCU_REPEATEDLY=y
> > CONFIG_SPARSE_RCU_POINTER=y
> > CONFIG_RCU_CPU_STALL_TIMEOUT=21
> > CONFIG_RCU_CPU_STALL_INFO=y
> > CONFIG_RCU_TRACE=y
> > alexs@alex-panda:~$ cat /proc/config.gz | gunzip | grep IDLE
> > CONFIG_NO_HZ_IDLE=y
> > CONFIG_GENERIC_SMP_IDLE_THREAD=y
> > CONFIG_GENERIC_IDLE_POLL_SETUP=y
> > CONFIG_CPU_IDLE=y
> > CONFIG_CPU_IDLE_GOV_LADDER=y
> > CONFIG_CPU_IDLE_GOV_MENU=y
> > CONFIG_ARCH_NEEDS_CPU_IDLE_COUPLED=y
> > 
> > -- 
> > Thanks
> >     Alex
> > 
> 
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

WARNING: multiple messages have this Message-ID (diff)
From: tony@atomide.com (Tony Lindgren)
To: linux-arm-kernel@lists.infradead.org
Subject: RCU stall on panda
Date: Mon, 12 May 2014 14:21:03 -0700	[thread overview]
Message-ID: <20140512212102.GF5668@atomide.com> (raw)
In-Reply-To: <20140505180617.GM8754@linux.vnet.ibm.com>

* Paul E. McKenney <paulmck@linux.vnet.ibm.com> [140505 11:11]:
> On Mon, May 05, 2014 at 05:39:43PM +0800, Alex Shi wrote:
> > I keep seeing the RCU stall problem on panda board from 3.10 kernel to latest upstream kernel
> > and google find some one report it before: https://lkml.org/lkml/2012/9/20/519
> > 
> > Is it the hardware issue or a real software problem?
> 
> I cannot distinguish between hardware and software from the trace below,
> but given that you are also seeing a soft lockup, either way you do
> appear to have a real problem as opposed to an RCU CPU stall warning
> false positive.

Looks like you have CPU_IDLE enabled on panda. Hangs with current linux
next with CPU_IDLE are currently being discussed on the linux-omap list
in thread "omap4-panda-es boot issues with v3.15-rc4"

I've seen occasional system hangs, and I've also noticed that doing
ctrl-a-f h or ctrl-a-f l for sysrq backtrace can unlock the system
producing similar errors to the below.

Regards,

Tony
 
> >   95.519653] INFO: rcu_sched self-detected stall on CPU^M
> > [   95.519866]  1: (1 GPs behind) idle=2e7/1/0 softirq=4404/4405 ^M
> > [   95.526489] INFO: rcu_sched detected stalls on CPUs/tasks:^M
> > [   95.526489]  1: (1 GPs behind) idle=2e7/1/0 softirq=4404/4405 ^M
> > [   95.526489]  (detected by 0, t=4229 jiffies, g=800, c=799, q=440)^M
> > [   95.526519] Task dump for CPU 1:^M
> > [   95.526519] swapper/1       R running      0     0      1 0x00000000^M
> > [   95.559844]   (t=4229 jiffies g=800 c=799 q=440)^M
> > [   95.564727] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.15.0-rc4 #93^M
> > [   95.571502] [<c00133fd>] (unwind_backtrace) from [<c001076d>] (show_stack+0x11/0x14)^M
> > [   95.579711] [<c001076d>] (show_stack) from [<c0570465>] (dump_stack+0x75/0x88)^M
> > [   95.587371] [<c0570465>] (dump_stack) from [<c0084383>] (rcu_check_callbacks+0x353/0x79c)^M
> > [   95.596038] [<c0084383>] (rcu_check_callbacks) from [<c003e99f>] (update_process_times+0x33/0x4c)^M
> > [   95.605438] [<c003e99f>] (update_process_times) from [<c008e5a3>] (tick_sched_handle.isra.18+0x1f/0x48)^M
> > [   95.615386] [<c008e5a3>] (tick_sched_handle.isra.18) from [<c008e609>] (tick_sched_timer+0x3d/0x5c)^M
> > [   95.624969] [<c008e609>] (tick_sched_timer) from [<c0051a23>] (__run_hrtimer+0x67/0x310)^M
> > [   95.633544] [<c0051a23>] (__run_hrtimer) from [<c00525fd>] (hrtimer_interrupt+0xe1/0x214)^M
> > [   95.642211] [<c00525fd>] (hrtimer_interrupt) from [<c008cecb>] (tick_receive_broadcast+0x1f/0x30)^M
> > [   95.651611] [<c008cecb>] (tick_receive_broadcast) from [<c0011e4f>] (handle_IPI+0xb3/0x120)^M
> > [   95.660461] [<c0011e4f>] (handle_IPI) from [<c00085e5>] (gic_handle_irq+0x51/0x54)^M
> > [   95.668487] [<c00085e5>] (gic_handle_irq) from [<c057603f>] (__irq_svc+0x3f/0x64)^M
> > [   95.676391] Exception stack(0xee0dbf10 to 0xee0dbf58)^M
> > [   95.681762] bf00:                                     00000001 00000001 00000000 ee0d8c40^M
> > [   95.690429] bf20: 3c6bd296 00000016 3c6f8c43 00000016 eefab540 c08e0c84 00000000 c0fc7114^M
> > [   95.699066] bf40: 00000010 ee0dbf58 c006ef4d c0443890 40000033 ffffffff^M
> > [   95.706085] [<c057603f>] (__irq_svc) from [<c0443890>] (cpuidle_enter_state+0xc0/0xc4)^M
> > [   95.714477] [<c0443890>] (cpuidle_enter_state) from [<c0444d11>] (cpuidle_enter_state_coupled+0xe1/0x290)^M
> > [   95.724639] [<c0444d11>] (cpuidle_enter_state_coupled) from [<c0067cd1>] (cpu_startup_entry+0x1a5/0x494)^M
> > [   95.734680] [<c0067cd1>] (cpu_startup_entry) from [<80008685>] (0x80008685)^M
> > [   95.742095] BUG: soft lockup - CPU#1 stuck for 40s! [swapper/1:0]^M
> > [   95.748535] Modules linked in:^M
> > [   95.751770] irq event stamp: 128730^M
> > [   95.755462] hardirqs last  enabled at (128727): [<c044388f>] cpuidle_enter_state+0xbf/0xc4^M
> > [   95.764221] hardirqs last disabled at (128728): [<c0576033>] __irq_svc+0x33/0x64^M
> > [   95.772064] softirqs last  enabled at (128730): [<c00386cd>] irq_enter+0x59/0x60^M
> > [   95.779907] softirqs last disabled at (128729): [<c00386ba>] irq_enter+0x46/0x60^M
> > [   95.787750] ^M
> > 
> > 
> > my RCU and IDLE related kernel config as blow:
> > 
> > CONFIG_TREE_RCU=y
> > CONFIG_RCU_STALL_COMMON=y
> > CONFIG_RCU_FANOUT=32
> > CONFIG_RCU_FANOUT_LEAF=16
> > CONFIG_TREE_RCU_TRACE=y
> > CONFIG_PROVE_RCU=y
> > CONFIG_PROVE_RCU_REPEATEDLY=y
> > CONFIG_SPARSE_RCU_POINTER=y
> > CONFIG_RCU_CPU_STALL_TIMEOUT=21
> > CONFIG_RCU_CPU_STALL_INFO=y
> > CONFIG_RCU_TRACE=y
> > alexs at alex-panda:~$ cat /proc/config.gz | gunzip | grep IDLE
> > CONFIG_NO_HZ_IDLE=y
> > CONFIG_GENERIC_SMP_IDLE_THREAD=y
> > CONFIG_GENERIC_IDLE_POLL_SETUP=y
> > CONFIG_CPU_IDLE=y
> > CONFIG_CPU_IDLE_GOV_LADDER=y
> > CONFIG_CPU_IDLE_GOV_MENU=y
> > CONFIG_ARCH_NEEDS_CPU_IDLE_COUPLED=y
> > 
> > -- 
> > Thanks
> >     Alex
> > 
> 
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2014-05-12 21:21 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-05-05  9:39 RCU stall on panda Alex Shi
2014-05-05 18:06 ` Paul E. McKenney
2014-05-12 21:21   ` Tony Lindgren [this message]
2014-05-12 21:21     ` Tony Lindgren
2014-05-13  6:36     ` Alex Shi
2014-05-13  6:36       ` Alex Shi
2014-05-13 15:32       ` Tony Lindgren
2014-05-13 15:32         ` Tony Lindgren
2014-05-15  8:44         ` Alex Shi
2014-05-15  8:44           ` Alex Shi
2014-05-15  9:05           ` Daniel Lezcano
2014-05-15  9:05             ` Daniel Lezcano
2014-05-15 13:26             ` Alex Shi
2014-05-15 13:26               ` Alex Shi
2014-05-15 18:32               ` Tony Lindgren
2014-05-15 18:32                 ` Tony Lindgren
2014-05-15 18:36                 ` Santosh Shilimkar
2014-05-15 18:36                   ` Santosh Shilimkar
2014-05-16  7:41                   ` Alex Shi
2014-05-16  7:41                     ` Alex Shi
2014-05-16 13:37                     ` Santosh Shilimkar
2014-05-16 13:37                       ` Santosh Shilimkar
2014-05-22  8:59                       ` Alex Shi
2014-05-22  8:59                         ` Alex Shi
2014-05-22 13:36                         ` Santosh Shilimkar
2014-05-22 13:36                           ` Santosh Shilimkar
  -- strict thread matches above, loose matches on Subject: below --
2014-05-05  9:38 Alex Shi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140512212102.GF5668@atomide.com \
    --to=tony@atomide.com \
    --cc=alex.shi@linaro.org \
    --cc=broonie@linaro.org \
    --cc=daniel.lezcano@linaro.org \
    --cc=linaro-kernel@lists.linaro.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-omap@vger.kernel.org \
    --cc=naresh.kamboju@linaro.org \
    --cc=paulmck@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.