All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Jim Schutt" <jaschut@sandia.gov>
To: paulmck@linux.vnet.ibm.com
Cc: linux-kernel@vger.kernel.org
Subject: Re: mcelog stalls on 2.6.39-rc5
Date: Thu, 28 Apr 2011 08:06:17 -0600	[thread overview]
Message-ID: <4DB97459.3040301@sandia.gov> (raw)
In-Reply-To: <20110427230333.GD2135@linux.vnet.ibm.com>

Paul E. McKenney wrote:
> On Wed, Apr 27, 2011 at 01:59:01PM -0600, Jim Schutt wrote:
>> Hi,
>>
>> Testing 2.6.39-rc5 is giving me the following stall:
>>
>>     [ 5767.731001] INFO: rcu_sched_state detected stall on CPU 1 (t=60001 jiffies)
>>     [ 5767.732001] INFO: rcu_sched_state detected stalls on CPUs/tasks: { 1} (detected by 0, t=60002 jiffies)
>>     [ 5947.763001] INFO: rcu_sched_state detected stall on CPU 1 (t=240032 jiffies)
>>     [ 5947.764001] INFO: rcu_sched_state detected stalls on CPUs/tasks: { 1} (detected by 0, t=240034 jiffies)
>>     [ 6024.489362] libceph: mon0 172.17.40.34:6789 socket closed
>>     [ 6121.281139] INFO: task mcelog:6513 blocked for more than 120 seconds.
>>     [ 6121.287575] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>     [ 6121.295397]  ffff880177aefce8 0000000000000082 ffffffff810339b6 ffff880226d35a40
>>     [ 6121.302840]  ffff88018226c3b0 ffff88018226c3b0 0000000000011e80 ffff880226d35a40
>>     [ 6121.310284]  ffff88018226c760 ffff880177aefe80 ffff880177aefd18 ffffffff813af047
>>     [ 6121.317725] Call Trace:
>>     [ 6121.320176]  [<ffffffff810339b6>] ? calc_load_account_idle+0xe/0x1d
>>     [ 6121.326437]  [<ffffffff813af047>] schedule+0x159/0x193
>>     [ 6121.331569]  [<ffffffff813af449>] schedule_timeout+0x36/0xe2
>>     [ 6121.337223]  [<ffffffff810ad9eb>] ? trace_hardirqs_on+0x9/0x20
>>     [ 6121.343047]  [<ffffffff813aee2b>] do_wait_for_common+0x97/0xe3
>>     [ 6121.348967]  [<ffffffff8103e8fa>] ? try_to_wake_up+0x200/0x200
>>     [ 6121.354794]  [<ffffffff8107a4bf>] ? __raw_spin_lock_irq+0x17/0x2f
>>     [ 6121.360878]  [<ffffffff813af2a9>] wait_for_common+0x36/0x4d
>>     [ 6121.366441]  [<ffffffff813af378>] wait_for_completion+0x1d/0x1f
>>     [ 6121.372356]  [<ffffffff8109ae6d>] synchronize_sched+0x40/0x49
>>     [ 6121.378096]  [<ffffffff810635b8>] ? find_get_pid+0x1b/0x1b
>>     [ 6121.383574]  [<ffffffff81015fe5>] mce_read+0x17f/0x25d
>>     [ 6121.388707]  [<ffffffff81111af5>] ? rw_verify_area+0xac/0xdb
>>     [ 6121.394358]  [<ffffffff811121f1>] vfs_read+0xa9/0xe1
>>     [ 6121.399317]  [<ffffffff8111248d>] sys_read+0x4c/0x70
>>     [ 6121.404278]  [<ffffffff813b6deb>] system_call_fastpath+0x16/0x1b
>>     [ 6127.795001] INFO: rcu_sched_state detected stall on CPU 1 (t=420064 jiffies)
>>     [ 6127.796001] INFO: rcu_sched_state detected stalls on CPUs/tasks: { 1} (detected by 0, t=420066 jiffies)
>>     [ 6241.410171] INFO: task mcelog:6513 blocked for more than 120 seconds.
>>
>> Reverting commit a4dd99250dc makes the stalls go away:
>>
>>     rcu: create new rcu_access_index() and use in mce
>>
>>     The MCE subsystem needs to sample an RCU-protected index outside of
>>     any protection for that index.  If this was a pointer, we would use
>>     rcu_access_pointer(), but there is no corresponding rcu_access_index().
>>     This commit therefore creates an rcu_access_index() and applies it
>>     to MCE.
>>
>>     Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
>>     Tested-by: Zdenek Kabelac <zkabelac@redhat.com>
> 
> Wow!  This is just instructions, just wrapping the access in
> ACCESS_ONCE().
> 
> Was the original RCU CPU stall repeatable?

Yes.  I saw it on two different machines.
Both were running as Ceph clients, doing sustained
streaming writes, if that helps any.

I can attempt to repeat with any extra debugging
you'd like me to try.

-- Jim

> 
> 							Thanx, Paul
> 
>> Here's some bits from my config that might be relevant:
>>
>> # RCU Subsystem
>> CONFIG_TREE_RCU=y
>> # CONFIG_PREEMPT_RCU is not set
>> # CONFIG_RCU_TRACE is not set
>> CONFIG_RCU_FANOUT=64
>> # CONFIG_RCU_FANOUT_EXACT is not set
>> # CONFIG_RCU_FAST_NO_HZ is not set
>> # CONFIG_TREE_RCU_TRACE is not set
>> CONFIG_PREEMPT_NOTIFIERS=y
>> # CONFIG_PREEMPT_NONE is not set
>> CONFIG_PREEMPT_VOLUNTARY=y
>> # CONFIG_PREEMPT is not set
>> CONFIG_X86_MCE=y
>> CONFIG_X86_MCE_INTEL=y
>> CONFIG_X86_MCE_AMD=y
>> CONFIG_X86_MCE_THRESHOLD=y
>> # CONFIG_X86_MCE_INJECT is not set
>> CONFIG_EDAC_DECODE_MCE=y
>> # CONFIG_EDAC_MCE_INJ is not set
>> CONFIG_EDAC_MCE=y
>> # CONFIG_SPARSE_RCU_POINTER is not set
>> # CONFIG_RCU_TORTURE_TEST is not set
>> CONFIG_RCU_CPU_STALL_DETECTOR=y
>> CONFIG_RCU_CPU_STALL_TIMEOUT=60
>> CONFIG_RCU_CPU_STALL_DETECTOR_RUNNABLE=y
>>
>> Please let me know if there is anything I can do to help sort this out.
>>
>> -- Jim
>>
> 



  reply	other threads:[~2011-04-28 14:06 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-04-27 19:59 mcelog stalls on 2.6.39-rc5 Jim Schutt
2011-04-27 23:03 ` Paul E. McKenney
2011-04-28 14:06   ` Jim Schutt [this message]
2011-04-28 16:06     ` Jim Schutt
2011-04-29  0:31       ` Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4DB97459.3040301@sandia.gov \
    --to=jaschut@sandia.gov \
    --cc=linux-kernel@vger.kernel.org \
    --cc=paulmck@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.