All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Aaron Ma <mapengyu@gmail.com>
Cc: linux-rt-users@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: RCU stall and the system boot hang with nfsroot
Date: Thu, 31 Dec 2015 11:49:49 -0800	[thread overview]
Message-ID: <20151231194949.GA20877@linux.vnet.ibm.com> (raw)
In-Reply-To: <20151230174145.GN4054@linux.vnet.ibm.com>

On Wed, Dec 30, 2015 at 09:41:45AM -0800, Paul E. McKenney wrote:
> On Wed, Dec 30, 2015 at 03:03:33PM +0800, Aaron Ma wrote:
> > On Wed, Dec 30, 2015 at 7:42 AM, Paul E. McKenney
> > <paulmck@linux.vnet.ibm.com> wrote:

[ . . . ]

> > cfg80211: Calling CRDA to update world regulatory domain
> > cfg80211: Calling CRDA to update world regulatory domain
> > cfg80211: Exceeded CRDA call max attempts. Not calling CRDA
> > INFO: rcu_preempt detected stalls on CPUs/tasks:
> >     71: (0 ticks this GP) idle=1ac/0/0 softirq=0/0 fqs=0
> >     (detected by 62, t=26002 jiffies, g=3735, c=3734, q=366014)
> > Task dump for CPU 71:
> > swapper/71      R  running task        0     0      1 0x00200000
> >  ffffffff81492587 ffff8804633cbe58 ffffffff814f21d7 0000000000000004
> >  0000000000000004 ffffe8fffb405310 ffffffff820dc5c0 ffff8804633cbea8
> >  ffffffff8181db85 0000000000000000 0000000000000000 0000000000000046
> > Call Trace:
> >  [<ffffffff81492587>] ? debug_smp_processor_id+0x17/0x20
> >  [<ffffffff814f21d7>] ? intel_idle+0x137/0x140
> >  [<ffffffff8181db85>] ? cpuidle_enter_state+0x65/0x3e0
> >  [<ffffffff8181df37>] ? cpuidle_enter+0x17/0x20
> >  [<ffffffff810a849d>] ? cpu_startup_entry+0x33d/0x630
> >  [<ffffffff8103ceae>] ? start_secondary+0x12e/0x140
> > rcu_preempt kthread starved for 26002 jiffies!
> > rcu_check_gp_kthread_starvation --->show task:
> > rcu_preempt     S ffff880456413c68     0     8      2 0x00000000
> >  ffff880456413c68 ffff8804564025d0 000000000000d7a0 ffff880456b18000
> >  ffff8804564025d0 ffff880456413c38 ffffffff81492587 ffff880456413c58
> >  ffff880456414000 ffff8804564025d0 ffff880456413cb8 ffff880869dce500
> > Call Trace:
> >  [<ffffffff81492587>] ? debug_smp_processor_id+0x17/0x20
> >  [<ffffffff81b5ce9f>] schedule+0x3f/0xd0
> >  [<ffffffff81b5ef19>] schedule_timeout+0x189/0x3f0
> >  [<ffffffff810a7904>] ? swait_prepare+0x24/0x90
> >  [<ffffffff810e8e60>] ? timer_cpu_notify+0x190/0x190
> >  [<ffffffff810a793b>] ? swait_prepare+0x5b/0x90
> >  [<ffffffff810de3f8>] rcu_gp_kthread+0x8a8/0x2190
> >  [<ffffffff810b275d>] ? trace_hardirqs_on+0xd/0x10
> >  [<ffffffff81b5c18f>] ? __schedule+0x4af/0x1180
> >  [<ffffffff810ddb50>] ? call_rcu_sched+0x20/0x20
> >  [<ffffffff8107f844>] kthread+0xe4/0x100
> >  [<ffffffff810b275d>] ? trace_hardirqs_on+0xd/0x10
> >  [<ffffffff8107f760>] ? kthread_create_on_node+0x240/0x240
> >  [<ffffffff81b61562>] ret_from_fork+0x42/0x70
> >  [<ffffffff8107f760>] ? kthread_create_on_node+0x240/0x240
> > rcu_check_gp_kthread_starvation --->end
> > 
> > It seems wait in rcu_gp_kthread. it should be no task blocked right?
> > If so, why the swait_event_interruptible_timeout is not awaken? the
> > timeout is CONFIG_HZ=1000.
> 
> Given that this happens at boot, perhaps ftrace is a good next step.
> The thought would be to enable ftrace via the kernel boot parameters
> for the timers.
> 
> And how often does this problem occur?

And does the following diagnostic patch help?  Its expected behavior
would be to turn a hard hang into something that recovered in a few
minutes, while giving a few stall-warning splats.

							Thanx, Paul

------------------------------------------------------------------------

commit 7798a5efb2acabfa3ca788dd9b5b118eb1bff443
Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Date:   Thu Dec 31 08:48:36 2015 -0800

    rcu: Awaken grace-period kthread when stalled
    
    Recent kernels can fail to awaken the grace-period kthread for
    quiescent-state forcing.  This commit is a crude hack that does
    a wakeup any time a stall is detected.
    
    Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 4b3de6718f7c..51da7ef3561f 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1225,8 +1225,10 @@ static void rcu_check_gp_kthread_starvation(struct rcu_state *rsp)
 		       rsp->gp_flags,
 		       gp_state_getname(rsp->gp_state), rsp->gp_state,
 		       rsp->gp_kthread ? rsp->gp_kthread->state : ~0);
-		if (rsp->gp_kthread)
+		if (rsp->gp_kthread) {
 			sched_show_task(rsp->gp_kthread);
+			wake_up_process(rsp->gp_kthread);
+		}
 	}
 }
 


  reply	other threads:[~2015-12-31 19:49 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-12-29  9:32 RCU stall and the system boot hang with nfsroot Aaron Ma
2015-12-29  9:34 ` Aaron Ma
2015-12-29 23:42   ` Paul E. McKenney
2015-12-30  7:03     ` Aaron Ma
2015-12-30 17:41       ` Paul E. McKenney
2015-12-31 19:49         ` Paul E. McKenney [this message]
2016-01-04 10:01           ` Aaron Ma
2016-01-04 21:18             ` Paul E. McKenney
2016-01-05  7:57               ` Aaron Ma
2016-01-05 18:51                 ` Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20151231194949.GA20877@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-rt-users@vger.kernel.org \
    --cc=mapengyu@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.