public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Wang YanQing <udknight@gmail.com>
Cc: linux-kernel@vger.kernel.org
Subject: Re: Bug report for RCU stalled warning [3.10.69]
Date: Sat, 14 Oct 2017 05:51:16 -0700	[thread overview]
Message-ID: <20171014125116.GA8791@linux.vnet.ibm.com> (raw)
In-Reply-To: <20171012203824.GK3521@linux.vnet.ibm.com>

On Thu, Oct 12, 2017 at 01:38:24PM -0700, Paul E. McKenney wrote:
> [ Adding LKML on CC so that others can find this. ]
> 
> On Wed, Oct 11, 2017 at 12:21:39PM +0800, Wang YanQing wrote:
> > Hi, Paul McKenney.
> > 
> > I have received many machine-stopped-respone reports, after reboot and
> > inspect message, all of them show RCU stalled, but I can't figure out
> > how to fix it. I can't update the kernel, it is the painful point, so I
> > need to fix it in 3.10. I have attached four messages come from different
> > cpu and broads(so I guess it is a BUG instead of hardware fault), any
> > suggestion is welcome.
> 
> The first step is of course to report this to your distro, as they are
> the ones who do the care and feeding of such old kernels.  Please include
> the information below in that report, as it might help your distro find
> and fix the problem.
> 
> It looks like the stalled CPU is idle, and that the activity resulting
> from the stall-warning message gets things going again.  Callbacks are
> being processed, so no OOM.  But you are getting the splat every 60
> seconds.  The system has only two CPUs, and is x86.
> 
> If you cannot upgrade the kernel, my ability to help is limited.  And the
> diagnostics printed with the v3.10 CPU stall warnings are also quite
> limited.  However, there are some things you could try as workarounds:
> 
> 1.	Check to make sure that the rcu_sched kthread is getting
> 	the CPU time that it needs.  Preventing this kthread from
> 	running would create exactly this output, assuming that
> 	the stall warning got it going again temporarily.
> 
> 2.	It looks like the disturbance of the RCU CPU stall warning
> 	is getting things going again.  Try artificially providing
> 	this disturbance, for example, by running a usermode program
> 	or script that runs on each CPU in turn, then sleeps for
> 	(say) five seconds.
> 
> 3.	If you can reconfigure your kernel, try building with
> 	CONFIG_RCU_FAST_NO_HZ=n.

And if you can reconfigure kernel, in v3.10, building with
CONFIG_RCU_CPU_STALL_INFO and CONFIG_RCU_CPU_STALL_VERBOSE will provide
more information on the CPUs and tasks stalling the grace period.

							Thanx, Paul

> 4.	Was the system running reliably on some earlier version?
> 	If so, consider reverting back to that version, and include
> 	the version information in your report to your distro.  If
> 	your distro provides individual patches, you should consider
> 	bisecting so as to locate the offending patch.
> 
> Good luck with it!
> 
> 							Thanx, Paul

      reply	other threads:[~2017-10-14 12:51 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20171011042139.GA5038@udknight>
2017-10-12 20:38 ` Bug report for RCU stalled warning [3.10.69] Paul E. McKenney
2017-10-14 12:51   ` Paul E. McKenney [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171014125116.GA8791@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=udknight@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox