* Re: Bug report for RCU stalled warning [3.10.69] [not found] <20171011042139.GA5038@udknight> @ 2017-10-12 20:38 ` Paul E. McKenney 2017-10-14 12:51 ` Paul E. McKenney 0 siblings, 1 reply; 2+ messages in thread From: Paul E. McKenney @ 2017-10-12 20:38 UTC (permalink / raw) To: Wang YanQing; +Cc: linux-kernel [ Adding LKML on CC so that others can find this. ] On Wed, Oct 11, 2017 at 12:21:39PM +0800, Wang YanQing wrote: > Hi, Paul McKenney. > > I have received many machine-stopped-respone reports, after reboot and > inspect message, all of them show RCU stalled, but I can't figure out > how to fix it. I can't update the kernel, it is the painful point, so I > need to fix it in 3.10. I have attached four messages come from different > cpu and broads(so I guess it is a BUG instead of hardware fault), any > suggestion is welcome. The first step is of course to report this to your distro, as they are the ones who do the care and feeding of such old kernels. Please include the information below in that report, as it might help your distro find and fix the problem. It looks like the stalled CPU is idle, and that the activity resulting from the stall-warning message gets things going again. Callbacks are being processed, so no OOM. But you are getting the splat every 60 seconds. The system has only two CPUs, and is x86. If you cannot upgrade the kernel, my ability to help is limited. And the diagnostics printed with the v3.10 CPU stall warnings are also quite limited. However, there are some things you could try as workarounds: 1. Check to make sure that the rcu_sched kthread is getting the CPU time that it needs. Preventing this kthread from running would create exactly this output, assuming that the stall warning got it going again temporarily. 2. It looks like the disturbance of the RCU CPU stall warning is getting things going again. Try artificially providing this disturbance, for example, by running a usermode program or script that runs on each CPU in turn, then sleeps for (say) five seconds. 3. If you can reconfigure your kernel, try building with CONFIG_RCU_FAST_NO_HZ=n. 4. Was the system running reliably on some earlier version? If so, consider reverting back to that version, and include the version information in your report to your distro. If your distro provides individual patches, you should consider bisecting so as to locate the offending patch. Good luck with it! Thanx, Paul ^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: Bug report for RCU stalled warning [3.10.69] 2017-10-12 20:38 ` Bug report for RCU stalled warning [3.10.69] Paul E. McKenney @ 2017-10-14 12:51 ` Paul E. McKenney 0 siblings, 0 replies; 2+ messages in thread From: Paul E. McKenney @ 2017-10-14 12:51 UTC (permalink / raw) To: Wang YanQing; +Cc: linux-kernel On Thu, Oct 12, 2017 at 01:38:24PM -0700, Paul E. McKenney wrote: > [ Adding LKML on CC so that others can find this. ] > > On Wed, Oct 11, 2017 at 12:21:39PM +0800, Wang YanQing wrote: > > Hi, Paul McKenney. > > > > I have received many machine-stopped-respone reports, after reboot and > > inspect message, all of them show RCU stalled, but I can't figure out > > how to fix it. I can't update the kernel, it is the painful point, so I > > need to fix it in 3.10. I have attached four messages come from different > > cpu and broads(so I guess it is a BUG instead of hardware fault), any > > suggestion is welcome. > > The first step is of course to report this to your distro, as they are > the ones who do the care and feeding of such old kernels. Please include > the information below in that report, as it might help your distro find > and fix the problem. > > It looks like the stalled CPU is idle, and that the activity resulting > from the stall-warning message gets things going again. Callbacks are > being processed, so no OOM. But you are getting the splat every 60 > seconds. The system has only two CPUs, and is x86. > > If you cannot upgrade the kernel, my ability to help is limited. And the > diagnostics printed with the v3.10 CPU stall warnings are also quite > limited. However, there are some things you could try as workarounds: > > 1. Check to make sure that the rcu_sched kthread is getting > the CPU time that it needs. Preventing this kthread from > running would create exactly this output, assuming that > the stall warning got it going again temporarily. > > 2. It looks like the disturbance of the RCU CPU stall warning > is getting things going again. Try artificially providing > this disturbance, for example, by running a usermode program > or script that runs on each CPU in turn, then sleeps for > (say) five seconds. > > 3. If you can reconfigure your kernel, try building with > CONFIG_RCU_FAST_NO_HZ=n. And if you can reconfigure kernel, in v3.10, building with CONFIG_RCU_CPU_STALL_INFO and CONFIG_RCU_CPU_STALL_VERBOSE will provide more information on the CPUs and tasks stalling the grace period. Thanx, Paul > 4. Was the system running reliably on some earlier version? > If so, consider reverting back to that version, and include > the version information in your report to your distro. If > your distro provides individual patches, you should consider > bisecting so as to locate the offending patch. > > Good luck with it! > > Thanx, Paul ^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2017-10-14 12:51 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20171011042139.GA5038@udknight>
2017-10-12 20:38 ` Bug report for RCU stalled warning [3.10.69] Paul E. McKenney
2017-10-14 12:51 ` Paul E. McKenney
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox