From: "Luis Claudio R. Goncalves" <lclaudio@uudg.org>
To: Tom Putzeys <tom.putzeys@be.atlascopco.com>
Cc: "linux-rt-users@vger.kernel.org" <linux-rt-users@vger.kernel.org>
Subject: Re: System lockup causes reboot but no panic and no kernel crash dump
Date: Mon, 4 Feb 2019 11:56:30 -0200 [thread overview]
Message-ID: <20190204135630.GA3847@uudg.org> (raw)
In-Reply-To: <AM0PR03MB4804EC083BE1023391C829B1BB910@AM0PR03MB4804.eurprd03.prod.outlook.com>
On Thu, Jan 31, 2019 at 12:26:48AM +0000, Tom Putzeys wrote:
> Hi all,
>
> I am trying to debug a series of random system freezes / lockups by making use of the kernel lockup detector / NMI watchdog to trigger a kernel crash dump when a system lockup occurs.
>
> We are running the 4.14.93-rt kernel on a quad-core x86_64 Intel Atom SMP machine.
>
> The lockup detector is fully enabled and configured to trigger a panic when a hard or soft lockup occurs:
> CONFIG_HAVE_HARDLOCKUP_DETECTOR_PERF=y
> CONFIG_LOCKUP_DETECTOR=y
> CONFIG_SOFTLOCKUP_DETECTOR=y
> CONFIG_HARDLOCKUP_DETECTOR_PERF=y
> CONFIG_HARDLOCKUP_CHECK_TIMESTAMP=y
> CONFIG_HARDLOCKUP_DETECTOR=y
> CONFIG_BOOTPARAM_HARDLOCKUP_PANIC=y
> CONFIG_BOOTPARAM_HARDLOCKUP_PANIC_VALUE=1
> CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC=y
> CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC_VALUE=1
>
> I also set the correct sysctl variables:
> kernel.panic = 1
> kernel.panic_on_oops = 1
> kernel.unknown_nmi_panic = 1
> kernel.panic_on_unrecovered_nmi = 1
> kernel.panic_on_io_nmi = 1
> kernel.softlockup_panic = 1
> kernel.hung_task_panic = 1
>
> I also enabled the NMI watchdog via the kernel cmdline (nmi_watchdog=1).
>
> I configured my system to generate a kernel crash dump using kdump /
> kexec when a panic occurs. When I trigger a manual kernel panic via
> /proc/sysrq-trigger, the crash dump mechanism works perfectly. I see a
> switch to my dump-capture kernel and ramdisk.
>
> The problem: when a real-life lockup or system freeze occurs, the system
> just reboots without generating a crash dump. There is no switch to the
> dump-capture kernel. AFAIK, there is no panic. I find nothing in the logs
> and nothing appears on the console.
>
> To replicate the problem: I wrote a small program that runs an infinite
> nop while loop. When running this program on all 4 cores with max.
> real-time priority (SCHED_FIFO) to hog the CPU, I get a complete system
> lockup (no keyboard input, no serial console, no ping reply). This freeze
> then triggers a reboot (I guess when the watchdog kicks in) but no crash
> dump or no visible kernel panic.
>
> I find it strange that the RT throttling mechanism does not prevent a
> freeze in this case (we did not disable it), but apart from that, I guess
> my hog application should be detected as a hung task and cause a panic.
With regards to RT throttling, you probably need to run this command line:
# echo NO_RT_RUNTIME_SHARE > /sys/kernel/debug/sched_features
In short, a CPU about to hit the point where the RT throttling mechanism
would kick in can borrow "RT time" from other CPUs that are not running
RT tasks. That is the default behavior and the command line above disables
that feature.
You could also, for testing purposes, increase the slice of CPU time reserved
for non-rt tasks. You could try 10% instead of the usual 5%:
# echo 1000000 > /proc/sys/kernel/sched_rt_period_us
# echo 900000 > /proc/sys/kernel/sched_rt_runtime_us
Luis
parent reply other threads:[~2019-02-04 13:56 UTC|newest]
Thread overview: expand[flat|nested] mbox.gz Atom feed
[parent not found: <AM0PR03MB4804EC083BE1023391C829B1BB910@AM0PR03MB4804.eurprd03.prod.outlook.com>]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190204135630.GA3847@uudg.org \
--to=lclaudio@uudg.org \
--cc=linux-rt-users@vger.kernel.org \
--cc=tom.putzeys@be.atlascopco.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).