From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755996AbbJAHhL (ORCPT ); Thu, 1 Oct 2015 03:37:11 -0400 Received: from mx1.redhat.com ([209.132.183.28]:47919 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756058AbbJAHhG (ORCPT ); Thu, 1 Oct 2015 03:37:06 -0400 Date: Thu, 1 Oct 2015 08:37:03 +0100 From: Aaron Tomlin To: Jiri Kosina Cc: Andrew Morton , Ulrich Obergfell , Don Zickus , linux-kernel@vger.kernel.org Subject: Re: [PATCH] watchdog: perform all-CPU backtrace in case of hard lockup Message-ID: <20151001073703.GR25024@atomlin.usersys.redhat.com> References: MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="wA9WyeW1yVBM2Q32" Content-Disposition: inline In-Reply-To: X-PGP-Key: http://pgp.mit.edu/pks/lookup?search=atomlin%40redhat.com X-PGP-Fingerprint: 7906 84EB FA8A 9638 8D1E 6E9B E2DE 9658 19CC 77D6 User-Agent: Mutt/1.5.23.1-rc1 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org --wA9WyeW1yVBM2Q32 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Fri 2015-09-25 13:15 +0200, Jiri Kosina wrote: > From: Jiri Kosina >=20 > In many cases of hardlockup reports, it's actually not possible to know= =20 > why it triggered, because the CPU that got stuck is usually waiting on a= =20 > resource (with IRQs disabled) in posession of some other CPU is holding. >=20 > IOW, we are often looking at the stacktrace of the victim and not the=20 > actual offender. >=20 > Introduce sysctl / cmdline parameter that makes it possible to have=20 > hardlockup detector perform all-CPU backtrace. >=20 > Signed-off-by: Jiri Kosina > --- > Documentation/kernel-parameters.txt | 5 +++++ > Documentation/sysctl/kernel.txt | 12 ++++++++++++ > include/linux/nmi.h | 1 + > kernel/sysctl.c | 9 +++++++++ > kernel/watchdog.c | 33 ++++++++++++++++++++++++++++---= -- > 5 files changed, 55 insertions(+), 5 deletions(-) >=20 > diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-p= arameters.txt > index 22a4b68..b4af96e 100644 > --- a/Documentation/kernel-parameters.txt > +++ b/Documentation/kernel-parameters.txt > @@ -1246,6 +1246,11 @@ bytes respectively. Such letter suffixes can also = be entirely omitted. > Format: such that (rxsize & ~0x1fffc0) =3D=3D 0. > Default: 1024 > =20 > + hardlockup_all_cpu_backtrace=3D > + [KNL] Should the hard-lockup detector generate > + backtraces on all cpus. > + Format: > + > hashdist=3D [KNL,NUMA] Large hashes allocated during boot > are distributed across NUMA nodes. Defaults on > for 64-bit NUMA, off otherwise. > diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kerne= l.txt > index 6fccb69..af70d15 100644 > --- a/Documentation/sysctl/kernel.txt > +++ b/Documentation/sysctl/kernel.txt > @@ -33,6 +33,7 @@ show up in /proc/sys/kernel: > - domainname > - hostname > - hotplug > +- hardlockup_all_cpu_backtrace > - hung_task_panic > - hung_task_check_count > - hung_task_timeout_secs > @@ -293,6 +294,17 @@ domain names are in general different. For a detaile= d discussion > see the hostname(1) man page. > =20 > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > +hardlockup_all_cpu_backtrace: > + > +This value controls the hard lockup detector behavior when a hard > +lockup condition is detected as to whether or not to gather further > +debug information. If enabled, arch-specific all-CPU stack dumping > +will be initiated. > + > +0: do nothing. This is the default behavior. > + > +1: on detection capture more debug information. > +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > =20 > hotplug: > =20 > diff --git a/include/linux/nmi.h b/include/linux/nmi.h > index 78488e0..7ec5b86 100644 > --- a/include/linux/nmi.h > +++ b/include/linux/nmi.h > @@ -73,6 +73,7 @@ extern int watchdog_user_enabled; > extern int watchdog_thresh; > extern unsigned long *watchdog_cpumask_bits; > extern int sysctl_softlockup_all_cpu_backtrace; > +extern int sysctl_hardlockup_all_cpu_backtrace; > struct ctl_table; > extern int proc_watchdog(struct ctl_table *, int , > void __user *, size_t *, loff_t *); > diff --git a/kernel/sysctl.c b/kernel/sysctl.c > index e69201d..efb0370 100644 > --- a/kernel/sysctl.c > +++ b/kernel/sysctl.c > @@ -897,6 +897,15 @@ static struct ctl_table kern_table[] =3D { > .extra1 =3D &zero, > .extra2 =3D &one, > }, > + { > + .procname =3D "hardlockup_all_cpu_backtrace", > + .data =3D &sysctl_hardlockup_all_cpu_backtrace, > + .maxlen =3D sizeof(int), > + .mode =3D 0644, > + .proc_handler =3D proc_dointvec_minmax, > + .extra1 =3D &zero, > + .extra2 =3D &one, > + }, > #endif /* CONFIG_SMP */ > #endif > #if defined(CONFIG_X86_LOCAL_APIC) && defined(CONFIG_X86) > diff --git a/kernel/watchdog.c b/kernel/watchdog.c > index 64ed1c3..202999c 100644 > --- a/kernel/watchdog.c > +++ b/kernel/watchdog.c > @@ -57,8 +57,10 @@ int __read_mostly watchdog_thresh =3D 10; > =20 > #ifdef CONFIG_SMP > int __read_mostly sysctl_softlockup_all_cpu_backtrace; > +int __read_mostly sysctl_hardlockup_all_cpu_backtrace; > #else > #define sysctl_softlockup_all_cpu_backtrace 0 > +#define sysctl_hardlockup_all_cpu_backtrace 0 > #endif > static struct cpumask watchdog_cpumask __read_mostly; > unsigned long *watchdog_cpumask_bits =3D cpumask_bits(&watchdog_cpumask); > @@ -112,6 +114,7 @@ static unsigned long soft_lockup_nmi_warn; > #ifdef CONFIG_HARDLOCKUP_DETECTOR > static int hardlockup_panic =3D > CONFIG_BOOTPARAM_HARDLOCKUP_PANIC_VALUE; > +static unsigned long hardlockup_allcpu_dumped; > /* > * We may not want to enable hard lockup detection by default in all cas= es, > * for example when running the kernel as a guest on a hypervisor. In th= ese > @@ -173,6 +176,13 @@ static int __init softlockup_all_cpu_backtrace_setup= (char *str) > return 1; > } > __setup("softlockup_all_cpu_backtrace=3D", softlockup_all_cpu_backtrace_= setup); > +static int __init hardlockup_all_cpu_backtrace_setup(char *str) > +{ > + sysctl_hardlockup_all_cpu_backtrace =3D > + !!simple_strtol(str, NULL, 0); > + return 1; > +} > +__setup("hardlockup_all_cpu_backtrace=3D", hardlockup_all_cpu_backtrace_= setup); > #endif > =20 > /* > @@ -318,17 +328,30 @@ static void watchdog_overflow_callback(struct perf_= event *event, > */ > if (is_hardlockup()) { > int this_cpu =3D smp_processor_id(); > + struct pt_regs *regs =3D get_irq_regs(); > =20 > /* only print hardlockups once */ > if (__this_cpu_read(hard_watchdog_warn) =3D=3D true) > return; > =20 > - if (hardlockup_panic) > - panic("Watchdog detected hard LOCKUP on cpu %d", > - this_cpu); > + pr_emerg("Watchdog detected hard LOCKUP on cpu %d", this_cpu); > + print_modules(); > + print_irqtrace_events(current); > + if (regs) > + show_regs(regs); > else > - WARN(1, "Watchdog detected hard LOCKUP on cpu %d", > - this_cpu); > + dump_stack(); > + > + /* > + * Perform all-CPU dump only once to avoid multiple hardlockups > + * generating interleaving traces > + */ > + if (sysctl_hardlockup_all_cpu_backtrace && > + !test_and_set_bit(0, &hardlockup_allcpu_dumped)) > + trigger_allbutself_cpu_backtrace(); How does this play when 'softlockup_all_cpu_backtrace' is enabled too? > + > + if (hardlockup_panic) > + panic("Hard LOCKUP"); > =20 > __this_cpu_write(hard_watchdog_warn, true); > return; This does indeed appear similar to Linus commit ed235875 ("kernel/watchdog.c: print traces for all cpus on lockup detection"); albeit for the hardlockup detector. Looks fine to me. Thanks! Reviewed-by: Aaron Tomlin --wA9WyeW1yVBM2Q32 Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAEBAgAGBQJWDOKfAAoJEOLellgZzHfWfNIP/ilCcSm2X4QP9Rbs7lePqr9+ N8YNGiggBQKMrRfBq6wa2FPCPKZVxbZHI3QsDBY05CWjPdWdEcB/pChDjADSI7O8 1u4LfpsK5VJR5gpJpvTGT1IZbPdyfVcPfHYx6P0/cAgeZUIF2x5ivPyOC0kI5LO+ 7OLLCQKVEYmx0OQR2s73ZPFygEl5u5Claj0FzrBbkALyyvthOT7HOwMQsBNNuBPG VmVPmUEllgIVDt1bOfkKM+uRhEH9ZhxXzcvcIbmglh/Tm4OJoaPpUdK94KbeJmde RuD8gMdgYtiV59Bjha+ElAI0pLuJNZoIgzOAexjtPckztsjC7faDgn5qQsVb/ygh pOeF7nVyLUZXVN9ovCjIyR/3m8TqlW9FpyA1UPE81zT08oM6qfQrmnkbd+I8epkc +GPRj+ZJT7osn0fi5Olxa2wNHk326K5Aa+UQn1Kt+Gsvs3PYkigcKz/cLdbhHqq3 rCY3QbGTOl8L66XpKHOYLeyCapT4QAh+j0XmJ132E1Pru3d82Bseul06/c+ee2zo 5wR80KdMowvcDopmXbzitRER7aDJCmb+oaMM9NfKX/VCjLrGEBvgUw0Y6mlqqov/ OPtRzlKkS2n6u6stHPsjxi14dVGH3n1INkyxwWyYlThG8z5Ue8+z7Pjs+urhZ2cG MYJMzREsMSGA5DFem2S/ =z1Wy -----END PGP SIGNATURE----- --wA9WyeW1yVBM2Q32--