linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Randy Dunlap <rdunlap@infradead.org>
To: Douglas Anderson <dianders@chromium.org>,
	Petr Mladek <pmladek@suse.com>,
	Andrew Morton <akpm@linux-foundation.org>
Cc: Lecopzer Chen <lecopzer.chen@mediatek.com>,
	Daniel Thompson <daniel.thompson@linaro.org>,
	Stephen Boyd <swboyd@chromium.org>, Chen-Yu Tsai <wens@csie.org>,
	linux-arm-kernel@lists.infradead.org,
	kgdb-bugreport@lists.sourceforge.net,
	Marc Zyngier <maz@kernel.org>,
	linux-perf-users@vger.kernel.org,
	Mark Rutland <mark.rutland@arm.com>,
	Masayoshi Mizuma <msys.mizuma@gmail.com>,
	Will Deacon <will@kernel.org>,
	ito-yuichi@fujitsu.com, Sumit Garg <sumit.garg@linaro.org>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Colin Cross <ccross@android.com>,
	Matthias Kaehlcke <mka@chromium.org>,
	Guenter Roeck <groeck@chromium.org>,
	Tzung-Bi Shih <tzungbi@chromium.org>,
	Alexander Potapenko <glider@google.com>,
	AngeloGioacchino Del Regno 
	<angelogioacchino.delregno@collabora.com>,
	Dan Williams <dan.j.williams@intel.com>,
	Geert Uytterhoeven <geert+renesas@glider.be>,
	Ingo Molnar <mingo@kernel.org>,
	John Ogness <john.ogness@linutronix.de>,
	Josh Poimboeuf <jpoimboe@kernel.org>,
	Juergen Gross <jgross@suse.com>,
	Kees Cook <keescook@chromium.org>,
	Laurent Dufour <ldufour@linux.ibm.com>,
	Liam Howlett <liam.howlett@oracle.com>,
	Marco Elver <elver@google.com>,
	Matthias Brugger <matthias.bgg@gmail.com>,
	Michael Ellerman <mpe@ellerman.id.au>,
	Miguel Ojeda <ojeda@kernel.org>,
	Nathan Chancellor <nathan@kernel.org>,
	Nick Desaulniers <ndesaulniers@google.com>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Rasmus Villemoes <linux@rasmusvillemoes.dk>,
	Sami Tolvanen <samitolvanen@google.com>,
	Stefano Stabellini <sstabellini@kernel.org>,
	Vlastimil Babka <vbabka@suse.cz>,
	Zhaoyang Huang <zhaoyang.huang@unisoc.com>,
	Zhen Lei <thunder.leizhen@huawei.com>,
	linux-kernel@vger.kernel.org, linux-mediatek@lists.infradead.org
Subject: Re: [PATCH] hardlockup: detect hard lockups using secondary (buddy) cpus
Date: Fri, 21 Apr 2023 16:59:27 -0700	[thread overview]
Message-ID: <d54fe26d-0f11-e422-d5f3-841c663b9d6f@infradead.org> (raw)
In-Reply-To: <20230421155255.1.I6bf789d21d0c3d75d382e7e51a804a7a51315f2c@changeid>

Hi--

On 4/21/23 15:53, Douglas Anderson wrote:
> From: Colin Cross <ccross@android.com>
> 
> Implement a hardlockup detector that can be enabled on SMP systems
> that don't have an arch provided one or one implemented atop perf by

Is that                            one or more
?

> using interrupts on other cpus. Each cpu will use its softlockup
> hrtimer to check that the next cpu is processing hrtimer interrupts by
> verifying that a counter is increasing.
> 
> NOTE: unlike the other hard lockup detectors, the buddy one can't
> easily provide a backtrace on the CPU that locked up. It relies on
> some other mechanism in the system to get information about the locked
> up CPUs. This could be support for NMI backtraces like [1], it could
> be a mechanism for printing the PC of locked CPUs like [2], or it
> could be something else.
> 
> This style of hardlockup detector originated in some downstream
> Android trees and has been rebased on / carried in ChromeOS trees for
> quite a long time for use on arm and arm64 boards. Historically on
> these boards we've leveraged mechanism [2] to get information about
> hung CPUs, but we could move to [1].
> 
> NOTE: the buddy system is not really useful to enable on any
> architectures that have a better mechanism. On arm64 folks have been
> trying to get a better mechanism for years and there has even been
> recent posts of patches adding support [3]. However, nothing about the
> buddy system is tied to arm64 and several archs (even arm32, where it
> was originally developed) could find it useful.
> 
> [1] https://lore.kernel.org/r/20230419225604.21204-1-dianders@chromium.org
> [2] https://issuetracker.google.com/172213129
> [3] https://lore.kernel.org/linux-arm-kernel/20220903093415.15850-1-lecopzer.chen@mediatek.com/
> 
> Signed-off-by: Colin Cross <ccross@android.com>
> Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
> Signed-off-by: Guenter Roeck <groeck@chromium.org>
> Signed-off-by: Tzung-Bi Shih <tzungbi@chromium.org>
> Signed-off-by: Douglas Anderson <dianders@chromium.org>
> ---
> This patch has been rebased in ChromeOS kernel trees many times, and
> each time someone had to do work on it they added their
> Signed-off-by. I've included those here. I've also left the author as
> Colin Cross since the core code is still his.
> 
> I'll also note that the CC list is pretty giant, but that's what
> get_maintainers came up with (plus a few other folks I thought would
> be interested). As far as I can tell, there's no true MAINTAINER
> listed for the existing watchdog code. Assuming people don't hate
> this, maybe it would go through Andrew Morton's tree?
> 
>  include/linux/nmi.h         |  18 ++++-
>  kernel/Makefile             |   1 +
>  kernel/watchdog.c           |  24 ++++--
>  kernel/watchdog_buddy_cpu.c | 141 ++++++++++++++++++++++++++++++++++++
>  lib/Kconfig.debug           |  19 ++++-
>  5 files changed, 192 insertions(+), 11 deletions(-)
>  create mode 100644 kernel/watchdog_buddy_cpu.c
> 

> diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> index 39d1d93164bd..9eb86bc9f5ee 100644
> --- a/lib/Kconfig.debug
> +++ b/lib/Kconfig.debug
> @@ -1036,6 +1036,9 @@ config HARDLOCKUP_DETECTOR_PERF
>  config HARDLOCKUP_CHECK_TIMESTAMP
>  	bool
>  
> +config HARDLOCKUP_DETECTOR_CORE
> +	bool
> +
>  #
>  # arch/ can define HAVE_HARDLOCKUP_DETECTOR_ARCH to provide their own hard
>  # lockup detector rather than the perf based detector.
> @@ -1045,6 +1048,7 @@ config HARDLOCKUP_DETECTOR
>  	depends on DEBUG_KERNEL && !S390
>  	depends on HAVE_HARDLOCKUP_DETECTOR_PERF || HAVE_HARDLOCKUP_DETECTOR_ARCH
>  	select LOCKUP_DETECTOR
> +	select HARDLOCKUP_DETECTOR_CORE
>  	select HARDLOCKUP_DETECTOR_PERF if HAVE_HARDLOCKUP_DETECTOR_PERF
>  	help
>  	  Say Y here to enable the kernel to act as a watchdog to detect
> @@ -1055,9 +1059,22 @@ config HARDLOCKUP_DETECTOR
>  	  chance to run.  The current stack trace is displayed upon detection
>  	  and the system will stay locked up.
>  
> +config HARDLOCKUP_DETECTOR_BUDDY_CPU
> +	bool "Buddy CPU hardlockup detector"
> +	depends on DEBUG_KERNEL && SMP
> +	depends on !HARDLOCKUP_DETECTOR && !HAVE_NMI_WATCHDOG
> +	depends on !S390
> +	select HARDLOCKUP_DETECTOR_CORE
> +	select SOFTLOCKUP_DETECTOR
> +	help
> +	  Say Y here to enable a hardlockup detector where CPUs check
> +	  each other for lockup. Each cpu uses its softlockup hrtimer

Preferably                            CPU

> +	  to check that the next cpu is processing hrtimer interrupts by

and                              CPU

> +	  verifying that a counter is increasing.
> +
>  config BOOTPARAM_HARDLOCKUP_PANIC
>  	bool "Panic (Reboot) On Hard Lockups"
> -	depends on HARDLOCKUP_DETECTOR
> +	depends on HARDLOCKUP_DETECTOR_CORE
>  	help
>  	  Say Y here to enable the kernel to panic on "hard lockups",
>  	  which are bugs that cause the kernel to loop in kernel

-- 
~Randy

  reply	other threads:[~2023-04-22  0:00 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-21 22:53 [PATCH] hardlockup: detect hard lockups using secondary (buddy) cpus Douglas Anderson
2023-04-21 23:59 ` Randy Dunlap [this message]
2023-04-22  1:19 ` Ian Rogers
2023-04-24 15:23   ` Doug Anderson
2023-05-07 17:12     ` Andi Kleen
2023-04-24 12:53 ` Daniel Thompson
2023-04-24 15:41   ` Doug Anderson
2023-04-25  4:58     ` Chen-Yu Tsai
2023-04-25 15:26       ` Doug Anderson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d54fe26d-0f11-e422-d5f3-841c663b9d6f@infradead.org \
    --to=rdunlap@infradead.org \
    --cc=akpm@linux-foundation.org \
    --cc=angelogioacchino.delregno@collabora.com \
    --cc=catalin.marinas@arm.com \
    --cc=ccross@android.com \
    --cc=dan.j.williams@intel.com \
    --cc=daniel.thompson@linaro.org \
    --cc=dianders@chromium.org \
    --cc=elver@google.com \
    --cc=geert+renesas@glider.be \
    --cc=glider@google.com \
    --cc=groeck@chromium.org \
    --cc=ito-yuichi@fujitsu.com \
    --cc=jgross@suse.com \
    --cc=john.ogness@linutronix.de \
    --cc=jpoimboe@kernel.org \
    --cc=keescook@chromium.org \
    --cc=kgdb-bugreport@lists.sourceforge.net \
    --cc=ldufour@linux.ibm.com \
    --cc=lecopzer.chen@mediatek.com \
    --cc=liam.howlett@oracle.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mediatek@lists.infradead.org \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=linux@rasmusvillemoes.dk \
    --cc=mark.rutland@arm.com \
    --cc=matthias.bgg@gmail.com \
    --cc=maz@kernel.org \
    --cc=mingo@kernel.org \
    --cc=mka@chromium.org \
    --cc=mpe@ellerman.id.au \
    --cc=msys.mizuma@gmail.com \
    --cc=nathan@kernel.org \
    --cc=ndesaulniers@google.com \
    --cc=ojeda@kernel.org \
    --cc=paulmck@kernel.org \
    --cc=peterz@infradead.org \
    --cc=pmladek@suse.com \
    --cc=samitolvanen@google.com \
    --cc=sstabellini@kernel.org \
    --cc=sumit.garg@linaro.org \
    --cc=swboyd@chromium.org \
    --cc=thunder.leizhen@huawei.com \
    --cc=tzungbi@chromium.org \
    --cc=vbabka@suse.cz \
    --cc=wens@csie.org \
    --cc=will@kernel.org \
    --cc=zhaoyang.huang@unisoc.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).