Re: [PATCH v4 07/17] watchdog/hardlockup: Move perf hardlockup checking/panic to common watchdog.c

linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: "Nicholas Piggin" <npiggin@gmail.com>
To: "Douglas Anderson" <dianders@chromium.org>,
	"Petr Mladek" <pmladek@suse.com>,
	"Andrew Morton" <akpm@linux-foundation.org>
Cc: "Sumit Garg" <sumit.garg@linaro.org>,
	"Mark Rutland" <mark.rutland@arm.com>,
	"Matthias Kaehlcke" <mka@chromium.org>,
	"Stephane Eranian" <eranian@google.com>,
	"Stephen Boyd" <swboyd@chromium.org>, <ricardo.neri@intel.com>,
	"Tzung-Bi Shih" <tzungbi@chromium.org>,
	"Lecopzer Chen" <lecopzer.chen@mediatek.com>,
	<kgdb-bugreport@lists.sourceforge.net>,
	"Masayoshi Mizuma" <msys.mizuma@gmail.com>,
	"Guenter Roeck" <groeck@chromium.org>,
	"Pingfan Liu" <kernelfans@gmail.com>,
	"Andi Kleen" <ak@linux.intel.com>,
	"Ian Rogers" <irogers@google.com>,
	<linux-arm-kernel@lists.infradead.org>,
	<linux-perf-users@vger.kernel.org>, <ito-yuichi@fujitsu.com>,
	"Randy Dunlap" <rdunlap@infradead.org>,
	"Chen-Yu Tsai" <wens@csie.org>, <christophe.leroy@csgroup.eu>,
	<davem@davemloft.net>, <sparclinux@vger.kernel.org>,
	<mpe@ellerman.id.au>, "Will Deacon" <will@kernel.org>,
	<ravi.v.shankar@intel.com>, <linuxppc-dev@lists.ozlabs.org>,
	"Marc Zyngier" <maz@kernel.org>,
	"Catalin Marinas" <catalin.marinas@arm.com>,
	"Daniel Thompson" <daniel.thompson@linaro.org>
Subject: Re: [PATCH v4 07/17] watchdog/hardlockup: Move perf hardlockup checking/panic to common watchdog.c
Date: Fri, 05 May 2023 12:58:22 +1000	[thread overview]
Message-ID: <CSE09YL4X0XY.1GAQWAFOOEK42@wheely> (raw)
In-Reply-To: <20230504151100.v4.7.Id4133d3183e798122dc3b6205e7852601f289071@changeid>

On Fri May 5, 2023 at 8:13 AM AEST, Douglas Anderson wrote:
> The perf hardlockup detector works by looking at interrupt counts and
> seeing if they change from run to run. The interrupt counts are
> managed by the common watchdog code via its watchdog_timer_fn().
>
> Currently the API between the perf detector and the common code is a
> function: is_hardlockup(). When the hard lockup detector sees that
> function return true then it handles printing out debug info and
> inducing a panic if necessary.
>
> Let's change the API a little bit in preparation for the buddy
> hardlockup detector. The buddy hardlockup detector wants to print

I think the name change is a gratuitous. Especially since it's now
static.

watchdog_hardlockup_ is a pretty long prefix too, hardlockup_ 
should be enough?

Seems okay otherwise though.

Thanks,
Nick

> nearly the same debug info and have nearly the same panic
> behavior. That means we want to move all that code to the common
> file. For now, the code in the common file will only be there if the
> perf hardlockup detector is enabled, but eventually it will be
> selected by a common config.
>
> Right now, this _just_ moves the code from the perf detector file to
> the common file and changes the names. It doesn't make the changes
> that the buddy hardlockup detector will need and doesn't do any style
> cleanups. A future patch will do cleanup to make it more obvious what
> changed.
>
> With the above, we no longer have any callers of is_hardlockup()
> outside of the "watchdog.c" file, so we can remove it from the header,
> make it static, move it to the same "#ifdef" block as our new
> watchdog_hardlockup_check(), and rename it to make it obvious it's
> just for hardlockup detectors. While doing this, it can be noted that
> even if no hardlockup detectors were configured the existing code used
> to still have the code for counting/checking "hrtimer_interrupts" even
> if the perf hardlockup detector wasn't configured. We didn't need to
> do that, so move all the "hrtimer_interrupts" counting to only be
> there if the perf hardlockup detector is configured as well.
>
> This change is expected to be a no-op.
>
> Signed-off-by: Douglas Anderson <dianders@chromium.org>
> ---
>
> Changes in v4:
> - ("Move perf hardlockup checking/panic ...") new for v4.
>
>  include/linux/nmi.h    |  5 ++-
>  kernel/watchdog.c      | 92 +++++++++++++++++++++++++++++++++---------
>  kernel/watchdog_perf.c | 42 +------------------
>  3 files changed, 78 insertions(+), 61 deletions(-)
>
> diff --git a/include/linux/nmi.h b/include/linux/nmi.h
> index 35d09d70f394..c6cb9bc5dc80 100644
> --- a/include/linux/nmi.h
> +++ b/include/linux/nmi.h
> @@ -15,7 +15,6 @@
>  void lockup_detector_init(void);
>  void lockup_detector_soft_poweroff(void);
>  void lockup_detector_cleanup(void);
> -bool is_hardlockup(void);
>  
>  extern int watchdog_user_enabled;
>  extern int nmi_watchdog_user_enabled;
> @@ -88,6 +87,10 @@ extern unsigned int hardlockup_panic;
>  static inline void hardlockup_detector_disable(void) {}
>  #endif
>  
> +#if defined(CONFIG_HARDLOCKUP_DETECTOR_PERF)
> +void watchdog_hardlockup_check(struct pt_regs *regs);
> +#endif
> +
>  #if defined(CONFIG_HAVE_NMI_WATCHDOG) || defined(CONFIG_HARDLOCKUP_DETECTOR)
>  # define NMI_WATCHDOG_SYSCTL_PERM	0644
>  #else
> diff --git a/kernel/watchdog.c b/kernel/watchdog.c
> index c705a18b26bf..2d319cdf64b9 100644
> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -85,6 +85,78 @@ __setup("nmi_watchdog=", hardlockup_panic_setup);
>  
>  #endif /* CONFIG_HARDLOCKUP_DETECTOR */
>  
> +#if defined(CONFIG_HARDLOCKUP_DETECTOR_PERF)
> +
> +static DEFINE_PER_CPU(unsigned long, hrtimer_interrupts);
> +static DEFINE_PER_CPU(unsigned long, hrtimer_interrupts_saved);
> +static DEFINE_PER_CPU(bool, hard_watchdog_warn);
> +static unsigned long hardlockup_allcpu_dumped;
> +
> +static bool watchdog_hardlockup_is_lockedup(void)
> +{
> +	unsigned long hrint = __this_cpu_read(hrtimer_interrupts);
> +
> +	if (__this_cpu_read(hrtimer_interrupts_saved) == hrint)
> +		return true;
> +
> +	__this_cpu_write(hrtimer_interrupts_saved, hrint);
> +	return false;
> +}
> +
> +static void watchdog_hardlockup_interrupt_count(void)
> +{
> +	__this_cpu_inc(hrtimer_interrupts);
> +}
> +
> +void watchdog_hardlockup_check(struct pt_regs *regs)
> +{
> +	/* check for a hardlockup
> +	 * This is done by making sure our timer interrupt
> +	 * is incrementing.  The timer interrupt should have
> +	 * fired multiple times before we overflow'd.  If it hasn't
> +	 * then this is a good indication the cpu is stuck
> +	 */
> +	if (watchdog_hardlockup_is_lockedup()) {
> +		int this_cpu = smp_processor_id();
> +
> +		/* only print hardlockups once */
> +		if (__this_cpu_read(hard_watchdog_warn) == true)
> +			return;
> +
> +		pr_emerg("Watchdog detected hard LOCKUP on cpu %d\n",
> +			 this_cpu);
> +		print_modules();
> +		print_irqtrace_events(current);
> +		if (regs)
> +			show_regs(regs);
> +		else
> +			dump_stack();
> +
> +		/*
> +		 * Perform all-CPU dump only once to avoid multiple hardlockups
> +		 * generating interleaving traces
> +		 */
> +		if (sysctl_hardlockup_all_cpu_backtrace &&
> +				!test_and_set_bit(0, &hardlockup_allcpu_dumped))
> +			trigger_allbutself_cpu_backtrace();
> +
> +		if (hardlockup_panic)
> +			nmi_panic(regs, "Hard LOCKUP");
> +
> +		__this_cpu_write(hard_watchdog_warn, true);
> +		return;
> +	}
> +
> +	__this_cpu_write(hard_watchdog_warn, false);
> +	return;
> +}
> +
> +#else /* CONFIG_HARDLOCKUP_DETECTOR_PERF */
> +
> +static inline void watchdog_hardlockup_interrupt_count(void) { }
> +
> +#endif /* !CONFIG_HARDLOCKUP_DETECTOR_PERF */
> +
>  /*
>   * These functions can be overridden if an architecture implements its
>   * own hardlockup detector.
> @@ -176,8 +248,6 @@ static DEFINE_PER_CPU(unsigned long, watchdog_touch_ts);
>  static DEFINE_PER_CPU(unsigned long, watchdog_report_ts);
>  static DEFINE_PER_CPU(struct hrtimer, watchdog_hrtimer);
>  static DEFINE_PER_CPU(bool, softlockup_touch_sync);
> -static DEFINE_PER_CPU(unsigned long, hrtimer_interrupts);
> -static DEFINE_PER_CPU(unsigned long, hrtimer_interrupts_saved);
>  static unsigned long soft_lockup_nmi_warn;
>  
>  static int __init nowatchdog_setup(char *str)
> @@ -312,22 +382,6 @@ static int is_softlockup(unsigned long touch_ts,
>  }
>  
>  /* watchdog detector functions */
> -bool is_hardlockup(void)
> -{
> -	unsigned long hrint = __this_cpu_read(hrtimer_interrupts);
> -
> -	if (__this_cpu_read(hrtimer_interrupts_saved) == hrint)
> -		return true;
> -
> -	__this_cpu_write(hrtimer_interrupts_saved, hrint);
> -	return false;
> -}
> -
> -static void watchdog_interrupt_count(void)
> -{
> -	__this_cpu_inc(hrtimer_interrupts);
> -}
> -
>  static DEFINE_PER_CPU(struct completion, softlockup_completion);
>  static DEFINE_PER_CPU(struct cpu_stop_work, softlockup_stop_work);
>  
> @@ -359,7 +413,7 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
>  		return HRTIMER_NORESTART;
>  
>  	/* kick the hardlockup detector */
> -	watchdog_interrupt_count();
> +	watchdog_hardlockup_interrupt_count();
>  
>  	/* kick the softlockup detector */
>  	if (completion_done(this_cpu_ptr(&softlockup_completion))) {
> diff --git a/kernel/watchdog_perf.c b/kernel/watchdog_perf.c
> index c3d8ceb149da..5f3651b87ee7 100644
> --- a/kernel/watchdog_perf.c
> +++ b/kernel/watchdog_perf.c
> @@ -20,13 +20,11 @@
>  #include <asm/irq_regs.h>
>  #include <linux/perf_event.h>
>  
> -static DEFINE_PER_CPU(bool, hard_watchdog_warn);
>  static DEFINE_PER_CPU(bool, watchdog_nmi_touch);
>  static DEFINE_PER_CPU(struct perf_event *, watchdog_ev);
>  static DEFINE_PER_CPU(struct perf_event *, dead_event);
>  static struct cpumask dead_events_mask;
>  
> -static unsigned long hardlockup_allcpu_dumped;
>  static atomic_t watchdog_cpus = ATOMIC_INIT(0);
>  
>  notrace void arch_touch_nmi_watchdog(void)
> @@ -122,45 +120,7 @@ static void watchdog_overflow_callback(struct perf_event *event,
>  	if (!watchdog_check_timestamp())
>  		return;
>  
> -	/* check for a hardlockup
> -	 * This is done by making sure our timer interrupt
> -	 * is incrementing.  The timer interrupt should have
> -	 * fired multiple times before we overflow'd.  If it hasn't
> -	 * then this is a good indication the cpu is stuck
> -	 */
> -	if (is_hardlockup()) {
> -		int this_cpu = smp_processor_id();
> -
> -		/* only print hardlockups once */
> -		if (__this_cpu_read(hard_watchdog_warn) == true)
> -			return;
> -
> -		pr_emerg("Watchdog detected hard LOCKUP on cpu %d\n",
> -			 this_cpu);
> -		print_modules();
> -		print_irqtrace_events(current);
> -		if (regs)
> -			show_regs(regs);
> -		else
> -			dump_stack();
> -
> -		/*
> -		 * Perform all-CPU dump only once to avoid multiple hardlockups
> -		 * generating interleaving traces
> -		 */
> -		if (sysctl_hardlockup_all_cpu_backtrace &&
> -				!test_and_set_bit(0, &hardlockup_allcpu_dumped))
> -			trigger_allbutself_cpu_backtrace();
> -
> -		if (hardlockup_panic)
> -			nmi_panic(regs, "Hard LOCKUP");
> -
> -		__this_cpu_write(hard_watchdog_warn, true);
> -		return;
> -	}
> -
> -	__this_cpu_write(hard_watchdog_warn, false);
> -	return;
> +	watchdog_hardlockup_check(regs);
>  }
>  
>  static int hardlockup_detector_event_create(void)
> -- 
> 2.40.1.521.gf1e218fcd8-goog

next prev parent reply	other threads:[~2023-05-05  2:58 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-04 22:13 [PATCH v4 00/17] watchdog/hardlockup: Add the buddy hardlockup detector Douglas Anderson
2023-05-04 22:13 ` [PATCH v4 01/17] watchdog/perf: Define dummy watchdog_update_hrtimer_threshold() on correct config Douglas Anderson
2023-05-05  2:43   ` Nicholas Piggin
2023-05-11  8:39     ` Petr Mladek
2023-05-04 22:13 ` [PATCH v4 02/17] watchdog: remove WATCHDOG_DEFAULT Douglas Anderson
2023-05-04 22:13 ` [PATCH v4 03/17] watchdog/hardlockup: change watchdog_nmi_enable() to void Douglas Anderson
2023-05-05  2:45   ` Nicholas Piggin
2023-05-04 22:13 ` [PATCH v4 04/17] watchdog/perf: Ensure CPU-bound context when creating hardlockup detector event Douglas Anderson
2023-05-04 22:13 ` [PATCH v4 05/17] watchdog/hardlockup: Rename touch_nmi_watchdog() to touch_hardlockup_watchdog() Douglas Anderson
2023-05-05  2:51   ` Nicholas Piggin
2023-05-05 16:37     ` Doug Anderson
2023-05-08  1:34       ` Nicholas Piggin
2023-05-08 15:56         ` Doug Anderson
2023-05-11  9:24       ` Petr Mladek
2023-05-04 22:13 ` [PATCH v4 06/17] watchdog/perf: Rename watchdog_hld.c to watchdog_perf.c Douglas Anderson
2023-05-05  2:53   ` Nicholas Piggin
2023-05-11 10:09   ` Petr Mladek
2023-05-04 22:13 ` [PATCH v4 07/17] watchdog/hardlockup: Move perf hardlockup checking/panic to common watchdog.c Douglas Anderson
2023-05-05  2:58   ` Nicholas Piggin [this message]
2023-05-05 16:37     ` Doug Anderson
2023-05-11 12:03       ` Petr Mladek
2023-05-04 22:13 ` [PATCH v4 08/17] watchdog/hardlockup: Style changes to watchdog_hardlockup_check() / ..._is_lockedup() Douglas Anderson
2023-05-05  3:01   ` Nicholas Piggin
2023-05-05 16:38     ` Doug Anderson
2023-05-11 12:45       ` Petr Mladek
2023-05-04 22:13 ` [PATCH v4 09/17] watchdog/hardlockup: Add a "cpu" param to watchdog_hardlockup_check() Douglas Anderson
2023-05-11 14:14   ` Petr Mladek
2023-05-19 17:21     ` Doug Anderson
2023-05-04 22:13 ` [PATCH v4 10/17] watchdog/hardlockup: Move perf hardlockup watchdog petting to watchdog.c Douglas Anderson
2023-05-11 15:46   ` Petr Mladek
2023-05-19 17:22     ` Doug Anderson
2023-05-04 22:13 ` [PATCH v4 11/17] watchdog/hardlockup: Rename some "NMI watchdog" constants/function Douglas Anderson
2023-05-05  3:06   ` Nicholas Piggin
2023-05-05 16:38     ` Doug Anderson
2023-05-12 11:21     ` Petr Mladek
2023-05-04 22:13 ` [PATCH v4 12/17] watchdog/hardlockup: Have the perf hardlockup use __weak functions more cleanly Douglas Anderson
2023-05-12 11:55   ` Petr Mladek
2023-05-04 22:13 ` [PATCH v4 13/17] watchdog/hardlockup: detect hard lockups using secondary (buddy) CPUs Douglas Anderson
2023-05-05  2:35   ` Nicholas Piggin
2023-05-05 16:35     ` Doug Anderson
2023-05-08  1:04       ` Nicholas Piggin
2023-05-08 15:52         ` Doug Anderson
2023-05-19 17:23           ` Doug Anderson
2023-05-04 22:13 ` [PATCH v4 14/17] watchdog/perf: Add a weak function for an arch to detect if perf can use NMIs Douglas Anderson
2023-05-04 22:13 ` [PATCH v4 15/17] watchdog/perf: Adapt the watchdog_perf interface for async model Douglas Anderson
2023-05-04 22:13 ` [PATCH v4 16/17] arm64: add hw_nmi_get_sample_period for preparation of lockup detector Douglas Anderson
2023-05-04 22:13 ` [PATCH v4 17/17] arm64: Enable perf events based hard " Douglas Anderson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CSE09YL4X0XY.1GAQWAFOOEK42@wheely \
    --to=npiggin@gmail.com \
    --cc=ak@linux.intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=catalin.marinas@arm.com \
    --cc=christophe.leroy@csgroup.eu \
    --cc=daniel.thompson@linaro.org \
    --cc=davem@davemloft.net \
    --cc=dianders@chromium.org \
    --cc=eranian@google.com \
    --cc=groeck@chromium.org \
    --cc=irogers@google.com \
    --cc=ito-yuichi@fujitsu.com \
    --cc=kernelfans@gmail.com \
    --cc=kgdb-bugreport@lists.sourceforge.net \
    --cc=lecopzer.chen@mediatek.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mark.rutland@arm.com \
    --cc=maz@kernel.org \
    --cc=mka@chromium.org \
    --cc=mpe@ellerman.id.au \
    --cc=msys.mizuma@gmail.com \
    --cc=pmladek@suse.com \
    --cc=ravi.v.shankar@intel.com \
    --cc=rdunlap@infradead.org \
    --cc=ricardo.neri@intel.com \
    --cc=sparclinux@vger.kernel.org \
    --cc=sumit.garg@linaro.org \
    --cc=swboyd@chromium.org \
    --cc=tzungbi@chromium.org \
    --cc=wens@csie.org \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).