linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Petr Mladek <pmladek@suse.com>
To: Douglas Anderson <dianders@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Mark Rutland <mark.rutland@arm.com>,
	Randy Dunlap <rdunlap@infradead.org>,
	Will Deacon <will@kernel.org>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Sumit Garg <sumit.garg@linaro.org>,
	Daniel Thompson <daniel.thompson@linaro.org>,
	Ian Rogers <irogers@google.com>,
	ravi.v.shankar@intel.com, Marc Zyngier <maz@kernel.org>,
	linux-perf-users@vger.kernel.org,
	Stephane Eranian <eranian@google.com>,
	kgdb-bugreport@lists.sourceforge.net, ito-yuichi@fujitsu.com,
	linux-arm-kernel@lists.infradead.org,
	Stephen Boyd <swboyd@chromium.org>,
	Masayoshi Mizuma <msys.mizuma@gmail.com>,
	ricardo.neri@intel.com,
	Lecopzer Chen <lecopzer.chen@mediatek.com>,
	Chen-Yu Tsai <wens@csie.org>, Andi Kleen <ak@linux.intel.com>,
	Colin Cross <ccross@android.com>,
	Matthias Kaehlcke <mka@chromium.org>,
	Guenter Roeck <groeck@chromium.org>,
	Tzung-Bi Shih <tzungbi@chromium.org>,
	Alexander Potapenko <glider@google.com>,
	AngeloGioacchino Del Regno 
	<angelogioacchino.delregno@collabora.com>,
	Geert Uytterhoeven <geert+renesas@glider.be>,
	Juergen Gross <jgross@suse.com>,
	Kees Cook <keescook@chromium.org>,
	Laurent Dufour <ldufour@linux.ibm.com>,
	Liam Howlett <liam.howlett@oracle.com>,
	Masahiro Yamada <masahiroy@kernel.org>,
	Matthias Brugger <matthias.bgg@gmail.com>,
	Michael Ellerman <mpe@ellerman.id.au>,
	Miguel Ojeda <ojeda@kernel.org>,
	Nathan Chancellor <nathan@kernel.org>,
	Nick Desaulniers <ndesaulniers@google.com>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Sami Tolvanen <samitolvanen@google.com>,
	Vlastimil Babka <vbabka@suse.cz>,
	Zhaoyang Huang <zhaoyang.huang@unisoc.com>,
	Zhen Lei <thunder.leizhen@huawei.com>,
	linux-kernel@vger.kernel.org, linux-mediatek@lists.infradead.org
Subject: cpu hotplug : was: Re: [PATCH v3] hardlockup: detect hard lockups using secondary (buddy) CPUs
Date: Tue, 2 May 2023 17:23:45 +0200	[thread overview]
Message-ID: <ZFEqynvf5nqkzEvQ@alley> (raw)
In-Reply-To: <20230501082341.v3.1.I6bf789d21d0c3d75d382e7e51a804a7a51315f2c@changeid>

On Mon 2023-05-01 08:24:46, Douglas Anderson wrote:
> From: Colin Cross <ccross@android.com>
> 
> Implement a hardlockup detector that doesn't doesn't need any extra
> arch-specific support code to detect lockups. Instead of using
> something arch-specific we will use the buddy system, where each CPU
> watches out for another one. Specifically, each CPU will use its
> softlockup hrtimer to check that the next CPU is processing hrtimer
> interrupts by verifying that a counter is increasing.
> 
> --- /dev/null
> +++ b/kernel/watchdog_buddy_cpu.c
> +int watchdog_nmi_enable(unsigned int cpu)
> +{
> +	/*
> +	 * The new CPU will be marked online before the first hrtimer interrupt
> +	 * runs on it.

It does not need to be the first hrtimer interrupt. The CPU might have
been offlined/onlined repeatedly. The counter might have any value.

> +      * If another CPU tests for a hardlockup on the new CPU
> +	 * before it has run its first hrtimer, it will get a false positive.
> +	 * Touch the watchdog on the new CPU to delay the first check for at
> +	 * least 3 sampling periods to guarantee one hrtimer has run on the new
> +	 * CPU.
> +	 */
> +	per_cpu(watchdog_touch, cpu) = true;

We should touch also the next_cpu:

	/*
	 * We are going to check the next CPU. Our watchdog_hrtimer
	 * need not be zero if the CPU has already been online earlier.
	 * Touch the watchdog on the next CPU to avoid false positive
	 * if we try to check it in less then 3 interrupts.
	 */
	next_cpu = watchdog_next_cpu(cpu);
	if (next_cpu < nr_cpu_ids)
		per_cpu(watchdog_touch, next_cpu) = true;

Alternative would be to clear watchdog_hrtimer. But it would kind-of
affect also the softlockup detector.


> +	/* Match with smp_rmb() in watchdog_check_hardlockup() */
> +	smp_wmb();
> +	cpumask_set_cpu(cpu, &watchdog_cpus);
> +	return 0;
> +}
> +
> +void watchdog_nmi_disable(unsigned int cpu)
> +{
> +	unsigned int next_cpu = watchdog_next_cpu(cpu);
> +
> +	/*
> +	 * Offlining this CPU will cause the CPU before this one to start
> +	 * checking the one after this one. If this CPU just finished checking
> +	 * the next CPU and updating hrtimer_interrupts_saved, and then the
> +	 * previous CPU checks it within one sample period, it will trigger a
> +	 * false positive. Touch the watchdog on the next CPU to prevent it.
> +	 */
> +	if (next_cpu < nr_cpu_ids)
> +		per_cpu(watchdog_touch, next_cpu) = true;
> +	/* Match with smp_rmb() in watchdog_check_hardlockup() */
> +	smp_wmb();
> +	cpumask_clear_cpu(cpu, &watchdog_cpus);
> +}
> +

Best Regards,
Petr

  reply	other threads:[~2023-05-02 15:23 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-01 15:24 [PATCH v3] hardlockup: detect hard lockups using secondary (buddy) CPUs Douglas Anderson
2023-05-02 15:23 ` Petr Mladek [this message]
2023-05-04 22:16   ` cpu hotplug : was: " Doug Anderson
2023-05-02 15:26 ` shared code: " Petr Mladek
2023-05-04 22:29   ` Doug Anderson
2023-05-04 22:38 ` Doug Anderson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZFEqynvf5nqkzEvQ@alley \
    --to=pmladek@suse.com \
    --cc=ak@linux.intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=angelogioacchino.delregno@collabora.com \
    --cc=catalin.marinas@arm.com \
    --cc=ccross@android.com \
    --cc=daniel.thompson@linaro.org \
    --cc=dianders@chromium.org \
    --cc=eranian@google.com \
    --cc=geert+renesas@glider.be \
    --cc=glider@google.com \
    --cc=groeck@chromium.org \
    --cc=irogers@google.com \
    --cc=ito-yuichi@fujitsu.com \
    --cc=jgross@suse.com \
    --cc=keescook@chromium.org \
    --cc=kgdb-bugreport@lists.sourceforge.net \
    --cc=ldufour@linux.ibm.com \
    --cc=lecopzer.chen@mediatek.com \
    --cc=liam.howlett@oracle.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mediatek@lists.infradead.org \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=masahiroy@kernel.org \
    --cc=matthias.bgg@gmail.com \
    --cc=maz@kernel.org \
    --cc=mka@chromium.org \
    --cc=mpe@ellerman.id.au \
    --cc=msys.mizuma@gmail.com \
    --cc=nathan@kernel.org \
    --cc=ndesaulniers@google.com \
    --cc=ojeda@kernel.org \
    --cc=paulmck@kernel.org \
    --cc=ravi.v.shankar@intel.com \
    --cc=rdunlap@infradead.org \
    --cc=ricardo.neri@intel.com \
    --cc=samitolvanen@google.com \
    --cc=sumit.garg@linaro.org \
    --cc=swboyd@chromium.org \
    --cc=thunder.leizhen@huawei.com \
    --cc=tzungbi@chromium.org \
    --cc=vbabka@suse.cz \
    --cc=wens@csie.org \
    --cc=will@kernel.org \
    --cc=zhaoyang.huang@unisoc.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).