From: Petr Mladek <pmladek@suse.com>
To: Douglas Anderson <dianders@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Mark Rutland <mark.rutland@arm.com>,
Randy Dunlap <rdunlap@infradead.org>,
Will Deacon <will@kernel.org>,
Catalin Marinas <catalin.marinas@arm.com>,
Sumit Garg <sumit.garg@linaro.org>,
Daniel Thompson <daniel.thompson@linaro.org>,
Ian Rogers <irogers@google.com>,
ravi.v.shankar@intel.com, Marc Zyngier <maz@kernel.org>,
linux-perf-users@vger.kernel.org,
Stephane Eranian <eranian@google.com>,
kgdb-bugreport@lists.sourceforge.net, ito-yuichi@fujitsu.com,
linux-arm-kernel@lists.infradead.org,
Stephen Boyd <swboyd@chromium.org>,
Masayoshi Mizuma <msys.mizuma@gmail.com>,
ricardo.neri@intel.com,
Lecopzer Chen <lecopzer.chen@mediatek.com>,
Chen-Yu Tsai <wens@csie.org>, Andi Kleen <ak@linux.intel.com>,
Colin Cross <ccross@android.com>,
Matthias Kaehlcke <mka@chromium.org>,
Guenter Roeck <groeck@chromium.org>,
Tzung-Bi Shih <tzungbi@chromium.org>,
Alexander Potapenko <glider@google.com>,
AngeloGioacchino Del Regno
<angelogioacchino.delregno@collabora.com>,
Geert Uytterhoeven <geert+renesas@glider.be>,
Juergen Gross <jgross@suse.com>,
Kees Cook <keescook@chromium.org>,
Laurent Dufour <ldufour@linux.ibm.com>,
Liam Howlett <liam.howlett@oracle.com>,
Masahiro Yamada <masahiroy@kernel.org>,
Matthias Brugger <matthias.bgg@gmail.com>,
Michael Ellerman <mpe@ellerman.id.au>,
Miguel Ojeda <ojeda@kernel.org>,
Nathan Chancellor <nathan@kernel.org>,
Nick Desaulniers <ndesaulniers@google.com>,
"Paul E. McKenney" <paulmck@kernel.org>,
Sami Tolvanen <samitolvanen@google.com>,
Vlastimil Babka <vbabka@suse.cz>,
Zhaoyang Huang <zhaoyang.huang@unisoc.com>,
Zhen Lei <thunder.leizhen@huawei.com>,
linux-kernel@vger.kernel.org, linux-mediatek@lists.infradead.org
Subject: cpu hotplug : was: Re: [PATCH v3] hardlockup: detect hard lockups using secondary (buddy) CPUs
Date: Tue, 2 May 2023 17:23:45 +0200 [thread overview]
Message-ID: <ZFEqynvf5nqkzEvQ@alley> (raw)
In-Reply-To: <20230501082341.v3.1.I6bf789d21d0c3d75d382e7e51a804a7a51315f2c@changeid>
On Mon 2023-05-01 08:24:46, Douglas Anderson wrote:
> From: Colin Cross <ccross@android.com>
>
> Implement a hardlockup detector that doesn't doesn't need any extra
> arch-specific support code to detect lockups. Instead of using
> something arch-specific we will use the buddy system, where each CPU
> watches out for another one. Specifically, each CPU will use its
> softlockup hrtimer to check that the next CPU is processing hrtimer
> interrupts by verifying that a counter is increasing.
>
> --- /dev/null
> +++ b/kernel/watchdog_buddy_cpu.c
> +int watchdog_nmi_enable(unsigned int cpu)
> +{
> + /*
> + * The new CPU will be marked online before the first hrtimer interrupt
> + * runs on it.
It does not need to be the first hrtimer interrupt. The CPU might have
been offlined/onlined repeatedly. The counter might have any value.
> + * If another CPU tests for a hardlockup on the new CPU
> + * before it has run its first hrtimer, it will get a false positive.
> + * Touch the watchdog on the new CPU to delay the first check for at
> + * least 3 sampling periods to guarantee one hrtimer has run on the new
> + * CPU.
> + */
> + per_cpu(watchdog_touch, cpu) = true;
We should touch also the next_cpu:
/*
* We are going to check the next CPU. Our watchdog_hrtimer
* need not be zero if the CPU has already been online earlier.
* Touch the watchdog on the next CPU to avoid false positive
* if we try to check it in less then 3 interrupts.
*/
next_cpu = watchdog_next_cpu(cpu);
if (next_cpu < nr_cpu_ids)
per_cpu(watchdog_touch, next_cpu) = true;
Alternative would be to clear watchdog_hrtimer. But it would kind-of
affect also the softlockup detector.
> + /* Match with smp_rmb() in watchdog_check_hardlockup() */
> + smp_wmb();
> + cpumask_set_cpu(cpu, &watchdog_cpus);
> + return 0;
> +}
> +
> +void watchdog_nmi_disable(unsigned int cpu)
> +{
> + unsigned int next_cpu = watchdog_next_cpu(cpu);
> +
> + /*
> + * Offlining this CPU will cause the CPU before this one to start
> + * checking the one after this one. If this CPU just finished checking
> + * the next CPU and updating hrtimer_interrupts_saved, and then the
> + * previous CPU checks it within one sample period, it will trigger a
> + * false positive. Touch the watchdog on the next CPU to prevent it.
> + */
> + if (next_cpu < nr_cpu_ids)
> + per_cpu(watchdog_touch, next_cpu) = true;
> + /* Match with smp_rmb() in watchdog_check_hardlockup() */
> + smp_wmb();
> + cpumask_clear_cpu(cpu, &watchdog_cpus);
> +}
> +
Best Regards,
Petr
next prev parent reply other threads:[~2023-05-02 15:23 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-05-01 15:24 [PATCH v3] hardlockup: detect hard lockups using secondary (buddy) CPUs Douglas Anderson
2023-05-02 15:23 ` Petr Mladek [this message]
2023-05-04 22:16 ` cpu hotplug : was: " Doug Anderson
2023-05-02 15:26 ` shared code: " Petr Mladek
2023-05-04 22:29 ` Doug Anderson
2023-05-04 22:38 ` Doug Anderson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZFEqynvf5nqkzEvQ@alley \
--to=pmladek@suse.com \
--cc=ak@linux.intel.com \
--cc=akpm@linux-foundation.org \
--cc=angelogioacchino.delregno@collabora.com \
--cc=catalin.marinas@arm.com \
--cc=ccross@android.com \
--cc=daniel.thompson@linaro.org \
--cc=dianders@chromium.org \
--cc=eranian@google.com \
--cc=geert+renesas@glider.be \
--cc=glider@google.com \
--cc=groeck@chromium.org \
--cc=irogers@google.com \
--cc=ito-yuichi@fujitsu.com \
--cc=jgross@suse.com \
--cc=keescook@chromium.org \
--cc=kgdb-bugreport@lists.sourceforge.net \
--cc=ldufour@linux.ibm.com \
--cc=lecopzer.chen@mediatek.com \
--cc=liam.howlett@oracle.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mediatek@lists.infradead.org \
--cc=linux-perf-users@vger.kernel.org \
--cc=mark.rutland@arm.com \
--cc=masahiroy@kernel.org \
--cc=matthias.bgg@gmail.com \
--cc=maz@kernel.org \
--cc=mka@chromium.org \
--cc=mpe@ellerman.id.au \
--cc=msys.mizuma@gmail.com \
--cc=nathan@kernel.org \
--cc=ndesaulniers@google.com \
--cc=ojeda@kernel.org \
--cc=paulmck@kernel.org \
--cc=ravi.v.shankar@intel.com \
--cc=rdunlap@infradead.org \
--cc=ricardo.neri@intel.com \
--cc=samitolvanen@google.com \
--cc=sumit.garg@linaro.org \
--cc=swboyd@chromium.org \
--cc=thunder.leizhen@huawei.com \
--cc=tzungbi@chromium.org \
--cc=vbabka@suse.cz \
--cc=wens@csie.org \
--cc=will@kernel.org \
--cc=zhaoyang.huang@unisoc.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).