public inbox for linux-arm-kernel@lists.infradead.org
 help / color / mirror / Atom feed
From: Yicong Yang <yangyicong@huawei.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: <yangyicong@hisilicon.com>, <catalin.marinas@arm.com>,
	<will@kernel.org>, <dianders@chromium.org>,
	<sumit.garg@linaro.org>, <kernelfans@gmail.com>,
	<lecopzer.chen@mediatek.com>, <tglx@linutronix.de>,
	<song@kernel.org>, <linux-arm-kernel@lists.infradead.org>,
	<jonathan.cameron@huawei.com>, <zhanjie9@hisilicon.com>,
	<prime.zeng@hisilicon.com>, <linuxarm@huawei.com>
Subject: Re: [PATCH v2 0/2] Update the watchdog period according to real CPU frequency
Date: Tue, 13 May 2025 14:51:04 +0800	[thread overview]
Message-ID: <3d9ef0c0-11b3-6ce2-5a48-5a01dbc9bf58@huawei.com> (raw)
In-Reply-To: <20250512160612.c10464075df3c7842b13da11@linux-foundation.org>

On 2025/5/13 7:06, Andrew Morton wrote:
> On Mon, 12 May 2025 21:09:17 +0800 Yicong Yang <yangyicong@huawei.com> wrote:
> 
>> From: Yicong Yang <yangyicong@hisilicon.com>
>>
>> watchdog perf needs architecture to provide method for converting the watchdog
>> thresh to counter period. For arm64 we're using the max CPU frequency for
>> doing the conversion which is from cpufreq driver. But some cpufreq driver
>> are registered lately, for example cppc_cpufreq will be registered at late
>> initcall which is after the initialization of watchdog perf (initialized in
>> armv8_pmuv3 of device initcall). In such case the period of watchdog will not
>> be accurate enough. Fix this by registering a cpufreq notifier and update the
>> watchdog period once the cpufreq driver is initialized.
>>
> 
> Thanks.  Thoughts.
> 
> 1: What is the impact of this change?  Is the current code causing
>    problems?  If so, what are they?  How is the end-user experience
>    improved by this change?  Important info!
> 

This will make NMI watchdog (hardlockup detector) work more accurately. HARDLOCKUP_DETECTOR_PERF
is driven by the PMU sample interrupts of cpu cycle event with period related to the watchdog
threshold.  User will set the watchdog threshold in seconds (e.g. 10s by default) and for
HARDLOCKUP_DETECTOR_PERF we need to convert the seconds to cycles to setup the PMU counter.
The coversion method is provided by the arhitecture by implementing hw_nmi_get_sample_period().
For arm64 it's using max_cpufreq to do the conversion:
cycle_event_period = threshold(s) * max_cpufreq(hz)

Since arm64 doesn't have an arthictectural way to get the max_cpufreq, we use cpufreq_driver
to get it. If cpufreq_driver is not available, currently we use a safe max_cpufreq as 5GHz.
Without this patchset, if the cpufreq_driver is initialized after the hardlockup detector we'll
use 5GHz for calculating the event period. It's my case here as described in the coverletter.
That means in the default case (10s threshold) if the real max_cpufreq is 2.5GHz, the NMI watchdog
is actually working in a 20s period. With this patchset the period can be calibrated after
the cpufreq driver is initialized, much more accurate.

> 2: As far as I can tell, this patchset impacts arm64 only.  Do you
>    think that other architectures should implement this?

It's highly depends on the architecure's implementation of their HARDLOCKUP_DETECTOR_PERF (the
implementation of hw_nmi_get_sample_period()). If other architectures can gain the max_cpufreq
without cpufreq driver they don't have this problems (seems x86 implement in another way
and can gain the cpufreq by some arhitectural way).

> 
> 3: As far as I can tell, this patchset affects all cpufreq drivers
>    which use late_initcall() (on arm64, of course).  Is this correct?
> 

cpufreq drivers are not touched. we registered a notifier block to the cpufreq framework
so the changes to the cpufreq will notify us to see whether it's needed to modify the
event period.


> 4: It is asserted that we should use the *maximum* possible CPU
>    frequency for this calculation.  Why?  I assume this is because we
>    care about the minimum watchdog period?
> 

I think it may because it's impossbile to use the current frequency for calculating the counter
period as it may change at any time. Using maximum frequency will make the real period as close
as possible to the expected watchdog thresh in a simplest way.

> Can I assume that the ARM maintainers will be handling this?
> .

I think yes since it only affects the arm64's implementation of HARDLOCKUP_DETECTOR_PERF.

Thanks.



  reply	other threads:[~2025-05-13  6:53 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-05-12 13:09 [PATCH v2 0/2] Update the watchdog period according to real CPU frequency Yicong Yang
2025-05-12 13:09 ` [PATCH v2 1/2] watchdog/perf: Provide function for adjusting the event period Yicong Yang
2025-05-12 23:07   ` Andrew Morton
2025-05-13  7:02     ` Yicong Yang
2025-05-12 13:09 ` [PATCH v2 2/2] arm64/watchdog_hld: Add a cpufreq notifier for update watchdog thresh Yicong Yang
2025-05-12 23:06 ` [PATCH v2 0/2] Update the watchdog period according to real CPU frequency Andrew Morton
2025-05-13  6:51   ` Yicong Yang [this message]
2025-06-27 15:26   ` Will Deacon
2025-06-27 18:56     ` Andrew Morton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3d9ef0c0-11b3-6ce2-5a48-5a01dbc9bf58@huawei.com \
    --to=yangyicong@huawei.com \
    --cc=akpm@linux-foundation.org \
    --cc=catalin.marinas@arm.com \
    --cc=dianders@chromium.org \
    --cc=jonathan.cameron@huawei.com \
    --cc=kernelfans@gmail.com \
    --cc=lecopzer.chen@mediatek.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linuxarm@huawei.com \
    --cc=prime.zeng@hisilicon.com \
    --cc=song@kernel.org \
    --cc=sumit.garg@linaro.org \
    --cc=tglx@linutronix.de \
    --cc=will@kernel.org \
    --cc=yangyicong@hisilicon.com \
    --cc=zhanjie9@hisilicon.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox