* [RESEND PATCH v2 0/2] Update the watchdog period according to real CPU frequency
@ 2025-06-19 11:48 Yicong Yang
2025-06-19 11:48 ` [RESEND PATCH v2 1/2] watchdog/perf: Provide function for adjusting the event period Yicong Yang
` (2 more replies)
0 siblings, 3 replies; 12+ messages in thread
From: Yicong Yang @ 2025-06-19 11:48 UTC (permalink / raw)
To: catalin.marinas, will, akpm, dianders, sumit.garg, kernelfans,
lecopzer.chen, tglx, song, linux-arm-kernel
Cc: jonathan.cameron, zhanjie9, prime.zeng, yangyicong, linuxarm
From: Yicong Yang <yangyicong@hisilicon.com>
watchdog perf needs architecture to provide method for converting the watchdog
thresh to counter period. For arm64 we're using the max CPU frequency for
doing the conversion which is from cpufreq driver. But some cpufreq driver
are registered lately, for example cppc_cpufreq will be registered at late
initcall which is after the initialization of watchdog perf (initialized in
armv8_pmuv3 of device initcall). In such case the period of watchdog will not
be accurate enough. Fix this by registering a cpufreq notifier and update the
watchdog period once the cpufreq driver is initialized.
Attach some Q&A raised by Andrew last time:
[1] https://lore.kernel.org/linux-arm-kernel/20250512160612.c10464075df3c7842b13da11@linux-foundation.org/
Q: What is the impact of this change? Is the current code causing problems? If so,
what are they? How is the end-user experience improved by this change? Important info!
A: This will make NMI watchdog (hardlockup detector) work more accurately. HARDLOCKUP_DETECTOR_PERF
is driven by the PMU sample interrupts of cpu cycle event with period related to the watchdog
threshold. User will set the watchdog threshold in seconds (e.g. 10s by default) and for
HARDLOCKUP_DETECTOR_PERF we need to convert the seconds to cycles to setup the PMU counter.
The coversion method is provided by the arhitecture by implementing hw_nmi_get_sample_period().
For arm64 it's using max_cpufreq to do the conversion:
cycle_event_period = threshold(s) * max_cpufreq(hz)
Since arm64 doesn't have an arthictectural way to get the max_cpufreq, we use cpufreq_driver
to get it. If cpufreq_driver is not available, currently we use a safe max_cpufreq as 5GHz.
Without this patchset, if the cpufreq_driver is initialized after the hardlockup detector we'll
use 5GHz for calculating the event period. It's my case here as described in the coverletter.
That means in the default case (10s threshold) if the real max_cpufreq is 2.5GHz, the NMI watchdog
is actually working in a 20s period. With this patchset the period can be calibrated after
the cpufreq driver is initialized, much more accurate.
Q: As far as I can tell, this patchset impacts arm64 only. Do you think that other architectures
should implement this?
A: It's highly depends on the architecure's implementation of their HARDLOCKUP_DETECTOR_PERF (the
implementation of hw_nmi_get_sample_period()). If other architectures can gain the max_cpufreq
without cpufreq driver they don't have this problems (seems x86 implement in another way
and can gain the cpufreq by some arhitectural way).
Q: As far as I can tell, this patchset affects all cpufreq drivers which use late_initcall()
(on arm64, of course). Is this correct?
A: cpufreq drivers are not touched. we registered a notifier block to the cpufreq framework
so the changes to the cpufreq will notify us to see whether it's needed to modify the
event period.
Q: It is asserted that we should use the *maximum* possible CPU frequency for this calculation.
Why? I assume this is because we care about the minimum watchdog period?
A: Because it's impossbile to use the current frequency for calculating the counter period as
it may change at any time. Using maximum frequency will make the real period as close as
possible to the expected watchdog thresh in a simplest way.
Change since v1:
- Handle the theoretical race condition and other comments from Doug, thanks
Link: https://lore.kernel.org/all/20250307021811.46981-1-yangyicong@huawei.com/
Yicong Yang (2):
watchdog/perf: Provide function for adjusting the event period
arm64/watchdog_hld: Add a cpufreq notifier for update watchdog thresh
arch/arm64/kernel/watchdog_hld.c | 58 ++++++++++++++++++++++++++++++++
include/linux/nmi.h | 2 ++
kernel/watchdog_perf.c | 23 +++++++++++++
3 files changed, 83 insertions(+)
--
2.24.0
^ permalink raw reply [flat|nested] 12+ messages in thread
* [RESEND PATCH v2 1/2] watchdog/perf: Provide function for adjusting the event period
2025-06-19 11:48 [RESEND PATCH v2 0/2] Update the watchdog period according to real CPU frequency Yicong Yang
@ 2025-06-19 11:48 ` Yicong Yang
2025-06-25 22:46 ` Doug Anderson
2025-06-27 15:35 ` Will Deacon
2025-06-19 11:48 ` [RESEND PATCH v2 2/2] arm64/watchdog_hld: Add a cpufreq notifier for update watchdog thresh Yicong Yang
2025-06-23 23:30 ` [RESEND PATCH v2 0/2] Update the watchdog period according to real CPU frequency Andrew Morton
2 siblings, 2 replies; 12+ messages in thread
From: Yicong Yang @ 2025-06-19 11:48 UTC (permalink / raw)
To: catalin.marinas, will, akpm, dianders, sumit.garg, kernelfans,
lecopzer.chen, tglx, song, linux-arm-kernel
Cc: jonathan.cameron, zhanjie9, prime.zeng, yangyicong, linuxarm
From: Yicong Yang <yangyicong@hisilicon.com>
Architecture's using perf events for hard lockup detection needs to
convert the watchdog_thresh to the event's period, some architecture
for example arm64 perform this conversion using the CPU's maximum
frequency which will be acquired by cpufreq. However by the time
the lockup detector's initialized the cpufreq driver may not be
initialized, thus launch a watchdog with inaccurate period. Provide
a function hardlockup_detector_perf_adjust_period() to allowing
adjust the event period. Then architecture can update with more
accurate period if cpufreq is initialized.
Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
---
include/linux/nmi.h | 2 ++
kernel/watchdog_perf.c | 23 +++++++++++++++++++++++
2 files changed, 25 insertions(+)
diff --git a/include/linux/nmi.h b/include/linux/nmi.h
index e78fa535f61d..0a6d8a8d2d5b 100644
--- a/include/linux/nmi.h
+++ b/include/linux/nmi.h
@@ -103,10 +103,12 @@ void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs);
extern void hardlockup_detector_perf_stop(void);
extern void hardlockup_detector_perf_restart(void);
extern void hardlockup_config_perf_event(const char *str);
+extern void hardlockup_detector_perf_adjust_period(int cpu, u64 period);
#else
static inline void hardlockup_detector_perf_stop(void) { }
static inline void hardlockup_detector_perf_restart(void) { }
static inline void hardlockup_config_perf_event(const char *str) { }
+static inline void hardlockup_detector_perf_adjust_period(int cpu, u64 period) { }
#endif
void watchdog_hardlockup_stop(void);
diff --git a/kernel/watchdog_perf.c b/kernel/watchdog_perf.c
index 75af12ff774e..41c299e63fab 100644
--- a/kernel/watchdog_perf.c
+++ b/kernel/watchdog_perf.c
@@ -186,6 +186,29 @@ void watchdog_hardlockup_disable(unsigned int cpu)
}
}
+/**
+ * hardlockup_detector_perf_adjust_period - Adjust the event period due
+ * to cpu frequency change
+ * @cpu: The CPU whose event period will be adjusted
+ * @period: The target period to be set
+ */
+void hardlockup_detector_perf_adjust_period(int cpu, u64 period)
+{
+ struct perf_event *event = per_cpu(watchdog_ev, cpu);
+
+ if (!(watchdog_enabled & WATCHDOG_HARDLOCKUP_ENABLED))
+ return;
+
+ if (!event)
+ return;
+
+ if (event->attr.sample_period == period)
+ return;
+
+ if (perf_event_period(event, period))
+ pr_err("failed to change period to %llu\n", period);
+}
+
/**
* hardlockup_detector_perf_stop - Globally stop watchdog events
*
--
2.24.0
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [RESEND PATCH v2 2/2] arm64/watchdog_hld: Add a cpufreq notifier for update watchdog thresh
2025-06-19 11:48 [RESEND PATCH v2 0/2] Update the watchdog period according to real CPU frequency Yicong Yang
2025-06-19 11:48 ` [RESEND PATCH v2 1/2] watchdog/perf: Provide function for adjusting the event period Yicong Yang
@ 2025-06-19 11:48 ` Yicong Yang
2025-06-24 2:32 ` Jie Zhan
` (2 more replies)
2025-06-23 23:30 ` [RESEND PATCH v2 0/2] Update the watchdog period according to real CPU frequency Andrew Morton
2 siblings, 3 replies; 12+ messages in thread
From: Yicong Yang @ 2025-06-19 11:48 UTC (permalink / raw)
To: catalin.marinas, will, akpm, dianders, sumit.garg, kernelfans,
lecopzer.chen, tglx, song, linux-arm-kernel
Cc: jonathan.cameron, zhanjie9, prime.zeng, yangyicong, linuxarm
From: Yicong Yang <yangyicong@hisilicon.com>
arm64 depends on the cpufreq driver to gain the maximum cpu frequency
to convert the watchdog_thresh to perf event period. cpufreq drivers
like cppc_cpufreq will be initialized lately after the initializing of
the hard lockup detector so just use a safe cpufreq which will be
inaccurency. Use a cpufreq notifier to adjust the event's period to
a more accurate one.
Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
---
arch/arm64/kernel/watchdog_hld.c | 58 ++++++++++++++++++++++++++++++++
1 file changed, 58 insertions(+)
diff --git a/arch/arm64/kernel/watchdog_hld.c b/arch/arm64/kernel/watchdog_hld.c
index dcd25322127c..e55548cb26df 100644
--- a/arch/arm64/kernel/watchdog_hld.c
+++ b/arch/arm64/kernel/watchdog_hld.c
@@ -34,3 +34,61 @@ bool __init arch_perf_nmi_is_available(void)
*/
return arm_pmu_irq_is_nmi();
}
+
+static int watchdog_perf_update_period(void *data)
+{
+ int cpu = raw_smp_processor_id();
+ u64 max_cpu_freq, new_period;
+
+ max_cpu_freq = cpufreq_get_hw_max_freq(cpu) * 1000UL;
+ if (!max_cpu_freq)
+ return 0;
+
+ new_period = watchdog_thresh * max_cpu_freq;
+ hardlockup_detector_perf_adjust_period(cpu, new_period);
+
+ return 0;
+}
+
+static int watchdog_freq_notifier_callback(struct notifier_block *nb,
+ unsigned long val, void *data)
+{
+ struct cpufreq_policy *policy = data;
+ int cpu;
+
+ if (val != CPUFREQ_CREATE_POLICY)
+ return NOTIFY_DONE;
+
+ /*
+ * Let each online CPU related to the policy update the period by their
+ * own. This will serialize with the framework on start/stop the lockup
+ * detector (softlockup_{start,stop}_all) and avoid potential race
+ * condition. Otherwise we may have below theoretical race condition:
+ * (core 0/1 share the same policy)
+ * [core 0] [core 1]
+ * hardlockup_detector_event_create()
+ * hw_nmi_get_sample_period()
+ * (cpufreq registered, notifier callback invoked)
+ * watchdog_freq_notifier_callback()
+ * watchdog_perf_update_period()
+ * (since core 1's event's not yet created,
+ * the period is not set)
+ * perf_event_create_kernel_counter()
+ * (event's period is SAFE_MAX_CPU_FREQ)
+ */
+ for_each_cpu(cpu, policy->cpus)
+ smp_call_on_cpu(cpu, watchdog_perf_update_period, NULL, false);
+
+ return NOTIFY_DONE;
+}
+
+static struct notifier_block watchdog_freq_notifier = {
+ .notifier_call = watchdog_freq_notifier_callback,
+};
+
+static int __init init_watchdog_freq_notifier(void)
+{
+ return cpufreq_register_notifier(&watchdog_freq_notifier,
+ CPUFREQ_POLICY_NOTIFIER);
+}
+core_initcall(init_watchdog_freq_notifier);
--
2.24.0
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [RESEND PATCH v2 0/2] Update the watchdog period according to real CPU frequency
2025-06-19 11:48 [RESEND PATCH v2 0/2] Update the watchdog period according to real CPU frequency Yicong Yang
2025-06-19 11:48 ` [RESEND PATCH v2 1/2] watchdog/perf: Provide function for adjusting the event period Yicong Yang
2025-06-19 11:48 ` [RESEND PATCH v2 2/2] arm64/watchdog_hld: Add a cpufreq notifier for update watchdog thresh Yicong Yang
@ 2025-06-23 23:30 ` Andrew Morton
2025-06-24 10:55 ` Yicong Yang
2 siblings, 1 reply; 12+ messages in thread
From: Andrew Morton @ 2025-06-23 23:30 UTC (permalink / raw)
To: Yicong Yang
Cc: catalin.marinas, will, dianders, sumit.garg, kernelfans,
lecopzer.chen, tglx, song, linux-arm-kernel, jonathan.cameron,
zhanjie9, prime.zeng, yangyicong, linuxarm, Rafael J. Wysocki,
Viresh Kumar, linux-pm
On Thu, 19 Jun 2025 19:48:03 +0800 Yicong Yang <yangyicong@huawei.com> wrote:
> watchdog perf needs architecture to provide method for converting the watchdog
> thresh to counter period. For arm64 we're using the max CPU frequency for
> doing the conversion which is from cpufreq driver. But some cpufreq driver
> are registered lately, for example cppc_cpufreq will be registered at late
> initcall which is after the initialization of watchdog perf (initialized in
> armv8_pmuv3 of device initcall). In such case the period of watchdog will not
> be accurate enough. Fix this by registering a cpufreq notifier and update the
> watchdog period once the cpufreq driver is initialized.
Great, thanks. I'll add these to mm.git for testing.
> arch/arm64/kernel/watchdog_hld.c | 58 ++++++++++++++++++++++++++++++++
> include/linux/nmi.h | 2 ++
> kernel/watchdog_perf.c | 23 +++++++++++++
This is not exactly my comfort zone so additional review would be
appreciated, please. Even though it doesn't touch drivers/cpufreq,
perhaps the developers over there could take a look?
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RESEND PATCH v2 2/2] arm64/watchdog_hld: Add a cpufreq notifier for update watchdog thresh
2025-06-19 11:48 ` [RESEND PATCH v2 2/2] arm64/watchdog_hld: Add a cpufreq notifier for update watchdog thresh Yicong Yang
@ 2025-06-24 2:32 ` Jie Zhan
2025-06-25 22:46 ` Doug Anderson
2025-06-27 15:35 ` Will Deacon
2 siblings, 0 replies; 12+ messages in thread
From: Jie Zhan @ 2025-06-24 2:32 UTC (permalink / raw)
To: Yicong Yang, catalin.marinas, will, akpm, dianders, sumit.garg,
kernelfans, lecopzer.chen, tglx, song, linux-arm-kernel
Cc: jonathan.cameron, prime.zeng, yangyicong, linuxarm
On 19/06/2025 19:48, Yicong Yang wrote:
> From: Yicong Yang <yangyicong@hisilicon.com>
>
> arm64 depends on the cpufreq driver to gain the maximum cpu frequency
> to convert the watchdog_thresh to perf event period. cpufreq drivers
> like cppc_cpufreq will be initialized lately after the initializing of
> the hard lockup detector so just use a safe cpufreq which will be
> inaccurency. Use a cpufreq notifier to adjust the event's period to
> a more accurate one.
>
> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
> ---
> arch/arm64/kernel/watchdog_hld.c | 58 ++++++++++++++++++++++++++++++++
> 1 file changed, 58 insertions(+)
In general, this makes the watchdog period closer to the expected.
The actual period might be longer if the cpu maxfreq is lowered down later,
or shorter if boost is turned on later.
LGTM as it's anyhow better than a fixed 5GHz.
Reviewed-by: Jie Zhan <zhanjie9@hisilicon.com>
>
> diff --git a/arch/arm64/kernel/watchdog_hld.c b/arch/arm64/kernel/watchdog_hld.c
> index dcd25322127c..e55548cb26df 100644
> --- a/arch/arm64/kernel/watchdog_hld.c
> +++ b/arch/arm64/kernel/watchdog_hld.c
> @@ -34,3 +34,61 @@ bool __init arch_perf_nmi_is_available(void)
> */
> return arm_pmu_irq_is_nmi();
> }
> +
> +static int watchdog_perf_update_period(void *data)
> +{
> + int cpu = raw_smp_processor_id();
> + u64 max_cpu_freq, new_period;
> +
> + max_cpu_freq = cpufreq_get_hw_max_freq(cpu) * 1000UL;
> + if (!max_cpu_freq)
> + return 0;
> +
> + new_period = watchdog_thresh * max_cpu_freq;
> + hardlockup_detector_perf_adjust_period(cpu, new_period);
> +
> + return 0;
> +}
> +
> +static int watchdog_freq_notifier_callback(struct notifier_block *nb,
> + unsigned long val, void *data)
> +{
> + struct cpufreq_policy *policy = data;
> + int cpu;
> +
> + if (val != CPUFREQ_CREATE_POLICY)
> + return NOTIFY_DONE;
> +
> + /*
> + * Let each online CPU related to the policy update the period by their
> + * own. This will serialize with the framework on start/stop the lockup
> + * detector (softlockup_{start,stop}_all) and avoid potential race
> + * condition. Otherwise we may have below theoretical race condition:
> + * (core 0/1 share the same policy)
> + * [core 0] [core 1]
> + * hardlockup_detector_event_create()
> + * hw_nmi_get_sample_period()
> + * (cpufreq registered, notifier callback invoked)
> + * watchdog_freq_notifier_callback()
> + * watchdog_perf_update_period()
> + * (since core 1's event's not yet created,
> + * the period is not set)
> + * perf_event_create_kernel_counter()
> + * (event's period is SAFE_MAX_CPU_FREQ)
> + */
> + for_each_cpu(cpu, policy->cpus)
> + smp_call_on_cpu(cpu, watchdog_perf_update_period, NULL, false);
> +
> + return NOTIFY_DONE;
> +}
> +
> +static struct notifier_block watchdog_freq_notifier = {
> + .notifier_call = watchdog_freq_notifier_callback,
> +};
> +
> +static int __init init_watchdog_freq_notifier(void)
> +{
> + return cpufreq_register_notifier(&watchdog_freq_notifier,
> + CPUFREQ_POLICY_NOTIFIER);
> +}
> +core_initcall(init_watchdog_freq_notifier);
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RESEND PATCH v2 0/2] Update the watchdog period according to real CPU frequency
2025-06-23 23:30 ` [RESEND PATCH v2 0/2] Update the watchdog period according to real CPU frequency Andrew Morton
@ 2025-06-24 10:55 ` Yicong Yang
0 siblings, 0 replies; 12+ messages in thread
From: Yicong Yang @ 2025-06-24 10:55 UTC (permalink / raw)
To: Andrew Morton
Cc: yangyicong, catalin.marinas, will, dianders, sumit.garg,
kernelfans, lecopzer.chen, tglx, song, linux-arm-kernel,
jonathan.cameron, zhanjie9, prime.zeng, linuxarm,
Rafael J. Wysocki, Viresh Kumar, linux-pm
On 2025/6/24 7:30, Andrew Morton wrote:
> On Thu, 19 Jun 2025 19:48:03 +0800 Yicong Yang <yangyicong@huawei.com> wrote:
>
>> watchdog perf needs architecture to provide method for converting the watchdog
>> thresh to counter period. For arm64 we're using the max CPU frequency for
>> doing the conversion which is from cpufreq driver. But some cpufreq driver
>> are registered lately, for example cppc_cpufreq will be registered at late
>> initcall which is after the initialization of watchdog perf (initialized in
>> armv8_pmuv3 of device initcall). In such case the period of watchdog will not
>> be accurate enough. Fix this by registering a cpufreq notifier and update the
>> watchdog period once the cpufreq driver is initialized.
>
> Great, thanks. I'll add these to mm.git for testing.
>
thanks.
>> arch/arm64/kernel/watchdog_hld.c | 58 ++++++++++++++++++++++++++++++++
>> include/linux/nmi.h | 2 ++
>> kernel/watchdog_perf.c | 23 +++++++++++++
>
> This is not exactly my comfort zone so additional review would be
> appreciated, please. Even though it doesn't touch drivers/cpufreq,
> perhaps the developers over there could take a look?
> .
sure. I've already cc'ed them.
thanks.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RESEND PATCH v2 1/2] watchdog/perf: Provide function for adjusting the event period
2025-06-19 11:48 ` [RESEND PATCH v2 1/2] watchdog/perf: Provide function for adjusting the event period Yicong Yang
@ 2025-06-25 22:46 ` Doug Anderson
2025-06-27 15:35 ` Will Deacon
1 sibling, 0 replies; 12+ messages in thread
From: Doug Anderson @ 2025-06-25 22:46 UTC (permalink / raw)
To: Yicong Yang
Cc: catalin.marinas, will, akpm, sumit.garg, kernelfans,
lecopzer.chen, tglx, song, linux-arm-kernel, jonathan.cameron,
zhanjie9, prime.zeng, yangyicong, linuxarm
Hi,
On Thu, Jun 19, 2025 at 4:48 AM Yicong Yang <yangyicong@huawei.com> wrote:
>
> From: Yicong Yang <yangyicong@hisilicon.com>
>
> Architecture's using perf events for hard lockup detection needs to
> convert the watchdog_thresh to the event's period, some architecture
> for example arm64 perform this conversion using the CPU's maximum
> frequency which will be acquired by cpufreq. However by the time
> the lockup detector's initialized the cpufreq driver may not be
> initialized, thus launch a watchdog with inaccurate period. Provide
> a function hardlockup_detector_perf_adjust_period() to allowing
> adjust the event period. Then architecture can update with more
> accurate period if cpufreq is initialized.
>
> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
> ---
> include/linux/nmi.h | 2 ++
> kernel/watchdog_perf.c | 23 +++++++++++++++++++++++
> 2 files changed, 25 insertions(+)
I looked at v1 a long time ago and my nit has been fixed, so:
Reviewed-by: Douglas Anderson <dianders@chromium.org>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RESEND PATCH v2 2/2] arm64/watchdog_hld: Add a cpufreq notifier for update watchdog thresh
2025-06-19 11:48 ` [RESEND PATCH v2 2/2] arm64/watchdog_hld: Add a cpufreq notifier for update watchdog thresh Yicong Yang
2025-06-24 2:32 ` Jie Zhan
@ 2025-06-25 22:46 ` Doug Anderson
2025-06-27 15:35 ` Will Deacon
2 siblings, 0 replies; 12+ messages in thread
From: Doug Anderson @ 2025-06-25 22:46 UTC (permalink / raw)
To: Yicong Yang
Cc: catalin.marinas, will, akpm, sumit.garg, kernelfans,
lecopzer.chen, tglx, song, linux-arm-kernel, jonathan.cameron,
zhanjie9, prime.zeng, yangyicong, linuxarm
Hi,
On Thu, Jun 19, 2025 at 4:48 AM Yicong Yang <yangyicong@huawei.com> wrote:
>
> From: Yicong Yang <yangyicong@hisilicon.com>
>
> arm64 depends on the cpufreq driver to gain the maximum cpu frequency
> to convert the watchdog_thresh to perf event period. cpufreq drivers
> like cppc_cpufreq will be initialized lately after the initializing of
> the hard lockup detector so just use a safe cpufreq which will be
> inaccurency. Use a cpufreq notifier to adjust the event's period to
> a more accurate one.
>
> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
> ---
> arch/arm64/kernel/watchdog_hld.c | 58 ++++++++++++++++++++++++++++++++
> 1 file changed, 58 insertions(+)
I haven't had time to go through carefully and look for every possible
race condition. ...but at least this does attempt to get a bit safer
than v1 was. I also agree that it seems hard to believe this could
make things worse than not having this patch. Thus, I'm OK with:
Reviewed-by: Douglas Anderson <dianders@chromium.org>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RESEND PATCH v2 1/2] watchdog/perf: Provide function for adjusting the event period
2025-06-19 11:48 ` [RESEND PATCH v2 1/2] watchdog/perf: Provide function for adjusting the event period Yicong Yang
2025-06-25 22:46 ` Doug Anderson
@ 2025-06-27 15:35 ` Will Deacon
2025-06-30 8:07 ` Yicong Yang
1 sibling, 1 reply; 12+ messages in thread
From: Will Deacon @ 2025-06-27 15:35 UTC (permalink / raw)
To: Yicong Yang
Cc: catalin.marinas, akpm, dianders, sumit.garg, kernelfans,
lecopzer.chen, tglx, song, linux-arm-kernel, jonathan.cameron,
zhanjie9, prime.zeng, yangyicong, linuxarm
On Thu, Jun 19, 2025 at 07:48:04PM +0800, Yicong Yang wrote:
> From: Yicong Yang <yangyicong@hisilicon.com>
>
> Architecture's using perf events for hard lockup detection needs to
> convert the watchdog_thresh to the event's period, some architecture
> for example arm64 perform this conversion using the CPU's maximum
> frequency which will be acquired by cpufreq. However by the time
> the lockup detector's initialized the cpufreq driver may not be
> initialized, thus launch a watchdog with inaccurate period. Provide
> a function hardlockup_detector_perf_adjust_period() to allowing
> adjust the event period. Then architecture can update with more
> accurate period if cpufreq is initialized.
>
> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
> ---
> include/linux/nmi.h | 2 ++
> kernel/watchdog_perf.c | 23 +++++++++++++++++++++++
> 2 files changed, 25 insertions(+)
>
> diff --git a/include/linux/nmi.h b/include/linux/nmi.h
> index e78fa535f61d..0a6d8a8d2d5b 100644
> --- a/include/linux/nmi.h
> +++ b/include/linux/nmi.h
> @@ -103,10 +103,12 @@ void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs);
> extern void hardlockup_detector_perf_stop(void);
> extern void hardlockup_detector_perf_restart(void);
> extern void hardlockup_config_perf_event(const char *str);
> +extern void hardlockup_detector_perf_adjust_period(int cpu, u64 period);
> #else
> static inline void hardlockup_detector_perf_stop(void) { }
> static inline void hardlockup_detector_perf_restart(void) { }
> static inline void hardlockup_config_perf_event(const char *str) { }
> +static inline void hardlockup_detector_perf_adjust_period(int cpu, u64 period) { }
> #endif
>
> void watchdog_hardlockup_stop(void);
> diff --git a/kernel/watchdog_perf.c b/kernel/watchdog_perf.c
> index 75af12ff774e..41c299e63fab 100644
> --- a/kernel/watchdog_perf.c
> +++ b/kernel/watchdog_perf.c
> @@ -186,6 +186,29 @@ void watchdog_hardlockup_disable(unsigned int cpu)
> }
> }
>
> +/**
> + * hardlockup_detector_perf_adjust_period - Adjust the event period due
> + * to cpu frequency change
> + * @cpu: The CPU whose event period will be adjusted
> + * @period: The target period to be set
> + */
> +void hardlockup_detector_perf_adjust_period(int cpu, u64 period)
> +{
> + struct perf_event *event = per_cpu(watchdog_ev, cpu);
Given that this is always called in the context of the current CPU (and
has to be to avoid the race you describe in the next patch), why not
drop the 'cpu' argument and use this_cpu_ptr() here instead?
Will
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RESEND PATCH v2 2/2] arm64/watchdog_hld: Add a cpufreq notifier for update watchdog thresh
2025-06-19 11:48 ` [RESEND PATCH v2 2/2] arm64/watchdog_hld: Add a cpufreq notifier for update watchdog thresh Yicong Yang
2025-06-24 2:32 ` Jie Zhan
2025-06-25 22:46 ` Doug Anderson
@ 2025-06-27 15:35 ` Will Deacon
2025-06-30 8:38 ` Yicong Yang
2 siblings, 1 reply; 12+ messages in thread
From: Will Deacon @ 2025-06-27 15:35 UTC (permalink / raw)
To: Yicong Yang
Cc: catalin.marinas, akpm, dianders, sumit.garg, kernelfans,
lecopzer.chen, tglx, song, linux-arm-kernel, jonathan.cameron,
zhanjie9, prime.zeng, yangyicong, linuxarm
On Thu, Jun 19, 2025 at 07:48:05PM +0800, Yicong Yang wrote:
> From: Yicong Yang <yangyicong@hisilicon.com>
>
> arm64 depends on the cpufreq driver to gain the maximum cpu frequency
> to convert the watchdog_thresh to perf event period. cpufreq drivers
> like cppc_cpufreq will be initialized lately after the initializing of
> the hard lockup detector so just use a safe cpufreq which will be
> inaccurency. Use a cpufreq notifier to adjust the event's period to
> a more accurate one.
>
> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
> ---
> arch/arm64/kernel/watchdog_hld.c | 58 ++++++++++++++++++++++++++++++++
> 1 file changed, 58 insertions(+)
>
> diff --git a/arch/arm64/kernel/watchdog_hld.c b/arch/arm64/kernel/watchdog_hld.c
> index dcd25322127c..e55548cb26df 100644
> --- a/arch/arm64/kernel/watchdog_hld.c
> +++ b/arch/arm64/kernel/watchdog_hld.c
> @@ -34,3 +34,61 @@ bool __init arch_perf_nmi_is_available(void)
> */
> return arm_pmu_irq_is_nmi();
> }
> +
> +static int watchdog_perf_update_period(void *data)
> +{
> + int cpu = raw_smp_processor_id();
Why raw?
Will
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RESEND PATCH v2 1/2] watchdog/perf: Provide function for adjusting the event period
2025-06-27 15:35 ` Will Deacon
@ 2025-06-30 8:07 ` Yicong Yang
0 siblings, 0 replies; 12+ messages in thread
From: Yicong Yang @ 2025-06-30 8:07 UTC (permalink / raw)
To: Will Deacon
Cc: yangyicong, catalin.marinas, akpm, dianders, sumit.garg,
kernelfans, lecopzer.chen, tglx, song, linux-arm-kernel,
jonathan.cameron, zhanjie9, prime.zeng, linuxarm
On 2025/6/27 23:35, Will Deacon wrote:
> On Thu, Jun 19, 2025 at 07:48:04PM +0800, Yicong Yang wrote:
>> From: Yicong Yang <yangyicong@hisilicon.com>
>>
>> Architecture's using perf events for hard lockup detection needs to
>> convert the watchdog_thresh to the event's period, some architecture
>> for example arm64 perform this conversion using the CPU's maximum
>> frequency which will be acquired by cpufreq. However by the time
>> the lockup detector's initialized the cpufreq driver may not be
>> initialized, thus launch a watchdog with inaccurate period. Provide
>> a function hardlockup_detector_perf_adjust_period() to allowing
>> adjust the event period. Then architecture can update with more
>> accurate period if cpufreq is initialized.
>>
>> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
>> ---
>> include/linux/nmi.h | 2 ++
>> kernel/watchdog_perf.c | 23 +++++++++++++++++++++++
>> 2 files changed, 25 insertions(+)
>>
>> diff --git a/include/linux/nmi.h b/include/linux/nmi.h
>> index e78fa535f61d..0a6d8a8d2d5b 100644
>> --- a/include/linux/nmi.h
>> +++ b/include/linux/nmi.h
>> @@ -103,10 +103,12 @@ void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs);
>> extern void hardlockup_detector_perf_stop(void);
>> extern void hardlockup_detector_perf_restart(void);
>> extern void hardlockup_config_perf_event(const char *str);
>> +extern void hardlockup_detector_perf_adjust_period(int cpu, u64 period);
>> #else
>> static inline void hardlockup_detector_perf_stop(void) { }
>> static inline void hardlockup_detector_perf_restart(void) { }
>> static inline void hardlockup_config_perf_event(const char *str) { }
>> +static inline void hardlockup_detector_perf_adjust_period(int cpu, u64 period) { }
>> #endif
>>
>> void watchdog_hardlockup_stop(void);
>> diff --git a/kernel/watchdog_perf.c b/kernel/watchdog_perf.c
>> index 75af12ff774e..41c299e63fab 100644
>> --- a/kernel/watchdog_perf.c
>> +++ b/kernel/watchdog_perf.c
>> @@ -186,6 +186,29 @@ void watchdog_hardlockup_disable(unsigned int cpu)
>> }
>> }
>>
>> +/**
>> + * hardlockup_detector_perf_adjust_period - Adjust the event period due
>> + * to cpu frequency change
>> + * @cpu: The CPU whose event period will be adjusted
>> + * @period: The target period to be set
>> + */
>> +void hardlockup_detector_perf_adjust_period(int cpu, u64 period)
>> +{
>> + struct perf_event *event = per_cpu(watchdog_ev, cpu);
>
> Given that this is always called in the context of the current CPU (and
> has to be to avoid the race you describe in the next patch), why not
> drop the 'cpu' argument and use this_cpu_ptr() here instead?
>
work for me. In v1 this may be called in a remote cpu and I forget to change the
prototype. will drop it, we only have one user to call this on the local cpu.
thanks.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RESEND PATCH v2 2/2] arm64/watchdog_hld: Add a cpufreq notifier for update watchdog thresh
2025-06-27 15:35 ` Will Deacon
@ 2025-06-30 8:38 ` Yicong Yang
0 siblings, 0 replies; 12+ messages in thread
From: Yicong Yang @ 2025-06-30 8:38 UTC (permalink / raw)
To: Will Deacon
Cc: yangyicong, catalin.marinas, akpm, dianders, sumit.garg,
kernelfans, lecopzer.chen, tglx, song, linux-arm-kernel,
jonathan.cameron, zhanjie9, prime.zeng, linuxarm
On 2025/6/27 23:35, Will Deacon wrote:
> On Thu, Jun 19, 2025 at 07:48:05PM +0800, Yicong Yang wrote:
>> From: Yicong Yang <yangyicong@hisilicon.com>
>>
>> arm64 depends on the cpufreq driver to gain the maximum cpu frequency
>> to convert the watchdog_thresh to perf event period. cpufreq drivers
>> like cppc_cpufreq will be initialized lately after the initializing of
>> the hard lockup detector so just use a safe cpufreq which will be
>> inaccurency. Use a cpufreq notifier to adjust the event's period to
>> a more accurate one.
>>
>> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
>> ---
>> arch/arm64/kernel/watchdog_hld.c | 58 ++++++++++++++++++++++++++++++++
>> 1 file changed, 58 insertions(+)
>>
>> diff --git a/arch/arm64/kernel/watchdog_hld.c b/arch/arm64/kernel/watchdog_hld.c
>> index dcd25322127c..e55548cb26df 100644
>> --- a/arch/arm64/kernel/watchdog_hld.c
>> +++ b/arch/arm64/kernel/watchdog_hld.c
>> @@ -34,3 +34,61 @@ bool __init arch_perf_nmi_is_available(void)
>> */
>> return arm_pmu_irq_is_nmi();
>> }
>> +
>> +static int watchdog_perf_update_period(void *data)
>> +{
>> + int cpu = raw_smp_processor_id();
>
> Why raw?
>
smp_processor_id() also works for me, since here's running in a per-cpu worker from
smp_call_on_cpu() so we won't violate the check in smp_processor_id() when
CONFIG_DEBUG_PREEMPT=y. I might think the raw_smp_processora_id() is okay since we're
known to run on the local cpu so the unstable raw_smp_processor_id() is fine.
thanks.
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2025-06-30 8:43 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-19 11:48 [RESEND PATCH v2 0/2] Update the watchdog period according to real CPU frequency Yicong Yang
2025-06-19 11:48 ` [RESEND PATCH v2 1/2] watchdog/perf: Provide function for adjusting the event period Yicong Yang
2025-06-25 22:46 ` Doug Anderson
2025-06-27 15:35 ` Will Deacon
2025-06-30 8:07 ` Yicong Yang
2025-06-19 11:48 ` [RESEND PATCH v2 2/2] arm64/watchdog_hld: Add a cpufreq notifier for update watchdog thresh Yicong Yang
2025-06-24 2:32 ` Jie Zhan
2025-06-25 22:46 ` Doug Anderson
2025-06-27 15:35 ` Will Deacon
2025-06-30 8:38 ` Yicong Yang
2025-06-23 23:30 ` [RESEND PATCH v2 0/2] Update the watchdog period according to real CPU frequency Andrew Morton
2025-06-24 10:55 ` Yicong Yang
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).