linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/2] Update the watchdog period according to real CPU frequency
@ 2025-07-01 11:02 Yicong Yang
  2025-07-01 11:02 ` [PATCH v3 1/2] watchdog/perf: Provide function for adjusting the event period Yicong Yang
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Yicong Yang @ 2025-07-01 11:02 UTC (permalink / raw)
  To: catalin.marinas, will, akpm, dianders, kernelfans, tglx, song,
	linux-arm-kernel
  Cc: jonathan.cameron, zhanjie9, prime.zeng, yangyicong, linuxarm

From: Yicong Yang <yangyicong@hisilicon.com>

watchdog perf needs architecture to provide method for converting the watchdog
thresh to counter period. For arm64 we're using the max CPU frequency for
doing the conversion which is from cpufreq driver. But some cpufreq driver
are registered lately, for example cppc_cpufreq will be registered at late
initcall which is after the initialization of watchdog perf (initialized in
armv8_pmuv3 of device initcall). In such case the period of watchdog will not
be accurate enough. Fix this by registering a cpufreq notifier and update the
watchdog period once the cpufreq driver is initialized.

Attach some Q&A raised by Andrew last time:
[1] https://lore.kernel.org/linux-arm-kernel/20250512160612.c10464075df3c7842b13da11@linux-foundation.org/

Q: What is the impact of this change?  Is the current code causing problems?  If so,
   what are they?  How is the end-user experience improved by this change? Important info!
A: This will make NMI watchdog (hardlockup detector) work more accurately. HARDLOCKUP_DETECTOR_PERF
   is driven by the PMU sample interrupts of cpu cycle event with period related to the watchdog
   threshold.  User will set the watchdog threshold in seconds (e.g. 10s by default) and for
   HARDLOCKUP_DETECTOR_PERF we need to convert the seconds to cycles to setup the PMU counter.
   The coversion method is provided by the arhitecture by implementing hw_nmi_get_sample_period().
   For arm64 it's using max_cpufreq to do the conversion:

       cycle_event_period = threshold(s) * max_cpufreq(hz)

   Since arm64 doesn't have an arthictectural way to get the max_cpufreq, we use cpufreq_driver
   to get it. If cpufreq_driver is not available, currently we use a safe max_cpufreq as 5GHz.
   Without this patchset, if the cpufreq_driver is initialized after the hardlockup detector we'll
   use 5GHz for calculating the event period. It's my case here as described in the coverletter.
   That means in the default case (10s threshold) if the real max_cpufreq is 2.5GHz, the NMI watchdog
   is actually working in a 20s period. With this patchset the period can be calibrated after
   the cpufreq driver is initialized, much more accurate.

Q: As far as I can tell, this patchset impacts arm64 only.  Do you think that other architectures
   should implement this?
A: It's highly depends on the architecure's implementation of their HARDLOCKUP_DETECTOR_PERF (the
   implementation of hw_nmi_get_sample_period()). If other architectures can gain the max_cpufreq
   without cpufreq driver they don't have this problems (seems x86 implement in another way
   and can gain the cpufreq by some arhitectural way).

Q: As far as I can tell, this patchset affects all cpufreq drivers which use late_initcall()
   (on arm64, of course).  Is this correct?
A: cpufreq drivers are not touched. we registered a notifier block to the cpufreq framework
   so the changes to the cpufreq will notify us to see whether it's needed to modify the
   event period.

Q: It is asserted that we should use the *maximum* possible CPU frequency for this calculation.
   Why?  I assume this is because we care about the minimum watchdog period?
A: Because it's impossbile to use the current frequency for calculating the counter period as
   it may change at any time. Using maximum frequency will make the real period as close as
   possible to the expected watchdog thresh in a simplest way.

Change since v2:
- Address the comments from Will and collect tags for Doug and Jie, thanks
Link: https://lore.kernel.org/linux-arm-kernel/20250619114805.16968-1-yangyicong@huawei.com/#t

Change since v1:
- Handle the theoretical race condition and other comments from Doug, thanks
Link: https://lore.kernel.org/all/20250307021811.46981-1-yangyicong@huawei.com/

Yicong Yang (2):
  watchdog/perf: Provide function for adjusting the event period
  arm64/watchdog_hld: Add a cpufreq notifier for update watchdog thresh

 arch/arm64/kernel/watchdog_hld.c | 58 ++++++++++++++++++++++++++++++++
 include/linux/nmi.h              |  2 ++
 kernel/watchdog_perf.c           | 22 ++++++++++++
 3 files changed, 82 insertions(+)

-- 
2.24.0



^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH v3 1/2] watchdog/perf: Provide function for adjusting the event period
  2025-07-01 11:02 [PATCH v3 0/2] Update the watchdog period according to real CPU frequency Yicong Yang
@ 2025-07-01 11:02 ` Yicong Yang
  2025-07-01 11:02 ` [PATCH v3 2/2] arm64/watchdog_hld: Add a cpufreq notifier for update watchdog thresh Yicong Yang
  2025-07-04 17:44 ` [PATCH v3 0/2] Update the watchdog period according to real CPU frequency Will Deacon
  2 siblings, 0 replies; 4+ messages in thread
From: Yicong Yang @ 2025-07-01 11:02 UTC (permalink / raw)
  To: catalin.marinas, will, akpm, dianders, kernelfans, tglx, song,
	linux-arm-kernel
  Cc: jonathan.cameron, zhanjie9, prime.zeng, yangyicong, linuxarm

From: Yicong Yang <yangyicong@hisilicon.com>

Architecture's using perf events for hard lockup detection needs to
convert the watchdog_thresh to the event's period, some architecture
for example arm64 perform this conversion using the CPU's maximum
frequency which will be acquired by cpufreq. However by the time
the lockup detector's initialized the cpufreq driver may not be
initialized, thus launch a watchdog with inaccurate period. Provide
a function hardlockup_detector_perf_adjust_period() to allowing
adjust the event period. Then architecture can update with more
accurate period if cpufreq is initialized.

Reviewed-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
---
 include/linux/nmi.h    |  2 ++
 kernel/watchdog_perf.c | 22 ++++++++++++++++++++++
 2 files changed, 24 insertions(+)

diff --git a/include/linux/nmi.h b/include/linux/nmi.h
index e78fa535f61d..cf3c6ab408aa 100644
--- a/include/linux/nmi.h
+++ b/include/linux/nmi.h
@@ -103,10 +103,12 @@ void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs);
 extern void hardlockup_detector_perf_stop(void);
 extern void hardlockup_detector_perf_restart(void);
 extern void hardlockup_config_perf_event(const char *str);
+extern void hardlockup_detector_perf_adjust_period(u64 period);
 #else
 static inline void hardlockup_detector_perf_stop(void) { }
 static inline void hardlockup_detector_perf_restart(void) { }
 static inline void hardlockup_config_perf_event(const char *str) { }
+static inline void hardlockup_detector_perf_adjust_period(u64 period) { }
 #endif
 
 void watchdog_hardlockup_stop(void);
diff --git a/kernel/watchdog_perf.c b/kernel/watchdog_perf.c
index 75af12ff774e..9c58f5b4381d 100644
--- a/kernel/watchdog_perf.c
+++ b/kernel/watchdog_perf.c
@@ -186,6 +186,28 @@ void watchdog_hardlockup_disable(unsigned int cpu)
 	}
 }
 
+/**
+ * hardlockup_detector_perf_adjust_period - Adjust the event period due
+ *                                          to current cpu frequency change
+ * @period: The target period to be set
+ */
+void hardlockup_detector_perf_adjust_period(u64 period)
+{
+	struct perf_event *event = this_cpu_read(watchdog_ev);
+
+	if (!(watchdog_enabled & WATCHDOG_HARDLOCKUP_ENABLED))
+		return;
+
+	if (!event)
+		return;
+
+	if (event->attr.sample_period == period)
+		return;
+
+	if (perf_event_period(event, period))
+		pr_err("failed to change period to %llu\n", period);
+}
+
 /**
  * hardlockup_detector_perf_stop - Globally stop watchdog events
  *
-- 
2.24.0



^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH v3 2/2] arm64/watchdog_hld: Add a cpufreq notifier for update watchdog thresh
  2025-07-01 11:02 [PATCH v3 0/2] Update the watchdog period according to real CPU frequency Yicong Yang
  2025-07-01 11:02 ` [PATCH v3 1/2] watchdog/perf: Provide function for adjusting the event period Yicong Yang
@ 2025-07-01 11:02 ` Yicong Yang
  2025-07-04 17:44 ` [PATCH v3 0/2] Update the watchdog period according to real CPU frequency Will Deacon
  2 siblings, 0 replies; 4+ messages in thread
From: Yicong Yang @ 2025-07-01 11:02 UTC (permalink / raw)
  To: catalin.marinas, will, akpm, dianders, kernelfans, tglx, song,
	linux-arm-kernel
  Cc: jonathan.cameron, zhanjie9, prime.zeng, yangyicong, linuxarm

From: Yicong Yang <yangyicong@hisilicon.com>

arm64 depends on the cpufreq driver to gain the maximum cpu frequency
to convert the watchdog_thresh to perf event period. cpufreq drivers
like cppc_cpufreq will be initialized lately after the initializing of
the hard lockup detector so just use a safe cpufreq which will be
inaccurency. Use a cpufreq notifier to adjust the event's period to
a more accurate one.

Reviewed-by: Jie Zhan <zhanjie9@hisilicon.com>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
---
 arch/arm64/kernel/watchdog_hld.c | 58 ++++++++++++++++++++++++++++++++
 1 file changed, 58 insertions(+)

diff --git a/arch/arm64/kernel/watchdog_hld.c b/arch/arm64/kernel/watchdog_hld.c
index dcd25322127c..3093037dcb7b 100644
--- a/arch/arm64/kernel/watchdog_hld.c
+++ b/arch/arm64/kernel/watchdog_hld.c
@@ -34,3 +34,61 @@ bool __init arch_perf_nmi_is_available(void)
 	 */
 	return arm_pmu_irq_is_nmi();
 }
+
+static int watchdog_perf_update_period(void *data)
+{
+	int cpu = smp_processor_id();
+	u64 max_cpu_freq, new_period;
+
+	max_cpu_freq = cpufreq_get_hw_max_freq(cpu) * 1000UL;
+	if (!max_cpu_freq)
+		return 0;
+
+	new_period = watchdog_thresh * max_cpu_freq;
+	hardlockup_detector_perf_adjust_period(new_period);
+
+	return 0;
+}
+
+static int watchdog_freq_notifier_callback(struct notifier_block *nb,
+					   unsigned long val, void *data)
+{
+	struct cpufreq_policy *policy = data;
+	int cpu;
+
+	if (val != CPUFREQ_CREATE_POLICY)
+		return NOTIFY_DONE;
+
+	/*
+	 * Let each online CPU related to the policy update the period by their
+	 * own. This will serialize with the framework on start/stop the lockup
+	 * detector (softlockup_{start,stop}_all) and avoid potential race
+	 * condition. Otherwise we may have below theoretical race condition:
+	 * (core 0/1 share the same policy)
+	 * [core 0]                      [core 1]
+	 *                               hardlockup_detector_event_create()
+	 *                                 hw_nmi_get_sample_period()
+	 * (cpufreq registered, notifier callback invoked)
+	 * watchdog_freq_notifier_callback()
+	 *   watchdog_perf_update_period()
+	 *   (since core 1's event's not yet created,
+	 *    the period is not set)
+	 *                                 perf_event_create_kernel_counter()
+	 *                                 (event's period is SAFE_MAX_CPU_FREQ)
+	 */
+	for_each_cpu(cpu, policy->cpus)
+		smp_call_on_cpu(cpu, watchdog_perf_update_period, NULL, false);
+
+	return NOTIFY_DONE;
+}
+
+static struct notifier_block watchdog_freq_notifier = {
+	.notifier_call = watchdog_freq_notifier_callback,
+};
+
+static int __init init_watchdog_freq_notifier(void)
+{
+	return cpufreq_register_notifier(&watchdog_freq_notifier,
+					 CPUFREQ_POLICY_NOTIFIER);
+}
+core_initcall(init_watchdog_freq_notifier);
-- 
2.24.0



^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH v3 0/2] Update the watchdog period according to real CPU frequency
  2025-07-01 11:02 [PATCH v3 0/2] Update the watchdog period according to real CPU frequency Yicong Yang
  2025-07-01 11:02 ` [PATCH v3 1/2] watchdog/perf: Provide function for adjusting the event period Yicong Yang
  2025-07-01 11:02 ` [PATCH v3 2/2] arm64/watchdog_hld: Add a cpufreq notifier for update watchdog thresh Yicong Yang
@ 2025-07-04 17:44 ` Will Deacon
  2 siblings, 0 replies; 4+ messages in thread
From: Will Deacon @ 2025-07-04 17:44 UTC (permalink / raw)
  To: catalin.marinas, akpm, dianders, kernelfans, tglx, song,
	linux-arm-kernel, Yicong Yang
  Cc: kernel-team, Will Deacon, jonathan.cameron, zhanjie9, prime.zeng,
	yangyicong, linuxarm

On Tue, 01 Jul 2025 19:02:12 +0800, Yicong Yang wrote:
> watchdog perf needs architecture to provide method for converting the watchdog
> thresh to counter period. For arm64 we're using the max CPU frequency for
> doing the conversion which is from cpufreq driver. But some cpufreq driver
> are registered lately, for example cppc_cpufreq will be registered at late
> initcall which is after the initialization of watchdog perf (initialized in
> armv8_pmuv3 of device initcall). In such case the period of watchdog will not
> be accurate enough. Fix this by registering a cpufreq notifier and update the
> watchdog period once the cpufreq driver is initialized.
> 
> [...]

Applied to will (for-next/perf), thanks!

[1/2] watchdog/perf: Provide function for adjusting the event period
      https://git.kernel.org/will/c/60bc47b5a0b1
[2/2] arm64/watchdog_hld: Add a cpufreq notifier for update watchdog thresh
      https://git.kernel.org/will/c/7a884442aeb6

Cheers,
-- 
Will

https://fixes.arm64.dev
https://next.arm64.dev
https://will.arm64.dev


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2025-07-04 18:22 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-01 11:02 [PATCH v3 0/2] Update the watchdog period according to real CPU frequency Yicong Yang
2025-07-01 11:02 ` [PATCH v3 1/2] watchdog/perf: Provide function for adjusting the event period Yicong Yang
2025-07-01 11:02 ` [PATCH v3 2/2] arm64/watchdog_hld: Add a cpufreq notifier for update watchdog thresh Yicong Yang
2025-07-04 17:44 ` [PATCH v3 0/2] Update the watchdog period according to real CPU frequency Will Deacon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).