[PATCH] watchdog/hardlockup: Fix UAF in perf event cleanup due to migration race

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH] watchdog/hardlockup: Fix UAF in perf event cleanup due to migration race
@ 2026-01-22  4:27 Qiliang Yuan
  2026-01-22  5:24 ` [PATCH v2] " Qiliang Yuan
  0 siblings, 1 reply; 14+ messages in thread
From: Qiliang Yuan @ 2026-01-22  4:27 UTC (permalink / raw)
  To: Li Huafei, Ingo Molnar, Andrew Morton
  Cc: Thorsten Blum, Yicong Yang, Jinchao Wang, linux-kernel,
	Qiliang Yuan, Shouxin Sun, Junnan Xhang, Qiliang Yuan

During the early initialization of the hardlockup detector, the
hardlockup_detector_perf_init() function probes for PMU hardware availability.
It originally used hardlockup_detector_event_create(), which interacts with
the per-cpu 'watchdog_ev' variable.

If the initializing task migrates to another CPU during this probe phase,
two issues arise:
1. The 'watchdog_ev' pointer on the original CPU is set but not cleared,
   leaving a stale pointer to a freed perf event.
2. The 'watchdog_ev' pointer on the new CPU might be incorrectly cleared.

This race condition was observed in console logs (captured by adding debug printks):

[23.038376] hardlockup_detector_perf_init 313 cur_cpu=2
...
[23.076385] hardlockup_detector_event_create 203 cpu(cur)=2 set watchdog_ev
...
[23.095788] perf_event_release_kernel 4623 cur_cpu=2
...
[23.116963] lockup_detector_reconfigure 577 cur_cpu=3

The log shows the task started on CPU 2, set watchdog_ev on CPU 2,
released the event on CPU 2, but then migrated to CPU 3 before the
cleanup logic (which would clear watchdog_ev) could run. This left
watchdog_ev on CPU 2 pointing to a freed event.

Later, when the watchdog is enabled/disabled on CPU 2, this stale pointer
leads to a Use-After-Free (UAF) in perf_event_disable(), as detected by KASAN:
[26.539140] ==================================================================
[26.540732] BUG: KASAN: use-after-free in perf_event_ctx_lock_nested.isra.72+0x6b/0x140
[26.542442] Read of size 8 at addr ff110006b360d718 by task kworker/2:1/94
[26.543954]
[26.544744] CPU: 2 PID: 94 Comm: kworker/2:1 Not tainted 4.19.90-debugkasan #11
[26.546505] Hardware name: GoStack Foundation OpenStack Nova, BIOS 1.16.3-3.ctl3 04/01/2014
[26.548256] Workqueue: events smp_call_on_cpu_callback
[26.549267] Call Trace:
[26.549936]  dump_stack+0x8b/0xbb
[26.550731]  print_address_description+0x6a/0x270
[26.551688]  kasan_report+0x179/0x2c0
[26.552519]  ? perf_event_ctx_lock_nested.isra.72+0x6b/0x140
[26.553654]  ? watchdog_disable+0x80/0x80
[26.553657]  perf_event_ctx_lock_nested.isra.72+0x6b/0x140
[26.556951]  ? dump_stack+0xa0/0xbb
[26.564006]  ? watchdog_disable+0x80/0x80
[26.564886]  perf_event_disable+0xa/0x30
[26.565746]  hardlockup_detector_perf_disable+0x1b/0x60
[26.566776]  watchdog_disable+0x51/0x80
[26.567624]  softlockup_stop_fn+0x11/0x20
[26.568499]  smp_call_on_cpu_callback+0x5b/0xb0
[26.569443]  process_one_work+0x389/0x770
[26.570311]  worker_thread+0x57/0x5a0
[26.571124]  ? process_one_work+0x770/0x770
[26.572031]  kthread+0x1ae/0x1d0
[26.572810]  ? kthread_create_worker_on_cpu+0xc0/0xc0
[26.573821]  ret_from_fork+0x1f/0x40
[26.574638]
[26.575178] Allocated by task 1:
[26.575990]  kasan_kmalloc+0xa0/0xd0
[26.576814]  kmem_cache_alloc_trace+0xf3/0x1e0
[26.577732]  perf_event_alloc.part.89+0xb5/0x12b0
[26.578700]  perf_event_create_kernel_counter+0x1e/0x1d0
[26.579728]  hardlockup_detector_event_create+0x4e/0xc0
[26.580744]  hardlockup_detector_perf_init+0x2f/0x60
[26.581746]  lockup_detector_init+0x85/0xdc
[26.582645]  kernel_init_freeable+0x34d/0x40e
[26.583568]  kernel_init+0xf/0x130
[26.584428]  ret_from_fork+0x1f/0x40
[26.584429]
[26.584430] Freed by task 0:
[26.584433]  __kasan_slab_free+0x130/0x180
[26.584436]  kfree+0x90/0x1a0
[26.589641]  rcu_process_callbacks+0x2cb/0x6e0
[26.590935]  __do_softirq+0x119/0x3a2
[26.591965]
[26.592630] The buggy address belongs to the object at ff110006b360d500
[26.592630]  which belongs to the cache kmalloc-2048 of size 2048
[26.592633] The buggy address is located 536 bytes inside of
[26.592633]  2048-byte region [ff110006b360d500, ff110006b360dd00)
[26.592634] The buggy address belongs to the page:
[26.592637] page:ffd400001acd8200 count:1 mapcount:0 mapping:ff11000107c0e800 index:0x0 compound_mapcount: 0
[26.600959] flags: 0x17ffffc0010200(slab|head)
[26.601891] raw: 0017ffffc0010200 dead000000000100 dead000000000200 ff11000107c0e800
[26.603541] raw: 0000000000000000 00000000800f000f 00000001ffffffff 0000000000000000
[26.605546] page dumped because: kasan: bad access detected
[26.606788]
[26.607351] Memory state around the buggy address:
[26.608556]  ff110006b360d600: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[26.610565]  ff110006b360d680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[26.610567] >ff110006b360d700: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[26.610568]                             ^
[26.610570]  ff110006b360d780: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[26.610573]  ff110006b360d800: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[26.618955] ==================================================================

Fix this by making the probe logic stateless. Use a local variable for the
perf event and avoid accessing the per-cpu 'watchdog_ev' during initialization.
This ensures that the probe event is always properly released regardless of
task migration, and no stale global state is left behind.

Signed-off-by: Shouxin Sun <sunshx@chinatelecom.cn>
Signed-off-by: Junnan Xhang <zhangjn11@chinatelecom.cn>
Signed-off-by: Qiliang Yuan <realwujing@gmail.com>
Signed-off-by: Qiliang Yuan <yuanql9@chinatelecom.cn>
---
 kernel/watchdog_perf.c | 28 ++++++++++++++++++++++++----
 1 file changed, 24 insertions(+), 4 deletions(-)

diff --git a/kernel/watchdog_perf.c b/kernel/watchdog_perf.c
index d3ca70e3c256..5066be7bba03 100644
--- a/kernel/watchdog_perf.c
+++ b/kernel/watchdog_perf.c
@@ -264,18 +264,38 @@ bool __weak __init arch_perf_nmi_is_available(void)
 int __init watchdog_hardlockup_probe(void)
 {
 	int ret;
+	struct perf_event_attr *wd_attr = &wd_hw_attr;
+	struct perf_event *evt;
+	unsigned int cpu;
 
 	if (!arch_perf_nmi_is_available())
 		return -ENODEV;
 
-	ret = hardlockup_detector_event_create();
+	/*
+	 * Test hardware PMU availability. Avoid using
+	 * hardlockup_detector_event_create() to prevent migration-related
+	 * stale pointers in the per-cpu watchdog_ev during early probe.
+	 */
+	wd_attr->sample_period = hw_nmi_get_sample_period(watchdog_thresh);
+	if (!wd_attr->sample_period)
+		return -EINVAL;
 
-	if (ret) {
+	/*
+	 * Use raw_smp_processor_id() for probing in preemptible init code.
+	 * Migration after reading ID is acceptable as counter creation on
+	 * the old CPU is sufficient for the probe.
+	 */
+	cpu = raw_smp_processor_id();
+	evt = perf_event_create_kernel_counter(wd_attr, cpu, NULL,
+					       watchdog_overflow_callback, NULL);
+	if (IS_ERR(evt)) {
 		pr_info("Perf NMI watchdog permanently disabled\n");
+		ret = PTR_ERR(evt);
 	} else {
-		perf_event_release_kernel(this_cpu_read(watchdog_ev));
-		this_cpu_write(watchdog_ev, NULL);
+		perf_event_release_kernel(evt);
+		ret = 0;
 	}
+
 	return ret;
 }
 
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2] watchdog/hardlockup: Fix UAF in perf event cleanup due to migration race
  2026-01-22  4:27 [PATCH] watchdog/hardlockup: Fix UAF in perf event cleanup due to migration race Qiliang Yuan
@ 2026-01-22  5:24 ` Qiliang Yuan
  2026-01-22 21:59   ` Andrew Morton
  0 siblings, 1 reply; 14+ messages in thread
From: Qiliang Yuan @ 2026-01-22  5:24 UTC (permalink / raw)
  To: realwujing, akpm, lihuafei1, mingo
  Cc: linux-kernel, sunshx, thorsten.blum, wangjinchao600, yangyicong,
	yuanql9, zhangjn11, stable

During the early initialization of the hardlockup detector, the
hardlockup_detector_perf_init() function probes for PMU hardware availability.
It originally used hardlockup_detector_event_create(), which interacts with
the per-cpu 'watchdog_ev' variable.

If the initializing task migrates to another CPU during this probe phase,
two issues arise:
1. The 'watchdog_ev' pointer on the original CPU is set but not cleared,
   leaving a stale pointer to a freed perf event.
2. The 'watchdog_ev' pointer on the new CPU might be incorrectly cleared.

This race condition was observed in console logs (captured by adding debug printks):

[23.038376] hardlockup_detector_perf_init 313 cur_cpu=2
...
[23.076385] hardlockup_detector_event_create 203 cpu(cur)=2 set watchdog_ev
...
[23.095788] perf_event_release_kernel 4623 cur_cpu=2
...
[23.116963] lockup_detector_reconfigure 577 cur_cpu=3

The log shows the task started on CPU 2, set watchdog_ev on CPU 2,
released the event on CPU 2, but then migrated to CPU 3 before the
cleanup logic (which would clear watchdog_ev) could run. This left
watchdog_ev on CPU 2 pointing to a freed event.

Later, when the watchdog is enabled/disabled on CPU 2, this stale pointer
leads to a Use-After-Free (UAF) in perf_event_disable(), as detected by KASAN:
[26.539140] ==================================================================
[26.540732] BUG: KASAN: use-after-free in perf_event_ctx_lock_nested.isra.72+0x6b/0x140
[26.542442] Read of size 8 at addr ff110006b360d718 by task kworker/2:1/94
[26.543954]
[26.544744] CPU: 2 PID: 94 Comm: kworker/2:1 Not tainted 4.19.90-debugkasan #11
[26.546505] Hardware name: GoStack Foundation OpenStack Nova, BIOS 1.16.3-3.ctl3 04/01/2014
[26.548256] Workqueue: events smp_call_on_cpu_callback
[26.549267] Call Trace:
[26.549936]  dump_stack+0x8b/0xbb
[26.550731]  print_address_description+0x6a/0x270
[26.551688]  kasan_report+0x179/0x2c0
[26.552519]  ? perf_event_ctx_lock_nested.isra.72+0x6b/0x140
[26.553654]  ? watchdog_disable+0x80/0x80
[26.553657]  perf_event_ctx_lock_nested.isra.72+0x6b/0x140
[26.556951]  ? dump_stack+0xa0/0xbb
[26.564006]  ? watchdog_disable+0x80/0x80
[26.564886]  perf_event_disable+0xa/0x30
[26.565746]  hardlockup_detector_perf_disable+0x1b/0x60
[26.566776]  watchdog_disable+0x51/0x80
[26.567624]  softlockup_stop_fn+0x11/0x20
[26.568499]  smp_call_on_cpu_callback+0x5b/0xb0
[26.569443]  process_one_work+0x389/0x770
[26.570311]  worker_thread+0x57/0x5a0
[26.571124]  ? process_one_work+0x770/0x770
[26.572031]  kthread+0x1ae/0x1d0
[26.572810]  ? kthread_create_worker_on_cpu+0xc0/0xc0
[26.573821]  ret_from_fork+0x1f/0x40
[26.574638]
[26.575178] Allocated by task 1:
[26.575990]  kasan_kmalloc+0xa0/0xd0
[26.576814]  kmem_cache_alloc_trace+0xf3/0x1e0
[26.577732]  perf_event_alloc.part.89+0xb5/0x12b0
[26.578700]  perf_event_create_kernel_counter+0x1e/0x1d0
[26.579728]  hardlockup_detector_event_create+0x4e/0xc0
[26.580744]  hardlockup_detector_perf_init+0x2f/0x60
[26.581746]  lockup_detector_init+0x85/0xdc
[26.582645]  kernel_init_freeable+0x34d/0x40e
[26.583568]  kernel_init+0xf/0x130
[26.584428]  ret_from_fork+0x1f/0x40
[26.584429]
[26.584430] Freed by task 0:
[26.584433]  __kasan_slab_free+0x130/0x180
[26.584436]  kfree+0x90/0x1a0
[26.589641]  rcu_process_callbacks+0x2cb/0x6e0
[26.590935]  __do_softirq+0x119/0x3a2
[26.591965]
[26.592630] The buggy address belongs to the object at ff110006b360d500
[26.592630]  which belongs to the cache kmalloc-2048 of size 2048
[26.592633] The buggy address is located 536 bytes inside of
[26.592633]  2048-byte region [ff110006b360d500, ff110006b360dd00)
[26.592634] The buggy address belongs to the page:
[26.592637] page:ffd400001acd8200 count:1 mapcount:0 mapping:ff11000107c0e800 index:0x0 compound_mapcount: 0
[26.600959] flags: 0x17ffffc0010200(slab|head)
[26.601891] raw: 0017ffffc0010200 dead000000000100 dead000000000200 ff11000107c0e800
[26.603541] raw: 0000000000000000 00000000800f000f 00000001ffffffff 0000000000000000
[26.605546] page dumped because: kasan: bad access detected
[26.606788]
[26.607351] Memory state around the buggy address:
[26.608556]  ff110006b360d600: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[26.610565]  ff110006b360d680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[26.610567] >ff110006b360d700: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[26.610568]                             ^
[26.610570]  ff110006b360d780: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[26.610573]  ff110006b360d800: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[26.618955] ==================================================================

Fix this by making the probe logic stateless. Use a local variable for the
perf event and avoid accessing the per-cpu 'watchdog_ev' during initialization.
This ensures that the probe event is always properly released regardless of
task migration, and no stale global state is left behind.

Cc: stable@vger.kernel.org
Signed-off-by: Shouxin Sun <sunshx@chinatelecom.cn>
Signed-off-by: Junnan Zhang <zhangjn11@chinatelecom.cn>
Signed-off-by: Qiliang Yuan <realwujing@gmail.com>
Signed-off-by: Qiliang Yuan <yuanql9@chinatelecom.cn>
---
v2:
- Add Cc: stable@vger.kernel.org tag.
---
 kernel/watchdog_perf.c | 28 ++++++++++++++++++++++++----
 1 file changed, 24 insertions(+), 4 deletions(-)

diff --git a/kernel/watchdog_perf.c b/kernel/watchdog_perf.c
index d3ca70e3c256..5066be7bba03 100644
--- a/kernel/watchdog_perf.c
+++ b/kernel/watchdog_perf.c
@@ -264,18 +264,38 @@ bool __weak __init arch_perf_nmi_is_available(void)
 int __init watchdog_hardlockup_probe(void)
 {
 	int ret;
+	struct perf_event_attr *wd_attr = &wd_hw_attr;
+	struct perf_event *evt;
+	unsigned int cpu;
 
 	if (!arch_perf_nmi_is_available())
 		return -ENODEV;
 
-	ret = hardlockup_detector_event_create();
+	/*
+	 * Test hardware PMU availability. Avoid using
+	 * hardlockup_detector_event_create() to prevent migration-related
+	 * stale pointers in the per-cpu watchdog_ev during early probe.
+	 */
+	wd_attr->sample_period = hw_nmi_get_sample_period(watchdog_thresh);
+	if (!wd_attr->sample_period)
+		return -EINVAL;
 
-	if (ret) {
+	/*
+	 * Use raw_smp_processor_id() for probing in preemptible init code.
+	 * Migration after reading ID is acceptable as counter creation on
+	 * the old CPU is sufficient for the probe.
+	 */
+	cpu = raw_smp_processor_id();
+	evt = perf_event_create_kernel_counter(wd_attr, cpu, NULL,
+					       watchdog_overflow_callback, NULL);
+	if (IS_ERR(evt)) {
 		pr_info("Perf NMI watchdog permanently disabled\n");
+		ret = PTR_ERR(evt);
 	} else {
-		perf_event_release_kernel(this_cpu_read(watchdog_ev));
-		this_cpu_write(watchdog_ev, NULL);
+		perf_event_release_kernel(evt);
+		ret = 0;
 	}
+
 	return ret;
 }
 
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v2] watchdog/hardlockup: Fix UAF in perf event cleanup due to migration race
  2026-01-22  5:24 ` [PATCH v2] " Qiliang Yuan
@ 2026-01-22 21:59   ` Andrew Morton
  2026-01-23  2:39     ` Doug Anderson
  0 siblings, 1 reply; 14+ messages in thread
From: Andrew Morton @ 2026-01-22 21:59 UTC (permalink / raw)
  To: Qiliang Yuan
  Cc: lihuafei1, mingo, linux-kernel, sunshx, thorsten.blum,
	wangjinchao600, yangyicong, yuanql9, zhangjn11, stable, Song Liu,
	Douglas Anderson

On Thu, 22 Jan 2026 00:24:42 -0500 Qiliang Yuan <realwujing@gmail.com> wrote:

> During the early initialization of the hardlockup detector, the
> hardlockup_detector_perf_init() function probes for PMU hardware availability.
> It originally used hardlockup_detector_event_create(), which interacts with
> the per-cpu 'watchdog_ev' variable.

Thanks.

For a -stable backport it's desirable to have a Fixes: target.  But it
appears this is very old code?

Also, I'm not sure who best to ask to help review this change.  I'll
add a few cc's here.

[full email retained...]

> If the initializing task migrates to another CPU during this probe phase,
> two issues arise:
> 1. The 'watchdog_ev' pointer on the original CPU is set but not cleared,
>    leaving a stale pointer to a freed perf event.
> 2. The 'watchdog_ev' pointer on the new CPU might be incorrectly cleared.
> 
> This race condition was observed in console logs (captured by adding debug printks):
> 
> [23.038376] hardlockup_detector_perf_init 313 cur_cpu=2
> ...
> [23.076385] hardlockup_detector_event_create 203 cpu(cur)=2 set watchdog_ev
> ...
> [23.095788] perf_event_release_kernel 4623 cur_cpu=2
> ...
> [23.116963] lockup_detector_reconfigure 577 cur_cpu=3
> 
> The log shows the task started on CPU 2, set watchdog_ev on CPU 2,
> released the event on CPU 2, but then migrated to CPU 3 before the
> cleanup logic (which would clear watchdog_ev) could run. This left
> watchdog_ev on CPU 2 pointing to a freed event.
> 
> Later, when the watchdog is enabled/disabled on CPU 2, this stale pointer
> leads to a Use-After-Free (UAF) in perf_event_disable(), as detected by KASAN:
> [26.539140] ==================================================================
> [26.540732] BUG: KASAN: use-after-free in perf_event_ctx_lock_nested.isra.72+0x6b/0x140
> [26.542442] Read of size 8 at addr ff110006b360d718 by task kworker/2:1/94
> [26.543954]
> [26.544744] CPU: 2 PID: 94 Comm: kworker/2:1 Not tainted 4.19.90-debugkasan #11
> [26.546505] Hardware name: GoStack Foundation OpenStack Nova, BIOS 1.16.3-3.ctl3 04/01/2014
> [26.548256] Workqueue: events smp_call_on_cpu_callback
> [26.549267] Call Trace:
> [26.549936]  dump_stack+0x8b/0xbb
> [26.550731]  print_address_description+0x6a/0x270
> [26.551688]  kasan_report+0x179/0x2c0
> [26.552519]  ? perf_event_ctx_lock_nested.isra.72+0x6b/0x140
> [26.553654]  ? watchdog_disable+0x80/0x80
> [26.553657]  perf_event_ctx_lock_nested.isra.72+0x6b/0x140
> [26.556951]  ? dump_stack+0xa0/0xbb
> [26.564006]  ? watchdog_disable+0x80/0x80
> [26.564886]  perf_event_disable+0xa/0x30
> [26.565746]  hardlockup_detector_perf_disable+0x1b/0x60
> [26.566776]  watchdog_disable+0x51/0x80
> [26.567624]  softlockup_stop_fn+0x11/0x20
> [26.568499]  smp_call_on_cpu_callback+0x5b/0xb0
> [26.569443]  process_one_work+0x389/0x770
> [26.570311]  worker_thread+0x57/0x5a0
> [26.571124]  ? process_one_work+0x770/0x770
> [26.572031]  kthread+0x1ae/0x1d0
> [26.572810]  ? kthread_create_worker_on_cpu+0xc0/0xc0
> [26.573821]  ret_from_fork+0x1f/0x40
> [26.574638]
> [26.575178] Allocated by task 1:
> [26.575990]  kasan_kmalloc+0xa0/0xd0
> [26.576814]  kmem_cache_alloc_trace+0xf3/0x1e0
> [26.577732]  perf_event_alloc.part.89+0xb5/0x12b0
> [26.578700]  perf_event_create_kernel_counter+0x1e/0x1d0
> [26.579728]  hardlockup_detector_event_create+0x4e/0xc0
> [26.580744]  hardlockup_detector_perf_init+0x2f/0x60
> [26.581746]  lockup_detector_init+0x85/0xdc
> [26.582645]  kernel_init_freeable+0x34d/0x40e
> [26.583568]  kernel_init+0xf/0x130
> [26.584428]  ret_from_fork+0x1f/0x40
> [26.584429]
> [26.584430] Freed by task 0:
> [26.584433]  __kasan_slab_free+0x130/0x180
> [26.584436]  kfree+0x90/0x1a0
> [26.589641]  rcu_process_callbacks+0x2cb/0x6e0
> [26.590935]  __do_softirq+0x119/0x3a2
> [26.591965]
> [26.592630] The buggy address belongs to the object at ff110006b360d500
> [26.592630]  which belongs to the cache kmalloc-2048 of size 2048
> [26.592633] The buggy address is located 536 bytes inside of
> [26.592633]  2048-byte region [ff110006b360d500, ff110006b360dd00)
> [26.592634] The buggy address belongs to the page:
> [26.592637] page:ffd400001acd8200 count:1 mapcount:0 mapping:ff11000107c0e800 index:0x0 compound_mapcount: 0
> [26.600959] flags: 0x17ffffc0010200(slab|head)
> [26.601891] raw: 0017ffffc0010200 dead000000000100 dead000000000200 ff11000107c0e800
> [26.603541] raw: 0000000000000000 00000000800f000f 00000001ffffffff 0000000000000000
> [26.605546] page dumped because: kasan: bad access detected
> [26.606788]
> [26.607351] Memory state around the buggy address:
> [26.608556]  ff110006b360d600: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> [26.610565]  ff110006b360d680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> [26.610567] >ff110006b360d700: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> [26.610568]                             ^
> [26.610570]  ff110006b360d780: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> [26.610573]  ff110006b360d800: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> [26.618955] ==================================================================
> 
> Fix this by making the probe logic stateless. Use a local variable for the
> perf event and avoid accessing the per-cpu 'watchdog_ev' during initialization.
> This ensures that the probe event is always properly released regardless of
> task migration, and no stale global state is left behind.
> 
> Cc: stable@vger.kernel.org
> Signed-off-by: Shouxin Sun <sunshx@chinatelecom.cn>
> Signed-off-by: Junnan Zhang <zhangjn11@chinatelecom.cn>
> Signed-off-by: Qiliang Yuan <realwujing@gmail.com>
> Signed-off-by: Qiliang Yuan <yuanql9@chinatelecom.cn>
> ---
> v2:
> - Add Cc: stable@vger.kernel.org tag.
> ---
>  kernel/watchdog_perf.c | 28 ++++++++++++++++++++++++----
>  1 file changed, 24 insertions(+), 4 deletions(-)
> 
> diff --git a/kernel/watchdog_perf.c b/kernel/watchdog_perf.c
> index d3ca70e3c256..5066be7bba03 100644
> --- a/kernel/watchdog_perf.c
> +++ b/kernel/watchdog_perf.c
> @@ -264,18 +264,38 @@ bool __weak __init arch_perf_nmi_is_available(void)
>  int __init watchdog_hardlockup_probe(void)
>  {
>  	int ret;
> +	struct perf_event_attr *wd_attr = &wd_hw_attr;
> +	struct perf_event *evt;
> +	unsigned int cpu;
>  
>  	if (!arch_perf_nmi_is_available())
>  		return -ENODEV;
>  
> -	ret = hardlockup_detector_event_create();
> +	/*
> +	 * Test hardware PMU availability. Avoid using
> +	 * hardlockup_detector_event_create() to prevent migration-related
> +	 * stale pointers in the per-cpu watchdog_ev during early probe.
> +	 */
> +	wd_attr->sample_period = hw_nmi_get_sample_period(watchdog_thresh);
> +	if (!wd_attr->sample_period)
> +		return -EINVAL;
>  
> -	if (ret) {
> +	/*
> +	 * Use raw_smp_processor_id() for probing in preemptible init code.
> +	 * Migration after reading ID is acceptable as counter creation on
> +	 * the old CPU is sufficient for the probe.
> +	 */
> +	cpu = raw_smp_processor_id();
> +	evt = perf_event_create_kernel_counter(wd_attr, cpu, NULL,
> +					       watchdog_overflow_callback, NULL);
> +	if (IS_ERR(evt)) {
>  		pr_info("Perf NMI watchdog permanently disabled\n");
> +		ret = PTR_ERR(evt);
>  	} else {
> -		perf_event_release_kernel(this_cpu_read(watchdog_ev));
> -		this_cpu_write(watchdog_ev, NULL);
> +		perf_event_release_kernel(evt);
> +		ret = 0;
>  	}
> +
>  	return ret;
>  }
>  
> -- 
> 2.51.0

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2] watchdog/hardlockup: Fix UAF in perf event cleanup due to migration race
  2026-01-22 21:59   ` Andrew Morton
@ 2026-01-23  2:39     ` Doug Anderson
  2026-01-23  6:34       ` [PATCH v3] " Qiliang Yuan
  0 siblings, 1 reply; 14+ messages in thread
From: Doug Anderson @ 2026-01-23  2:39 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Qiliang Yuan, lihuafei1, mingo, linux-kernel, sunshx,
	thorsten.blum, wangjinchao600, yangyicong, yuanql9, zhangjn11,
	stable, Song Liu

Hi,

On Thu, Jan 22, 2026 at 1:59 PM Andrew Morton <akpm@linux-foundation.org> wrote:
>
> On Thu, 22 Jan 2026 00:24:42 -0500 Qiliang Yuan <realwujing@gmail.com> wrote:
>
> > During the early initialization of the hardlockup detector, the
> > hardlockup_detector_perf_init() function probes for PMU hardware availability.
> > It originally used hardlockup_detector_event_create(), which interacts with
> > the per-cpu 'watchdog_ev' variable.
>
> Thanks.
>
> For a -stable backport it's desirable to have a Fixes: target.  But it
> appears this is very old code?
>
> Also, I'm not sure who best to ask to help review this change.  I'll
> add a few cc's here.

I'm nowhere near an expert on the perf system or the perf-specific
bits of the hardlockup detector, but I took a quick look...

I guess my first question is: why didn't the
"WARN_ON(!is_percpu_thread());" in hardlockup_detector_event_create()
hit in this case?

I guess my second question is: your new code doesn't seem to use
"fallback_wd_hw_attr" if there is an error. Is that important?

My last thought is: why not just move the "this_cpu_write(watchdog_ev,
evt)" out of hardlockup_detector_event_create() and into
watchdog_hardlockup_enable()? You can just return evt from
hardlockup_detector_event_create(), right? Then you can keep using
hardlockup_detector_event_create() and share the code...

Full disclosure: I don't know this code and I looked at it quickly. If
something I said sounds stupid, please call me out on it.

-Doug

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v3] watchdog/hardlockup: Fix UAF in perf event cleanup due to migration race
  2026-01-23  2:39     ` Doug Anderson
@ 2026-01-23  6:34       ` Qiliang Yuan
  2026-01-24  0:01         ` Doug Anderson
  0 siblings, 1 reply; 14+ messages in thread
From: Qiliang Yuan @ 2026-01-23  6:34 UTC (permalink / raw)
  To: dianders, akpm
  Cc: lihuafei1, linux-kernel, mingo, realwujing, song, stable, sunshx,
	thorsten.blum, wangjinchao600, yangyicong, yuanql9, zhangjn11,
	mm-commits

During the early initialization of the hardlockup detector, the
hardlockup_detector_perf_init() function probes for PMU hardware availability.
It originally used hardlockup_detector_event_create(), which interacts with
the per-cpu 'watchdog_ev' variable.

If the initializing task migrates to another CPU during this probe phase,
two issues arise:
1. The 'watchdog_ev' pointer on the original CPU is set but not cleared,
   leaving a stale pointer to a freed perf event.
2. The 'watchdog_ev' pointer on the new CPU might be incorrectly cleared.

This race condition was observed in console logs (captured by adding debug printks):

[23.038376] hardlockup_detector_perf_init 313 cur_cpu=2
...
[23.076385] hardlockup_detector_event_create 203 cpu(cur)=2 set watchdog_ev
...
[23.095788] perf_event_release_kernel 4623 cur_cpu=2
...
[23.116963] lockup_detector_reconfigure 577 cur_cpu=3

The log shows the task started on CPU 2, set watchdog_ev on CPU 2,
released the event on CPU 2, but then migrated to CPU 3 before the
cleanup logic (which would clear watchdog_ev) could run. This left
watchdog_ev on CPU 2 pointing to a freed event.

Later, when the watchdog is enabled/disabled on CPU 2, this stale pointer
leads to a Use-After-Free (UAF) in perf_event_disable(), as detected by KASAN:
[26.539140] ==================================================================
[26.540732] BUG: KASAN: use-after-free in perf_event_ctx_lock_nested.isra.72+0x6b/0x140
[26.542442] Read of size 8 at addr ff110006b360d718 by task kworker/2:1/94
[26.543954]
[26.544744] CPU: 2 PID: 94 Comm: kworker/2:1 Not tainted 4.19.90-debugkasan #11
[26.546505] Hardware name: GoStack Foundation OpenStack Nova, BIOS 1.16.3-3.ctl3 04/01/2014
[26.548256] Workqueue: events smp_call_on_cpu_callback
[26.549267] Call Trace:
[26.549936]  dump_stack+0x8b/0xbb
[26.550731]  print_address_description+0x6a/0x270
[26.551688]  kasan_report+0x179/0x2c0
[26.552519]  ? perf_event_ctx_lock_nested.isra.72+0x6b/0x140
[26.553654]  ? watchdog_disable+0x80/0x80
[26.553657]  perf_event_ctx_lock_nested.isra.72+0x6b/0x140
[26.556951]  ? dump_stack+0xa0/0xbb
[26.564006]  ? watchdog_disable+0x80/0x80
[26.564886]  perf_event_disable+0xa/0x30
[26.565746]  hardlockup_detector_perf_disable+0x1b/0x60
[26.566776]  watchdog_disable+0x51/0x80
[26.567624]  softlockup_stop_fn+0x11/0x20
[26.568499]  smp_call_on_cpu_callback+0x5b/0xb0
[26.569443]  process_one_work+0x389/0x770
[26.570311]  worker_thread+0x57/0x5a0
[26.571124]  ? process_one_work+0x770/0x770
[26.572031]  kthread+0x1ae/0x1d0
[26.572810]  ? kthread_create_worker_on_cpu+0xc0/0xc0
[26.573821]  ret_from_fork+0x1f/0x40
[26.574638]
[26.575178] Allocated by task 1:
[26.575990]  kasan_kmalloc+0xa0/0xd0
[26.576814]  kmem_cache_alloc_trace+0xf3/0x1e0
[26.577732]  perf_event_alloc.part.89+0xb5/0x12b0
[26.578700]  perf_event_create_kernel_counter+0x1e/0x1d0
[26.579728]  hardlockup_detector_event_create+0x4e/0xc0
[26.580744]  hardlockup_detector_perf_init+0x2f/0x60
[26.581746]  lockup_detector_init+0x85/0xdc
[26.582645]  kernel_init_freeable+0x34d/0x40e
[26.583568]  kernel_init+0xf/0x130
[26.584428]  ret_from_fork+0x1f/0x40
[26.584429]
[26.584430] Freed by task 0:
[26.584433]  __kasan_slab_free+0x130/0x180
[26.584436]  kfree+0x90/0x1a0
[26.589641]  rcu_process_callbacks+0x2cb/0x6e0
[26.590935]  __do_softirq+0x119/0x3a2
[26.591965]
[26.592630] The buggy address belongs to the object at ff110006b360d500
[26.592630]  which belongs to the cache kmalloc-2048 of size 2048
[26.592633] The buggy address is located 536 bytes inside of
[26.592633]  2048-byte region [ff110006b360d500, ff110006b360dd00)
[26.592634] The buggy address belongs to the page:
[26.592637] page:ffd400001acd8200 count:1 mapcount:0 mapping:ff11000107c0e800 index:0x0 compound_mapcount: 0
[26.600959] flags: 0x17ffffc0010200(slab|head)
[26.601891] raw: 0017ffffc0010200 dead000000000100 dead000000000200 ff11000107c0e800
[26.603541] raw: 0000000000000000 00000000800f000f 00000001ffffffff 0000000000000000
[26.605546] page dumped because: kasan: bad access detected
[26.606788]
[26.607351] Memory state around the buggy address:
[26.608556]  ff110006b360d600: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[26.610565]  ff110006b360d680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[26.610567] >ff110006b360d700: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[26.610568]                             ^
[26.610570]  ff110006b360d780: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[26.610573]  ff110006b360d800: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[26.618955] ==================================================================

Fix this by refactoring hardlockup_detector_event_create() to return the
created perf event instead of directly assigning it to the per-cpu variable.
This allows the probe logic to reuse the creation code (including fallback
logic) without affecting the global state, ensuring that task migration
during probe no longer leaves stale pointers in 'watchdog_ev'.

Signed-off-by: Shouxin Sun <sunshx@chinatelecom.cn>
Signed-off-by: Junnan Zhang <zhangjn11@chinatelecom.cn>
Signed-off-by: Qiliang Yuan <realwujing@gmail.com>
Signed-off-by: Qiliang Yuan <yuanql9@chinatelecom.cn>
Cc: Song Liu <song@kernel.org>
Cc: Douglas Anderson <dianders@chromium.org>
Cc: Jinchao Wang <wangjinchao600@gmail.com>
Cc: Wang Jinchao <wangjinchao600@gmail.com>
Cc: <stable@vger.kernel.org>
---
v3: Refactor creation logic to return event pointer; restores PMU cycle fallback and unifies paths.
v2: Add Cc: <stable@vger.kernel.org>.
v1: Avoid 'watchdog_ev' in probe path by manually creating and releasing a local perf event.

 kernel/watchdog_perf.c | 51 ++++++++++++++++++++++++------------------
 1 file changed, 29 insertions(+), 22 deletions(-)

diff --git a/kernel/watchdog_perf.c b/kernel/watchdog_perf.c
index d3ca70e3c256..d045b92bc514 100644
--- a/kernel/watchdog_perf.c
+++ b/kernel/watchdog_perf.c
@@ -118,18 +118,11 @@ static void watchdog_overflow_callback(struct perf_event *event,
 	watchdog_hardlockup_check(smp_processor_id(), regs);
 }
 
-static int hardlockup_detector_event_create(void)
+static struct perf_event *hardlockup_detector_event_create(unsigned int cpu)
 {
-	unsigned int cpu;
 	struct perf_event_attr *wd_attr;
 	struct perf_event *evt;
 
-	/*
-	 * Preemption is not disabled because memory will be allocated.
-	 * Ensure CPU-locality by calling this in per-CPU kthread.
-	 */
-	WARN_ON(!is_percpu_thread());
-	cpu = raw_smp_processor_id();
 	wd_attr = &wd_hw_attr;
 	wd_attr->sample_period = hw_nmi_get_sample_period(watchdog_thresh);
 
@@ -143,14 +136,7 @@ static int hardlockup_detector_event_create(void)
 						       watchdog_overflow_callback, NULL);
 	}
 
-	if (IS_ERR(evt)) {
-		pr_debug("Perf event create on CPU %d failed with %ld\n", cpu,
-			 PTR_ERR(evt));
-		return PTR_ERR(evt);
-	}
-	WARN_ONCE(this_cpu_read(watchdog_ev), "unexpected watchdog_ev leak");
-	this_cpu_write(watchdog_ev, evt);
-	return 0;
+	return evt;
 }
 
 /**
@@ -159,17 +145,26 @@ static int hardlockup_detector_event_create(void)
  */
 void watchdog_hardlockup_enable(unsigned int cpu)
 {
+	struct perf_event *evt;
+
 	WARN_ON_ONCE(cpu != smp_processor_id());
 
-	if (hardlockup_detector_event_create())
+	evt = hardlockup_detector_event_create(cpu);
+	if (IS_ERR(evt)) {
+		pr_debug("Perf event create on CPU %d failed with %ld\n", cpu,
+			 PTR_ERR(evt));
 		return;
+	}
 
 	/* use original value for check */
 	if (!atomic_fetch_inc(&watchdog_cpus))
 		pr_info("Enabled. Permanently consumes one hw-PMU counter.\n");
 
+	WARN_ONCE(this_cpu_read(watchdog_ev), "unexpected watchdog_ev leak");
+	this_cpu_write(watchdog_ev, evt);
+
 	watchdog_init_timestamp();
-	perf_event_enable(this_cpu_read(watchdog_ev));
+	perf_event_enable(evt);
 }
 
 /**
@@ -263,19 +258,31 @@ bool __weak __init arch_perf_nmi_is_available(void)
  */
 int __init watchdog_hardlockup_probe(void)
 {
+	struct perf_event *evt;
+	unsigned int cpu;
 	int ret;
 
 	if (!arch_perf_nmi_is_available())
 		return -ENODEV;
 
-	ret = hardlockup_detector_event_create();
+	if (!hw_nmi_get_sample_period(watchdog_thresh))
+		return -EINVAL;
 
-	if (ret) {
+	/*
+	 * Test hardware PMU availability by creating a temporary perf event.
+	 * Allow migration during the check as any successfully created per-cpu
+	 * event validates PMU support. The event is released immediately.
+	 */
+	cpu = raw_smp_processor_id();
+	evt = hardlockup_detector_event_create(cpu);
+	if (IS_ERR(evt)) {
 		pr_info("Perf NMI watchdog permanently disabled\n");
+		ret = PTR_ERR(evt);
 	} else {
-		perf_event_release_kernel(this_cpu_read(watchdog_ev));
-		this_cpu_write(watchdog_ev, NULL);
+		perf_event_release_kernel(evt);
+		ret = 0;
 	}
+
 	return ret;
 }
 
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v3] watchdog/hardlockup: Fix UAF in perf event cleanup due to migration race
  2026-01-23  6:34       ` [PATCH v3] " Qiliang Yuan
@ 2026-01-24  0:01         ` Doug Anderson
  2026-01-24  6:57           ` Qiliang Yuan
  2026-01-24  7:08           ` Qiliang Yuan
  0 siblings, 2 replies; 14+ messages in thread
From: Doug Anderson @ 2026-01-24  0:01 UTC (permalink / raw)
  To: Qiliang Yuan
  Cc: akpm, lihuafei1, linux-kernel, mingo, song, stable, sunshx,
	thorsten.blum, wangjinchao600, yangyicong, yuanql9, zhangjn11,
	mm-commits

Hi,

On Thu, Jan 22, 2026 at 10:34 PM Qiliang Yuan <realwujing@gmail.com> wrote:
>
> During the early initialization of the hardlockup detector, the
> hardlockup_detector_perf_init() function probes for PMU hardware availability.
> It originally used hardlockup_detector_event_create(), which interacts with
> the per-cpu 'watchdog_ev' variable.
>
> If the initializing task migrates to another CPU during this probe phase,
> two issues arise:
> 1. The 'watchdog_ev' pointer on the original CPU is set but not cleared,
>    leaving a stale pointer to a freed perf event.
> 2. The 'watchdog_ev' pointer on the new CPU might be incorrectly cleared.
>
> This race condition was observed in console logs (captured by adding debug printks):
>
> [23.038376] hardlockup_detector_perf_init 313 cur_cpu=2

Wait a second... The above function hasn't existed for 2.5 years. It
was removed in commit d9b3629ade8e ("watchdog/hardlockup: have the
perf hardlockup use __weak functions more cleanly"). All that's left
in the ToT kernel referencing that function is an old comment...

Oh, and I guess I can see below that your stack traces are on 4.19,
which is ancient! Things have changed a bit in the meantime. Are you
certain that the problem still reproduces on ToT?

> Signed-off-by: Shouxin Sun <sunshx@chinatelecom.cn>
> Signed-off-by: Junnan Zhang <zhangjn11@chinatelecom.cn>
> Signed-off-by: Qiliang Yuan <realwujing@gmail.com>
> Signed-off-by: Qiliang Yuan <yuanql9@chinatelecom.cn>
> Cc: Song Liu <song@kernel.org>
> Cc: Douglas Anderson <dianders@chromium.org>
> Cc: Jinchao Wang <wangjinchao600@gmail.com>
> Cc: Wang Jinchao <wangjinchao600@gmail.com>
> Cc: <stable@vger.kernel.org>

Probably want a "Fixes" tag? If I had to guess, maybe?

Fixes: 930d8f8dbab9 ("watchdog/perf: adapt the watchdog_perf interface
for async model")

Why? I think before that the init function could only be called
directly from the kernel init code and before smp_init(). After that,
a worker could call it, which is the case where preemption could have
been enabled. Does my logic sound correct?

Can you confirm that you're only seeing the problem when the retry
hits? In other words when called from lockup_detector_delay_init()?
Oh, though if you're on 4.19 then I'm not sure what to think...

> @@ -118,18 +118,11 @@ static void watchdog_overflow_callback(struct perf_event *event,
>         watchdog_hardlockup_check(smp_processor_id(), regs);
>  }
>
> -static int hardlockup_detector_event_create(void)
> +static struct perf_event *hardlockup_detector_event_create(unsigned int cpu)
>  {
> -       unsigned int cpu;
>         struct perf_event_attr *wd_attr;
>         struct perf_event *evt;
>
> -       /*
> -        * Preemption is not disabled because memory will be allocated.
> -        * Ensure CPU-locality by calling this in per-CPU kthread.
> -        */
> -       WARN_ON(!is_percpu_thread());

I'm still a bit confused why this warning didn't trigger previously.
Do you know why?

> @@ -263,19 +258,31 @@ bool __weak __init arch_perf_nmi_is_available(void)
>   */
>  int __init watchdog_hardlockup_probe(void)
>  {
> +       struct perf_event *evt;
> +       unsigned int cpu;
>         int ret;
>
>         if (!arch_perf_nmi_is_available())
>                 return -ENODEV;
>
> -       ret = hardlockup_detector_event_create();
> +       if (!hw_nmi_get_sample_period(watchdog_thresh))
> +               return -EINVAL;
>
> -       if (ret) {
> +       /*
> +        * Test hardware PMU availability by creating a temporary perf event.
> +        * Allow migration during the check as any successfully created per-cpu
> +        * event validates PMU support. The event is released immediately.

I guess it's implied by the "Allow migration during the check", but I
might even word it more strongly and say something like "The cpu we
use here is arbitrary, so we don't disable preemption and use
raw_smp_processor_id() to get a CPU."

I guess that should be OK. Hopefully the arbitrary CPU that you pick
doesn't go offline during this function. I don't know "perf" well, but
I could imagine that it might be upset if you tried to create a perf
event for a CPU that has gone offline. I guess you could be paranoid
and surround this with cpu_hotplug_disable() / cpu_hotplug_enable()?

I guess overall thoughts: the problem you're describing does seem
real, but the fact that your reports are from an ancient 4.19 kernel
make me concerned about whether you really tested all the cases on a
new kernel...

-Doug

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3] watchdog/hardlockup: Fix UAF in perf event cleanup due to migration race
  2026-01-24  0:01         ` Doug Anderson
@ 2026-01-24  6:57           ` Qiliang Yuan
  2026-01-24 23:36             ` Doug Anderson
  2026-01-24  7:08           ` Qiliang Yuan
  1 sibling, 1 reply; 14+ messages in thread
From: Qiliang Yuan @ 2026-01-24  6:57 UTC (permalink / raw)
  To: dianders
  Cc: akpm, lihuafei1, linux-kernel, mingo, mm-commits, realwujing,
	song, stable, sunshx, thorsten.blum, wangjinchao600, yangyicong,
	yuanql9, zhangjn11

Thanks for the detailed review!

> Wait a second... The above function hasn't existed for 2.5 years. It
> was removed in commit d9b3629ade8e ("watchdog/hardlockup: have the
> perf hardlockup use __weak functions more cleanly"). All that's left
> in the ToT kernel referencing that function is an old comment...
>
> Oh, and I guess I can see below that your stack traces are on 4.19,
> which is ancient! Things have changed a bit in the meantime. Are you
> certain that the problem still reproduces on ToT?

The function hardlockup_detector_perf_init() was renamed to
watchdog_hardlockup_probe() in commit d9b3629ade8e ("watchdog/hardlockup:
have the perf hardlockup use __weak functions more cleanly").
Additionally, the source file was moved from kernel/watchdog_hld.c to
kernel/watchdog_perf.c in commit 6ea0d04211a7. The v3 commit message
inadvertently retained legacy terminology from the 4.19 kernel; this will
be updated in V4 to reflect current ToT naming.

The core logic remains the same: the race condition persists despite the
renaming and cleanup of the __weak function logic.

Regarding ToT reproducibility: while the KASAN report originated from
4.19, the underlying logic is still problematic in ToT. In
watchdog_hardlockup_probe(), the call to
hardlockup_detector_event_create() still writes to the per-cpu
watchdog_ev. Task migration between event creation and the subsequent
perf_event_release_kernel() leaves a stale pointer in the watchdog_ev of
the original CPU.

> Probably want a "Fixes" tag? If I had to guess, maybe?
>
> Fixes: 930d8f8dbab9 ("watchdog/perf: adapt the watchdog_perf interface
> for async model")

Commit 930d8f8dbab9 introduced the async initialization which allows
preemption/migration during the probe phase. This tag will be included in
V4.

> I'm still a bit confused why this warning didn't trigger previously.
> Do you know why?

In 4.19, hardlockup_detector_event_create() did not include the
WARN_ON(!is_percpu_thread()) check, which was added in later versions. In
ToT, this warning is expected to trigger if watchdog_hardlockup_probe()
is called from a non-per-cpu-bound thread (such as kernel_init). This
further justifies refactoring the creation logic to be CPU-agnostic for
probing.

> I guess it's implied by the "Allow migration during the check", but I
> might even word it more strongly and say something like "The cpu we
> use here is arbitrary, so we don't disable preemption and use
> raw_smp_processor_id() to get a CPU."
>
> I guess that should be OK. Hopefully the arbitrary CPU that you pick
> doesn't go offline during this function. I don't know "perf" well, but
> I could imagine that it might be upset if you tried to create a perf
> event for a CPU that has gone offline. I guess you could be paranoid
> and surround this with cpu_hotplug_disable() / cpu_hotplug_enable()?

The point is well-taken. While unlikely during early boot, adding
cpu_hotplug_disable() ensures robustness.

V4 will be submitted with the following changes:
1. Clarified commit message (retaining 4.19 logs while explaining the
   renaming to watchdog_hardlockup_probe).
2. Inclusion of the "Fixes" tag.
3. Addition of cpu_hotplug_disable() around the probe.
4. Refined comments.

Best regards,
Qiliang

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3] watchdog/hardlockup: Fix UAF in perf event cleanup due to migration race
  2026-01-24  6:57           ` Qiliang Yuan
@ 2026-01-24 23:36             ` Doug Anderson
  2026-01-26  3:30               ` Qiliang Yuan
  0 siblings, 1 reply; 14+ messages in thread
From: Doug Anderson @ 2026-01-24 23:36 UTC (permalink / raw)
  To: Qiliang Yuan
  Cc: akpm, lihuafei1, linux-kernel, mingo, mm-commits, song, stable,
	sunshx, thorsten.blum, wangjinchao600, yangyicong, yuanql9,
	zhangjn11

Hi,

On Fri, Jan 23, 2026 at 10:57 PM Qiliang Yuan <realwujing@gmail.com> wrote:
>
> Thanks for the detailed review!
>
> > Wait a second... The above function hasn't existed for 2.5 years. It
> > was removed in commit d9b3629ade8e ("watchdog/hardlockup: have the
> > perf hardlockup use __weak functions more cleanly"). All that's left
> > in the ToT kernel referencing that function is an old comment...
> >
> > Oh, and I guess I can see below that your stack traces are on 4.19,
> > which is ancient! Things have changed a bit in the meantime. Are you
> > certain that the problem still reproduces on ToT?
>
> The function hardlockup_detector_perf_init() was renamed to
> watchdog_hardlockup_probe() in commit d9b3629ade8e ("watchdog/hardlockup:
> have the perf hardlockup use __weak functions more cleanly").
> Additionally, the source file was moved from kernel/watchdog_hld.c to
> kernel/watchdog_perf.c in commit 6ea0d04211a7. The v3 commit message
> inadvertently retained legacy terminology from the 4.19 kernel; this will
> be updated in V4 to reflect current ToT naming.
>
> The core logic remains the same: the race condition persists despite the
> renaming and cleanup of the __weak function logic.
>
> Regarding ToT reproducibility: while the KASAN report originated from
> 4.19, the underlying logic is still problematic in ToT. In
> watchdog_hardlockup_probe(), the call to
> hardlockup_detector_event_create() still writes to the per-cpu
> watchdog_ev. Task migration between event creation and the subsequent
> perf_event_release_kernel() leaves a stale pointer in the watchdog_ev of
> the original CPU.
>
> > Probably want a "Fixes" tag? If I had to guess, maybe?
> >
> > Fixes: 930d8f8dbab9 ("watchdog/perf: adapt the watchdog_perf interface
> > for async model")
>
> Commit 930d8f8dbab9 introduced the async initialization which allows
> preemption/migration during the probe phase. This tag will be included in
> V4.

The part that doesn't make a lot of sense to me, though, is that v4.19
also doesn't have commit 930d8f8dbab9 ("watchdog/perf: adapt the
watchdog_perf interface for async model"), which is where we are
saying the problem was introduced.

...so in v4.19 I think:
* hardlockup_detector_perf_init() is only called from watchdog_nmi_probe()
* watchdog_nmi_probe() is only called from lockup_detector_init()
* lockup_detector_init() is only called from kernel_init_freeable()
right before smp_init()

Thus I'm super confused about how you could have seen the problem on
v4.19. Maybe your v4.19 kernel has some backported patches that makes
this possible?

While I'm not saying that the v4 patch you just posted is incorrect,
I'm just trying to make sure that:

1. We actually understand the problem you were seeing.

2. We are identifying the correct "Fixes" commit.


> > I'm still a bit confused why this warning didn't trigger previously.
> > Do you know why?
>
> In 4.19, hardlockup_detector_event_create() did not include the
> WARN_ON(!is_percpu_thread()) check, which was added in later versions. In
> ToT, this warning is expected to trigger if watchdog_hardlockup_probe()
> is called from a non-per-cpu-bound thread (such as kernel_init). This
> further justifies refactoring the creation logic to be CPU-agnostic for
> probing.

OK, fair enough. ...but I'm a bit curious why nobody else saw this
WARN_ON(). I'm also curious if you have tested the hardlockup detector
on newer kernels, or if all of your work has been done on 4.19. If all
your work has been done on 4.19, do we need to find someone to test
your patch on a newer kernel and make sure it works OK? If you've
tested on a newer kernel, did the hardlockup detector init from the
kernel's early-init code, or the retry code?

-Doug

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3] watchdog/hardlockup: Fix UAF in perf event cleanup due to migration race
  2026-01-24 23:36             ` Doug Anderson
@ 2026-01-26  3:30               ` Qiliang Yuan
  2026-01-27  1:14                 ` Doug Anderson
  0 siblings, 1 reply; 14+ messages in thread
From: Qiliang Yuan @ 2026-01-26  3:30 UTC (permalink / raw)
  To: dianders
  Cc: akpm, lihuafei1, linux-kernel, mingo, mm-commits, realwujing,
	song, stable, sunshx, thorsten.blum, wangjinchao600, yangyicong,
	yuanql9, zhangjn11

Hi Doug,

Thanks for your further questions and for digging into the 4.19 vs ToT
differences.

On Sat, 24 Jan 2026 15:36:01 Doug Anderson <dianders@chromium.org> wrote:
> The part that doesn't make a lot of sense to me, though, is that v4.19
> also doesn't have commit 930d8f8dbab9 ("watchdog/perf: adapt the
> watchdog_perf interface for async model"), which is where we are
> saying the problem was introduced.
> 
> ...so in v4.19 I think:
> * hardlockup_detector_perf_init() is only called from watchdog_nmi_probe()
> * watchdog_nmi_probe() is only called from lockup_detector_init()
> * lockup_detector_init() is only called from kernel_init_freeable()
> right before smp_init()
> 
> Thus I'm super confused about how you could have seen the problem on
> v4.19. Maybe your v4.19 kernel has some backported patches that makes
> this possible?

You caught it! Here is the context for the differences:

1. Mainline (ToT):
   - `lockup_detector_init()` is always called before `smp_init()`
     (pre-SMP phase).
   - Risk source: The asynchronous retry path (`lockup_detector_delay_init`)
     introduced by 930d8f8dbab9, which runs in a workqueue (post-SMP)
     context and triggers the UAF.

2. openEuler (4.19/5.10):
   - Local `euler inclusion` patches moved `lockup_detector_init()` after
     `do_basic_setup()` (post-SMP phase).
   - Risk source: The initial probe occurs directly in a post-SMP
     environment, exposing the race condition.

For openEuler (4.19/5.10) kernel, the call stack looks like this:
  kernel_init()
  -> kernel_init_freeable()
    -> lockup_detector_init()       <-- Called after smp_init()
      -> watchdog_nmi_probe()
        -> hardlockup_detector_perf_init()
          -> hardlockup_detector_event_create()

In mainline (ToT), the initial probe (safe) call stack is:
  kernel_init()
  -> kernel_init_freeable()
    -> lockup_detector_init()       <-- Called before smp_init()
      -> watchdog_hardlockup_probe()
        -> hardlockup_detector_event_create()

However, the asynchronous retry mechanism (commit 930d8f8dbab9) executes the
probe logic in a post-SMP, preemptible context. 

For the mainline (ToT) retry path (at risk), the call stack is:
  kworker thread
  -> process_one_work()
    -> lockup_detector_delay_init()
      -> watchdog_hardlockup_probe()
        -> hardlockup_detector_event_create()

Thus, `930d8f8dbab9` remains the correct "Fixes" target for ToT.

> OK, fair enough. ...but I'm a bit curious why nobody else saw this
> WARN_ON(). I'm also curious if you have tested the hardlockup detector
> on newer kernels, or if all of your work has been done on 4.19. If all
> your work has been done on 4.19, do we need to find someone to test
> your patch on a newer kernel and make sure it works OK? If you've
> tested on a newer kernel, did the hardlockup detector init from the
> kernel's early-init code, or the retry code?

In newer kernels, when the probe fails initially and falls
back to the retry workqueue (or even during early init if preemption is
enabled), the `WARN_ON(!is_percpu_thread())` in
`hardlockup_detector_event_create()` does indeed trigger because
`watchdog_hardlockup_probe()` is called from a non-bound context.

I have verified this patch on the openEuler 4.19 kernel. During our stress
testing, where we start dozens of VMs simultaneously to create high resource
contention, the UAF was consistently reproducible without this fix and is now
confirmed resolved.

The v4 patch addresses this by refactoring the creation logic to be stateless
and adding `cpu_hotplug_disable()` to ensure the probed CPU stays alive.

I'll wait for your further thoughts on v4:
https://lore.kernel.org/all/20260124070814.806828-1-realwujing@gmail.com/

Best regards,
Qiliang

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3] watchdog/hardlockup: Fix UAF in perf event cleanup due to migration race
  2026-01-26  3:30               ` Qiliang Yuan
@ 2026-01-27  1:14                 ` Doug Anderson
  2026-01-27  2:16                   ` [PATCH v4] " Qiliang Yuan
  0 siblings, 1 reply; 14+ messages in thread
From: Doug Anderson @ 2026-01-27  1:14 UTC (permalink / raw)
  To: Qiliang Yuan
  Cc: akpm, lihuafei1, linux-kernel, mingo, mm-commits, song, stable,
	sunshx, thorsten.blum, wangjinchao600, yangyicong, yuanql9,
	zhangjn11

Hi,

On Sun, Jan 25, 2026 at 7:30 PM Qiliang Yuan <realwujing@gmail.com> wrote:
>
> Hi Doug,
>
> Thanks for your further questions and for digging into the 4.19 vs ToT
> differences.
>
> On Sat, 24 Jan 2026 15:36:01 Doug Anderson <dianders@chromium.org> wrote:
> > The part that doesn't make a lot of sense to me, though, is that v4.19
> > also doesn't have commit 930d8f8dbab9 ("watchdog/perf: adapt the
> > watchdog_perf interface for async model"), which is where we are
> > saying the problem was introduced.
> >
> > ...so in v4.19 I think:
> > * hardlockup_detector_perf_init() is only called from watchdog_nmi_probe()
> > * watchdog_nmi_probe() is only called from lockup_detector_init()
> > * lockup_detector_init() is only called from kernel_init_freeable()
> > right before smp_init()
> >
> > Thus I'm super confused about how you could have seen the problem on
> > v4.19. Maybe your v4.19 kernel has some backported patches that makes
> > this possible?
>
> You caught it! Here is the context for the differences:
>
> 1. Mainline (ToT):
>    - `lockup_detector_init()` is always called before `smp_init()`
>      (pre-SMP phase).
>    - Risk source: The asynchronous retry path (`lockup_detector_delay_init`)
>      introduced by 930d8f8dbab9, which runs in a workqueue (post-SMP)
>      context and triggers the UAF.
>
> 2. openEuler (4.19/5.10):
>    - Local `euler inclusion` patches moved `lockup_detector_init()` after
>      `do_basic_setup()` (post-SMP phase).
>    - Risk source: The initial probe occurs directly in a post-SMP
>      environment, exposing the race condition.
>
> For openEuler (4.19/5.10) kernel, the call stack looks like this:
>   kernel_init()
>   -> kernel_init_freeable()
>     -> lockup_detector_init()       <-- Called after smp_init()
>       -> watchdog_nmi_probe()
>         -> hardlockup_detector_perf_init()
>           -> hardlockup_detector_event_create()
>
> In mainline (ToT), the initial probe (safe) call stack is:
>   kernel_init()
>   -> kernel_init_freeable()
>     -> lockup_detector_init()       <-- Called before smp_init()
>       -> watchdog_hardlockup_probe()
>         -> hardlockup_detector_event_create()
>
> However, the asynchronous retry mechanism (commit 930d8f8dbab9) executes the
> probe logic in a post-SMP, preemptible context.
>
> For the mainline (ToT) retry path (at risk), the call stack is:
>   kworker thread
>   -> process_one_work()
>     -> lockup_detector_delay_init()
>       -> watchdog_hardlockup_probe()
>         -> hardlockup_detector_event_create()
>
> Thus, `930d8f8dbab9` remains the correct "Fixes" target for ToT.

OK, at least I'm not crazy! That does indeed explain why things seemed
so wonky...


> > OK, fair enough. ...but I'm a bit curious why nobody else saw this
> > WARN_ON(). I'm also curious if you have tested the hardlockup detector
> > on newer kernels, or if all of your work has been done on 4.19. If all
> > your work has been done on 4.19, do we need to find someone to test
> > your patch on a newer kernel and make sure it works OK? If you've
> > tested on a newer kernel, did the hardlockup detector init from the
> > kernel's early-init code, or the retry code?
>
> In newer kernels, when the probe fails initially and falls
> back to the retry workqueue (or even during early init if preemption is
> enabled), the `WARN_ON(!is_percpu_thread())` in
> `hardlockup_detector_event_create()` does indeed trigger because
> `watchdog_hardlockup_probe()` is called from a non-bound context.
>
> I have verified this patch on the openEuler 4.19 kernel. During our stress
> testing, where we start dozens of VMs simultaneously to create high resource
> contention, the UAF was consistently reproducible without this fix and is now
> confirmed resolved.
>
> The v4 patch addresses this by refactoring the creation logic to be stateless
> and adding `cpu_hotplug_disable()` to ensure the probed CPU stays alive.

OK, so I think the answer is: you haven't actually seen the problem
(or the WARN_ON) on a mainline kernel, only on the openEuler 4.19
kernel...

...actually, I looked and now think the problem doesn't exist on a
mainline kernel. Specificaly, when we run lockup_detector_retry_init()
we call schedule_work() to do the work. That schedules work on the
"system_percpu_wq". While the work ends up being queued with
"WORK_CPU_UNBOUND", I believe that we still end up running on a thread
that's bound to just one CPU in the end.  This is presumably why
nobody has reported that "WARN_ON(!is_percpu_thread())" actually
hitting on mainline.

Given the above, it sounds to me like the problem you're having is
with a downstream kernel and upstream is actually fine. Did I
understand that correctly?

If that's the case, we'd definitely want to at least change the
description and presumably _remove_ the Fixes tag? I actually still
think the code looks nicer after your CL and (maybe?) we could even
remove the whole schedule_work() for running this code? Maybe it was
only added to deal with this exact problem? ...but the CL description
would definitely need to be updated.


> I'll wait for your further thoughts on v4:
> https://lore.kernel.org/all/20260124070814.806828-1-realwujing@gmail.com/

Sure. In the very least the CL description would need to be updated
(assuming my understanding is correct), but for now let's avoid
forking the conversation and resolve things here?

-Doug

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v4] watchdog/hardlockup: Fix UAF in perf event cleanup due to migration race
  2026-01-27  1:14                 ` Doug Anderson
@ 2026-01-27  2:16                   ` Qiliang Yuan
  2026-01-27 21:37                     ` Doug Anderson
  0 siblings, 1 reply; 14+ messages in thread
From: Qiliang Yuan @ 2026-01-27  2:16 UTC (permalink / raw)
  To: dianders
  Cc: akpm, lihuafei1, linux-kernel, mingo, mm-commits, realwujing,
	song, stable, sunshx, thorsten.blum, wangjinchao600, yangyicong,
	yuanql9, zhangjn11

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=y, Size: 3192 bytes --]

Hi Doug,

Thanks for your insightful follow-up! It's great to have the openEuler vs. Mainline 
timing differences clarified—it definitely explains why we hit this so reliably 
in our downstream environment.

On Mon, Jan 26, 2026 at 5:14 PM Doug Anderson <dianders@chromium.org> wrote:
> OK, so I think the answer is: you haven't actually seen the problem
> (or the WARN_ON) on a mainline kernel, only on the openEuler 4.19
> kernel...
>
> ...actually, I looked and now think the problem doesn't exist on a
> mainline kernel. Specificaly, when we run lockup_detector_retry_init()
> we call schedule_work() to do the work. That schedules work on the
> "system_percpu_wq". While the work ends up being queued with
> "WORK_CPU_UNBOUND", I believe that we still end up running on a thread
> that's bound to just one CPU in the end. This is presumably why
> nobody has reported that "WARN_ON(!is_percpu_thread())" actually
> hitting on mainline.

You are right that in the latest mainline, schedule_work() has been updated 
to use 'system_percpu_wq'. However, in many LTS kernels (including 4.19), 
schedule_work() still submits to 'system_wq', which lacks the per-cpu 
guarantee.

More importantly, even on 'system_percpu_wq', the worker threads do not 
carry the PF_PERCPU_THREAD flag. is_percpu_thread() specifically checks 
(current->flags & PF_PERCPU_THREAD), which is reserved for kthreads 
specifically pinned via kthread_create_on_cpu(). Therefore, the 
WARN_ON(!is_percpu_thread()) in hardlockup_detector_event_create() is 
still violated in the retry path even on mainline.

The UAF risk stems from the fact that preemption is enabled during the 
probe. If the worker thread (even if on a per-cpu wq) is preempted or 
if the logic assumes the task cannot migrate (which is_percpu_thread 
usually guarantees), we have a logical gap. By making the probe path 
stateless and using cpu_hotplug_disable(), we eliminate this dependency 
entirely.

> If that's the case, we'd definitely want to at least change the
> description and presumably _remove_ the Fixes tag? I actually still
> think the code looks nicer after your CL and (maybe?) we could even
> remove the whole schedule_work() for running this code? Maybe it was
> only added to deal with this exact problem? ...but the CL description
> would definitely need to be updated.

The schedule_work() in lockup_detector_retry_init() (added by 930d8f8dbab9) 
is necessary for platforms where the PMU or other dependencies aren't ready 
during early init. 

I agree that the commit description should be updated to clarify that 
while the issue was caught in a downstream kernel with shifted init timings, 
it identifies a latent race condition in the mainline retry path. 

Regarding the 'Fixes' tag, since 930d8f8dbab9 introduced the asynchronous 
retry path which calls the probe logic from a non-percpu-thread context, 
it still seems like the appropriate target for the "root cause" of the 
vulnerability.

I'll refactor the commit message in V5 to better reflect this context 
and remove the emphasis on ToT being "broken" out-of-the-box (since early 
init is indeed safe there).

How does that sound to you?

Best regards,
Qiliang

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v4] watchdog/hardlockup: Fix UAF in perf event cleanup due to migration race
  2026-01-27  2:16                   ` [PATCH v4] " Qiliang Yuan
@ 2026-01-27 21:37                     ` Doug Anderson
  2026-01-28  2:37                       ` Qiliang Yuan
  0 siblings, 1 reply; 14+ messages in thread
From: Doug Anderson @ 2026-01-27 21:37 UTC (permalink / raw)
  To: Qiliang Yuan
  Cc: akpm, lihuafei1, linux-kernel, mingo, mm-commits, song, stable,
	sunshx, thorsten.blum, wangjinchao600, yangyicong, yuanql9,
	zhangjn11

Hi,

On Mon, Jan 26, 2026 at 6:17 PM Qiliang Yuan <realwujing@gmail.com> wrote:
>
> Hi Doug,
>
> Thanks for your insightful follow-up! It's great to have the openEuler vs. Mainline
> timing differences clarified—it definitely explains why we hit this so reliably
> in our downstream environment.
>
> On Mon, Jan 26, 2026 at 5:14 PM Doug Anderson <dianders@chromium.org> wrote:
> > OK, so I think the answer is: you haven't actually seen the problem
> > (or the WARN_ON) on a mainline kernel, only on the openEuler 4.19
> > kernel...
> >
> > ...actually, I looked and now think the problem doesn't exist on a
> > mainline kernel. Specificaly, when we run lockup_detector_retry_init()
> > we call schedule_work() to do the work. That schedules work on the
> > "system_percpu_wq". While the work ends up being queued with
> > "WORK_CPU_UNBOUND", I believe that we still end up running on a thread
> > that's bound to just one CPU in the end. This is presumably why
> > nobody has reported that "WARN_ON(!is_percpu_thread())" actually
> > hitting on mainline.
>
> You are right that in the latest mainline, schedule_work() has been updated
> to use 'system_percpu_wq'. However, in many LTS kernels (including 4.19),
> schedule_work() still submits to 'system_wq', which lacks the per-cpu
> guarantee.

Really, it matters what schedule_work() does on anyone who happens to
have commit 930d8f8dbab9 ("watchdog/perf: adapt the watchdog_perf
interface for async model")... While I can sympathize with supporting
older kernels and doing backports, we have to focus on supporting the
mainline kernel here. If we're claiming that we're fixing a bug (and
even your newest CL says it's fixing a UAF and has a Fixes tag) then
the bug has to actually be there.

> More importantly, even on 'system_percpu_wq', the worker threads do not
> carry the PF_PERCPU_THREAD flag. is_percpu_thread() specifically checks
> (current->flags & PF_PERCPU_THREAD), which is reserved for kthreads
> specifically pinned via kthread_create_on_cpu().

I think we need to keep the focus on mainline or at least the kernel
as of commit 930d8f8dbab9. The grep for "PF_PERCPU_THREAD" has no hits
in either. In both cases, it is:

return (current->flags & PF_NO_SETAFFINITY) &&
    (current->nr_cpus_allowed  == 1);

> Therefore, the
> WARN_ON(!is_percpu_thread()) in hardlockup_detector_event_create() is
> still violated in the retry path even on mainline.

To ask directly: have you seen this WARN_ON in mainline, or is this
all speculative?

I'm going to assert that the WARN_ON is _not_ seen on mainline and
wasn't there as of commit 930d8f8dbab9. Specifically, the same set of
patches that added the "retry" for the hardlockup detector had the
WARN_ON(). It feels highly unlikely the WARN_ON was firing at that
point in time. You can see the whole series of patches at:

https://lore.kernel.org/linux-arm-kernel/20220903093415.15850-1-lecopzer.chen@mediatek.com/

...yes, I ended up rebasing them and included them when I landed the
buddy lockup detector where they landed, but they should have been
equivalent to Lecopzer's patches.

> The UAF risk stems from the fact that preemption is enabled during the
> probe. If the worker thread (even if on a per-cpu wq) is preempted or
> if the logic assumes the task cannot migrate (which is_percpu_thread
> usually guarantees), we have a logical gap. By making the probe path
> stateless and using cpu_hotplug_disable(), we eliminate this dependency
> entirely.
>
> > If that's the case, we'd definitely want to at least change the
> > description and presumably _remove_ the Fixes tag? I actually still
> > think the code looks nicer after your CL and (maybe?) we could even
> > remove the whole schedule_work() for running this code? Maybe it was
> > only added to deal with this exact problem? ...but the CL description
> > would definitely need to be updated.
>
> The schedule_work() in lockup_detector_retry_init() (added by 930d8f8dbab9)
> is necessary for platforms where the PMU or other dependencies aren't ready
> during early init.
>
> I agree that the commit description should be updated to clarify that
> while the issue was caught in a downstream kernel with shifted init timings,
> it identifies a latent race condition in the mainline retry path.
>
> Regarding the 'Fixes' tag, since 930d8f8dbab9 introduced the asynchronous
> retry path which calls the probe logic from a non-percpu-thread context,
> it still seems like the appropriate target for the "root cause" of the
> vulnerability.
>
> I'll refactor the commit message in V5 to better reflect this context
> and remove the emphasis on ToT being "broken" out-of-the-box (since early
> init is indeed safe there).
>
> How does that sound to you?

I'm still not convinced that there was ever a UAF in mainline nor that
this actually "Fixes" anything in mainline. I do agree that the code
is better by not having it write the per-cpu variable at probe time,
but unless you can say that you've actually tested _on mainline_ and
demonstrated that the WARN_ON() is truly hitting _on mainline_ by
providing a printout of it happening _on mainline_ or somehow shown
the UAF actually happening _on mainline_ then we simply can't claim
that this is a Fix. Although I supposed I'd also be OK with doing any
of the above on any pure upstream kernel after commit 930d8f8dbab9, as
well.

-Doug

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v4] watchdog/hardlockup: Fix UAF in perf event cleanup due to migration race
  2026-01-27 21:37                     ` Doug Anderson
@ 2026-01-28  2:37                       ` Qiliang Yuan
  0 siblings, 0 replies; 14+ messages in thread
From: Qiliang Yuan @ 2026-01-28  2:37 UTC (permalink / raw)
  To: dianders
  Cc: akpm, lihuafei1, linux-kernel, mingo, mm-commits, realwujing,
	song, stable, sunshx, thorsten.blum, wangjinchao600, yangyicong,
	yuanql9, zhangjn11

Hi Doug,

Thanks for your detailed feedback and for the patient explanation regarding the 
mainline workqueue behavior.

On Tue, 27 Jan 2026 13:37:28 Doug Anderson <dianders@chromium.org> wrote:
> Really, it matters what schedule_work() does on anyone who happens to have
> commit 930d8f8dbab9 ("watchdog/perf: adapt the watchdog_perf interface
> for async model")... we have to focus on supporting the mainline kernel here.

I completely agree that the focus must be on the mainline kernel. I've since
checked and confirmed that in mainline, schedule_work() is redirected to
system_percpu_wq (via include/linux/workqueue.h), which provides the
necessary CPU affinity.

> To ask directly: have you seen this WARN_ON in mainline, or is this
> all speculative?

To be direct: no, I haven't seen this WARN_ON on a pure mainline kernel. As
you suspected, the issue was identified in a downstream 4.19-based kernel
with different initialization timings and workqueue behavior. My assumption
that it would also affect mainline was indeed speculative and based on an
incomplete understanding of include/linux/sched.h's is_percpu_thread()
implementation on modern kernels.

> I'm still not convinced that there was ever a UAF in mainline nor that
> this actually "Fixes" anything in mainline. I do agree that the code
> is better by not having it write the per-cpu variable at probe time

Since the risk is not currently manifested in mainline, I have refactored
the patch as a "cleanup and robustness improvement" as you suggested. This
removes the fragile implicit dependency on the caller's context and makes
the probe stateless.

I have sent v6 with these changes. Please ignore v5 and review v6 instead.

v6 changes:
- Renamed the title to "simplify perf event probe and remove per-cpu dependency".
- Removed the "Fixes:" tag and "Cc: stable".
- Rewrote the commit record in the imperative mood.
- Updated the description to clarify that it addresses code brittleness
  rather than a confirmed mainline bug.

v6 link: https://lore.kernel.org/all/20260127025814.1200345-1-realwujing@gmail.com/

Best regards,
Qiliang

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v4] watchdog/hardlockup: Fix UAF in perf event cleanup due to migration race
  2026-01-24  0:01         ` Doug Anderson
  2026-01-24  6:57           ` Qiliang Yuan
@ 2026-01-24  7:08           ` Qiliang Yuan
  1 sibling, 0 replies; 14+ messages in thread
From: Qiliang Yuan @ 2026-01-24  7:08 UTC (permalink / raw)
  To: dianders
  Cc: akpm, lihuafei1, linux-kernel, mingo, mm-commits, realwujing,
	song, stable, sunshx, thorsten.blum, wangjinchao600, yangyicong,
	yuanql9, zhangjn11, linux-watchdog

Original analysis on Linux 4.19 showed a race condition in the hardlockup
detector's initialization phase. Specifically, during the early probe
phase, hardlockup_detector_perf_init() (renamed to
watchdog_hardlockup_probe() in newer kernels via commit d9b3629ade8e)
interacted with the per-cpu 'watchdog_ev' variable.

If the initializing task migrates to another CPU during this probe phase,
two issues arise:
1. The 'watchdog_ev' pointer on the original CPU is set but not cleared,
   leaving a stale pointer to a freed perf event.
2. The 'watchdog_ev' pointer on the new CPU might be incorrectly cleared.

Note: Although the logs below reference hardlockup_detector_perf_init(),
the same logic persists in the current watchdog_hardlockup_probe()
implementation.

This race condition was observed in console logs:
[23.038376] hardlockup_detector_perf_init 313 cur_cpu=2
...
[23.076385] hardlockup_detector_event_create 203 cpu(cur)=2 set watchdog_ev
...
[23.095788] perf_event_release_kernel 4623 cur_cpu=2
...
[23.116963] lockup_detector_reconfigure 577 cur_cpu=3

The log shows the task started on CPU 2, set watchdog_ev on CPU 2,
released the event on CPU 2, but then migrated to CPU 3 before the
cleanup logic could run. This left watchdog_ev on CPU 2 pointing to a
freed event, resulting in a UAF when later accessed:

[26.540732] BUG: KASAN: use-after-free in perf_event_ctx_lock_nested.isra.72+0x6b/0x140
[26.542442] Read of size 8 at addr ff110006b360d718 by task kworker/2:1/94

Fix this by refactoring hardlockup_detector_event_create() to return the
created perf event instead of directly assigning it to the per-cpu variable.
In the probe function, use an arbitrary CPU but ensure it remains
online via cpu_hotplug_disable() during the check.

Fixes: 930d8f8dbab9 ("watchdog/perf: adapt the watchdog_perf interface for async model")
Signed-off-by: Shouxin Sun <sunshx@chinatelecom.cn>
Signed-off-by: Junnan Zhang <zhangjn11@chinatelecom.cn>
Signed-off-by: Qiliang Yuan <realwujing@gmail.com>
Signed-off-by: Qiliang Yuan <yuanql9@chinatelecom.cn>
Cc: Song Liu <song@kernel.org>
Cc: Douglas Anderson <dianders@chromium.org>
Cc: Jinchao Wang <wangjinchao600@gmail.com>
Cc: Wang Jinchao <wangjinchao600@gmail.com>
Cc: <stable@vger.kernel.org>
---
v4:
- Add cpu_hotplug_disable() in watchdog_hardlockup_probe() to ensure the
  sampled CPU remains online during probing. 
- Update commit message to explain the relevance of 4.19 logs even
  though functions were renamed in modern kernels. 
v3:
- Refactor hardlockup_detector_event_create() to return the event pointer
  instead of directly assigning to per-cpu variables to fix the UAF.
- Restore PMU cycle fallback and unify the enable/probe paths.
v2:
- Add Cc: <stable@vger.kernel.org>.
v1:
- Avoid 'watchdog_ev' in probe path by manually creating and releasing a
  local perf event.
 kernel/watchdog_perf.c | 56 +++++++++++++++++++++++++-----------------
 1 file changed, 34 insertions(+), 22 deletions(-)

diff --git a/kernel/watchdog_perf.c b/kernel/watchdog_perf.c
index d3ca70e3c256..887b61c65c1b 100644
--- a/kernel/watchdog_perf.c
+++ b/kernel/watchdog_perf.c
@@ -17,6 +17,7 @@
 #include <linux/atomic.h>
 #include <linux/module.h>
 #include <linux/sched/debug.h>
+#include <linux/cpu.h>
 
 #include <asm/irq_regs.h>
 #include <linux/perf_event.h>
@@ -118,18 +119,11 @@ static void watchdog_overflow_callback(struct perf_event *event,
 	watchdog_hardlockup_check(smp_processor_id(), regs);
 }
 
-static int hardlockup_detector_event_create(void)
+static struct perf_event *hardlockup_detector_event_create(unsigned int cpu)
 {
-	unsigned int cpu;
 	struct perf_event_attr *wd_attr;
 	struct perf_event *evt;
 
-	/*
-	 * Preemption is not disabled because memory will be allocated.
-	 * Ensure CPU-locality by calling this in per-CPU kthread.
-	 */
-	WARN_ON(!is_percpu_thread());
-	cpu = raw_smp_processor_id();
 	wd_attr = &wd_hw_attr;
 	wd_attr->sample_period = hw_nmi_get_sample_period(watchdog_thresh);
 
@@ -143,14 +137,7 @@ static int hardlockup_detector_event_create(void)
 						       watchdog_overflow_callback, NULL);
 	}
 
-	if (IS_ERR(evt)) {
-		pr_debug("Perf event create on CPU %d failed with %ld\n", cpu,
-			 PTR_ERR(evt));
-		return PTR_ERR(evt);
-	}
-	WARN_ONCE(this_cpu_read(watchdog_ev), "unexpected watchdog_ev leak");
-	this_cpu_write(watchdog_ev, evt);
-	return 0;
+	return evt;
 }
 
 /**
@@ -159,17 +146,26 @@ static int hardlockup_detector_event_create(void)
  */
 void watchdog_hardlockup_enable(unsigned int cpu)
 {
+	struct perf_event *evt;
+
 	WARN_ON_ONCE(cpu != smp_processor_id());
 
-	if (hardlockup_detector_event_create())
+	evt = hardlockup_detector_event_create(cpu);
+	if (IS_ERR(evt)) {
+		pr_debug("Perf event create on CPU %d failed with %ld\n", cpu,
+			 PTR_ERR(evt));
 		return;
+	}
 
 	/* use original value for check */
 	if (!atomic_fetch_inc(&watchdog_cpus))
 		pr_info("Enabled. Permanently consumes one hw-PMU counter.\n");
 
+	WARN_ONCE(this_cpu_read(watchdog_ev), "unexpected watchdog_ev leak");
+	this_cpu_write(watchdog_ev, evt);
+
 	watchdog_init_timestamp();
-	perf_event_enable(this_cpu_read(watchdog_ev));
+	perf_event_enable(evt);
 }
 
 /**
@@ -263,19 +259,35 @@ bool __weak __init arch_perf_nmi_is_available(void)
  */
 int __init watchdog_hardlockup_probe(void)
 {
+	struct perf_event *evt;
+	unsigned int cpu;
 	int ret;
 
 	if (!arch_perf_nmi_is_available())
 		return -ENODEV;
 
-	ret = hardlockup_detector_event_create();
+	if (!hw_nmi_get_sample_period(watchdog_thresh))
+		return -EINVAL;
 
-	if (ret) {
+	/*
+	 * Test hardware PMU availability by creating a temporary perf event.
+	 * The requested CPU is arbitrary; preemption is not disabled, so
+	 * raw_smp_processor_id() is used. Surround with cpu_hotplug_disable()
+	 * to ensure the arbitrarily chosen CPU remains online during the check.
+	 * The event is released immediately.
+	 */
+	cpu_hotplug_disable();
+	cpu = raw_smp_processor_id();
+	evt = hardlockup_detector_event_create(cpu);
+	if (IS_ERR(evt)) {
 		pr_info("Perf NMI watchdog permanently disabled\n");
+		ret = PTR_ERR(evt);
 	} else {
-		perf_event_release_kernel(this_cpu_read(watchdog_ev));
-		this_cpu_write(watchdog_ev, NULL);
+		perf_event_release_kernel(evt);
+		ret = 0;
 	}
+	cpu_hotplug_enable();
+
 	return ret;
 }
 
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2026-01-28  2:38 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-22  4:27 [PATCH] watchdog/hardlockup: Fix UAF in perf event cleanup due to migration race Qiliang Yuan
2026-01-22  5:24 ` [PATCH v2] " Qiliang Yuan
2026-01-22 21:59   ` Andrew Morton
2026-01-23  2:39     ` Doug Anderson
2026-01-23  6:34       ` [PATCH v3] " Qiliang Yuan
2026-01-24  0:01         ` Doug Anderson
2026-01-24  6:57           ` Qiliang Yuan
2026-01-24 23:36             ` Doug Anderson
2026-01-26  3:30               ` Qiliang Yuan
2026-01-27  1:14                 ` Doug Anderson
2026-01-27  2:16                   ` [PATCH v4] " Qiliang Yuan
2026-01-27 21:37                     ` Doug Anderson
2026-01-28  2:37                       ` Qiliang Yuan
2026-01-24  7:08           ` Qiliang Yuan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox