From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6789930DD1D;
	Thu, 22 Jan 2026 21:59:53 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201
ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1769119194; cv=none; b=hJD20mNRKPN2R/cXHU7aKKkj02M9UhGJHbipQN+kno7rFD5LxB39jeTb7pD2UrazGVUSe61RA00pf5PxW3PoD58a0wRyIO3mT4wsehmbCN9tIOlaRZEXlzwmv7F6ktXZae9zg6mbznLmbwRLmC0FJzVoK+t/8YHe93Spv/7rGHw=
ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1769119194; c=relaxed/simple;
	bh=t8y7aUQcU/qiqgYFpCKSJRRYf13Se/dbEuith7oxcEk=;
	h=Date:From:To:Cc:Subject:Message-Id:In-Reply-To:References:
	 Mime-Version:Content-Type; b=IyZVp41yuPl0ZQduUVR8NehcWjrbD8vpXZA4iXSOKaDZQEoUkzg1BQUddbSrfFtdT9XcQ2k+ZgeCCqSEdEk4OKNAeKSanpZKFDrOXWQRr6pow0J+Ce/pZ7f6Z1Xa7ZXBqdgmxOfVFTEpoKQj1PySDOFW7A/i1ArBuiSXA80KN2I=
ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux-foundation.org header.i=@linux-foundation.org header.b=GqmdQDdW; arc=none smtp.client-ip=10.30.226.201
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (1024-bit key) header.d=linux-foundation.org header.i=@linux-foundation.org header.b="GqmdQDdW"
Received: by smtp.kernel.org (Postfix) with ESMTPSA id 849FFC116D0;
	Thu, 22 Jan 2026 21:59:52 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org;
	s=korg; t=1769119193;
	bh=t8y7aUQcU/qiqgYFpCKSJRRYf13Se/dbEuith7oxcEk=;
	h=Date:From:To:Cc:Subject:In-Reply-To:References:From;
	b=GqmdQDdWpUV1+NT/kFUTO7fq9v+oatrAe0T1+aQUisPlqvNiSXIWP6557agpcvHBV
	 74polR+JjBtuzM+yPqkyF2KK4mhM4uyG7N3eQUBjD3zoKnLU5toVqPs8IpV8Kc4eob
	 uX5lAEjvz/8e69dKGQSd9ICkCiH+DJOTUSWB020k=
Date: Thu, 22 Jan 2026 13:59:51 -0800
From: Andrew Morton <akpm@linux-foundation.org>
To: Qiliang Yuan <realwujing@gmail.com>
Cc: lihuafei1@huawei.com, mingo@kernel.org, linux-kernel@vger.kernel.org,
 sunshx@chinatelecom.cn, thorsten.blum@linux.dev, wangjinchao600@gmail.com,
 yangyicong@hisilicon.com, yuanql9@chinatelecom.cn,
 zhangjn11@chinatelecom.cn, stable@vger.kernel.org, Song Liu
 <song@kernel.org>, Douglas Anderson <dianders@chromium.org>
Subject: Re: [PATCH v2] watchdog/hardlockup: Fix UAF in perf event cleanup
 due to migration race
Message-Id: <20260122135951.68ca60cf6ca3d90314306552@linux-foundation.org>
In-Reply-To: <20260122052442.667394-1-realwujing@gmail.com>
References: <20260122042717.657231-1-realwujing@gmail.com>
	<20260122052442.667394-1-realwujing@gmail.com>
X-Mailer: Sylpheed 3.7.0 (GTK+ 2.24.33; x86_64-pc-linux-gnu)
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit

On Thu, 22 Jan 2026 00:24:42 -0500 Qiliang Yuan <realwujing@gmail.com> wrote:

> During the early initialization of the hardlockup detector, the
> hardlockup_detector_perf_init() function probes for PMU hardware availability.
> It originally used hardlockup_detector_event_create(), which interacts with
> the per-cpu 'watchdog_ev' variable.

Thanks.

For a -stable backport it's desirable to have a Fixes: target.  But it
appears this is very old code?

Also, I'm not sure who best to ask to help review this change.  I'll
add a few cc's here.

[full email retained...]

> If the initializing task migrates to another CPU during this probe phase,
> two issues arise:
> 1. The 'watchdog_ev' pointer on the original CPU is set but not cleared,
>    leaving a stale pointer to a freed perf event.
> 2. The 'watchdog_ev' pointer on the new CPU might be incorrectly cleared.
> 
> This race condition was observed in console logs (captured by adding debug printks):
> 
> [23.038376] hardlockup_detector_perf_init 313 cur_cpu=2
> ...
> [23.076385] hardlockup_detector_event_create 203 cpu(cur)=2 set watchdog_ev
> ...
> [23.095788] perf_event_release_kernel 4623 cur_cpu=2
> ...
> [23.116963] lockup_detector_reconfigure 577 cur_cpu=3
> 
> The log shows the task started on CPU 2, set watchdog_ev on CPU 2,
> released the event on CPU 2, but then migrated to CPU 3 before the
> cleanup logic (which would clear watchdog_ev) could run. This left
> watchdog_ev on CPU 2 pointing to a freed event.
> 
> Later, when the watchdog is enabled/disabled on CPU 2, this stale pointer
> leads to a Use-After-Free (UAF) in perf_event_disable(), as detected by KASAN:
> [26.539140] ==================================================================
> [26.540732] BUG: KASAN: use-after-free in perf_event_ctx_lock_nested.isra.72+0x6b/0x140
> [26.542442] Read of size 8 at addr ff110006b360d718 by task kworker/2:1/94
> [26.543954]
> [26.544744] CPU: 2 PID: 94 Comm: kworker/2:1 Not tainted 4.19.90-debugkasan #11
> [26.546505] Hardware name: GoStack Foundation OpenStack Nova, BIOS 1.16.3-3.ctl3 04/01/2014
> [26.548256] Workqueue: events smp_call_on_cpu_callback
> [26.549267] Call Trace:
> [26.549936]  dump_stack+0x8b/0xbb
> [26.550731]  print_address_description+0x6a/0x270
> [26.551688]  kasan_report+0x179/0x2c0
> [26.552519]  ? perf_event_ctx_lock_nested.isra.72+0x6b/0x140
> [26.553654]  ? watchdog_disable+0x80/0x80
> [26.553657]  perf_event_ctx_lock_nested.isra.72+0x6b/0x140
> [26.556951]  ? dump_stack+0xa0/0xbb
> [26.564006]  ? watchdog_disable+0x80/0x80
> [26.564886]  perf_event_disable+0xa/0x30
> [26.565746]  hardlockup_detector_perf_disable+0x1b/0x60
> [26.566776]  watchdog_disable+0x51/0x80
> [26.567624]  softlockup_stop_fn+0x11/0x20
> [26.568499]  smp_call_on_cpu_callback+0x5b/0xb0
> [26.569443]  process_one_work+0x389/0x770
> [26.570311]  worker_thread+0x57/0x5a0
> [26.571124]  ? process_one_work+0x770/0x770
> [26.572031]  kthread+0x1ae/0x1d0
> [26.572810]  ? kthread_create_worker_on_cpu+0xc0/0xc0
> [26.573821]  ret_from_fork+0x1f/0x40
> [26.574638]
> [26.575178] Allocated by task 1:
> [26.575990]  kasan_kmalloc+0xa0/0xd0
> [26.576814]  kmem_cache_alloc_trace+0xf3/0x1e0
> [26.577732]  perf_event_alloc.part.89+0xb5/0x12b0
> [26.578700]  perf_event_create_kernel_counter+0x1e/0x1d0
> [26.579728]  hardlockup_detector_event_create+0x4e/0xc0
> [26.580744]  hardlockup_detector_perf_init+0x2f/0x60
> [26.581746]  lockup_detector_init+0x85/0xdc
> [26.582645]  kernel_init_freeable+0x34d/0x40e
> [26.583568]  kernel_init+0xf/0x130
> [26.584428]  ret_from_fork+0x1f/0x40
> [26.584429]
> [26.584430] Freed by task 0:
> [26.584433]  __kasan_slab_free+0x130/0x180
> [26.584436]  kfree+0x90/0x1a0
> [26.589641]  rcu_process_callbacks+0x2cb/0x6e0
> [26.590935]  __do_softirq+0x119/0x3a2
> [26.591965]
> [26.592630] The buggy address belongs to the object at ff110006b360d500
> [26.592630]  which belongs to the cache kmalloc-2048 of size 2048
> [26.592633] The buggy address is located 536 bytes inside of
> [26.592633]  2048-byte region [ff110006b360d500, ff110006b360dd00)
> [26.592634] The buggy address belongs to the page:
> [26.592637] page:ffd400001acd8200 count:1 mapcount:0 mapping:ff11000107c0e800 index:0x0 compound_mapcount: 0
> [26.600959] flags: 0x17ffffc0010200(slab|head)
> [26.601891] raw: 0017ffffc0010200 dead000000000100 dead000000000200 ff11000107c0e800
> [26.603541] raw: 0000000000000000 00000000800f000f 00000001ffffffff 0000000000000000
> [26.605546] page dumped because: kasan: bad access detected
> [26.606788]
> [26.607351] Memory state around the buggy address:
> [26.608556]  ff110006b360d600: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> [26.610565]  ff110006b360d680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> [26.610567] >ff110006b360d700: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> [26.610568]                             ^
> [26.610570]  ff110006b360d780: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> [26.610573]  ff110006b360d800: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> [26.618955] ==================================================================
> 
> Fix this by making the probe logic stateless. Use a local variable for the
> perf event and avoid accessing the per-cpu 'watchdog_ev' during initialization.
> This ensures that the probe event is always properly released regardless of
> task migration, and no stale global state is left behind.
> 
> Cc: stable@vger.kernel.org
> Signed-off-by: Shouxin Sun <sunshx@chinatelecom.cn>
> Signed-off-by: Junnan Zhang <zhangjn11@chinatelecom.cn>
> Signed-off-by: Qiliang Yuan <realwujing@gmail.com>
> Signed-off-by: Qiliang Yuan <yuanql9@chinatelecom.cn>
> ---
> v2:
> - Add Cc: stable@vger.kernel.org tag.
> ---
>  kernel/watchdog_perf.c | 28 ++++++++++++++++++++++++----
>  1 file changed, 24 insertions(+), 4 deletions(-)
> 
> diff --git a/kernel/watchdog_perf.c b/kernel/watchdog_perf.c
> index d3ca70e3c256..5066be7bba03 100644
> --- a/kernel/watchdog_perf.c
> +++ b/kernel/watchdog_perf.c
> @@ -264,18 +264,38 @@ bool __weak __init arch_perf_nmi_is_available(void)
>  int __init watchdog_hardlockup_probe(void)
>  {
>  	int ret;
> +	struct perf_event_attr *wd_attr = &wd_hw_attr;
> +	struct perf_event *evt;
> +	unsigned int cpu;
>  
>  	if (!arch_perf_nmi_is_available())
>  		return -ENODEV;
>  
> -	ret = hardlockup_detector_event_create();
> +	/*
> +	 * Test hardware PMU availability. Avoid using
> +	 * hardlockup_detector_event_create() to prevent migration-related
> +	 * stale pointers in the per-cpu watchdog_ev during early probe.
> +	 */
> +	wd_attr->sample_period = hw_nmi_get_sample_period(watchdog_thresh);
> +	if (!wd_attr->sample_period)
> +		return -EINVAL;
>  
> -	if (ret) {
> +	/*
> +	 * Use raw_smp_processor_id() for probing in preemptible init code.
> +	 * Migration after reading ID is acceptable as counter creation on
> +	 * the old CPU is sufficient for the probe.
> +	 */
> +	cpu = raw_smp_processor_id();
> +	evt = perf_event_create_kernel_counter(wd_attr, cpu, NULL,
> +					       watchdog_overflow_callback, NULL);
> +	if (IS_ERR(evt)) {
>  		pr_info("Perf NMI watchdog permanently disabled\n");
> +		ret = PTR_ERR(evt);
>  	} else {
> -		perf_event_release_kernel(this_cpu_read(watchdog_ev));
> -		this_cpu_write(watchdog_ev, NULL);
> +		perf_event_release_kernel(evt);
> +		ret = 0;
>  	}
> +
>  	return ret;
>  }
>  
> -- 
> 2.51.0