From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from out-171.mta1.migadu.com (out-171.mta1.migadu.com [95.215.58.171])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7E4C3318B8F
	for <linux-kernel@vger.kernel.org>; Tue,  3 Feb 2026 11:01:13 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=95.215.58.171
ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1770116475; cv=none; b=qpAU51qfZPiPrpCyCopX+jgiuZmzqZGAmY3ISnO835SncwxtVGE1mJAkFpvPoNVPXYoHPTmIvzfkmCiP97KX7abQMO0hlyKwk4uZKo1CV6Ojlxl0pWiuodajJEMmYdPEIq7daVb68Obtw53Rzv5sEGbTCicbH7b67Sb7m8bbA0U=
ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1770116475; c=relaxed/simple;
	bh=5rsP0mkRVAsLfr6lj3Ko33DsHP732qBDzGkPOffSh3A=;
	h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From:
	 In-Reply-To:Content-Type; b=pHkl3E2spnBBykZaTWqPbJOz5pERiXOKwEOp2mDjBod9cznZDyO/ZhUvonNl9g0PHmnPC35ijzXokqYrqgNFJIcvIQ5k8W0iESk6dIkILhZHw5G6d+iTrQQkYmjxlQo+5gRhiMHniSs/WTNTOaBqERzxIsyXh/C+W2oLPpei7qY=
ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=IbpbLA6B; arc=none smtp.client-ip=95.215.58.171
Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev
Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="IbpbLA6B"
Message-ID: <c4846caa-3979-4207-96bf-76d2b4ae20cc@linux.dev>
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1;
	t=1770116471;
	h=from:from:reply-to:subject:subject:date:date:message-id:message-id:
	 to:to:cc:cc:mime-version:mime-version:content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references;
	bh=EkNXyjSvllkhiX+Yl/TG0ClVHSZgyvMSl7O3AUU5XwY=;
	b=IbpbLA6BeVYxMzt8oiPqK8c3EoamrYHr9kC13j2rRccSfiMB44hez6sL3Dtvy6zYrATe87
	nvKIBDMQpwuiFw4x0g9gO4oeQA7PPvTTM/0jtmqCgKPCJ69Dk6Bmj62TopJyVKyBRRaiwM
	BuXpMiwJ3olBcCu6dRCHXrm1o/oiglo=
Date: Tue, 3 Feb 2026 19:01:01 +0800
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Subject: Re: [v7 PATCH 1/2] hung_task: Refactor detection logic and atomicise
 detection count
Content-Language: en-US
To: Petr Mladek <pmladek@suse.com>
Cc: Aaron Tomlin <atomlin@atomlin.com>, neelx@suse.com, sean@ashe.io,
 akpm@linux-foundation.org, mproche@gmail.com, chjohnst@gmail.com,
 nick.lange@gmail.com, linux-kernel@vger.kernel.org, mhiramat@kernel.org,
 joel.granados@kernel.org, gregkh@linuxfoundation.org
References: <20260125135848.3356585-1-atomlin@atomlin.com>
 <20260125135848.3356585-2-atomlin@atomlin.com>
 <f239e00f-4282-408d-b172-0f9885f4b01b@linux.dev>
 <45c17b93-a6f1-45e0-8b25-20665a281949@linux.dev>
 <aYG5-GP50UtDCk7s@pathway.suse.cz>
X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers.
From: Lance Yang <lance.yang@linux.dev>
In-Reply-To: <aYG5-GP50UtDCk7s@pathway.suse.cz>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
X-Migadu-Flow: FLOW_OUT


On 2026/2/3 17:03, Petr Mladek wrote:
> On Tue 2026-02-03 11:08:33, Lance Yang wrote:
>> On 2026/2/3 11:05, Lance Yang wrote:
>>> On 2026/1/25 21:58, Aaron Tomlin wrote:
>>>> The check_hung_task() function currently conflates two distinct
>>>> responsibilities: validating whether a task is hung and handling the
>>>> subsequent reporting (printing warnings, triggering panics, or
>>>> tracepoints).
>>>>
>>>> This patch refactors the logic by introducing hung_task_info(), a
>>>> function dedicated solely to reporting. The actual detection check,
>>>> task_is_hung(), is hoisted into the primary loop within
>>>> check_hung_uninterruptible_tasks(). This separation clearly decouples
>>>> the mechanism of detection from the policy of reporting.
>>>>
>>>> Furthermore, to facilitate future support for concurrent hung task
>>>> detection, the global sysctl_hung_task_detect_count variable is
>>>> converted from unsigned long to atomic_long_t. Consequently, the
>>>> counting logic is updated to accumulate the number of hung tasks locally
>>>> (this_round_count) during the iteration. The global counter is then
>>>> updated atomically via atomic_long_cmpxchg_relaxed() once the loop
>>>> concludes, rather than incrementally during the scan.
>>>>
>>>> These changes are strictly preparatory and introduce no functional
>>>> change to the system's runtime behaviour.
>>>>
>>>> Signed-off-by: Aaron Tomlin <atomlin@atomlin.com>
>>>> ---
>>>>    kernel/hung_task.c | 58 ++++++++++++++++++++++++++--------------------
>>>>    1 file changed, 33 insertions(+), 25 deletions(-)
>>>>
>>>> diff --git a/kernel/hung_task.c b/kernel/hung_task.c
>>>> index d2254c91450b..df10830ed9ef 100644
>>>> --- a/kernel/hung_task.c
>>>> +++ b/kernel/hung_task.c
>>>> @@ -36,7 +36,7 @@ static int __read_mostly
>>>> sysctl_hung_task_check_count = PID_MAX_LIMIT;
>>>>    /*
>>>>     * Total number of tasks detected as hung since boot:
>>>>     */
>>>> -static unsigned long __read_mostly sysctl_hung_task_detect_count;
>>>> +static atomic_long_t sysctl_hung_task_detect_count =
>>>> ATOMIC_LONG_INIT(0);
>>>>    /*
>>>>     * Limit number of tasks checked in a batch.
>>>> @@ -223,31 +223,29 @@ static inline void debug_show_blocker(struct
>>>> task_struct *task, unsigned long ti
>>>>    }
>>>>    #endif
>>>> -static void check_hung_task(struct task_struct *t, unsigned long
>>>> timeout,
>>>> -        unsigned long prev_detect_count)
>>>> +/**
>>>> + * hung_task_info - Print diagnostic details for a hung task
>>>> + * @t: Pointer to the detected hung task.
>>>> + * @timeout: Timeout threshold for detecting hung tasks
>>>> + * @this_round_count: Count of hung tasks detected in the current
>>>> iteration
>>>> + *
>>>> + * Print structured information about the specified hung task, if
>>>> warnings
>>>> + * are enabled or if the panic batch threshold is exceeded.
>>>> + */
>>>> +static void hung_task_info(struct task_struct *t, unsigned long timeout,
>>>> +               unsigned long this_round_count)
>>>>    {
>>>> -    unsigned long total_hung_task;
>>>> -
>>>> -    if (!task_is_hung(t, timeout))
>>>> -        return;
>>>> -
>>>> -    /*
>>>> -     * This counter tracks the total number of tasks detected as hung
>>>> -     * since boot.
>>>> -     */
>>>> -    sysctl_hung_task_detect_count++;
>>>
>>> Previously, the global detect count updated immediately when a hung task
>>> was found. BUT now, it only updates after the full scan finishes ...
>>>
>>> Ideally, the count should update as soon as possible, so that userspace
>>> can react in time :)
>>>
>>> For example, by migrating critical containers away from the node before
>>> the situation gets worse - something we already do.
>>
>> Sorry, I should have said that earlier - just realized it ...
> 
> Better late then sorry ;-)

;P

> 
> That said, is the delay really critical? I guess that the userspace
> checks the counter in regular intervals (seconds or tens of seconds).
> Or is there any way to get a notification immediately?

Just rely on polling the counter every 0.x seconds.

I don't think that the full scan would take many seconds, but reporting
(e.g. pr_err) could be slow somehow ...

> 
> Anyway, I thought how the counting and barriers might work when
> we update the global counter immediately. And I came up with
> the following:

Nice! That should be doing the right thing.

> 
> diff --git a/kernel/hung_task.c b/kernel/hung_task.c
> index 350093de0535..8bc043fbe89c 100644
> --- a/kernel/hung_task.c
> +++ b/kernel/hung_task.c
> @@ -302,15 +302,10 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout)
>   	int max_count = sysctl_hung_task_check_count;
>   	unsigned long last_break = jiffies;
>   	struct task_struct *g, *t;
> -	unsigned long total_count, this_round_count;
> +	unsigned long this_round_count;
>   	int need_warning = sysctl_hung_task_warnings;
>   	unsigned long si_mask = hung_task_si_mask;
>   
> -	/*
> -	 * The counter might get reset. Remember the initial value.
> -	 * Acquire prevents reordering task checks before this point.
> -	 */
> -	total_count = atomic_long_read_acquire(&sysctl_hung_task_detect_count);
>   	/*
>   	 * If the system crashed already then all bets are off,
>   	 * do not report extra hung tasks:
> @@ -330,6 +325,13 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout)
>   		}
>   
>   		if (task_is_hung(t, timeout)) {
> +			/*
> +			 * Increment the global counter so that userspace could
> +			 * start migrating tasks ASAP. But count the current
> +			 * round separately because userspace could reset
> +			 * the global counter at any time.
> +			 */
> +			atomic_long_inc(&sysctl_hung_task_detect_count);


Atomic increment with relaxed ordering, which is good enough and works 
well, IIUC.


>   			this_round_count++;
>   			hung_task_info(t, timeout, this_round_count);
>   		}
> @@ -340,15 +342,6 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout)
>   	if (!this_round_count)
>   		return;
>   
> -	/*
> -	 * Do not count this round when the global counter has been reset
> -	 * during this check. Release ensures we see all hang details
> -	 * recorded during the scan.
> -	 */
> -	atomic_long_cmpxchg_release(&sysctl_hung_task_detect_count,
> -				    total_count, total_count +
> -				    this_round_count);
> -
>   	if (need_warning || hung_task_call_panic) {
>   		si_mask |= SYS_INFO_LOCKS;
>   
> 
> I am not sure of the comment above the increment is needed.
> Well, it might help anyone to understand the motivation without
> digging in the git log history.

Looks good to me. Could you send it as a follow-up patch?

Cheers,
Lance