From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from out-171.mta1.migadu.com (out-171.mta1.migadu.com [95.215.58.171]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7E4C3318B8F for ; Tue, 3 Feb 2026 11:01:13 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=95.215.58.171 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770116475; cv=none; b=qpAU51qfZPiPrpCyCopX+jgiuZmzqZGAmY3ISnO835SncwxtVGE1mJAkFpvPoNVPXYoHPTmIvzfkmCiP97KX7abQMO0hlyKwk4uZKo1CV6Ojlxl0pWiuodajJEMmYdPEIq7daVb68Obtw53Rzv5sEGbTCicbH7b67Sb7m8bbA0U= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770116475; c=relaxed/simple; bh=5rsP0mkRVAsLfr6lj3Ko33DsHP732qBDzGkPOffSh3A=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=pHkl3E2spnBBykZaTWqPbJOz5pERiXOKwEOp2mDjBod9cznZDyO/ZhUvonNl9g0PHmnPC35ijzXokqYrqgNFJIcvIQ5k8W0iESk6dIkILhZHw5G6d+iTrQQkYmjxlQo+5gRhiMHniSs/WTNTOaBqERzxIsyXh/C+W2oLPpei7qY= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=IbpbLA6B; arc=none smtp.client-ip=95.215.58.171 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="IbpbLA6B" Message-ID: DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1770116471; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=EkNXyjSvllkhiX+Yl/TG0ClVHSZgyvMSl7O3AUU5XwY=; b=IbpbLA6BeVYxMzt8oiPqK8c3EoamrYHr9kC13j2rRccSfiMB44hez6sL3Dtvy6zYrATe87 nvKIBDMQpwuiFw4x0g9gO4oeQA7PPvTTM/0jtmqCgKPCJ69Dk6Bmj62TopJyVKyBRRaiwM BuXpMiwJ3olBcCu6dRCHXrm1o/oiglo= Date: Tue, 3 Feb 2026 19:01:01 +0800 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Subject: Re: [v7 PATCH 1/2] hung_task: Refactor detection logic and atomicise detection count Content-Language: en-US To: Petr Mladek Cc: Aaron Tomlin , neelx@suse.com, sean@ashe.io, akpm@linux-foundation.org, mproche@gmail.com, chjohnst@gmail.com, nick.lange@gmail.com, linux-kernel@vger.kernel.org, mhiramat@kernel.org, joel.granados@kernel.org, gregkh@linuxfoundation.org References: <20260125135848.3356585-1-atomlin@atomlin.com> <20260125135848.3356585-2-atomlin@atomlin.com> <45c17b93-a6f1-45e0-8b25-20665a281949@linux.dev> X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Lance Yang In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Migadu-Flow: FLOW_OUT On 2026/2/3 17:03, Petr Mladek wrote: > On Tue 2026-02-03 11:08:33, Lance Yang wrote: >> On 2026/2/3 11:05, Lance Yang wrote: >>> On 2026/1/25 21:58, Aaron Tomlin wrote: >>>> The check_hung_task() function currently conflates two distinct >>>> responsibilities: validating whether a task is hung and handling the >>>> subsequent reporting (printing warnings, triggering panics, or >>>> tracepoints). >>>> >>>> This patch refactors the logic by introducing hung_task_info(), a >>>> function dedicated solely to reporting. The actual detection check, >>>> task_is_hung(), is hoisted into the primary loop within >>>> check_hung_uninterruptible_tasks(). This separation clearly decouples >>>> the mechanism of detection from the policy of reporting. >>>> >>>> Furthermore, to facilitate future support for concurrent hung task >>>> detection, the global sysctl_hung_task_detect_count variable is >>>> converted from unsigned long to atomic_long_t. Consequently, the >>>> counting logic is updated to accumulate the number of hung tasks locally >>>> (this_round_count) during the iteration. The global counter is then >>>> updated atomically via atomic_long_cmpxchg_relaxed() once the loop >>>> concludes, rather than incrementally during the scan. >>>> >>>> These changes are strictly preparatory and introduce no functional >>>> change to the system's runtime behaviour. >>>> >>>> Signed-off-by: Aaron Tomlin >>>> --- >>>>   kernel/hung_task.c | 58 ++++++++++++++++++++++++++-------------------- >>>>   1 file changed, 33 insertions(+), 25 deletions(-) >>>> >>>> diff --git a/kernel/hung_task.c b/kernel/hung_task.c >>>> index d2254c91450b..df10830ed9ef 100644 >>>> --- a/kernel/hung_task.c >>>> +++ b/kernel/hung_task.c >>>> @@ -36,7 +36,7 @@ static int __read_mostly >>>> sysctl_hung_task_check_count = PID_MAX_LIMIT; >>>>   /* >>>>    * Total number of tasks detected as hung since boot: >>>>    */ >>>> -static unsigned long __read_mostly sysctl_hung_task_detect_count; >>>> +static atomic_long_t sysctl_hung_task_detect_count = >>>> ATOMIC_LONG_INIT(0); >>>>   /* >>>>    * Limit number of tasks checked in a batch. >>>> @@ -223,31 +223,29 @@ static inline void debug_show_blocker(struct >>>> task_struct *task, unsigned long ti >>>>   } >>>>   #endif >>>> -static void check_hung_task(struct task_struct *t, unsigned long >>>> timeout, >>>> -        unsigned long prev_detect_count) >>>> +/** >>>> + * hung_task_info - Print diagnostic details for a hung task >>>> + * @t: Pointer to the detected hung task. >>>> + * @timeout: Timeout threshold for detecting hung tasks >>>> + * @this_round_count: Count of hung tasks detected in the current >>>> iteration >>>> + * >>>> + * Print structured information about the specified hung task, if >>>> warnings >>>> + * are enabled or if the panic batch threshold is exceeded. >>>> + */ >>>> +static void hung_task_info(struct task_struct *t, unsigned long timeout, >>>> +               unsigned long this_round_count) >>>>   { >>>> -    unsigned long total_hung_task; >>>> - >>>> -    if (!task_is_hung(t, timeout)) >>>> -        return; >>>> - >>>> -    /* >>>> -     * This counter tracks the total number of tasks detected as hung >>>> -     * since boot. >>>> -     */ >>>> -    sysctl_hung_task_detect_count++; >>> >>> Previously, the global detect count updated immediately when a hung task >>> was found. BUT now, it only updates after the full scan finishes ... >>> >>> Ideally, the count should update as soon as possible, so that userspace >>> can react in time :) >>> >>> For example, by migrating critical containers away from the node before >>> the situation gets worse - something we already do. >> >> Sorry, I should have said that earlier - just realized it ... > > Better late then sorry ;-) ;P > > That said, is the delay really critical? I guess that the userspace > checks the counter in regular intervals (seconds or tens of seconds). > Or is there any way to get a notification immediately? Just rely on polling the counter every 0.x seconds. I don't think that the full scan would take many seconds, but reporting (e.g. pr_err) could be slow somehow ... > > Anyway, I thought how the counting and barriers might work when > we update the global counter immediately. And I came up with > the following: Nice! That should be doing the right thing. > > diff --git a/kernel/hung_task.c b/kernel/hung_task.c > index 350093de0535..8bc043fbe89c 100644 > --- a/kernel/hung_task.c > +++ b/kernel/hung_task.c > @@ -302,15 +302,10 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout) > int max_count = sysctl_hung_task_check_count; > unsigned long last_break = jiffies; > struct task_struct *g, *t; > - unsigned long total_count, this_round_count; > + unsigned long this_round_count; > int need_warning = sysctl_hung_task_warnings; > unsigned long si_mask = hung_task_si_mask; > > - /* > - * The counter might get reset. Remember the initial value. > - * Acquire prevents reordering task checks before this point. > - */ > - total_count = atomic_long_read_acquire(&sysctl_hung_task_detect_count); > /* > * If the system crashed already then all bets are off, > * do not report extra hung tasks: > @@ -330,6 +325,13 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout) > } > > if (task_is_hung(t, timeout)) { > + /* > + * Increment the global counter so that userspace could > + * start migrating tasks ASAP. But count the current > + * round separately because userspace could reset > + * the global counter at any time. > + */ > + atomic_long_inc(&sysctl_hung_task_detect_count); Atomic increment with relaxed ordering, which is good enough and works well, IIUC. > this_round_count++; > hung_task_info(t, timeout, this_round_count); > } > @@ -340,15 +342,6 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout) > if (!this_round_count) > return; > > - /* > - * Do not count this round when the global counter has been reset > - * during this check. Release ensures we see all hang details > - * recorded during the scan. > - */ > - atomic_long_cmpxchg_release(&sysctl_hung_task_detect_count, > - total_count, total_count + > - this_round_count); > - > if (need_warning || hung_task_call_panic) { > si_mask |= SYS_INFO_LOCKS; > > > I am not sure of the comment above the increment is needed. > Well, it might help anyone to understand the motivation without > digging in the git log history. Looks good to me. Could you send it as a follow-up patch? Cheers, Lance