public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Petr Mladek <pmladek@suse.com>
To: Aaron Tomlin <atomlin@atomlin.com>
Cc: akpm@linux-foundation.org, lance.yang@linux.dev,
	mhiramat@kernel.org, gregkh@linuxfoundation.org,
	joel.granados@kernel.org, neelx@suse.com, sean@ashe.io,
	mproche@gmail.com, chjohnst@gmail.com, nick.lange@gmail.com,
	linux-kernel@vger.kernel.org
Subject: Re: [v7 PATCH 2/2] hung_task: Enable runtime reset of hung_task_detect_count
Date: Mon, 2 Feb 2026 14:26:14 +0100	[thread overview]
Message-ID: <aYCl9iTmr175xvwN@pathway> (raw)
In-Reply-To: <20260125135848.3356585-3-atomlin@atomlin.com>

On Sun 2026-01-25 08:58:48, Aaron Tomlin wrote:
> Currently, the hung_task_detect_count sysctl provides a cumulative count
> of hung tasks since boot. In long-running, high-availability
> environments, this counter may lose its utility if it cannot be reset
> once an incident has been resolved. Furthermore, the previous
> implementation relied upon implicit ordering, which could not strictly
> guarantee that diagnostic metadata published by one CPU was visible to
> the panic logic on another.
> 
> This patch introduces the capability to reset the detection count by
> writing "0" to the hung_task_detect_count sysctl. The proc_handler logic
> has been updated to validate this input and atomically reset the
> counter.
> 
> The synchronisation of sysctl_hung_task_detect_count relies upon a
> transactional model to ensure the integrity of the detection counter
> against concurrent resets from userspace. The application of
> atomic_long_read_acquire() and atomic_long_cmpxchg_release() is correct
> and provides the following guarantees:
> 
>     1. Prevention of Load-Store Reordering via Acquire Semantics By
>        utilising atomic_long_read_acquire() to snapshot the counter
>        before initiating the task traversal, we establish a strict
>        memory barrier. This prevents the compiler or hardware from
>        reordering the initial load to a point later in the scan. Without
>        this "acquire" barrier, a delayed load could potentially read a
>        "0" value resulting from a userspace reset that occurred
>        mid-scan. This would lead to the subsequent cmpxchg succeeding
>        erroneously, thereby overwriting the user's reset with stale
>        increment data.
> 
>     2. Atomicity of the "Commit" Phase via Release Semantics The
>        atomic_long_cmpxchg_release() serves as the transaction's commit
>        point. The "release" barrier ensures that all diagnostic
>        recordings and task-state observations made during the scan are
>        globally visible before the counter is incremented.
> 
>     3. Race Condition Resolution This pairing effectively detects any
>        "out-of-band" reset of the counter. If
>        sysctl_hung_task_detect_count is modified via the procfs
>        interface during the scan, the final cmpxchg will detect the
>        discrepancy between the current value and the "acquire" snapshot.
>        Consequently, the update will fail, ensuring that a reset command
>        from the administrator is prioritised over a scan that may have
>        been invalidated by that very reset.
> 
> Signed-off-by: Aaron Tomlin <atomlin@atomlin.com>

LGTM, feel free to use:

Reviewed-by: Petr Mladek <pmladek@suse.com>

Best Regards,
Petr

  parent reply	other threads:[~2026-02-02 13:26 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-25 13:58 [v7 PATCH 0/2] hung_task: Provide runtime reset interface for hung task detector Aaron Tomlin
2026-01-25 13:58 ` [v7 PATCH 1/2] hung_task: Refactor detection logic and atomicise detection count Aaron Tomlin
2026-02-02  6:10   ` Masami Hiramatsu
2026-02-02 12:59   ` Petr Mladek
2026-02-03  3:05   ` Lance Yang
2026-02-03  3:08     ` Lance Yang
2026-02-03  9:03       ` Petr Mladek
2026-02-03 11:01         ` Lance Yang
2026-02-04 11:04           ` [PATCH] hung_task: Increment the global counter immediately Petr Mladek
2026-02-04 11:21             ` Lance Yang
2026-02-04 14:00             ` Aaron Tomlin
2026-02-04 18:05             ` Andrew Morton
2026-02-06 20:54               ` Aaron Tomlin
2026-02-07  6:10                 ` Lance Yang
2026-02-04 14:07         ` [v7 PATCH 1/2] hung_task: Refactor detection logic and atomicise detection count Aaron Tomlin
2026-01-25 13:58 ` [v7 PATCH 2/2] hung_task: Enable runtime reset of hung_task_detect_count Aaron Tomlin
2026-02-02  6:09   ` Masami Hiramatsu
2026-02-02 13:26   ` Petr Mladek [this message]
2026-02-01 19:48 ` [v7 PATCH 0/2] hung_task: Provide runtime reset interface for hung task detector Aaron Tomlin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aYCl9iTmr175xvwN@pathway \
    --to=pmladek@suse.com \
    --cc=akpm@linux-foundation.org \
    --cc=atomlin@atomlin.com \
    --cc=chjohnst@gmail.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=joel.granados@kernel.org \
    --cc=lance.yang@linux.dev \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mhiramat@kernel.org \
    --cc=mproche@gmail.com \
    --cc=neelx@suse.com \
    --cc=nick.lange@gmail.com \
    --cc=sean@ashe.io \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox