All of lore.kernel.org
 help / color / mirror / Atom feed
From: Petr Mladek <pmladek@suse.com>
To: Aaron Tomlin <atomlin@atomlin.com>
Cc: akpm@linux-foundation.org, lance.yang@linux.dev,
	mhiramat@kernel.org, gregkh@linuxfoundation.org,
	joel.granados@kernel.org, neelx@suse.com, sean@ashe.io,
	mproche@gmail.com, chjohnst@gmail.com, nick.lange@gmail.com,
	linux-kernel@vger.kernel.org
Subject: Re: [v7 PATCH 2/2] hung_task: Enable runtime reset of hung_task_detect_count
Date: Mon, 2 Feb 2026 14:26:14 +0100	[thread overview]
Message-ID: <aYCl9iTmr175xvwN@pathway> (raw)
In-Reply-To: <20260125135848.3356585-3-atomlin@atomlin.com>

On Sun 2026-01-25 08:58:48, Aaron Tomlin wrote:
> Currently, the hung_task_detect_count sysctl provides a cumulative count
> of hung tasks since boot. In long-running, high-availability
> environments, this counter may lose its utility if it cannot be reset
> once an incident has been resolved. Furthermore, the previous
> implementation relied upon implicit ordering, which could not strictly
> guarantee that diagnostic metadata published by one CPU was visible to
> the panic logic on another.
> 
> This patch introduces the capability to reset the detection count by
> writing "0" to the hung_task_detect_count sysctl. The proc_handler logic
> has been updated to validate this input and atomically reset the
> counter.
> 
> The synchronisation of sysctl_hung_task_detect_count relies upon a
> transactional model to ensure the integrity of the detection counter
> against concurrent resets from userspace. The application of
> atomic_long_read_acquire() and atomic_long_cmpxchg_release() is correct
> and provides the following guarantees:
> 
>     1. Prevention of Load-Store Reordering via Acquire Semantics By
>        utilising atomic_long_read_acquire() to snapshot the counter
>        before initiating the task traversal, we establish a strict
>        memory barrier. This prevents the compiler or hardware from
>        reordering the initial load to a point later in the scan. Without
>        this "acquire" barrier, a delayed load could potentially read a
>        "0" value resulting from a userspace reset that occurred
>        mid-scan. This would lead to the subsequent cmpxchg succeeding
>        erroneously, thereby overwriting the user's reset with stale
>        increment data.
> 
>     2. Atomicity of the "Commit" Phase via Release Semantics The
>        atomic_long_cmpxchg_release() serves as the transaction's commit
>        point. The "release" barrier ensures that all diagnostic
>        recordings and task-state observations made during the scan are
>        globally visible before the counter is incremented.
> 
>     3. Race Condition Resolution This pairing effectively detects any
>        "out-of-band" reset of the counter. If
>        sysctl_hung_task_detect_count is modified via the procfs
>        interface during the scan, the final cmpxchg will detect the
>        discrepancy between the current value and the "acquire" snapshot.
>        Consequently, the update will fail, ensuring that a reset command
>        from the administrator is prioritised over a scan that may have
>        been invalidated by that very reset.
> 
> Signed-off-by: Aaron Tomlin <atomlin@atomlin.com>

LGTM, feel free to use:

Reviewed-by: Petr Mladek <pmladek@suse.com>

Best Regards,
Petr

  parent reply	other threads:[~2026-02-02 13:26 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-25 13:58 [v7 PATCH 0/2] hung_task: Provide runtime reset interface for hung task detector Aaron Tomlin
2026-01-25 13:58 ` [v7 PATCH 1/2] hung_task: Refactor detection logic and atomicise detection count Aaron Tomlin
2026-02-02  6:10   ` Masami Hiramatsu
2026-02-02 12:59   ` Petr Mladek
2026-02-03  3:05   ` Lance Yang
2026-02-03  3:08     ` Lance Yang
2026-02-03  9:03       ` Petr Mladek
2026-02-03 11:01         ` Lance Yang
2026-02-04 11:04           ` [PATCH] hung_task: Increment the global counter immediately Petr Mladek
2026-02-04 11:21             ` Lance Yang
2026-02-04 14:00             ` Aaron Tomlin
2026-02-04 18:05             ` Andrew Morton
2026-02-06 20:54               ` Aaron Tomlin
2026-02-07  6:10                 ` Lance Yang
2026-02-04 14:07         ` [v7 PATCH 1/2] hung_task: Refactor detection logic and atomicise detection count Aaron Tomlin
2026-01-25 13:58 ` [v7 PATCH 2/2] hung_task: Enable runtime reset of hung_task_detect_count Aaron Tomlin
2026-02-02  6:09   ` Masami Hiramatsu
2026-02-02 13:26   ` Petr Mladek [this message]
2026-02-01 19:48 ` [v7 PATCH 0/2] hung_task: Provide runtime reset interface for hung task detector Aaron Tomlin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aYCl9iTmr175xvwN@pathway \
    --to=pmladek@suse.com \
    --cc=akpm@linux-foundation.org \
    --cc=atomlin@atomlin.com \
    --cc=chjohnst@gmail.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=joel.granados@kernel.org \
    --cc=lance.yang@linux.dev \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mhiramat@kernel.org \
    --cc=mproche@gmail.com \
    --cc=neelx@suse.com \
    --cc=nick.lange@gmail.com \
    --cc=sean@ashe.io \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.