All of lore.kernel.org
 help / color / mirror / Atom feed
From: Lance Yang <lance.yang@linux.dev>
To: atomlin@atomlin.com
Cc: lance.yang@linux.dev, akpm@linux-foundation.org,
	mhiramat@kernel.org, pmladek@suse.com,
	linux-kernel@vger.kernel.org, david.laight.linux@gmail.com,
	neelx@suse.com, sean@ashe.io, chjohnst@gmail.com, steve@abita.co,
	mproche@gmail.com, nick.lange@gmail.com
Subject: Re: [PATCH v2] hung_task: Add per-round stack trace deduplication
Date: Sun, 21 Jun 2026 13:17:18 +0800	[thread overview]
Message-ID: <20260621051718.64919-1-lance.yang@linux.dev> (raw)
In-Reply-To: <ou2kjpz7ojgu7xb2bv6hwkzpr7mqodh5oxji5fl4zdkw775zko@aaskidk2i6ka>


On Sat, Jun 20, 2026 at 01:54:56PM -0400, Aaron Tomlin wrote:
[...]
>On Sat, Jun 20, 2026 at 11:37:15AM +0800, Lance Yang wrote:
>> Hi Aaron,
>>
>> On Fri, Jun 19, 2026 at 09:35:59PM -0400, Aaron Tomlin wrote:
>> >Currently, when multiple tasks hang in the exact same location (e.g.,
>> >such as severe contention for a mutex), khungtaskd indiscriminately
>> >reports every single instance. This wastes ring buffer space with
>> >identical stack traces up to the defined warning limit (i.e.,
>> >kernel.hung_task_warnings), obscuring the root cause without providing
>> >any additional diagnostic value.
>> >
>> >Introduce a lightweight, hash-based stack trace deduplicator for
>> >khungtaskd to ensure only unique stack traces are reported during
>> >a single detection interval.
>> >
>> >Technical details of the implementation:
>> > - Uses a 12-bit hash table (4096 slots), consuming just 16 KB of
>> >   static memory to prevent cache thrashing during massive hangs.
>> >
>> > - Operates purely serially within the single khungtaskd thread,
>> >   requiring zero atomic operations or concurrent locking overhead.
>> >
>> > - Flushes the lossy cache via memset() at the beginning of each
>> >   detection round. This ensures the immediate "thundering herd" of
>> >   duplicates is suppressed, but guarantees the system will not
>> >   permanently suppress identical hangs that occur in future rounds.
>> >
>> > - Introduces a new sysctl, kernel.hung_task_dedup, which defaults to 1
>> >   (enabled). The sysctl is locally cached at the outset of each
>> >   interval to prevent tearing caused by concurrent userspace toggling.
>> >
>>
>> Thanks for working on this, but ... guess I'll be the bad guy here, not
>> convinced this should go in ...
>>
>> When khungtaskd fires, somthing is already wrong, no? I don't see why it
>> should grow a new sysctl, a stack hash table, and extra filtering logic
>> just ot hide part of the report ...
>>
>> Emm ... do you have real cases where duplicate hung-task stacks caused
>> serious pain?
>>
>> If many tasks hang at once, usually one root cause, not a bunch of
>> different bugs. At least from what I've seen, any one of those stacks is
>> enough to start debugging ...
>>
>> We already have hung_task_detect_count and trace_sched_process_hang() for
>> basic counting/observability. Even if hung_task_warnings is finite and
>> the warning budget runs out, we still don't lose detections: counter gets
>> bumped and tracepoint fires before printk output is gated :)
>>
>> If someone wants stack grouping, I'd rather leave that to a tool than add
>> another policy knob to khungtaskd. Once it lands, maintainers have to
>> carry it forever. Not every nice-to-have feature is worth that cost, IMHO
>>
>> And if someone really wants more hung-task stacks in the log, we already
>> have hung_task_warnings for that. Raise it, or set it to -1.
>>
>> Also, looking at the v1 thread, I don't think the concerns there have
>> really settled yet ... If nobody replies, maybe give it a week before
>> sending a new version.
>
>Hi Lance,
>
>Thank you for taking the time to review the patch and for your candour.

I think your reply still misses most of my concerns ...

>You raise an entirely fair point regarding maintainability; every new
>control knob indeed carries a permanent cost for the maintainers, and I
>respect your caution.

Yeah, that matters, IMHO.

>To answer your question regarding real world pain: the primary issue is not
>merely visual clutter, but the premature exhaustion of the warning budget
>and the preservation of the kernel ring buffer during cascading failures.

Right, but that still sounds like a very specific case.

When khungtaskd fires, something is already wrong, no?

Even with one identical stack, per-round dedup only helps inside one
scan. The same stack can still come back in later rounds and burn through
hung_task_warnings anyway.

And under heavy contention, I would not expect only one stack anyway.
Different tasks can hang behind different locks or different callers, and
those stacks can still burn through the warning budget.

>In our production environments, we typically leave
>kernel.hung_task_warnings at its default value of 10. If a severe lock
>contention occurs, a single bottleneck can easily cause 10 tasks to hang
>simultaneously with the exact same stack trace. Under the current logic,

Not sure I buy this premise :)

Same bottleneck does not necessarily mean exactly the same stack.
Different callers can block on the same lock, and exact-stack dedup won't
help there.

At least from cases I've looked at, I can't really recall seeing this
exact pattern often enough to justify a new khungtaskd knob.

>those 10 identical traces will completely exhaust the warning budget.
>Consequently, the kernel is left entirely blind to any subsequent or
>completely unrelated deadlocks that might be occurring concurrently, as all
>further reports are silenced.

I don't think "entirely blind" is accurate.

hung_task_warnings *only* gates printk. We still bump
hung_task_detect_count and hit trace_sched_process_hang() before that
gate.

>Furthermore, dumping a full stack trace for every duplicate rapidly injects
>several of lines of identical noise into dmesg. We have found that this
>sudden burst frequently rolls the circular ring buffer.
>
>Userspace tooling is unfortunately unable to group or analyse logs that
>have already been evicted before the tool could read them, nor can it
>recover traces the kernel silently dropped due to an exhausted budget.
>
>The deduplicator acts as a telemetry filter, ensuring that the limited
>warning budget is spent strictly on unique traces rather than redundant
>noise, thereby preserving the history of the crash and ensuring secondary
>failures are not obscured.
>
>I wanted to clarify the exact operational context and the limitation of
>relying on userspace. Please let me know if this operational context alters
>your perspective at all.
[...]

Aaron, you've done good work in khungtaskd, and some of it is upstream
already. I do appreciate that!

But this one feels different. Useful locally, maybe, but not something
the kernel should carry forever.

Anyway, I'll stop here. Still a nack from my side.

Thanks, Lance

      reply	other threads:[~2026-06-21  5:17 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-20  1:35 [PATCH v2] hung_task: Add per-round stack trace deduplication Aaron Tomlin
2026-06-20  3:37 ` Lance Yang
2026-06-20 17:54   ` Aaron Tomlin
2026-06-21  5:17     ` Lance Yang [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260621051718.64919-1-lance.yang@linux.dev \
    --to=lance.yang@linux.dev \
    --cc=akpm@linux-foundation.org \
    --cc=atomlin@atomlin.com \
    --cc=chjohnst@gmail.com \
    --cc=david.laight.linux@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mhiramat@kernel.org \
    --cc=mproche@gmail.com \
    --cc=neelx@suse.com \
    --cc=nick.lange@gmail.com \
    --cc=pmladek@suse.com \
    --cc=sean@ashe.io \
    --cc=steve@abita.co \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.