From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from out-177.mta1.migadu.com (out-177.mta1.migadu.com [95.215.58.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3CFE85733E for ; Sat, 20 Jun 2026 03:37:43 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=95.215.58.177 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781926665; cv=none; b=PwcToqWEaT8ONWg9vAmTePaD4/IjV9bHexc7KcPeBgMLa33TsVRdmAjN4aXERUhEQN1nqM1pS8XBNzrSz+jujd5zouom3/Djy9lBwjEwt37E+y8/QqVXeqyCQgSGQ6XwTzqM4OLI5mw1ysDSORGu1KUlN2Vdt1nUCwO04OMFFx8= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781926665; c=relaxed/simple; bh=R1HffaKAJSD8tlMtVNzgNWuL2kz8mFinrz/4Ly/8eTM=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version:Content-Type; b=eNBkToJeBLdgpRbpSXDYSqRQ1v/WDVvjt0RoFCnczrkJx65MyhfSeolOWEGOimUh73mXSWIODG5vnkTHB/klAqmDg2XEPu3DJzVsusoBA8HwcPsczxZ6LXacQHv5LqR0oO6XaccJpQc0ruF2hh3s/lbJeV6TpsH6ZClBTOYFhOg= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=i+d0wt2r; arc=none smtp.client-ip=95.215.58.177 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="i+d0wt2r" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1781926661; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=7Neu1YwMBD+Bb2setpAZcKmVSOadoZcXs1CfFEPyyFw=; b=i+d0wt2rGsudyJ9g1AkRv+gdjYevtkp7tJvZRLH2d26aPHNWR/Upgrj3LMO1KTgof8c92Q kegQlTE3yw79tc7YT6xnxVxhdgxJ/9cKvRGPqsfY4CPvMp688wo9XQnA3E61a+NpVExFuV unqfgG5J5Hfz7JgsROnRe+GVDx+0vWM= From: Lance Yang To: atomlin@atomlin.com Cc: akpm@linux-foundation.org, lance.yang@linux.dev, mhiramat@kernel.org, pmladek@suse.com, linux-kernel@vger.kernel.org, david.laight.linux@gmail.com, neelx@suse.com, sean@ashe.io, chjohnst@gmail.com, steve@abita.co, mproche@gmail.com, nick.lange@gmail.com Subject: Re: [PATCH v2] hung_task: Add per-round stack trace deduplication Date: Sat, 20 Jun 2026 11:37:15 +0800 Message-Id: <20260620033715.71108-1-lance.yang@linux.dev> In-Reply-To: <20260620013559.1537893-1-atomlin@atomlin.com> References: <20260620013559.1537893-1-atomlin@atomlin.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Migadu-Flow: FLOW_OUT Hi Aaron, On Fri, Jun 19, 2026 at 09:35:59PM -0400, Aaron Tomlin wrote: >Currently, when multiple tasks hang in the exact same location (e.g., >such as severe contention for a mutex), khungtaskd indiscriminately >reports every single instance. This wastes ring buffer space with >identical stack traces up to the defined warning limit (i.e., >kernel.hung_task_warnings), obscuring the root cause without providing >any additional diagnostic value. > >Introduce a lightweight, hash-based stack trace deduplicator for >khungtaskd to ensure only unique stack traces are reported during >a single detection interval. > >Technical details of the implementation: > - Uses a 12-bit hash table (4096 slots), consuming just 16 KB of > static memory to prevent cache thrashing during massive hangs. > > - Operates purely serially within the single khungtaskd thread, > requiring zero atomic operations or concurrent locking overhead. > > - Flushes the lossy cache via memset() at the beginning of each > detection round. This ensures the immediate "thundering herd" of > duplicates is suppressed, but guarantees the system will not > permanently suppress identical hangs that occur in future rounds. > > - Introduces a new sysctl, kernel.hung_task_dedup, which defaults to 1 > (enabled). The sysctl is locally cached at the outset of each > interval to prevent tearing caused by concurrent userspace toggling. > Thanks for working on this, but ... guess I'll be the bad guy here, not convinced this should go in ... When khungtaskd fires, somthing is already wrong, no? I don't see why it should grow a new sysctl, a stack hash table, and extra filtering logic just ot hide part of the report ... Emm ... do you have real cases where duplicate hung-task stacks caused serious pain? If many tasks hang at once, usually one root cause, not a bunch of different bugs. At least from what I've seen, any one of those stacks is enough to start debugging ... We already have hung_task_detect_count and trace_sched_process_hang() for basic counting/observability. Even if hung_task_warnings is finite and the warning budget runs out, we still don't lose detections: counter gets bumped and tracepoint fires before printk output is gated :) If someone wants stack grouping, I'd rather leave that to a tool than add another policy knob to khungtaskd. Once it lands, maintainers have to carry it forever. Not every nice-to-have feature is worth that cost, IMHO And if someone really wants more hung-task stacks in the log, we already have hung_task_warnings for that. Raise it, or set it to -1. Also, looking at the v1 thread, I don't think the concerns there have really settled yet ... If nobody replies, maybe give it a week before sending a new version. Thanks, Lance