From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from CWXP265CU008.outbound.protection.outlook.com (mail-ukwestazon11020115.outbound.protection.outlook.com [52.101.195.115]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9D4BC1DB92C for ; Sun, 21 Jun 2026 21:38:02 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=52.101.195.115 ARC-Seal:i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1782077884; cv=fail; b=ZFW2wv+PIpYTmQF3DLssbD0f7HKMgHEqUDk9LxTZBl2KDdL/BfQbc+chfJ+RVCC7qLX4dzxuhFMPfPyxpzqWBH1sKcvElGBoSCQsTPB638xuMFElPyMVk4S54SGEGwfo4dQ/GKpIyBGv++7wJ4lRdVNriBJuH53kVFD4B68pQyk= ARC-Message-Signature:i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1782077884; c=relaxed/simple; bh=xTa2lO4QvWA0kZoxkCSgtWO41cEO7m4KGs9WER61iNg=; h=From:To:Cc:Subject:Date:Message-ID:Content-Type:MIME-Version; b=fKMHoVjGbqmUTDbv17q0Zyv0vWVQxiUTysLkReCJe2vHfhucUkVSdbO9K9+4EDigXtcxbzwAXPNlxXsc9Slo+R/T9jP1Y7UEJJ9bNRENitukehTbHOTfvTqnWSnONLWdoIp6iQT7Sub/Tjv33yswohjZNt4QyM2aR5zioVKi0Dc= ARC-Authentication-Results:i=2; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=atomlin.com; spf=pass smtp.mailfrom=atomlin.com; arc=fail smtp.client-ip=52.101.195.115 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=atomlin.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=atomlin.com ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=d0PO2uTkRZuLwygXJQxzMmKzoc/NVaNQz+CQ/pdG5DNd9e/8QBdgq7FvxEPg1014XCe8+zWjjNe0EDwcrkcRMY8ZtHWsybAl0uCIqpQduAiYRcC2TpYgD1LDlxqiqQGJCDIJlc9MEdqUocZo56S8yYtiEnFZ63+1UGKDn5naxAAnC45bGPbN8/Hu8J30pK0MyWdgeBYgcDmiltv6KBZd6vmXubeXA/KmNmdO6c1VoaBrqcLnFk80vPsLdSTs1pyCA8cJzHoDYCbYiAW0OkeAZLakEOl3I23/wrNbjiZxCpYiz3L0kBgeRLj+Yy4kyiZhuzSx3T+r9qVuDq5LpKLuxA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:MIME-Version; bh=daIUNkkkdOH5aQDPQiKMx1LBMY9nnf8OccdFoNiLHQY=; b=PJiKLIWKXGferOjYsK9Gkipg1qVa37VbmXGzLhDz0SvSFcec3jiqFYQNaQtJGTFELy+TENZMkHqe8C9IzHcgUrICzMEHZlnECoAe/VFL7E3eYhZPmh2IWFZOMFAd4y/DDFOe5MlNxaGgm27didHqVDr01im2gW55V4xjTnXbqXCqy6vPw4qjbB0Ku9lxX6pGEgiA1SkpcrvEiSm0zxM4F2TCvV1LPbJk0igqWnIzqNQIdkbcc0AgC8gVSY2jxcWaRiQmmCFD4ftpmmVKilfuXKGuMqk4erqXX386byPxSK8uKcEtaRnjBL/fqoCKXuRve+ywYaDGAPIyJdVYNZuaFA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=atomlin.com; dmarc=pass action=none header.from=atomlin.com; dkim=pass header.d=atomlin.com; arc=none Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=atomlin.com; Received: from CWLP123MB6607.GBRP123.PROD.OUTLOOK.COM (2603:10a6:400:183::5) by LO6P123MB7254.GBRP123.PROD.OUTLOOK.COM (2603:10a6:600:340::7) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.21.139.19; Sun, 21 Jun 2026 21:38:00 +0000 Received: from CWLP123MB6607.GBRP123.PROD.OUTLOOK.COM ([fe80::cec4:77ab:262e:d230]) by CWLP123MB6607.GBRP123.PROD.OUTLOOK.COM ([fe80::cec4:77ab:262e:d230%4]) with mapi id 15.21.0139.018; Sun, 21 Jun 2026 21:38:00 +0000 From: Aaron Tomlin To: akpm@linux-foundation.org, lance.yang@linux.dev, mhiramat@kernel.org, pmladek@suse.com Cc: linux-kernel@vger.kernel.org, david.laight.linux@gmail.com, atomlin@atomlin.com, neelx@suse.com, sean@ashe.io, chjohnst@gmail.com, steve@abita.co, mproche@gmail.com, nick.lange@gmail.com Subject: [PATCH v3] hung_task: deduplicate identical hang reports Date: Sun, 21 Jun 2026 17:37:56 -0400 Message-ID: <20260621213756.43225-1-atomlin@atomlin.com> X-Mailer: git-send-email 2.51.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain X-ClientProxiedBy: BN9PR03CA0032.namprd03.prod.outlook.com (2603:10b6:408:fb::7) To CWLP123MB6607.GBRP123.PROD.OUTLOOK.COM (2603:10a6:400:183::5) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: CWLP123MB6607:EE_|LO6P123MB7254:EE_ X-MS-Office365-Filtering-Correlation-Id: b7754d11-c3ab-4c3e-e4a2-08decfdd5da7 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|23010399003|1800799024|366016|7416014|376014|18002099003|3023799007|6133799003|56012099006; X-Microsoft-Antispam-Message-Info: eFw54djS9UsegJ2eOxElPyAjtoIQ30YqviZs8OC5pCqY+G7QwPN9FdU4FQ8FmoO4goBw0dyD7cHMIzLYmIgTRZZIxVT8Dg5+p91VvO4rJSFXCL61HIGyskXrLeKSfihBtt06DyfUAvf/5AWwAObcoJN9NGk22/rOr9feGo/rozbtQykHLUhB3hf+40mVLCiXNZf53BoXIQWqBJyZe7BTYCdywXVht68H4HanESN/yV5HNrORVb5QWAIrUka7zavCBambGwOdnF0ZDQaCcBc//N5+a8ubipfzaH4Pxc3dKp9G+Ov2hgusqnuA3owvuAnQzYS3gBB77eEKbiH4weUu1IPkAslwfbYVkvGIMhZ5guLjqldGrytJOKY9/LARcGM8sj08kJh5dnU7pX4oxxnk1i2dm4svJnLOtBEXGHLak+aBXGOWlhTWPXPKtc3UAKvdPyOtUjgYZrJ1Q/LGEFU7j439pLpMJU2kwAFovg+Olxpajxmiah9F755EqEC7+hoOk61PEsSzfhJ/4EghVPfyYXp3QKmg+O4xkU/uHcp6JJG4coJ3f4dDJjl7dDRstQ8GPbNiQmrHfvyGVwVl/OPab0DI8lpAxV/ZrqRpQrKLRO/Day3lFDnsrKB/2T0Sv4o/fZxl3SGh9hKgM2cEQexs0jkQhnSWrGvWD/K6KNRMm3s= X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:CWLP123MB6607.GBRP123.PROD.OUTLOOK.COM;PTR:;CAT:NONE;SFS:(13230040)(23010399003)(1800799024)(366016)(7416014)(376014)(18002099003)(3023799007)(6133799003)(56012099006);DIR:OUT;SFP:1102; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?us-ascii?Q?5Ow8HCRUSxVFP9XycvJZ8DuyFTLubE1BXHuUOPVCAo9bj77Rg2fSNrHZAsUi?= =?us-ascii?Q?NVq8V0Q572mH2Hg5ltCxmybAvHjg5ZxZYkX0AuEoybS24T1Y4QDetIruT5x+?= =?us-ascii?Q?XFnIGmBSBHMZNSDWPbTIlr2tC1drWrtkdKcNiq9oHP5Fain40wYcXdefl/t1?= =?us-ascii?Q?JSF/tTGeLB325WlUiFKkNxE3lJ35VxIRUWiXck+KAYouJUDB6ONFf4fmekAC?= =?us-ascii?Q?k9nvKW4Hi/8iMy1SFS5ObmPouaN80Cg56/gi8f3xAqa5Zd4NUz5xjk9NTNAK?= =?us-ascii?Q?2qX8ZY5/haoGLjPEUqVZnQhmjXTOJv9BuFyiAUtCucsT0cMeRXA66X4n4fbl?= =?us-ascii?Q?vEds+Eaa2zQJk6sZ2EpvkDOGkgg9l4VE0G3BLUE53b1+Z77uM4MK8oKIVrFJ?= =?us-ascii?Q?AHWqUm5QHzOvUtQZFAQnd4n6HVSLidN+QipjIP8fosMJqXqTCeUmiUudDgyx?= =?us-ascii?Q?M9lkj+/KAlzfaZTy4QF450AKy8Hy/0Co8gZzBVkZ8WP72g+1WG+uAJuk2as0?= =?us-ascii?Q?7WKQYTN3+TVaGc10H8O8PrE6Q0Qa0aSAghE+buSOF4xliFprssgHBUIGAlAL?= =?us-ascii?Q?PLVlbpY3c8u+Ch8XeD+lgxHKLHt2m78yF9muw4ZgwxpDQQnCJezN69mSyZPx?= =?us-ascii?Q?VBeEazF0N60v4tr7tl56Or0uCPPAYA7oDYME76H62q23cF5jIHamR3heQBlI?= =?us-ascii?Q?pl4nzAsaHL4t7CukUdM+2/8DSbqIdinu0FolfONNXFr7CO0ru2hFzm7ujbww?= =?us-ascii?Q?rUNYIRAB9stvz5tuWrLPNuTbpJK3x5yc2yYOZYhB7DQY0VU+mpHKOVxHZI2w?= =?us-ascii?Q?RuHkCYhOInSqMBbcf4BxHIohSjpoz8iVYlaRRST7TYLLTTVo/FS20OsvZzI2?= =?us-ascii?Q?DLA7g0xrxPzEkmnS+Gzt1IjGP0WDvz5u8Y0VnuFvfH43vKPhMLmImUC05wCP?= =?us-ascii?Q?M0Y2bMa7j8rRdD/bZz9rHpYRiHegCYrKK8g2FLeZlEcaIpjA7mMbfJDbjT5s?= =?us-ascii?Q?3UKSGmsDa96OPLOyCcT3r+apAFvTb/g6onCncbr0svo/lSiuleFoFhbSRhPB?= =?us-ascii?Q?2BTl65mUQo7I9589Ipr88C3wPGUka5Oisg7986VW7E7EMHHfan4f2fmhbhiw?= =?us-ascii?Q?ogARrF5EmiHwlJfJij/yk5BuHZgGhcYq3AfwBp+maf+9EAq84uaE9hQqQqsD?= =?us-ascii?Q?wUXLCUgmyY7MV1GGEtNQHMYpascNTUYFcGL5b6kLi8U9hEr6a/H3ZMb3bz47?= =?us-ascii?Q?HKlAC7Vsa/Du/sxTJBsRnPe2XaF07vIW1WCN7q3f7Aqz4hD4Hdb84sBxKmAh?= =?us-ascii?Q?GNSC7Bkm5gubIqNJ7Jk2aaqaaGkvHCO5Z4zxvDAztiTAaod9KYCYqx850r/W?= =?us-ascii?Q?0KAjqKscTbxt9Qswk38cqnfuOf0/y1wX+ZocpcZ2HqfVkBWHTURGO/9Z82WF?= =?us-ascii?Q?TXwam+UNRSuoQccUPM6C4HRPlNj7g0IqdsEAYZ3bg+0SFhB+Z+z0/goNf+c/?= =?us-ascii?Q?tI2YijMoQezDYLrsasr7oOXWxrFe42Q9CGcKfwJtpIF/aFoXaokSR+n9158P?= =?us-ascii?Q?cVDFRHPLsiZzrCFKsHgONocqTeF26S7YkG/K2sBrqRdM60+rIEfkspXI6UUF?= =?us-ascii?Q?P2OvQ/6NOsGY946cTT930q4BCBhmdf6U6fy/AJDRv2gZSixy5eI8/kVbonIH?= =?us-ascii?Q?00tMOJSz5esrFSDvd1rE5qCxBQfLjz+N3zL7eFgM0zh7Ve/pJ0Y4vjAt2A4p?= =?us-ascii?Q?qRTB3wl8lA=3D=3D?= X-OriginatorOrg: atomlin.com X-MS-Exchange-CrossTenant-Network-Message-Id: b7754d11-c3ab-4c3e-e4a2-08decfdd5da7 X-MS-Exchange-CrossTenant-AuthSource: CWLP123MB6607.GBRP123.PROD.OUTLOOK.COM X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 21 Jun 2026 21:38:00.0268 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: e6a32402-7d7b-4830-9a2b-76945bbbcb57 X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: SIfMJfDODwccxeU7KBKg2iEFT2V6N7HD+MhpixllHmd2jGji/uFwUkIF6fmOcdlSaThrwDftw4dUUeTD9REk+g== X-MS-Exchange-Transport-CrossTenantHeadersStamped: LO6P123MB7254 Currently, during severe lock contention, multiple tasks can hang while waiting on the exact same resource. The khungtaskd kthread indiscriminately reports every single instance with a stack trace. This can roll the kernel ring buffer and prematurely exhaust the kernel.hung_task_warnings budget. Consequently, the kernel is left entirely blind to subsequent, unrelated deadlocks. To preserve the warning budget and ring buffer without sacrificing observability, introduce a Wait Channel (wchan) and task-state based deduplicator: 1. Implement a lightweight, stack-allocated 64-slot Wait Channel (wchan) hash map. Tasks blocked on the exact same wchan during a single scan are recognised as sharing the same bottleneck, successfully deduplicating contentions even when the callers possess entirely disparate call stacks. 2. Introduce a hung_task_reported bit-field in task_struct. If a task remains hung across multiple intervals, khungtaskd recognises it has already been reported. The bit is safely cleared without locks or atomics the moment the task's context switch counter increments. 3. For duplicate tasks, we still print the single-line "INFO: task ..." message and trigger tracepoint trace_sched_process_hang(). It merely skips calling sched_show_task() and debug_show_blocker(), printing a concise suppression notice instead. Signed-off-by: Aaron Tomlin -- Changes since v2: - Replaced the per-round cache flush with a task_struct bit-field for persistent cross-scan tracking, mitigating delayed budget exhaustion - Abandoned exact-stack hashing in favour of Wait Channel hashing - Transitioned from jhash() to hash_long() to optimise single-pointer hashing, and relocated the hash map to the local stack - Linked to v2: https://lore.kernel.org/lkml/20260620013559.1537893-1-atomlin@atomlin.com/ Changes since v1: - Preserve "INFO:" headers for all hung tasks; suppress only the stack dumps for duplicates (Masami Hiramatsu) - Print a clear notification when a trace is explicitly suppressed - Add #ifdef CONFIG_STACKTRACE guards to prevent Kconfig build errors - Optimise overhead by unwinding the stack only if a warning is actually going to be printed - Linked to v1: https://lore.kernel.org/lkml/20260617184841.1447955-1-atomlin@atomlin.com/ --- include/linux/sched.h | 3 +++ kernel/hung_task.c | 32 ++++++++++++++++++++++++++++---- 2 files changed, 31 insertions(+), 4 deletions(-) diff --git a/include/linux/sched.h b/include/linux/sched.h index b3204a15d512..e76cf221cc78 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1046,6 +1046,9 @@ struct task_struct { /* Used by page_owner=on to detect recursion in page tracking. */ unsigned in_page_owner:1; #endif +#ifdef CONFIG_DETECT_HUNG_TASK + unsigned hung_task_reported:1; +#endif #ifdef CONFIG_EVENTFD /* Recursion prevention for eventfd_signal() */ unsigned in_eventfd:1; diff --git a/kernel/hung_task.c b/kernel/hung_task.c index 6fcc94ce4ca9..5dcce0e7041b 100644 --- a/kernel/hung_task.c +++ b/kernel/hung_task.c @@ -25,6 +25,7 @@ #include #include #include +#include #include @@ -125,6 +126,7 @@ static bool task_is_hung(struct task_struct *t, unsigned long timeout) if (switch_count != t->last_switch_count) { t->last_switch_count = switch_count; t->last_switch_time = jiffies; + t->hung_task_reported = 0; return false; } if (time_is_after_jiffies(t->last_switch_time + timeout * HZ)) @@ -228,12 +230,14 @@ static inline void debug_show_blocker(struct task_struct *task, unsigned long ti * @t: Pointer to the detected hung task. * @timeout: Timeout threshold for detecting hung tasks * @this_round_count: Count of hung tasks detected in the current iteration + * @skip_show_task: Indicating if stack trace should be skipped * * Print structured information about the specified hung task, if warnings * are enabled or if the panic batch threshold is exceeded. */ static void hung_task_info(struct task_struct *t, unsigned long timeout, - unsigned long this_round_count) + unsigned long this_round_count, + unsigned int skip_show_task) { trace_sched_process_hang(t); @@ -261,8 +265,12 @@ static void hung_task_info(struct task_struct *t, unsigned long timeout, pr_err(" Blocked by coredump.\n"); pr_err("\"echo 0 > /proc/sys/kernel/hung_task_timeout_secs\"" " disables this message.\n"); - sched_show_task(t); - debug_show_blocker(t, timeout); + if (!skip_show_task) { + sched_show_task(t); + debug_show_blocker(t, timeout); + } else { + pr_err(" Stack trace suppressed. Already reported or duplicate wchan\n"); + } if (!sysctl_hung_task_warnings) pr_info("Future hung task reports are suppressed, see sysctl kernel.hung_task_warnings\n"); @@ -306,6 +314,9 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout) unsigned long this_round_count; int need_warning = sysctl_hung_task_warnings; unsigned long si_mask = hung_task_si_mask; + unsigned long wchan, wchan_hash[64] = { 0 }; + unsigned int hash; + unsigned int skip_show_task; /* * If the system crashed already then all bets are off, @@ -326,6 +337,7 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout) } if (task_is_hung(t, timeout)) { + skip_show_task = t->hung_task_reported; /* * Increment the global counter so that userspace could * start migrating tasks ASAP. But count the current @@ -334,7 +346,19 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout) */ atomic_long_inc(&sysctl_hung_task_detect_count); this_round_count++; - hung_task_info(t, timeout, this_round_count); + + wchan = get_wchan(t); + if (wchan) { + hash = hash_long(wchan, 6); + if (wchan_hash[hash] == wchan) + skip_show_task = 1; + else + wchan_hash[hash] = wchan; + } + + hung_task_info(t, timeout, this_round_count, + skip_show_task); + t->hung_task_reported = 1; } } unlock: -- 2.51.0