From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.zx2c4.com (lists.zx2c4.com [165.227.139.114]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 8587CCF34DA for ; Wed, 19 Nov 2025 15:40:17 +0000 (UTC) Received: by lists.zx2c4.com (ZX2C4 Mail Server) with ESMTP id 379e2e38; Tue, 18 Nov 2025 17:32:37 +0000 (UTC) Received: from out-183.mta1.migadu.com (out-183.mta1.migadu.com [95.215.58.183]) by lists.zx2c4.com (ZX2C4 Mail Server) with ESMTPS id 6a6af75c (TLSv1.3:TLS_AES_256_GCM_SHA384:256:NO) for ; Tue, 14 Oct 2025 11:41:11 +0000 (UTC) Message-ID: <096168a6-8687-4dae-a774-0741d3e5a891@linux.dev> DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1760442070; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Hd2PoIE+rPMCr6jd3OwU3he0uZl4CmPgrMZ61+ge710=; b=VK9bY4LuIi4ji669KYN12TIOH35WJDu8pKASGIJSdi9qQhdHcSOS9ZOqAwGKpQPsCsa/MC Any8FgOqFuOI1eRpcwlhhwocJFbU9rKLsGQh3WYropHOaMHnqd5PxuOYzCW6qqbWxTyanS cCDvtJaYV62OHRnEZ5m/iARM6dBj9oQ= Date: Tue, 14 Oct 2025 19:40:58 +0800 MIME-Version: 1.0 Subject: =?UTF-8?B?UmU6IFvlpJbpg6jpgq7ku7ZdIFJlOiBbUEFUQ0hdW3YzXSBodW5nX3Rh?= =?UTF-8?Q?sk=3A_Panic_after_fixed_number_of_hung_tasks?= Content-Language: en-US To: "Li,Rongqing" , Petr Mladek Cc: "wireguard@lists.zx2c4.com" , "linux-arm-kernel@lists.infradead.org" , "Liam R . Howlett" , "linux-doc@vger.kernel.org" , David Hildenbrand , Randy Dunlap , Stanislav Fomichev , "linux-aspeed@lists.ozlabs.org" , Andrew Jeffery , Joel Stanley , Russell King , Lorenzo Stoakes , Shuah Khan , Steven Rostedt , Jonathan Corbet , Joel Granados , Andrew Morton , Phil Auld , "linux-kernel@vger.kernel.org" , "linux-kselftest@vger.kernel.org" , Masami Hiramatsu , Jakub Kicinski , Pawan Gupta , Simon Horman , Anshuman Khandual , Florian Westphal , "netdev@vger.kernel.org" , Kees Cook , Arnd Bergmann , "Paul E . McKenney" , Feng Tang , "Jason A . Donenfeld" References: <20251012115035.2169-1-lirongqing@baidu.com> <588c1935-835f-4cab-9679-f31c1e903a9a@linux.dev> <3acdcd15-7e52-4a9a-9492-a434ed609dcc@linux.dev> <38af4922ca44433fa7cd168f7c520dc9@baidu.com> X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Lance Yang In-Reply-To: <38af4922ca44433fa7cd168f7c520dc9@baidu.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Migadu-Flow: FLOW_OUT X-Mailman-Approved-At: Tue, 18 Nov 2025 17:23:16 +0000 X-BeenThere: wireguard@lists.zx2c4.com X-Mailman-Version: 2.1.30rc1 Precedence: list List-Id: Development discussion of WireGuard List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: wireguard-bounces@lists.zx2c4.com Sender: "WireGuard" On 2025/10/14 19:18, Li,Rongqing wrote: >>>>> Currently, when 'hung_task_panic' is enabled, the kernel panics >>>>> immediately upon detecting the first hung task. However, some hung >>>>> tasks are transient and the system can recover, while others are >>>>> persistent and may accumulate progressively. >>> >>> My understanding is that this patch wanted to do: >>> >>> + report even temporary stalls >>> + panic only when the stall was much longer and likely persistent >>> >>> Which might make some sense. But the code does something else. >> >> Cool. Sounds good to me! >> >>> >>>>> --- a/kernel/hung_task.c >>>>> +++ b/kernel/hung_task.c >>>>> @@ -229,9 +232,11 @@ static void check_hung_task(struct task_struct >> *t, unsigned long timeout) >>>>> */ >>>>> sysctl_hung_task_detect_count++; >>>>> + total_hung_task = sysctl_hung_task_detect_count - >>>>> +prev_detect_count; >>>>> trace_sched_process_hang(t); >>>>> - if (sysctl_hung_task_panic) { >>>>> + if (sysctl_hung_task_panic && >>>>> + (total_hung_task >= sysctl_hung_task_panic)) { >>>>> console_verbose(); >>>>> hung_task_show_lock = true; >>>>> hung_task_call_panic = true; >>> >>> I would expect that this patch added another counter, similar to >>> sysctl_hung_task_detect_count. It would be incremented only once per >>> check when a hung task was detected. And it would be cleared (reset) >>> when no hung task was found. >> >> Much cleaner. We could add an internal counter for that, yeah. No need to >> expose it to userspace ;) >> >> Petr's suggestion seems to align better with the goal of panicking on >> persistent hangs, IMHO. Panic after N consecutive checks with hung tasks. >> >> @RongQing does that work for you? > > > In my opinion, a single task hang is not a critical issue, fatal hangs—such as those caused by I/O hangs, network card failures, or hangs while holding locks—will inevitably lead to multiple tasks being hung. In such scenarios, users cannot even log in to the machine, making it extremely difficult to investigate the root cause. Therefore, I believe the current approach is sound. What's your opinion? Thanks! I'm fine with either approach. Let's hear what the other folks think ;)