From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from out-176.mta1.migadu.com (out-176.mta1.migadu.com [95.215.58.176]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BB3981E835B for ; Mon, 21 Jul 2025 06:19:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=95.215.58.176 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1753078767; cv=none; b=fGjty0QYklqEON8d8r6HZRg9TGl7UjdW12w67ijpeK6OGB3Xv6iqVvezB52CKRM4R9avt3GykwMFIieaXqk8ei9ZLl5eCEL3vdSvpCjcuU5i7woXHuGskh5pH5bRPFIhYi/uURC+2/bHJY/8bC/fViAEgE9hMGtg9jpeGjuqZHU= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1753078767; c=relaxed/simple; bh=4a/L0tN5R/o4U4pIg5paGQZuKWCpCUJi4oNwzOXlRQI=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=TfxRL3yud1u/fpjADkrv5IvTpPxZMqmjl2h0mgMBaXjmSz81/To5hQLtxZQ2UnmPKvv3HaDcjCR+tvHjKEoY75eDdF/1gGIuSQ6swSROhR53F+24mhgeDPo9mPG0eS5MNJZ8C6Sb+T7MFqZiNz7iSJGt9+UdVAJEJq+CPjgPcKs= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=IAx/u3we; arc=none smtp.client-ip=95.215.58.176 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="IAx/u3we" Message-ID: <83ac6ac0-a7c5-4475-8800-0beefa117164@linux.dev> DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1753078757; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=i1HyBDmhyDwGblwjgRPp3ZXe9xBmZyyHE1GUZkiTAXM=; b=IAx/u3wePaZvhqDlotjGIMFuSuWPWc0g5rl3GxB9t8Hf+Gg2Qob7IXkntwECjk43yurrHz 8KgLMA8M5VrmTQx6tlBoE3t2R2ck0v/5Zb+gHLh5/9IHUDXaq4ctomhdb3e51ST/Dw9aBT Sohx1zEwu2QBr9vVZEorD0Ap4/iqZ40= Date: Mon, 21 Jul 2025 14:19:10 +0800 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Subject: Re: [PATCH] hung_task: add warning counter to blocked task report Content-Language: en-US To: Ye Liu Cc: Ye Liu , linux-kernel@vger.kernel.org, Andrew Morton , Zi Li References: <20250721031755.1418556-1-ye.liu@linux.dev> <0d15cf75-abbd-446d-86fa-49ea251f7a82@linux.dev> <582cf973-1290-493c-b821-f23480e75014@linux.dev> X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Lance Yang In-Reply-To: <582cf973-1290-493c-b821-f23480e75014@linux.dev> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Migadu-Flow: FLOW_OUT On 2025/7/21 13:45, Ye Liu wrote: > > > On 2025/7/21 12:56, Lance Yang wrote: >> Hi Ye, >> >> Thanks for your patch! >> >> On 2025/7/21 11:17, Ye Liu wrote: >>> From: Ye Liu >>> >>> Add a warning counter to each hung task message to make it easier >>> to analyze and locate issues in the logs. >>> >>> Signed-off-by: Ye Liu >>> --- >>>   kernel/hung_task.c | 6 ++++-- >>>   1 file changed, 4 insertions(+), 2 deletions(-) >>> >>> diff --git a/kernel/hung_task.c b/kernel/hung_task.c >>> index 8708a1205f82..9e5f86148d47 100644 >>> --- a/kernel/hung_task.c >>> +++ b/kernel/hung_task.c >>> @@ -58,6 +58,7 @@ EXPORT_SYMBOL_GPL(sysctl_hung_task_timeout_secs); >>>   static unsigned long __read_mostly sysctl_hung_task_check_interval_secs; >>>     static int __read_mostly sysctl_hung_task_warnings = 10; >>> +static int hung_task_warning_count; >>>     static int __read_mostly did_panic; >>>   static bool hung_task_show_lock; >>> @@ -232,8 +233,9 @@ static void check_hung_task(struct task_struct *t, unsigned long timeout) >>>       if (sysctl_hung_task_warnings || hung_task_call_panic) { >>>           if (sysctl_hung_task_warnings > 0) >>>               sysctl_hung_task_warnings--; >>> -        pr_err("INFO: task %s:%d blocked for more than %ld seconds.\n", >>> -               t->comm, t->pid, (jiffies - t->last_switch_time) / HZ); >>> +        pr_err("INFO: task %s:%d blocked for more than %ld seconds. [Warning #%d]\n", >>> +               t->comm, t->pid, (jiffies - t->last_switch_time) / HZ, >>> +               ++hung_task_warning_count); >>>           pr_err("      %s %s %.*s\n", >>>               print_tainted(), init_utsname()->release, >>>               (int)strcspn(init_utsname()->version, " "), >> >> A quick thought on this: we already have the hung_task_detect_count >> counter, which tracks the total number of hung tasks detected since >> boot ;) >> >> While this patch adds a counter inline with the warning message, the >> existing counter already provides a way to know how many hung task >> events have occurred. >> >> Could you elaborate on the specific benefit of printing this count >> directly in the log, compared to checking the global hung_task_detect_count? >> >> Also, if the goal is to give each warning a unique sequence number, >> I think the dmesg timestamp prefix serves the same purpose ;) >> >> Thanks, >> Lance > > Sorry for not noticing sysctl_hung_task_detect_count. > I just thought adding it directly to the warning message would make the > log easier to read and more intuitive than relying on timestamps. > > If accepted, I will send V2, like this: Let's step back and considet the practical use case. when we are troubleshooting hung task issues in a production log, what information do we actually use? Typically, we look for: 1) The timestamp, to correlate with other system events 2) The task name and PID (%s:%d) 3) The kernel stack trace that follows, to see where it's stuck So, my question is: in what specific troubleshooting scenario would knowing the sequence number, like [#N], provide actionable information that the above data points do not? Unless there's a compelling use case I'm missing, I'd prefer to keep the code as it is ;) Thanks, Lance > > diff --git a/kernel/hung_task.c b/kernel/hung_task.c > index 8708a1205f82..231afdb68bb2 100644 > --- a/kernel/hung_task.c > +++ b/kernel/hung_task.c > @@ -232,8 +232,9 @@ static void check_hung_task(struct task_struct *t, unsigned long timeout) > if (sysctl_hung_task_warnings || hung_task_call_panic) { > if (sysctl_hung_task_warnings > 0) > sysctl_hung_task_warnings--; > - pr_err("INFO: task %s:%d blocked for more than %ld seconds.\n", > - t->comm, t->pid, (jiffies - t->last_switch_time) / HZ); > + pr_err("INFO: task %s:%d blocked for more than %ld seconds. [#%ld]\n", > + t->comm, t->pid, (jiffies - t->last_switch_time) / HZ, > + sysctl_hung_task_detect_count); > pr_err(" %s %s %.*s\n", > print_tainted(), init_utsname()->release, > (int)strcspn(init_utsname()->version, " "), > > > >