Re: [PATCH] hung_task: configurable hung-task stacktrace loglevel

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Petr Mladek <pmladek@suse.com>
To: Tomasz Figa <tfiga@chromium.org>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>,
	Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Juri Lelli <juri.lelli@redhat.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
	John Ogness <john.ogness@linutronix.de>,
	Steven Rostedt <rostedt@goodmis.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH] hung_task: configurable hung-task stacktrace loglevel
Date: Fri, 2 May 2025 17:05:27 +0200	[thread overview]
Message-ID: <aBTfN5cSrPvHHvCS@localhost.localdomain> (raw)
In-Reply-To: <CAAFQd5BeJnYXZt06WVFBWu8cvCmXWTe_tH8Ly3ywTNRCjxXCMA@mail.gmail.com>

On Wed 2025-04-30 17:42:51, Tomasz Figa wrote:
> On Sat, Apr 26, 2025 at 12:32 AM Petr Mladek <pmladek@suse.com> wrote:
> >
> > On Fri 2025-04-25 15:58:46, Tomasz Figa wrote:
> > > Hi Petr,
> > >
> > > On Thu, Apr 24, 2025 at 7:59 PM Petr Mladek <pmladek@suse.com> wrote:
> > > >
> > > > On Thu 2025-04-24 16:02:43, Sergey Senozhatsky wrote:
> > > > > Currently, hung-task watchdog uses two different loglevels
> > > > > to report hung-tasks: a) KERN_INFO for all the important task
> > > > > information (e.g. sched_show_task()) and b)  KERN_ERR for the
> > > > > rest.
> > > >
> > > > IMHO, the two different loglevels make sense. The KERN_ERR
> > > > message seems to inform about that a task gets blocked for too long.
> > > > And KERN_INFO is used for an extra debug information.
> > > >
> > >
> > > > If the problem is matching all related lines. Then a solution
> > > > would be printing some help lines around the report, similar
> > > > to
> > > >
> > > >     ------------[ cut here ]------------
> > > >
> > > > in include/asm-generic/bug.h
> > > >
> > > > Plus, it would be needed to filter out messages from other CPUs.
> > > > CONFIG_PRINTK_CALLER should help with this.
> > >
> > > I'm not really in love with that idea - it would make things so much
> > > more complicated, despite already having the right tool to
> > > differentiate between the importance of various logs - after all the
> > > log level is exactly that.
> >
> > Honestly, the more I think about it the more I like the prefix/postfix
> > lines + the caller_id. I am afraid that manipulating log levels is a
> > lost fight  because different people might have different opinion
> > about how various messages are important.
> 
> The problem with the special lines is that it completely breaks any
> line-based processing in a data pipeline. For a piece of
> infrastructure that needs to deal with thousands of reports, on an
> on-demand basis, that would mean quite a bit of sequential work done
> instead of doing it in parallel and taking much more time to answer
> users' queries.
> 
> That could be worked around, though, if we could prefix each line
> separately with some special tag in addition to log level, timestamp
> and caller, though. Borrowing from Sergey's earlier example:
> 
> <3>[  125.297687][  T140][E] INFO: task zsh:470 blocked for more than
> 61 seconds.
> <3>[  125.302321][  T140][E]       Not tainted
> 6.15.0-rc3-next-20250424-00001-g258d8df78c77-dirty #154
> <3>[  125.309333][  T140][E] "echo 0 >
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> <6>[  125.315040][  T140][E] task:zsh             state:D stack:0
> pid:470   tgid:470   ppid:430    task_flags:0x400100 flags:0x00004002
> <6>[  125.320594][  T140][E] Call Trace:
> <6>[  125.322327][  T140][E]  <TASK>
> <6>[  125.323852][  T140][E]  __schedule+0x13b4/0x2120
> <6>[  125.325459][  T140][E]  ? schedule+0xdc/0x280
> <6>[  125.327100][  T140][E]  schedule+0xdc/0x280
> <6>[  125.328590][  T140][E]  schedule_preempt_disabled+0x10/0x20
> <6>[  125.330589][  T140][E]  __mutex_lock+0x698/0x1200
> <6>[  125.332291][  T140][E]  ? __mutex_lock+0x485/0x1200
> <6>[  125.334074][  T140][E]  mutex_lock+0x81/0x90
> <6>[  125.335113][  T140][E]  drop_caches_sysctl_handler+0x3e/0x140
> <6>[  125.336665][  T140][E]  proc_sys_call_handler+0x327/0x4f0
> <6>[  125.338069][  T140][E]  vfs_write+0x794/0xb60
> <6>[  125.339216][  T140][E]  ? proc_sys_read+0x10/0x10
> <6>[  125.340568][  T140][E]  ksys_write+0xb8/0x170
> <6>[  125.341701][  T140][E]  do_syscall_64+0xd0/0x1a0
> <6>[  125.343009][  T140][E]  ? arch_exit_to_user_mode_prepare+0x11/0x60
> <6>[  125.344612][  T140][E]  ? irqentry_exit_to_user_mode+0x7e/0xa0
> <6>[  125.346260][  T140][E]  entry_SYSCALL_64_after_hwframe+0x4b/0x53
> 
> where [E] would mean an "emergency" message, rather than something
> usual, regardless of the loglevel.

This is an interesting idea. It has several advantages. It would:

  + still allow to filter out the extra details on too slow consoles [1]
  + work even when the "cut here" prefix/postfix lines get lost
  + obsolete the config option forcing the same loglevel in emergency
      section => safe space in struct task_struct. [2]

[1] Note that there is still floating a patchset which allows to define
     per-console loglevel, see
     https://lore.kernel.org/r/cover.1730133890.git.chris@chrisdown.name

[2] It might be eventually replaced by a config option which would show
    all emergency messages on consoles.

Best Regards,
Petr

next prev parent reply	other threads:[~2025-05-02 15:05 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-24  7:02 [PATCH] hung_task: configurable hung-task stacktrace loglevel Sergey Senozhatsky
2025-04-24 10:58 ` Petr Mladek
2025-04-25  4:49   ` Sergey Senozhatsky
2025-04-25 14:55     ` Petr Mladek
2025-04-30  1:34       ` Sergey Senozhatsky
2025-04-30  1:57         ` Sergey Senozhatsky
2025-04-25  6:58   ` Tomasz Figa
2025-04-25 15:32     ` Petr Mladek
2025-04-28  8:05       ` John Ogness
2025-04-30  5:05         ` Sergey Senozhatsky
2025-04-30  8:42       ` Tomasz Figa
2025-05-02 15:05         ` Petr Mladek [this message]
2025-05-02 15:30           ` John Ogness

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aBTfN5cSrPvHHvCS@localhost.localdomain \
    --to=pmladek@suse.com \
    --cc=akpm@linux-foundation.org \
    --cc=bsegall@google.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=john.ogness@linutronix.de \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=senozhatsky@chromium.org \
    --cc=tfiga@chromium.org \
    --cc=vincent.guittot@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox