Re: [PATCH] hung_task: configurable hung-task stacktrace loglevel

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Petr Mladek <pmladek@suse.com>
To: Tomasz Figa <tfiga@chromium.org>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>,
	Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Juri Lelli <juri.lelli@redhat.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
	John Ogness <john.ogness@linutronix.de>,
	Steven Rostedt <rostedt@goodmis.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH] hung_task: configurable hung-task stacktrace loglevel
Date: Fri, 2 May 2025 17:05:27 +0200	[thread overview]
Message-ID: <aBTfN5cSrPvHHvCS@localhost.localdomain> (raw)
In-Reply-To: <CAAFQd5BeJnYXZt06WVFBWu8cvCmXWTe_tH8Ly3ywTNRCjxXCMA@mail.gmail.com>

On Wed 2025-04-30 17:42:51, Tomasz Figa wrote:
> On Sat, Apr 26, 2025 at 12:32 AM Petr Mladek <pmladek@suse.com> wrote:
> >
> > On Fri 2025-04-25 15:58:46, Tomasz Figa wrote:
> > > Hi Petr,
> > >
> > > On Thu, Apr 24, 2025 at 7:59 PM Petr Mladek <pmladek@suse.com> wrote:
> > > >
> > > > On Thu 2025-04-24 16:02:43, Sergey Senozhatsky wrote:
> > > > > Currently, hung-task watchdog uses two different loglevels
> > > > > to report hung-tasks: a) KERN_INFO for all the important task
> > > > > information (e.g. sched_show_task()) and b)  KERN_ERR for the
> > > > > rest.
> > > >
> > > > IMHO, the two different loglevels make sense. The KERN_ERR
> > > > message seems to inform about that a task gets blocked for too long.
> > > > And KERN_INFO is used for an extra debug information.
> > > >
> > >
> > > > If the problem is matching all related lines. Then a solution
> > > > would be printing some help lines around the report, similar
> > > > to
> > > >
> > > >     ------------[ cut here ]------------
> > > >
> > > > in include/asm-generic/bug.h
> > > >
> > > > Plus, it would be needed to filter out messages from other CPUs.
> > > > CONFIG_PRINTK_CALLER should help with this.
> > >
> > > I'm not really in love with that idea - it would make things so much
> > > more complicated, despite already having the right tool to
> > > differentiate between the importance of various logs - after all the
> > > log level is exactly that.
> >
> > Honestly, the more I think about it the more I like the prefix/postfix
> > lines + the caller_id. I am afraid that manipulating log levels is a
> > lost fight  because different people might have different opinion
> > about how various messages are important.
> 
> The problem with the special lines is that it completely breaks any
> line-based processing in a data pipeline. For a piece of
> infrastructure that needs to deal with thousands of reports, on an
> on-demand basis, that would mean quite a bit of sequential work done
> instead of doing it in parallel and taking much more time to answer
> users' queries.
> 
> That could be worked around, though, if we could prefix each line
> separately with some special tag in addition to log level, timestamp
> and caller, though. Borrowing from Sergey's earlier example:
> 
> <3>[  125.297687][  T140][E] INFO: task zsh:470 blocked for more than
> 61 seconds.
> <3>[  125.302321][  T140][E]       Not tainted
> 6.15.0-rc3-next-20250424-00001-g258d8df78c77-dirty #154
> <3>[  125.309333][  T140][E] "echo 0 >
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> <6>[  125.315040][  T140][E] task:zsh             state:D stack:0
> pid:470   tgid:470   ppid:430    task_flags:0x400100 flags:0x00004002
> <6>[  125.320594][  T140][E] Call Trace:
> <6>[  125.322327][  T140][E]  <TASK>
> <6>[  125.323852][  T140][E]  __schedule+0x13b4/0x2120
> <6>[  125.325459][  T140][E]  ? schedule+0xdc/0x280
> <6>[  125.327100][  T140][E]  schedule+0xdc/0x280
> <6>[  125.328590][  T140][E]  schedule_preempt_disabled+0x10/0x20
> <6>[  125.330589][  T140][E]  __mutex_lock+0x698/0x1200
> <6>[  125.332291][  T140][E]  ? __mutex_lock+0x485/0x1200
> <6>[  125.334074][  T140][E]  mutex_lock+0x81/0x90
> <6>[  125.335113][  T140][E]  drop_caches_sysctl_handler+0x3e/0x140
> <6>[  125.336665][  T140][E]  proc_sys_call_handler+0x327/0x4f0
> <6>[  125.338069][  T140][E]  vfs_write+0x794/0xb60
> <6>[  125.339216][  T140][E]  ? proc_sys_read+0x10/0x10
> <6>[  125.340568][  T140][E]  ksys_write+0xb8/0x170
> <6>[  125.341701][  T140][E]  do_syscall_64+0xd0/0x1a0
> <6>[  125.343009][  T140][E]  ? arch_exit_to_user_mode_prepare+0x11/0x60
> <6>[  125.344612][  T140][E]  ? irqentry_exit_to_user_mode+0x7e/0xa0
> <6>[  125.346260][  T140][E]  entry_SYSCALL_64_after_hwframe+0x4b/0x53
> 
> where [E] would mean an "emergency" message, rather than something
> usual, regardless of the loglevel.

This is an interesting idea. It has several advantages. It would:

  + still allow to filter out the extra details on too slow consoles [1]
  + work even when the "cut here" prefix/postfix lines get lost
  + obsolete the config option forcing the same loglevel in emergency
      section => safe space in struct task_struct. [2]

[1] Note that there is still floating a patchset which allows to define
     per-console loglevel, see
     https://lore.kernel.org/r/cover.1730133890.git.chris@chrisdown.name

[2] It might be eventually replaced by a config option which would show
    all emergency messages on consoles.

Best Regards,
Petr

next prev parent reply	other threads:[~2025-05-02 15:05 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-24  7:02 [PATCH] hung_task: configurable hung-task stacktrace loglevel Sergey Senozhatsky
2025-04-24 10:58 ` Petr Mladek
2025-04-25  4:49   ` Sergey Senozhatsky
2025-04-25 14:55     ` Petr Mladek
2025-04-30  1:34       ` Sergey Senozhatsky
2025-04-30  1:57         ` Sergey Senozhatsky
2025-04-25  6:58   ` Tomasz Figa
2025-04-25 15:32     ` Petr Mladek
2025-04-28  8:05       ` John Ogness
2025-04-30  5:05         ` Sergey Senozhatsky
2025-04-30  8:42       ` Tomasz Figa
2025-05-02 15:05         ` Petr Mladek [this message]
2025-05-02 15:30           ` John Ogness

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aBTfN5cSrPvHHvCS@localhost.localdomain \
    --to=pmladek@suse.com \
    --cc=akpm@linux-foundation.org \
    --cc=bsegall@google.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=john.ogness@linutronix.de \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=senozhatsky@chromium.org \
    --cc=tfiga@chromium.org \
    --cc=vincent.guittot@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.