From: Petr Mladek <pmladek@suse.com>
To: Andrew Halaney <ahalaney@redhat.com>
Cc: John Ogness <john.ogness@linutronix.de>,
Derek Barbosa <debarbos@redhat.com>,
rostedt@goodmis.org, senozhatsky@chromium.org,
linux-rt-users@vger.kernel.org, linux-kernel@vger.kernel.org,
williams@redhat.com, jlelli@redhat.com, lgoncalv@redhat.com,
jwyatt@redhat.com, aubaker@redhat.com
Subject: Re: [BUG] printk/nbcon.c: watchdog BUG: softlockup - CPU#x stuck for 78s
Date: Fri, 21 Jun 2024 09:57:41 +0200 [thread overview]
Message-ID: <ZnUydTAC9lbTvF9H@pathway.suse.cz> (raw)
In-Reply-To: <xt5rhuygy2wwugsbdxemfrlkfcj6czfsmdkkjlqpmf4lcvc4pk@o6j6errohzfs>
On Thu 2024-06-20 12:27:12, Andrew Halaney wrote:
> On Wed, Jun 19, 2024 at 11:46:38AM GMT, Petr Mladek wrote:
> > On Tue 2024-06-18 17:52:19, Andrew Halaney wrote:
> > > On Tue, Jun 18, 2024 at 09:03:00PM GMT, John Ogness wrote:
> > > Just in case I did something dumb, here's the module I wrote up:
> > >
> > > ahalaney@x1gen2nano ~/git/linux-rt-devel (git)-[tags/v6.10-rc4-rt6-rebase] % cat kernel/printk/test_thread.c :(
> > > /*
> > > * Test making a kthread similar to nbcon's (under load)
> > > * to see if it also has issues with migrate_swap()
> > > */
> > > #include "linux/nmi.h"
> > > #include <asm-generic/delay.h>
> > > #include <linux/kthread.h>
> > > #include <linux/module.h>
> > > #include <linux/sched.h>
> > >
> > > DEFINE_STATIC_SRCU(test_srcu);
> > > static DEFINE_SPINLOCK(test_lock);
> > > static struct task_struct *kt;
> > > static bool dont_stop = true;
> > >
> > > static int test_thread_func(void *unused) {
> > > unsigned long flags;
> > >
> > > pr_info("Starting the while true loop\n");
> > > do {
> > > int cookie = srcu_read_lock_nmisafe(&test_srcu);
> > > spin_lock_irqsave(&test_lock, flags);
> > > touch_nmi_watchdog();
> > > udelay(5000); // print a line to serial
> > > spin_unlock_irqrestore(&test_lock, flags);
> > > srcu_read_unlock_nmisafe(&test_srcu, cookie);
> >
> > Does it help to add here?
> >
> > cond_resched();
> >
> > > } while (dont_stop);
> > >
> > > return 0;
> > > }
> > >
> > > static int __init test_thread_init(void) {
> > >
> > > pr_info("Creating test_thread at -20 nice level\n");
> > > kt = kthread_run(test_thread_func, NULL, "test_thread");
> > > if (IS_ERR(kt)) {
> > > pr_err("Failed to make test_thread\n");
> > > return PTR_ERR(kt);
> > > }
> > > sched_set_normal(kt, -20);
> > >
> > > return 0;
> > > }
> > >
> > > static void __exit test_thread_exit(void) {
> > > dont_stop = false;
> > > kthread_stop(kt);
> > > }
> > >
> > > module_init(test_thread_init);
> > > module_exit(test_thread_exit);
> > > MODULE_LICENSE("GPL");
> >
> > The touch_nmi_watchdog() caused that watchdog_timer_fn() did not see
> > that "test_thread" kthread did not schedule. By other words, it did
> > hide the problem.
> >
>
> Is it reasonable to consider removing the touch_nmi_watchdog()'s in
> 8250_port.c? There's some rather old ones, and some new ones with the
> nbcon transition, and they sort of made finding this issue more
> indirect.
>
> Could be some valid reason they exist still, but to me it seems sensible
> to remove if we can't think of any good reasons.
Good point!
I believe that they were added because of flushing printk() messages.
This is the case of commit 54f19b4a6791491 ("tty/serial/8250: Touch
NMI watchdog in wait_for_xmitr"), definitely. The others were added
before git history so that it is more complicated to check it.
Anyway, I think that it is not necessary to touch the watchdog on
every operation on the serial console. It should be enough to
touch them only around writing single printk record/message.
And it is better to do so in the generic printk cycle than
in particular console drivers.
Well, we need to make sure that the watchdog is touched in all
cycles flushing consoles, like console_flush_all() or
__nbcon_atomic_flush_pending_con().
Best Regards,
Petr
next prev parent reply other threads:[~2024-06-21 7:57 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-06-18 17:37 [BUG] printk/nbcon.c: watchdog BUG: softlockup - CPU#x stuck for 78s Derek Barbosa
2024-06-18 18:57 ` John Ogness
2024-06-18 22:52 ` Andrew Halaney
2024-06-19 5:09 ` John Ogness
2024-06-19 9:46 ` Petr Mladek
2024-06-20 17:27 ` Andrew Halaney
2024-06-21 7:57 ` Petr Mladek [this message]
2024-06-20 7:15 ` Sebastian Andrzej Siewior
2024-06-20 9:32 ` Sebastian Andrzej Siewior
2024-06-20 9:43 ` [PATCH] prinkt/nbcon: Add a scheduling point to nbcon_kthread_func() Sebastian Andrzej Siewior
2024-06-20 17:18 ` Andrew Halaney
2024-06-20 18:34 ` Derek Barbosa
2024-06-21 1:16 ` John Ogness
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZnUydTAC9lbTvF9H@pathway.suse.cz \
--to=pmladek@suse.com \
--cc=ahalaney@redhat.com \
--cc=aubaker@redhat.com \
--cc=debarbos@redhat.com \
--cc=jlelli@redhat.com \
--cc=john.ogness@linutronix.de \
--cc=jwyatt@redhat.com \
--cc=lgoncalv@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-rt-users@vger.kernel.org \
--cc=rostedt@goodmis.org \
--cc=senozhatsky@chromium.org \
--cc=williams@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox