Re: [BUG] printk/nbcon.c: watchdog BUG: softlockup - CPU#x stuck for 78s

public inbox for linux-rt-users@vger.kernel.org
 help / color / mirror / Atom feed

From: John Ogness <john.ogness@linutronix.de>
To: Andrew Halaney <ahalaney@redhat.com>, tglx@linutronix.de
Cc: Derek Barbosa <debarbos@redhat.com>,
	pmladek@suse.com, rostedt@goodmis.org, senozhatsky@chromium.org,
	linux-rt-users@vger.kernel.org, linux-kernel@vger.kernel.org,
	williams@redhat.com, jlelli@redhat.com, lgoncalv@redhat.com,
	jwyatt@redhat.com, aubaker@redhat.com
Subject: Re: [BUG] printk/nbcon.c: watchdog BUG: softlockup - CPU#x stuck for 78s
Date: Wed, 19 Jun 2024 07:15:31 +0206	[thread overview]
Message-ID: <87o77xa584.fsf@jogness.linutronix.de> (raw)
In-Reply-To: <dtde47mfm3amxg4mbrnbct53ehpfbekdvrjhhd6j5tzl7lulwj@zwdsvkq3orag>

[ Explicitly added tglx, hoping he can chime in here. ]

On 2024-06-18, Andrew Halaney <ahalaney@redhat.com> wrote:
>> Shouldn't the scheduler eventually kick the task off the CPU after
>> its timeslice is up?
>
> I trust you better than myself about this, but this is being
> reproduced with a CONFIG_PREEMPT_DYNAMIC=y +
> CONFIG_PREEMPT_VOLUNTARY=y setup (so essentially the current mode is
> VOLUNTARY). Does that actually work that way for a kthread in that
> mode?

It would be good not to trust me better than yourself. I actually have
very little experience with the non-RT preemption models. I will need to
investigate this further.

> Just in case I did something dumb, here's the module I wrote up:
>
> ahalaney@x1gen2nano ~/git/linux-rt-devel (git)-[tags/v6.10-rc4-rt6-rebase] % cat kernel/printk/test_thread.c                         :(
> /*
>  * Test making a kthread similar to nbcon's (under load)
>  * to see if it also has issues with migrate_swap()
>  */
> #include "linux/nmi.h"
> #include <asm-generic/delay.h>
> #include <linux/kthread.h>
> #include <linux/module.h>
> #include <linux/sched.h>
>
> DEFINE_STATIC_SRCU(test_srcu);
> static DEFINE_SPINLOCK(test_lock);
> static struct task_struct *kt;
> static bool dont_stop = true;
>
> static int test_thread_func(void *unused) {
> 	unsigned long flags;
>
> 	pr_info("Starting the while true loop\n");
> 	do {
> 		int cookie = srcu_read_lock_nmisafe(&test_srcu);
> 		spin_lock_irqsave(&test_lock, flags);
> 		touch_nmi_watchdog();
> 		udelay(5000);  // print a line to serial
> 		spin_unlock_irqrestore(&test_lock, flags);
> 		srcu_read_unlock_nmisafe(&test_srcu, cookie);
> 	} while (dont_stop);
>
> 	return 0;
> }
>
> static int __init test_thread_init(void) {
>
> 	pr_info("Creating test_thread at -20 nice level\n");
> 	kt = kthread_run(test_thread_func, NULL, "test_thread");
> 	if (IS_ERR(kt)) {
> 		pr_err("Failed to make test_thread\n");
> 		return PTR_ERR(kt);
> 	}
> 	sched_set_normal(kt, -20);
>
> 	return 0;
> }
>
> static void __exit test_thread_exit(void) {
> 	dont_stop = false;
> 	kthread_stop(kt);
> }
>
> module_init(test_thread_init);
> module_exit(test_thread_exit);
> MODULE_LICENSE("GPL");

Thanks for the functional test! This should quite accurately reproduce
the situation when the printing thread is unable to catch up to the
amount of incoming messages.

Some function to explicitly trigger the scheduler may be needed. Such as
adding cond_resched() outside the critical section, before repeating the
loop. We would like to remove such explicit preemption points from the
kernel code, but perhaps it is necessary for the VOLUNTARY preemption
scheme.

John

next prev parent reply	other threads:[~2024-06-19  5:09 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-18 17:37 [BUG] printk/nbcon.c: watchdog BUG: softlockup - CPU#x stuck for 78s Derek Barbosa
2024-06-18 18:57 ` John Ogness
2024-06-18 22:52   ` Andrew Halaney
2024-06-19  5:09     ` John Ogness [this message]
2024-06-19  9:46     ` Petr Mladek
2024-06-20 17:27       ` Andrew Halaney
2024-06-21  7:57         ` Petr Mladek
2024-06-20  7:15 ` Sebastian Andrzej Siewior
2024-06-20  9:32   ` Sebastian Andrzej Siewior
2024-06-20  9:43     ` [PATCH] prinkt/nbcon: Add a scheduling point to nbcon_kthread_func() Sebastian Andrzej Siewior
2024-06-20 17:18       ` Andrew Halaney
2024-06-20 18:34         ` Derek Barbosa
2024-06-21  1:16       ` John Ogness

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87o77xa584.fsf@jogness.linutronix.de \
    --to=john.ogness@linutronix.de \
    --cc=ahalaney@redhat.com \
    --cc=aubaker@redhat.com \
    --cc=debarbos@redhat.com \
    --cc=jlelli@redhat.com \
    --cc=jwyatt@redhat.com \
    --cc=lgoncalv@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-rt-users@vger.kernel.org \
    --cc=pmladek@suse.com \
    --cc=rostedt@goodmis.org \
    --cc=senozhatsky@chromium.org \
    --cc=tglx@linutronix.de \
    --cc=williams@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox