From: Petr Mladek <pmladek@suse.com>
To: Ryo Takakura <ryotkkr98@gmail.com>
Cc: john.ogness@linutronix.de, Jason@zx2c4.com,
gregkh@linuxfoundation.org, linux-serial@vger.kernel.org,
lkp@intel.com, oe-lkp@lists.linux.dev, oliver.sang@intel.com
Subject: Re: [linux-next:master] [serial] b63e6f60ea: BUG:soft_lockup-CPU##stuck_for#s![modprobe:#]
Date: Thu, 24 Apr 2025 11:02:54 +0200 [thread overview]
Message-ID: <aAn-PkxRAz34tTPR@pathway.suse.cz> (raw)
In-Reply-To: <20250424081101.110914-1-ryotkkr98@gmail.com>
On Thu 2025-04-24 17:11:01, Ryo Takakura wrote:
> Hi Petr and John!
>
> On Tue, 22 Apr 2025 14:15:01 +0200, Petr Mladek wrote:
> >On Mon 2025-04-21 12:41:50, Ryo Takakura wrote:
> >> Hi!
> >>
> >> I would like to follow up the last email that I sent.
> >>
> >> First, I'm sorry that I later realized that I should have tested
> >> the rslib test as an inserted module, as how the robot does, by
> >> choosing CONFIG_REED_SOLOMON_TEST=m.
> >> Not as a boottime test by enabling CONFIG_REED_SOLOMON_TEST=y.
> >>
> >> Running the rslib test as an inserted module without the John's series
> >> was less prone to softlockup. Without the John's series, softlockup shows
> >> up once in a test or not at all. With the John's series, softlockup can
> >> be observed constanly over the test.
> >
> >> >>Thanks Ryo for looking into this! I think we need to have a technical
> >> >>explanation/understanding of the problem so that it is clear how my
> >> >>series triggers or exaggerates the issue.
> >>
> >> As mentioned earlier, I'm sorry that I should have run the test as
> >> inserted module... It seems the series does make the test more prone
> >> to softlockups.
> >
> >IMHO, the main difference is that the patch "serial: 8250: Switch to
> >nbcon console" removes touch_nmi_watchdog() from
> >serial8250_console_write().
> >
> >The touch_nmi_watchdog() resets the softlockup watchdog. It might
> >hide that the CPU did not schedule for a long time.
> >
> >The touch_nmi_watchdog() was there because the console_lock() owner,
> >used by the legacy loop, was responsible for flushing all pending
> >messages. And there might be many pending messages when new ones
> >were added by other CPUs in parallel. And the legacy loop
> >could not call cond_resched() when called from printk() because
> >printk() might be called in atomic context.
>
> I see. Without the John's series, the cond_resched() in the mention
> code path should be called during the rslib test as it's not in atomic
> context in addition to the touch_nmi_watchdog().
Just to be sure. The right fix is to add cond_resched() to rslib test.
The code should allow scheduling and do not block the CPU for too
long.
touch_nmi_watchdog() just hides the problem. It was used in printk()
because there was no better solution.
> I used this kernel[1] which is for raspberry pi. Let me recheck
> with some other machine with Linus' master and linux-next to see
> if the behavior is raspberry pi specific.
John explained why the emergency context helped. I think that we have
a pretty good understating of what is going on there.
I believe that the problem will be the same in all code streams.
It might be enough to check one of them (Linus' tree or linux-next)
just to be sure that the fix applies and it has not been already
fixed.
Best Regards,
Petr
next prev parent reply other threads:[~2025-04-24 9:02 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-01-22 2:28 [linux-next:master] [serial] b63e6f60ea: BUG:soft_lockup-CPU##stuck_for#s![modprobe:#] kernel test robot
2025-01-22 8:41 ` John Ogness
2025-01-22 9:37 ` Greg Kroah-Hartman
2025-01-24 16:10 ` Petr Mladek
2025-01-24 16:39 ` Petr Mladek
2025-03-15 3:38 ` Ryo Takakura
2025-03-17 8:45 ` John Ogness
2025-03-17 14:42 ` Ryo Takakura
2025-04-21 3:41 ` Ryo Takakura
2025-04-22 12:15 ` Petr Mladek
2025-04-22 14:03 ` John Ogness
2025-04-24 8:11 ` Ryo Takakura
2025-04-24 9:00 ` John Ogness
2025-04-24 14:13 ` Ryo Takakura
2025-04-24 9:02 ` Petr Mladek [this message]
2025-04-24 14:17 ` Ryo Takakura
2025-04-30 9:15 ` Ryo Takakura
2025-04-30 15:41 ` John Ogness
2025-05-01 4:10 ` Ryo Takakura
2025-06-16 15:15 ` Florian Bezdeka
2025-06-18 4:42 ` John Ogness
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aAn-PkxRAz34tTPR@pathway.suse.cz \
--to=pmladek@suse.com \
--cc=Jason@zx2c4.com \
--cc=gregkh@linuxfoundation.org \
--cc=john.ogness@linutronix.de \
--cc=linux-serial@vger.kernel.org \
--cc=lkp@intel.com \
--cc=oe-lkp@lists.linux.dev \
--cc=oliver.sang@intel.com \
--cc=ryotkkr98@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.