public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: John Ogness <john.ogness@linutronix.de>
To: Petr Mladek <pmladek@suse.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Jiri Slaby <jirislaby@kernel.org>,
	Sergey Senozhatsky <senozhatsky@chromium.org>,
	Steven Rostedt <rostedt@goodmis.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Esben Haabendal <esben@geanix.com>,
	linux-serial@vger.kernel.org, linux-kernel@vger.kernel.org,
	Andy Shevchenko <andriy.shevchenko@linux.intel.com>,
	Arnd Bergmann <arnd@arndb.de>, Tony Lindgren <tony@atomide.com>,
	Niklas Schnelle <schnelle@linux.ibm.com>,
	Serge Semin <fancer.lancer@gmail.com>,
	Andrew Murray <amurray@thegoodpenguin.co.uk>,
	Petr Mladek <pmladek@suse.com>
Subject: Re: [PATCH 3/3] printk/nbcon: Release nbcon consoles ownership in atomic flush after each emitted record
Date: Fri, 26 Sep 2025 16:49:12 +0206	[thread overview]
Message-ID: <84v7l5gtq7.fsf@jogness.linutronix.de> (raw)
In-Reply-To: <20250926124912.243464-4-pmladek@suse.com>

On 2025-09-26, Petr Mladek <pmladek@suse.com> wrote:
> printk() tries to flush messages with NBCON_PRIO_EMERGENCY on
> nbcon consoles immediately. It might take seconds to flush all
> pending lines on slow serial consoles. Note that there might be
> hundreds of messages, for example:
>
> [    3.771531][    T1] pci 0000:3e:08.1: [8086:324
> ** replaying previous printk message **
> [    3.771531][    T1] pci 0000:3e:08.1: [8086:3246] type 00 class 0x088000 PCIe Root Complex Integrated Endpoint
> [ ... more than 2000 lines, about 200kB messages ... ]
> [    3.837752][    T1] pci 0000:20:01.0: Adding to iommu group 18
> [    3.837851][    T
> ** replaying previous printk message **
> [    3.837851][    T1] pci 0000:20:03.0: Adding to iommu group 19
> [    3.837946][    T1] pci 0000:20:05.0: Adding to iommu group 20
> [ ... more than 500 messages for iommu groups 21-590 ...]
> [    3.912932][    T1] pci 0000:f6:00.1: Adding to iommu group 591
> [    3.913070][    T1] pci 0000:f6:00.2: Adding to iommu group 592
> [    3.913243][    T1] DMAR: Intel(R) Virtualization Technology for Directed I/O
> [    3.913245][    T1] PCI-DMA: Using software bounce buffering for IO (SWIOTLB)
> [    3.913245][    T1] software IO TLB: mapped [mem 0x000000004f000000-0x0000000053000000] (64MB)
> [    3.913324][    T1] RAPL PMU: API unit is 2^-32 Joules, 3 fixed counters, 655360 ms ovfl timer
> [    3.913325][    T1] RAPL PMU: hw unit of domain package 2^-14 Joules
> [    3.913326][    T1] RAPL PMU: hw unit of domain dram 2^-14 Joules
> [    3.913327][    T1] RAPL PMU: hw unit of domain psys 2^-0 Joules
> [    3.933486][    T1] ------------[ cut here ]------------
> [    3.933488][    T1] WARNING: CPU: 2 PID: 1 at arch/x86/events/intel/uncore.c:1156 uncore_pci_pmu_register+0x15e/0x180
> [    3.930291][    C0] watchdog: Watchdog detected hard LOCKUP on cpu 0
> [    3.930291][    C0] Kernel panic - not syncing: Hard LOCKUP
> [...]
> [    3.930291][    C0] CPU: 0 UID: 0 PID: 18 Comm: pr/ttyS0 Not tainted...
> [...]
> [    3.930291][    C0] RIP: 0010:nbcon_reacquire_nobuf+0x11/0x50
> [    3.930291][    C0] Call Trace:
> [...]
> [    3.930291][    C0]  <TASK>
> [    3.930291][    C0]  serial8250_console_write+0x16d/0x5c0
> [    3.930291][    C0]  nbcon_emit_next_record+0x22c/0x250
> [    3.930291][    C0]  nbcon_emit_one+0x93/0xe0
> [    3.930291][    C0]  nbcon_kthread_func+0x13c/0x1c0
>
> The are visible two takeovers of the console ownership:
>
>   - The 1st one is triggered by the "WARNING: CPU: 2 PID: 1 at
>     arch/x86/..." line printed with NBCON_PRIO_EMERGENCY.
>
>   - The 2nd one is triggered by the "Kernel panic - not syncing:
>     Hard LOCKUP" line printed with NBCON_PRIO_PANIC.
>
> There are more than 2500 lines, at about 240kB, emitted between
> the takeover and the 1st "WARNING" line in the emergency context.
> This amount of pending messages had to be flushed by
> nbcon_atomic_flush_pending() when WARN() printed its first line.
>
> The atomic flush was holding the nbcon console context for too long so
> that it triggered hard lockup on the CPU running the printk kthread
> "pr/ttyS0". The kthread needed to reacquire the console ownership
> for restoring the original serial port state in serial8250_console_write().
>
> Prevent the hardlockup by releasing the nbcon console ownership after
> each emitted record.
>
> Note that __nbcon_atomic_flush_pending_con() used to hold the console
> ownership all the time because it blocked the printk kthread. Otherwise
> the kthread tried to flush the messages in parallel which caused repeated
> takeovers and more replayed messages.
>
> It is not longer a problem because the repeated takeovers are blocked
> by the counter of emergency contexts, see nbcon_cpu_emergency_cnt.
>
> Link: https://lore.kernel.org/all/aNQO-zl3k1l4ENfy@pathway.suse.cz
> Signed-off-by: Petr Mladek <pmladek@suse.com>

Reviewed-by: John Ogness <john.ogness@linutronix.de>

Looks good and performs as advertised.

  reply	other threads:[~2025-09-26 14:43 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-26 12:49 [PATCH 0/3] printk/nbcon: Prevent hardlockup reports caused by atomic nbcon flush Petr Mladek
2025-09-26 12:49 ` [PATCH 1/3] printk/nbcon: Block printk kthreads when any CPU is in an emergency context Petr Mladek
2025-09-26 14:37   ` John Ogness
2025-09-29 12:02     ` Petr Mladek
2025-09-29  8:40   ` Andrew Murray
2025-09-30 20:15   ` kernel test robot
2025-09-26 12:49 ` [PATCH 2/3] printk/nbcon/panic: Allow printk kthread to sleep when the system is in panic Petr Mladek
2025-09-26 14:38   ` John Ogness
2025-09-29  8:39   ` Andrew Murray
2025-09-26 12:49 ` [PATCH 3/3] printk/nbcon: Release nbcon consoles ownership in atomic flush after each emitted record Petr Mladek
2025-09-26 14:43   ` John Ogness [this message]
2025-09-29  8:38   ` Andrew Murray
2025-10-30 11:32 ` [PATCH 0/3] printk/nbcon: Prevent hardlockup reports caused by atomic nbcon flush Petr Mladek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=84v7l5gtq7.fsf@jogness.linutronix.de \
    --to=john.ogness@linutronix.de \
    --cc=amurray@thegoodpenguin.co.uk \
    --cc=andriy.shevchenko@linux.intel.com \
    --cc=arnd@arndb.de \
    --cc=esben@geanix.com \
    --cc=fancer.lancer@gmail.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=jirislaby@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-serial@vger.kernel.org \
    --cc=pmladek@suse.com \
    --cc=rostedt@goodmis.org \
    --cc=schnelle@linux.ibm.com \
    --cc=senozhatsky@chromium.org \
    --cc=tglx@linutronix.de \
    --cc=tony@atomide.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox