netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: John Ogness <john.ogness@linutronix.de>
To: Calvin Owens <calvin@wbinvd.org>,
	Breno Leitao <leitao@debian.org>,
	Sebastian Siewior <bigeasy@linutronix.de>
Cc: Petr Mladek <pmladek@suse.com>, Mike Galbraith <efault@gmx.de>,
	Simon Horman <horms@kernel.org>,
	kuba@kernel.org, Pavel Begunkov <asml.silence@gmail.com>,
	Johannes Berg <johannes@sipsolutions.net>,
	paulmck@kernel.org, LKML <linux-kernel@vger.kernel.org>,
	netdev@vger.kernel.org, boqun.feng@gmail.com,
	Sergey Senozhatsky <senozhatsky@chromium.org>,
	Steven Rostedt <rostedt@goodmis.org>
Subject: Re: netconsole: HARDIRQ-safe -> HARDIRQ-unsafe lock order warning
Date: Tue, 30 Sep 2025 16:29:02 +0206	[thread overview]
Message-ID: <84frc4j9yx.fsf@jogness.linutronix.de> (raw)
In-Reply-To: <aNvh2Cd2i9MVA1d3@mozart.vkv.me>

(Added Sebastian.)

On 2025-09-30, Calvin Owens <calvin@wbinvd.org> wrote:
> On Wednesday 09/10 at 11:26 -0700, Breno Leitao wrote:
>> On Wed, Sep 10, 2025 at 05:12:43PM +0200, Petr Mladek wrote:
>> > On Wed 2025-09-10 14:28:40, John Ogness wrote:
>> 
>> > > @pmladek: We could introduce a new console flag (NBCON_ATOMIC_UNSAFE) so
>> > > that the callback is only used by nbcon_atomic_flush_unsafe().
>> > 
>> > This might be an acceptable compromise. It would try to emit messages
>> > only at the very end of panic() as the last desperate attempt.
>> > 
>> > Just to be sure, what do you mean with unsafe?
>> > 
>> >     + taking IRQ unsafe locks?
>> 
>> Taking IRQ unsafe locks is the major issue we have in netconsole today.
>> Basically the drivers can implement IRQ unsafe locks in their
>> .ndo_start_xmit() callback, and in some cases those are IRQ unsafe,
>> which doesn't match with .write_atomic(), which expect all the inner
>> locks to be IRQ safe.
>
> Hmm, I'm also hitting the below on next-20250926 with translated=strict,
> the triggering acquisition is here:
>
>     https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/iommu/iova.c?id=30d4efb2f5a515a60fe6b0ca85362cbebea21e2f#n832
>
> Naively I'd think the IOMMU code would need to be safe to call with
> interrupts disabled? Do we need raw_spin_lock() in some places there?
>
> I'll have more time to dig and maybe send a patch tomorrow, any quick
> thoughts are appreciated.
>
> [  319.006534][   T16] BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48
> [  319.006536][   T16] in_atomic(): 0, irqs_disabled(): 1, non_block: 0, pid: 16, name: pr/legacy
> [  319.006537][   T16] preempt_count: 0, expected: 0
> [  319.006537][   T16] RCU nest depth: 3, expected: 3
> [  319.006538][   T16] 8 locks held by pr/legacy/16:
> [  319.006539][   T16]  #0: ffffffff831ffbe0 (console_lock){+.+.}-{0:0}, at: legacy_kthread_func+0x1e/0xc0
> [  319.006546][   T16]  #1: ffffffff831ffc30 (console_srcu){....}-{0:0}, at: console_flush_all+0xf2/0x430
> [  319.006550][   T16]  #2: ffffffff832c3ef8 (target_list_lock){+.+.}-{3:3}, at: write_ext_msg.part.0+0x28/0x4d0
> [  319.006554][   T16]  #3: ffffffff83202720 (rcu_read_lock){....}-{1:3}, at: rt_spin_lock+0xd5/0x1a0
> [  319.006557][   T16]  #4: ffffffff83202720 (rcu_read_lock){....}-{1:3}, at: __netpoll_send_skb+0x4a/0x3c0
> [  319.006561][   T16]  #5: ffff888107c89e98 (_xmit_ETHER#2){+...}-{3:3}, at: __netpoll_send_skb+0x2d6/0x3c0
> [  319.006564][   T16]  #6: ffffffff83202720 (rcu_read_lock){....}-{1:3}, at: rt_spin_trylock+0x59/0x130
> [  319.006567][   T16]  #7: ffffe8ffffc06218 (&cpu_rcache->lock){+.+.}-{3:3}, at: alloc_iova_fast+0x70/0x2d0
> [  319.006570][   T16] irq event stamp: 20680
> [  319.006571][   T16] hardirqs last  enabled at (20679): [<ffffffff81ee7c3c>] _raw_spin_unlock_irqrestore+0x3c/0x50
> [  319.006573][   T16] hardirqs last disabled at (20680): [<ffffffff81d37b40>] netpoll_send_skb+0x30/0x70
> [  319.006575][   T16] softirqs last  enabled at (0): [<ffffffff8138970a>] copy_process+0x7aa/0x1940
> [  319.006577][   T16] softirqs last disabled at (0): [<0000000000000000>] 0x0
> [  319.006580][   T16] CPU: 0 UID: 0 PID: 16 Comm: pr/legacy Not tainted 6.17.0-rc7-next-20250926 #1 PREEMPT_{RT,LAZY}
> [  319.006582][   T16] Hardware name: ASUSTeK COMPUTER INC. WS C246M PRO Series/WS C246M PRO Series, BIOS 3301 03/23/2020
> [  319.006583][   T16] Call Trace:
> [  319.006584][   T16]  <TASK>
> [  319.006586][   T16]  dump_stack_lvl+0x57/0x80
> [  319.006590][   T16]  __might_resched.cold+0xec/0xfd
> [  319.006592][   T16]  rt_spin_lock+0x52/0x1a0
> [  319.006594][   T16]  ? alloc_iova_fast+0x70/0x2d0
> [  319.006598][   T16]  alloc_iova_fast+0x70/0x2d0
> [  319.006603][   T16]  iommu_dma_alloc_iova+0xca/0x100
> [  319.006606][   T16]  __iommu_dma_map+0x7f/0x170
> [  319.006611][   T16]  iommu_dma_map_phys+0xb7/0x190
> [  319.006615][   T16]  dma_map_phys+0xc9/0x130
> [  319.006619][   T16]  igc_tx_map.isra.0+0x155/0x570
> [  319.006625][   T16]  igc_xmit_frame_ring+0x2f3/0x510
> [  319.006627][   T16]  ? rt_spin_trylock+0x59/0x130
> [  319.006631][   T16]  netpoll_start_xmit+0x11c/0x190
> [  319.006635][   T16]  __netpoll_send_skb+0x32b/0x3c0
> [  319.006640][   T16]  netpoll_send_skb+0x3e/0x70

netpoll_send_skb() is doing local_irq_save(), which is disabling
hardware interrupts. So nothing deeper in the stack may sleep. But
__iova_rcache_get() is performing a spin_lock_irqsave, which for
PREEMPT_RT can sleep.

It would be nice to replace that local_irq_save() with a real lock
type.

@bigeasy: You have some experience cleaning up this class of
problems. Any suggestions?

> [  319.006643][   T16]  write_ext_msg.part.0+0x457/0x4d0
> [  319.006650][   T16]  console_emit_next_record+0xcb/0x1c0
> [  319.006656][   T16]  console_flush_all+0x274/0x430
> [  319.006660][   T16]  ? devkmsg_write+0x110/0x110
> [  319.006663][   T16]  __console_flush_and_unlock+0x34/0xa0
> [  319.006666][   T16]  legacy_kthread_func+0x23/0xc0
> [  319.006669][   T16]  ? swake_up_locked+0x50/0x50
> [  319.006672][   T16]  kthread+0xf9/0x200
> [  319.006675][   T16]  ? kthread_fetch_affinity.isra.0+0x40/0x40
> [  319.006678][   T16]  ret_from_fork+0xff/0x150
> [  319.006681][   T16]  ? kthread_fetch_affinity.isra.0+0x40/0x40
> [  319.006682][   T16]  ? kthread_fetch_affinity.isra.0+0x40/0x40
> [  319.006684][   T16]  ret_from_fork_asm+0x11/0x20
> [  319.006694][   T16]  </TASK>

John Ogness

  reply	other threads:[~2025-09-30 14:23 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <fb38cfe5153fd67f540e6e8aff814c60b7129480.camel@gmx.de>
2025-08-14 10:16 ` netconsole: HARDIRQ-safe -> HARDIRQ-unsafe lock order warning Breno Leitao
2025-08-14 15:45   ` Pavel Begunkov
2025-08-15  0:23   ` Jakub Kicinski
2025-08-15 10:44     ` Pavel Begunkov
2025-08-15 16:42       ` Jakub Kicinski
2025-08-15 17:29         ` Breno Leitao
2025-08-15 17:33           ` Jakub Kicinski
2025-08-18 12:23             ` Breno Leitao
2025-08-15 19:10           ` Calvin Owens
2025-08-16  9:19             ` Mike Galbraith
2025-08-15 20:02           ` Pavel Begunkov
2025-08-18 12:10             ` Breno Leitao
2025-08-19 17:27               ` Breno Leitao
2025-08-20 12:31                 ` Mike Galbraith
2025-08-20 17:36                   ` Breno Leitao
2025-08-21  3:37                     ` Mike Galbraith
2025-08-21  3:51                       ` Mike Galbraith
2025-08-21 17:35                         ` Breno Leitao
2025-08-22  3:54                           ` Mike Galbraith
2025-08-26 12:43                             ` Breno Leitao
2025-08-26 13:56                               ` Mike Galbraith
2025-09-05 12:48                               ` John Ogness
2025-09-06  2:32                                 ` Mike Galbraith
2025-09-08 13:30                                   ` John Ogness
2025-09-08 15:18                                     ` Mike Galbraith
2025-09-08 20:27                                 ` Calvin Owens
2025-09-09 15:49                                   ` Mike Galbraith
2025-09-10 15:51                                   ` Petr Mladek
2025-09-09 12:50                                 ` Breno Leitao
2025-09-10 12:22                                   ` John Ogness
2025-09-10 15:12                                     ` Petr Mladek
2025-09-10 18:26                                       ` Breno Leitao
2025-09-30 13:57                                         ` Calvin Owens
2025-09-30 14:23                                           ` John Ogness [this message]
2025-09-30 14:30                                             ` Sebastian Siewior
2025-09-30 17:35                                               ` Mike Galbraith
2025-10-01  6:00                                                 ` Mike Galbraith
2025-09-11 13:03                                       ` John Ogness
2025-09-10 18:23                                     ` Breno Leitao
2025-09-11 13:13                                       ` John Ogness
2025-08-21 10:06                       ` Mike Galbraith
2025-08-21 13:12                         ` Mike Galbraith
2025-08-15 17:37         ` Calvin Owens
2025-08-26 14:10         ` Johannes Berg
2025-08-15 12:45     ` Mike Galbraith

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=84frc4j9yx.fsf@jogness.linutronix.de \
    --to=john.ogness@linutronix.de \
    --cc=asml.silence@gmail.com \
    --cc=bigeasy@linutronix.de \
    --cc=boqun.feng@gmail.com \
    --cc=calvin@wbinvd.org \
    --cc=efault@gmx.de \
    --cc=horms@kernel.org \
    --cc=johannes@sipsolutions.net \
    --cc=kuba@kernel.org \
    --cc=leitao@debian.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=paulmck@kernel.org \
    --cc=pmladek@suse.com \
    --cc=rostedt@goodmis.org \
    --cc=senozhatsky@chromium.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).