From: John Ogness <john.ogness@linutronix.de>
To: Calvin Owens <calvin@wbinvd.org>,
Breno Leitao <leitao@debian.org>,
Sebastian Siewior <bigeasy@linutronix.de>
Cc: Petr Mladek <pmladek@suse.com>, Mike Galbraith <efault@gmx.de>,
Simon Horman <horms@kernel.org>,
kuba@kernel.org, Pavel Begunkov <asml.silence@gmail.com>,
Johannes Berg <johannes@sipsolutions.net>,
paulmck@kernel.org, LKML <linux-kernel@vger.kernel.org>,
netdev@vger.kernel.org, boqun.feng@gmail.com,
Sergey Senozhatsky <senozhatsky@chromium.org>,
Steven Rostedt <rostedt@goodmis.org>
Subject: Re: netconsole: HARDIRQ-safe -> HARDIRQ-unsafe lock order warning
Date: Tue, 30 Sep 2025 16:29:02 +0206 [thread overview]
Message-ID: <84frc4j9yx.fsf@jogness.linutronix.de> (raw)
In-Reply-To: <aNvh2Cd2i9MVA1d3@mozart.vkv.me>
(Added Sebastian.)
On 2025-09-30, Calvin Owens <calvin@wbinvd.org> wrote:
> On Wednesday 09/10 at 11:26 -0700, Breno Leitao wrote:
>> On Wed, Sep 10, 2025 at 05:12:43PM +0200, Petr Mladek wrote:
>> > On Wed 2025-09-10 14:28:40, John Ogness wrote:
>>
>> > > @pmladek: We could introduce a new console flag (NBCON_ATOMIC_UNSAFE) so
>> > > that the callback is only used by nbcon_atomic_flush_unsafe().
>> >
>> > This might be an acceptable compromise. It would try to emit messages
>> > only at the very end of panic() as the last desperate attempt.
>> >
>> > Just to be sure, what do you mean with unsafe?
>> >
>> > + taking IRQ unsafe locks?
>>
>> Taking IRQ unsafe locks is the major issue we have in netconsole today.
>> Basically the drivers can implement IRQ unsafe locks in their
>> .ndo_start_xmit() callback, and in some cases those are IRQ unsafe,
>> which doesn't match with .write_atomic(), which expect all the inner
>> locks to be IRQ safe.
>
> Hmm, I'm also hitting the below on next-20250926 with translated=strict,
> the triggering acquisition is here:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/iommu/iova.c?id=30d4efb2f5a515a60fe6b0ca85362cbebea21e2f#n832
>
> Naively I'd think the IOMMU code would need to be safe to call with
> interrupts disabled? Do we need raw_spin_lock() in some places there?
>
> I'll have more time to dig and maybe send a patch tomorrow, any quick
> thoughts are appreciated.
>
> [ 319.006534][ T16] BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48
> [ 319.006536][ T16] in_atomic(): 0, irqs_disabled(): 1, non_block: 0, pid: 16, name: pr/legacy
> [ 319.006537][ T16] preempt_count: 0, expected: 0
> [ 319.006537][ T16] RCU nest depth: 3, expected: 3
> [ 319.006538][ T16] 8 locks held by pr/legacy/16:
> [ 319.006539][ T16] #0: ffffffff831ffbe0 (console_lock){+.+.}-{0:0}, at: legacy_kthread_func+0x1e/0xc0
> [ 319.006546][ T16] #1: ffffffff831ffc30 (console_srcu){....}-{0:0}, at: console_flush_all+0xf2/0x430
> [ 319.006550][ T16] #2: ffffffff832c3ef8 (target_list_lock){+.+.}-{3:3}, at: write_ext_msg.part.0+0x28/0x4d0
> [ 319.006554][ T16] #3: ffffffff83202720 (rcu_read_lock){....}-{1:3}, at: rt_spin_lock+0xd5/0x1a0
> [ 319.006557][ T16] #4: ffffffff83202720 (rcu_read_lock){....}-{1:3}, at: __netpoll_send_skb+0x4a/0x3c0
> [ 319.006561][ T16] #5: ffff888107c89e98 (_xmit_ETHER#2){+...}-{3:3}, at: __netpoll_send_skb+0x2d6/0x3c0
> [ 319.006564][ T16] #6: ffffffff83202720 (rcu_read_lock){....}-{1:3}, at: rt_spin_trylock+0x59/0x130
> [ 319.006567][ T16] #7: ffffe8ffffc06218 (&cpu_rcache->lock){+.+.}-{3:3}, at: alloc_iova_fast+0x70/0x2d0
> [ 319.006570][ T16] irq event stamp: 20680
> [ 319.006571][ T16] hardirqs last enabled at (20679): [<ffffffff81ee7c3c>] _raw_spin_unlock_irqrestore+0x3c/0x50
> [ 319.006573][ T16] hardirqs last disabled at (20680): [<ffffffff81d37b40>] netpoll_send_skb+0x30/0x70
> [ 319.006575][ T16] softirqs last enabled at (0): [<ffffffff8138970a>] copy_process+0x7aa/0x1940
> [ 319.006577][ T16] softirqs last disabled at (0): [<0000000000000000>] 0x0
> [ 319.006580][ T16] CPU: 0 UID: 0 PID: 16 Comm: pr/legacy Not tainted 6.17.0-rc7-next-20250926 #1 PREEMPT_{RT,LAZY}
> [ 319.006582][ T16] Hardware name: ASUSTeK COMPUTER INC. WS C246M PRO Series/WS C246M PRO Series, BIOS 3301 03/23/2020
> [ 319.006583][ T16] Call Trace:
> [ 319.006584][ T16] <TASK>
> [ 319.006586][ T16] dump_stack_lvl+0x57/0x80
> [ 319.006590][ T16] __might_resched.cold+0xec/0xfd
> [ 319.006592][ T16] rt_spin_lock+0x52/0x1a0
> [ 319.006594][ T16] ? alloc_iova_fast+0x70/0x2d0
> [ 319.006598][ T16] alloc_iova_fast+0x70/0x2d0
> [ 319.006603][ T16] iommu_dma_alloc_iova+0xca/0x100
> [ 319.006606][ T16] __iommu_dma_map+0x7f/0x170
> [ 319.006611][ T16] iommu_dma_map_phys+0xb7/0x190
> [ 319.006615][ T16] dma_map_phys+0xc9/0x130
> [ 319.006619][ T16] igc_tx_map.isra.0+0x155/0x570
> [ 319.006625][ T16] igc_xmit_frame_ring+0x2f3/0x510
> [ 319.006627][ T16] ? rt_spin_trylock+0x59/0x130
> [ 319.006631][ T16] netpoll_start_xmit+0x11c/0x190
> [ 319.006635][ T16] __netpoll_send_skb+0x32b/0x3c0
> [ 319.006640][ T16] netpoll_send_skb+0x3e/0x70
netpoll_send_skb() is doing local_irq_save(), which is disabling
hardware interrupts. So nothing deeper in the stack may sleep. But
__iova_rcache_get() is performing a spin_lock_irqsave, which for
PREEMPT_RT can sleep.
It would be nice to replace that local_irq_save() with a real lock
type.
@bigeasy: You have some experience cleaning up this class of
problems. Any suggestions?
> [ 319.006643][ T16] write_ext_msg.part.0+0x457/0x4d0
> [ 319.006650][ T16] console_emit_next_record+0xcb/0x1c0
> [ 319.006656][ T16] console_flush_all+0x274/0x430
> [ 319.006660][ T16] ? devkmsg_write+0x110/0x110
> [ 319.006663][ T16] __console_flush_and_unlock+0x34/0xa0
> [ 319.006666][ T16] legacy_kthread_func+0x23/0xc0
> [ 319.006669][ T16] ? swake_up_locked+0x50/0x50
> [ 319.006672][ T16] kthread+0xf9/0x200
> [ 319.006675][ T16] ? kthread_fetch_affinity.isra.0+0x40/0x40
> [ 319.006678][ T16] ret_from_fork+0xff/0x150
> [ 319.006681][ T16] ? kthread_fetch_affinity.isra.0+0x40/0x40
> [ 319.006682][ T16] ? kthread_fetch_affinity.isra.0+0x40/0x40
> [ 319.006684][ T16] ret_from_fork_asm+0x11/0x20
> [ 319.006694][ T16] </TASK>
John Ogness
next prev parent reply other threads:[~2025-09-30 14:23 UTC|newest]
Thread overview: 45+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <fb38cfe5153fd67f540e6e8aff814c60b7129480.camel@gmx.de>
2025-08-14 10:16 ` netconsole: HARDIRQ-safe -> HARDIRQ-unsafe lock order warning Breno Leitao
2025-08-14 15:45 ` Pavel Begunkov
2025-08-15 0:23 ` Jakub Kicinski
2025-08-15 10:44 ` Pavel Begunkov
2025-08-15 16:42 ` Jakub Kicinski
2025-08-15 17:29 ` Breno Leitao
2025-08-15 17:33 ` Jakub Kicinski
2025-08-18 12:23 ` Breno Leitao
2025-08-15 19:10 ` Calvin Owens
2025-08-16 9:19 ` Mike Galbraith
2025-08-15 20:02 ` Pavel Begunkov
2025-08-18 12:10 ` Breno Leitao
2025-08-19 17:27 ` Breno Leitao
2025-08-20 12:31 ` Mike Galbraith
2025-08-20 17:36 ` Breno Leitao
2025-08-21 3:37 ` Mike Galbraith
2025-08-21 3:51 ` Mike Galbraith
2025-08-21 17:35 ` Breno Leitao
2025-08-22 3:54 ` Mike Galbraith
2025-08-26 12:43 ` Breno Leitao
2025-08-26 13:56 ` Mike Galbraith
2025-09-05 12:48 ` John Ogness
2025-09-06 2:32 ` Mike Galbraith
2025-09-08 13:30 ` John Ogness
2025-09-08 15:18 ` Mike Galbraith
2025-09-08 20:27 ` Calvin Owens
2025-09-09 15:49 ` Mike Galbraith
2025-09-10 15:51 ` Petr Mladek
2025-09-09 12:50 ` Breno Leitao
2025-09-10 12:22 ` John Ogness
2025-09-10 15:12 ` Petr Mladek
2025-09-10 18:26 ` Breno Leitao
2025-09-30 13:57 ` Calvin Owens
2025-09-30 14:23 ` John Ogness [this message]
2025-09-30 14:30 ` Sebastian Siewior
2025-09-30 17:35 ` Mike Galbraith
2025-10-01 6:00 ` Mike Galbraith
2025-09-11 13:03 ` John Ogness
2025-09-10 18:23 ` Breno Leitao
2025-09-11 13:13 ` John Ogness
2025-08-21 10:06 ` Mike Galbraith
2025-08-21 13:12 ` Mike Galbraith
2025-08-15 17:37 ` Calvin Owens
2025-08-26 14:10 ` Johannes Berg
2025-08-15 12:45 ` Mike Galbraith
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=84frc4j9yx.fsf@jogness.linutronix.de \
--to=john.ogness@linutronix.de \
--cc=asml.silence@gmail.com \
--cc=bigeasy@linutronix.de \
--cc=boqun.feng@gmail.com \
--cc=calvin@wbinvd.org \
--cc=efault@gmx.de \
--cc=horms@kernel.org \
--cc=johannes@sipsolutions.net \
--cc=kuba@kernel.org \
--cc=leitao@debian.org \
--cc=linux-kernel@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=paulmck@kernel.org \
--cc=pmladek@suse.com \
--cc=rostedt@goodmis.org \
--cc=senozhatsky@chromium.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).