* [PATCH v2 0/2] block/blk-mq: fix RT kernel issues and interrupt context warnings
@ 2025-12-22 20:15 Ionut Nechita (WindRiver)
2025-12-22 20:15 ` [PATCH v2 1/2] block/blk-mq: fix RT kernel regression with queue_lock in hot path Ionut Nechita (WindRiver)
2025-12-22 20:15 ` [PATCH v2 2/2] block: Fix WARN_ON in blk_mq_run_hw_queue when called from interrupt context Ionut Nechita (WindRiver)
0 siblings, 2 replies; 12+ messages in thread
From: Ionut Nechita (WindRiver) @ 2025-12-22 20:15 UTC (permalink / raw)
To: ming.lei
Cc: axboe, gregkh, ionut.nechita, linux-block, linux-kernel,
muchun.song, sashal, stable
From: Ionut Nechita <ionut.nechita@windriver.com>
This series addresses two critical issues in the block layer multiqueue
(blk-mq) subsystem when running on PREEMPT_RT kernels.
The first patch fixes a severe performance regression where queue_lock
contention in the I/O hot path causes IRQ threads to sleep on RT kernels.
Testing on MegaRAID 12GSAS controller showed a 76% performance drop
(640 MB/s -> 153 MB/s). The fix replaces spinlock with memory barriers
to maintain ordering without sleeping.
The second patch fixes a WARN_ON that triggers during SCSI device scanning
when blk_freeze_queue_start() calls blk_mq_run_hw_queues() synchronously
from interrupt context. The warning "WARN_ON_ONCE(!async && in_interrupt())"
is resolved by switching to asynchronous execution.
Changes in v2:
- Removed the blk_mq_cpuhp_lock patch (needs more investigation)
- Added fix for WARN_ON in interrupt context during queue freezing
- Updated commit messages for clarity
Ionut Nechita (2):
block/blk-mq: fix RT kernel regression with queue_lock in hot path
block: Fix WARN_ON in blk_mq_run_hw_queue when called from interrupt
context
block/blk-mq.c | 21 +++++++++------------
1 file changed, 9 insertions(+), 12 deletions(-)
--
2.52.0
^ permalink raw reply [flat|nested] 12+ messages in thread* [PATCH v2 1/2] block/blk-mq: fix RT kernel regression with queue_lock in hot path 2025-12-22 20:15 [PATCH v2 0/2] block/blk-mq: fix RT kernel issues and interrupt context warnings Ionut Nechita (WindRiver) @ 2025-12-22 20:15 ` Ionut Nechita (WindRiver) 2025-12-23 2:15 ` Muchun Song 2025-12-22 20:15 ` [PATCH v2 2/2] block: Fix WARN_ON in blk_mq_run_hw_queue when called from interrupt context Ionut Nechita (WindRiver) 1 sibling, 1 reply; 12+ messages in thread From: Ionut Nechita (WindRiver) @ 2025-12-22 20:15 UTC (permalink / raw) To: ming.lei Cc: axboe, gregkh, ionut.nechita, linux-block, linux-kernel, muchun.song, sashal, stable From: Ionut Nechita <ionut.nechita@windriver.com> Commit 679b1874eba7 ("block: fix ordering between checking QUEUE_FLAG_QUIESCED request adding") introduced queue_lock acquisition in blk_mq_run_hw_queue() to synchronize QUEUE_FLAG_QUIESCED checks. On RT kernels (CONFIG_PREEMPT_RT), regular spinlocks are converted to rt_mutex (sleeping locks). When multiple MSI-X IRQ threads process I/O completions concurrently, they contend on queue_lock in the hot path, causing all IRQ threads to enter D (uninterruptible sleep) state. This serializes interrupt processing completely. Test case (MegaRAID 12GSAS with 8 MSI-X vectors on RT kernel): - Good (v6.6.52-rt): 640 MB/s sequential read - Bad (v6.6.64-rt): 153 MB/s sequential read (-76% regression) - 6-8 out of 8 MSI-X IRQ threads stuck in D-state waiting on queue_lock The original commit message mentioned memory barriers as an alternative approach. Use full memory barriers (smp_mb) instead of queue_lock to provide the same ordering guarantees without sleeping in RT kernel. Memory barriers ensure proper synchronization: - CPU0 either sees QUEUE_FLAG_QUIESCED cleared, OR - CPU1 sees dispatch list/sw queue bitmap updates This maintains correctness while avoiding lock contention that causes RT kernel IRQ threads to sleep in the I/O completion path. Fixes: 679b1874eba7 ("block: fix ordering between checking QUEUE_FLAG_QUIESCED request adding") Cc: stable@vger.kernel.org Signed-off-by: Ionut Nechita <ionut.nechita@windriver.com> --- block/blk-mq.c | 19 ++++++++----------- 1 file changed, 8 insertions(+), 11 deletions(-) diff --git a/block/blk-mq.c b/block/blk-mq.c index 5da948b07058..5fb8da4958d0 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -2292,22 +2292,19 @@ void blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool async) might_sleep_if(!async && hctx->flags & BLK_MQ_F_BLOCKING); + /* + * First lockless check to avoid unnecessary overhead. + * Memory barrier below synchronizes with blk_mq_unquiesce_queue(). + */ need_run = blk_mq_hw_queue_need_run(hctx); if (!need_run) { - unsigned long flags; - - /* - * Synchronize with blk_mq_unquiesce_queue(), because we check - * if hw queue is quiesced locklessly above, we need the use - * ->queue_lock to make sure we see the up-to-date status to - * not miss rerunning the hw queue. - */ - spin_lock_irqsave(&hctx->queue->queue_lock, flags); + /* Synchronize with blk_mq_unquiesce_queue() */ + smp_mb(); need_run = blk_mq_hw_queue_need_run(hctx); - spin_unlock_irqrestore(&hctx->queue->queue_lock, flags); - if (!need_run) return; + /* Ensure dispatch list/sw queue updates visible before execution */ + smp_mb(); } if (async || !cpumask_test_cpu(raw_smp_processor_id(), hctx->cpumask)) { -- 2.52.0 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH v2 1/2] block/blk-mq: fix RT kernel regression with queue_lock in hot path 2025-12-22 20:15 ` [PATCH v2 1/2] block/blk-mq: fix RT kernel regression with queue_lock in hot path Ionut Nechita (WindRiver) @ 2025-12-23 2:15 ` Muchun Song 2026-01-06 11:36 ` djiony2011 0 siblings, 1 reply; 12+ messages in thread From: Muchun Song @ 2025-12-23 2:15 UTC (permalink / raw) To: Ionut Nechita (WindRiver) Cc: axboe, gregkh, ionut.nechita, linux-block, linux-kernel, sashal, stable, ming.lei On 2025/12/23 04:15, Ionut Nechita (WindRiver) wrote: > From: Ionut Nechita <ionut.nechita@windriver.com> > > Commit 679b1874eba7 ("block: fix ordering between checking > QUEUE_FLAG_QUIESCED request adding") introduced queue_lock acquisition > in blk_mq_run_hw_queue() to synchronize QUEUE_FLAG_QUIESCED checks. > > On RT kernels (CONFIG_PREEMPT_RT), regular spinlocks are converted to > rt_mutex (sleeping locks). When multiple MSI-X IRQ threads process I/O > completions concurrently, they contend on queue_lock in the hot path, > causing all IRQ threads to enter D (uninterruptible sleep) state. This > serializes interrupt processing completely. > > Test case (MegaRAID 12GSAS with 8 MSI-X vectors on RT kernel): > - Good (v6.6.52-rt): 640 MB/s sequential read > - Bad (v6.6.64-rt): 153 MB/s sequential read (-76% regression) > - 6-8 out of 8 MSI-X IRQ threads stuck in D-state waiting on queue_lock > > The original commit message mentioned memory barriers as an alternative > approach. Use full memory barriers (smp_mb) instead of queue_lock to > provide the same ordering guarantees without sleeping in RT kernel. > > Memory barriers ensure proper synchronization: > - CPU0 either sees QUEUE_FLAG_QUIESCED cleared, OR > - CPU1 sees dispatch list/sw queue bitmap updates > > This maintains correctness while avoiding lock contention that causes > RT kernel IRQ threads to sleep in the I/O completion path. > > Fixes: 679b1874eba7 ("block: fix ordering between checking QUEUE_FLAG_QUIESCED request adding") > Cc: stable@vger.kernel.org > Signed-off-by: Ionut Nechita <ionut.nechita@windriver.com> > --- > block/blk-mq.c | 19 ++++++++----------- > 1 file changed, 8 insertions(+), 11 deletions(-) > > diff --git a/block/blk-mq.c b/block/blk-mq.c > index 5da948b07058..5fb8da4958d0 100644 > --- a/block/blk-mq.c > +++ b/block/blk-mq.c > @@ -2292,22 +2292,19 @@ void blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool async) > > might_sleep_if(!async && hctx->flags & BLK_MQ_F_BLOCKING); > > + /* > + * First lockless check to avoid unnecessary overhead. > + * Memory barrier below synchronizes with blk_mq_unquiesce_queue(). > + */ > need_run = blk_mq_hw_queue_need_run(hctx); > if (!need_run) { > - unsigned long flags; > - > - /* > - * Synchronize with blk_mq_unquiesce_queue(), because we check > - * if hw queue is quiesced locklessly above, we need the use > - * ->queue_lock to make sure we see the up-to-date status to > - * not miss rerunning the hw queue. > - */ > - spin_lock_irqsave(&hctx->queue->queue_lock, flags); > + /* Synchronize with blk_mq_unquiesce_queue() */ Memory barriers must be used in pairs. So how to synchronize? > + smp_mb(); > need_run = blk_mq_hw_queue_need_run(hctx); > - spin_unlock_irqrestore(&hctx->queue->queue_lock, flags); > - > if (!need_run) > return; > + /* Ensure dispatch list/sw queue updates visible before execution */ > + smp_mb(); Why we need another barrier? What order does this barrier guarantee? Thanks. > } > > if (async || !cpumask_test_cpu(raw_smp_processor_id(), hctx->cpumask)) { ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2 1/2] block/blk-mq: fix RT kernel regression with queue_lock in hot path 2025-12-23 2:15 ` Muchun Song @ 2026-01-06 11:36 ` djiony2011 0 siblings, 0 replies; 12+ messages in thread From: djiony2011 @ 2026-01-06 11:36 UTC (permalink / raw) To: muchun.song Cc: axboe, gregkh, ionut.nechita, linux-block, linux-kernel, ming.lei, sashal, stable From: Ionut Nechita <ionut.nechita@windriver.com> Hi Muchun, Thank you for the detailed review. Your questions about the memory barriers are absolutely correct and highlight fundamental issues with my approach. > Memory barriers must be used in pairs. So how to synchronize? > Why we need another barrier? What order does this barrier guarantee? You're right to ask these questions. After careful consideration and discussion with Ming Lei, I've concluded that the memory barrier approach in this patch is flawed and insufficient. The fundamental problem is: 1. Memory barriers need proper pairing on both read and write sides 2. The write-side barriers would need to be inserted at MULTIPLE call sites throughout the block layer - everywhere work is added before calling blk_mq_run_hw_queue() 3. This is exactly why the original commit 679b1874eba7 chose the lock-based approach, noting that "memory barrier is not easy to be maintained" My patch attempted to add barriers only in blk_mq_run_hw_queue(), but didn't address the pairing barriers needed at all the call sites that add work to dispatch lists/sw queues. This makes the synchronization incomplete. ## New approach: dedicated raw_spinlock I'm abandoning the memory barrier approach and preparing a new patch that uses a dedicated raw_spinlock_t (quiesce_sync_lock) instead of the general-purpose queue_lock. The key differences from the current problematic code: - Current: Uses queue_lock (spinlock_t) which becomes rt_mutex in RT kernel - New: Uses quiesce_sync_lock (raw_spinlock_t) which stays a real spinlock Why raw_spinlock is safe: - Critical section is provably short (only flag and counter checks) - No sleeping operations under the lock - Specific to quiesce synchronization, not general queue operations This approach: - Maintains the correct synchronization from 679b1874eba7 - Avoids sleeping in RT kernel's IRQ thread context - Simpler and more maintainable than memory barrier pairing across many call sites I'll send the new patch shortly. Thank you for catching these issues before they made it into the kernel. Best regards, Ionut ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH v2 2/2] block: Fix WARN_ON in blk_mq_run_hw_queue when called from interrupt context 2025-12-22 20:15 [PATCH v2 0/2] block/blk-mq: fix RT kernel issues and interrupt context warnings Ionut Nechita (WindRiver) 2025-12-22 20:15 ` [PATCH v2 1/2] block/blk-mq: fix RT kernel regression with queue_lock in hot path Ionut Nechita (WindRiver) @ 2025-12-22 20:15 ` Ionut Nechita (WindRiver) 2025-12-23 1:22 ` Ming Lei 2025-12-23 2:18 ` Muchun Song 1 sibling, 2 replies; 12+ messages in thread From: Ionut Nechita (WindRiver) @ 2025-12-22 20:15 UTC (permalink / raw) To: ming.lei Cc: axboe, gregkh, ionut.nechita, linux-block, linux-kernel, muchun.song, sashal, stable From: Ionut Nechita <ionut.nechita@windriver.com> Fix warning "WARN_ON_ONCE(!async && in_interrupt())" that occurs during SCSI device scanning when blk_freeze_queue_start() calls blk_mq_run_hw_queues() synchronously from interrupt context. The issue happens during device removal/scanning when: 1. blk_mq_destroy_queue() -> blk_queue_start_drain() 2. blk_freeze_queue_start() calls blk_mq_run_hw_queues(q, false) 3. This triggers the warning in blk_mq_run_hw_queue() when in interrupt context Change the synchronous call to asynchronous to avoid running in interrupt context. Fixes: Warning in blk_mq_run_hw_queue+0x1fa/0x260 Signed-off-by: Ionut Nechita <ionut.nechita@windriver.com> --- block/blk-mq.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/block/blk-mq.c b/block/blk-mq.c index 5fb8da4958d0..ae152f7a6933 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -128,7 +128,7 @@ void blk_freeze_queue_start(struct request_queue *q) percpu_ref_kill(&q->q_usage_counter); mutex_unlock(&q->mq_freeze_lock); if (queue_is_mq(q)) - blk_mq_run_hw_queues(q, false); + blk_mq_run_hw_queues(q, true); } else { mutex_unlock(&q->mq_freeze_lock); } -- 2.52.0 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH v2 2/2] block: Fix WARN_ON in blk_mq_run_hw_queue when called from interrupt context 2025-12-22 20:15 ` [PATCH v2 2/2] block: Fix WARN_ON in blk_mq_run_hw_queue when called from interrupt context Ionut Nechita (WindRiver) @ 2025-12-23 1:22 ` Ming Lei 2026-01-06 11:14 ` djiony2011 2025-12-23 2:18 ` Muchun Song 1 sibling, 1 reply; 12+ messages in thread From: Ming Lei @ 2025-12-23 1:22 UTC (permalink / raw) To: Ionut Nechita (WindRiver) Cc: axboe, gregkh, ionut.nechita, linux-block, linux-kernel, muchun.song, sashal, stable On Mon, Dec 22, 2025 at 10:15:41PM +0200, Ionut Nechita (WindRiver) wrote: > From: Ionut Nechita <ionut.nechita@windriver.com> > > Fix warning "WARN_ON_ONCE(!async && in_interrupt())" that occurs during > SCSI device scanning when blk_freeze_queue_start() calls blk_mq_run_hw_queues() > synchronously from interrupt context. Can you show the whole stack trace in the warning? The in-code doesn't indicate that freeze queue can be called from scsi's interrupt context. Thanks, Ming ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2 2/2] block: Fix WARN_ON in blk_mq_run_hw_queue when called from interrupt context 2025-12-23 1:22 ` Ming Lei @ 2026-01-06 11:14 ` djiony2011 2026-01-06 12:29 ` Bart Van Assche 2026-01-06 15:04 ` Ming Lei 0 siblings, 2 replies; 12+ messages in thread From: djiony2011 @ 2026-01-06 11:14 UTC (permalink / raw) To: ming.lei Cc: axboe, djiony2011, gregkh, ionut.nechita, linux-block, linux-kernel, muchun.song, sashal, stable From: Ionut Nechita <ionut.nechita@windriver.com> Hi Ming, Thank you for the review. You're absolutely right to ask for clarification - I need to correct my commit message as it's misleading about the actual call path. > Can you show the whole stack trace in the warning? The in-code doesn't > indicate that freeze queue can be called from scsi's interrupt context. Here's the complete stack trace from the WARNING at blk_mq_run_hw_queue: [Mon Dec 22 10:18:18 2025] WARNING: CPU: 190 PID: 2041 at block/blk-mq.c:2291 blk_mq_run_hw_queue+0x1fa/0x260 [Mon Dec 22 10:18:18 2025] Modules linked in: [Mon Dec 22 10:18:18 2025] CPU: 190 PID: 2041 Comm: kworker/u385:1 Tainted: G W 6.6.0-1-rt-amd64 #1 Debian 6.6.71-1 [Mon Dec 22 10:18:18 2025] Hardware name: Dell Inc. PowerEdge R7615/09K9WP, BIOS 1.11.2 12/19/2024 [Mon Dec 22 10:18:18 2025] Workqueue: events_unbound async_run_entry_fn [Mon Dec 22 10:18:18 2025] RIP: 0010:blk_mq_run_hw_queue+0x1fa/0x260 [Mon Dec 22 10:18:18 2025] Code: ff 75 68 44 89 f6 e8 e5 45 c0 ff e9 ac fe ff ff e8 2b 70 c0 ff 48 89 ef e8 b3 a0 00 00 5b 5d 41 5c 41 5d 41 5e e9 26 9e c0 ff <0f> 0b e9 43 fe ff ff e8 0a 70 c0 ff 48 8b 85 d0 00 00 00 48 8b 80 [Mon Dec 22 10:18:18 2025] RSP: 0018:ff630f098528fb98 EFLAGS: 00010206 [Mon Dec 22 10:18:18 2025] RAX: 0000000000ff0000 RBX: 0000000000000000 RCX: 0000000000000000 [Mon Dec 22 10:18:18 2025] RDX: 0000000000ff0000 RSI: 0000000000000000 RDI: ff3edc0247159400 [Mon Dec 22 10:18:18 2025] RBP: ff3edc0247159400 R08: ff3edc0247159400 R09: ff630f098528fb60 [Mon Dec 22 10:18:18 2025] R10: 0000000000000000 R11: 0000000045069ed3 R12: 0000000000000000 [Mon Dec 22 10:18:18 2025] R13: ff3edc024715a828 R14: 0000000000000000 R15: 0000000000000000 [Mon Dec 22 10:18:18 2025] FS: 0000000000000000(0000) GS:ff3edc10fd380000(0000) knlGS:0000000000000000 [Mon Dec 22 10:18:18 2025] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [Mon Dec 22 10:18:18 2025] CR2: 0000000000000000 CR3: 000000073961a001 CR4: 0000000000771ee0 [Mon Dec 22 10:18:18 2025] PKRU: 55555554 [Mon Dec 22 10:18:18 2025] Call Trace: [Mon Dec 22 10:18:18 2025] <TASK> [Mon Dec 22 10:18:18 2025] ? __warn+0x89/0x140 [Mon Dec 22 10:18:18 2025] ? blk_mq_run_hw_queue+0x1fa/0x260 [Mon Dec 22 10:18:18 2025] ? report_bug+0x198/0x1b0 [Mon Dec 22 10:18:18 2025] ? handle_bug+0x53/0x90 [Mon Dec 22 10:18:18 2025] ? exc_invalid_op+0x18/0x70 [Mon Dec 22 10:18:18 2025] ? asm_exc_invalid_op+0x1a/0x20 [Mon Dec 22 10:18:18 2025] ? blk_mq_run_hw_queue+0x1fa/0x260 [Mon Dec 22 10:18:18 2025] blk_mq_run_hw_queues+0x6c/0x130 [Mon Dec 22 10:18:18 2025] blk_queue_start_drain+0x12/0x40 [Mon Dec 22 10:18:18 2025] blk_mq_destroy_queue+0x37/0x70 [Mon Dec 22 10:18:18 2025] __scsi_remove_device+0x6a/0x180 [Mon Dec 22 10:18:18 2025] scsi_alloc_sdev+0x357/0x360 [Mon Dec 22 10:18:18 2025] scsi_probe_and_add_lun+0x8ac/0xc00 [Mon Dec 22 10:18:18 2025] ? srso_alias_return_thunk+0x5/0xfbef5 [Mon Dec 22 10:18:18 2025] ? dev_set_name+0x57/0x80 [Mon Dec 22 10:18:18 2025] ? srso_alias_return_thunk+0x5/0xfbef5 [Mon Dec 22 10:18:18 2025] ? attribute_container_add_device+0x4d/0x130 [Mon Dec 22 10:18:18 2025] __scsi_scan_target+0xf0/0x520 [Mon Dec 22 10:18:18 2025] ? srso_alias_return_thunk+0x5/0xfbef5 [Mon Dec 22 10:18:18 2025] ? sched_clock_cpu+0x64/0x190 [Mon Dec 22 10:18:18 2025] scsi_scan_channel+0x57/0x90 [Mon Dec 22 10:18:18 2025] scsi_scan_host_selected+0xd4/0x110 [Mon Dec 22 10:18:18 2025] do_scan_async+0x1c/0x190 [Mon Dec 22 10:18:18 2025] async_run_entry_fn+0x2f/0x130 [Mon Dec 22 10:18:18 2025] process_one_work+0x175/0x370 [Mon Dec 22 10:18:18 2025] worker_thread+0x280/0x390 [Mon Dec 22 10:18:18 2025] ? __pfx_worker_thread+0x10/0x10 [Mon Dec 22 10:18:18 2025] kthread+0xdd/0x110 [Mon Dec 22 10:18:18 2025] ? __pfx_kthread+0x10/0x10 [Mon Dec 22 10:18:18 2025] ret_from_fork+0x31/0x50 [Mon Dec 22 10:18:18 2025] ? __pfx_kthread+0x10/0x10 [Mon Dec 22 10:18:18 2025] ret_from_fork_asm+0x1b/0x30 [Mon Dec 22 10:18:18 2025] </TASK> [Mon Dec 22 10:18:18 2025] ---[ end trace 0000000000000000 ]--- ## Important clarifications: 1. **Not freeze queue, but drain during destroy**: My commit message was incorrect. The call path is: blk_mq_destroy_queue() -> blk_queue_start_drain() -> blk_mq_run_hw_queues(q, false) This is NOT during blk_freeze_queue_start(), but during queue destruction when a SCSI device probe fails and cleanup is triggered. 2. **Not true interrupt context**: You're correct that this isn't from an interrupt handler. The workqueue context is process context, not interrupt context. 3. **The actual problem on PREEMPT_RT**: There's a preceding "scheduling while atomic" error that provides the real context: [Mon Dec 22 10:18:18 2025] BUG: scheduling while atomic: kworker/u385:1/2041/0x00000002 [Mon Dec 22 10:18:18 2025] Call Trace: [Mon Dec 22 10:18:18 2025] dump_stack_lvl+0x37/0x50 [Mon Dec 22 10:18:18 2025] __schedule_bug+0x52/0x60 [Mon Dec 22 10:18:18 2025] __schedule+0x87d/0xb10 [Mon Dec 22 10:18:18 2025] rt_mutex_schedule+0x21/0x40 [Mon Dec 22 10:18:18 2025] rt_mutex_slowlock_block.constprop.0+0x33/0x170 [Mon Dec 22 10:18:18 2025] __rt_mutex_slowlock_locked.constprop.0+0xc4/0x1e0 [Mon Dec 22 10:18:18 2025] mutex_lock+0x44/0x60 [Mon Dec 22 10:18:18 2025] __cpuhp_state_add_instance_cpuslocked+0x41/0x110 [Mon Dec 22 10:18:18 2025] __cpuhp_state_add_instance+0x48/0xd0 [Mon Dec 22 10:18:18 2025] blk_mq_realloc_hw_ctxs+0x405/0x420 [Mon Dec 22 10:18:18 2025] blk_mq_init_allocated_queue+0x10a/0x480 The context is atomic because on PREEMPT_RT, some spinlock earlier in the call chain has been converted to an rt_mutex, and the code is holding that lock. When blk_mq_run_hw_queues() is called with async=false, it triggers kblockd_mod_delayed_work_on(), which calls in_interrupt(), and this returns true because preempt_count() is non-zero due to the rt_mutex being held. ## What this means: The issue is specific to PREEMPT_RT where: - Spinlocks become sleeping mutexes (rt_mutex) - Holding an rt_mutex sets preempt_count, making in_interrupt() return true - blk_mq_run_hw_queues() with async=false hits WARN_ON_ONCE(!async && in_interrupt()) This is why the async parameter needs to be true when called in contexts that might hold spinlocks on RT kernels. I apologize for the confusion in my commit message. Should I: 1. Revise the commit message to accurately describe the blk_queue_start_drain() path? 2. Add details about the PREEMPT_RT context causing the atomic state? Best regards, Ionut ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2 2/2] block: Fix WARN_ON in blk_mq_run_hw_queue when called from interrupt context 2026-01-06 11:14 ` djiony2011 @ 2026-01-06 12:29 ` Bart Van Assche 2026-01-06 14:40 ` Ionut Nechita 2026-01-06 15:04 ` Ming Lei 1 sibling, 1 reply; 12+ messages in thread From: Bart Van Assche @ 2026-01-06 12:29 UTC (permalink / raw) To: djiony2011, ming.lei Cc: axboe, gregkh, ionut.nechita, linux-block, linux-kernel, muchun.song, sashal, stable On 1/6/26 3:14 AM, djiony2011@gmail.com wrote: > [Mon Dec 22 10:18:18 2025] WARNING: CPU: 190 PID: 2041 at block/blk-mq.c:2291 blk_mq_run_hw_queue+0x1fa/0x260 > [Mon Dec 22 10:18:18 2025] Modules linked in: > [Mon Dec 22 10:18:18 2025] CPU: 190 PID: 2041 Comm: kworker/u385:1 Tainted: G W 6.6.0-1-rt-amd64 #1 Debian 6.6.71-1 6.6.71 is pretty far away from Jens' for-next branch. Please use Jens' for-next branch for testing kernel patches intended for the upstream kernel. > [Mon Dec 22 10:18:18 2025] Call Trace: > [Mon Dec 22 10:18:18 2025] <TASK> > [Mon Dec 22 10:18:18 2025] blk_mq_run_hw_queues+0x6c/0x130 > [Mon Dec 22 10:18:18 2025] blk_queue_start_drain+0x12/0x40 > [Mon Dec 22 10:18:18 2025] blk_mq_destroy_queue+0x37/0x70 > [Mon Dec 22 10:18:18 2025] __scsi_remove_device+0x6a/0x180 > [Mon Dec 22 10:18:18 2025] scsi_alloc_sdev+0x357/0x360 > [Mon Dec 22 10:18:18 2025] scsi_probe_and_add_lun+0x8ac/0xc00 > [Mon Dec 22 10:18:18 2025] __scsi_scan_target+0xf0/0x520 > [Mon Dec 22 10:18:18 2025] scsi_scan_channel+0x57/0x90 > [Mon Dec 22 10:18:18 2025] scsi_scan_host_selected+0xd4/0x110 > [Mon Dec 22 10:18:18 2025] do_scan_async+0x1c/0x190 > [Mon Dec 22 10:18:18 2025] async_run_entry_fn+0x2f/0x130 > [Mon Dec 22 10:18:18 2025] process_one_work+0x175/0x370 > [Mon Dec 22 10:18:18 2025] worker_thread+0x280/0x390 > [Mon Dec 22 10:18:18 2025] kthread+0xdd/0x110 > [Mon Dec 22 10:18:18 2025] ret_from_fork+0x31/0x50 > [Mon Dec 22 10:18:18 2025] ret_from_fork_asm+0x1b/0x30 Where in the above call stack is the code that disables interrupts? > 3. **The actual problem on PREEMPT_RT**: There's a preceding "scheduling while atomic" > error that provides the real context: > > [Mon Dec 22 10:18:18 2025] BUG: scheduling while atomic: kworker/u385:1/2041/0x00000002 > [Mon Dec 22 10:18:18 2025] Call Trace: > [Mon Dec 22 10:18:18 2025] dump_stack_lvl+0x37/0x50 > [Mon Dec 22 10:18:18 2025] __schedule_bug+0x52/0x60 > [Mon Dec 22 10:18:18 2025] __schedule+0x87d/0xb10 > [Mon Dec 22 10:18:18 2025] rt_mutex_schedule+0x21/0x40 > [Mon Dec 22 10:18:18 2025] rt_mutex_slowlock_block.constprop.0+0x33/0x170 > [Mon Dec 22 10:18:18 2025] __rt_mutex_slowlock_locked.constprop.0+0xc4/0x1e0 > [Mon Dec 22 10:18:18 2025] mutex_lock+0x44/0x60 > [Mon Dec 22 10:18:18 2025] __cpuhp_state_add_instance_cpuslocked+0x41/0x110 > [Mon Dec 22 10:18:18 2025] __cpuhp_state_add_instance+0x48/0xd0 > [Mon Dec 22 10:18:18 2025] blk_mq_realloc_hw_ctxs+0x405/0x420 > [Mon Dec 22 10:18:18 2025] blk_mq_init_allocated_queue+0x10a/0x480 How is the above call stack related to the reported problem? The above call stack is about request queue allocation while the reported problem happens during request queue destruction. > I apologize for the confusion in my commit message. Should I: > 1. Revise the commit message to accurately describe the blk_queue_start_drain() path? > 2. Add details about the PREEMPT_RT context causing the atomic state? The answer to both questions is yes. Thanks, Bart. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2 2/2] block: Fix WARN_ON in blk_mq_run_hw_queue when called from interrupt context 2026-01-06 12:29 ` Bart Van Assche @ 2026-01-06 14:40 ` Ionut Nechita 0 siblings, 0 replies; 12+ messages in thread From: Ionut Nechita @ 2026-01-06 14:40 UTC (permalink / raw) To: bvanassche Cc: axboe, gregkh, ionut.nechita, linux-block, linux-kernel, ming.lei, muchun.song, sashal, stable Hi Bart, Thank you for the thorough and insightful review. You've identified several critical issues with my submission that I need to address. > 6.6.71 is pretty far away from Jens' for-next branch. Please use Jens' > for-next branch for testing kernel patches intended for the upstream kernel. You're absolutely right. I was testing on the stable Debian kernel (6.6.71-rt) which was where the issue was originally reported. I will now fetch and test on Jens' for-next branch and ensure the issue reproduces there before resubmitting. > Where in the above call stack is the code that disables interrupts? This was poorly worded on my part, and I apologize for the confusion. The issue is NOT "interrupt context" in the hardirq sense. What's actually happening: - **Context:** kworker thread (async SCSI device scan) - **State:** Running with preemption disabled (atomic context, not hardirq) - **Path:** Queue destruction during device probe error cleanup - **Trigger:** On PREEMPT_RT, in_interrupt() returns true when preemption is disabled, even in process context The WARN_ON in blk_mq_run_hw_queue() at line 2291 is: WARN_ON_ONCE(!async && in_interrupt()); On PREEMPT_RT, this check fires because: 1. blk_freeze_queue_start() calls blk_mq_run_hw_queues(q, false) ← async=false 2. This eventually calls blk_mq_run_hw_queue() with async=false 3. in_interrupt() returns true (because preempt_count indicates atomic state) 4. WARN_ON triggers So it's not "interrupt context" - it's atomic context (preemption disabled) being detected by in_interrupt() on RT kernel. > How is the above call stack related to the reported problem? The above > call stack is about request queue allocation while the reported problem > happens during request queue destruction. You're absolutely correct, and I apologize for the confusion. I mistakenly included two different call stacks in my commit message: 1. **"scheduling while atomic" during blk_mq_realloc_hw_ctxs** - This was from queue allocation and is a DIFFERENT issue. It should NOT have been included. 2. **WARN_ON during blk_queue_start_drain** - This is the ACTUAL issue that my patch addresses (queue destruction path). I will revise the commit message to remove the unrelated allocation stack trace and focus solely on the queue destruction path. > I apologize for the confusion in my commit message. Should I: > 1. Revise the commit message to accurately describe the blk_queue_start_drain() path? > 2. Add details about the PREEMPT_RT context causing the atomic state? > > The answer to both questions is yes. Understood. I will prepare v3->v5 with the following corrections: 1. **Test on Jens' for-next branch** - Fetch, reproduce, and validate the fix on the upstream development tree 2. **Accurate context description** - Replace "IRQ thread context" with "kworker context with preemption disabled (atomic context on RT)" 3. **Single, clear call stack** - Remove the confusing allocation stack trace, focus only on the destruction path: ``` scsi_alloc_sdev (error path) → __scsi_remove_device → blk_mq_destroy_queue → blk_queue_start_drain → blk_freeze_queue_start → blk_mq_run_hw_queues(q, false) ← Problem: async=false ``` 4. **Explain PREEMPT_RT specifics** - Clearly describe why in_interrupt() returns true in atomic context on RT kernel, and how changing to async=true avoids the problem 5. **Accurate problem statement** - This is about avoiding synchronous queue runs in atomic context on RT, not about MSI-X IRQ thread contention (that was a misunderstanding on my part) I'll respond again once I've validated on for-next and have a corrected v3->v5 ready. Thank you again for the detailed feedback. Best regards, Ionut -- 2.52.0 ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2 2/2] block: Fix WARN_ON in blk_mq_run_hw_queue when called from interrupt context 2026-01-06 11:14 ` djiony2011 2026-01-06 12:29 ` Bart Van Assche @ 2026-01-06 15:04 ` Ming Lei 2026-01-06 16:35 ` Ionut Nechita (WindRiver) 1 sibling, 1 reply; 12+ messages in thread From: Ming Lei @ 2026-01-06 15:04 UTC (permalink / raw) To: djiony2011 Cc: axboe, gregkh, ionut.nechita, linux-block, linux-kernel, muchun.song, sashal, stable On Tue, Jan 06, 2026 at 01:14:11PM +0200, djiony2011@gmail.com wrote: > From: Ionut Nechita <ionut.nechita@windriver.com> > > Hi Ming, > > Thank you for the review. You're absolutely right to ask for clarification - I need to > correct my commit message as it's misleading about the actual call path. > > > Can you show the whole stack trace in the warning? The in-code doesn't > > indicate that freeze queue can be called from scsi's interrupt context. > > Here's the complete stack trace from the WARNING at blk_mq_run_hw_queue: > > [Mon Dec 22 10:18:18 2025] WARNING: CPU: 190 PID: 2041 at block/blk-mq.c:2291 blk_mq_run_hw_queue+0x1fa/0x260 > [Mon Dec 22 10:18:18 2025] Modules linked in: > [Mon Dec 22 10:18:18 2025] CPU: 190 PID: 2041 Comm: kworker/u385:1 Tainted: G W 6.6.0-1-rt-amd64 #1 Debian 6.6.71-1 There is so big change between 6.6.0-1-rt and 6.19, because Real-Time "PREEMPT_RT" Support Merged For Linux 6.12 https://www.phoronix.com/news/Linux-6.12-Does-Real-Time > [Mon Dec 22 10:18:18 2025] Hardware name: Dell Inc. PowerEdge R7615/09K9WP, BIOS 1.11.2 12/19/2024 > [Mon Dec 22 10:18:18 2025] Workqueue: events_unbound async_run_entry_fn > [Mon Dec 22 10:18:18 2025] RIP: 0010:blk_mq_run_hw_queue+0x1fa/0x260 > [Mon Dec 22 10:18:18 2025] Code: ff 75 68 44 89 f6 e8 e5 45 c0 ff e9 ac fe ff ff e8 2b 70 c0 ff 48 89 ef e8 b3 a0 00 00 5b 5d 41 5c 41 5d 41 5e e9 26 9e c0 ff <0f> 0b e9 43 fe ff ff e8 0a 70 c0 ff 48 8b 85 d0 00 00 00 48 8b 80 > [Mon Dec 22 10:18:18 2025] RSP: 0018:ff630f098528fb98 EFLAGS: 00010206 > [Mon Dec 22 10:18:18 2025] RAX: 0000000000ff0000 RBX: 0000000000000000 RCX: 0000000000000000 > [Mon Dec 22 10:18:18 2025] RDX: 0000000000ff0000 RSI: 0000000000000000 RDI: ff3edc0247159400 > [Mon Dec 22 10:18:18 2025] RBP: ff3edc0247159400 R08: ff3edc0247159400 R09: ff630f098528fb60 > [Mon Dec 22 10:18:18 2025] R10: 0000000000000000 R11: 0000000045069ed3 R12: 0000000000000000 > [Mon Dec 22 10:18:18 2025] R13: ff3edc024715a828 R14: 0000000000000000 R15: 0000000000000000 > [Mon Dec 22 10:18:18 2025] FS: 0000000000000000(0000) GS:ff3edc10fd380000(0000) knlGS:0000000000000000 > [Mon Dec 22 10:18:18 2025] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [Mon Dec 22 10:18:18 2025] CR2: 0000000000000000 CR3: 000000073961a001 CR4: 0000000000771ee0 > [Mon Dec 22 10:18:18 2025] PKRU: 55555554 > [Mon Dec 22 10:18:18 2025] Call Trace: > [Mon Dec 22 10:18:18 2025] <TASK> > [Mon Dec 22 10:18:18 2025] ? __warn+0x89/0x140 > [Mon Dec 22 10:18:18 2025] ? blk_mq_run_hw_queue+0x1fa/0x260 > [Mon Dec 22 10:18:18 2025] ? report_bug+0x198/0x1b0 > [Mon Dec 22 10:18:18 2025] ? handle_bug+0x53/0x90 > [Mon Dec 22 10:18:18 2025] ? exc_invalid_op+0x18/0x70 > [Mon Dec 22 10:18:18 2025] ? asm_exc_invalid_op+0x1a/0x20 > [Mon Dec 22 10:18:18 2025] ? blk_mq_run_hw_queue+0x1fa/0x260 > [Mon Dec 22 10:18:18 2025] blk_mq_run_hw_queues+0x6c/0x130 > [Mon Dec 22 10:18:18 2025] blk_queue_start_drain+0x12/0x40 > [Mon Dec 22 10:18:18 2025] blk_mq_destroy_queue+0x37/0x70 > [Mon Dec 22 10:18:18 2025] __scsi_remove_device+0x6a/0x180 > [Mon Dec 22 10:18:18 2025] scsi_alloc_sdev+0x357/0x360 > [Mon Dec 22 10:18:18 2025] scsi_probe_and_add_lun+0x8ac/0xc00 > [Mon Dec 22 10:18:18 2025] ? srso_alias_return_thunk+0x5/0xfbef5 > [Mon Dec 22 10:18:18 2025] ? dev_set_name+0x57/0x80 > [Mon Dec 22 10:18:18 2025] ? srso_alias_return_thunk+0x5/0xfbef5 > [Mon Dec 22 10:18:18 2025] ? attribute_container_add_device+0x4d/0x130 > [Mon Dec 22 10:18:18 2025] __scsi_scan_target+0xf0/0x520 > [Mon Dec 22 10:18:18 2025] ? srso_alias_return_thunk+0x5/0xfbef5 > [Mon Dec 22 10:18:18 2025] ? sched_clock_cpu+0x64/0x190 > [Mon Dec 22 10:18:18 2025] scsi_scan_channel+0x57/0x90 > [Mon Dec 22 10:18:18 2025] scsi_scan_host_selected+0xd4/0x110 > [Mon Dec 22 10:18:18 2025] do_scan_async+0x1c/0x190 > [Mon Dec 22 10:18:18 2025] async_run_entry_fn+0x2f/0x130 > [Mon Dec 22 10:18:18 2025] process_one_work+0x175/0x370 > [Mon Dec 22 10:18:18 2025] worker_thread+0x280/0x390 > [Mon Dec 22 10:18:18 2025] ? __pfx_worker_thread+0x10/0x10 > [Mon Dec 22 10:18:18 2025] kthread+0xdd/0x110 > [Mon Dec 22 10:18:18 2025] ? __pfx_kthread+0x10/0x10 > [Mon Dec 22 10:18:18 2025] ret_from_fork+0x31/0x50 > [Mon Dec 22 10:18:18 2025] ? __pfx_kthread+0x10/0x10 > [Mon Dec 22 10:18:18 2025] ret_from_fork_asm+0x1b/0x30 > [Mon Dec 22 10:18:18 2025] </TASK> > [Mon Dec 22 10:18:18 2025] ---[ end trace 0000000000000000 ]--- > > ## Important clarifications: > > 1. **Not freeze queue, but drain during destroy**: My commit message was incorrect. > The call path is: > blk_mq_destroy_queue() -> blk_queue_start_drain() -> blk_mq_run_hw_queues(q, false) > > This is NOT during blk_freeze_queue_start(), but during queue destruction when a > SCSI device probe fails and cleanup is triggered. > > 2. **Not true interrupt context**: You're correct that this isn't from an interrupt > handler. The workqueue context is process context, not interrupt context. > > 3. **The actual problem on PREEMPT_RT**: There's a preceding "scheduling while atomic" > error that provides the real context: > > [Mon Dec 22 10:18:18 2025] BUG: scheduling while atomic: kworker/u385:1/2041/0x00000002 > [Mon Dec 22 10:18:18 2025] Call Trace: > [Mon Dec 22 10:18:18 2025] dump_stack_lvl+0x37/0x50 > [Mon Dec 22 10:18:18 2025] __schedule_bug+0x52/0x60 > [Mon Dec 22 10:18:18 2025] __schedule+0x87d/0xb10 > [Mon Dec 22 10:18:18 2025] rt_mutex_schedule+0x21/0x40 > [Mon Dec 22 10:18:18 2025] rt_mutex_slowlock_block.constprop.0+0x33/0x170 > [Mon Dec 22 10:18:18 2025] __rt_mutex_slowlock_locked.constprop.0+0xc4/0x1e0 > [Mon Dec 22 10:18:18 2025] mutex_lock+0x44/0x60 > [Mon Dec 22 10:18:18 2025] __cpuhp_state_add_instance_cpuslocked+0x41/0x110 > [Mon Dec 22 10:18:18 2025] __cpuhp_state_add_instance+0x48/0xd0 > [Mon Dec 22 10:18:18 2025] blk_mq_realloc_hw_ctxs+0x405/0x420 Why is the above warning related with your patch? > [Mon Dec 22 10:18:18 2025] blk_mq_init_allocated_queue+0x10a/0x480 > > The context is atomic because on PREEMPT_RT, some spinlock earlier in the call chain has > been converted to an rt_mutex, and the code is holding that lock. When blk_mq_run_hw_queues() > is called with async=false, it triggers kblockd_mod_delayed_work_on(), which calls > in_interrupt(), and this returns true because preempt_count() is non-zero due to the > rt_mutex being held. > > ## What this means: > > The issue is specific to PREEMPT_RT where: > - Spinlocks become sleeping mutexes (rt_mutex) > - Holding an rt_mutex sets preempt_count, making in_interrupt() return true > - blk_mq_run_hw_queues() with async=false hits WARN_ON_ONCE(!async && in_interrupt()) If you think the same issue exists on recent kernel, show the stack trace. Or please share how preempt is disabled in the above blk_mq_run_hw_queues code path. Thanks, Ming ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2 2/2] block: Fix WARN_ON in blk_mq_run_hw_queue when called from interrupt context 2026-01-06 15:04 ` Ming Lei @ 2026-01-06 16:35 ` Ionut Nechita (WindRiver) 0 siblings, 0 replies; 12+ messages in thread From: Ionut Nechita (WindRiver) @ 2026-01-06 16:35 UTC (permalink / raw) To: ming.lei Cc: axboe, gregkh, ionut.nechita, linux-block, linux-kernel, muchun.song, sashal, stable From: Ionut Nechita <ionut.nechita@windriver.com> Hi Ming, Thank you for the thorough review. You've identified critical issues with my analysis. > There is so big change between 6.6.0-1-rt and 6.19, because Real-Time > "PREEMPT_RT" Support Merged For Linux 6.12 You're absolutely right. I tested on Debian's 6.6.71-rt which uses the out-of-tree RT patches. I will retest on Friday with both 6.12 (first kernel with merged RT support) and linux-next to confirm whether this issue still exists in current upstream. > Why is the above warning related with your patch? After reviewing the complete dmesg log, I now see there are TWO separate errors from the same process (PID 2041): **Error #1** - Root cause (the one you highlighted): ``` BUG: scheduling while atomic: kworker/u385:1/2041/0x00000002 mutex_lock → __cpuhp_state_add_instance → blk_mq_realloc_hw_ctxs → blk_mq_init_queue → scsi_alloc_sdev ← Queue ALLOCATION ``` **Error #2** - Symptom (the one my patch addresses): ``` WARNING at blk_mq_run_hw_queue+0x1fa blk_mq_run_hw_queue → blk_mq_run_hw_queues → blk_queue_start_drain → blk_mq_destroy_queue → __scsi_remove_device → scsi_alloc_sdev ← Queue DESTRUCTION (cleanup) ``` The sequence is: 1. Queue allocation in scsi_alloc_sdev() hits Error #1 (mutex in atomic context) 2. Allocation fails, enters cleanup path 3. Cleanup calls blk_mq_destroy_queue() while STILL in atomic context 4. blk_queue_start_drain() → blk_mq_run_hw_queues(q, false) 5. WARN_ON(!async && in_interrupt()) triggers → Error #2 > Or please share how preempt is disabled in the above blk_mq_run_hw_queues > code path. The atomic context (preempt_count = 0x00000002) is inherited from Error #1. The code is already in atomic state when it enters the cleanup path. > If you think the same issue exists on recent kernel, show the stack trace. I don't have current data from upstream kernels. I will test on Friday and provide: 1. Results from 6.12-rt (first kernel with merged RT support) 2. Results from linux-next 3. Complete stack traces if the issue reproduces If the issue still exists on current upstream, I need to address Error #1 (the root cause) rather than Error #2 (the symptom). My current patch only suppresses the warning during cleanup but doesn't fix the underlying atomic context problem. I will report back with test results on Friday. - BUG: scheduling while atomic: kworker/u385:1/2041/0x00000002 Modules linked in: CPU: 190 PID: 2041 Comm: kworker/u385:1 Not tainted 6.6.0-1-rt-amd64 #1 Debian 6.6.71-1 Hardware name: Dell Inc. PowerEdge R7615/09K9WP, BIOS 1.11.2 12/19/2024 Workqueue: events_unbound async_run_entry_fn Call Trace: <TASK> dump_stack_lvl+0x37/0x50 __schedule_bug+0x52/0x60 __schedule+0x87d/0xb10 rt_mutex_schedule+0x21/0x40 rt_mutex_slowlock_block.constprop.0+0x33/0x170 __rt_mutex_slowlock_locked.constprop.0+0xc4/0x1e0 mutex_lock+0x44/0x60 __cpuhp_state_add_instance_cpuslocked+0x41/0x110 __cpuhp_state_add_instance+0x48/0xd0 blk_mq_realloc_hw_ctxs+0x405/0x420 blk_mq_init_allocated_queue+0x10a/0x480 intel_rapl_common: Found RAPL domain package ? srso_alias_return_thunk+0x5/0xfbef5 intel_rapl_common: Found RAPL domain core ? percpu_ref_init+0x6e/0x130 blk_mq_init_queue+0x3c/0x70 scsi_alloc_sdev+0x225/0x360 scsi_probe_and_add_lun+0x8ac/0xc00 ? srso_alias_return_thunk+0x5/0xfbef5 ? dev_set_name+0x57/0x80 ? srso_alias_return_thunk+0x5/0xfbef5 ? attribute_container_add_device+0x4d/0x130 __scsi_scan_target+0xf0/0x520 ? srso_alias_return_thunk+0x5/0xfbef5 ? sched_clock_cpu+0x64/0x190 scsi_scan_channel+0x57/0x90 scsi_scan_host_selected+0xd4/0x110 do_scan_async+0x1c/0x190 async_run_entry_fn+0x2f/0x130 process_one_work+0x175/0x370 worker_thread+0x280/0x390 ? __pfx_worker_thread+0x10/0x10 kthread+0xdd/0x110 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x31/0x50 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1b/0x30 </TASK> gnss: GNSS driver registered with major 241 ------------[ cut here ]------------ WARNING: CPU: 190 PID: 2041 at block/blk-mq.c:2291 blk_mq_run_hw_queue+0x1fa/0x260 Modules linked in: CPU: 190 PID: 2041 Comm: kworker/u385:1 Tainted: G W 6.6.0-1-rt-amd64 #1 Debian 6.6.71-1 Hardware name: Dell Inc. PowerEdge R7615/09K9WP, BIOS 1.11.2 12/19/2024 Workqueue: events_unbound async_run_entry_fn RIP: 0010:blk_mq_run_hw_queue+0x1fa/0x260 Code: ff 75 68 44 89 f6 e8 e5 45 c0 ff e9 ac fe ff ff e8 2b 70 c0 ff 48 89 ef e8 b3 a0 00 00 5b 5d 41 5c 41 5d 41 5e e9 26 9e c0 ff <0f> 0b e9 43 fe ff ff e8 0a 70 c0 ff 48 8b 85 d0 00 00 00 48 8b 80 RSP: 0018:ff630f098528fb98 EFLAGS: 00010206 RAX: 0000000000ff0000 RBX: 0000000000000000 RCX: 0000000000000000 RDX: 0000000000ff0000 RSI: 0000000000000000 RDI: ff3edc0247159400 RBP: ff3edc0247159400 R08: ff3edc0247159400 R09: ff630f098528fb60 R10: 0000000000000000 R11: 0000000045069ed3 R12: 0000000000000000 R13: ff3edc024715a828 R14: 0000000000000000 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ff3edc10fd380000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 000000073961a001 CR4: 0000000000771ee0 PKRU: 55555554 Call Trace: <TASK> ? __warn+0x89/0x140 ? blk_mq_run_hw_queue+0x1fa/0x260 ? report_bug+0x198/0x1b0 ? handle_bug+0x53/0x90 ? exc_invalid_op+0x18/0x70 ? asm_exc_invalid_op+0x1a/0x20 ? blk_mq_run_hw_queue+0x1fa/0x260 blk_mq_run_hw_queues+0x6c/0x130 blk_queue_start_drain+0x12/0x40 blk_mq_destroy_queue+0x37/0x70 __scsi_remove_device+0x6a/0x180 scsi_alloc_sdev+0x357/0x360 scsi_probe_and_add_lun+0x8ac/0xc00 ? srso_alias_return_thunk+0x5/0xfbef5 ? dev_set_name+0x57/0x80 ? srso_alias_return_thunk+0x5/0xfbef5 ? attribute_container_add_device+0x4d/0x130 __scsi_scan_target+0xf0/0x520 ? srso_alias_return_thunk+0x5/0xfbef5 ? sched_clock_cpu+0x64/0x190 scsi_scan_channel+0x57/0x90 scsi_scan_host_selected+0xd4/0x110 do_scan_async+0x1c/0x190 async_run_entry_fn+0x2f/0x130 process_one_work+0x175/0x370 worker_thread+0x280/0x390 ? __pfx_worker_thread+0x10/0x10 kthread+0xdd/0x110 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x31/0x50 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1b/0x30 </TASK> ---[ end trace 0000000000000000 ]--- ------------[ cut here ]------------ WARNING: CPU: 190 PID: 2041 at kernel/time/timer.c:1570 __timer_delete_sync+0x152/0x170 Modules linked in: CPU: 190 PID: 2041 Comm: kworker/u385:1 Tainted: G W 6.6.0-1-rt-amd64 #1 Debian 6.6.71-1 Hardware name: Dell Inc. PowerEdge R7615/09K9WP, BIOS 1.11.2 12/19/2024 Workqueue: events_unbound async_run_entry_fn RIP: 0010:__timer_delete_sync+0x152/0x170 Code: 8b 04 24 4c 89 c7 e8 ad 11 b9 00 f0 ff 4d 30 4c 8b 04 24 4c 89 c7 e8 8d 03 b9 00 be 00 02 00 00 4c 89 ff e8 e0 83 f3 ff eb 93 <0f> 0b e9 e8 fe ff ff 49 8d 2c 16 eb a8 e8 5c 49 b8 00 66 66 2e 0f RSP: 0018:ff630f098528fba8 EFLAGS: 00010246 RAX: 000000007fffffff RBX: ff3edc02829426d0 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ff3edc02829426d0 RBP: ff3edc02829425b0 R08: ff3edc0282942938 R09: ff630f098528fba0 R10: 0000000000000000 R11: 0000000045069ed3 R12: 0000000000000000 R13: ff3edc024715a828 R14: 0000000000000000 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ff3edc10fd380000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 000000073961a001 CR4: 0000000000771ee0 PKRU: 55555554 Call Trace: <TASK> ? __warn+0x89/0x140 ? __timer_delete_sync+0x152/0x170 ? report_bug+0x198/0x1b0 ? handle_bug+0x53/0x90 ? exc_invalid_op+0x18/0x70 ? asm_exc_invalid_op+0x1a/0x20 ? __timer_delete_sync+0x152/0x170 ? srso_alias_return_thunk+0x5/0xfbef5 ? percpu_ref_is_zero+0x3b/0x50 ? srso_alias_return_thunk+0x5/0xfbef5 blk_sync_queue+0x19/0x30 blk_mq_destroy_queue+0x47/0x70 __scsi_remove_device+0x6a/0x180 scsi_alloc_sdev+0x357/0x360 scsi_probe_and_add_lun+0x8ac/0xc00 ? srso_alias_return_thunk+0x5/0xfbef5 ? dev_set_name+0x57/0x80 ? srso_alias_return_thunk+0x5/0xfbef5 ? attribute_container_add_device+0x4d/0x130 __scsi_scan_target+0xf0/0x520 ? srso_alias_return_thunk+0x5/0xfbef5 ? sched_clock_cpu+0x64/0x190 scsi_scan_channel+0x57/0x90 scsi_scan_host_selected+0xd4/0x110 do_scan_async+0x1c/0x190 async_run_entry_fn+0x2f/0x130 process_one_work+0x175/0x370 worker_thread+0x280/0x390 ? __pfx_worker_thread+0x10/0x10 kthread+0xdd/0x110 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x31/0x50 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1b/0x30 </TASK> ---[ end trace 0000000000000000 ]--- drop_monitor: Initializing network drop monitor service ------------[ cut here ]------------ WARNING: CPU: 190 PID: 2041 at kernel/time/timer.c:1570 __timer_delete_sync+0x152/0x170 Modules linked in: CPU: 190 PID: 2041 Comm: kworker/u385:1 Tainted: G W 6.6.0-1-rt-amd64 #1 Debian 6.6.71-1 Hardware name: Dell Inc. PowerEdge R7615/09K9WP, BIOS 1.11.2 12/19/2024 Workqueue: events_unbound async_run_entry_fn RIP: 0010:__timer_delete_sync+0x152/0x170 Code: 8b 04 24 4c 89 c7 e8 ad 11 b9 00 f0 ff 4d 30 4c 8b 04 24 4c 89 c7 e8 8d 03 b9 00 be 00 02 00 00 4c 89 ff e8 e0 83 f3 ff eb 93 <0f> 0b e9 e8 fe ff ff 49 8d 2c 16 eb a8 e8 5c 49 b8 00 66 66 2e 0f RSP: 0018:ff630f098528fba8 EFLAGS: 00010246 RAX: 000000007fffffff RBX: ff3edc0282943790 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ff3edc0282943790 RBP: ff3edc0282943670 R08: ff3edc02829439f8 R09: ff630f098528fba0 R10: 0000000000000000 R11: 00000000b3b80e06 R12: 0000000000000000 R13: ff3edc02828dc428 R14: 0000000000000000 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ff3edc10fd380000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 000000073961a001 CR4: 0000000000771ee0 PKRU: 55555554 Call Trace: <TASK> ? __warn+0x89/0x140 ? __timer_delete_sync+0x152/0x170 ? report_bug+0x198/0x1b0 ? handle_bug+0x53/0x90 ? exc_invalid_op+0x18/0x70 ? asm_exc_invalid_op+0x1a/0x20 ? __timer_delete_sync+0x152/0x170 ? srso_alias_return_thunk+0x5/0xfbef5 ? percpu_ref_is_zero+0x3b/0x50 ? srso_alias_return_thunk+0x5/0xfbef5 blk_sync_queue+0x19/0x30 blk_mq_destroy_queue+0x47/0x70 __scsi_remove_device+0x6a/0x180 scsi_alloc_sdev+0x357/0x360 scsi_probe_and_add_lun+0x8ac/0xc00 ? srso_alias_return_thunk+0x5/0xfbef5 ? dev_set_name+0x57/0x80 ? srso_alias_return_thunk+0x5/0xfbef5 ? attribute_container_add_device+0x4d/0x130 __scsi_scan_target+0xf0/0x520 ? srso_alias_return_thunk+0x5/0xfbef5 ? sched_clock_cpu+0x64/0x190 scsi_scan_channel+0x57/0x90 scsi_scan_host_selected+0xd4/0x110 do_scan_async+0x1c/0x190 async_run_entry_fn+0x2f/0x130 process_one_work+0x175/0x370 worker_thread+0x280/0x390 ? __pfx_worker_thread+0x10/0x10 kthread+0xdd/0x110 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x31/0x50 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1b/0x30 </TASK> ---[ end trace 0000000000000000 ]--- ------------[ cut here ]------------ WARNING: CPU: 190 PID: 2041 at kernel/time/timer.c:1570 __timer_delete_sync+0x152/0x170 Modules linked in: CPU: 190 PID: 2041 Comm: kworker/u385:1 Tainted: G W 6.6.0-1-rt-amd64 #1 Debian 6.6.71-1 Hardware name: Dell Inc. PowerEdge R7615/09K9WP, BIOS 1.11.2 12/19/2024 Workqueue: events_unbound async_run_entry_fn RIP: 0010:__timer_delete_sync+0x152/0x170 Code: 8b 04 24 4c 89 c7 e8 ad 11 b9 00 f0 ff 4d 30 4c 8b 04 24 4c 89 c7 e8 8d 03 b9 00 be 00 02 00 00 4c 89 ff e8 e0 83 f3 ff eb 93 <0f> 0b e9 e8 fe ff ff 49 8d 2c 16 eb a8 e8 5c 49 b8 00 66 66 2e 0f RSP: 0018:ff630f098528fba8 EFLAGS: 00010246 RAX: 000000007fffffff RBX: ff3edc0282944420 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ff3edc0282944420 RBP: ff3edc0282944300 R08: ff3edc0282944688 R09: ff630f098528fba0 R10: 0000000000000000 R11: 0000000043ba156d R12: 0000000000000000 R13: ff3edc02829ec028 R14: 0000000000000000 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ff3edc10fd380000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 000000073961a001 CR4: 0000000000771ee0 PKRU: 55555554 Call Trace: <TASK> ? __warn+0x89/0x140 ? __timer_delete_sync+0x152/0x170 ? report_bug+0x198/0x1b0 ? handle_bug+0x53/0x90 ? exc_invalid_op+0x18/0x70 ? asm_exc_invalid_op+0x1a/0x20 ? __timer_delete_sync+0x152/0x170 ? srso_alias_return_thunk+0x5/0xfbef5 ? percpu_ref_is_zero+0x3b/0x50 ? srso_alias_return_thunk+0x5/0xfbef5 blk_sync_queue+0x19/0x30 blk_mq_destroy_queue+0x47/0x70 __scsi_remove_device+0x6a/0x180 scsi_alloc_sdev+0x357/0x360 scsi_probe_and_add_lun+0x8ac/0xc00 ? srso_alias_return_thunk+0x5/0xfbef5 ? dev_set_name+0x57/0x80 ? srso_alias_return_thunk+0x5/0xfbef5 ? attribute_container_add_device+0x4d/0x130 __scsi_scan_target+0xf0/0x520 ? srso_alias_return_thunk+0x5/0xfbef5 ? sched_clock_cpu+0x64/0x190 scsi_scan_channel+0x57/0x90 scsi_scan_host_selected+0xd4/0x110 do_scan_async+0x1c/0x190 async_run_entry_fn+0x2f/0x130 process_one_work+0x175/0x370 worker_thread+0x280/0x390 ? __pfx_worker_thread+0x10/0x10 kthread+0xdd/0x110 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x31/0x50 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1b/0x30 </TASK> ---[ end trace 0000000000000000 ]--- Initializing XFRM netlink socket Thank you for the careful review, Ionut ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2 2/2] block: Fix WARN_ON in blk_mq_run_hw_queue when called from interrupt context 2025-12-22 20:15 ` [PATCH v2 2/2] block: Fix WARN_ON in blk_mq_run_hw_queue when called from interrupt context Ionut Nechita (WindRiver) 2025-12-23 1:22 ` Ming Lei @ 2025-12-23 2:18 ` Muchun Song 1 sibling, 0 replies; 12+ messages in thread From: Muchun Song @ 2025-12-23 2:18 UTC (permalink / raw) To: Ionut Nechita (WindRiver) Cc: ming.lei, axboe, gregkh, ionut.nechita, linux-block, linux-kernel, sashal, stable > On Dec 23, 2025, at 04:15, Ionut Nechita (WindRiver) <djiony2011@gmail.com> wrote: > > From: Ionut Nechita <ionut.nechita@windriver.com> > > Fix warning "WARN_ON_ONCE(!async && in_interrupt())" that occurs during > SCSI device scanning when blk_freeze_queue_start() calls blk_mq_run_hw_queues() > synchronously from interrupt context. > > The issue happens during device removal/scanning when: > 1. blk_mq_destroy_queue() -> blk_queue_start_drain() > 2. blk_freeze_queue_start() calls blk_mq_run_hw_queues(q, false) > 3. This triggers the warning in blk_mq_run_hw_queue() when in interrupt context > > Change the synchronous call to asynchronous to avoid running in interrupt context. > > Fixes: Warning in blk_mq_run_hw_queue+0x1fa/0x260 You've added a wrong format of Fixes tag. Thanks. > Signed-off-by: Ionut Nechita <ionut.nechita@windriver.com> ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2026-01-06 16:36 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-12-22 20:15 [PATCH v2 0/2] block/blk-mq: fix RT kernel issues and interrupt context warnings Ionut Nechita (WindRiver) 2025-12-22 20:15 ` [PATCH v2 1/2] block/blk-mq: fix RT kernel regression with queue_lock in hot path Ionut Nechita (WindRiver) 2025-12-23 2:15 ` Muchun Song 2026-01-06 11:36 ` djiony2011 2025-12-22 20:15 ` [PATCH v2 2/2] block: Fix WARN_ON in blk_mq_run_hw_queue when called from interrupt context Ionut Nechita (WindRiver) 2025-12-23 1:22 ` Ming Lei 2026-01-06 11:14 ` djiony2011 2026-01-06 12:29 ` Bart Van Assche 2026-01-06 14:40 ` Ionut Nechita 2026-01-06 15:04 ` Ming Lei 2026-01-06 16:35 ` Ionut Nechita (WindRiver) 2025-12-23 2:18 ` Muchun Song
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).