* [PATCH v2] ufs: core: Fix IRQ lock inversion
@ 2025-07-21 21:50 Bart Van Assche
2025-07-22 7:23 ` Peter Wang (王信友)
0 siblings, 1 reply; 4+ messages in thread
From: Bart Van Assche @ 2025-07-21 21:50 UTC (permalink / raw)
To: Martin K . Petersen
Cc: linux-scsi, Bart Van Assche, Neil Armstrong, André Draszik,
James E.J. Bottomley, Peter Wang, Avri Altman, Bean Huo,
Manivannan Sadhasivam, Bao D. Nguyen
Commit 3c7ac40d7322 ("scsi: ufs: core: Delegate the interrupt service
routine to a threaded IRQ handler") introduced an IRQ lock inversion
issue. Fix this lock inversion by changing the spin_lock_irq() calls into
spin_lock_irqsave() calls in code that can be called either from interrupt
context or from thread context. This patch fixes the following lockdep
complaint:
WARNING: possible irq lock inversion dependency detected
6.12.30-android16-5-maybe-dirty-4k #1 Tainted: G W OE
--------------------------------------------------------
kworker/u28:0/12 just changed the state of lock:
ffffff881e29dd60 (&hba->clk_gating.lock){-...}-{2:2}, at: ufshcd_release_scsi_cmd+0x60/0x110
but this lock took another, HARDIRQ-unsafe lock in the past:
(shost->host_lock){+.+.}-{2:2}
and interrupts could create inverse lock ordering between them.
other info that might help us debug this:
Possible interrupt unsafe locking scenario:
CPU0 CPU1
---- ----
lock(shost->host_lock);
local_irq_disable();
lock(&hba->clk_gating.lock);
lock(shost->host_lock);
<Interrupt>
lock(&hba->clk_gating.lock);
*** DEADLOCK ***
4 locks held by kworker/u28:0/12:
#0: ffffff8800ac6158 ((wq_completion)async){+.+.}-{0:0}, at: process_one_work+0x1bc/0x65c
#1: ffffffc085c93d70 ((work_completion)(&entry->work)){+.+.}-{0:0}, at: process_one_work+0x1e4/0x65c
#2: ffffff881e29c0e0 (&shost->scan_mutex){+.+.}-{3:3}, at: __scsi_add_device+0x74/0x120
#3: ffffff881960ea00 (&hwq->cq_lock){-...}-{2:2}, at: ufshcd_mcq_poll_cqe_lock+0x28/0x104
the shortest dependencies between 2nd lock and 1st lock:
-> (shost->host_lock){+.+.}-{2:2} {
HARDIRQ-ON-W at:
lock_acquire+0x134/0x2b4
_raw_spin_lock+0x48/0x64
ufshcd_sl_intr+0x4c/0xa08
ufshcd_threaded_intr+0x70/0x12c
irq_thread_fn+0x48/0xa8
irq_thread+0x130/0x1ec
kthread+0x110/0x134
ret_from_fork+0x10/0x20
SOFTIRQ-ON-W at:
lock_acquire+0x134/0x2b4
_raw_spin_lock+0x48/0x64
ufshcd_sl_intr+0x4c/0xa08
ufshcd_threaded_intr+0x70/0x12c
irq_thread_fn+0x48/0xa8
irq_thread+0x130/0x1ec
kthread+0x110/0x134
ret_from_fork+0x10/0x20
INITIAL USE at:
lock_acquire+0x134/0x2b4
_raw_spin_lock+0x48/0x64
ufshcd_sl_intr+0x4c/0xa08
ufshcd_threaded_intr+0x70/0x12c
irq_thread_fn+0x48/0xa8
irq_thread+0x130/0x1ec
kthread+0x110/0x134
ret_from_fork+0x10/0x20
}
... key at: [<ffffffc085ba1a98>] scsi_host_alloc.__key+0x0/0x10
... acquired at:
_raw_spin_lock_irqsave+0x5c/0x80
__ufshcd_release+0x78/0x118
ufshcd_send_uic_cmd+0xe4/0x118
ufshcd_dme_set_attr+0x88/0x1c8
ufs_google_phy_initialization+0x68/0x418 [ufs]
ufs_google_link_startup_notify+0x78/0x27c [ufs]
ufshcd_link_startup+0x84/0x720
ufshcd_init+0xf3c/0x1330
ufshcd_pltfrm_init+0x728/0x7d8
ufs_google_probe+0x30/0x84 [ufs]
platform_probe+0xa0/0xe0
really_probe+0x114/0x454
__driver_probe_device+0xa4/0x160
driver_probe_device+0x44/0x23c
__driver_attach_async_helper+0x60/0xd4
async_run_entry_fn+0x4c/0x17c
process_one_work+0x26c/0x65c
worker_thread+0x33c/0x498
kthread+0x110/0x134
ret_from_fork+0x10/0x20
-> (&hba->clk_gating.lock){-...}-{2:2} {
IN-HARDIRQ-W at:
lock_acquire+0x134/0x2b4
_raw_spin_lock_irqsave+0x5c/0x80
ufshcd_release_scsi_cmd+0x60/0x110
ufshcd_compl_one_cqe+0x2c0/0x3f4
ufshcd_mcq_poll_cqe_lock+0xb0/0x104
ufs_google_mcq_intr+0x80/0xa0 [ufs]
__handle_irq_event_percpu+0x104/0x32c
handle_irq_event+0x40/0x9c
handle_fasteoi_irq+0x170/0x2e8
generic_handle_domain_irq+0x58/0x80
gic_handle_irq+0x48/0x104
call_on_irq_stack+0x3c/0x50
do_interrupt_handler+0x7c/0xd8
el1_interrupt+0x34/0x58
el1h_64_irq_handler+0x18/0x24
el1h_64_irq+0x68/0x6c
_raw_spin_unlock_irqrestore+0x3c/0x6c
debug_object_assert_init+0x16c/0x21c
__mod_timer+0x4c/0x48c
schedule_timeout+0xd4/0x16c
io_schedule_timeout+0x48/0x70
do_wait_for_common+0x100/0x194
wait_for_completion_io_timeout+0x48/0x6c
blk_execute_rq+0x124/0x17c
scsi_execute_cmd+0x18c/0x3f8
scsi_probe_and_add_lun+0x204/0xd74
__scsi_add_device+0xbc/0x120
ufshcd_async_scan+0x80/0x3c0
async_run_entry_fn+0x4c/0x17c
process_one_work+0x26c/0x65c
worker_thread+0x33c/0x498
kthread+0x110/0x134
ret_from_fork+0x10/0x20
INITIAL USE at:
lock_acquire+0x134/0x2b4
_raw_spin_lock_irqsave+0x5c/0x80
ufshcd_hold+0x34/0x14c
ufshcd_send_uic_cmd+0x28/0x118
ufshcd_dme_set_attr+0x88/0x1c8
ufs_google_phy_initialization+0x68/0x418 [ufs]
ufs_google_link_startup_notify+0x78/0x27c [ufs]
ufshcd_link_startup+0x84/0x720
ufshcd_init+0xf3c/0x1330
ufshcd_pltfrm_init+0x728/0x7d8
ufs_google_probe+0x30/0x84 [ufs]
platform_probe+0xa0/0xe0
really_probe+0x114/0x454
__driver_probe_device+0xa4/0x160
driver_probe_device+0x44/0x23c
__driver_attach_async_helper+0x60/0xd4
async_run_entry_fn+0x4c/0x17c
process_one_work+0x26c/0x65c
worker_thread+0x33c/0x498
kthread+0x110/0x134
ret_from_fork+0x10/0x20
}
... key at: [<ffffffc085ba6fe8>] ufshcd_init.__key+0x0/0x10
... acquired at:
mark_lock+0x1c4/0x224
__lock_acquire+0x438/0x2e1c
lock_acquire+0x134/0x2b4
_raw_spin_lock_irqsave+0x5c/0x80
ufshcd_release_scsi_cmd+0x60/0x110
ufshcd_compl_one_cqe+0x2c0/0x3f4
ufshcd_mcq_poll_cqe_lock+0xb0/0x104
ufs_google_mcq_intr+0x80/0xa0 [ufs]
__handle_irq_event_percpu+0x104/0x32c
handle_irq_event+0x40/0x9c
handle_fasteoi_irq+0x170/0x2e8
generic_handle_domain_irq+0x58/0x80
gic_handle_irq+0x48/0x104
call_on_irq_stack+0x3c/0x50
do_interrupt_handler+0x7c/0xd8
el1_interrupt+0x34/0x58
el1h_64_irq_handler+0x18/0x24
el1h_64_irq+0x68/0x6c
_raw_spin_unlock_irqrestore+0x3c/0x6c
debug_object_assert_init+0x16c/0x21c
__mod_timer+0x4c/0x48c
schedule_timeout+0xd4/0x16c
io_schedule_timeout+0x48/0x70
do_wait_for_common+0x100/0x194
wait_for_completion_io_timeout+0x48/0x6c
blk_execute_rq+0x124/0x17c
scsi_execute_cmd+0x18c/0x3f8
scsi_probe_and_add_lun+0x204/0xd74
__scsi_add_device+0xbc/0x120
ufshcd_async_scan+0x80/0x3c0
async_run_entry_fn+0x4c/0x17c
process_one_work+0x26c/0x65c
worker_thread+0x33c/0x498
kthread+0x110/0x134
ret_from_fork+0x10/0x20
stack backtrace:
CPU: 6 UID: 0 PID: 12 Comm: kworker/u28:0 Tainted: G W OE 6.12.30-android16-5-maybe-dirty-4k #1 ccd4020fe444bdf629efc3b86df6be920b8df7d0
Tainted: [W]=WARN, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
Hardware name: Spacecraft board based on MALIBU (DT)
Workqueue: async async_run_entry_fn
Call trace:
dump_backtrace+0xfc/0x17c
show_stack+0x18/0x28
dump_stack_lvl+0x40/0xa0
dump_stack+0x18/0x24
print_irq_inversion_bug+0x2fc/0x304
mark_lock_irq+0x388/0x4fc
mark_lock+0x1c4/0x224
__lock_acquire+0x438/0x2e1c
lock_acquire+0x134/0x2b4
_raw_spin_lock_irqsave+0x5c/0x80
ufshcd_release_scsi_cmd+0x60/0x110
ufshcd_compl_one_cqe+0x2c0/0x3f4
ufshcd_mcq_poll_cqe_lock+0xb0/0x104
ufs_google_mcq_intr+0x80/0xa0 [ufs dd6f385554e109da094ab91d5f7be18625a2222a]
__handle_irq_event_percpu+0x104/0x32c
handle_irq_event+0x40/0x9c
handle_fasteoi_irq+0x170/0x2e8
generic_handle_domain_irq+0x58/0x80
gic_handle_irq+0x48/0x104
call_on_irq_stack+0x3c/0x50
do_interrupt_handler+0x7c/0xd8
el1_interrupt+0x34/0x58
el1h_64_irq_handler+0x18/0x24
el1h_64_irq+0x68/0x6c
_raw_spin_unlock_irqrestore+0x3c/0x6c
debug_object_assert_init+0x16c/0x21c
__mod_timer+0x4c/0x48c
schedule_timeout+0xd4/0x16c
io_schedule_timeout+0x48/0x70
do_wait_for_common+0x100/0x194
wait_for_completion_io_timeout+0x48/0x6c
blk_execute_rq+0x124/0x17c
scsi_execute_cmd+0x18c/0x3f8
scsi_probe_and_add_lun+0x204/0xd74
__scsi_add_device+0xbc/0x120
ufshcd_async_scan+0x80/0x3c0
async_run_entry_fn+0x4c/0x17c
process_one_work+0x26c/0x65c
worker_thread+0x33c/0x498
kthread+0x110/0x134
ret_from_fork+0x10/0x20
Cc: Neil Armstrong <neil.armstrong@linaro.org>
Cc: André Draszik <andre.draszik@linaro.org>
Fixes: 3c7ac40d7322 ("scsi: ufs: core: Delegate the interrupt service routine to a threaded IRQ handler")
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
Changes compared to v1: fixed a build error.
drivers/ufs/core/ufshcd.c | 11 ++++-------
1 file changed, 4 insertions(+), 7 deletions(-)
diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c
index acfc1b4691fa..8b72cf7811ca 100644
--- a/drivers/ufs/core/ufshcd.c
+++ b/drivers/ufs/core/ufshcd.c
@@ -5566,7 +5566,7 @@ static irqreturn_t ufshcd_uic_cmd_compl(struct ufs_hba *hba, u32 intr_status)
irqreturn_t retval = IRQ_NONE;
struct uic_command *cmd;
- spin_lock(hba->host->host_lock);
+ guard(spinlock_irqsave)(hba->host->host_lock);
cmd = hba->active_uic_cmd;
if (WARN_ON_ONCE(!cmd))
goto unlock;
@@ -5593,8 +5593,6 @@ static irqreturn_t ufshcd_uic_cmd_compl(struct ufs_hba *hba, u32 intr_status)
ufshcd_add_uic_command_trace(hba, cmd, UFS_CMD_COMP);
unlock:
- spin_unlock(hba->host->host_lock);
-
return retval;
}
@@ -6455,10 +6453,9 @@ void ufshcd_schedule_eh_work(struct ufs_hba *hba)
static void ufshcd_force_error_recovery(struct ufs_hba *hba)
{
- spin_lock_irq(hba->host->host_lock);
+ guard(spinlock_irqsave)(hba->host->host_lock);
hba->force_reset = true;
ufshcd_schedule_eh_work(hba);
- spin_unlock_irq(hba->host->host_lock);
}
static void ufshcd_clk_scaling_allow(struct ufs_hba *hba, bool allow)
@@ -6922,7 +6919,7 @@ static irqreturn_t ufshcd_check_errors(struct ufs_hba *hba, u32 intr_status)
bool queue_eh_work = false;
irqreturn_t retval = IRQ_NONE;
- spin_lock(hba->host->host_lock);
+ guard(spinlock_irqsave)(hba->host->host_lock);
hba->errors |= UFSHCD_ERROR_MASK & intr_status;
if (hba->errors & INT_FATAL_ERRORS) {
@@ -6981,7 +6978,7 @@ static irqreturn_t ufshcd_check_errors(struct ufs_hba *hba, u32 intr_status)
*/
hba->errors = 0;
hba->uic_error = 0;
- spin_unlock(hba->host->host_lock);
+
return retval;
}
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH v2] ufs: core: Fix IRQ lock inversion
2025-07-21 21:50 [PATCH v2] ufs: core: Fix IRQ lock inversion Bart Van Assche
@ 2025-07-22 7:23 ` Peter Wang (王信友)
2025-07-22 15:35 ` Bart Van Assche
0 siblings, 1 reply; 4+ messages in thread
From: Peter Wang (王信友) @ 2025-07-22 7:23 UTC (permalink / raw)
To: bvanassche@acm.org, martin.petersen@oracle.com
Cc: huobean@gmail.com, neil.armstrong@linaro.org,
quic_nguyenb@quicinc.com, linux-scsi@vger.kernel.org,
andre.draszik@linaro.org, avri.altman@sandisk.com,
mani@kernel.org, James.Bottomley@HansenPartnership.com
On Mon, 2025-07-21 at 14:50 -0700, Bart Van Assche wrote:
> other info that might help us debug this:
> Possible interrupt unsafe locking scenario:
>
> CPU0 CPU1
> ---- ----
> lock(shost->host_lock);
> local_irq_disable();
> lock(&hba->clk_gating.lock);
> lock(shost->host_lock);
> <Interrupt>
> lock(&hba->clk_gating.lock);
>
> *** DEADLOCK ***
>
> 4 locks held by kworker/u28:0/12:
> #0: ffffff8800ac6158 ((wq_completion)async){+.+.}-{0:0}, at:
> process_one_work+0x1bc/0x65c
> #1: ffffffc085c93d70 ((work_completion)(&entry->work)){+.+.}-{0:0},
> at: process_one_work+0x1e4/0x65c
> #2: ffffff881e29c0e0 (&shost->scan_mutex){+.+.}-{3:3}, at:
> __scsi_add_device+0x74/0x120
> #3: ffffff881960ea00 (&hwq->cq_lock){-...}-{2:2}, at:
> ufshcd_mcq_poll_cqe_lock+0x28/0x104
>
Hi Bart,
This kworker is acquiring cq_lock, not host_lock?
> stack backtrace:
> CPU: 6 UID: 0 PID: 12 Comm: kworker/u28:0 Tainted: G W
> OE 6.12.30-android16-5-maybe-dirty-4k #1
> ccd4020fe444bdf629efc3b86df6be920b8df7d0
> Tainted: [W]=WARN, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
> Hardware name: Spacecraft board based on MALIBU (DT)
> Workqueue: async async_run_entry_fn
> Call trace:
> dump_backtrace+0xfc/0x17c
> show_stack+0x18/0x28
> dump_stack_lvl+0x40/0xa0
> dump_stack+0x18/0x24
> print_irq_inversion_bug+0x2fc/0x304
> mark_lock_irq+0x388/0x4fc
> mark_lock+0x1c4/0x224
> __lock_acquire+0x438/0x2e1c
> lock_acquire+0x134/0x2b4
> _raw_spin_lock_irqsave+0x5c/0x80
> ufshcd_release_scsi_cmd+0x60/0x110
> ufshcd_compl_one_cqe+0x2c0/0x3f4
> ufshcd_mcq_poll_cqe_lock+0xb0/0x104
> ufs_google_mcq_intr+0x80/0xa0 [ufs
> dd6f385554e109da094ab91d5f7be18625a2222a]
> __handle_irq_event_percpu+0x104/0x32c
> handle_irq_event+0x40/0x9c
> handle_fasteoi_irq+0x170/0x2e8
> generic_handle_domain_irq+0x58/0x80
> gic_handle_irq+0x48/0x104
> call_on_irq_stack+0x3c/0x50
> do_interrupt_handler+0x7c/0xd8
> el1_interrupt+0x34/0x58
> el1h_64_irq_handler+0x18/0x24
> el1h_64_irq+0x68/0x6c
> _raw_spin_unlock_irqrestore+0x3c/0x6c
> debug_object_assert_init+0x16c/0x21c
> __mod_timer+0x4c/0x48c
> schedule_timeout+0xd4/0x16c
> io_schedule_timeout+0x48/0x70
> do_wait_for_common+0x100/0x194
> wait_for_completion_io_timeout+0x48/0x6c
> blk_execute_rq+0x124/0x17c
> scsi_execute_cmd+0x18c/0x3f8
> scsi_probe_and_add_lun+0x204/0xd74
> __scsi_add_device+0xbc/0x120
> ufshcd_async_scan+0x80/0x3c0
> async_run_entry_fn+0x4c/0x17c
> process_one_work+0x26c/0x65c
> worker_thread+0x33c/0x498
> kthread+0x110/0x134
> ret_from_fork+0x10/0x20
>
This backtrace also looks like it is acquiring cq_lock
rather than host_lock. I'm not sure if I missed something?
>
> diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c
> index acfc1b4691fa..8b72cf7811ca 100644
> --- a/drivers/ufs/core/ufshcd.c
> +++ b/drivers/ufs/core/ufshcd.c
> @@ -5566,7 +5566,7 @@ static irqreturn_t ufshcd_uic_cmd_compl(struct
> ufs_hba *hba, u32 intr_status)
> irqreturn_t retval = IRQ_NONE;
> struct uic_command *cmd;
>
> - spin_lock(hba->host->host_lock);
> + guard(spinlock_irqsave)(hba->host->host_lock);
> cmd = hba->active_uic_cmd;
> if (WARN_ON_ONCE(!cmd))
> goto unlock;
> @@ -5593,8 +5593,6 @@ static irqreturn_t ufshcd_uic_cmd_compl(struct
> ufs_hba *hba, u32 intr_status)
> ufshcd_add_uic_command_trace(hba, cmd, UFS_CMD_COMP);
>
> unlock:
> - spin_unlock(hba->host->host_lock);
> -
> return retval;
> }
>
> @@ -6455,10 +6453,9 @@ void ufshcd_schedule_eh_work(struct ufs_hba
> *hba)
>
> static void ufshcd_force_error_recovery(struct ufs_hba *hba)
> {
> - spin_lock_irq(hba->host->host_lock);
> + guard(spinlock_irqsave)(hba->host->host_lock);
> hba->force_reset = true;
> ufshcd_schedule_eh_work(hba);
> - spin_unlock_irq(hba->host->host_lock);
> }
>
The other two functions do seem to have issues,
but ufshcd_force_error_recovery shouldn't,
because it is not used by ufshcd_threaded_intr, right?
Thanks.
Peter
> static void ufshcd_clk_scaling_allow(struct ufs_hba *hba, bool
> allow)
> @@ -6922,7 +6919,7 @@ static irqreturn_t ufshcd_check_errors(struct
> ufs_hba *hba, u32 intr_status)
> bool queue_eh_work = false;
> irqreturn_t retval = IRQ_NONE;
>
> - spin_lock(hba->host->host_lock);
> + guard(spinlock_irqsave)(hba->host->host_lock);
> hba->errors |= UFSHCD_ERROR_MASK & intr_status;
>
> if (hba->errors & INT_FATAL_ERRORS) {
> @@ -6981,7 +6978,7 @@ static irqreturn_t ufshcd_check_errors(struct
> ufs_hba *hba, u32 intr_status)
> */
> hba->errors = 0;
> hba->uic_error = 0;
> - spin_unlock(hba->host->host_lock);
> +
> return retval;
> }
>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH v2] ufs: core: Fix IRQ lock inversion
2025-07-22 7:23 ` Peter Wang (王信友)
@ 2025-07-22 15:35 ` Bart Van Assche
2025-07-23 9:31 ` Peter Wang (王信友)
0 siblings, 1 reply; 4+ messages in thread
From: Bart Van Assche @ 2025-07-22 15:35 UTC (permalink / raw)
To: Peter Wang (王信友), martin.petersen@oracle.com
Cc: huobean@gmail.com, neil.armstrong@linaro.org,
quic_nguyenb@quicinc.com, linux-scsi@vger.kernel.org,
andre.draszik@linaro.org, avri.altman@sandisk.com,
mani@kernel.org, James.Bottomley@HansenPartnership.com
On 7/22/25 12:23 AM, Peter Wang (王信友) wrote:
> On Mon, 2025-07-21 at 14:50 -0700, Bart Van Assche wrote:
>> other info that might help us debug this:
>> Possible interrupt unsafe locking scenario:
>>
>> CPU0 CPU1
>> ---- ----
>> lock(shost->host_lock);
>> local_irq_disable();
>> lock(&hba->clk_gating.lock);
>> lock(shost->host_lock);
>> <Interrupt>
>> lock(&hba->clk_gating.lock);
>>
>> *** DEADLOCK ***
>>
>> 4 locks held by kworker/u28:0/12:
>> #0: ffffff8800ac6158 ((wq_completion)async){+.+.}-{0:0}, at:
>> process_one_work+0x1bc/0x65c
>> #1: ffffffc085c93d70 ((work_completion)(&entry->work)){+.+.}-{0:0},
>> at: process_one_work+0x1e4/0x65c
>> #2: ffffff881e29c0e0 (&shost->scan_mutex){+.+.}-{3:3}, at:
>> __scsi_add_device+0x74/0x120
>> #3: ffffff881960ea00 (&hwq->cq_lock){-...}-{2:2}, at:
>> ufshcd_mcq_poll_cqe_lock+0x28/0x104
>
> This kworker is acquiring cq_lock, not host_lock?
No. "lock(shost->host_lock)" means that it is acquiring the SCSI host
lock.
>> stack backtrace:
>> CPU: 6 UID: 0 PID: 12 Comm: kworker/u28:0 Tainted: G W
>> OE 6.12.30-android16-5-maybe-dirty-4k #1
>> ccd4020fe444bdf629efc3b86df6be920b8df7d0
>> Tainted: [W]=WARN, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
>> Hardware name: Spacecraft board based on MALIBU (DT)
>> Workqueue: async async_run_entry_fn
>> Call trace:
>> dump_backtrace+0xfc/0x17c
>> show_stack+0x18/0x28
>> dump_stack_lvl+0x40/0xa0
>> dump_stack+0x18/0x24
>> print_irq_inversion_bug+0x2fc/0x304
>> mark_lock_irq+0x388/0x4fc
>> mark_lock+0x1c4/0x224
>> __lock_acquire+0x438/0x2e1c
>> lock_acquire+0x134/0x2b4
>> _raw_spin_lock_irqsave+0x5c/0x80
>> ufshcd_release_scsi_cmd+0x60/0x110
>> ufshcd_compl_one_cqe+0x2c0/0x3f4
>> ufshcd_mcq_poll_cqe_lock+0xb0/0x104
>> ufs_google_mcq_intr+0x80/0xa0 [ufs
>> dd6f385554e109da094ab91d5f7be18625a2222a]
>> __handle_irq_event_percpu+0x104/0x32c
>> handle_irq_event+0x40/0x9c
>> handle_fasteoi_irq+0x170/0x2e8
>> generic_handle_domain_irq+0x58/0x80
>> gic_handle_irq+0x48/0x104
>> call_on_irq_stack+0x3c/0x50
>> do_interrupt_handler+0x7c/0xd8
>> el1_interrupt+0x34/0x58
>> el1h_64_irq_handler+0x18/0x24
>> el1h_64_irq+0x68/0x6c
>> _raw_spin_unlock_irqrestore+0x3c/0x6c
>> debug_object_assert_init+0x16c/0x21c
>> __mod_timer+0x4c/0x48c
>> schedule_timeout+0xd4/0x16c
>> io_schedule_timeout+0x48/0x70
>> do_wait_for_common+0x100/0x194
>> wait_for_completion_io_timeout+0x48/0x6c
>> blk_execute_rq+0x124/0x17c
>> scsi_execute_cmd+0x18c/0x3f8
>> scsi_probe_and_add_lun+0x204/0xd74
>> __scsi_add_device+0xbc/0x120
>> ufshcd_async_scan+0x80/0x3c0
>> async_run_entry_fn+0x4c/0x17c
>> process_one_work+0x26c/0x65c
>> worker_thread+0x33c/0x498
>> kthread+0x110/0x134
>> ret_from_fork+0x10/0x20
>>
>
> This backtrace also looks like it is acquiring cq_lock
> rather than host_lock. I'm not sure if I missed something?
No. ufshcd_release_scsi_cmd() doesn't call any function that acquires
cq_lock.
>> @@ -6455,10 +6453,9 @@ void ufshcd_schedule_eh_work(struct ufs_hba
>> *hba)
>>
>> static void ufshcd_force_error_recovery(struct ufs_hba *hba)
>> {
>> - spin_lock_irq(hba->host->host_lock);
>> + guard(spinlock_irqsave)(hba->host->host_lock);
>> hba->force_reset = true;
>> ufshcd_schedule_eh_work(hba);
>> - spin_unlock_irq(hba->host->host_lock);
>> }
>>
>
> The other two functions do seem to have issues,
> but ufshcd_force_error_recovery shouldn't,
> because it is not used by ufshcd_threaded_intr, right?
Agreed. I will leave out this change.
Thanks,
Bart.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH v2] ufs: core: Fix IRQ lock inversion
2025-07-22 15:35 ` Bart Van Assche
@ 2025-07-23 9:31 ` Peter Wang (王信友)
0 siblings, 0 replies; 4+ messages in thread
From: Peter Wang (王信友) @ 2025-07-23 9:31 UTC (permalink / raw)
To: bvanassche@acm.org, martin.petersen@oracle.com
Cc: huobean@gmail.com, neil.armstrong@linaro.org,
quic_nguyenb@quicinc.com, linux-scsi@vger.kernel.org,
andre.draszik@linaro.org, avri.altman@sandisk.com,
mani@kernel.org, James.Bottomley@HansenPartnership.com
On Tue, 2025-07-22 at 08:35 -0700, Bart Van Assche wrote:
> > >
> > > 4 locks held by kworker/u28:0/12:
> > > #0: ffffff8800ac6158 ((wq_completion)async){+.+.}-{0:0}, at:
> > > process_one_work+0x1bc/0x65c
> > > #1: ffffffc085c93d70 ((work_completion)(&entry->work)){+.+.}-
> > > {0:0},
> > > at: process_one_work+0x1e4/0x65c
> > > #2: ffffff881e29c0e0 (&shost->scan_mutex){+.+.}-{3:3}, at:
> > > __scsi_add_device+0x74/0x120
> > > #3: ffffff881960ea00 (&hwq->cq_lock){-...}-{2:2}, at:
> > > ufshcd_mcq_poll_cqe_lock+0x28/0x104
> >
> > This kworker is acquiring cq_lock, not host_lock?
>
> No. "lock(shost->host_lock)" means that it is acquiring the SCSI host
> lock.
>
Hi Bart,
I know that lock(shost->host_lock) means the SCSI host lock.
But among the 4 locks held by kworker/u28:0/12,
there is no host_lock, only cq_lock.
> > > stack backtrace:
> > > CPU: 6 UID: 0 PID: 12 Comm: kworker/u28:0 Tainted: G W
> > > OE 6.12.30-android16-5-maybe-dirty-4k #1
> > > ccd4020fe444bdf629efc3b86df6be920b8df7d0
> > > Tainted: [W]=WARN, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
> > > Hardware name: Spacecraft board based on MALIBU (DT)
> > > Workqueue: async async_run_entry_fn
> > > Call trace:
> > > dump_backtrace+0xfc/0x17c
> > > show_stack+0x18/0x28
> > > dump_stack_lvl+0x40/0xa0
> > > dump_stack+0x18/0x24
> > > print_irq_inversion_bug+0x2fc/0x304
> > > mark_lock_irq+0x388/0x4fc
> > > mark_lock+0x1c4/0x224
> > > __lock_acquire+0x438/0x2e1c
> > > lock_acquire+0x134/0x2b4
> > > _raw_spin_lock_irqsave+0x5c/0x80
> > > ufshcd_release_scsi_cmd+0x60/0x110
> > > ufshcd_compl_one_cqe+0x2c0/0x3f4
> > > ufshcd_mcq_poll_cqe_lock+0xb0/0x104
> > > ufs_google_mcq_intr+0x80/0xa0 [ufs
> > > dd6f385554e109da094ab91d5f7be18625a2222a]
> > > __handle_irq_event_percpu+0x104/0x32c
> > > handle_irq_event+0x40/0x9c
> > > handle_fasteoi_irq+0x170/0x2e8
> > > generic_handle_domain_irq+0x58/0x80
> > > gic_handle_irq+0x48/0x104
> > > call_on_irq_stack+0x3c/0x50
> > > do_interrupt_handler+0x7c/0xd8
> > > el1_interrupt+0x34/0x58
> > > el1h_64_irq_handler+0x18/0x24
> > > el1h_64_irq+0x68/0x6c
> > > _raw_spin_unlock_irqrestore+0x3c/0x6c
> > > debug_object_assert_init+0x16c/0x21c
> > > __mod_timer+0x4c/0x48c
> > > schedule_timeout+0xd4/0x16c
> > > io_schedule_timeout+0x48/0x70
> > > do_wait_for_common+0x100/0x194
> > > wait_for_completion_io_timeout+0x48/0x6c
> > > blk_execute_rq+0x124/0x17c
> > > scsi_execute_cmd+0x18c/0x3f8
> > > scsi_probe_and_add_lun+0x204/0xd74
> > > __scsi_add_device+0xbc/0x120
> > > ufshcd_async_scan+0x80/0x3c0
> > > async_run_entry_fn+0x4c/0x17c
> > > process_one_work+0x26c/0x65c
> > > worker_thread+0x33c/0x498
> > > kthread+0x110/0x134
> > > ret_from_fork+0x10/0x20
> > >
> >
> > This backtrace also looks like it is acquiring cq_lock
> > rather than host_lock. I'm not sure if I missed something?
>
> No. ufshcd_release_scsi_cmd() doesn't call any function that acquires
> cq_lock.
>
> >
What I mean is:
ufshcd_mcq_poll_cqe_lock will first acquire cq_lock,
and ufshcd_release_scsi_cmd will then acquire clk_gating.lock.
So CPU6 should look like this
CPU6
----
<Interrupt>
lock(&hwq->cq_lock);
...
lock(&hba->clk_gating.lock);
Thanks.
Peter
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2025-07-23 9:31 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-21 21:50 [PATCH v2] ufs: core: Fix IRQ lock inversion Bart Van Assche
2025-07-22 7:23 ` Peter Wang (王信友)
2025-07-22 15:35 ` Bart Van Assche
2025-07-23 9:31 ` Peter Wang (王信友)
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).