[PATCH v2] ufs: core: Fix IRQ lock inversion

linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v2] ufs: core: Fix IRQ lock inversion
@ 2025-07-21 21:50 Bart Van Assche
  2025-07-22  7:23 ` Peter Wang (王信友)
  0 siblings, 1 reply; 4+ messages in thread
From: Bart Van Assche @ 2025-07-21 21:50 UTC (permalink / raw)
  To: Martin K . Petersen
  Cc: linux-scsi, Bart Van Assche, Neil Armstrong, André Draszik,
	James E.J. Bottomley, Peter Wang, Avri Altman, Bean Huo,
	Manivannan Sadhasivam, Bao D. Nguyen

Commit 3c7ac40d7322 ("scsi: ufs: core: Delegate the interrupt service
routine to a threaded IRQ handler") introduced an IRQ lock inversion
issue. Fix this lock inversion by changing the spin_lock_irq() calls into
spin_lock_irqsave() calls in code that can be called either from interrupt
context or from thread context. This patch fixes the following lockdep
complaint:

WARNING: possible irq lock inversion dependency detected
6.12.30-android16-5-maybe-dirty-4k #1 Tainted: G        W  OE
--------------------------------------------------------
kworker/u28:0/12 just changed the state of lock:
ffffff881e29dd60 (&hba->clk_gating.lock){-...}-{2:2}, at: ufshcd_release_scsi_cmd+0x60/0x110
but this lock took another, HARDIRQ-unsafe lock in the past:
 (shost->host_lock){+.+.}-{2:2}

and interrupts could create inverse lock ordering between them.

other info that might help us debug this:
 Possible interrupt unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(shost->host_lock);
                               local_irq_disable();
                               lock(&hba->clk_gating.lock);
                               lock(shost->host_lock);
  <Interrupt>
    lock(&hba->clk_gating.lock);

 *** DEADLOCK ***

4 locks held by kworker/u28:0/12:
 #0: ffffff8800ac6158 ((wq_completion)async){+.+.}-{0:0}, at: process_one_work+0x1bc/0x65c
 #1: ffffffc085c93d70 ((work_completion)(&entry->work)){+.+.}-{0:0}, at: process_one_work+0x1e4/0x65c
 #2: ffffff881e29c0e0 (&shost->scan_mutex){+.+.}-{3:3}, at: __scsi_add_device+0x74/0x120
 #3: ffffff881960ea00 (&hwq->cq_lock){-...}-{2:2}, at: ufshcd_mcq_poll_cqe_lock+0x28/0x104

the shortest dependencies between 2nd lock and 1st lock:
 -> (shost->host_lock){+.+.}-{2:2} {
    HARDIRQ-ON-W at:
                      lock_acquire+0x134/0x2b4
                      _raw_spin_lock+0x48/0x64
                      ufshcd_sl_intr+0x4c/0xa08
                      ufshcd_threaded_intr+0x70/0x12c
                      irq_thread_fn+0x48/0xa8
                      irq_thread+0x130/0x1ec
                      kthread+0x110/0x134
                      ret_from_fork+0x10/0x20
    SOFTIRQ-ON-W at:
                      lock_acquire+0x134/0x2b4
                      _raw_spin_lock+0x48/0x64
                      ufshcd_sl_intr+0x4c/0xa08
                      ufshcd_threaded_intr+0x70/0x12c
                      irq_thread_fn+0x48/0xa8
                      irq_thread+0x130/0x1ec
                      kthread+0x110/0x134
                      ret_from_fork+0x10/0x20
    INITIAL USE at:
                     lock_acquire+0x134/0x2b4
                     _raw_spin_lock+0x48/0x64
                     ufshcd_sl_intr+0x4c/0xa08
                     ufshcd_threaded_intr+0x70/0x12c
                     irq_thread_fn+0x48/0xa8
                     irq_thread+0x130/0x1ec
                     kthread+0x110/0x134
                     ret_from_fork+0x10/0x20
  }
  ... key      at: [<ffffffc085ba1a98>] scsi_host_alloc.__key+0x0/0x10
  ... acquired at:
   _raw_spin_lock_irqsave+0x5c/0x80
   __ufshcd_release+0x78/0x118
   ufshcd_send_uic_cmd+0xe4/0x118
   ufshcd_dme_set_attr+0x88/0x1c8
   ufs_google_phy_initialization+0x68/0x418 [ufs]
   ufs_google_link_startup_notify+0x78/0x27c [ufs]
   ufshcd_link_startup+0x84/0x720
   ufshcd_init+0xf3c/0x1330
   ufshcd_pltfrm_init+0x728/0x7d8
   ufs_google_probe+0x30/0x84 [ufs]
   platform_probe+0xa0/0xe0
   really_probe+0x114/0x454
   __driver_probe_device+0xa4/0x160
   driver_probe_device+0x44/0x23c
   __driver_attach_async_helper+0x60/0xd4
   async_run_entry_fn+0x4c/0x17c
   process_one_work+0x26c/0x65c
   worker_thread+0x33c/0x498
   kthread+0x110/0x134
   ret_from_fork+0x10/0x20

-> (&hba->clk_gating.lock){-...}-{2:2} {
   IN-HARDIRQ-W at:
                    lock_acquire+0x134/0x2b4
                    _raw_spin_lock_irqsave+0x5c/0x80
                    ufshcd_release_scsi_cmd+0x60/0x110
                    ufshcd_compl_one_cqe+0x2c0/0x3f4
                    ufshcd_mcq_poll_cqe_lock+0xb0/0x104
                    ufs_google_mcq_intr+0x80/0xa0 [ufs]
                    __handle_irq_event_percpu+0x104/0x32c
                    handle_irq_event+0x40/0x9c
                    handle_fasteoi_irq+0x170/0x2e8
                    generic_handle_domain_irq+0x58/0x80
                    gic_handle_irq+0x48/0x104
                    call_on_irq_stack+0x3c/0x50
                    do_interrupt_handler+0x7c/0xd8
                    el1_interrupt+0x34/0x58
                    el1h_64_irq_handler+0x18/0x24
                    el1h_64_irq+0x68/0x6c
                    _raw_spin_unlock_irqrestore+0x3c/0x6c
                    debug_object_assert_init+0x16c/0x21c
                    __mod_timer+0x4c/0x48c
                    schedule_timeout+0xd4/0x16c
                    io_schedule_timeout+0x48/0x70
                    do_wait_for_common+0x100/0x194
                    wait_for_completion_io_timeout+0x48/0x6c
                    blk_execute_rq+0x124/0x17c
                    scsi_execute_cmd+0x18c/0x3f8
                    scsi_probe_and_add_lun+0x204/0xd74
                    __scsi_add_device+0xbc/0x120
                    ufshcd_async_scan+0x80/0x3c0
                    async_run_entry_fn+0x4c/0x17c
                    process_one_work+0x26c/0x65c
                    worker_thread+0x33c/0x498
                    kthread+0x110/0x134
                    ret_from_fork+0x10/0x20
   INITIAL USE at:
                   lock_acquire+0x134/0x2b4
                   _raw_spin_lock_irqsave+0x5c/0x80
                   ufshcd_hold+0x34/0x14c
                   ufshcd_send_uic_cmd+0x28/0x118
                   ufshcd_dme_set_attr+0x88/0x1c8
                   ufs_google_phy_initialization+0x68/0x418 [ufs]
                   ufs_google_link_startup_notify+0x78/0x27c [ufs]
                   ufshcd_link_startup+0x84/0x720
                   ufshcd_init+0xf3c/0x1330
                   ufshcd_pltfrm_init+0x728/0x7d8
                   ufs_google_probe+0x30/0x84 [ufs]
                   platform_probe+0xa0/0xe0
                   really_probe+0x114/0x454
                   __driver_probe_device+0xa4/0x160
                   driver_probe_device+0x44/0x23c
                   __driver_attach_async_helper+0x60/0xd4
                   async_run_entry_fn+0x4c/0x17c
                   process_one_work+0x26c/0x65c
                   worker_thread+0x33c/0x498
                   kthread+0x110/0x134
                   ret_from_fork+0x10/0x20
 }
 ... key      at: [<ffffffc085ba6fe8>] ufshcd_init.__key+0x0/0x10
 ... acquired at:
   mark_lock+0x1c4/0x224
   __lock_acquire+0x438/0x2e1c
   lock_acquire+0x134/0x2b4
   _raw_spin_lock_irqsave+0x5c/0x80
   ufshcd_release_scsi_cmd+0x60/0x110
   ufshcd_compl_one_cqe+0x2c0/0x3f4
   ufshcd_mcq_poll_cqe_lock+0xb0/0x104
   ufs_google_mcq_intr+0x80/0xa0 [ufs]
   __handle_irq_event_percpu+0x104/0x32c
   handle_irq_event+0x40/0x9c
   handle_fasteoi_irq+0x170/0x2e8
   generic_handle_domain_irq+0x58/0x80
   gic_handle_irq+0x48/0x104
   call_on_irq_stack+0x3c/0x50
   do_interrupt_handler+0x7c/0xd8
   el1_interrupt+0x34/0x58
   el1h_64_irq_handler+0x18/0x24
   el1h_64_irq+0x68/0x6c
   _raw_spin_unlock_irqrestore+0x3c/0x6c
   debug_object_assert_init+0x16c/0x21c
   __mod_timer+0x4c/0x48c
   schedule_timeout+0xd4/0x16c
   io_schedule_timeout+0x48/0x70
   do_wait_for_common+0x100/0x194
   wait_for_completion_io_timeout+0x48/0x6c
   blk_execute_rq+0x124/0x17c
   scsi_execute_cmd+0x18c/0x3f8
   scsi_probe_and_add_lun+0x204/0xd74
   __scsi_add_device+0xbc/0x120
   ufshcd_async_scan+0x80/0x3c0
   async_run_entry_fn+0x4c/0x17c
   process_one_work+0x26c/0x65c
   worker_thread+0x33c/0x498
   kthread+0x110/0x134
   ret_from_fork+0x10/0x20

stack backtrace:
CPU: 6 UID: 0 PID: 12 Comm: kworker/u28:0 Tainted: G        W  OE      6.12.30-android16-5-maybe-dirty-4k #1 ccd4020fe444bdf629efc3b86df6be920b8df7d0
Tainted: [W]=WARN, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
Hardware name: Spacecraft board based on MALIBU (DT)
Workqueue: async async_run_entry_fn
Call trace:
 dump_backtrace+0xfc/0x17c
 show_stack+0x18/0x28
 dump_stack_lvl+0x40/0xa0
 dump_stack+0x18/0x24
 print_irq_inversion_bug+0x2fc/0x304
 mark_lock_irq+0x388/0x4fc
 mark_lock+0x1c4/0x224
 __lock_acquire+0x438/0x2e1c
 lock_acquire+0x134/0x2b4
 _raw_spin_lock_irqsave+0x5c/0x80
 ufshcd_release_scsi_cmd+0x60/0x110
 ufshcd_compl_one_cqe+0x2c0/0x3f4
 ufshcd_mcq_poll_cqe_lock+0xb0/0x104
 ufs_google_mcq_intr+0x80/0xa0 [ufs dd6f385554e109da094ab91d5f7be18625a2222a]
 __handle_irq_event_percpu+0x104/0x32c
 handle_irq_event+0x40/0x9c
 handle_fasteoi_irq+0x170/0x2e8
 generic_handle_domain_irq+0x58/0x80
 gic_handle_irq+0x48/0x104
 call_on_irq_stack+0x3c/0x50
 do_interrupt_handler+0x7c/0xd8
 el1_interrupt+0x34/0x58
 el1h_64_irq_handler+0x18/0x24
 el1h_64_irq+0x68/0x6c
 _raw_spin_unlock_irqrestore+0x3c/0x6c
 debug_object_assert_init+0x16c/0x21c
 __mod_timer+0x4c/0x48c
 schedule_timeout+0xd4/0x16c
 io_schedule_timeout+0x48/0x70
 do_wait_for_common+0x100/0x194
 wait_for_completion_io_timeout+0x48/0x6c
 blk_execute_rq+0x124/0x17c
 scsi_execute_cmd+0x18c/0x3f8
 scsi_probe_and_add_lun+0x204/0xd74
 __scsi_add_device+0xbc/0x120
 ufshcd_async_scan+0x80/0x3c0
 async_run_entry_fn+0x4c/0x17c
 process_one_work+0x26c/0x65c
 worker_thread+0x33c/0x498
 kthread+0x110/0x134
 ret_from_fork+0x10/0x20

Cc: Neil Armstrong <neil.armstrong@linaro.org>
Cc: André Draszik <andre.draszik@linaro.org>
Fixes: 3c7ac40d7322 ("scsi: ufs: core: Delegate the interrupt service routine to a threaded IRQ handler")
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---

Changes compared to v1: fixed a build error.

 drivers/ufs/core/ufshcd.c | 11 ++++-------
 1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c
index acfc1b4691fa..8b72cf7811ca 100644
--- a/drivers/ufs/core/ufshcd.c
+++ b/drivers/ufs/core/ufshcd.c
@@ -5566,7 +5566,7 @@ static irqreturn_t ufshcd_uic_cmd_compl(struct ufs_hba *hba, u32 intr_status)
 	irqreturn_t retval = IRQ_NONE;
 	struct uic_command *cmd;
 
-	spin_lock(hba->host->host_lock);
+	guard(spinlock_irqsave)(hba->host->host_lock);
 	cmd = hba->active_uic_cmd;
 	if (WARN_ON_ONCE(!cmd))
 		goto unlock;
@@ -5593,8 +5593,6 @@ static irqreturn_t ufshcd_uic_cmd_compl(struct ufs_hba *hba, u32 intr_status)
 		ufshcd_add_uic_command_trace(hba, cmd, UFS_CMD_COMP);
 
 unlock:
-	spin_unlock(hba->host->host_lock);
-
 	return retval;
 }
 
@@ -6455,10 +6453,9 @@ void ufshcd_schedule_eh_work(struct ufs_hba *hba)
 
 static void ufshcd_force_error_recovery(struct ufs_hba *hba)
 {
-	spin_lock_irq(hba->host->host_lock);
+	guard(spinlock_irqsave)(hba->host->host_lock);
 	hba->force_reset = true;
 	ufshcd_schedule_eh_work(hba);
-	spin_unlock_irq(hba->host->host_lock);
 }
 
 static void ufshcd_clk_scaling_allow(struct ufs_hba *hba, bool allow)
@@ -6922,7 +6919,7 @@ static irqreturn_t ufshcd_check_errors(struct ufs_hba *hba, u32 intr_status)
 	bool queue_eh_work = false;
 	irqreturn_t retval = IRQ_NONE;
 
-	spin_lock(hba->host->host_lock);
+	guard(spinlock_irqsave)(hba->host->host_lock);
 	hba->errors |= UFSHCD_ERROR_MASK & intr_status;
 
 	if (hba->errors & INT_FATAL_ERRORS) {
@@ -6981,7 +6978,7 @@ static irqreturn_t ufshcd_check_errors(struct ufs_hba *hba, u32 intr_status)
 	 */
 	hba->errors = 0;
 	hba->uic_error = 0;
-	spin_unlock(hba->host->host_lock);
+
 	return retval;
 }
 

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH v2] ufs: core: Fix IRQ lock inversion
  2025-07-21 21:50 [PATCH v2] ufs: core: Fix IRQ lock inversion Bart Van Assche
@ 2025-07-22  7:23 ` Peter Wang (王信友)
  2025-07-22 15:35   ` Bart Van Assche
  0 siblings, 1 reply; 4+ messages in thread
From: Peter Wang (王信友) @ 2025-07-22  7:23 UTC (permalink / raw)
  To: bvanassche@acm.org, martin.petersen@oracle.com
  Cc: huobean@gmail.com, neil.armstrong@linaro.org,
	quic_nguyenb@quicinc.com, linux-scsi@vger.kernel.org,
	andre.draszik@linaro.org, avri.altman@sandisk.com,
	mani@kernel.org, James.Bottomley@HansenPartnership.com

On Mon, 2025-07-21 at 14:50 -0700, Bart Van Assche wrote:
> other info that might help us debug this:
>  Possible interrupt unsafe locking scenario:
> 
>        CPU0                    CPU1
>        ----                    ----
>   lock(shost->host_lock);
>                                local_irq_disable();
>                                lock(&hba->clk_gating.lock);
>                                lock(shost->host_lock);
>   <Interrupt>
>     lock(&hba->clk_gating.lock);
> 
>  *** DEADLOCK ***
> 
> 4 locks held by kworker/u28:0/12:
>  #0: ffffff8800ac6158 ((wq_completion)async){+.+.}-{0:0}, at:
> process_one_work+0x1bc/0x65c
>  #1: ffffffc085c93d70 ((work_completion)(&entry->work)){+.+.}-{0:0},
> at: process_one_work+0x1e4/0x65c
>  #2: ffffff881e29c0e0 (&shost->scan_mutex){+.+.}-{3:3}, at:
> __scsi_add_device+0x74/0x120
>  #3: ffffff881960ea00 (&hwq->cq_lock){-...}-{2:2}, at:
> ufshcd_mcq_poll_cqe_lock+0x28/0x104
> 

Hi Bart,

This kworker is acquiring cq_lock, not host_lock?


> stack backtrace:
> CPU: 6 UID: 0 PID: 12 Comm: kworker/u28:0 Tainted: G        W 
> OE      6.12.30-android16-5-maybe-dirty-4k #1
> ccd4020fe444bdf629efc3b86df6be920b8df7d0
> Tainted: [W]=WARN, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
> Hardware name: Spacecraft board based on MALIBU (DT)
> Workqueue: async async_run_entry_fn
> Call trace:
>  dump_backtrace+0xfc/0x17c
>  show_stack+0x18/0x28
>  dump_stack_lvl+0x40/0xa0
>  dump_stack+0x18/0x24
>  print_irq_inversion_bug+0x2fc/0x304
>  mark_lock_irq+0x388/0x4fc
>  mark_lock+0x1c4/0x224
>  __lock_acquire+0x438/0x2e1c
>  lock_acquire+0x134/0x2b4
>  _raw_spin_lock_irqsave+0x5c/0x80
>  ufshcd_release_scsi_cmd+0x60/0x110
>  ufshcd_compl_one_cqe+0x2c0/0x3f4
>  ufshcd_mcq_poll_cqe_lock+0xb0/0x104
>  ufs_google_mcq_intr+0x80/0xa0 [ufs
> dd6f385554e109da094ab91d5f7be18625a2222a]
>  __handle_irq_event_percpu+0x104/0x32c
>  handle_irq_event+0x40/0x9c
>  handle_fasteoi_irq+0x170/0x2e8
>  generic_handle_domain_irq+0x58/0x80
>  gic_handle_irq+0x48/0x104
>  call_on_irq_stack+0x3c/0x50
>  do_interrupt_handler+0x7c/0xd8
>  el1_interrupt+0x34/0x58
>  el1h_64_irq_handler+0x18/0x24
>  el1h_64_irq+0x68/0x6c
>  _raw_spin_unlock_irqrestore+0x3c/0x6c
>  debug_object_assert_init+0x16c/0x21c
>  __mod_timer+0x4c/0x48c
>  schedule_timeout+0xd4/0x16c
>  io_schedule_timeout+0x48/0x70
>  do_wait_for_common+0x100/0x194
>  wait_for_completion_io_timeout+0x48/0x6c
>  blk_execute_rq+0x124/0x17c
>  scsi_execute_cmd+0x18c/0x3f8
>  scsi_probe_and_add_lun+0x204/0xd74
>  __scsi_add_device+0xbc/0x120
>  ufshcd_async_scan+0x80/0x3c0
>  async_run_entry_fn+0x4c/0x17c
>  process_one_work+0x26c/0x65c
>  worker_thread+0x33c/0x498
>  kthread+0x110/0x134
>  ret_from_fork+0x10/0x20
> 

This backtrace also looks like it is acquiring cq_lock
rather than host_lock. I'm not sure if I missed something?


> 
> diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c
> index acfc1b4691fa..8b72cf7811ca 100644
> --- a/drivers/ufs/core/ufshcd.c
> +++ b/drivers/ufs/core/ufshcd.c
> @@ -5566,7 +5566,7 @@ static irqreturn_t ufshcd_uic_cmd_compl(struct
> ufs_hba *hba, u32 intr_status)
>         irqreturn_t retval = IRQ_NONE;
>         struct uic_command *cmd;
> 
> -       spin_lock(hba->host->host_lock);
> +       guard(spinlock_irqsave)(hba->host->host_lock);
>         cmd = hba->active_uic_cmd;
>         if (WARN_ON_ONCE(!cmd))
>                 goto unlock;
> @@ -5593,8 +5593,6 @@ static irqreturn_t ufshcd_uic_cmd_compl(struct
> ufs_hba *hba, u32 intr_status)
>                 ufshcd_add_uic_command_trace(hba, cmd, UFS_CMD_COMP);
> 
>  unlock:
> -       spin_unlock(hba->host->host_lock);
> -
>         return retval;
>  }
> 
> @@ -6455,10 +6453,9 @@ void ufshcd_schedule_eh_work(struct ufs_hba
> *hba)
> 
>  static void ufshcd_force_error_recovery(struct ufs_hba *hba)
>  {
> -       spin_lock_irq(hba->host->host_lock);
> +       guard(spinlock_irqsave)(hba->host->host_lock);
>         hba->force_reset = true;
>         ufshcd_schedule_eh_work(hba);
> -       spin_unlock_irq(hba->host->host_lock);
>  }
> 

The other two functions do seem to have issues, 
but ufshcd_force_error_recovery shouldn't,
because it is not used by ufshcd_threaded_intr, right?

Thanks.
Peter



>  static void ufshcd_clk_scaling_allow(struct ufs_hba *hba, bool
> allow)
> @@ -6922,7 +6919,7 @@ static irqreturn_t ufshcd_check_errors(struct
> ufs_hba *hba, u32 intr_status)
>         bool queue_eh_work = false;
>         irqreturn_t retval = IRQ_NONE;
> 
> -       spin_lock(hba->host->host_lock);
> +       guard(spinlock_irqsave)(hba->host->host_lock);
>         hba->errors |= UFSHCD_ERROR_MASK & intr_status;
> 
>         if (hba->errors & INT_FATAL_ERRORS) {
> @@ -6981,7 +6978,7 @@ static irqreturn_t ufshcd_check_errors(struct
> ufs_hba *hba, u32 intr_status)
>          */
>         hba->errors = 0;
>         hba->uic_error = 0;
> -       spin_unlock(hba->host->host_lock);
> +
>         return retval;
>  }
> 


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v2] ufs: core: Fix IRQ lock inversion
  2025-07-22  7:23 ` Peter Wang (王信友)
@ 2025-07-22 15:35   ` Bart Van Assche
  2025-07-23  9:31     ` Peter Wang (王信友)
  0 siblings, 1 reply; 4+ messages in thread
From: Bart Van Assche @ 2025-07-22 15:35 UTC (permalink / raw)
  To: Peter Wang (王信友), martin.petersen@oracle.com
  Cc: huobean@gmail.com, neil.armstrong@linaro.org,
	quic_nguyenb@quicinc.com, linux-scsi@vger.kernel.org,
	andre.draszik@linaro.org, avri.altman@sandisk.com,
	mani@kernel.org, James.Bottomley@HansenPartnership.com

On 7/22/25 12:23 AM, Peter Wang (王信友) wrote:
> On Mon, 2025-07-21 at 14:50 -0700, Bart Van Assche wrote:
>> other info that might help us debug this:
>>  Possible interrupt unsafe locking scenario:
>> 
>>        CPU0                    CPU1
>>        ----                    ----
>>   lock(shost->host_lock);
>>                                local_irq_disable();
>>                                lock(&hba->clk_gating.lock);
>>                                lock(shost->host_lock);
>>   <Interrupt>
>>     lock(&hba->clk_gating.lock);
>> 
>>  *** DEADLOCK ***
>> 
>> 4 locks held by kworker/u28:0/12:
>>  #0: ffffff8800ac6158 ((wq_completion)async){+.+.}-{0:0}, at:
>> process_one_work+0x1bc/0x65c
>>  #1: ffffffc085c93d70 ((work_completion)(&entry->work)){+.+.}-{0:0},
>> at: process_one_work+0x1e4/0x65c
>>  #2: ffffff881e29c0e0 (&shost->scan_mutex){+.+.}-{3:3}, at:
>> __scsi_add_device+0x74/0x120
>>  #3: ffffff881960ea00 (&hwq->cq_lock){-...}-{2:2}, at:
>> ufshcd_mcq_poll_cqe_lock+0x28/0x104
> 
> This kworker is acquiring cq_lock, not host_lock?

No. "lock(shost->host_lock)" means that it is acquiring the SCSI host
lock.

>> stack backtrace:
>> CPU: 6 UID: 0 PID: 12 Comm: kworker/u28:0 Tainted: G        W 
>> OE      6.12.30-android16-5-maybe-dirty-4k #1
>> ccd4020fe444bdf629efc3b86df6be920b8df7d0
>> Tainted: [W]=WARN, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
>> Hardware name: Spacecraft board based on MALIBU (DT)
>> Workqueue: async async_run_entry_fn
>> Call trace:
>>  dump_backtrace+0xfc/0x17c
>>  show_stack+0x18/0x28
>>  dump_stack_lvl+0x40/0xa0
>>  dump_stack+0x18/0x24
>>  print_irq_inversion_bug+0x2fc/0x304
>>  mark_lock_irq+0x388/0x4fc
>>  mark_lock+0x1c4/0x224
>>  __lock_acquire+0x438/0x2e1c
>>  lock_acquire+0x134/0x2b4
>>  _raw_spin_lock_irqsave+0x5c/0x80
>>  ufshcd_release_scsi_cmd+0x60/0x110
>>  ufshcd_compl_one_cqe+0x2c0/0x3f4
>>  ufshcd_mcq_poll_cqe_lock+0xb0/0x104
>>  ufs_google_mcq_intr+0x80/0xa0 [ufs
>> dd6f385554e109da094ab91d5f7be18625a2222a]
>>  __handle_irq_event_percpu+0x104/0x32c
>>  handle_irq_event+0x40/0x9c
>>  handle_fasteoi_irq+0x170/0x2e8
>>  generic_handle_domain_irq+0x58/0x80
>>  gic_handle_irq+0x48/0x104
>>  call_on_irq_stack+0x3c/0x50
>>  do_interrupt_handler+0x7c/0xd8
>>  el1_interrupt+0x34/0x58
>>  el1h_64_irq_handler+0x18/0x24
>>  el1h_64_irq+0x68/0x6c
>>  _raw_spin_unlock_irqrestore+0x3c/0x6c
>>  debug_object_assert_init+0x16c/0x21c
>>  __mod_timer+0x4c/0x48c
>>  schedule_timeout+0xd4/0x16c
>>  io_schedule_timeout+0x48/0x70
>>  do_wait_for_common+0x100/0x194
>>  wait_for_completion_io_timeout+0x48/0x6c
>>  blk_execute_rq+0x124/0x17c
>>  scsi_execute_cmd+0x18c/0x3f8
>>  scsi_probe_and_add_lun+0x204/0xd74
>>  __scsi_add_device+0xbc/0x120
>>  ufshcd_async_scan+0x80/0x3c0
>>  async_run_entry_fn+0x4c/0x17c
>>  process_one_work+0x26c/0x65c
>>  worker_thread+0x33c/0x498
>>  kthread+0x110/0x134
>>  ret_from_fork+0x10/0x20
>> 
> 
> This backtrace also looks like it is acquiring cq_lock
> rather than host_lock. I'm not sure if I missed something?

No. ufshcd_release_scsi_cmd() doesn't call any function that acquires
cq_lock.

>> @@ -6455,10 +6453,9 @@ void ufshcd_schedule_eh_work(struct ufs_hba
>> *hba)
>> 
>>  static void ufshcd_force_error_recovery(struct ufs_hba *hba)
>>  {
>> -       spin_lock_irq(hba->host->host_lock);
>> +       guard(spinlock_irqsave)(hba->host->host_lock);
>>         hba->force_reset = true;
>>         ufshcd_schedule_eh_work(hba);
>> -       spin_unlock_irq(hba->host->host_lock);
>>  }
>> 
> 
> The other two functions do seem to have issues,
> but ufshcd_force_error_recovery shouldn't,
> because it is not used by ufshcd_threaded_intr, right?

Agreed. I will leave out this change.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v2] ufs: core: Fix IRQ lock inversion
  2025-07-22 15:35   ` Bart Van Assche
@ 2025-07-23  9:31     ` Peter Wang (王信友)
  0 siblings, 0 replies; 4+ messages in thread
From: Peter Wang (王信友) @ 2025-07-23  9:31 UTC (permalink / raw)
  To: bvanassche@acm.org, martin.petersen@oracle.com
  Cc: huobean@gmail.com, neil.armstrong@linaro.org,
	quic_nguyenb@quicinc.com, linux-scsi@vger.kernel.org,
	andre.draszik@linaro.org, avri.altman@sandisk.com,
	mani@kernel.org, James.Bottomley@HansenPartnership.com

On Tue, 2025-07-22 at 08:35 -0700, Bart Van Assche wrote:
> > > 
> > > 4 locks held by kworker/u28:0/12:
> > >  #0: ffffff8800ac6158 ((wq_completion)async){+.+.}-{0:0}, at:
> > > process_one_work+0x1bc/0x65c
> > >  #1: ffffffc085c93d70 ((work_completion)(&entry->work)){+.+.}-
> > > {0:0},
> > > at: process_one_work+0x1e4/0x65c
> > >  #2: ffffff881e29c0e0 (&shost->scan_mutex){+.+.}-{3:3}, at:
> > > __scsi_add_device+0x74/0x120
> > >  #3: ffffff881960ea00 (&hwq->cq_lock){-...}-{2:2}, at:
> > > ufshcd_mcq_poll_cqe_lock+0x28/0x104
> > 
> > This kworker is acquiring cq_lock, not host_lock?
> 
> No. "lock(shost->host_lock)" means that it is acquiring the SCSI host
> lock.
> 

Hi Bart,

I know that lock(shost->host_lock) means the SCSI host lock.
But among the 4 locks held by kworker/u28:0/12,
there is no host_lock, only cq_lock.



> > > stack backtrace:
> > > CPU: 6 UID: 0 PID: 12 Comm: kworker/u28:0 Tainted: G        W
> > > OE      6.12.30-android16-5-maybe-dirty-4k #1
> > > ccd4020fe444bdf629efc3b86df6be920b8df7d0
> > > Tainted: [W]=WARN, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
> > > Hardware name: Spacecraft board based on MALIBU (DT)
> > > Workqueue: async async_run_entry_fn
> > > Call trace:
> > >  dump_backtrace+0xfc/0x17c
> > >  show_stack+0x18/0x28
> > >  dump_stack_lvl+0x40/0xa0
> > >  dump_stack+0x18/0x24
> > >  print_irq_inversion_bug+0x2fc/0x304
> > >  mark_lock_irq+0x388/0x4fc
> > >  mark_lock+0x1c4/0x224
> > >  __lock_acquire+0x438/0x2e1c
> > >  lock_acquire+0x134/0x2b4
> > >  _raw_spin_lock_irqsave+0x5c/0x80
> > >  ufshcd_release_scsi_cmd+0x60/0x110
> > >  ufshcd_compl_one_cqe+0x2c0/0x3f4
> > >  ufshcd_mcq_poll_cqe_lock+0xb0/0x104
> > >  ufs_google_mcq_intr+0x80/0xa0 [ufs
> > > dd6f385554e109da094ab91d5f7be18625a2222a]
> > >  __handle_irq_event_percpu+0x104/0x32c
> > >  handle_irq_event+0x40/0x9c
> > >  handle_fasteoi_irq+0x170/0x2e8
> > >  generic_handle_domain_irq+0x58/0x80
> > >  gic_handle_irq+0x48/0x104
> > >  call_on_irq_stack+0x3c/0x50
> > >  do_interrupt_handler+0x7c/0xd8
> > >  el1_interrupt+0x34/0x58
> > >  el1h_64_irq_handler+0x18/0x24
> > >  el1h_64_irq+0x68/0x6c
> > >  _raw_spin_unlock_irqrestore+0x3c/0x6c
> > >  debug_object_assert_init+0x16c/0x21c
> > >  __mod_timer+0x4c/0x48c
> > >  schedule_timeout+0xd4/0x16c
> > >  io_schedule_timeout+0x48/0x70
> > >  do_wait_for_common+0x100/0x194
> > >  wait_for_completion_io_timeout+0x48/0x6c
> > >  blk_execute_rq+0x124/0x17c
> > >  scsi_execute_cmd+0x18c/0x3f8
> > >  scsi_probe_and_add_lun+0x204/0xd74
> > >  __scsi_add_device+0xbc/0x120
> > >  ufshcd_async_scan+0x80/0x3c0
> > >  async_run_entry_fn+0x4c/0x17c
> > >  process_one_work+0x26c/0x65c
> > >  worker_thread+0x33c/0x498
> > >  kthread+0x110/0x134
> > >  ret_from_fork+0x10/0x20
> > > 
> > 
> > This backtrace also looks like it is acquiring cq_lock
> > rather than host_lock. I'm not sure if I missed something?
> 
> No. ufshcd_release_scsi_cmd() doesn't call any function that acquires
> cq_lock.
> 
> > 

What I mean is:
ufshcd_mcq_poll_cqe_lock will first acquire cq_lock,
and ufshcd_release_scsi_cmd will then acquire clk_gating.lock.
So CPU6 should look like this

        CPU6                
        ---- 
   <Interrupt>
   lock(&hwq->cq_lock);

   ...

   lock(&hba->clk_gating.lock);


Thanks.
Peter


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2025-07-23  9:31 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-21 21:50 [PATCH v2] ufs: core: Fix IRQ lock inversion Bart Van Assche
2025-07-22  7:23 ` Peter Wang (王信友)
2025-07-22 15:35   ` Bart Van Assche
2025-07-23  9:31     ` Peter Wang (王信友)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).