Linux SCSI subsystem development
 help / color / mirror / Atom feed
* [PATCH] ufs: core: Fix an error handler crash
@ 2025-12-04 17:04 Bart Van Assche
  2025-12-04 17:38 ` Nitin Rawat
                   ` (3 more replies)
  0 siblings, 4 replies; 6+ messages in thread
From: Bart Van Assche @ 2025-12-04 17:04 UTC (permalink / raw)
  To: Martin K . Petersen
  Cc: linux-scsi, Bart Van Assche, Peter Wang, Nitin Rawat,
	James E.J. Bottomley, Matthias Brugger,
	AngeloGioacchino Del Regno, Avri Altman, Bean Huo, Adrian Hunter,
	Bao D. Nguyen

The UFS error handler may be activated before SCSI scanning has started and
hence before hba->ufs_device_wlun has been set. Check the
hba->ufs_device_wlun pointer before using it.

Cc: Peter Wang <peter.wang@mediatek.com>
Cc: Nitin Rawat <nitin.rawat@oss.qualcomm.com>
Fixes: e23ef4f22db3 ("scsi: ufs: core: Fix error handler host_sem issue")
Fixes: f966e02ae521 ("scsi: ufs: core: Fix runtime suspend error deadlock")
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 drivers/ufs/core/ufshcd.c | 25 ++++++++++++++-----------
 1 file changed, 14 insertions(+), 11 deletions(-)

diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c
index b834b9635062..80c0b49f30b0 100644
--- a/drivers/ufs/core/ufshcd.c
+++ b/drivers/ufs/core/ufshcd.c
@@ -6698,19 +6698,22 @@ static void ufshcd_err_handler(struct work_struct *work)
 		 hba->saved_uic_err, hba->force_reset,
 		 ufshcd_is_link_broken(hba) ? "; link is broken" : "");
 
-	/*
-	 * Use ufshcd_rpm_get_noresume() here to safely perform link recovery
-	 * even if an error occurs during runtime suspend or runtime resume.
-	 * This avoids potential deadlocks that could happen if we tried to
-	 * resume the device while a PM operation is already in progress.
-	 */
-	ufshcd_rpm_get_noresume(hba);
-	if (hba->pm_op_in_progress) {
-		ufshcd_link_recovery(hba);
+	if (hba->ufs_device_wlun) {
+		/*
+		 * Use ufshcd_rpm_get_noresume() here to safely perform link
+		 * recovery even if an error occurs during runtime suspend or
+		 * runtime resume. This avoids potential deadlocks that could
+		 * happen if we tried to resume the device while a PM operation
+		 * is already in progress.
+		 */
+		ufshcd_rpm_get_noresume(hba);
+		if (hba->pm_op_in_progress) {
+			ufshcd_link_recovery(hba);
+			ufshcd_rpm_put(hba);
+			return;
+		}
 		ufshcd_rpm_put(hba);
-		return;
 	}
-	ufshcd_rpm_put(hba);
 
 	down(&hba->host_sem);
 	spin_lock_irqsave(hba->host->host_lock, flags);

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] ufs: core: Fix an error handler crash
  2025-12-04 17:04 [PATCH] ufs: core: Fix an error handler crash Bart Van Assche
@ 2025-12-04 17:38 ` Nitin Rawat
  2025-12-04 18:26   ` Bart Van Assche
  2025-12-05  8:31 ` Peter Wang (王信友)
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 6+ messages in thread
From: Nitin Rawat @ 2025-12-04 17:38 UTC (permalink / raw)
  To: Bart Van Assche, Martin K . Petersen
  Cc: linux-scsi, Peter Wang, James E.J. Bottomley, Matthias Brugger,
	AngeloGioacchino Del Regno, Avri Altman, Bean Huo, Adrian Hunter,
	Bao D. Nguyen



On 12/4/2025 10:34 PM, Bart Van Assche wrote:
> The UFS error handler may be activated before SCSI scanning has started and
> hence before hba->ufs_device_wlun has been set. Check the
> hba->ufs_device_wlun pointer before using it.
> 
> Cc: Peter Wang <peter.wang@mediatek.com>
> Cc: Nitin Rawat <nitin.rawat@oss.qualcomm.com>
> Fixes: e23ef4f22db3 ("scsi: ufs: core: Fix error handler host_sem issue")
> Fixes: f966e02ae521 ("scsi: ufs: core: Fix runtime suspend error deadlock")
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> ---
>   drivers/ufs/core/ufshcd.c | 25 ++++++++++++++-----------
>   1 file changed, 14 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c
> index b834b9635062..80c0b49f30b0 100644
> --- a/drivers/ufs/core/ufshcd.c
> +++ b/drivers/ufs/core/ufshcd.c
> @@ -6698,19 +6698,22 @@ static void ufshcd_err_handler(struct work_struct *work)
>   		 hba->saved_uic_err, hba->force_reset,
>   		 ufshcd_is_link_broken(hba) ? "; link is broken" : "");
>   
> -	/*
> -	 * Use ufshcd_rpm_get_noresume() here to safely perform link recovery
> -	 * even if an error occurs during runtime suspend or runtime resume.
> -	 * This avoids potential deadlocks that could happen if we tried to
> -	 * resume the device while a PM operation is already in progress.
> -	 */
> -	ufshcd_rpm_get_noresume(hba);
> -	if (hba->pm_op_in_progress) {
> -		ufshcd_link_recovery(hba);
> +	if (hba->ufs_device_wlun) {
> +		/*
> +		 * Use ufshcd_rpm_get_noresume() here to safely perform link
> +		 * recovery even if an error occurs during runtime suspend or
> +		 * runtime resume. This avoids potential deadlocks that could
> +		 * happen if we tried to resume the device while a PM operation
> +		 * is already in progress.
> +		 */
> +		ufshcd_rpm_get_noresume(hba);
> +		if (hba->pm_op_in_progress) {
> +			ufshcd_link_recovery(hba);
> +			ufshcd_rpm_put(hba);
> +			return;
> +		}
>   		ufshcd_rpm_put(hba);
> -		return;
>   	}
> -	ufshcd_rpm_put(hba);
>   
>   	down(&hba->host_sem);
>   	spin_lock_irqsave(hba->host->host_lock, flags);


Hi Bart,

It seems you missed sending the below patch. Both patches are required 
to address the issue (hang and clock scaling errors), except for the UIC 
error, which still needs to be root-caused


 > diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c
 > index 1b3fe1d8655e..fd0b6b620b53 100644
 > --- a/drivers/ufs/core/ufshcd.c
 > +++ b/drivers/ufs/core/ufshcd.c
 > @@ -1455,15 +1455,14 @@ static int ufshcd_clock_scaling_prepare(struct
 > ufs_hba *hba, u64 timeout_us)
 >   static void ufshcd_clock_scaling_unprepare(struct ufs_hba *hba, int 
err)
 >   {
 >         up_write(&hba->clk_scaling_lock);
 > -
 > +       mutex_unlock(&hba->wb_mutex);
 > +       blk_mq_unquiesce_tagset(&hba->host->tag_set);
 > +       mutex_unlock(&hba->host->scan_mutex);
 > +
 >         /* Enable Write Booster if current gear requires it else 
disable it */
 >         if (ufshcd_enable_wb_if_scaling_up(hba) && !err)
 >                 ufshcd_wb_toggle(hba, hba->pwr_info.gear_rx >=
 > hba->clk_scaling.wb_gear);
 >
 > -       mutex_unlock(&hba->wb_mutex);
 > -
 > -       blk_mq_unquiesce_tagset(&hba->host->tag_set);
 > -       mutex_unlock(&hba->host->scan_mutex);
 >         ufshcd_release(hba);
 >   }


Thanks,
Nitin


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] ufs: core: Fix an error handler crash
  2025-12-04 17:38 ` Nitin Rawat
@ 2025-12-04 18:26   ` Bart Van Assche
  0 siblings, 0 replies; 6+ messages in thread
From: Bart Van Assche @ 2025-12-04 18:26 UTC (permalink / raw)
  To: Nitin Rawat, Martin K . Petersen; +Cc: linux-scsi, James E.J. Bottomley

On 12/4/25 7:38 AM, Nitin Rawat wrote:
> It seems you missed sending the below patch. Both patches are required 
> to address the issue (hang and clock scaling errors), except for the UIC 
> error, which still needs to be root-caused

My shell history tells me that I tried to send the fix for the clock
scaling hang to the linux-scsi mailing list yesterday. Apparently the
patch didn't reach the linux-scsi mailing list. Maybe this was caused by
the hotel Wi-Fi. Anyway, thanks for having pointed this out. I have
resent the fix for the clock scaling hang.

Bart.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] ufs: core: Fix an error handler crash
  2025-12-04 17:04 [PATCH] ufs: core: Fix an error handler crash Bart Van Assche
  2025-12-04 17:38 ` Nitin Rawat
@ 2025-12-05  8:31 ` Peter Wang (王信友)
  2025-12-06 15:06 ` Nitin Rawat
  2025-12-09  2:58 ` Martin K. Petersen
  3 siblings, 0 replies; 6+ messages in thread
From: Peter Wang (王信友) @ 2025-12-05  8:31 UTC (permalink / raw)
  To: bvanassche@acm.org, martin.petersen@oracle.com
  Cc: beanhuo@micron.com, quic_nguyenb@quicinc.com,
	linux-scsi@vger.kernel.org, AngeloGioacchino Del Regno,
	adrian.hunter@intel.com, avri.altman@sandisk.com,
	matthias.bgg@gmail.com, nitin.rawat@oss.qualcomm.com,
	James.Bottomley@HansenPartnership.com

On Thu, 2025-12-04 at 07:04 -1000, Bart Van Assche wrote:
> The UFS error handler may be activated before SCSI scanning has
> started and
> hence before hba->ufs_device_wlun has been set. Check the
> hba->ufs_device_wlun pointer before using it.
> 
> Cc: Peter Wang <peter.wang@mediatek.com>
> Cc: Nitin Rawat <nitin.rawat@oss.qualcomm.com>
> Fixes: e23ef4f22db3 ("scsi: ufs: core: Fix error handler host_sem
> issue")
> Fixes: f966e02ae521 ("scsi: ufs: core: Fix runtime suspend error
> deadlock")
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>

Thanks for fix this bug.
Reviewed-by: Peter Wang <peter.wang@mediatek.com>


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] ufs: core: Fix an error handler crash
  2025-12-04 17:04 [PATCH] ufs: core: Fix an error handler crash Bart Van Assche
  2025-12-04 17:38 ` Nitin Rawat
  2025-12-05  8:31 ` Peter Wang (王信友)
@ 2025-12-06 15:06 ` Nitin Rawat
  2025-12-09  2:58 ` Martin K. Petersen
  3 siblings, 0 replies; 6+ messages in thread
From: Nitin Rawat @ 2025-12-06 15:06 UTC (permalink / raw)
  To: Bart Van Assche, Martin K . Petersen
  Cc: linux-scsi, Peter Wang, James E.J. Bottomley, Matthias Brugger,
	AngeloGioacchino Del Regno, Avri Altman, Bean Huo, Adrian Hunter,
	Bao D. Nguyen



On 12/4/2025 10:34 PM, Bart Van Assche wrote:
> The UFS error handler may be activated before SCSI scanning has started and
> hence before hba->ufs_device_wlun has been set. Check the
> hba->ufs_device_wlun pointer before using it.
> 
> Cc: Peter Wang <peter.wang@mediatek.com>
> Cc: Nitin Rawat <nitin.rawat@oss.qualcomm.com>
> Fixes: e23ef4f22db3 ("scsi: ufs: core: Fix error handler host_sem issue")
> Fixes: f966e02ae521 ("scsi: ufs: core: Fix runtime suspend error deadlock")
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>

Reviewed-by: Nitin Rawat <nitin.rawat@oss.qualcomm.com>

Tested-by: Nitin Rawat <nitin.rawat@oss.qualcomm.com> #SM8750




^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] ufs: core: Fix an error handler crash
  2025-12-04 17:04 [PATCH] ufs: core: Fix an error handler crash Bart Van Assche
                   ` (2 preceding siblings ...)
  2025-12-06 15:06 ` Nitin Rawat
@ 2025-12-09  2:58 ` Martin K. Petersen
  3 siblings, 0 replies; 6+ messages in thread
From: Martin K. Petersen @ 2025-12-09  2:58 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Martin K . Petersen, linux-scsi, Peter Wang, Nitin Rawat,
	James E.J. Bottomley, Matthias Brugger,
	AngeloGioacchino Del Regno, Avri Altman, Bean Huo, Adrian Hunter,
	Bao D. Nguyen


Bart,

> The UFS error handler may be activated before SCSI scanning has
> started and hence before hba->ufs_device_wlun has been set. Check the
> hba->ufs_device_wlun pointer before using it.

Applied to 6.19/scsi-staging, thanks!

-- 
Martin K. Petersen

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2025-12-09  2:59 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-12-04 17:04 [PATCH] ufs: core: Fix an error handler crash Bart Van Assche
2025-12-04 17:38 ` Nitin Rawat
2025-12-04 18:26   ` Bart Van Assche
2025-12-05  8:31 ` Peter Wang (王信友)
2025-12-06 15:06 ` Nitin Rawat
2025-12-09  2:58 ` Martin K. Petersen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox