public inbox for stable@vger.kernel.org
 help / color / mirror / Atom feed
* FAILED: patch "[PATCH] ata: libata-scsi: Fix delayed scsi_rescan_device() execution" failed to apply to 6.5-stable tree
@ 2023-10-04 14:32 gregkh
  2023-10-04 23:39 ` Damien Le Moal
  2023-10-04 23:46 ` [PATCH 6.5.y] ata: libata-scsi: Fix delayed scsi_rescan_device() execution Damien Le Moal
  0 siblings, 2 replies; 3+ messages in thread
From: gregkh @ 2023-10-04 14:32 UTC (permalink / raw)
  To: dlemoal, geert+renesas, hare, martin.petersen, niklas.cassel; +Cc: stable


The patch below does not apply to the 6.5-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable@vger.kernel.org>.

To reproduce the conflict and resubmit, you may use the following commands:

git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.5.y
git checkout FETCH_HEAD
git cherry-pick -x 8b4d9469d0b0e553208ee6f62f2807111fde18b9
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable@vger.kernel.org>' --in-reply-to '2023100443-freewill-shanty-0f05@gregkh' --subject-prefix 'PATCH 6.5.y' HEAD^..

Possible dependencies:



thanks,

greg k-h

------------------ original commit in Linus's tree ------------------

From 8b4d9469d0b0e553208ee6f62f2807111fde18b9 Mon Sep 17 00:00:00 2001
From: Damien Le Moal <dlemoal@kernel.org>
Date: Tue, 5 Sep 2023 09:06:23 +0900
Subject: [PATCH] ata: libata-scsi: Fix delayed scsi_rescan_device() execution

Commit 6aa0365a3c85 ("ata: libata-scsi: Avoid deadlock on rescan after
device resume") modified ata_scsi_dev_rescan() to check the scsi device
"is_suspended" power field to ensure that the scsi device associated
with an ATA device is fully resumed when scsi_rescan_device() is
executed. However, this fix is problematic as:
1) It relies on a PM internal field that should not be used without PM
   device locking protection.
2) The check for is_suspended and the call to scsi_rescan_device() are
   not atomic and a suspend PM event may be triggered between them,
   casuing scsi_rescan_device() to be called on a suspended device and
   in that function blocking while holding the scsi device lock. This
   would deadlock a following resume operation.
These problems can trigger PM deadlocks on resume, especially with
resume operations triggered quickly after or during suspend operations.
E.g., a simple bash script like:

for (( i=0; i<10; i++ )); do
	echo "+2 > /sys/class/rtc/rtc0/wakealarm
	echo mem > /sys/power/state
done

that triggers a resume 2 seconds after starting suspending a system can
quickly lead to a PM deadlock preventing the system from correctly
resuming.

Fix this by replacing the check on is_suspended with a check on the
return value given by scsi_rescan_device() as that function will fail if
called against a suspended device. Also make sure rescan tasks already
scheduled are first cancelled before suspending an ata port.

Fixes: 6aa0365a3c85 ("ata: libata-scsi: Avoid deadlock on rescan after device resume")
Cc: stable@vger.kernel.org
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Niklas Cassel <niklas.cassel@wdc.com>
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>

diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index a0bc01606b30..092372334e92 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -5168,11 +5168,27 @@ static const unsigned int ata_port_suspend_ehi = ATA_EHI_QUIET
 
 static void ata_port_suspend(struct ata_port *ap, pm_message_t mesg)
 {
+	/*
+	 * We are about to suspend the port, so we do not care about
+	 * scsi_rescan_device() calls scheduled by previous resume operations.
+	 * The next resume will schedule the rescan again. So cancel any rescan
+	 * that is not done yet.
+	 */
+	cancel_delayed_work_sync(&ap->scsi_rescan_task);
+
 	ata_port_request_pm(ap, mesg, 0, ata_port_suspend_ehi, false);
 }
 
 static void ata_port_suspend_async(struct ata_port *ap, pm_message_t mesg)
 {
+	/*
+	 * We are about to suspend the port, so we do not care about
+	 * scsi_rescan_device() calls scheduled by previous resume operations.
+	 * The next resume will schedule the rescan again. So cancel any rescan
+	 * that is not done yet.
+	 */
+	cancel_delayed_work_sync(&ap->scsi_rescan_task);
+
 	ata_port_request_pm(ap, mesg, 0, ata_port_suspend_ehi, true);
 }
 
diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
index a0e58d22d222..6850cac803c1 100644
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -4756,7 +4756,7 @@ void ata_scsi_dev_rescan(struct work_struct *work)
 	struct ata_link *link;
 	struct ata_device *dev;
 	unsigned long flags;
-	bool delay_rescan = false;
+	int ret = 0;
 
 	mutex_lock(&ap->scsi_scan_mutex);
 	spin_lock_irqsave(ap->lock, flags);
@@ -4765,37 +4765,34 @@ void ata_scsi_dev_rescan(struct work_struct *work)
 		ata_for_each_dev(dev, link, ENABLED) {
 			struct scsi_device *sdev = dev->sdev;
 
+			/*
+			 * If the port was suspended before this was scheduled,
+			 * bail out.
+			 */
+			if (ap->pflags & ATA_PFLAG_SUSPENDED)
+				goto unlock;
+
 			if (!sdev)
 				continue;
 			if (scsi_device_get(sdev))
 				continue;
 
-			/*
-			 * If the rescan work was scheduled because of a resume
-			 * event, the port is already fully resumed, but the
-			 * SCSI device may not yet be fully resumed. In such
-			 * case, executing scsi_rescan_device() may cause a
-			 * deadlock with the PM code on device_lock(). Prevent
-			 * this by giving up and retrying rescan after a short
-			 * delay.
-			 */
-			delay_rescan = sdev->sdev_gendev.power.is_suspended;
-			if (delay_rescan) {
-				scsi_device_put(sdev);
-				break;
-			}
-
 			spin_unlock_irqrestore(ap->lock, flags);
-			scsi_rescan_device(sdev);
+			ret = scsi_rescan_device(sdev);
 			scsi_device_put(sdev);
 			spin_lock_irqsave(ap->lock, flags);
+
+			if (ret)
+				goto unlock;
 		}
 	}
 
+unlock:
 	spin_unlock_irqrestore(ap->lock, flags);
 	mutex_unlock(&ap->scsi_scan_mutex);
 
-	if (delay_rescan)
+	/* Reschedule with a delay if scsi_rescan_device() returned an error */
+	if (ret)
 		schedule_delayed_work(&ap->scsi_rescan_task,
 				      msecs_to_jiffies(5));
 }


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: FAILED: patch "[PATCH] ata: libata-scsi: Fix delayed scsi_rescan_device() execution" failed to apply to 6.5-stable tree
  2023-10-04 14:32 FAILED: patch "[PATCH] ata: libata-scsi: Fix delayed scsi_rescan_device() execution" failed to apply to 6.5-stable tree gregkh
@ 2023-10-04 23:39 ` Damien Le Moal
  2023-10-04 23:46 ` [PATCH 6.5.y] ata: libata-scsi: Fix delayed scsi_rescan_device() execution Damien Le Moal
  1 sibling, 0 replies; 3+ messages in thread
From: Damien Le Moal @ 2023-10-04 23:39 UTC (permalink / raw)
  To: gregkh, geert+renesas, hare, martin.petersen, niklas.cassel; +Cc: stable

On 10/4/23 23:32, gregkh@linuxfoundation.org wrote:
> 
> The patch below does not apply to the 6.5-stable tree.
> If someone wants it applied there, or to any other stable or longterm
> tree, then please email the backport, including the original git commit
> id to <stable@vger.kernel.org>.
> 
> To reproduce the conflict and resubmit, you may use the following commands:
> 
> git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.5.y
> git checkout FETCH_HEAD
> git cherry-pick -x 8b4d9469d0b0e553208ee6f62f2807111fde18b9
> # <resolve conflicts, build, test, etc.>
> git commit -s
> git send-email --to '<stable@vger.kernel.org>' --in-reply-to '2023100443-freewill-shanty-0f05@gregkh' --subject-prefix 'PATCH 6.5.y' HEAD^..
> 
> Possible dependencies:

commit ff48b37802e5c134e2dfc4d091f10b2eb5065a72

    scsi: Do not attempt to rescan suspended devices

is needed to backport this one.

> 
> 
> 
> thanks,
> 
> greg k-h
> 
> ------------------ original commit in Linus's tree ------------------
> 
> From 8b4d9469d0b0e553208ee6f62f2807111fde18b9 Mon Sep 17 00:00:00 2001
> From: Damien Le Moal <dlemoal@kernel.org>
> Date: Tue, 5 Sep 2023 09:06:23 +0900
> Subject: [PATCH] ata: libata-scsi: Fix delayed scsi_rescan_device() execution
> 
> Commit 6aa0365a3c85 ("ata: libata-scsi: Avoid deadlock on rescan after
> device resume") modified ata_scsi_dev_rescan() to check the scsi device
> "is_suspended" power field to ensure that the scsi device associated
> with an ATA device is fully resumed when scsi_rescan_device() is
> executed. However, this fix is problematic as:
> 1) It relies on a PM internal field that should not be used without PM
>    device locking protection.
> 2) The check for is_suspended and the call to scsi_rescan_device() are
>    not atomic and a suspend PM event may be triggered between them,
>    casuing scsi_rescan_device() to be called on a suspended device and
>    in that function blocking while holding the scsi device lock. This
>    would deadlock a following resume operation.
> These problems can trigger PM deadlocks on resume, especially with
> resume operations triggered quickly after or during suspend operations.
> E.g., a simple bash script like:
> 
> for (( i=0; i<10; i++ )); do
> 	echo "+2 > /sys/class/rtc/rtc0/wakealarm
> 	echo mem > /sys/power/state
> done
> 
> that triggers a resume 2 seconds after starting suspending a system can
> quickly lead to a PM deadlock preventing the system from correctly
> resuming.
> 
> Fix this by replacing the check on is_suspended with a check on the
> return value given by scsi_rescan_device() as that function will fail if
> called against a suspended device. Also make sure rescan tasks already
> scheduled are first cancelled before suspending an ata port.
> 
> Fixes: 6aa0365a3c85 ("ata: libata-scsi: Avoid deadlock on rescan after device resume")
> Cc: stable@vger.kernel.org
> Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
> Reviewed-by: Hannes Reinecke <hare@suse.de>
> Reviewed-by: Niklas Cassel <niklas.cassel@wdc.com>
> Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
> 
> diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
> index a0bc01606b30..092372334e92 100644
> --- a/drivers/ata/libata-core.c
> +++ b/drivers/ata/libata-core.c
> @@ -5168,11 +5168,27 @@ static const unsigned int ata_port_suspend_ehi = ATA_EHI_QUIET
>  
>  static void ata_port_suspend(struct ata_port *ap, pm_message_t mesg)
>  {
> +	/*
> +	 * We are about to suspend the port, so we do not care about
> +	 * scsi_rescan_device() calls scheduled by previous resume operations.
> +	 * The next resume will schedule the rescan again. So cancel any rescan
> +	 * that is not done yet.
> +	 */
> +	cancel_delayed_work_sync(&ap->scsi_rescan_task);
> +
>  	ata_port_request_pm(ap, mesg, 0, ata_port_suspend_ehi, false);
>  }
>  
>  static void ata_port_suspend_async(struct ata_port *ap, pm_message_t mesg)
>  {
> +	/*
> +	 * We are about to suspend the port, so we do not care about
> +	 * scsi_rescan_device() calls scheduled by previous resume operations.
> +	 * The next resume will schedule the rescan again. So cancel any rescan
> +	 * that is not done yet.
> +	 */
> +	cancel_delayed_work_sync(&ap->scsi_rescan_task);
> +
>  	ata_port_request_pm(ap, mesg, 0, ata_port_suspend_ehi, true);
>  }
>  
> diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
> index a0e58d22d222..6850cac803c1 100644
> --- a/drivers/ata/libata-scsi.c
> +++ b/drivers/ata/libata-scsi.c
> @@ -4756,7 +4756,7 @@ void ata_scsi_dev_rescan(struct work_struct *work)
>  	struct ata_link *link;
>  	struct ata_device *dev;
>  	unsigned long flags;
> -	bool delay_rescan = false;
> +	int ret = 0;
>  
>  	mutex_lock(&ap->scsi_scan_mutex);
>  	spin_lock_irqsave(ap->lock, flags);
> @@ -4765,37 +4765,34 @@ void ata_scsi_dev_rescan(struct work_struct *work)
>  		ata_for_each_dev(dev, link, ENABLED) {
>  			struct scsi_device *sdev = dev->sdev;
>  
> +			/*
> +			 * If the port was suspended before this was scheduled,
> +			 * bail out.
> +			 */
> +			if (ap->pflags & ATA_PFLAG_SUSPENDED)
> +				goto unlock;
> +
>  			if (!sdev)
>  				continue;
>  			if (scsi_device_get(sdev))
>  				continue;
>  
> -			/*
> -			 * If the rescan work was scheduled because of a resume
> -			 * event, the port is already fully resumed, but the
> -			 * SCSI device may not yet be fully resumed. In such
> -			 * case, executing scsi_rescan_device() may cause a
> -			 * deadlock with the PM code on device_lock(). Prevent
> -			 * this by giving up and retrying rescan after a short
> -			 * delay.
> -			 */
> -			delay_rescan = sdev->sdev_gendev.power.is_suspended;
> -			if (delay_rescan) {
> -				scsi_device_put(sdev);
> -				break;
> -			}
> -
>  			spin_unlock_irqrestore(ap->lock, flags);
> -			scsi_rescan_device(sdev);
> +			ret = scsi_rescan_device(sdev);
>  			scsi_device_put(sdev);
>  			spin_lock_irqsave(ap->lock, flags);
> +
> +			if (ret)
> +				goto unlock;
>  		}
>  	}
>  
> +unlock:
>  	spin_unlock_irqrestore(ap->lock, flags);
>  	mutex_unlock(&ap->scsi_scan_mutex);
>  
> -	if (delay_rescan)
> +	/* Reschedule with a delay if scsi_rescan_device() returned an error */
> +	if (ret)
>  		schedule_delayed_work(&ap->scsi_rescan_task,
>  				      msecs_to_jiffies(5));
>  }
> 

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 3+ messages in thread

* [PATCH 6.5.y] ata: libata-scsi: Fix delayed scsi_rescan_device() execution
  2023-10-04 14:32 FAILED: patch "[PATCH] ata: libata-scsi: Fix delayed scsi_rescan_device() execution" failed to apply to 6.5-stable tree gregkh
  2023-10-04 23:39 ` Damien Le Moal
@ 2023-10-04 23:46 ` Damien Le Moal
  1 sibling, 0 replies; 3+ messages in thread
From: Damien Le Moal @ 2023-10-04 23:46 UTC (permalink / raw)
  To: stable

Commit 8b4d9469d0b0e553208ee6f62f2807111fde18b9 upstream.

Commit 6aa0365a3c85 ("ata: libata-scsi: Avoid deadlock on rescan after
device resume") modified ata_scsi_dev_rescan() to check the scsi device
"is_suspended" power field to ensure that the scsi device associated
with an ATA device is fully resumed when scsi_rescan_device() is
executed. However, this fix is problematic as:
1) It relies on a PM internal field that should not be used without PM
   device locking protection.
2) The check for is_suspended and the call to scsi_rescan_device() are
   not atomic and a suspend PM event may be triggered between them,
   casuing scsi_rescan_device() to be called on a suspended device and
   in that function blocking while holding the scsi device lock. This
   would deadlock a following resume operation.
These problems can trigger PM deadlocks on resume, especially with
resume operations triggered quickly after or during suspend operations.
E.g., a simple bash script like:

for (( i=0; i<10; i++ )); do
	echo "+2 > /sys/class/rtc/rtc0/wakealarm
	echo mem > /sys/power/state
done

that triggers a resume 2 seconds after starting suspending a system can
quickly lead to a PM deadlock preventing the system from correctly
resuming.

Fix this by replacing the check on is_suspended with a check on the
return value given by scsi_rescan_device() as that function will fail if
called against a suspended device. Also make sure rescan tasks already
scheduled are first cancelled before suspending an ata port.

Fixes: 6aa0365a3c85 ("ata: libata-scsi: Avoid deadlock on rescan after device resume")
Cc: stable@vger.kernel.org
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Niklas Cassel <niklas.cassel@wdc.com>
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 8b4d9469d0b0e553208ee6f62f2807111fde18b9)
---
 drivers/ata/libata-core.c | 16 ++++++++++++++++
 drivers/ata/libata-scsi.c | 33 +++++++++++++++------------------
 2 files changed, 31 insertions(+), 18 deletions(-)

diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index 79d02eb4e479..397a06bcc0cc 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -5245,11 +5245,27 @@ static const unsigned int ata_port_suspend_ehi = ATA_EHI_QUIET
 
 static void ata_port_suspend(struct ata_port *ap, pm_message_t mesg)
 {
+	/*
+	 * We are about to suspend the port, so we do not care about
+	 * scsi_rescan_device() calls scheduled by previous resume operations.
+	 * The next resume will schedule the rescan again. So cancel any rescan
+	 * that is not done yet.
+	 */
+	cancel_delayed_work_sync(&ap->scsi_rescan_task);
+
 	ata_port_request_pm(ap, mesg, 0, ata_port_suspend_ehi, false);
 }
 
 static void ata_port_suspend_async(struct ata_port *ap, pm_message_t mesg)
 {
+	/*
+	 * We are about to suspend the port, so we do not care about
+	 * scsi_rescan_device() calls scheduled by previous resume operations.
+	 * The next resume will schedule the rescan again. So cancel any rescan
+	 * that is not done yet.
+	 */
+	cancel_delayed_work_sync(&ap->scsi_rescan_task);
+
 	ata_port_request_pm(ap, mesg, 0, ata_port_suspend_ehi, true);
 }
 
diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
index c6ece32de8e3..90001d265336 100644
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -4861,7 +4861,7 @@ void ata_scsi_dev_rescan(struct work_struct *work)
 	struct ata_link *link;
 	struct ata_device *dev;
 	unsigned long flags;
-	bool delay_rescan = false;
+	int ret = 0;
 
 	mutex_lock(&ap->scsi_scan_mutex);
 	spin_lock_irqsave(ap->lock, flags);
@@ -4870,37 +4870,34 @@ void ata_scsi_dev_rescan(struct work_struct *work)
 		ata_for_each_dev(dev, link, ENABLED) {
 			struct scsi_device *sdev = dev->sdev;
 
+			/*
+			 * If the port was suspended before this was scheduled,
+			 * bail out.
+			 */
+			if (ap->pflags & ATA_PFLAG_SUSPENDED)
+				goto unlock;
+
 			if (!sdev)
 				continue;
 			if (scsi_device_get(sdev))
 				continue;
 
-			/*
-			 * If the rescan work was scheduled because of a resume
-			 * event, the port is already fully resumed, but the
-			 * SCSI device may not yet be fully resumed. In such
-			 * case, executing scsi_rescan_device() may cause a
-			 * deadlock with the PM code on device_lock(). Prevent
-			 * this by giving up and retrying rescan after a short
-			 * delay.
-			 */
-			delay_rescan = sdev->sdev_gendev.power.is_suspended;
-			if (delay_rescan) {
-				scsi_device_put(sdev);
-				break;
-			}
-
 			spin_unlock_irqrestore(ap->lock, flags);
-			scsi_rescan_device(&(sdev->sdev_gendev));
+			ret = scsi_rescan_device(&(sdev->sdev_gendev));
 			scsi_device_put(sdev);
 			spin_lock_irqsave(ap->lock, flags);
+
+			if (ret)
+				goto unlock;
 		}
 	}
 
+unlock:
 	spin_unlock_irqrestore(ap->lock, flags);
 	mutex_unlock(&ap->scsi_scan_mutex);
 
-	if (delay_rescan)
+	/* Reschedule with a delay if scsi_rescan_device() returned an error */
+	if (ret)
 		schedule_delayed_work(&ap->scsi_rescan_task,
 				      msecs_to_jiffies(5));
 }
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2023-10-04 23:46 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-10-04 14:32 FAILED: patch "[PATCH] ata: libata-scsi: Fix delayed scsi_rescan_device() execution" failed to apply to 6.5-stable tree gregkh
2023-10-04 23:39 ` Damien Le Moal
2023-10-04 23:46 ` [PATCH 6.5.y] ata: libata-scsi: Fix delayed scsi_rescan_device() execution Damien Le Moal

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox