* FAILED: patch "[PATCH] ata: libata-scsi: Fix delayed scsi_rescan_device() execution" failed to apply to 6.5-stable tree
@ 2023-10-04 14:32 gregkh
2023-10-04 23:39 ` Damien Le Moal
2023-10-04 23:46 ` [PATCH 6.5.y] ata: libata-scsi: Fix delayed scsi_rescan_device() execution Damien Le Moal
0 siblings, 2 replies; 3+ messages in thread
From: gregkh @ 2023-10-04 14:32 UTC (permalink / raw)
To: dlemoal, geert+renesas, hare, martin.petersen, niklas.cassel; +Cc: stable
The patch below does not apply to the 6.5-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable@vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.5.y
git checkout FETCH_HEAD
git cherry-pick -x 8b4d9469d0b0e553208ee6f62f2807111fde18b9
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable@vger.kernel.org>' --in-reply-to '2023100443-freewill-shanty-0f05@gregkh' --subject-prefix 'PATCH 6.5.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 8b4d9469d0b0e553208ee6f62f2807111fde18b9 Mon Sep 17 00:00:00 2001
From: Damien Le Moal <dlemoal@kernel.org>
Date: Tue, 5 Sep 2023 09:06:23 +0900
Subject: [PATCH] ata: libata-scsi: Fix delayed scsi_rescan_device() execution
Commit 6aa0365a3c85 ("ata: libata-scsi: Avoid deadlock on rescan after
device resume") modified ata_scsi_dev_rescan() to check the scsi device
"is_suspended" power field to ensure that the scsi device associated
with an ATA device is fully resumed when scsi_rescan_device() is
executed. However, this fix is problematic as:
1) It relies on a PM internal field that should not be used without PM
device locking protection.
2) The check for is_suspended and the call to scsi_rescan_device() are
not atomic and a suspend PM event may be triggered between them,
casuing scsi_rescan_device() to be called on a suspended device and
in that function blocking while holding the scsi device lock. This
would deadlock a following resume operation.
These problems can trigger PM deadlocks on resume, especially with
resume operations triggered quickly after or during suspend operations.
E.g., a simple bash script like:
for (( i=0; i<10; i++ )); do
echo "+2 > /sys/class/rtc/rtc0/wakealarm
echo mem > /sys/power/state
done
that triggers a resume 2 seconds after starting suspending a system can
quickly lead to a PM deadlock preventing the system from correctly
resuming.
Fix this by replacing the check on is_suspended with a check on the
return value given by scsi_rescan_device() as that function will fail if
called against a suspended device. Also make sure rescan tasks already
scheduled are first cancelled before suspending an ata port.
Fixes: 6aa0365a3c85 ("ata: libata-scsi: Avoid deadlock on rescan after device resume")
Cc: stable@vger.kernel.org
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Niklas Cassel <niklas.cassel@wdc.com>
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index a0bc01606b30..092372334e92 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -5168,11 +5168,27 @@ static const unsigned int ata_port_suspend_ehi = ATA_EHI_QUIET
static void ata_port_suspend(struct ata_port *ap, pm_message_t mesg)
{
+ /*
+ * We are about to suspend the port, so we do not care about
+ * scsi_rescan_device() calls scheduled by previous resume operations.
+ * The next resume will schedule the rescan again. So cancel any rescan
+ * that is not done yet.
+ */
+ cancel_delayed_work_sync(&ap->scsi_rescan_task);
+
ata_port_request_pm(ap, mesg, 0, ata_port_suspend_ehi, false);
}
static void ata_port_suspend_async(struct ata_port *ap, pm_message_t mesg)
{
+ /*
+ * We are about to suspend the port, so we do not care about
+ * scsi_rescan_device() calls scheduled by previous resume operations.
+ * The next resume will schedule the rescan again. So cancel any rescan
+ * that is not done yet.
+ */
+ cancel_delayed_work_sync(&ap->scsi_rescan_task);
+
ata_port_request_pm(ap, mesg, 0, ata_port_suspend_ehi, true);
}
diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
index a0e58d22d222..6850cac803c1 100644
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -4756,7 +4756,7 @@ void ata_scsi_dev_rescan(struct work_struct *work)
struct ata_link *link;
struct ata_device *dev;
unsigned long flags;
- bool delay_rescan = false;
+ int ret = 0;
mutex_lock(&ap->scsi_scan_mutex);
spin_lock_irqsave(ap->lock, flags);
@@ -4765,37 +4765,34 @@ void ata_scsi_dev_rescan(struct work_struct *work)
ata_for_each_dev(dev, link, ENABLED) {
struct scsi_device *sdev = dev->sdev;
+ /*
+ * If the port was suspended before this was scheduled,
+ * bail out.
+ */
+ if (ap->pflags & ATA_PFLAG_SUSPENDED)
+ goto unlock;
+
if (!sdev)
continue;
if (scsi_device_get(sdev))
continue;
- /*
- * If the rescan work was scheduled because of a resume
- * event, the port is already fully resumed, but the
- * SCSI device may not yet be fully resumed. In such
- * case, executing scsi_rescan_device() may cause a
- * deadlock with the PM code on device_lock(). Prevent
- * this by giving up and retrying rescan after a short
- * delay.
- */
- delay_rescan = sdev->sdev_gendev.power.is_suspended;
- if (delay_rescan) {
- scsi_device_put(sdev);
- break;
- }
-
spin_unlock_irqrestore(ap->lock, flags);
- scsi_rescan_device(sdev);
+ ret = scsi_rescan_device(sdev);
scsi_device_put(sdev);
spin_lock_irqsave(ap->lock, flags);
+
+ if (ret)
+ goto unlock;
}
}
+unlock:
spin_unlock_irqrestore(ap->lock, flags);
mutex_unlock(&ap->scsi_scan_mutex);
- if (delay_rescan)
+ /* Reschedule with a delay if scsi_rescan_device() returned an error */
+ if (ret)
schedule_delayed_work(&ap->scsi_rescan_task,
msecs_to_jiffies(5));
}
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: FAILED: patch "[PATCH] ata: libata-scsi: Fix delayed scsi_rescan_device() execution" failed to apply to 6.5-stable tree
2023-10-04 14:32 FAILED: patch "[PATCH] ata: libata-scsi: Fix delayed scsi_rescan_device() execution" failed to apply to 6.5-stable tree gregkh
@ 2023-10-04 23:39 ` Damien Le Moal
2023-10-04 23:46 ` [PATCH 6.5.y] ata: libata-scsi: Fix delayed scsi_rescan_device() execution Damien Le Moal
1 sibling, 0 replies; 3+ messages in thread
From: Damien Le Moal @ 2023-10-04 23:39 UTC (permalink / raw)
To: gregkh, geert+renesas, hare, martin.petersen, niklas.cassel; +Cc: stable
On 10/4/23 23:32, gregkh@linuxfoundation.org wrote:
>
> The patch below does not apply to the 6.5-stable tree.
> If someone wants it applied there, or to any other stable or longterm
> tree, then please email the backport, including the original git commit
> id to <stable@vger.kernel.org>.
>
> To reproduce the conflict and resubmit, you may use the following commands:
>
> git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.5.y
> git checkout FETCH_HEAD
> git cherry-pick -x 8b4d9469d0b0e553208ee6f62f2807111fde18b9
> # <resolve conflicts, build, test, etc.>
> git commit -s
> git send-email --to '<stable@vger.kernel.org>' --in-reply-to '2023100443-freewill-shanty-0f05@gregkh' --subject-prefix 'PATCH 6.5.y' HEAD^..
>
> Possible dependencies:
commit ff48b37802e5c134e2dfc4d091f10b2eb5065a72
scsi: Do not attempt to rescan suspended devices
is needed to backport this one.
>
>
>
> thanks,
>
> greg k-h
>
> ------------------ original commit in Linus's tree ------------------
>
> From 8b4d9469d0b0e553208ee6f62f2807111fde18b9 Mon Sep 17 00:00:00 2001
> From: Damien Le Moal <dlemoal@kernel.org>
> Date: Tue, 5 Sep 2023 09:06:23 +0900
> Subject: [PATCH] ata: libata-scsi: Fix delayed scsi_rescan_device() execution
>
> Commit 6aa0365a3c85 ("ata: libata-scsi: Avoid deadlock on rescan after
> device resume") modified ata_scsi_dev_rescan() to check the scsi device
> "is_suspended" power field to ensure that the scsi device associated
> with an ATA device is fully resumed when scsi_rescan_device() is
> executed. However, this fix is problematic as:
> 1) It relies on a PM internal field that should not be used without PM
> device locking protection.
> 2) The check for is_suspended and the call to scsi_rescan_device() are
> not atomic and a suspend PM event may be triggered between them,
> casuing scsi_rescan_device() to be called on a suspended device and
> in that function blocking while holding the scsi device lock. This
> would deadlock a following resume operation.
> These problems can trigger PM deadlocks on resume, especially with
> resume operations triggered quickly after or during suspend operations.
> E.g., a simple bash script like:
>
> for (( i=0; i<10; i++ )); do
> echo "+2 > /sys/class/rtc/rtc0/wakealarm
> echo mem > /sys/power/state
> done
>
> that triggers a resume 2 seconds after starting suspending a system can
> quickly lead to a PM deadlock preventing the system from correctly
> resuming.
>
> Fix this by replacing the check on is_suspended with a check on the
> return value given by scsi_rescan_device() as that function will fail if
> called against a suspended device. Also make sure rescan tasks already
> scheduled are first cancelled before suspending an ata port.
>
> Fixes: 6aa0365a3c85 ("ata: libata-scsi: Avoid deadlock on rescan after device resume")
> Cc: stable@vger.kernel.org
> Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
> Reviewed-by: Hannes Reinecke <hare@suse.de>
> Reviewed-by: Niklas Cassel <niklas.cassel@wdc.com>
> Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
>
> diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
> index a0bc01606b30..092372334e92 100644
> --- a/drivers/ata/libata-core.c
> +++ b/drivers/ata/libata-core.c
> @@ -5168,11 +5168,27 @@ static const unsigned int ata_port_suspend_ehi = ATA_EHI_QUIET
>
> static void ata_port_suspend(struct ata_port *ap, pm_message_t mesg)
> {
> + /*
> + * We are about to suspend the port, so we do not care about
> + * scsi_rescan_device() calls scheduled by previous resume operations.
> + * The next resume will schedule the rescan again. So cancel any rescan
> + * that is not done yet.
> + */
> + cancel_delayed_work_sync(&ap->scsi_rescan_task);
> +
> ata_port_request_pm(ap, mesg, 0, ata_port_suspend_ehi, false);
> }
>
> static void ata_port_suspend_async(struct ata_port *ap, pm_message_t mesg)
> {
> + /*
> + * We are about to suspend the port, so we do not care about
> + * scsi_rescan_device() calls scheduled by previous resume operations.
> + * The next resume will schedule the rescan again. So cancel any rescan
> + * that is not done yet.
> + */
> + cancel_delayed_work_sync(&ap->scsi_rescan_task);
> +
> ata_port_request_pm(ap, mesg, 0, ata_port_suspend_ehi, true);
> }
>
> diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
> index a0e58d22d222..6850cac803c1 100644
> --- a/drivers/ata/libata-scsi.c
> +++ b/drivers/ata/libata-scsi.c
> @@ -4756,7 +4756,7 @@ void ata_scsi_dev_rescan(struct work_struct *work)
> struct ata_link *link;
> struct ata_device *dev;
> unsigned long flags;
> - bool delay_rescan = false;
> + int ret = 0;
>
> mutex_lock(&ap->scsi_scan_mutex);
> spin_lock_irqsave(ap->lock, flags);
> @@ -4765,37 +4765,34 @@ void ata_scsi_dev_rescan(struct work_struct *work)
> ata_for_each_dev(dev, link, ENABLED) {
> struct scsi_device *sdev = dev->sdev;
>
> + /*
> + * If the port was suspended before this was scheduled,
> + * bail out.
> + */
> + if (ap->pflags & ATA_PFLAG_SUSPENDED)
> + goto unlock;
> +
> if (!sdev)
> continue;
> if (scsi_device_get(sdev))
> continue;
>
> - /*
> - * If the rescan work was scheduled because of a resume
> - * event, the port is already fully resumed, but the
> - * SCSI device may not yet be fully resumed. In such
> - * case, executing scsi_rescan_device() may cause a
> - * deadlock with the PM code on device_lock(). Prevent
> - * this by giving up and retrying rescan after a short
> - * delay.
> - */
> - delay_rescan = sdev->sdev_gendev.power.is_suspended;
> - if (delay_rescan) {
> - scsi_device_put(sdev);
> - break;
> - }
> -
> spin_unlock_irqrestore(ap->lock, flags);
> - scsi_rescan_device(sdev);
> + ret = scsi_rescan_device(sdev);
> scsi_device_put(sdev);
> spin_lock_irqsave(ap->lock, flags);
> +
> + if (ret)
> + goto unlock;
> }
> }
>
> +unlock:
> spin_unlock_irqrestore(ap->lock, flags);
> mutex_unlock(&ap->scsi_scan_mutex);
>
> - if (delay_rescan)
> + /* Reschedule with a delay if scsi_rescan_device() returned an error */
> + if (ret)
> schedule_delayed_work(&ap->scsi_rescan_task,
> msecs_to_jiffies(5));
> }
>
--
Damien Le Moal
Western Digital Research
^ permalink raw reply [flat|nested] 3+ messages in thread
* [PATCH 6.5.y] ata: libata-scsi: Fix delayed scsi_rescan_device() execution
2023-10-04 14:32 FAILED: patch "[PATCH] ata: libata-scsi: Fix delayed scsi_rescan_device() execution" failed to apply to 6.5-stable tree gregkh
2023-10-04 23:39 ` Damien Le Moal
@ 2023-10-04 23:46 ` Damien Le Moal
1 sibling, 0 replies; 3+ messages in thread
From: Damien Le Moal @ 2023-10-04 23:46 UTC (permalink / raw)
To: stable
Commit 8b4d9469d0b0e553208ee6f62f2807111fde18b9 upstream.
Commit 6aa0365a3c85 ("ata: libata-scsi: Avoid deadlock on rescan after
device resume") modified ata_scsi_dev_rescan() to check the scsi device
"is_suspended" power field to ensure that the scsi device associated
with an ATA device is fully resumed when scsi_rescan_device() is
executed. However, this fix is problematic as:
1) It relies on a PM internal field that should not be used without PM
device locking protection.
2) The check for is_suspended and the call to scsi_rescan_device() are
not atomic and a suspend PM event may be triggered between them,
casuing scsi_rescan_device() to be called on a suspended device and
in that function blocking while holding the scsi device lock. This
would deadlock a following resume operation.
These problems can trigger PM deadlocks on resume, especially with
resume operations triggered quickly after or during suspend operations.
E.g., a simple bash script like:
for (( i=0; i<10; i++ )); do
echo "+2 > /sys/class/rtc/rtc0/wakealarm
echo mem > /sys/power/state
done
that triggers a resume 2 seconds after starting suspending a system can
quickly lead to a PM deadlock preventing the system from correctly
resuming.
Fix this by replacing the check on is_suspended with a check on the
return value given by scsi_rescan_device() as that function will fail if
called against a suspended device. Also make sure rescan tasks already
scheduled are first cancelled before suspending an ata port.
Fixes: 6aa0365a3c85 ("ata: libata-scsi: Avoid deadlock on rescan after device resume")
Cc: stable@vger.kernel.org
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Niklas Cassel <niklas.cassel@wdc.com>
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 8b4d9469d0b0e553208ee6f62f2807111fde18b9)
---
drivers/ata/libata-core.c | 16 ++++++++++++++++
drivers/ata/libata-scsi.c | 33 +++++++++++++++------------------
2 files changed, 31 insertions(+), 18 deletions(-)
diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index 79d02eb4e479..397a06bcc0cc 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -5245,11 +5245,27 @@ static const unsigned int ata_port_suspend_ehi = ATA_EHI_QUIET
static void ata_port_suspend(struct ata_port *ap, pm_message_t mesg)
{
+ /*
+ * We are about to suspend the port, so we do not care about
+ * scsi_rescan_device() calls scheduled by previous resume operations.
+ * The next resume will schedule the rescan again. So cancel any rescan
+ * that is not done yet.
+ */
+ cancel_delayed_work_sync(&ap->scsi_rescan_task);
+
ata_port_request_pm(ap, mesg, 0, ata_port_suspend_ehi, false);
}
static void ata_port_suspend_async(struct ata_port *ap, pm_message_t mesg)
{
+ /*
+ * We are about to suspend the port, so we do not care about
+ * scsi_rescan_device() calls scheduled by previous resume operations.
+ * The next resume will schedule the rescan again. So cancel any rescan
+ * that is not done yet.
+ */
+ cancel_delayed_work_sync(&ap->scsi_rescan_task);
+
ata_port_request_pm(ap, mesg, 0, ata_port_suspend_ehi, true);
}
diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
index c6ece32de8e3..90001d265336 100644
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -4861,7 +4861,7 @@ void ata_scsi_dev_rescan(struct work_struct *work)
struct ata_link *link;
struct ata_device *dev;
unsigned long flags;
- bool delay_rescan = false;
+ int ret = 0;
mutex_lock(&ap->scsi_scan_mutex);
spin_lock_irqsave(ap->lock, flags);
@@ -4870,37 +4870,34 @@ void ata_scsi_dev_rescan(struct work_struct *work)
ata_for_each_dev(dev, link, ENABLED) {
struct scsi_device *sdev = dev->sdev;
+ /*
+ * If the port was suspended before this was scheduled,
+ * bail out.
+ */
+ if (ap->pflags & ATA_PFLAG_SUSPENDED)
+ goto unlock;
+
if (!sdev)
continue;
if (scsi_device_get(sdev))
continue;
- /*
- * If the rescan work was scheduled because of a resume
- * event, the port is already fully resumed, but the
- * SCSI device may not yet be fully resumed. In such
- * case, executing scsi_rescan_device() may cause a
- * deadlock with the PM code on device_lock(). Prevent
- * this by giving up and retrying rescan after a short
- * delay.
- */
- delay_rescan = sdev->sdev_gendev.power.is_suspended;
- if (delay_rescan) {
- scsi_device_put(sdev);
- break;
- }
-
spin_unlock_irqrestore(ap->lock, flags);
- scsi_rescan_device(&(sdev->sdev_gendev));
+ ret = scsi_rescan_device(&(sdev->sdev_gendev));
scsi_device_put(sdev);
spin_lock_irqsave(ap->lock, flags);
+
+ if (ret)
+ goto unlock;
}
}
+unlock:
spin_unlock_irqrestore(ap->lock, flags);
mutex_unlock(&ap->scsi_scan_mutex);
- if (delay_rescan)
+ /* Reschedule with a delay if scsi_rescan_device() returned an error */
+ if (ret)
schedule_delayed_work(&ap->scsi_rescan_task,
msecs_to_jiffies(5));
}
--
2.41.0
^ permalink raw reply related [flat|nested] 3+ messages in thread
end of thread, other threads:[~2023-10-04 23:46 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-10-04 14:32 FAILED: patch "[PATCH] ata: libata-scsi: Fix delayed scsi_rescan_device() execution" failed to apply to 6.5-stable tree gregkh
2023-10-04 23:39 ` Damien Le Moal
2023-10-04 23:46 ` [PATCH 6.5.y] ata: libata-scsi: Fix delayed scsi_rescan_device() execution Damien Le Moal
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox