From: Damien Le Moal <dlemoal@kernel.org>
To: linux-ide@vger.kernel.org
Cc: linux-scsi@vger.kernel.org,
"Martin K . Petersen" <martin.petersen@oracle.com>,
John Garry <john.g.garry@oracle.com>,
Rodrigo Vivi <rodrigo.vivi@intel.com>,
Paul Ausbeck <paula@soe.ucsc.edu>,
Kai-Heng Feng <kai.heng.feng@canonical.com>,
Joe Breuer <linux-kernel@jmbreuer.net>,
Geert Uytterhoeven <geert@linux-m68k.org>,
Chia-Lin Kao <acelan.kao@canonical.com>
Subject: [PATCH v6 07/23] ata: libata-scsi: Fix delayed scsi_rescan_device() execution
Date: Sat, 23 Sep 2023 09:29:16 +0900 [thread overview]
Message-ID: <20230923002932.1082348-8-dlemoal@kernel.org> (raw)
In-Reply-To: <20230923002932.1082348-1-dlemoal@kernel.org>
Commit 6aa0365a3c85 ("ata: libata-scsi: Avoid deadlock on rescan after
device resume") modified ata_scsi_dev_rescan() to check the scsi device
"is_suspended" power field to ensure that the scsi device associated
with an ATA device is fully resumed when scsi_rescan_device() is
executed. However, this fix is problematic as:
1) It relies on a PM internal field that should not be used without PM
device locking protection.
2) The check for is_suspended and the call to scsi_rescan_device() are
not atomic and a suspend PM event may be triggered between them,
casuing scsi_rescan_device() to be called on a suspended device and
in that function blocking while holding the scsi device lock. This
would deadlock a following resume operation.
These problems can trigger PM deadlocks on resume, especially with
resume operations triggered quickly after or during suspend operations.
E.g., a simple bash script like:
for (( i=0; i<10; i++ )); do
echo "+2 > /sys/class/rtc/rtc0/wakealarm
echo mem > /sys/power/state
done
that triggers a resume 2 seconds after starting suspending a system can
quickly lead to a PM deadlock preventing the system from correctly
resuming.
Fix this by replacing the check on is_suspended with a check on the
return value given by scsi_rescan_device() as that function will fail if
called against a suspended device. Also make sure rescan tasks already
scheduled are first cancelled before suspending an ata port.
Fixes: 6aa0365a3c85 ("ata: libata-scsi: Avoid deadlock on rescan after device resume")
Cc: stable@vger.kernel.org
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Niklas Cassel <niklas.cassel@wdc.com>
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
---
drivers/ata/libata-core.c | 16 ++++++++++++++++
drivers/ata/libata-scsi.c | 33 +++++++++++++++------------------
2 files changed, 31 insertions(+), 18 deletions(-)
diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index a0bc01606b30..092372334e92 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -5168,11 +5168,27 @@ static const unsigned int ata_port_suspend_ehi = ATA_EHI_QUIET
static void ata_port_suspend(struct ata_port *ap, pm_message_t mesg)
{
+ /*
+ * We are about to suspend the port, so we do not care about
+ * scsi_rescan_device() calls scheduled by previous resume operations.
+ * The next resume will schedule the rescan again. So cancel any rescan
+ * that is not done yet.
+ */
+ cancel_delayed_work_sync(&ap->scsi_rescan_task);
+
ata_port_request_pm(ap, mesg, 0, ata_port_suspend_ehi, false);
}
static void ata_port_suspend_async(struct ata_port *ap, pm_message_t mesg)
{
+ /*
+ * We are about to suspend the port, so we do not care about
+ * scsi_rescan_device() calls scheduled by previous resume operations.
+ * The next resume will schedule the rescan again. So cancel any rescan
+ * that is not done yet.
+ */
+ cancel_delayed_work_sync(&ap->scsi_rescan_task);
+
ata_port_request_pm(ap, mesg, 0, ata_port_suspend_ehi, true);
}
diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
index a69d63e7b919..576bb51cb480 100644
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -4756,7 +4756,7 @@ void ata_scsi_dev_rescan(struct work_struct *work)
struct ata_link *link;
struct ata_device *dev;
unsigned long flags;
- bool delay_rescan = false;
+ int ret = 0;
mutex_lock(&ap->scsi_scan_mutex);
spin_lock_irqsave(ap->lock, flags);
@@ -4765,37 +4765,34 @@ void ata_scsi_dev_rescan(struct work_struct *work)
ata_for_each_dev(dev, link, ENABLED) {
struct scsi_device *sdev = dev->sdev;
+ /*
+ * If the port was suspended before this was scheduled,
+ * bail out.
+ */
+ if (ap->pflags & ATA_PFLAG_SUSPENDED)
+ goto unlock;
+
if (!sdev)
continue;
if (scsi_device_get(sdev))
continue;
- /*
- * If the rescan work was scheduled because of a resume
- * event, the port is already fully resumed, but the
- * SCSI device may not yet be fully resumed. In such
- * case, executing scsi_rescan_device() may cause a
- * deadlock with the PM code on device_lock(). Prevent
- * this by giving up and retrying rescan after a short
- * delay.
- */
- delay_rescan = sdev->sdev_gendev.power.is_suspended;
- if (delay_rescan) {
- scsi_device_put(sdev);
- break;
- }
-
spin_unlock_irqrestore(ap->lock, flags);
- scsi_rescan_device(sdev);
+ ret = scsi_rescan_device(sdev);
scsi_device_put(sdev);
spin_lock_irqsave(ap->lock, flags);
+
+ if (ret)
+ goto unlock;
}
}
+unlock:
spin_unlock_irqrestore(ap->lock, flags);
mutex_unlock(&ap->scsi_scan_mutex);
- if (delay_rescan)
+ /* Reschedule with a delay if scsi_rescan_device() returned an error */
+ if (ret)
schedule_delayed_work(&ap->scsi_rescan_task,
msecs_to_jiffies(5));
}
--
2.41.0
next prev parent reply other threads:[~2023-09-23 0:29 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-09-23 0:29 [PATCH v6 00/23] Fix libata suspend/resume handling and code cleanup Damien Le Moal
2023-09-23 0:29 ` [PATCH v6 01/23] ata: libata-core: Fix ata_port_request_pm() locking Damien Le Moal
2023-09-26 17:18 ` Bart Van Assche
2023-09-23 0:29 ` [PATCH v6 02/23] ata: libata-core: Fix port and device removal Damien Le Moal
2023-09-26 17:28 ` Bart Van Assche
2023-09-23 0:29 ` [PATCH v6 03/23] ata: libata-scsi: link ata port and scsi device Damien Le Moal
2023-09-23 0:29 ` [PATCH v6 04/23] scsi: sd: Differentiate system and runtime start/stop management Damien Le Moal
2023-09-26 18:07 ` Bart Van Assche
2023-09-23 0:29 ` [PATCH v6 05/23] ata: libata-scsi: Disable scsi device manage_system_start_stop Damien Le Moal
2023-09-25 14:27 ` Phillip Susi
2023-09-26 6:19 ` Damien Le Moal
2023-09-26 6:34 ` Damien Le Moal
2023-09-26 15:25 ` Phillip Susi
2023-09-23 0:29 ` [PATCH v6 06/23] scsi: Do not attempt to rescan suspended devices Damien Le Moal
2023-09-26 18:10 ` Bart Van Assche
2023-09-23 0:29 ` Damien Le Moal [this message]
2023-09-23 0:29 ` [PATCH v6 08/23] ata: libata-core: Do not register PM operations for SAS ports Damien Le Moal
2023-09-23 0:29 ` [PATCH v6 09/23] scsi: sd: Do not issue commands to suspended disks on shutdown Damien Le Moal
2023-09-25 20:22 ` Bart Van Assche
2023-09-26 6:00 ` Damien Le Moal
2023-09-26 14:51 ` Bart Van Assche
2023-09-26 23:30 ` Bart Van Assche
2023-09-23 0:29 ` [PATCH v6 10/23] ata: libata-core: Fix compilation warning in ata_dev_config_ncq() Damien Le Moal
2023-09-23 0:29 ` [PATCH v6 11/23] ata: libata-eh: Fix compilation warning in ata_eh_link_report() Damien Le Moal
2023-09-23 0:29 ` [PATCH v6 12/23] scsi: Remove scsi device no_start_on_resume flag Damien Le Moal
2023-09-26 20:42 ` Bart Van Assche
2023-09-23 0:29 ` [PATCH v6 13/23] ata: libata-scsi: Cleanup ata_scsi_start_stop_xlat() Damien Le Moal
2023-09-23 0:29 ` [PATCH v6 14/23] ata: libata-core: Synchronize ata_port_detach() with hotplug Damien Le Moal
2023-09-23 0:29 ` [PATCH v6 15/23] ata: libata-core: Detach a port devices on shutdown Damien Le Moal
2023-09-23 0:29 ` [PATCH v6 16/23] ata: libata-core: Remove ata_port_suspend_async() Damien Le Moal
2023-09-23 0:29 ` [PATCH v6 17/23] ata: libata-core: Remove ata_port_resume_async() Damien Le Moal
2023-09-23 0:29 ` [PATCH v6 18/23] ata: libata-core: Do not poweroff runtime suspended ports Damien Le Moal
2023-09-23 0:29 ` [PATCH v6 19/23] ata: libata-core: Do not resume " Damien Le Moal
2023-09-25 17:26 ` Phillip Susi
2023-09-26 6:27 ` Damien Le Moal
2023-09-26 15:01 ` Phillip Susi
2023-09-23 0:29 ` [PATCH v6 20/23] ata: libata-sata: Improve ata_sas_slave_configure() Damien Le Moal
2023-09-23 0:29 ` [PATCH v6 21/23] ata: libata-eh: Improve reset error messages Damien Le Moal
2023-09-23 0:29 ` [PATCH v6 22/23] ata: libata-eh: Reduce "disable device" message verbosity Damien Le Moal
2023-09-23 0:29 ` [PATCH v6 23/23] ata: libata: Cleanup inline DMA helper functions Damien Le Moal
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230923002932.1082348-8-dlemoal@kernel.org \
--to=dlemoal@kernel.org \
--cc=acelan.kao@canonical.com \
--cc=geert@linux-m68k.org \
--cc=john.g.garry@oracle.com \
--cc=kai.heng.feng@canonical.com \
--cc=linux-ide@vger.kernel.org \
--cc=linux-kernel@jmbreuer.net \
--cc=linux-scsi@vger.kernel.org \
--cc=martin.petersen@oracle.com \
--cc=paula@soe.ucsc.edu \
--cc=rodrigo.vivi@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.