public inbox for linux-ide@vger.kernel.org
 help / color / mirror / Atom feed
From: Damien Le Moal <dlemoal@kernel.org>
To: linux-ide@vger.kernel.org
Cc: linux-scsi@vger.kernel.org,
	"Martin K . Petersen" <martin.petersen@oracle.com>,
	John Garry <john.g.garry@oracle.com>,
	Rodrigo Vivi <rodrigo.vivi@intel.com>,
	Paul Ausbeck <paula@soe.ucsc.edu>,
	Kai-Heng Feng <kai.heng.feng@canonical.com>,
	Joe Breuer <linux-kernel@jmbreuer.net>
Subject: [PATCH v2 05/21] ata: libata-scsi: Fix delayed scsi_rescan_device() execution
Date: Tue, 12 Sep 2023 09:56:39 +0900	[thread overview]
Message-ID: <20230912005655.368075-6-dlemoal@kernel.org> (raw)
In-Reply-To: <20230912005655.368075-1-dlemoal@kernel.org>

Commit 6aa0365a3c85 ("ata: libata-scsi: Avoid deadlock on rescan after
device resume") modified ata_scsi_dev_rescan() to check the scsi device
"is_suspended" power field to ensure that the scsi device associated
with an ATA device is fully resumed when scsi_rescan_device() is
executed. However, this fix is problematic as:
1) it relies on a PM internal field that should not be used without PM
   device locking protection.
2) The check for is_suspended and the call to ata_scsi_dev_rescan() are
   not atomic and a suspend PM even may be triggered between them,
   casuing ata_scsi_dev_rescan() to be called on a suspended device,
   resulting in that function blocking while holding the scsi device
   lock, which would deadlock a following resume operation.
These problems can trigger PM deadlocks on resume, especially with
resume operations triggered quickly after or during suspend operations.
E.g., a simple bash script like:

for (( i=0; i<10; i++ )); do
	echo "+2 > /sys/class/rtc/rtc0/wakealarm
	echo mem > /sys/power/state
done

that triggers a resume 2 seconds after starting suspending a system can
quickly lead to a PM deadlock preventing the system from correctly
resuming.

Fix this by replacing the check on is_suspended with a check on the scsi
device state inside ata_scsi_dev_rescan(), while holding the scsi device
lock, thus making the device rescan atomic with regard to PM operations.
Additionnly, make sure that scheduled rescan tasks are first cancelled
before suspending an ata port.

Fixes: 6aa0365a3c85 ("ata: libata-scsi: Avoid deadlock on rescan after device resume")
Cc: stable@vger.kernel.org
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
---
 drivers/ata/libata-core.c | 16 ++++++++++++++++
 drivers/ata/libata-scsi.c | 36 ++++++++++++++++++------------------
 drivers/scsi/scsi_scan.c  | 12 +++++++++++-
 include/scsi/scsi_host.h  |  2 +-
 4 files changed, 46 insertions(+), 20 deletions(-)

diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index 0cf0caf77907..0479493e54bd 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -5172,11 +5172,27 @@ static const unsigned int ata_port_suspend_ehi = ATA_EHI_QUIET
 
 static void ata_port_suspend(struct ata_port *ap, pm_message_t mesg)
 {
+	/*
+	 * We are about to suspend the port, so we do not care about
+	 * scsi_rescan_device() calls scheduled by previous resume operations.
+	 * The next resume will schedule the rescan again. So cancel any rescan
+	 * that is not done yet.
+	 */
+	cancel_delayed_work_sync(&ap->scsi_rescan_task);
+
 	ata_port_request_pm(ap, mesg, 0, ata_port_suspend_ehi, false);
 }
 
 static void ata_port_suspend_async(struct ata_port *ap, pm_message_t mesg)
 {
+	/*
+	 * We are about to suspend the port, so we do not care about
+	 * scsi_rescan_device() calls scheduled by previous resume operations.
+	 * The next resume will schedule the rescan again. So cancel any rescan
+	 * that is not done yet.
+	 */
+	cancel_delayed_work_sync(&ap->scsi_rescan_task);
+
 	ata_port_request_pm(ap, mesg, 0, ata_port_suspend_ehi, true);
 }
 
diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
index 9bb1ace8bf79..f2d4460ab450 100644
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -4750,7 +4750,7 @@ void ata_scsi_dev_rescan(struct work_struct *work)
 	struct ata_link *link;
 	struct ata_device *dev;
 	unsigned long flags;
-	bool delay_rescan = false;
+	int ret = 0;
 
 	mutex_lock(&ap->scsi_scan_mutex);
 	spin_lock_irqsave(ap->lock, flags);
@@ -4759,37 +4759,37 @@ void ata_scsi_dev_rescan(struct work_struct *work)
 		ata_for_each_dev(dev, link, ENABLED) {
 			struct scsi_device *sdev = dev->sdev;
 
+			/*
+			 * If the port was suspended before this was scheduled,
+			 * bail out.
+			 */
+			if (ap->pflags & ATA_PFLAG_SUSPENDED)
+				goto unlock;
+
 			if (!sdev)
 				continue;
 			if (scsi_device_get(sdev))
 				continue;
 
-			/*
-			 * If the rescan work was scheduled because of a resume
-			 * event, the port is already fully resumed, but the
-			 * SCSI device may not yet be fully resumed. In such
-			 * case, executing scsi_rescan_device() may cause a
-			 * deadlock with the PM code on device_lock(). Prevent
-			 * this by giving up and retrying rescan after a short
-			 * delay.
-			 */
-			delay_rescan = sdev->sdev_gendev.power.is_suspended;
-			if (delay_rescan) {
-				scsi_device_put(sdev);
-				break;
-			}
-
 			spin_unlock_irqrestore(ap->lock, flags);
-			scsi_rescan_device(sdev);
+			ret = scsi_rescan_device(sdev);
 			scsi_device_put(sdev);
 			spin_lock_irqsave(ap->lock, flags);
+
+			if (ret)
+				goto unlock;
 		}
 	}
 
+unlock:
 	spin_unlock_irqrestore(ap->lock, flags);
 	mutex_unlock(&ap->scsi_scan_mutex);
 
-	if (delay_rescan)
+	/*
+	 * Reschedule with a delay if scsi_rescan_device() returned an error
+	 * and the port has not been suspended.
+	 */
+	if (ret && !(ap->pflags & ATA_PFLAG_SUSPENDED))
 		schedule_delayed_work(&ap->scsi_rescan_task,
 				      msecs_to_jiffies(5));
 }
diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
index 52014b2d39e1..6650f63afec9 100644
--- a/drivers/scsi/scsi_scan.c
+++ b/drivers/scsi/scsi_scan.c
@@ -1619,12 +1619,18 @@ int scsi_add_device(struct Scsi_Host *host, uint channel,
 }
 EXPORT_SYMBOL(scsi_add_device);
 
-void scsi_rescan_device(struct scsi_device *sdev)
+int scsi_rescan_device(struct scsi_device *sdev)
 {
 	struct device *dev = &sdev->sdev_gendev;
+	int ret = 0;
 
 	device_lock(dev);
 
+	if (sdev->sdev_state != SDEV_RUNNING) {
+		ret = -ENXIO;
+		goto unlock;
+	}
+
 	scsi_attach_vpd(sdev);
 	scsi_cdl_check(sdev);
 
@@ -1638,7 +1644,11 @@ void scsi_rescan_device(struct scsi_device *sdev)
 			drv->rescan(dev);
 		module_put(dev->driver->owner);
 	}
+
+unlock:
 	device_unlock(dev);
+
+	return ret;
 }
 EXPORT_SYMBOL(scsi_rescan_device);
 
diff --git a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h
index 49f768d0ff37..4c2dc8150c6d 100644
--- a/include/scsi/scsi_host.h
+++ b/include/scsi/scsi_host.h
@@ -764,7 +764,7 @@ scsi_template_proc_dir(const struct scsi_host_template *sht);
 #define scsi_template_proc_dir(sht) NULL
 #endif
 extern void scsi_scan_host(struct Scsi_Host *);
-extern void scsi_rescan_device(struct scsi_device *);
+extern int scsi_rescan_device(struct scsi_device *sdev);
 extern void scsi_remove_host(struct Scsi_Host *);
 extern struct Scsi_Host *scsi_host_get(struct Scsi_Host *);
 extern int scsi_host_busy(struct Scsi_Host *shost);
-- 
2.41.0


  parent reply	other threads:[~2023-09-12  2:07 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-12  0:56 [PATCH v2 00/21] Fix libata suspend/resume handling and code cleanup Damien Le Moal
2023-09-12  0:56 ` [PATCH v2 01/21] ata: libata-core: Fix ata_port_request_pm() locking Damien Le Moal
2023-09-12  0:56 ` [PATCH v2 02/21] ata: libata-core: Fix port and device removal Damien Le Moal
2023-09-12  0:56 ` [PATCH v2 03/21] ata: libata-scsi: link ata port and scsi device Damien Le Moal
2023-09-13 10:25   ` Geert Uytterhoeven
2023-09-14  7:08     ` Geert Uytterhoeven
2023-09-14  7:18       ` Damien Le Moal
2023-09-14 13:18       ` Damien Le Moal
2023-09-14 13:43         ` Geert Uytterhoeven
2023-09-12  0:56 ` [PATCH v2 04/21] ata: libata-scsi: Disable scsi device manage_start_stop Damien Le Moal
2023-09-12  0:56 ` Damien Le Moal [this message]
2023-09-12  0:56 ` [PATCH v2 06/21] ata: libata-core: Do not register PM operations for SAS ports Damien Le Moal
2023-09-12  0:56 ` [PATCH v2 07/21] scsi: sd: Do not issue commands to suspended disks on remove Damien Le Moal
2023-09-14 14:48   ` Bart Van Assche
2023-09-14 22:06     ` Damien Le Moal
2023-09-12  0:56 ` [PATCH v2 08/21] ata: libata-core: Fix compilation warning in ata_dev_config_ncq() Damien Le Moal
2023-09-12  6:14   ` Hannes Reinecke
2023-09-12  0:56 ` [PATCH v2 09/21] ata: libata-eh: Fix compilation warning in ata_eh_link_report() Damien Le Moal
2023-09-12  6:14   ` Hannes Reinecke
2023-09-12  0:56 ` [PATCH v2 10/21] scsi: Remove scsi device no_start_on_resume flag Damien Le Moal
2023-09-12  0:56 ` [PATCH v2 11/21] ata: libata-scsi: Cleanup ata_scsi_start_stop_xlat() Damien Le Moal
2023-09-12  0:56 ` [PATCH v2 12/21] ata: libata-core: Synchronize ata_port_detach() with hotplug Damien Le Moal
2023-09-12  0:56 ` [PATCH v2 13/21] ata: libata-core: Detach a port devices on shutdown Damien Le Moal
2023-09-12  0:56 ` [PATCH v2 14/21] ata: libata-core: Remove ata_port_suspend_async() Damien Le Moal
2023-09-12  0:56 ` [PATCH v2 15/21] ata: libata-core: Remove ata_port_resume_async() Damien Le Moal
2023-09-12  0:56 ` [PATCH v2 16/21] ata: libata-core: skip poweroff for devices that are runtime suspended Damien Le Moal
2023-09-12  0:56 ` [PATCH v2 17/21] ata: libata-core: Do not resume ports that have been " Damien Le Moal
2023-09-12  0:56 ` [PATCH v2 18/21] ata: libata-sata: Improve ata_sas_slave_configure() Damien Le Moal
2023-09-12  7:43   ` John Garry
2023-09-12  7:52     ` Damien Le Moal
2023-09-12  0:56 ` [PATCH v2 19/21] ata: libata-eh: Improve reset error messages Damien Le Moal
2023-09-12  0:56 ` [PATCH v2 20/21] ata: libata-eh: Reduce "disable device" message verbosity Damien Le Moal
2023-09-12  0:56 ` [PATCH v2 21/21] ata: libata: Cleanup inline DMA helper functions Damien Le Moal

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230912005655.368075-6-dlemoal@kernel.org \
    --to=dlemoal@kernel.org \
    --cc=john.g.garry@oracle.com \
    --cc=kai.heng.feng@canonical.com \
    --cc=linux-ide@vger.kernel.org \
    --cc=linux-kernel@jmbreuer.net \
    --cc=linux-scsi@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=paula@soe.ucsc.edu \
    --cc=rodrigo.vivi@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox