linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Damien Le Moal <dlemoal@kernel.org>
To: linux-ide@vger.kernel.org
Cc: linux-scsi@vger.kernel.org,
	"Martin K . Petersen" <martin.petersen@oracle.com>,
	John Garry <john.g.garry@oracle.com>,
	Rodrigo Vivi <rodrigo.vivi@intel.com>,
	Paul Ausbeck <paula@soe.ucsc.edu>,
	Kai-Heng Feng <kai.heng.feng@canonical.com>,
	Joe Breuer <linux-kernel@jmbreuer.net>,
	Geert Uytterhoeven <geert@linux-m68k.org>,
	Chia-Lin Kao <acelan.kao@canonical.com>
Subject: [PATCH v6 07/23] ata: libata-scsi: Fix delayed scsi_rescan_device() execution
Date: Sat, 23 Sep 2023 09:29:16 +0900	[thread overview]
Message-ID: <20230923002932.1082348-8-dlemoal@kernel.org> (raw)
In-Reply-To: <20230923002932.1082348-1-dlemoal@kernel.org>

Commit 6aa0365a3c85 ("ata: libata-scsi: Avoid deadlock on rescan after
device resume") modified ata_scsi_dev_rescan() to check the scsi device
"is_suspended" power field to ensure that the scsi device associated
with an ATA device is fully resumed when scsi_rescan_device() is
executed. However, this fix is problematic as:
1) It relies on a PM internal field that should not be used without PM
   device locking protection.
2) The check for is_suspended and the call to scsi_rescan_device() are
   not atomic and a suspend PM event may be triggered between them,
   casuing scsi_rescan_device() to be called on a suspended device and
   in that function blocking while holding the scsi device lock. This
   would deadlock a following resume operation.
These problems can trigger PM deadlocks on resume, especially with
resume operations triggered quickly after or during suspend operations.
E.g., a simple bash script like:

for (( i=0; i<10; i++ )); do
	echo "+2 > /sys/class/rtc/rtc0/wakealarm
	echo mem > /sys/power/state
done

that triggers a resume 2 seconds after starting suspending a system can
quickly lead to a PM deadlock preventing the system from correctly
resuming.

Fix this by replacing the check on is_suspended with a check on the
return value given by scsi_rescan_device() as that function will fail if
called against a suspended device. Also make sure rescan tasks already
scheduled are first cancelled before suspending an ata port.

Fixes: 6aa0365a3c85 ("ata: libata-scsi: Avoid deadlock on rescan after device resume")
Cc: stable@vger.kernel.org
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Niklas Cassel <niklas.cassel@wdc.com>
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
---
 drivers/ata/libata-core.c | 16 ++++++++++++++++
 drivers/ata/libata-scsi.c | 33 +++++++++++++++------------------
 2 files changed, 31 insertions(+), 18 deletions(-)

diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index a0bc01606b30..092372334e92 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -5168,11 +5168,27 @@ static const unsigned int ata_port_suspend_ehi = ATA_EHI_QUIET
 
 static void ata_port_suspend(struct ata_port *ap, pm_message_t mesg)
 {
+	/*
+	 * We are about to suspend the port, so we do not care about
+	 * scsi_rescan_device() calls scheduled by previous resume operations.
+	 * The next resume will schedule the rescan again. So cancel any rescan
+	 * that is not done yet.
+	 */
+	cancel_delayed_work_sync(&ap->scsi_rescan_task);
+
 	ata_port_request_pm(ap, mesg, 0, ata_port_suspend_ehi, false);
 }
 
 static void ata_port_suspend_async(struct ata_port *ap, pm_message_t mesg)
 {
+	/*
+	 * We are about to suspend the port, so we do not care about
+	 * scsi_rescan_device() calls scheduled by previous resume operations.
+	 * The next resume will schedule the rescan again. So cancel any rescan
+	 * that is not done yet.
+	 */
+	cancel_delayed_work_sync(&ap->scsi_rescan_task);
+
 	ata_port_request_pm(ap, mesg, 0, ata_port_suspend_ehi, true);
 }
 
diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
index a69d63e7b919..576bb51cb480 100644
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -4756,7 +4756,7 @@ void ata_scsi_dev_rescan(struct work_struct *work)
 	struct ata_link *link;
 	struct ata_device *dev;
 	unsigned long flags;
-	bool delay_rescan = false;
+	int ret = 0;
 
 	mutex_lock(&ap->scsi_scan_mutex);
 	spin_lock_irqsave(ap->lock, flags);
@@ -4765,37 +4765,34 @@ void ata_scsi_dev_rescan(struct work_struct *work)
 		ata_for_each_dev(dev, link, ENABLED) {
 			struct scsi_device *sdev = dev->sdev;
 
+			/*
+			 * If the port was suspended before this was scheduled,
+			 * bail out.
+			 */
+			if (ap->pflags & ATA_PFLAG_SUSPENDED)
+				goto unlock;
+
 			if (!sdev)
 				continue;
 			if (scsi_device_get(sdev))
 				continue;
 
-			/*
-			 * If the rescan work was scheduled because of a resume
-			 * event, the port is already fully resumed, but the
-			 * SCSI device may not yet be fully resumed. In such
-			 * case, executing scsi_rescan_device() may cause a
-			 * deadlock with the PM code on device_lock(). Prevent
-			 * this by giving up and retrying rescan after a short
-			 * delay.
-			 */
-			delay_rescan = sdev->sdev_gendev.power.is_suspended;
-			if (delay_rescan) {
-				scsi_device_put(sdev);
-				break;
-			}
-
 			spin_unlock_irqrestore(ap->lock, flags);
-			scsi_rescan_device(sdev);
+			ret = scsi_rescan_device(sdev);
 			scsi_device_put(sdev);
 			spin_lock_irqsave(ap->lock, flags);
+
+			if (ret)
+				goto unlock;
 		}
 	}
 
+unlock:
 	spin_unlock_irqrestore(ap->lock, flags);
 	mutex_unlock(&ap->scsi_scan_mutex);
 
-	if (delay_rescan)
+	/* Reschedule with a delay if scsi_rescan_device() returned an error */
+	if (ret)
 		schedule_delayed_work(&ap->scsi_rescan_task,
 				      msecs_to_jiffies(5));
 }
-- 
2.41.0


  parent reply	other threads:[~2023-09-23  0:29 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-23  0:29 [PATCH v6 00/23] Fix libata suspend/resume handling and code cleanup Damien Le Moal
2023-09-23  0:29 ` [PATCH v6 01/23] ata: libata-core: Fix ata_port_request_pm() locking Damien Le Moal
2023-09-26 17:18   ` Bart Van Assche
2023-09-23  0:29 ` [PATCH v6 02/23] ata: libata-core: Fix port and device removal Damien Le Moal
2023-09-26 17:28   ` Bart Van Assche
2023-09-23  0:29 ` [PATCH v6 03/23] ata: libata-scsi: link ata port and scsi device Damien Le Moal
2023-09-23  0:29 ` [PATCH v6 04/23] scsi: sd: Differentiate system and runtime start/stop management Damien Le Moal
2023-09-26 18:07   ` Bart Van Assche
2023-09-23  0:29 ` [PATCH v6 05/23] ata: libata-scsi: Disable scsi device manage_system_start_stop Damien Le Moal
2023-09-25 14:27   ` Phillip Susi
2023-09-26  6:19     ` Damien Le Moal
2023-09-26  6:34       ` Damien Le Moal
2023-09-26 15:25       ` Phillip Susi
2023-09-23  0:29 ` [PATCH v6 06/23] scsi: Do not attempt to rescan suspended devices Damien Le Moal
2023-09-26 18:10   ` Bart Van Assche
2023-09-23  0:29 ` Damien Le Moal [this message]
2023-09-23  0:29 ` [PATCH v6 08/23] ata: libata-core: Do not register PM operations for SAS ports Damien Le Moal
2023-09-23  0:29 ` [PATCH v6 09/23] scsi: sd: Do not issue commands to suspended disks on shutdown Damien Le Moal
2023-09-25 20:22   ` Bart Van Assche
2023-09-26  6:00     ` Damien Le Moal
2023-09-26 14:51       ` Bart Van Assche
2023-09-26 23:30         ` Bart Van Assche
2023-09-23  0:29 ` [PATCH v6 10/23] ata: libata-core: Fix compilation warning in ata_dev_config_ncq() Damien Le Moal
2023-09-23  0:29 ` [PATCH v6 11/23] ata: libata-eh: Fix compilation warning in ata_eh_link_report() Damien Le Moal
2023-09-23  0:29 ` [PATCH v6 12/23] scsi: Remove scsi device no_start_on_resume flag Damien Le Moal
2023-09-26 20:42   ` Bart Van Assche
2023-09-23  0:29 ` [PATCH v6 13/23] ata: libata-scsi: Cleanup ata_scsi_start_stop_xlat() Damien Le Moal
2023-09-23  0:29 ` [PATCH v6 14/23] ata: libata-core: Synchronize ata_port_detach() with hotplug Damien Le Moal
2023-09-23  0:29 ` [PATCH v6 15/23] ata: libata-core: Detach a port devices on shutdown Damien Le Moal
2023-09-23  0:29 ` [PATCH v6 16/23] ata: libata-core: Remove ata_port_suspend_async() Damien Le Moal
2023-09-23  0:29 ` [PATCH v6 17/23] ata: libata-core: Remove ata_port_resume_async() Damien Le Moal
2023-09-23  0:29 ` [PATCH v6 18/23] ata: libata-core: Do not poweroff runtime suspended ports Damien Le Moal
2023-09-23  0:29 ` [PATCH v6 19/23] ata: libata-core: Do not resume " Damien Le Moal
2023-09-25 17:26   ` Phillip Susi
2023-09-26  6:27     ` Damien Le Moal
2023-09-26 15:01       ` Phillip Susi
2023-09-23  0:29 ` [PATCH v6 20/23] ata: libata-sata: Improve ata_sas_slave_configure() Damien Le Moal
2023-09-23  0:29 ` [PATCH v6 21/23] ata: libata-eh: Improve reset error messages Damien Le Moal
2023-09-23  0:29 ` [PATCH v6 22/23] ata: libata-eh: Reduce "disable device" message verbosity Damien Le Moal
2023-09-23  0:29 ` [PATCH v6 23/23] ata: libata: Cleanup inline DMA helper functions Damien Le Moal

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230923002932.1082348-8-dlemoal@kernel.org \
    --to=dlemoal@kernel.org \
    --cc=acelan.kao@canonical.com \
    --cc=geert@linux-m68k.org \
    --cc=john.g.garry@oracle.com \
    --cc=kai.heng.feng@canonical.com \
    --cc=linux-ide@vger.kernel.org \
    --cc=linux-kernel@jmbreuer.net \
    --cc=linux-scsi@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=paula@soe.ucsc.edu \
    --cc=rodrigo.vivi@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).