[PATCH v5 0/3] Accelerate Storage Resume (2x or more)

linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v5 0/3] Accelerate Storage Resume (2x or more)
@ 2014-03-05 20:17 Dan Williams
  2014-03-05 20:17 ` [PATCH v5 1/3] libata, libsas: kill pm_result and related cleanup Dan Williams
                   ` (2 more replies)
  0 siblings, 3 replies; 28+ messages in thread
From: Dan Williams @ 2014-03-05 20:17 UTC (permalink / raw)
  To: tj, JBottomley
  Cc: Len Brown, linux-scsi, Phillip Susi, linux-ide, Alan Stern,
	Todd Brandt, Todd Brandt

It is not everyday that a kernel operation gets 2x faster.  Nice find
by Todd and his AnalyzeSuspend tool [1].

Todd is on vacation so I am taking care of these patches to make sure
they get in the queue for 3.15.

The significant changes from Todd's last submission [2]:

1/ Split out the pure cleanup into its own patch, and expand it to clean
   up libsas as well

2/ Move the entirety of scsi_device resume to an async context.  As
   written, v4 of the patchset overlapped the scsi-start-stop command
   with the scsi_device_resume().  Now in v5 the queue restart is
   properly ordered with the completion of the scsi-start-stop command.

Resume completion time as reported by:

"echo devices > /sys/power/pm_test && echo mem > /sys/power/state"

BEFORE: PM: resume of devices complete after 2271.097 msecs
AFTER:  PM: resume of devices complete after 1057.404 msecs

On a system with two SSDs attached via AHCI.

--
Dan


[1]: Todd's testing showing up to 12x resume latency improvement in some
cases:
https://01.org/suspendresume/blogs/tebrandt/2013/hard-disk-resume-optimization-simpler-approach

[2]: Todd's v4 patchset:
[PATCH v4 0/2] http://marc.info/?l=linux-ide&m=138984698103487&w=2
[PATCH v4 1/2] http://marc.info/?l=linux-ide&m=138984713203515&w=2
[PATCH v4 2/2] http://marc.info/?l=linux-ide&m=138984722003539&w=2

---

Dan Williams (2):
      libata, libsas: kill pm_result and related cleanup
      scsi: async sd resume

Todd Brandt (1):
      libata: async resume


 drivers/ata/libata-core.c     |   75 ++++++++++-----------------
 drivers/ata/libata-eh.c       |   13 +----
 drivers/scsi/libsas/sas_ata.c |   35 ++----------
 drivers/scsi/scsi.c           |    3 +
 drivers/scsi/scsi_pm.c        |  115 ++++++++++++++++++++++++++++++++---------
 drivers/scsi/scsi_priv.h      |    1 
 drivers/scsi/sd.c             |    1 
 include/linux/libata.h        |    9 +--
 include/scsi/libsas.h         |    1 
 9 files changed, 134 insertions(+), 119 deletions(-)

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v5 1/3] libata, libsas: kill pm_result and related cleanup
  2014-03-05 20:17 [PATCH v5 0/3] Accelerate Storage Resume (2x or more) Dan Williams
@ 2014-03-05 20:17 ` Dan Williams
  2014-03-10 20:29   ` Tejun Heo
  2014-03-05 20:17 ` [PATCH v5 2/3] libata: async resume Dan Williams
  2014-03-05 20:17 ` [PATCH v5 3/3] scsi: async sd resume Dan Williams
  2 siblings, 1 reply; 28+ messages in thread
From: Dan Williams @ 2014-03-05 20:17 UTC (permalink / raw)
  To: tj, JBottomley
  Cc: Phillip Susi, Todd Brandt, Alan Stern, linux-scsi, linux-ide

Tejun says:
  "At least for libata, worrying about suspend/resume failures don't make
   whole lot of sense.  If suspend failed, just proceed with suspend.  If
   the device can't be woken up afterwards, that's that.  There isn't
   anything we could have done differently anyway.  The same for resume, if
   spinup fails, the device is dud and the following commands will invoke
   EH actions and will eventually fail.  Again, there really isn't any
   *choice* to make.  Just making sure the errors are handled gracefully
   (ie. don't crash) and the following commands are handled correctly
   should be enough."

The only libata user that actually cares about the result from a suspend
operation is libsas.  However, it only cares about whether queuing a new
operation collides with an in-flight one.  All libsas does with the
error is retry, but we can just let libata wait for the previous
operation before continuing.

While we're at it take an opportunity to replace helper functions that
just do a to_ata_port(dev) conversion with macros whose names readily
distinguish sync vs async (queued) operations.

Reference: http://marc.info/?l=linux-scsi&m=138995409532286&w=2

Cc: Phillip Susi <psusi@ubuntu.com>
Cc: Alan Stern <stern@rowland.harvard.edu>
Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Todd Brandt <todd.e.brandt@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/ata/libata-core.c     |   75 +++++++++++++++--------------------------
 drivers/ata/libata-eh.c       |   13 +------
 drivers/scsi/libsas/sas_ata.c |   35 +++----------------
 include/linux/libata.h        |    9 ++---
 include/scsi/libsas.h         |    1 -
 5 files changed, 39 insertions(+), 94 deletions(-)

diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index 1a3dbd1b196e..0f47436c714c 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -5351,22 +5351,17 @@ bool ata_link_offline(struct ata_link *link)
 }
 
 #ifdef CONFIG_PM
-static int ata_port_request_pm(struct ata_port *ap, pm_message_t mesg,
-			       unsigned int action, unsigned int ehi_flags,
-			       int *async)
+static void ata_port_request_pm(struct ata_port *ap, pm_message_t mesg,
+				unsigned int action, unsigned int ehi_flags,
+				bool async)
 {
 	struct ata_link *link;
 	unsigned long flags;
-	int rc = 0;
 
 	/* Previous resume operation might still be in
 	 * progress.  Wait for PM_PENDING to clear.
 	 */
 	if (ap->pflags & ATA_PFLAG_PM_PENDING) {
-		if (async) {
-			*async = -EAGAIN;
-			return 0;
-		}
 		ata_port_wait_eh(ap);
 		WARN_ON(ap->pflags & ATA_PFLAG_PM_PENDING);
 	}
@@ -5375,11 +5370,6 @@ static int ata_port_request_pm(struct ata_port *ap, pm_message_t mesg,
 	spin_lock_irqsave(ap->lock, flags);
 
 	ap->pm_mesg = mesg;
-	if (async)
-		ap->pm_result = async;
-	else
-		ap->pm_result = &rc;
-
 	ap->pflags |= ATA_PFLAG_PM_PENDING;
 	ata_for_each_link(link, ap, HOST_FIRST) {
 		link->eh_info.action |= action;
@@ -5390,16 +5380,13 @@ static int ata_port_request_pm(struct ata_port *ap, pm_message_t mesg,
 
 	spin_unlock_irqrestore(ap->lock, flags);
 
-	/* wait and check result */
 	if (!async) {
 		ata_port_wait_eh(ap);
 		WARN_ON(ap->pflags & ATA_PFLAG_PM_PENDING);
 	}
-
-	return rc;
 }
 
-static int __ata_port_suspend_common(struct ata_port *ap, pm_message_t mesg, int *async)
+static int ata_port_suspend_common(struct ata_port *ap, pm_message_t mesg, bool async)
 {
 	/*
 	 * On some hardware, device fails to respond after spun down
@@ -5411,59 +5398,53 @@ static int __ata_port_suspend_common(struct ata_port *ap, pm_message_t mesg, int
 	 */
 	unsigned int ehi_flags = ATA_EHI_QUIET | ATA_EHI_NO_AUTOPSY |
 				 ATA_EHI_NO_RECOVERY;
-	return ata_port_request_pm(ap, mesg, 0, ehi_flags, async);
+	ata_port_request_pm(ap, mesg, 0, ehi_flags, async);
+	return 0;
 }
 
-static int ata_port_suspend_common(struct device *dev, pm_message_t mesg)
-{
-	struct ata_port *ap = to_ata_port(dev);
-
-	return __ata_port_suspend_common(ap, mesg, NULL);
-}
+#define ata_port_suspend_sync(ap, msg) ata_port_suspend_common((ap), (msg), false)
+#define queue_ata_port_suspend(ap, msg) ata_port_suspend_common((ap), (msg), true)
 
 static int ata_port_suspend(struct device *dev)
 {
+	struct ata_port *ap = to_ata_port(dev);
+
 	if (pm_runtime_suspended(dev))
 		return 0;
 
-	return ata_port_suspend_common(dev, PMSG_SUSPEND);
+	return ata_port_suspend_sync(ap, PMSG_SUSPEND);
 }
 
 static int ata_port_do_freeze(struct device *dev)
 {
+	struct ata_port *ap = to_ata_port(dev);
+
 	if (pm_runtime_suspended(dev))
 		return 0;
 
-	return ata_port_suspend_common(dev, PMSG_FREEZE);
+	return ata_port_suspend_sync(ap, PMSG_FREEZE);
 }
 
 static int ata_port_poweroff(struct device *dev)
 {
-	return ata_port_suspend_common(dev, PMSG_HIBERNATE);
+	return ata_port_suspend_sync(to_ata_port(dev), PMSG_HIBERNATE);
 }
 
-static int __ata_port_resume_common(struct ata_port *ap, pm_message_t mesg,
-				    int *async)
+static int ata_port_resume_common(struct ata_port *ap, pm_message_t mesg, bool async)
 {
-	int rc;
-
-	rc = ata_port_request_pm(ap, mesg, ATA_EH_RESET,
-		ATA_EHI_NO_AUTOPSY | ATA_EHI_QUIET, async);
-	return rc;
+	ata_port_request_pm(ap, mesg, ATA_EH_RESET,
+			    ATA_EHI_NO_AUTOPSY | ATA_EHI_QUIET, async);
+	return 0;
 }
 
-static int ata_port_resume_common(struct device *dev, pm_message_t mesg)
-{
-	struct ata_port *ap = to_ata_port(dev);
-
-	return __ata_port_resume_common(ap, mesg, NULL);
-}
+#define ata_port_resume_sync(ap, msg) ata_port_resume_common((ap), (msg), false)
+#define queue_ata_port_resume(ap, msg) ata_port_resume_common((ap), (msg), true)
 
 static int ata_port_resume(struct device *dev)
 {
 	int rc;
 
-	rc = ata_port_resume_common(dev, PMSG_RESUME);
+	rc = ata_port_resume_sync(to_ata_port(dev), PMSG_RESUME);
 	if (!rc) {
 		pm_runtime_disable(dev);
 		pm_runtime_set_active(dev);
@@ -5499,12 +5480,12 @@ static int ata_port_runtime_idle(struct device *dev)
 
 static int ata_port_runtime_suspend(struct device *dev)
 {
-	return ata_port_suspend_common(dev, PMSG_AUTO_SUSPEND);
+	return ata_port_suspend_sync(to_ata_port(dev), PMSG_AUTO_SUSPEND);
 }
 
 static int ata_port_runtime_resume(struct device *dev)
 {
-	return ata_port_resume_common(dev, PMSG_AUTO_RESUME);
+	return ata_port_resume_sync(to_ata_port(dev), PMSG_AUTO_RESUME);
 }
 
 static const struct dev_pm_ops ata_port_pm_ops = {
@@ -5525,15 +5506,15 @@ static const struct dev_pm_ops ata_port_pm_ops = {
  * level. sas suspend/resume is async to allow parallel port recovery
  * since sas has multiple ata_port instances per Scsi_Host.
  */
-int ata_sas_port_async_suspend(struct ata_port *ap, int *async)
+int ata_sas_port_async_suspend(struct ata_port *ap)
 {
-	return __ata_port_suspend_common(ap, PMSG_SUSPEND, async);
+	return queue_ata_port_suspend(ap, PMSG_SUSPEND);
 }
 EXPORT_SYMBOL_GPL(ata_sas_port_async_suspend);
 
-int ata_sas_port_async_resume(struct ata_port *ap, int *async)
+int ata_sas_port_async_resume(struct ata_port *ap)
 {
-	return __ata_port_resume_common(ap, PMSG_RESUME, async);
+	return queue_ata_port_resume(ap, PMSG_RESUME);
 }
 EXPORT_SYMBOL_GPL(ata_sas_port_async_resume);
 
diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c
index 6d8757008318..b4d0596f30a8 100644
--- a/drivers/ata/libata-eh.c
+++ b/drivers/ata/libata-eh.c
@@ -4069,7 +4069,7 @@ static void ata_eh_handle_port_suspend(struct ata_port *ap)
 
 	ata_acpi_set_state(ap, ap->pm_mesg);
  out:
-	/* report result */
+	/* update the flags */
 	spin_lock_irqsave(ap->lock, flags);
 
 	ap->pflags &= ~ATA_PFLAG_PM_PENDING;
@@ -4078,11 +4078,6 @@ static void ata_eh_handle_port_suspend(struct ata_port *ap)
 	else if (ap->pflags & ATA_PFLAG_FROZEN)
 		ata_port_schedule_eh(ap);
 
-	if (ap->pm_result) {
-		*ap->pm_result = rc;
-		ap->pm_result = NULL;
-	}
-
 	spin_unlock_irqrestore(ap->lock, flags);
 
 	return;
@@ -4134,13 +4129,9 @@ static void ata_eh_handle_port_resume(struct ata_port *ap)
 	/* tell ACPI that we're resuming */
 	ata_acpi_on_resume(ap);
 
-	/* report result */
+	/* update the flags */
 	spin_lock_irqsave(ap->lock, flags);
 	ap->pflags &= ~(ATA_PFLAG_PM_PENDING | ATA_PFLAG_SUSPENDED);
-	if (ap->pm_result) {
-		*ap->pm_result = rc;
-		ap->pm_result = NULL;
-	}
 	spin_unlock_irqrestore(ap->lock, flags);
 }
 #endif /* CONFIG_PM */
diff --git a/drivers/scsi/libsas/sas_ata.c b/drivers/scsi/libsas/sas_ata.c
index d2895836f9fa..9a2af18fd843 100644
--- a/drivers/scsi/libsas/sas_ata.c
+++ b/drivers/scsi/libsas/sas_ata.c
@@ -700,46 +700,26 @@ void sas_probe_sata(struct asd_sas_port *port)
 
 }
 
-static bool sas_ata_flush_pm_eh(struct asd_sas_port *port, const char *func)
+static void sas_ata_flush_pm_eh(struct asd_sas_port *port, const char *func)
 {
 	struct domain_device *dev, *n;
-	bool retry = false;
 
 	list_for_each_entry_safe(dev, n, &port->dev_list, dev_list_node) {
-		int rc;
-
 		if (!dev_is_sata(dev))
 			continue;
 
 		sas_ata_wait_eh(dev);
-		rc = dev->sata_dev.pm_result;
-		if (rc == -EAGAIN)
-			retry = true;
-		else if (rc) {
-			/* since we don't have a
-			 * ->port_{suspend|resume} routine in our
-			 *  ata_port ops, and no entanglements with
-			 *  acpi, suspend should just be mechanical trip
-			 *  through eh, catch cases where these
-			 *  assumptions are invalidated
-			 */
-			WARN_ONCE(1, "failed %s %s error: %d\n", func,
-				 dev_name(&dev->rphy->dev), rc);
-		}
 
 		/* if libata failed to power manage the device, tear it down */
 		if (ata_dev_disabled(sas_to_ata_dev(dev)))
 			sas_fail_probe(dev, func, -ENODEV);
 	}
-
-	return retry;
 }
 
 void sas_suspend_sata(struct asd_sas_port *port)
 {
 	struct domain_device *dev;
 
- retry:
 	mutex_lock(&port->ha->disco_mutex);
 	list_for_each_entry(dev, &port->dev_list, dev_list_node) {
 		struct sata_device *sata;
@@ -751,20 +731,17 @@ void sas_suspend_sata(struct asd_sas_port *port)
 		if (sata->ap->pm_mesg.event == PM_EVENT_SUSPEND)
 			continue;
 
-		sata->pm_result = -EIO;
-		ata_sas_port_async_suspend(sata->ap, &sata->pm_result);
+		ata_sas_port_async_suspend(sata->ap);
 	}
 	mutex_unlock(&port->ha->disco_mutex);
 
-	if (sas_ata_flush_pm_eh(port, __func__))
-		goto retry;
+	sas_ata_flush_pm_eh(port, __func__);
 }
 
 void sas_resume_sata(struct asd_sas_port *port)
 {
 	struct domain_device *dev;
 
- retry:
 	mutex_lock(&port->ha->disco_mutex);
 	list_for_each_entry(dev, &port->dev_list, dev_list_node) {
 		struct sata_device *sata;
@@ -776,13 +753,11 @@ void sas_resume_sata(struct asd_sas_port *port)
 		if (sata->ap->pm_mesg.event == PM_EVENT_ON)
 			continue;
 
-		sata->pm_result = -EIO;
-		ata_sas_port_async_resume(sata->ap, &sata->pm_result);
+		ata_sas_port_async_resume(sata->ap);
 	}
 	mutex_unlock(&port->ha->disco_mutex);
 
-	if (sas_ata_flush_pm_eh(port, __func__))
-		goto retry;
+	sas_ata_flush_pm_eh(port, __func__);
 }
 
 /**
diff --git a/include/linux/libata.h b/include/linux/libata.h
index bec6dbe939a0..304d023a91aa 100644
--- a/include/linux/libata.h
+++ b/include/linux/libata.h
@@ -848,7 +848,6 @@ struct ata_port {
 	struct completion	park_req_pending;
 
 	pm_message_t		pm_mesg;
-	int			*pm_result;
 	enum ata_lpm_policy	target_lpm_policy;
 
 	struct timer_list	fastdrain_timer;
@@ -1140,14 +1139,14 @@ extern bool ata_link_offline(struct ata_link *link);
 #ifdef CONFIG_PM
 extern int ata_host_suspend(struct ata_host *host, pm_message_t mesg);
 extern void ata_host_resume(struct ata_host *host);
-extern int ata_sas_port_async_suspend(struct ata_port *ap, int *async);
-extern int ata_sas_port_async_resume(struct ata_port *ap, int *async);
+extern int ata_sas_port_async_suspend(struct ata_port *ap);
+extern int ata_sas_port_async_resume(struct ata_port *ap);
 #else
-static inline int ata_sas_port_async_suspend(struct ata_port *ap, int *async)
+static inline int ata_sas_port_async_suspend(struct ata_port *ap)
 {
 	return 0;
 }
-static inline int ata_sas_port_async_resume(struct ata_port *ap, int *async)
+static inline int ata_sas_port_async_resume(struct ata_port *ap)
 {
 	return 0;
 }
diff --git a/include/scsi/libsas.h b/include/scsi/libsas.h
index f843dd8722a9..ef7872c20da9 100644
--- a/include/scsi/libsas.h
+++ b/include/scsi/libsas.h
@@ -172,7 +172,6 @@ struct sata_device {
         enum   ata_command_set command_set;
         struct smp_resp        rps_resp; /* report_phy_sata_resp */
         u8     port_no;        /* port number, if this is a PM (Port) */
-	int    pm_result;
 
 	struct ata_port *ap;
 	struct ata_host ata_host;


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v5 2/3] libata: async resume
  2014-03-05 20:17 [PATCH v5 0/3] Accelerate Storage Resume (2x or more) Dan Williams
  2014-03-05 20:17 ` [PATCH v5 1/3] libata, libsas: kill pm_result and related cleanup Dan Williams
@ 2014-03-05 20:17 ` Dan Williams
  2014-12-19 23:46   ` Phillip Susi
  2014-03-05 20:17 ` [PATCH v5 3/3] scsi: async sd resume Dan Williams
  2 siblings, 1 reply; 28+ messages in thread
From: Dan Williams @ 2014-03-05 20:17 UTC (permalink / raw)
  To: tj, JBottomley
  Cc: Len Brown, linux-scsi, Phillip Susi, linux-ide, Alan Stern,
	Todd Brandt

From: Todd Brandt <todd.e.brandt@linux.intel.com>

Improve overall system resume time by making libata link recovery
actions asynchronous relative to other resume events.

Link resume operations are performed using the scsi_eh thread, so
commands, particularly the sd resume start/stop command, will be held
off until the device exits error handling.  Libata already flushes eh
with ata_port_wait_eh() in the port teardown paths, so there are no
concerns with async operation colliding with the end-of-life of the
ata_port object.  Also, libata-core is already careful to flush
in-flight pm operations before another round of pm starts on the given
ata_port.

Reference: https://01.org/suspendresume/blogs/tebrandt/2013/hard-disk-resume-optimization-simpler-approach

Cc: Len Brown <len.brown@intel.com>
Cc: Phillip Susi <psusi@ubuntu.com>
Cc: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Todd Brandt <todd.e.brandt@linux.intel.com>
[djbw: rebase on cleanup patch, changelog wordsmithing]
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/ata/libata-core.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index 0f47436c714c..7719ec7d9df9 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -5444,7 +5444,7 @@ static int ata_port_resume(struct device *dev)
 {
 	int rc;
 
-	rc = ata_port_resume_sync(to_ata_port(dev), PMSG_RESUME);
+	rc = queue_ata_port_resume(to_ata_port(dev), PMSG_RESUME);
 	if (!rc) {
 		pm_runtime_disable(dev);
 		pm_runtime_set_active(dev);


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v5 3/3] scsi: async sd resume
  2014-03-05 20:17 [PATCH v5 0/3] Accelerate Storage Resume (2x or more) Dan Williams
  2014-03-05 20:17 ` [PATCH v5 1/3] libata, libsas: kill pm_result and related cleanup Dan Williams
  2014-03-05 20:17 ` [PATCH v5 2/3] libata: async resume Dan Williams
@ 2014-03-05 20:17 ` Dan Williams
  2014-03-07 15:42   ` Alan Stern
  2 siblings, 1 reply; 28+ messages in thread
From: Dan Williams @ 2014-03-05 20:17 UTC (permalink / raw)
  To: tj, JBottomley
  Cc: Len Brown, linux-scsi, Phillip Susi, linux-ide, Alan Stern,
	Todd Brandt

async_schedule() sd resume work to allow disks and other devices to
resume in parallel.

This moves the entirety of scsi_device resume to an async context to
ensure that scsi_device_resume() remains ordered with respect to the
completion of the start/stop command.  For the duration of the resume,
new command submissions (that do not originate from the scsi-core) will
be deferred (BLKPREP_DEFER).

It adds a new ASYNC_DOMAIN_EXCLUSIVE(scsi_sd_pm_domain) as a container
of these operations.  Like scsi_sd_probe_domain it is flushed at
sd_remove() time to ensure async ops do not continue past the
end-of-life of the sdev.  The implementation explicitly refrains from
reusing scsi_sd_probe_domain directly for this purpose as it is flushed
at the end of dpm_resume(), potentially defeating some of the benefit.
Given sdevs are quiesced it is permissible for these resume operations
to bleed past the async_synchronize_full() calls made by the driver
core.

We defer the resolution of which pm callback to call until
scsi_dev_type_{suspend|resume} time and guarantee that the 'int
(*cb)(struct device *)' parameter is never NULL.  With this in place the
type of resume operation is encoded in the async function identifier.

Inspired by Todd's analysis and initial proposal [2]:
https://01.org/suspendresume/blogs/tebrandt/2013/hard-disk-resume-optimization-simpler-approach

Cc: Len Brown <len.brown@intel.com>
Cc: Phillip Susi <psusi@ubuntu.com>
Cc: Alan Stern <stern@rowland.harvard.edu>
Suggested-by: Todd Brandt <todd.e.brandt@linux.intel.com>
[djbw: kick all resume work to the async queue]
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/scsi/scsi.c      |    3 +
 drivers/scsi/scsi_pm.c   |  115 ++++++++++++++++++++++++++++++++++++----------
 drivers/scsi/scsi_priv.h |    1 
 drivers/scsi/sd.c        |    1 
 4 files changed, 95 insertions(+), 25 deletions(-)

diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
index d8afec8317cf..990d4a302788 100644
--- a/drivers/scsi/scsi.c
+++ b/drivers/scsi/scsi.c
@@ -91,6 +91,9 @@ EXPORT_SYMBOL(scsi_logging_level);
 ASYNC_DOMAIN(scsi_sd_probe_domain);
 EXPORT_SYMBOL(scsi_sd_probe_domain);
 
+ASYNC_DOMAIN_EXCLUSIVE(scsi_sd_pm_domain);
+EXPORT_SYMBOL(scsi_sd_pm_domain);
+
 /* NB: These are exposed through /proc/scsi/scsi and form part of the ABI.
  * You may not alter any existing entry (although adding new ones is
  * encouraged once assigned by ANSI/INCITS T10
diff --git a/drivers/scsi/scsi_pm.c b/drivers/scsi/scsi_pm.c
index 001e9ceda4c3..1cb211bf0383 100644
--- a/drivers/scsi/scsi_pm.c
+++ b/drivers/scsi/scsi_pm.c
@@ -18,17 +18,52 @@
 
 #ifdef CONFIG_PM_SLEEP
 
+#define do_pm_op(dev, op) \
+	dev->driver ? dev->driver->pm ? \
+	dev->driver->pm->op(dev) : 0 : 0
+
+static int do_scsi_suspend(struct device *dev)
+{
+	return do_pm_op(dev, suspend);
+}
+
+static int do_scsi_freeze(struct device *dev)
+{
+	return do_pm_op(dev, freeze);
+}
+
+static int do_scsi_poweroff(struct device *dev)
+{
+	return do_pm_op(dev, poweroff);
+}
+
+static int do_scsi_resume(struct device *dev)
+{
+	return do_pm_op(dev, resume);
+}
+
+static int do_scsi_thaw(struct device *dev)
+{
+	return do_pm_op(dev, thaw);
+}
+
+static int do_scsi_restore(struct device *dev)
+{
+	return do_pm_op(dev, restore);
+}
+
 static int scsi_dev_type_suspend(struct device *dev, int (*cb)(struct device *))
 {
 	int err;
 
+	/* flush pending in-flight resume operations, suspend is synchronous */
+	async_synchronize_full_domain(&scsi_sd_pm_domain);
+
 	err = scsi_device_quiesce(to_scsi_device(dev));
 	if (err == 0) {
-		if (cb) {
-			err = cb(dev);
-			if (err)
-				scsi_device_resume(to_scsi_device(dev));
-		}
+		err = cb(dev);
+		if (err)
+			scsi_device_resume(to_scsi_device(dev));
 	}
 	dev_dbg(dev, "scsi suspend: %d\n", err);
 	return err;
@@ -38,10 +73,16 @@ static int scsi_dev_type_resume(struct device *dev, int (*cb)(struct device *))
 {
 	int err = 0;
 
-	if (cb)
-		err = cb(dev);
+	err = cb(dev);
 	scsi_device_resume(to_scsi_device(dev));
 	dev_dbg(dev, "scsi resume: %d\n", err);
+
+	if (err == 0) {
+		pm_runtime_disable(dev);
+		pm_runtime_set_active(dev);
+		pm_runtime_enable(dev);
+	}
+
 	return err;
 }
 
@@ -66,20 +107,50 @@ scsi_bus_suspend_common(struct device *dev, int (*cb)(struct device *))
 	return err;
 }
 
+static void async_sdev_resume(void *dev, async_cookie_t cookie)
+{
+	scsi_dev_type_resume(dev, do_scsi_resume);
+}
+
+static void async_sdev_thaw(void *dev, async_cookie_t cookie)
+{
+	scsi_dev_type_resume(dev, do_scsi_thaw);
+}
+
+static void async_sdev_restore(void *dev, async_cookie_t cookie)
+{
+	scsi_dev_type_resume(dev, do_scsi_restore);
+}
+
+static async_func_t to_async_sdev_resume_fn(struct device *dev,
+		int (*cb)(struct device *))
+{
+	if (!scsi_is_sdev_device(dev))
+		return NULL;
+
+	if (cb == do_scsi_resume)
+		return async_sdev_resume;
+	else if (cb == do_scsi_thaw)
+		return async_sdev_thaw;
+	else if (cb == do_scsi_restore)
+		return async_sdev_restore;
+	else
+		return NULL;
+}
+
 static int
 scsi_bus_resume_common(struct device *dev, int (*cb)(struct device *))
 {
-	int err = 0;
-
-	if (scsi_is_sdev_device(dev))
-		err = scsi_dev_type_resume(dev, cb);
+	async_func_t fn = to_async_sdev_resume_fn(dev, cb);
 
-	if (err == 0) {
+	if (fn)
+		async_schedule_domain(fn, dev, &scsi_sd_pm_domain);
+	else {
 		pm_runtime_disable(dev);
 		pm_runtime_set_active(dev);
 		pm_runtime_enable(dev);
 	}
-	return err;
+	return 0;
 }
 
 static int scsi_bus_prepare(struct device *dev)
@@ -97,38 +168,32 @@ static int scsi_bus_prepare(struct device *dev)
 
 static int scsi_bus_suspend(struct device *dev)
 {
-	const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
-	return scsi_bus_suspend_common(dev, pm ? pm->suspend : NULL);
+	return scsi_bus_suspend_common(dev, do_scsi_suspend);
 }
 
 static int scsi_bus_resume(struct device *dev)
 {
-	const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
-	return scsi_bus_resume_common(dev, pm ? pm->resume : NULL);
+	return scsi_bus_resume_common(dev, do_scsi_resume);
 }
 
 static int scsi_bus_freeze(struct device *dev)
 {
-	const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
-	return scsi_bus_suspend_common(dev, pm ? pm->freeze : NULL);
+	return scsi_bus_suspend_common(dev, do_scsi_freeze);
 }
 
 static int scsi_bus_thaw(struct device *dev)
 {
-	const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
-	return scsi_bus_resume_common(dev, pm ? pm->thaw : NULL);
+	return scsi_bus_resume_common(dev, do_scsi_thaw);
 }
 
 static int scsi_bus_poweroff(struct device *dev)
 {
-	const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
-	return scsi_bus_suspend_common(dev, pm ? pm->poweroff : NULL);
+	return scsi_bus_suspend_common(dev, do_scsi_poweroff);
 }
 
 static int scsi_bus_restore(struct device *dev)
 {
-	const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
-	return scsi_bus_resume_common(dev, pm ? pm->restore : NULL);
+	return scsi_bus_resume_common(dev, do_scsi_restore);
 }
 
 #else /* CONFIG_PM_SLEEP */
diff --git a/drivers/scsi/scsi_priv.h b/drivers/scsi/scsi_priv.h
index f079a598bed4..a26fd44c7738 100644
--- a/drivers/scsi/scsi_priv.h
+++ b/drivers/scsi/scsi_priv.h
@@ -166,6 +166,7 @@ static inline int scsi_autopm_get_host(struct Scsi_Host *h) { return 0; }
 static inline void scsi_autopm_put_host(struct Scsi_Host *h) {}
 #endif /* CONFIG_PM_RUNTIME */
 
+extern struct async_domain scsi_sd_pm_domain;
 extern struct async_domain scsi_sd_probe_domain;
 
 /* 
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 470954aba728..700c595c603e 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -3020,6 +3020,7 @@ static int sd_remove(struct device *dev)
 	devt = disk_devt(sdkp->disk);
 	scsi_autopm_get_device(sdkp->device);
 
+	async_synchronize_full_domain(&scsi_sd_pm_domain);
 	async_synchronize_full_domain(&scsi_sd_probe_domain);
 	blk_queue_prep_rq(sdkp->device->request_queue, scsi_prep_fn);
 	blk_queue_unprep_rq(sdkp->device->request_queue, NULL);


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH v5 3/3] scsi: async sd resume
  2014-03-05 20:17 ` [PATCH v5 3/3] scsi: async sd resume Dan Williams
@ 2014-03-07 15:42   ` Alan Stern
  2014-03-08  0:41     ` Dan Williams
  2014-03-08  2:52     ` Dan Williams
  0 siblings, 2 replies; 28+ messages in thread
From: Alan Stern @ 2014-03-07 15:42 UTC (permalink / raw)
  To: Dan Williams
  Cc: tj, JBottomley, Len Brown, linux-scsi, Phillip Susi, linux-ide,
	Todd Brandt

On Wed, 5 Mar 2014, Dan Williams wrote:

> async_schedule() sd resume work to allow disks and other devices to
> resume in parallel.
> 
> This moves the entirety of scsi_device resume to an async context to
> ensure that scsi_device_resume() remains ordered with respect to the
> completion of the start/stop command.  For the duration of the resume,
> new command submissions (that do not originate from the scsi-core) will
> be deferred (BLKPREP_DEFER).
> 
> It adds a new ASYNC_DOMAIN_EXCLUSIVE(scsi_sd_pm_domain) as a container
> of these operations.  Like scsi_sd_probe_domain it is flushed at
> sd_remove() time to ensure async ops do not continue past the
> end-of-life of the sdev.  The implementation explicitly refrains from
> reusing scsi_sd_probe_domain directly for this purpose as it is flushed
> at the end of dpm_resume(), potentially defeating some of the benefit.
> Given sdevs are quiesced it is permissible for these resume operations
> to bleed past the async_synchronize_full() calls made by the driver
> core.
> 
> We defer the resolution of which pm callback to call until
> scsi_dev_type_{suspend|resume} time and guarantee that the 'int
> (*cb)(struct device *)' parameter is never NULL.  With this in place the
> type of resume operation is encoded in the async function identifier.
> 
> Inspired by Todd's analysis and initial proposal [2]:
> https://01.org/suspendresume/blogs/tebrandt/2013/hard-disk-resume-optimization-simpler-approach
> 
> Cc: Len Brown <len.brown@intel.com>
> Cc: Phillip Susi <psusi@ubuntu.com>
> Cc: Alan Stern <stern@rowland.harvard.edu>
> Suggested-by: Todd Brandt <todd.e.brandt@linux.intel.com>
> [djbw: kick all resume work to the async queue]
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---

> --- a/drivers/scsi/scsi_pm.c
> +++ b/drivers/scsi/scsi_pm.c
> @@ -18,17 +18,52 @@
>  
>  #ifdef CONFIG_PM_SLEEP
>  
> +#define do_pm_op(dev, op) \
> +	dev->driver ? dev->driver->pm ? \
> +	dev->driver->pm->op(dev) : 0 : 0

This will crash if dev->driver->pm->op is NULL.  How about making it 
easier to read, too?

#define RETURN_PM_OP(dev, op)						\
	if (dev->driver && dev->driver->pm && dev->driver->pm->op) 	\
		return dev->driver->pm->op(dev);			\
	else								\
		return 0

static int do_scsi_suspend(struct device *dev)
{
	RETURM_PM_OP(dev, suspend);
}

etc.

Alternatively, you could put the "dev->driver && dev->driver->pm" part
of the test directly into scsi_dev_type_suspend and
scsi_dev_type_resume, to save a little code space.  Then the original 
macro formulation would become sufficiently simple: one test and one 
function call.

> +static int do_scsi_suspend(struct device *dev)
> +{
> +	return do_pm_op(dev, suspend);
> +}
> +
> +static int do_scsi_freeze(struct device *dev)
> +{
> +	return do_pm_op(dev, freeze);
> +}
> +
> +static int do_scsi_poweroff(struct device *dev)
> +{
> +	return do_pm_op(dev, poweroff);
> +}
> +
> +static int do_scsi_resume(struct device *dev)
> +{
> +	return do_pm_op(dev, resume);
> +}
> +
> +static int do_scsi_thaw(struct device *dev)
> +{
> +	return do_pm_op(dev, thaw);
> +}
> +
> +static int do_scsi_restore(struct device *dev)
> +{
> +	return do_pm_op(dev, restore);
> +}
> +
>  static int scsi_dev_type_suspend(struct device *dev, int (*cb)(struct device *))
>  {
>  	int err;
>  
> +	/* flush pending in-flight resume operations, suspend is synchronous */
> +	async_synchronize_full_domain(&scsi_sd_pm_domain);
> +
>  	err = scsi_device_quiesce(to_scsi_device(dev));
>  	if (err == 0) {
> -		if (cb) {
> -			err = cb(dev);
> -			if (err)
> -				scsi_device_resume(to_scsi_device(dev));
> -		}
> +		err = cb(dev);
> +		if (err)
> +			scsi_device_resume(to_scsi_device(dev));
>  	}
>  	dev_dbg(dev, "scsi suspend: %d\n", err);
>  	return err;
> @@ -38,10 +73,16 @@ static int scsi_dev_type_resume(struct device *dev, int (*cb)(struct device *))
>  {
>  	int err = 0;
>  
> -	if (cb)
> -		err = cb(dev);
> +	err = cb(dev);
>  	scsi_device_resume(to_scsi_device(dev));
>  	dev_dbg(dev, "scsi resume: %d\n", err);
> +
> +	if (err == 0) {
> +		pm_runtime_disable(dev);
> +		pm_runtime_set_active(dev);
> +		pm_runtime_enable(dev);
> +	}
> +
>  	return err;
>  }
>  
> @@ -66,20 +107,50 @@ scsi_bus_suspend_common(struct device *dev, int (*cb)(struct device *))
>  	return err;
>  }
>  
> +static void async_sdev_resume(void *dev, async_cookie_t cookie)
> +{
> +	scsi_dev_type_resume(dev, do_scsi_resume);
> +}
> +
> +static void async_sdev_thaw(void *dev, async_cookie_t cookie)
> +{
> +	scsi_dev_type_resume(dev, do_scsi_thaw);
> +}
> +
> +static void async_sdev_restore(void *dev, async_cookie_t cookie)
> +{
> +	scsi_dev_type_resume(dev, do_scsi_restore);
> +}
> +
> +static async_func_t to_async_sdev_resume_fn(struct device *dev,
> +		int (*cb)(struct device *))
> +{
> +	if (!scsi_is_sdev_device(dev))
> +		return NULL;
> +
> +	if (cb == do_scsi_resume)
> +		return async_sdev_resume;
> +	else if (cb == do_scsi_thaw)
> +		return async_sdev_thaw;
> +	else if (cb == do_scsi_restore)
> +		return async_sdev_restore;
> +	else
> +		return NULL;
> +}

Given that this function is used in only one place; I would put it 
inline.

Alan Stern


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v5 3/3] scsi: async sd resume
  2014-03-07 15:42   ` Alan Stern
@ 2014-03-08  0:41     ` Dan Williams
  2014-03-08  2:52     ` Dan Williams
  1 sibling, 0 replies; 28+ messages in thread
From: Dan Williams @ 2014-03-08  0:41 UTC (permalink / raw)
  To: Alan Stern
  Cc: Tejun Heo, James Bottomley, Len Brown, linux-scsi, Phillip Susi,
	IDE/ATA development list, Todd Brandt

On Fri, Mar 7, 2014 at 7:42 AM, Alan Stern <stern@rowland.harvard.edu> wrote:
> On Wed, 5 Mar 2014, Dan Williams wrote:
>
>> async_schedule() sd resume work to allow disks and other devices to
>> resume in parallel.
>>
>> This moves the entirety of scsi_device resume to an async context to
>> ensure that scsi_device_resume() remains ordered with respect to the
>> completion of the start/stop command.  For the duration of the resume,
>> new command submissions (that do not originate from the scsi-core) will
>> be deferred (BLKPREP_DEFER).
>>
>> It adds a new ASYNC_DOMAIN_EXCLUSIVE(scsi_sd_pm_domain) as a container
>> of these operations.  Like scsi_sd_probe_domain it is flushed at
>> sd_remove() time to ensure async ops do not continue past the
>> end-of-life of the sdev.  The implementation explicitly refrains from
>> reusing scsi_sd_probe_domain directly for this purpose as it is flushed
>> at the end of dpm_resume(), potentially defeating some of the benefit.
>> Given sdevs are quiesced it is permissible for these resume operations
>> to bleed past the async_synchronize_full() calls made by the driver
>> core.
>>
>> We defer the resolution of which pm callback to call until
>> scsi_dev_type_{suspend|resume} time and guarantee that the 'int
>> (*cb)(struct device *)' parameter is never NULL.  With this in place the
>> type of resume operation is encoded in the async function identifier.
>>
>> Inspired by Todd's analysis and initial proposal [2]:
>> https://01.org/suspendresume/blogs/tebrandt/2013/hard-disk-resume-optimization-simpler-approach
>>
>> Cc: Len Brown <len.brown@intel.com>
>> Cc: Phillip Susi <psusi@ubuntu.com>
>> Cc: Alan Stern <stern@rowland.harvard.edu>
>> Suggested-by: Todd Brandt <todd.e.brandt@linux.intel.com>
>> [djbw: kick all resume work to the async queue]
>> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
>> ---
>
>> --- a/drivers/scsi/scsi_pm.c
>> +++ b/drivers/scsi/scsi_pm.c
>> @@ -18,17 +18,52 @@
>>
>>  #ifdef CONFIG_PM_SLEEP
>>
>> +#define do_pm_op(dev, op) \
>> +     dev->driver ? dev->driver->pm ? \
>> +     dev->driver->pm->op(dev) : 0 : 0
>
> This will crash if dev->driver->pm->op is NULL.  How about making it
> easier to read, too?
>
> #define RETURN_PM_OP(dev, op)                                           \
>         if (dev->driver && dev->driver->pm && dev->driver->pm->op)      \
>                 return dev->driver->pm->op(dev);                        \
>         else                                                            \
>                 return 0
>
> static int do_scsi_suspend(struct device *dev)
> {
>         RETURM_PM_OP(dev, suspend);
> }
>
> etc.
>
> Alternatively, you could put the "dev->driver && dev->driver->pm" part
> of the test directly into scsi_dev_type_suspend and
> scsi_dev_type_resume, to save a little code space.  Then the original
> macro formulation would become sufficiently simple: one test and one
> function call.

I like that better than hiding too much (especially bugs) in the macro.

>
>> +static int do_scsi_suspend(struct device *dev)
>> +{
>> +     return do_pm_op(dev, suspend);
>> +}
>> +
>> +static int do_scsi_freeze(struct device *dev)
>> +{
>> +     return do_pm_op(dev, freeze);
>> +}
>> +
>> +static int do_scsi_poweroff(struct device *dev)
>> +{
>> +     return do_pm_op(dev, poweroff);
>> +}
>> +
>> +static int do_scsi_resume(struct device *dev)
>> +{
>> +     return do_pm_op(dev, resume);
>> +}
>> +
>> +static int do_scsi_thaw(struct device *dev)
>> +{
>> +     return do_pm_op(dev, thaw);
>> +}
>> +
>> +static int do_scsi_restore(struct device *dev)
>> +{
>> +     return do_pm_op(dev, restore);
>> +}
>> +
>>  static int scsi_dev_type_suspend(struct device *dev, int (*cb)(struct device *))
>>  {
>>       int err;
>>
>> +     /* flush pending in-flight resume operations, suspend is synchronous */
>> +     async_synchronize_full_domain(&scsi_sd_pm_domain);
>> +
>>       err = scsi_device_quiesce(to_scsi_device(dev));
>>       if (err == 0) {
>> -             if (cb) {
>> -                     err = cb(dev);
>> -                     if (err)
>> -                             scsi_device_resume(to_scsi_device(dev));
>> -             }
>> +             err = cb(dev);
>> +             if (err)
>> +                     scsi_device_resume(to_scsi_device(dev));
>>       }
>>       dev_dbg(dev, "scsi suspend: %d\n", err);
>>       return err;
>> @@ -38,10 +73,16 @@ static int scsi_dev_type_resume(struct device *dev, int (*cb)(struct device *))
>>  {
>>       int err = 0;
>>
>> -     if (cb)
>> -             err = cb(dev);
>> +     err = cb(dev);
>>       scsi_device_resume(to_scsi_device(dev));
>>       dev_dbg(dev, "scsi resume: %d\n", err);
>> +
>> +     if (err == 0) {
>> +             pm_runtime_disable(dev);
>> +             pm_runtime_set_active(dev);
>> +             pm_runtime_enable(dev);
>> +     }
>> +
>>       return err;
>>  }
>>
>> @@ -66,20 +107,50 @@ scsi_bus_suspend_common(struct device *dev, int (*cb)(struct device *))
>>       return err;
>>  }
>>
>> +static void async_sdev_resume(void *dev, async_cookie_t cookie)
>> +{
>> +     scsi_dev_type_resume(dev, do_scsi_resume);
>> +}
>> +
>> +static void async_sdev_thaw(void *dev, async_cookie_t cookie)
>> +{
>> +     scsi_dev_type_resume(dev, do_scsi_thaw);
>> +}
>> +
>> +static void async_sdev_restore(void *dev, async_cookie_t cookie)
>> +{
>> +     scsi_dev_type_resume(dev, do_scsi_restore);
>> +}
>> +
>> +static async_func_t to_async_sdev_resume_fn(struct device *dev,
>> +             int (*cb)(struct device *))
>> +{
>> +     if (!scsi_is_sdev_device(dev))
>> +             return NULL;
>> +
>> +     if (cb == do_scsi_resume)
>> +             return async_sdev_resume;
>> +     else if (cb == do_scsi_thaw)
>> +             return async_sdev_thaw;
>> +     else if (cb == do_scsi_restore)
>> +             return async_sdev_restore;
>> +     else
>> +             return NULL;
>> +}
>
> Given that this function is used in only one place; I would put it
> inline.

Personal preference, but since you mention it, sure.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v5 3/3] scsi: async sd resume
  2014-03-07 15:42   ` Alan Stern
  2014-03-08  0:41     ` Dan Williams
@ 2014-03-08  2:52     ` Dan Williams
  2014-03-10 15:49       ` Alan Stern
  2014-03-10 20:43       ` Tejun Heo
  1 sibling, 2 replies; 28+ messages in thread
From: Dan Williams @ 2014-03-08  2:52 UTC (permalink / raw)
  To: Alan Stern
  Cc: tj, JBottomley, Len Brown, linux-scsi, Phillip Susi, linux-ide,
	Todd Brandt

On Fri, 2014-03-07 at 10:42 -0500, Alan Stern wrote:
> >  #ifdef CONFIG_PM_SLEEP
> >  
> > +#define do_pm_op(dev, op) \
> > +	dev->driver ? dev->driver->pm ? \
> > +	dev->driver->pm->op(dev) : 0 : 0
> 
> This will crash if dev->driver->pm->op is NULL.  How about making it 
> easier to read, too?
> 
> #define RETURN_PM_OP(dev, op)						\
> 	if (dev->driver && dev->driver->pm && dev->driver->pm->op) 	\
> 		return dev->driver->pm->op(dev);			\
> 	else								\
> 		return 0
> 
> static int do_scsi_suspend(struct device *dev)
> {
> 	RETURM_PM_OP(dev, suspend);
> }
> 
> etc.
> 
> Alternatively, you could put the "dev->driver && dev->driver->pm" part
> of the test directly into scsi_dev_type_suspend and
> scsi_dev_type_resume, to save a little code space.  Then the original 
> macro formulation would become sufficiently simple: one test and one 
> function call.
[..]
> Given that this function is used in only one place; I would put it 
> inline.
> 

Ended up putting the dev->driver->pm lookup into scsi_dev_type_{suspend|
resume} directly, and open coding the pm->op is NULL check.

Moved the lookup of the async function inline.

8<----------------
Subject: scsi: async sd resume

From: Dan Williams <dan.j.williams@intel.com>

async_schedule() sd resume work to allow disks and other devices to
resume in parallel.

This moves the entirety of scsi_device resume to an async context to
ensure that scsi_device_resume() remains ordered with respect to the
completion of the start/stop command.  For the duration of the resume,
new command submissions (that do not originate from the scsi-core) will
be deferred (BLKPREP_DEFER).

It adds a new ASYNC_DOMAIN_EXCLUSIVE(scsi_sd_pm_domain) as a container
of these operations.  Like scsi_sd_probe_domain it is flushed at
sd_remove() time to ensure async ops do not continue past the
end-of-life of the sdev.  The implementation explicitly refrains from
reusing scsi_sd_probe_domain directly for this purpose as it is flushed
at the end of dpm_resume(), potentially defeating some of the benefit.
Given sdevs are quiesced it is permissible for these resume operations
to bleed past the async_synchronize_full() calls made by the driver
core.

We defer the resolution of which pm callback to call until
scsi_dev_type_{suspend|resume} time and guarantee that the callback
parameter is never NULL.  With this in place the type of resume
operation is encoded in the async function identifier.

Inspired by Todd's analysis and initial proposal [2]:
https://01.org/suspendresume/blogs/tebrandt/2013/hard-disk-resume-optimization-simpler-approach

Cc: Len Brown <len.brown@intel.com>
Cc: Phillip Susi <psusi@ubuntu.com>
Cc: Alan Stern <stern@rowland.harvard.edu>
[alan: bug fix and clean up suggestion]
Suggested-by: Todd Brandt <todd.e.brandt@linux.intel.com>
[djbw: kick all resume work to the async queue]
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/scsi/scsi.c      |    3 +
 drivers/scsi/scsi_pm.c   |  119 +++++++++++++++++++++++++++++++++++-----------
 drivers/scsi/scsi_priv.h |    1 
 drivers/scsi/sd.c        |    1 
 4 files changed, 95 insertions(+), 29 deletions(-)

diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
index d8afec8317cf..990d4a302788 100644
--- a/drivers/scsi/scsi.c
+++ b/drivers/scsi/scsi.c
@@ -91,6 +91,9 @@ EXPORT_SYMBOL(scsi_logging_level);
 ASYNC_DOMAIN(scsi_sd_probe_domain);
 EXPORT_SYMBOL(scsi_sd_probe_domain);
 
+ASYNC_DOMAIN_EXCLUSIVE(scsi_sd_pm_domain);
+EXPORT_SYMBOL(scsi_sd_pm_domain);
+
 /* NB: These are exposed through /proc/scsi/scsi and form part of the ABI.
  * You may not alter any existing entry (although adding new ones is
  * encouraged once assigned by ANSI/INCITS T10
diff --git a/drivers/scsi/scsi_pm.c b/drivers/scsi/scsi_pm.c
index 001e9ceda4c3..06fc787e7787 100644
--- a/drivers/scsi/scsi_pm.c
+++ b/drivers/scsi/scsi_pm.c
@@ -18,35 +18,77 @@
 
 #ifdef CONFIG_PM_SLEEP
 
-static int scsi_dev_type_suspend(struct device *dev, int (*cb)(struct device *))
+static int do_scsi_suspend(struct device *dev, const struct dev_pm_ops *pm)
 {
+	return pm && pm->suspend ? pm->suspend(dev) : 0;
+}
+
+static int do_scsi_freeze(struct device *dev, const struct dev_pm_ops *pm)
+{
+	return pm && pm->freeze ? pm->freeze(dev) : 0;
+}
+
+static int do_scsi_poweroff(struct device *dev, const struct dev_pm_ops *pm)
+{
+	return pm && pm->poweroff ? pm->poweroff(dev) : 0;
+}
+
+static int do_scsi_resume(struct device *dev, const struct dev_pm_ops *pm)
+{
+	return pm && pm->resume ? pm->resume(dev) : 0;
+}
+
+static int do_scsi_thaw(struct device *dev, const struct dev_pm_ops *pm)
+{
+	return pm && pm->thaw ? pm->thaw(dev) : 0;
+}
+
+static int do_scsi_restore(struct device *dev, const struct dev_pm_ops *pm)
+{
+	return pm && pm->restore ? pm->restore(dev) : 0;
+}
+
+static int scsi_dev_type_suspend(struct device *dev,
+		int (*cb)(struct device *, const struct dev_pm_ops *))
+{
+	const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
 	int err;
 
+	/* flush pending in-flight resume operations, suspend is synchronous */
+	async_synchronize_full_domain(&scsi_sd_pm_domain);
+
 	err = scsi_device_quiesce(to_scsi_device(dev));
 	if (err == 0) {
-		if (cb) {
-			err = cb(dev);
-			if (err)
-				scsi_device_resume(to_scsi_device(dev));
-		}
+		err = cb(dev, pm);
+		if (err)
+			scsi_device_resume(to_scsi_device(dev));
 	}
 	dev_dbg(dev, "scsi suspend: %d\n", err);
 	return err;
 }
 
-static int scsi_dev_type_resume(struct device *dev, int (*cb)(struct device *))
+static int scsi_dev_type_resume(struct device *dev,
+		int (*cb)(struct device *, const struct dev_pm_ops *))
 {
+	const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
 	int err = 0;
 
-	if (cb)
-		err = cb(dev);
+	err = cb(dev, pm);
 	scsi_device_resume(to_scsi_device(dev));
 	dev_dbg(dev, "scsi resume: %d\n", err);
+
+	if (err == 0) {
+		pm_runtime_disable(dev);
+		pm_runtime_set_active(dev);
+		pm_runtime_enable(dev);
+	}
+
 	return err;
 }
 
 static int
-scsi_bus_suspend_common(struct device *dev, int (*cb)(struct device *))
+scsi_bus_suspend_common(struct device *dev,
+		int (*cb)(struct device *, const struct dev_pm_ops *))
 {
 	int err = 0;
 
@@ -66,20 +108,45 @@ scsi_bus_suspend_common(struct device *dev, int (*cb)(struct device *))
 	return err;
 }
 
-static int
-scsi_bus_resume_common(struct device *dev, int (*cb)(struct device *))
+static void async_sdev_resume(void *dev, async_cookie_t cookie)
 {
-	int err = 0;
+	scsi_dev_type_resume(dev, do_scsi_resume);
+}
 
-	if (scsi_is_sdev_device(dev))
-		err = scsi_dev_type_resume(dev, cb);
+static void async_sdev_thaw(void *dev, async_cookie_t cookie)
+{
+	scsi_dev_type_resume(dev, do_scsi_thaw);
+}
 
-	if (err == 0) {
+static void async_sdev_restore(void *dev, async_cookie_t cookie)
+{
+	scsi_dev_type_resume(dev, do_scsi_restore);
+}
+
+static int scsi_bus_resume_common(struct device *dev,
+		int (*cb)(struct device *, const struct dev_pm_ops *))
+{
+	async_func_t fn;
+
+	if (!scsi_is_sdev_device(dev))
+		fn = NULL;
+	else if (cb == do_scsi_resume)
+		fn = async_sdev_resume;
+	else if (cb == do_scsi_thaw)
+		fn = async_sdev_thaw;
+	else if (cb == do_scsi_restore)
+		fn = async_sdev_restore;
+	else
+		fn = NULL;
+
+	if (fn)
+		async_schedule_domain(fn, dev, &scsi_sd_pm_domain);
+	else {
 		pm_runtime_disable(dev);
 		pm_runtime_set_active(dev);
 		pm_runtime_enable(dev);
 	}
-	return err;
+	return 0;
 }
 
 static int scsi_bus_prepare(struct device *dev)
@@ -97,38 +164,32 @@ static int scsi_bus_prepare(struct device *dev)
 
 static int scsi_bus_suspend(struct device *dev)
 {
-	const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
-	return scsi_bus_suspend_common(dev, pm ? pm->suspend : NULL);
+	return scsi_bus_suspend_common(dev, do_scsi_suspend);
 }
 
 static int scsi_bus_resume(struct device *dev)
 {
-	const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
-	return scsi_bus_resume_common(dev, pm ? pm->resume : NULL);
+	return scsi_bus_resume_common(dev, do_scsi_resume);
 }
 
 static int scsi_bus_freeze(struct device *dev)
 {
-	const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
-	return scsi_bus_suspend_common(dev, pm ? pm->freeze : NULL);
+	return scsi_bus_suspend_common(dev, do_scsi_freeze);
 }
 
 static int scsi_bus_thaw(struct device *dev)
 {
-	const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
-	return scsi_bus_resume_common(dev, pm ? pm->thaw : NULL);
+	return scsi_bus_resume_common(dev, do_scsi_thaw);
 }
 
 static int scsi_bus_poweroff(struct device *dev)
 {
-	const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
-	return scsi_bus_suspend_common(dev, pm ? pm->poweroff : NULL);
+	return scsi_bus_suspend_common(dev, do_scsi_poweroff);
 }
 
 static int scsi_bus_restore(struct device *dev)
 {
-	const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
-	return scsi_bus_resume_common(dev, pm ? pm->restore : NULL);
+	return scsi_bus_resume_common(dev, do_scsi_restore);
 }
 
 #else /* CONFIG_PM_SLEEP */
diff --git a/drivers/scsi/scsi_priv.h b/drivers/scsi/scsi_priv.h
index f079a598bed4..a26fd44c7738 100644
--- a/drivers/scsi/scsi_priv.h
+++ b/drivers/scsi/scsi_priv.h
@@ -166,6 +166,7 @@ static inline int scsi_autopm_get_host(struct Scsi_Host *h) { return 0; }
 static inline void scsi_autopm_put_host(struct Scsi_Host *h) {}
 #endif /* CONFIG_PM_RUNTIME */
 
+extern struct async_domain scsi_sd_pm_domain;
 extern struct async_domain scsi_sd_probe_domain;
 
 /* 
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 470954aba728..700c595c603e 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -3020,6 +3020,7 @@ static int sd_remove(struct device *dev)
 	devt = disk_devt(sdkp->disk);
 	scsi_autopm_get_device(sdkp->device);
 
+	async_synchronize_full_domain(&scsi_sd_pm_domain);
 	async_synchronize_full_domain(&scsi_sd_probe_domain);
 	blk_queue_prep_rq(sdkp->device->request_queue, scsi_prep_fn);
 	blk_queue_unprep_rq(sdkp->device->request_queue, NULL);



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH v5 3/3] scsi: async sd resume
  2014-03-08  2:52     ` Dan Williams
@ 2014-03-10 15:49       ` Alan Stern
  2014-03-10 20:43       ` Tejun Heo
  1 sibling, 0 replies; 28+ messages in thread
From: Alan Stern @ 2014-03-10 15:49 UTC (permalink / raw)
  To: Dan Williams
  Cc: tj, JBottomley, Len Brown, linux-scsi, Phillip Susi, linux-ide,
	Todd Brandt

On Fri, 7 Mar 2014, Dan Williams wrote:

> On Fri, 2014-03-07 at 10:42 -0500, Alan Stern wrote:
> > >  #ifdef CONFIG_PM_SLEEP
> > >  
> > > +#define do_pm_op(dev, op) \
> > > +	dev->driver ? dev->driver->pm ? \
> > > +	dev->driver->pm->op(dev) : 0 : 0
> > 
> > This will crash if dev->driver->pm->op is NULL.  How about making it 
> > easier to read, too?
> > 
> > #define RETURN_PM_OP(dev, op)						\
> > 	if (dev->driver && dev->driver->pm && dev->driver->pm->op) 	\
> > 		return dev->driver->pm->op(dev);			\
> > 	else								\
> > 		return 0
> > 
> > static int do_scsi_suspend(struct device *dev)
> > {
> > 	RETURM_PM_OP(dev, suspend);
> > }
> > 
> > etc.
> > 
> > Alternatively, you could put the "dev->driver && dev->driver->pm" part
> > of the test directly into scsi_dev_type_suspend and
> > scsi_dev_type_resume, to save a little code space.  Then the original 
> > macro formulation would become sufficiently simple: one test and one 
> > function call.
> [..]
> > Given that this function is used in only one place; I would put it 
> > inline.
> > 
> 
> Ended up putting the dev->driver->pm lookup into scsi_dev_type_{suspend|
> resume} directly, and open coding the pm->op is NULL check.
> 
> Moved the lookup of the async function inline.
> 
> 8<----------------
> Subject: scsi: async sd resume
> 
> From: Dan Williams <dan.j.williams@intel.com>
> 
> async_schedule() sd resume work to allow disks and other devices to
> resume in parallel.
> 
> This moves the entirety of scsi_device resume to an async context to
> ensure that scsi_device_resume() remains ordered with respect to the
> completion of the start/stop command.  For the duration of the resume,
> new command submissions (that do not originate from the scsi-core) will
> be deferred (BLKPREP_DEFER).
> 
> It adds a new ASYNC_DOMAIN_EXCLUSIVE(scsi_sd_pm_domain) as a container
> of these operations.  Like scsi_sd_probe_domain it is flushed at
> sd_remove() time to ensure async ops do not continue past the
> end-of-life of the sdev.  The implementation explicitly refrains from
> reusing scsi_sd_probe_domain directly for this purpose as it is flushed
> at the end of dpm_resume(), potentially defeating some of the benefit.
> Given sdevs are quiesced it is permissible for these resume operations
> to bleed past the async_synchronize_full() calls made by the driver
> core.
> 
> We defer the resolution of which pm callback to call until
> scsi_dev_type_{suspend|resume} time and guarantee that the callback
> parameter is never NULL.  With this in place the type of resume
> operation is encoded in the async function identifier.
> 
> Inspired by Todd's analysis and initial proposal [2]:
> https://01.org/suspendresume/blogs/tebrandt/2013/hard-disk-resume-optimization-simpler-approach
> 
> Cc: Len Brown <len.brown@intel.com>
> Cc: Phillip Susi <psusi@ubuntu.com>
> Cc: Alan Stern <stern@rowland.harvard.edu>
> [alan: bug fix and clean up suggestion]
> Suggested-by: Todd Brandt <todd.e.brandt@linux.intel.com>
> [djbw: kick all resume work to the async queue]
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>

Acked-by: Alan Stern <stern@rowland.harvard.edu>


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v5 1/3] libata, libsas: kill pm_result and related cleanup
  2014-03-05 20:17 ` [PATCH v5 1/3] libata, libsas: kill pm_result and related cleanup Dan Williams
@ 2014-03-10 20:29   ` Tejun Heo
  2014-03-10 20:44     ` Tejun Heo
  0 siblings, 1 reply; 28+ messages in thread
From: Tejun Heo @ 2014-03-10 20:29 UTC (permalink / raw)
  To: Dan Williams
  Cc: JBottomley, Phillip Susi, Todd Brandt, Alan Stern, linux-scsi,
	linux-ide

Hello,

On Wed, Mar 05, 2014 at 12:17:30PM -0800, Dan Williams wrote:
> +#define ata_port_resume_sync(ap, msg) ata_port_resume_common((ap), (msg), false)
> +#define queue_ata_port_resume(ap, msg) ata_port_resume_common((ap), (msg), true)

Let's please use proper static functions.  The compiler can deal with
inlining.  Also, maybe ata_port_resume() and ata_port_resume_async()
are better names for the wrappers?

Other than that, libata part looks good to me.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v5 3/3] scsi: async sd resume
  2014-03-08  2:52     ` Dan Williams
  2014-03-10 15:49       ` Alan Stern
@ 2014-03-10 20:43       ` Tejun Heo
  2014-03-10 20:56         ` Dan Williams
  1 sibling, 1 reply; 28+ messages in thread
From: Tejun Heo @ 2014-03-10 20:43 UTC (permalink / raw)
  To: Dan Williams
  Cc: Alan Stern, JBottomley, Len Brown, linux-scsi, Phillip Susi,
	linux-ide, Todd Brandt

On Fri, Mar 07, 2014 at 06:52:06PM -0800, Dan Williams wrote:
> From: Dan Williams <dan.j.williams@intel.com>
> 
> async_schedule() sd resume work to allow disks and other devices to
> resume in parallel.
> 
> This moves the entirety of scsi_device resume to an async context to
> ensure that scsi_device_resume() remains ordered with respect to the
> completion of the start/stop command.  For the duration of the resume,
> new command submissions (that do not originate from the scsi-core) will
> be deferred (BLKPREP_DEFER).
> 
> It adds a new ASYNC_DOMAIN_EXCLUSIVE(scsi_sd_pm_domain) as a container
> of these operations.  Like scsi_sd_probe_domain it is flushed at
> sd_remove() time to ensure async ops do not continue past the
> end-of-life of the sdev.  The implementation explicitly refrains from
> reusing scsi_sd_probe_domain directly for this purpose as it is flushed
> at the end of dpm_resume(), potentially defeating some of the benefit.
> Given sdevs are quiesced it is permissible for these resume operations
> to bleed past the async_synchronize_full() calls made by the driver
> core.
> 
> We defer the resolution of which pm callback to call until
> scsi_dev_type_{suspend|resume} time and guarantee that the callback
> parameter is never NULL.  With this in place the type of resume
> operation is encoded in the async function identifier.
> 
> Inspired by Todd's analysis and initial proposal [2]:
> https://01.org/suspendresume/blogs/tebrandt/2013/hard-disk-resume-optimization-simpler-approach

The only thing which is a bit concerning is that this doesn't have any
throttling mechanism for simultaneous wakeups.  Would this be able to
blow up the PSU if used on a machine with a lot of spindles?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v5 1/3] libata, libsas: kill pm_result and related cleanup
  2014-03-10 20:29   ` Tejun Heo
@ 2014-03-10 20:44     ` Tejun Heo
  0 siblings, 0 replies; 28+ messages in thread
From: Tejun Heo @ 2014-03-10 20:44 UTC (permalink / raw)
  To: Dan Williams
  Cc: JBottomley, Phillip Susi, Todd Brandt, Alan Stern, linux-scsi,
	linux-ide

On Mon, Mar 10, 2014 at 04:29:47PM -0400, Tejun Heo wrote:
> Let's please use proper static functions.  The compiler can deal with
> inlining.  Also, maybe ata_port_resume() and ata_port_resume_async()
> are better names for the wrappers?

So, if the suggested names don't jive well with the rest of the
series, my only concern is that none of other ATA functions start with
the verb, so something like ata_port_queue_resume() would be fine too.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v5 3/3] scsi: async sd resume
  2014-03-10 20:43       ` Tejun Heo
@ 2014-03-10 20:56         ` Dan Williams
  2014-03-10 20:59           ` Alan Stern
  0 siblings, 1 reply; 28+ messages in thread
From: Dan Williams @ 2014-03-10 20:56 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Alan Stern, James Bottomley, Len Brown, linux-scsi, Phillip Susi,
	IDE/ATA development list, Todd Brandt

On Mon, Mar 10, 2014 at 1:43 PM, Tejun Heo <tj@kernel.org> wrote:
> On Fri, Mar 07, 2014 at 06:52:06PM -0800, Dan Williams wrote:
>> From: Dan Williams <dan.j.williams@intel.com>
>>
>> async_schedule() sd resume work to allow disks and other devices to
>> resume in parallel.
>>
>> This moves the entirety of scsi_device resume to an async context to
>> ensure that scsi_device_resume() remains ordered with respect to the
>> completion of the start/stop command.  For the duration of the resume,
>> new command submissions (that do not originate from the scsi-core) will
>> be deferred (BLKPREP_DEFER).
>>
>> It adds a new ASYNC_DOMAIN_EXCLUSIVE(scsi_sd_pm_domain) as a container
>> of these operations.  Like scsi_sd_probe_domain it is flushed at
>> sd_remove() time to ensure async ops do not continue past the
>> end-of-life of the sdev.  The implementation explicitly refrains from
>> reusing scsi_sd_probe_domain directly for this purpose as it is flushed
>> at the end of dpm_resume(), potentially defeating some of the benefit.
>> Given sdevs are quiesced it is permissible for these resume operations
>> to bleed past the async_synchronize_full() calls made by the driver
>> core.
>>
>> We defer the resolution of which pm callback to call until
>> scsi_dev_type_{suspend|resume} time and guarantee that the callback
>> parameter is never NULL.  With this in place the type of resume
>> operation is encoded in the async function identifier.
>>
>> Inspired by Todd's analysis and initial proposal [2]:
>> https://01.org/suspendresume/blogs/tebrandt/2013/hard-disk-resume-optimization-simpler-approach
>
> The only thing which is a bit concerning is that this doesn't have any
> throttling mechanism for simultaneous wakeups.  Would this be able to
> blow up the PSU if used on a machine with a lot of spindles?

Good point.  The primary benefit is completing userspace resume
without needlessly waiting for the disk.  For now I think it would be
enough to have a mutex to maintain one disk at a time.  We can follow
on later with something more complex to enable a max simultaneous
spin-up tunable.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v5 3/3] scsi: async sd resume
  2014-03-10 20:56         ` Dan Williams
@ 2014-03-10 20:59           ` Alan Stern
  2014-03-10 21:06             ` Dan Williams
                               ` (2 more replies)
  0 siblings, 3 replies; 28+ messages in thread
From: Alan Stern @ 2014-03-10 20:59 UTC (permalink / raw)
  To: Dan Williams
  Cc: Tejun Heo, James Bottomley, Len Brown, linux-scsi, Phillip Susi,
	IDE/ATA development list, Todd Brandt

On Mon, 10 Mar 2014, Dan Williams wrote:

> > The only thing which is a bit concerning is that this doesn't have any
> > throttling mechanism for simultaneous wakeups.  Would this be able to
> > blow up the PSU if used on a machine with a lot of spindles?
> 
> Good point.  The primary benefit is completing userspace resume
> without needlessly waiting for the disk.  For now I think it would be
> enough to have a mutex to maintain one disk at a time.  We can follow
> on later with something more complex to enable a max simultaneous
> spin-up tunable.

Why?  The existing code doesn't have anything like that.

Alan Stern


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v5 3/3] scsi: async sd resume
  2014-03-10 20:59           ` Alan Stern
@ 2014-03-10 21:06             ` Dan Williams
  2014-03-10 22:41             ` Phillip Susi
  2014-03-11  4:19             ` James Bottomley
  2 siblings, 0 replies; 28+ messages in thread
From: Dan Williams @ 2014-03-10 21:06 UTC (permalink / raw)
  To: Alan Stern
  Cc: Tejun Heo, James Bottomley, Len Brown, linux-scsi, Phillip Susi,
	IDE/ATA development list, Todd Brandt

On Mon, Mar 10, 2014 at 1:59 PM, Alan Stern <stern@rowland.harvard.edu> wrote:
> On Mon, 10 Mar 2014, Dan Williams wrote:
>
>> > The only thing which is a bit concerning is that this doesn't have any
>> > throttling mechanism for simultaneous wakeups.  Would this be able to
>> > blow up the PSU if used on a machine with a lot of spindles?
>>
>> Good point.  The primary benefit is completing userspace resume
>> without needlessly waiting for the disk.  For now I think it would be
>> enough to have a mutex to maintain one disk at a time.  We can follow
>> on later with something more complex to enable a max simultaneous
>> spin-up tunable.
>
> Why?  The existing code doesn't have anything like that.
>

Why follow up later, or why maintain one disk at a time?

I know at least the SCSI driver I maintained had this as a
per-low-level driver tunable.  Seems to be a good candidate for
attempting a unified platform-level tunable.  Mind you I would not be
looking to implement it anytime soon.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v5 3/3] scsi: async sd resume
  2014-03-10 20:59           ` Alan Stern
  2014-03-10 21:06             ` Dan Williams
@ 2014-03-10 22:41             ` Phillip Susi
  2014-03-11  4:19             ` James Bottomley
  2 siblings, 0 replies; 28+ messages in thread
From: Phillip Susi @ 2014-03-10 22:41 UTC (permalink / raw)
  To: Alan Stern, Dan Williams
  Cc: Tejun Heo, James Bottomley, Len Brown, linux-scsi,
	IDE/ATA development list, Todd Brandt

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

On 03/10/2014 04:59 PM, Alan Stern wrote:
> Why?  The existing code doesn't have anything like that.

The current sd code doesn't, but the libata code does use the AHCI
staggered spinup feature flag as a hint to only spin up one drive at a
time, so these patches together may defeat that behavior when resuming
from suspend.  That could be a problem on some systems.  Then again,
it may still follow the same code path in libata; I haven't looked at
it closely enough lately to be sure.


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.14 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQEcBAEBCgAGBQJTHj+FAAoJEI5FoCIzSKrwYm0IAKZlralrzs8+5s7F4xouYrz8
samOep5+XxO8eVa6Cl6kqQvl02jYJNLUnabeBBTe8L5qws+ZFXnSkhFvAC5zd3LF
a68hns996l9wum+XWDAeMb7K7/dKYJiDB+6w3foquA4fgoZsc2tYLEjvUUA9h9U6
+bfZpelLnqqRgFwKwBazqVRB4ZIBGcq1+DLnDTldOH6e4gJSnzoJU1MbLYiCa6KU
tvg2XWDnp1ifhHcCGVIsdEdB1hxKKs3J+3CiXeg1Uc6LoWOfOpsfkvReNfIk4Ju9
zr2KEXGsIZwcmyx9i2Jni/I90oG3jNtVWlOT9Gvey2RW2fiMREesuHTt5KONc/U=
=wrih
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v5 3/3] scsi: async sd resume
  2014-03-10 20:59           ` Alan Stern
  2014-03-10 21:06             ` Dan Williams
  2014-03-10 22:41             ` Phillip Susi
@ 2014-03-11  4:19             ` James Bottomley
  2014-03-11  4:47               ` Dan Williams
  2 siblings, 1 reply; 28+ messages in thread
From: James Bottomley @ 2014-03-11  4:19 UTC (permalink / raw)
  To: Alan Stern
  Cc: Dan Williams, Tejun Heo, Len Brown, linux-scsi, Phillip Susi,
	IDE/ATA development list, Todd Brandt

On Mon, 2014-03-10 at 16:59 -0400, Alan Stern wrote:
> On Mon, 10 Mar 2014, Dan Williams wrote:
> 
> > > The only thing which is a bit concerning is that this doesn't have any
> > > throttling mechanism for simultaneous wakeups.  Would this be able to
> > > blow up the PSU if used on a machine with a lot of spindles?
> > 
> > Good point.  The primary benefit is completing userspace resume
> > without needlessly waiting for the disk.  For now I think it would be
> > enough to have a mutex to maintain one disk at a time.  We can follow
> > on later with something more complex to enable a max simultaneous
> > spin-up tunable.
> 
> Why?  The existing code doesn't have anything like that.

This only becomes a problem if enterprise class systems want to suspend
and resume.  Last I heard, this wasn't on the cards.

We also don't usually need to bother with in-kernel staggered spin ups
because if the device can blow the PSU by doing simultaneous spinups,
staggering is usually hard jumpered in the system.  The main reason for
this is that there's no way for us to work out the PSU topology to do
the stagger in kernel, so the enterprise likes to be sure.

James




^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v5 3/3] scsi: async sd resume
  2014-03-11  4:19             ` James Bottomley
@ 2014-03-11  4:47               ` Dan Williams
  2014-03-11  5:16                 ` James Bottomley
  0 siblings, 1 reply; 28+ messages in thread
From: Dan Williams @ 2014-03-11  4:47 UTC (permalink / raw)
  To: James Bottomley
  Cc: Alan Stern, Tejun Heo, Len Brown, linux-scsi, Phillip Susi,
	IDE/ATA development list, Todd Brandt

On Mon, Mar 10, 2014 at 9:19 PM, James Bottomley
<James.Bottomley@hansenpartnership.com> wrote:
> On Mon, 2014-03-10 at 16:59 -0400, Alan Stern wrote:
>> On Mon, 10 Mar 2014, Dan Williams wrote:
>>
>> > > The only thing which is a bit concerning is that this doesn't have any
>> > > throttling mechanism for simultaneous wakeups.  Would this be able to
>> > > blow up the PSU if used on a machine with a lot of spindles?
>> >
>> > Good point.  The primary benefit is completing userspace resume
>> > without needlessly waiting for the disk.  For now I think it would be
>> > enough to have a mutex to maintain one disk at a time.  We can follow
>> > on later with something more complex to enable a max simultaneous
>> > spin-up tunable.
>>
>> Why?  The existing code doesn't have anything like that.
>
> This only becomes a problem if enterprise class systems want to suspend
> and resume.  Last I heard, this wasn't on the cards.
>
> We also don't usually need to bother with in-kernel staggered spin ups
> because if the device can blow the PSU by doing simultaneous spinups,
> staggering is usually hard jumpered in the system.  The main reason for
> this is that there's no way for us to work out the PSU topology to do
> the stagger in kernel, so the enterprise likes to be sure.
>

In that case shouldn't we limit it to a disk at a time?  Enterprise
doesn't care and home brew folks that both put a lot of disks in a
system and suspend/resume them are less likely to burn themselves.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v5 3/3] scsi: async sd resume
  2014-03-11  4:47               ` Dan Williams
@ 2014-03-11  5:16                 ` James Bottomley
  2014-03-11  5:26                   ` Dan Williams
  0 siblings, 1 reply; 28+ messages in thread
From: James Bottomley @ 2014-03-11  5:16 UTC (permalink / raw)
  To: Dan Williams
  Cc: Alan Stern, Tejun Heo, Len Brown, linux-scsi, Phillip Susi,
	IDE/ATA development list, Todd Brandt

On Mon, 2014-03-10 at 21:47 -0700, Dan Williams wrote:
> On Mon, Mar 10, 2014 at 9:19 PM, James Bottomley
> <James.Bottomley@hansenpartnership.com> wrote:
> > On Mon, 2014-03-10 at 16:59 -0400, Alan Stern wrote:
> >> On Mon, 10 Mar 2014, Dan Williams wrote:
> >>
> >> > > The only thing which is a bit concerning is that this doesn't have any
> >> > > throttling mechanism for simultaneous wakeups.  Would this be able to
> >> > > blow up the PSU if used on a machine with a lot of spindles?
> >> >
> >> > Good point.  The primary benefit is completing userspace resume
> >> > without needlessly waiting for the disk.  For now I think it would be
> >> > enough to have a mutex to maintain one disk at a time.  We can follow
> >> > on later with something more complex to enable a max simultaneous
> >> > spin-up tunable.
> >>
> >> Why?  The existing code doesn't have anything like that.
> >
> > This only becomes a problem if enterprise class systems want to suspend
> > and resume.  Last I heard, this wasn't on the cards.
> >
> > We also don't usually need to bother with in-kernel staggered spin ups
> > because if the device can blow the PSU by doing simultaneous spinups,
> > staggering is usually hard jumpered in the system.  The main reason for
> > this is that there's no way for us to work out the PSU topology to do
> > the stagger in kernel, so the enterprise likes to be sure.
> >
> 
> In that case shouldn't we limit it to a disk at a time?  Enterprise
> doesn't care and home brew folks that both put a lot of disks in a
> system and suspend/resume them are less likely to burn themselves.

Where do you get that idea from?  the enterprise is amazingly sensitive
to system start times; they'd demand my head on a plate if we started a
disk at a time and their start times went up by 10x.  That's why we
leave it to hard wiring ... they can start in batches the maximum number
allowable and thus get the shortest start up times.

In the long game, though this whole debate is moot: setups with hard
wired start times adhere to them regardless of what the system does, so
they ignore start unit commands.  Systems without hard wired start times
can usually be started at once, so us introducing a delay is unnecessary
in either case.

James



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v5 3/3] scsi: async sd resume
  2014-03-11  5:16                 ` James Bottomley
@ 2014-03-11  5:26                   ` Dan Williams
  2014-03-11  6:29                     ` James Bottomley
  0 siblings, 1 reply; 28+ messages in thread
From: Dan Williams @ 2014-03-11  5:26 UTC (permalink / raw)
  To: James Bottomley
  Cc: Alan Stern, Tejun Heo, Len Brown, linux-scsi, Phillip Susi,
	IDE/ATA development list, Todd Brandt

On Mon, Mar 10, 2014 at 10:16 PM, James Bottomley
<James.Bottomley@hansenpartnership.com> wrote:
> On Mon, 2014-03-10 at 21:47 -0700, Dan Williams wrote:
>> On Mon, Mar 10, 2014 at 9:19 PM, James Bottomley
>> <James.Bottomley@hansenpartnership.com> wrote:
>> > On Mon, 2014-03-10 at 16:59 -0400, Alan Stern wrote:
>> >> On Mon, 10 Mar 2014, Dan Williams wrote:
>> >>
>> >> > > The only thing which is a bit concerning is that this doesn't have any
>> >> > > throttling mechanism for simultaneous wakeups.  Would this be able to
>> >> > > blow up the PSU if used on a machine with a lot of spindles?
>> >> >
>> >> > Good point.  The primary benefit is completing userspace resume
>> >> > without needlessly waiting for the disk.  For now I think it would be
>> >> > enough to have a mutex to maintain one disk at a time.  We can follow
>> >> > on later with something more complex to enable a max simultaneous
>> >> > spin-up tunable.
>> >>
>> >> Why?  The existing code doesn't have anything like that.
>> >
>> > This only becomes a problem if enterprise class systems want to suspend
>> > and resume.  Last I heard, this wasn't on the cards.
>> >
>> > We also don't usually need to bother with in-kernel staggered spin ups
>> > because if the device can blow the PSU by doing simultaneous spinups,
>> > staggering is usually hard jumpered in the system.  The main reason for
>> > this is that there's no way for us to work out the PSU topology to do
>> > the stagger in kernel, so the enterprise likes to be sure.
>> >
>>
>> In that case shouldn't we limit it to a disk at a time?  Enterprise
>> doesn't care and home brew folks that both put a lot of disks in a
>> system and suspend/resume them are less likely to burn themselves.
>
> Where do you get that idea from?  the enterprise is amazingly sensitive
> to system start times; they'd demand my head on a plate if we started a
> disk at a time and their start times went up by 10x.  That's why we
> leave it to hard wiring ... they can start in batches the maximum number
> allowable and thus get the shortest start up times.

Just considering the dpm_resume path, not device discovery...

> In the long game, though this whole debate is moot: setups with hard
> wired start times adhere to them regardless of what the system does, so
> they ignore start unit commands.  Systems without hard wired start times
> can usually be started at once, so us introducing a delay is unnecessary
> in either case.

Ok, then I'll let the patch stand as is.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v5 3/3] scsi: async sd resume
  2014-03-11  5:26                   ` Dan Williams
@ 2014-03-11  6:29                     ` James Bottomley
  2014-03-14 20:11                       ` Dan Williams
  0 siblings, 1 reply; 28+ messages in thread
From: James Bottomley @ 2014-03-11  6:29 UTC (permalink / raw)
  To: Dan Williams
  Cc: Alan Stern, Tejun Heo, Len Brown, linux-scsi, Phillip Susi,
	IDE/ATA development list, Todd Brandt

On Mon, 2014-03-10 at 22:26 -0700, Dan Williams wrote:
> On Mon, Mar 10, 2014 at 10:16 PM, James Bottomley
> <James.Bottomley@hansenpartnership.com> wrote:
> > On Mon, 2014-03-10 at 21:47 -0700, Dan Williams wrote:
> >> On Mon, Mar 10, 2014 at 9:19 PM, James Bottomley
> >> <James.Bottomley@hansenpartnership.com> wrote:
> >> > On Mon, 2014-03-10 at 16:59 -0400, Alan Stern wrote:
> >> >> On Mon, 10 Mar 2014, Dan Williams wrote:
> >> >>
> >> >> > > The only thing which is a bit concerning is that this doesn't have any
> >> >> > > throttling mechanism for simultaneous wakeups.  Would this be able to
> >> >> > > blow up the PSU if used on a machine with a lot of spindles?
> >> >> >
> >> >> > Good point.  The primary benefit is completing userspace resume
> >> >> > without needlessly waiting for the disk.  For now I think it would be
> >> >> > enough to have a mutex to maintain one disk at a time.  We can follow
> >> >> > on later with something more complex to enable a max simultaneous
> >> >> > spin-up tunable.
> >> >>
> >> >> Why?  The existing code doesn't have anything like that.
> >> >
> >> > This only becomes a problem if enterprise class systems want to suspend
> >> > and resume.  Last I heard, this wasn't on the cards.
> >> >
> >> > We also don't usually need to bother with in-kernel staggered spin ups
> >> > because if the device can blow the PSU by doing simultaneous spinups,
> >> > staggering is usually hard jumpered in the system.  The main reason for
> >> > this is that there's no way for us to work out the PSU topology to do
> >> > the stagger in kernel, so the enterprise likes to be sure.
> >> >
> >>
> >> In that case shouldn't we limit it to a disk at a time?  Enterprise
> >> doesn't care and home brew folks that both put a lot of disks in a
> >> system and suspend/resume them are less likely to burn themselves.
> >
> > Where do you get that idea from?  the enterprise is amazingly sensitive
> > to system start times; they'd demand my head on a plate if we started a
> > disk at a time and their start times went up by 10x.  That's why we
> > leave it to hard wiring ... they can start in batches the maximum number
> > allowable and thus get the shortest start up times.
> 
> Just considering the dpm_resume path, not device discovery...

But the point is, it's the same.  Either the device was stopped in
suspend or it wasn't.  If it wasn't, the discussion is irrelevant and
there's nothing to wait for and if it was, we need to follow the same
start logic as we did in system bringup.

> > In the long game, though this whole debate is moot: setups with hard
> > wired start times adhere to them regardless of what the system does, so
> > they ignore start unit commands.  Systems without hard wired start times
> > can usually be started at once, so us introducing a delay is unnecessary
> > in either case.
> 
> Ok, then I'll let the patch stand as is.

Sounds good.

James



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v5 3/3] scsi: async sd resume
  2014-03-11  6:29                     ` James Bottomley
@ 2014-03-14 20:11                       ` Dan Williams
  2014-03-15 21:47                         ` James Bottomley
  0 siblings, 1 reply; 28+ messages in thread
From: Dan Williams @ 2014-03-14 20:11 UTC (permalink / raw)
  To: James Bottomley
  Cc: Alan Stern, Tejun Heo, Len Brown, linux-scsi, Phillip Susi,
	IDE/ATA development list, Todd Brandt

On Mon, Mar 10, 2014 at 11:29 PM, James Bottomley
<James.Bottomley@hansenpartnership.com> wrote:
>> > In the long game, though this whole debate is moot: setups with hard
>> > wired start times adhere to them regardless of what the system does, so
>> > they ignore start unit commands.  Systems without hard wired start times
>> > can usually be started at once, so us introducing a delay is unnecessary
>> > in either case.
>>
>> Ok, then I'll let the patch stand as is.
>
> Sounds good.
>
> James
>
>

Well, one more chirp about this.  If the user has disabled async
scanning by CONFIG_SCSI_SCAN_ASYNC=n or scsi_mod.scan != "async" then
resume should follow suit.  I'll include this in the next rev.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v5 3/3] scsi: async sd resume
  2014-03-14 20:11                       ` Dan Williams
@ 2014-03-15 21:47                         ` James Bottomley
  2014-03-15 23:35                           ` Dan Williams
  0 siblings, 1 reply; 28+ messages in thread
From: James Bottomley @ 2014-03-15 21:47 UTC (permalink / raw)
  To: Dan Williams
  Cc: Alan Stern, Tejun Heo, Len Brown, linux-scsi, Phillip Susi,
	IDE/ATA development list, Todd Brandt

On Fri, 2014-03-14 at 13:11 -0700, Dan Williams wrote:
> On Mon, Mar 10, 2014 at 11:29 PM, James Bottomley
> <James.Bottomley@hansenpartnership.com> wrote:
> >> > In the long game, though this whole debate is moot: setups with hard
> >> > wired start times adhere to them regardless of what the system does, so
> >> > they ignore start unit commands.  Systems without hard wired start times
> >> > can usually be started at once, so us introducing a delay is unnecessary
> >> > in either case.
> >>
> >> Ok, then I'll let the patch stand as is.
> >
> > Sounds good.
> >
> > James
> >
> >
> 
> Well, one more chirp about this.  If the user has disabled async
> scanning by CONFIG_SCSI_SCAN_ASYNC=n or scsi_mod.scan != "async" then
> resume should follow suit.  I'll include this in the next rev.

Hm, OK, if this is tied at the hip to async scanning, why do you need
another async domain for it?  Why not just use the current async
scanning domain ... it will actually probably resolve a few nasty (but
wholly manufactured) races where scanning races with suspend.

James



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v5 3/3] scsi: async sd resume
  2014-03-15 21:47                         ` James Bottomley
@ 2014-03-15 23:35                           ` Dan Williams
  2014-03-16 18:21                             ` James Bottomley
  0 siblings, 1 reply; 28+ messages in thread
From: Dan Williams @ 2014-03-15 23:35 UTC (permalink / raw)
  To: James Bottomley
  Cc: Alan Stern, Tejun Heo, Len Brown, linux-scsi, Phillip Susi,
	IDE/ATA development list, Todd Brandt

On Sat, Mar 15, 2014 at 2:47 PM, James Bottomley
<James.Bottomley@hansenpartnership.com> wrote:
> On Fri, 2014-03-14 at 13:11 -0700, Dan Williams wrote:
>> On Mon, Mar 10, 2014 at 11:29 PM, James Bottomley
>> <James.Bottomley@hansenpartnership.com> wrote:
>> >> > In the long game, though this whole debate is moot: setups with hard
>> >> > wired start times adhere to them regardless of what the system does, so
>> >> > they ignore start unit commands.  Systems without hard wired start times
>> >> > can usually be started at once, so us introducing a delay is unnecessary
>> >> > in either case.
>> >>
>> >> Ok, then I'll let the patch stand as is.
>> >
>> > Sounds good.
>> >
>> > James
>> >
>> >
>>
>> Well, one more chirp about this.  If the user has disabled async
>> scanning by CONFIG_SCSI_SCAN_ASYNC=n or scsi_mod.scan != "async" then
>> resume should follow suit.  I'll include this in the next rev.
>
> Hm, OK, if this is tied at the hip to async scanning, why do you need
> another async domain for it?  Why not just use the current async
> scanning domain ... it will actually probably resolve a few nasty (but
> wholly manufactured) races where scanning races with suspend.

I considered that initially, but it ends up destroying most/all of the
benefit of doing it asynchronously.  This is due to the fact that
scsi_sd_probe_domain is flushed by the async_synchronize_full()
performed in dpm_resume().  We want userspace to resume while the disk
may still be starting.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v5 3/3] scsi: async sd resume
  2014-03-15 23:35                           ` Dan Williams
@ 2014-03-16 18:21                             ` James Bottomley
  2014-03-17 18:20                               ` Dan Williams
  0 siblings, 1 reply; 28+ messages in thread
From: James Bottomley @ 2014-03-16 18:21 UTC (permalink / raw)
  To: Dan Williams
  Cc: Alan Stern, Tejun Heo, Len Brown, linux-scsi, Phillip Susi,
	IDE/ATA development list, Todd Brandt

On Sat, 2014-03-15 at 16:35 -0700, Dan Williams wrote:
> On Sat, Mar 15, 2014 at 2:47 PM, James Bottomley
> <James.Bottomley@hansenpartnership.com> wrote:
> > On Fri, 2014-03-14 at 13:11 -0700, Dan Williams wrote:
> >> On Mon, Mar 10, 2014 at 11:29 PM, James Bottomley
> >> <James.Bottomley@hansenpartnership.com> wrote:
> >> >> > In the long game, though this whole debate is moot: setups with hard
> >> >> > wired start times adhere to them regardless of what the system does, so
> >> >> > they ignore start unit commands.  Systems without hard wired start times
> >> >> > can usually be started at once, so us introducing a delay is unnecessary
> >> >> > in either case.
> >> >>
> >> >> Ok, then I'll let the patch stand as is.
> >> >
> >> > Sounds good.
> >> >
> >> > James
> >> >
> >> >
> >>
> >> Well, one more chirp about this.  If the user has disabled async
> >> scanning by CONFIG_SCSI_SCAN_ASYNC=n or scsi_mod.scan != "async" then
> >> resume should follow suit.  I'll include this in the next rev.
> >
> > Hm, OK, if this is tied at the hip to async scanning, why do you need
> > another async domain for it?  Why not just use the current async
> > scanning domain ... it will actually probably resolve a few nasty (but
> > wholly manufactured) races where scanning races with suspend.
> 
> I considered that initially, but it ends up destroying most/all of the
> benefit of doing it asynchronously.  This is due to the fact that
> scsi_sd_probe_domain is flushed by the async_synchronize_full()
> performed in dpm_resume().  We want userspace to resume while the disk
> may still be starting.

OK, finally got it, the new domain doesn't participate in
async_synchronize_full() but scsi_sd_probe_domain does (and has to
because of the device and module operations).  That might actually be
worth a comment somewhere.

James




^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v5 3/3] scsi: async sd resume
  2014-03-16 18:21                             ` James Bottomley
@ 2014-03-17 18:20                               ` Dan Williams
  2014-03-21  0:09                                 ` Dan Williams
  0 siblings, 1 reply; 28+ messages in thread
From: Dan Williams @ 2014-03-17 18:20 UTC (permalink / raw)
  To: James Bottomley
  Cc: Alan Stern, Tejun Heo, Len Brown, linux-scsi, Phillip Susi,
	IDE/ATA development list, Todd Brandt

On Sun, 2014-03-16 at 11:21 -0700, James Bottomley wrote:
> On Sat, 2014-03-15 at 16:35 -0700, Dan Williams wrote:
> > On Sat, Mar 15, 2014 at 2:47 PM, James Bottomley
> > <James.Bottomley@hansenpartnership.com> wrote:
> > > On Fri, 2014-03-14 at 13:11 -0700, Dan Williams wrote:
> > >> On Mon, Mar 10, 2014 at 11:29 PM, James Bottomley
> > >> <James.Bottomley@hansenpartnership.com> wrote:
> > >> >> > In the long game, though this whole debate is moot: setups with hard
> > >> >> > wired start times adhere to them regardless of what the system does, so
> > >> >> > they ignore start unit commands.  Systems without hard wired start times
> > >> >> > can usually be started at once, so us introducing a delay is unnecessary
> > >> >> > in either case.
> > >> >>
> > >> >> Ok, then I'll let the patch stand as is.
> > >> >
> > >> > Sounds good.
> > >> >
> > >> > James
> > >> >
> > >> >
> > >>
> > >> Well, one more chirp about this.  If the user has disabled async
> > >> scanning by CONFIG_SCSI_SCAN_ASYNC=n or scsi_mod.scan != "async" then
> > >> resume should follow suit.  I'll include this in the next rev.
> > >
> > > Hm, OK, if this is tied at the hip to async scanning, why do you need
> > > another async domain for it?  Why not just use the current async
> > > scanning domain ... it will actually probably resolve a few nasty (but
> > > wholly manufactured) races where scanning races with suspend.
> > 
> > I considered that initially, but it ends up destroying most/all of the
> > benefit of doing it asynchronously.  This is due to the fact that
> > scsi_sd_probe_domain is flushed by the async_synchronize_full()
> > performed in dpm_resume().  We want userspace to resume while the disk
> > may still be starting.
> 
> OK, finally got it, the new domain doesn't participate in
> async_synchronize_full() but scsi_sd_probe_domain does (and has to
> because of the device and module operations).  That might actually be
> worth a comment somewhere.

Fair enough, see the comment added to the declaration of
scsi_sd_pm_domain below.

BTW, these patches are independent.  Patch 1 and 2 can go through
Tejun/libata and this one can go through you/scsi by itself.

8<----------
Subject: scsi: async sd resume

From: Dan Williams <dan.j.williams@intel.com>

async_schedule() sd resume work to allow disks and other devices to
resume in parallel.

This moves the entirety of scsi_device resume to an async context to
ensure that scsi_device_resume() remains ordered with respect to the
completion of the start/stop command.  For the duration of the resume,
new command submissions (that do not originate from the scsi-core) will
be deferred (BLKPREP_DEFER).

It adds a new ASYNC_DOMAIN_EXCLUSIVE(scsi_sd_pm_domain) as a container
of these operations.  Like scsi_sd_probe_domain it is flushed at
sd_remove() time to ensure async ops do not continue past the
end-of-life of the sdev.  The implementation explicitly refrains from
reusing scsi_sd_probe_domain directly for this purpose as it is flushed
at the end of dpm_resume(), potentially defeating some of the benefit.
Given sdevs are quiesced it is permissible for these resume operations
to bleed past the async_synchronize_full() calls made by the driver
core.

We defer the resolution of which pm callback to call until
scsi_dev_type_{suspend|resume} time and guarantee that the callback
parameter is never NULL.  With this in place the type of resume
operation is encoded in the async function identifier.

There is a concern that async resume could trigger PSU overload.  In the
enterprise, storage enclosures enforce staggered spin-up regardless of
what the kernel does making async scanning safe by default.  Outside of
that context a user can disable asynchronous scanning via a kernel
command line or CONFIG_SCSI_SCAN_ASYNC.  Honor that setting when
deciding whether to do resume asynchronously.

Inspired by Todd's analysis and initial proposal [2]:
https://01.org/suspendresume/blogs/tebrandt/2013/hard-disk-resume-optimization-simpler-approach

Cc: Len Brown <len.brown@intel.com>
Cc: Phillip Susi <psusi@ubuntu.com>
[alan: bug fix and clean up suggestion]
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Suggested-by: Todd Brandt <todd.e.brandt@linux.intel.com>
[djbw: kick all resume work to the async queue]
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/scsi/Kconfig     |    3 +
 drivers/scsi/scsi.c      |    9 +++
 drivers/scsi/scsi_pm.c   |  128 ++++++++++++++++++++++++++++++++++++----------
 drivers/scsi/scsi_priv.h |    2 +
 drivers/scsi/scsi_scan.c |    2 -
 drivers/scsi/sd.c        |    1 
 6 files changed, 115 insertions(+), 30 deletions(-)

diff --git a/drivers/scsi/Kconfig b/drivers/scsi/Kconfig
index c8bd092fc945..02832d64d918 100644
--- a/drivers/scsi/Kconfig
+++ b/drivers/scsi/Kconfig
@@ -263,6 +263,9 @@ config SCSI_SCAN_ASYNC
 	  You can override this choice by specifying "scsi_mod.scan=sync"
 	  or async on the kernel's command line.
 
+	  Note that this setting also affects whether resuming from
+	  system suspend will be performed asynchronously.
+
 menu "SCSI Transports"
 	depends on SCSI
 
diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
index d8afec8317cf..1b345bf41a91 100644
--- a/drivers/scsi/scsi.c
+++ b/drivers/scsi/scsi.c
@@ -91,6 +91,15 @@ EXPORT_SYMBOL(scsi_logging_level);
 ASYNC_DOMAIN(scsi_sd_probe_domain);
 EXPORT_SYMBOL(scsi_sd_probe_domain);
 
+/*
+ * Separate domain (from scsi_sd_probe_domain) to maximize the benefit of
+ * asynchronous system resume operations.  It is marked 'exclusive' to avoid
+ * being included in the async_synchronize_full() that is invoked by
+ * dpm_resume()
+ */
+ASYNC_DOMAIN_EXCLUSIVE(scsi_sd_pm_domain);
+EXPORT_SYMBOL(scsi_sd_pm_domain);
+
 /* NB: These are exposed through /proc/scsi/scsi and form part of the ABI.
  * You may not alter any existing entry (although adding new ones is
  * encouraged once assigned by ANSI/INCITS T10
diff --git a/drivers/scsi/scsi_pm.c b/drivers/scsi/scsi_pm.c
index 001e9ceda4c3..7454498c4091 100644
--- a/drivers/scsi/scsi_pm.c
+++ b/drivers/scsi/scsi_pm.c
@@ -18,35 +18,77 @@
 
 #ifdef CONFIG_PM_SLEEP
 
-static int scsi_dev_type_suspend(struct device *dev, int (*cb)(struct device *))
+static int do_scsi_suspend(struct device *dev, const struct dev_pm_ops *pm)
 {
+	return pm && pm->suspend ? pm->suspend(dev) : 0;
+}
+
+static int do_scsi_freeze(struct device *dev, const struct dev_pm_ops *pm)
+{
+	return pm && pm->freeze ? pm->freeze(dev) : 0;
+}
+
+static int do_scsi_poweroff(struct device *dev, const struct dev_pm_ops *pm)
+{
+	return pm && pm->poweroff ? pm->poweroff(dev) : 0;
+}
+
+static int do_scsi_resume(struct device *dev, const struct dev_pm_ops *pm)
+{
+	return pm && pm->resume ? pm->resume(dev) : 0;
+}
+
+static int do_scsi_thaw(struct device *dev, const struct dev_pm_ops *pm)
+{
+	return pm && pm->thaw ? pm->thaw(dev) : 0;
+}
+
+static int do_scsi_restore(struct device *dev, const struct dev_pm_ops *pm)
+{
+	return pm && pm->restore ? pm->restore(dev) : 0;
+}
+
+static int scsi_dev_type_suspend(struct device *dev,
+		int (*cb)(struct device *, const struct dev_pm_ops *))
+{
+	const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
 	int err;
 
+	/* flush pending in-flight resume operations, suspend is synchronous */
+	async_synchronize_full_domain(&scsi_sd_pm_domain);
+
 	err = scsi_device_quiesce(to_scsi_device(dev));
 	if (err == 0) {
-		if (cb) {
-			err = cb(dev);
-			if (err)
-				scsi_device_resume(to_scsi_device(dev));
-		}
+		err = cb(dev, pm);
+		if (err)
+			scsi_device_resume(to_scsi_device(dev));
 	}
 	dev_dbg(dev, "scsi suspend: %d\n", err);
 	return err;
 }
 
-static int scsi_dev_type_resume(struct device *dev, int (*cb)(struct device *))
+static int scsi_dev_type_resume(struct device *dev,
+		int (*cb)(struct device *, const struct dev_pm_ops *))
 {
+	const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
 	int err = 0;
 
-	if (cb)
-		err = cb(dev);
+	err = cb(dev, pm);
 	scsi_device_resume(to_scsi_device(dev));
 	dev_dbg(dev, "scsi resume: %d\n", err);
+
+	if (err == 0) {
+		pm_runtime_disable(dev);
+		pm_runtime_set_active(dev);
+		pm_runtime_enable(dev);
+	}
+
 	return err;
 }
 
 static int
-scsi_bus_suspend_common(struct device *dev, int (*cb)(struct device *))
+scsi_bus_suspend_common(struct device *dev,
+		int (*cb)(struct device *, const struct dev_pm_ops *))
 {
 	int err = 0;
 
@@ -66,20 +108,54 @@ scsi_bus_suspend_common(struct device *dev, int (*cb)(struct device *))
 	return err;
 }
 
-static int
-scsi_bus_resume_common(struct device *dev, int (*cb)(struct device *))
+static void async_sdev_resume(void *dev, async_cookie_t cookie)
 {
-	int err = 0;
+	scsi_dev_type_resume(dev, do_scsi_resume);
+}
 
-	if (scsi_is_sdev_device(dev))
-		err = scsi_dev_type_resume(dev, cb);
+static void async_sdev_thaw(void *dev, async_cookie_t cookie)
+{
+	scsi_dev_type_resume(dev, do_scsi_thaw);
+}
 
-	if (err == 0) {
+static void async_sdev_restore(void *dev, async_cookie_t cookie)
+{
+	scsi_dev_type_resume(dev, do_scsi_restore);
+}
+
+static int scsi_bus_resume_common(struct device *dev,
+		int (*cb)(struct device *, const struct dev_pm_ops *))
+{
+	async_func_t fn;
+
+	if (!scsi_is_sdev_device(dev))
+		fn = NULL;
+	else if (cb == do_scsi_resume)
+		fn = async_sdev_resume;
+	else if (cb == do_scsi_thaw)
+		fn = async_sdev_thaw;
+	else if (cb == do_scsi_restore)
+		fn = async_sdev_restore;
+	else
+		fn = NULL;
+
+	if (fn) {
+		async_schedule_domain(fn, dev, &scsi_sd_pm_domain);
+
+		/*
+		 * If a user has disabled async probing a likely reason
+		 * is due to a storage enclosure that does not inject
+		 * staggered spin-ups.  For safety, make resume
+		 * synchronous as well in that case.
+		 */
+		if (strncmp(scsi_scan_type, "async", 5) != 0)
+			async_synchronize_full_domain(&scsi_sd_pm_domain);
+	} else {
 		pm_runtime_disable(dev);
 		pm_runtime_set_active(dev);
 		pm_runtime_enable(dev);
 	}
-	return err;
+	return 0;
 }
 
 static int scsi_bus_prepare(struct device *dev)
@@ -97,38 +173,32 @@ static int scsi_bus_prepare(struct device *dev)
 
 static int scsi_bus_suspend(struct device *dev)
 {
-	const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
-	return scsi_bus_suspend_common(dev, pm ? pm->suspend : NULL);
+	return scsi_bus_suspend_common(dev, do_scsi_suspend);
 }
 
 static int scsi_bus_resume(struct device *dev)
 {
-	const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
-	return scsi_bus_resume_common(dev, pm ? pm->resume : NULL);
+	return scsi_bus_resume_common(dev, do_scsi_resume);
 }
 
 static int scsi_bus_freeze(struct device *dev)
 {
-	const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
-	return scsi_bus_suspend_common(dev, pm ? pm->freeze : NULL);
+	return scsi_bus_suspend_common(dev, do_scsi_freeze);
 }
 
 static int scsi_bus_thaw(struct device *dev)
 {
-	const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
-	return scsi_bus_resume_common(dev, pm ? pm->thaw : NULL);
+	return scsi_bus_resume_common(dev, do_scsi_thaw);
 }
 
 static int scsi_bus_poweroff(struct device *dev)
 {
-	const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
-	return scsi_bus_suspend_common(dev, pm ? pm->poweroff : NULL);
+	return scsi_bus_suspend_common(dev, do_scsi_poweroff);
 }
 
 static int scsi_bus_restore(struct device *dev)
 {
-	const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
-	return scsi_bus_resume_common(dev, pm ? pm->restore : NULL);
+	return scsi_bus_resume_common(dev, do_scsi_restore);
 }
 
 #else /* CONFIG_PM_SLEEP */
diff --git a/drivers/scsi/scsi_priv.h b/drivers/scsi/scsi_priv.h
index f079a598bed4..48e5b657e79f 100644
--- a/drivers/scsi/scsi_priv.h
+++ b/drivers/scsi/scsi_priv.h
@@ -112,6 +112,7 @@ extern void scsi_exit_procfs(void);
 #endif /* CONFIG_PROC_FS */
 
 /* scsi_scan.c */
+extern char scsi_scan_type[];
 extern int scsi_complete_async_scans(void);
 extern int scsi_scan_host_selected(struct Scsi_Host *, unsigned int,
 				   unsigned int, unsigned int, int);
@@ -166,6 +167,7 @@ static inline int scsi_autopm_get_host(struct Scsi_Host *h) { return 0; }
 static inline void scsi_autopm_put_host(struct Scsi_Host *h) {}
 #endif /* CONFIG_PM_RUNTIME */
 
+extern struct async_domain scsi_sd_pm_domain;
 extern struct async_domain scsi_sd_probe_domain;
 
 /* 
diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
index 307a81137607..6b2f51f52af6 100644
--- a/drivers/scsi/scsi_scan.c
+++ b/drivers/scsi/scsi_scan.c
@@ -97,7 +97,7 @@ MODULE_PARM_DESC(max_luns,
 #define SCSI_SCAN_TYPE_DEFAULT "sync"
 #endif
 
-static char scsi_scan_type[6] = SCSI_SCAN_TYPE_DEFAULT;
+char scsi_scan_type[6] = SCSI_SCAN_TYPE_DEFAULT;
 
 module_param_string(scan, scsi_scan_type, sizeof(scsi_scan_type), S_IRUGO);
 MODULE_PARM_DESC(scan, "sync, async or none");
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 470954aba728..700c595c603e 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -3020,6 +3020,7 @@ static int sd_remove(struct device *dev)
 	devt = disk_devt(sdkp->disk);
 	scsi_autopm_get_device(sdkp->device);
 
+	async_synchronize_full_domain(&scsi_sd_pm_domain);
 	async_synchronize_full_domain(&scsi_sd_probe_domain);
 	blk_queue_prep_rq(sdkp->device->request_queue, scsi_prep_fn);
 	blk_queue_unprep_rq(sdkp->device->request_queue, NULL);



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH v5 3/3] scsi: async sd resume
  2014-03-17 18:20                               ` Dan Williams
@ 2014-03-21  0:09                                 ` Dan Williams
  0 siblings, 0 replies; 28+ messages in thread
From: Dan Williams @ 2014-03-21  0:09 UTC (permalink / raw)
  To: James Bottomley
  Cc: Alan Stern, Tejun Heo, Len Brown, linux-scsi, Phillip Susi,
	IDE/ATA development list, Todd Brandt

On Mon, 2014-03-17 at 11:20 -0700, Dan Williams wrote:
> On Sun, 2014-03-16 at 11:21 -0700, James Bottomley wrote:
> > On Sat, 2014-03-15 at 16:35 -0700, Dan Williams wrote:
> > > On Sat, Mar 15, 2014 at 2:47 PM, James Bottomley
> > > > Hm, OK, if this is tied at the hip to async scanning, why do you need
> > > > another async domain for it?  Why not just use the current async
> > > > scanning domain ... it will actually probably resolve a few nasty (but
> > > > wholly manufactured) races where scanning races with suspend.
> > > 
> > > I considered that initially, but it ends up destroying most/all of the
> > > benefit of doing it asynchronously.  This is due to the fact that
> > > scsi_sd_probe_domain is flushed by the async_synchronize_full()
> > > performed in dpm_resume().  We want userspace to resume while the disk
> > > may still be starting.
> > 
> > OK, finally got it, the new domain doesn't participate in
> > async_synchronize_full() but scsi_sd_probe_domain does (and has to
> > because of the device and module operations).  That might actually be
> > worth a comment somewhere.
> 
> Fair enough, see the comment added to the declaration of
> scsi_sd_pm_domain below.
> 
> BTW, these patches are independent.  Patch 1 and 2 can go through
> Tejun/libata and this one can go through you/scsi by itself.
> 

...and Tejun has now applied the ata bits for 3.15.

James, when you get a chance, please apply this latest version of patch
3 or otherwise let me know your disposition.

Regards,
Dan

(no changes since the version I sent on the 17th)

8<------
scsi: async sd resume

From: Dan Williams <dan.j.williams@intel.com>

async_schedule() sd resume work to allow disks and other devices to
resume in parallel.

This moves the entirety of scsi_device resume to an async context to
ensure that scsi_device_resume() remains ordered with respect to the
completion of the start/stop command.  For the duration of the resume,
new command submissions (that do not originate from the scsi-core) will
be deferred (BLKPREP_DEFER).

It adds a new ASYNC_DOMAIN_EXCLUSIVE(scsi_sd_pm_domain) as a container
of these operations.  Like scsi_sd_probe_domain it is flushed at
sd_remove() time to ensure async ops do not continue past the
end-of-life of the sdev.  The implementation explicitly refrains from
reusing scsi_sd_probe_domain directly for this purpose as it is flushed
at the end of dpm_resume(), potentially defeating some of the benefit.
Given sdevs are quiesced it is permissible for these resume operations
to bleed past the async_synchronize_full() calls made by the driver
core.

We defer the resolution of which pm callback to call until
scsi_dev_type_{suspend|resume} time and guarantee that the callback
parameter is never NULL.  With this in place the type of resume
operation is encoded in the async function identifier.

There is a concern that async resume could trigger PSU overload.  In the
enterprise, storage enclosures enforce staggered spin-up regardless of
what the kernel does making async scanning safe by default.  Outside of
that context a user can disable asynchronous scanning via a kernel
command line or CONFIG_SCSI_SCAN_ASYNC.  Honor that setting when
deciding whether to do resume asynchronously.

Inspired by Todd's analysis and initial proposal [2]:
https://01.org/suspendresume/blogs/tebrandt/2013/hard-disk-resume-optimization-simpler-approach

Cc: Len Brown <len.brown@intel.com>
Cc: Phillip Susi <psusi@ubuntu.com>
[alan: bug fix and clean up suggestion]
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Suggested-by: Todd Brandt <todd.e.brandt@linux.intel.com>
[djbw: kick all resume work to the async queue]
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/scsi/Kconfig     |    3 +
 drivers/scsi/scsi.c      |    9 +++
 drivers/scsi/scsi_pm.c   |  128 ++++++++++++++++++++++++++++++++++++----------
 drivers/scsi/scsi_priv.h |    2 +
 drivers/scsi/scsi_scan.c |    2 -
 drivers/scsi/sd.c        |    1 
 6 files changed, 115 insertions(+), 30 deletions(-)

diff --git a/drivers/scsi/Kconfig b/drivers/scsi/Kconfig
index c8bd092fc945..02832d64d918 100644
--- a/drivers/scsi/Kconfig
+++ b/drivers/scsi/Kconfig
@@ -263,6 +263,9 @@ config SCSI_SCAN_ASYNC
 	  You can override this choice by specifying "scsi_mod.scan=sync"
 	  or async on the kernel's command line.
 
+	  Note that this setting also affects whether resuming from
+	  system suspend will be performed asynchronously.
+
 menu "SCSI Transports"
 	depends on SCSI
 
diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
index d8afec8317cf..1b345bf41a91 100644
--- a/drivers/scsi/scsi.c
+++ b/drivers/scsi/scsi.c
@@ -91,6 +91,15 @@ EXPORT_SYMBOL(scsi_logging_level);
 ASYNC_DOMAIN(scsi_sd_probe_domain);
 EXPORT_SYMBOL(scsi_sd_probe_domain);
 
+/*
+ * Separate domain (from scsi_sd_probe_domain) to maximize the benefit of
+ * asynchronous system resume operations.  It is marked 'exclusive' to avoid
+ * being included in the async_synchronize_full() that is invoked by
+ * dpm_resume()
+ */
+ASYNC_DOMAIN_EXCLUSIVE(scsi_sd_pm_domain);
+EXPORT_SYMBOL(scsi_sd_pm_domain);
+
 /* NB: These are exposed through /proc/scsi/scsi and form part of the ABI.
  * You may not alter any existing entry (although adding new ones is
  * encouraged once assigned by ANSI/INCITS T10
diff --git a/drivers/scsi/scsi_pm.c b/drivers/scsi/scsi_pm.c
index 001e9ceda4c3..7454498c4091 100644
--- a/drivers/scsi/scsi_pm.c
+++ b/drivers/scsi/scsi_pm.c
@@ -18,35 +18,77 @@
 
 #ifdef CONFIG_PM_SLEEP
 
-static int scsi_dev_type_suspend(struct device *dev, int (*cb)(struct device *))
+static int do_scsi_suspend(struct device *dev, const struct dev_pm_ops *pm)
 {
+	return pm && pm->suspend ? pm->suspend(dev) : 0;
+}
+
+static int do_scsi_freeze(struct device *dev, const struct dev_pm_ops *pm)
+{
+	return pm && pm->freeze ? pm->freeze(dev) : 0;
+}
+
+static int do_scsi_poweroff(struct device *dev, const struct dev_pm_ops *pm)
+{
+	return pm && pm->poweroff ? pm->poweroff(dev) : 0;
+}
+
+static int do_scsi_resume(struct device *dev, const struct dev_pm_ops *pm)
+{
+	return pm && pm->resume ? pm->resume(dev) : 0;
+}
+
+static int do_scsi_thaw(struct device *dev, const struct dev_pm_ops *pm)
+{
+	return pm && pm->thaw ? pm->thaw(dev) : 0;
+}
+
+static int do_scsi_restore(struct device *dev, const struct dev_pm_ops *pm)
+{
+	return pm && pm->restore ? pm->restore(dev) : 0;
+}
+
+static int scsi_dev_type_suspend(struct device *dev,
+		int (*cb)(struct device *, const struct dev_pm_ops *))
+{
+	const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
 	int err;
 
+	/* flush pending in-flight resume operations, suspend is synchronous */
+	async_synchronize_full_domain(&scsi_sd_pm_domain);
+
 	err = scsi_device_quiesce(to_scsi_device(dev));
 	if (err == 0) {
-		if (cb) {
-			err = cb(dev);
-			if (err)
-				scsi_device_resume(to_scsi_device(dev));
-		}
+		err = cb(dev, pm);
+		if (err)
+			scsi_device_resume(to_scsi_device(dev));
 	}
 	dev_dbg(dev, "scsi suspend: %d\n", err);
 	return err;
 }
 
-static int scsi_dev_type_resume(struct device *dev, int (*cb)(struct device *))
+static int scsi_dev_type_resume(struct device *dev,
+		int (*cb)(struct device *, const struct dev_pm_ops *))
 {
+	const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
 	int err = 0;
 
-	if (cb)
-		err = cb(dev);
+	err = cb(dev, pm);
 	scsi_device_resume(to_scsi_device(dev));
 	dev_dbg(dev, "scsi resume: %d\n", err);
+
+	if (err == 0) {
+		pm_runtime_disable(dev);
+		pm_runtime_set_active(dev);
+		pm_runtime_enable(dev);
+	}
+
 	return err;
 }
 
 static int
-scsi_bus_suspend_common(struct device *dev, int (*cb)(struct device *))
+scsi_bus_suspend_common(struct device *dev,
+		int (*cb)(struct device *, const struct dev_pm_ops *))
 {
 	int err = 0;
 
@@ -66,20 +108,54 @@ scsi_bus_suspend_common(struct device *dev, int (*cb)(struct device *))
 	return err;
 }
 
-static int
-scsi_bus_resume_common(struct device *dev, int (*cb)(struct device *))
+static void async_sdev_resume(void *dev, async_cookie_t cookie)
 {
-	int err = 0;
+	scsi_dev_type_resume(dev, do_scsi_resume);
+}
 
-	if (scsi_is_sdev_device(dev))
-		err = scsi_dev_type_resume(dev, cb);
+static void async_sdev_thaw(void *dev, async_cookie_t cookie)
+{
+	scsi_dev_type_resume(dev, do_scsi_thaw);
+}
 
-	if (err == 0) {
+static void async_sdev_restore(void *dev, async_cookie_t cookie)
+{
+	scsi_dev_type_resume(dev, do_scsi_restore);
+}
+
+static int scsi_bus_resume_common(struct device *dev,
+		int (*cb)(struct device *, const struct dev_pm_ops *))
+{
+	async_func_t fn;
+
+	if (!scsi_is_sdev_device(dev))
+		fn = NULL;
+	else if (cb == do_scsi_resume)
+		fn = async_sdev_resume;
+	else if (cb == do_scsi_thaw)
+		fn = async_sdev_thaw;
+	else if (cb == do_scsi_restore)
+		fn = async_sdev_restore;
+	else
+		fn = NULL;
+
+	if (fn) {
+		async_schedule_domain(fn, dev, &scsi_sd_pm_domain);
+
+		/*
+		 * If a user has disabled async probing a likely reason
+		 * is due to a storage enclosure that does not inject
+		 * staggered spin-ups.  For safety, make resume
+		 * synchronous as well in that case.
+		 */
+		if (strncmp(scsi_scan_type, "async", 5) != 0)
+			async_synchronize_full_domain(&scsi_sd_pm_domain);
+	} else {
 		pm_runtime_disable(dev);
 		pm_runtime_set_active(dev);
 		pm_runtime_enable(dev);
 	}
-	return err;
+	return 0;
 }
 
 static int scsi_bus_prepare(struct device *dev)
@@ -97,38 +173,32 @@ static int scsi_bus_prepare(struct device *dev)
 
 static int scsi_bus_suspend(struct device *dev)
 {
-	const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
-	return scsi_bus_suspend_common(dev, pm ? pm->suspend : NULL);
+	return scsi_bus_suspend_common(dev, do_scsi_suspend);
 }
 
 static int scsi_bus_resume(struct device *dev)
 {
-	const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
-	return scsi_bus_resume_common(dev, pm ? pm->resume : NULL);
+	return scsi_bus_resume_common(dev, do_scsi_resume);
 }
 
 static int scsi_bus_freeze(struct device *dev)
 {
-	const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
-	return scsi_bus_suspend_common(dev, pm ? pm->freeze : NULL);
+	return scsi_bus_suspend_common(dev, do_scsi_freeze);
 }
 
 static int scsi_bus_thaw(struct device *dev)
 {
-	const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
-	return scsi_bus_resume_common(dev, pm ? pm->thaw : NULL);
+	return scsi_bus_resume_common(dev, do_scsi_thaw);
 }
 
 static int scsi_bus_poweroff(struct device *dev)
 {
-	const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
-	return scsi_bus_suspend_common(dev, pm ? pm->poweroff : NULL);
+	return scsi_bus_suspend_common(dev, do_scsi_poweroff);
 }
 
 static int scsi_bus_restore(struct device *dev)
 {
-	const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
-	return scsi_bus_resume_common(dev, pm ? pm->restore : NULL);
+	return scsi_bus_resume_common(dev, do_scsi_restore);
 }
 
 #else /* CONFIG_PM_SLEEP */
diff --git a/drivers/scsi/scsi_priv.h b/drivers/scsi/scsi_priv.h
index f079a598bed4..48e5b657e79f 100644
--- a/drivers/scsi/scsi_priv.h
+++ b/drivers/scsi/scsi_priv.h
@@ -112,6 +112,7 @@ extern void scsi_exit_procfs(void);
 #endif /* CONFIG_PROC_FS */
 
 /* scsi_scan.c */
+extern char scsi_scan_type[];
 extern int scsi_complete_async_scans(void);
 extern int scsi_scan_host_selected(struct Scsi_Host *, unsigned int,
 				   unsigned int, unsigned int, int);
@@ -166,6 +167,7 @@ static inline int scsi_autopm_get_host(struct Scsi_Host *h) { return 0; }
 static inline void scsi_autopm_put_host(struct Scsi_Host *h) {}
 #endif /* CONFIG_PM_RUNTIME */
 
+extern struct async_domain scsi_sd_pm_domain;
 extern struct async_domain scsi_sd_probe_domain;
 
 /* 
diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
index 307a81137607..6b2f51f52af6 100644
--- a/drivers/scsi/scsi_scan.c
+++ b/drivers/scsi/scsi_scan.c
@@ -97,7 +97,7 @@ MODULE_PARM_DESC(max_luns,
 #define SCSI_SCAN_TYPE_DEFAULT "sync"
 #endif
 
-static char scsi_scan_type[6] = SCSI_SCAN_TYPE_DEFAULT;
+char scsi_scan_type[6] = SCSI_SCAN_TYPE_DEFAULT;
 
 module_param_string(scan, scsi_scan_type, sizeof(scsi_scan_type), S_IRUGO);
 MODULE_PARM_DESC(scan, "sync, async or none");
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 470954aba728..700c595c603e 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -3020,6 +3020,7 @@ static int sd_remove(struct device *dev)
 	devt = disk_devt(sdkp->disk);
 	scsi_autopm_get_device(sdkp->device);
 
+	async_synchronize_full_domain(&scsi_sd_pm_domain);
 	async_synchronize_full_domain(&scsi_sd_probe_domain);
 	blk_queue_prep_rq(sdkp->device->request_queue, scsi_prep_fn);
 	blk_queue_unprep_rq(sdkp->device->request_queue, NULL);



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH v5 2/3] libata: async resume
  2014-03-05 20:17 ` [PATCH v5 2/3] libata: async resume Dan Williams
@ 2014-12-19 23:46   ` Phillip Susi
  2014-12-20  1:42     ` Phillip Susi
  0 siblings, 1 reply; 28+ messages in thread
From: Phillip Susi @ 2014-12-19 23:46 UTC (permalink / raw)
  To: Todd Brandt
  Cc: Dan Williams, tj, JBottomley, Len Brown, linux-scsi, linux-ide,
	Alan Stern

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

On 03/05/2014 03:17 PM, Dan Williams wrote:
> From: Todd Brandt <todd.e.brandt@linux.intel.com>
> 
> Improve overall system resume time by making libata link recovery 
> actions asynchronous relative to other resume events.
> 
> Link resume operations are performed using the scsi_eh thread, so 
> commands, particularly the sd resume start/stop command, will be
> held off until the device exits error handling.  Libata already
> flushes eh with ata_port_wait_eh() in the port teardown paths, so
> there are no concerns with async operation colliding with the
> end-of-life of the ata_port object.  Also, libata-core is already
> careful to flush in-flight pm operations before another round of pm
> starts on the given ata_port.

I realize this is a little late but I finally started looking at the
patch set I was working on last year again, and now that I look at
your version that was accepted, I realize that it only addresses the
libata side of things.  sd still issues START_STOP_UNIT synchronously
in the resume path, so without the patch fixing that, you shouldn't
see any actual speed up in resume times.  Or are you using the
manage_start_stop flag to inhibit that ( at the cost of taking an
emergency park on each suspend/shutdown )?


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1

iQEcBAEBCgAGBQJUlLjNAAoJENRVrw2cjl5RU48H/2APBJMJ9XyTEfa7r6+M76zH
jf238VwOJUuTUC+Mh2h3AoQVkNy4E8CM/CAnWww8Y1iAvuRTptp9J64NrAQdylCf
p3KLIqhXaGmGvgx1SpzwzwGhbvZ9YM8w1uRC1VLACr9ZwySjEXyEv3B2kZDDUMEj
xxnnQfM47f2km6pxhV7nzt1jHlvaWhvPsuRSaVFxLQstbGR9U1VLJnESZgBFYipR
w5z0dmhssE21A/T8B7dSAx5tDCATeWsMn5fDtQ15MFXgfguXrmmOuHLBtv9EGPZt
d5M1rr2E7WXems5pBoxJMcYFblwQ/h30qPEkRNgYXrfTRx7h79q20tWNI+B1D1c=
=s6tS
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v5 2/3] libata: async resume
  2014-12-19 23:46   ` Phillip Susi
@ 2014-12-20  1:42     ` Phillip Susi
  0 siblings, 0 replies; 28+ messages in thread
From: Phillip Susi @ 2014-12-20  1:42 UTC (permalink / raw)
  To: Todd Brandt
  Cc: Dan Williams, tj, JBottomley, Len Brown, linux-scsi, linux-ide,
	Alan Stern

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

On 12/19/2014 06:46 PM, Phillip Susi wrote:
> I realize this is a little late but I finally started looking at
> the patch set I was working on last year again, and now that I look
> at your version that was accepted, I realize that it only addresses
> the libata side of things.  sd still issues START_STOP_UNIT
> synchronously in the resume path, so without the patch fixing that,
> you shouldn't see any actual speed up in resume times.  Or are you
> using the manage_start_stop flag to inhibit that ( at the cost of
> taking an emergency park on each suspend/shutdown )?

Sorry for the noise, I missed the patch to the scsi core that makes
the entire call to sd_resume async.  Very nice.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1

iQEcBAEBCgAGBQJUlNQQAAoJENRVrw2cjl5RUP8H/3hz6qRtSbRJPYya2RCsjgWE
dM5tm9RXJurPYCa615NmMX+bym2ooSVlYFZPAkuEtIh+uOeO7S9z5WX6de7ErtyN
yNZNSPQXMZDLOzwy0gnZr2BU0Htk4LpX+2mdmos9oVQJJXqc2n1Nje3NXF/F6qm8
I8fjKTyOLQgI8GbJNFkK028rjvxc1DYTGx2hgC2KYpuktdyrJv6c68TZBUF1jFQm
UVsp1xpN5BoGGbhb13LBVm1To8V5YyEvR6Ou1Y633VM4089HLUWMwVBwVIjryyiY
gR6dHdqs/a/1nuXFGZXAMZXhLFuNJ2a/h98QcX5ZxuaKP8XFEN6V/JyXxiekxK8=
=Vtbl
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2014-12-20  1:42 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-03-05 20:17 [PATCH v5 0/3] Accelerate Storage Resume (2x or more) Dan Williams
2014-03-05 20:17 ` [PATCH v5 1/3] libata, libsas: kill pm_result and related cleanup Dan Williams
2014-03-10 20:29   ` Tejun Heo
2014-03-10 20:44     ` Tejun Heo
2014-03-05 20:17 ` [PATCH v5 2/3] libata: async resume Dan Williams
2014-12-19 23:46   ` Phillip Susi
2014-12-20  1:42     ` Phillip Susi
2014-03-05 20:17 ` [PATCH v5 3/3] scsi: async sd resume Dan Williams
2014-03-07 15:42   ` Alan Stern
2014-03-08  0:41     ` Dan Williams
2014-03-08  2:52     ` Dan Williams
2014-03-10 15:49       ` Alan Stern
2014-03-10 20:43       ` Tejun Heo
2014-03-10 20:56         ` Dan Williams
2014-03-10 20:59           ` Alan Stern
2014-03-10 21:06             ` Dan Williams
2014-03-10 22:41             ` Phillip Susi
2014-03-11  4:19             ` James Bottomley
2014-03-11  4:47               ` Dan Williams
2014-03-11  5:16                 ` James Bottomley
2014-03-11  5:26                   ` Dan Williams
2014-03-11  6:29                     ` James Bottomley
2014-03-14 20:11                       ` Dan Williams
2014-03-15 21:47                         ` James Bottomley
2014-03-15 23:35                           ` Dan Williams
2014-03-16 18:21                             ` James Bottomley
2014-03-17 18:20                               ` Dan Williams
2014-03-21  0:09                                 ` Dan Williams

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).