public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] libsas: flush initial device discovery before completing ->scan_finished()
@ 2011-02-17  3:06 Dan Williams
  2011-02-19  0:02 ` James Bottomley
  0 siblings, 1 reply; 3+ messages in thread
From: Dan Williams @ 2011-02-17  3:06 UTC (permalink / raw)
  To: james.bottomley
  Cc: dave.jiang, linux-scsi, David Milburn, jacek.danecki, jack_wang,
	lindar_liu, jeffrey.d.skirvin, edmund.nadolski, Srinivas

During initial scan libsas drivers start their phys and notify libsas
with PORTE_BYTES_DMAED events as port links are established.  This
notification in turn causes libsas to post DISCE_DISCOVER_DOMAIN events
to the queue.  Calling scsi_flush_work() at the end of scan_finished
guarantees that all preceding PORTE_BYTES_DMAED events have been
registered in the queue, but it does not guarantee that the resulting
DISCE_DISCOVER_DOMAIN events have been processed because
flush_workqueue() explicitly avoids live-locking with incoming work.

Introduce sas_flush_discovery() to guarantee that all initial discovery
events have completed.  It is called after the driver determines all
initial PORTE_BYTES_DMAED events have had a chance to enter the queue.
This does not cover BCNs that are generated during expander bring up,
only the initial sas_discover_domain() event.

Cc: David Milburn <dmilburn@redhat.com>
Cc: Srinivas <satyasrinivasp@hcl.in>
Cc: jack_wang@usish.com
Cc: lindar_liu@usish.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/scsi/aic94xx/aic94xx_init.c |    2 +-
 drivers/scsi/mvsas/mv_sas.c         |    2 +-
 drivers/scsi/pm8001/pm8001_sas.c    |    2 +-
 include/scsi/libsas.h               |   13 +++++++++++++
 4 files changed, 16 insertions(+), 3 deletions(-)

diff --git a/drivers/scsi/aic94xx/aic94xx_init.c b/drivers/scsi/aic94xx/aic94xx_init.c
index 3b7e83d..f763daa 100644
--- a/drivers/scsi/aic94xx/aic94xx_init.c
+++ b/drivers/scsi/aic94xx/aic94xx_init.c
@@ -972,7 +972,7 @@ static int asd_scan_finished(struct Scsi_Host *shost, unsigned long time)
 	if (time < HZ)
 		return 0;
 	/* Wait for discovery to finish */
-	scsi_flush_work(shost);
+	sas_flush_discovery(shost);
 	return 1;
 }
 
diff --git a/drivers/scsi/mvsas/mv_sas.c b/drivers/scsi/mvsas/mv_sas.c
index adedaa9..e34c378 100644
--- a/drivers/scsi/mvsas/mv_sas.c
+++ b/drivers/scsi/mvsas/mv_sas.c
@@ -440,7 +440,7 @@ int mvs_scan_finished(struct Scsi_Host *shost, unsigned long time)
 	if (time < HZ)
 		return 0;
 	/* Wait for discovery to finish */
-	scsi_flush_work(shost);
+	sas_flush_discovery(shost);
 	return 1;
 }
 
diff --git a/drivers/scsi/pm8001/pm8001_sas.c b/drivers/scsi/pm8001/pm8001_sas.c
index 6ae059e..997cf59 100644
--- a/drivers/scsi/pm8001/pm8001_sas.c
+++ b/drivers/scsi/pm8001/pm8001_sas.c
@@ -253,7 +253,7 @@ int pm8001_scan_finished(struct Scsi_Host *shost, unsigned long time)
 	if (time < HZ)
 		return 0;
 	/* Wait for discovery to finish */
-	scsi_flush_work(shost);
+	sas_flush_discovery(shost);
 	return 1;
 }
 
diff --git a/include/scsi/libsas.h b/include/scsi/libsas.h
index 5173bba..d4ada3b 100644
--- a/include/scsi/libsas.h
+++ b/include/scsi/libsas.h
@@ -405,6 +405,19 @@ static inline void sas_phy_disconnected(struct asd_sas_phy *phy)
 	phy->linkrate = SAS_LINK_RATE_UNKNOWN;
 }
 
+/* Before returning from ->scan_finished() an LLDD calls this routine to
+ * ensure that all port notifications have been promoted to domain
+ * discovery events, and that initial domain discovery has completed
+ */
+static inline void sas_flush_discovery(struct Scsi_Host *shost)
+{
+	/* flush port events */
+	scsi_flush_work(shost);
+
+	/* flush domain discovery events queued by the port events */
+	scsi_flush_work(shost);
+}
+
 /* ---------- Tasks ---------- */
 /*
       service_response |  SAS_TASK_COMPLETE  |  SAS_TASK_UNDELIVERED |


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] libsas: flush initial device discovery before completing ->scan_finished()
  2011-02-17  3:06 [PATCH] libsas: flush initial device discovery before completing ->scan_finished() Dan Williams
@ 2011-02-19  0:02 ` James Bottomley
  2011-02-19  1:32   ` Dan Williams
  0 siblings, 1 reply; 3+ messages in thread
From: James Bottomley @ 2011-02-19  0:02 UTC (permalink / raw)
  To: Dan Williams
  Cc: dave.jiang, linux-scsi, David Milburn, jacek.danecki, jack_wang,
	lindar_liu, jeffrey.d.skirvin, edmund.nadolski, Srinivas

On Wed, 2011-02-16 at 19:06 -0800, Dan Williams wrote:
> During initial scan libsas drivers start their phys and notify libsas
> with PORTE_BYTES_DMAED events as port links are established.  This
> notification in turn causes libsas to post DISCE_DISCOVER_DOMAIN events
> to the queue.  Calling scsi_flush_work() at the end of scan_finished
> guarantees that all preceding PORTE_BYTES_DMAED events have been
> registered in the queue, but it does not guarantee that the resulting
> DISCE_DISCOVER_DOMAIN events have been processed because
> flush_workqueue() explicitly avoids live-locking with incoming work.
> 
> Introduce sas_flush_discovery() to guarantee that all initial discovery
> events have completed.  It is called after the driver determines all
> initial PORTE_BYTES_DMAED events have had a chance to enter the queue.
> This does not cover BCNs that are generated during expander bring up,
> only the initial sas_discover_domain() event.

I think this is a workaround for an old bug in workqueue flushing (the
flush doesn't clean work it causes) ... I thought that's been fixed for
ages (well, months at least) ... have you verified that this is still a
problem?

James



^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] libsas: flush initial device discovery before completing ->scan_finished()
  2011-02-19  0:02 ` James Bottomley
@ 2011-02-19  1:32   ` Dan Williams
  0 siblings, 0 replies; 3+ messages in thread
From: Dan Williams @ 2011-02-19  1:32 UTC (permalink / raw)
  To: James Bottomley
  Cc: Jiang, Dave, linux-scsi@vger.kernel.org, David Milburn,
	Danecki, Jacek, jack_wang@usish.com, lindar_liu@usish.com,
	Skirvin, Jeffrey D, Nadolski, Edmund, Srinivas

On Fri, 2011-02-18 at 16:02 -0800, James Bottomley wrote:
> On Wed, 2011-02-16 at 19:06 -0800, Dan Williams wrote:
> > During initial scan libsas drivers start their phys and notify libsas
> > with PORTE_BYTES_DMAED events as port links are established.  This
> > notification in turn causes libsas to post DISCE_DISCOVER_DOMAIN events
> > to the queue.  Calling scsi_flush_work() at the end of scan_finished
> > guarantees that all preceding PORTE_BYTES_DMAED events have been
> > registered in the queue, but it does not guarantee that the resulting
> > DISCE_DISCOVER_DOMAIN events have been processed because
> > flush_workqueue() explicitly avoids live-locking with incoming work.
> > 
> > Introduce sas_flush_discovery() to guarantee that all initial discovery
> > events have completed.  It is called after the driver determines all
> > initial PORTE_BYTES_DMAED events have had a chance to enter the queue.
> > This does not cover BCNs that are generated during expander bring up,
> > only the initial sas_discover_domain() event.
> 
> I think this is a workaround for an old bug in workqueue flushing (the
> flush doesn't clean work it causes) ... I thought that's been fixed for
> ages (well, months at least) ... have you verified that this is still a
> problem?
> 

Hmm... I saw this initially on 2.6.36.

Latest git still has the "livelock" comment [1], and I was the able to
capture the following trace with two disks connected on a 2.6.38-rc5
build.  The second "sas_discover_domain" completion occurs after the
"first flush done".

# tracer: nop
#
#           TASK-PID    CPU#    TIMESTAMP  FUNCTION
#              | |       |          |         |
           <...>-5     [007]    93.849947: sas_porte_bytes_dmaed: sas_porte_bytes_dmaed: done
           <...>-5     [007]    94.444643: sas_discover_domain: sas_discover_domain: complete
           <...>-5     [007]    94.451993: sas_porte_bytes_dmaed: sas_porte_bytes_dmaed: done
           <...>-1792  [006]    94.452011: isci_host_scan_finished: isci_host_scan_finished: first flush done
           <...>-5     [007]    94.773256: sas_discover_domain: sas_discover_domain: complete
           <...>-1792  [006]    94.773270: isci_host_scan_finished: isci_host_scan_finished: second flush done

--
Dan


[1]: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob;f=kernel/workqueue.c;h=11869faa;hb=HEAD#l2201


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2011-02-19  1:11 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-02-17  3:06 [PATCH] libsas: flush initial device discovery before completing ->scan_finished() Dan Williams
2011-02-19  0:02 ` James Bottomley
2011-02-19  1:32   ` Dan Williams

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox