linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC v3 01/15] scsi: Drop struct Scsi_Host->host_lock usage in scsi_dispatch_cmd()
@ 2010-09-23 23:37 Nicholas A. Bellinger
  2010-09-24 13:41 ` Brian King
  0 siblings, 1 reply; 8+ messages in thread
From: Nicholas A. Bellinger @ 2010-09-23 23:37 UTC (permalink / raw)
  To: linux-scsi, linux-kernel, Vasu Dev, Tim Chen, Andi Kleen
  Cc: James Smart, Andrew Vasquez, FUJITA Tomonori, Hannes Reinecke,
	Joe Eykholt, Christoph Hellwig, MPTFusionLinux, eata.c maintainer,
	Nicholas Bellinger

From: Nicholas Bellinger <nab@linux-iscsi.org>

This patch converts scsi_dispatch_cmd() drop the struct Scsi_Host->host_lock
legacy usage so that SHT->queuecommand() can be called without this lock held
softirq interrupts disabled via spin_lock_irq()

This patch drops the usage of scsi_cmd_get_serial() in scsi_dispatch_cmd()
and assumes the legacy SCSI LLDs that depend upon struct scsi_cmnd->serial_number
will call the now EXPORT_SYMBOL()'ed scsi_cmd_get_serial() call.

This patch also updates scsi_error.c:scsi_try_to_abort_cmd() to use:

        if (test_bit(REQ_ATOM_COMPLETE, &scmd->request->atomic_flags))
                return SUCCESS;

instead of checking for (scmd->serial_number == 0) following a recommendation
by Mike Christie.

This patch also converts the remaining struct Scsi_Host->cmd_serial_number to
atomic_t following a recommedation by Joe Eykholt to start struct Scsi_Host->
cmd_serial_number at 1, and increment each serial_number by 2 so that the
serial is odd, and wraps to 1 instead of 0.

Many thanks to Vasu Dev, Tim Chen, Mike and Joe for their help with particular
series!

Signed-off-by: Nicholas A. Bellinger <nab@linux-iscsi.org>
---
 drivers/scsi/scsi.c       |   29 ++++++++++++++---------------
 drivers/scsi/scsi_error.c |   10 +++++++---
 include/scsi/scsi_cmnd.h  |    1 +
 include/scsi/scsi_host.h  |    5 ++---
 4 files changed, 24 insertions(+), 21 deletions(-)

diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
index ad0ed21..84dbe76 100644
--- a/drivers/scsi/scsi.c
+++ b/drivers/scsi/scsi.c
@@ -628,18 +628,22 @@ void scsi_log_completion(struct scsi_cmnd *cmd, int disposition)
 
 /**
  * scsi_cmd_get_serial - Assign a serial number to a command
- * @host: the scsi host
  * @cmd: command to assign serial number to
  *
  * Description: a serial number identifies a request for error recovery
- * and debugging purposes.  Protected by the Host_Lock of host.
+ * and debugging purposes.  Called directly by SCSI LLDs that have a
+ * legacy requirement for struct scsi_cmnd->serial_number.
  */
-static inline void scsi_cmd_get_serial(struct Scsi_Host *host, struct scsi_cmnd *cmd)
+void scsi_cmd_get_serial(struct scsi_cmnd *cmd)
 {
-	cmd->serial_number = host->cmd_serial_number++;
-	if (cmd->serial_number == 0) 
-		cmd->serial_number = host->cmd_serial_number++;
+	struct Scsi_Host *host = cmd->device->host;
+	/*
+	 * Increment the host->cmd_serial_number by 2 so cmd->serial_number
+	 * is always odd and wraps to 1 instead of 0.
+	 */
+	cmd->serial_number = atomic_add_return(2, &host->cmd_serial_number);
 }
+EXPORT_SYMBOL(scsi_cmd_get_serial);
 
 /**
  * scsi_dispatch_command - Dispatch a command to the low-level driver.
@@ -651,7 +655,6 @@ static inline void scsi_cmd_get_serial(struct Scsi_Host *host, struct scsi_cmnd
 int scsi_dispatch_cmd(struct scsi_cmnd *cmd)
 {
 	struct Scsi_Host *host = cmd->device->host;
-	unsigned long flags = 0;
 	unsigned long timeout;
 	int rtn = 0;
 
@@ -736,15 +739,11 @@ int scsi_dispatch_cmd(struct scsi_cmnd *cmd)
 		scsi_done(cmd);
 		goto out;
 	}
-
-	spin_lock_irqsave(host->host_lock, flags);
 	/*
-	 * AK: unlikely race here: for some reason the timer could
-	 * expire before the serial number is set up below.
-	 *
-	 * TODO: kill serial or move to blk layer
+	 * Note that scsi_cmd_get_serial() used to be called here, but
+	 * now we expect the legacy SCSI LLDs that actually need this
+	 * to call it directly within their SHT->queuecommand() caller.
 	 */
-	scsi_cmd_get_serial(host, cmd); 
 
 	if (unlikely(host->shost_state == SHOST_DEL)) {
 		cmd->result = (DID_NO_CONNECT << 16);
@@ -753,7 +752,7 @@ int scsi_dispatch_cmd(struct scsi_cmnd *cmd)
 		trace_scsi_dispatch_cmd_start(cmd);
 		rtn = host->hostt->queuecommand(cmd, scsi_done);
 	}
-	spin_unlock_irqrestore(host->host_lock, flags);
+
 	if (rtn) {
 		trace_scsi_dispatch_cmd_error(cmd, rtn);
 		if (rtn != SCSI_MLQUEUE_DEVICE_BUSY &&
diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index 1de30eb..34a4fce 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -39,6 +39,8 @@
 #include "scsi_logging.h"
 #include "scsi_transport_api.h"
 
+#include <../block/blk.h> /* For REQ_ATOM_COMPLETE */
+
 #include <trace/events/scsi.h>
 
 #define SENSE_TIMEOUT		(10*HZ)
@@ -645,11 +647,13 @@ static int __scsi_try_to_abort_cmd(struct scsi_cmnd *scmd)
 static int scsi_try_to_abort_cmd(struct scsi_cmnd *scmd)
 {
 	/*
-	 * scsi_done was called just after the command timed out and before
-	 * we had a chance to process it. (db)
+	 * Use the struct request atomic_flags here to check if
+	 * block/blk.h:blk_mark_rq_complete() has already been called
+	 * from the block softirq
 	 */
-	if (scmd->serial_number == 0)
+	if (test_bit(REQ_ATOM_COMPLETE, &scmd->request->atomic_flags))
 		return SUCCESS;
+
 	return __scsi_try_to_abort_cmd(scmd);
 }
 
diff --git a/include/scsi/scsi_cmnd.h b/include/scsi/scsi_cmnd.h
index a5e885a..bbba4fa 100644
--- a/include/scsi/scsi_cmnd.h
+++ b/include/scsi/scsi_cmnd.h
@@ -136,6 +136,7 @@ extern struct scsi_cmnd *__scsi_get_command(struct Scsi_Host *, gfp_t);
 extern void scsi_put_command(struct scsi_cmnd *);
 extern void __scsi_put_command(struct Scsi_Host *, struct scsi_cmnd *,
 			       struct device *);
+extern void scsi_cmd_get_serial(struct scsi_cmnd *);
 extern void scsi_finish_command(struct scsi_cmnd *cmd);
 
 extern void *scsi_kmap_atomic_sg(struct scatterlist *sg, int sg_count,
diff --git a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h
index b7bdecb..1b69511 100644
--- a/include/scsi/scsi_host.h
+++ b/include/scsi/scsi_host.h
@@ -602,10 +602,9 @@ struct Scsi_Host {
 	short unsigned int max_sectors;
 	unsigned long dma_boundary;
 	/* 
-	 * Used to assign serial numbers to the cmds.
-	 * Protected by the host lock.
+	 * Used to assign serial numbers to the cmds in scsi_cmd_get_serial()
 	 */
-	unsigned long cmd_serial_number;
+	atomic_t cmd_serial_number;
 	
 	unsigned active_mode:2;
 	unsigned unchecked_isa_dma:1;
-- 
1.7.3


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [RFC v3 01/15] scsi: Drop struct Scsi_Host->host_lock usage in scsi_dispatch_cmd()
@ 2010-09-24  0:46 Nicholas A. Bellinger
  0 siblings, 0 replies; 8+ messages in thread
From: Nicholas A. Bellinger @ 2010-09-24  0:46 UTC (permalink / raw)
  To: linux-scsi, linux-kernel, Vasu Dev, Tim Chen, Andi Kleen
  Cc: James Smart, Andrew Vasquez, FUJITA Tomonori, Hannes Reinecke,
	Joe Eykholt, Christoph Hellwig, MPTFusionLinux, eata.c maintainer,
	Nicholas Bellinger

From: Nicholas Bellinger <nab@linux-iscsi.org>

Hi Guys,

This is a quick re-spin of patch #1 that was missing the hosts.c:scsi_host_alloc()
initialization of struct Scsi_Host->cmd_serial_number.  Thanks to Joe for
spotting this.

--------------------------------------------------------------------------------

This patch converts scsi_dispatch_cmd() drop the struct Scsi_Host->host_lock
legacy usage so that SHT->queuecommand() can be called without this lock held
softirq interrupts disabled via spin_lock_irq()

This patch drops the usage of scsi_cmd_get_serial() in scsi_dispatch_cmd()
and assumes the legacy SCSI LLDs that depend upon struct scsi_cmnd->serial_number
will call the now EXPORT_SYMBOL()'ed scsi_cmd_get_serial() call.

This patch also updates scsi_error.c:scsi_try_to_abort_cmd() to use:

        if (test_bit(REQ_ATOM_COMPLETE, &scmd->request->atomic_flags))
                return SUCCESS;

instead of checking for (scmd->serial_number == 0) following a recommendation
by Mike Christie.

This patch also converts the remaining struct Scsi_Host->cmd_serial_number to
atomic_t following a recommedation by Joe Eykholt to start struct Scsi_Host->
cmd_serial_number at 1, and increment each serial_number by 2 so that the
serial is odd, and wraps to 1 instead of 0.  struct Scsi_Host->cmd_serial_number
is initialized to '1' in drivers/scsi/hosts.c:scsi_host_alloc().

Many thanks to Vasu Dev, Tim Chen, Mike and Joe for their help with particular
series!

Signed-off-by: Nicholas A. Bellinger <nab@linux-iscsi.org>
---
 drivers/scsi/hosts.c      |    4 ++++
 drivers/scsi/scsi.c       |   29 ++++++++++++++---------------
 drivers/scsi/scsi_error.c |   10 +++++++---
 include/scsi/scsi_cmnd.h  |    1 +
 include/scsi/scsi_host.h  |    5 ++---
 5 files changed, 28 insertions(+), 21 deletions(-)

diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c
index 8a8f803..aff1c9c 100644
--- a/drivers/scsi/hosts.c
+++ b/drivers/scsi/hosts.c
@@ -380,6 +380,10 @@ struct Scsi_Host *scsi_host_alloc(struct scsi_host_template *sht, int privsize)
 	shost->unchecked_isa_dma = sht->unchecked_isa_dma;
 	shost->use_clustering = sht->use_clustering;
 	shost->ordered_tag = sht->ordered_tag;
+	/*
+	 * Set the default shost->cmd_serial_number to 1.
+	 */
+	atomic_set(&shost->cmd_serial_number, 1);
 
 	if (sht->supported_mode == MODE_UNKNOWN)
 		/* means we didn't set it ... default to INITIATOR */
diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
index ad0ed21..84dbe76 100644
--- a/drivers/scsi/scsi.c
+++ b/drivers/scsi/scsi.c
@@ -628,18 +628,22 @@ void scsi_log_completion(struct scsi_cmnd *cmd, int disposition)
 
 /**
  * scsi_cmd_get_serial - Assign a serial number to a command
- * @host: the scsi host
  * @cmd: command to assign serial number to
  *
  * Description: a serial number identifies a request for error recovery
- * and debugging purposes.  Protected by the Host_Lock of host.
+ * and debugging purposes.  Called directly by SCSI LLDs that have a
+ * legacy requirement for struct scsi_cmnd->serial_number.
  */
-static inline void scsi_cmd_get_serial(struct Scsi_Host *host, struct scsi_cmnd *cmd)
+void scsi_cmd_get_serial(struct scsi_cmnd *cmd)
 {
-	cmd->serial_number = host->cmd_serial_number++;
-	if (cmd->serial_number == 0) 
-		cmd->serial_number = host->cmd_serial_number++;
+	struct Scsi_Host *host = cmd->device->host;
+	/*
+	 * Increment the host->cmd_serial_number by 2 so cmd->serial_number
+	 * is always odd and wraps to 1 instead of 0.
+	 */
+	cmd->serial_number = atomic_add_return(2, &host->cmd_serial_number);
 }
+EXPORT_SYMBOL(scsi_cmd_get_serial);
 
 /**
  * scsi_dispatch_command - Dispatch a command to the low-level driver.
@@ -651,7 +655,6 @@ static inline void scsi_cmd_get_serial(struct Scsi_Host *host, struct scsi_cmnd
 int scsi_dispatch_cmd(struct scsi_cmnd *cmd)
 {
 	struct Scsi_Host *host = cmd->device->host;
-	unsigned long flags = 0;
 	unsigned long timeout;
 	int rtn = 0;
 
@@ -736,15 +739,11 @@ int scsi_dispatch_cmd(struct scsi_cmnd *cmd)
 		scsi_done(cmd);
 		goto out;
 	}
-
-	spin_lock_irqsave(host->host_lock, flags);
 	/*
-	 * AK: unlikely race here: for some reason the timer could
-	 * expire before the serial number is set up below.
-	 *
-	 * TODO: kill serial or move to blk layer
+	 * Note that scsi_cmd_get_serial() used to be called here, but
+	 * now we expect the legacy SCSI LLDs that actually need this
+	 * to call it directly within their SHT->queuecommand() caller.
 	 */
-	scsi_cmd_get_serial(host, cmd); 
 
 	if (unlikely(host->shost_state == SHOST_DEL)) {
 		cmd->result = (DID_NO_CONNECT << 16);
@@ -753,7 +752,7 @@ int scsi_dispatch_cmd(struct scsi_cmnd *cmd)
 		trace_scsi_dispatch_cmd_start(cmd);
 		rtn = host->hostt->queuecommand(cmd, scsi_done);
 	}
-	spin_unlock_irqrestore(host->host_lock, flags);
+
 	if (rtn) {
 		trace_scsi_dispatch_cmd_error(cmd, rtn);
 		if (rtn != SCSI_MLQUEUE_DEVICE_BUSY &&
diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index 1de30eb..34a4fce 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -39,6 +39,8 @@
 #include "scsi_logging.h"
 #include "scsi_transport_api.h"
 
+#include <../block/blk.h> /* For REQ_ATOM_COMPLETE */
+
 #include <trace/events/scsi.h>
 
 #define SENSE_TIMEOUT		(10*HZ)
@@ -645,11 +647,13 @@ static int __scsi_try_to_abort_cmd(struct scsi_cmnd *scmd)
 static int scsi_try_to_abort_cmd(struct scsi_cmnd *scmd)
 {
 	/*
-	 * scsi_done was called just after the command timed out and before
-	 * we had a chance to process it. (db)
+	 * Use the struct request atomic_flags here to check if
+	 * block/blk.h:blk_mark_rq_complete() has already been called
+	 * from the block softirq
 	 */
-	if (scmd->serial_number == 0)
+	if (test_bit(REQ_ATOM_COMPLETE, &scmd->request->atomic_flags))
 		return SUCCESS;
+
 	return __scsi_try_to_abort_cmd(scmd);
 }
 
diff --git a/include/scsi/scsi_cmnd.h b/include/scsi/scsi_cmnd.h
index a5e885a..bbba4fa 100644
--- a/include/scsi/scsi_cmnd.h
+++ b/include/scsi/scsi_cmnd.h
@@ -136,6 +136,7 @@ extern struct scsi_cmnd *__scsi_get_command(struct Scsi_Host *, gfp_t);
 extern void scsi_put_command(struct scsi_cmnd *);
 extern void __scsi_put_command(struct Scsi_Host *, struct scsi_cmnd *,
 			       struct device *);
+extern void scsi_cmd_get_serial(struct scsi_cmnd *);
 extern void scsi_finish_command(struct scsi_cmnd *cmd);
 
 extern void *scsi_kmap_atomic_sg(struct scatterlist *sg, int sg_count,
diff --git a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h
index b7bdecb..1b69511 100644
--- a/include/scsi/scsi_host.h
+++ b/include/scsi/scsi_host.h
@@ -602,10 +602,9 @@ struct Scsi_Host {
 	short unsigned int max_sectors;
 	unsigned long dma_boundary;
 	/* 
-	 * Used to assign serial numbers to the cmds.
-	 * Protected by the host lock.
+	 * Used to assign serial numbers to the cmds in scsi_cmd_get_serial()
 	 */
-	unsigned long cmd_serial_number;
+	atomic_t cmd_serial_number;
 	
 	unsigned active_mode:2;
 	unsigned unchecked_isa_dma:1;
-- 
1.7.3

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [RFC v3 01/15] scsi: Drop struct Scsi_Host->host_lock usage in scsi_dispatch_cmd()
  2010-09-23 23:37 [RFC v3 01/15] scsi: Drop struct Scsi_Host->host_lock usage in scsi_dispatch_cmd() Nicholas A. Bellinger
@ 2010-09-24 13:41 ` Brian King
  2010-09-24 20:44   ` Nicholas A. Bellinger
  0 siblings, 1 reply; 8+ messages in thread
From: Brian King @ 2010-09-24 13:41 UTC (permalink / raw)
  To: Nicholas A. Bellinger
  Cc: linux-scsi, linux-kernel, Vasu Dev, Tim Chen, Andi Kleen,
	Matthew Wilcox, James Bottomley, Mike Christie, James Smart,
	Andrew Vasquez, FUJITA Tomonori, Hannes Reinecke, Joe Eykholt,
	Christoph Hellwig, MPTFusionLinux, eata.c maintainer

On 09/23/2010 06:37 PM, Nicholas A. Bellinger wrote:
> @@ -651,7 +655,6 @@ static inline void scsi_cmd_get_serial(struct Scsi_Host *host, struct scsi_cmnd
>  int scsi_dispatch_cmd(struct scsi_cmnd *cmd)
>  {
>  	struct Scsi_Host *host = cmd->device->host;
> -	unsigned long flags = 0;
>  	unsigned long timeout;
>  	int rtn = 0;
> 
> @@ -736,15 +739,11 @@ int scsi_dispatch_cmd(struct scsi_cmnd *cmd)
>  		scsi_done(cmd);
>  		goto out;
>  	}
> -
> -	spin_lock_irqsave(host->host_lock, flags);
>  	/*
> -	 * AK: unlikely race here: for some reason the timer could
> -	 * expire before the serial number is set up below.
> -	 *
> -	 * TODO: kill serial or move to blk layer
> +	 * Note that scsi_cmd_get_serial() used to be called here, but
> +	 * now we expect the legacy SCSI LLDs that actually need this
> +	 * to call it directly within their SHT->queuecommand() caller.
>  	 */
> -	scsi_cmd_get_serial(host, cmd); 
> 
>  	if (unlikely(host->shost_state == SHOST_DEL)) {
>  		cmd->result = (DID_NO_CONNECT << 16);
> @@ -753,7 +752,7 @@ int scsi_dispatch_cmd(struct scsi_cmnd *cmd)
>  		trace_scsi_dispatch_cmd_start(cmd);
>  		rtn = host->hostt->queuecommand(cmd, scsi_done);
>  	}
> -	spin_unlock_irqrestore(host->host_lock, flags);
> +
>  	if (rtn) {
>  		trace_scsi_dispatch_cmd_error(cmd, rtn);
>  		if (rtn != SCSI_MLQUEUE_DEVICE_BUSY &&

Are you planning a future revision that moves the acquiring of the host lock
into the LLDD's queuecommand for all the other drivers you don't currently
touch in this patch set?

-- 
Brian King
Linux on Power Virtualization
IBM Linux Technology Center

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC v3 01/15] scsi: Drop struct Scsi_Host->host_lock usage in scsi_dispatch_cmd()
  2010-09-24 13:41 ` Brian King
@ 2010-09-24 20:44   ` Nicholas A. Bellinger
  2010-09-24 21:10     ` Brian King
  2010-09-27 14:19     ` Christof Schmitt
  0 siblings, 2 replies; 8+ messages in thread
From: Nicholas A. Bellinger @ 2010-09-24 20:44 UTC (permalink / raw)
  To: Brian King
  Cc: linux-scsi, linux-kernel, Vasu Dev, Tim Chen, Andi Kleen,
	Matthew Wilcox, James Bottomley, Mike Christie, James Smart,
	Andrew Vasquez, FUJITA Tomonori, Hannes Reinecke, Joe Eykholt,
	Christoph Hellwig, MPTFusionLinux, eata.c maintainer

On Fri, 2010-09-24 at 08:41 -0500, Brian King wrote:
> On 09/23/2010 06:37 PM, Nicholas A. Bellinger wrote:
> > @@ -651,7 +655,6 @@ static inline void scsi_cmd_get_serial(struct Scsi_Host *host, struct scsi_cmnd
> >  int scsi_dispatch_cmd(struct scsi_cmnd *cmd)
> >  {
> >  	struct Scsi_Host *host = cmd->device->host;
> > -	unsigned long flags = 0;
> >  	unsigned long timeout;
> >  	int rtn = 0;
> > 
> > @@ -736,15 +739,11 @@ int scsi_dispatch_cmd(struct scsi_cmnd *cmd)
> >  		scsi_done(cmd);
> >  		goto out;
> >  	}
> > -
> > -	spin_lock_irqsave(host->host_lock, flags);
> >  	/*
> > -	 * AK: unlikely race here: for some reason the timer could
> > -	 * expire before the serial number is set up below.
> > -	 *
> > -	 * TODO: kill serial or move to blk layer
> > +	 * Note that scsi_cmd_get_serial() used to be called here, but
> > +	 * now we expect the legacy SCSI LLDs that actually need this
> > +	 * to call it directly within their SHT->queuecommand() caller.
> >  	 */
> > -	scsi_cmd_get_serial(host, cmd); 
> > 
> >  	if (unlikely(host->shost_state == SHOST_DEL)) {
> >  		cmd->result = (DID_NO_CONNECT << 16);
> > @@ -753,7 +752,7 @@ int scsi_dispatch_cmd(struct scsi_cmnd *cmd)
> >  		trace_scsi_dispatch_cmd_start(cmd);
> >  		rtn = host->hostt->queuecommand(cmd, scsi_done);
> >  	}
> > -	spin_unlock_irqrestore(host->host_lock, flags);
> > +
> >  	if (rtn) {
> >  		trace_scsi_dispatch_cmd_error(cmd, rtn);
> >  		if (rtn != SCSI_MLQUEUE_DEVICE_BUSY &&
> 
> Are you planning a future revision that moves the acquiring of the host lock
> into the LLDD's queuecommand for all the other drivers you don't currently
> touch in this patch set?
> 

Hi Brian,

I was under the impression that this would be unnecessary for the vast
majority of existing LLD drivers, but if you are aware of specific LLDs
that would still need the struct Scsi_Host->host_lock held in their
SHT->queuecommand() for whaterver reason please let me know and I would
be happy to include this into an RFCv4.

Thanks for your comments!

--nab

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC v3 01/15] scsi: Drop struct Scsi_Host->host_lock usage in scsi_dispatch_cmd()
  2010-09-24 20:44   ` Nicholas A. Bellinger
@ 2010-09-24 21:10     ` Brian King
  2010-09-24 21:12       ` Nicholas A. Bellinger
  2010-09-27 14:19     ` Christof Schmitt
  1 sibling, 1 reply; 8+ messages in thread
From: Brian King @ 2010-09-24 21:10 UTC (permalink / raw)
  To: Nicholas A. Bellinger
  Cc: linux-scsi, linux-kernel, Vasu Dev, Tim Chen, Andi Kleen,
	Matthew Wilcox, James Bottomley, Mike Christie, James Smart,
	Andrew Vasquez, FUJITA Tomonori, Hannes Reinecke, Joe Eykholt,
	Christoph Hellwig, MPTFusionLinux, eata.c maintainer

On 09/24/2010 03:44 PM, Nicholas A. Bellinger wrote:
> On Fri, 2010-09-24 at 08:41 -0500, Brian King wrote:
>> On 09/23/2010 06:37 PM, Nicholas A. Bellinger wrote:
>>> @@ -651,7 +655,6 @@ static inline void scsi_cmd_get_serial(struct Scsi_Host *host, struct scsi_cmnd
>>>  int scsi_dispatch_cmd(struct scsi_cmnd *cmd)
>>>  {
>>>  	struct Scsi_Host *host = cmd->device->host;
>>> -	unsigned long flags = 0;
>>>  	unsigned long timeout;
>>>  	int rtn = 0;
>>>
>>> @@ -736,15 +739,11 @@ int scsi_dispatch_cmd(struct scsi_cmnd *cmd)
>>>  		scsi_done(cmd);
>>>  		goto out;
>>>  	}
>>> -
>>> -	spin_lock_irqsave(host->host_lock, flags);
>>>  	/*
>>> -	 * AK: unlikely race here: for some reason the timer could
>>> -	 * expire before the serial number is set up below.
>>> -	 *
>>> -	 * TODO: kill serial or move to blk layer
>>> +	 * Note that scsi_cmd_get_serial() used to be called here, but
>>> +	 * now we expect the legacy SCSI LLDs that actually need this
>>> +	 * to call it directly within their SHT->queuecommand() caller.
>>>  	 */
>>> -	scsi_cmd_get_serial(host, cmd); 
>>>
>>>  	if (unlikely(host->shost_state == SHOST_DEL)) {
>>>  		cmd->result = (DID_NO_CONNECT << 16);
>>> @@ -753,7 +752,7 @@ int scsi_dispatch_cmd(struct scsi_cmnd *cmd)
>>>  		trace_scsi_dispatch_cmd_start(cmd);
>>>  		rtn = host->hostt->queuecommand(cmd, scsi_done);
>>>  	}
>>> -	spin_unlock_irqrestore(host->host_lock, flags);
>>> +
>>>  	if (rtn) {
>>>  		trace_scsi_dispatch_cmd_error(cmd, rtn);
>>>  		if (rtn != SCSI_MLQUEUE_DEVICE_BUSY &&
>>
>> Are you planning a future revision that moves the acquiring of the host lock
>> into the LLDD's queuecommand for all the other drivers you don't currently
>> touch in this patch set?
>>
> 
> Hi Brian,
> 
> I was under the impression that this would be unnecessary for the vast
> majority of existing LLD drivers, but if you are aware of specific LLDs
> that would still need the struct Scsi_Host->host_lock held in their
> SHT->queuecommand() for whaterver reason please let me know and I would
> be happy to include this into an RFCv4.

I would think that most drivers might have issues without some pretty careful
auditing. When Christoph did this for the EH handlers, the first step was to
simply move acquiring the host lock into the LLDs. That way we can optimize
drivers one at a time after ensuring they can run lockless in their queuecommand
handler.

A couple examples of possible issues with drivers I'm familiar with (ibmvfc, ipr):

* Some drivers will do list manipulation for resources needed to send commands. If
  done lockless, this could result in list corruption with multiple readers/writers.

* Some drivers check the state of the hardware before sending a command. Failing to
  do this when the hardware is being reset may result in nasty things like PCI bus
  errors or even sending a command to the wrong device.

These are all the sorts of errors that will be very difficult to hit but have
pretty bad consequence when they are hit.

Thanks,

Brian

-- 
Brian King
Linux on Power Virtualization
IBM Linux Technology Center

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC v3 01/15] scsi: Drop struct Scsi_Host->host_lock usage in scsi_dispatch_cmd()
  2010-09-24 21:10     ` Brian King
@ 2010-09-24 21:12       ` Nicholas A. Bellinger
  0 siblings, 0 replies; 8+ messages in thread
From: Nicholas A. Bellinger @ 2010-09-24 21:12 UTC (permalink / raw)
  To: Brian King
  Cc: linux-scsi, linux-kernel, Vasu Dev, Tim Chen, Andi Kleen,
	Matthew Wilcox, James Bottomley, Mike Christie, James Smart,
	Andrew Vasquez, FUJITA Tomonori, Hannes Reinecke, Joe Eykholt,
	Christoph Hellwig, MPTFusionLinux, eata.c maintainer

On Fri, 2010-09-24 at 16:10 -0500, Brian King wrote:
> On 09/24/2010 03:44 PM, Nicholas A. Bellinger wrote:
> > On Fri, 2010-09-24 at 08:41 -0500, Brian King wrote:
> >> On 09/23/2010 06:37 PM, Nicholas A. Bellinger wrote:
> >>> @@ -651,7 +655,6 @@ static inline void scsi_cmd_get_serial(struct Scsi_Host *host, struct scsi_cmnd
> >>>  int scsi_dispatch_cmd(struct scsi_cmnd *cmd)
> >>>  {
> >>>  	struct Scsi_Host *host = cmd->device->host;
> >>> -	unsigned long flags = 0;
> >>>  	unsigned long timeout;
> >>>  	int rtn = 0;
> >>>
> >>> @@ -736,15 +739,11 @@ int scsi_dispatch_cmd(struct scsi_cmnd *cmd)
> >>>  		scsi_done(cmd);
> >>>  		goto out;
> >>>  	}
> >>> -
> >>> -	spin_lock_irqsave(host->host_lock, flags);
> >>>  	/*
> >>> -	 * AK: unlikely race here: for some reason the timer could
> >>> -	 * expire before the serial number is set up below.
> >>> -	 *
> >>> -	 * TODO: kill serial or move to blk layer
> >>> +	 * Note that scsi_cmd_get_serial() used to be called here, but
> >>> +	 * now we expect the legacy SCSI LLDs that actually need this
> >>> +	 * to call it directly within their SHT->queuecommand() caller.
> >>>  	 */
> >>> -	scsi_cmd_get_serial(host, cmd); 
> >>>
> >>>  	if (unlikely(host->shost_state == SHOST_DEL)) {
> >>>  		cmd->result = (DID_NO_CONNECT << 16);
> >>> @@ -753,7 +752,7 @@ int scsi_dispatch_cmd(struct scsi_cmnd *cmd)
> >>>  		trace_scsi_dispatch_cmd_start(cmd);
> >>>  		rtn = host->hostt->queuecommand(cmd, scsi_done);
> >>>  	}
> >>> -	spin_unlock_irqrestore(host->host_lock, flags);
> >>> +
> >>>  	if (rtn) {
> >>>  		trace_scsi_dispatch_cmd_error(cmd, rtn);
> >>>  		if (rtn != SCSI_MLQUEUE_DEVICE_BUSY &&
> >>
> >> Are you planning a future revision that moves the acquiring of the host lock
> >> into the LLDD's queuecommand for all the other drivers you don't currently
> >> touch in this patch set?
> >>
> > 
> > Hi Brian,
> > 
> > I was under the impression that this would be unnecessary for the vast
> > majority of existing LLD drivers, but if you are aware of specific LLDs
> > that would still need the struct Scsi_Host->host_lock held in their
> > SHT->queuecommand() for whaterver reason please let me know and I would
> > be happy to include this into an RFCv4.
> 
> I would think that most drivers might have issues without some pretty careful
> auditing. When Christoph did this for the EH handlers, the first step was to
> simply move acquiring the host lock into the LLDs. That way we can optimize
> drivers one at a time after ensuring they can run lockless in their queuecommand
> handler.
> 
> A couple examples of possible issues with drivers I'm familiar with (ibmvfc, ipr):
> 
> * Some drivers will do list manipulation for resources needed to send commands. If
>   done lockless, this could result in list corruption with multiple readers/writers.
> 
> * Some drivers check the state of the hardware before sending a command. Failing to
>   do this when the hardware is being reset may result in nasty things like PCI bus
>   errors or even sending a command to the wrong device.
> 

Indeed, I can very much see some older LLDs making these types of
assumptions.

> These are all the sorts of errors that will be very difficult to hit but have
> pretty bad consequence when they are hit.
> 

I think pretty bad would be an under-statement when running into either
of the above two items in ancient LLD code.

In that case, I will re-spin a v4 series that contains a legacy
host_lock held w/ SHT->queuecomand() for all of the "historically high
host_lock optimized in ->queuecommand()" LLDs that are in RFCv3, and
include the other specific ones (namely mpt/fusion and mpt2sas) that we
know are safe to drop host_lock.

Many thanks for your invaluable input on some of the non-obvious items
at play here.

Best,

--nab



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC v3 01/15] scsi: Drop struct Scsi_Host->host_lock usage in scsi_dispatch_cmd()
  2010-09-24 20:44   ` Nicholas A. Bellinger
  2010-09-24 21:10     ` Brian King
@ 2010-09-27 14:19     ` Christof Schmitt
  2010-09-27 23:15       ` Nicholas A. Bellinger
  1 sibling, 1 reply; 8+ messages in thread
From: Christof Schmitt @ 2010-09-27 14:19 UTC (permalink / raw)
  To: Nicholas A. Bellinger
  Cc: Brian King, linux-scsi, linux-kernel, Vasu Dev, Tim Chen,
	Andi Kleen, Matthew Wilcox, James Bottomley, Mike Christie,
	James Smart, Andrew Vasquez, FUJITA Tomonori, Hannes Reinecke,
	Joe Eykholt, Christoph Hellwig, MPTFusionLinux, eata.c maintainer

On Fri, Sep 24, 2010 at 01:44:00PM -0700, Nicholas A. Bellinger wrote:
> On Fri, 2010-09-24 at 08:41 -0500, Brian King wrote:
> > On 09/23/2010 06:37 PM, Nicholas A. Bellinger wrote:
> > > @@ -651,7 +655,6 @@ static inline void scsi_cmd_get_serial(struct Scsi_Host *host, struct scsi_cmnd
> > >  int scsi_dispatch_cmd(struct scsi_cmnd *cmd)
> > >  {
> > >  	struct Scsi_Host *host = cmd->device->host;
> > > -	unsigned long flags = 0;
> > >  	unsigned long timeout;
> > >  	int rtn = 0;
> > > 
> > > @@ -736,15 +739,11 @@ int scsi_dispatch_cmd(struct scsi_cmnd *cmd)
> > >  		scsi_done(cmd);
> > >  		goto out;
> > >  	}
> > > -
> > > -	spin_lock_irqsave(host->host_lock, flags);
> > >  	/*
> > > -	 * AK: unlikely race here: for some reason the timer could
> > > -	 * expire before the serial number is set up below.
> > > -	 *
> > > -	 * TODO: kill serial or move to blk layer
> > > +	 * Note that scsi_cmd_get_serial() used to be called here, but
> > > +	 * now we expect the legacy SCSI LLDs that actually need this
> > > +	 * to call it directly within their SHT->queuecommand() caller.
> > >  	 */
> > > -	scsi_cmd_get_serial(host, cmd); 
> > > 
> > >  	if (unlikely(host->shost_state == SHOST_DEL)) {
> > >  		cmd->result = (DID_NO_CONNECT << 16);
> > > @@ -753,7 +752,7 @@ int scsi_dispatch_cmd(struct scsi_cmnd *cmd)
> > >  		trace_scsi_dispatch_cmd_start(cmd);
> > >  		rtn = host->hostt->queuecommand(cmd, scsi_done);
> > >  	}
> > > -	spin_unlock_irqrestore(host->host_lock, flags);
> > > +
> > >  	if (rtn) {
> > >  		trace_scsi_dispatch_cmd_error(cmd, rtn);
> > >  		if (rtn != SCSI_MLQUEUE_DEVICE_BUSY &&
> > 
> > Are you planning a future revision that moves the acquiring of the host lock
> > into the LLDD's queuecommand for all the other drivers you don't currently
> > touch in this patch set?
> > 
> 
> Hi Brian,
> 
> I was under the impression that this would be unnecessary for the vast
> majority of existing LLD drivers, but if you are aware of specific LLDs
> that would still need the struct Scsi_Host->host_lock held in their
> SHT->queuecommand() for whaterver reason please let me know and I would
> be happy to include this into an RFCv4.
> 
> Thanks for your comments!

zfcp relies on having the interrupts disabled when calling
queuecommand. Without the spin_lock_irqsave in scsi_dispatch_cmd, the
locking in zfcp_fsf_send_fcp_command_task has to be changed from
spin_lock(&qdio->req_q_lock) to spin_lock_irqsave. It is a simple
change, but other drivers might have similar requirements.

Christof

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC v3 01/15] scsi: Drop struct Scsi_Host->host_lock usage in scsi_dispatch_cmd()
  2010-09-27 14:19     ` Christof Schmitt
@ 2010-09-27 23:15       ` Nicholas A. Bellinger
  0 siblings, 0 replies; 8+ messages in thread
From: Nicholas A. Bellinger @ 2010-09-27 23:15 UTC (permalink / raw)
  To: Christof Schmitt
  Cc: Brian King, linux-scsi, linux-kernel, Vasu Dev, Tim Chen,
	Andi Kleen, Matthew Wilcox, James Bottomley, Mike Christie,
	James Smart, Andrew Vasquez, FUJITA Tomonori, Hannes Reinecke,
	Joe Eykholt, Christoph Hellwig, MPTFusionLinux, eata.c maintainer

On Mon, 2010-09-27 at 16:19 +0200, Christof Schmitt wrote:
> On Fri, Sep 24, 2010 at 01:44:00PM -0700, Nicholas A. Bellinger wrote:
> > On Fri, 2010-09-24 at 08:41 -0500, Brian King wrote:
> > > On 09/23/2010 06:37 PM, Nicholas A. Bellinger wrote:
> > > > @@ -651,7 +655,6 @@ static inline void scsi_cmd_get_serial(struct Scsi_Host *host, struct scsi_cmnd
> > > >  int scsi_dispatch_cmd(struct scsi_cmnd *cmd)
> > > >  {
> > > >  	struct Scsi_Host *host = cmd->device->host;
> > > > -	unsigned long flags = 0;
> > > >  	unsigned long timeout;
> > > >  	int rtn = 0;
> > > > 
> > > > @@ -736,15 +739,11 @@ int scsi_dispatch_cmd(struct scsi_cmnd *cmd)
> > > >  		scsi_done(cmd);
> > > >  		goto out;
> > > >  	}
> > > > -
> > > > -	spin_lock_irqsave(host->host_lock, flags);
> > > >  	/*
> > > > -	 * AK: unlikely race here: for some reason the timer could
> > > > -	 * expire before the serial number is set up below.
> > > > -	 *
> > > > -	 * TODO: kill serial or move to blk layer
> > > > +	 * Note that scsi_cmd_get_serial() used to be called here, but
> > > > +	 * now we expect the legacy SCSI LLDs that actually need this
> > > > +	 * to call it directly within their SHT->queuecommand() caller.
> > > >  	 */
> > > > -	scsi_cmd_get_serial(host, cmd); 
> > > > 
> > > >  	if (unlikely(host->shost_state == SHOST_DEL)) {
> > > >  		cmd->result = (DID_NO_CONNECT << 16);
> > > > @@ -753,7 +752,7 @@ int scsi_dispatch_cmd(struct scsi_cmnd *cmd)
> > > >  		trace_scsi_dispatch_cmd_start(cmd);
> > > >  		rtn = host->hostt->queuecommand(cmd, scsi_done);
> > > >  	}
> > > > -	spin_unlock_irqrestore(host->host_lock, flags);
> > > > +
> > > >  	if (rtn) {
> > > >  		trace_scsi_dispatch_cmd_error(cmd, rtn);
> > > >  		if (rtn != SCSI_MLQUEUE_DEVICE_BUSY &&
> > > 
> > > Are you planning a future revision that moves the acquiring of the host lock
> > > into the LLDD's queuecommand for all the other drivers you don't currently
> > > touch in this patch set?
> > > 
> > 
> > Hi Brian,
> > 
> > I was under the impression that this would be unnecessary for the vast
> > majority of existing LLD drivers, but if you are aware of specific LLDs
> > that would still need the struct Scsi_Host->host_lock held in their
> > SHT->queuecommand() for whaterver reason please let me know and I would
> > be happy to include this into an RFCv4.
> > 
> > Thanks for your comments!
> 
> zfcp relies on having the interrupts disabled when calling
> queuecommand. Without the spin_lock_irqsave in scsi_dispatch_cmd, the
> locking in zfcp_fsf_send_fcp_command_task has to be changed from
> spin_lock(&qdio->req_q_lock) to spin_lock_irqsave. It is a simple
> change, but other drivers might have similar requirements.
> 

Very good to know, thanks for this useful bit of knowledge Christof.

So my next steps for an RFC v4 of this series is to:

*) Create a seperate scsi_legacy_dispatch_cmd() that still uses
host_lock for SHT->queuecommand(), and enable this by default for all
the LLDs not included in the RFCv3 series to drop their historical
'->queuecommand() unlock -> do_work() -> lock' optimization and that we
know are 100% sure we can run w/o host_lock being held.  Also note that
MPT/Fusion and mpt2sas will do not require host_lock as Tim Chen has
demonstrated with his testing.  (Any LSI folks comments here..?)

*) Following Mike Anderson's comments wrt to the signaling
scsi_try_to_abort_cmd() callpath for block softirq completion w/o struct
scsi_cmnd->serial_number, and how it currently breaks scsi_unjam_host ->
scsi_eh_abort_cmds -> scsi_try_to_abort_cmd().

I don't for see any particular issues with the former at this point, and
I will be reposting with these changes soon.  For the latter it appears
we need a seperate struct request->atomic_flags bit for signaling the
call from block softirq context, so that the scsi_try_to_abort_cmd() can
still complete the outstanding struct scsi_cmnd descriptor in the EH
scsi_unjam_host() path.  Any other comments on this particular item from
Intel, IBM and other distro kernel folks would be apperciated.

Thanks for your comments!

--nab
	 

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2010-09-27 23:15 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-09-23 23:37 [RFC v3 01/15] scsi: Drop struct Scsi_Host->host_lock usage in scsi_dispatch_cmd() Nicholas A. Bellinger
2010-09-24 13:41 ` Brian King
2010-09-24 20:44   ` Nicholas A. Bellinger
2010-09-24 21:10     ` Brian King
2010-09-24 21:12       ` Nicholas A. Bellinger
2010-09-27 14:19     ` Christof Schmitt
2010-09-27 23:15       ` Nicholas A. Bellinger
  -- strict thread matches above, loose matches on Subject: below --
2010-09-24  0:46 Nicholas A. Bellinger

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).