linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/1] mid-layer unblocks blocked sdev leaving queue stopped
@ 2009-12-16 22:43 Michael Reed
  2010-02-10 22:24 ` Michael Reed
  0 siblings, 1 reply; 6+ messages in thread
From: Michael Reed @ 2009-12-16 22:43 UTC (permalink / raw)
  To: linux-scsi; +Cc: James Smart, Jeremy Higdon, Robin Holt

scsi_add_lun() and scsi_sysfs_add_sdev() unconditionally set the sdev
state to SDEV_RUNNING.  If this sdev was blocked via scsi_internal_device_block()
the changing of the state to running will result in the device's queue
remaining stopped after the call to scsi_internal_device_unblock() and
commands becoming stuck.  This patch changes scsi_add_lun() and
scsi_sysfs_add_sdev() to not move the sdev to running if it is
blocked. It leaves it to the blocker of the sdev to unblock it.

Found during large lun count testing using 13 FC host adapter ports.
In my testing the sdev was blocked due to disruptions or events on the
fibre channel link.

Signed-off-by: Michael Reed <mdr@sgi.com>

==

--- a/drivers/scsi/scsi_sysfs.c	2009-12-16 16:26:14.512882449 -0600
+++ scsi-misc/drivers/scsi/scsi_sysfs.c	2009-12-16 16:23:33.352882975 -0600
@@ -878,8 +878,14 @@ int scsi_sysfs_add_sdev(struct scsi_devi
 	struct request_queue *rq = sdev->request_queue;
 	struct scsi_target *starget = sdev->sdev_target;
 
-	if ((error = scsi_device_set_state(sdev, SDEV_RUNNING)) != 0)
-		return error;
+	/*
+	 * If device is blocked, leave state alone and let
+	 * blocker unblock when appropriate.
+	 */
+	if (sdev->sdev_state != SDEV_BLOCK) {
+		if ((error = scsi_device_set_state(sdev, SDEV_RUNNING)) != 0)
+			return error;
+	}
 
 	error = scsi_target_add(starget);
 	if (error)
--- a/drivers/scsi/scsi_scan.c	2009-12-16 16:26:14.512882449 -0600
+++ scsi-misc/drivers/scsi/scsi_scan.c	2009-12-16 16:25:58.084976675 -0600
@@ -902,17 +902,22 @@ static int scsi_add_lun(struct scsi_devi
 	if (*bflags & BLIST_USE_10_BYTE_MS)
 		sdev->use_10_for_ms = 1;
 
-	/* set the device running here so that slave configure
-	 * may do I/O */
-	ret = scsi_device_set_state(sdev, SDEV_RUNNING);
-	if (ret) {
-		ret = scsi_device_set_state(sdev, SDEV_BLOCK);
-
+	/*
+	 * If device is blocked, leave state alone and let blocker
+	 * unblock when appropriate.  Otherwise, set the device
+	 * running here so that slave configure may perform i/o.
+	 */
+	if (sdev->sdev_state != SDEV_BLOCK) {
+		ret = scsi_device_set_state(sdev, SDEV_RUNNING);
 		if (ret) {
-			sdev_printk(KERN_ERR, sdev,
+			ret = scsi_device_set_state(sdev, SDEV_BLOCK);
+
+			if (ret) {
+				sdev_printk(KERN_ERR, sdev,
 				    "in wrong state %s to complete scan\n",
 				    scsi_device_state_name(sdev->sdev_state));
-			return SCSI_SCAN_NO_RESPONSE;
+				return SCSI_SCAN_NO_RESPONSE;
+			}
 		}
 	}
 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/1] mid-layer unblocks blocked sdev leaving queue stopped
  2009-12-16 22:43 [PATCH 1/1] mid-layer unblocks blocked sdev leaving queue stopped Michael Reed
@ 2010-02-10 22:24 ` Michael Reed
  2010-02-10 22:57   ` Mike Christie
  0 siblings, 1 reply; 6+ messages in thread
From: Michael Reed @ 2010-02-10 22:24 UTC (permalink / raw)
  To: linux-scsi; +Cc: James Smart, James Bottomley

No comments after almost 2 months?  This is a real problem resulting in hung i/o.
Any thoughts on when / if this might be integrated?

Thanks,
 Mike


Michael Reed wrote:
> scsi_add_lun() and scsi_sysfs_add_sdev() unconditionally set the sdev
> state to SDEV_RUNNING.  If this sdev was blocked via scsi_internal_device_block()
> the changing of the state to running will result in the device's queue
> remaining stopped after the call to scsi_internal_device_unblock() and
> commands becoming stuck.  This patch changes scsi_add_lun() and
> scsi_sysfs_add_sdev() to not move the sdev to running if it is
> blocked. It leaves it to the blocker of the sdev to unblock it.
> 
> Found during large lun count testing using 13 FC host adapter ports.
> In my testing the sdev was blocked due to disruptions or events on the
> fibre channel link.
> 
> Signed-off-by: Michael Reed <mdr@sgi.com>
> 
> ==
> 
> --- a/drivers/scsi/scsi_sysfs.c	2009-12-16 16:26:14.512882449 -0600
> +++ scsi-misc/drivers/scsi/scsi_sysfs.c	2009-12-16 16:23:33.352882975 -0600
> @@ -878,8 +878,14 @@ int scsi_sysfs_add_sdev(struct scsi_devi
>  	struct request_queue *rq = sdev->request_queue;
>  	struct scsi_target *starget = sdev->sdev_target;
>  
> -	if ((error = scsi_device_set_state(sdev, SDEV_RUNNING)) != 0)
> -		return error;
> +	/*
> +	 * If device is blocked, leave state alone and let
> +	 * blocker unblock when appropriate.
> +	 */
> +	if (sdev->sdev_state != SDEV_BLOCK) {
> +		if ((error = scsi_device_set_state(sdev, SDEV_RUNNING)) != 0)
> +			return error;
> +	}
>  
>  	error = scsi_target_add(starget);
>  	if (error)
> --- a/drivers/scsi/scsi_scan.c	2009-12-16 16:26:14.512882449 -0600
> +++ scsi-misc/drivers/scsi/scsi_scan.c	2009-12-16 16:25:58.084976675 -0600
> @@ -902,17 +902,22 @@ static int scsi_add_lun(struct scsi_devi
>  	if (*bflags & BLIST_USE_10_BYTE_MS)
>  		sdev->use_10_for_ms = 1;
>  
> -	/* set the device running here so that slave configure
> -	 * may do I/O */
> -	ret = scsi_device_set_state(sdev, SDEV_RUNNING);
> -	if (ret) {
> -		ret = scsi_device_set_state(sdev, SDEV_BLOCK);
> -
> +	/*
> +	 * If device is blocked, leave state alone and let blocker
> +	 * unblock when appropriate.  Otherwise, set the device
> +	 * running here so that slave configure may perform i/o.
> +	 */
> +	if (sdev->sdev_state != SDEV_BLOCK) {
> +		ret = scsi_device_set_state(sdev, SDEV_RUNNING);
>  		if (ret) {
> -			sdev_printk(KERN_ERR, sdev,
> +			ret = scsi_device_set_state(sdev, SDEV_BLOCK);
> +
> +			if (ret) {
> +				sdev_printk(KERN_ERR, sdev,
>  				    "in wrong state %s to complete scan\n",
>  				    scsi_device_state_name(sdev->sdev_state));
> -			return SCSI_SCAN_NO_RESPONSE;
> +				return SCSI_SCAN_NO_RESPONSE;
> +			}
>  		}
>  	}
>  
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/1] mid-layer unblocks blocked sdev leaving queue stopped
  2010-02-10 22:24 ` Michael Reed
@ 2010-02-10 22:57   ` Mike Christie
  2010-02-16 21:48     ` Michael Reed
  0 siblings, 1 reply; 6+ messages in thread
From: Mike Christie @ 2010-02-10 22:57 UTC (permalink / raw)
  To: Michael Reed; +Cc: linux-scsi, James Smart, James Bottomley

On 02/10/2010 04:24 PM, Michael Reed wrote:
>> +	/*
>> +	 * If device is blocked, leave state alone and let blocker
>> +	 * unblock when appropriate.  Otherwise, set the device
>> +	 * running here so that slave configure may perform i/o.
>> +	 */
>> +	if (sdev->sdev_state != SDEV_BLOCK) {
>> +		ret = scsi_device_set_state(sdev, SDEV_RUNNING);

Do we need locking here? Is it possible that right after we check the 
sdev_state for being blocked, it could be be set to blocked?

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/1] mid-layer unblocks blocked sdev leaving queue stopped
  2010-02-10 22:57   ` Mike Christie
@ 2010-02-16 21:48     ` Michael Reed
  2010-02-24 20:17       ` Michael Reed
  2010-03-09 16:14       ` James Smart
  0 siblings, 2 replies; 6+ messages in thread
From: Michael Reed @ 2010-02-16 21:48 UTC (permalink / raw)
  To: Mike Christie; +Cc: linux-scsi, James Smart, James Bottomley

Mike Christie wrote:
> On 02/10/2010 04:24 PM, Michael Reed wrote:
>>> +	/*
>>> +	 * If device is blocked, leave state alone and let blocker
>>> +	 * unblock when appropriate.  Otherwise, set the device
>>> +	 * running here so that slave configure may perform i/o.
>>> +	 */
>>> +	if (sdev->sdev_state != SDEV_BLOCK) {
>>> +		ret = scsi_device_set_state(sdev, SDEV_RUNNING);
> 
> Do we need locking here? Is it possible that right after we check the 
> sdev_state for being blocked, it could be be set to blocked?

Since the sdev_state can be set to blocked upon entry to the routines,
it seems likely that it could transition as you describe.  The host_lock
appears to be the right lock for this.  Nice catch.  It does, however,
open a can of worms.  The documentation for the _block()/_unblock()
functions state that it is assumed the host_lock is held upon entry.
This doesn't line up with the code, which explicitly releases the
host_lock before calling scsi_internal_device_[un]block().

I previously toyed around with an alternate method of resolving this which
put the onus on the unblocking thread which doesn't care about the
sdev_state.  As scsi_internal_device_block() is the routine which stops
the queue, I made scsi_internal_device_unblock() allow for the
possibility that another thread had transitioned sdev_state from SDEV_BLOCK.

It's been long enough that I don't recall why I chose the patch I
posted to resolve the issue, but this patch was also effective at
resolving the issue.  It also confines the change to the _block()/
_unblock() pair of routines and doesn't require changes to the
existing locking logic.  In retrospect, this patch was probably the
one I should have chosen to post.

Please review.

Signed-off-by: Michael Reed <mdr@sgi.com>

--- a/drivers/scsi/scsi_lib.c	2010-02-08 11:19:57.000000000 -0600
+++ b/drivers/scsi/scsi_lib.c	2010-02-11 14:35:26.249142688 -0600
@@ -2376,7 +2376,7 @@ EXPORT_SYMBOL(scsi_target_resume);
  *	(which must be a legal transition).  When the device is in this
  *	state, all commands are deferred until the scsi lld reenables
  *	the device with scsi_device_unblock or device_block_tmo fires.
- *	This routine assumes the host_lock is held on entry.
+ *	This routine assumes the host_lock is not held on entry.
  */
 int
 scsi_internal_device_block(struct scsi_device *sdev)
@@ -2420,7 +2420,7 @@ EXPORT_SYMBOL_GPL(scsi_internal_device_b
  *	This routine transitions the device to the SDEV_RUNNING state
  *	(which must be a legal transition) allowing the midlayer to
  *	goose the queue for this device.  This routine assumes the 
- *	host_lock is held upon entry.
+ *	host_lock is not held upon entry.
  */
 int
 scsi_internal_device_unblock(struct scsi_device *sdev)
@@ -2431,15 +2431,23 @@ scsi_internal_device_unblock(struct scsi
 	/* 
 	 * Try to transition the scsi device to SDEV_RUNNING
 	 * and goose the device queue if successful.  
+	 * It's possible that an sdev may be blocked before it is
+	 * completely set up.  scsi_sysfs_add_sdev() and scsi_add_lun()
+	 * will unconditionally attempt to transition the sdev to
+	 * SDEV_RUNNING.  To avoid leaving the queue stopped, we
+	 * allow for the sdev to already be in the SDEV_RUNNING state.
 	 */
+	spin_lock_irqsave(q->queue_lock, flags);
+
 	if (sdev->sdev_state == SDEV_BLOCK)
 		sdev->sdev_state = SDEV_RUNNING;
 	else if (sdev->sdev_state == SDEV_CREATED_BLOCK)
 		sdev->sdev_state = SDEV_CREATED;
-	else
+	else if (sdev->sdev_state != SDEV_RUNNING || !blk_queue_stopped(q)) {
+		spin_unlock_irqrestore(q->queue_lock, flags);
 		return -EINVAL;
+	}
 
-	spin_lock_irqsave(q->queue_lock, flags);
 	blk_start_queue(q);
 	spin_unlock_irqrestore(q->queue_lock, flags);
 


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/1] mid-layer unblocks blocked sdev leaving queue stopped
  2010-02-16 21:48     ` Michael Reed
@ 2010-02-24 20:17       ` Michael Reed
  2010-03-09 16:14       ` James Smart
  1 sibling, 0 replies; 6+ messages in thread
From: Michael Reed @ 2010-02-24 20:17 UTC (permalink / raw)
  To: Mike Christie; +Cc: linux-scsi, James Smart, James Bottomley, Hannes Reinecke



Michael Reed wrote:
> Mike Christie wrote:
>> On 02/10/2010 04:24 PM, Michael Reed wrote:
>>>> +	/*
>>>> +	 * If device is blocked, leave state alone and let blocker
>>>> +	 * unblock when appropriate.  Otherwise, set the device
>>>> +	 * running here so that slave configure may perform i/o.
>>>> +	 */
>>>> +	if (sdev->sdev_state != SDEV_BLOCK) {
>>>> +		ret = scsi_device_set_state(sdev, SDEV_RUNNING);
>> Do we need locking here? Is it possible that right after we check the 
>> sdev_state for being blocked, it could be be set to blocked?
> 
> Since the sdev_state can be set to blocked upon entry to the routines,
> it seems likely that it could transition as you describe.  The host_lock
> appears to be the right lock for this.  Nice catch.  It does, however,
> open a can of worms.  The documentation for the _block()/_unblock()
> functions state that it is assumed the host_lock is held upon entry.
> This doesn't line up with the code, which explicitly releases the
> host_lock before calling scsi_internal_device_[un]block().
> 
> I previously toyed around with an alternate method of resolving this which
> put the onus on the unblocking thread which doesn't care about the
> sdev_state.  As scsi_internal_device_block() is the routine which stops
> the queue, I made scsi_internal_device_unblock() allow for the
> possibility that another thread had transitioned sdev_state from SDEV_BLOCK.
> 
> It's been long enough that I don't recall why I chose the patch I
> posted to resolve the issue, but this patch was also effective at
> resolving the issue.  It also confines the change to the _block()/
> _unblock() pair of routines and doesn't require changes to the
> existing locking logic.  In retrospect, this patch was probably the
> one I should have chosen to post.
> 
> Please review.

I've encountered a test system which exhibits this problem regularly.
This patch consistently corrects the hang and allows the system to
boot.  I used a printk to confirm the condition was encountered.

Please consider this patch for integration to stable as well as
any other appropriate trees.

Thank you.

Mike

> 
> Signed-off-by: Michael Reed <mdr@sgi.com>
> 
> --- a/drivers/scsi/scsi_lib.c	2010-02-08 11:19:57.000000000 -0600
> +++ b/drivers/scsi/scsi_lib.c	2010-02-11 14:35:26.249142688 -0600
> @@ -2376,7 +2376,7 @@ EXPORT_SYMBOL(scsi_target_resume);
>   *	(which must be a legal transition).  When the device is in this
>   *	state, all commands are deferred until the scsi lld reenables
>   *	the device with scsi_device_unblock or device_block_tmo fires.
> - *	This routine assumes the host_lock is held on entry.
> + *	This routine assumes the host_lock is not held on entry.
>   */
>  int
>  scsi_internal_device_block(struct scsi_device *sdev)
> @@ -2420,7 +2420,7 @@ EXPORT_SYMBOL_GPL(scsi_internal_device_b
>   *	This routine transitions the device to the SDEV_RUNNING state
>   *	(which must be a legal transition) allowing the midlayer to
>   *	goose the queue for this device.  This routine assumes the 
> - *	host_lock is held upon entry.
> + *	host_lock is not held upon entry.
>   */
>  int
>  scsi_internal_device_unblock(struct scsi_device *sdev)
> @@ -2431,15 +2431,23 @@ scsi_internal_device_unblock(struct scsi
>  	/* 
>  	 * Try to transition the scsi device to SDEV_RUNNING
>  	 * and goose the device queue if successful.  
> +	 * It's possible that an sdev may be blocked before it is
> +	 * completely set up.  scsi_sysfs_add_sdev() and scsi_add_lun()
> +	 * will unconditionally attempt to transition the sdev to
> +	 * SDEV_RUNNING.  To avoid leaving the queue stopped, we
> +	 * allow for the sdev to already be in the SDEV_RUNNING state.
>  	 */
> +	spin_lock_irqsave(q->queue_lock, flags);
> +
>  	if (sdev->sdev_state == SDEV_BLOCK)
>  		sdev->sdev_state = SDEV_RUNNING;
>  	else if (sdev->sdev_state == SDEV_CREATED_BLOCK)
>  		sdev->sdev_state = SDEV_CREATED;
> -	else
> +	else if (sdev->sdev_state != SDEV_RUNNING || !blk_queue_stopped(q)) {
> +		spin_unlock_irqrestore(q->queue_lock, flags);
>  		return -EINVAL;
> +	}
>  
> -	spin_lock_irqsave(q->queue_lock, flags);
>  	blk_start_queue(q);
>  	spin_unlock_irqrestore(q->queue_lock, flags);
>  
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/1] mid-layer unblocks blocked sdev leaving queue stopped
  2010-02-16 21:48     ` Michael Reed
  2010-02-24 20:17       ` Michael Reed
@ 2010-03-09 16:14       ` James Smart
  1 sibling, 0 replies; 6+ messages in thread
From: James Smart @ 2010-03-09 16:14 UTC (permalink / raw)
  To: Michael Reed; +Cc: Mike Christie, linux-scsi, James Bottomley

Re: locks-held comment : I went back to some of the original patches (Oct 
2004) and see that the comments were wrong then too...

Re: the patch...
In review, seeing how many places allow for the sdev_state to change w/o being 
under a lock (host lock or scan lock) is a real concern. But, as you mention 
(prior post), really correcting it opens a rats nest.  This does get around 
the primary issue - which is the overwrite of the blocked state by the scan 
code.  I wasn't sure the queue lock synchronization was good relative to the 
scan state, but I assume that's more for the potential blocked_queue_stopped() 
call than for anything else.

Acked-by: James Smart <james.smart@emulex.com>

-- james s


Michael Reed wrote:
> Mike Christie wrote:
>> On 02/10/2010 04:24 PM, Michael Reed wrote:
>>>> +	/*
>>>> +	 * If device is blocked, leave state alone and let blocker
>>>> +	 * unblock when appropriate.  Otherwise, set the device
>>>> +	 * running here so that slave configure may perform i/o.
>>>> +	 */
>>>> +	if (sdev->sdev_state != SDEV_BLOCK) {
>>>> +		ret = scsi_device_set_state(sdev, SDEV_RUNNING);
>> Do we need locking here? Is it possible that right after we check the 
>> sdev_state for being blocked, it could be be set to blocked?
> 
> Since the sdev_state can be set to blocked upon entry to the routines,
> it seems likely that it could transition as you describe.  The host_lock
> appears to be the right lock for this.  Nice catch.  It does, however,
> open a can of worms.  The documentation for the _block()/_unblock()
> functions state that it is assumed the host_lock is held upon entry.
> This doesn't line up with the code, which explicitly releases the
> host_lock before calling scsi_internal_device_[un]block().
> 
> I previously toyed around with an alternate method of resolving this which
> put the onus on the unblocking thread which doesn't care about the
> sdev_state.  As scsi_internal_device_block() is the routine which stops
> the queue, I made scsi_internal_device_unblock() allow for the
> possibility that another thread had transitioned sdev_state from SDEV_BLOCK.
> 
> It's been long enough that I don't recall why I chose the patch I
> posted to resolve the issue, but this patch was also effective at
> resolving the issue.  It also confines the change to the _block()/
> _unblock() pair of routines and doesn't require changes to the
> existing locking logic.  In retrospect, this patch was probably the
> one I should have chosen to post.
> 
> Please review.
> 
> Signed-off-by: Michael Reed <mdr@sgi.com>
> 
> --- a/drivers/scsi/scsi_lib.c	2010-02-08 11:19:57.000000000 -0600
> +++ b/drivers/scsi/scsi_lib.c	2010-02-11 14:35:26.249142688 -0600
> @@ -2376,7 +2376,7 @@ EXPORT_SYMBOL(scsi_target_resume);
>   *	(which must be a legal transition).  When the device is in this
>   *	state, all commands are deferred until the scsi lld reenables
>   *	the device with scsi_device_unblock or device_block_tmo fires.
> - *	This routine assumes the host_lock is held on entry.
> + *	This routine assumes the host_lock is not held on entry.
>   */
>  int
>  scsi_internal_device_block(struct scsi_device *sdev)
> @@ -2420,7 +2420,7 @@ EXPORT_SYMBOL_GPL(scsi_internal_device_b
>   *	This routine transitions the device to the SDEV_RUNNING state
>   *	(which must be a legal transition) allowing the midlayer to
>   *	goose the queue for this device.  This routine assumes the 
> - *	host_lock is held upon entry.
> + *	host_lock is not held upon entry.
>   */
>  int
>  scsi_internal_device_unblock(struct scsi_device *sdev)
> @@ -2431,15 +2431,23 @@ scsi_internal_device_unblock(struct scsi
>  	/* 
>  	 * Try to transition the scsi device to SDEV_RUNNING
>  	 * and goose the device queue if successful.  
> +	 * It's possible that an sdev may be blocked before it is
> +	 * completely set up.  scsi_sysfs_add_sdev() and scsi_add_lun()
> +	 * will unconditionally attempt to transition the sdev to
> +	 * SDEV_RUNNING.  To avoid leaving the queue stopped, we
> +	 * allow for the sdev to already be in the SDEV_RUNNING state.
>  	 */
> +	spin_lock_irqsave(q->queue_lock, flags);
> +
>  	if (sdev->sdev_state == SDEV_BLOCK)
>  		sdev->sdev_state = SDEV_RUNNING;
>  	else if (sdev->sdev_state == SDEV_CREATED_BLOCK)
>  		sdev->sdev_state = SDEV_CREATED;
> -	else
> +	else if (sdev->sdev_state != SDEV_RUNNING || !blk_queue_stopped(q)) {
> +		spin_unlock_irqrestore(q->queue_lock, flags);
>  		return -EINVAL;
> +	}
>  
> -	spin_lock_irqsave(q->queue_lock, flags);
>  	blk_start_queue(q);
>  	spin_unlock_irqrestore(q->queue_lock, flags);
>  
> 
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2010-03-09 16:14 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-12-16 22:43 [PATCH 1/1] mid-layer unblocks blocked sdev leaving queue stopped Michael Reed
2010-02-10 22:24 ` Michael Reed
2010-02-10 22:57   ` Mike Christie
2010-02-16 21:48     ` Michael Reed
2010-02-24 20:17       ` Michael Reed
2010-03-09 16:14       ` James Smart

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).