[PATCH 1/5] SCSI scanning and removal fixes

linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 1/5] SCSI scanning and removal fixes
@ 2005-07-26 14:12 Alan Stern
  2005-09-07 15:16 ` James Bottomley
  0 siblings, 1 reply; 30+ messages in thread
From: Alan Stern @ 2005-07-26 14:12 UTC (permalink / raw)
  To: James Bottomley; +Cc: SCSI development list

James:

This patch (as542) fixes a few loose ends left by Mike's patches.  It adds 
a declaration for the new scsi_host_set_state routine, adds an allowed 
transition from the SHOST_RECOVERY state to the SHOST_CANCEL state, and 
avoids returning an uninitialized value in __scsi_add_device.

Alan Stern



Signed-off-by: Alan Stern <stern@rowland.harvard.edu>

Index: 2613/include/scsi/scsi_host.h
===================================================================
--- 2613.orig/include/scsi/scsi_host.h
+++ 2613/include/scsi/scsi_host.h
@@ -636,6 +636,7 @@ extern void scsi_remove_host(struct Scsi
 extern struct Scsi_Host *scsi_host_get(struct Scsi_Host *);
 extern void scsi_host_put(struct Scsi_Host *t);
 extern struct Scsi_Host *scsi_host_lookup(unsigned short);
+extern int scsi_host_set_state(struct Scsi_Host *, enum scsi_host_state);
 extern const char *scsi_host_state_name(enum scsi_host_state);
 
 extern u64 scsi_calculate_bounce_limit(struct Scsi_Host *);
Index: 2613/drivers/scsi/hosts.c
===================================================================
--- 2613.orig/drivers/scsi/hosts.c
+++ 2613/drivers/scsi/hosts.c
@@ -97,6 +97,7 @@ int scsi_host_set_state(struct Scsi_Host
 		switch (oldstate) {
 		case SHOST_CREATED:
 		case SHOST_RUNNING:
+		case SHOST_RECOVERY:
 			break;
 		default:
 			goto illegal;
Index: 2613/drivers/scsi/scsi_scan.c
===================================================================
--- 2613.orig/drivers/scsi/scsi_scan.c
+++ 2613/drivers/scsi/scsi_scan.c
@@ -1210,25 +1210,28 @@ struct scsi_device *__scsi_add_device(st
 				      uint id, uint lun, void *hostdata)
 {
 	struct scsi_device *sdev;
-	struct device *parent = &shost->shost_gendev;
-	int res;
-	struct scsi_target *starget = scsi_alloc_target(parent, channel, id);
-
-	if (!starget)
-		return ERR_PTR(-ENOMEM);
+	struct scsi_target *starget;
 
-	get_device(&starget->dev);
 	down(&shost->scan_mutex);
-	if (scsi_host_scan_allowed(shost)) {
-		res = scsi_probe_and_add_lun(starget, lun, NULL, &sdev, 1,
-					     hostdata);
-		if (res != SCSI_SCAN_LUN_PRESENT)
-			sdev = ERR_PTR(-ENODEV);
+	if (!scsi_host_scan_allowed(shost)) {
+		sdev = ERR_PTR(-ENODEV);
+		goto out;
 	}
-	up(&shost->scan_mutex);
+	starget = scsi_alloc_target(&shost->shost_gendev, channel, id);
+	if (!starget) {
+		sdev = ERR_PTR(-ENOMEM);
+		goto out;
+	}
+
+	get_device(&starget->dev);
+	if (scsi_probe_and_add_lun(starget, lun, NULL, &sdev, 1, hostdata)
+			!= SCSI_SCAN_LUN_PRESENT)
+		sdev = ERR_PTR(-ENODEV);
 	scsi_target_reap(starget);
 	put_device(&starget->dev);
 
+ out:
+	up(&shost->scan_mutex);
 	return sdev;
 }
 EXPORT_SYMBOL(__scsi_add_device);


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/5] SCSI scanning and removal fixes
  2005-07-26 14:12 [PATCH 1/5] SCSI scanning and removal fixes Alan Stern
@ 2005-09-07 15:16 ` James Bottomley
  2005-09-07 18:27   ` Alan Stern
  0 siblings, 1 reply; 30+ messages in thread
From: James Bottomley @ 2005-09-07 15:16 UTC (permalink / raw)
  To: Alan Stern; +Cc: SCSI development list

On Tue, 2005-07-26 at 10:12 -0400, Alan Stern wrote:
> This patch (as542) fixes a few loose ends left by Mike's patches.  It adds 
> a declaration for the new scsi_host_set_state routine, adds an allowed 
> transition from the SHOST_RECOVERY state to the SHOST_CANCEL state, and 
> avoids returning an uninitialized value in __scsi_add_device.

The first part (external declaration) is already in.

The second (allow RECOVERY->CANCEL) isn't really an answer.  The correct
thing, I suppose, is to have scsi_remove_host() wait for the error
handler to finish if the state transition cannot be accomodated
(otherwise the error handler will try to transition ->RUNNING part way
through the removal).

The third (don't allow uninitialised return from __scsi_add_device) is
correct, but the return is universally ignored (the only actual user
being ipr).  Unfortunately, now I look at this, ipr gets it wrong too.
This API currently returns an sdev with an incremented refcount.  If you
ignore the return, as ipr does, you're leaking refcounts.  So, since the
only user gets it wrong all the time, I'd vote for modifying the API not
to return a device.

James

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/5] SCSI scanning and removal fixes
  2005-09-07 15:16 ` James Bottomley
@ 2005-09-07 18:27   ` Alan Stern
  2005-09-07 18:37     ` Luben Tuikov
                       ` (2 more replies)
  0 siblings, 3 replies; 30+ messages in thread
From: Alan Stern @ 2005-09-07 18:27 UTC (permalink / raw)
  To: James Bottomley; +Cc: SCSI development list

On Wed, 7 Sep 2005, James Bottomley wrote:

> On Tue, 2005-07-26 at 10:12 -0400, Alan Stern wrote:
> > This patch (as542) fixes a few loose ends left by Mike's patches.  It adds 
> > a declaration for the new scsi_host_set_state routine, adds an allowed 
> > transition from the SHOST_RECOVERY state to the SHOST_CANCEL state, and 
> > avoids returning an uninitialized value in __scsi_add_device.
> 
> The first part (external declaration) is already in.
> 
> The second (allow RECOVERY->CANCEL) isn't really an answer.  The correct
> thing, I suppose, is to have scsi_remove_host() wait for the error
> handler to finish if the state transition cannot be accomodated
> (otherwise the error handler will try to transition ->RUNNING part way
> through the removal).

I'm going to argue strongly about this.  scsi_remove_host should _not_
wait for error recovery to complete -- to do so will invite deadlocks.  
(Suppose the error handler is waiting for a bus reset, but the bus reset
routine requires a semaphore held by the LLD during the call to
scsi_remove_host?)  Furthermore, error recovery can potentially take quite
a long time -- much longer than we want to wait during a removal event.  
Instead, the error handler should not be allowed to make the transition to
RUNNING once the removal has started.

> The third (don't allow uninitialised return from __scsi_add_device) is
> correct, but the return is universally ignored (the only actual user
> being ipr).  Unfortunately, now I look at this, ipr gets it wrong too.
> This API currently returns an sdev with an incremented refcount.  If you
> ignore the return, as ipr does, you're leaking refcounts.  So, since the
> only user gets it wrong all the time, I'd vote for modifying the API not
> to return a device.

Changing the API is fine with me, but the existing code is still shaky
because it calls scsi_alloc_target before checking scsi_host_scan_allowed.  
Maybe that's not an out-and-out mistake, but better to avoid it.

Would you like me to submit an updated patch?

Alan Stern

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/5] SCSI scanning and removal fixes
  2005-09-07 18:27   ` Alan Stern
@ 2005-09-07 18:37     ` Luben Tuikov
  2005-09-07 18:42     ` Luben Tuikov
  2005-09-07 19:58     ` James Bottomley
  2 siblings, 0 replies; 30+ messages in thread
From: Luben Tuikov @ 2005-09-07 18:37 UTC (permalink / raw)
  To: Alan Stern; +Cc: James Bottomley, SCSI development list

On 09/07/05 14:27, Alan Stern wrote:
> On Wed, 7 Sep 2005, James Bottomley wrote:
> 
> 
>>On Tue, 2005-07-26 at 10:12 -0400, Alan Stern wrote:
>>
>>>This patch (as542) fixes a few loose ends left by Mike's patches.  It adds 
>>>a declaration for the new scsi_host_set_state routine, adds an allowed 
>>>transition from the SHOST_RECOVERY state to the SHOST_CANCEL state, and 
>>>avoids returning an uninitialized value in __scsi_add_device.
>>
>>The first part (external declaration) is already in.
>>
>>The second (allow RECOVERY->CANCEL) isn't really an answer.  The correct
>>thing, I suppose, is to have scsi_remove_host() wait for the error
>>handler to finish if the state transition cannot be accomodated
>>(otherwise the error handler will try to transition ->RUNNING part way
>>through the removal).
> 
> 
> I'm going to argue strongly about this.  scsi_remove_host should _not_
> wait for error recovery to complete -- to do so will invite deadlocks.  
> (Suppose the error handler is waiting for a bus reset, but the bus reset
> routine requires a semaphore held by the LLD during the call to
> scsi_remove_host?)  Furthermore, error recovery can potentially take quite
> a long time -- much longer than we want to wait during a removal event.  
> Instead, the error handler should not be allowed to make the transition to
> RUNNING once the removal has started.

Hey guys,

I haven't looked at Alan's patches in detail, but would just like
to mention that my driver supports hot plugging _and_unplugging_
of devices and _whole_domains_ (many devices at the same time,
plug and unplug).

scsi hosts are also dynamically created and destroyed.

Currently there is absolutely no problems with any deadlocks or anything
like that.  (Of course the driver implements its own eh+timeout hooks.)

Make sure you guys do not break something. ;-)

	Luben



> 
> 
>>The third (don't allow uninitialised return from __scsi_add_device) is
>>correct, but the return is universally ignored (the only actual user
>>being ipr).  Unfortunately, now I look at this, ipr gets it wrong too.
>>This API currently returns an sdev with an incremented refcount.  If you
>>ignore the return, as ipr does, you're leaking refcounts.  So, since the
>>only user gets it wrong all the time, I'd vote for modifying the API not
>>to return a device.
> 
> 
> Changing the API is fine with me, but the existing code is still shaky
> because it calls scsi_alloc_target before checking scsi_host_scan_allowed.  
> Maybe that's not an out-and-out mistake, but better to avoid it.
> 
> Would you like me to submit an updated patch?
> 
> Alan Stern
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/5] SCSI scanning and removal fixes
  2005-09-07 18:27   ` Alan Stern
  2005-09-07 18:37     ` Luben Tuikov
@ 2005-09-07 18:42     ` Luben Tuikov
  2005-09-07 19:31       ` Alan Stern
  2005-09-07 19:58     ` James Bottomley
  2 siblings, 1 reply; 30+ messages in thread
From: Luben Tuikov @ 2005-09-07 18:42 UTC (permalink / raw)
  To: Alan Stern; +Cc: James Bottomley, SCSI development list

On 09/07/05 14:27, Alan Stern wrote:
> On Wed, 7 Sep 2005, James Bottomley wrote:
> 
> 
>>On Tue, 2005-07-26 at 10:12 -0400, Alan Stern wrote:
>>
>>>This patch (as542) fixes a few loose ends left by Mike's patches.  It adds 
>>>a declaration for the new scsi_host_set_state routine, adds an allowed 
>>>transition from the SHOST_RECOVERY state to the SHOST_CANCEL state, and 
>>>avoids returning an uninitialized value in __scsi_add_device.
>>
>>The first part (external declaration) is already in.
>>
>>The second (allow RECOVERY->CANCEL) isn't really an answer.  The correct
>>thing, I suppose, is to have scsi_remove_host() wait for the error
>>handler to finish if the state transition cannot be accomodated
>>(otherwise the error handler will try to transition ->RUNNING part way
>>through the removal).
> 
> 
> I'm going to argue strongly about this.  scsi_remove_host should _not_
> wait for error recovery to complete -- to do so will invite deadlocks.  
> (Suppose the error handler is waiting for a bus reset, but the bus reset
> routine requires a semaphore held by the LLD during the call to
> scsi_remove_host?)  Furthermore, error recovery can potentially take quite
> a long time -- much longer than we want to wait during a removal event.  
> Instead, the error handler should not be allowed to make the transition to
> RUNNING once the removal has started.

Alan, this tells me one thing: the _layering_ infrastructure is broken,
and in this case, it looks like is not SCSI Core.

E.g. why is the LLDD messing with semas of the host? (rhetorical, please
do not answer as this would go into another thread...)

BTW, since the eh is a _function of the host_, James is correct that
scsi_remove_host should wait for the eh to finish.

This makes me believe that maybe USB storage would need an overhaul
of the event infra: removing and adding, and/or implement its own
eh.

	Luben


> 
> 
>>The third (don't allow uninitialised return from __scsi_add_device) is
>>correct, but the return is universally ignored (the only actual user
>>being ipr).  Unfortunately, now I look at this, ipr gets it wrong too.
>>This API currently returns an sdev with an incremented refcount.  If you
>>ignore the return, as ipr does, you're leaking refcounts.  So, since the
>>only user gets it wrong all the time, I'd vote for modifying the API not
>>to return a device.
> 
> 
> Changing the API is fine with me, but the existing code is still shaky
> because it calls scsi_alloc_target before checking scsi_host_scan_allowed.  
> Maybe that's not an out-and-out mistake, but better to avoid it.
> 
> Would you like me to submit an updated patch?
> 
> Alan Stern
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/5] SCSI scanning and removal fixes
  2005-09-07 18:42     ` Luben Tuikov
@ 2005-09-07 19:31       ` Alan Stern
  2005-09-07 20:00         ` Mike Anderson
  2005-09-07 20:43         ` Luben Tuikov
  0 siblings, 2 replies; 30+ messages in thread
From: Alan Stern @ 2005-09-07 19:31 UTC (permalink / raw)
  To: Luben Tuikov; +Cc: James Bottomley, SCSI development list

On Wed, 7 Sep 2005, Luben Tuikov wrote:

> On 09/07/05 14:27, Alan Stern wrote:

> > I'm going to argue strongly about this.  scsi_remove_host should _not_
> > wait for error recovery to complete -- to do so will invite deadlocks.  
> > (Suppose the error handler is waiting for a bus reset, but the bus reset
> > routine requires a semaphore held by the LLD during the call to
> > scsi_remove_host?)  Furthermore, error recovery can potentially take quite
> > a long time -- much longer than we want to wait during a removal event.  
> > Instead, the error handler should not be allowed to make the transition to
> > RUNNING once the removal has started.
> 
> Alan, this tells me one thing: the _layering_ infrastructure is broken,
> and in this case, it looks like is not SCSI Core.
> 
> E.g. why is the LLDD messing with semas of the host? (rhetorical, please
> do not answer as this would go into another thread...)
> 
> BTW, since the eh is a _function of the host_, James is correct that
> scsi_remove_host should wait for the eh to finish.

That's a very good point.  It hadn't occurred to me before, but you're
absolutely right.  scsi_remove_host should indeed wait for the error
handler to finish.  But first it should set things up so that the
everything the error handler does will fail-fast, so that the eh can
return quickly.  That will include putting the device into the SDEV_CANCEL
state, so it remains true that the error handler better not try to move
from CANCEL back to RUNNING.

As for layering violations and deadlocks...  Unfortunately the violation
is unavoidable.  It's related to the way the error handler sometimes tries
to fix a non-working device by doing a bus reset, which will also affect
all the other devices on the same bus.  The same sort of thing applies to
USB.  Fortunately the deadlock _is_ avoidable; in fact the USB driver
already has code in place to fail the reset attempt if it takes too long
to acquire the lock.  So that's no longer an issue.

> This makes me believe that maybe USB storage would need an overhaul
> of the event infra: removing and adding, and/or implement its own
> eh.

In the long run that might be good, but for now I think we'll be okay.  
The important thing is to make sure that once the device has moved to the
CANCEL state, everything fails quickly.

Alan Stern

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/5] SCSI scanning and removal fixes
  2005-09-07 18:27   ` Alan Stern
  2005-09-07 18:37     ` Luben Tuikov
  2005-09-07 18:42     ` Luben Tuikov
@ 2005-09-07 19:58     ` James Bottomley
  2005-09-07 22:05       ` James Bottomley
                         ` (2 more replies)
  2 siblings, 3 replies; 30+ messages in thread
From: James Bottomley @ 2005-09-07 19:58 UTC (permalink / raw)
  To: Alan Stern; +Cc: SCSI development list

On Wed, 2005-09-07 at 14:27 -0400, Alan Stern wrote:
> > The second (allow RECOVERY->CANCEL) isn't really an answer.  The correct
> > thing, I suppose, is to have scsi_remove_host() wait for the error
> > handler to finish if the state transition cannot be accomodated
> > (otherwise the error handler will try to transition ->RUNNING part way
> > through the removal).
> 
> I'm going to argue strongly about this.  scsi_remove_host should _not_
> wait for error recovery to complete -- to do so will invite deadlocks.  
> (Suppose the error handler is waiting for a bus reset, but the bus reset
> routine requires a semaphore held by the LLD during the call to
> scsi_remove_host?)  Furthermore, error recovery can potentially take quite
> a long time -- much longer than we want to wait during a removal event.  
> Instead, the error handler should not be allowed to make the transition to
> RUNNING once the removal has started.

I agree (about the deadlocks).  However, as things stand RECOVERY is a
state in the model and the model can only be in a single state.  If you
permit the transition, and recovery is going on in parallel with
removal, they'll race to set the final state (removal wants DEL and the
eh thread will set it to RUNNING).

Either we go back to having an in_recovery flag (i.e. lift recovery out
of the state model) or we make the model more complex to cope with this.
Since really the only thing we test is in_recovery, we could do a more
complex model; something like:

created
   |
   v    <--------- 
 running ---------> recovery
   |                   |
   v   <----------     v
 cancel ----------> recover/cancel
   |                   |
   v   ----------->    v
  del <------------ recover/del

I also think I'd like not to go from del -> recover/del, but unless del
actually means that all devices have completed their I/O for deletion
that can't be avoided.

> Changing the API is fine with me, but the existing code is still shaky
> because it calls scsi_alloc_target before checking scsi_host_scan_allowed.  
> Maybe that's not an out-and-out mistake, but better to avoid it.

Actually, alloc_target is properly guarded so it doesn't need the scan
mutex.  It might be nice to update the SDEV_ state model to include a
"scanning" state, that way we could properly guard the sdev_alloc as
well and dump the scan mutex ... that's probably more than a slight
change, though.

> Would you like me to submit an updated patch?

Yes, please.  It's been suggested that we should have a scsi_add_target
that returns zero on success or error on failure (with no ref to the
sdev) and keep the old behaviour of __scsi_add_target().

James

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/5] SCSI scanning and removal fixes
  2005-09-07 19:31       ` Alan Stern
@ 2005-09-07 20:00         ` Mike Anderson
  2005-09-07 20:43         ` Luben Tuikov
  1 sibling, 0 replies; 30+ messages in thread
From: Mike Anderson @ 2005-09-07 20:00 UTC (permalink / raw)
  To: Alan Stern; +Cc: Luben Tuikov, James Bottomley, SCSI development list

Alan Stern <stern@rowland.harvard.edu> wrote:
> On Wed, 7 Sep 2005, Luben Tuikov wrote:
> 
> > On 09/07/05 14:27, Alan Stern wrote:
> 
> > > I'm going to argue strongly about this.  scsi_remove_host should _not_
> > > wait for error recovery to complete -- to do so will invite deadlocks.  
> > > (Suppose the error handler is waiting for a bus reset, but the bus reset
> > > routine requires a semaphore held by the LLD during the call to
> > > scsi_remove_host?)  Furthermore, error recovery can potentially take quite
> > > a long time -- much longer than we want to wait during a removal event.  
> > > Instead, the error handler should not be allowed to make the transition to
> > > RUNNING once the removal has started.
> > 
> > Alan, this tells me one thing: the _layering_ infrastructure is broken,
> > and in this case, it looks like is not SCSI Core.
> > 
> > E.g. why is the LLDD messing with semas of the host? (rhetorical, please
> > do not answer as this would go into another thread...)
> > 
> > BTW, since the eh is a _function of the host_, James is correct that
> > scsi_remove_host should wait for the eh to finish.
> 
> That's a very good point.  It hadn't occurred to me before, but you're
> absolutely right.  scsi_remove_host should indeed wait for the error
> handler to finish.  But first it should set things up so that the
> everything the error handler does will fail-fast, so that the eh can
> return quickly.  That will include putting the device into the SDEV_CANCEL
> state, so it remains true that the error handler better not try to move
> from CANCEL back to RUNNING.
> 

Well the scsi_device_set_state function / model will not let us move a
device from SDEV_CANCEL to SDEV_RUNNING again.

To fail faster (I assumed you mean the concept not the flag) we would need
to add a few checks during the start of some of the functions. It would be
good to make these as efficient as possibly, but I guess we are already in
the error handler so we have take a time hit already.

-andmike
--
Michael Anderson
andmike@us.ibm.com

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/5] SCSI scanning and removal fixes
  2005-09-07 19:31       ` Alan Stern
  2005-09-07 20:00         ` Mike Anderson
@ 2005-09-07 20:43         ` Luben Tuikov
  2005-09-07 21:34           ` Stefan Richter
  2005-09-08 15:19           ` Alan Stern
  1 sibling, 2 replies; 30+ messages in thread
From: Luben Tuikov @ 2005-09-07 20:43 UTC (permalink / raw)
  To: Alan Stern; +Cc: James Bottomley, SCSI development list

On 09/07/05 15:31, Alan Stern wrote:
> 
> As for layering violations and deadlocks...  Unfortunately the violation
> is unavoidable.  It's related to the way the error handler sometimes tries
> to fix a non-working device by doing a bus reset, which will also affect
> all the other devices on the same bus.

Yes, that's a good point.

Is it possible to implement LU Reset TMF for USB storage devices?
If so, then you do not need to do a "bus reset".

*General note:* It is very bad to have to punish all devices on the
"bus" just because one device failed.  Isn't there a better way
to recover(*) a failed USB Storage device?

(*) That doesn't necessarily mean "bring back to life", it could
also mean "remove".

That is, a "bus reset" means that also any other device on the "bus" is
unreachable.  If this is _not_ the case, then a "bus" reset must _not_
take place.

>>This makes me believe that maybe USB storage would need an overhaul
>>of the event infra: removing and adding, and/or implement its own
>>eh.
> 
> In the long run that might be good, but for now I think we'll be okay.  
> The important thing is to make sure that once the device has moved to the
> CANCEL state, everything fails quickly.

If you have the means to do transport checking of the state of the
device, (and it seems like you can for USB Storage, since it is USB after
all), then you _must_ implement your own eh handler.  This is what
it is for.  Please, consider.

	Luben

P.S. It may actually simplify things in the USB transport layer _and_ in
SCSI Core (i.e. no need for those patches).

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/5] SCSI scanning and removal fixes
  2005-09-07 20:43         ` Luben Tuikov
@ 2005-09-07 21:34           ` Stefan Richter
  2005-09-08 15:19           ` Alan Stern
  1 sibling, 0 replies; 30+ messages in thread
From: Stefan Richter @ 2005-09-07 21:34 UTC (permalink / raw)
  To: Luben Tuikov; +Cc: Alan Stern, James Bottomley, SCSI development list

Luben Tuikov wrote:
> It is very bad to have to punish all devices on the
> "bus" just because one device failed.  Isn't there a better way
> to recover(*) a failed USB Storage device?

usb-storage implements scsi_mod bus reset as USB port reset. This 
affects only the failing device.

There is one scsi_mod bus per SCSI target in usb-storage. There is no 
mapping from scsi_mod bus to USB bus. The same is true for sbp2. 
Usb-storage and sbp2 actually don't care at all for the bus a device is 
attached to.

(Or at least that's how I understand the USB subsystem; I'm sure about 
the 1394 subsystem.)
-- 
Stefan Richter
-=====-=-=-= =--= --===
http://arcgraph.de/sr/

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/5] SCSI scanning and removal fixes
  2005-09-07 19:58     ` James Bottomley
@ 2005-09-07 22:05       ` James Bottomley
  2005-09-08 15:59       ` Alan Stern
  2005-09-08 16:08       ` Alan Stern
  2 siblings, 0 replies; 30+ messages in thread
From: James Bottomley @ 2005-09-07 22:05 UTC (permalink / raw)
  To: Alan Stern; +Cc: SCSI development list

On Wed, 2005-09-07 at 14:58 -0500, James Bottomley wrote:
> On Wed, 2005-09-07 at 14:27 -0400, Alan Stern wrote:
> > I'm going to argue strongly about this.  scsi_remove_host should _not_
> > wait for error recovery to complete -- to do so will invite deadlocks.  
> > (Suppose the error handler is waiting for a bus reset, but the bus reset
> > routine requires a semaphore held by the LLD during the call to
> > scsi_remove_host?)  Furthermore, error recovery can potentially take quite
> > a long time -- much longer than we want to wait during a removal event.  
> > Instead, the error handler should not be allowed to make the transition to
> > RUNNING once the removal has started.
> 
> I agree (about the deadlocks).  However, as things stand RECOVERY is a
> state in the model and the model can only be in a single state.  If you
> permit the transition, and recovery is going on in parallel with
> removal, they'll race to set the final state (removal wants DEL and the
> eh thread will set it to RUNNING).
> 
> Either we go back to having an in_recovery flag (i.e. lift recovery out
> of the state model) or we make the model more complex to cope with this.
> Since really the only thing we test is in_recovery, we could do a more
> complex model; something like:

OK, try this as an implementation of that model (except I junked the DEL
-> DEL_RECOVERY path).

There's a few nasties in this (notably that timed out commands will be
finished without error recovery from the DEL state).

James

diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c
--- a/drivers/scsi/hosts.c
+++ b/drivers/scsi/hosts.c
@@ -98,6 +98,7 @@ int scsi_host_set_state(struct Scsi_Host
 		switch (oldstate) {
 		case SHOST_CREATED:
 		case SHOST_RUNNING:
+		case SHOST_CANCEL_RECOVERY:
 			break;
 		default:
 			goto illegal;
@@ -107,12 +108,31 @@ int scsi_host_set_state(struct Scsi_Host
 	case SHOST_DEL:
 		switch (oldstate) {
 		case SHOST_CANCEL:
+		case SHOST_DEL_RECOVERY:
 			break;
 		default:
 			goto illegal;
 		}
 		break;
 
+	case SHOST_CANCEL_RECOVERY:
+		switch (oldstate) {
+		case SHOST_CANCEL:
+		case SHOST_RECOVERY:
+			break;
+		default:
+			goto illegal;
+		}
+		break;
+
+	case SHOST_DEL_RECOVERY:
+		switch (oldstate) {
+		case SHOST_CANCEL_RECOVERY:
+			break;
+		default:
+			goto illegal;
+		}
+		break;
 	}
 	shost->shost_state = state;
 	return 0;
@@ -135,12 +155,17 @@ EXPORT_SYMBOL(scsi_host_set_state);
 void scsi_remove_host(struct Scsi_Host *shost)
 {
 	down(&shost->scan_mutex);
-	scsi_host_set_state(shost, SHOST_CANCEL);
+	if (!scsi_host_set_state(shost, SHOST_CANCEL))
+		if (!scsi_host_set_state(shost, SHOST_CANCEL_RECOVERY)) {
+			up(&shost->scan_mutex);
+			return;
+		}
 	up(&shost->scan_mutex);
 	scsi_forget_host(shost);
 	scsi_proc_host_rm(shost);
 
-	scsi_host_set_state(shost, SHOST_DEL);
+	if (!scsi_host_set_state(shost, SHOST_DEL))
+		BUG_ON(!scsi_host_set_state(shost, SHOST_DEL_RECOVERY));
 
 	transport_unregister_device(&shost->shost_gendev);
 	class_device_unregister(&shost->shost_classdev);
diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -68,19 +68,24 @@ int scsi_eh_scmd_add(struct scsi_cmnd *s
 {
 	struct Scsi_Host *shost = scmd->device->host;
 	unsigned long flags;
+	int ret = 0;
 
 	if (shost->eh_wait == NULL)
 		return 0;
 
 	spin_lock_irqsave(shost->host_lock, flags);
+	if (!scsi_host_set_state(shost, SHOST_RECOVERY))
+		if (!scsi_host_set_state(shost, SHOST_CANCEL_RECOVERY))
+			goto out_unlock;
 
+	ret = 1;
 	scmd->eh_eflags |= eh_flag;
 	list_add_tail(&scmd->eh_entry, &shost->eh_cmd_q);
-	scsi_host_set_state(shost, SHOST_RECOVERY);
 	shost->host_failed++;
 	scsi_eh_wakeup(shost);
+ out_unlock:
 	spin_unlock_irqrestore(shost->host_lock, flags);
-	return 1;
+	return ret;
 }
 
 /**
@@ -176,8 +181,8 @@ void scsi_times_out(struct scsi_cmnd *sc
 		}
 
 	if (unlikely(!scsi_eh_scmd_add(scmd, SCSI_EH_CANCEL_CMD))) {
-		panic("Error handler thread not present at %p %p %s %d",
-		      scmd, scmd->device->host, __FILE__, __LINE__);
+		scmd->result |= DID_TIME_OUT << 16;
+		__scsi_done(scmd);
 	}
 }
 
@@ -196,8 +201,7 @@ int scsi_block_when_processing_errors(st
 {
 	int online;
 
-	wait_event(sdev->host->host_wait, (sdev->host->shost_state !=
-					   SHOST_RECOVERY));
+	wait_event(sdev->host->host_wait, !scsi_host_in_recovery(sdev->host));
 
 	online = scsi_device_online(sdev);
 
@@ -1460,7 +1464,9 @@ static void scsi_restart_operations(stru
 	SCSI_LOG_ERROR_RECOVERY(3, printk("%s: waking up host to restart\n",
 					  __FUNCTION__));
 
-	scsi_host_set_state(shost, SHOST_RUNNING);
+	if (!scsi_host_set_state(shost, SHOST_RUNNING))
+		if (!scsi_host_set_state(shost, SHOST_CANCEL))
+			BUG_ON(!scsi_host_set_state(shost, SHOST_DEL));
 
 	wake_up(&shost->host_wait);
 
diff --git a/drivers/scsi/scsi_ioctl.c b/drivers/scsi/scsi_ioctl.c
--- a/drivers/scsi/scsi_ioctl.c
+++ b/drivers/scsi/scsi_ioctl.c
@@ -458,7 +458,7 @@ int scsi_nonblockable_ioctl(struct scsi_
 	 * error processing, as long as the device was opened
 	 * non-blocking */
 	if (filp && filp->f_flags & O_NONBLOCK) {
-		if (sdev->host->shost_state == SHOST_RECOVERY)
+		if (scsi_host_in_recovery(sdev->host))
 			return -ENODEV;
 	} else if (!scsi_block_when_processing_errors(sdev))
 		return -ENODEV;
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -424,7 +424,7 @@ void scsi_device_unbusy(struct scsi_devi
 
 	spin_lock_irqsave(shost->host_lock, flags);
 	shost->host_busy--;
-	if (unlikely((shost->shost_state == SHOST_RECOVERY) &&
+	if (unlikely(scsi_host_in_recovery(shost) &&
 		     shost->host_failed))
 		scsi_eh_wakeup(shost);
 	spin_unlock(shost->host_lock);
@@ -1306,7 +1306,7 @@ static inline int scsi_host_queue_ready(
 				   struct Scsi_Host *shost,
 				   struct scsi_device *sdev)
 {
-	if (shost->shost_state == SHOST_RECOVERY)
+	if (scsi_host_in_recovery(shost))
 		return 0;
 	if (shost->host_busy == 0 && shost->host_blocked) {
 		/*
diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
--- a/drivers/scsi/scsi_sysfs.c
+++ b/drivers/scsi/scsi_sysfs.c
@@ -57,6 +57,8 @@ static struct {
 	{ SHOST_CANCEL, "cancel" },
 	{ SHOST_DEL, "deleted" },
 	{ SHOST_RECOVERY, "recovery" },
+	{ SHOST_CANCEL_RECOVERY, "cancel/recovery" },
+	{ SHOST_DEL_RECOVERY, "deleted/recovery", },
 };
 const char *scsi_host_state_name(enum scsi_host_state state)
 {
diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -1027,7 +1027,7 @@ sg_ioctl(struct inode *inode, struct fil
 		if (sdp->detached)
 			return -ENODEV;
 		if (filp->f_flags & O_NONBLOCK) {
-			if (sdp->device->host->shost_state == SHOST_RECOVERY)
+			if (scsi_host_in_recovery(sdp->device->host))
 				return -EBUSY;
 		} else if (!scsi_block_when_processing_errors(sdp->device))
 			return -EBUSY;
diff --git a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h
--- a/include/scsi/scsi_host.h
+++ b/include/scsi/scsi_host.h
@@ -439,6 +439,8 @@ enum scsi_host_state {
 	SHOST_CANCEL,
 	SHOST_DEL,
 	SHOST_RECOVERY,
+	SHOST_CANCEL_RECOVERY,
+	SHOST_DEL_RECOVERY,
 };
 
 struct Scsi_Host {
@@ -621,6 +623,13 @@ static inline struct Scsi_Host *dev_to_s
 	return container_of(dev, struct Scsi_Host, shost_gendev);
 }
 
+static inline int scsi_host_in_recovery(struct Scsi_Host *shost)
+{
+	return shost->shost_state == SHOST_RECOVERY ||
+		shost->shost_state == SHOST_CANCEL_RECOVERY ||
+		shost->shost_state == SHOST_DEL_RECOVERY;
+}
+
 extern int scsi_queue_work(struct Scsi_Host *, struct work_struct *);
 extern void scsi_flush_work(struct Scsi_Host *);
 



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/5] SCSI scanning and removal fixes
  2005-09-07 20:43         ` Luben Tuikov
  2005-09-07 21:34           ` Stefan Richter
@ 2005-09-08 15:19           ` Alan Stern
  2005-09-08 16:07             ` Luben Tuikov
  1 sibling, 1 reply; 30+ messages in thread
From: Alan Stern @ 2005-09-08 15:19 UTC (permalink / raw)
  To: Stefan Richter, Luben Tuikov; +Cc: James Bottomley, SCSI development list

On Wed, 7 Sep 2005, Stefan Richter wrote:

> usb-storage implements scsi_mod bus reset as USB port reset. This
> affects only the failing device.

Correct, but you're forgetting that in USB a single device can contain
more than one "interface", and drivers bind to interfaces, not devices.  
Doing a USB port reset has implications beyond those of the single
interface controlled by usb-storage, since it affects the entire device.  
In fact, the current version refuses to do a port reset if there are other
interfaces present.

> There is one scsi_mod bus per SCSI target in usb-storage.

That's not quite entirely true.  There are a few obscure USB mass-storage 
devices that actually are a SCSI host adapter and control a real SCSI bus.  
But for all practical purposes we can forget about them.

On Wed, 7 Sep 2005, Luben Tuikov wrote:

> Is it possible to implement LU Reset TMF for USB storage devices?
> If so, then you do not need to do a "bus reset".

Sorry, I don't know that acronym.  What is TMF?

> *General note:* It is very bad to have to punish all devices on the
> "bus" just because one device failed.  Isn't there a better way
> to recover(*) a failed USB Storage device?

No.  There are only two reset mechanisms available for USB Mass Storage.  
One is a class-specific reset command (which usb-storage issues when asked 
to do a device reset), and the other is a USB port reset (which 
usb-storage issues when asked to do a bus reset).

In practice, it turns out that many (perhaps even most) USB Storage 
devices don't implement the class-specific reset command correctly.  
Windows doesn't use it at all, so far as I know.  This means our only 
realistic option is the USB port reset.

> (*) That doesn't necessarily mean "bring back to life", it could
> also mean "remove".

Well, usb-storage doesn't need to "remove" a failed device.  The SCSI core 
does a perfectly good job just by setting it off-line.  And since there 
aren't other targets sharing the bus, that's good enough.

> That is, a "bus reset" means that also any other device on the "bus" is
> unreachable.  If this is _not_ the case, then a "bus" reset must _not_
> take place.

I'm not sure what that first sentence is intended to mean.  The SCSI error 
handler _does_ issue bus resets even when other devices on the bus are 
still reachable, doesn't it?  Provided the others are idle at the time, 
there shouldn't be any harm in it.

> If you have the means to do transport checking of the state of the
> device, (and it seems like you can for USB Storage, since it is USB after
> all), then you _must_ implement your own eh handler.  This is what
> it is for.  Please, consider.
> 
> 	Luben
> 
> P.S. It may actually simplify things in the USB transport layer _and_ in
> SCSI Core (i.e. no need for those patches).

We are sort of moving in that direction.  usb-storage already does its 
own unsolicited resets when unexpected error occur.  But we don't have 
anything like the error-handler's mechanism for recovering from timeouts 
(TUR, then device reset, then TUR, then bus reset, ...).  There doesn't 
seem to be any point in reinventing all that code.

Alan Stern

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/5] SCSI scanning and removal fixes
  2005-09-07 19:58     ` James Bottomley
  2005-09-07 22:05       ` James Bottomley
@ 2005-09-08 15:59       ` Alan Stern
  2005-09-08 16:15         ` James Bottomley
  2005-09-08 16:08       ` Alan Stern
  2 siblings, 1 reply; 30+ messages in thread
From: Alan Stern @ 2005-09-08 15:59 UTC (permalink / raw)
  To: James Bottomley; +Cc: SCSI development list

On Wed, 7 Sep 2005, James Bottomley wrote:

> I agree (about the deadlocks).  However, as things stand RECOVERY is a
> state in the model and the model can only be in a single state.  If you
> permit the transition, and recovery is going on in parallel with
> removal, they'll race to set the final state (removal wants DEL and the
> eh thread will set it to RUNNING).
> 
> Either we go back to having an in_recovery flag (i.e. lift recovery out
> of the state model) or we make the model more complex to cope with this.
> Since really the only thing we test is in_recovery, we could do a more
> complex model; something like:
> 
> created
>    |
>    v    <--------- 
>  running ---------> recovery
>    |                   |
>    v   <----------     v
>  cancel ----------> recover/cancel
>    |                   |
>    v   ----------->    v
>   del <------------ recover/del
> 
> I also think I'd like not to go from del -> recover/del, but unless del
> actually means that all devices have completed their I/O for deletion
> that can't be avoided.

I don't understand your reasoning.  With your new system, you end up with 
two threads doing this:

	Removal thread			Error handler thread
	-------------------------	---------------------------
					Go from RUNNING to RECOVERY
	Try to go to CANCEL, fail
	Go to CANCEL_RECOVERY
race:	Go to DEL_RECOVERY		Try to go to RUNNING, fail


Whereas the old system has the two threads doing this:

	Removal thread			Error handler thread
	-------------------------	---------------------------
					Go from RUNNING to RECOVERY
	Go to CANCEL
race:	Go to DEL			Try to go to RUNNING, fail


At least, that's how it would work if you allow the RECOVERY -> CANCEL
transition.  Either way you end up in the correct state.  So what's wrong
with the old (current) system?

Alan Stern


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/5] SCSI scanning and removal fixes
  2005-09-08 15:19           ` Alan Stern
@ 2005-09-08 16:07             ` Luben Tuikov
  2005-09-08 18:36               ` Alan Stern
  0 siblings, 1 reply; 30+ messages in thread
From: Luben Tuikov @ 2005-09-08 16:07 UTC (permalink / raw)
  To: Alan Stern; +Cc: Stefan Richter, James Bottomley, SCSI development list

On 09/08/05 11:19, Alan Stern wrote:
> On Wed, 7 Sep 2005, Luben Tuikov wrote:
> 
> 
>>Is it possible to implement LU Reset TMF for USB storage devices?
>>If so, then you do not need to do a "bus reset".
> 
> 
> Sorry, I don't know that acronym.  What is TMF?

Good morning Alan,

I meant Task Management Function. (SAM from T10.org, section 7).

All newer storage transports are requred to have a direct mapping
of TMFs to transport units, e.g. SAS.

>>*General note:* It is very bad to have to punish all devices on the
>>"bus" just because one device failed.  Isn't there a better way
>>to recover(*) a failed USB Storage device?
> 
> No.  There are only two reset mechanisms available for USB Mass Storage.  

I see.

> One is a class-specific reset command (which usb-storage issues when asked 
> to do a device reset),

So as far as I can see it sends a reset on the wire to the device?

> and the other is a USB port reset (which 
> usb-storage issues when asked to do a bus reset).

As far as I see this resets the port on the host side and then
rediscovers the devices?

> In practice, it turns out that many (perhaps even most) USB Storage 
> devices don't implement the class-specific reset command correctly.  
> Windows doesn't use it at all, so far as I know.  This means our only 
> realistic option is the USB port reset.
> 
> 
>>(*) That doesn't necessarily mean "bring back to life", it could
>>also mean "remove".
> 
> 
> Well, usb-storage doesn't need to "remove" a failed device.  The SCSI core 
> does a perfectly good job just by setting it off-line.  And since there 
> aren't other targets sharing the bus, that's good enough.

Well what happens when I just unplug the device?
 
>>That is, a "bus reset" means that also any other device on the "bus" is
>>unreachable.  If this is _not_ the case, then a "bus" reset must _not_
>>take place.
> 
> I'm not sure what that first sentence is intended to mean.  The SCSI error 
> handler _does_ issue bus resets even when other devices on the bus are 
> still reachable, doesn't it?  Provided the others are idle at the time, 
> there shouldn't be any harm in it.

Ok, I see where the confusion is.

>>If you have the means to do transport checking of the state of the
>>device, (and it seems like you can for USB Storage, since it is USB after
>>all), then you _must_ implement your own eh handler.  This is what
>>it is for.  Please, consider.
>>
>>	Luben
>>
>>P.S. It may actually simplify things in the USB transport layer _and_ in
>>SCSI Core (i.e. no need for those patches).
> 
> 
> We are sort of moving in that direction.  usb-storage already does its 
> own unsolicited resets when unexpected error occur.  But we don't have 
> anything like the error-handler's mechanism for recovering from timeouts 
> (TUR, then device reset, then TUR, then bus reset, ...).  There doesn't 
> seem to be any point in reinventing all that code.

What code?  What reinvention?

You _need_ to implement a timeout and eh hook if you want to do this
in any sane way.

You also need to implement kref via kobject for the devices which you/USB Storage
represent to SCSI Core.  You also need to get rid of that horrible patchwork
of a solution: dev_semaphore.  Maybe you can take a look in the SAS code and
see how this is all done?

	Luben


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/5] SCSI scanning and removal fixes
  2005-09-07 19:58     ` James Bottomley
  2005-09-07 22:05       ` James Bottomley
  2005-09-08 15:59       ` Alan Stern
@ 2005-09-08 16:08       ` Alan Stern
  2 siblings, 0 replies; 30+ messages in thread
From: Alan Stern @ 2005-09-08 16:08 UTC (permalink / raw)
  To: James Bottomley, Mike Anderson; +Cc: SCSI development list

On Wed, 7 Sep 2005, James Bottomley wrote:

> Actually, alloc_target is properly guarded so it doesn't need the scan
> mutex.  It might be nice to update the SDEV_ state model to include a
> "scanning" state, that way we could properly guard the sdev_alloc as
> well and dump the scan mutex ... that's probably more than a slight
> change, though.

Probably not a good idea, either.  The "scanning" state applies to the 
host, not a particular device -- the idea is to prevent one thread from 
adding new devices while another thread is trying to get rid of all the 
devices because the host is going away.  (Not to mention, preventing two 
threads from adding the same target at the same time!)

And this needs to be a mutex, since host removal must wait until scanning 
finishes.  Making it a state instead won't work well.

> > Would you like me to submit an updated patch?
> 
> Yes, please.  It's been suggested that we should have a scsi_add_target
> that returns zero on success or error on failure (with no ref to the
> sdev) and keep the old behaviour of __scsi_add_target().

On Wed, 7 Sep 2005, Mike Anderson wrote:

> On as542. The first hunk of the diff is already in (declare of
> scsi_host_set_state). The second hunk looks good (SHOST_RECOVERY label).
> The third hunk I will take in combination with the other patch that
> effects the scan code. This should have probably been
> another patch.

At this point, maybe it's best for me to step back and let Mike look 
through the issues first.  There's Patrick's patch to be combined with the 
one I sent, and other things that need to be straightened out.

Alan Stern

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/5] SCSI scanning and removal fixes
  2005-09-08 15:59       ` Alan Stern
@ 2005-09-08 16:15         ` James Bottomley
  2005-09-08 18:58           ` Alan Stern
  0 siblings, 1 reply; 30+ messages in thread
From: James Bottomley @ 2005-09-08 16:15 UTC (permalink / raw)
  To: Alan Stern; +Cc: SCSI development list

On Thu, 2005-09-08 at 11:59 -0400, Alan Stern wrote:
> I don't understand your reasoning.  With your new system, you end up with 
> two threads doing this:
> 
> 	Removal thread			Error handler thread
> 	-------------------------	---------------------------
> 					Go from RUNNING to RECOVERY
> 	Try to go to CANCEL, fail
> 	Go to CANCEL_RECOVERY
> race:	Go to DEL_RECOVERY		Try to go to RUNNING, fail

Actually, no, that's why we have the parallel EH states ... let me put
in the events that trigger state transitions so you can see what
happens:


        EH thread finishes
	<---------------

        EH thread begins
        ---------------->


         <--------- 
  running ---------> recovery
    |                   |
    |                   | scsi_remove_host() called
    v   <----------     v
  cancel ----------> recover/cancel
    |                   |
    |                   | scsi_remove_host finishes visibility removal
    v                   v
   del <------------ recover/del

So the EH is allowed to activate in either running or cancel states, but
goes through its own state transition eventually coming back to del when
it finishes.  Once the EH gets into recover/cancel it can never
transition back to running.

> At least, that's how it would work if you allow the RECOVERY -> CANCEL
> transition.  Either way you end up in the correct state.  So what's wrong
> with the old (current) system?

It's just nasty on two counts:

1) we have an incorrect bifurcation in the state model and
2) we never actually enforce any of the state transitions.

James



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/5] SCSI scanning and removal fixes
  2005-09-08 16:07             ` Luben Tuikov
@ 2005-09-08 18:36               ` Alan Stern
  2005-09-08 23:59                 ` Luben Tuikov
  0 siblings, 1 reply; 30+ messages in thread
From: Alan Stern @ 2005-09-08 18:36 UTC (permalink / raw)
  To: Luben Tuikov; +Cc: SCSI development list

On Thu, 8 Sep 2005, Luben Tuikov wrote:

> > No.  There are only two reset mechanisms available for USB Mass Storage.  
> 
> I see.
> 
> > One is a class-specific reset command (which usb-storage issues when asked 
> > to do a device reset),
> 
> So as far as I can see it sends a reset on the wire to the device?

Sort of.  It's not a low-level hardware reset, it's rather high-level.  
If the device's firmware is busy doing something else, or is hung, or
doesn't implement resets correctly, then the reset won't work.

> > and the other is a USB port reset (which 
> > usb-storage issues when asked to do a bus reset).
> 
> As far as I see this resets the port on the host side and then
> rediscovers the devices?

That's right.  (Except there's only one device to rediscover.)  This is a 
much lower-level action than the other sort, and is much more likely to 
succeed.

> > Well, usb-storage doesn't need to "remove" a failed device.  The SCSI core 
> > does a perfectly good job just by setting it off-line.  And since there 
> > aren't other targets sharing the bus, that's good enough.
> 
> Well what happens when I just unplug the device?

For a brief time (maybe 1/4 second) all commands will return DID_ERROR
and resets will return FAILED.  Then usbcore will realize that the device
has been unplugged and will call the usb-storage disconnect method, which
in turn will call scsi_remove_host.

Does that answer your question?  For more, see below.

> > We are sort of moving in that direction.  usb-storage already does its 
> > own unsolicited resets when unexpected error occur.  But we don't have 
> > anything like the error-handler's mechanism for recovering from timeouts 
> > (TUR, then device reset, then TUR, then bus reset, ...).  There doesn't 
> > seem to be any point in reinventing all that code.
> 
> What code?  What reinvention?

scsi_unjam_host plus the things that it calls.  Why add a private version 
of all that to usb-storage when it already exists in scsi_error.c?

> You _need_ to implement a timeout and eh hook if you want to do this
> in any sane way.

No I don't.  There already is a timeout in the SCSI core
(scmd->eh_timeout); I don't need to add another one.

> You also need to implement kref via kobject for the devices which you/USB Storage
> represent to SCSI Core.

Again, I don't need to.  The data used by usb-storage belongs to the
private portion of the scsi_host, so it's already protected by the host's
kref.

>  You also need to get rid of that horrible patchwork
> of a solution: dev_semaphore.  Maybe you can take a look in the SAS code and
> see how this is all done?

I assume you're referring to the way dev_semaphore is used in the 
bus_reset routine?  It's sort of a preventative measure, because the SCSI 
core and the driver core haven't quite settled down yet.

Here's the idea.  We want to protect against a race between the error 
handler doing a bus reset and the user doing rmmod usb-storage.  With the 
current code, I believe scsi_remove_host does not wait for devices to go 
out of error recovery.  Here is what might happen without dev_semaphore:

	Error handler thread		Rmmod thread
	--------------------		------------
	eh calls usb-storage's bus_reset
	-> calls usb_stor_port_reset
	 -> calls usb_lock_device_for_reset
	-> calls usb_reset_device
					Driver core calls usb-storage's
						remove method
					storage_disconnect calls
						quiesce_and_remove_host
					-> locks & releases dev_semaphore (*)
					-> calls scsi_remove_host
						(doesn't wait for eh)
					-> returns to driver core
	 -> usb_reset_device does
		the port reset

(This picture is over-simplified because the driver core does some locking 
of its own before calling the remove method.  Let's not worry about that.)

Now see what has happened.  usb-storage's remove method has returned, so
it has given up all claims to the device.  It shouldn't do anything more
that will affect the device.  And yet in the eh thread it's about to do a
port reset!

Since in reality the bus_reset routine acquires dev_semaphore, the rmmod
thread will wait at (*) until the reset has finished.  Alternatively, if
scsi_remove_host waited until the device was out of error recovery, then
we would be okay without using dev_semaphore.

The complication exists inside usb_lock_device_for_reset.  That routine
ends up calling down_trylock and looping with a timeout, because of the 
potential for a complicated deadlock that I have described before.

There's one more thing worth mentioning.  Thanks to the resets carried out
internally by usb-storage, we may decide it's easiest not to implement the
bus_reset method at all!  That would certainly solve all the problems with
dev_semaphore.  :-)

Alan Stern

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/5] SCSI scanning and removal fixes
  2005-09-08 16:15         ` James Bottomley
@ 2005-09-08 18:58           ` Alan Stern
  2005-09-08 20:15             ` James Bottomley
  0 siblings, 1 reply; 30+ messages in thread
From: Alan Stern @ 2005-09-08 18:58 UTC (permalink / raw)
  To: James Bottomley; +Cc: SCSI development list

On Thu, 8 Sep 2005, James Bottomley wrote:

> Actually, no, that's why we have the parallel EH states ... let me put
> in the events that trigger state transitions so you can see what
> happens:
> 
> 
>         EH thread finishes
> 	<---------------
> 
>         EH thread begins
>         ---------------->
> 
> 
>          <--------- 
>   running ---------> recovery
>     |                   |
>     |                   | scsi_remove_host() called
>     v   <----------     v
>   cancel ----------> recover/cancel
>     |                   |
>     |                   | scsi_remove_host finishes visibility removal
>     v                   v
>    del <------------ recover/del
> 
> So the EH is allowed to activate in either running or cancel states, but
> goes through its own state transition eventually coming back to del when
> it finishes.  Once the EH gets into recover/cancel it can never
> transition back to running.

Why allow cancel -> recover/cancel?  Once the device is in the cancel 
state, there isn't anything useful the error handler can accomplish, is 
there?  Failed or timed-out commands should simply return an error.

And if you accept that, then what point is there in distinguishing between 
cancel and recover/cancel?  As far as I can see, the only significant 
difference is that in recover/cancel the error handler is running (but not 
accomplishing anything).  Is this related to 1) below?

> > At least, that's how it would work if you allow the RECOVERY -> CANCEL
> > transition.  Either way you end up in the correct state.  So what's wrong
> > with the old (current) system?
> 
> It's just nasty on two counts:
> 
> 1) we have an incorrect bifurcation in the state model and
> 2) we never actually enforce any of the state transitions.

Can you explain 1) more fully?  I don't really understand what you're 
getting at.

As for 2), what do you mean?  In 2.6.13, scsi_device_set_state does not 
change the state if the transition is illegal.

Alan Stern

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/5] SCSI scanning and removal fixes
  2005-09-08 18:58           ` Alan Stern
@ 2005-09-08 20:15             ` James Bottomley
  2005-09-09  0:18               ` Luben Tuikov
  2005-09-09 14:16               ` Alan Stern
  0 siblings, 2 replies; 30+ messages in thread
From: James Bottomley @ 2005-09-08 20:15 UTC (permalink / raw)
  To: Alan Stern; +Cc: SCSI development list

On Thu, 2005-09-08 at 14:58 -0400, Alan Stern wrote:
> On Thu, 8 Sep 2005, James Bottomley wrote:
> > Actually, no, that's why we have the parallel EH states ... let me put
> > in the events that trigger state transitions so you can see what
> > happens:
> > 
> > 
> >         EH thread finishes
> > 	<---------------
> > 
> >         EH thread begins
> >         ---------------->
> > 
> > 
> >          <--------- 
> >   running ---------> recovery
> >     |                   |
> >     |                   | scsi_remove_host() called
> >     v   <----------     v
> >   cancel ----------> recover/cancel
> >     |                   |
> >     |                   | scsi_remove_host finishes visibility removal
> >     v                   v
> >    del <------------ recover/del
> > 
> > So the EH is allowed to activate in either running or cancel states, but
> > goes through its own state transition eventually coming back to del when
> > it finishes.  Once the EH gets into recover/cancel it can never
> > transition back to running.
> 
> Why allow cancel -> recover/cancel?  Once the device is in the cancel 
> state, there isn't anything useful the error handler can accomplish, is 
> there?  Failed or timed-out commands should simply return an error.
> 
> And if you accept that, then what point is there in distinguishing between 
> cancel and recover/cancel?  As far as I can see, the only significant 
> difference is that in recover/cancel the error handler is running (but not 
> accomplishing anything).  Is this related to 1) below?

Yes, a state machine like I've shown can only be in one state at any
given time.  The problem is what happens if the error handler is running
when you try to remove the device.  We can reach in and try to stop it,
but under a state model we'd have to wait until the error handler put
the device back to running unless we allow a set of state transitions
that the error handler can go through in parallel with the removal
(which is the model I've shown above).

The idea of allowing the error handler to start up in cancel is that not
every device removal is a forced removal, so you want to permit the
shutdown commands (like sync cache) and consequent error handling on
those commands.

> > It's just nasty on two counts:
> > 
> > 1) we have an incorrect bifurcation in the state model and
> > 2) we never actually enforce any of the state transitions.
> 
> Can you explain 1) more fully?  I don't really understand what you're 
> getting at.

Does the explanation above cover this?

> As for 2), what do you mean?  In 2.6.13, scsi_device_set_state does not 
> change the state if the transition is illegal.

Yes, but we never act on the return value.  if the state is illegal, we
shouldn't be taking actions believing we're in that state, so we should
always be checking the returns.

James



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/5] SCSI scanning and removal fixes
  2005-09-08 18:36               ` Alan Stern
@ 2005-09-08 23:59                 ` Luben Tuikov
  2005-09-09 14:44                   ` Alan Stern
  2005-09-09 17:08                   ` Stefan Richter
  0 siblings, 2 replies; 30+ messages in thread
From: Luben Tuikov @ 2005-09-08 23:59 UTC (permalink / raw)
  To: Alan Stern; +Cc: SCSI development list

On 09/08/05 14:36, Alan Stern wrote:
> On Thu, 8 Sep 2005, Luben Tuikov wrote:
> 
> 
>>>No.  There are only two reset mechanisms available for USB Mass Storage.  
>>
>>I see.
>>
>>
>>>One is a class-specific reset command (which usb-storage issues when asked 
>>>to do a device reset),
>>
>>So as far as I can see it sends a reset on the wire to the device?
> 
> 
> Sort of.  It's not a low-level hardware reset, it's rather high-level.  
> If the device's firmware is busy doing something else, or is hung, or
> doesn't implement resets correctly, then the reset won't work.

So this is perfect for LU Reset.

>>>and the other is a USB port reset (which 
>>>usb-storage issues when asked to do a bus reset).
>>
>>As far as I see this resets the port on the host side and then
>>rediscovers the devices?
> 
> 
> That's right.  (Except there's only one device to rediscover.)  This is a 
> much lower-level action than the other sort, and is much more likely to 
> succeed.

So this is perfect for I_T Nexus Reset.

>>>Well, usb-storage doesn't need to "remove" a failed device.  The SCSI core 
>>>does a perfectly good job just by setting it off-line.  And since there 
>>>aren't other targets sharing the bus, that's good enough.
>>
>>Well what happens when I just unplug the device?
> 
> 
> For a brief time (maybe 1/4 second) all commands will return DID_ERROR
> and resets will return FAILED.  Then usbcore will realize that the device

Perfectly ok.

> has been unplugged and will call the usb-storage disconnect method, which
> in turn will call scsi_remove_host.

This is fine.  I haven't looked closely at the code, but note that
scsi_remove_host() should absolutely be called last.
 
>>What code?  What reinvention?
> 
> scsi_unjam_host plus the things that it calls.  Why add a private version 
> of all that to usb-storage when it already exists in scsi_error.c?

Because it is not implemented correctly.

Think about this:
	EH is function of scsi_host.
	You can define YOUR OWN EH! in usb storage, which you control!
	So that when you want to remove the host you know what position
	you're at. ;-)

>>You _need_ to implement a timeout and eh hook if you want to do this
>>in any sane way.
> 
> 
> No I don't.  There already is a timeout in the SCSI core
> (scmd->eh_timeout); I don't need to add another one.

Eh, you're missing my point completely.
Hopefully one day you can see what I mean.

>>You also need to implement kref via kobject for the devices which you/USB Storage
>>represent to SCSI Core.
> 
> Again, I don't need to.  The data used by usb-storage belongs to the
> private portion of the scsi_host, so it's already protected by the host's
> kref.

Well then, if your code is so correct, why are all these problems?

>> You also need to get rid of that horrible patchwork
>>of a solution: dev_semaphore.  Maybe you can take a look in the SAS code and
>>see how this is all done?
> 
> I assume you're referring to the way dev_semaphore is used in the 
> bus_reset routine?  It's sort of a preventative measure, because the SCSI 
> core and the driver core haven't quite settled down yet.

"Settled down" -- you mean, litte design if ever has been done writing this
thing?  Or do you mean: "one is waiting for the other to change in certain
ways to make it work"?

> Here's the idea.  We want to protect against a race between the error 
> handler doing a bus reset and the user doing rmmod usb-storage.  With the 
> current code, I believe scsi_remove_host does not wait for devices to go 
> out of error recovery.  Here is what might happen without dev_semaphore:
> 
> 	Error handler thread		Rmmod thread
> 	--------------------		------------
> 	eh calls usb-storage's bus_reset
> 	-> calls usb_stor_port_reset
> 	 -> calls usb_lock_device_for_reset
> 	-> calls usb_reset_device
> 					Driver core calls usb-storage's
> 						remove method
> 					storage_disconnect calls
> 						quiesce_and_remove_host
> 					-> locks & releases dev_semaphore (*)
> 					-> calls scsi_remove_host
> 						(doesn't wait for eh)
> 					-> returns to driver core

How can all this happen if a kref (module count) is upped by one,
due to SCSI Core using usb-storage?

> 	 -> usb_reset_device does
> 		the port reset
> 
> (This picture is over-simplified because the driver core does some locking 
> of its own before calling the remove method.  Let's not worry about that.)
> 
> Now see what has happened.  usb-storage's remove method has returned, so
> it has given up all claims to the device.  It shouldn't do anything more
> that will affect the device.  And yet in the eh thread it's about to do a
> port reset!

Yes, this is because eh should be in USB Storage.

> 
> Since in reality the bus_reset routine acquires dev_semaphore, the rmmod
> thread will wait at (*) until the reset has finished.  Alternatively, if

OMG!

> scsi_remove_host waited until the device was out of error recovery, then
> we would be okay without using dev_semaphore.
> 
> The complication exists inside usb_lock_device_for_reset.  That routine
> ends up calling down_trylock and looping with a timeout, because of the 
> potential for a complicated deadlock that I have described before.

OMG!
 
> There's one more thing worth mentioning.  Thanks to the resets carried out
> internally by usb-storage, we may decide it's easiest not to implement the
> bus_reset method at all!  That would certainly solve all the problems with
> dev_semaphore.  :-)

What would solve your problem is implementing your own eh and timer hook.
More so for transports such as USB.

	Luben


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/5] SCSI scanning and removal fixes
  2005-09-08 20:15             ` James Bottomley
@ 2005-09-09  0:18               ` Luben Tuikov
  2005-09-09 14:16               ` Alan Stern
  1 sibling, 0 replies; 30+ messages in thread
From: Luben Tuikov @ 2005-09-09  0:18 UTC (permalink / raw)
  To: James Bottomley; +Cc: Alan Stern, SCSI development list

On 09/08/05 16:15, James Bottomley wrote:
> On Thu, 2005-09-08 at 14:58 -0400, Alan Stern wrote:
> 
>>On Thu, 8 Sep 2005, James Bottomley wrote:
>>
>>>Actually, no, that's why we have the parallel EH states ... let me put
>>>in the events that trigger state transitions so you can see what
>>>happens:
>>>
>>>
>>>        EH thread finishes
>>>	<---------------
>>>
>>>        EH thread begins
>>>        ---------------->
>>>
>>>
>>>         <--------- 
>>>  running ---------> recovery
>>>    |                   |
>>>    |                   | scsi_remove_host() called
>>>    v   <----------     v
>>>  cancel ----------> recover/cancel
>>>    |                   |
>>>    |                   | scsi_remove_host finishes visibility removal
>>>    v                   v
>>>   del <------------ recover/del
>>>
>>>So the EH is allowed to activate in either running or cancel states, but
>>>goes through its own state transition eventually coming back to del when
>>>it finishes.  Once the EH gets into recover/cancel it can never
>>>transition back to running.
>>
>>Why allow cancel -> recover/cancel?  Once the device is in the cancel 
>>state, there isn't anything useful the error handler can accomplish, is 
>>there?  Failed or timed-out commands should simply return an error.
>>
>>And if you accept that, then what point is there in distinguishing between 
>>cancel and recover/cancel?  As far as I can see, the only significant 
>>difference is that in recover/cancel the error handler is running (but not 
>>accomplishing anything).  Is this related to 1) below?
> 
> 
> Yes, a state machine like I've shown can only be in one state at any
> given time.

And this is fine.

> The problem is what happens if the error handler is running
> when you try to remove the device.  We can reach in and try to stop it,
> but under a state model we'd have to wait until the error handler put
> the device back to running unless we allow a set of state transitions
> that the error handler can go through in parallel with the removal
> (which is the model I've shown above).

I'd suggest "waiting" until EH is done.  EH is just another user
of the SDS.  Thus if the device is physically removed, all tasks
fail and new ones are returned with failed delivery (the SDS tells
you this).

So one state at a time is the way to go.

In effect, a device remove _event_ is queued up, _after_ the
_recover_event_, and executed thereafter, when the state process
of _recover_event_ has finished and the state machine is ready
to transition.

	Luben


> The idea of allowing the error handler to start up in cancel is that not
> every device removal is a forced removal, so you want to permit the
> shutdown commands (like sync cache) and consequent error handling on
> those commands.
> 
> 
>>>It's just nasty on two counts:
>>>
>>>1) we have an incorrect bifurcation in the state model and
>>>2) we never actually enforce any of the state transitions.
>>
>>Can you explain 1) more fully?  I don't really understand what you're 
>>getting at.
> 
> 
> Does the explanation above cover this?
> 
> 
>>As for 2), what do you mean?  In 2.6.13, scsi_device_set_state does not 
>>change the state if the transition is illegal.
> 
> 
> Yes, but we never act on the return value.  if the state is illegal, we
> shouldn't be taking actions believing we're in that state, so we should
> always be checking the returns.
> 
> James
> 
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/5] SCSI scanning and removal fixes
  2005-09-08 20:15             ` James Bottomley
  2005-09-09  0:18               ` Luben Tuikov
@ 2005-09-09 14:16               ` Alan Stern
  2005-09-09 14:44                 ` James Bottomley
  1 sibling, 1 reply; 30+ messages in thread
From: Alan Stern @ 2005-09-09 14:16 UTC (permalink / raw)
  To: James Bottomley; +Cc: SCSI development list

On Thu, 8 Sep 2005, James Bottomley wrote:

> Yes, a state machine like I've shown can only be in one state at any
> given time.  The problem is what happens if the error handler is running
> when you try to remove the device.  We can reach in and try to stop it,
> but under a state model we'd have to wait until the error handler put
> the device back to running unless we allow a set of state transitions
> that the error handler can go through in parallel with the removal
> (which is the model I've shown above).
> 
> The idea of allowing the error handler to start up in cancel is that not
> every device removal is a forced removal, so you want to permit the
> shutdown commands (like sync cache) and consequent error handling on
> those commands.

I see.  You must be making an unstated assumption, something along the
lines of "If the error handler is running then the host is in a *RECOVERY
state."  Given that, there definitely is a need for a separate 
CANCEL_RECOVERY and DEL_RECOVERY.

> > As for 2), what do you mean?  In 2.6.13, scsi_device_set_state does not 
> > change the state if the transition is illegal.
> 
> Yes, but we never act on the return value.  if the state is illegal, we
> shouldn't be taking actions believing we're in that state, so we should
> always be checking the returns.

Hmmm, well looking through the patch you posted yesterday, there are a few 
problems.  If the ->CANCEL transition fails, scsi_remove_host then tries 
->CANCEL_RECOVERY.  But the failed transition still writes a error message 
to the log, even though it's not really an error.

Also, you've got a race between scsi_remove_host doing ->CANCEL_RECOVERY
and the error handler doing ->RUNNING.  If both happen at the same time,
you can end up in the wrong state.  To make this work, the state
transitions should only be made under protection of the host's spinlock.  
And to avoid the error messages, the choice of next state should be made 
depending on the current state (like, if the current state is RECOVERY 
then go to CANCEL_RECOVERY, otherwise go to CANCEL).

Alan Stern

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/5] SCSI scanning and removal fixes
  2005-09-08 23:59                 ` Luben Tuikov
@ 2005-09-09 14:44                   ` Alan Stern
  2005-09-09 17:08                   ` Stefan Richter
  1 sibling, 0 replies; 30+ messages in thread
From: Alan Stern @ 2005-09-09 14:44 UTC (permalink / raw)
  To: Luben Tuikov; +Cc: SCSI development list

On Thu, 8 Sep 2005, Luben Tuikov wrote:

> >>So as far as I can see it sends a reset on the wire to the device?
> > 
> > 
> > Sort of.  It's not a low-level hardware reset, it's rather high-level.  
> > If the device's firmware is busy doing something else, or is hung, or
> > doesn't implement resets correctly, then the reset won't work.
> 
> So this is perfect for LU Reset.
> 
> >>>and the other is a USB port reset (which 
> >>>usb-storage issues when asked to do a bus reset).
> >>
> >>As far as I see this resets the port on the host side and then
> >>rediscovers the devices?
> > 
> > 
> > That's right.  (Except there's only one device to rediscover.)  This is a 
> > much lower-level action than the other sort, and is much more likely to 
> > succeed.
> 
> So this is perfect for I_T Nexus Reset.

Forgive me; I haven't read the SAM document and I'm not familiar with all
the strange SCSI terms even in the regular SCSI spec.  You're agreeing
that usb-storage uses the appropriate sort of reset operations at the
correct spots, right?

> >>What code?  What reinvention?
> > 
> > scsi_unjam_host plus the things that it calls.  Why add a private version 
> > of all that to usb-storage when it already exists in scsi_error.c?
> 
> Because it is not implemented correctly.

Are you saying the scsi_unjam_host is wrong ("implemented incorrectly")?  
Can you point out any particular errors in it, or send in a patch to fix
the implementation?

> Think about this:
> 	EH is function of scsi_host.
> 	You can define YOUR OWN EH! in usb storage, which you control!
> 	So that when you want to remove the host you know what position
> 	you're at. ;-)

Well yes, that's true.  But doing it will add extra code to the 
usb-storage driver without making it better or improving its behavior in 
any way.

> Eh, you're missing my point completely.
> Hopefully one day you can see what I mean.

And maybe you'll see what I mean.  (I really _am_ trying to understand, 
but it's always hard to communicate subtle meanings over email.)

> >>You also need to implement kref via kobject for the devices which you/USB Storage
> >>represent to SCSI Core.
> > 
> > Again, I don't need to.  The data used by usb-storage belongs to the
> > private portion of the scsi_host, so it's already protected by the host's
> > kref.
> 
> Well then, if your code is so correct, why are all these problems?

All what problems?  There aren't any problems in usb-storage.  There's an
awkward bit of code in one routine (usb_lock_device_for_reset), but that's
inevitable because of the layering violation implicit in its job.

> >> You also need to get rid of that horrible patchwork
> >>of a solution: dev_semaphore.  Maybe you can take a look in the SAS code and
> >>see how this is all done?
> > 
> > I assume you're referring to the way dev_semaphore is used in the 
> > bus_reset routine?  It's sort of a preventative measure, because the SCSI 
> > core and the driver core haven't quite settled down yet.
> 
> "Settled down" -- you mean, litte design if ever has been done writing this
> thing?  Or do you mean: "one is waiting for the other to change in certain
> ways to make it work"?

What I mean is: I wasn't sure whether scsi_remove_host would wait for the 
error handler to finish.  Until recently it would.  Now with Mike's 
changes it place, it doesn't.  With the changes you've been proposing, 
once again it would.

Since I couldn't depend on any particular behavior, I had to write the 
bus_reset routine in usb-storage so that it would work either way.  That 
meant it needed to acquire the dev_semaphore lock.

After thinking it over last night, I decided that the details of
scsi_remove_host's behavior really should remain private to the SCSI core.  
So far as I'm concerned, it's okay if it doesn't wait for the error
handler to finish and it's okay if it does.  The one _crucial_
requirement is this: It must synchronize with the error handler.  That is,
if the error handler is executing a command or a reset at the time
scsi_remove_host is called, then scsi_remove_host must not return until
the command or reset has completed.  How long it chooses to wait after
that is unimportant.

If scsi_remove_host would make that guarantee, then dev_semaphore could be 
removed from usb-storage's bus_reset.  (It wouldn't hurt to add a comment 
to scsi_remove_host, documenting the guarantee so that it doesn't get 
changed in the future.)

> > Here's the idea.  We want to protect against a race between the error 
> > handler doing a bus reset and the user doing rmmod usb-storage.  With the 
> > current code, I believe scsi_remove_host does not wait for devices to go 
> > out of error recovery.  Here is what might happen without dev_semaphore:
> > 
> > 	Error handler thread		Rmmod thread
> > 	--------------------		------------
> > 	eh calls usb-storage's bus_reset
> > 	-> calls usb_stor_port_reset
> > 	 -> calls usb_lock_device_for_reset
> > 	-> calls usb_reset_device
> > 					Driver core calls usb-storage's
> > 						remove method
> > 					storage_disconnect calls
> > 						quiesce_and_remove_host
> > 					-> locks & releases dev_semaphore (*)
> > 					-> calls scsi_remove_host
> > 						(doesn't wait for eh)
> > 					-> returns to driver core
> 
> How can all this happen if a kref (module count) is upped by one,
> due to SCSI Core using usb-storage?

I do have a blind spot for that, don't I?  All right, this can't happen by 
way of rmmod.  But it can happen if the user unbinds usb-storage via 
sysfs.

> > 	 -> usb_reset_device does
> > 		the port reset
> > 
> > (This picture is over-simplified because the driver core does some locking 
> > of its own before calling the remove method.  Let's not worry about that.)
> > 
> > Now see what has happened.  usb-storage's remove method has returned, so
> > it has given up all claims to the device.  It shouldn't do anything more
> > that will affect the device.  And yet in the eh thread it's about to do a
> > port reset!
> 
> Yes, this is because eh should be in USB Storage.

No, that wouldn't make any difference!  Particularly since the part that
does the port reset already _is_ in usb-storage.

What matters is whether or not scsi_remove_host waits for the reset to 
complete.  Adding a private error handler to usb-storage won't affect 
that.

> What would solve your problem is implementing your own eh and timer hook.
> More so for transports such as USB.

But there isn't a problem to solve!  Everything works as it is.

The only part one might object to is the awkward code in 
usb_lock_device_for_reset.  However, that code will have to remain awkward 
no matter what.  Even with a private error handler for usb-storage that 
will still be true.

Alan Stern

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/5] SCSI scanning and removal fixes
  2005-09-09 14:16               ` Alan Stern
@ 2005-09-09 14:44                 ` James Bottomley
  2005-09-09 15:16                   ` Alan Stern
  0 siblings, 1 reply; 30+ messages in thread
From: James Bottomley @ 2005-09-09 14:44 UTC (permalink / raw)
  To: Alan Stern; +Cc: SCSI development list

On Fri, 2005-09-09 at 10:16 -0400, Alan Stern wrote:
> I see.  You must be making an unstated assumption, something along the
> lines of "If the error handler is running then the host is in a *RECOVERY
> state."  Given that, there definitely is a need for a separate 
> CANCEL_RECOVERY and DEL_RECOVERY.

Sorry, yes ... it wasn't stated because I thought it was apparent in the
diagram.

> Hmmm, well looking through the patch you posted yesterday, there are a few 
> problems.  If the ->CANCEL transition fails, scsi_remove_host then tries 
> ->CANCEL_RECOVERY.  But the failed transition still writes a error message 
> to the log, even though it's not really an error.

Well ... the log is just that, a log.  It's not active unless someone
asks to see it and all its entries don't necessarily represent errors.
However, we can probably tone down the logging to KERN_NOTICE.

> Also, you've got a race between scsi_remove_host doing ->CANCEL_RECOVERY
> and the error handler doing ->RUNNING.  If both happen at the same time,
> you can end up in the wrong state.  To make this work, the state
> transitions should only be made under protection of the host's spinlock.  
> And to avoid the error messages, the choice of next state should be made 
> depending on the current state (like, if the current state is RECOVERY 
> then go to CANCEL_RECOVERY, otherwise go to CANCEL).

Yes ... we definitely use the lock for device transitions.  Since all of
this is done with user context, a semaphore would be the most
appropriate, but since we already have a host lock, it's probably better
to use it.

So which way do you want to go?  Either we wait in recovery for the
error handler to finish and transition the host state to RUNNING or we
introduce the parallel states for the error handler.

James

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/5] SCSI scanning and removal fixes
  2005-09-09 14:44                 ` James Bottomley
@ 2005-09-09 15:16                   ` Alan Stern
  2005-09-09 15:37                     ` James Bottomley
  0 siblings, 1 reply; 30+ messages in thread
From: Alan Stern @ 2005-09-09 15:16 UTC (permalink / raw)
  To: James Bottomley; +Cc: SCSI development list

On Fri, 9 Sep 2005, James Bottomley wrote:

> So which way do you want to go?  Either we wait in recovery for the
> error handler to finish and transition the host state to RUNNING or we
> introduce the parallel states for the error handler.

For usb-storage it won't make any difference on the whole, as far as I can
see.  The important thing is that scsi_remove_host needs to synchronize
somehow with the error handler.  Waiting for the host state to go back to
RUNNING would be valid.  Introducing the parallel states would mean 
waiting for the host to go from CANCEL_RECOVERY to CANCEL, right?

Either way should work.  Would there be more of a difference for drivers
that allow non-forced removal?  And without the parallel states, would you
worry about the possibility of starving scsi_remove_host (every time it
tries to go from RUNNING to CANCEL, the error handler gets there first and
changes the state to RECOVERY)?

In the absence of other considerations, my vote goes for adding the least 
amount of additional code.

Alan Stern

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/5] SCSI scanning and removal fixes
  2005-09-09 15:16                   ` Alan Stern
@ 2005-09-09 15:37                     ` James Bottomley
  2005-09-09 16:17                       ` Alan Stern
  0 siblings, 1 reply; 30+ messages in thread
From: James Bottomley @ 2005-09-09 15:37 UTC (permalink / raw)
  To: Alan Stern; +Cc: SCSI development list

On Fri, 2005-09-09 at 11:16 -0400, Alan Stern wrote:
> > So which way do you want to go?  Either we wait in recovery for the
> > error handler to finish and transition the host state to RUNNING or we
> > introduce the parallel states for the error handler.
> 
> For usb-storage it won't make any difference on the whole, as far as I can
> see.  The important thing is that scsi_remove_host needs to synchronize
> somehow with the error handler.  Waiting for the host state to go back to
> RUNNING would be valid.  Introducing the parallel states would mean 
> waiting for the host to go from CANCEL_RECOVERY to CANCEL, right?

Actually, no, there would be no waiting.  Once the host gets to DEL or
CANCEL_DEL, then all the devices should be in DEL and the mid-layer will
begin rejecting any commands the error handler makes.  So it can
continue until it's exhausted.  It keeps a reference to the devices it
needs to operate, so they'll be finally freed when it stops.

Since the eh has references, module removal would have to wait until it
had finished, but everything else can proceed without it.

> Either way should work.  Would there be more of a difference for drivers
> that allow non-forced removal?  And without the parallel states, would you
> worry about the possibility of starving scsi_remove_host (every time it
> tries to go from RUNNING to CANCEL, the error handler gets there first and
> changes the state to RECOVERY)?

Not really.  Starting the eh is quite a long and involved process, so
the chances of it restarting before scsi_remove_host gets scheduled is
small (not impossible, but small).

> In the absence of other considerations, my vote goes for adding the least 
> amount of additional code.

Actually, they're both about the same amount of code.

James

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/5] SCSI scanning and removal fixes
  2005-09-09 15:37                     ` James Bottomley
@ 2005-09-09 16:17                       ` Alan Stern
  2005-09-09 16:47                         ` Mike Anderson
  0 siblings, 1 reply; 30+ messages in thread
From: Alan Stern @ 2005-09-09 16:17 UTC (permalink / raw)
  To: James Bottomley; +Cc: SCSI development list

On Fri, 9 Sep 2005, James Bottomley wrote:

> On Fri, 2005-09-09 at 11:16 -0400, Alan Stern wrote:
> > > So which way do you want to go?  Either we wait in recovery for the
> > > error handler to finish and transition the host state to RUNNING or we
> > > introduce the parallel states for the error handler.
> > 
> > For usb-storage it won't make any difference on the whole, as far as I can
> > see.  The important thing is that scsi_remove_host needs to synchronize
> > somehow with the error handler.  Waiting for the host state to go back to
> > RUNNING would be valid.  Introducing the parallel states would mean 
> > waiting for the host to go from CANCEL_RECOVERY to CANCEL, right?
> 
> Actually, no, there would be no waiting.

You mean that there would be no waiting if the parallel states are used -- 
without them we have to wait for the state to go to RUNNING.

>  Once the host gets to DEL or
> CANCEL_DEL, then all the devices should be in DEL and the mid-layer will
> begin rejecting any commands the error handler makes.  So it can
> continue until it's exhausted.  It keeps a reference to the devices it
> needs to operate, so they'll be finally freed when it stops.
> 
> Since the eh has references, module removal would have to wait until it
> had finished, but everything else can proceed without it.

Then what about making scsi_remove_host wait for the current command or
reset to complete?  If you don't wait for the state to change from
CANCEL_RECOVERY to CANCEL before moving to DEL then scsi_remove_host might
return too early, while the error handler is still using the host.  (On
the plus side, if scsi_remove_host always waits for the state to be CANCEL
before making the transition to DEL, then there's no need for DEL_RECOVERY
at all.  The error handler can simply refuse to start when the state is
already DEL.)

The conundrum I'm facing is how to make sure that when scsi_remove_host
returns, the mid-layer is no longer sending anything to the host.  Sure,
no new commands will be issued once the state is set to DEL (or
DEL_RECOVERY).  But what about commands/resets that were already in
progress at that time?  (Especially if they were issued before
scsi_remove_host was called.)  The routine shouldn't return until they
have completed.  This applies to commands coming from either the
high-level driver or from the error handler.

I don't know what the best approach is.  The issue may be unrelated to
whether you use the parallel states; if it is, then I don't care whether
the parallel states are used or not.

Alan Stern

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/5] SCSI scanning and removal fixes
  2005-09-09 16:17                       ` Alan Stern
@ 2005-09-09 16:47                         ` Mike Anderson
  0 siblings, 0 replies; 30+ messages in thread
From: Mike Anderson @ 2005-09-09 16:47 UTC (permalink / raw)
  To: Alan Stern; +Cc: James Bottomley, SCSI development list

Alan Stern <stern@rowland.harvard.edu> wrote:
> The conundrum I'm facing is how to make sure that when scsi_remove_host
> returns, the mid-layer is no longer sending anything to the host.  Sure,
> no new commands will be issued once the state is set to DEL (or
> DEL_RECOVERY).  But what about commands/resets that were already in
> progress at that time?  (Especially if they were issued before
> scsi_remove_host was called.)  The routine shouldn't return until they
> have completed.  This applies to commands coming from either the
> high-level driver or from the error handler.
> 

I do not understand why we would not want to wait on the eh thread and
shut it down before returning from scsi_remove_host. In the end we are
really going to wait on this action anyways.

Currently scsi_send_eh_cmnd does not go through scsi_dispatch_cmd and eh_*
calls have no checks besides SUCCESS and FAILED. The scsi_send_eh_cmnd
should be updated to follow the same model that scsi_dispatch_cmd uses and
the eh_* calls good be improved if we allowed another return code that the
LLDDs could use to indicate that no device was present. These would only
be an optimization to reduce the wait time on the eh thread and if not
implemented would just increase the wait time.

-andmike
--
Michael Anderson
andmike@us.ibm.com

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/5] SCSI scanning and removal fixes
  2005-09-08 23:59                 ` Luben Tuikov
  2005-09-09 14:44                   ` Alan Stern
@ 2005-09-09 17:08                   ` Stefan Richter
  2005-09-09 17:15                     ` Luben Tuikov
  1 sibling, 1 reply; 30+ messages in thread
From: Stefan Richter @ 2005-09-09 17:08 UTC (permalink / raw)
  To: SCSI development list; +Cc: Luben Tuikov, Alan Stern

Luben Tuikov wrote:
> Think about this:
> 	EH is function of scsi_host.

The host does not use the EH. The EH uses the host.

> 	You can define YOUR OWN EH! in usb storage, which you control!

There are a few hotpluggable low-level providers besides usb-storage.
-- 
Stefan Richter
-=====-=-=-= =--= -=--=
http://arcgraph.de/sr/

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/5] SCSI scanning and removal fixes
  2005-09-09 17:08                   ` Stefan Richter
@ 2005-09-09 17:15                     ` Luben Tuikov
  0 siblings, 0 replies; 30+ messages in thread
From: Luben Tuikov @ 2005-09-09 17:15 UTC (permalink / raw)
  To: Stefan Richter; +Cc: SCSI development list, Alan Stern

On 09/09/05 13:08, Stefan Richter wrote:
> Luben Tuikov wrote:
> 
>>Think about this:
>>	EH is function of scsi_host.
> 
> 
> The host does not use the EH. The EH uses the host.

That's exactly what my statement above means:

	F(scsi_host) = EH

>>	You can define YOUR OWN EH! in usb storage, which you control!
> 
> 
> There are a few hotpluggable low-level providers besides usb-storage.

Ok.

	Luben


^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2005-09-09 17:15 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-07-26 14:12 [PATCH 1/5] SCSI scanning and removal fixes Alan Stern
2005-09-07 15:16 ` James Bottomley
2005-09-07 18:27   ` Alan Stern
2005-09-07 18:37     ` Luben Tuikov
2005-09-07 18:42     ` Luben Tuikov
2005-09-07 19:31       ` Alan Stern
2005-09-07 20:00         ` Mike Anderson
2005-09-07 20:43         ` Luben Tuikov
2005-09-07 21:34           ` Stefan Richter
2005-09-08 15:19           ` Alan Stern
2005-09-08 16:07             ` Luben Tuikov
2005-09-08 18:36               ` Alan Stern
2005-09-08 23:59                 ` Luben Tuikov
2005-09-09 14:44                   ` Alan Stern
2005-09-09 17:08                   ` Stefan Richter
2005-09-09 17:15                     ` Luben Tuikov
2005-09-07 19:58     ` James Bottomley
2005-09-07 22:05       ` James Bottomley
2005-09-08 15:59       ` Alan Stern
2005-09-08 16:15         ` James Bottomley
2005-09-08 18:58           ` Alan Stern
2005-09-08 20:15             ` James Bottomley
2005-09-09  0:18               ` Luben Tuikov
2005-09-09 14:16               ` Alan Stern
2005-09-09 14:44                 ` James Bottomley
2005-09-09 15:16                   ` Alan Stern
2005-09-09 15:37                     ` James Bottomley
2005-09-09 16:17                       ` Alan Stern
2005-09-09 16:47                         ` Mike Anderson
2005-09-08 16:08       ` Alan Stern

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).