[PATCH 1/2] Block I/O while SG reset operation in progress

linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 1/2] Block I/O while SG reset operation in progress - midlayer portion
@ 2006-02-24 16:52 James Smart
  2006-02-24 20:11 ` Mike Anderson
  0 siblings, 1 reply; 4+ messages in thread
From: James Smart @ 2006-02-24 16:52 UTC (permalink / raw)
  To: linux-scsi

This is the midlayer portion of the patch

This patch ensures that i/o is stopped while an eh handler is being
processed. It adds a new flag, set by the async reset callers, which
augments the host-in-reset checks and stops i/o.  The async reset
callers are already synchronized to hold off until the error thread is
no longer running.

-- james s

Signed-off-by: James Smart <James.Smart@emulex.com>


diff -upNr a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
--- a/drivers/scsi/scsi_error.c	2006-02-06 12:00:11.000000000 -0500
+++ b/drivers/scsi/scsi_error.c	2006-02-13 10:44:10.000000000 -0500
@@ -1654,7 +1654,9 @@ int
  scsi_reset_provider(struct scsi_device *dev, int flag)
  {
  	struct scsi_cmnd *scmd = scsi_get_command(dev, GFP_KERNEL);
+	struct Scsi_Host *shost = dev->host;
  	struct request req;
+	unsigned long flags;
  	int rtn;

  	scmd->request = &req;
@@ -1684,6 +1686,10 @@ scsi_reset_provider(struct scsi_device *
  	 */
  	scmd->pid			= 0;

+	spin_lock_irqsave(shost->host_lock, flags);
+	shost->tmf_in_progress = 1;
+	spin_unlock_irqrestore(shost->host_lock, flags);
+
  	switch (flag) {
  	case SCSI_TRY_RESET_DEVICE:
  		rtn = scsi_try_bus_device_reset(scmd);
@@ -1702,6 +1708,10 @@ scsi_reset_provider(struct scsi_device *
  		rtn = FAILED;
  	}

+	spin_lock_irqsave(shost->host_lock, flags);
+	shost->tmf_in_progress = 0;
+	spin_unlock_irqrestore(shost->host_lock, flags);
+
  	scsi_next_command(scmd);
  	return rtn;
  }
diff -upNr a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h
--- a/include/scsi/scsi_host.h	2006-02-06 12:00:33.000000000 -0500
+++ b/include/scsi/scsi_host.h	2006-02-13 10:21:01.000000000 -0500
@@ -556,6 +556,9 @@ struct Scsi_Host {
  	 */
  	unsigned ordered_tag:1;

+	/* task mgmt function in progress */
+	unsigned tmf_in_progress:1;
+
  	/*
  	 * Optional work queue to be utilized by the transport
  	 */
@@ -633,7 +636,8 @@ static inline int scsi_host_in_recovery(
  {
  	return shost->shost_state == SHOST_RECOVERY ||
  		shost->shost_state == SHOST_CANCEL_RECOVERY ||
-		shost->shost_state == SHOST_DEL_RECOVERY;
+		shost->shost_state == SHOST_DEL_RECOVERY ||
+		shost->tmf_in_progress;
  }

  extern int scsi_queue_work(struct Scsi_Host *, struct work_struct *);


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH 1/2] Block I/O while SG reset operation in progress - midlayer portion
  2006-02-24 16:52 [PATCH 1/2] Block I/O while SG reset operation in progress - midlayer portion James Smart
@ 2006-02-24 20:11 ` Mike Anderson
  2006-02-24 22:35   ` James Smart
  0 siblings, 1 reply; 4+ messages in thread
From: Mike Anderson @ 2006-02-24 20:11 UTC (permalink / raw)
  To: James Smart; +Cc: linux-scsi

James Smart <James.Smart@Emulex.Com> wrote:
> This is the midlayer portion of the patch
> 
> This patch ensures that i/o is stopped while an eh handler is being
> processed. It adds a new flag, set by the async reset callers, which
> augments the host-in-reset checks and stops i/o.  The async reset
> callers are already synchronized to hold off until the error thread is
> no longer running.

Why the new flag instead of calling scsi_host_set_state(shost,
SHOST_RECOVERY) or a new state if you want different behavior?

Not using the state model but still doing recovery actions may run into
issues if these happen while someone tries to remove a host.

-andmike
--
Michael Anderson
andmike@us.ibm.com

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH 1/2] Block I/O while SG reset operation in progress - midlayer portion
  2006-02-24 20:11 ` Mike Anderson
@ 2006-02-24 22:35   ` James Smart
  2006-02-28  7:09     ` Mike Anderson
  0 siblings, 1 reply; 4+ messages in thread
From: James Smart @ 2006-02-24 22:35 UTC (permalink / raw)
  To: Mike Anderson; +Cc: linux-scsi

Well for a couple of reasons. I didn't view it as an actual recovery
function as it's execution doesn't attempt to change the state of the
shost, starget or sdev. It's not part of an escalation policy, etc.
It's essentially performing the same effect as a reset by some other
initiator in a multi-initiator environment, just with i/o failing faster
than it may normally take us to detect it. As for side effects on the
teardown, I don't see anything different than what exists today.
At worst, it should only add a short delay until the ioflow resumes.

I also looked at what it would take to implement a full state change
or piggy back on SHOST_RECOVERY. It's a significant amount of code,
and the addition of a new state brings it's own set of additional
issues to get everything to play right. In the end, I saw very little
change in result, but with lots of code changes.

My main concern was - what is the interface definition we're telling
the LLDD's for the eh routines ?  Does the LLDD expect to always be
entered in them while in the error thread ? Does the LLDD expect that
an eh routine will never be invoked will another call to that routine
is outstanding ?  Obviously, none of these are true with the existing
implementation.

Thoughts ?

-- james

Mike Anderson wrote:
> James Smart <James.Smart@Emulex.Com> wrote:
>> This is the midlayer portion of the patch
>>
>> This patch ensures that i/o is stopped while an eh handler is being
>> processed. It adds a new flag, set by the async reset callers, which
>> augments the host-in-reset checks and stops i/o.  The async reset
>> callers are already synchronized to hold off until the error thread is
>> no longer running.
> 
> Why the new flag instead of calling scsi_host_set_state(shost,
> SHOST_RECOVERY) or a new state if you want different behavior?
> 
> Not using the state model but still doing recovery actions may run into
> issues if these happen while someone tries to remove a host.
> 
> -andmike
> --
> Michael Anderson
> andmike@us.ibm.com
> 

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH 1/2] Block I/O while SG reset operation in progress - midlayer portion
  2006-02-24 22:35   ` James Smart
@ 2006-02-28  7:09     ` Mike Anderson
  0 siblings, 0 replies; 4+ messages in thread
From: Mike Anderson @ 2006-02-28  7:09 UTC (permalink / raw)
  To: James Smart; +Cc: linux-scsi

James Smart <James.Smart@Emulex.Com> wrote:
> Well for a couple of reasons. I didn't view it as an actual recovery
> function as it's execution doesn't attempt to change the state of the
> shost, starget or sdev. It's not part of an escalation policy, etc.
> It's essentially performing the same effect as a reset by some other
> initiator in a multi-initiator environment, just with i/o failing faster
> than it may normally take us to detect it. As for side effects on the
> teardown, I don't see anything different than what exists today.
> At worst, it should only add a short delay until the ioflow resumes.

I agree it is unsafe today as we do not restrict the scsi_reset_provider
during scsi_remove_host.

In scsi_remove_host there was code added to detect recovery running during
removal and try to reduce the chance of eh_* routines being called post
scsi_remove_host returning.

> 
> I also looked at what it would take to implement a full state change
> or piggy back on SHOST_RECOVERY. It's a significant amount of code,
> and the addition of a new state brings it's own set of additional
> issues to get everything to play right. In the end, I saw very little
> change in result, but with lots of code changes.
> 

Yes, there is some code overhead, but some of it is needed. In looking at
your patch you add to the scsi_host_in_recovery case, but do not have a
wake_up for event waiters on host_wait.

> My main concern was - what is the interface definition we're telling
> the LLDD's for the eh routines ?  Does the LLDD expect to always be
> entered in them while in the error thread ? Does the LLDD expect that
> an eh routine will never be invoked will another call to that routine
> is outstanding ?  Obviously, none of these are true with the existing
> implementation.

Well the scsi_mid_low_api.txt indicates that the eh_*_reset_handler
routines will be called with "No other commands will be queued on current
host during eh". As you have indicated this is not true today. I did
review a few reset functions: One checked to see if a reset action was
pending and returned an error, the others appeared to not care about
current state of a reset action.

-andmike
--
Michael Anderson
andmike@us.ibm.com

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2006-02-28  7:10 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-02-24 16:52 [PATCH 1/2] Block I/O while SG reset operation in progress - midlayer portion James Smart
2006-02-24 20:11 ` Mike Anderson
2006-02-24 22:35   ` James Smart
2006-02-28  7:09     ` Mike Anderson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).