[PATCH 1/1] scsi: Add EH Start Unit retry

linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 1/1] scsi: Add EH Start Unit retry
@ 2007-03-29 20:25 Brian King
  0 siblings, 0 replies; 7+ messages in thread
From: Brian King @ 2007-03-29 20:25 UTC (permalink / raw)
  To: James.Bottomley; +Cc: linux-scsi, thlin, brking


Currently, the scsi error handler will issue a START_UNIT
command if the drive indicates it needs its motor started
and the allow_restart flag is set in the scsi_device. If,
after the scsi error handler invokes a host adapter reset
due to error recovery, a device is in a unit attention
state AND also needs a START_UNIT, that device will be placed
offline. The disk array devices on an ipr RAID adapter
will do exactly this when in a dual initiator configuration.
This patch adds a single retry to the EH initiated
START_UNIT.

Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
---

 linux-2.6-bjking1/drivers/scsi/scsi_error.c |   14 ++++++++++++--
 1 files changed, 12 insertions(+), 2 deletions(-)

diff -puN drivers/scsi/scsi_error.c~scsi_eh_stu_retry drivers/scsi/scsi_error.c
--- linux-2.6/drivers/scsi/scsi_error.c~scsi_eh_stu_retry	2007-03-14 15:04:17.000000000 -0500
+++ linux-2.6-bjking1/drivers/scsi/scsi_error.c	2007-03-14 15:04:17.000000000 -0500
@@ -923,12 +923,22 @@ static int scsi_eh_try_stu(struct scsi_c
 	static unsigned char stu_command[6] = {START_STOP, 0, 0, 0, 1, 0};
 
 	if (scmd->device->allow_restart) {
-		int rtn;
+		int retry_cnt = 1, rtn;
 
+retry_stu:
 		rtn = scsi_send_eh_cmnd(scmd, stu_command, 6,
 					START_UNIT_TIMEOUT, 0);
-		if (rtn == SUCCESS)
+
+		switch (rtn) {
+		case SUCCESS:
 			return 0;
+		case NEEDS_RETRY:
+			if (retry_cnt--)
+				goto retry_stu;
+			/*FALLTHRU*/
+		default:
+			return 1;
+		}
 	}
 
 	return 1;
_

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/1] scsi: Add EH Start Unit retry
       [not found] <11751999513070-patch-mail.ibm.com>
@ 2007-03-29 22:02 ` Douglas Gilbert
  2007-04-02 18:57   ` Brian King
  2007-04-01 16:36 ` James Bottomley
  1 sibling, 1 reply; 7+ messages in thread
From: Douglas Gilbert @ 2007-03-29 22:02 UTC (permalink / raw)
  To: Brian King; +Cc: James.Bottomley, linux-scsi, thlin

Brian King wrote:
> Currently, the scsi error handler will issue a START_UNIT
> command if the drive indicates it needs its motor started
> and the allow_restart flag is set in the scsi_device. If,
> after the scsi error handler invokes a host adapter reset
> due to error recovery, a device is in a unit attention
> state AND also needs a START_UNIT, that device will be placed
> offline. The disk array devices on an ipr RAID adapter
> will do exactly this when in a dual initiator configuration.
> This patch adds a single retry to the EH initiated
> START_UNIT.

I have no objection to this patch. Just seems a pity
that SCSI devices go to the trouble of sending unit
attentions while OSes just throw them away.

Perhaps the scsi_device sysfs directory could have entries
like:
  last_ua_asc
  last_ua_ascq
  last_ua_timestamp
where code could place the asc/ascq codes and a timestamp
then continue doing a retry.
Could we get a log entry, hotplug event?

Logical units may queue unit attentions (sam4r10.pdf
section 5.8.7) so it is possible that one retry may
not be enough. With my suggestion above, only the last
one would persist for a reasonable time.

Doug Gilbert

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/1] scsi: Add EH Start Unit retry
       [not found] <11751999513070-patch-mail.ibm.com>
  2007-03-29 22:02 ` [PATCH 1/1] scsi: Add EH Start Unit retry Douglas Gilbert
@ 2007-04-01 16:36 ` James Bottomley
  2007-04-02 13:38   ` Brian King
  1 sibling, 1 reply; 7+ messages in thread
From: James Bottomley @ 2007-04-01 16:36 UTC (permalink / raw)
  To: Brian King; +Cc: linux-scsi, thlin

On Thu, 2007-03-29 at 15:25 -0500, Brian King wrote:
> -		int rtn;
> +		int retry_cnt = 1, rtn;
>  
> +retry_stu:
>  		rtn = scsi_send_eh_cmnd(scmd, stu_command, 6,
>  					START_UNIT_TIMEOUT, 0);
> -		if (rtn == SUCCESS)
> +
> +		switch (rtn) {
> +		case SUCCESS:
>  			return 0;
> +		case NEEDS_RETRY:
> +			if (retry_cnt--)
> +				goto retry_stu;
> +			/*FALLTHRU*/
> +		default:
> +			return 1;
> +		}

This is pretty much an open coded for loop ... how about just doing a
for loop as attached below?

The two advantages to this are:

     1. it's cleaner and more readable
     2. the compiler knows how to optimize it better

James

diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index 7a1a1bb..28a266c 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -932,10 +932,12 @@ static int scsi_eh_try_stu(struct scsi_cmnd *scmd)
 	static unsigned char stu_command[6] = {START_STOP, 0, 0, 0, 1, 0};
 
 	if (scmd->device->allow_restart) {
-		int rtn;
+		int i, rtn = NEEDS_RETRY;
+
+		for (i = 0; rtn == NEEDS_RETRY && i < 2; i++)
+			rtn = scsi_send_eh_cmnd(scmd, stu_command, 6,
+						START_UNIT_TIMEOUT, 0);
 
-		rtn = scsi_send_eh_cmnd(scmd, stu_command, 6,
-					START_UNIT_TIMEOUT, 0);
 		if (rtn == SUCCESS)
 			return 0;
 	}



^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/1] scsi: Add EH Start Unit retry
  2007-04-01 16:36 ` James Bottomley
@ 2007-04-02 13:38   ` Brian King
  0 siblings, 0 replies; 7+ messages in thread
From: Brian King @ 2007-04-02 13:38 UTC (permalink / raw)
  To: James Bottomley; +Cc: Brian King, linux-scsi, thlin

James Bottomley wrote:
> On Thu, 2007-03-29 at 15:25 -0500, Brian King wrote:
> This is pretty much an open coded for loop ... how about just doing a
> for loop as attached below?

Fine by me. I was simply coding it up to look like scsi_eh_tur.

Thanks,

Brian

> 
> The two advantages to this are:
> 
>      1. it's cleaner and more readable
>      2. the compiler knows how to optimize it better
> 
> James
> 
> diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
> index 7a1a1bb..28a266c 100644
> --- a/drivers/scsi/scsi_error.c
> +++ b/drivers/scsi/scsi_error.c
> @@ -932,10 +932,12 @@ static int scsi_eh_try_stu(struct scsi_cmnd *scmd)
>  	static unsigned char stu_command[6] = {START_STOP, 0, 0, 0, 1, 0};
> 
>  	if (scmd->device->allow_restart) {
> -		int rtn;
> +		int i, rtn = NEEDS_RETRY;
> +
> +		for (i = 0; rtn == NEEDS_RETRY && i < 2; i++)
> +			rtn = scsi_send_eh_cmnd(scmd, stu_command, 6,
> +						START_UNIT_TIMEOUT, 0);
> 
> -		rtn = scsi_send_eh_cmnd(scmd, stu_command, 6,
> -					START_UNIT_TIMEOUT, 0);
>  		if (rtn == SUCCESS)
>  			return 0;
>  	}
> 
> 


-- 
Brian King
eServer Storage I/O
IBM Linux Technology Center

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/1] scsi: Add EH Start Unit retry
  2007-03-29 22:02 ` [PATCH 1/1] scsi: Add EH Start Unit retry Douglas Gilbert
@ 2007-04-02 18:57   ` Brian King
  2007-04-03  3:09     ` James Bottomley
  0 siblings, 1 reply; 7+ messages in thread
From: Brian King @ 2007-04-02 18:57 UTC (permalink / raw)
  To: dougg; +Cc: James.Bottomley, linux-scsi, thlin

Douglas Gilbert wrote:
> Brian King wrote:
>> Currently, the scsi error handler will issue a START_UNIT
>> command if the drive indicates it needs its motor started
>> and the allow_restart flag is set in the scsi_device. If,
>> after the scsi error handler invokes a host adapter reset
>> due to error recovery, a device is in a unit attention
>> state AND also needs a START_UNIT, that device will be placed
>> offline. The disk array devices on an ipr RAID adapter
>> will do exactly this when in a dual initiator configuration.
>> This patch adds a single retry to the EH initiated
>> START_UNIT.
> 
> I have no objection to this patch. Just seems a pity
> that SCSI devices go to the trouble of sending unit
> attentions while OSes just throw them away.

I agree. The reason the ipr adapter firmware added this UA
in this configuration was to support SCSI 1 reservations and
communicate to the host that any reservation previously
held to the disk array is now lost since the adapter was reset.

> Perhaps the scsi_device sysfs directory could have entries
> like:
>   last_ua_asc
>   last_ua_ascq
>   last_ua_timestamp
> where code could place the asc/ascq codes and a timestamp
> then continue doing a retry.
> Could we get a log entry, hotplug event?

If we did have a way to communicate UA's to userspace like this
it seems like it would allow usage of SCSI 1 reservations
in this config and make it easier for an out of band tool to
manage these reservations. I wonder if it would be cleaner if
UAs could simply be sent up as netlink events / uevents, so they
could contain all the information needed in one packet, rather
than having to read sysfs attributes to figure out what happened.

> Logical units may queue unit attentions (sam4r10.pdf
> section 5.8.7) so it is possible that one retry may
> not be enough. With my suggestion above, only the last
> one would persist for a reasonable time.

Yep. I've already ran into that with dual ported SAS devices.
While one retry is sufficient for the ipr disk array devices I am
trying to fix this for, I have no objection to increasing it.
Maybe its just a case of increasing it later if it ends up
being an issue.

Brian

-- 
Brian King
eServer Storage I/O
IBM Linux Technology Center

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/1] scsi: Add EH Start Unit retry
  2007-04-02 18:57   ` Brian King
@ 2007-04-03  3:09     ` James Bottomley
  2007-04-03 13:20       ` Brian King
  0 siblings, 1 reply; 7+ messages in thread
From: James Bottomley @ 2007-04-03  3:09 UTC (permalink / raw)
  To: brking; +Cc: dougg, linux-scsi, thlin

On Mon, 2007-04-02 at 13:57 -0500, Brian King wrote:
> I agree. The reason the ipr adapter firmware added this UA
> in this configuration was to support SCSI 1 reservations and
> communicate to the host that any reservation previously
> held to the disk array is now lost since the adapter was reset.

For that particular case, scsi_report_bus_reset() was the original
design concept ... it's asynchronous when the problem occurs ... if you
have to wait until UA, you don't necessarily find there's a problem
until much later.

James




^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/1] scsi: Add EH Start Unit retry
  2007-04-03  3:09     ` James Bottomley
@ 2007-04-03 13:20       ` Brian King
  0 siblings, 0 replies; 7+ messages in thread
From: Brian King @ 2007-04-03 13:20 UTC (permalink / raw)
  To: James Bottomley; +Cc: dougg, linux-scsi, thlin

James Bottomley wrote:
> On Mon, 2007-04-02 at 13:57 -0500, Brian King wrote:
>> I agree. The reason the ipr adapter firmware added this UA
>> in this configuration was to support SCSI 1 reservations and
>> communicate to the host that any reservation previously
>> held to the disk array is now lost since the adapter was reset.
> 
> For that particular case, scsi_report_bus_reset() was the original
> design concept ... it's asynchronous when the problem occurs ... if you
> have to wait until UA, you don't necessarily find there's a problem
> until much later.

Right. One of the ipr patches in my queue is to call scsi_report_bus_reset
for the logical disk array bus after the adapter is reset, so that scsi core
knows about this reset and it can handle the UA, although this patch is
obviously still needed even with that change... 

Brian

-- 
Brian King
eServer Storage I/O
IBM Linux Technology Center

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2007-04-03 13:21 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <11751999513070-patch-mail.ibm.com>
2007-03-29 22:02 ` [PATCH 1/1] scsi: Add EH Start Unit retry Douglas Gilbert
2007-04-02 18:57   ` Brian King
2007-04-03  3:09     ` James Bottomley
2007-04-03 13:20       ` Brian King
2007-04-01 16:36 ` James Bottomley
2007-04-02 13:38   ` Brian King
2007-03-29 20:25 Brian King

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).