public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/1] scsi: Handle MLQUEUE busy response in scsi_send_eh_cmnd
@ 2013-04-15 18:39 wenxiong
  2013-04-15 18:39 ` [PATCH 1/1] " wenxiong
  2013-04-15 20:45 ` [PATCH 0/1] " James Bottomley
  0 siblings, 2 replies; 7+ messages in thread
From: wenxiong @ 2013-04-15 18:39 UTC (permalink / raw)
  To: James.Bottomley; +Cc: linux-scsi, brking

In scsi_send_eh_cmnd(), this fix will check the return code of queuecomamnd
when sending commands and retry for a bit if the driver returns a
busy response.

Thanks,
Wendy

-- 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 1/1] scsi: Handle MLQUEUE busy response in scsi_send_eh_cmnd
  2013-04-15 18:39 [PATCH 0/1] scsi: Handle MLQUEUE busy response in scsi_send_eh_cmnd wenxiong
@ 2013-04-15 18:39 ` wenxiong
  2013-04-15 20:45 ` [PATCH 0/1] " James Bottomley
  1 sibling, 0 replies; 7+ messages in thread
From: wenxiong @ 2013-04-15 18:39 UTC (permalink / raw)
  To: James.Bottomley; +Cc: linux-scsi, brking, Wen Xiong

[-- Attachment #1: check_return_of_queuecommand --]
[-- Type: text/plain, Size: 1468 bytes --]

Fix scsi_send_eh_cmnd to check the return code of queuecommand when
sending commands and retry for a bit if the LLDD returns a busy response.
This fixes an issue seen with the ipr driver where an ipr initiated reset
immediately following an eh_host_reset caused EH initiated commands to fail,
resulting in devices being taken offline. This patch resolves the issue.

Signed-off-by: Wen Xiong <wenxiong@linux.vnet.ibm.com>

---
 drivers/scsi/scsi_error.c |   10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

Index: b/drivers/scsi/scsi_error.c
===================================================================
--- a/drivers/scsi/scsi_error.c	2013-04-10 12:55:57.000000000 -0500
+++ b/drivers/scsi/scsi_error.c	2013-04-10 13:04:12.467858487 -0500
@@ -793,6 +793,7 @@ static int scsi_send_eh_cmnd(struct scsi
 	DECLARE_COMPLETION_ONSTACK(done);
 	unsigned long timeleft;
 	struct scsi_eh_save ses;
+	int attempts = 30;
 	int rtn;
 
 	scsi_eh_prep_cmnd(scmd, &ses, cmnd, cmnd_size, sense_bytes);
@@ -800,7 +801,14 @@ static int scsi_send_eh_cmnd(struct scsi
 
 	scsi_log_send(scmd);
 	scmd->scsi_done = scsi_eh_done;
-	shost->hostt->queuecommand(shost, scmd);
+
+	while ((rtn = shost->hostt->queuecommand(shost, scmd)) && attempts) {
+		if (rtn == SCSI_MLQUEUE_DEVICE_BUSY ||
+		    rtn == SCSI_MLQUEUE_TARGET_BUSY ||
+		    rtn == SCSI_MLQUEUE_HOST_BUSY)
+			attempts--;
+		ssleep(1);
+	}
 
 	timeleft = wait_for_completion_timeout(&done, timeout);
 

-- 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 0/1] scsi: Handle MLQUEUE busy response in scsi_send_eh_cmnd
  2013-04-15 18:39 [PATCH 0/1] scsi: Handle MLQUEUE busy response in scsi_send_eh_cmnd wenxiong
  2013-04-15 18:39 ` [PATCH 1/1] " wenxiong
@ 2013-04-15 20:45 ` James Bottomley
  2013-04-15 21:55   ` Brian King
  1 sibling, 1 reply; 7+ messages in thread
From: James Bottomley @ 2013-04-15 20:45 UTC (permalink / raw)
  To: wenxiong; +Cc: linux-scsi, brking

On Mon, 2013-04-15 at 13:39 -0500, wenxiong@linux.vnet.ibm.com wrote:
> In scsi_send_eh_cmnd(), this fix will check the return code of queuecomamnd
> when sending commands and retry for a bit if the driver returns a
> busy response.

This is already handled by the timeout, I think.  If a driver
continuously returns MLQUEUE BUSY, then we'll fail the request after the
timeout on the command expires.

James



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 0/1] scsi: Handle MLQUEUE busy response in scsi_send_eh_cmnd
  2013-04-15 20:45 ` [PATCH 0/1] " James Bottomley
@ 2013-04-15 21:55   ` Brian King
  2013-04-15 22:33     ` James Bottomley
  0 siblings, 1 reply; 7+ messages in thread
From: Brian King @ 2013-04-15 21:55 UTC (permalink / raw)
  To: James Bottomley; +Cc: wenxiong, linux-scsi

On 04/15/2013 03:45 PM, James Bottomley wrote:
> On Mon, 2013-04-15 at 13:39 -0500, wenxiong@linux.vnet.ibm.com wrote:
>> In scsi_send_eh_cmnd(), this fix will check the return code of queuecomamnd
>> when sending commands and retry for a bit if the driver returns a
>> busy response.
> 
> This is already handled by the timeout, I think.  If a driver
> continuously returns MLQUEUE BUSY, then we'll fail the request after the
> timeout on the command expires.

If we get a timeout in scsi_send_eh_cmnd we call scsi_abort_eh_cmnd. If the
abort works, we return FAILED out of scsi_send_eh_cmnd, which results in
no retry being performed, since scsi_eh_tur only retries once and
only if NEEDS_RETRY is returned. Or am I missing something?

Thanks,

Brian

-- 
Brian King
Power Linux I/O
IBM Linux Technology Center



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 0/1] scsi: Handle MLQUEUE busy response in scsi_send_eh_cmnd
  2013-04-15 21:55   ` Brian King
@ 2013-04-15 22:33     ` James Bottomley
  2013-04-16  0:09       ` wenxiong
  2013-04-16 15:12       ` Brian King
  0 siblings, 2 replies; 7+ messages in thread
From: James Bottomley @ 2013-04-15 22:33 UTC (permalink / raw)
  To: Brian King; +Cc: wenxiong, linux-scsi

On Mon, 2013-04-15 at 16:55 -0500, Brian King wrote:
> On 04/15/2013 03:45 PM, James Bottomley wrote:
> > On Mon, 2013-04-15 at 13:39 -0500, wenxiong@linux.vnet.ibm.com wrote:
> >> In scsi_send_eh_cmnd(), this fix will check the return code of queuecomamnd
> >> when sending commands and retry for a bit if the driver returns a
> >> busy response.
> > 
> > This is already handled by the timeout, I think.  If a driver
> > continuously returns MLQUEUE BUSY, then we'll fail the request after the
> > timeout on the command expires.
> 
> If we get a timeout in scsi_send_eh_cmnd we call scsi_abort_eh_cmnd. If the
> abort works, we return FAILED out of scsi_send_eh_cmnd, which results in
> no retry being performed, since scsi_eh_tur only retries once and
> only if NEEDS_RETRY is returned. Or am I missing something?

Sorry, I'm not being clear.  It comes with being at a conference.  What
I mean is that if you do this, the criterion for success or failure
should be the amount of time left not the number of retries.  This is
what the non-eh submission path also does for retries of events that
don't count against the retry limit ... so something like this patch
(uncompiled and untested #include stddisclaimer.h)

James

----
diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index c1b05a8..93ab4f4 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -793,6 +793,7 @@ static int scsi_send_eh_cmnd(struct scsi_cmnd *scmd, unsigned char *cmnd,
 	DECLARE_COMPLETION_ONSTACK(done);
 	unsigned long timeleft;
 	struct scsi_eh_save ses;
+	const int stall_for = min(HZ/10,1); /* 100 ms */
 	int rtn;
 
 	scsi_eh_prep_cmnd(scmd, &ses, cmnd, cmnd_size, sense_bytes);
@@ -802,6 +803,8 @@ static int scsi_send_eh_cmnd(struct scsi_cmnd *scmd, unsigned char *cmnd,
 	scmd->scsi_done = scsi_eh_done;
 	shost->hostt->queuecommand(shost, scmd);
 
+  retry:
+
 	timeleft = wait_for_completion_timeout(&done, timeout);
 
 	shost->eh_action = NULL;
@@ -831,8 +834,12 @@ static int scsi_send_eh_cmnd(struct scsi_cmnd *scmd, unsigned char *cmnd,
 		case TARGET_ERROR:
 			break;
 		case ADD_TO_MLQUEUE:
-			rtn = NEEDS_RETRY;
-			break;
+			if (timeleft > stall_for) {
+				timeout = timeleft - stall_for;
+				msleep(stall_for);
+				goto retry;
+			}
+			/* fall through */
 		default:
 			rtn = FAILED;
 			break;




^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 0/1] scsi: Handle MLQUEUE busy response in scsi_send_eh_cmnd
  2013-04-15 22:33     ` James Bottomley
@ 2013-04-16  0:09       ` wenxiong
  2013-04-16 15:12       ` Brian King
  1 sibling, 0 replies; 7+ messages in thread
From: wenxiong @ 2013-04-16  0:09 UTC (permalink / raw)
  To: James Bottomley; +Cc: Brian King, linux-scsi


Quoting James Bottomley <James.Bottomley@hansenpartnership.com>:

> On Mon, 2013-04-15 at 16:55 -0500, Brian King wrote:
>> On 04/15/2013 03:45 PM, James Bottomley wrote:
>> > On Mon, 2013-04-15 at 13:39 -0500, wenxiong@linux.vnet.ibm.com wrote:
>> >> In scsi_send_eh_cmnd(), this fix will check the return code of  
>> queuecomamnd
>> >> when sending commands and retry for a bit if the driver returns a
>> >> busy response.
>> >
>> > This is already handled by the timeout, I think.  If a driver
>> > continuously returns MLQUEUE BUSY, then we'll fail the request after the
>> > timeout on the command expires.
>>
>> If we get a timeout in scsi_send_eh_cmnd we call scsi_abort_eh_cmnd. If the
>> abort works, we return FAILED out of scsi_send_eh_cmnd, which results in
>> no retry being performed, since scsi_eh_tur only retries once and
>> only if NEEDS_RETRY is returned. Or am I missing something?
>
> Sorry, I'm not being clear.  It comes with being at a conference.  What
> I mean is that if you do this, the criterion for success or faiure
> should be the amount of time left not the number of retries.  This is
> what the non-eh submission path also does for retries of events that
> don't count against the retry limit ... so something like this patch
> (uncompiled and untested #include stddisclaimer.h)
>
> James

Hi James,

The failing case for us is: Doesn't matter what timeout value we set in
wait_for_completion_timeout(), it always returns with timeleft = 0.
For example, if I set timeout = 50 secs, wait_for_completion_timeout()
always returns with timeleft =0(even adapter is already in good shape in
20 secs).  We never gets a chance to call into if (timeleft) leg.

My understanding is: if shost->host->queuecommand() failed with MLQUEUE busy
response at the first time, wait_for_completion_timeout() always wakes  
up by expired.

Here is log when I enabled scsi log:

Apr 15 18:44:35 ltcsatiocp5 kernel: scsi_send_eh_cmnd: scmd:  
c0000000f88bc980, timeleft: 0

I applied your patch. Because timeleft is always zero, never got a  
chance to call into
if(timeleft) { leg.

Thanks,
Wendy


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 0/1] scsi: Handle MLQUEUE busy response in scsi_send_eh_cmnd
  2013-04-15 22:33     ` James Bottomley
  2013-04-16  0:09       ` wenxiong
@ 2013-04-16 15:12       ` Brian King
  1 sibling, 0 replies; 7+ messages in thread
From: Brian King @ 2013-04-16 15:12 UTC (permalink / raw)
  To: James Bottomley; +Cc: wenxiong, linux-scsi

On 04/15/2013 05:33 PM, James Bottomley wrote:
> On Mon, 2013-04-15 at 16:55 -0500, Brian King wrote:
>> On 04/15/2013 03:45 PM, James Bottomley wrote:
>>> On Mon, 2013-04-15 at 13:39 -0500, wenxiong@linux.vnet.ibm.com wrote:
>>>> In scsi_send_eh_cmnd(), this fix will check the return code of queuecomamnd
>>>> when sending commands and retry for a bit if the driver returns a
>>>> busy response.
>>>
>>> This is already handled by the timeout, I think.  If a driver
>>> continuously returns MLQUEUE BUSY, then we'll fail the request after the
>>> timeout on the command expires.
>>
>> If we get a timeout in scsi_send_eh_cmnd we call scsi_abort_eh_cmnd. If the
>> abort works, we return FAILED out of scsi_send_eh_cmnd, which results in
>> no retry being performed, since scsi_eh_tur only retries once and
>> only if NEEDS_RETRY is returned. Or am I missing something?
> 
> Sorry, I'm not being clear.  It comes with being at a conference.  What
> I mean is that if you do this, the criterion for success or failure
> should be the amount of time left not the number of retries.  This is
> what the non-eh submission path also does for retries of events that
> don't count against the retry limit ... so something like this patch
> (uncompiled and untested #include stddisclaimer.h)

Jams,

Wendy and I discussed this a bit more and I think we understand your concern.
Wendy is working on an updated patch.

Thanks,

Brian

-- 
Brian King
Power Linux I/O
IBM Linux Technology Center



^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2013-04-16 15:13 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-04-15 18:39 [PATCH 0/1] scsi: Handle MLQUEUE busy response in scsi_send_eh_cmnd wenxiong
2013-04-15 18:39 ` [PATCH 1/1] " wenxiong
2013-04-15 20:45 ` [PATCH 0/1] " James Bottomley
2013-04-15 21:55   ` Brian King
2013-04-15 22:33     ` James Bottomley
2013-04-16  0:09       ` wenxiong
2013-04-16 15:12       ` Brian King

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox