* [PATCH 0/1] scsi: Handle MLQUEUE busy response in scsi_send_eh_cmnd
@ 2013-04-15 18:39 wenxiong
2013-04-15 18:39 ` [PATCH 1/1] " wenxiong
2013-04-15 20:45 ` [PATCH 0/1] " James Bottomley
0 siblings, 2 replies; 7+ messages in thread
From: wenxiong @ 2013-04-15 18:39 UTC (permalink / raw)
To: James.Bottomley; +Cc: linux-scsi, brking
In scsi_send_eh_cmnd(), this fix will check the return code of queuecomamnd
when sending commands and retry for a bit if the driver returns a
busy response.
Thanks,
Wendy
--
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH 1/1] scsi: Handle MLQUEUE busy response in scsi_send_eh_cmnd
2013-04-15 18:39 [PATCH 0/1] scsi: Handle MLQUEUE busy response in scsi_send_eh_cmnd wenxiong
@ 2013-04-15 18:39 ` wenxiong
2013-04-15 20:45 ` [PATCH 0/1] " James Bottomley
1 sibling, 0 replies; 7+ messages in thread
From: wenxiong @ 2013-04-15 18:39 UTC (permalink / raw)
To: James.Bottomley; +Cc: linux-scsi, brking, Wen Xiong
[-- Attachment #1: check_return_of_queuecommand --]
[-- Type: text/plain, Size: 1468 bytes --]
Fix scsi_send_eh_cmnd to check the return code of queuecommand when
sending commands and retry for a bit if the LLDD returns a busy response.
This fixes an issue seen with the ipr driver where an ipr initiated reset
immediately following an eh_host_reset caused EH initiated commands to fail,
resulting in devices being taken offline. This patch resolves the issue.
Signed-off-by: Wen Xiong <wenxiong@linux.vnet.ibm.com>
---
drivers/scsi/scsi_error.c | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)
Index: b/drivers/scsi/scsi_error.c
===================================================================
--- a/drivers/scsi/scsi_error.c 2013-04-10 12:55:57.000000000 -0500
+++ b/drivers/scsi/scsi_error.c 2013-04-10 13:04:12.467858487 -0500
@@ -793,6 +793,7 @@ static int scsi_send_eh_cmnd(struct scsi
DECLARE_COMPLETION_ONSTACK(done);
unsigned long timeleft;
struct scsi_eh_save ses;
+ int attempts = 30;
int rtn;
scsi_eh_prep_cmnd(scmd, &ses, cmnd, cmnd_size, sense_bytes);
@@ -800,7 +801,14 @@ static int scsi_send_eh_cmnd(struct scsi
scsi_log_send(scmd);
scmd->scsi_done = scsi_eh_done;
- shost->hostt->queuecommand(shost, scmd);
+
+ while ((rtn = shost->hostt->queuecommand(shost, scmd)) && attempts) {
+ if (rtn == SCSI_MLQUEUE_DEVICE_BUSY ||
+ rtn == SCSI_MLQUEUE_TARGET_BUSY ||
+ rtn == SCSI_MLQUEUE_HOST_BUSY)
+ attempts--;
+ ssleep(1);
+ }
timeleft = wait_for_completion_timeout(&done, timeout);
--
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 0/1] scsi: Handle MLQUEUE busy response in scsi_send_eh_cmnd
2013-04-15 18:39 [PATCH 0/1] scsi: Handle MLQUEUE busy response in scsi_send_eh_cmnd wenxiong
2013-04-15 18:39 ` [PATCH 1/1] " wenxiong
@ 2013-04-15 20:45 ` James Bottomley
2013-04-15 21:55 ` Brian King
1 sibling, 1 reply; 7+ messages in thread
From: James Bottomley @ 2013-04-15 20:45 UTC (permalink / raw)
To: wenxiong; +Cc: linux-scsi, brking
On Mon, 2013-04-15 at 13:39 -0500, wenxiong@linux.vnet.ibm.com wrote:
> In scsi_send_eh_cmnd(), this fix will check the return code of queuecomamnd
> when sending commands and retry for a bit if the driver returns a
> busy response.
This is already handled by the timeout, I think. If a driver
continuously returns MLQUEUE BUSY, then we'll fail the request after the
timeout on the command expires.
James
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 0/1] scsi: Handle MLQUEUE busy response in scsi_send_eh_cmnd
2013-04-15 20:45 ` [PATCH 0/1] " James Bottomley
@ 2013-04-15 21:55 ` Brian King
2013-04-15 22:33 ` James Bottomley
0 siblings, 1 reply; 7+ messages in thread
From: Brian King @ 2013-04-15 21:55 UTC (permalink / raw)
To: James Bottomley; +Cc: wenxiong, linux-scsi
On 04/15/2013 03:45 PM, James Bottomley wrote:
> On Mon, 2013-04-15 at 13:39 -0500, wenxiong@linux.vnet.ibm.com wrote:
>> In scsi_send_eh_cmnd(), this fix will check the return code of queuecomamnd
>> when sending commands and retry for a bit if the driver returns a
>> busy response.
>
> This is already handled by the timeout, I think. If a driver
> continuously returns MLQUEUE BUSY, then we'll fail the request after the
> timeout on the command expires.
If we get a timeout in scsi_send_eh_cmnd we call scsi_abort_eh_cmnd. If the
abort works, we return FAILED out of scsi_send_eh_cmnd, which results in
no retry being performed, since scsi_eh_tur only retries once and
only if NEEDS_RETRY is returned. Or am I missing something?
Thanks,
Brian
--
Brian King
Power Linux I/O
IBM Linux Technology Center
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 0/1] scsi: Handle MLQUEUE busy response in scsi_send_eh_cmnd
2013-04-15 21:55 ` Brian King
@ 2013-04-15 22:33 ` James Bottomley
2013-04-16 0:09 ` wenxiong
2013-04-16 15:12 ` Brian King
0 siblings, 2 replies; 7+ messages in thread
From: James Bottomley @ 2013-04-15 22:33 UTC (permalink / raw)
To: Brian King; +Cc: wenxiong, linux-scsi
On Mon, 2013-04-15 at 16:55 -0500, Brian King wrote:
> On 04/15/2013 03:45 PM, James Bottomley wrote:
> > On Mon, 2013-04-15 at 13:39 -0500, wenxiong@linux.vnet.ibm.com wrote:
> >> In scsi_send_eh_cmnd(), this fix will check the return code of queuecomamnd
> >> when sending commands and retry for a bit if the driver returns a
> >> busy response.
> >
> > This is already handled by the timeout, I think. If a driver
> > continuously returns MLQUEUE BUSY, then we'll fail the request after the
> > timeout on the command expires.
>
> If we get a timeout in scsi_send_eh_cmnd we call scsi_abort_eh_cmnd. If the
> abort works, we return FAILED out of scsi_send_eh_cmnd, which results in
> no retry being performed, since scsi_eh_tur only retries once and
> only if NEEDS_RETRY is returned. Or am I missing something?
Sorry, I'm not being clear. It comes with being at a conference. What
I mean is that if you do this, the criterion for success or failure
should be the amount of time left not the number of retries. This is
what the non-eh submission path also does for retries of events that
don't count against the retry limit ... so something like this patch
(uncompiled and untested #include stddisclaimer.h)
James
----
diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index c1b05a8..93ab4f4 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -793,6 +793,7 @@ static int scsi_send_eh_cmnd(struct scsi_cmnd *scmd, unsigned char *cmnd,
DECLARE_COMPLETION_ONSTACK(done);
unsigned long timeleft;
struct scsi_eh_save ses;
+ const int stall_for = min(HZ/10,1); /* 100 ms */
int rtn;
scsi_eh_prep_cmnd(scmd, &ses, cmnd, cmnd_size, sense_bytes);
@@ -802,6 +803,8 @@ static int scsi_send_eh_cmnd(struct scsi_cmnd *scmd, unsigned char *cmnd,
scmd->scsi_done = scsi_eh_done;
shost->hostt->queuecommand(shost, scmd);
+ retry:
+
timeleft = wait_for_completion_timeout(&done, timeout);
shost->eh_action = NULL;
@@ -831,8 +834,12 @@ static int scsi_send_eh_cmnd(struct scsi_cmnd *scmd, unsigned char *cmnd,
case TARGET_ERROR:
break;
case ADD_TO_MLQUEUE:
- rtn = NEEDS_RETRY;
- break;
+ if (timeleft > stall_for) {
+ timeout = timeleft - stall_for;
+ msleep(stall_for);
+ goto retry;
+ }
+ /* fall through */
default:
rtn = FAILED;
break;
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH 0/1] scsi: Handle MLQUEUE busy response in scsi_send_eh_cmnd
2013-04-15 22:33 ` James Bottomley
@ 2013-04-16 0:09 ` wenxiong
2013-04-16 15:12 ` Brian King
1 sibling, 0 replies; 7+ messages in thread
From: wenxiong @ 2013-04-16 0:09 UTC (permalink / raw)
To: James Bottomley; +Cc: Brian King, linux-scsi
Quoting James Bottomley <James.Bottomley@hansenpartnership.com>:
> On Mon, 2013-04-15 at 16:55 -0500, Brian King wrote:
>> On 04/15/2013 03:45 PM, James Bottomley wrote:
>> > On Mon, 2013-04-15 at 13:39 -0500, wenxiong@linux.vnet.ibm.com wrote:
>> >> In scsi_send_eh_cmnd(), this fix will check the return code of
>> queuecomamnd
>> >> when sending commands and retry for a bit if the driver returns a
>> >> busy response.
>> >
>> > This is already handled by the timeout, I think. If a driver
>> > continuously returns MLQUEUE BUSY, then we'll fail the request after the
>> > timeout on the command expires.
>>
>> If we get a timeout in scsi_send_eh_cmnd we call scsi_abort_eh_cmnd. If the
>> abort works, we return FAILED out of scsi_send_eh_cmnd, which results in
>> no retry being performed, since scsi_eh_tur only retries once and
>> only if NEEDS_RETRY is returned. Or am I missing something?
>
> Sorry, I'm not being clear. It comes with being at a conference. What
> I mean is that if you do this, the criterion for success or faiure
> should be the amount of time left not the number of retries. This is
> what the non-eh submission path also does for retries of events that
> don't count against the retry limit ... so something like this patch
> (uncompiled and untested #include stddisclaimer.h)
>
> James
Hi James,
The failing case for us is: Doesn't matter what timeout value we set in
wait_for_completion_timeout(), it always returns with timeleft = 0.
For example, if I set timeout = 50 secs, wait_for_completion_timeout()
always returns with timeleft =0(even adapter is already in good shape in
20 secs). We never gets a chance to call into if (timeleft) leg.
My understanding is: if shost->host->queuecommand() failed with MLQUEUE busy
response at the first time, wait_for_completion_timeout() always wakes
up by expired.
Here is log when I enabled scsi log:
Apr 15 18:44:35 ltcsatiocp5 kernel: scsi_send_eh_cmnd: scmd:
c0000000f88bc980, timeleft: 0
I applied your patch. Because timeleft is always zero, never got a
chance to call into
if(timeleft) { leg.
Thanks,
Wendy
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 0/1] scsi: Handle MLQUEUE busy response in scsi_send_eh_cmnd
2013-04-15 22:33 ` James Bottomley
2013-04-16 0:09 ` wenxiong
@ 2013-04-16 15:12 ` Brian King
1 sibling, 0 replies; 7+ messages in thread
From: Brian King @ 2013-04-16 15:12 UTC (permalink / raw)
To: James Bottomley; +Cc: wenxiong, linux-scsi
On 04/15/2013 05:33 PM, James Bottomley wrote:
> On Mon, 2013-04-15 at 16:55 -0500, Brian King wrote:
>> On 04/15/2013 03:45 PM, James Bottomley wrote:
>>> On Mon, 2013-04-15 at 13:39 -0500, wenxiong@linux.vnet.ibm.com wrote:
>>>> In scsi_send_eh_cmnd(), this fix will check the return code of queuecomamnd
>>>> when sending commands and retry for a bit if the driver returns a
>>>> busy response.
>>>
>>> This is already handled by the timeout, I think. If a driver
>>> continuously returns MLQUEUE BUSY, then we'll fail the request after the
>>> timeout on the command expires.
>>
>> If we get a timeout in scsi_send_eh_cmnd we call scsi_abort_eh_cmnd. If the
>> abort works, we return FAILED out of scsi_send_eh_cmnd, which results in
>> no retry being performed, since scsi_eh_tur only retries once and
>> only if NEEDS_RETRY is returned. Or am I missing something?
>
> Sorry, I'm not being clear. It comes with being at a conference. What
> I mean is that if you do this, the criterion for success or failure
> should be the amount of time left not the number of retries. This is
> what the non-eh submission path also does for retries of events that
> don't count against the retry limit ... so something like this patch
> (uncompiled and untested #include stddisclaimer.h)
Jams,
Wendy and I discussed this a bit more and I think we understand your concern.
Wendy is working on an updated patch.
Thanks,
Brian
--
Brian King
Power Linux I/O
IBM Linux Technology Center
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2013-04-16 15:13 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-04-15 18:39 [PATCH 0/1] scsi: Handle MLQUEUE busy response in scsi_send_eh_cmnd wenxiong
2013-04-15 18:39 ` [PATCH 1/1] " wenxiong
2013-04-15 20:45 ` [PATCH 0/1] " James Bottomley
2013-04-15 21:55 ` Brian King
2013-04-15 22:33 ` James Bottomley
2013-04-16 0:09 ` wenxiong
2013-04-16 15:12 ` Brian King
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox