* [PATCH 0/1] scsi: Handle MLQUEUE busy response in scsi_send_eh_cmnd @ 2013-04-15 18:39 wenxiong 2013-04-15 18:39 ` [PATCH 1/1] " wenxiong 2013-04-15 20:45 ` [PATCH 0/1] " James Bottomley 0 siblings, 2 replies; 7+ messages in thread From: wenxiong @ 2013-04-15 18:39 UTC (permalink / raw) To: James.Bottomley; +Cc: linux-scsi, brking In scsi_send_eh_cmnd(), this fix will check the return code of queuecomamnd when sending commands and retry for a bit if the driver returns a busy response. Thanks, Wendy -- ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH 1/1] scsi: Handle MLQUEUE busy response in scsi_send_eh_cmnd 2013-04-15 18:39 [PATCH 0/1] scsi: Handle MLQUEUE busy response in scsi_send_eh_cmnd wenxiong @ 2013-04-15 18:39 ` wenxiong 2013-04-15 20:45 ` [PATCH 0/1] " James Bottomley 1 sibling, 0 replies; 7+ messages in thread From: wenxiong @ 2013-04-15 18:39 UTC (permalink / raw) To: James.Bottomley; +Cc: linux-scsi, brking, Wen Xiong [-- Attachment #1: check_return_of_queuecommand --] [-- Type: text/plain, Size: 1468 bytes --] Fix scsi_send_eh_cmnd to check the return code of queuecommand when sending commands and retry for a bit if the LLDD returns a busy response. This fixes an issue seen with the ipr driver where an ipr initiated reset immediately following an eh_host_reset caused EH initiated commands to fail, resulting in devices being taken offline. This patch resolves the issue. Signed-off-by: Wen Xiong <wenxiong@linux.vnet.ibm.com> --- drivers/scsi/scsi_error.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) Index: b/drivers/scsi/scsi_error.c =================================================================== --- a/drivers/scsi/scsi_error.c 2013-04-10 12:55:57.000000000 -0500 +++ b/drivers/scsi/scsi_error.c 2013-04-10 13:04:12.467858487 -0500 @@ -793,6 +793,7 @@ static int scsi_send_eh_cmnd(struct scsi DECLARE_COMPLETION_ONSTACK(done); unsigned long timeleft; struct scsi_eh_save ses; + int attempts = 30; int rtn; scsi_eh_prep_cmnd(scmd, &ses, cmnd, cmnd_size, sense_bytes); @@ -800,7 +801,14 @@ static int scsi_send_eh_cmnd(struct scsi scsi_log_send(scmd); scmd->scsi_done = scsi_eh_done; - shost->hostt->queuecommand(shost, scmd); + + while ((rtn = shost->hostt->queuecommand(shost, scmd)) && attempts) { + if (rtn == SCSI_MLQUEUE_DEVICE_BUSY || + rtn == SCSI_MLQUEUE_TARGET_BUSY || + rtn == SCSI_MLQUEUE_HOST_BUSY) + attempts--; + ssleep(1); + } timeleft = wait_for_completion_timeout(&done, timeout); -- ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 0/1] scsi: Handle MLQUEUE busy response in scsi_send_eh_cmnd 2013-04-15 18:39 [PATCH 0/1] scsi: Handle MLQUEUE busy response in scsi_send_eh_cmnd wenxiong 2013-04-15 18:39 ` [PATCH 1/1] " wenxiong @ 2013-04-15 20:45 ` James Bottomley 2013-04-15 21:55 ` Brian King 1 sibling, 1 reply; 7+ messages in thread From: James Bottomley @ 2013-04-15 20:45 UTC (permalink / raw) To: wenxiong; +Cc: linux-scsi, brking On Mon, 2013-04-15 at 13:39 -0500, wenxiong@linux.vnet.ibm.com wrote: > In scsi_send_eh_cmnd(), this fix will check the return code of queuecomamnd > when sending commands and retry for a bit if the driver returns a > busy response. This is already handled by the timeout, I think. If a driver continuously returns MLQUEUE BUSY, then we'll fail the request after the timeout on the command expires. James ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 0/1] scsi: Handle MLQUEUE busy response in scsi_send_eh_cmnd 2013-04-15 20:45 ` [PATCH 0/1] " James Bottomley @ 2013-04-15 21:55 ` Brian King 2013-04-15 22:33 ` James Bottomley 0 siblings, 1 reply; 7+ messages in thread From: Brian King @ 2013-04-15 21:55 UTC (permalink / raw) To: James Bottomley; +Cc: wenxiong, linux-scsi On 04/15/2013 03:45 PM, James Bottomley wrote: > On Mon, 2013-04-15 at 13:39 -0500, wenxiong@linux.vnet.ibm.com wrote: >> In scsi_send_eh_cmnd(), this fix will check the return code of queuecomamnd >> when sending commands and retry for a bit if the driver returns a >> busy response. > > This is already handled by the timeout, I think. If a driver > continuously returns MLQUEUE BUSY, then we'll fail the request after the > timeout on the command expires. If we get a timeout in scsi_send_eh_cmnd we call scsi_abort_eh_cmnd. If the abort works, we return FAILED out of scsi_send_eh_cmnd, which results in no retry being performed, since scsi_eh_tur only retries once and only if NEEDS_RETRY is returned. Or am I missing something? Thanks, Brian -- Brian King Power Linux I/O IBM Linux Technology Center ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 0/1] scsi: Handle MLQUEUE busy response in scsi_send_eh_cmnd 2013-04-15 21:55 ` Brian King @ 2013-04-15 22:33 ` James Bottomley 2013-04-16 0:09 ` wenxiong 2013-04-16 15:12 ` Brian King 0 siblings, 2 replies; 7+ messages in thread From: James Bottomley @ 2013-04-15 22:33 UTC (permalink / raw) To: Brian King; +Cc: wenxiong, linux-scsi On Mon, 2013-04-15 at 16:55 -0500, Brian King wrote: > On 04/15/2013 03:45 PM, James Bottomley wrote: > > On Mon, 2013-04-15 at 13:39 -0500, wenxiong@linux.vnet.ibm.com wrote: > >> In scsi_send_eh_cmnd(), this fix will check the return code of queuecomamnd > >> when sending commands and retry for a bit if the driver returns a > >> busy response. > > > > This is already handled by the timeout, I think. If a driver > > continuously returns MLQUEUE BUSY, then we'll fail the request after the > > timeout on the command expires. > > If we get a timeout in scsi_send_eh_cmnd we call scsi_abort_eh_cmnd. If the > abort works, we return FAILED out of scsi_send_eh_cmnd, which results in > no retry being performed, since scsi_eh_tur only retries once and > only if NEEDS_RETRY is returned. Or am I missing something? Sorry, I'm not being clear. It comes with being at a conference. What I mean is that if you do this, the criterion for success or failure should be the amount of time left not the number of retries. This is what the non-eh submission path also does for retries of events that don't count against the retry limit ... so something like this patch (uncompiled and untested #include stddisclaimer.h) James ---- diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c index c1b05a8..93ab4f4 100644 --- a/drivers/scsi/scsi_error.c +++ b/drivers/scsi/scsi_error.c @@ -793,6 +793,7 @@ static int scsi_send_eh_cmnd(struct scsi_cmnd *scmd, unsigned char *cmnd, DECLARE_COMPLETION_ONSTACK(done); unsigned long timeleft; struct scsi_eh_save ses; + const int stall_for = min(HZ/10,1); /* 100 ms */ int rtn; scsi_eh_prep_cmnd(scmd, &ses, cmnd, cmnd_size, sense_bytes); @@ -802,6 +803,8 @@ static int scsi_send_eh_cmnd(struct scsi_cmnd *scmd, unsigned char *cmnd, scmd->scsi_done = scsi_eh_done; shost->hostt->queuecommand(shost, scmd); + retry: + timeleft = wait_for_completion_timeout(&done, timeout); shost->eh_action = NULL; @@ -831,8 +834,12 @@ static int scsi_send_eh_cmnd(struct scsi_cmnd *scmd, unsigned char *cmnd, case TARGET_ERROR: break; case ADD_TO_MLQUEUE: - rtn = NEEDS_RETRY; - break; + if (timeleft > stall_for) { + timeout = timeleft - stall_for; + msleep(stall_for); + goto retry; + } + /* fall through */ default: rtn = FAILED; break; ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH 0/1] scsi: Handle MLQUEUE busy response in scsi_send_eh_cmnd 2013-04-15 22:33 ` James Bottomley @ 2013-04-16 0:09 ` wenxiong 2013-04-16 15:12 ` Brian King 1 sibling, 0 replies; 7+ messages in thread From: wenxiong @ 2013-04-16 0:09 UTC (permalink / raw) To: James Bottomley; +Cc: Brian King, linux-scsi Quoting James Bottomley <James.Bottomley@hansenpartnership.com>: > On Mon, 2013-04-15 at 16:55 -0500, Brian King wrote: >> On 04/15/2013 03:45 PM, James Bottomley wrote: >> > On Mon, 2013-04-15 at 13:39 -0500, wenxiong@linux.vnet.ibm.com wrote: >> >> In scsi_send_eh_cmnd(), this fix will check the return code of >> queuecomamnd >> >> when sending commands and retry for a bit if the driver returns a >> >> busy response. >> > >> > This is already handled by the timeout, I think. If a driver >> > continuously returns MLQUEUE BUSY, then we'll fail the request after the >> > timeout on the command expires. >> >> If we get a timeout in scsi_send_eh_cmnd we call scsi_abort_eh_cmnd. If the >> abort works, we return FAILED out of scsi_send_eh_cmnd, which results in >> no retry being performed, since scsi_eh_tur only retries once and >> only if NEEDS_RETRY is returned. Or am I missing something? > > Sorry, I'm not being clear. It comes with being at a conference. What > I mean is that if you do this, the criterion for success or faiure > should be the amount of time left not the number of retries. This is > what the non-eh submission path also does for retries of events that > don't count against the retry limit ... so something like this patch > (uncompiled and untested #include stddisclaimer.h) > > James Hi James, The failing case for us is: Doesn't matter what timeout value we set in wait_for_completion_timeout(), it always returns with timeleft = 0. For example, if I set timeout = 50 secs, wait_for_completion_timeout() always returns with timeleft =0(even adapter is already in good shape in 20 secs). We never gets a chance to call into if (timeleft) leg. My understanding is: if shost->host->queuecommand() failed with MLQUEUE busy response at the first time, wait_for_completion_timeout() always wakes up by expired. Here is log when I enabled scsi log: Apr 15 18:44:35 ltcsatiocp5 kernel: scsi_send_eh_cmnd: scmd: c0000000f88bc980, timeleft: 0 I applied your patch. Because timeleft is always zero, never got a chance to call into if(timeleft) { leg. Thanks, Wendy ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 0/1] scsi: Handle MLQUEUE busy response in scsi_send_eh_cmnd 2013-04-15 22:33 ` James Bottomley 2013-04-16 0:09 ` wenxiong @ 2013-04-16 15:12 ` Brian King 1 sibling, 0 replies; 7+ messages in thread From: Brian King @ 2013-04-16 15:12 UTC (permalink / raw) To: James Bottomley; +Cc: wenxiong, linux-scsi On 04/15/2013 05:33 PM, James Bottomley wrote: > On Mon, 2013-04-15 at 16:55 -0500, Brian King wrote: >> On 04/15/2013 03:45 PM, James Bottomley wrote: >>> On Mon, 2013-04-15 at 13:39 -0500, wenxiong@linux.vnet.ibm.com wrote: >>>> In scsi_send_eh_cmnd(), this fix will check the return code of queuecomamnd >>>> when sending commands and retry for a bit if the driver returns a >>>> busy response. >>> >>> This is already handled by the timeout, I think. If a driver >>> continuously returns MLQUEUE BUSY, then we'll fail the request after the >>> timeout on the command expires. >> >> If we get a timeout in scsi_send_eh_cmnd we call scsi_abort_eh_cmnd. If the >> abort works, we return FAILED out of scsi_send_eh_cmnd, which results in >> no retry being performed, since scsi_eh_tur only retries once and >> only if NEEDS_RETRY is returned. Or am I missing something? > > Sorry, I'm not being clear. It comes with being at a conference. What > I mean is that if you do this, the criterion for success or failure > should be the amount of time left not the number of retries. This is > what the non-eh submission path also does for retries of events that > don't count against the retry limit ... so something like this patch > (uncompiled and untested #include stddisclaimer.h) Jams, Wendy and I discussed this a bit more and I think we understand your concern. Wendy is working on an updated patch. Thanks, Brian -- Brian King Power Linux I/O IBM Linux Technology Center ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2013-04-16 15:13 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-04-15 18:39 [PATCH 0/1] scsi: Handle MLQUEUE busy response in scsi_send_eh_cmnd wenxiong 2013-04-15 18:39 ` [PATCH 1/1] " wenxiong 2013-04-15 20:45 ` [PATCH 0/1] " James Bottomley 2013-04-15 21:55 ` Brian King 2013-04-15 22:33 ` James Bottomley 2013-04-16 0:09 ` wenxiong 2013-04-16 15:12 ` Brian King
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox