public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] scsi: Handle MLQUEUE busy response in scsi_send_eh_cmnd
@ 2013-04-25  6:10 Hannes Reinecke
  2013-05-03 14:24 ` wenxiong
  0 siblings, 1 reply; 5+ messages in thread
From: Hannes Reinecke @ 2013-04-25  6:10 UTC (permalink / raw)
  To: James Bottomley; +Cc: linux-scsi, Hannes Reinecke, Wen Xiong, Brian King

scsi_send_eh_cmnd() is calling queuecommand() directly, so
it needs to check the return value here.
The only valid return codes for queuecommand() are 'busy'
states, so we need to wait for a bit to allow the LLDD
to recover.

Based on an earlier patch from Wen Xiong.

Cc: Wen Xiong <wenxiong@linux.vnet.ibm.com>
Cc: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Hannes Reinecke <hare@suse.de>

diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index d58db32..6a3c1d2 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -889,22 +889,32 @@ static int scsi_send_eh_cmnd(struct scsi_cmnd *scmd, unsigned char *cmnd,
 	struct scsi_device *sdev = scmd->device;
 	struct Scsi_Host *shost = sdev->host;
 	DECLARE_COMPLETION_ONSTACK(done);
-	unsigned long timeleft;
+	unsigned long timeleft = timeout;
 	struct scsi_eh_save ses;
+	const int stall_for = min(HZ/10, 1);
 	int rtn;
 
+retry:
 	scsi_eh_prep_cmnd(scmd, &ses, cmnd, cmnd_size, sense_bytes);
 	shost->eh_action = &done;
 
 	scsi_log_send(scmd);
 	scmd->scsi_done = scsi_eh_done;
-	shost->hostt->queuecommand(shost, scmd);
-
-	timeleft = wait_for_completion_timeout(&done, timeout);
+	rtn = shost->hostt->queuecommand(shost, scmd);
+	if (rtn) {
+		if (timeleft) {
+			scsi_eh_restore_cmnd(scmd, &ses);
+			timeleft -= stall_for;
+			msleep(stall_for);
+			goto retry;
+		}
+		rtn = NEEDS_RETRY;
+	} else
+		timeleft = wait_for_completion_timeout(&done, timeout);
 
 	shost->eh_action = NULL;
 
-	scsi_log_completion(scmd, SUCCESS);
+	scsi_log_completion(scmd, rtn);
 
 	SCSI_LOG_ERROR_RECOVERY(3,
 		printk("%s: scmd: %p, timeleft: %ld\n",
@@ -935,7 +945,7 @@ static int scsi_send_eh_cmnd(struct scsi_cmnd *scmd, unsigned char *cmnd,
 			rtn = FAILED;
 			break;
 		}
-	} else {
+	} else if (!rtn) {
 		scsi_abort_eh_cmnd(scmd);
 		rtn = FAILED;
 	}

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] scsi: Handle MLQUEUE busy response in scsi_send_eh_cmnd
  2013-04-25  6:10 [PATCH] scsi: Handle MLQUEUE busy response in scsi_send_eh_cmnd Hannes Reinecke
@ 2013-05-03 14:24 ` wenxiong
  2013-05-03 18:23   ` James Bottomley
  0 siblings, 1 reply; 5+ messages in thread
From: wenxiong @ 2013-05-03 14:24 UTC (permalink / raw)
  To: Hannes Reinecke; +Cc: James Bottomley, linux-scsi, Brian King


Quoting Hannes Reinecke <hare@suse.de>:

> scsi_send_eh_cmnd() is calling queuecommand() directly, so
> it needs to check the return value here.
> The only valid return codes for queuecommand() are 'busy'
> states, so we need to wait for a bit to allow the LLDD
> to recover.
>
> Based on an earlier patch from Wen Xiong.
>
> Cc: Wen Xiong <wenxiong@linux.vnet.ibm.com>
> Cc: Brian King <brking@linux.vnet.ibm.com>
> Signed-off-by: Hannes Reinecke <hare@suse.de>
>
Hi James,

I have verified this patch with two new ipr adapters. EEH error can be  
recoery successfully.
Do you have any question about this new patch?

Thanks,
Wendy
> diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
> index d58db32..6a3c1d2 100644
> --- a/drivers/scsi/scsi_error.c
> +++ b/drivers/scsi/scsi_error.c
> @@ -889,22 +889,32 @@ static int scsi_send_eh_cmnd(struct scsi_cmnd  
> *scmd, unsigned char *cmnd,
>  	struct scsi_device *sdev = scmd->device;
>  	struct Scsi_Host *shost = sdev->host;
>  	DECLARE_COMPLETION_ONSTACK(done);
> -	unsigned long timeleft;
> +	unsigned long timeleft = timeout;
>  	struct scsi_eh_save ses;
> +	const int stall_for = min(HZ/10, 1);
>  	int rtn;
>
> +retry:
>  	scsi_eh_prep_cmnd(scmd, &ses, cmnd, cmnd_size, sense_bytes);
>  	shost->eh_action = &done;
>
>  	scsi_log_send(scmd);
>  	scmd->scsi_done = scsi_eh_done;
> -	shost->hostt->queuecommand(shost, scmd);
> -
> -	timeleft = wait_for_completion_timeout(&done, timeout);
> +	rtn = shost->hostt->queuecommand(shost, scmd);
> +	if (rtn) {
> +		if (timeleft) {
> +			scsi_eh_restore_cmnd(scmd, &ses);
> +			timeleft -= stall_for;
> +			msleep(stall_for);
> +			goto retry;
> +		}
> +		rtn = NEEDS_RETRY;
> +	} else
> +		timeleft = wait_for_completion_timeout(&done, timeout);
>
>  	shost->eh_action = NULL;
>
> -	scsi_log_completion(scmd, SUCCESS);
> +	scsi_log_completion(scmd, rtn);
>
>  	SCSI_LOG_ERROR_RECOVERY(3,
>  		printk("%s: scmd: %p, timeleft: %ld\n",
> @@ -935,7 +945,7 @@ static int scsi_send_eh_cmnd(struct scsi_cmnd  
> *scmd, unsigned char *cmnd,
>  			rtn = FAILED;
>  			break;
>  		}
> -	} else {
> +	} else if (!rtn) {
>  		scsi_abort_eh_cmnd(scmd);
>  		rtn = FAILED;
>  	}




^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] scsi: Handle MLQUEUE busy response in scsi_send_eh_cmnd
  2013-05-03 14:24 ` wenxiong
@ 2013-05-03 18:23   ` James Bottomley
  2013-05-04 17:02     ` Bart Van Assche
  0 siblings, 1 reply; 5+ messages in thread
From: James Bottomley @ 2013-05-03 18:23 UTC (permalink / raw)
  To: wenxiong@linux.vnet.ibm.com
  Cc: Hannes Reinecke, linux-scsi@vger.kernel.org, Brian King

On Fri, 2013-05-03 at 10:24 -0400, wenxiong@linux.vnet.ibm.com wrote:
> Quoting Hannes Reinecke <hare@suse.de>:
> 
> > scsi_send_eh_cmnd() is calling queuecommand() directly, so
> > it needs to check the return value here.
> > The only valid return codes for queuecommand() are 'busy'
> > states, so we need to wait for a bit to allow the LLDD
> > to recover.
> >
> > Based on an earlier patch from Wen Xiong.
> >
> > Cc: Wen Xiong <wenxiong@linux.vnet.ibm.com>
> > Cc: Brian King <brking@linux.vnet.ibm.com>
> > Signed-off-by: Hannes Reinecke <hare@suse.de>
> >
> Hi James,
> 
> I have verified this patch with two new ipr adapters. EEH error can be  
> recoery successfully.
> Do you have any question about this new patch?

Yes, it's not correct: stall_for is in jiffies not msec, so
msleep(stall_for) is taking far too long.  Plus you don't know that
stall_for is a divisor of timeleft, so timeleft -= stall_for could wrap
and cause the retry loop to go on effectively forever.

I think the below is the correct patch, so I'll commit it unless there
are objections.

James

---
>From a74498dda7acf6f5fb99e43df54f2c1f5c6beec9 Mon Sep 17 00:00:00 2001
From: Hannes Reinecke <hare@suse.de>
Date: Thu, 25 Apr 2013 08:10:00 +0200
Subject: [PATCH] [SCSI] Handle MLQUEUE busy response in scsi_send_eh_cmnd

scsi_send_eh_cmnd() is calling queuecommand() directly, so
it needs to check the return value here.
The only valid return codes for queuecommand() are 'busy'
states, so we need to wait for a bit to allow the LLDD
to recover.

Based on an earlier patch from Wen Xiong.

[jejb: fix confusion between msec and jiffies values and other issues]
Cc: Wen Xiong <wenxiong@linux.vnet.ibm.com>
Cc: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>

diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index c1b05a8..cfd1ef2 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -25,6 +25,7 @@
 #include <linux/interrupt.h>
 #include <linux/blkdev.h>
 #include <linux/delay.h>
+#include <linux/jiffies.h>
 
 #include <scsi/scsi.h>
 #include <scsi/scsi_cmnd.h>
@@ -791,22 +792,33 @@ static int scsi_send_eh_cmnd(struct scsi_cmnd *scmd, unsigned char *cmnd,
 	struct scsi_device *sdev = scmd->device;
 	struct Scsi_Host *shost = sdev->host;
 	DECLARE_COMPLETION_ONSTACK(done);
-	unsigned long timeleft;
+	unsigned long timeleft = timeout;
 	struct scsi_eh_save ses;
+	const unsigned long stall_for = min(msecs_to_jiffies(10), 1UL);
 	int rtn;
 
+retry:
 	scsi_eh_prep_cmnd(scmd, &ses, cmnd, cmnd_size, sense_bytes);
 	shost->eh_action = &done;
 
 	scsi_log_send(scmd);
 	scmd->scsi_done = scsi_eh_done;
-	shost->hostt->queuecommand(shost, scmd);
-
-	timeleft = wait_for_completion_timeout(&done, timeout);
+	rtn = shost->hostt->queuecommand(shost, scmd);
+	if (rtn) {
+		if (timeleft > stall_for) {
+			scsi_eh_restore_cmnd(scmd, &ses);
+			timeleft -= stall_for;
+			msleep(jiffies_to_msecs(stall_for));
+			goto retry;
+		}
+		timeleft = 0;
+		rtn = NEEDS_RETRY;
+	} else
+		timeleft = wait_for_completion_timeout(&done, timeout);
 
 	shost->eh_action = NULL;
 
-	scsi_log_completion(scmd, SUCCESS);
+	scsi_log_completion(scmd, rtn);
 
 	SCSI_LOG_ERROR_RECOVERY(3,
 		printk("%s: scmd: %p, timeleft: %ld\n",
@@ -837,7 +849,7 @@ static int scsi_send_eh_cmnd(struct scsi_cmnd *scmd, unsigned char *cmnd,
 			rtn = FAILED;
 			break;
 		}
-	} else {
+	} else if (!rtn) {
 		scsi_abort_eh_cmnd(scmd);
 		rtn = FAILED;
 	}


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] scsi: Handle MLQUEUE busy response in scsi_send_eh_cmnd
  2013-05-03 18:23   ` James Bottomley
@ 2013-05-04 17:02     ` Bart Van Assche
  2013-05-04 18:20       ` James Bottomley
  0 siblings, 1 reply; 5+ messages in thread
From: Bart Van Assche @ 2013-05-04 17:02 UTC (permalink / raw)
  To: James Bottomley
  Cc: wenxiong@linux.vnet.ibm.com, Hannes Reinecke,
	linux-scsi@vger.kernel.org, Brian King

On 05/03/13 20:23, James Bottomley wrote:
> +	const unsigned long stall_for = min(msecs_to_jiffies(10), 1UL);

Hello James,

Can you please clarify what the intention of this statement is ? Is the 
purpose of this statement to avoid that stall_for would be zero in case 
HZ < 100 ? If that is the case, maybe you meant max() instead of min() ? 
Also, are you aware that msecs_to_jiffies() already rounds up the result 
of the division ?

Bart.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] scsi: Handle MLQUEUE busy response in scsi_send_eh_cmnd
  2013-05-04 17:02     ` Bart Van Assche
@ 2013-05-04 18:20       ` James Bottomley
  0 siblings, 0 replies; 5+ messages in thread
From: James Bottomley @ 2013-05-04 18:20 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: wenxiong@linux.vnet.ibm.com, Hannes Reinecke,
	linux-scsi@vger.kernel.org, Brian King

On Sat, 2013-05-04 at 19:02 +0200, Bart Van Assche wrote:
> On 05/03/13 20:23, James Bottomley wrote:
> > +	const unsigned long stall_for = min(msecs_to_jiffies(10), 1UL);
> 
> Hello James,
> 
> Can you please clarify what the intention of this statement is ? Is the 
> purpose of this statement to avoid that stall_for would be zero in case 
> HZ < 100 ? If that is the case, maybe you meant max() instead of min() ? 
> Also, are you aware that msecs_to_jiffies() already rounds up the result 
> of the division ?

Yes, I thought afterwards I should dump the bogus min statement as well.
Plus HZ/10 is actually 100ms, so the value is 10x wrong.  I've fixed it
up below (plus a bit of comment rework and some style fixes).

Thanks,

James

---

>From 4bd9ef9789ad86656d8e52e8fff5422b741097e1 Mon Sep 17 00:00:00 2001
From: Hannes Reinecke <hare@suse.de>
Date: Thu, 25 Apr 2013 08:10:00 +0200
Subject: [PATCH] [SCSI] Handle MLQUEUE busy response in scsi_send_eh_cmnd

scsi_send_eh_cmnd() is calling queuecommand() directly, so
it needs to check the return value here.
The only valid return codes for queuecommand() are 'busy'
states, so we need to wait for a bit to allow the LLDD
to recover.

Based on an earlier patch from Wen Xiong.

[jejb: fix confusion between msec and jiffies values and other issues]
[bvanassche: correct stall_for interval]
Cc: Wen Xiong <wenxiong@linux.vnet.ibm.com>
Cc: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>

diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index c1b05a8..f43de1e 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -25,6 +25,7 @@
 #include <linux/interrupt.h>
 #include <linux/blkdev.h>
 #include <linux/delay.h>
+#include <linux/jiffies.h>
 
 #include <scsi/scsi.h>
 #include <scsi/scsi_cmnd.h>
@@ -791,32 +792,48 @@ static int scsi_send_eh_cmnd(struct scsi_cmnd *scmd, unsigned char *cmnd,
 	struct scsi_device *sdev = scmd->device;
 	struct Scsi_Host *shost = sdev->host;
 	DECLARE_COMPLETION_ONSTACK(done);
-	unsigned long timeleft;
+	unsigned long timeleft = timeout;
 	struct scsi_eh_save ses;
+	const unsigned long stall_for = msecs_to_jiffies(100);
 	int rtn;
 
+retry:
 	scsi_eh_prep_cmnd(scmd, &ses, cmnd, cmnd_size, sense_bytes);
 	shost->eh_action = &done;
 
 	scsi_log_send(scmd);
 	scmd->scsi_done = scsi_eh_done;
-	shost->hostt->queuecommand(shost, scmd);
-
-	timeleft = wait_for_completion_timeout(&done, timeout);
+	rtn = shost->hostt->queuecommand(shost, scmd);
+	if (rtn) {
+		if (timeleft > stall_for) {
+			scsi_eh_restore_cmnd(scmd, &ses);
+			timeleft -= stall_for;
+			msleep(jiffies_to_msecs(stall_for));
+			goto retry;
+		}
+		/* signal not to enter either branch of the if () below */
+		timeleft = 0;
+		rtn = NEEDS_RETRY;
+	} else {
+		timeleft = wait_for_completion_timeout(&done, timeout);
+	}
 
 	shost->eh_action = NULL;
 
-	scsi_log_completion(scmd, SUCCESS);
+	scsi_log_completion(scmd, rtn);
 
 	SCSI_LOG_ERROR_RECOVERY(3,
 		printk("%s: scmd: %p, timeleft: %ld\n",
 			__func__, scmd, timeleft));
 
 	/*
-	 * If there is time left scsi_eh_done got called, and we will
-	 * examine the actual status codes to see whether the command
-	 * actually did complete normally, else tell the host to forget
-	 * about this command.
+	 * If there is time left scsi_eh_done got called, and we will examine
+	 * the actual status codes to see whether the command actually did
+	 * complete normally, else if we have a zero return and no time left,
+	 * the command must still be pending, so abort it and return FAILED.
+	 * If we never actually managed to issue the command, because
+	 * ->queuecommand() kept returning non zero, use the rtn = FAILED
+	 * value above (so don't execute either branch of the if)
 	 */
 	if (timeleft) {
 		rtn = scsi_eh_completed_normally(scmd);
@@ -837,7 +854,7 @@ static int scsi_send_eh_cmnd(struct scsi_cmnd *scmd, unsigned char *cmnd,
 			rtn = FAILED;
 			break;
 		}
-	} else {
+	} else if (!rtn) {
 		scsi_abort_eh_cmnd(scmd);
 		rtn = FAILED;
 	}


^ permalink raw reply related	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2013-05-04 18:20 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-04-25  6:10 [PATCH] scsi: Handle MLQUEUE busy response in scsi_send_eh_cmnd Hannes Reinecke
2013-05-03 14:24 ` wenxiong
2013-05-03 18:23   ` James Bottomley
2013-05-04 17:02     ` Bart Van Assche
2013-05-04 18:20       ` James Bottomley

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox