[PATCH] scsi device recovery

public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH] scsi device recovery
@ 2007-12-12 12:54 Bernd Schubert
  2007-12-12 13:39 ` Matthew Wilcox
  0 siblings, 1 reply; 12+ messages in thread
From: Bernd Schubert @ 2007-12-12 12:54 UTC (permalink / raw)
  To: linux-scsi

Hi,

below is a patch introducing device recovery, trying to prevent i/o errors 
when a DID_NO_CONNECT or SOFT_ERROR does happen.

The patch still needs quite some work:

1.) I still didn't figure out what is the best place to run 

sdev->deh.ehandler = kthread_run(scsi_device_error_handler, ...)

2.) As I see it, its not a good idea to run spi_schedule_dv_device() in 
scsi_error.c, since spi_schedule_dv_device() is in scsi_transport_spi.c, 
which seems to be separated from the core scsi-layer.
So what is another way to initiate a DV in scsi_error.c?

3.) Maybe related to 2), for now I'm calling spi_schedule_dv_device(), but 
this is not always doing what I want.

[  406.785104] sd 5:0:2:0: deh: scheduling domain validation
[  408.422530]  target5:0:2: Beginning Domain Validation
[  408.466620]  target5:0:2: Domain Validation skipping write tests
[  408.472771]  target5:0:2: Ending Domain Validation

Hmm, somehow related to sdev->inquiry_len, but isn't it the task of 
spi_schedule_dv_device() and subfunctions to do that properly?

Any comments, hints and help is appreciated.


Signed-of-by: Bernd Schubert <bs@q-leap.de>

Index: linux-2.6.22/drivers/scsi/scsi_error.c
===================================================================
--- linux-2.6.22.orig/drivers/scsi/scsi_error.c	2007-12-12 12:26:20.000000000 
+0100
+++ linux-2.6.22/drivers/scsi/scsi_error.c	2007-12-12 13:08:40.000000000 +0100
@@ -33,6 +33,7 @@
 #include <scsi/scsi_transport.h>
 #include <scsi/scsi_host.h>
 #include <scsi/scsi_ioctl.h>
+#include <scsi/scsi_transport_spi.h>
 
 #include "scsi_priv.h"
 #include "scsi_logging.h"
@@ -1589,6 +1590,153 @@ int scsi_error_handler(void *data)
 	return 0;
 }
 
+/**
+  * scsi_unjam_sdev - try to revover a failed scsi-device
+  * @sdev:	scsi device we are recovering
+  */
+static int scsi_unjam_sdev(struct scsi_device *sdev)
+{
+	int rtn;
+
+	sdev_printk(KERN_CRIT, sdev, "resetting device\n");
+	rtn = scsi_reset_provider(sdev, SCSI_TRY_RESET_DEVICE);
+	scsi_report_device_reset(sdev->host, sdev->channel, sdev->id);
+	if (rtn == SUCCESS)
+		sdev_printk(KERN_INFO, sdev, "device reset succeeded, "
+		            "set device to running state\n");
+	return SUCCESS;
+}
+
+/**
+ * scsi_schedule_deh - schedule EH for SCSI device
+ * @sdev:	SCSI device to invoke error handling on.
+ *
+ **/
+void scsi_schedule_deh(struct scsi_device *sdev)
+{
+#if 0
+	if (sdev->deh.error) {
+		/* blocking the device does not work! another recovery was
+		 * scheduled, though no i/o should go to the device now! */
+		sdev_printk(KERN_CRIT, sdev,
+		            "device already in recovery, but another recovery "
+		            "was scheduled\n");
+		dump_stack();
+	}
+#endif
+	if (sdev->deh.error)
+		return; /* recovery already running */
+
+	if (sdev->deh.last_recovery
+	&&  jiffies < sdev->deh.last_recovery + 300 * HZ)
+		sdev->deh.count++;
+	else
+		sdev->deh.count = 0;
+
+	if (sdev->deh.count >= 10) {
+		sdev_printk(KERN_WARNING, sdev,
+		            "too many errors within time limit, setting "
+		            "device offline\n");
+		scsi_device_set_state(sdev, SDEV_OFFLINE);
+		return;
+	} else if (sdev->deh.count >= 5) {
+		sdev_printk(KERN_INFO, sdev, "Initiating host recovery\n");
+		scsi_schedule_eh(sdev->host); /* host recovery */
+		return;
+	} else
+		sdev->deh.count++;
+
+	sdev_printk(KERN_INFO, sdev, "n-error: %d\n", sdev->deh.count);
+
+	if (!scsi_internal_device_block(sdev)) {
+		sdev->deh.error = 1;
+		if (sdev->deh.ehandler)
+			wake_up_process(sdev->deh.ehandler);
+		else
+			sdev_printk(KERN_WARNING, sdev,
+			            "deh handler missing\n");
+	} else {
+		sdev_printk(KERN_WARNING, sdev,
+		            "Couldn't block device, calling host recovery\n");
+		scsi_schedule_eh(sdev->host);
+	}
+}
+EXPORT_SYMBOL_GPL(scsi_schedule_deh);
+
+/**
+ * scsi_device_error_handler - SCSI error handler thread
+ * @data:	Device for which we are running.
+ *
+ * Notes:
+ *    This is the main device error handling loop.  This is run as a kernel 
thread
+ *    for every SCSI device and handles all device error handling activity.
+ **/
+int scsi_device_error_handler(void *data)
+{
+	struct scsi_device *sdev = data;
+	int sleeptime = 30;
+
+	current->flags |= PF_NOFREEZE;
+
+	/*
+	 * We use TASK_INTERRUPTIBLE so that the thread is not
+	 * counted against the load average as a running process.
+	 * We never actually get interrupted because kthread_run
+	 * disables singal delivery for the created thread.
+	 */
+	set_current_state(TASK_INTERRUPTIBLE);
+	while (!kthread_should_stop()) {
+		if (sdev->deh.error == 0) {
+			SCSI_LOG_ERROR_RECOVERY(1,
+				printk("Error handler scsi_deh sleeping\n"));
+			schedule();
+			set_current_state(TASK_INTERRUPTIBLE);
+			continue;
+		}
+
+		__set_current_state(TASK_RUNNING);
+		SCSI_LOG_ERROR_RECOVERY(1,
+			printk("Error handler scsi_deh waking up\n"));
+
+		sdev_printk(KERN_CRIT, sdev, "waiting %ds to settle device\n",
+		            sleeptime);
+		msleep (sleeptime * 1000);
+
+		if (sdev->deh.count < 2) {
+			sdev_printk(KERN_WARNING, sdev,
+			            "First device error, simply recovery\n");
+			goto cont;
+		}
+
+		/*
+		 * We have a device that is failing for some reason.  Figure out
+		 * what we need to do to get it up and online again (if we can).
+		 * If we fail, we call host recovery
+		 */
+		if (scsi_unjam_sdev(sdev) != SUCCESS) {
+			sdev_printk(KERN_CRIT, sdev, "device recovery failed,"
+			            " initiating host recovery\n");
+			scsi_schedule_eh(sdev->host);
+			/* scsi_schedule_eh() doesn't know about deh.error */
+			goto error_cont;
+		}
+cont:
+		if (scsi_internal_device_unblock(sdev))
+			sdev_printk(KERN_WARNING, sdev,
+			            "deh: device unblocking failed!\n");
+		spi_schedule_dv_device(sdev);
+error_cont:
+		sdev->deh.error = 0;
+		sdev->deh.last_recovery = jiffies;
+		set_current_state(TASK_INTERRUPTIBLE);
+	}
+	__set_current_state(TASK_RUNNING);
+
+	sdev_printk(KERN_CRIT, sdev, "Error handler scsi_deh exiting\n");
+	sdev->deh.ehandler = NULL;
+	return 0;
+}
+
 /*
  * Function:    scsi_report_bus_reset()
  *
Index: linux-2.6.22/include/scsi/scsi_device.h
===================================================================
--- linux-2.6.22.orig/include/scsi/scsi_device.h	2007-12-12 12:26:20.000000000 
+0100
+++ linux-2.6.22/include/scsi/scsi_device.h	2007-12-12 12:26:23.000000000 
+0100
@@ -145,6 +145,13 @@ struct scsi_device {
 
 	enum scsi_device_state sdev_state;
 	unsigned long		sdev_data[0];
+
+	struct device_error_handler {
+		unsigned error;
+		struct task_struct * ehandler;	/* Error recovery thread. */
+		time_t	last_recovery; 		/* time on last error recovery */
+		unsigned count;			/* error count */
+	} deh;
 } __attribute__((aligned(sizeof(unsigned long))));
 #define	to_scsi_device(d)	\
 	container_of(d, struct scsi_device, sdev_gendev)
Index: linux-2.6.22/drivers/scsi/scsi_scan.c
===================================================================
--- linux-2.6.22.orig/drivers/scsi/scsi_scan.c	2007-12-12 12:26:20.000000000 
+0100
+++ linux-2.6.22/drivers/scsi/scsi_scan.c	2007-12-12 12:26:23.000000000 +0100
@@ -1313,6 +1313,12 @@ static int scsi_report_lun_scan(struct s
 			return 0;
 	}
 
+	if (!sdev->deh.ehandler)
+		sdev->deh.ehandler = kthread_run(scsi_device_error_handler,
+		                                 sdev, "sdeh_%d_%d_%d_%d",
+	                                         shost->host_no, sdev->channel,
+	                                         sdev->id, sdev->lun);
+
 	sprintf(devname, "host %d channel %d id %d",
 		shost->host_no, sdev->channel, sdev->id);
 
@@ -1489,8 +1495,13 @@ struct scsi_device *__scsi_add_device(st
 		scsi_probe_and_add_lun(starget, lun, NULL, &sdev, 1, hostdata);
 	mutex_unlock(&shost->scan_mutex);
 	scsi_target_reap(starget);
-	put_device(&starget->dev);
 
+	if (!sdev->deh.ehandler)
+		sdev->deh.ehandler = kthread_run(scsi_device_error_handler,
+		                                 sdev, "sdeh_%d_%d_%d_%d",
+	                                         shost->host_no, sdev->channel,
+	                                         sdev->id, sdev->lun);
+	put_device(&starget->dev);
 	return sdev;
 }
 EXPORT_SYMBOL(__scsi_add_device);
Index: linux-2.6.22/drivers/scsi/scsi_priv.h
===================================================================
--- linux-2.6.22.orig/drivers/scsi/scsi_priv.h	2007-12-12 12:26:20.000000000 
+0100
+++ linux-2.6.22/drivers/scsi/scsi_priv.h	2007-12-12 12:26:23.000000000 +0100
@@ -54,6 +54,7 @@ extern void scsi_add_timer(struct scsi_c
 extern int scsi_delete_timer(struct scsi_cmnd *);
 extern void scsi_times_out(struct scsi_cmnd *cmd);
 extern int scsi_error_handler(void *host);
+extern int scsi_device_error_handler(void *sdev);
 extern int scsi_decide_disposition(struct scsi_cmnd *cmd);
 extern void scsi_eh_wakeup(struct Scsi_Host *shost);
 extern int scsi_eh_scmd_add(struct scsi_cmnd *, int);
Index: linux-2.6.22/drivers/scsi/scsi_sysfs.c
===================================================================
--- linux-2.6.22.orig/drivers/scsi/scsi_sysfs.c	2007-12-12 12:26:20.000000000 
+0100
+++ linux-2.6.22/drivers/scsi/scsi_sysfs.c	2007-12-12 12:26:23.000000000 +0100
@@ -10,6 +10,7 @@
 #include <linux/init.h>
 #include <linux/blkdev.h>
 #include <linux/device.h>
+#include <linux/kthread.h>
 
 #include <scsi/scsi.h>
 #include <scsi/scsi_device.h>
@@ -798,6 +799,9 @@ void __scsi_remove_device(struct scsi_de
 	if (scsi_device_set_state(sdev, SDEV_CANCEL) != 0)
 		return;
 
+	if (sdev->deh.ehandler)
+		kthread_stop(sdev->deh.ehandler);
+
 	class_device_unregister(&sdev->sdev_classdev);
 	transport_remove_device(dev);
 	device_del(dev);
Index: linux-2.6.22/drivers/scsi/scsi_lib.c
===================================================================
--- linux-2.6.22.orig/drivers/scsi/scsi_lib.c	2007-12-12 12:26:20.000000000 
+0100
+++ linux-2.6.22/drivers/scsi/scsi_lib.c	2007-12-12 12:52:31.000000000 +0100
@@ -28,6 +28,7 @@
 
 #include "scsi_priv.h"
 #include "scsi_logging.h"
+#include "scsi_transport_api.h"
 
 
 #define SG_MEMPOOL_NR		ARRAY_SIZE(scsi_sg_pools)
@@ -820,6 +821,7 @@ void scsi_io_completion(struct scsi_cmnd
 	int this_count = cmd->request_bufflen;
 	request_queue_t *q = cmd->device->request_queue;
 	struct request *req = cmd->request;
+	struct scsi_device *sdev = cmd->device;
 	int clear_errors = 1;
 	struct scsi_sense_hdr sshdr;
 	int sense_valid = 0;
@@ -958,13 +960,26 @@ void scsi_io_completion(struct scsi_cmnd
 			break;
 		}
 	}
-	if (host_byte(result) == DID_RESET) {
+	switch (host_byte(result)) {
+	case DID_OK:
+		break;
+	case DID_RESET:
 		/* Third party bus reset or reset for error recovery
 		 * reasons.  Just retry the request and see what
 		 * happens.
 		 */
 		scsi_requeue_command(q, cmd);
 		return;
+	case DID_NO_CONNECT:
+		sdev_printk(KERN_CRIT, sdev, "DID_NO_CONNECT\n");
+		scsi_schedule_deh(sdev);
+		scsi_requeue_command(q, cmd);
+		return;
+	case DID_SOFT_ERROR:
+		sdev_printk(KERN_CRIT, sdev, "DID_SOFT_ERROR\n");
+		scsi_schedule_deh(sdev);
+		scsi_requeue_command(q, cmd);
+		return;
 	}
 	if (result) {
 		if (!(req->cmd_flags & REQ_QUIET)) {
@@ -2007,18 +2022,18 @@ scsi_device_set_state(struct scsi_device
 			goto illegal;
 		}
 		break;
-
 	}
 	sdev->sdev_state = state;
 	return 0;
 
  illegal:
-	SCSI_LOG_ERROR_RECOVERY(1, 
+	SCSI_LOG_ERROR_RECOVERY(1,
 				sdev_printk(KERN_ERR, sdev,
 					    "Illegal state transition %s->%s\n",
 					    scsi_device_state_name(oldstate),
 					    scsi_device_state_name(state))
 				);
+	dump_stack();
 	return -EINVAL;
 }
 EXPORT_SYMBOL(scsi_device_set_state);
Index: linux-2.6.22/drivers/scsi/scsi_transport_api.h
===================================================================
--- linux-2.6.22.orig/drivers/scsi/scsi_transport_api.h	2007-12-12 
12:26:20.000000000 +0100
+++ linux-2.6.22/drivers/scsi/scsi_transport_api.h	2007-12-12 
12:26:23.000000000 +0100
@@ -2,5 +2,6 @@
 #define _SCSI_TRANSPORT_API_H
 
 void scsi_schedule_eh(struct Scsi_Host *shost);
+void scsi_schedule_deh(struct scsi_device *sdev);
 
 #endif /* _SCSI_TRANSPORT_API_H */
Index: linux-2.6.22/drivers/scsi/scsi.c
===================================================================
--- linux-2.6.22.orig/drivers/scsi/scsi.c	2007-12-12 12:26:20.000000000 +0100
+++ linux-2.6.22/drivers/scsi/scsi.c	2007-12-12 12:26:23.000000000 +0100
@@ -494,7 +494,8 @@ int scsi_dispatch_cmd(struct scsi_cmnd *
 		 */
 		scsi_queue_insert(cmd, SCSI_MLQUEUE_DEVICE_BUSY);
 
-		SCSI_LOG_MLQUEUE(3, printk("queuecommand : device blocked \n"));
+		SCSI_LOG_MLQUEUE(3, printk("queuecommand : device blocked or "
+		                           "in recovery\n"));
 
 		/*
 		 * NOTE: rtn is still zero here because we don't need the


-- 
Bernd Schubert
Q-Leap Networks GmbH

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] scsi device recovery
  2007-12-12 12:54 [PATCH] scsi device recovery Bernd Schubert
@ 2007-12-12 13:39 ` Matthew Wilcox
  2007-12-12 14:36   ` Bernd Schubert
  0 siblings, 1 reply; 12+ messages in thread
From: Matthew Wilcox @ 2007-12-12 13:39 UTC (permalink / raw)
  To: Bernd Schubert; +Cc: linux-scsi

On Wed, Dec 12, 2007 at 01:54:14PM +0100, Bernd Schubert wrote:
> below is a patch introducing device recovery, trying to prevent i/o errors 
> when a DID_NO_CONNECT or SOFT_ERROR does happen.

Why doesn't the regular scsi_eh do what you need?

-- 
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] scsi device recovery
  2007-12-12 13:39 ` Matthew Wilcox
@ 2007-12-12 14:36   ` Bernd Schubert
  2007-12-12 15:59     ` James Bottomley
  0 siblings, 1 reply; 12+ messages in thread
From: Bernd Schubert @ 2007-12-12 14:36 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: linux-scsi

On Wednesday 12 December 2007 14:39:27 Matthew Wilcox wrote:
> On Wed, Dec 12, 2007 at 01:54:14PM +0100, Bernd Schubert wrote:
> > below is a patch introducing device recovery, trying to prevent i/o
> > errors when a DID_NO_CONNECT or SOFT_ERROR does happen.
>
> Why doesn't the regular scsi_eh do what you need?

First of all, it is presently simply not called when the two errors above do 
happen. This could be changed, of course.

Secondly, I think scsi_eh is in most cases doing too much. We are fighting 
with flaky Infortrend boxes here, and scsi_eh sometimes manages to crash 
their scsi channels. In most cases it is sufficient to stall any io to the 
device and then to resume.
For most scsi devices one probably doesn't need a suspend time or it can be 
very small, this still needs to become configurable via sysfs.

Thirdly, scsi_eh doesn't give up, in most cases, when the scsi channel of a 
Infortrend box crashed, it tried forever to recover.
To improve this is still on my todo list.

Thanks,
Bernd

-- 
Bernd Schubert
Q-Leap Networks GmbH

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] scsi device recovery
  2007-12-12 14:36   ` Bernd Schubert
@ 2007-12-12 15:59     ` James Bottomley
  2007-12-12 17:54       ` Bernd Schubert
  0 siblings, 1 reply; 12+ messages in thread
From: James Bottomley @ 2007-12-12 15:59 UTC (permalink / raw)
  To: Bernd Schubert; +Cc: Matthew Wilcox, linux-scsi

On Wed, 2007-12-12 at 15:36 +0100, Bernd Schubert wrote:
> On Wednesday 12 December 2007 14:39:27 Matthew Wilcox wrote:
> > On Wed, Dec 12, 2007 at 01:54:14PM +0100, Bernd Schubert wrote:
> > > below is a patch introducing device recovery, trying to prevent i/o
> > > errors when a DID_NO_CONNECT or SOFT_ERROR does happen.
> >
> > Why doesn't the regular scsi_eh do what you need?
> 
> First of all, it is presently simply not called when the two errors above do 
> happen. This could be changed, of course.

Erm, I think you'll find the error handler does activate on
DID_SOFT_ERROR.  It causes a retry via the eh.  DID_NO_CONNECT is an
immediate error with no eh intervention because it means that the target
went away.  Handling this as a retryable error isn't an option because
it will interfere with hotplug.

> Secondly, I think scsi_eh is in most cases doing too much. We are fighting 
> with flaky Infortrend boxes here, and scsi_eh sometimes manages to crash 
> their scsi channels. In most cases it is sufficient to stall any io to the 
> device and then to resume.

But that's basically the default behaviour of the error handler (stall
then resume).

> For most scsi devices one probably doesn't need a suspend time or it can be 
> very small, this still needs to become configurable via sysfs.

You mean a wait time beyond what the error handler currently does
(basically it waits for the quiesce, begins error handling and then
sends a test unit ready when it finishes before restarting).

> Thirdly, scsi_eh doesn't give up, in most cases, when the scsi channel of a 
> Infortrend box crashed, it tried forever to recover.
> To improve this is still on my todo list.

Could you send traces for this.  I thought the error handler had been
fixed over the last few years always to terminate.  If there's a case
where it doesn't, this needs fixing.

James

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] scsi device recovery
  2007-12-12 15:59     ` James Bottomley
@ 2007-12-12 17:54       ` Bernd Schubert
  2007-12-13 14:18         ` James Bottomley
  0 siblings, 1 reply; 12+ messages in thread
From: Bernd Schubert @ 2007-12-12 17:54 UTC (permalink / raw)
  To: James Bottomley; +Cc: Matthew Wilcox, linux-scsi

[Hmm, resending since mail after more than 30min still not on the ML, maybe 
the attachment was too large? I have uploaded the log to 
http://www.pci.uni-heidelberg.de/tc/usr/bernd/downloads/scsi/kern.log.1]

On Wednesday 12 December 2007 16:59:36 James Bottomley wrote:
> On Wed, 2007-12-12 at 15:36 +0100, Bernd Schubert wrote:
> > On Wednesday 12 December 2007 14:39:27 Matthew Wilcox wrote:
> > > On Wed, Dec 12, 2007 at 01:54:14PM +0100, Bernd Schubert wrote:
> > > > below is a patch introducing device recovery, trying to prevent i/o
> > > > errors when a DID_NO_CONNECT or SOFT_ERROR does happen.
> > >
> > > Why doesn't the regular scsi_eh do what you need?
> >
> > First of all, it is presently simply not called when the two errors above
> > do happen. This could be changed, of course.
>
> Erm, I think you'll find the error handler does activate on
> DID_SOFT_ERROR.  It causes a retry via the eh.  DID_NO_CONNECT is an

Dec  7 23:48:45 beo-96 kernel: [94605.297924] sd 2:0:5:0: [sdd] Result: 
hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK,SUGGEST_OK
Dec  7 23:48:45 beo-96 kernel: [94605.297932] end_request: I/O error, dev sdd, 
sector 7706802052
Dec  7 23:48:45 beo-96 kernel: [94605.297937] raid5:md5: read error not 
correctable (sector 871932472 on sdd3).

Full log attached.

> immediate error with no eh intervention because it means that the target
> went away.  Handling this as a retryable error isn't an option because
> it will interfere with hotplug.

Then we need a sysfs flag one can set to manually enable eh for these devices
on DID_NO_CONNECT. 

>
> > Secondly, I think scsi_eh is in most cases doing too much. We are
> > fighting with flaky Infortrend boxes here, and scsi_eh sometimes manages
> > to crash their scsi channels. In most cases it is sufficient to stall any
> > io to the device and then to resume.
>
> But that's basically the default behaviour of the error handler (stall
> then resume).
>
> > For most scsi devices one probably doesn't need a suspend time or it can
> > be very small, this still needs to become configurable via sysfs.
>
> You mean a wait time beyond what the error handler currently does
> (basically it waits for the quiesce, begins error handling and then
> sends a test unit ready when it finishes before restarting).

In deh just waits on the first error and then only does a DV. For 
these infortrend devices, thats mostly sufficient.

>
> > Thirdly, scsi_eh doesn't give up, in most cases, when the scsi channel of
> > a Infortrend box crashed, it tried forever to recover.
> > To improve this is still on my todo list.
>
> Could you send traces for this.  I thought the error handler had been
> fixed over the last few years always to terminate.  If there's a case
> where it doesn't, this needs fixing.

I'm attaching the syslog, this is 2.6.22 + additional printks, dump_stack()'s
and msleep()'s.
At 03:59:36 the system finally went into wait_for_completion(), similar
to the "everything in wait_for_completion, what is my system doing?" thread.


Thanks,
Bernd

-- 
Bernd Schubert
Q-Leap Networks GmbH

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] scsi device recovery
  2007-12-12 17:54       ` Bernd Schubert
@ 2007-12-13 14:18         ` James Bottomley
  2007-12-14 11:26           ` fusion problem (was Re: [PATCH] scsi device recovery) Bernd Schubert
  2007-12-14 12:04           ` [PATCH] scsi device recovery Bernd Schubert
  0 siblings, 2 replies; 12+ messages in thread
From: James Bottomley @ 2007-12-13 14:18 UTC (permalink / raw)
  To: Bernd Schubert; +Cc: Matthew Wilcox, linux-scsi, Moore, Eric


On Wed, 2007-12-12 at 18:54 +0100, Bernd Schubert wrote:
> [Hmm, resending since mail after more than 30min still not on the ML, maybe 
> the attachment was too large? I have uploaded the log to 
> http://www.pci.uni-heidelberg.de/tc/usr/bernd/downloads/scsi/kern.log.1]
> 
> On Wednesday 12 December 2007 16:59:36 James Bottomley wrote:
> > On Wed, 2007-12-12 at 15:36 +0100, Bernd Schubert wrote:
> > > On Wednesday 12 December 2007 14:39:27 Matthew Wilcox wrote:
> > > > On Wed, Dec 12, 2007 at 01:54:14PM +0100, Bernd Schubert wrote:
> > > > > below is a patch introducing device recovery, trying to prevent i/o
> > > > > errors when a DID_NO_CONNECT or SOFT_ERROR does happen.
> > > >
> > > > Why doesn't the regular scsi_eh do what you need?
> > >
> > > First of all, it is presently simply not called when the two errors above
> > > do happen. This could be changed, of course.
> >
> > Erm, I think you'll find the error handler does activate on
> > DID_SOFT_ERROR.  It causes a retry via the eh.  DID_NO_CONNECT is an
> 
> Dec  7 23:48:45 beo-96 kernel: [94605.297924] sd 2:0:5:0: [sdd] Result: 
> hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK,SUGGEST_OK
> Dec  7 23:48:45 beo-96 kernel: [94605.297932] end_request: I/O error, dev sdd, 
> sector 7706802052
> Dec  7 23:48:45 beo-96 kernel: [94605.297937] raid5:md5: read error not 
> correctable (sector 871932472 on sdd3).

This is some type of ioc internal error.  What we do on DID_SOFT_ERROR
is retry for the usual number of times up to the timeout limit.
Unfortunately, the retries are fixed at SD_MAX_RETRIES in sd.c.  Without
diagnosing what's going wrong in the fusion, it's impossible to say if
this is reasonable, but your fusion is signalling ioc errors (firmware
errors).

> Full log attached.
> 
> > immediate error with no eh intervention because it means that the target
> > went away.  Handling this as a retryable error isn't an option because
> > it will interfere with hotplug.
> 
> Then we need a sysfs flag one can set to manually enable eh for these devices
> on DID_NO_CONNECT. 

No, because that will seriously damage a lot of other systems.

The DID_NO_CONNECT looks to be a genuine reselection issue caused by a
device out of spec on the bus.  The SPI standard says a device should
respond in 250ms, which is what most HBA's take as the default selection
timeout.  I'd say for the device you have, you need to increase this.
Unfortunately doing this for the fusion is some type of mode page
setting, I think, but I don't have the doc in front of me.  I'd be
amenable to putting the selection timeout as a parameter in the spi
transport class, since others might find it valuable occasionally to
control.

> >
> > > Secondly, I think scsi_eh is in most cases doing too much. We are
> > > fighting with flaky Infortrend boxes here, and scsi_eh sometimes manages
> > > to crash their scsi channels. In most cases it is sufficient to stall any
> > > io to the device and then to resume.
> >
> > But that's basically the default behaviour of the error handler (stall
> > then resume).
> >
> > > For most scsi devices one probably doesn't need a suspend time or it can
> > > be very small, this still needs to become configurable via sysfs.
> >
> > You mean a wait time beyond what the error handler currently does
> > (basically it waits for the quiesce, begins error handling and then
> > sends a test unit ready when it finishes before restarting).
> 
> In deh just waits on the first error and then only does a DV. For 
> these infortrend devices, thats mostly sufficient.

> > > Thirdly, scsi_eh doesn't give up, in most cases, when the scsi channel of
> > > a Infortrend box crashed, it tried forever to recover.
> > > To improve this is still on my todo list.
> >
> > Could you send traces for this.  I thought the error handler had been
> > fixed over the last few years always to terminate.  If there's a case
> > where it doesn't, this needs fixing.
> 
> I'm attaching the syslog, this is 2.6.22 + additional printks, dump_stack()'s
> and msleep()'s.
> At 03:59:36 the system finally went into wait_for_completion(), similar
> to the "everything in wait_for_completion, what is my system doing?" thread.

This looks like a genuine bug.  I missed the thread, since my email
system went off line while I was on holiday for two weeks.  The symptoms
look to be lost commands, but I can't see why from the traces.  There's
a known bug where we can hang in domain validation because of a resource
starvation issue, but I know of none where everything hangs just after
error recovery completes.

James



^ permalink raw reply	[flat|nested] 12+ messages in thread

* fusion problem (was Re: [PATCH] scsi device recovery)
  2007-12-13 14:18         ` James Bottomley
@ 2007-12-14 11:26           ` Bernd Schubert
  2007-12-14 12:04           ` [PATCH] scsi device recovery Bernd Schubert
  1 sibling, 0 replies; 12+ messages in thread
From: Bernd Schubert @ 2007-12-14 11:26 UTC (permalink / raw)
  To: James Bottomley; +Cc: Matthew Wilcox, linux-scsi, Moore, Eric

On Thursday 13 December 2007 15:18:33 James Bottomley wrote:
> On Wed, 2007-12-12 at 18:54 +0100, Bernd Schubert wrote:
> > [Hmm, resending since mail after more than 30min still not on the ML,
> > maybe the attachment was too large? I have uploaded the log to
> > http://www.pci.uni-heidelberg.de/tc/usr/bernd/downloads/scsi/kern.log.1]
> >
> > On Wednesday 12 December 2007 16:59:36 James Bottomley wrote:
> > > On Wed, 2007-12-12 at 15:36 +0100, Bernd Schubert wrote:
> > > > On Wednesday 12 December 2007 14:39:27 Matthew Wilcox wrote:
> > > > > On Wed, Dec 12, 2007 at 01:54:14PM +0100, Bernd Schubert wrote:
> > > > > > below is a patch introducing device recovery, trying to prevent
> > > > > > i/o errors when a DID_NO_CONNECT or SOFT_ERROR does happen.
> > > > >
> > > > > Why doesn't the regular scsi_eh do what you need?
> > > >
> > > > First of all, it is presently simply not called when the two errors
> > > > above do happen. This could be changed, of course.
> > >
> > > Erm, I think you'll find the error handler does activate on
> > > DID_SOFT_ERROR.  It causes a retry via the eh.  DID_NO_CONNECT is an
> >
> > Dec  7 23:48:45 beo-96 kernel: [94605.297924] sd 2:0:5:0: [sdd] Result:
> > hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK,SUGGEST_OK
> > Dec  7 23:48:45 beo-96 kernel: [94605.297932] end_request: I/O error, dev
> > sdd, sector 7706802052
> > Dec  7 23:48:45 beo-96 kernel: [94605.297937] raid5:md5: read error not
> > correctable (sector 871932472 on sdd3).
>
> This is some type of ioc internal error.  What we do on DID_SOFT_ERROR
> is retry for the usual number of times up to the timeout limit.
> Unfortunately, the retries are fixed at SD_MAX_RETRIES in sd.c.  Without
> diagnosing what's going wrong in the fusion, it's impossible to say if
> this is reasonable, but your fusion is signalling ioc errors (firmware
> errors).

Yes, I also think this is a fusion problem, if I'm not entirely mistaken, 
it does a DV for the wrong host.

Dec  6 22:32:33 beo-96 kernel: [  106.478866] ioc0: 53C1030: Capabilities={Initiator}
Dec  6 22:32:33 beo-96 kernel: [  106.923643] scsi2 : ioc0: LSI53C1030, FwRev=01033010h, Ports=1, MaxQ=222, IRQ=16
Dec  6 22:32:33 beo-96 kernel: [  107.939374] scsi 2:0:4:0: Direct-Access     IFT      A16U-G2430       348E PQ: 0 ANSI: 5
Dec  6 22:32:33 beo-96 kernel: [  107.947632]  target2:0:4: Beginning Domain Validation
[...]
Dec  6 22:32:33 beo-96 kernel: [  108.157159] scsi 2:0:5:0: Direct-Access     IFT      A16U-G2430       348E PQ: 0 ANSI: 5
Dec  6 22:32:33 beo-96 kernel: [  108.165396]  target2:0:5: Beginning Domain Validation
[...]
Dec  6 22:32:33 beo-96 kernel: [  110.625321] mptbase: Initiating ioc1 bringup
Dec  6 22:32:33 beo-96 kernel: [  111.117987] ioc1: 53C1030: Capabilities={Initiator}
Dec  6 22:32:33 beo-96 kernel: [  111.562771] scsi3 : ioc1: LSI53C1030, FwRev=01033010h, Ports=1, MaxQ=222, IRQ=17
Dec  6 22:32:33 beo-96 kernel: [  113.829617] scsi 3:0:10:0: Direct-Access     IFT      A16U-G2430       348E PQ: 0 ANSI: 5
Dec  6 22:32:33 beo-96 kernel: [  113.837929]  target3:0:10: Beginning Domain Validation
[...]
Dec  6 22:32:33 beo-96 kernel: [  114.083750] scsi 3:0:11:0: Direct-Access     IFT      A16U-G2430       348E PQ: 0 ANSI: 5
Dec  6 22:32:33 beo-96 kernel: [  114.092085]  target3:0:11: Beginning Domain Validation

[...]

So ioc0 is target2 with id 4 and 5. Ioc1 is target3 with id 10 and 11. 

As you can see from the logs I posted before and for completeness 
below again, the troublesome Infortrend box was on ioc1 (target3), 
but sometimes there have been domain validations for target2. 
For me the syslog suggests it simply did the DV to the wrong host.

Dec  7 23:45:14 beo-96 kernel: [94142.892782] mptbase: Initiating ioc1 recovery
Dec  7 23:45:14 beo-96 kernel: [94156.622334] mptscsih: ioc1: Issue of TaskMgmt failed!
Dec  7 23:45:14 beo-96 kernel: [94156.627458] mptscsih: ioc1: target reset: FAILED (sc=ffff8100aff2fcc0)
Dec  7 23:45:14 beo-96 kernel: [94156.634059] scsi_eh_ready_devs: !scsi_eh_bus_device_reset(), sleeping 10s
Dec  7 23:45:14 beo-96 kernel: [94156.640999]  target2:0:4: Beginning Domain Validation
Dec  7 23:45:14 beo-96 kernel: [94156.646242] 0/3 read DV
Dec  7 23:45:14 beo-96 kernel: [94156.648954]  target2:0:4: Domain Validation Initial Inquiry Failed
Dec  7 23:45:14 beo-96 kernel: [94156.655191]  target2:0:4: Ending Domain Validation


Best,
Bernd


-- 
Bernd Schubert
Q-Leap Networks GmbH

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] scsi device recovery
  2007-12-13 14:18         ` James Bottomley
  2007-12-14 11:26           ` fusion problem (was Re: [PATCH] scsi device recovery) Bernd Schubert
@ 2007-12-14 12:04           ` Bernd Schubert
  2007-12-14 12:22             ` Matthew Wilcox
  2007-12-14 14:35             ` James Bottomley
  1 sibling, 2 replies; 12+ messages in thread
From: Bernd Schubert @ 2007-12-14 12:04 UTC (permalink / raw)
  To: James Bottomley; +Cc: Matthew Wilcox, linux-scsi, Moore, Eric

Hello James,

On Thursday 13 December 2007 15:18:33 James Bottomley wrote:
> On Wed, 2007-12-12 at 18:54 +0100, Bernd Schubert wrote:
> > [Hmm, resending since mail after more than 30min still not on the ML,
> > maybe the attachment was too large? I have uploaded the log to
> > http://www.pci.uni-heidelberg.de/tc/usr/bernd/downloads/scsi/kern.log.1]
> >
> > On Wednesday 12 December 2007 16:59:36 James Bottomley wrote:
> > > On Wed, 2007-12-12 at 15:36 +0100, Bernd Schubert wrote:
> > > > On Wednesday 12 December 2007 14:39:27 Matthew Wilcox wrote:
> > > > > On Wed, Dec 12, 2007 at 01:54:14PM +0100, Bernd Schubert wrote:
> > > > > > below is a patch introducing device recovery, trying to prevent
> > > > > > i/o errors when a DID_NO_CONNECT or SOFT_ERROR does happen.
> > > > >
> > > > > Why doesn't the regular scsi_eh do what you need?
> > > >
> > > > First of all, it is presently simply not called when the two errors
> > > > above do happen. This could be changed, of course.
> > >
> > > Erm, I think you'll find the error handler does activate on
> > > DID_SOFT_ERROR.  It causes a retry via the eh.  DID_NO_CONNECT is an
> >
> > Dec  7 23:48:45 beo-96 kernel: [94605.297924] sd 2:0:5:0: [sdd] Result:
> > hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK,SUGGEST_OK
> > Dec  7 23:48:45 beo-96 kernel: [94605.297932] end_request: I/O error, dev
> > sdd, sector 7706802052
> > Dec  7 23:48:45 beo-96 kernel: [94605.297937] raid5:md5: read error not
> > correctable (sector 871932472 on sdd3).
>
> This is some type of ioc internal error.  What we do on DID_SOFT_ERROR
> is retry for the usual number of times up to the timeout limit.
> Unfortunately, the retries are fixed at SD_MAX_RETRIES in sd.c.  Without
> diagnosing what's going wrong in the fusion, it's impossible to say if
> this is reasonable, but your fusion is signalling ioc errors (firmware
> errors).

besides this seems to be a fusion driver or firmware problem, I still think eh 
is not activated for this error. I'm not absulutely sure, but I think with my 
patch deh and later on eh would be triggered, wouldn't it?

>
> > Full log attached.
> >
> > > immediate error with no eh intervention because it means that the
> > > target went away.  Handling this as a retryable error isn't an option
> > > because it will interfere with hotplug.
> >
> > Then we need a sysfs flag one can set to manually enable eh for these
> > devices on DID_NO_CONNECT.
>
> No, because that will seriously damage a lot of other systems.

How would it, if we create a device specific sysfs parameter defaulting to 
off? If you think users could activate it by accident, we could also print a 
big warning when the paramter is read from userspace.
Furthermore, as far as I did understand you, DID_NO_CONNECT is only required 
for hotplugging. But real scsi doesn't do automatic hotplugging, does it? One 
always needs to do it manually, e.g. with scsiadd or similar tools. So is 
DID_NO_CONNECT really required for native scsi? If not, we also could make 
the scsi-drivers to set a flag to activate eh on DID_NO_CONNECT.

>
> The DID_NO_CONNECT looks to be a genuine reselection issue caused by a
> device out of spec on the bus.  The SPI standard says a device should
> respond in 250ms, which is what most HBA's take as the default selection
> timeout.  I'd say for the device you have, you need to increase this.
> Unfortunately doing this for the fusion is some type of mode page
> setting, I think, but I don't have the doc in front of me.  I'd be
> amenable to putting the selection timeout as a parameter in the spi
> transport class, since others might find it valuable occasionally to
> control.

Its of course optimal to fix the real cause of our problems. I have ask 
Infortrend now which value should be used for their devices.

Eric, I would be greatful if you could point me to the code fragment using or 
setting the respond timeout.


[...]

> > I'm attaching the syslog, this is 2.6.22 + additional printks,
> > dump_stack()'s and msleep()'s.
> > At 03:59:36 the system finally went into wait_for_completion(), similar
> > to the "everything in wait_for_completion, what is my system doing?"
> > thread.
>
> This looks like a genuine bug.  I missed the thread, since my email
> system went off line while I was on holiday for two weeks.  The symptoms
> look to be lost commands, but I can't see why from the traces.  There's
> a known bug where we can hang in domain validation because of a resource
> starvation issue, but I know of none where everything hangs just after
> error recovery completes.

Since still not much happend to solve this bug, shall I create a bugzilla 
entry?


Thanks a lot,
Bernd

PS: Do you have some links to scsi and SPI specs? 

-- 
Bernd Schubert
Q-Leap Networks GmbH

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] scsi device recovery
  2007-12-14 12:04           ` [PATCH] scsi device recovery Bernd Schubert
@ 2007-12-14 12:22             ` Matthew Wilcox
  2007-12-14 12:28               ` Bernd Schubert
  2007-12-14 14:35             ` James Bottomley
  1 sibling, 1 reply; 12+ messages in thread
From: Matthew Wilcox @ 2007-12-14 12:22 UTC (permalink / raw)
  To: Bernd Schubert; +Cc: James Bottomley, linux-scsi, Moore, Eric

On Fri, Dec 14, 2007 at 01:04:12PM +0100, Bernd Schubert wrote:
> PS: Do you have some links to scsi and SPI specs? 

The final versions are available for a fee from ANSI.  However,
you can download the final draft versions for free from
http://www.t10.org/drafts.htm  You probably want:

http://www.t10.org/ftp/t10/drafts/sam3/sam3r14.pdf (SCSI Architecture)
http://www.t10.org/ftp/t10/drafts/spi5/spi5r06.pdf (Parallel SCSI)
http://www.t10.org/ftp/t10/drafts/sdv/sdv-r08b.pdf (Domain Validation)
http://www.t10.org/ftp/t10/drafts/spc3/spc3r23.pdf (Primary Commands)
http://www.t10.org/ftp/t10/drafts/sbc2/sbc2r16.pdf (Block Commands)

I sometimes find it easier to look at SCSI-2 to find
things which haven't changed in the last fourteen years:
http://www.t10.org/ftp/t10/drafts/s2/s2-r10l.pdf

-- 
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] scsi device recovery
  2007-12-14 12:22             ` Matthew Wilcox
@ 2007-12-14 12:28               ` Bernd Schubert
  0 siblings, 0 replies; 12+ messages in thread
From: Bernd Schubert @ 2007-12-14 12:28 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: James Bottomley, linux-scsi, Moore, Eric

On Friday 14 December 2007 13:22:55 Matthew Wilcox wrote:
> On Fri, Dec 14, 2007 at 01:04:12PM +0100, Bernd Schubert wrote:
> > PS: Do you have some links to scsi and SPI specs?
>
> The final versions are available for a fee from ANSI.  However,
> you can download the final draft versions for free from
> http://www.t10.org/drafts.htm  You probably want:
>
> http://www.t10.org/ftp/t10/drafts/sam3/sam3r14.pdf (SCSI Architecture)
> http://www.t10.org/ftp/t10/drafts/spi5/spi5r06.pdf (Parallel SCSI)
> http://www.t10.org/ftp/t10/drafts/sdv/sdv-r08b.pdf (Domain Validation)
> http://www.t10.org/ftp/t10/drafts/spc3/spc3r23.pdf (Primary Commands)
> http://www.t10.org/ftp/t10/drafts/sbc2/sbc2r16.pdf (Block Commands)
>
> I sometimes find it easier to look at SCSI-2 to find
> things which haven't changed in the last fourteen years:
> http://www.t10.org/ftp/t10/drafts/s2/s2-r10l.pdf

Thanks a lot!!!


-- 
Bernd Schubert
Q-Leap Networks GmbH

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] scsi device recovery
  2007-12-14 12:04           ` [PATCH] scsi device recovery Bernd Schubert
  2007-12-14 12:22             ` Matthew Wilcox
@ 2007-12-14 14:35             ` James Bottomley
  2007-12-14 15:26               ` Bernd Schubert
  1 sibling, 1 reply; 12+ messages in thread
From: James Bottomley @ 2007-12-14 14:35 UTC (permalink / raw)
  To: Bernd Schubert; +Cc: Matthew Wilcox, linux-scsi, Moore, Eric


On Fri, 2007-12-14 at 13:04 +0100, Bernd Schubert wrote:
> Hello James,
> 
> On Thursday 13 December 2007 15:18:33 James Bottomley wrote:
> > On Wed, 2007-12-12 at 18:54 +0100, Bernd Schubert wrote:
> > > [Hmm, resending since mail after more than 30min still not on the ML,
> > > maybe the attachment was too large? I have uploaded the log to
> > > http://www.pci.uni-heidelberg.de/tc/usr/bernd/downloads/scsi/kern.log.1]
> > >
> > > On Wednesday 12 December 2007 16:59:36 James Bottomley wrote:
> > > > On Wed, 2007-12-12 at 15:36 +0100, Bernd Schubert wrote:
> > > > > On Wednesday 12 December 2007 14:39:27 Matthew Wilcox wrote:
> > > > > > On Wed, Dec 12, 2007 at 01:54:14PM +0100, Bernd Schubert wrote:
> > > > > > > below is a patch introducing device recovery, trying to prevent
> > > > > > > i/o errors when a DID_NO_CONNECT or SOFT_ERROR does happen.
> > > > > >
> > > > > > Why doesn't the regular scsi_eh do what you need?
> > > > >
> > > > > First of all, it is presently simply not called when the two errors
> > > > > above do happen. This could be changed, of course.
> > > >
> > > > Erm, I think you'll find the error handler does activate on
> > > > DID_SOFT_ERROR.  It causes a retry via the eh.  DID_NO_CONNECT is an
> > >
> > > Dec  7 23:48:45 beo-96 kernel: [94605.297924] sd 2:0:5:0: [sdd] Result:
> > > hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK,SUGGEST_OK
> > > Dec  7 23:48:45 beo-96 kernel: [94605.297932] end_request: I/O error, dev
> > > sdd, sector 7706802052
> > > Dec  7 23:48:45 beo-96 kernel: [94605.297937] raid5:md5: read error not
> > > correctable (sector 871932472 on sdd3).
> >
> > This is some type of ioc internal error.  What we do on DID_SOFT_ERROR
> > is retry for the usual number of times up to the timeout limit.
> > Unfortunately, the retries are fixed at SD_MAX_RETRIES in sd.c.  Without
> > diagnosing what's going wrong in the fusion, it's impossible to say if
> > this is reasonable, but your fusion is signalling ioc errors (firmware
> > errors).
> 
> besides this seems to be a fusion driver or firmware problem, I still think eh 
> is not activated for this error. I'm not absulutely sure, but I think with my 
> patch deh and later on eh would be triggered, wouldn't it?

the full eh machinery, by design, isn't activated for a simple retry.
If you look in scsi_lib.c:scsi_softirq_done() you'll see the processing
of the outcome of scsi_decide_disposision() (DID_SOFT_ERROR comes out of
here with NEEDS_RETRY, providing there are retries left).  Right at the
moment, this means that the retry is absolutely immediate, so you
probably run through all of the retries before firmware recovery even
has time to activate.  I'd be amenable to giving it an ADD_TO_MLQUEUE
type return (provided it still increments retries) which will cause a
pause in the resubmission (until either a command returns or io pressure
builds up in the block layer).

> >
> > > Full log attached.
> > >
> > > > immediate error with no eh intervention because it means that the
> > > > target went away.  Handling this as a retryable error isn't an option
> > > > because it will interfere with hotplug.
> > >
> > > Then we need a sysfs flag one can set to manually enable eh for these
> > > devices on DID_NO_CONNECT.
> >
> > No, because that will seriously damage a lot of other systems.
> 
> How would it, if we create a device specific sysfs parameter defaulting to 
> off? If you think users could activate it by accident, we could also print a 
> big warning when the paramter is read from userspace.
> Furthermore, as far as I did understand you, DID_NO_CONNECT is only required 
> for hotplugging. But real scsi doesn't do automatic hotplugging, does it? 

Yes, it does.  Most modern busses are hot plug aware and use
DID_NO_CONNECT to signal target went away.  Even some SPI frames are
quasi hotplug aware.

> One 
> always needs to do it manually, e.g. with scsiadd or similar tools. So is 
> DID_NO_CONNECT really required for native scsi? If not, we also could make 
> the scsi-drivers to set a flag to activate eh on DID_NO_CONNECT.

Just grep through the mid layer ... you'll see we use DID_NO_CONNECT on
a host of other error conditions to force an immediate error as well.

> >
> > The DID_NO_CONNECT looks to be a genuine reselection issue caused by a
> > device out of spec on the bus.  The SPI standard says a device should
> > respond in 250ms, which is what most HBA's take as the default selection
> > timeout.  I'd say for the device you have, you need to increase this.
> > Unfortunately doing this for the fusion is some type of mode page
> > setting, I think, but I don't have the doc in front of me.  I'd be
> > amenable to putting the selection timeout as a parameter in the spi
> > transport class, since others might find it valuable occasionally to
> > control.
> 
> Its of course optimal to fix the real cause of our problems. I have ask 
> Infortrend now which value should be used for their devices.
> 
> Eric, I would be greatful if you could point me to the code fragment using or 
> setting the respond timeout.
> 
> 
> [...]
> 
> > > I'm attaching the syslog, this is 2.6.22 + additional printks,
> > > dump_stack()'s and msleep()'s.
> > > At 03:59:36 the system finally went into wait_for_completion(), similar
> > > to the "everything in wait_for_completion, what is my system doing?"
> > > thread.
> >
> > This looks like a genuine bug.  I missed the thread, since my email
> > system went off line while I was on holiday for two weeks.  The symptoms
> > look to be lost commands, but I can't see why from the traces.  There's
> > a known bug where we can hang in domain validation because of a resource
> > starvation issue, but I know of none where everything hangs just after
> > error recovery completes.
> 
> Since still not much happend to solve this bug, shall I create a bugzilla 
> entry?

Sure ... on further analysis, it is the fusion DV resource starvation
issue.  The email thread is here:

http://marc.info/?t=118039577800004

James



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] scsi device recovery
  2007-12-14 14:35             ` James Bottomley
@ 2007-12-14 15:26               ` Bernd Schubert
  0 siblings, 0 replies; 12+ messages in thread
From: Bernd Schubert @ 2007-12-14 15:26 UTC (permalink / raw)
  To: James Bottomley; +Cc: Matthew Wilcox, linux-scsi, Moore, Eric

On Friday 14 December 2007 15:35:01 James Bottomley wrote:
> > > This is some type of ioc internal error.  What we do on DID_SOFT_ERROR
> > > is retry for the usual number of times up to the timeout limit.
> > > Unfortunately, the retries are fixed at SD_MAX_RETRIES in sd.c. 
> > > Without diagnosing what's going wrong in the fusion, it's impossible to
> > > say if this is reasonable, but your fusion is signalling ioc errors
> > > (firmware errors).
> >
> > besides this seems to be a fusion driver or firmware problem, I still
> > think eh is not activated for this error. I'm not absulutely sure, but I
> > think with my patch deh and later on eh would be triggered, wouldn't it?
>
> the full eh machinery, by design, isn't activated for a simple retry.
> If you look in scsi_lib.c:scsi_softirq_done() you'll see the processing
> of the outcome of scsi_decide_disposision() (DID_SOFT_ERROR comes out of
> here with NEEDS_RETRY, providing there are retries left).  Right at the
> moment, this means that the retry is absolutely immediate, so you
> probably run through all of the retries before firmware recovery even
> has time to activate.  I'd be amenable to giving it an ADD_TO_MLQUEUE
> type return (provided it still increments retries) which will cause a
> pause in the resubmission (until either a command returns or io pressure
> builds up in the block layer).

Isn't there always i/o pressure if the scsi bus is satturated? Can we activate 
eh machinery when retries is exceeded? 


Index: linux-2.6.22/drivers/scsi/scsi_error.c
===================================================================
--- linux-2.6.22.orig/drivers/scsi/scsi_error.c	2007-12-14 15:53:48.000000000 
+0100
+++ linux-2.6.22/drivers/scsi/scsi_error.c	2007-12-14 15:58:27.000000000 +0100
@@ -1235,7 +1235,7 @@ int scsi_decide_disposition(struct scsi_
 		 * and not get stuck in a loop.
 		 */
 	case DID_SOFT_ERROR:
-		goto maybe_retry;
+		goto maybe_requeue;
 	case DID_IMM_RETRY:
 		return NEEDS_RETRY;
 
@@ -1342,6 +1342,24 @@ int scsi_decide_disposition(struct scsi_
 		 */
 		return SUCCESS;
 	}
+
+      maybe_requeue:
+
+	/* we requeue for retry because the error was retryable, and
+	 * the request was not marked fast fail.  Note that above,
+	 * even if the request is marked fast fail, we still requeue
+	 * for queue congestion conditions (QUEUE_FULL or BUSY) */
+	if ((++scmd->retries) <= scmd->allowed
+	    && !blk_noretry_request(scmd->request)) {
+		return ADD_TO_MLQUEUE;
+	} else {
+		/*
+		 * no more retries - report this one back to upper level.
+		 *
+		 * TODO: initiate full error recovery now?
+		 */
+		return SUCCESS;
+	}
 }
 
 /**


>
> > > > Full log attached.
> > > >
> > > > > immediate error with no eh intervention because it means that the
> > > > > target went away.  Handling this as a retryable error isn't an
> > > > > option because it will interfere with hotplug.
> > > >
> > > > Then we need a sysfs flag one can set to manually enable eh for these
> > > > devices on DID_NO_CONNECT.
> > >
> > > No, because that will seriously damage a lot of other systems.
> >
> > How would it, if we create a device specific sysfs parameter defaulting
> > to off? If you think users could activate it by accident, we could also
> > print a big warning when the paramter is read from userspace.
> > Furthermore, as far as I did understand you, DID_NO_CONNECT is only
> > required for hotplugging. But real scsi doesn't do automatic hotplugging,
> > does it?
>
> Yes, it does.  Most modern busses are hot plug aware and use
> DID_NO_CONNECT to signal target went away.  Even some SPI frames are
> quasi hotplug aware.
>
> > One
> > always needs to do it manually, e.g. with scsiadd or similar tools. So is
> > DID_NO_CONNECT really required for native scsi? If not, we also could
> > make the scsi-drivers to set a flag to activate eh on DID_NO_CONNECT.
>
> Just grep through the mid layer ... you'll see we use DID_NO_CONNECT on
> a host of other error conditions to force an immediate error as well.

I will do later on. I will also write a patch allowing error recovery for 
manually overridden devices.

[...]

> > > This looks like a genuine bug.  I missed the thread, since my email
> > > system went off line while I was on holiday for two weeks.  The
> > > symptoms look to be lost commands, but I can't see why from the traces.
> > >  There's a known bug where we can hang in domain validation because of
> > > a resource starvation issue, but I know of none where everything hangs
> > > just after error recovery completes.
> >
> > Since still not much happend to solve this bug, shall I create a bugzilla
> > entry?
>
> Sure ... on further analysis, it is the fusion DV resource starvation
> issue.  The email thread is here:
>
> http://marc.info/?t=118039577800004


Interesting thread, I don't understand the details yet, but I'm really curious 
if this can somehow also explain the *almost deadlock* we are seeing when we 
do md-resync at maximum device speed.


Thanks a lot for your help,
Bernd


-- 
Bernd Schubert
Q-Leap Networks GmbH

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2007-12-14 15:27 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-12-12 12:54 [PATCH] scsi device recovery Bernd Schubert
2007-12-12 13:39 ` Matthew Wilcox
2007-12-12 14:36   ` Bernd Schubert
2007-12-12 15:59     ` James Bottomley
2007-12-12 17:54       ` Bernd Schubert
2007-12-13 14:18         ` James Bottomley
2007-12-14 11:26           ` fusion problem (was Re: [PATCH] scsi device recovery) Bernd Schubert
2007-12-14 12:04           ` [PATCH] scsi device recovery Bernd Schubert
2007-12-14 12:22             ` Matthew Wilcox
2007-12-14 12:28               ` Bernd Schubert
2007-12-14 14:35             ` James Bottomley
2007-12-14 15:26               ` Bernd Schubert

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox