public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] improvement of fastfail operation
@ 2004-03-24  0:38 Masao Fukuchi
  2004-03-27 15:57 ` James Bottomley
  0 siblings, 1 reply; 13+ messages in thread
From: Masao Fukuchi @ 2004-03-24  0:38 UTC (permalink / raw)
  To: linux-scsi

Hi all,

We are planning to use linux for enterprise server.
Since the reliability of data is important factor, this server has RAID
or clustering system.
Also, this server needs quick response to host(< 30sec) even if device/
path error occurs.

We are planning to use fastfail flag for this purpose.
We reviewed the sequence of fastfail, but the operation is inadequate for
some error cases(mainly command timeout).

We propose the following improvements for fastfail.

1.Validate fastfail flag for command timeout.
  Currently fastfail flag is not valid for command timeout and repeats
  4 times.
2.Set timeout value to 10sec.
  Currently timeout value is set to 30sec.
3.Set wait time for bus reset/host reset to 5sec.
  Currently wait time is set to 10sec.
  (In many cases, abort task command fails for command timeout and it needs
  bus reset or host reset operation)

Each timeout values come from:
  timeout(10sec)+Abort/Bus reset(5sec+)+alt retry timeout(10sec) < 30sec

This is one idea for quick response on device/path error.
If you have any comments or idea for this improvements, please let me know.

Thanks,
Masao Fukuchi

diff -urN linux-2.6.4/drivers/scsi/scsi_error.c linux-2.6.4FF/drivers/scsi/scsi_error.c
--- linux-2.6.4/drivers/scsi/scsi_error.c       2004-02-18 12:57:12.000000000 +0900
+++ linux-2.6.4FF/drivers/scsi/scsi_error.c     2004-03-18 16:59:50.000000000 +0900
@@ -43,6 +43,8 @@
  */
 #define BUS_RESET_SETTLE_TIME   10*HZ
 #define HOST_RESET_SETTLE_TIME  10*HZ
+#define BUS_RESET_SETTLE_TIME_FAST   5*HZ
+#define HOST_RESET_SETTLE_TIME_FAST  5*HZ

 /* called with shost->host_lock held */
 void scsi_eh_wakeup(struct Scsi_Host *shost)
@@ -909,7 +911,10 @@
        spin_unlock_irqrestore(scmd->device->host->host_lock, flags);

        if (rtn == SUCCESS) {
-               scsi_sleep(BUS_RESET_SETTLE_TIME);
+               if (blk_noretry_request(scmd->request))
+                       scsi_sleep(BUS_RESET_SETTLE_TIME_FAST);
+               else
+                       scsi_sleep(BUS_RESET_SETTLE_TIME);
                spin_lock_irqsave(scmd->device->host->host_lock, flags);
                scsi_report_bus_reset(scmd->device->host, scmd->device->channel);
                spin_unlock_irqrestore(scmd->device->host->host_lock, flags);
@@ -940,7 +945,10 @@
        spin_unlock_irqrestore(scmd->device->host->host_lock, flags);

        if (rtn == SUCCESS) {
-               scsi_sleep(HOST_RESET_SETTLE_TIME);
+               if (blk_noretry_request(scmd->request))
+                       scsi_sleep(HOST_RESET_SETTLE_TIME_FAST);
+               else
+                       scsi_sleep(HOST_RESET_SETTLE_TIME);
                spin_lock_irqsave(scmd->device->host->host_lock, flags);
                scsi_report_bus_reset(scmd->device->host, scmd->device->channel);
                spin_unlock_irqrestore(scmd->device->host->host_lock, flags);
@@ -1421,7 +1429,8 @@
                scmd = list_entry(lh, struct scsi_cmnd, eh_entry);
                list_del_init(lh);
                if (scmd->device->online &&
-                       (++scmd->retries < scmd->allowed)) {
+                       (++scmd->retries < scmd->allowed) &&
+                       (!blk_noretry_request(scmd->request))) {
                        SCSI_LOG_ERROR_RECOVERY(3, printk("%s: flush"
                                                          " retry cmd: %p\n",
                                                          current->comm,
diff -urN linux-2.6.4/drivers/scsi/sd.c linux-2.6.4FF/drivers/scsi/sd.c
--- linux-2.6.4/drivers/scsi/sd.c       2004-03-18 16:12:01.000000000 +0900
+++ linux-2.6.4FF/drivers/scsi/sd.c     2004-03-18 17:14:36.000000000 +0900
@@ -67,6 +67,7 @@
  * Time out in seconds for disks and Magneto-opticals (which are slower).
  */
 #define SD_TIMEOUT             (30 * HZ)
+#define SD_TIMEOUT_FAST         (10 * HZ)
 #define SD_MOD_TIMEOUT         (75 * HZ)

 /*
@@ -178,7 +179,10 @@
        sector_t block;
        struct scsi_device *sdp = SCpnt->device;

-       timeout = SD_TIMEOUT;
+        if (blk_noretry_request(SCpnt->request))
+               timeout = SD_TIMEOUT_FAST;
+        else
+               timeout = SD_TIMEOUT;
        if (SCpnt->device->type != TYPE_DISK)
                timeout = SD_MOD_TIMEOUT;

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2004-04-01 15:07 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-03-24  0:38 [PATCH] improvement of fastfail operation Masao Fukuchi
2004-03-27 15:57 ` James Bottomley
2004-03-29 10:20   ` Mike Christie
2004-03-29 12:17   ` Masao Fukuchi
2004-03-31  1:29     ` James Bottomley
2004-03-31  5:14       ` Mike Christie
2004-03-31 22:04         ` Jens Axboe
2004-03-31 22:11           ` James Bottomley
2004-03-31 22:12             ` Jens Axboe
2004-03-31 23:15               ` Mike Christie
2004-04-01  6:47                 ` Jens Axboe
     [not found]                 ` <406BC50E.6090100@us.ibm.com>
2004-04-01  7:53                   ` Mike Christie
2004-04-01 15:06                     ` James Bottomley

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox