* [RFC PATCH 4/4] convert scsi to blkerr error values
@ 2005-08-24 9:04 Mike Christie
0 siblings, 0 replies; 7+ messages in thread
From: Mike Christie @ 2005-08-24 9:04 UTC (permalink / raw)
To: axboe, linux-scsi, dm-devel
This patch convert scsi to use the BLKERR values.
It uses the scsi_cmnd->result values to try
and "guess" (LLDs are not all consistent in
their host byte usage) if it is a transport or
device or driver error. Maybe it would be better to
have the transport classes define new host byte error
values (or add a transport byte), and scsi-ml can then
call into the transport class to tell it if the error
is retryable or fatal.
If this approach is ok, I guess the next step
is more testing and figure out how to best
support vendor specifics. While converting
dm-emc's asc/ascq code there are comments
indicating we might need to send additional
commands to try and figure out what to do
(Enginio or Sun people did you need this too?)
The patch has been lightly tested with iscsi_sfnet
and open-iscsi using dm-multipath to verify that
when we send retryable errors dm-multipath does
nice things like not automatically fail a path
and fail the IO.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
--- a/drivers/scsi/scsi.c
+++ b/drivers/scsi/scsi.c
@@ -1269,6 +1269,7 @@ int scsi_device_cancel(struct scsi_devic
scsi_eh_scmd_add(scmd, SCSI_EH_CANCEL_CMD);
} else {
scmd->result = (DID_ABORT << 16);
+ scmd->blk_result = BLKERR_FATAL_DEV;
scsi_finish_command(scmd);
}
}
diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -1213,6 +1213,7 @@ int scsi_decide_disposition(struct scsi_
SCSI_LOG_ERROR_RECOVERY(5, printk("%s: device offline - report"
" as SUCCESS\n",
__FUNCTION__));
+ scmd->blk_result = BLKERR_FATAL_DEV;
return SUCCESS;
}
@@ -1228,6 +1229,7 @@ int scsi_decide_disposition(struct scsi_
* did_ok.
*/
scmd->result &= 0xff00ffff;
+ scmd->blk_result = BLK_SUCCESS;
return SUCCESS;
case DID_OK:
/*
@@ -1236,7 +1238,10 @@ int scsi_decide_disposition(struct scsi_
break;
case DID_NO_CONNECT:
case DID_BAD_TARGET:
+ scmd->blk_result = BLKERR_FATAL_XPT;
+ return SUCCESS;
case DID_ABORT:
+ scmd->blk_result = BLKERR_RETRY_DEV;
/*
* note - this means that we just report the status back
* to the top level driver, not that we actually think
@@ -1253,6 +1258,7 @@ int scsi_decide_disposition(struct scsi_
* and not get stuck in a loop.
*/
case DID_SOFT_ERROR:
+ scmd->blk_result = BLKERR_RETRY_DEV;
goto maybe_retry;
case DID_IMM_RETRY:
return NEEDS_RETRY;
@@ -1269,11 +1275,12 @@ int scsi_decide_disposition(struct scsi_
*/
break;
/* fallthrough */
-
case DID_BUS_BUSY:
case DID_PARITY:
+ scmd->blk_result = BLKERR_RETRY_XPT;
goto maybe_retry;
case DID_TIME_OUT:
+ scmd->blk_result = BLKERR_FATAL_DEV;
/*
* when we scan the bus, we get timeout messages for
* these commands if there is no device available.
@@ -1286,16 +1293,20 @@ int scsi_decide_disposition(struct scsi_
return FAILED;
}
case DID_RESET:
+ scmd->blk_result = BLKERR_RETRY_DEV;
return SUCCESS;
default:
+ scmd->blk_result = BLKERR_FATAL_DEV;
return FAILED;
}
/*
* next, check the message byte.
*/
- if (msg_byte(scmd->result) != COMMAND_COMPLETE)
+ if (msg_byte(scmd->result) != COMMAND_COMPLETE) {
+ scmd->blk_result = BLKERR_FATAL_DEV;
return FAILED;
+ }
/*
* check the status byte to see if this indicates anything special.
@@ -1315,13 +1326,19 @@ int scsi_decide_disposition(struct scsi_
*/
return ADD_TO_MLQUEUE;
case GOOD:
+ scmd->blk_result = BLK_SUCCESS;
+ return SUCCESS;
case COMMAND_TERMINATED:
case TASK_ABORTED:
+ scmd->blk_result = BLKERR_FATAL_DEV;
return SUCCESS;
case CHECK_CONDITION:
rtn = scsi_check_sense(scmd);
- if (rtn == NEEDS_RETRY)
+ if (rtn == NEEDS_RETRY) {
+ scmd->blk_result = BLKERR_RETRY_DEV;
goto maybe_retry;
+ } else
+ scmd->blk_result = BLKERR_FATAL_DEV;
/* if rtn == FAILED, we have no sense information;
* returning FAILED will wake the error handler thread
* to collect the sense and redo the decide
@@ -1334,6 +1351,7 @@ int scsi_decide_disposition(struct scsi_
/*
* who knows? FIXME(eric)
*/
+ scmd->blk_result = BLK_SUCCESS;
return SUCCESS;
case RESERVATION_CONFLICT:
@@ -1341,10 +1359,13 @@ int scsi_decide_disposition(struct scsi_
" %d channel %d id %d lun %d\n",
scmd->device->host->host_no, scmd->device->channel,
scmd->device->id, scmd->device->lun);
+ scmd->blk_result = BLK_SUCCESS;
return SUCCESS; /* causes immediate i/o error */
default:
+ scmd->blk_result = BLKERR_FATAL_DEV;
return FAILED;
}
+ scmd->blk_result = BLKERR_FATAL_DEV;
return FAILED;
maybe_retry:
@@ -1519,6 +1540,7 @@ static void scsi_eh_flush_done_q(struct
SCSI_LOG_ERROR_RECOVERY(3, printk("%s: flush finish"
" cmd: %p\n",
current->comm, scmd));
+ scmd->blk_result = BLKERR_RETRY_DEV;
scsi_finish_command(scmd);
}
}
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -513,7 +513,7 @@ void scsi_run_host_queues(struct Scsi_Ho
* of upper level post-processing and scsi_io_completion).
*
* Arguments: cmd - command that is complete.
- * uptodate - 1 if I/O indicates success, <= 0 for I/O error.
+ * uptodate - a block layer error value
* bytes - number of bytes of completed I/O
* requeue - indicates whether we should requeue leftovers.
*
@@ -527,7 +527,7 @@ void scsi_run_host_queues(struct Scsi_Ho
* We are guaranteeing that the request queue will be goosed
* at some point during this call.
*/
-static struct scsi_cmnd *scsi_end_request(struct scsi_cmnd *cmd, int uptodate,
+static struct scsi_cmnd *scsi_end_request(struct scsi_cmnd *cmd, int blkerr,
int bytes, int requeue)
{
request_queue_t *q = cmd->device->request_queue;
@@ -538,15 +538,15 @@ static struct scsi_cmnd *scsi_end_reques
* If there are blocks left over at the end, set up the command
* to queue the remainder of them.
*/
- if (end_that_request_chunk(req, uptodate, bytes)) {
+ if (end_that_request_chunk(req, blkerr, bytes)) {
int leftover = (req->hard_nr_sectors << 9);
if (blk_pc_request(req))
leftover = req->data_len;
/* kill remainder if no retrys */
- if (!uptodate && blk_noretry_request(req))
- end_that_request_chunk(req, 0, leftover);
+ if (blkerr != BLK_SUCCESS && blk_noretry_request(req))
+ end_that_request_chunk(req, blkerr, leftover);
else {
if (requeue)
/*
@@ -781,7 +781,8 @@ void scsi_io_completion(struct scsi_cmnd
* requeueing right here - we will requeue down below
* when we handle the bad sectors.
*/
- cmd = scsi_end_request(cmd, 1, good_bytes, result == 0);
+ cmd = scsi_end_request(cmd, BLK_SUCCESS, good_bytes,
+ result == 0);
/*
* If the command completed without error, then either finish off the
@@ -804,8 +805,8 @@ void scsi_io_completion(struct scsi_cmnd
* and quietly refuse further access.
*/
cmd->device->changed = 1;
- cmd = scsi_end_request(cmd, 0,
- this_count, 1);
+ cmd = scsi_end_request(cmd, BLKERR_FATAL_DEV,
+ this_count, 1);
return;
} else {
/*
@@ -838,7 +839,8 @@ void scsi_io_completion(struct scsi_cmnd
scsi_requeue_command(q, cmd);
result = 0;
} else {
- cmd = scsi_end_request(cmd, 0, this_count, 1);
+ cmd = scsi_end_request(cmd, cmd->blk_result,
+ this_count, 1);
return;
}
break;
@@ -853,7 +855,8 @@ void scsi_io_completion(struct scsi_cmnd
}
printk(KERN_INFO "Device %s not ready.\n",
req->rq_disk ? req->rq_disk->disk_name : "");
- cmd = scsi_end_request(cmd, 0, this_count, 1);
+ cmd = scsi_end_request(cmd, cmd->blk_result,
+ this_count, 1);
return;
case VOLUME_OVERFLOW:
printk(KERN_INFO "Volume overflow <%d %d %d %d> CDB: ",
@@ -862,7 +865,8 @@ void scsi_io_completion(struct scsi_cmnd
(int)cmd->device->id, (int)cmd->device->lun);
__scsi_print_command(cmd->data_cmnd);
scsi_print_sense("", cmd);
- cmd = scsi_end_request(cmd, 0, block_bytes, 1);
+ cmd = scsi_end_request(cmd, cmd->blk_result,
+ block_bytes, 1);
return;
default:
break;
@@ -894,7 +898,7 @@ void scsi_io_completion(struct scsi_cmnd
block_bytes = req->hard_cur_sectors << 9;
if (!block_bytes)
block_bytes = req->data_len;
- cmd = scsi_end_request(cmd, 0, block_bytes, 1);
+ cmd = scsi_end_request(cmd, cmd->blk_result, block_bytes, 1);
}
}
EXPORT_SYMBOL(scsi_io_completion);
@@ -1246,7 +1250,8 @@ static void scsi_kill_requests(request_q
while ((req = elv_next_request(q)) != NULL) {
blkdev_dequeue_request(req);
req->flags |= REQ_QUIET;
- while (end_that_request_first(req, 0, req->nr_sectors))
+ while (end_that_request_first(req, BLKERR_FATAL_DEV,
+ req->nr_sectors))
;
end_that_request_last(req);
}
@@ -1301,7 +1306,8 @@ static void scsi_request_fn(struct reque
sdev->host->host_no, sdev->id, sdev->lun);
blkdev_dequeue_request(req);
req->flags |= REQ_QUIET;
- while (end_that_request_first(req, 0, req->nr_sectors))
+ while (end_that_request_first(req, BLKERR_FATAL_DEV,
+ req->nr_sectors))
;
end_that_request_last(req);
continue;
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -754,7 +754,8 @@ static void sd_end_flush(request_queue_t
/*
* force journal abort of barriers
*/
- end_that_request_first(rq, -EOPNOTSUPP, rq->hard_nr_sectors);
+ end_that_request_first(rq, BLKERR_NOTSUPP,
+ rq->hard_nr_sectors);
end_that_request_last(rq);
}
}
diff --git a/include/scsi/scsi_cmnd.h b/include/scsi/scsi_cmnd.h
--- a/include/scsi/scsi_cmnd.h
+++ b/include/scsi/scsi_cmnd.h
@@ -127,6 +127,7 @@ struct scsi_cmnd {
* to be at an address < 16Mb). */
int result; /* Status code from lower level driver */
+ int blk_result; /* block layer return value */
unsigned char tag; /* SCSI-II queued command tag */
unsigned long pid; /* Process ID, starts at 0. Unique per host. */
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [RFC PATCH 4/4] convert scsi to blkerr error values
2005-09-16 20:32 goggin, edward
@ 2005-09-16 20:26 ` Mike Christie
2005-09-16 20:53 ` Mike Christie
0 siblings, 1 reply; 7+ messages in thread
From: Mike Christie @ 2005-09-16 20:26 UTC (permalink / raw)
To: goggin, edward; +Cc: dm-devel, axboe, linux-scsi
goggin, edward wrote:
> Mike,
>
> I don't think it is reasonably possible to anticipate
> all possible parsing requirements for the asc and ascq
> portions of SCSI sense information across all device
> models. I'm in favor of having a "small" framework in
> SCSI where a SCSI sense interpreter module (per
> vendor & model possibly) could be registered
> dynamically, by dm-emc.c for instance.
Yeah I agree, I mentioned this before in some other mails. I think a
module versus some table that userspace could write to were discussed.
The BLKERR values were meant to be able to tell upper layer code whether
a transport or device or driver error occured and whether the lower
level thought it was retryable. But then I thought I could also wedge in
the handling of the vendor specifcs by adding a vendor specific SCSI
module that would map the their specific value to a BLKERR_* one. And as
I said offlist it is not working perfectly becuase we are losing some
information in the translations.
>
> The extended error interpreter callout would be
> triggered indirectly by a call from
> __end_that_request_first to a extended error parser
> associated with the io request's queue whenever it
> sees a non-zero sense field of the io request.
> Perhaps the sense and sense_len fields in the
> request structure should be changed to not be
> SCSI specific.
So this just handles the sense type of error right? We still need
something seperate for the transport and device etc errors correct?
>
> Also, in order to allow for more variation and detail
> in the interpretation of device specific SCSI asc and
> ascq values, the results of the interpretation should
> not be required to be block layer generic, but instead
> are saved in something like a void *bi_extended_error
> field of the bio. __end_that_request_first would push
> the results of the extended_error interpretation to the
> bi_extended_error field of each bio in the request,
> similar to how Jens's code currently works.
>
> This extended error parsing approach should extend easily
> (without requiring a kernel revision for new BLKERR values)
> to new SCSI devices/models and their device specific asc
> and ascq values. This design could also be extended to
> support the interpretation of extended error information
> for non-SCSI block devices like DASD.
>
I am fine with such a thing. You basically described what we have been
talking about for some time but sperated the BLKERR part from the vendor
specific part (which solves the problem I was having in trying to use
the BLKERR value for two things). I am not the decision maker though so.
^ permalink raw reply [flat|nested] 7+ messages in thread
* RE: [RFC PATCH 4/4] convert scsi to blkerr error values
@ 2005-09-16 20:32 goggin, edward
2005-09-16 20:26 ` Mike Christie
0 siblings, 1 reply; 7+ messages in thread
From: goggin, edward @ 2005-09-16 20:32 UTC (permalink / raw)
To: 'Mike Christie', axboe, linux-scsi, dm-devel
Mike,
I don't think it is reasonably possible to anticipate
all possible parsing requirements for the asc and ascq
portions of SCSI sense information across all device
models. I'm in favor of having a "small" framework in
SCSI where a SCSI sense interpreter module (per
vendor & model possibly) could be registered
dynamically, by dm-emc.c for instance.
The extended error interpreter callout would be
triggered indirectly by a call from
__end_that_request_first to a extended error parser
associated with the io request's queue whenever it
sees a non-zero sense field of the io request.
Perhaps the sense and sense_len fields in the
request structure should be changed to not be
SCSI specific.
Also, in order to allow for more variation and detail
in the interpretation of device specific SCSI asc and
ascq values, the results of the interpretation should
not be required to be block layer generic, but instead
are saved in something like a void *bi_extended_error
field of the bio. __end_that_request_first would push
the results of the extended_error interpretation to the
bi_extended_error field of each bio in the request,
similar to how Jens's code currently works.
This extended error parsing approach should extend easily
(without requiring a kernel revision for new BLKERR values)
to new SCSI devices/models and their device specific asc
and ascq values. This design could also be extended to
support the interpretation of extended error information
for non-SCSI block devices like DASD.
Ed Goggin
> -----Original Message-----
> From: linux-scsi-owner@vger.kernel.org
> [mailto:linux-scsi-owner@vger.kernel.org] On Behalf Of Mike Christie
> Sent: Wednesday, August 24, 2005 5:05 AM
> To: axboe@suse.de; linux-scsi@vger.kernel.org; dm-devel@redhat.com
> Subject: [RFC PATCH 4/4] convert scsi to blkerr error values
>
> This patch convert scsi to use the BLKERR values.
> It uses the scsi_cmnd->result values to try
> and "guess" (LLDs are not all consistent in
> their host byte usage) if it is a transport or
> device or driver error. Maybe it would be better to
> have the transport classes define new host byte error
> values (or add a transport byte), and scsi-ml can then
> call into the transport class to tell it if the error
> is retryable or fatal.
>
> If this approach is ok, I guess the next step
> is more testing and figure out how to best
> support vendor specifics. While converting
> dm-emc's asc/ascq code there are comments
> indicating we might need to send additional
> commands to try and figure out what to do
> (Enginio or Sun people did you need this too?)
>
> The patch has been lightly tested with iscsi_sfnet
> and open-iscsi using dm-multipath to verify that
> when we send retryable errors dm-multipath does
> nice things like not automatically fail a path
> and fail the IO.
>
> Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
>
> diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
> --- a/drivers/scsi/scsi.c
> +++ b/drivers/scsi/scsi.c
> @@ -1269,6 +1269,7 @@ int scsi_device_cancel(struct scsi_devic
> scsi_eh_scmd_add(scmd,
> SCSI_EH_CANCEL_CMD);
> } else {
> scmd->result = (DID_ABORT << 16);
> + scmd->blk_result = BLKERR_FATAL_DEV;
> scsi_finish_command(scmd);
> }
> }
> diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
> --- a/drivers/scsi/scsi_error.c
> +++ b/drivers/scsi/scsi_error.c
> @@ -1213,6 +1213,7 @@ int scsi_decide_disposition(struct scsi_
> SCSI_LOG_ERROR_RECOVERY(5, printk("%s: device
> offline - report"
> " as SUCCESS\n",
> __FUNCTION__));
> + scmd->blk_result = BLKERR_FATAL_DEV;
> return SUCCESS;
> }
>
> @@ -1228,6 +1229,7 @@ int scsi_decide_disposition(struct scsi_
> * did_ok.
> */
> scmd->result &= 0xff00ffff;
> + scmd->blk_result = BLK_SUCCESS;
> return SUCCESS;
> case DID_OK:
> /*
> @@ -1236,7 +1238,10 @@ int scsi_decide_disposition(struct scsi_
> break;
> case DID_NO_CONNECT:
> case DID_BAD_TARGET:
> + scmd->blk_result = BLKERR_FATAL_XPT;
> + return SUCCESS;
> case DID_ABORT:
> + scmd->blk_result = BLKERR_RETRY_DEV;
> /*
> * note - this means that we just report the status back
> * to the top level driver, not that we actually think
> @@ -1253,6 +1258,7 @@ int scsi_decide_disposition(struct scsi_
> * and not get stuck in a loop.
> */
> case DID_SOFT_ERROR:
> + scmd->blk_result = BLKERR_RETRY_DEV;
> goto maybe_retry;
> case DID_IMM_RETRY:
> return NEEDS_RETRY;
> @@ -1269,11 +1275,12 @@ int scsi_decide_disposition(struct scsi_
> */
> break;
> /* fallthrough */
> -
> case DID_BUS_BUSY:
> case DID_PARITY:
> + scmd->blk_result = BLKERR_RETRY_XPT;
> goto maybe_retry;
> case DID_TIME_OUT:
> + scmd->blk_result = BLKERR_FATAL_DEV;
> /*
> * when we scan the bus, we get timeout messages for
> * these commands if there is no device available.
> @@ -1286,16 +1293,20 @@ int scsi_decide_disposition(struct scsi_
> return FAILED;
> }
> case DID_RESET:
> + scmd->blk_result = BLKERR_RETRY_DEV;
> return SUCCESS;
> default:
> + scmd->blk_result = BLKERR_FATAL_DEV;
> return FAILED;
> }
>
> /*
> * next, check the message byte.
> */
> - if (msg_byte(scmd->result) != COMMAND_COMPLETE)
> + if (msg_byte(scmd->result) != COMMAND_COMPLETE) {
> + scmd->blk_result = BLKERR_FATAL_DEV;
> return FAILED;
> + }
>
> /*
> * check the status byte to see if this indicates
> anything special.
> @@ -1315,13 +1326,19 @@ int scsi_decide_disposition(struct scsi_
> */
> return ADD_TO_MLQUEUE;
> case GOOD:
> + scmd->blk_result = BLK_SUCCESS;
> + return SUCCESS;
> case COMMAND_TERMINATED:
> case TASK_ABORTED:
> + scmd->blk_result = BLKERR_FATAL_DEV;
> return SUCCESS;
> case CHECK_CONDITION:
> rtn = scsi_check_sense(scmd);
> - if (rtn == NEEDS_RETRY)
> + if (rtn == NEEDS_RETRY) {
> + scmd->blk_result = BLKERR_RETRY_DEV;
> goto maybe_retry;
> + } else
> + scmd->blk_result = BLKERR_FATAL_DEV;
> /* if rtn == FAILED, we have no sense information;
> * returning FAILED will wake the error handler thread
> * to collect the sense and redo the decide
> @@ -1334,6 +1351,7 @@ int scsi_decide_disposition(struct scsi_
> /*
> * who knows? FIXME(eric)
> */
> + scmd->blk_result = BLK_SUCCESS;
> return SUCCESS;
>
> case RESERVATION_CONFLICT:
> @@ -1341,10 +1359,13 @@ int scsi_decide_disposition(struct scsi_
> " %d channel %d id %d lun %d\n",
> scmd->device->host->host_no,
> scmd->device->channel,
> scmd->device->id, scmd->device->lun);
> + scmd->blk_result = BLK_SUCCESS;
> return SUCCESS; /* causes immediate i/o error */
> default:
> + scmd->blk_result = BLKERR_FATAL_DEV;
> return FAILED;
> }
> + scmd->blk_result = BLKERR_FATAL_DEV;
> return FAILED;
>
> maybe_retry:
> @@ -1519,6 +1540,7 @@ static void scsi_eh_flush_done_q(struct
> SCSI_LOG_ERROR_RECOVERY(3, printk("%s:
> flush finish"
> " cmd: %p\n",
>
> current->comm, scmd));
> + scmd->blk_result = BLKERR_RETRY_DEV;
> scsi_finish_command(scmd);
> }
> }
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -513,7 +513,7 @@ void scsi_run_host_queues(struct Scsi_Ho
> * of upper level post-processing and scsi_io_completion).
> *
> * Arguments: cmd - command that is complete.
> - * uptodate - 1 if I/O indicates success, <= 0
> for I/O error.
> + * uptodate - a block layer error value
> * bytes - number of bytes of completed I/O
> * requeue - indicates whether we should requeue
> leftovers.
> *
> @@ -527,7 +527,7 @@ void scsi_run_host_queues(struct Scsi_Ho
> * We are guaranteeing that the request queue will
> be goosed
> * at some point during this call.
> */
> -static struct scsi_cmnd *scsi_end_request(struct scsi_cmnd
> *cmd, int uptodate,
> +static struct scsi_cmnd *scsi_end_request(struct scsi_cmnd
> *cmd, int blkerr,
> int bytes, int requeue)
> {
> request_queue_t *q = cmd->device->request_queue;
> @@ -538,15 +538,15 @@ static struct scsi_cmnd *scsi_end_reques
> * If there are blocks left over at the end, set up the command
> * to queue the remainder of them.
> */
> - if (end_that_request_chunk(req, uptodate, bytes)) {
> + if (end_that_request_chunk(req, blkerr, bytes)) {
> int leftover = (req->hard_nr_sectors << 9);
>
> if (blk_pc_request(req))
> leftover = req->data_len;
>
> /* kill remainder if no retrys */
> - if (!uptodate && blk_noretry_request(req))
> - end_that_request_chunk(req, 0, leftover);
> + if (blkerr != BLK_SUCCESS && blk_noretry_request(req))
> + end_that_request_chunk(req, blkerr, leftover);
> else {
> if (requeue)
> /*
> @@ -781,7 +781,8 @@ void scsi_io_completion(struct scsi_cmnd
> * requeueing right here - we will requeue down below
> * when we handle the bad sectors.
> */
> - cmd = scsi_end_request(cmd, 1, good_bytes, result == 0);
> + cmd = scsi_end_request(cmd, BLK_SUCCESS, good_bytes,
> + result == 0);
>
> /*
> * If the command completed without error, then
> either finish off the
> @@ -804,8 +805,8 @@ void scsi_io_completion(struct scsi_cmnd
> * and quietly refuse further access.
> */
> cmd->device->changed = 1;
> - cmd = scsi_end_request(cmd, 0,
> - this_count, 1);
> + cmd = scsi_end_request(cmd,
> BLKERR_FATAL_DEV,
> + this_count, 1);
> return;
> } else {
> /*
> @@ -838,7 +839,8 @@ void scsi_io_completion(struct scsi_cmnd
> scsi_requeue_command(q, cmd);
> result = 0;
> } else {
> - cmd = scsi_end_request(cmd, 0,
> this_count, 1);
> + cmd = scsi_end_request(cmd,
> cmd->blk_result,
> + this_count, 1);
> return;
> }
> break;
> @@ -853,7 +855,8 @@ void scsi_io_completion(struct scsi_cmnd
> }
> printk(KERN_INFO "Device %s not ready.\n",
> req->rq_disk ?
> req->rq_disk->disk_name : "");
> - cmd = scsi_end_request(cmd, 0, this_count, 1);
> + cmd = scsi_end_request(cmd, cmd->blk_result,
> + this_count, 1);
> return;
> case VOLUME_OVERFLOW:
> printk(KERN_INFO "Volume overflow <%d
> %d %d %d> CDB: ",
> @@ -862,7 +865,8 @@ void scsi_io_completion(struct scsi_cmnd
> (int)cmd->device->id,
> (int)cmd->device->lun);
> __scsi_print_command(cmd->data_cmnd);
> scsi_print_sense("", cmd);
> - cmd = scsi_end_request(cmd, 0, block_bytes, 1);
> + cmd = scsi_end_request(cmd, cmd->blk_result,
> + block_bytes, 1);
> return;
> default:
> break;
> @@ -894,7 +898,7 @@ void scsi_io_completion(struct scsi_cmnd
> block_bytes = req->hard_cur_sectors << 9;
> if (!block_bytes)
> block_bytes = req->data_len;
> - cmd = scsi_end_request(cmd, 0, block_bytes, 1);
> + cmd = scsi_end_request(cmd, cmd->blk_result,
> block_bytes, 1);
> }
> }
> EXPORT_SYMBOL(scsi_io_completion);
> @@ -1246,7 +1250,8 @@ static void scsi_kill_requests(request_q
> while ((req = elv_next_request(q)) != NULL) {
> blkdev_dequeue_request(req);
> req->flags |= REQ_QUIET;
> - while (end_that_request_first(req, 0, req->nr_sectors))
> + while (end_that_request_first(req, BLKERR_FATAL_DEV,
> + req->nr_sectors))
> ;
> end_that_request_last(req);
> }
> @@ -1301,7 +1306,8 @@ static void scsi_request_fn(struct reque
> sdev->host->host_no, sdev->id,
> sdev->lun);
> blkdev_dequeue_request(req);
> req->flags |= REQ_QUIET;
> - while (end_that_request_first(req, 0,
> req->nr_sectors))
> + while (end_that_request_first(req,
> BLKERR_FATAL_DEV,
> + req->nr_sectors))
> ;
> end_that_request_last(req);
> continue;
> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
> --- a/drivers/scsi/sd.c
> +++ b/drivers/scsi/sd.c
> @@ -754,7 +754,8 @@ static void sd_end_flush(request_queue_t
> /*
> * force journal abort of barriers
> */
> - end_that_request_first(rq, -EOPNOTSUPP,
> rq->hard_nr_sectors);
> + end_that_request_first(rq, BLKERR_NOTSUPP,
> + rq->hard_nr_sectors);
> end_that_request_last(rq);
> }
> }
> diff --git a/include/scsi/scsi_cmnd.h b/include/scsi/scsi_cmnd.h
> --- a/include/scsi/scsi_cmnd.h
> +++ b/include/scsi/scsi_cmnd.h
> @@ -127,6 +127,7 @@ struct scsi_cmnd {
> * to be at an
> address < 16Mb). */
>
> int result; /* Status code from lower level
> driver */
> + int blk_result; /* block layer return value */
>
> unsigned char tag; /* SCSI-II queued command tag */
> unsigned long pid; /* Process ID, starts at 0.
> Unique per host. */
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe
> linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [RFC PATCH 4/4] convert scsi to blkerr error values
2005-09-16 20:26 ` Mike Christie
@ 2005-09-16 20:53 ` Mike Christie
0 siblings, 0 replies; 7+ messages in thread
From: Mike Christie @ 2005-09-16 20:53 UTC (permalink / raw)
To: goggin, edward; +Cc: dm-devel, axboe, linux-scsi
Mike Christie wrote:
> goggin, edward wrote:
>
>> Mike,
>>
>> I don't think it is reasonably possible to anticipate
>> all possible parsing requirements for the asc and ascq
>> portions of SCSI sense information across all device
>> models. I'm in favor of having a "small" framework in
>> SCSI where a SCSI sense interpreter module (per
>> vendor & model possibly) could be registered
>> dynamically, by dm-emc.c for instance.
>
>
> Yeah I agree, I mentioned this before in some other mails. I think a
> module versus some table that userspace could write to were discussed.
>
> The BLKERR values were meant to be able to tell upper layer code whether
> a transport or device or driver error occured and whether the lower
> level thought it was retryable. But then I thought I could also wedge in
> the handling of the vendor specifcs by adding a vendor specific SCSI
> module that would map the their specific value to a BLKERR_* one. And as
> I said offlist it is not working perfectly becuase we are losing some
> information in the translations.
>
Oh yeah so the problem I am having is emc boxes may return "LUN Not
Ready - Manual Intervention Required". When dm-emc.c sees this error it
wants to bypass a group of paths and retry the IO but under ceratin
conditions not fail those paths. So I am not sure what to return for
this error. I thought if I redo my BLKERR so they describe the error like
BLKERR_DEV_NOT_READY
BLKERR_MANUAL_INTERVENTION_REQ
BLKERR_NOT_CONN
... and set them up as a bitmap like suggested by JamesB. I could return
BLKERR_MANUAL_INTERVENTION_REQ from a scsi module then have dm-emc.c
evaluate that value to a dm-mpaths return value of "MP_BYPASS_PG |
MP_RETRY_IO" which means bypass the priority group (group of paths) and
retry the IO.
But as more vendors use dm and they cannot use existing BLKERR values I
have to add more and more. And then we have to handle the case where
some block layer code may return BLKERR_MANUAL_INTERVENTION_REQ for
something not related to the reasons EMC's HW was returning "LUN Not
Ready - Manual Intervention Required" and we end up with getting things
wrong.
^ permalink raw reply [flat|nested] 7+ messages in thread
* RE: [RFC PATCH 4/4] convert scsi to blkerr error values
@ 2005-09-17 12:06 goggin, edward
0 siblings, 0 replies; 7+ messages in thread
From: goggin, edward @ 2005-09-17 12:06 UTC (permalink / raw)
To: 'Mike Christie', goggin, edward; +Cc: dm-devel, axboe, linux-scsi
> -----Original Message-----
> From: Mike Christie [mailto:michaelc@cs.wisc.edu]
> Sent: Friday, September 16, 2005 4:54 PM
> To: goggin, edward
> Cc: axboe@suse.de; linux-scsi@vger.kernel.org; dm-devel@redhat.com
> Subject: Re: [RFC PATCH 4/4] convert scsi to blkerr error values
>
> Mike Christie wrote:
> > goggin, edward wrote:
> >
> >> Mike,
> >>
> >> I don't think it is reasonably possible to anticipate
> >> all possible parsing requirements for the asc and ascq
> >> portions of SCSI sense information across all device
> >> models. I'm in favor of having a "small" framework in
> >> SCSI where a SCSI sense interpreter module (per
> >> vendor & model possibly) could be registered
> >> dynamically, by dm-emc.c for instance.
> >
> >
> > Yeah I agree, I mentioned this before in some other mails.
> I think a
> > module versus some table that userspace could write to were
> discussed.
> >
> > The BLKERR values were meant to be able to tell upper layer
> code whether
> > a transport or device or driver error occured and whether the lower
> > level thought it was retryable. But then I thought I could
> also wedge in
> > the handling of the vendor specifcs by adding a vendor
> specific SCSI
> > module that would map the their specific value to a
> BLKERR_* one. And as
> > I said offlist it is not working perfectly becuase we are
> losing some
> > information in the translations.
> >
>
> Oh yeah so the problem I am having is emc boxes may return "LUN Not
> Ready - Manual Intervention Required". When dm-emc.c sees
> this error it
> wants to bypass a group of paths and retry the IO but under ceratin
> conditions not fail those paths. So I am not sure what to return for
> this error. I thought if I redo my BLKERR so they describe
> the error like
>
> BLKERR_DEV_NOT_READY
> BLKERR_MANUAL_INTERVENTION_REQ
> BLKERR_NOT_CONN
>
> ... and set them up as a bitmap like suggested by JamesB. I
> could return
> BLKERR_MANUAL_INTERVENTION_REQ from a scsi module then have dm-emc.c
> evaluate that value to a dm-mpaths return value of "MP_BYPASS_PG |
> MP_RETRY_IO" which means bypass the priority group (group of
> paths) and
> retry the IO.
>
> But as more vendors use dm and they cannot use existing
> BLKERR values I
> have to add more and more.
I was hoping we could avoid the need to do this by having the framework
described in my email -- the idea for which I heard about from you in
the first place :))
> some block layer code may return BLKERR_MANUAL_INTERVENTION_REQ for
> something not related to the reasons EMC's HW was returning "LUN Not
> Ready - Manual Intervention Required" and we end up with
> getting things
> wrong.
Good point, I hadn't thought of that case.
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* RE: [RFC PATCH 4/4] convert scsi to blkerr error values
@ 2005-09-17 12:10 goggin, edward
2005-09-17 15:41 ` Mike Christie
0 siblings, 1 reply; 7+ messages in thread
From: goggin, edward @ 2005-09-17 12:10 UTC (permalink / raw)
To: 'Mike Christie', goggin, edward; +Cc: axboe, linux-scsi, dm-devel
> -----Original Message-----
> From: Mike Christie [mailto:michaelc@cs.wisc.edu]
> Sent: Friday, September 16, 2005 4:26 PM
> To: goggin, edward
> Cc: axboe@suse.de; linux-scsi@vger.kernel.org; dm-devel@redhat.com
> Subject: Re: [RFC PATCH 4/4] convert scsi to blkerr error values
>
> goggin, edward wrote:
> > Mike,
> >
> > I don't think it is reasonably possible to anticipate
> > all possible parsing requirements for the asc and ascq
> > portions of SCSI sense information across all device
> > models. I'm in favor of having a "small" framework in
> > SCSI where a SCSI sense interpreter module (per
> > vendor & model possibly) could be registered
> > dynamically, by dm-emc.c for instance.
>
> Yeah I agree, I mentioned this before in some other mails. I think a
> module versus some table that userspace could write to were discussed.
Yes, I first heard about this idea from you on one of the multipathing
conference calls. I wasn't sure if you were still advocating for this
approach though :))
>
> The BLKERR values were meant to be able to tell upper layer
> code whether
> a transport or device or driver error occured and whether the lower
> level thought it was retryable. But then I thought I could
> also wedge in
> the handling of the vendor specifcs by adding a vendor specific SCSI
> module that would map the their specific value to a BLKERR_*
> one. And as
> I said offlist it is not working perfectly becuase we are losing some
> information in the translations.
>
> >
> > The extended error interpreter callout would be
> > triggered indirectly by a call from
> > __end_that_request_first to a extended error parser
> > associated with the io request's queue whenever it
> > sees a non-zero sense field of the io request.
> > Perhaps the sense and sense_len fields in the
> > request structure should be changed to not be
> > SCSI specific.
>
> So this just handles the sense type of error right? We still need
> something seperate for the transport and device etc errors correct?
Yeah, I forgot about those.
>
> >
> > Also, in order to allow for more variation and detail
> > in the interpretation of device specific SCSI asc and
> > ascq values, the results of the interpretation should
> > not be required to be block layer generic, but instead
> > are saved in something like a void *bi_extended_error
> > field of the bio. __end_that_request_first would push
> > the results of the extended_error interpretation to the
> > bi_extended_error field of each bio in the request,
> > similar to how Jens's code currently works.
> >
> > This extended error parsing approach should extend easily
> > (without requiring a kernel revision for new BLKERR values)
> > to new SCSI devices/models and their device specific asc
> > and ascq values. This design could also be extended to
> > support the interpretation of extended error information
> > for non-SCSI block devices like DASD.
> >
>
> I am fine with such a thing. You basically described what we
> have been
> talking about for some time but sperated the BLKERR part from
> the vendor
> specific part (which solves the problem I was having in trying to use
> the BLKERR value for two things). I am not the decision maker
> though so.
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [RFC PATCH 4/4] convert scsi to blkerr error values
2005-09-17 12:10 [RFC PATCH 4/4] convert scsi to blkerr error values goggin, edward
@ 2005-09-17 15:41 ` Mike Christie
0 siblings, 0 replies; 7+ messages in thread
From: Mike Christie @ 2005-09-17 15:41 UTC (permalink / raw)
To: goggin, edward; +Cc: axboe, linux-scsi, dm-devel
goggin, edward wrote:
>>-----Original Message-----
>>From: Mike Christie [mailto:michaelc@cs.wisc.edu]
>>Sent: Friday, September 16, 2005 4:26 PM
>>To: goggin, edward
>>Cc: axboe@suse.de; linux-scsi@vger.kernel.org; dm-devel@redhat.com
>>Subject: Re: [RFC PATCH 4/4] convert scsi to blkerr error values
>>
>>goggin, edward wrote:
>>
>>>Mike,
>>>
>>>I don't think it is reasonably possible to anticipate
>>>all possible parsing requirements for the asc and ascq
>>>portions of SCSI sense information across all device
>>>models. I'm in favor of having a "small" framework in
>>>SCSI where a SCSI sense interpreter module (per
>>>vendor & model possibly) could be registered
>>>dynamically, by dm-emc.c for instance.
>>
>>Yeah I agree, I mentioned this before in some other mails. I think a
>>module versus some table that userspace could write to were discussed.
>
>
> Yes, I first heard about this idea from you on one of the multipathing
> cosense. In Lars's comments it stated we may need to send another request nference calls. I wasn't sure if you were still advocating for this
> approach though :))
>
To actually implement the vendor specifics, I think I was just being
lazy and waiting to hear about what people need when they decode the
sense. In Lars's comments it stated we may need to send another request
to determine if the paths need to be failed. I think SUN may have
wanted the same thing. Any ideas? Did you guys need to do this in your
multipath solution?
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2005-09-17 17:42 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-09-17 12:10 [RFC PATCH 4/4] convert scsi to blkerr error values goggin, edward
2005-09-17 15:41 ` Mike Christie
-- strict thread matches above, loose matches on Subject: below --
2005-09-17 12:06 goggin, edward
2005-09-16 20:32 goggin, edward
2005-09-16 20:26 ` Mike Christie
2005-09-16 20:53 ` Mike Christie
2005-08-24 9:04 Mike Christie
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).