From mboxrd@z Thu Jan 1 00:00:00 1970 From: Luben Tuikov Subject: [PATCH]: Flexible timeout infrastructure Date: Tue, 15 Jun 2004 11:02:55 -0400 Sender: linux-scsi-owner@vger.kernel.org Message-ID: <40CF0F9F.4050902@adaptec.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Return-path: Received: from magic.adaptec.com ([216.52.22.17]:61855 "EHLO magic.adaptec.com") by vger.kernel.org with ESMTP id S265676AbUFOPC5 (ORCPT ); Tue, 15 Jun 2004 11:02:57 -0400 Received: from redfish.adaptec.com (redfish.adaptec.com [162.62.50.11]) by magic.adaptec.com (8.11.6/8.11.6) with ESMTP id i5FF2uX28339 for ; Tue, 15 Jun 2004 08:02:56 -0700 Received: from rtpe2k01.adaptec.com (rtpe2k01.adaptec.com [10.110.12.40]) by redfish.adaptec.com (8.11.6/8.11.6) with ESMTP id i5FF2uo29249 for ; Tue, 15 Jun 2004 08:02:56 -0700 List-Id: linux-scsi@vger.kernel.org To: SCSI Mailing List Hello, This patch introduces a flexible command timeout infrastructure accomodating completely current behaviour SCSI Core command timeout, but also offering the ability to hand command timeout handling to a LLDD, _yet_ still have it all go through *SCSI Core*. This is somewhat similar to Christoph's proposal patch, but I wasn't aware of his patch when I thought of this. Interestingly enough it can be viewed as an extension to his patch. The patch is very short (minimal chages indeed), but the text a bit lenghty since it outlines the many different a usage of such an infrastructure. New method: struct scsi_host_template :: int (*eh_cmd_timed_out)(struct scsi_cmnd *); This method is marked as OPTIONAL status. If the driver does not define eh_cmd_timed_out() method in its host template, then current behaviour is assumed. If the driver does define it, then SCSI Core will give the command to LLDD without a timer running. Then when the LLDD notifies the transport and before it sends it to the interconnect it start the timer *by calling SCSI Core*, scsi_add_timer(,,scsi_times_out). Timeline: ---+-------------------------+-------------> t t0 t1 SCSI Core::queuecommand(), LLDD::scsi_add_timer(,,scsi_times_out) NB: |t1-t0| is *always* bounded! (hint) SUCCESS: If the command completes, LLDD calls scsi_delete_timer(), and then scsi_done(). This completes the SCSI command transaction. TO: If the command times out, then *SCSI Core* is called, scsi_times_out(), which calls the defined, eh_cmd_timed_out() method. RETURNS: eh_cmd_timed_out() returns: EH_HANDLED: - LLDD has called scsi_done() having set the status/response codes appropriately, XOR - resent the command back to the LU. Either way, there's nothing to do at this point and the timer routine completes. EH_NONE: - SCSI Core should do error handling as usual. Note that after eh_cmd_timed_out() returns, we are in _consistent_ state: - either the command is back out to the LU: timer has been rerun by the LLDD, the command belongs to LLDD, SCSI Core should do nothing -- just the same as if it never fired, XOR - scsi_done() is called and SCSI Core will be processing the command shortly, no timer is running since it fired (since we're executing in this method). Consistent as if the command completed. Or we returned EH_NONE, and SCSI Core adds it to the eh list for recovery. Let me know your comments, USB, FC, SAS, iSCSI device driver writers. The main point of this patch is complicated _recovery_ of protocols for which SCSI Core cannot have internal knowlege, but will get notified *at every step*. I.e. SAM like. Note that the driver may also decide to do some processing in the eh_cmd_timed_out() method (interrupt context) and still return EH_NONE, to get back at that command in process context from the eh tread (i.e. if TMF ABORT TASK takes some time to return). Against scsi-misc-2.6: ===== drivers/scsi/scsi.c 1.143 vs edited ===== --- 1.143/drivers/scsi/scsi.c 2004-04-28 12:32:09 -04:00 +++ edited/drivers/scsi/scsi.c 2004-06-14 15:13:12 -04:00 @@ -549,7 +549,12 @@ host->resetting = 0; } - scsi_add_timer(cmd, cmd->timeout_per_command, scsi_times_out); + /* If the LLDD claims to be able to handle its own command timeout, + * don't start the timer. It will start it before sending the command + * to the LU. + */ + if (!host->hostt->eh_cmd_timed_out) + scsi_add_timer(cmd, cmd->timeout_per_command, scsi_times_out); scsi_log_send(cmd); @@ -699,8 +704,9 @@ * that function could really be. It might be on another processor, * etc, etc. */ - if (!scsi_delete_timer(cmd)) - return; + if (!cmd->device->host->hostt->eh_cmd_timed_out) + if (!scsi_delete_timer(cmd)) + return; /* * Set the serial numbers back to zero ===== drivers/scsi/scsi_error.c 1.76 vs edited ===== --- 1.76/drivers/scsi/scsi_error.c 2004-06-04 12:54:06 -04:00 +++ edited/drivers/scsi/scsi_error.c 2004-06-14 15:13:53 -04:00 @@ -155,8 +155,8 @@ } /** - * scsi_times_out - Timeout function for normal scsi commands. - * @scmd: Cmd that is timing out. + * scsi_times_out - Timeout function for normal SCSI commands. + * @scmd: Command that is timing out. * * Notes: * We do not need to lock this. There is the potential for a race @@ -166,7 +166,16 @@ **/ void scsi_times_out(struct scsi_cmnd *scmd) { + int ret = EH_NONE; + scsi_log_completion(scmd, TIMEOUT_ERROR); + + if (scmd->device->host->hostt->eh_cmd_timed_out) + ret = scmd->device->host->hostt->eh_cmd_timed_out(scmd); + + if (ret == EH_HANDLED) + return; + if (unlikely(!scsi_eh_scmd_add(scmd, SCSI_EH_CANCEL_CMD))) { panic("Error handler thread not present at %p %p %s %d", scmd, scmd->device->host, __FILE__, __LINE__); ===== drivers/scsi/scsi_syms.c 1.47 vs edited ===== --- 1.47/drivers/scsi/scsi_syms.c 2004-05-19 13:46:14 -04:00 +++ edited/drivers/scsi/scsi_syms.c 2004-06-14 11:09:29 -04:00 @@ -107,3 +107,4 @@ */ EXPORT_SYMBOL(scsi_add_timer); EXPORT_SYMBOL(scsi_delete_timer); +EXPORT_SYMBOL(scsi_times_out); ===== include/scsi/scsi.h 1.21 vs edited ===== --- 1.21/include/scsi/scsi.h 2004-04-21 12:54:43 -04:00 +++ edited/include/scsi/scsi.h 2004-06-14 14:33:15 -04:00 @@ -327,6 +327,12 @@ #define SCSI_MLQUEUE_EH_RETRY 0x1057 /* + * LLDD timeout handler return values + */ +#define EH_NONE (0) +#define EH_HANDLED (1) + +/* * Use these to separate status msg and our bytes * * These are set by: ===== include/scsi/scsi_eh.h 1.2 vs edited ===== --- 1.2/include/scsi/scsi_eh.h 2004-06-04 07:45:01 -04:00 +++ edited/include/scsi/scsi_eh.h 2004-06-14 11:12:13 -04:00 @@ -7,10 +7,11 @@ extern void scsi_add_timer(struct scsi_cmnd *, int, void (*)(struct scsi_cmnd *)); -extern int scsi_delete_timer(struct scsi_cmnd *); +extern int scsi_delete_timer(struct scsi_cmnd *); +extern void scsi_times_out(struct scsi_cmnd *); extern void scsi_report_bus_reset(struct Scsi_Host *, int); extern void scsi_report_device_reset(struct Scsi_Host *, int, int); -extern int scsi_block_when_processing_errors(struct scsi_device *); +extern int scsi_block_when_processing_errors(struct scsi_device *); extern void scsi_sleep(int); /* ===== include/scsi/scsi_host.h 1.17 vs edited ===== --- 1.17/include/scsi/scsi_host.h 2004-06-04 12:51:31 -04:00 +++ edited/include/scsi/scsi_host.h 2004-06-14 15:12:23 -04:00 @@ -125,6 +125,40 @@ int (* eh_bus_reset_handler)(struct scsi_cmnd *); int (* eh_host_reset_handler)(struct scsi_cmnd *); + + /* + * If defined, SCSI Core will leave _all_ command timeout + * handling to the LLDD. The LLDD should call + * scsi_add_timer(,,scsi_times_out) when + * appropriate, possibly before sending the command to the LU, + * and scsi_delete_timer() before returning it to SCSI + * Core. That is, a SCSI command is passed to the LLDD (via + * queuecommand()) _without_ a timer running and is expected + * to be returned to SCSI Core (via scsi_done()) just the + * same, without one running. + * + * This method is called whenever a SCSI command times out, if + * the driver had called + * scsi_add_timer(,,scsi_times_out) of course. + * + * Returns: EH_HANDLED or EH_NONE. + * EH_HANDLED -- means that the LLDD has taken all steps + * to ensure proper command recovery handling and had + * A) either called scsi_done() and set the response/result + * properly in the SCSI command structure, + * XOR + * B) resent the command to the LU. + * EH_NONE -- SCSI Core should try to recover the command as + * usual (eh recovery thread, etc). + * + * If this method is not defined, current/old behaviour is assumed. + * + * NOTE: This runs in interrupt context, so it cannot sleep. + * + * STATUS: OPTIONAL + */ + int (* eh_cmd_timed_out)(struct scsi_cmnd *); + /* * Old EH handlers, no longer used. Make them warn the user of old * drivers by using a wrong type -- Luben