From: Boaz Harrosh <bharrosh@panasas.com>
To: Min Zhang <mzhang@mvista.com>, Tejun Heo <tj@kernel.org>,
James Bottomley <James.Bottomley@suse.de>
Cc: linux-scsi@vger.kernel.org
Subject: Re: [PATCH] scsi: timeout reset timer before command is queued
Date: Sun, 13 Dec 2009 11:52:34 +0200 [thread overview]
Message-ID: <4B24B962.4070804@panasas.com> (raw)
In-Reply-To: <4B21B143.0@mvista.com>
On 12/11/2009 04:41 AM, Min Zhang wrote:
> Kernel panic of NULL pointer in scsi_dispatch_cmd during fiber channel
> scsi disk access. It is because scsi command can time out even before
> dispatcher queues it to lower level driver, then scsi_times_out error
> handler could complete the request and free the command memory while
> scsi_dispatch_cmd still uses the command pointer.
>
> This patch adds new SCSI_EH_RESET_TIMER flag per command to indicate the
> command hasn't been queued, so scsi_times_out won't complete and free
> the command. req->special command pointer is also NULLed when the
> command is freed.
>
> This premature time out is rare, mostly triggered by occasional network
> delay for fibre channel scsi disk. The patch is verified by
> instrumenting a testing delay to force scsi_times_out and then
> monitoring the command pointer integrity.
>
> Signed-off-by: Min Zhang <mzhang@mvista.com>
>
OK I stare at the existing code hard and I do not understand why we
ask about (blk_queue_tagged(q) && !blk_queue_start_tag(q, req)) twice?
And why we let in a failing chance between the blk_start and the dispatch?
I mean why not do the below?
This should also solve Min's problem by not letting any failure between
blk_start_request and scsi_dispatch_cmd.
(Only compile tested just to explain the question above)
---
git diff --stat -p -M drivers/scsi/
drivers/scsi/scsi_lib.c | 38 ++++++++++++++++++--------------------
1 files changed, 18 insertions(+), 20 deletions(-)
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 5987da8..e748b6f 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1487,26 +1487,6 @@ static void scsi_request_fn(struct request_queue *q)
continue;
}
-
- /*
- * Remove the request from the request list.
- */
- if (!(blk_queue_tagged(q) && !blk_queue_start_tag(q, req)))
- blk_start_request(req);
- sdev->device_busy++;
-
- spin_unlock(q->queue_lock);
- cmd = req->special;
- if (unlikely(cmd == NULL)) {
- printk(KERN_CRIT "impossible request in %s.\n"
- "please mail a stack trace to "
- "linux-scsi@vger.kernel.org\n",
- __func__);
- blk_dump_rq_flags(req, "foo");
- BUG();
- }
- spin_lock(shost->host_lock);
-
/*
* We hit this when the driver is using a host wide
* tag map. For device level tag maps the queue_depth check
@@ -1528,6 +1508,24 @@ static void scsi_request_fn(struct request_queue *q)
if (!scsi_host_queue_ready(q, shost, sdev))
goto not_ready;
+ /*
+ * Remove the request from the request list.
+ */
+ blk_start_request(req);
+ sdev->device_busy++;
+
+ spin_unlock(q->queue_lock);
+ cmd = req->special;
+ if (unlikely(cmd == NULL)) {
+ printk(KERN_CRIT "impossible request in %s.\n"
+ "please mail a stack trace to "
+ "linux-scsi@vger.kernel.org\n",
+ __func__);
+ blk_dump_rq_flags(req, "foo");
+ BUG();
+ }
+ spin_lock(shost->host_lock);
+
scsi_target(sdev)->target_busy++;
shost->host_busy++;
> diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
> index dd098ca..9be78a5 100644
> --- a/drivers/scsi/scsi.c
> +++ b/drivers/scsi/scsi.c
> @@ -743,6 +743,9 @@ int scsi_dispatch_cmd(struct scsi_cmnd *cmd)
> */
> scsi_cmd_get_serial(host, cmd);
>
> + /* Timeout error handler can start processing cmd now */
> + cmd->eh_eflags &= ~SCSI_EH_RESET_TIMER;
> +
> if (unlikely(host->shost_state == SHOST_DEL)) {
> cmd->result = (DID_NO_CONNECT << 16);
> scsi_done(cmd);
> diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
> index 1b0060b..94dca64 100644
> --- a/drivers/scsi/scsi_error.c
> +++ b/drivers/scsi/scsi_error.c
> @@ -128,6 +128,14 @@ enum blk_eh_timer_return scsi_times_out(struct
> request *req)
>
> scsi_log_completion(scmd, TIMEOUT_ERROR);
>
> + /*
> + * Extend timeout if cmd has not been queued yet, otherwise error
> + * handler could complete request and free the cmd memory while
> + * the dispatch handler still uses the cmd pointer.
> + */
> + if (scmd->eh_eflags | SCSI_EH_RESET_TIMER)
> + return BLK_EH_RESET_TIMER;
> +
> if (scmd->device->host->transportt->eh_timed_out)
> rtn = scmd->device->host->transportt->eh_timed_out(scmd);
> else if (scmd->device->host->hostt->eh_timed_out)
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index 5987da8..fd1afb6 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -492,11 +492,13 @@ void scsi_next_command(struct scsi_cmnd *cmd)
> {
> struct scsi_device *sdev = cmd->device;
> struct request_queue *q = sdev->request_queue;
> + struct request *req = cmd->request;
>
> /* need to hold a reference on the device before we let go of the
> cmd */
> get_device(&sdev->sdev_gendev);
>
> scsi_put_command(cmd);
> + req->special = NULL;
> scsi_run_queue(q);
>
> /* ok to remove device now */
> @@ -1491,12 +1493,21 @@ static void scsi_request_fn(struct request_queue *q)
> /*
> * Remove the request from the request list.
> */
> + cmd = req->special;
> if (!(blk_queue_tagged(q) && !blk_queue_start_tag(q, req)))
> + {
> + /*
> + * Prevent timeout error handler from processing
> + * the cmd until it is queued in scsi_dispatch_cmd,
> + * otherwise cmd pointer might be freed by error handle
> + */
> + cmd->eh_eflags |= SCSI_EH_RESET_TIMER;
> +
> blk_start_request(req);
> + }
> sdev->device_busy++;
>
> spin_unlock(q->queue_lock);
> - cmd = req->special;
> if (unlikely(cmd == NULL)) {
> printk(KERN_CRIT "impossible request in %s.\n"
> "please mail a stack trace to "
> diff --git a/drivers/scsi/scsi_priv.h b/drivers/scsi/scsi_priv.h
> index 1fbf7c7..4c71010 100644
> --- a/drivers/scsi/scsi_priv.h
> +++ b/drivers/scsi/scsi_priv.h
> @@ -16,6 +16,7 @@ struct scsi_nl_hdr;
> * Scsi Error Handler Flags
> */
> #define SCSI_EH_CANCEL_CMD 0x0001 /* Cancel this cmd */
> +#define SCSI_EH_RESET_TIMER 0x0002 /* Reset timer on the this cmd */
>
> #define SCSI_SENSE_VALID(scmd) \
> (((scmd)->sense_buffer[0] & 0x70) == 0x70)
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2009-12-13 9:52 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-12-11 2:41 [PATCH] scsi: timeout reset timer before command is queued Min Zhang
2009-12-13 9:52 ` Boaz Harrosh [this message]
2009-12-15 1:17 ` Min Zhang
2009-12-15 2:21 ` Matthew Wilcox
2009-12-15 2:40 ` Min Zhang
2009-12-15 2:59 ` Matthew Wilcox
2009-12-15 10:40 ` Matthew Wilcox
2009-12-15 13:16 ` Boaz Harrosh
2009-12-15 20:53 ` Min Zhang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4B24B962.4070804@panasas.com \
--to=bharrosh@panasas.com \
--cc=James.Bottomley@suse.de \
--cc=linux-scsi@vger.kernel.org \
--cc=mzhang@mvista.com \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.