From: Boaz Harrosh <bharrosh@panasas.com>
To: Min Zhang <mzhang@mvista.com>, Tejun Heo <tj@kernel.org>,
James Bottomley <James.Bottomley@suse.de>
Cc: linux-scsi@vger.kernel.org
Subject: Re: [PATCH] scsi: timeout reset timer before command is queued
Date: Sun, 13 Dec 2009 11:52:34 +0200 [thread overview]
Message-ID: <4B24B962.4070804@panasas.com> (raw)
In-Reply-To: <4B21B143.0@mvista.com>
On 12/11/2009 04:41 AM, Min Zhang wrote:
> Kernel panic of NULL pointer in scsi_dispatch_cmd during fiber channel
> scsi disk access. It is because scsi command can time out even before
> dispatcher queues it to lower level driver, then scsi_times_out error
> handler could complete the request and free the command memory while
> scsi_dispatch_cmd still uses the command pointer.
>
> This patch adds new SCSI_EH_RESET_TIMER flag per command to indicate the
> command hasn't been queued, so scsi_times_out won't complete and free
> the command. req->special command pointer is also NULLed when the
> command is freed.
>
> This premature time out is rare, mostly triggered by occasional network
> delay for fibre channel scsi disk. The patch is verified by
> instrumenting a testing delay to force scsi_times_out and then
> monitoring the command pointer integrity.
>
> Signed-off-by: Min Zhang <mzhang@mvista.com>
>
OK I stare at the existing code hard and I do not understand why we
ask about (blk_queue_tagged(q) && !blk_queue_start_tag(q, req)) twice?
And why we let in a failing chance between the blk_start and the dispatch?
I mean why not do the below?
This should also solve Min's problem by not letting any failure between
blk_start_request and scsi_dispatch_cmd.
(Only compile tested just to explain the question above)
---
git diff --stat -p -M drivers/scsi/
drivers/scsi/scsi_lib.c | 38 ++++++++++++++++++--------------------
1 files changed, 18 insertions(+), 20 deletions(-)
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 5987da8..e748b6f 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1487,26 +1487,6 @@ static void scsi_request_fn(struct request_queue *q)
continue;
}
-
- /*
- * Remove the request from the request list.
- */
- if (!(blk_queue_tagged(q) && !blk_queue_start_tag(q, req)))
- blk_start_request(req);
- sdev->device_busy++;
-
- spin_unlock(q->queue_lock);
- cmd = req->special;
- if (unlikely(cmd == NULL)) {
- printk(KERN_CRIT "impossible request in %s.\n"
- "please mail a stack trace to "
- "linux-scsi@vger.kernel.org\n",
- __func__);
- blk_dump_rq_flags(req, "foo");
- BUG();
- }
- spin_lock(shost->host_lock);
-
/*
* We hit this when the driver is using a host wide
* tag map. For device level tag maps the queue_depth check
@@ -1528,6 +1508,24 @@ static void scsi_request_fn(struct request_queue *q)
if (!scsi_host_queue_ready(q, shost, sdev))
goto not_ready;
+ /*
+ * Remove the request from the request list.
+ */
+ blk_start_request(req);
+ sdev->device_busy++;
+
+ spin_unlock(q->queue_lock);
+ cmd = req->special;
+ if (unlikely(cmd == NULL)) {
+ printk(KERN_CRIT "impossible request in %s.\n"
+ "please mail a stack trace to "
+ "linux-scsi@vger.kernel.org\n",
+ __func__);
+ blk_dump_rq_flags(req, "foo");
+ BUG();
+ }
+ spin_lock(shost->host_lock);
+
scsi_target(sdev)->target_busy++;
shost->host_busy++;
> diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
> index dd098ca..9be78a5 100644
> --- a/drivers/scsi/scsi.c
> +++ b/drivers/scsi/scsi.c
> @@ -743,6 +743,9 @@ int scsi_dispatch_cmd(struct scsi_cmnd *cmd)
> */
> scsi_cmd_get_serial(host, cmd);
>
> + /* Timeout error handler can start processing cmd now */
> + cmd->eh_eflags &= ~SCSI_EH_RESET_TIMER;
> +
> if (unlikely(host->shost_state == SHOST_DEL)) {
> cmd->result = (DID_NO_CONNECT << 16);
> scsi_done(cmd);
> diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
> index 1b0060b..94dca64 100644
> --- a/drivers/scsi/scsi_error.c
> +++ b/drivers/scsi/scsi_error.c
> @@ -128,6 +128,14 @@ enum blk_eh_timer_return scsi_times_out(struct
> request *req)
>
> scsi_log_completion(scmd, TIMEOUT_ERROR);
>
> + /*
> + * Extend timeout if cmd has not been queued yet, otherwise error
> + * handler could complete request and free the cmd memory while
> + * the dispatch handler still uses the cmd pointer.
> + */
> + if (scmd->eh_eflags | SCSI_EH_RESET_TIMER)
> + return BLK_EH_RESET_TIMER;
> +
> if (scmd->device->host->transportt->eh_timed_out)
> rtn = scmd->device->host->transportt->eh_timed_out(scmd);
> else if (scmd->device->host->hostt->eh_timed_out)
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index 5987da8..fd1afb6 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -492,11 +492,13 @@ void scsi_next_command(struct scsi_cmnd *cmd)
> {
> struct scsi_device *sdev = cmd->device;
> struct request_queue *q = sdev->request_queue;
> + struct request *req = cmd->request;
>
> /* need to hold a reference on the device before we let go of the
> cmd */
> get_device(&sdev->sdev_gendev);
>
> scsi_put_command(cmd);
> + req->special = NULL;
> scsi_run_queue(q);
>
> /* ok to remove device now */
> @@ -1491,12 +1493,21 @@ static void scsi_request_fn(struct request_queue *q)
> /*
> * Remove the request from the request list.
> */
> + cmd = req->special;
> if (!(blk_queue_tagged(q) && !blk_queue_start_tag(q, req)))
> + {
> + /*
> + * Prevent timeout error handler from processing
> + * the cmd until it is queued in scsi_dispatch_cmd,
> + * otherwise cmd pointer might be freed by error handle
> + */
> + cmd->eh_eflags |= SCSI_EH_RESET_TIMER;
> +
> blk_start_request(req);
> + }
> sdev->device_busy++;
>
> spin_unlock(q->queue_lock);
> - cmd = req->special;
> if (unlikely(cmd == NULL)) {
> printk(KERN_CRIT "impossible request in %s.\n"
> "please mail a stack trace to "
> diff --git a/drivers/scsi/scsi_priv.h b/drivers/scsi/scsi_priv.h
> index 1fbf7c7..4c71010 100644
> --- a/drivers/scsi/scsi_priv.h
> +++ b/drivers/scsi/scsi_priv.h
> @@ -16,6 +16,7 @@ struct scsi_nl_hdr;
> * Scsi Error Handler Flags
> */
> #define SCSI_EH_CANCEL_CMD 0x0001 /* Cancel this cmd */
> +#define SCSI_EH_RESET_TIMER 0x0002 /* Reset timer on the this cmd */
>
> #define SCSI_SENSE_VALID(scmd) \
> (((scmd)->sense_buffer[0] & 0x70) == 0x70)
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2009-12-13 9:52 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-12-11 2:41 [PATCH] scsi: timeout reset timer before command is queued Min Zhang
2009-12-13 9:52 ` Boaz Harrosh [this message]
2009-12-15 1:17 ` Min Zhang
2009-12-15 2:21 ` Matthew Wilcox
2009-12-15 2:40 ` Min Zhang
2009-12-15 2:59 ` Matthew Wilcox
2009-12-15 10:40 ` Matthew Wilcox
2009-12-15 13:16 ` Boaz Harrosh
2009-12-15 20:53 ` Min Zhang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4B24B962.4070804@panasas.com \
--to=bharrosh@panasas.com \
--cc=James.Bottomley@suse.de \
--cc=linux-scsi@vger.kernel.org \
--cc=mzhang@mvista.com \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox