From: James Bottomley <James.Bottomley@HansenPartnership.com>
To: Alan Stern <stern@rowland.harvard.edu>
Cc: Roland Dreier <roland@kernel.org>,
Heiko Carstens <heiko.carstens@de.ibm.com>,
Jens Axboe <jaxboe@fusionio.com>,
linux-scsi@vger.kernel.org,
Steffen Maier <maier@linux.vnet.ibm.com>,
"Manvanthara B. Puttashankar" <manvanth@linux.vnet.ibm.com>,
Tarak Reddy <tarak.reddy@in.ibm.com>,
"Seshagiri N. Ippili" <sesh17@linux.vnet.ibm.com>
Subject: Re: [BUG] 2.6.39.1 crash in scsi_dispatch_cmd()
Date: Thu, 07 Jul 2011 15:45:40 -0500 [thread overview]
Message-ID: <1310071540.3282.71.camel@mulgrave> (raw)
In-Reply-To: <Pine.LNX.4.44L0.1107061432340.1995-100000@iolanthe.rowland.org>
On Wed, 2011-07-06 at 14:49 -0400, Alan Stern wrote:
> On Wed, 6 Jul 2011, Roland Dreier wrote:
>
> > On Wed, Jul 6, 2011 at 9:53 AM, Alan Stern <stern@rowland.harvard.edu> wrote:
> > > He probably meant blk_execute_rq_nowait(). The test has to be done
> > > before the elevator is accessed.
> >
> > Hmm, seems we would need the test in multiple places, since my second call
> > trace is io_schedule -> blk_flush_plug_list -> queue_unplugged ->
> > __blk_run_queue
> >
> > So I don't think I hit blk_execute_rq_nowait in my crash.
> >
> > But maybe the problem is that dm-multipath is trying to requeue the IO to an
> > underlying sdX device that is already dead?
>
> I'm not at all familiar with the block layer. It seems that the check
> for a dead queue would have to be made on every path that ends up
> calling the elevator, which would be a difficult sort of thing to
> enforce.
>
> I'm not too sure about James's comment:
>
> > Moving the
> > queue free is wrong ... it recently moved to fix another oops.
>
> Apparently this refers to commit
> e73e079bf128d68284efedeba1fbbc18d78610f9 ([SCSI] Fix oops caused by
> queue refcounting failure). In fact that commit does _not_ move the
> call to scsi_free_queue(). Instead it merely takes another reference
> to the queue, so that scsi_free_queue() doesn't actually deallocate the
> queue. But it does still deallocate the elevator.
Not that one, 86cbfb5607d4b81b1a993ff689bbd2addd5d3a9b
> Perhaps this means the elevator shouldn't be freed until the queue is.
> I just don't know. Jens and James are the experts, but Jens hasn't
> said anything and James is currently busy.
OK, back from being busy now (for a short while).
This is what I think should fix it all (at least it fixes it for me now
I've finally figured out how to reproduce it).
The nasty about this is that blk_get_request() has to return NULL even
if __GFP_WAIT is specified, if the queue is already dead. Lots of block
users don't check for NULL if they specify __GFP_WAIT.
James
---
diff --git a/block/blk-core.c b/block/blk-core.c
index d2f8f40..1d49e1c 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -839,6 +839,9 @@ struct request *blk_get_request(struct request_queue *q, int rw, gfp_t gfp_mask)
{
struct request *rq;
+ if (unlikely(test_bit(QUEUE_FLAG_DEAD, &q->queue_flags)))
+ return NULL;
+
BUG_ON(rw != READ && rw != WRITE);
spin_lock_irq(q->queue_lock);
diff --git a/block/blk-exec.c b/block/blk-exec.c
index 8a0e7ec..a1ebceb 100644
--- a/block/blk-exec.c
+++ b/block/blk-exec.c
@@ -50,6 +50,13 @@ void blk_execute_rq_nowait(struct request_queue *q, struct gendisk *bd_disk,
{
int where = at_head ? ELEVATOR_INSERT_FRONT : ELEVATOR_INSERT_BACK;
+ if (unlikely(test_bit(QUEUE_FLAG_DEAD, &q->queue_flags))) {
+ rq->errors = -ENXIO;
+ if (rq->end_io)
+ rq->end_io(rq, rq->errors);
+ return;
+ }
+
rq->rq_disk = bd_disk;
rq->end_io = done;
WARN_ON(irqs_disabled());
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index ec1803a..28d9c9d 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -213,6 +213,8 @@ int scsi_execute(struct scsi_device *sdev, const unsigned char *cmd,
int ret = DRIVER_ERROR << 24;
req = blk_get_request(sdev->request_queue, write, __GFP_WAIT);
+ if (!req)
+ return ret;
if (bufflen && blk_rq_map_kern(sdev->request_queue, req,
buffer, bufflen, __GFP_WAIT))
next prev parent reply other threads:[~2011-07-07 20:45 UTC|newest]
Thread overview: 46+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-06-15 11:20 [BUG] 2.6.39.1 crash in scsi_dispatch_cmd() Heiko Carstens
2011-06-16 16:01 ` Heiko Carstens
2011-06-16 16:37 ` James Bottomley
2011-06-16 18:40 ` Heiko Carstens
2011-06-20 15:30 ` Heiko Carstens
2011-07-01 18:07 ` Roland Dreier
2011-07-01 19:04 ` James Bottomley
2011-07-06 0:34 ` Roland Dreier
2011-07-06 6:47 ` Heiko Carstens
2011-07-06 8:06 ` Roland Dreier
2011-07-06 9:25 ` Heiko Carstens
2011-07-06 14:20 ` Alan Stern
2011-07-06 14:24 ` James Bottomley
2011-07-06 16:30 ` Roland Dreier
2011-07-06 16:53 ` Alan Stern
2011-07-06 18:07 ` Roland Dreier
2011-07-06 18:49 ` Alan Stern
2011-07-07 20:45 ` James Bottomley [this message]
2011-07-07 21:07 ` Alan Stern
2011-07-08 17:04 ` James Bottomley
2011-07-08 19:43 ` Alan Stern
2011-07-08 20:41 ` James Bottomley
2011-07-08 22:08 ` Alan Stern
2011-07-08 22:25 ` James Bottomley
2011-07-08 20:47 ` Roland Dreier
2011-07-08 23:04 ` [PATCH] block: Check that queue is alive in blk_insert_cloned_request() Roland Dreier
2011-07-09 9:05 ` Stefan Richter
2011-07-11 22:40 ` Mike Snitzer
2011-07-12 0:52 ` Alan Stern
2011-07-12 0:52 ` Alan Stern
2011-07-12 1:22 ` Mike Snitzer
2011-07-12 1:46 ` Vivek Goyal
2011-07-12 15:24 ` Alan Stern
2011-07-12 15:24 ` Alan Stern
2011-07-12 17:10 ` Vivek Goyal
2011-07-12 14:58 ` [PATCH] dm mpath: manage reference on request queue of underlying devices Mike Snitzer
2011-07-12 17:06 ` [PATCH] block: Check that queue is alive in blk_insert_cloned_request() Vivek Goyal
2011-07-12 17:41 ` James Bottomley
2011-07-12 18:02 ` Vivek Goyal
2011-07-12 18:28 ` James Bottomley
2011-07-12 18:54 ` Vivek Goyal
2011-07-12 18:54 ` Vivek Goyal
2011-07-12 21:02 ` Alan Stern
2011-07-12 21:02 ` Alan Stern
2011-07-12 2:09 ` Vivek Goyal
2011-07-06 16:24 ` [BUG] 2.6.39.1 crash in scsi_dispatch_cmd() Roland Dreier
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1310071540.3282.71.camel@mulgrave \
--to=james.bottomley@hansenpartnership.com \
--cc=heiko.carstens@de.ibm.com \
--cc=jaxboe@fusionio.com \
--cc=linux-scsi@vger.kernel.org \
--cc=maier@linux.vnet.ibm.com \
--cc=manvanth@linux.vnet.ibm.com \
--cc=roland@kernel.org \
--cc=sesh17@linux.vnet.ibm.com \
--cc=stern@rowland.harvard.edu \
--cc=tarak.reddy@in.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.