From: James Bottomley <James.Bottomley@HansenPartnership.com>
To: Alan Stern <stern@rowland.harvard.edu>
Cc: Roland Dreier <roland@kernel.org>,
Heiko Carstens <heiko.carstens@de.ibm.com>,
Jens Axboe <jaxboe@fusionio.com>,
linux-scsi@vger.kernel.org,
Steffen Maier <maier@linux.vnet.ibm.com>,
"Manvanthara B. Puttashankar" <manvanth@linux.vnet.ibm.com>,
Tarak Reddy <tarak.reddy@in.ibm.com>,
"Seshagiri N. Ippili" <sesh17@linux.vnet.ibm.com>
Subject: Re: [BUG] 2.6.39.1 crash in scsi_dispatch_cmd()
Date: Fri, 08 Jul 2011 12:04:26 -0500 [thread overview]
Message-ID: <1310144667.3282.94.camel@mulgrave> (raw)
In-Reply-To: <Pine.LNX.4.44L0.1107071653010.1687-100000@iolanthe.rowland.org>
On Thu, 2011-07-07 at 17:07 -0400, Alan Stern wrote:
> On Thu, 7 Jul 2011, James Bottomley wrote:
>
> > > Perhaps this means the elevator shouldn't be freed until the queue is.
> > > I just don't know. Jens and James are the experts, but Jens hasn't
> > > said anything and James is currently busy.
> >
> > OK, back from being busy now (for a short while).
> >
> > This is what I think should fix it all (at least it fixes it for me now
> > I've finally figured out how to reproduce it).
> >
> > The nasty about this is that blk_get_request() has to return NULL even
> > if __GFP_WAIT is specified, if the queue is already dead. Lots of block
> > users don't check for NULL if they specify __GFP_WAIT.
>
> There are a few things here you might be able to explain.
>
> First, after blk_cleanup_queue() is called, whose responsibility is it
> to make sure that no more requests can be added to the queue: the block
> layer or the client (in this case the SCSI layer)? Your patch implies
> that the block layer is responsible; is this documented anywhere? Or
> if it isn't, does Jens agree?
So when I asked Jens, he agreed block. The problem we've been running
into is that the teardown state (QUEUE_FLAG_DEAD) isn't really checked
where it should be, so the early teardown SCSI now does runs into a lot
of these missing checks.
> Second, there isn't any synchronization between __scsi_remove_device()
> and scsi_prep_fn(). Isn't it still possible that a request will get
> added to the queue and processed after queuedata has been set to NULL,
> leading to another invalid memory access? Isn't a NULL check still
> needed in scsi_prep_fn()?
only in the REQ_TYPE_BLOCK_PC case. The theory is that before we tear
down the queue, we've waited for the ULD detach to occur, so none of the
ULD prep functions need check.
However, that does beg the question of why sr is using the device after
sr_remove() has completed. That seems to be because of the block ops
being the dominant structure, so we try to do cleanup when we get the
block release rather than the driver release ... that looks to be the
root cause to me.
> Third, do you recall the reason (if any) for freeing the queue in
> __scsi_remove_device() rather than
> scsi_device_dev_release_usercontext()? At that point, you can be quite
> sure no more requests will be added to the queue.
Yes, it was to get early teardown of the queue. I've forgotten the
specific bug, but it's somewhere in bugzilla. There was an oops because
the queue was still running.
James
next prev parent reply other threads:[~2011-07-08 17:04 UTC|newest]
Thread overview: 42+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-06-15 11:20 [BUG] 2.6.39.1 crash in scsi_dispatch_cmd() Heiko Carstens
2011-06-16 16:01 ` Heiko Carstens
2011-06-16 16:37 ` James Bottomley
2011-06-16 18:40 ` Heiko Carstens
2011-06-20 15:30 ` Heiko Carstens
2011-07-01 18:07 ` Roland Dreier
2011-07-01 19:04 ` James Bottomley
2011-07-06 0:34 ` Roland Dreier
2011-07-06 6:47 ` Heiko Carstens
2011-07-06 8:06 ` Roland Dreier
2011-07-06 9:25 ` Heiko Carstens
2011-07-06 14:20 ` Alan Stern
2011-07-06 14:24 ` James Bottomley
2011-07-06 16:30 ` Roland Dreier
2011-07-06 16:53 ` Alan Stern
2011-07-06 18:07 ` Roland Dreier
2011-07-06 18:49 ` Alan Stern
2011-07-07 20:45 ` James Bottomley
2011-07-07 21:07 ` Alan Stern
2011-07-08 17:04 ` James Bottomley [this message]
2011-07-08 19:43 ` Alan Stern
2011-07-08 20:41 ` James Bottomley
2011-07-08 22:08 ` Alan Stern
2011-07-08 22:25 ` James Bottomley
2011-07-08 20:47 ` Roland Dreier
2011-07-08 23:04 ` [PATCH] block: Check that queue is alive in blk_insert_cloned_request() Roland Dreier
2011-07-09 9:05 ` Stefan Richter
2011-07-11 22:40 ` Mike Snitzer
2011-07-12 0:52 ` Alan Stern
2011-07-12 1:22 ` Mike Snitzer
2011-07-12 1:46 ` Vivek Goyal
2011-07-12 15:24 ` Alan Stern
2011-07-12 17:10 ` Vivek Goyal
2011-07-12 14:58 ` [PATCH] dm mpath: manage reference on request queue of underlying devices Mike Snitzer
2011-07-12 17:06 ` [PATCH] block: Check that queue is alive in blk_insert_cloned_request() Vivek Goyal
2011-07-12 17:41 ` James Bottomley
2011-07-12 18:02 ` Vivek Goyal
2011-07-12 18:28 ` James Bottomley
2011-07-12 18:54 ` Vivek Goyal
2011-07-12 21:02 ` Alan Stern
2011-07-12 2:09 ` Vivek Goyal
2011-07-06 16:24 ` [BUG] 2.6.39.1 crash in scsi_dispatch_cmd() Roland Dreier
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1310144667.3282.94.camel@mulgrave \
--to=james.bottomley@hansenpartnership.com \
--cc=heiko.carstens@de.ibm.com \
--cc=jaxboe@fusionio.com \
--cc=linux-scsi@vger.kernel.org \
--cc=maier@linux.vnet.ibm.com \
--cc=manvanth@linux.vnet.ibm.com \
--cc=roland@kernel.org \
--cc=sesh17@linux.vnet.ibm.com \
--cc=stern@rowland.harvard.edu \
--cc=tarak.reddy@in.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox