From: "Jun'ichi Nomura" <j-nomura@ce.jp.nec.com>
To: Ben Hutchings <ben@decadent.org.uk>, jaxboe@fusionio.com
Cc: Alan Stern <stern@rowland.harvard.edu>,
James Bottomley <James.Bottomley@HansenPartnership.com>,
Rocko Requin <rockorequin@hotmail.com>,
tytso@mit.edu,
Kernel development list <linux-kernel@vger.kernel.org>,
linux-scsi <linux-scsi@vger.kernel.org>
Subject: Re: [Bug 25832] kernel crashes when a mounted ext3/4 file system is physically removed
Date: Tue, 20 Sep 2011 16:32:52 +0900 [thread overview]
Message-ID: <4E7841A4.8040104@ce.jp.nec.com> (raw)
In-Reply-To: <1316386825.14749.207.camel@deadeye>
On 09/19/11 08:00, Ben Hutchings wrote:
> On Sat, 2011-09-17 at 13:34 -0400, Alan Stern wrote:
>> On Sat, 17 Sep 2011, Rocko Requin wrote:
>>
>>>> Why were you using gnome-terminal? You should be running the tests at
>>>> a console VT, not under X at all. Ctrl-Alt-F2 or the equivalent...
>>>
>>> Because with Ted's patch it doesn't crash when run from a console VT, even with an X server running.
>>
>> That's weird. Maybe the screen updates change some timing.
>>
>>>> Here's another patch to address the new problem. You can apply it on
>>>> top of all the other patches.
>>>
>>> Attached is the crash log I get with the latest patch applied.
>>
>> Okay, more fallout from the same problem. Here's an updated version of
>> the previous patch.
> [...]
>
> There have been reports of this in Debian going back to 2.6.39:
>
> http://bugs.debian.org/631187
> http://bugs.debian.org/636263
> http://bugs.debian.org/642043
>
> Plus possibly related crashes in elv_put_request after CD-ROM removal:
>
> http://bugs.debian.org/633890
> http://bugs.debian.org/634681
> http://bugs.debian.org/636103
>
> The former was also reported in Ubuntu since their 2.6.38-10:
>
> https://bugs.launchpad.net/debian/+source/linux-2.6/+bug/793796
>
> The result of the discussion there was that it appeared to be a
> regression due to commit 86cbfb5607d4b81b1a993ff689bbd2addd5d3a9b
> ("[SCSI] put stricter guards on queue dead checks") which was also
> included in a stable update for 2.6.38.
>
> There was also a report on bugzilla.kernel.org, though no-one can see
> quite what that says now:
>
> https://bugzilla.kernel.org/show_bug.cgi?id=38842
>
> I also reported most of the above to James Bottomley and linux-scsi
> nearly 2 months ago, to no response.
I've reported a similar oops related to the above commit:
[BUG] Oops when SCSI device under multipath is removed
https://lkml.org/lkml/2011/8/10/11
Elevator being removed is the core of the problem.
And the essential issue seems 2 different models of queue/driver relation
implied by queue_lock.
If reverting the commit is not an option,
until somebody comes up to fix the essential issue,
the patch below should close the regressions introduced by the commit.
Thanks,
--
Jun'ichi Nomura, NEC Corporation
This patch moves elevator_exit() and blk_throtl_exit() from
blk_cleanup_queue() to blk_release_queue() when it is possible.
elevator_exit() and blk_throtl_exit() were called in blk_cleanup_queue()
because they use queue_lock.
There are 2 types of queue_locks:
a) supplied by driver (via blk_init_queue)
b) embedded in struct request_queue (__queue_lock)
When queue_lock is supplied by driver, there is no guarantee that
the pointer is valid after blk_cleanup_queue(), so they have to be
called in blk_cleanup_queue().
In this case, the driver has to make sure nobody is using the queue
before calling blk_cleanup_queue().
However, OTOH, if queue_lock is '__queue_lock' in request_queue,
blk_release_queue() is better place for freeing structures
because the block layer knows for sure there is no reference.
This patch is ugly but should fix various oopses introduced by this change:
86cbfb5607d4b81b1a993ff689bbd2addd5d3a9b
[SCSI] put stricter guards on queue dead checks
For example:
https://lkml.org/lkml/2011/8/10/11
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Index: linux-3.1-rc4/block/blk-core.c
===================================================================
--- linux-3.1-rc4.orig/block/blk-core.c 2011-08-29 13:16:01.000000000 +0900
+++ linux-3.1-rc4/block/blk-core.c 2011-09-20 15:53:23.496814819 +0900
@@ -352,6 +352,14 @@
* unexpectedly as some queue cleanup components like elevator_exit() and
* blk_throtl_exit() need queue lock.
*/
+void blk_release_queue_components_with_queuelock(struct request_queue *q)
+{
+ if (q->elevator)
+ elevator_exit(q->elevator);
+
+ blk_throtl_exit(q);
+}
+
void blk_cleanup_queue(struct request_queue *q)
{
/*
@@ -367,10 +375,12 @@
queue_flag_set_unlocked(QUEUE_FLAG_DEAD, q);
mutex_unlock(&q->sysfs_lock);
- if (q->elevator)
- elevator_exit(q->elevator);
-
- blk_throtl_exit(q);
+ /*
+ * A driver supplied the queue lock.
+ * Cleanup components while the queue lock is valid.
+ */
+ if (q->queue_lock != &q->__queue_lock)
+ blk_release_queue_components_with_queuelock(q);
blk_put_queue(q);
}
Index: linux-3.1-rc4/block/blk-sysfs.c
===================================================================
--- linux-3.1-rc4.orig/block/blk-sysfs.c 2011-09-19 09:38:51.000000000 +0900
+++ linux-3.1-rc4/block/blk-sysfs.c 2011-09-20 15:57:50.358807023 +0900
@@ -477,6 +477,9 @@
blk_sync_queue(q);
+ if (q->queue_lock == &q->__queue_lock)
+ blk_release_queue_components_with_queuelock(q);
+
if (rl->rq_pool)
mempool_destroy(rl->rq_pool);
Index: linux-3.1-rc4/block/blk.h
===================================================================
--- linux-3.1-rc4.orig/block/blk.h 2011-08-29 13:16:01.000000000 +0900
+++ linux-3.1-rc4/block/blk.h 2011-09-20 15:57:38.306807136 +0900
@@ -25,6 +25,9 @@
void blk_add_timer(struct request *);
void __generic_unplug_device(struct request_queue *);
+/* Wrapper to release functions to be called while queue_lock is valid */
+void blk_release_queue_components_with_queuelock(struct request_queue *q);
+
/*
* Internal atomic flags for request handling
*/
next prev parent reply other threads:[~2011-09-20 7:34 UTC|newest]
Thread overview: 53+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <BAY151-W32DCB4BAFEC97DD4913A12A1090@phx.gbl>
2011-09-17 17:34 ` [Bug 25832] kernel crashes when a mounted ext3/4 file system is physically removed Alan Stern
2011-09-18 23:00 ` Ben Hutchings
2011-09-20 7:32 ` Jun'ichi Nomura [this message]
2011-09-22 12:26 ` Hannes Reinecke
2011-09-22 12:26 ` Hannes Reinecke
2011-09-22 12:35 ` James Bottomley
2011-09-22 15:16 ` Alan Stern
2011-09-22 15:16 ` Alan Stern
2011-09-22 16:20 ` Thadeu Lima de Souza Cascardo
2011-09-22 16:32 ` Hannes Reinecke
2011-09-22 16:32 ` Hannes Reinecke
[not found] <BAY151-W13DDCCEFEB7B68EE506214A10C0@phx.gbl>
2011-09-23 15:18 ` Alan Stern
2011-09-23 15:18 ` Alan Stern
[not found] <BAY151-W234D9A977DF076A732C2AAA1080@phx.gbl>
2011-09-18 14:43 ` Alan Stern
[not found] <BAY151-W1224E6C1A20D179965A149A1090@phx.gbl>
2011-09-17 13:21 ` Alan Stern
[not found] <BAY151-W3498E8491E671BDAE90421A1070@phx.gbl>
2011-09-16 16:28 ` Alan Stern
[not found] <bug-25832-13602@https.bugzilla.kernel.org/>
2011-04-22 13:42 ` bugzilla-daemon
2011-04-22 15:00 ` bugzilla-daemon
2011-04-23 0:32 ` bugzilla-daemon
2011-04-23 4:12 ` bugzilla-daemon
2011-04-23 19:31 ` bugzilla-daemon
2011-04-24 1:35 ` bugzilla-daemon
2011-04-25 0:36 ` bugzilla-daemon
2011-04-25 0:37 ` bugzilla-daemon
2011-04-25 0:39 ` bugzilla-daemon
2011-04-25 20:28 ` bugzilla-daemon
2011-04-26 0:28 ` bugzilla-daemon
2011-04-26 0:44 ` bugzilla-daemon
2011-04-26 1:22 ` bugzilla-daemon
2011-04-26 3:29 ` bugzilla-daemon
2011-04-26 4:02 ` bugzilla-daemon
2011-04-26 18:15 ` bugzilla-daemon
2011-05-03 2:19 ` bugzilla-daemon
2011-05-04 7:36 ` bugzilla-daemon
2011-05-10 23:27 ` bugzilla-daemon
2011-05-26 6:44 ` bugzilla-daemon
2011-05-26 14:27 ` bugzilla-daemon
2011-07-13 7:52 ` bugzilla-daemon
2011-08-31 5:00 ` bugzilla-daemon
2011-08-31 5:07 ` bugzilla-daemon
2011-08-31 14:36 ` bugzilla-daemon
2011-08-31 23:43 ` bugzilla-daemon
2011-09-01 1:30 ` bugzilla-daemon
2011-09-04 3:53 ` bugzilla-daemon
2011-09-04 13:55 ` bugzilla-daemon
2011-09-04 14:00 ` bugzilla-daemon
2011-09-05 17:44 ` bugzilla-daemon
2011-09-09 19:13 ` Ted Ts'o
2011-09-09 22:10 ` Alan Stern
[not found] ` <BAY151-W6176D929049AA9E2BDBAEBA1000@phx.gbl>
2011-09-10 14:06 ` Ted Ts'o
2011-09-10 18:07 ` Alan Stern
2011-09-12 1:58 ` Alan Stern
2012-07-02 13:24 ` bugzilla-daemon
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4E7841A4.8040104@ce.jp.nec.com \
--to=j-nomura@ce.jp.nec.com \
--cc=James.Bottomley@HansenPartnership.com \
--cc=ben@decadent.org.uk \
--cc=jaxboe@fusionio.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-scsi@vger.kernel.org \
--cc=rockorequin@hotmail.com \
--cc=stern@rowland.harvard.edu \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.