* RE: [Bug 25832] kernel crashes when a mounted ext3/4 file system is physically removed [not found] ` <Pine.LNX.4.44L0.1109171330360.18455-100000@netrider.rowland.org> @ 2011-09-18 23:00 ` Ben Hutchings 2011-09-20 7:32 ` Jun'ichi Nomura 0 siblings, 1 reply; 8+ messages in thread From: Ben Hutchings @ 2011-09-18 23:00 UTC (permalink / raw) To: Alan Stern, James Bottomley Cc: Rocko Requin, tytso, Kernel development list, linux-scsi [-- Attachment #1: Type: text/plain, Size: 1864 bytes --] On Sat, 2011-09-17 at 13:34 -0400, Alan Stern wrote: > On Sat, 17 Sep 2011, Rocko Requin wrote: > > > > Why were you using gnome-terminal? You should be running the tests at > > > a console VT, not under X at all. Ctrl-Alt-F2 or the equivalent... > > > > Because with Ted's patch it doesn't crash when run from a console VT, even with an X server running. > > That's weird. Maybe the screen updates change some timing. > > > > Here's another patch to address the new problem. You can apply it on > > > top of all the other patches. > > > > Attached is the crash log I get with the latest patch applied. > > Okay, more fallout from the same problem. Here's an updated version of > the previous patch. [...] There have been reports of this in Debian going back to 2.6.39: http://bugs.debian.org/631187 http://bugs.debian.org/636263 http://bugs.debian.org/642043 Plus possibly related crashes in elv_put_request after CD-ROM removal: http://bugs.debian.org/633890 http://bugs.debian.org/634681 http://bugs.debian.org/636103 The former was also reported in Ubuntu since their 2.6.38-10: https://bugs.launchpad.net/debian/+source/linux-2.6/+bug/793796 The result of the discussion there was that it appeared to be a regression due to commit 86cbfb5607d4b81b1a993ff689bbd2addd5d3a9b ("[SCSI] put stricter guards on queue dead checks") which was also included in a stable update for 2.6.38. There was also a report on bugzilla.kernel.org, though no-one can see quite what that says now: https://bugzilla.kernel.org/show_bug.cgi?id=38842 I also reported most of the above to James Bottomley and linux-scsi nearly 2 months ago, to no response. Ben. -- Ben Hutchings Power corrupts. Absolute power is kind of neat. - John Lehman, Secretary of the US Navy 1981-1987 [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 828 bytes --] ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Bug 25832] kernel crashes when a mounted ext3/4 file system is physically removed 2011-09-18 23:00 ` [Bug 25832] kernel crashes when a mounted ext3/4 file system is physically removed Ben Hutchings @ 2011-09-20 7:32 ` Jun'ichi Nomura 2011-09-22 12:26 ` Hannes Reinecke 0 siblings, 1 reply; 8+ messages in thread From: Jun'ichi Nomura @ 2011-09-20 7:32 UTC (permalink / raw) To: Ben Hutchings, jaxboe Cc: Alan Stern, James Bottomley, Rocko Requin, tytso, Kernel development list, linux-scsi On 09/19/11 08:00, Ben Hutchings wrote: > On Sat, 2011-09-17 at 13:34 -0400, Alan Stern wrote: >> On Sat, 17 Sep 2011, Rocko Requin wrote: >> >>>> Why were you using gnome-terminal? You should be running the tests at >>>> a console VT, not under X at all. Ctrl-Alt-F2 or the equivalent... >>> >>> Because with Ted's patch it doesn't crash when run from a console VT, even with an X server running. >> >> That's weird. Maybe the screen updates change some timing. >> >>>> Here's another patch to address the new problem. You can apply it on >>>> top of all the other patches. >>> >>> Attached is the crash log I get with the latest patch applied. >> >> Okay, more fallout from the same problem. Here's an updated version of >> the previous patch. > [...] > > There have been reports of this in Debian going back to 2.6.39: > > http://bugs.debian.org/631187 > http://bugs.debian.org/636263 > http://bugs.debian.org/642043 > > Plus possibly related crashes in elv_put_request after CD-ROM removal: > > http://bugs.debian.org/633890 > http://bugs.debian.org/634681 > http://bugs.debian.org/636103 > > The former was also reported in Ubuntu since their 2.6.38-10: > > https://bugs.launchpad.net/debian/+source/linux-2.6/+bug/793796 > > The result of the discussion there was that it appeared to be a > regression due to commit 86cbfb5607d4b81b1a993ff689bbd2addd5d3a9b > ("[SCSI] put stricter guards on queue dead checks") which was also > included in a stable update for 2.6.38. > > There was also a report on bugzilla.kernel.org, though no-one can see > quite what that says now: > > https://bugzilla.kernel.org/show_bug.cgi?id=38842 > > I also reported most of the above to James Bottomley and linux-scsi > nearly 2 months ago, to no response. I've reported a similar oops related to the above commit: [BUG] Oops when SCSI device under multipath is removed https://lkml.org/lkml/2011/8/10/11 Elevator being removed is the core of the problem. And the essential issue seems 2 different models of queue/driver relation implied by queue_lock. If reverting the commit is not an option, until somebody comes up to fix the essential issue, the patch below should close the regressions introduced by the commit. Thanks, -- Jun'ichi Nomura, NEC Corporation This patch moves elevator_exit() and blk_throtl_exit() from blk_cleanup_queue() to blk_release_queue() when it is possible. elevator_exit() and blk_throtl_exit() were called in blk_cleanup_queue() because they use queue_lock. There are 2 types of queue_locks: a) supplied by driver (via blk_init_queue) b) embedded in struct request_queue (__queue_lock) When queue_lock is supplied by driver, there is no guarantee that the pointer is valid after blk_cleanup_queue(), so they have to be called in blk_cleanup_queue(). In this case, the driver has to make sure nobody is using the queue before calling blk_cleanup_queue(). However, OTOH, if queue_lock is '__queue_lock' in request_queue, blk_release_queue() is better place for freeing structures because the block layer knows for sure there is no reference. This patch is ugly but should fix various oopses introduced by this change: 86cbfb5607d4b81b1a993ff689bbd2addd5d3a9b [SCSI] put stricter guards on queue dead checks For example: https://lkml.org/lkml/2011/8/10/11 Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Index: linux-3.1-rc4/block/blk-core.c =================================================================== --- linux-3.1-rc4.orig/block/blk-core.c 2011-08-29 13:16:01.000000000 +0900 +++ linux-3.1-rc4/block/blk-core.c 2011-09-20 15:53:23.496814819 +0900 @@ -352,6 +352,14 @@ * unexpectedly as some queue cleanup components like elevator_exit() and * blk_throtl_exit() need queue lock. */ +void blk_release_queue_components_with_queuelock(struct request_queue *q) +{ + if (q->elevator) + elevator_exit(q->elevator); + + blk_throtl_exit(q); +} + void blk_cleanup_queue(struct request_queue *q) { /* @@ -367,10 +375,12 @@ queue_flag_set_unlocked(QUEUE_FLAG_DEAD, q); mutex_unlock(&q->sysfs_lock); - if (q->elevator) - elevator_exit(q->elevator); - - blk_throtl_exit(q); + /* + * A driver supplied the queue lock. + * Cleanup components while the queue lock is valid. + */ + if (q->queue_lock != &q->__queue_lock) + blk_release_queue_components_with_queuelock(q); blk_put_queue(q); } Index: linux-3.1-rc4/block/blk-sysfs.c =================================================================== --- linux-3.1-rc4.orig/block/blk-sysfs.c 2011-09-19 09:38:51.000000000 +0900 +++ linux-3.1-rc4/block/blk-sysfs.c 2011-09-20 15:57:50.358807023 +0900 @@ -477,6 +477,9 @@ blk_sync_queue(q); + if (q->queue_lock == &q->__queue_lock) + blk_release_queue_components_with_queuelock(q); + if (rl->rq_pool) mempool_destroy(rl->rq_pool); Index: linux-3.1-rc4/block/blk.h =================================================================== --- linux-3.1-rc4.orig/block/blk.h 2011-08-29 13:16:01.000000000 +0900 +++ linux-3.1-rc4/block/blk.h 2011-09-20 15:57:38.306807136 +0900 @@ -25,6 +25,9 @@ void blk_add_timer(struct request *); void __generic_unplug_device(struct request_queue *); +/* Wrapper to release functions to be called while queue_lock is valid */ +void blk_release_queue_components_with_queuelock(struct request_queue *q); + /* * Internal atomic flags for request handling */ ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Bug 25832] kernel crashes when a mounted ext3/4 file system is physically removed 2011-09-20 7:32 ` Jun'ichi Nomura @ 2011-09-22 12:26 ` Hannes Reinecke 2011-09-22 12:35 ` James Bottomley 2011-09-22 15:16 ` Alan Stern 0 siblings, 2 replies; 8+ messages in thread From: Hannes Reinecke @ 2011-09-22 12:26 UTC (permalink / raw) To: Jun'ichi Nomura Cc: Ben Hutchings, jaxboe, Alan Stern, James Bottomley, Rocko Requin, tytso, Kernel development list, linux-scsi On 09/20/2011 09:32 AM, Jun'ichi Nomura wrote: > On 09/19/11 08:00, Ben Hutchings wrote: [ .. ] >> >> There have been reports of this in Debian going back to 2.6.39: >> >> http://bugs.debian.org/631187 >> http://bugs.debian.org/636263 >> http://bugs.debian.org/642043 >> >> Plus possibly related crashes in elv_put_request after CD-ROM removal: >> >> http://bugs.debian.org/633890 >> http://bugs.debian.org/634681 >> http://bugs.debian.org/636103 >> >> The former was also reported in Ubuntu since their 2.6.38-10: >> >> https://bugs.launchpad.net/debian/+source/linux-2.6/+bug/793796 >> >> The result of the discussion there was that it appeared to be a >> regression due to commit 86cbfb5607d4b81b1a993ff689bbd2addd5d3a9b >> ("[SCSI] put stricter guards on queue dead checks") which was also >> included in a stable update for 2.6.38. >> >> There was also a report on bugzilla.kernel.org, though no-one can see >> quite what that says now: >> >> https://bugzilla.kernel.org/show_bug.cgi?id=38842 >> >> I also reported most of the above to James Bottomley and linux-scsi >> nearly 2 months ago, to no response. > > I've reported a similar oops related to the above commit: > [BUG] Oops when SCSI device under multipath is removed > https://lkml.org/lkml/2011/8/10/11 > > Elevator being removed is the core of the problem. > And the essential issue seems 2 different models of queue/driver relation > implied by queue_lock. > > If reverting the commit is not an option, > until somebody comes up to fix the essential issue, > the patch below should close the regressions introduced by the commit. > Why do you have to do it that complicated? Couldn't we just state that any external lock is being disconnected from queue_lock after blk_cleanup_queue()? Then something like this should suffice here: diff --git a/block/blk-core.c b/block/blk-core.c index 90e1ffd..a4ac005 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -367,10 +367,8 @@ void blk_cleanup_queue(struct request_queue *q) queue_flag_set_unlocked(QUEUE_FLAG_DEAD, q); mutex_unlock(&q->sysfs_lock); - if (q->elevator) - elevator_exit(q->elevator); - - blk_throtl_exit(q); + if (q->queue_lock != q->__queue_lock) + q->queue_lock = q->__queue_lock; blk_put_queue(q); } diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c index 0ee17b5..a5a756b 100644 --- a/block/blk-sysfs.c +++ b/block/blk-sysfs.c @@ -477,6 +477,11 @@ static void blk_release_queue(struct kobject *kobj) blk_sync_queue(q); + if (q->elevator) + elevator_exit(q->elevator); + + blk_throtl_exit(q); + if (rl->rq_pool) mempool_destroy(rl->rq_pool); And yeah, I find it pretty annoying, too. Cheers, Hannes -- Dr. Hannes Reinecke zSeries & Storage hare@suse.de +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: Markus Rex, HRB 16746 (AG Nürnberg) -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [Bug 25832] kernel crashes when a mounted ext3/4 file system is physically removed 2011-09-22 12:26 ` Hannes Reinecke @ 2011-09-22 12:35 ` James Bottomley 2011-09-22 15:16 ` Alan Stern 1 sibling, 0 replies; 8+ messages in thread From: James Bottomley @ 2011-09-22 12:35 UTC (permalink / raw) To: Hannes Reinecke Cc: Jun'ichi Nomura, Ben Hutchings, jaxboe, Alan Stern, Rocko Requin, tytso, Kernel development list, linux-scsi On Thu, 2011-09-22 at 14:26 +0200, Hannes Reinecke wrote: > On 09/20/2011 09:32 AM, Jun'ichi Nomura wrote: > > On 09/19/11 08:00, Ben Hutchings wrote: > [ .. ] > >> > >> There have been reports of this in Debian going back to 2.6.39: > >> > >> http://bugs.debian.org/631187 > >> http://bugs.debian.org/636263 > >> http://bugs.debian.org/642043 > >> > >> Plus possibly related crashes in elv_put_request after CD-ROM removal: > >> > >> http://bugs.debian.org/633890 > >> http://bugs.debian.org/634681 > >> http://bugs.debian.org/636103 > >> > >> The former was also reported in Ubuntu since their 2.6.38-10: > >> > >> https://bugs.launchpad.net/debian/+source/linux-2.6/+bug/793796 > >> > >> The result of the discussion there was that it appeared to be a > >> regression due to commit 86cbfb5607d4b81b1a993ff689bbd2addd5d3a9b > >> ("[SCSI] put stricter guards on queue dead checks") which was also > >> included in a stable update for 2.6.38. > >> > >> There was also a report on bugzilla.kernel.org, though no-one can see > >> quite what that says now: > >> > >> https://bugzilla.kernel.org/show_bug.cgi?id=38842 > >> > >> I also reported most of the above to James Bottomley and linux-scsi > >> nearly 2 months ago, to no response. > > > > I've reported a similar oops related to the above commit: > > [BUG] Oops when SCSI device under multipath is removed > > https://lkml.org/lkml/2011/8/10/11 > > > > Elevator being removed is the core of the problem. > > And the essential issue seems 2 different models of queue/driver relation > > implied by queue_lock. > > > > If reverting the commit is not an option, > > until somebody comes up to fix the essential issue, > > the patch below should close the regressions introduced by the commit. > > > Why do you have to do it that complicated? > Couldn't we just state that any external lock is being disconnected from > queue_lock after blk_cleanup_queue()? > > Then something like this should suffice here: > > diff --git a/block/blk-core.c b/block/blk-core.c > index 90e1ffd..a4ac005 100644 > --- a/block/blk-core.c > +++ b/block/blk-core.c > @@ -367,10 +367,8 @@ void blk_cleanup_queue(struct request_queue *q) > queue_flag_set_unlocked(QUEUE_FLAG_DEAD, q); > mutex_unlock(&q->sysfs_lock); > > - if (q->elevator) > - elevator_exit(q->elevator); > - > - blk_throtl_exit(q); > + if (q->queue_lock != q->__queue_lock) > + q->queue_lock = q->__queue_lock; > > blk_put_queue(q); > } > diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c > index 0ee17b5..a5a756b 100644 > --- a/block/blk-sysfs.c > +++ b/block/blk-sysfs.c > @@ -477,6 +477,11 @@ static void blk_release_queue(struct kobject *kobj) > > blk_sync_queue(q); > > + if (q->elevator) > + elevator_exit(q->elevator); > + > + blk_throtl_exit(q); > + OK, I'll buy this one (when you fix the whitespace issue ... you have spaces instead of tabs). The fact that the lock check/replacement doesn't actually need any locking is probably worthy of a comment. James ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Bug 25832] kernel crashes when a mounted ext3/4 file system is physically removed 2011-09-22 12:26 ` Hannes Reinecke 2011-09-22 12:35 ` James Bottomley @ 2011-09-22 15:16 ` Alan Stern 2011-09-22 16:20 ` Thadeu Lima de Souza Cascardo 1 sibling, 1 reply; 8+ messages in thread From: Alan Stern @ 2011-09-22 15:16 UTC (permalink / raw) To: Rocko Requin Cc: Hannes Reinecke, Jun'ichi Nomura, Ben Hutchings, jaxboe, James Bottomley, tytso, Kernel development list, linux-scsi Rocko: Can you try testing this patch instead of all the patches I sent to you (but keep Ted's patch)? Alan Stern On Thu, 22 Sep 2011, Hannes Reinecke wrote: > On 09/20/2011 09:32 AM, Jun'ichi Nomura wrote: > > On 09/19/11 08:00, Ben Hutchings wrote: > [ .. ] > >> > >> There have been reports of this in Debian going back to 2.6.39: > >> > >> http://bugs.debian.org/631187 > >> http://bugs.debian.org/636263 > >> http://bugs.debian.org/642043 > >> > >> Plus possibly related crashes in elv_put_request after CD-ROM removal: > >> > >> http://bugs.debian.org/633890 > >> http://bugs.debian.org/634681 > >> http://bugs.debian.org/636103 > >> > >> The former was also reported in Ubuntu since their 2.6.38-10: > >> > >> https://bugs.launchpad.net/debian/+source/linux-2.6/+bug/793796 > >> > >> The result of the discussion there was that it appeared to be a > >> regression due to commit 86cbfb5607d4b81b1a993ff689bbd2addd5d3a9b > >> ("[SCSI] put stricter guards on queue dead checks") which was also > >> included in a stable update for 2.6.38. > >> > >> There was also a report on bugzilla.kernel.org, though no-one can see > >> quite what that says now: > >> > >> https://bugzilla.kernel.org/show_bug.cgi?id=38842 > >> > >> I also reported most of the above to James Bottomley and linux-scsi > >> nearly 2 months ago, to no response. > > > > I've reported a similar oops related to the above commit: > > [BUG] Oops when SCSI device under multipath is removed > > https://lkml.org/lkml/2011/8/10/11 > > > > Elevator being removed is the core of the problem. > > And the essential issue seems 2 different models of queue/driver relation > > implied by queue_lock. > > > > If reverting the commit is not an option, > > until somebody comes up to fix the essential issue, > > the patch below should close the regressions introduced by the commit. > > > Why do you have to do it that complicated? > Couldn't we just state that any external lock is being disconnected from > queue_lock after blk_cleanup_queue()? > > Then something like this should suffice here: diff --git a/block/blk-core.c b/block/blk-core.c index 90e1ffd..a4ac005 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -367,10 +367,8 @@ void blk_cleanup_queue(struct request_queue *q) queue_flag_set_unlocked(QUEUE_FLAG_DEAD, q); mutex_unlock(&q->sysfs_lock); - if (q->elevator) - elevator_exit(q->elevator); - - blk_throtl_exit(q); + if (q->queue_lock != q->__queue_lock) + q->queue_lock = q->__queue_lock; blk_put_queue(q); } diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c index 0ee17b5..a5a756b 100644 --- a/block/blk-sysfs.c +++ b/block/blk-sysfs.c @@ -477,6 +477,11 @@ static void blk_release_queue(struct kobject *kobj) blk_sync_queue(q); + if (q->elevator) + elevator_exit(q->elevator); + + blk_throtl_exit(q); + if (rl->rq_pool) mempool_destroy(rl->rq_pool); > And yeah, I find it pretty annoying, too. > > Cheers, > > Hannes > ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [Bug 25832] kernel crashes when a mounted ext3/4 file system is physically removed 2011-09-22 15:16 ` Alan Stern @ 2011-09-22 16:20 ` Thadeu Lima de Souza Cascardo 2011-09-22 16:32 ` Hannes Reinecke 0 siblings, 1 reply; 8+ messages in thread From: Thadeu Lima de Souza Cascardo @ 2011-09-22 16:20 UTC (permalink / raw) To: Alan Stern Cc: Rocko Requin, Hannes Reinecke, Jun'ichi Nomura, Ben Hutchings, jaxboe, James Bottomley, tytso, Kernel development list, linux-scsi On Thu, Sep 22, 2011 at 11:16:30AM -0400, Alan Stern wrote: > Rocko: > > Can you try testing this patch instead of all the patches I sent to > you (but keep Ted's patch)? > > Alan Stern > > On Thu, 22 Sep 2011, Hannes Reinecke wrote: > > > On 09/20/2011 09:32 AM, Jun'ichi Nomura wrote: > > > On 09/19/11 08:00, Ben Hutchings wrote: > > [ .. ] > > >> > > >> There have been reports of this in Debian going back to 2.6.39: > > >> > > >> http://bugs.debian.org/631187 > > >> http://bugs.debian.org/636263 > > >> http://bugs.debian.org/642043 > > >> > > >> Plus possibly related crashes in elv_put_request after CD-ROM removal: > > >> > > >> http://bugs.debian.org/633890 > > >> http://bugs.debian.org/634681 > > >> http://bugs.debian.org/636103 > > >> > > >> The former was also reported in Ubuntu since their 2.6.38-10: > > >> > > >> https://bugs.launchpad.net/debian/+source/linux-2.6/+bug/793796 > > >> > > >> The result of the discussion there was that it appeared to be a > > >> regression due to commit 86cbfb5607d4b81b1a993ff689bbd2addd5d3a9b > > >> ("[SCSI] put stricter guards on queue dead checks") which was also > > >> included in a stable update for 2.6.38. > > >> > > >> There was also a report on bugzilla.kernel.org, though no-one can see > > >> quite what that says now: > > >> > > >> https://bugzilla.kernel.org/show_bug.cgi?id=38842 > > >> > > >> I also reported most of the above to James Bottomley and linux-scsi > > >> nearly 2 months ago, to no response. > > > > > > I've reported a similar oops related to the above commit: > > > [BUG] Oops when SCSI device under multipath is removed > > > https://lkml.org/lkml/2011/8/10/11 > > > > > > Elevator being removed is the core of the problem. > > > And the essential issue seems 2 different models of queue/driver relation > > > implied by queue_lock. > > > > > > If reverting the commit is not an option, > > > until somebody comes up to fix the essential issue, > > > the patch below should close the regressions introduced by the commit. > > > > > Why do you have to do it that complicated? > > Couldn't we just state that any external lock is being disconnected from > > queue_lock after blk_cleanup_queue()? > > > > Then something like this should suffice here: > > > > diff --git a/block/blk-core.c b/block/blk-core.c > index 90e1ffd..a4ac005 100644 > --- a/block/blk-core.c > +++ b/block/blk-core.c > @@ -367,10 +367,8 @@ void blk_cleanup_queue(struct request_queue *q) > queue_flag_set_unlocked(QUEUE_FLAG_DEAD, q); > mutex_unlock(&q->sysfs_lock); > > - if (q->elevator) > - elevator_exit(q->elevator); > - > - blk_throtl_exit(q); > + if (q->queue_lock != q->__queue_lock) > + q->queue_lock = q->__queue_lock; That should be &q->__queue_lock. Regards, Cascardo. > > blk_put_queue(q); > } > diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c > index 0ee17b5..a5a756b 100644 > --- a/block/blk-sysfs.c > +++ b/block/blk-sysfs.c > @@ -477,6 +477,11 @@ static void blk_release_queue(struct kobject *kobj) > > blk_sync_queue(q); > > + if (q->elevator) > + elevator_exit(q->elevator); > + > + blk_throtl_exit(q); > + > if (rl->rq_pool) > mempool_destroy(rl->rq_pool); > > > > And yeah, I find it pretty annoying, too. > > > > Cheers, > > > > Hannes > > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Bug 25832] kernel crashes when a mounted ext3/4 file system is physically removed 2011-09-22 16:20 ` Thadeu Lima de Souza Cascardo @ 2011-09-22 16:32 ` Hannes Reinecke 0 siblings, 0 replies; 8+ messages in thread From: Hannes Reinecke @ 2011-09-22 16:32 UTC (permalink / raw) To: Thadeu Lima de Souza Cascardo Cc: Alan Stern, Rocko Requin, Jun'ichi Nomura, Ben Hutchings, jaxboe, James Bottomley, tytso, Kernel development list, linux-scsi On 09/22/2011 06:20 PM, Thadeu Lima de Souza Cascardo wrote: > On Thu, Sep 22, 2011 at 11:16:30AM -0400, Alan Stern wrote: >> Rocko: >> >> Can you try testing this patch instead of all the patches I sent to >> you (but keep Ted's patch)? >> >> Alan Stern >> >> On Thu, 22 Sep 2011, Hannes Reinecke wrote: >> >>> On 09/20/2011 09:32 AM, Jun'ichi Nomura wrote: >>>> On 09/19/11 08:00, Ben Hutchings wrote: >>> [ .. ] >>>>> >>>>> There have been reports of this in Debian going back to 2.6.39: >>>>> >>>>> http://bugs.debian.org/631187 >>>>> http://bugs.debian.org/636263 >>>>> http://bugs.debian.org/642043 >>>>> >>>>> Plus possibly related crashes in elv_put_request after CD-ROM removal: >>>>> >>>>> http://bugs.debian.org/633890 >>>>> http://bugs.debian.org/634681 >>>>> http://bugs.debian.org/636103 >>>>> >>>>> The former was also reported in Ubuntu since their 2.6.38-10: >>>>> >>>>> https://bugs.launchpad.net/debian/+source/linux-2.6/+bug/793796 >>>>> >>>>> The result of the discussion there was that it appeared to be a >>>>> regression due to commit 86cbfb5607d4b81b1a993ff689bbd2addd5d3a9b >>>>> ("[SCSI] put stricter guards on queue dead checks") which was also >>>>> included in a stable update for 2.6.38. >>>>> >>>>> There was also a report on bugzilla.kernel.org, though no-one can see >>>>> quite what that says now: >>>>> >>>>> https://bugzilla.kernel.org/show_bug.cgi?id=38842 >>>>> >>>>> I also reported most of the above to James Bottomley and linux-scsi >>>>> nearly 2 months ago, to no response. >>>> >>>> I've reported a similar oops related to the above commit: >>>> [BUG] Oops when SCSI device under multipath is removed >>>> https://lkml.org/lkml/2011/8/10/11 >>>> >>>> Elevator being removed is the core of the problem. >>>> And the essential issue seems 2 different models of queue/driver relation >>>> implied by queue_lock. >>>> >>>> If reverting the commit is not an option, >>>> until somebody comes up to fix the essential issue, >>>> the patch below should close the regressions introduced by the commit. >>>> >>> Why do you have to do it that complicated? >>> Couldn't we just state that any external lock is being disconnected from >>> queue_lock after blk_cleanup_queue()? >>> >>> Then something like this should suffice here: >> >> >> >> diff --git a/block/blk-core.c b/block/blk-core.c >> index 90e1ffd..a4ac005 100644 >> --- a/block/blk-core.c >> +++ b/block/blk-core.c >> @@ -367,10 +367,8 @@ void blk_cleanup_queue(struct request_queue *q) >> queue_flag_set_unlocked(QUEUE_FLAG_DEAD, q); >> mutex_unlock(&q->sysfs_lock); >> >> - if (q->elevator) >> - elevator_exit(q->elevator); >> - >> - blk_throtl_exit(q); >> + if (q->queue_lock != q->__queue_lock) >> + q->queue_lock = q->__queue_lock; > > That should be &q->__queue_lock. > Why, but of course. It's been fixed with the official patch (cf block: Free queue resources at blk_release_queue()) Cheers, Hannes -- Dr. Hannes Reinecke zSeries & Storage hare@suse.de +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: Markus Rex, HRB 16746 (AG Nürnberg) -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 8+ messages in thread
[parent not found: <BAY151-W13DDCCEFEB7B68EE506214A10C0@phx.gbl>]
* RE: [Bug 25832] kernel crashes when a mounted ext3/4 file system is physically removed [not found] <BAY151-W13DDCCEFEB7B68EE506214A10C0@phx.gbl> @ 2011-09-23 15:18 ` Alan Stern 0 siblings, 0 replies; 8+ messages in thread From: Alan Stern @ 2011-09-23 15:18 UTC (permalink / raw) To: Rocko Requin Cc: hare, j-nomura, ben, jaxboe, james.bottomley, tytso, linux-kernel, linux-scsi On Thu, 22 Sep 2011, Rocko Requin wrote: > > Rocko: > > > > Can you try testing this patch instead of all the patches I sent to > > you (but keep Ted's patch)? > > > > Alan Stern > > > > The simpler patch (in conjunction with Ted's patch) does stop the > crashes. I get the same results as before: no kernel crashes > (marvellous!), but the script's attempt to umount fails. I can then > manually umount afterwards. That sounds like a problem in the ext4 unmount implementation. Ted should be able to help track it down. What happens if you change your script to try two unmounts in a row? In theory, the second should work like your manual unmount. > Are these patches likely to be backported to the 3.0 kernel? Yes, I should think so. The ext4/ext3 patches may be ported even farther back. Alan Stern ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2011-09-23 15:18 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <BAY151-W32DCB4BAFEC97DD4913A12A1090@phx.gbl>
[not found] ` <Pine.LNX.4.44L0.1109171330360.18455-100000@netrider.rowland.org>
2011-09-18 23:00 ` [Bug 25832] kernel crashes when a mounted ext3/4 file system is physically removed Ben Hutchings
2011-09-20 7:32 ` Jun'ichi Nomura
2011-09-22 12:26 ` Hannes Reinecke
2011-09-22 12:35 ` James Bottomley
2011-09-22 15:16 ` Alan Stern
2011-09-22 16:20 ` Thadeu Lima de Souza Cascardo
2011-09-22 16:32 ` Hannes Reinecke
[not found] <BAY151-W13DDCCEFEB7B68EE506214A10C0@phx.gbl>
2011-09-23 15:18 ` Alan Stern
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox