* md related oops triggered in bdev_inode_switch_bdi
@ 2011-08-31 6:22 NeilBrown
2011-09-01 3:30 ` Wu Fengguang
0 siblings, 1 reply; 5+ messages in thread
From: NeilBrown @ 2011-08-31 6:22 UTC (permalink / raw)
To: Christoph Hellwig, Hugh Dickins, Andrew Morton, Wu Fengguang; +Cc: lkml
Hi Christoph et. al.,
My testing recently triggered an oops in bdi_lock_two called from
bdev_inode_switch_bdi.
The bdi and the request_queue that contains it had been freed.
This happens with md which can free the md device and request queue
immediately after last close.
It seems that this is caused by your patch f758eeabeb96f8.
Prior to that the 'old' bdi was never dereferenced in
bdev_inode_switch_bdi. Now it is.
I think we can fix that by simply moving the call to bdev_inode_switch_bdi
before the call to ->release as in the patch below.
Do you see any problem with this patch?
Thanks,
NeilBrown
>From bcc5851cbb6876c97cce214feddd0ec092f7d71c Mon Sep 17 00:00:00 2001
From: NeilBrown <neilb@suse.de>
Date: Wed, 31 Aug 2011 15:37:22 +1000
Subject: [PATCH] Avoid dereferencing a 'request_queue' after last close.
On the last close of an 'md' device which as been stopped, the device
is destroyed and in particular the request_queue is freed. The free
is done in a separate thread so it might happen a short time later.
__blkdev_put calls bdev_inode_switch_bdi *after* ->release has been
called.
Since commit f758eeabeb96f878c860e8f110f94ec8820822a9
bdev_inode_switch_bdi will dereference the 'old' bdi, which lives
inside a request_queue, to get a spin lock. This causes the last
close on an md device to sometime take a spin_lock which lives in
freed memory - which results in an oops.
So move the called to bdev_inode_switch_bdi before the call to
->release.
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: stable@kernel.org
Signed-off-by: NeilBrown <neilb@suse.de>
diff --git a/fs/block_dev.c b/fs/block_dev.c
index ff77262..d8a753f 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -1430,6 +1430,12 @@ static int __blkdev_put(struct block_device *bdev, fmode_t mode, int for_part)
sync_blockdev(bdev);
kill_bdev(bdev);
}
+ if (!bdev->bd_openers)
+ /* ->release can cause the old bdi to disappear,
+ * so must switch it out first
+ */
+ bdev_inode_switch_bdi(bdev->bd_inode,
+ &default_backing_dev_info);
if (bdev->bd_contains == bdev) {
if (disk->fops->release)
ret = disk->fops->release(disk, mode);
@@ -1442,8 +1448,6 @@ static int __blkdev_put(struct block_device *bdev, fmode_t mode, int for_part)
disk_put_part(bdev->bd_part);
bdev->bd_part = NULL;
bdev->bd_disk = NULL;
- bdev_inode_switch_bdi(bdev->bd_inode,
- &default_backing_dev_info);
if (bdev != bdev->bd_contains)
victim = bdev->bd_contains;
bdev->bd_contains = NULL;
^ permalink raw reply related [flat|nested] 5+ messages in thread* Re: md related oops triggered in bdev_inode_switch_bdi
2011-08-31 6:22 md related oops triggered in bdev_inode_switch_bdi NeilBrown
@ 2011-09-01 3:30 ` Wu Fengguang
2011-09-01 5:49 ` NeilBrown
0 siblings, 1 reply; 5+ messages in thread
From: Wu Fengguang @ 2011-09-01 3:30 UTC (permalink / raw)
To: NeilBrown; +Cc: Christoph Hellwig, Hugh Dickins, Andrew Morton, lkml
Hi Neil,
> Subject: [PATCH] Avoid dereferencing a 'request_queue' after last close.
Reviewed-by: Wu Fengguang <fengguang.wu@intel.com>
with comments below.
> --- a/fs/block_dev.c
> +++ b/fs/block_dev.c
> @@ -1430,6 +1430,12 @@ static int __blkdev_put(struct block_device *bdev, fmode_t mode, int for_part)
> sync_blockdev(bdev);
> kill_bdev(bdev);
> }
> + if (!bdev->bd_openers)
> + /* ->release can cause the old bdi to disappear,
> + * so must switch it out first
> + */
> + bdev_inode_switch_bdi(bdev->bd_inode,
> + &default_backing_dev_info);
> if (bdev->bd_contains == bdev) {
> if (disk->fops->release)
> ret = disk->fops->release(disk, mode);
The bdev_inode_switch_bdi() call can be further moved into the
previous if block, like this:
if (!--bdev->bd_openers) {
WARN_ON_ONCE(bdev->bd_holders);
sync_blockdev(bdev);
kill_bdev(bdev);
+
+ /* ->release can cause the old bdi to disappear,
+ * so must switch it out first
+ */
+ bdev_inode_switch_bdi(bdev->bd_inode,
+ &default_backing_dev_info);
}
Then it's obvious that kill_bdev() will truncate all inode pages
and there won't be further interactions with dirty writes.
Although there are dozens of disk->fops->release functions, however
it's very unlikely they need to access some inode on top of the disk
(which is illogical thing).
So I don't see any problems. It makes sense to push it to next for
broader test ASAP. Will you do it, or me?
Thanks,
Fengguang
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: md related oops triggered in bdev_inode_switch_bdi
2011-09-01 3:30 ` Wu Fengguang
@ 2011-09-01 5:49 ` NeilBrown
2011-09-09 8:56 ` Lin Ming
0 siblings, 1 reply; 5+ messages in thread
From: NeilBrown @ 2011-09-01 5:49 UTC (permalink / raw)
To: Wu Fengguang; +Cc: Christoph Hellwig, Hugh Dickins, Andrew Morton, lkml
On Thu, 1 Sep 2011 11:30:56 +0800 Wu Fengguang <fengguang.wu@intel.com> wrote:
> Hi Neil,
>
> > Subject: [PATCH] Avoid dereferencing a 'request_queue' after last close.
>
> Reviewed-by: Wu Fengguang <fengguang.wu@intel.com>
Thanks.
>
> with comments below.
>
> > --- a/fs/block_dev.c
> > +++ b/fs/block_dev.c
> > @@ -1430,6 +1430,12 @@ static int __blkdev_put(struct block_device *bdev, fmode_t mode, int for_part)
> > sync_blockdev(bdev);
> > kill_bdev(bdev);
> > }
> > + if (!bdev->bd_openers)
> > + /* ->release can cause the old bdi to disappear,
> > + * so must switch it out first
> > + */
> > + bdev_inode_switch_bdi(bdev->bd_inode,
> > + &default_backing_dev_info);
> > if (bdev->bd_contains == bdev) {
> > if (disk->fops->release)
> > ret = disk->fops->release(disk, mode);
>
> The bdev_inode_switch_bdi() call can be further moved into the
> previous if block, like this:
>
> if (!--bdev->bd_openers) {
> WARN_ON_ONCE(bdev->bd_holders);
> sync_blockdev(bdev);
> kill_bdev(bdev);
> +
> + /* ->release can cause the old bdi to disappear,
> + * so must switch it out first
> + */
> + bdev_inode_switch_bdi(bdev->bd_inode,
> + &default_backing_dev_info);
> }
Yes, and obvious improvement now that you have pointed it out - thanks.
>
> Then it's obvious that kill_bdev() will truncate all inode pages
> and there won't be further interactions with dirty writes.
>
> Although there are dozens of disk->fops->release functions, however
> it's very unlikely they need to access some inode on top of the disk
> (which is illogical thing).
>
> So I don't see any problems. It makes sense to push it to next for
> broader test ASAP. Will you do it, or me?
I've just push it into my for-next. If I heard nothing else by mid next week
I'll push it to Linus
Thanks,
NeilBrown
>
> Thanks,
> Fengguang
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: md related oops triggered in bdev_inode_switch_bdi
2011-09-01 5:49 ` NeilBrown
@ 2011-09-09 8:56 ` Lin Ming
2011-09-11 7:01 ` Sitsofe Wheeler
0 siblings, 1 reply; 5+ messages in thread
From: Lin Ming @ 2011-09-09 8:56 UTC (permalink / raw)
To: NeilBrown
Cc: Wu Fengguang, Christoph Hellwig, Hugh Dickins, Andrew Morton,
lkml
On Thu, Sep 1, 2011 at 1:49 PM, NeilBrown <neilb@suse.de> wrote:
> On Thu, 1 Sep 2011 11:30:56 +0800 Wu Fengguang <fengguang.wu@intel.com> wrote:
>
>> Hi Neil,
>>
>> > Subject: [PATCH] Avoid dereferencing a 'request_queue' after last close.
>>
>> Reviewed-by: Wu Fengguang <fengguang.wu@intel.com>
>
> Thanks.
>
>
>>
>> with comments below.
>>
>> > --- a/fs/block_dev.c
>> > +++ b/fs/block_dev.c
>> > @@ -1430,6 +1430,12 @@ static int __blkdev_put(struct block_device *bdev, fmode_t mode, int for_part)
>> > sync_blockdev(bdev);
>> > kill_bdev(bdev);
>> > }
>> > + if (!bdev->bd_openers)
>> > + /* ->release can cause the old bdi to disappear,
>> > + * so must switch it out first
>> > + */
>> > + bdev_inode_switch_bdi(bdev->bd_inode,
>> > + &default_backing_dev_info);
>> > if (bdev->bd_contains == bdev) {
>> > if (disk->fops->release)
>> > ret = disk->fops->release(disk, mode);
>>
>> The bdev_inode_switch_bdi() call can be further moved into the
>> previous if block, like this:
>>
>> if (!--bdev->bd_openers) {
>> WARN_ON_ONCE(bdev->bd_holders);
>> sync_blockdev(bdev);
>> kill_bdev(bdev);
>> +
>> + /* ->release can cause the old bdi to disappear,
>> + * so must switch it out first
>> + */
>> + bdev_inode_switch_bdi(bdev->bd_inode,
>> + &default_backing_dev_info);
>> }
>
> Yes, and obvious improvement now that you have pointed it out - thanks.
>
>
>>
>> Then it's obvious that kill_bdev() will truncate all inode pages
>> and there won't be further interactions with dirty writes.
>>
>> Although there are dozens of disk->fops->release functions, however
>> it's very unlikely they need to access some inode on top of the disk
>> (which is illogical thing).
>>
>> So I don't see any problems. It makes sense to push it to next for
>> broader test ASAP. Will you do it, or me?
>
> I've just push it into my for-next. If I heard nothing else by mid next week
> I'll push it to Linus
Just FYI,
this patch also fixes below bug which can be reproduced by unplug the
usb disk before umount.
BUG blkdev_queue: Poison overwritten
-----------------------------------------------------------------------------
INFO: 0xffff880106640308-0xffff880106640309. First byte 0x6c instead of 0x6b
INFO: Allocated in blk_alloc_queue_node+0x1f/0x1a9 age=5794 cpu=0 pid=3449
__slab_alloc+0x503/0x56e
kmem_cache_alloc_node+0x61/0x18e
blk_alloc_queue_node+0x1f/0x1a9
blk_init_queue_node+0x24/0x61
blk_init_queue+0xc/0xe
__scsi_alloc_queue+0x21/0x145
scsi_alloc_queue+0x18/0x64
scsi_alloc_sdev+0x1bb/0x282
scsi_probe_and_add_lun+0x13b/0xb6a
__scsi_scan_target+0xc5/0x60c
scsi_scan_channel+0x58/0x80
scsi_scan_host_selected+0xed/0x136
do_scsi_scan_host+0x6b/0x70
scsi_scan_host+0x1a0/0x1c5
usb_stor_scan_thread+0x16d/0x17b
kthread+0x7d/0x85
INFO: Freed in blk_release_queue+0x60/0x65 age=2267 cpu=1 pid=3484
__slab_free+0x2c/0x30d
kmem_cache_free+0xbc/0x127
blk_release_queue+0x60/0x65
kobject_release+0x51/0x67
kref_put+0x43/0x4d
kobject_put+0x47/0x4b
blk_put_queue+0x10/0x12
scsi_device_dev_release_usercontext+0xbf/0x10a
execute_in_process_context+0x2a/0x61
scsi_device_dev_release+0x17/0x19
device_release+0x49/0x7e
kobject_release+0x51/0x67
kref_put+0x43/0x4d
kobject_put+0x47/0x4b
put_device+0x12/0x14
scsi_device_put+0x3d/0x42
Thanks,
Lin Ming
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: md related oops triggered in bdev_inode_switch_bdi
2011-09-09 8:56 ` Lin Ming
@ 2011-09-11 7:01 ` Sitsofe Wheeler
0 siblings, 0 replies; 5+ messages in thread
From: Sitsofe Wheeler @ 2011-09-11 7:01 UTC (permalink / raw)
To: Lin Ming
Cc: NeilBrown, Wu Fengguang, Christoph Hellwig, Hugh Dickins,
Andrew Morton, lkml
On Fri, Sep 09, 2011 at 04:56:30PM +0800, Lin Ming wrote:
> On Thu, Sep 1, 2011 at 1:49 PM, NeilBrown <neilb@suse.de> wrote:
> > On Thu, 1 Sep 2011 11:30:56 +0800 Wu Fengguang <fengguang.wu@intel.com> wrote:
> >
> >> Hi Neil,
> >>
> >> > Subject: [PATCH] Avoid dereferencing a 'request_queue' after last close.
> >>
> >> Reviewed-by: Wu Fengguang <fengguang.wu@intel.com>
> >
> > Thanks.
> Just FYI,
> this patch also fixes below bug which can be reproduced by unplug the
> usb disk before umount.
Agreed - this fixes the oops I was seeing pulling out a USB key with a
mounted FAT32 filesystem (https://lkml.org/lkml/2011/9/5/339 ). Thanks
to Yang RuiRui and akpm for point me to this patch.
Tested-by: Sitsofe Wheeler <sitsofe@yahoo.com>
--
Sitsofe | http://sucs.org/~sits/
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2011-09-11 7:01 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-08-31 6:22 md related oops triggered in bdev_inode_switch_bdi NeilBrown
2011-09-01 3:30 ` Wu Fengguang
2011-09-01 5:49 ` NeilBrown
2011-09-09 8:56 ` Lin Ming
2011-09-11 7:01 ` Sitsofe Wheeler
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox