From: Jan Kara <jack@suse.cz>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Wei Fang <fangwei1@huawei.com>,
jack@suse.cz, hannes@cmpxchg.org, hch@infradead.org,
linux-mm@kvack.org, stable@vger.kernel.org,
Al Viro <viro@zeniv.linux.org.uk>
Subject: Re: [PATCH] mm: Fix a NULL dereference crash while accessing bdev->bd_disk
Date: Wed, 30 Nov 2016 08:30:45 +0100 [thread overview]
Message-ID: <20161130073045.GA16667@quack2.suse.cz> (raw)
In-Reply-To: <20161129150828.e0a4897160b9ee7301e5f554@linux-foundation.org>
On Tue 29-11-16 15:08:28, Andrew Morton wrote:
> On Sat, 26 Nov 2016 10:06:22 +0800 Wei Fang <fangwei1@huawei.com> wrote:
>
> > ->bd_disk is assigned to NULL in __blkdev_put() when no one is holding
> > the bdev. After that, ->bd_inode still can be touched in the
> > blockdev_superblock->s_inodes list before the final iput. So iterate_bdevs()
> > can still get this inode, and start writeback on mapping dirty pages.
> > ->bd_disk will be dereferenced in mapping_cap_writeback_dirty() in this
> > case, and a NULL dereference crash will be triggered:
> >
> > Unable to handle kernel NULL pointer dereference at virtual address 00000388
> > ...
> > [<ffff8000004cb1e4>] blk_get_backing_dev_info+0x1c/0x28
> > [<ffff8000001c879c>] __filemap_fdatawrite_range+0x54/0x98
> > [<ffff8000001c8804>] filemap_fdatawrite+0x24/0x2c
> > [<ffff80000027e7a4>] fdatawrite_one_bdev+0x20/0x28
> > [<ffff800000288b44>] iterate_bdevs+0xec/0x144
> > [<ffff80000027eb50>] sys_sync+0x84/0xd0
> >
> > Since mapping_cap_writeback_dirty() is always return true about
> > block device inodes, no need to check it if the inode is a block
> > device inode.
> >
> > ...
> >
> > --- a/mm/filemap.c
> > +++ b/mm/filemap.c
> > @@ -334,8 +334,9 @@ int __filemap_fdatawrite_range(struct address_space *mapping, loff_t start,
> > .range_end = end,
> > };
> >
> > - if (!mapping_cap_writeback_dirty(mapping))
> > - return 0;
> > + if (!sb_is_blkdev_sb(mapping->host->i_sb))
> > + if (!mapping_cap_writeback_dirty(mapping))
> > + return 0;
> >
> > wbc_attach_fdatawrite_inode(&wbc, mapping->host);
> > ret = do_writepages(mapping, &wbc);
>
> This seems wrong to me. If __blkdev_put() has got so deep into the
> release process as to be zeroing out ->bd_disk then the blockdev's
> inode shouldn't be visible to iterate_bdevs()?
That's the trouble with how block devices currently work. On last close of
the block device, the block device inode is detached from bd_disk and thus
from request_queue & bdi. bd_disk & company gets freed, inode stays (bdev
inode is referenced by inodes representing block device in the filesystem
which are referenced by dentries). This happens asynchronously wrt
iterate_bdevs() and inode_to_bdi() calls in general - any inode_to_bdi()
call on block device inode can oops if it happens to race with
__blkdev_put(). The use of inode_to_bdi() in mapping_cap_writeback_dirty()
from iterate_bdevs() is one such possibility - that is relatively easy to
fix by modifying iterate_bdevs() however it is not so easy to protect in
this way inode_to_bdi() calls in writeback happening periodically from the
flusher work.
Honza
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
prev parent reply other threads:[~2016-11-30 7:30 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-11-26 2:06 [PATCH] mm: Fix a NULL dereference crash while accessing bdev->bd_disk Wei Fang
2016-11-26 2:06 ` Wei Fang
2016-11-28 10:07 ` Jan Kara
2016-11-28 15:57 ` Tejun Heo
2016-11-29 9:30 ` Jan Kara
2016-11-29 16:43 ` Tejun Heo
2016-11-30 9:50 ` Jan Kara
2016-11-29 1:58 ` Wei Fang
2016-11-29 1:58 ` Wei Fang
2016-11-30 9:51 ` Jan Kara
2016-11-30 9:51 ` Jan Kara
2016-12-01 2:30 ` Wei Fang
2016-12-01 2:30 ` Wei Fang
2016-12-01 8:18 ` Jan Kara
2016-11-29 23:08 ` Andrew Morton
2016-11-29 23:08 ` Andrew Morton
2016-11-30 7:30 ` Jan Kara [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20161130073045.GA16667@quack2.suse.cz \
--to=jack@suse.cz \
--cc=akpm@linux-foundation.org \
--cc=fangwei1@huawei.com \
--cc=hannes@cmpxchg.org \
--cc=hch@infradead.org \
--cc=linux-mm@kvack.org \
--cc=stable@vger.kernel.org \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.