From: Jan Kara <jack@suse.cz>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Wei Fang <fangwei1@huawei.com>,
jack@suse.cz, hannes@cmpxchg.org, hch@infradead.org,
linux-mm@kvack.org, stable@vger.kernel.org,
Al Viro <viro@zeniv.linux.org.uk>
Subject: Re: [PATCH] mm: Fix a NULL dereference crash while accessing bdev->bd_disk
Date: Wed, 30 Nov 2016 08:30:45 +0100 [thread overview]
Message-ID: <20161130073045.GA16667@quack2.suse.cz> (raw)
In-Reply-To: <20161129150828.e0a4897160b9ee7301e5f554@linux-foundation.org>
On Tue 29-11-16 15:08:28, Andrew Morton wrote:
> On Sat, 26 Nov 2016 10:06:22 +0800 Wei Fang <fangwei1@huawei.com> wrote:
>
> > ->bd_disk is assigned to NULL in __blkdev_put() when no one is holding
> > the bdev. After that, ->bd_inode still can be touched in the
> > blockdev_superblock->s_inodes list before the final iput. So iterate_bdevs()
> > can still get this inode, and start writeback on mapping dirty pages.
> > ->bd_disk will be dereferenced in mapping_cap_writeback_dirty() in this
> > case, and a NULL dereference crash will be triggered:
> >
> > Unable to handle kernel NULL pointer dereference at virtual address 00000388
> > ...
> > [<ffff8000004cb1e4>] blk_get_backing_dev_info+0x1c/0x28
> > [<ffff8000001c879c>] __filemap_fdatawrite_range+0x54/0x98
> > [<ffff8000001c8804>] filemap_fdatawrite+0x24/0x2c
> > [<ffff80000027e7a4>] fdatawrite_one_bdev+0x20/0x28
> > [<ffff800000288b44>] iterate_bdevs+0xec/0x144
> > [<ffff80000027eb50>] sys_sync+0x84/0xd0
> >
> > Since mapping_cap_writeback_dirty() is always return true about
> > block device inodes, no need to check it if the inode is a block
> > device inode.
> >
> > ...
> >
> > --- a/mm/filemap.c
> > +++ b/mm/filemap.c
> > @@ -334,8 +334,9 @@ int __filemap_fdatawrite_range(struct address_space *mapping, loff_t start,
> > .range_end = end,
> > };
> >
> > - if (!mapping_cap_writeback_dirty(mapping))
> > - return 0;
> > + if (!sb_is_blkdev_sb(mapping->host->i_sb))
> > + if (!mapping_cap_writeback_dirty(mapping))
> > + return 0;
> >
> > wbc_attach_fdatawrite_inode(&wbc, mapping->host);
> > ret = do_writepages(mapping, &wbc);
>
> This seems wrong to me. If __blkdev_put() has got so deep into the
> release process as to be zeroing out ->bd_disk then the blockdev's
> inode shouldn't be visible to iterate_bdevs()?
That's the trouble with how block devices currently work. On last close of
the block device, the block device inode is detached from bd_disk and thus
from request_queue & bdi. bd_disk & company gets freed, inode stays (bdev
inode is referenced by inodes representing block device in the filesystem
which are referenced by dentries). This happens asynchronously wrt
iterate_bdevs() and inode_to_bdi() calls in general - any inode_to_bdi()
call on block device inode can oops if it happens to race with
__blkdev_put(). The use of inode_to_bdi() in mapping_cap_writeback_dirty()
from iterate_bdevs() is one such possibility - that is relatively easy to
fix by modifying iterate_bdevs() however it is not so easy to protect in
this way inode_to_bdi() calls in writeback happening periodically from the
flusher work.
Honza
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
prev parent reply other threads:[~2016-11-30 7:30 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-11-26 2:06 [PATCH] mm: Fix a NULL dereference crash while accessing bdev->bd_disk Wei Fang
2016-11-28 10:07 ` Jan Kara
2016-11-28 15:57 ` Tejun Heo
2016-11-29 9:30 ` Jan Kara
2016-11-29 16:43 ` Tejun Heo
2016-11-30 9:50 ` Jan Kara
2016-11-29 1:58 ` Wei Fang
2016-11-30 9:51 ` Jan Kara
2016-12-01 2:30 ` Wei Fang
2016-12-01 8:18 ` Jan Kara
2016-11-29 23:08 ` Andrew Morton
2016-11-30 7:30 ` Jan Kara [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20161130073045.GA16667@quack2.suse.cz \
--to=jack@suse.cz \
--cc=akpm@linux-foundation.org \
--cc=fangwei1@huawei.com \
--cc=hannes@cmpxchg.org \
--cc=hch@infradead.org \
--cc=linux-mm@kvack.org \
--cc=stable@vger.kernel.org \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).