From: Jan Kara <jack@suse.cz>
To: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Dmitry Vyukov <dvyukov@google.com>,
"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
LKML <linux-kernel@vger.kernel.org>, Jens Axboe <axboe@fb.com>,
Andrew Morton <akpm@linux-foundation.org>,
Tejun Heo <tj@kernel.org>, Jan Kara <jack@suse.cz>,
Johannes Weiner <hannes@cmpxchg.org>,
"linux-mm@kvack.org" <linux-mm@kvack.org>,
Andrey Ryabinin <aryabinin@virtuozzo.com>,
syzkaller <syzkaller@googlegroups.com>
Subject: Re: mm: GPF in bdi_put
Date: Wed, 1 Mar 2017 16:05:44 +0100 [thread overview]
Message-ID: <20170301150544.GH20512@quack2.suse.cz> (raw)
In-Reply-To: <20170301142909.GG20512@quack2.suse.cz>
[-- Attachment #1: Type: text/plain, Size: 2631 bytes --]
On Wed 01-03-17 15:29:09, Jan Kara wrote:
> On Mon 27-02-17 18:27:55, Al Viro wrote:
> > On Mon, Feb 27, 2017 at 06:11:11PM +0100, Dmitry Vyukov wrote:
> > > Hello,
> > >
> > > The following program triggers GPF in bdi_put:
> > > https://gist.githubusercontent.com/dvyukov/15b3e211f937ff6abc558724369066ce/raw/cc017edf57963e30175a6a6fe2b8d917f6e92899/gistfile1.txt
> >
> > What happens is
> > * attempt of, essentially, mount -t bdev ..., calls mount_pseudo()
> > and then promptly destroys the new instance it has created.
> > * the only inode created on that sucker (root directory, that
> > is) gets evicted.
> > * most of ->evict_inode() is harmless, until it gets to
> > if (bdev->bd_bdi != &noop_backing_dev_info)
> > bdi_put(bdev->bd_bdi);
>
> Thanks for the analysis!
>
> > added there by "block: Make blk_get_backing_dev_info() safe without open bdev".
> > Since ->bd_bdi hadn't been initialized for that sucker (the same patch has
> > placed initialization into bdget()), we step into shit of varying nastiness,
> > depending on phase of moon, etc.
>
> Yup, I've missed that the root inode of bdev superblock does not go through
> bdget() (in fact I didn't think what happens with root inode for bdev
> superblock at all) and thus bd_bdi is left uninitialized in that case. I'll
> send a fix for that in a while.
>
> > Could somebody explain WTF do we have those two lines in bdev_evict_inode(),
> > anyway? We set ->bd_bdi to something other than noop_backing_dev_info only
> > in __blkdev_get() when ->bd_openers goes from zero to positive, so why is
> > the matching bdi_put() not in __blkdev_put()? Jan?
>
> The problem is writeback code (from flusher work or through sync(2) -
> generally inode_to_bdi() users) can be looking at bdev inode independently
> from it being open. So if they start looking while the bdev is open but the
> dereference happens after it is closed and device removed, we oops. We have
> seen oopses due to this for quite a while. And all the stuff that is done
> in __blkdev_put() is not enough to prevent writeback code from having a
> look whether there is not something to write.
>
> So what we do now is that once we establish valid bd_bdi reference, we
> leave it alone until bdev inode gets evicted. And to handle the case when
> underlying device actually changes, we unhash bdev inode when the device
> gets removed from the system so that it cannot be found by bdget() anymore.
Attached patch fixes the problem for me. I'll post it officially tomorrow
once Al has a chance to reply...
Honza
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
[-- Attachment #2: 0001-block-Initialize-bd_bdi-on-inode-initialization.patch --]
[-- Type: text/x-patch, Size: 2034 bytes --]
>From a533c8dd1fb4dbf840cd3adaf68afb6ad6851ddc Mon Sep 17 00:00:00 2001
From: Jan Kara <jack@suse.cz>
Date: Wed, 1 Mar 2017 15:31:11 +0100
Subject: [PATCH] block: Initialize bd_bdi on inode initialization
So far we initialized bd_bdi only in bdget(). That is fine for normal
bdev inodes however for the special case of the root inode of
blockdev_superblock that function is never called and thus bd_bdi is
left uninitialized. As a result bdev_evict_inode() may oops doing
bdi_put(root->bd_bdi) on that inode as can be seen when doing:
mount -t bdev none /mnt
Fix the problem by initializing bd_bdi when first allocating the inode
and then reinitializing bd_bdi in bdev_evict_inode().
Thanks to syzkaller team for finding the problem.
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Fixes: b1d2dc5659b41741f5a29b2ade76ffb4e5bb13d8
Signed-off-by: Jan Kara <jack@suse.cz>
---
fs/block_dev.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/fs/block_dev.c b/fs/block_dev.c
index 77c30f15a02c..2eca00ec4370 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -870,6 +870,7 @@ static void init_once(void *foo)
#ifdef CONFIG_SYSFS
INIT_LIST_HEAD(&bdev->bd_holder_disks);
#endif
+ bdev->bd_bdi = &noop_backing_dev_info;
inode_init_once(&ei->vfs_inode);
/* Initialize mutex for freeze. */
mutex_init(&bdev->bd_fsfreeze_mutex);
@@ -884,8 +885,10 @@ static void bdev_evict_inode(struct inode *inode)
spin_lock(&bdev_lock);
list_del_init(&bdev->bd_list);
spin_unlock(&bdev_lock);
- if (bdev->bd_bdi != &noop_backing_dev_info)
+ if (bdev->bd_bdi != &noop_backing_dev_info) {
bdi_put(bdev->bd_bdi);
+ bdev->bd_bdi = &noop_backing_dev_info;
+ }
}
static const struct super_operations bdev_sops = {
@@ -988,7 +991,6 @@ struct block_device *bdget(dev_t dev)
bdev->bd_contains = NULL;
bdev->bd_super = NULL;
bdev->bd_inode = inode;
- bdev->bd_bdi = &noop_backing_dev_info;
bdev->bd_block_size = i_blocksize(inode);
bdev->bd_part_count = 0;
bdev->bd_invalidated = 0;
--
2.10.2
next prev parent reply other threads:[~2017-03-01 15:05 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-02-27 17:11 mm: GPF in bdi_put Dmitry Vyukov
2017-02-27 17:14 ` Dmitry Vyukov
2017-02-27 18:27 ` Al Viro
2017-02-28 17:55 ` Dmitry Vyukov
2017-02-28 18:23 ` Al Viro
2017-03-01 14:29 ` Jan Kara
2017-03-01 15:05 ` Jan Kara [this message]
2017-03-02 11:44 ` Al Viro
2017-03-02 12:20 ` Jan Kara
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170301150544.GH20512@quack2.suse.cz \
--to=jack@suse.cz \
--cc=akpm@linux-foundation.org \
--cc=aryabinin@virtuozzo.com \
--cc=axboe@fb.com \
--cc=dvyukov@google.com \
--cc=hannes@cmpxchg.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=syzkaller@googlegroups.com \
--cc=tj@kernel.org \
--cc=viro@ZenIV.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).