All of lore.kernel.org
 help / color / mirror / Atom feed
From: Al Viro <viro@ZenIV.linux.org.uk>
To: "Suzuki K. Poulose" <suzuki.poulose@arm.com>
Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	marc.zyngier@arm.com, torvalds@linux-foundation.org,
	Tejun Heo <tj@kernel.org>,
	stable@vger.kernel.org
Subject: Re: [PATCH] blkdev: Fix blkdev_open to release the bdev on error
Date: Tue, 8 Dec 2015 07:25:08 +0000	[thread overview]
Message-ID: <20151208072508.GM20997@ZenIV.linux.org.uk> (raw)
In-Reply-To: <1449511503-7543-1-git-send-email-suzuki.poulose@arm.com>

On Mon, Dec 07, 2015 at 06:05:03PM +0000, Suzuki K. Poulose wrote:
> blkdev_open() doesn't release the bdev, it attached to a given
> inode, if blkdev_get() fails (e.g, due to absence of a device).
> This can cause kernel crashes when the original filesystem
> tries to flush the data during evict_inode.
> 
> This can be triggered easily with virtio-9p fs using the following
> simple steps.

???
How can filesystem type affect the behaviour of block devices?

Having mknod /tmp/splat b 8 1; rm /tmp/splat try to evict the pagecache
of /dev/sda1 is simply wrong, no matter what type /tmp happens to have.
And they must share pagecache, or you'll get one hell of cache coherency
problems.  As it is, that pagecache belongs to inode on bdevfs (see
fs/block_dev.c; not mountable anywhere visible, the one and only mount is
internal).  That inode is tied to struct bdev, ditto for its lifetime.

Block device inodes on anything else have their ->i_mapping pointing to
the corresponding (unique for given major/minor) inode on bdevfs; that
gives us the coherency, but that also means that their *own* pagecache
(->i_data) is empty.  Which is just fine, since inode eviction should
get rid of everything in its embedded struct address_space.  In case of
block device inodes on ext2, 9p, etc. that amounts to no pages at all.
In case of bdevfs, it contains the page cache of block device.

<looks> 
Aha...
        truncate_inode_pages_final(inode->i_mapping);
        clear_inode(inode);
        filemap_fdatawrite(inode->i_mapping);

in there is obviously wrong - it should be

        truncate_inode_pages_final(&inode->i_data);
        clear_inode(inode);
        filemap_fdatawrite(&inode->i_data);

and if you check other filesystems' ->evict_inode() you'll see the same thing
there.

We should not do bd_forget() upon failing open() - what for?  As long as
->i_rdev remains the same, the pointer to struct bdev is valid.  It
doesn't pin bdev down; having it (or any other alias) opened does.  When
we decide to evict bdev, *all* aliasing inodes are dissociated from it;
none of them is open at that point, so we are OK.  When an aliasing inode
gets evicted, we have it dissociated from its ->i_bdev (if any).  Since we
only access the ->i_mapping of aliasing inode while its open, those places
are fine and anything that wants ->i_data of alias will simply find it empty.

AFAICS, the cause of your oopsen is that 9p evict_inode is accessing the
object it has no business to touch.

Could you confirm that the patch below fixes your problem?

diff --git a/fs/9p/vfs_inode.c b/fs/9p/vfs_inode.c
index 699941e..5110785 100644
--- a/fs/9p/vfs_inode.c
+++ b/fs/9p/vfs_inode.c
@@ -451,9 +451,9 @@ void v9fs_evict_inode(struct inode *inode)
 {
 	struct v9fs_inode *v9inode = V9FS_I(inode);
 
-	truncate_inode_pages_final(inode->i_mapping);
+	truncate_inode_pages_final(&inode->i_data);
 	clear_inode(inode);
-	filemap_fdatawrite(inode->i_mapping);
+	filemap_fdatawrite(&inode->i_data);
 
 	v9fs_cache_inode_put_cookie(inode);
 	/* clunk the fid stashed in writeback_fid */

  parent reply	other threads:[~2015-12-08  7:25 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-12-07 18:05 [PATCH] blkdev: Fix blkdev_open to release the bdev on error Suzuki K. Poulose
2015-12-07 18:49 ` Linus Torvalds
2015-12-08  7:58   ` Al Viro
2015-12-08 10:08     ` Suzuki K. Poulose
2015-12-08 11:56       ` Vegard Nossum
2015-12-08  7:25 ` Al Viro [this message]
2015-12-08 10:07   ` Suzuki K. Poulose

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20151208072508.GM20997@ZenIV.linux.org.uk \
    --to=viro@zeniv.linux.org.uk \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=marc.zyngier@arm.com \
    --cc=stable@vger.kernel.org \
    --cc=suzuki.poulose@arm.com \
    --cc=tj@kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.