From: Stefan Priebe <s.priebe@profihost.ag>
To: Jeff Mahoney <jeffm@suse.com>, Christoph Hellwig <hch@lst.de>
Cc: "linux-btrfs@vger.kernel.org" <linux-btrfs@vger.kernel.org>,
linux-fsdevel@vger.kernel.org, Chris Mason <clm@fb.com>
Subject: Re: btrfs regression since 4.X kernel NULL pointer dereference
Date: Sat, 12 Sep 2015 01:31:08 +0200 [thread overview]
Message-ID: <55F3643C.3010700@profihost.ag> (raw)
In-Reply-To: <55F32600.6080201@suse.com>
Am 11.09.2015 um 21:05 schrieb Jeff Mahoney:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On 9/11/15 2:55 PM, Jeff Mahoney wrote:
>> On 8/25/15 5:00 AM, Christoph Hellwig wrote:
>>> I think this is btrfs using a struct block_device that doesn't
>>> have a valid queue pointer in it's gendisk for ->s_bdev. And
>>> there are some fishy looking ->s_bdev assignments in the code
>>> which I suspect are related to it:
>>
>>> fs/btrfs/dev-replace.c: if (fs_info->sb->s_bdev ==
>>> src_device->bdev) fs/btrfs/dev-replace.c: fs_info->sb->s_bdev =
>>> tgt_device->bdev; fs/btrfs/volumes.c: if (device->bdev ==
>>> root->fs_info->sb->s_bdev) fs/btrfs/volumes.c:
>>> root->fs_info->sb->s_bdev = next_device->bdev;
>>> fs/btrfs/volumes.c: if (tgtdev->bdev == fs_info->sb->s_bdev)
>>> fs/btrfs/volumes.c: fs_info->sb->s_bdev = next_device->bdev;
>>
>> The report at https://bugzilla.kernel.org/show_bug.cgi?id=100911
>> tracks it down a bit further and it's bdev->bd_disk == NULL instead
>> of the queue in the gendisk. I don't think that the s_bdev stuff
>> is related, though I'd certainly love to see that bit go away.
>>
>> If we're calling blk_get_backing_dev_info, that means we're
>> already using an inode that has blockdev_superblock and the btrfs
>> superblock isn't even involved.
>>
>> We're getting there because btrfs_evict_inode ->
>> btrfs_wait_ordered_range -> btrfs_fdatawrite_range ->
>> filemap_fdatawrite_range gets called with inode->i_mapping. That
>> mapping gets passed down through __filemap_fdatawrite_range to
>> wbc_attach_fdatawrite_inode where the inode passed is mapping->host
>> -- which will be the block device inode rather than the btrfs
>> device node inode. That inode is the one ultimately checked in
>> inode_to_bdi.
>>
>> So it looks like we're causing writeback on an unrelated block
>> device that was opened using a device node hosted on btrfs, which
>> is obviously wrong.
>>
>> I don't think snapshot removal is even a requirement to trigger
>> this. I expect it's possible to trigger with two device nodes for
>> the same block device where one is getting closed and cleaned up
>> while the eviction of the other happens. The device nodes wouldn't
>> even need to be on the same fs.
>>
>> Other file systems use &inode->i_data in eviction. Is it that
>> simple here?
Your patch works fine here. Did some simple tests already.
Thanks!
Stefan
>
> Incidentally, this explanation also covers why I was unable to
> reproduce it locally. SLES systems use devtmpfs and I just bind
> mounted it into my chroot environment like I normally would. When I
> cp'd /dev into the test environment, I was able to reproduce immediately
> .
>
> - -Jeff
>
> - --
> Jeff Mahoney
> SUSE Labs
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG/MacGPG2 v2.0.19 (Darwin)
>
> iQIcBAEBAgAGBQJV8yYAAAoJEB57S2MheeWy2BwP+QGdpsErIfyHJcx95LLrvsxu
> n0kBoI4Jd5yfNxp8m+Ll3xgUdsd6rKHJV2Muq8aRdNEdzf1E0DFrRcE0d1W5UrJy
> lPzrA8QxCVaLf5jFysFp0xygKbLKHGmOAv2KnAGYFw6exIjb344UnZb6aiw5Uekm
> DqrTmEq+0Yb/mE04GVpWMylK6pkDOhgkOzFVZa1Pff0eKY4E61G5GtmA2kNAUP9v
> CsoZ0FO1WdF2Fc9ONSPjq7FdZLKH+OmIVakHnaELa8EEM3W7NU+mxLRabznBV25e
> L/KPjr+awzkhV1ieyAAww/dddE3bN5nmDOq+OgvA9WPgaRvvwne2tHVTFaxHoiHg
> d8oHDLkC1/Z1MqINLi5dZNsSuIWMvRhIMV9Th5F2rdWxrBCSRvID7N+Z2HHh6mJC
> Q9rgSOyYKclTam6IF7yX8lDWIqkAnoA6OxvOKRccgr3hS/u4DzVtRmWHO9RblEi+
> a9dF2FCP+v+Lgdb8C5n7XUixrtF5H6BWHhmArgjmxD6iyeXOmphyGrgqmSLdY1s9
> sakvLrSB9i3O27CKoup2OHyF6MOdgsaa90FZLPLt6BDrCTWAscd0LDy8MbaKgKCR
> kjfSTiwNydzZfkJixH71U/1mGbuB9nqf6jrNWCQdE5f57MSCEwiFqQvaD1KK+Uug
> ZW2Bz1VQxkOvGbYiJ4HV
> =ic4+
> -----END PGP SIGNATURE-----
>
next prev parent reply other threads:[~2015-09-11 23:31 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-08-22 17:29 btrfs regression since 4.X kernel NULL pointer dereference Stefan Priebe
2015-08-25 9:00 ` Christoph Hellwig
2015-08-25 9:44 ` Stefan Priebe - Profihost AG
2015-08-25 13:51 ` Chris Mason
2015-08-31 17:32 ` Stefan Priebe - Profihost AG
2015-09-01 0:06 ` Chris Mason
2015-09-01 4:41 ` Stefan Priebe
2015-09-11 23:21 ` Christoph Biedl
2015-09-10 22:21 ` Jeff Mahoney
2015-09-11 4:55 ` Stefan Priebe
2015-09-11 18:55 ` Jeff Mahoney
2015-09-11 19:05 ` Jeff Mahoney
2015-09-11 23:31 ` Stefan Priebe [this message]
2015-09-11 19:34 ` Chris Mason
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=55F3643C.3010700@profihost.ag \
--to=s.priebe@profihost.ag \
--cc=clm@fb.com \
--cc=hch@lst.de \
--cc=jeffm@suse.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.