From: Jeff Mahoney <jeffm@jeffm.io>
To: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Roman Lebedev <lebedev.ri@gmail.com>,
David Howells <dhowells@redhat.com>,
linux-btrfs@vger.kernel.org, linux-unionfs@vger.kernel.org,
linux-fsdevel@vger.kernel.org, fstests@vger.kernel.org,
Filipe Manana <fdmanana@suse.com>
Subject: Re: kernel BUG when fsync'ing file in a overlayfs merged dir, located on btrfs
Date: Thu, 24 Mar 2016 11:31:56 -0400 [thread overview]
Message-ID: <56F4086C.2050609@jeffm.io> (raw)
In-Reply-To: <20160324152043.GA22781@ZenIV.linux.org.uk>
[-- Attachment #1.1: Type: text/plain, Size: 4184 bytes --]
On 3/24/16 11:20 AM, Al Viro wrote:
> On Thu, Nov 05, 2015 at 11:03:58PM -0500, Jeff Mahoney wrote:
>
>> I suppose the irony here is that, AFAIK, that code is to ensure a file
>> doesn't get lost between transactions due to rename.
>>
>> Isn't the file->f_path.dentry relationship stable otherwise, though? The
>> name might change and the parent might change but the dentry that the
>> file points to won't.
>
> Sure, it is stable. But that doesn't guarantee anything about the
> ancestors.
>
> btrfs_log_inode_parent() is a mess. Look at that loop you've got there:
>
> while (1) {
> if (!parent || d_really_is_negative(parent) || sb != d_inode(parent)->i_sb)
> break;
>
> Really? NULL from dget_parent()? A negative from dget_parent()? Sodding
> superblock changing from transition to parent?
>
> inode = d_inode(parent);
> if (root != BTRFS_I(inode)->root)
> break;
>
> Umm... Something like crossing a snapshot boundary?
>
> if (BTRFS_I(inode)->generation > last_committed) {
> ret = btrfs_log_inode(trans, root, inode,
> LOG_INODE_EXISTS,
> 0, LLONG_MAX, ctx);
> if (ret)
> goto end_trans;
> }
> if (IS_ROOT(parent))
> break;
>
> parent = dget_parent(parent);
> dput(old_parent);
> old_parent = parent;
> }
>
> Note that there's not a damn thing to guarantee that those directories have
> anything to do with your file - race with rename(2) easily could've sent you
> chasing the wrong chain of ancestors. Incidentally, what will happen if
> you do that fsync() to an unlinked file? It still has ->d_parent pointing
> to the what used to be its parent (quite possibly itself already rmdir'ed and
> pinned down by that reference; inode is still busy, as if we'd done open and
> rmdir). Is that what those checks are attempting to catch? And what happens
> if rmdir happens between the check and btrfs_log_inode()?
>
> The thing is, playing with ancestors of opened file is very easy to get
> wrong, regardless of overlayfs involvement. And this code is anything
> but clear. I don't know btrfs guts enough to be able to tell how much
> can actually break (as opposed to adding wrong inodes to transaction),
> but I would really suggest taking a good look at what's going on there...
Yeah, absolutely. The file_dentry() patch that you and Miklos worked
out the other day fixes the fsync crashes for us when we use it in
btrfs_file_sync but you're 100% correct in your opinion of this code.
>> I did find a few other places where that assumption happens without any
>> questionable traversals. Sure, all three are in file systems unlikely
>> to be used with overlayfs.
>>
>> ocfs2_prepare_inode_for_write uses file->f_path.dentry for
>> should_remove_suid (due to needing to do it early since cluster locking
>> is unknown in setattr, according to the commit). Having
>> should_remove_suid operate on an inode would solve that easily.
>
> Can't do - there are filesystems that _need_ dentry for ->setattr().
>
>
> Grr... Sorry, misread what you'd written. should_remove_suid()
> ought to be switched to inode, indeed.
Ok, I'll write that up.
>> cifs_new_fileinfo keeps a reference to the dentry but it seems to be
>> used mostly to access the inode except for the nasty-looking call to
>> build_path_from_dentry in cifs_reopen_file, which I won't be touching.
>> That does look like a questionable traversal, especially with the "we
>> can't take the rename lock here" comment.
>
> cifs uses fs-root-relative pathnames in protocol; nothing to do there.
> Where do you see that comment, BTW? It uses read_seqbegin/read_seqretry
> on rename_lock there...
I must've been looking at old code. I don't see it now either.
-Jeff
--
Jeff Mahoney
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 881 bytes --]
prev parent reply other threads:[~2016-03-24 15:32 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-09-30 19:57 kernel BUG when fsync'ing file in a overlayfs merged dir, located on btrfs Roman Lebedev
2015-09-30 19:57 ` [RFC PATCH] fstests: generic: Test that fsync works on file in overlayfs merged directory Roman Lebedev
2015-09-30 21:56 ` Dave Chinner
2015-09-30 22:07 ` Eric Sandeen
2015-11-06 2:57 ` kernel BUG when fsync'ing file in a overlayfs merged dir, located on btrfs Jeff Mahoney
2015-11-06 3:18 ` Al Viro
2015-11-06 4:03 ` Jeff Mahoney
2015-11-06 14:46 ` Jeff Mahoney
2016-03-24 15:20 ` Al Viro
2016-03-24 15:25 ` Al Viro
2016-03-24 15:31 ` Jeff Mahoney [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=56F4086C.2050609@jeffm.io \
--to=jeffm@jeffm.io \
--cc=dhowells@redhat.com \
--cc=fdmanana@suse.com \
--cc=fstests@vger.kernel.org \
--cc=lebedev.ri@gmail.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-unionfs@vger.kernel.org \
--cc=viro@ZenIV.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).