From: Jeff Mahoney <jeffm@jeffm.io>
To: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Roman Lebedev <lebedev.ri@gmail.com>,
David Howells <dhowells@redhat.com>,
linux-btrfs@vger.kernel.org, linux-unionfs@vger.kernel.org,
linux-fsdevel@vger.kernel.org, fstests@vger.kernel.org,
Filipe Manana <fdmanana@suse.com>
Subject: Re: kernel BUG when fsync'ing file in a overlayfs merged dir, located on btrfs
Date: Thu, 24 Mar 2016 11:31:56 -0400 [thread overview]
Message-ID: <56F4086C.2050609@jeffm.io> (raw)
In-Reply-To: <20160324152043.GA22781@ZenIV.linux.org.uk>
[-- Attachment #1.1: Type: text/plain, Size: 4184 bytes --]
On 3/24/16 11:20 AM, Al Viro wrote:
> On Thu, Nov 05, 2015 at 11:03:58PM -0500, Jeff Mahoney wrote:
>
>> I suppose the irony here is that, AFAIK, that code is to ensure a file
>> doesn't get lost between transactions due to rename.
>>
>> Isn't the file->f_path.dentry relationship stable otherwise, though? The
>> name might change and the parent might change but the dentry that the
>> file points to won't.
>
> Sure, it is stable. But that doesn't guarantee anything about the
> ancestors.
>
> btrfs_log_inode_parent() is a mess. Look at that loop you've got there:
>
> while (1) {
> if (!parent || d_really_is_negative(parent) || sb != d_inode(parent)->i_sb)
> break;
>
> Really? NULL from dget_parent()? A negative from dget_parent()? Sodding
> superblock changing from transition to parent?
>
> inode = d_inode(parent);
> if (root != BTRFS_I(inode)->root)
> break;
>
> Umm... Something like crossing a snapshot boundary?
>
> if (BTRFS_I(inode)->generation > last_committed) {
> ret = btrfs_log_inode(trans, root, inode,
> LOG_INODE_EXISTS,
> 0, LLONG_MAX, ctx);
> if (ret)
> goto end_trans;
> }
> if (IS_ROOT(parent))
> break;
>
> parent = dget_parent(parent);
> dput(old_parent);
> old_parent = parent;
> }
>
> Note that there's not a damn thing to guarantee that those directories have
> anything to do with your file - race with rename(2) easily could've sent you
> chasing the wrong chain of ancestors. Incidentally, what will happen if
> you do that fsync() to an unlinked file? It still has ->d_parent pointing
> to the what used to be its parent (quite possibly itself already rmdir'ed and
> pinned down by that reference; inode is still busy, as if we'd done open and
> rmdir). Is that what those checks are attempting to catch? And what happens
> if rmdir happens between the check and btrfs_log_inode()?
>
> The thing is, playing with ancestors of opened file is very easy to get
> wrong, regardless of overlayfs involvement. And this code is anything
> but clear. I don't know btrfs guts enough to be able to tell how much
> can actually break (as opposed to adding wrong inodes to transaction),
> but I would really suggest taking a good look at what's going on there...
Yeah, absolutely. The file_dentry() patch that you and Miklos worked
out the other day fixes the fsync crashes for us when we use it in
btrfs_file_sync but you're 100% correct in your opinion of this code.
>> I did find a few other places where that assumption happens without any
>> questionable traversals. Sure, all three are in file systems unlikely
>> to be used with overlayfs.
>>
>> ocfs2_prepare_inode_for_write uses file->f_path.dentry for
>> should_remove_suid (due to needing to do it early since cluster locking
>> is unknown in setattr, according to the commit). Having
>> should_remove_suid operate on an inode would solve that easily.
>
> Can't do - there are filesystems that _need_ dentry for ->setattr().
>
>
> Grr... Sorry, misread what you'd written. should_remove_suid()
> ought to be switched to inode, indeed.
Ok, I'll write that up.
>> cifs_new_fileinfo keeps a reference to the dentry but it seems to be
>> used mostly to access the inode except for the nasty-looking call to
>> build_path_from_dentry in cifs_reopen_file, which I won't be touching.
>> That does look like a questionable traversal, especially with the "we
>> can't take the rename lock here" comment.
>
> cifs uses fs-root-relative pathnames in protocol; nothing to do there.
> Where do you see that comment, BTW? It uses read_seqbegin/read_seqretry
> on rename_lock there...
I must've been looking at old code. I don't see it now either.
-Jeff
--
Jeff Mahoney
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 881 bytes --]
prev parent reply other threads:[~2016-03-24 15:32 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-09-30 19:57 kernel BUG when fsync'ing file in a overlayfs merged dir, located on btrfs Roman Lebedev
2015-09-30 19:57 ` Roman Lebedev
2015-09-30 19:57 ` [RFC PATCH] fstests: generic: Test that fsync works on file in overlayfs merged directory Roman Lebedev
2015-09-30 21:56 ` Dave Chinner
2015-09-30 22:07 ` Eric Sandeen
2015-11-06 2:57 ` kernel BUG when fsync'ing file in a overlayfs merged dir, located on btrfs Jeff Mahoney
2015-11-06 3:18 ` Al Viro
2015-11-06 4:03 ` Jeff Mahoney
2015-11-06 14:46 ` Jeff Mahoney
2016-03-24 15:20 ` Al Viro
2016-03-24 15:25 ` Al Viro
2016-03-24 15:31 ` Jeff Mahoney [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=56F4086C.2050609@jeffm.io \
--to=jeffm@jeffm.io \
--cc=dhowells@redhat.com \
--cc=fdmanana@suse.com \
--cc=fstests@vger.kernel.org \
--cc=lebedev.ri@gmail.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-unionfs@vger.kernel.org \
--cc=viro@ZenIV.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.