From: Al Viro <viro@ZenIV.linux.org.uk>
To: Dave Chinner <david@fromorbit.com>
Cc: linux-fsdevel@vger.kernel.org,
Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: [RFC][PATCH] do d_instantiate/unlock_new_inode combinations safely
Date: Fri, 11 May 2018 03:18:43 +0100 [thread overview]
Message-ID: <20180511021843.GY30522@ZenIV.linux.org.uk> (raw)
In-Reply-To: <20180511013208.GV23861@dastard>
On Fri, May 11, 2018 at 11:32:08AM +1000, Dave Chinner wrote:
> i.e. we already have code in xfs_setup_inode() that sets the xfs
> inode ILOCK rwsem dir/non-dir lockdep class before the new inode is
> unlocked - we could just do the i_rwsem lockdep setup there, too.
... which would suffice -
if (S_ISDIR(inode->i_mode)) {
struct file_system_type *type = inode->i_sb->s_type;
/* Set new key only if filesystem hasn't already changed it */
if (lockdep_match_class(&inode->i_rwsem, &type->i_mutex_key)) {
in lockdep_annotate_inode_mutex_key() would make sure that ->i_rwsem will be
left alone by unlock_new_inode().
> Then, if we were to factor unlock_new_inode() as Andreas suggested,
> we could call __unlock_new_inode() from xfs_finish_inode_setup().
No need - if you set the class in xfs_setup_inode(), you are fine.
Said that, hash insertion is also potentially delicate - another ext2/nfsd
race from the same pile back in 2008 had been
* ext2_new_inode() chooses inumber
* open-by-fhandle guesses the inumber and hits ext2_iget(), which
inserts a locked in-core inode into icache and proceeds to block reading
it from disk.
* ext2_new_inode() inserts *its* in-core inode into icache (with
the same inumber) and sets the things up, both in-core and on disk
* open-by-fhandle is back and sees a good live on-disk inode.
It finishes setting the in-core one up and we'd got *TWO* in-core inodes
with the same inumber, both hashed, both with dentries, both used by
syscalls to do IO. Good times all around - fs corruption is fun.
That was fixed by using insert_inode_locked() in ext2_new_inode(), and doing
that before the on-disk inode would start looking good. If it came during
ext2_iget(), it would've found an in-core inode with that inumber (locked,
doomed to be rejected), waited for it to come unlocked, see it unhashed
(since ext2_iget() said it was no good) and inserted its in-core inode
into hash (after having rechecked that nobody had an in-core inode with
the same inumber in there, that is).
I'm not familiar enough with XFS icache replacment to tell if anything
of that sort is a problem there; might be a non-issue for any number
of reasons.
next prev parent reply other threads:[~2018-05-11 2:18 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-05-10 18:20 [RFC][PATCH] do d_instantiate/unlock_new_inode combinations safely Al Viro
2018-05-10 19:11 ` Andreas Dilger
2018-05-10 19:32 ` Al Viro
2018-05-10 20:44 ` Mike Marshall
2018-05-10 22:56 ` Dave Chinner
2018-05-11 0:39 ` Al Viro
2018-05-11 1:32 ` Dave Chinner
2018-05-11 2:18 ` Al Viro [this message]
2018-05-11 3:00 ` Dave Chinner
2018-05-11 19:56 ` Al Viro
2018-05-11 6:15 ` Ritesh Harjani
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180511021843.GY30522@ZenIV.linux.org.uk \
--to=viro@zeniv.linux.org.uk \
--cc=david@fromorbit.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.