Re: [RFC][PATCH] do d_instantiate/unlock_new_inode combinations safely

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Dave Chinner <david@fromorbit.com>
To: Al Viro <viro@ZenIV.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: [RFC][PATCH] do d_instantiate/unlock_new_inode combinations safely
Date: Fri, 11 May 2018 11:32:08 +1000	[thread overview]
Message-ID: <20180511013208.GV23861@dastard> (raw)
In-Reply-To: <20180511003901.GW30522@ZenIV.linux.org.uk>

On Fri, May 11, 2018 at 01:39:01AM +0100, Al Viro wrote:
> On Fri, May 11, 2018 at 08:56:07AM +1000, Dave Chinner wrote:
> 
> > > For anything NFS-exported we do _not_ want to unlock new inode
> > > before it has grown an alias; original set of fixes got the
> > > ordering right, but missed the nasty complication in case of
> > > lockdep being enabled - unlock_new_inode() does
> > >     lockdep_annotate_inode_mutex_key(inode)
> > > which can only be done before anyone gets a chance to touch
> > > ->i_mutex.  Unfortunately, flipping the order and doing
> > > unlock_new_inode() before d_instantiate() opens a window when
> > > mkdir can race with open-by-fhandle on a guessed fhandle, leading
> > > to multiple aliases for a directory inode and all the breakage
> > > that follows from that.
> > > 
> > >     Correct solution: a new primitive (d_instantiate_new())
> > > combining these two in the right order - lockdep annotate, then
> > > d_instantiate(), then the rest of unlock_new_inode().  All
> > > combinations of d_instantiate() with unlock_new_inode() should
> > > be converted to that.
> > 
> > Ok, so this seems to touch only the paths that create new inodes
> > (mkdir, mknod, etc). Is the lookup path that does:
> > 
> > 
> > 	unlock_new_inode()
> > 	.....
> > 	d_splice_alias(inode, dentry);
> > 
> > OK?
> 
> Yes.  d_splice_alias()
> 	* will do the right thing when it runs into directory inode
> that already has an alias
> 	* is called from ->d_lookup(), which has calling conventions
> allowing to return a preexisting alias
> 
> The race in question is between mkdir() and open-by-fhandle that manages
> to guess an fhandle for directory about to be created.  mkdir() side
> creates a new inode, inserts it into icache (locked) and proceeds towards
> unlock_new_inode()/d_instantiate().  Suppose it loses CPU right after
> unlock_new_inode() and open-by-fhandle picks the inode from icache
> (either having just gotten there, or finally gets woken up after having
> waited for the sucker to get unlocked).  inode is valid, everything's
> set up properply, so we pass it to d_obtain_alias(), which sees that
> there's no exiting dentries, allocates one, rechecks, finds that there's
> still nothing and proceeds to attach its new anon dentry to that inode.
> Now mkdir regains CPU and does d_instantiate().  And we are fucked -
> there are *two* dentries for given directory inode.

Ok, thanks for the description of the race, Al. I understand it now.
:)

> 
> The window is narrow - to have a chance to hit it you need either
> to run it in a VM or have security_d_instantiate() (from d_instantiate())
> to do something slow (ideally - blocking).  It's non-empty, though.
> 
> Doing it in the opposite order (as XFS does on mkdir et.al.) plugs that
> window - open-by-fhandle won't get to the inode until after mkdir has
> attached a dentry to it.  Then d_obtain_alias() will simply return that
> dentry and we are done.  It's only d_instantiate() (or d_add()) that is
> a problem - d_splice_alias() is fine, so on the lookup path we don't
> need anything like that.  d_add_ci() is like d_splice_alias() in that
> respect, so the lookup is OK in case-insensitive variant as well.
> 
> So it would appear that XFS doesn't need to be touched.  HOWEVER,
> lockdep shite *can't* be done after something has had a chance to grab
> the damn rwsem.  I really wonder if
> 		d_instantiate(dentry, inode);
>         xfs_finish_inode_setup(cip);
> doesn't lead to unpleasantness with lockdep enabled:
> 	xfs_finish_inode_setup() -> unlock_new_inode() ->
> lockdep_annotate_inode_mutex_key() -> init_rwsem(&inode->i_rwsem)
> which does wonders if something has already gotten to the inode
> via that dentry and tried e.g. lock_inode() on it.

Could well do. Though it seems fixable.

i.e. we already have code in xfs_setup_inode() that sets the xfs
inode ILOCK rwsem dir/non-dir lockdep class before the new inode is
unlocked - we could just do the i_rwsem lockdep setup there, too.
Then, if we were to factor unlock_new_inode() as Andreas suggested,
we could call __unlock_new_inode() from xfs_finish_inode_setup().

I might be missing something subtle, but that looks to me like it
would work.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

next prev parent reply	other threads:[~2018-05-11  1:32 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-05-10 18:20 [RFC][PATCH] do d_instantiate/unlock_new_inode combinations safely Al Viro
2018-05-10 19:11 ` Andreas Dilger
2018-05-10 19:32   ` Al Viro
2018-05-10 20:44 ` Mike Marshall
2018-05-10 22:56 ` Dave Chinner
2018-05-11  0:39   ` Al Viro
2018-05-11  1:32     ` Dave Chinner [this message]
2018-05-11  2:18       ` Al Viro
2018-05-11  3:00         ` Dave Chinner
2018-05-11 19:56           ` Al Viro
2018-05-11  6:15     ` Ritesh Harjani

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180511013208.GV23861@dastard \
    --to=david@fromorbit.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@ZenIV.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).