From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-fsdevel-owner@vger.kernel.org>
Received: from ipmail07.adl2.internode.on.net ([150.101.137.131]:34711 "EHLO
        ipmail07.adl2.internode.on.net" rhost-flags-OK-OK-OK-OK)
        by vger.kernel.org with ESMTP id S1750758AbeEKDAd (ORCPT
        <rfc822;linux-fsdevel@vger.kernel.org>);
        Thu, 10 May 2018 23:00:33 -0400
Date: Fri, 11 May 2018 13:00:29 +1000
From: Dave Chinner <david@fromorbit.com>
To: Al Viro <viro@ZenIV.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org,
        Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: [RFC][PATCH] do d_instantiate/unlock_new_inode combinations
 safely
Message-ID: <20180511030029.GW23861@dastard>
References: <20180510182058.GP30522@ZenIV.linux.org.uk>
 <20180510225607.GU23861@dastard>
 <20180511003901.GW30522@ZenIV.linux.org.uk>
 <20180511013208.GV23861@dastard>
 <20180511021843.GY30522@ZenIV.linux.org.uk>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20180511021843.GY30522@ZenIV.linux.org.uk>
Sender: linux-fsdevel-owner@vger.kernel.org
List-ID: <linux-fsdevel.vger.kernel.org>

On Fri, May 11, 2018 at 03:18:43AM +0100, Al Viro wrote:
> On Fri, May 11, 2018 at 11:32:08AM +1000, Dave Chinner wrote:
> 
> > i.e. we already have code in xfs_setup_inode() that sets the xfs
> > inode ILOCK rwsem dir/non-dir lockdep class before the new inode is
> > unlocked - we could just do the i_rwsem lockdep setup there, too.
> 
> ... which would suffice -
> 
>         if (S_ISDIR(inode->i_mode)) {
>                 struct file_system_type *type = inode->i_sb->s_type;
> 
>                 /* Set new key only if filesystem hasn't already changed it */
>                 if (lockdep_match_class(&inode->i_rwsem, &type->i_mutex_key)) {
> 
> in lockdep_annotate_inode_mutex_key() would make sure that ->i_rwsem will be
> left alone by unlock_new_inode().

Ok, If you are happy with XFs doing that, I'll put together a patch
and send it out.

> > Then, if we were to factor unlock_new_inode() as Andreas suggested,
> > we could call __unlock_new_inode() from xfs_finish_inode_setup().
> 
> No need - if you set the class in xfs_setup_inode(), you are fine.
> 
> Said that, hash insertion is also potentially delicate - another ext2/nfsd
> race from the same pile back in 2008 had been
> 	* ext2_new_inode() chooses inumber
> 	* open-by-fhandle guesses the inumber and hits ext2_iget(), which
> inserts a locked in-core inode into icache and proceeds to block reading
> it from disk.
> 	* ext2_new_inode() inserts *its* in-core inode into icache (with
> the same inumber) and sets the things up, both in-core and on disk
> 	* open-by-fhandle is back and sees a good live on-disk inode.
> It finishes setting the in-core one up and we'd got *TWO* in-core inodes
> with the same inumber, both hashed, both with dentries, both used by
> syscalls to do IO.  Good times all around - fs corruption is fun.
> 
> That was fixed by using insert_inode_locked() in ext2_new_inode(), and doing
> that before the on-disk inode would start looking good.  If it came during
> ext2_iget(), it would've found an in-core inode with that inumber (locked,
> doomed to be rejected), waited for it to come unlocked, see it unhashed
> (since ext2_iget() said it was no good) and inserted its in-core inode
> into hash (after having rechecked that nobody had an in-core inode with
> the same inumber in there, that is).
> 
> I'm not familiar enough with XFS icache replacment to tell if anything
> of that sort is a problem there; might be a non-issue for any number
> of reasons.

I'm pretty sure we handle those cases - amongst other things we
don't trust inode numbers in filehandles and so validation of inode
numbers in incoming filehandles is serialised against
allocating/freeing of inodes before it even gets to inode cache
lookups...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com