From: "J. Bruce Fields" <bfields@fieldses.org>
To: Al Viro <viro@ZenIV.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org, linux-nfs@vger.kernel.org,
Linus Torvalds <torvalds@linux-foundation.org>,
Jeff Layton <jlayton@kernel.org>
Subject: Re: [RFC][PATCH] nfsd: vfs_mkdir() might succeed leaving dentry negative unhashed
Date: Wed, 16 May 2018 10:09:14 -0400 [thread overview]
Message-ID: <20180516140914.GA3730@fieldses.org> (raw)
In-Reply-To: <20180514164743.GI30522@ZenIV.linux.org.uk>
On Mon, May 14, 2018 at 05:47:43PM +0100, Al Viro wrote:
> On Mon, May 14, 2018 at 04:45:51PM +0100, Al Viro wrote:
>
> > > > 2) is nfserr_serverfail valid for situation when
> > > > directory created by such vfs_mkdir() manages to disappear
> > > > under us immediately afterwards? Or should we return nfserr_stale
> > > > instead?
> > >
> > > We just got a successful result on the create and the parent's still
> > > locked, so if somebody hits this I think we want them reporting a bug,
> > > not wasting time trying to find a mistake in their own logic.
> >
> > No. Suppose it's NFS-over-NFS (and yes, I agree that it's a bad idea;
> > somebody *has* done that). Lookup after successful mkdir can legitimately
> > fail if it's been removed server-side.
> >
> > And we *will* need to allow nfs_mkdir() to return that way in some cases
> > (== success with dentry passed to it left unhashed negative), I'm afraid ;-/
>
> Consider the situation when you have NFS reexported - there's server S1,
> exporting a filesystem to S2, which reexports it for client C.
Thanks for the explanation! I'd missed the connection between this and
the mkdir/filehandle-lookup races.
Acked-by: J. Bruce Fields <bfields@redhat.com>
for the patch.
--b.
>
> On S2, process A does stat foo, gets ENOENT and proceeds to mkdir foo.
> Dentry of foo is passed to nfs_mkdir(), which calls e.g. nfs3_proc_mkdir(),
> which calls nfs3_do_create(), which requests S1 to create the damn thing.
> Then, after that succeeds, it calls nfs_instantiate(). There we would
> proceed to get the inode and call d_add(dentry, inode).
>
> In the meanwhile, C, having figured out the fhandle S2 would assign to
> foo (e.g. having spied upon the traffic, etc.) sends that fhandle to
> S2. nfs_fh_to_dentry() is called by process B on S2 (either knfsd, or,
> in case if C==S2 and attacker has done open-by-fhandle - the caller
> of open_by_handle(2)). It gets the inode - the damn thing has been
> created on S1 already. That inode still has no dentries attached to
> it (B has just set it up), so d_obtain_alias() creates one and attaches
> to it.
>
> Now A gets around to nfs_fhget() and finds the same inode. Voila -
> d_add(dentry, inode) and we are fucked; two dentries over the same
> directory inode. Fun starts very shortly when fs/exportfs/* code
> is used by B to reconnect its dentry - the things really get bogus
> once that constraint is violated.
>
> The obvious solution would be to replace that d_add() with
> res = d_splice_alias(inode, dentry);
> if (IS_ERR(res)) {
> error = PTR_ERR(res);
> goto out_error;
> }
> dput(res);
>
> leaving the dentry passed to nfs_mkdir() unhashed and negative if
> we hit that race. That's fine - the next lookup (e.g. the one
> done by exportfs to reconnect the sucker) will find the right
> dentry in dcache; it's just that it won't the one passed to
> vfs_mkdir().
>
> That's different from the local case - there mkdir gets the inumber,
> inserts locked in-core inode with that inumber into icache and
> only then proceeds to set on-disk data up. Anyone guessing the
> inumber (and thus the fhandle) will either get -ESTALE (if they
> come first, as in this scenario) or find the in-core inode mkdir
> is setting up and wait for it to be unlocked, at which point
> d_obtain_alias() will already find it attached to dentry passed
> to mkdir.
>
> But that critically relies upon the icache search key being known
> to mkdir *BEFORE* the on-disk metadata starts looking acceptable.
> For NFS we really can't do that - there the key is S1's fhandle
> and we don't get that until S1 has created the damn thing.
>
> I don't see any variation of the trick used by local filesystems
> that would not have this race; we really can end up with B getting
> there first and creating directory inode with dentry attached to
> it before A gets through. Which, AFAICS, leaves only one solution -
> let A put the dentry created by B in place of what had been passed
> to A (and leave its argument unhashed). Which is trivial to
> implement (see above); however, it means that NFS itself is in
> the same class as cgroup - its ->mkdir() may, in some cases,
> end up succeeding and leaving its argument unhashed negative.
> Note that dcache is perfectly fine after that - we have hashed
> positive dentry with the right name and right parent, over the
> inode for directory we'd just created; everything's fine, except
> that it's not the struct dentry originally passed to vfs_mkdir().
prev parent reply other threads:[~2018-05-16 14:09 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-05-11 21:13 [RFC][PATCH] nfsd: vfs_mkdir() might succeed leaving dentry negative unhashed Al Viro
2018-05-14 15:32 ` J. Bruce Fields
2018-05-14 15:45 ` Al Viro
2018-05-14 16:27 ` J. Bruce Fields
2018-05-14 17:00 ` Al Viro
2018-05-14 17:56 ` J. Bruce Fields
2018-05-14 16:47 ` Al Viro
2018-05-16 14:09 ` J. Bruce Fields [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180516140914.GA3730@fieldses.org \
--to=bfields@fieldses.org \
--cc=jlayton@kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-nfs@vger.kernel.org \
--cc=torvalds@linux-foundation.org \
--cc=viro@ZenIV.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.