From: Al Viro <viro@ZenIV.linux.org.uk>
To: "J. Bruce Fields" <bfields@fieldses.org>
Cc: linux-fsdevel@vger.kernel.org, linux-nfs@vger.kernel.org,
Linus Torvalds <torvalds@linux-foundation.org>,
Jeff Layton <jlayton@kernel.org>
Subject: Re: [RFC][PATCH] nfsd: vfs_mkdir() might succeed leaving dentry negative unhashed
Date: Mon, 14 May 2018 17:47:43 +0100 [thread overview]
Message-ID: <20180514164743.GI30522@ZenIV.linux.org.uk> (raw)
In-Reply-To: <20180514154550.GH30522@ZenIV.linux.org.uk>
On Mon, May 14, 2018 at 04:45:51PM +0100, Al Viro wrote:
> > > 2) is nfserr_serverfail valid for situation when
> > > directory created by such vfs_mkdir() manages to disappear
> > > under us immediately afterwards? Or should we return nfserr_stale
> > > instead?
> >
> > We just got a successful result on the create and the parent's still
> > locked, so if somebody hits this I think we want them reporting a bug,
> > not wasting time trying to find a mistake in their own logic.
>
> No. Suppose it's NFS-over-NFS (and yes, I agree that it's a bad idea;
> somebody *has* done that). Lookup after successful mkdir can legitimately
> fail if it's been removed server-side.
>
> And we *will* need to allow nfs_mkdir() to return that way in some cases
> (== success with dentry passed to it left unhashed negative), I'm afraid ;-/
Consider the situation when you have NFS reexported - there's server S1,
exporting a filesystem to S2, which reexports it for client C.
On S2, process A does stat foo, gets ENOENT and proceeds to mkdir foo.
Dentry of foo is passed to nfs_mkdir(), which calls e.g. nfs3_proc_mkdir(),
which calls nfs3_do_create(), which requests S1 to create the damn thing.
Then, after that succeeds, it calls nfs_instantiate(). There we would
proceed to get the inode and call d_add(dentry, inode).
In the meanwhile, C, having figured out the fhandle S2 would assign to
foo (e.g. having spied upon the traffic, etc.) sends that fhandle to
S2. nfs_fh_to_dentry() is called by process B on S2 (either knfsd, or,
in case if C==S2 and attacker has done open-by-fhandle - the caller
of open_by_handle(2)). It gets the inode - the damn thing has been
created on S1 already. That inode still has no dentries attached to
it (B has just set it up), so d_obtain_alias() creates one and attaches
to it.
Now A gets around to nfs_fhget() and finds the same inode. Voila -
d_add(dentry, inode) and we are fucked; two dentries over the same
directory inode. Fun starts very shortly when fs/exportfs/* code
is used by B to reconnect its dentry - the things really get bogus
once that constraint is violated.
The obvious solution would be to replace that d_add() with
res = d_splice_alias(inode, dentry);
if (IS_ERR(res)) {
error = PTR_ERR(res);
goto out_error;
}
dput(res);
leaving the dentry passed to nfs_mkdir() unhashed and negative if
we hit that race. That's fine - the next lookup (e.g. the one
done by exportfs to reconnect the sucker) will find the right
dentry in dcache; it's just that it won't the one passed to
vfs_mkdir().
That's different from the local case - there mkdir gets the inumber,
inserts locked in-core inode with that inumber into icache and
only then proceeds to set on-disk data up. Anyone guessing the
inumber (and thus the fhandle) will either get -ESTALE (if they
come first, as in this scenario) or find the in-core inode mkdir
is setting up and wait for it to be unlocked, at which point
d_obtain_alias() will already find it attached to dentry passed
to mkdir.
But that critically relies upon the icache search key being known
to mkdir *BEFORE* the on-disk metadata starts looking acceptable.
For NFS we really can't do that - there the key is S1's fhandle
and we don't get that until S1 has created the damn thing.
I don't see any variation of the trick used by local filesystems
that would not have this race; we really can end up with B getting
there first and creating directory inode with dentry attached to
it before A gets through. Which, AFAICS, leaves only one solution -
let A put the dentry created by B in place of what had been passed
to A (and leave its argument unhashed). Which is trivial to
implement (see above); however, it means that NFS itself is in
the same class as cgroup - its ->mkdir() may, in some cases,
end up succeeding and leaving its argument unhashed negative.
Note that dcache is perfectly fine after that - we have hashed
positive dentry with the right name and right parent, over the
inode for directory we'd just created; everything's fine, except
that it's not the struct dentry originally passed to vfs_mkdir().
next prev parent reply other threads:[~2018-05-14 16:47 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-05-11 21:13 [RFC][PATCH] nfsd: vfs_mkdir() might succeed leaving dentry negative unhashed Al Viro
2018-05-14 15:32 ` J. Bruce Fields
2018-05-14 15:45 ` Al Viro
2018-05-14 16:27 ` J. Bruce Fields
2018-05-14 17:00 ` Al Viro
2018-05-14 17:56 ` J. Bruce Fields
2018-05-14 16:47 ` Al Viro [this message]
2018-05-16 14:09 ` J. Bruce Fields
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180514164743.GI30522@ZenIV.linux.org.uk \
--to=viro@zeniv.linux.org.uk \
--cc=bfields@fieldses.org \
--cc=jlayton@kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-nfs@vger.kernel.org \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).