From: Al Viro <viro@ZenIV.linux.org.uk>
To: "Drokin, Oleg" <oleg.drokin@intel.com>
Cc: "Dilger, Andreas" <andreas.dilger@intel.com>,
Linus Torvalds <torvalds@linux-foundation.org>,
"<linux-fsdevel@vger.kernel.org>" <linux-fsdevel@vger.kernel.org>,
Theodore Ts'o <tytso@mit.edu>, Mark Fasheh <mfasheh@suse.com>
Subject: Re: races in ll_splice_alias() and elsewhere (ext4, ocfs2)
Date: Thu, 10 Mar 2016 23:23:14 +0000 [thread overview]
Message-ID: <20160310232314.GP17997@ZenIV.linux.org.uk> (raw)
In-Reply-To: <11114F89-310D-460E-9758-C061E94B7811@intel.com>
On Thu, Mar 10, 2016 at 09:22:21PM +0000, Drokin, Oleg wrote:
> In fact I was surprised this was not the case already and so people
> needed to call it by hand in every possible case where a new dentry is possibly
> originated leading to a bit of a mess (and some bugs).
Most of the filesystems do not need that at all - just look at the
d_splice_alias() callers. In mainline we have
* 22 callers of form return d_splice_alias(inode, dentry); in
some ->lookup() instance.
* 1 caller in d_add_ci(), all callers of which are in ->lookup()
instances, with return value directly returned to caller of ->lookup().
* 1 caller in kernfs ->lookup(), with return value directly
returned after releasing a global mutex.
* ceph - returned in a very convoluted way, but return values is
not dereferenced until it gets passed to caller of ->lookup()
* gfs2 - passed to caller (or fed to finish_open() when it's in
atomic open) with some unlocks along the way.
* ocfs2 - broken, and in a way that wouldn't be fixable at dentry
allocation time.
* cifs - seeding dcache on readdir. No postprocessing of dentry.
fs/9p/vfs_inode.c:834: res = d_splice_alias(inode, dentry);
* fuse - essentially touching a timestamp of dentry, be it new or
old one. Both on ->lookup() and when seeding dcache on readdir. Same
as done on successful revalidate, so the worst we are risking is extra work
in ->d_revalidate() for somebody finding it in dcache before we'd touched
the timestamp.
* nfs - ditto.
* 9p - the fid we'd obtained is hung on whatever dentry we get.
If revalidate doesn't find a fid, it'll try to get one from server and
hang it on the dentry.
d_obtain_alias() callers make it even more clear - 27 call sites where it's
directly returned, 5 more where it's returned without being dereferenced
and 5 callers in 3 filesystems (ceph, fuse, lustre) where something is
(or ought to be) done. ceph might benefit from that thing as well - looks
like its users of d_obtain_alias() might be racy wrt d_splice_alias() from
->lookup() picking the alias before ceph_init_dentry() finishes. Or not -
the call chain in there is convoluted as hell and I might miss the call
of ceph_init_denty() along the way. ocfs2 might or might not be correct -
it doesn't do any postprocessing after d_obtain_alias(); no idea if it
really needs one there.
So we have two filesystems that could benefit from having ->d_init() -
ceph and lustre. Note that e.g. ocfs2 wouldn't benefit from that - it needs
a lot more than just allocation *and* it uses ->d_fsdata differently for
positives and negatives.
FWIW, there's a troubling thing in ceph_init_dentry() -
if (ceph_snap(d_inode(dentry->d_parent)) == CEPH_NOSNAP)
d_set_d_op(dentry, &ceph_dentry_ops);
else if (ceph_snap(d_inode(dentry->d_parent)) == CEPH_SNAPDIR)
d_set_d_op(dentry, &ceph_snapdir_dentry_ops);
else
d_set_d_op(dentry, &ceph_snap_dentry_ops);
_That_ can't be done in __d_alloc() and, what's worse, it doesn't look good
in their __fh_to_dentry() call of ceph_init_dentry(). ->d_parent has
a good chance to be pointing to the dentry itself. Not sure how much trouble
does it lead to - I'm not familiar with ceph snapshots or ceph reexports over
NFS, let alone the combination...
> The downside is that we do the allocation at all times even if the dentry is
> going to be promptly destroyed because we found a better alias.
The cost of kmalloc() (or kmem_cache_alloc() - it might make sense
to introduce a specialized slab for those beasts) is trivial compared to the
cost of talking to server to find which inode you are going to deal with.
next prev parent reply other threads:[~2016-03-10 23:23 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-03-08 16:05 races in ll_splice_alias() Al Viro
2016-03-08 20:44 ` Drokin, Oleg
2016-03-08 21:11 ` Al Viro
2016-03-08 23:18 ` Drokin, Oleg
2016-03-09 0:34 ` Al Viro
2016-03-09 0:53 ` Drokin, Oleg
2016-03-09 1:26 ` Al Viro
2016-03-09 5:20 ` Drokin, Oleg
2016-03-09 23:47 ` Drokin, Oleg
2016-03-10 2:20 ` races in ll_splice_alias() and elsewhere (ext4, ocfs2) Al Viro
2016-03-10 2:59 ` Al Viro
2016-03-10 23:55 ` Theodore Ts'o
2016-03-11 3:18 ` Al Viro
2016-03-11 15:42 ` Theodore Ts'o
2016-03-10 3:08 ` Drokin, Oleg
2016-03-10 3:34 ` Al Viro
2016-03-10 3:46 ` Drokin, Oleg
2016-03-10 4:22 ` Drokin, Oleg
2016-03-10 4:43 ` Al Viro
2016-03-10 5:15 ` Al Viro
2016-03-11 3:47 ` Drokin, Oleg
2016-03-10 5:47 ` Drokin, Oleg
2016-03-10 19:59 ` Al Viro
2016-03-10 20:34 ` do we need that smp_wmb() in __d_alloc()? Al Viro
2016-03-10 21:17 ` Al Viro
2016-03-10 21:22 ` races in ll_splice_alias() and elsewhere (ext4, ocfs2) Drokin, Oleg
2016-03-10 23:23 ` Al Viro [this message]
2016-03-11 3:25 ` Drokin, Oleg
2016-03-12 17:22 ` Al Viro
2016-03-13 14:35 ` Sage Weil
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160310232314.GP17997@ZenIV.linux.org.uk \
--to=viro@zeniv.linux.org.uk \
--cc=andreas.dilger@intel.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=mfasheh@suse.com \
--cc=oleg.drokin@intel.com \
--cc=torvalds@linux-foundation.org \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).