From: Al Viro <viro@ZenIV.linux.org.uk>
To: "Drokin, Oleg" <oleg.drokin@intel.com>
Cc: "Dilger, Andreas" <andreas.dilger@intel.com>,
Linus Torvalds <torvalds@linux-foundation.org>,
"<linux-fsdevel@vger.kernel.org>" <linux-fsdevel@vger.kernel.org>,
Mark Fasheh <mfasheh@suse.com>
Subject: Re: races in ll_splice_alias() and elsewhere (ext4, ocfs2)
Date: Thu, 10 Mar 2016 05:15:50 +0000 [thread overview]
Message-ID: <20160310051550.GJ17997@ZenIV.linux.org.uk> (raw)
In-Reply-To: <20160310044316.GI17997@ZenIV.linux.org.uk>
On Thu, Mar 10, 2016 at 04:43:16AM +0000, Al Viro wrote:
> On Thu, Mar 10, 2016 at 03:46:43AM +0000, Drokin, Oleg wrote:
>
> > > Wait a minute. If it's hashed, has the right name and the right parent,
> > > why the hell are we calling ->lookup() on a new dentry in the first place?
> > > Why hadn't we simply picked it from dcache?
> >
> > This is because of the trickery we do in the d_compare.
> > our d_compare looks at the "invalid" flag and if it's set, returns "not matching",
> > triggering the lookup instead of revalidate.
> > This makes revalidate simple and fast.
> > (We used to have a complicated revalidate with a lot of code duplication with
> > lookup in order to be able to query the server and pass all sorts of data there
> > and it was nothing but trouble).
>
> *Ugh*... That's really nasty. We certainly could make d_exact_match()
> accept unhashed ones and make rehashing conditional (NFS doesn't pull
> anything similar, so it won't care), but your ->d_revalidate()
> has exact same problem as ext4_d_revalidate() one mentioned upthread -
> there's no warranty that dentry->d_parent will stay stable.
>
> We are *NOT* guaranteed locked parent when ->d_revalidate() is called, or
> we would have to lock every damn directory on the way through the pathname
> resolution. Moreover, ->d_revalidate() really can overlap with rename(2).
PS: there's a reason why e.g. NFS ->d_revalidate() is doing
if (flags & LOOKUP_RCU) {
parent = ACCESS_ONCE(dentry->d_parent);
dir = d_inode_rcu(parent);
if (!dir)
return -ECHILD;
} else {
parent = dget_parent(dentry);
dir = d_inode(parent);
}
and so do other instances. It does *not* guarantee that parent will remain
the parent through the whole thing (or will still be one by the time
dget_parent() caller gets the return value), but it does guarantee that it
won't get freed under you. Note that the original parent won't disappear
(it's pinned by the caller), but there's no promise that what you'll
fetch from dentry->d_parent inside the method will have anything to do with
that.
BTW, we might be better off if we passed the parent and child as separate
arguments...
By the quick look through the instances, we have
* a bunch that don't look at the parent at all
* some that use dget_parent()/dput() (and often enough use only
->d_inode of the parent).
* some that look at it under dentry->d_lock - that's enough for
stability, but can't block. ceph, BTW, does igrab() of parent's inode under
->d_lock, uses it outside of ->d_lock and iput() in the end.
* kernfs, which serializes just about everything on a single
system-wide mutex.
* lustre (and ext4 crypto in -next) - broken
Only the third class (and actually only one instance in there - vfat) wouldn't
be just as fine if we passed it the parent as argument. VFAT one does
spin_lock(&dentry->d_lock);
if (dentry->d_time != d_inode(dentry->d_parent)->i_version)
ret = 0;
spin_unlock(&dentry->d_lock);
That one does care about ->d_time and ->d_parent being from the same moment.
And it can bloody well keep doing what it does.
Reducing the amount of dget_parent() callers would also be nice...
next prev parent reply other threads:[~2016-03-10 5:15 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-03-08 16:05 races in ll_splice_alias() Al Viro
2016-03-08 20:44 ` Drokin, Oleg
2016-03-08 21:11 ` Al Viro
2016-03-08 23:18 ` Drokin, Oleg
2016-03-09 0:34 ` Al Viro
2016-03-09 0:53 ` Drokin, Oleg
2016-03-09 1:26 ` Al Viro
2016-03-09 5:20 ` Drokin, Oleg
2016-03-09 23:47 ` Drokin, Oleg
2016-03-10 2:20 ` races in ll_splice_alias() and elsewhere (ext4, ocfs2) Al Viro
2016-03-10 2:59 ` Al Viro
2016-03-10 23:55 ` Theodore Ts'o
2016-03-11 3:18 ` Al Viro
2016-03-11 15:42 ` Theodore Ts'o
2016-03-10 3:08 ` Drokin, Oleg
2016-03-10 3:34 ` Al Viro
2016-03-10 3:46 ` Drokin, Oleg
2016-03-10 4:22 ` Drokin, Oleg
2016-03-10 4:43 ` Al Viro
2016-03-10 5:15 ` Al Viro [this message]
2016-03-11 3:47 ` Drokin, Oleg
2016-03-10 5:47 ` Drokin, Oleg
2016-03-10 19:59 ` Al Viro
2016-03-10 20:34 ` do we need that smp_wmb() in __d_alloc()? Al Viro
2016-03-10 21:17 ` Al Viro
2016-03-10 21:22 ` races in ll_splice_alias() and elsewhere (ext4, ocfs2) Drokin, Oleg
2016-03-10 23:23 ` Al Viro
2016-03-11 3:25 ` Drokin, Oleg
2016-03-12 17:22 ` Al Viro
2016-03-13 14:35 ` Sage Weil
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160310051550.GJ17997@ZenIV.linux.org.uk \
--to=viro@zeniv.linux.org.uk \
--cc=andreas.dilger@intel.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=mfasheh@suse.com \
--cc=oleg.drokin@intel.com \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).