From: Al Viro <viro@zeniv.linux.org.uk>
To: "Tobin C. Harding" <me@tobin.cc>
Cc: linux-fsdevel@vger.kernel.org
Subject: Re: dcache locking question
Date: Fri, 15 Mar 2019 01:50:21 +0000 [thread overview]
Message-ID: <20190315015021.GU2217@ZenIV.linux.org.uk> (raw)
In-Reply-To: <20190314231939.GA17269@eros.localdomain>
On Fri, Mar 15, 2019 at 10:19:39AM +1100, Tobin C. Harding wrote:
> On Fri, Mar 15, 2019 at 09:56:32AM +1100, Tobin C. Harding wrote:
> > Hi,
> >
> > I'm not able to understand the locking order in dcache. dcache has been
> > around for a while so clearly its right and I'm wrong.
> >
> > Could someone please explain to me how the locking order commented at
> > the top of the file is not violated in the following:
>
> >
> > From top of fs/dcache.c
> >
> > * If there is an ancestor relationship:
> > * dentry->d_parent->...->d_parent->d_lock
> > * ...
> > * dentry->d_parent->d_lock
> > * dentry->d_lock
> >
>
> This is a more simple example
>
> static struct dentry *dentry_kill(struct dentry *dentry)
> __releases(dentry->d_lock)
> {
> ...
>
> slow_positive:
> spin_unlock(&dentry->d_lock);
> spin_lock(&inode->i_lock);
> spin_lock(&dentry->d_lock);
> parent = lock_parent(dentry);
Yes? Why is that a problem? We start with ->d_lock held.
We drop it. At that point we are holding a reference to
dentry and no locks whatsoever. We know that inode is not
going away (we are holding a reference to positive dentry,
so its ->d_inode can't change under us, and it contributes
to inode refcount). So we can safely grab its ->i_lock.
Then we grab (in normal order) ->d_lock. Then, in
lock_parent() we check if ->d_parent points to ourselves.
->d_parent is stabilized by ->d_lock, so that check
if fine. If it does *not* point to ourselves, we do
trylock on its ->d_parent. Also fine - ->d_parent isn't
going anywhere and trylock can't lead to deadlocks;
that's what "try" in the name is about. If it succeeds,
we have
* inode->i_lock held
* dentry->d_lock held
* parent->d_lock held
* inode == dentry->d_inode
* parent == dentry->d_parent
and we are fine, except that the reference we are holding
might not be the only existing one anymore (it used to be,
but we'd dropped ->d_lock, so additional ones might've
appeared, and we need to repeat the retain_dentry()-related
checks).
If it fails, we call __lock_parent(). Which
* grabs RCU lock
* drops ->d_lock (now we are not holding ->d_lock
on anything).
* fetches ->d_parent. Note the READ_ONCE() there -
it's *NOT* stable (no ->d_lock held). We can't expect
that ->d_parent won't change or that the reference it used
to contribute to parent's refcount is there anymore; as
the matter of fact, the only thing that prevents outright
_freeing_ of the object 'parent' points to is rcu_read_lock()
and RCU delay between dropping the last reference and
actual freeing of the sucker. rcu_read_lock() is there,
though, which makes it safe to grab ->d_lock on 'parent'.
That 'parent' might very well have nothing to do with our
dentry by now. We can check if it's equal to its
->d_parent, though. dentry->d_parent is *NOT* stable
at that point. It might be changing right now.
However, the first store to dentry->d_parent making it
not equal to parent would have been done under parent->d_lock.
And since we are holding parent->d_lock, we won't miss that
store. We might miss subsequent ones, but if we observe
dentry->d_parent == parent, we know that it's stable. And
if we see dentry->d_parent != parent, we know that dentry
has moved around and we need to retry anyway.
If we do *not* need to retry, we have
* inode->i_lock held
* parent->d_lock held
* dentry->d_parent == parent
* dentry->d_inode == inode
At that point we can drop rcu_read_lock() - parent can't be
freed under us without dentry->d_parent switched from
parent to something else, and anyone trying to do that will
spin on parent->d_lock.
And now we can grab dentry->d_lock - ordering is guaranteed.
The only minor twist is that once upon a time __lock_parent()
used to have to cope with the possibility of dentry becoming
detached while we were retrying; in that case we obviously
don't want to grab dentry->d_lock - we are holding it already
(as parent->d_lock).
These days (since "make non-exchanging __d_move() copy ->d_parent
rather than swap them") if IS_ROOT(dentry) is false at some
point, it *never* will become true, so this check could've
been dropped, along with the checks for __lock_parent() return
value possibly being NULL.
I'm not saying it's not subtle and nasty - any time you see
READ_ONCE() (or rcu_read_lock(), for that matter), expect
headache. But trylock is absolutely innocious part here; we
are not even retrying it once.
next prev parent reply other threads:[~2019-03-15 1:50 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-03-14 22:56 dcache locking question Tobin C. Harding
2019-03-14 23:09 ` Matthew Wilcox
2019-03-15 1:38 ` Tobin C. Harding
2019-03-14 23:19 ` Tobin C. Harding
2019-03-15 1:50 ` Al Viro [this message]
2019-03-15 17:38 ` Eric Biggers
2019-03-15 18:54 ` Al Viro
2019-03-16 22:31 ` Paul E. McKenney
2019-03-17 0:18 ` Al Viro
2019-03-17 0:50 ` Paul E. McKenney
2019-03-17 2:20 ` James Bottomley
2019-03-17 3:06 ` Al Viro
2019-03-17 4:23 ` James Bottomley
2019-03-18 0:35 ` Paul E. McKenney
2019-03-18 16:26 ` James Bottomley
2019-03-18 17:11 ` Paul E. McKenney
2019-03-19 15:45 ` Paul E. McKenney
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190315015021.GU2217@ZenIV.linux.org.uk \
--to=viro@zeniv.linux.org.uk \
--cc=linux-fsdevel@vger.kernel.org \
--cc=me@tobin.cc \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).