From: Al Viro <viro@zeniv.linux.org.uk>
To: "Tobin C. Harding" <me@tobin.cc>
Cc: linux-fsdevel@vger.kernel.org
Subject: Re: dcache locking question
Date: Fri, 15 Mar 2019 01:50:21 +0000 [thread overview]
Message-ID: <20190315015021.GU2217@ZenIV.linux.org.uk> (raw)
In-Reply-To: <20190314231939.GA17269@eros.localdomain>
On Fri, Mar 15, 2019 at 10:19:39AM +1100, Tobin C. Harding wrote:
> On Fri, Mar 15, 2019 at 09:56:32AM +1100, Tobin C. Harding wrote:
> > Hi,
> >
> > I'm not able to understand the locking order in dcache. dcache has been
> > around for a while so clearly its right and I'm wrong.
> >
> > Could someone please explain to me how the locking order commented at
> > the top of the file is not violated in the following:
>
> >
> > From top of fs/dcache.c
> >
> > * If there is an ancestor relationship:
> > * dentry->d_parent->...->d_parent->d_lock
> > * ...
> > * dentry->d_parent->d_lock
> > * dentry->d_lock
> >
>
> This is a more simple example
>
> static struct dentry *dentry_kill(struct dentry *dentry)
> __releases(dentry->d_lock)
> {
> ...
>
> slow_positive:
> spin_unlock(&dentry->d_lock);
> spin_lock(&inode->i_lock);
> spin_lock(&dentry->d_lock);
> parent = lock_parent(dentry);
Yes? Why is that a problem? We start with ->d_lock held.
We drop it. At that point we are holding a reference to
dentry and no locks whatsoever. We know that inode is not
going away (we are holding a reference to positive dentry,
so its ->d_inode can't change under us, and it contributes
to inode refcount). So we can safely grab its ->i_lock.
Then we grab (in normal order) ->d_lock. Then, in
lock_parent() we check if ->d_parent points to ourselves.
->d_parent is stabilized by ->d_lock, so that check
if fine. If it does *not* point to ourselves, we do
trylock on its ->d_parent. Also fine - ->d_parent isn't
going anywhere and trylock can't lead to deadlocks;
that's what "try" in the name is about. If it succeeds,
we have
* inode->i_lock held
* dentry->d_lock held
* parent->d_lock held
* inode == dentry->d_inode
* parent == dentry->d_parent
and we are fine, except that the reference we are holding
might not be the only existing one anymore (it used to be,
but we'd dropped ->d_lock, so additional ones might've
appeared, and we need to repeat the retain_dentry()-related
checks).
If it fails, we call __lock_parent(). Which
* grabs RCU lock
* drops ->d_lock (now we are not holding ->d_lock
on anything).
* fetches ->d_parent. Note the READ_ONCE() there -
it's *NOT* stable (no ->d_lock held). We can't expect
that ->d_parent won't change or that the reference it used
to contribute to parent's refcount is there anymore; as
the matter of fact, the only thing that prevents outright
_freeing_ of the object 'parent' points to is rcu_read_lock()
and RCU delay between dropping the last reference and
actual freeing of the sucker. rcu_read_lock() is there,
though, which makes it safe to grab ->d_lock on 'parent'.
That 'parent' might very well have nothing to do with our
dentry by now. We can check if it's equal to its
->d_parent, though. dentry->d_parent is *NOT* stable
at that point. It might be changing right now.
However, the first store to dentry->d_parent making it
not equal to parent would have been done under parent->d_lock.
And since we are holding parent->d_lock, we won't miss that
store. We might miss subsequent ones, but if we observe
dentry->d_parent == parent, we know that it's stable. And
if we see dentry->d_parent != parent, we know that dentry
has moved around and we need to retry anyway.
If we do *not* need to retry, we have
* inode->i_lock held
* parent->d_lock held
* dentry->d_parent == parent
* dentry->d_inode == inode
At that point we can drop rcu_read_lock() - parent can't be
freed under us without dentry->d_parent switched from
parent to something else, and anyone trying to do that will
spin on parent->d_lock.
And now we can grab dentry->d_lock - ordering is guaranteed.
The only minor twist is that once upon a time __lock_parent()
used to have to cope with the possibility of dentry becoming
detached while we were retrying; in that case we obviously
don't want to grab dentry->d_lock - we are holding it already
(as parent->d_lock).
These days (since "make non-exchanging __d_move() copy ->d_parent
rather than swap them") if IS_ROOT(dentry) is false at some
point, it *never* will become true, so this check could've
been dropped, along with the checks for __lock_parent() return
value possibly being NULL.
I'm not saying it's not subtle and nasty - any time you see
READ_ONCE() (or rcu_read_lock(), for that matter), expect
headache. But trylock is absolutely innocious part here; we
are not even retrying it once.
next prev parent reply other threads:[~2019-03-15 1:50 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-03-14 22:56 dcache locking question Tobin C. Harding
2019-03-14 23:09 ` Matthew Wilcox
2019-03-15 1:38 ` Tobin C. Harding
2019-03-14 23:19 ` Tobin C. Harding
2019-03-15 1:50 ` Al Viro [this message]
2019-03-15 17:38 ` Eric Biggers
2019-03-15 18:54 ` Al Viro
2019-03-16 22:31 ` Paul E. McKenney
2019-03-17 0:18 ` Al Viro
2019-03-17 0:50 ` Paul E. McKenney
2019-03-17 2:20 ` James Bottomley
2019-03-17 3:06 ` Al Viro
2019-03-17 4:23 ` James Bottomley
2019-03-18 0:35 ` Paul E. McKenney
2019-03-18 16:26 ` James Bottomley
2019-03-18 17:11 ` Paul E. McKenney
2019-03-19 15:45 ` Paul E. McKenney
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190315015021.GU2217@ZenIV.linux.org.uk \
--to=viro@zeniv.linux.org.uk \
--cc=linux-fsdevel@vger.kernel.org \
--cc=me@tobin.cc \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.