linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Al Viro <viro@zeniv.linux.org.uk>
To: "Tobin C. Harding" <me@tobin.cc>
Cc: linux-fsdevel@vger.kernel.org
Subject: Re: dcache locking question
Date: Fri, 15 Mar 2019 01:50:21 +0000	[thread overview]
Message-ID: <20190315015021.GU2217@ZenIV.linux.org.uk> (raw)
In-Reply-To: <20190314231939.GA17269@eros.localdomain>

On Fri, Mar 15, 2019 at 10:19:39AM +1100, Tobin C. Harding wrote:
> On Fri, Mar 15, 2019 at 09:56:32AM +1100, Tobin C. Harding wrote:
> > Hi,
> > 
> > I'm not able to understand the locking order in dcache.  dcache has been
> > around for a while so clearly its right and I'm wrong.
> > 
> > Could someone please explain to me how the locking order commented at
> > the top of the file is not violated in the following:
> 
> > 
> > From top of fs/dcache.c
> > 
> >  * If there is an ancestor relationship:
> >  * dentry->d_parent->...->d_parent->d_lock
> >  *   ...
> >  *     dentry->d_parent->d_lock
> >  *       dentry->d_lock
> > 
> 
> This is a more simple example
> 
> static struct dentry *dentry_kill(struct dentry *dentry)
> 	__releases(dentry->d_lock)
> {
>         ...
> 
> slow_positive:
> 	spin_unlock(&dentry->d_lock);
> 	spin_lock(&inode->i_lock);
> 	spin_lock(&dentry->d_lock);
> 	parent = lock_parent(dentry);

Yes?  Why is that a problem?  We start with ->d_lock held.
We drop it.  At that point we are holding a reference to
dentry and no locks whatsoever.  We know that inode is not
going away (we are holding a reference to positive dentry,
so its ->d_inode can't change under us, and it contributes
to inode refcount).  So we can safely grab its ->i_lock.
Then we grab (in normal order) ->d_lock.  Then, in
lock_parent() we check if ->d_parent points to ourselves.
->d_parent is stabilized by ->d_lock, so that check
if fine.  If it does *not* point to ourselves, we do
trylock on its ->d_parent. Also fine - ->d_parent isn't
going anywhere and trylock can't lead to deadlocks;
that's what "try" in the name is about.  If it succeeds,
we have
	* inode->i_lock held
	* dentry->d_lock held
	* parent->d_lock held
	* inode == dentry->d_inode
	* parent == dentry->d_parent
and we are fine, except that the reference we are holding
might not be the only existing one anymore (it used to be,
but we'd dropped ->d_lock, so additional ones might've
appeared, and we need to repeat the retain_dentry()-related
checks).

If it fails, we call __lock_parent().  Which
	* grabs RCU lock
	* drops ->d_lock (now we are not holding ->d_lock
on anything).
	* fetches ->d_parent.  Note the READ_ONCE() there -
it's *NOT* stable (no ->d_lock held).  We can't expect
that ->d_parent won't change or that the reference it used
to contribute to parent's refcount is there anymore; as
the matter of fact, the only thing that prevents outright
_freeing_ of the object 'parent' points to is rcu_read_lock()
and RCU delay between dropping the last reference and
actual freeing of the sucker.  rcu_read_lock() is there,
though, which makes it safe to grab ->d_lock on 'parent'.

That 'parent' might very well have nothing to do with our
dentry by now.  We can check if it's equal to its
->d_parent, though.  dentry->d_parent is *NOT* stable
at that point.  It might be changing right now.

However, the first store to dentry->d_parent making it
not equal to parent would have been done under parent->d_lock.
And since we are holding parent->d_lock, we won't miss that
store.  We might miss subsequent ones, but if we observe
dentry->d_parent == parent, we know that it's stable.  And
if we see dentry->d_parent != parent, we know that dentry
has moved around and we need to retry anyway.

If we do *not* need to retry, we have
	* inode->i_lock held
	* parent->d_lock held
	* dentry->d_parent == parent
	* dentry->d_inode == inode
At that point we can drop rcu_read_lock() - parent can't be
freed under us without dentry->d_parent switched from
parent to something else, and anyone trying to do that will
spin on parent->d_lock.

And now we can grab dentry->d_lock - ordering is guaranteed.
The only minor twist is that once upon a time __lock_parent()
used to have to cope with the possibility of dentry becoming
detached while we were retrying; in that case we obviously
don't want to grab dentry->d_lock - we are holding it already
(as parent->d_lock).

These days (since "make non-exchanging __d_move() copy ->d_parent
rather than swap them") if IS_ROOT(dentry) is false at some
point, it *never* will become true, so this check could've
been dropped, along with the checks for __lock_parent() return
value possibly being NULL.

I'm not saying it's not subtle and nasty - any time you see
READ_ONCE() (or rcu_read_lock(), for that matter), expect
headache.  But trylock is absolutely innocious part here; we
are not even retrying it once.

  reply	other threads:[~2019-03-15  1:50 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-03-14 22:56 dcache locking question Tobin C. Harding
2019-03-14 23:09 ` Matthew Wilcox
2019-03-15  1:38   ` Tobin C. Harding
2019-03-14 23:19 ` Tobin C. Harding
2019-03-15  1:50   ` Al Viro [this message]
2019-03-15 17:38     ` Eric Biggers
2019-03-15 18:54       ` Al Viro
2019-03-16 22:31         ` Paul E. McKenney
2019-03-17  0:18           ` Al Viro
2019-03-17  0:50             ` Paul E. McKenney
2019-03-17  2:20               ` James Bottomley
2019-03-17  3:06                 ` Al Viro
2019-03-17  4:23                   ` James Bottomley
2019-03-18  0:35                     ` Paul E. McKenney
2019-03-18 16:26                       ` James Bottomley
2019-03-18 17:11                         ` Paul E. McKenney
2019-03-19 15:45                           ` Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190315015021.GU2217@ZenIV.linux.org.uk \
    --to=viro@zeniv.linux.org.uk \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=me@tobin.cc \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).