From: Al Viro <viro@zeniv.linux.org.uk>
To: Eric Biggers <ebiggers@kernel.org>
Cc: "Tobin C. Harding" <me@tobin.cc>,
linux-fsdevel@vger.kernel.org,
"Paul E. McKenney" <paulmck@linux.ibm.com>
Subject: Re: dcache locking question
Date: Fri, 15 Mar 2019 18:54:55 +0000 [thread overview]
Message-ID: <20190315185455.GA2217@ZenIV.linux.org.uk> (raw)
In-Reply-To: <20190315173819.GB77949@gmail.com>
On Fri, Mar 15, 2019 at 10:38:23AM -0700, Eric Biggers wrote:
> On Fri, Mar 15, 2019 at 01:50:21AM +0000, Al Viro wrote:
> >
> > If it fails, we call __lock_parent(). Which
> > * grabs RCU lock
> > * drops ->d_lock (now we are not holding ->d_lock
> > on anything).
> > * fetches ->d_parent. Note the READ_ONCE() there -
> > it's *NOT* stable (no ->d_lock held). We can't expect
> > that ->d_parent won't change or that the reference it used
> > to contribute to parent's refcount is there anymore; as
> > the matter of fact, the only thing that prevents outright
> > _freeing_ of the object 'parent' points to is rcu_read_lock()
> > and RCU delay between dropping the last reference and
> > actual freeing of the sucker. rcu_read_lock() is there,
> > though, which makes it safe to grab ->d_lock on 'parent'.
> >
> > That 'parent' might very well have nothing to do with our
> > dentry by now. We can check if it's equal to its
> > ->d_parent, though. dentry->d_parent is *NOT* stable
> > at that point. It might be changing right now.
> >
> > However, the first store to dentry->d_parent making it
> > not equal to parent would have been done under parent->d_lock.
> > And since we are holding parent->d_lock, we won't miss that
> > store. We might miss subsequent ones, but if we observe
> > dentry->d_parent == parent, we know that it's stable. And
> > if we see dentry->d_parent != parent, we know that dentry
> > has moved around and we need to retry anyway.
>
> Why isn't it necessary to use READ_ONCE(dentry->d_parent) here?
>
> if (unlikely(parent != dentry->d_parent)) {
>
> Suppose 'parent' is 0xAAAABBBB, and 'dentry->d_parent' is 0xAAAAAAAA and is
> concurrently changed to 0xBBBBBBBB.
>
> d_parent could be read in two parts, 0xAAAA then 0xBBBB, resulting in it
> appearing that d_parent == 0xAAAABBBB == parent.
>
> Yes it won't really be compiled as that in practice, but I thought the point of
> READ_ONCE() is to *guarantee* it's really done right...
READ_ONCE does not add any extra warranties of atomicity. Fetches and stores
of pointers are atomic, period; if that ever breaks, we are in a very deep
trouble all over the place.
What's more, spin_lock acts as a compiler barrier and, on SMP, is an ACQUIRE
operation. So that second fetch of ->d_parent will happen after we grab
parent->d_lock, from everyone's POV. Critical areas for the same spinlock
are ordered wrt each other. So we have observed
FETCH dentry->d_parent => parent
LOCK parent->d_lock
FETCH dentry->d_parent => parent
All stores to dentry->d_parent are done with ->d_lock held on dentry,
old value of dentry->d_parent *and* new value of dentry->d_parent. So
the second fetch is ordered wrt all stores making dentry->d_parent change
to parent and all stores making it change *from* parent. We might miss
some stores changing it from one value other than parent to another such,
but the predicate itself is fine and will stay fine until we drop
parent->d_lock.
Paul, could you comment on that one? The function in question is
this:
static struct dentry *__lock_parent(struct dentry *dentry)
{
struct dentry *parent;
rcu_read_lock();
spin_unlock(&dentry->d_lock);
again:
parent = READ_ONCE(dentry->d_parent);
spin_lock(&parent->d_lock);
/*
* We can't blindly lock dentry until we are sure
* that we won't violate the locking order.
* Any changes of dentry->d_parent must have
* been done with parent->d_lock held, so
* spin_lock() above is enough of a barrier
* for checking if it's still our child.
*/
if (unlikely(parent != dentry->d_parent)) {
spin_unlock(&parent->d_lock);
goto again;
}
rcu_read_unlock();
if (parent != dentry)
spin_lock_nested(&dentry->d_lock, DENTRY_D_LOCK_NESTED);
else
parent = NULL;
return parent;
}
(in fs/dcache.c) and all stores to ->d_parent are guaranteed to be done
under ->d_lock on dentry itself and ->d_lock on both old and new values
of ->d_parent.
next prev parent reply other threads:[~2019-03-15 18:55 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-03-14 22:56 dcache locking question Tobin C. Harding
2019-03-14 23:09 ` Matthew Wilcox
2019-03-15 1:38 ` Tobin C. Harding
2019-03-14 23:19 ` Tobin C. Harding
2019-03-15 1:50 ` Al Viro
2019-03-15 17:38 ` Eric Biggers
2019-03-15 18:54 ` Al Viro [this message]
2019-03-16 22:31 ` Paul E. McKenney
2019-03-17 0:18 ` Al Viro
2019-03-17 0:50 ` Paul E. McKenney
2019-03-17 2:20 ` James Bottomley
2019-03-17 3:06 ` Al Viro
2019-03-17 4:23 ` James Bottomley
2019-03-18 0:35 ` Paul E. McKenney
2019-03-18 16:26 ` James Bottomley
2019-03-18 17:11 ` Paul E. McKenney
2019-03-19 15:45 ` Paul E. McKenney
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190315185455.GA2217@ZenIV.linux.org.uk \
--to=viro@zeniv.linux.org.uk \
--cc=ebiggers@kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=me@tobin.cc \
--cc=paulmck@linux.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.