From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=Dyet=RS=vger.kernel.org=linux-fsdevel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-2.5 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS,
	MAILING_LIST_MULTI,SPF_PASS,USER_AGENT_MUTT autolearn=ham autolearn_force=no
	version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 895A0C43381
	for <linux-fsdevel@archiver.kernel.org>; Fri, 15 Mar 2019 18:55:01 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by mail.kernel.org (Postfix) with ESMTP id 6518621871
	for <linux-fsdevel@archiver.kernel.org>; Fri, 15 Mar 2019 18:55:01 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1726213AbfCOSzA (ORCPT
        <rfc822;linux-fsdevel@archiver.kernel.org>);
        Fri, 15 Mar 2019 14:55:00 -0400
Received: from zeniv.linux.org.uk ([195.92.253.2]:36596 "EHLO
        ZenIV.linux.org.uk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1725922AbfCOSzA (ORCPT
        <rfc822;linux-fsdevel@vger.kernel.org>);
        Fri, 15 Mar 2019 14:55:00 -0400
Received: from viro by ZenIV.linux.org.uk with local (Exim 4.92 #3 (Red Hat Linux))
        id 1h4ryp-0001sz-OT; Fri, 15 Mar 2019 18:54:55 +0000
Date:   Fri, 15 Mar 2019 18:54:55 +0000
From:   Al Viro <viro@zeniv.linux.org.uk>
To:     Eric Biggers <ebiggers@kernel.org>
Cc:     "Tobin C. Harding" <me@tobin.cc>, linux-fsdevel@vger.kernel.org,
        "Paul E. McKenney" <paulmck@linux.ibm.com>
Subject: Re: dcache locking question
Message-ID: <20190315185455.GA2217@ZenIV.linux.org.uk>
References: <20190314225632.GB15813@eros.localdomain>
 <20190314231939.GA17269@eros.localdomain>
 <20190315015021.GU2217@ZenIV.linux.org.uk>
 <20190315173819.GB77949@gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20190315173819.GB77949@gmail.com>
User-Agent: Mutt/1.10.1 (2018-07-13)
Sender: linux-fsdevel-owner@vger.kernel.org
Precedence: bulk
List-ID: <linux-fsdevel.vger.kernel.org>
X-Mailing-List: linux-fsdevel@vger.kernel.org

On Fri, Mar 15, 2019 at 10:38:23AM -0700, Eric Biggers wrote:
> On Fri, Mar 15, 2019 at 01:50:21AM +0000, Al Viro wrote:
> > 
> > If it fails, we call __lock_parent().  Which
> > 	* grabs RCU lock
> > 	* drops ->d_lock (now we are not holding ->d_lock
> > on anything).
> > 	* fetches ->d_parent.  Note the READ_ONCE() there -
> > it's *NOT* stable (no ->d_lock held).  We can't expect
> > that ->d_parent won't change or that the reference it used
> > to contribute to parent's refcount is there anymore; as
> > the matter of fact, the only thing that prevents outright
> > _freeing_ of the object 'parent' points to is rcu_read_lock()
> > and RCU delay between dropping the last reference and
> > actual freeing of the sucker.  rcu_read_lock() is there,
> > though, which makes it safe to grab ->d_lock on 'parent'.
> > 
> > That 'parent' might very well have nothing to do with our
> > dentry by now.  We can check if it's equal to its
> > ->d_parent, though.  dentry->d_parent is *NOT* stable
> > at that point.  It might be changing right now.
> > 
> > However, the first store to dentry->d_parent making it
> > not equal to parent would have been done under parent->d_lock.
> > And since we are holding parent->d_lock, we won't miss that
> > store.  We might miss subsequent ones, but if we observe
> > dentry->d_parent == parent, we know that it's stable.  And
> > if we see dentry->d_parent != parent, we know that dentry
> > has moved around and we need to retry anyway.
> 
> Why isn't it necessary to use READ_ONCE(dentry->d_parent) here?
> 
> 	if (unlikely(parent != dentry->d_parent)) {
> 
> Suppose 'parent' is 0xAAAABBBB, and 'dentry->d_parent' is 0xAAAAAAAA and is
> concurrently changed to 0xBBBBBBBB.
> 
> d_parent could be read in two parts, 0xAAAA then 0xBBBB, resulting in it
> appearing that d_parent == 0xAAAABBBB == parent.
> 
> Yes it won't really be compiled as that in practice, but I thought the point of
> READ_ONCE() is to *guarantee* it's really done right...

READ_ONCE does not add any extra warranties of atomicity.  Fetches and stores
of pointers are atomic, period; if that ever breaks, we are in a very deep
trouble all over the place.

What's more, spin_lock acts as a compiler barrier and, on SMP, is an ACQUIRE
operation.  So that second fetch of ->d_parent will happen after we grab
parent->d_lock, from everyone's POV.  Critical areas for the same spinlock
are ordered wrt each other.  So we have observed
	FETCH dentry->d_parent => parent
	LOCK parent->d_lock
	FETCH dentry->d_parent => parent
All stores to dentry->d_parent are done with ->d_lock held on dentry,
old value of dentry->d_parent *and* new value of dentry->d_parent.  So
the second fetch is ordered wrt all stores making dentry->d_parent change
to parent and all stores making it change *from* parent.  We might miss
some stores changing it from one value other than parent to another such,
but the predicate itself is fine and will stay fine until we drop
parent->d_lock.

Paul, could you comment on that one?  The function in question is
this:
static struct dentry *__lock_parent(struct dentry *dentry)
{
        struct dentry *parent;
        rcu_read_lock();
        spin_unlock(&dentry->d_lock);
again:
        parent = READ_ONCE(dentry->d_parent);
        spin_lock(&parent->d_lock);
        /*
         * We can't blindly lock dentry until we are sure
         * that we won't violate the locking order.
         * Any changes of dentry->d_parent must have
         * been done with parent->d_lock held, so
         * spin_lock() above is enough of a barrier
         * for checking if it's still our child.
         */
        if (unlikely(parent != dentry->d_parent)) {
                spin_unlock(&parent->d_lock);
                goto again;
        }
        rcu_read_unlock();
        if (parent != dentry)
                spin_lock_nested(&dentry->d_lock, DENTRY_D_LOCK_NESTED);
        else
                parent = NULL;
        return parent;
}
(in fs/dcache.c) and all stores to ->d_parent are guaranteed to be done
under ->d_lock on dentry itself and ->d_lock on both old and new values
of ->d_parent.