From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.5 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_PASS,USER_AGENT_MUTT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 895A0C43381 for ; Fri, 15 Mar 2019 18:55:01 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 6518621871 for ; Fri, 15 Mar 2019 18:55:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726213AbfCOSzA (ORCPT ); Fri, 15 Mar 2019 14:55:00 -0400 Received: from zeniv.linux.org.uk ([195.92.253.2]:36596 "EHLO ZenIV.linux.org.uk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725922AbfCOSzA (ORCPT ); Fri, 15 Mar 2019 14:55:00 -0400 Received: from viro by ZenIV.linux.org.uk with local (Exim 4.92 #3 (Red Hat Linux)) id 1h4ryp-0001sz-OT; Fri, 15 Mar 2019 18:54:55 +0000 Date: Fri, 15 Mar 2019 18:54:55 +0000 From: Al Viro To: Eric Biggers Cc: "Tobin C. Harding" , linux-fsdevel@vger.kernel.org, "Paul E. McKenney" Subject: Re: dcache locking question Message-ID: <20190315185455.GA2217@ZenIV.linux.org.uk> References: <20190314225632.GB15813@eros.localdomain> <20190314231939.GA17269@eros.localdomain> <20190315015021.GU2217@ZenIV.linux.org.uk> <20190315173819.GB77949@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190315173819.GB77949@gmail.com> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org On Fri, Mar 15, 2019 at 10:38:23AM -0700, Eric Biggers wrote: > On Fri, Mar 15, 2019 at 01:50:21AM +0000, Al Viro wrote: > > > > If it fails, we call __lock_parent(). Which > > * grabs RCU lock > > * drops ->d_lock (now we are not holding ->d_lock > > on anything). > > * fetches ->d_parent. Note the READ_ONCE() there - > > it's *NOT* stable (no ->d_lock held). We can't expect > > that ->d_parent won't change or that the reference it used > > to contribute to parent's refcount is there anymore; as > > the matter of fact, the only thing that prevents outright > > _freeing_ of the object 'parent' points to is rcu_read_lock() > > and RCU delay between dropping the last reference and > > actual freeing of the sucker. rcu_read_lock() is there, > > though, which makes it safe to grab ->d_lock on 'parent'. > > > > That 'parent' might very well have nothing to do with our > > dentry by now. We can check if it's equal to its > > ->d_parent, though. dentry->d_parent is *NOT* stable > > at that point. It might be changing right now. > > > > However, the first store to dentry->d_parent making it > > not equal to parent would have been done under parent->d_lock. > > And since we are holding parent->d_lock, we won't miss that > > store. We might miss subsequent ones, but if we observe > > dentry->d_parent == parent, we know that it's stable. And > > if we see dentry->d_parent != parent, we know that dentry > > has moved around and we need to retry anyway. > > Why isn't it necessary to use READ_ONCE(dentry->d_parent) here? > > if (unlikely(parent != dentry->d_parent)) { > > Suppose 'parent' is 0xAAAABBBB, and 'dentry->d_parent' is 0xAAAAAAAA and is > concurrently changed to 0xBBBBBBBB. > > d_parent could be read in two parts, 0xAAAA then 0xBBBB, resulting in it > appearing that d_parent == 0xAAAABBBB == parent. > > Yes it won't really be compiled as that in practice, but I thought the point of > READ_ONCE() is to *guarantee* it's really done right... READ_ONCE does not add any extra warranties of atomicity. Fetches and stores of pointers are atomic, period; if that ever breaks, we are in a very deep trouble all over the place. What's more, spin_lock acts as a compiler barrier and, on SMP, is an ACQUIRE operation. So that second fetch of ->d_parent will happen after we grab parent->d_lock, from everyone's POV. Critical areas for the same spinlock are ordered wrt each other. So we have observed FETCH dentry->d_parent => parent LOCK parent->d_lock FETCH dentry->d_parent => parent All stores to dentry->d_parent are done with ->d_lock held on dentry, old value of dentry->d_parent *and* new value of dentry->d_parent. So the second fetch is ordered wrt all stores making dentry->d_parent change to parent and all stores making it change *from* parent. We might miss some stores changing it from one value other than parent to another such, but the predicate itself is fine and will stay fine until we drop parent->d_lock. Paul, could you comment on that one? The function in question is this: static struct dentry *__lock_parent(struct dentry *dentry) { struct dentry *parent; rcu_read_lock(); spin_unlock(&dentry->d_lock); again: parent = READ_ONCE(dentry->d_parent); spin_lock(&parent->d_lock); /* * We can't blindly lock dentry until we are sure * that we won't violate the locking order. * Any changes of dentry->d_parent must have * been done with parent->d_lock held, so * spin_lock() above is enough of a barrier * for checking if it's still our child. */ if (unlikely(parent != dentry->d_parent)) { spin_unlock(&parent->d_lock); goto again; } rcu_read_unlock(); if (parent != dentry) spin_lock_nested(&dentry->d_lock, DENTRY_D_LOCK_NESTED); else parent = NULL; return parent; } (in fs/dcache.c) and all stores to ->d_parent are guaranteed to be done under ->d_lock on dentry itself and ->d_lock on both old and new values of ->d_parent.