From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from zeniv.linux.org.uk ([195.92.253.2]:47400 "EHLO ZenIV.linux.org.uk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751225AbcFTOwc (ORCPT ); Mon, 20 Jun 2016 10:52:32 -0400 Date: Mon, 20 Jun 2016 15:51:25 +0100 From: Al Viro To: "J. R. Okajima" Cc: linux-fsdevel@vger.kernel.org, Linus Torvalds Subject: Re: Q. hlist_bl_add_head_rcu() in d_alloc_parallel() Message-ID: <20160620145125.GL14480@ZenIV.linux.org.uk> References: <13136.1466196630@jrobl> <20160617221614.GE14480@ZenIV.linux.org.uk> <2123.1466313884@jrobl> <20160619165557.GH14480@ZenIV.linux.org.uk> <28627.1466397254@jrobl> <20160620053530.GI14480@ZenIV.linux.org.uk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20160620053530.GI14480@ZenIV.linux.org.uk> Sender: linux-fsdevel-owner@vger.kernel.org List-ID: On Mon, Jun 20, 2016 at 06:35:30AM +0100, Al Viro wrote: > On Mon, Jun 20, 2016 at 01:34:14PM +0900, J. R. Okajima wrote: > > > > Al Viro: > > > How would processB get past d_wait_lookup()? It would have to have > > > > By the first d_unhashed() test in the loop, processB doesn't reach > > d_wait_lookup(). > > Huh? What first d_unhashed()... > > That check is definitely bogus and I'm completely at loss as to WTF is it > doing there. Thanks for catching that; this kind of idiotic braino can > escape notice when rereading the code again and again, unfortunately ;-/ > > Fixed, will push to Linus tonight or tomorrow. FWIW, I understand how it got there; it was a garbage from cut'n'paste from lockless primary hash lookups (cut'n'paste was for the sake of "compare the name" logics). It was absolutely wrong - dentry is never added to the primary hash until it has been removed from in-lookup one. And we are walking the in-lookup hash chain with its bitlock held, so there's no chance of that. In effect that junk prevented d_alloc_parallel() from *ever* spotting in-lookup matches. What's more, removing it has instantly uncovered another bug in the match-handling code - dget() done under the chain bitlock, which nests inside ->d_lock. Trivially fixed, of course (we just hold rcu_read_lock() through the in-lookup hash search and instead of dget() while holding the chain bitlock do lockref_get_not_dead() after dropping the bitlock), but... *ouch* It's going through the local tests right now; seems to be OK so far; I'll send a pull request once it's through those. But this demonstrates why RTFS (and by somebody other than the author of TFS being R) is really, _really_ important. I have read through that loop many times and kept missing that turdlet ;-/ Al, wearing a brown paperbag ;-/