From mboxrd@z Thu Jan 1 00:00:00 1970 From: Al Viro Subject: Re: ->d_lock FUBAR (was Re: Linux 3.0-rc6) Date: Wed, 13 Jul 2011 01:56:34 +0100 Message-ID: <20110713005634.GK11013@ZenIV.linux.org.uk> References: <20110711060315.GI11013@ZenIV.linux.org.uk> <20110712234806.GJ11013@ZenIV.linux.org.uk> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: linux-fsdevel@vger.kernel.org, npiggin@kernel.dk To: Linus Torvalds Return-path: Received: from zeniv.linux.org.uk ([195.92.253.2]:42965 "EHLO ZenIV.linux.org.uk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753709Ab1GMA4f (ORCPT ); Tue, 12 Jul 2011 20:56:35 -0400 Content-Disposition: inline In-Reply-To: Sender: linux-fsdevel-owner@vger.kernel.org List-ID: On Tue, Jul 12, 2011 at 05:04:55PM -0700, Linus Torvalds wrote: > > All flakiness of the locking "order" aside, here we simply lock two dentries > > that might be nowhere near each other by now. ?Hell, by that point the parent > > might've been moved under (what used to be) child. ?Or it might have address > > greater than that of child and be not an ancestor anymore. ?Note that no > > i_mutex, etc. is held at that point, so there's no external serialization > > to save our arses... > > So why wouldn't it be sufficient to just take the s_vfs_rename_mutex > in the caller? We're only talking d_materialise_unique(), no? Alas, no. d_materialize_unique() aside (we have a couple of bugs in there), ->d_lock handling is fscked up. Basically, it's unlazy_walk() taking ->d_lock instances without anything to guarantee that it's doing that in order consistent with d_move() (and any other users, for that matter). *And* the "locking order" in question is not transitive, but that's a separate story. And no, we obviously are not going to serialize the switch from RCU to normal pathwalk on a per-fs mutex... I'm crawling through the d_lock users right now (>400 places ;-/), trying to put together reasonable locking rules. FWIW, we used to have this for d_move() et.al.: * all places changing ->d_parent hold i_mutex on parent(s) * if parents differ, we have ->s_vfs_rename_mutex as well * if old_dentry was not attached to the tree, it was positive and a subdirectory of new_dentry->d_parent. Said new_dentry->d_parent was been locked by caller. Unfortunately, current tree has these rules violated in several places. And these rules have nothing to say about ->d_lock ;-/ Nick, could you please describe the locking rules you had in mind for ->d_lock? unlazy_walk() (aka nameidata_dentry_drop_rcu()) can probably be dealt with by checking d_seq twice, once before locking the child. Then we could be sure that it's still a child of parent and will stay so as long as parent's ->d_lock is held, and thus the ordering would stay stable...