From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from zeniv.linux.org.uk (zeniv.linux.org.uk [62.89.141.173]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 302DA381AE1; Wed, 8 Apr 2026 22:39:35 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=62.89.141.173 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775687979; cv=none; b=Qxc4tE1cYHmDyj4fmfpuLw6iAkLm9ixPRQd/JmoCVzUX6bRkhSos6xhDidH3dDxT79bQ1i4gRuGcuPu36tf0QQkp/SYyrHhD1iJWWlUQn7tvd5kh+/EpWDkv/q/CWFDRjR/rrmCpAZi79tqFbnLAyFpk1qxUJEEgdNgSbYzgXNk= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775687979; c=relaxed/simple; bh=zO8cpU69Yo/rBxVo/fXsWUSGEVR23yKIgYCAeS7fU0E=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=MZ+wME9e/07K+bzpVstYCJ5RuQt/xI465hLBqfWmrxMC4XD+FmVsdwdKn1qwI73uFmJrTtPTAnIK6urA0Fvb09C3d1jObieVnJZbNi0xYm8cL8Yh6H3Qx4npx0akh9Ss0/pWUh/JhZGs77HGg2AyIK6rm/U6BJw7NXTgTuWB35o= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=zeniv.linux.org.uk; spf=none smtp.mailfrom=ftp.linux.org.uk; dkim=pass (2048-bit key) header.d=linux.org.uk header.i=@linux.org.uk header.b=c8TFaHC3; arc=none smtp.client-ip=62.89.141.173 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=zeniv.linux.org.uk Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=ftp.linux.org.uk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linux.org.uk header.i=@linux.org.uk header.b="c8TFaHC3" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=linux.org.uk; s=zeniv-20220401; h=Sender:In-Reply-To:Content-Type: MIME-Version:References:Message-ID:Subject:Cc:To:From:Date:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=wAA/I8qZGdGrm1OkqAgxfqOSto1/4h7OXUBnrqJ9dJQ=; b=c8TFaHC3/2QHNtdhh8nX+LWfkD Wk54/8lK0jmZwgkQUXbZbq7d1X01tfl8su0C9GHEUrQ8nfdRq0VQI4N6ZWOf9v3T5ilHTh019kPEN f8ov7TGxF/pWdqma7VYKTkKIZ7ZSPUbJWMJz2x6EtGtOXzXTJlXfDuh8JK/4UbFni8cAkPxmH3fuZ u4CEaLEtjeM8ww28TTyPq/A0hJ9cWe2wekRvuKzv8fLiRJKMrAqVIjqfWY4VOt48G5R1zbdCV7iXs FsebcsQp+TLC7iHSsQdt0UCgCwcXcOkqc0QiB3qoT4YiL4YvmwIKa4O95espxvz03Zgs3KwGNvxp0 3M3CGW1Q==; Received: from viro by zeniv.linux.org.uk with local (Exim 4.99.1 #2 (Red Hat Linux)) id 1wAbcF-00000000wUD-45mY; Wed, 08 Apr 2026 22:43:20 +0000 Date: Wed, 8 Apr 2026 23:43:19 +0100 From: Al Viro To: Jeff Layton Cc: Christian Brauner , Jan Kara , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, clm@meta.com, gustavold@meta.com Subject: Re: [PATCH] dcache: warn when a dentry is freed with a non-empty ->d_lru Message-ID: <20260408224319.GJ3836593@ZenIV> References: <20260406-dcache-warn-v1-1-c665efbc005f@kernel.org> <20260408064251.GE3836593@ZenIV> <20260408192620.GI3836593@ZenIV> <10d8499d60824f319e32f52fd6a196b3645f6652.camel@kernel.org> Precedence: bulk X-Mailing-List: linux-fsdevel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <10d8499d60824f319e32f52fd6a196b3645f6652.camel@kernel.org> Sender: Al Viro On Wed, Apr 08, 2026 at 05:05:41PM -0400, Jeff Layton wrote: > > ... and it still offers zero explanation of the path from livelock to > > UAF. It may or may not be real, but there's nothing in all that verbiage > > even suggesting what it might be. And proposed analysis is flat-out > > wrong. > > > > As for the livelock, see viro/vfs.git #work.dcache-busy-wait (in -next > > as of today). > > Thanks for taking a look, Al. We'll keep looking at this thing and see > if we can collect more info, and come up with a better theory of the > crash. I'll dig deeper into the vmcore tomorrow. OK... FWIW, on the trees up to 7.0-rc7: * dentry passed to select_collect2() must be still attached to the tree (no DCACHE_DENTRY_KILLED on it) * caller holds ->d_lock on that dentry. * in case when its ->d_lockref.count is positive, it's busy (at least at the moment) and there's nothing to be done; move on. * in case when its ->d_lockref.count is negative, it's getting killed right now. That's a busy-wait case - we ignore it on this pass, but remember that we'll need to rescan. * in case when its ->d_lockref.count is 0 and it's not on a shrink list: it's evictable, move it to data.dispose, no problem. * in case when its ->d_lockref.count is 0 and it *is* on a shrink list: try to steal it. We can't do that from the d_walk() callback itself - it's a non-blocking environment, to start with, so no evictions are possible there. What we can do is to have it returned to shrink_dcache_tree() - grab rcu_read_lock() to make sure it won't get freed (it hasn't reached dentry_unlist(), let alone dentry_free(), so the grace period for freeing hasn't started yet) and tell d_walk() to stop and return to caller immediately. Note that we deliberately return without rcu_read_unlock() - that's what protects the victim from getting freed under us. Other thread might or might not get around to starting the eviction of the victim, but whether it does that or not, it won't get around to freeing it. In shrink_dcache_tree() we grab ->d_lock on the victim again (it had been dropped by d_walk() when the callback returned). Then we call lock_for_kill(), which starts with checking that ->d_lockref.count is 0. If it isn't, there's nothing to be done to that sucker; it either went busy (in which case we should ignore it and move on) or somebody (likely the owner of the shrink list it had been on) got around to evicting it; in the latter case we need to wait until the damn thing is killed. *IF* refcount is still 0, we carefully acquire the ->i_lock on its inode (if any). Note that we are still holding rcu_read_lock(), so it can't get freed under us even if we have to drop and regain ->d_lock. We do need to recheck that refcount is still zero if we do that, obviously. Failing lock_for_kill() is either due to dentry becoming busy (and thus to be skipped) or due to dentry having been passed to __dentry_kill(). In the latter case we busy-wait, same as we'd do if we saw negative ->d_lockref.count in select_collect2(). Successful lock_for_kill() is followed by evicting the sucker; we do *not* remove it (or its ancestors, if it had been the only thing holding them busy) from whatever shrink list they'd been on. Anything on shrink lists gets reduced to the state of passive chunk of memory no longer connected to filesystem objects, marked with DCACHE_DENTRY_KILLED and left for the owner of shrink list to free once it gets around to that. There's no urgency anymore - rcu_read_lock(); if (!lock_for_kill(dentry)) { bool can_free; rcu_read_unlock(); d_shrink_del(dentry); can_free = dentry->d_flags & DCACHE_DENTRY_KILLED; spin_unlock(&dentry->d_lock); if (can_free) dentry_free(dentry); continue; } on the shrink_dentry_list() side will have lock_for_kill() return false (->d_lockref.count already negative), remove the sucker from the list, see DCACHE_DENTRY_KILLED on it, unlock the sucker and free it. Note that nothing in that sequence touches any fs objects - it's just a disposal of inert chunk of memory now. Note that we can't be stealing from ourselves - if anything had been added to data.dispose, d_walk() sees D_WALK_QUIT or D_WALK_NORETRY and either buggers off immediately or sets retry to false. It won't restart walking the tree in either case, so there's no way for the same dentry to be revisited by it. Having one shrink_dentry_tree() steal from another is OK - see the conditions when shrink_dentry_list() and __dentry_kill() are calling dentry_free() (called 'can_free' in both). Getting rid of busy-wait is handled with fairly small modification to that: select_collect2() treats negative refcount same as it would treat 0-and-on-shrink-list case and shrink_dcache_tree() checks if the victim has negative refcount and no DCACHE_DENTRY_KILLED yet. In that case it adds a local object (struct completion_list node) to a list hanging off dentry->waiters, evicts whatever else it might've collected, then waits for dentry_unlist() on the victim to have called complete() on the node.completion for everything on ->waiters. The rest of the logics is unchanged - if victim is not in the middle of eviction, we try to steal it, etc., same as in the mainline.