From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from zeniv.linux.org.uk (zeniv.linux.org.uk [62.89.141.173]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C5B1840DFA1 for ; Sat, 11 Apr 2026 00:47:40 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=62.89.141.173 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775868463; cv=none; b=iceMvbKrsB8KuWK2Zya7RL4X7NrciAQxwj6o7ya+Abq8MA12TjLTRe1CmYvUfBpoyZ0MCJOBydk1IkiFY6kPKMPcCieDfQlDWNPAVeB05S5EbozwS7fM6GlfSz6zvfqcIMPyaONpN22oZeSG8V0g/gGO1cu8L+CXmTc0gJ1bisI= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775868463; c=relaxed/simple; bh=GjUA1w4uy+gJaHsD5UtzxhvUZSEiijs9oqNYb4aAHIw=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=s0Miq6h4peRM1L+uziThxNaPY90UTaNkR3Elp6zy38vwwP+ywgmDBZ5TDrDDuyTaTX+RjX+TJSxfKY1iwOQLKLz+DlMKA6+W0/yg7q80VfSVZXAoTU782UKfnXBbnbM2PZ1gzsrJeVhrJ5xyEc5uqOGgWEq+ps0Lgk6AFDP58so= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=zeniv.linux.org.uk; spf=none smtp.mailfrom=ftp.linux.org.uk; dkim=pass (2048-bit key) header.d=linux.org.uk header.i=@linux.org.uk header.b=WadT0o+s; arc=none smtp.client-ip=62.89.141.173 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=zeniv.linux.org.uk Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=ftp.linux.org.uk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linux.org.uk header.i=@linux.org.uk header.b="WadT0o+s" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=linux.org.uk; s=zeniv-20220401; h=Sender:In-Reply-To:Content-Type: MIME-Version:References:Message-ID:Subject:Cc:To:From:Date:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=HGSc/FpeQlKx0OfBbWoN1HFGfC+ie72LFhc28NitwGU=; b=WadT0o+s+z0wrvGKAeuw4pRBBl qGXL274XAMfh/qQNZVV1Qf5nxjhzLXuYvNPIKQcsoQWEnqkDa/l13teTycqJPYt2KMXELZKloGSIg wuV+vB6/b7mHuQN4N6Pokk4o5p4hkdRQJzdJPMqQBuFPHk2rSQTIajHrr3OB8lC4nU0JZikwDFukQ E1+Arsr0+4EqUA605Z9lacNpRQbKJUt150Hsurkzru8R2hgJtohvS1YTUFetwwdNMDRDIeK6LBxTZ 6j/2gAQk8iJimJU45F9GZSHqs0ZZANYi7ywPfTQodw7U0FiVKAJL7AIIBrfN670YrIrDlBvINWdvx hO6Yc8xw==; Received: from viro by zeniv.linux.org.uk with local (Exim 4.99.1 #2 (Red Hat Linux)) id 1wBMZK-000000083Jn-1LLq; Sat, 11 Apr 2026 00:51:26 +0000 Date: Sat, 11 Apr 2026 01:51:26 +0100 From: Al Viro To: Calvin Owens Cc: Jeff Layton , Linus Torvalds , Boqun Feng , "Paul E. McKenney" , Frederic Weisbecker , Neeraj Upadhyay , Joel Fernandes , Josh Triplett , Uladzislau Rezki , linux-fsdevel@vger.kernel.org, Christian Brauner , Jan Kara , Nikolay Borisov , Max Kellermann , Eric Sandeen , Paulo Alcantara Subject: Re: [RFC][PATCH] make sure that lock_for_kill() callers drop the locks in safe order Message-ID: <20260411005126.GA3836593@ZenIV> References: <4305138de599923591df7403aefc4d663f50324a.camel@kernel.org> <20260410191907.GV3836593@ZenIV> <30ac5108ada614560326636d4da353d6304c3f91.camel@kernel.org> <20260410212403.GY3836593@ZenIV> <20260410230553.GZ3836593@ZenIV> Precedence: bulk X-Mailing-List: linux-fsdevel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: Al Viro On Fri, Apr 10, 2026 at 04:30:32PM -0700, Calvin Owens wrote: > Yes exactly, I was trying a lot of different pathological behaviors and > that was the first pattern I found that triggered the d_walk() spin > consistently. > > Initially I made the reproducer ignore itself in /proc. But it turns out > to be much more reliable if it *doesn't*. I realized it was opening its > own children's files in /proc/*/fd/*, and the children were also opening > other /proc files, which is what made me suspect the magic symlinks. > > > IOW, you are opening a random mix of files, both procfs and ones some processes > > had opened, then exiting. That triggers invalidation of your /proc/*, with > > a _lot_ of shite that needs to be taken out. It might be triggering a livelock > > or a UAF somewhere in dcache or it might be something entirely different - > > no idea at that point. I'll look further into that, but I wouldn't be surprised > > if it turns out to be entirely unrelated. Would be easier to deal with if the > > mix had been more predictable, but we have what we have... > > I'm sure I can narrow down the reproducer more, I'll try a bit. No need... Just have parent and child share descriptor table, then let parent wait for child and child do char buf[128], buf2[128]; int fd = 0, n; do { n = fd; sprintf(buf, "/proc/self/fdinfo/%d", fd); fd = open(buf, 0); } while (fd >= 0); for (int i = 0; i <= n; i++) { sprintf(buf, "/proc/self/fd/%d", i); readlink(buf, buf2, sizeof(buf2)); } and that's it. That's a nice demonstration of how nasty conditions can be arranged for d_invalidate(); *plenty* of busy dentries in the tree (to the tune of several millions), with some evictables scattered here and there. If you get enough busy stuff to wade through until you find an eviction candidate, you'll get select_collect() return D_WALK_QUIT as soon as you get a single evictable sucker. Of course, then you need to restart the whole thing - with the same pile of busy ones to get through. Repeat for every single evictable left in there. No memory corruption involved, just a fuckton of rescans ;-/ It *is* recoverable - you just need to flush caches hard enough and that'll unwedge the sucker. IMO that's a very convincing argument for not using shrink_dcache_parent() in d_invalidate(). That goes back to 2.1.65 - very early in dcache history. Back then (and until about 2014, IIRC) it had been "try to evict the subtree, unhash if they are all gone, fail otherwise"; these days it's worse, since d_invalidate() is not allowed to fail. "Rescan if we have found something and ran out of timeslice" is there since 2005 mingo's 116194f29f13 "[PATCH] sched: vfs: fix scheduling latencies in prune_dcache() and select_parent()" - latency problems prior to that got traded for this "rescan the same pile of busy stuff for each evictable" pathological behaviour. FWIW, it doesn't have to be on procfs - you can arrange for something similar with fuse. Joy...