From: Al Viro <viro@zeniv.linux.org.uk>
To: Calvin Owens <calvin@wbinvd.org>
Cc: Jeff Layton <jlayton@kernel.org>,
Linus Torvalds <torvalds@linux-foundation.org>,
Boqun Feng <boqun@kernel.org>,
"Paul E. McKenney" <paulmck@kernel.org>,
Frederic Weisbecker <frederic@kernel.org>,
Neeraj Upadhyay <neeraj.upadhyay@kernel.org>,
Joel Fernandes <joelagnelf@nvidia.com>,
Josh Triplett <josh@joshtriplett.org>,
Uladzislau Rezki <urezki@gmail.com>,
linux-fsdevel@vger.kernel.org,
Christian Brauner <brauner@kernel.org>, Jan Kara <jack@suse.cz>,
Nikolay Borisov <nik.borisov@suse.com>,
Max Kellermann <max.kellermann@ionos.com>,
Eric Sandeen <sandeen@redhat.com>,
Paulo Alcantara <pc@manguebit.org>
Subject: Re: [RFC][PATCH] make sure that lock_for_kill() callers drop the locks in safe order
Date: Sat, 11 Apr 2026 01:51:26 +0100 [thread overview]
Message-ID: <20260411005126.GA3836593@ZenIV> (raw)
In-Reply-To: <admIGCR-xGfKgIQ-@mozart.vkv.me>
On Fri, Apr 10, 2026 at 04:30:32PM -0700, Calvin Owens wrote:
> Yes exactly, I was trying a lot of different pathological behaviors and
> that was the first pattern I found that triggered the d_walk() spin
> consistently.
>
> Initially I made the reproducer ignore itself in /proc. But it turns out
> to be much more reliable if it *doesn't*. I realized it was opening its
> own children's files in /proc/*/fd/*, and the children were also opening
> other /proc files, which is what made me suspect the magic symlinks.
>
> > IOW, you are opening a random mix of files, both procfs and ones some processes
> > had opened, then exiting. That triggers invalidation of your /proc/*, with
> > a _lot_ of shite that needs to be taken out. It might be triggering a livelock
> > or a UAF somewhere in dcache or it might be something entirely different -
> > no idea at that point. I'll look further into that, but I wouldn't be surprised
> > if it turns out to be entirely unrelated. Would be easier to deal with if the
> > mix had been more predictable, but we have what we have...
>
> I'm sure I can narrow down the reproducer more, I'll try a bit.
No need... Just have parent and child share descriptor table, then let parent
wait for child and child do
char buf[128], buf2[128];
int fd = 0, n;
do {
n = fd;
sprintf(buf, "/proc/self/fdinfo/%d", fd);
fd = open(buf, 0);
} while (fd >= 0);
for (int i = 0; i <= n; i++) {
sprintf(buf, "/proc/self/fd/%d", i);
readlink(buf, buf2, sizeof(buf2));
}
and that's it.
That's a nice demonstration of how nasty conditions can be arranged for
d_invalidate(); *plenty* of busy dentries in the tree (to the tune of
several millions), with some evictables scattered here and there.
If you get enough busy stuff to wade through until you find an eviction
candidate, you'll get select_collect() return D_WALK_QUIT as soon as you
get a single evictable sucker. Of course, then you need to restart
the whole thing - with the same pile of busy ones to get through. Repeat
for every single evictable left in there.
No memory corruption involved, just a fuckton of rescans ;-/
It *is* recoverable - you just need to flush caches hard enough and that'll
unwedge the sucker.
IMO that's a very convincing argument for not using shrink_dcache_parent()
in d_invalidate().
That goes back to 2.1.65 - very early in dcache history. Back then (and
until about 2014, IIRC) it had been "try to evict the subtree, unhash
if they are all gone, fail otherwise"; these days it's worse, since
d_invalidate() is not allowed to fail.
"Rescan if we have found something and ran out of timeslice" is there since
2005 mingo's 116194f29f13 "[PATCH] sched: vfs: fix scheduling latencies in
prune_dcache() and select_parent()" - latency problems prior to that got
traded for this "rescan the same pile of busy stuff for each evictable"
pathological behaviour.
FWIW, it doesn't have to be on procfs - you can arrange for something similar
with fuse.
Joy...
next prev parent reply other threads:[~2026-04-11 0:47 UTC|newest]
Thread overview: 63+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-01-22 20:20 [PATCH][RFC] get rid of busy-wait in shrink_dcache_tree() Al Viro
2026-01-23 0:19 ` Linus Torvalds
2026-01-23 0:36 ` Al Viro
2026-01-24 4:36 ` Al Viro
2026-01-24 4:46 ` Linus Torvalds
2026-01-24 5:36 ` Al Viro
2026-01-24 17:45 ` Linus Torvalds
2026-01-24 18:43 ` Al Viro
2026-01-24 19:32 ` Linus Torvalds
2026-01-24 20:28 ` Al Viro
2026-04-02 18:08 ` [RFC PATCH v2 0/4] getting rid of busy-wait in shrink_dcache_parent() Al Viro
2026-04-02 18:08 ` [RFC PATCH v2 1/4] for_each_alias(): helper macro for iterating through dentries of given inode Al Viro
2026-04-02 18:08 ` [RFC PATCH v2 2/4] struct dentry: make ->d_u anonymous Al Viro
2026-04-02 18:08 ` [RFC PATCH v2 3/4] dcache.c: more idiomatic "positives are not allowed" sanity checks Al Viro
2026-04-02 18:08 ` [RFC PATCH v2 4/4] get rid of busy-waiting in shrink_dcache_tree() Al Viro
2026-04-02 19:52 ` Linus Torvalds
2026-04-02 22:44 ` Al Viro
2026-04-02 22:49 ` Linus Torvalds
2026-04-02 23:16 ` Al Viro
2026-04-03 0:29 ` Linus Torvalds
2026-04-03 2:15 ` Al Viro
2026-04-04 0:02 ` Al Viro
2026-04-04 0:04 ` Linus Torvalds
2026-04-04 18:54 ` Al Viro
2026-04-04 19:04 ` Linus Torvalds
2026-04-05 0:04 ` Al Viro
2026-04-02 20:28 ` [RFC PATCH v2 0/4] getting rid of busy-wait in shrink_dcache_parent() Paulo Alcantara
2026-04-03 4:46 ` Al Viro
2026-04-04 8:07 ` [RFC PATCH v3 " Al Viro
2026-04-04 8:07 ` [RFC PATCH v3 1/4] for_each_alias(): helper macro for iterating through dentries of given inode Al Viro
2026-04-04 8:07 ` [RFC PATCH v3 2/4] struct dentry: make ->d_u anonymous Al Viro
2026-04-04 8:07 ` [RFC PATCH v3 3/4] dcache.c: more idiomatic "positives are not allowed" sanity checks Al Viro
2026-04-04 8:07 ` [RFC PATCH v3 4/4] get rid of busy-waiting in shrink_dcache_tree() Al Viro
2026-04-09 16:51 ` [RFC PATCH v3 0/4] getting rid of busy-wait in shrink_dcache_parent() Jeff Layton
2026-04-09 19:02 ` Al Viro
2026-04-09 20:10 ` Jeff Layton
2026-04-09 21:57 ` Al Viro
2026-04-09 22:38 ` Jeff Layton
2026-04-10 8:48 ` [RFC][PATCH] make sure that lock_for_kill() callers drop the locks in safe order Al Viro
2026-04-10 11:18 ` Jeff Layton
2026-04-10 11:56 ` Jeff Layton
2026-04-10 15:25 ` Linus Torvalds
2026-04-10 15:57 ` Al Viro
2026-04-10 16:27 ` Boqun Feng
2026-04-10 17:31 ` Linus Torvalds
2026-04-10 18:11 ` Paul E. McKenney
2026-04-10 18:21 ` Jeff Layton
2026-04-10 19:19 ` Al Viro
2026-04-10 19:32 ` Jeff Layton
2026-04-10 21:13 ` Calvin Owens
2026-04-10 21:24 ` Al Viro
2026-04-10 22:15 ` Calvin Owens
2026-04-10 23:05 ` Al Viro
2026-04-10 23:30 ` Calvin Owens
2026-04-11 0:51 ` Al Viro [this message]
2026-04-10 17:32 ` Paul E. McKenney
2026-04-10 18:26 ` Jeff Layton
2026-04-10 18:36 ` Paul E. McKenney
2026-04-10 18:52 ` Al Viro
2026-04-10 19:21 ` Paul E. McKenney
2026-04-10 19:30 ` Linus Torvalds
2026-04-10 20:24 ` Al Viro
2026-04-10 20:48 ` Al Viro
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260411005126.GA3836593@ZenIV \
--to=viro@zeniv.linux.org.uk \
--cc=boqun@kernel.org \
--cc=brauner@kernel.org \
--cc=calvin@wbinvd.org \
--cc=frederic@kernel.org \
--cc=jack@suse.cz \
--cc=jlayton@kernel.org \
--cc=joelagnelf@nvidia.com \
--cc=josh@joshtriplett.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=max.kellermann@ionos.com \
--cc=neeraj.upadhyay@kernel.org \
--cc=nik.borisov@suse.com \
--cc=paulmck@kernel.org \
--cc=pc@manguebit.org \
--cc=sandeen@redhat.com \
--cc=torvalds@linux-foundation.org \
--cc=urezki@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox