* [PATCH][RFC] get rid of busy-wait in shrink_dcache_tree()
@ 2026-01-22 20:20 Al Viro
2026-01-23 0:19 ` Linus Torvalds
0 siblings, 1 reply; 10+ messages in thread
From: Al Viro @ 2026-01-22 20:20 UTC (permalink / raw)
To: Linus Torvalds
Cc: linux-fsdevel, Christian Brauner, Jan Kara, Nikolay Borisov,
Max Kellermann
There's a case in which shrink_dcache_tree() ends up busy-waiting: if some
dentry in the subtree in question is found to be in process of being evicted
by another thread. We need to wait for that to finish so that parent would
no longer be pinned, to avoid the situations when nothing in the tree is
busy, but shrink_dcache_tree() fails to evict some directory only because
memory pressure initiated eviction of some of its children before we got to
evicting those ourselves. That would be bogus both for shrink_dcache_parent()
and for shrink_dcache_for_umount().
Unfortunately, we have nothing to wait on. That had led to the possibility
of busy-waiting - getting through the iteration of shrink_dcache_tree() main
loop without having made any progress. That's Not Nice(tm) and that had been
discussed quite a few times since at least 2018. Recently it became obvious
that this goes beyond "not nice" - on sufficiently contrieved setup it's
possible to get a livelock there, with both threads involved tied to the same
CPU, shrink_dcache_tree() one with higher priority than the thread that has
given CPU up on may_sleep() in the very beginning of iput() during eviction
of the dentry in the tree shrink_dcache_tree() is busy-waiting for.
Let's get rid of that busy-waiting. Constraints are
* don't grow struct dentry
* don't slow the normal case of __dentry_kill() down
and it turns out to be doable.
Dentries in question are
* already marked dead (negative ->d_count)
* already negative
* already unhashed
* already not in in-lookup hash
* yet to get removed from ->d_sib and get DCACHE_DENTRY_KILLED in
flags.
Neither ->d_alias nor the fields overlapping it (->d_rcu and ->d_in_lookup_hash)
are going to be accessed for these dentries until after dentry_unlist(). What's
more, ->d_alias.next is guaranteed to be NULL.
So we can embed struct completion into struct select_data and (ab)use
->d_alias.next for linked list of struct select_data instances.
If dentry_unlist() finds ->d_alias.next non-NULL, it carefully goes over that
list and calls complete() for each of those.
That way select_collect2() can treat negative ->d_count the same way it deals
with dentries on other thread's shrink list - grab rcu_read_lock(), stash the
dentry into data.victim and tell d_walk() to stop.
If shrink_dcache_parent() runs into that case, it should attach its select_data
to victim dentry, evict whatever normal eviction candidates it has gathered
and wait for completion. Voila...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
diff --git a/fs/dcache.c b/fs/dcache.c
index dc2fff4811d1..6db72a684d8d 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -605,6 +605,54 @@ void d_drop(struct dentry *dentry)
}
EXPORT_SYMBOL(d_drop);
+struct select_data {
+ struct dentry *start;
+ union {
+ long found;
+ struct {
+ struct dentry *victim;
+ struct select_data *next;
+ struct completion completion;
+ };
+ };
+ struct list_head dispose;
+};
+
+/*
+ * shrink_dcache_parent() needs to be notified when dentry in process of
+ * being evicted finally gets unlisted. Such dentries are
+ * already with negative ->d_count
+ * already negative
+ * already not in in-lookup hash
+ * reachable only via ->d_sib.
+ *
+ * Neither ->d_alias, nor ->d_rcu, nor ->d_in_lookup_hash are going to be
+ * accessed for those, so we can (ab)use ->d_alias.next for list of
+ * select_data of waiters. Initially it's going to be NULL and as long
+ * as dentry_unlist() returns it to that state we are fine.
+ */
+static inline void d_add_waiter(struct dentry *dentry, struct select_data *p)
+{
+ struct select_data *v = (void *)dentry->d_u.d_alias.next;
+ init_completion(&p->completion);
+ p->next = v;
+ dentry->d_u.d_alias.next = (void *)p;
+}
+
+static inline void d_complete_waiters(struct dentry *dentry)
+{
+ struct select_data *v = (void *)dentry->d_u.d_alias.next;
+ if (unlikely(v)) {
+ /* some shrink_dcache_tree() instances are waiting */
+ dentry->d_u.d_alias.next = NULL;
+ while (v) {
+ struct completion *r = &v->completion;
+ v = v->next;
+ complete(r);
+ }
+ }
+}
+
static inline void dentry_unlist(struct dentry *dentry)
{
struct dentry *next;
@@ -613,6 +661,7 @@ static inline void dentry_unlist(struct dentry *dentry)
* attached to the dentry tree
*/
dentry->d_flags |= DCACHE_DENTRY_KILLED;
+ d_complete_waiters(dentry);
if (unlikely(hlist_unhashed(&dentry->d_sib)))
return;
__hlist_del(&dentry->d_sib);
@@ -1499,15 +1548,6 @@ int d_set_mounted(struct dentry *dentry)
* constraints.
*/
-struct select_data {
- struct dentry *start;
- union {
- long found;
- struct dentry *victim;
- };
- struct list_head dispose;
-};
-
static enum d_walk_ret select_collect(void *_data, struct dentry *dentry)
{
struct select_data *data = _data;
@@ -1559,6 +1599,10 @@ static enum d_walk_ret select_collect2(void *_data, struct dentry *dentry)
return D_WALK_QUIT;
}
to_shrink_list(dentry, &data->dispose);
+ } else if (dentry->d_lockref.count < 0) {
+ rcu_read_lock();
+ data->victim = dentry;
+ return D_WALK_QUIT;
}
/*
* We can return to the caller if we have found some (this
@@ -1598,12 +1642,26 @@ static void shrink_dcache_tree(struct dentry *parent, bool for_umount)
data.victim = NULL;
d_walk(parent, &data, select_collect2);
if (data.victim) {
- spin_lock(&data.victim->d_lock);
- if (!lock_for_kill(data.victim)) {
- spin_unlock(&data.victim->d_lock);
+ struct dentry *v = data.victim;
+
+ spin_lock(&v->d_lock);
+ if (v->d_lockref.count < 0 &&
+ !(v->d_flags & DCACHE_DENTRY_KILLED)) {
+ // It's busy dying; have it notify us once
+ // it becomes invisible to d_walk().
+ d_add_waiter(v, &data);
+ spin_unlock(&v->d_lock);
+ rcu_read_unlock();
+ if (!list_empty(&data.dispose))
+ shrink_dentry_list(&data.dispose);
+ wait_for_completion(&data.completion);
+ continue;
+ }
+ if (!lock_for_kill(v)) {
+ spin_unlock(&v->d_lock);
rcu_read_unlock();
} else {
- shrink_kill(data.victim);
+ shrink_kill(v);
}
}
if (!list_empty(&data.dispose))
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH][RFC] get rid of busy-wait in shrink_dcache_tree()
2026-01-22 20:20 [PATCH][RFC] get rid of busy-wait in shrink_dcache_tree() Al Viro
@ 2026-01-23 0:19 ` Linus Torvalds
2026-01-23 0:36 ` Al Viro
0 siblings, 1 reply; 10+ messages in thread
From: Linus Torvalds @ 2026-01-23 0:19 UTC (permalink / raw)
To: Al Viro
Cc: linux-fsdevel, Christian Brauner, Jan Kara, Nikolay Borisov,
Max Kellermann
On Thu, 22 Jan 2026 at 12:18, Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> +static inline void d_add_waiter(struct dentry *dentry, struct select_data *p)
> +{
> + struct select_data *v = (void *)dentry->d_u.d_alias.next;
> + init_completion(&p->completion);
> + p->next = v;
> + dentry->d_u.d_alias.next = (void *)p;
> +}
I tend to not love it when I see new users of completions - I've seen
too many mis-uses - but this does seem to be a good use-case for them.
That said, I absolutely abhor your cast. Christ - that 'd_u' is
*already* a union, exactly because that thing gets used for different
things - just add a new union member, instead of mis-using an existing
union member that then requires you to cast the data to a different
form.
Yes, you had an explanation for why you used d_alias.next, but please
make that explanation be in the union itself, not in the commit
message of something that mis-uses the union. Please?
That way there's no need for a cast, and you can name that new union
member something that also clarifies things on a source level
("eviction_completion" or whatever).
Or am I missing something?
Linus
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH][RFC] get rid of busy-wait in shrink_dcache_tree()
2026-01-23 0:19 ` Linus Torvalds
@ 2026-01-23 0:36 ` Al Viro
2026-01-24 4:36 ` Al Viro
0 siblings, 1 reply; 10+ messages in thread
From: Al Viro @ 2026-01-23 0:36 UTC (permalink / raw)
To: Linus Torvalds
Cc: linux-fsdevel, Christian Brauner, Jan Kara, Nikolay Borisov,
Max Kellermann
On Thu, Jan 22, 2026 at 04:19:56PM -0800, Linus Torvalds wrote:
> On Thu, 22 Jan 2026 at 12:18, Al Viro <viro@zeniv.linux.org.uk> wrote:
> >
> > +static inline void d_add_waiter(struct dentry *dentry, struct select_data *p)
> > +{
> > + struct select_data *v = (void *)dentry->d_u.d_alias.next;
> > + init_completion(&p->completion);
> > + p->next = v;
> > + dentry->d_u.d_alias.next = (void *)p;
> > +}
>
> I tend to not love it when I see new users of completions - I've seen
> too many mis-uses - but this does seem to be a good use-case for them.
>
> That said, I absolutely abhor your cast. Christ - that 'd_u' is
> *already* a union, exactly because that thing gets used for different
> things - just add a new union member, instead of mis-using an existing
> union member that then requires you to cast the data to a different
> form.
>
> Yes, you had an explanation for why you used d_alias.next, but please
> make that explanation be in the union itself, not in the commit
> message of something that mis-uses the union. Please?
>
> That way there's no need for a cast, and you can name that new union
> member something that also clarifies things on a source level
> ("eviction_completion" or whatever).
>
> Or am I missing something?
In practice it doesn't really matter, but we don't want to initialize
that field to NULL - no good place for doing that. Sure, the entire
d_alias has been subject to hlist_del_init() or INIT_HLIST_NODE(), so
any pointer field unioned with it will end up being NULL without
any assignments to it, but... ugh. "We have a union of two-pointer
struct, a pointer and some other stuff; we'd set both members of that
struct member to NULL and count upon the pointer member of union
having been zeroed by that" leaves a bad taste.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH][RFC] get rid of busy-wait in shrink_dcache_tree()
2026-01-23 0:36 ` Al Viro
@ 2026-01-24 4:36 ` Al Viro
2026-01-24 4:46 ` Linus Torvalds
0 siblings, 1 reply; 10+ messages in thread
From: Al Viro @ 2026-01-24 4:36 UTC (permalink / raw)
To: Linus Torvalds
Cc: linux-fsdevel, Christian Brauner, Jan Kara, Nikolay Borisov,
Max Kellermann
On Fri, Jan 23, 2026 at 12:36:51AM +0000, Al Viro wrote:
> On Thu, Jan 22, 2026 at 04:19:56PM -0800, Linus Torvalds wrote:
> > On Thu, 22 Jan 2026 at 12:18, Al Viro <viro@zeniv.linux.org.uk> wrote:
> > >
> > > +static inline void d_add_waiter(struct dentry *dentry, struct select_data *p)
> > > +{
> > > + struct select_data *v = (void *)dentry->d_u.d_alias.next;
> > > + init_completion(&p->completion);
> > > + p->next = v;
> > > + dentry->d_u.d_alias.next = (void *)p;
> > > +}
> >
> > I tend to not love it when I see new users of completions - I've seen
> > too many mis-uses - but this does seem to be a good use-case for them.
> >
> > That said, I absolutely abhor your cast. Christ - that 'd_u' is
> > *already* a union, exactly because that thing gets used for different
> > things - just add a new union member, instead of mis-using an existing
> > union member that then requires you to cast the data to a different
> > form.
> >
> > Yes, you had an explanation for why you used d_alias.next, but please
> > make that explanation be in the union itself, not in the commit
> > message of something that mis-uses the union. Please?
> >
> > That way there's no need for a cast, and you can name that new union
> > member something that also clarifies things on a source level
> > ("eviction_completion" or whatever).
> >
> > Or am I missing something?
>
> In practice it doesn't really matter, but we don't want to initialize
> that field to NULL - no good place for doing that. Sure, the entire
> d_alias has been subject to hlist_del_init() or INIT_HLIST_NODE(), so
> any pointer field unioned with it will end up being NULL without
> any assignments to it, but... ugh. "We have a union of two-pointer
> struct, a pointer and some other stuff; we'd set both members of that
> struct member to NULL and count upon the pointer member of union
> having been zeroed by that" leaves a bad taste.
FWIW, there's another reason, but it's not one I'm fond of. There are few
places where we use hlist_unhashed(&dentry->d_u.d_alias) as a debugging
check. I _think_ that none of those are reachable for dentries in that state,
but then all of them are of "it should evaluate true unless we have a kernel
bug" kind.
I'm not saying it's a good reason. As the matter of fact, the tests are
misspelled checks for dentry being negative. _If_ we kill those, we
could declare the state of ->d_u.d_alias undefined for negative dentries
and have transitions to that state explicitly clear the new field instead.
IOW, hlist_del_init() in dentry_unlink_inode() becomes
__hlist_del(&dentry->d_u.d_alias); // !hlist_unhashed() guaranteed
dentry->d_u.shrink_waiters = NULL;
INIT_HLIST_NODE(&dentry->d_u.d_alias) in __d_alloc() and __d_lookup_unhash() simply
dentry->d_u.shrink_waiters = NULL;
Then we are out of the nasal demon country.
Debugging tests are interesting, though. We have them in
dentry_free() - WARN_ON if on alias list; what we want is rather "have
DENTRY_KILLED in flags", and I would probably add "no PERSISTENT" as well as
"no LRU_LIST" there.
d_instantiate(), d_instantiate_new(), d_make_persistent(), d_mark_tmpfile() -
a mix of WARN_ON() and BUG_ON(); in all cases it should be "it must be negative
here", and in all cases it's done before ->d_lock is taken. Not wanting an oops
while holding ->d_lock (or ->i_lock, for that matter) on anything is understandable,
and stability is actually provided by the callers, but... it's still confusing
for readers, especially since the real proof of stability is nowhere near trivial.
I'd probably go for d_really_is_negative() in all of those; it might make sense to
unify some of that boilerplate, but that's for the next cycle...
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH][RFC] get rid of busy-wait in shrink_dcache_tree()
2026-01-24 4:36 ` Al Viro
@ 2026-01-24 4:46 ` Linus Torvalds
2026-01-24 5:36 ` Al Viro
0 siblings, 1 reply; 10+ messages in thread
From: Linus Torvalds @ 2026-01-24 4:46 UTC (permalink / raw)
To: Al Viro
Cc: linux-fsdevel, Christian Brauner, Jan Kara, Nikolay Borisov,
Max Kellermann
On Fri, 23 Jan 2026 at 20:34, Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> >
> > In practice it doesn't really matter, but we don't want to initialize
> > that field to NULL - no good place for doing that. Sure, the entire
> > d_alias has been subject to hlist_del_init() or INIT_HLIST_NODE(), so
> > any pointer field unioned with it will end up being NULL without
> > any assignments to it, but... ugh. "We have a union of two-pointer
> > struct, a pointer and some other stuff; we'd set both members of that
> > struct member to NULL and count upon the pointer member of union
> > having been zeroed by that" leaves a bad taste.
>
> FWIW, there's another reason, but it's not one I'm fond of. There are few
> places where we use hlist_unhashed(&dentry->d_u.d_alias) as a debugging
> check. I _think_ that none of those are reachable for dentries in that state,
> but then all of them are of "it should evaluate true unless we have a kernel
> bug" kind.
>
Note that I'm not complaining about re-using the
'dentry->d_u.d_alias.next' field.
I'm complaining about re-using it WITH A CAST, AND WITHOUT DOCUMENTING
IT IN THE TYPE DECLARATION.
So by all means use that field, and use that value, but use it *BY*
making it a proper union member, and by documenting the lifetime
change of that union member.
None of this "randomly cast a different pointer to 'void *' where it
will then be implicitly cast to to 'struct select_data *'" garbage.
Just make a proper "struct select_data *" union member that aliases
that "d_alias.next" field, and the compiler will generate the EXACT
same code, except the source code will be cleaner, and you won't need
any hacky pointer casts.
And document how that field is NULL when the dentry is killed, and how
that NULL 'dentry->d_u.d_alias.next' field at that point becomes a
NULL 'dentry->d_u.d_select_data' field.
You don't need to describe 'struct select_data', you just need to
declare it. IOW, something like this:
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -89,6 +89,11 @@ union shortname_store {
#define d_lock d_lockref.lock
#define d_iname d_shortname.string
+// The d_alias list becomes a 'struct select_data' pointer when
+// the dentry is killed (so the NULL d_alias.next pointer becomes
+// a NULL 'struct select_data *'
+struct select_data;
+
struct dentry {
/* RCU lookup touched fields */
unsigned int d_flags; /* protected by d_lock */
@@ -128,6 +133,7 @@ struct dentry {
struct hlist_node d_alias; /* inode alias list */
struct hlist_bl_node d_in_lookup_hash; /* only for
in-lookup ones */
struct rcu_head d_rcu;
+ struct select_data *d_select_data; /* only for killed
dentries */
} d_u;
};
but I obviously didn't test it.
Linus
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH][RFC] get rid of busy-wait in shrink_dcache_tree()
2026-01-24 4:46 ` Linus Torvalds
@ 2026-01-24 5:36 ` Al Viro
2026-01-24 17:45 ` Linus Torvalds
0 siblings, 1 reply; 10+ messages in thread
From: Al Viro @ 2026-01-24 5:36 UTC (permalink / raw)
To: Linus Torvalds
Cc: linux-fsdevel, Christian Brauner, Jan Kara, Nikolay Borisov,
Max Kellermann
On Fri, Jan 23, 2026 at 08:46:52PM -0800, Linus Torvalds wrote:
> Just make a proper "struct select_data *" union member that aliases
> that "d_alias.next" field, and the compiler will generate the EXACT
> same code, except the source code will be cleaner, and you won't need
> any hacky pointer casts.
>
> And document how that field is NULL when the dentry is killed, and how
> that NULL 'dentry->d_u.d_alias.next' field at that point becomes a
> NULL 'dentry->d_u.d_select_data' field.
>
> You don't need to describe 'struct select_data', you just need to
> declare it. IOW, something like this:
As the matter of fact, that _was_ the previous iteration of that patch -
see http://ftp.linux.org.uk/people/viro/y8
The only trouble is that as soon as some joker slaps __randomize_layout
on struct hlist_node they'll start flipping from sharing with ->next to
sharing with ->pprev, at random.
I'm not saying it wouldn't work, but I would rather have the proofs of
correctness less subtle. And it's not even hard to do - the only
rule added would be that ->d_u.d_alias should never be accessed for
negative dentries and never without ->i_lock on the inode of dentry
in question.
The only places where it does not hold at the moment are those WARN_ON()
and we'll be better off having those spelled in less obscure way; we
want to verify that dentry is negative, so let's express that in the
idiomatic way.
And that's it - with that done, we can add a field, obviously with
forward declaration of struct select_data, etc. and have it explicitly
initialized whenever dentry goes negative. Instead of zeroing
->d_u.d_alias.{next,pprev} as we do now.
Currently !hlist_unlinked(&dentry->d_u.d_alias) is equivalent to
dentry->d_inode != NULL, with identical stability requirements.
And nobody ever traverses that hlist without ->i_lock - no RCU accesses
there. We do have lockless checks that list is not empty (right before
grabbing ->i_lock and rechecking), but those come from the inode side;
"are there any aliases for this inode" rather than "is this dentry
an alias for anything (== positive)".
I'm putting together short documentation on d_inode/d_alias/i_dentry/type
bits in d_flags; should be done by tomorrow morning...
PS: a fun catch while doing that code audit - AFAICS, we don't really need
to play with fake root dentry for NFS anymore; the reason why it used to be
needed had been gone since 2013 as an unnoticed side effect of switching
shrink_dcache_for_umount() to use of d_walk()... Obviously needs
a review from NFS folks, but if they see no problems with that, it would
be nice to get rid of that kludge, as in
diff --git a/fs/nfs/getroot.c b/fs/nfs/getroot.c
index f13d25d95b85..2ac8404e1a15 100644
--- a/fs/nfs/getroot.c
+++ b/fs/nfs/getroot.c
@@ -32,35 +32,6 @@
#define NFSDBG_FACILITY NFSDBG_CLIENT
-/*
- * Set the superblock root dentry.
- * Note that this function frees the inode in case of error.
- */
-static int nfs_superblock_set_dummy_root(struct super_block *sb, struct inode *inode)
-{
- /* The mntroot acts as the dummy root dentry for this superblock */
- if (sb->s_root == NULL) {
- sb->s_root = d_make_root(inode);
- if (sb->s_root == NULL)
- return -ENOMEM;
- ihold(inode);
- /*
- * Ensure that this dentry is invisible to d_find_alias().
- * Otherwise, it may be spliced into the tree by
- * d_splice_alias if a parent directory from the same
- * filesystem gets mounted at a later time.
- * This again causes shrink_dcache_for_umount_subtree() to
- * Oops, since the test for IS_ROOT() will fail.
- */
- spin_lock(&d_inode(sb->s_root)->i_lock);
- spin_lock(&sb->s_root->d_lock);
- hlist_del_init(&sb->s_root->d_u.d_alias);
- spin_unlock(&sb->s_root->d_lock);
- spin_unlock(&d_inode(sb->s_root)->i_lock);
- }
- return 0;
-}
-
/*
* get a root dentry from the root filehandle
*/
@@ -99,10 +70,6 @@ int nfs_get_root(struct super_block *s, struct fs_context *fc)
goto out_fattr;
}
- error = nfs_superblock_set_dummy_root(s, inode);
- if (error != 0)
- goto out_fattr;
-
/* root dentries normally start off anonymous and get spliced in later
* if the dentry tree reaches them; however if the dentry already
* exists, we'll pick it up at this point and use it as the root
@@ -115,7 +82,6 @@ int nfs_get_root(struct super_block *s, struct fs_context *fc)
goto out_fattr;
}
- security_d_instantiate(root, inode);
spin_lock(&root->d_lock);
if (IS_ROOT(root) && !root->d_fsdata &&
!(root->d_flags & DCACHE_NFSFS_RENAMED)) {
@@ -123,6 +89,8 @@ int nfs_get_root(struct super_block *s, struct fs_context *fc)
name = NULL;
}
spin_unlock(&root->d_lock);
+ if (!s->s_root)
+ s->s_root = dget(root);
fc->root = root;
if (server->caps & NFS_CAP_SECURITY_LABEL)
kflags |= SECURITY_LSM_NATIVE_LABELS;
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH][RFC] get rid of busy-wait in shrink_dcache_tree()
2026-01-24 5:36 ` Al Viro
@ 2026-01-24 17:45 ` Linus Torvalds
2026-01-24 18:43 ` Al Viro
0 siblings, 1 reply; 10+ messages in thread
From: Linus Torvalds @ 2026-01-24 17:45 UTC (permalink / raw)
To: Al Viro
Cc: linux-fsdevel, Christian Brauner, Jan Kara, Nikolay Borisov,
Max Kellermann
On Fri, 23 Jan 2026 at 21:34, Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> The only trouble is that as soon as some joker slaps __randomize_layout
> on struct hlist_node they'll start flipping from sharing with ->next to
> sharing with ->pprev, at random.
If somebody starts using randomize_layout on core data structures,
they get what they deserve.
We have tons of data structures that are *NOT* randomizable.
In fact, RANDSTRUCT is so broken in general that we actually taint the
kernel if you enable that crazy option in the first place. So no,
"what if somebody enables it on random things" is not even remotely
worth worrying about.
Linus
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH][RFC] get rid of busy-wait in shrink_dcache_tree()
2026-01-24 17:45 ` Linus Torvalds
@ 2026-01-24 18:43 ` Al Viro
2026-01-24 19:32 ` Linus Torvalds
0 siblings, 1 reply; 10+ messages in thread
From: Al Viro @ 2026-01-24 18:43 UTC (permalink / raw)
To: Linus Torvalds
Cc: linux-fsdevel, Christian Brauner, Jan Kara, Nikolay Borisov,
Max Kellermann
On Sat, Jan 24, 2026 at 09:45:54AM -0800, Linus Torvalds wrote:
> On Fri, 23 Jan 2026 at 21:34, Al Viro <viro@zeniv.linux.org.uk> wrote:
> >
> > The only trouble is that as soon as some joker slaps __randomize_layout
> > on struct hlist_node they'll start flipping from sharing with ->next to
> > sharing with ->pprev, at random.
>
> If somebody starts using randomize_layout on core data structures,
> they get what they deserve.
>
> We have tons of data structures that are *NOT* randomizable.
>
> In fact, RANDSTRUCT is so broken in general that we actually taint the
> kernel if you enable that crazy option in the first place. So no,
> "what if somebody enables it on random things" is not even remotely
> worth worrying about.
Very much agreed, but we *do* have that on e.g. struct path (two pointers),
as well as struct inode, struct file, struct mount, etc. As far as VFS goes,
those are core data structures...
While we are at it, does anybody have objections to making dentry->d_u anonymous?
We are already using anonymous union members in struct dentry, so compiler support
is no longer a consideration.
Another thing in the same area:
#define for_each_alias(dentry, inode) \
hlist_for_each_entry(dentry, &(inode)->i_dentry, d_u.d_alias)
to avoid the boilerplate. Objections?
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH][RFC] get rid of busy-wait in shrink_dcache_tree()
2026-01-24 18:43 ` Al Viro
@ 2026-01-24 19:32 ` Linus Torvalds
2026-01-24 20:28 ` Al Viro
0 siblings, 1 reply; 10+ messages in thread
From: Linus Torvalds @ 2026-01-24 19:32 UTC (permalink / raw)
To: Al Viro
Cc: linux-fsdevel, Christian Brauner, Jan Kara, Nikolay Borisov,
Max Kellermann
On Sat, 24 Jan 2026 at 10:41, Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> > In fact, RANDSTRUCT is so broken in general that we actually taint the
> > kernel if you enable that crazy option in the first place. So no,
> > "what if somebody enables it on random things" is not even remotely
> > worth worrying about.
>
> Very much agreed, but we *do* have that on e.g. struct path (two pointers),
> as well as struct inode, struct file, struct mount, etc. As far as VFS goes,
> those are core data structures...
I certainly wouldn't mind if we remove the 'struct path' one in
particular. It's insanely pointless to do randstruct on two fields.
The inode and file structs make a lot more sense, in that they are
obviously not only much bigger, but they are prime candidates for the
kind of attacks that randstruct is designed for.
> While we are at it, does anybody have objections to making dentry->d_u anonymous?
> We are already using anonymous union members in struct dentry, so compiler support
> is no longer a consideration.
>
> Another thing in the same area:
>
> #define for_each_alias(dentry, inode) \
> hlist_for_each_entry(dentry, &(inode)->i_dentry, d_u.d_alias)
>
> to avoid the boilerplate. Objections?
No objection to either of those from me.
Linus
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH][RFC] get rid of busy-wait in shrink_dcache_tree()
2026-01-24 19:32 ` Linus Torvalds
@ 2026-01-24 20:28 ` Al Viro
0 siblings, 0 replies; 10+ messages in thread
From: Al Viro @ 2026-01-24 20:28 UTC (permalink / raw)
To: Linus Torvalds
Cc: linux-fsdevel, Christian Brauner, Jan Kara, Nikolay Borisov,
Max Kellermann
On Sat, Jan 24, 2026 at 11:32:59AM -0800, Linus Torvalds wrote:
> On Sat, 24 Jan 2026 at 10:41, Al Viro <viro@zeniv.linux.org.uk> wrote:
> >
> > > In fact, RANDSTRUCT is so broken in general that we actually taint the
> > > kernel if you enable that crazy option in the first place. So no,
> > > "what if somebody enables it on random things" is not even remotely
> > > worth worrying about.
> >
> > Very much agreed, but we *do* have that on e.g. struct path (two pointers),
> > as well as struct inode, struct file, struct mount, etc. As far as VFS goes,
> > those are core data structures...
>
> I certainly wouldn't mind if we remove the 'struct path' one in
> particular. It's insanely pointless to do randstruct on two fields.
<wry>
Frankly, each time I talk to these folks I keep hearing "Every bit
is wanted/Every bit is good/Every bit is needed/In your neighborhood"
set to the obvious tune...
</wry>
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2026-01-24 20:26 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-22 20:20 [PATCH][RFC] get rid of busy-wait in shrink_dcache_tree() Al Viro
2026-01-23 0:19 ` Linus Torvalds
2026-01-23 0:36 ` Al Viro
2026-01-24 4:36 ` Al Viro
2026-01-24 4:46 ` Linus Torvalds
2026-01-24 5:36 ` Al Viro
2026-01-24 17:45 ` Linus Torvalds
2026-01-24 18:43 ` Al Viro
2026-01-24 19:32 ` Linus Torvalds
2026-01-24 20:28 ` Al Viro
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox