From: Al Viro <viro@zeniv.linux.org.uk>
To: linux-fsdevel@vger.kernel.org
Cc: torvalds@linux-foundation.org, brauner@kernel.org, jack@suse.cz,
raven@themaw.net, miklos@szeredi.hu, neil@brown.name,
a.hindborg@kernel.org, linux-mm@kvack.org,
linux-efi@vger.kernel.org, ocfs2-devel@lists.linux.dev,
kees@kernel.org, rostedt@goodmis.org, gregkh@linuxfoundation.org,
linux-usb@vger.kernel.org, paul@paul-moore.com,
casey@schaufler-ca.com, linuxppc-dev@lists.ozlabs.org,
john.johansen@canonical.com, selinux@vger.kernel.org,
borntraeger@linux.ibm.com, bpf@vger.kernel.org, clm@meta.com
Subject: [PATCH v4 05/54] introduce a flag for explicitly marking persistently pinned dentries
Date: Tue, 18 Nov 2025 05:15:14 +0000 [thread overview]
Message-ID: <20251118051604.3868588-6-viro@zeniv.linux.org.uk> (raw)
In-Reply-To: <20251118051604.3868588-1-viro@zeniv.linux.org.uk>
Some filesystems use a kinda-sorta controlled dentry refcount leak to pin
dentries of created objects in dcache (and undo it when removing those).
Reference is grabbed and not released, but it's not actually _stored_
anywhere. That works, but it's hard to follow and verify; among other
things, we have no way to tell _which_ of the increments is intended
to be an unpaired one. Worse, on removal we need to decide whether
the reference had already been dropped, which can be non-trivial if
that removal is on umount and we need to figure out if this dentry is
pinned due to e.g. unlink() not done. Usually that is handled by using
kill_litter_super() as ->kill_sb(), but there are open-coded special
cases of the same (consider e.g. /proc/self).
Things get simpler if we introduce a new dentry flag (DCACHE_PERSISTENT)
marking those "leaked" dentries. Having it set claims responsibility
for +1 in refcount.
The end result this series is aiming for:
* get these unbalanced dget() and dput() replaced with new primitives that
would, in addition to adjusting refcount, set and clear persistency flag.
* instead of having kill_litter_super() mess with removing the remaining
"leaked" references (e.g. for all tmpfs files that hadn't been removed
prior to umount), have the regular shrink_dcache_for_umount() strip
DCACHE_PERSISTENT of all dentries, dropping the corresponding
reference if it had been set. After that kill_litter_super() becomes
an equivalent of kill_anon_super().
Doing that in a single step is not feasible - it would affect too many places
in too many filesystems. It has to be split into a series.
Here we
* introduce the new flag
* teach shrink_dcache_for_umount() to handle it (i.e. remove
and drop refcount on anything that survives to umount with that flag
still set)
* teach kill_litter_super() that anything with that flag does
*not* need to be unpinned.
Next commits will add primitives for maintaing that flag and convert the
common helpers to those. After that - a long series of per-filesystem
patches converting to those primitives.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
fs/dcache.c | 27 ++++++++++++++++++++++-----
include/linux/dcache.h | 1 +
2 files changed, 23 insertions(+), 5 deletions(-)
diff --git a/fs/dcache.c b/fs/dcache.c
index 035cccbc9276..f2c9f4fef2a2 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -1511,6 +1511,15 @@ static enum d_walk_ret select_collect(void *_data, struct dentry *dentry)
return ret;
}
+static enum d_walk_ret select_collect_umount(void *_data, struct dentry *dentry)
+{
+ if (dentry->d_flags & DCACHE_PERSISTENT) {
+ dentry->d_flags &= ~DCACHE_PERSISTENT;
+ dentry->d_lockref.count--;
+ }
+ return select_collect(_data, dentry);
+}
+
static enum d_walk_ret select_collect2(void *_data, struct dentry *dentry)
{
struct select_data *data = _data;
@@ -1539,18 +1548,20 @@ static enum d_walk_ret select_collect2(void *_data, struct dentry *dentry)
}
/**
- * shrink_dcache_parent - prune dcache
+ * shrink_dcache_tree - prune dcache
* @parent: parent of entries to prune
+ * @for_umount: true if we want to unpin the persistent ones
*
* Prune the dcache to remove unused children of the parent dentry.
*/
-void shrink_dcache_parent(struct dentry *parent)
+static void shrink_dcache_tree(struct dentry *parent, bool for_umount)
{
for (;;) {
struct select_data data = {.start = parent};
INIT_LIST_HEAD(&data.dispose);
- d_walk(parent, &data, select_collect);
+ d_walk(parent, &data,
+ for_umount ? select_collect_umount : select_collect);
if (!list_empty(&data.dispose)) {
shrink_dentry_list(&data.dispose);
@@ -1575,6 +1586,11 @@ void shrink_dcache_parent(struct dentry *parent)
shrink_dentry_list(&data.dispose);
}
}
+
+void shrink_dcache_parent(struct dentry *parent)
+{
+ shrink_dcache_tree(parent, false);
+}
EXPORT_SYMBOL(shrink_dcache_parent);
static enum d_walk_ret umount_check(void *_data, struct dentry *dentry)
@@ -1601,7 +1617,7 @@ static enum d_walk_ret umount_check(void *_data, struct dentry *dentry)
static void do_one_tree(struct dentry *dentry)
{
- shrink_dcache_parent(dentry);
+ shrink_dcache_tree(dentry, true);
d_walk(dentry, dentry, umount_check);
d_drop(dentry);
dput(dentry);
@@ -3111,7 +3127,8 @@ static enum d_walk_ret d_genocide_kill(void *data, struct dentry *dentry)
{
struct dentry *root = data;
if (dentry != root) {
- if (d_unhashed(dentry) || !dentry->d_inode)
+ if (d_unhashed(dentry) || !dentry->d_inode ||
+ dentry->d_flags & DCACHE_PERSISTENT)
return D_WALK_SKIP;
if (!(dentry->d_flags & DCACHE_GENOCIDE)) {
diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index c83e02b94389..94b58655322a 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -225,6 +225,7 @@ enum dentry_flags {
DCACHE_PAR_LOOKUP = BIT(24), /* being looked up (with parent locked shared) */
DCACHE_DENTRY_CURSOR = BIT(25),
DCACHE_NORCU = BIT(26), /* No RCU delay for freeing */
+ DCACHE_PERSISTENT = BIT(27)
};
#define DCACHE_MANAGED_DENTRY \
--
2.47.3
next prev parent reply other threads:[~2025-11-18 5:22 UTC|newest]
Thread overview: 81+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-18 5:15 [PATCH v4 00/54] tree-in-dcache stuff Al Viro
2025-11-18 5:15 ` [PATCH v4 01/54] fuse_ctl_add_conn(): fix nlink breakage in case of early failure Al Viro
2025-11-18 5:15 ` [PATCH v4 02/54] tracefs: fix a leak in eventfs_create_events_dir() Al Viro
2025-11-18 5:15 ` [PATCH v4 03/54] new helper: simple_remove_by_name() Al Viro
2025-11-18 5:15 ` [PATCH v4 04/54] new helper: simple_done_creating() Al Viro
2025-11-18 5:15 ` Al Viro [this message]
2025-11-18 5:15 ` [PATCH v4 06/54] primitives for maintaining persisitency Al Viro
2025-11-18 5:15 ` [PATCH v4 07/54] convert simple_{link,unlink,rmdir,rename,fill_super}() to new primitives Al Viro
2025-11-18 5:15 ` [PATCH v4 08/54] convert ramfs and tmpfs Al Viro
2025-11-18 5:15 ` [PATCH v4 09/54] procfs: make /self and /thread_self dentries persistent Al Viro
2025-11-18 5:15 ` [PATCH v4 10/54] configfs, securityfs: kill_litter_super() not needed Al Viro
2025-11-18 5:15 ` [PATCH v4 11/54] convert xenfs Al Viro
2025-11-18 5:15 ` [PATCH v4 12/54] convert smackfs Al Viro
2025-11-18 5:15 ` [PATCH v4 13/54] convert hugetlbfs Al Viro
2025-11-18 5:15 ` [PATCH v4 14/54] convert mqueue Al Viro
2025-11-18 5:15 ` [PATCH v4 15/54] convert bpf Al Viro
2025-11-18 5:15 ` [PATCH v4 16/54] convert dlmfs Al Viro
2025-11-18 5:15 ` [PATCH v4 17/54] convert fuse_ctl Al Viro
2025-11-18 5:15 ` [PATCH v4 18/54] convert pstore Al Viro
2025-11-18 5:15 ` [PATCH v4 19/54] convert tracefs Al Viro
2025-11-18 5:15 ` [PATCH v4 20/54] convert debugfs Al Viro
2025-11-18 5:15 ` [PATCH v4 21/54] debugfs: remove duplicate checks in callers of start_creating() Al Viro
2025-11-18 5:15 ` [PATCH v4 22/54] convert efivarfs Al Viro
2025-11-18 5:15 ` [PATCH v4 23/54] convert spufs Al Viro
2025-11-18 5:15 ` [PATCH v4 24/54] convert ibmasmfs Al Viro
2025-11-18 5:15 ` [PATCH v4 25/54] ibmasmfs: get rid of ibmasmfs_dir_ops Al Viro
2025-11-18 5:15 ` [PATCH v4 26/54] convert devpts Al Viro
2025-11-18 5:15 ` [PATCH v4 27/54] binderfs: use simple_start_creating() Al Viro
2025-11-18 5:15 ` [PATCH v4 28/54] binderfs_binder_ctl_create(): kill a bogus check Al Viro
2025-11-18 5:15 ` [PATCH v4 29/54] convert binderfs Al Viro
2025-11-18 5:15 ` [PATCH v4 30/54] autofs_{rmdir,unlink}: dentry->d_fsdata->dentry == dentry there Al Viro
2025-11-18 5:15 ` [PATCH v4 31/54] convert autofs Al Viro
2025-11-18 5:15 ` [PATCH v4 32/54] convert binfmt_misc Al Viro
2025-11-18 5:15 ` [PATCH v4 33/54] selinuxfs: don't stash the dentry of /policy_capabilities Al Viro
2025-11-18 5:15 ` [PATCH v4 34/54] selinuxfs: new helper for attaching files to tree Al Viro
2025-11-18 5:15 ` [PATCH v4 35/54] convert selinuxfs Al Viro
2025-11-18 5:15 ` [PATCH v4 36/54] functionfs: don't abuse ffs_data_closed() on fs shutdown Al Viro
2025-11-18 5:15 ` [PATCH v4 37/54] functionfs: don't bother with ffs->ref in ffs_data_{opened,closed}() Al Viro
2025-11-18 5:15 ` [PATCH v4 38/54] functionfs: need to cancel ->reset_work in ->kill_sb() Al Viro
2025-11-18 5:15 ` [PATCH v4 39/54] functionfs: fix the open/removal races Al Viro
2025-11-18 5:15 ` [PATCH v4 40/54] functionfs: switch to simple_remove_by_name() Al Viro
2025-11-18 5:15 ` [PATCH v4 41/54] convert functionfs Al Viro
2025-11-18 5:15 ` [PATCH v4 42/54] gadgetfs: switch to simple_remove_by_name() Al Viro
2025-11-18 5:15 ` [PATCH v4 43/54] convert gadgetfs Al Viro
2025-11-18 5:15 ` [PATCH v4 44/54] hypfs: don't pin dentries twice Al Viro
2025-11-18 5:15 ` [PATCH v4 45/54] hypfs: switch hypfs_create_str() to returning int Al Viro
2025-11-18 5:15 ` [PATCH v4 46/54] hypfs: swich hypfs_create_u64() " Al Viro
2025-11-18 5:15 ` [PATCH v4 47/54] convert hypfs Al Viro
2025-11-18 5:15 ` [PATCH v4 48/54] convert rpc_pipefs Al Viro
2025-11-18 5:15 ` [PATCH v4 49/54] convert nfsctl Al Viro
2025-11-18 5:15 ` [PATCH v4 50/54] convert rust_binderfs Al Viro
2025-11-18 5:16 ` [PATCH v4 51/54] get rid of kill_litter_super() Al Viro
2025-11-18 5:16 ` [PATCH v4 52/54] convert securityfs Al Viro
2025-11-18 5:16 ` [PATCH v4 53/54] kill securityfs_recursive_remove() Al Viro
2025-11-18 5:16 ` [PATCH v4 54/54] d_make_discardable(): warn if given a non-persistent dentry Al Viro
2026-01-27 0:56 ` [PATCH v4 00/54] tree-in-dcache stuff Samuel Wu
2026-01-27 7:42 ` Greg KH
2026-01-27 18:39 ` Linus Torvalds
2026-01-27 20:14 ` Al Viro
2026-01-28 8:53 ` Greg KH
2026-01-28 2:02 ` Samuel Wu
2026-01-28 4:59 ` Al Viro
2026-01-29 0:58 ` Samuel Wu
2026-01-29 3:23 ` Al Viro
2026-01-29 22:54 ` Al Viro
2026-01-30 1:16 ` Samuel Wu
2026-01-30 7:04 ` Al Viro
2026-01-30 22:31 ` Samuel Wu
2026-01-30 23:57 ` Al Viro
2026-01-31 0:14 ` Linus Torvalds
2026-01-31 1:08 ` Al Viro
2026-01-31 1:11 ` Linus Torvalds
2026-02-01 0:11 ` Al Viro
2026-01-31 0:59 ` Al Viro
2026-01-31 1:05 ` Samuel Wu
2026-01-31 1:18 ` Al Viro
2026-01-31 2:09 ` Samuel Wu
2026-01-31 2:43 ` Al Viro
2026-01-31 19:48 ` Samuel Wu
2026-01-31 14:58 ` Krishna Kurapati PSSNV
2026-01-31 20:02 ` Samuel Wu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20251118051604.3868588-6-viro@zeniv.linux.org.uk \
--to=viro@zeniv.linux.org.uk \
--cc=a.hindborg@kernel.org \
--cc=borntraeger@linux.ibm.com \
--cc=bpf@vger.kernel.org \
--cc=brauner@kernel.org \
--cc=casey@schaufler-ca.com \
--cc=clm@meta.com \
--cc=gregkh@linuxfoundation.org \
--cc=jack@suse.cz \
--cc=john.johansen@canonical.com \
--cc=kees@kernel.org \
--cc=linux-efi@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-usb@vger.kernel.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=miklos@szeredi.hu \
--cc=neil@brown.name \
--cc=ocfs2-devel@lists.linux.dev \
--cc=paul@paul-moore.com \
--cc=raven@themaw.net \
--cc=rostedt@goodmis.org \
--cc=selinux@vger.kernel.org \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox