linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 00/50] tree-in-dcache stuff
@ 2025-10-28  0:45 Al Viro
  2025-10-28  0:45 ` [PATCH v2 01/50] fuse_ctl_add_conn(): fix nlink breakage in case of early failure Al Viro
                   ` (51 more replies)
  0 siblings, 52 replies; 87+ messages in thread
From: Al Viro @ 2025-10-28  0:45 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: torvalds, brauner, jack, raven, miklos, neil, a.hindborg,
	linux-mm, linux-efi, ocfs2-devel, kees, rostedt, gregkh,
	linux-usb, paul, casey, linuxppc-dev, john.johansen, selinux,
	borntraeger, bpf

Some filesystems use a kinda-sorta controlled dentry refcount leak to pin
dentries of created objects in dcache (and undo it when removing those).
Reference is grabbed and not released, but it's not actually _stored_
anywhere.  That works, but it's hard to follow and verify; among other
things, we have no way to tell _which_ of the increments is intended
to be an unpaired one.  Worse, on removal we need to decide whether
the reference had already been dropped, which can be non-trivial if
that removal is on umount and we need to figure out if this dentry is
pinned due to e.g. unlink() not done.  Usually that is handled by using
kill_litter_super() as ->kill_sb(), but there are open-coded special
cases of the same (consider e.g. /proc/self).

Things get simpler if we introduce a new dentry flag (DCACHE_PERSISTENT)
marking those "leaked" dentries.  Having it set claims responsibility
for +1 in refcount.

The end result this series is aiming for:

* get these unbalanced dget() and dput() replaced with new primitives that
  would, in addition to adjusting refcount, set and clear persistency flag.
* instead of having kill_litter_super() mess with removing the remaining
  "leaked" references (e.g. for all tmpfs files that hadn't been removed
  prior to umount), have the regular shrink_dcache_for_umount() strip
  DCACHE_PERSISTENT of all dentries, dropping the corresponding
  reference if it had been set.  After that kill_litter_super() becomes
  an equivalent of kill_anon_super().

Doing that in a single step is not feasible - it would affect too many places
in too many filesystems.  It has to be split into a series.

This work has really started early in 2024; quite a few preliminary pieces
have already gone into mainline.  This chunk is finally getting to the
meat of that stuff - infrastructure and most of the conversions to it.

Some pieces are still sitting in the local branches, but the bulk of
that stuff is here.

Compared to v[1/50] 
	* fusectl nlink leak fix
	* selinuxfs stuff split
	* rpc_pipe conversion added
	* nfsctl conversion added
	* rust_binderfs conversion added
	* securityfs conversion added
	* now that the last users are converted, kill_litter_super() is gone.
	* tracefs leak fix in LOCKDOWN_TRACEFS case added.
	* shmem_{un,}link() makes use of simple_{un,}link() rather than
open-coding those.
	* killed securityfs_recursive_remove() - it's an unused alias for
securityfs_remove
	* configfs and apparmorfs switched away from calling simple_unlink()
and simple_rmdir(), allowing to make d_make_discardable() warn when given
a dentry that had not been marked persistent.

The branch is -rc3-based; it lives in
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git #work.persistency
individual patches in followups.

Please, help with review and testing; it does appear to survive the local beating,
but extra eyes on it would be very welcome.

Shortlog:
      fuse_ctl_add_conn(): fix nlink breakage in case of early failure
      tracefs: fix a leak in eventfs_create_events_dir()
      new helper: simple_remove_by_name()
      new helper: simple_done_creating()
      introduce a flag for explicitly marking persistently pinned dentries
      primitives for maintaining persisitency
      convert simple_{link,unlink,rmdir,rename,fill_super}() to new primitives
      convert ramfs and tmpfs
      procfs: make /self and /thread_self dentries persistent
      configfs, securityfs: kill_litter_super() not needed
      convert xenfs
      convert smackfs
      convert hugetlbfs
      convert mqueue
      convert bpf
      convert dlmfs
      convert fuse_ctl
      convert pstore
      convert tracefs
      convert debugfs
      debugfs: remove duplicate checks in callers of start_creating()
      convert efivarfs
      convert spufs
      convert ibmasmfs
      ibmasmfs: get rid of ibmasmfs_dir_ops
      convert devpts
      binderfs: use simple_start_creating()
      binderfs_binder_ctl_create(): kill a bogus check
      convert binderfs
      autofs_{rmdir,unlink}: dentry->d_fsdata->dentry == dentry there
      convert autofs
      convert binfmt_misc
      selinuxfs: don't stash the dentry of /policy_capabilities
      selinuxfs: new helper for attaching files to tree
      convert selinuxfs
      functionfs: switch to simple_remove_by_name()
      convert functionfs
      gadgetfs: switch to simple_remove_by_name()
      convert gadgetfs
      hypfs: don't pin dentries twice
      hypfs: switch hypfs_create_str() to returning int
      hypfs: swich hypfs_create_u64() to returning int
      convert hypfs
      convert rpc_pipefs
      convert nfsctl
      convert rust_binderfs
      get rid of kill_litter_super()
      convert securityfs
      kill securityfs_recursive_remove()
      d_make_discardable(): warn if given a non-persistent dentry

Diffstat:
 Documentation/filesystems/porting.rst     |   7 ++
 arch/powerpc/platforms/cell/spufs/inode.c |  15 +--
 arch/s390/hypfs/hypfs.h                   |   6 +-
 arch/s390/hypfs/hypfs_diag_fs.c           |  60 ++++------
 arch/s390/hypfs/hypfs_vm_fs.c             |  21 ++--
 arch/s390/hypfs/inode.c                   |  82 +++++--------
 drivers/android/binder/rust_binderfs.c    | 121 ++++++-------------
 drivers/android/binderfs.c                |  82 +++----------
 drivers/base/devtmpfs.c                   |   2 +-
 drivers/misc/ibmasm/ibmasmfs.c            |  24 ++--
 drivers/usb/gadget/function/f_fs.c        |  54 ++++-----
 drivers/usb/gadget/legacy/inode.c         |  49 ++++----
 drivers/xen/xenfs/super.c                 |   2 +-
 fs/autofs/inode.c                         |   2 +-
 fs/autofs/root.c                          |  11 +-
 fs/binfmt_misc.c                          |  69 ++++++-----
 fs/configfs/dir.c                         |  10 +-
 fs/configfs/inode.c                       |   3 +-
 fs/configfs/mount.c                       |   2 +-
 fs/dcache.c                               | 111 +++++++++++-------
 fs/debugfs/inode.c                        |  32 ++----
 fs/devpts/inode.c                         |  57 ++++-----
 fs/efivarfs/inode.c                       |   7 +-
 fs/efivarfs/super.c                       |   5 +-
 fs/fuse/control.c                         |  38 +++---
 fs/hugetlbfs/inode.c                      |  12 +-
 fs/internal.h                             |   1 -
 fs/libfs.c                                |  52 +++++++--
 fs/nfsd/nfsctl.c                          |  18 +--
 fs/ocfs2/dlmfs/dlmfs.c                    |   8 +-
 fs/proc/base.c                            |   6 +-
 fs/proc/internal.h                        |   1 +
 fs/proc/root.c                            |  14 +--
 fs/proc/self.c                            |  10 +-
 fs/proc/thread_self.c                     |  11 +-
 fs/pstore/inode.c                         |   7 +-
 fs/ramfs/inode.c                          |   8 +-
 fs/super.c                                |   8 --
 fs/tracefs/event_inode.c                  |   7 +-
 fs/tracefs/inode.c                        |  13 +--
 include/linux/dcache.h                    |   4 +-
 include/linux/fs.h                        |   6 +-
 include/linux/proc_fs.h                   |   2 -
 include/linux/security.h                  |   2 -
 init/do_mounts.c                          |   2 +-
 ipc/mqueue.c                              |  12 +-
 kernel/bpf/inode.c                        |  15 +--
 mm/shmem.c                                |  38 ++----
 net/sunrpc/rpc_pipe.c                     |  27 ++---
 security/apparmor/apparmorfs.c            |  13 ++-
 security/inode.c                          |  35 +++---
 security/selinux/selinuxfs.c              | 185 +++++++++++++-----------------
 security/smack/smackfs.c                  |   2 +-
 53 files changed, 585 insertions(+), 806 deletions(-)

	Overview:

First two commits are bugfixes (fusectl and tracefs resp.)

[1/50] fuse_ctl_add_conn(): fix nlink breakage in case of early failure
[2/50] tracefs: fix a leak in eventfs_create_events_dir()

Next, two commits adding a couple of useful helpers, the next three adding
the infrastructure and the rest consists of per-filesystem conversions.

[3/50] new helper: simple_remove_by_name()
[4/50] new helper: simple_done_creating()
	end_creating_path() analogue for internal object creation; unlike
end_creating_path() no mount is passed to it (or guaranteed to exist, for
that matter - it might be used during the filesystem setup, before the
superblock gets attached to any mounts).

Infrastructure:
[5/50] introduce a flag for explicitly marking persistently pinned dentries
	* introduce the new flag
	* teach shrink_dcache_for_umount() to handle it (i.e. remove
and drop refcount on anything that survives to umount with that flag
still set)
	* teach kill_litter_super() that anything with that flag does
*not* need to be unpinned.
[6/50] primitives for maintaining persisitency
	* d_make_persistent(dentry, inode) - bump refcount, mark persistent
and make hashed positive.  Return value is a borrowed reference to dentry;
it can be used until something removes persistency (at the very least,
until the parent gets unlocked, but some filesystems may have stronger
exclusion).
	* d_make_discardable() - remove persistency mark and drop reference.

NOTE: at that stage d_make_discardable() does not reject dentries not
marked persistent - it acts as if the mark been set.

Rationale: less noise in series splitup that way.  We want (and on the
next commit will get) simple_unlink() to do the right thing - remove
persistency, if it's there.  However, it's used by many filesystems.
We would have either to convert them all at once or split simple_unlink()
into "want persistent" and "don't want persistent" versions, the latter
being the old one.  In the course of the series almost all callers
would migrate to the replacement, leaving only two pathological cases
with the old one.  The same goes for simple_rmdir() (two callers left in
the end), simple_recursive_removal() (all callers gone in the end), etc.
That's a lot of noise and it's easier to start with d_make_discardable()
quietly accepting non-persistent dentries, then, in the end, add private
copies of simple_unlink() and simple_rmdir() for two weird users (configfs
and apparmorfs) and have those use dput() instead of d_make_discardable().
At that point we'd be left with all callers of d_make_discardable()
always passing persistent dentries, allowing to add a warning in it.

[7/50] convert simple_{link,unlink,rmdir,rename,fill_super}() to new primitives
	See above re quietly accepting non-peristent dentries in
simple_unlink(), simple_rmdir(), etc.

	Converting filesystems:
[8/50] convert ramfs and tmpfs
[9/50] procfs: make /self and /thread_self dentries persistent
[10/50] configfs, securityfs: kill_litter_super() not needed
[11/50] convert xenfs
[12/50] convert smackfs
[13/50] convert hugetlbfs
[14/50] convert mqueue
[15/50] convert bpf
[16/50] convert dlmfs
[17/50] convert fuse_ctl
[18/50] convert pstore
[19/50] convert tracefs
[20/50] convert debugfs
[21/50] debugfs: remove duplicate checks in callers of start_creating()
[22/50] convert efivarfs
[23/50] convert spufs
[24/50] convert ibmasmfs
[25/50] ibmasmfs: get rid of ibmasmfs_dir_ops
[26/50] convert devpts
[27/50] binderfs: use simple_start_creating()
[28/50] binderfs_binder_ctl_create(): kill a bogus check
[29/50] convert binderfs
[30/50] autofs_{rmdir,unlink}: dentry->d_fsdata->dentry == dentry there
[31/50] convert autofs
[32/50] convert binfmt_misc
[33/50] selinuxfs: don't stash the dentry of /policy_capabilities
[34/50] selinuxfs: new helper for attaching files to tree
[35/50] convert selinuxfs
[36/50] functionfs: switch to simple_remove_by_name()
[37/50] convert functionfs
[38/50] gadgetfs: switch to simple_remove_by_name()
[39/50] convert gadgetfs
[40/50] hypfs: don't pin dentries twice
[41/50] hypfs: switch hypfs_create_str() to returning int
[42/50] hypfs: swich hypfs_create_u64() to returning int
[43/50] convert hypfs
[44/50] convert rpc_pipefs
[45/50] convert nfsctl
[46/50] convert rust_binderfs

	... and no kill_litter_super() callers remain, so we
can take it out:
[47/50] get rid of kill_litter_super()
	
	Followups:
[48/50] convert securityfs
	That was the last remaining user of simple_recursive_removal()
that did *not* mark things persistent.  Now the only places where
d_make_discardable() is still called for dentries that are not marked
persistent are the calls of simple_{unlink,rmdir}() in configfs and
apparmorfs.

[49/50] kill securityfs_recursive_remove()
	Unused macro...

[50/50] d_make_discardable(): warn if given a non-persistent dentry

At this point there are very few call chains that might lead to
d_make_discardable() on a dentry that hadn't been made persistent:
calls of simple_unlink() and simple_rmdir() in configfs and
apparmorfs.

Both filesystems do pin (part of) their contents in dcache, but
they are currently playing very unusual games with that.  Converting
them to more usual patterns might be possible, but it's definitely
going to be a long series of changes in both cases.

For now the easiest solution is to have both stop using simple_unlink()
and simple_rmdir() - that allows to make d_make_discardable() warn
when given a non-persistent dentry.

Rather than giving them full-blown private copies (with calls of
d_make_discardable() replaced with dput()), let's pull the parts of
simple_unlink() and simple_rmdir() that deal with timestamps and link
counts into separate helpers (__simple_unlink() and __simple_rmdir()
resp.) and have those used by configfs and apparmorfs.

-- 
2.47.3


^ permalink raw reply	[flat|nested] 87+ messages in thread

end of thread, other threads:[~2025-10-30 13:36 UTC | newest]

Thread overview: 87+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-28  0:45 [PATCH v2 00/50] tree-in-dcache stuff Al Viro
2025-10-28  0:45 ` [PATCH v2 01/50] fuse_ctl_add_conn(): fix nlink breakage in case of early failure Al Viro
2025-10-28  0:45 ` [PATCH v2 02/50] tracefs: fix a leak in eventfs_create_events_dir() Al Viro
2025-10-28  1:15   ` Steven Rostedt
2025-10-28  0:45 ` [PATCH v2 03/50] new helper: simple_remove_by_name() Al Viro
2025-10-28  0:45 ` [PATCH v2 04/50] new helper: simple_done_creating() Al Viro
2025-10-28  0:45 ` [PATCH v2 05/50] introduce a flag for explicitly marking persistently pinned dentries Al Viro
2025-10-28  0:45 ` [PATCH v2 06/50] primitives for maintaining persisitency Al Viro
2025-10-28 12:38   ` James Bottomley
2025-10-29  5:10     ` Al Viro
2025-10-29 15:25       ` James Bottomley
2025-10-28  0:45 ` [PATCH v2 07/50] convert simple_{link,unlink,rmdir,rename,fill_super}() to new primitives Al Viro
2025-10-29 14:02   ` [External] : " Mark Tinguely
2025-10-29 17:55     ` Al Viro
2025-10-28  0:45 ` [PATCH v2 08/50] convert ramfs and tmpfs Al Viro
2025-10-28  0:45 ` [PATCH v2 09/50] procfs: make /self and /thread_self dentries persistent Al Viro
2025-10-28  0:45 ` [PATCH v2 10/50] configfs, securityfs: kill_litter_super() not needed Al Viro
2025-10-28 23:58   ` Paul Moore
2025-10-29  6:18   ` Andreas Hindborg
2025-10-28  0:45 ` [PATCH v2 11/50] convert xenfs Al Viro
2025-10-28  0:45 ` [PATCH v2 12/50] convert smackfs Al Viro
2025-10-28  0:45 ` [PATCH v2 13/50] convert hugetlbfs Al Viro
2025-10-28  0:45 ` [PATCH v2 14/50] convert mqueue Al Viro
2025-10-28  0:45 ` [PATCH v2 15/50] convert bpf Al Viro
2025-10-28  0:45 ` [PATCH v2 16/50] convert dlmfs Al Viro
2025-10-28  0:45 ` [PATCH v2 17/50] convert fuse_ctl Al Viro
2025-10-28  0:45 ` [PATCH v2 18/50] convert pstore Al Viro
2025-10-28  0:45 ` [PATCH v2 19/50] convert tracefs Al Viro
2025-10-28 15:37   ` Steven Rostedt
2025-10-28  0:45 ` [PATCH v2 20/50] convert debugfs Al Viro
2025-10-28  0:45 ` [PATCH v2 21/50] debugfs: remove duplicate checks in callers of start_creating() Al Viro
2025-10-28  0:45 ` [PATCH v2 22/50] convert efivarfs Al Viro
2025-10-28 12:53   ` James Bottomley
2025-10-28 17:45     ` Al Viro
2025-10-28 21:08       ` Al Viro
2025-10-28 21:34         ` Ard Biesheuvel
2025-10-29 18:08           ` Al Viro
2025-10-29 18:26             ` Ard Biesheuvel
2025-10-29 18:57           ` James Bottomley
2025-10-29 19:37             ` Al Viro
2025-10-29 19:48               ` James Bottomley
2025-10-30 13:35               ` Ard Biesheuvel
2025-10-28  0:45 ` [PATCH v2 23/50] convert spufs Al Viro
2025-10-28  1:15   ` bot+bpf-ci
2025-10-28  1:33     ` Al Viro
2025-10-28  0:45 ` [PATCH v2 24/50] convert ibmasmfs Al Viro
2025-10-28  0:45 ` [PATCH v2 25/50] ibmasmfs: get rid of ibmasmfs_dir_ops Al Viro
2025-10-28  0:45 ` [PATCH v2 26/50] convert devpts Al Viro
2025-10-28  0:45 ` [PATCH v2 27/50] binderfs: use simple_start_creating() Al Viro
2025-10-28  0:45 ` [PATCH v2 28/50] binderfs_binder_ctl_create(): kill a bogus check Al Viro
2025-10-28  0:45 ` [PATCH v2 29/50] convert binderfs Al Viro
2025-10-28  0:45 ` [PATCH v2 30/50] autofs_{rmdir,unlink}: dentry->d_fsdata->dentry == dentry there Al Viro
2025-10-28  0:45 ` [PATCH v2 31/50] convert autofs Al Viro
2025-10-28  1:55   ` Al Viro
2025-10-28  5:32     ` Linus Torvalds
2025-10-28  0:45 ` [PATCH v2 32/50] convert binfmt_misc Al Viro
2025-10-28  0:45 ` [PATCH v2 33/50] selinuxfs: don't stash the dentry of /policy_capabilities Al Viro
2025-10-29  0:08   ` Paul Moore
2025-10-29 15:19   ` Stephen Smalley
2025-10-28  0:45 ` [PATCH v2 34/50] selinuxfs: new helper for attaching files to tree Al Viro
2025-10-28 23:51   ` Paul Moore
2025-10-29 15:22   ` Stephen Smalley
2025-10-28  0:45 ` [PATCH v2 35/50] convert selinuxfs Al Viro
2025-10-29  0:02   ` Paul Moore
2025-10-29  3:24     ` Al Viro
2025-10-29 14:49       ` Paul Moore
2025-10-29 15:06   ` Stephen Smalley
2025-10-28  0:45 ` [PATCH v2 36/50] functionfs: switch to simple_remove_by_name() Al Viro
2025-10-28  8:47   ` Greg KH
2025-10-28  0:45 ` [PATCH v2 37/50] convert functionfs Al Viro
2025-10-28  0:45 ` [PATCH v2 38/50] gadgetfs: switch to simple_remove_by_name() Al Viro
2025-10-28  0:45 ` [PATCH v2 39/50] convert gadgetfs Al Viro
2025-10-28  0:45 ` [PATCH v2 40/50] hypfs: don't pin dentries twice Al Viro
2025-10-28  0:46 ` [PATCH v2 41/50] hypfs: switch hypfs_create_str() to returning int Al Viro
2025-10-28  0:46 ` [PATCH v2 42/50] hypfs: swich hypfs_create_u64() " Al Viro
2025-10-28  0:46 ` [PATCH v2 43/50] convert hypfs Al Viro
2025-10-28  0:46 ` [PATCH v2 44/50] convert rpc_pipefs Al Viro
2025-10-28  0:46 ` [PATCH v2 45/50] convert nfsctl Al Viro
2025-10-28  0:46 ` [PATCH v2 46/50] convert rust_binderfs Al Viro
2025-10-28  0:46 ` [PATCH v2 47/50] get rid of kill_litter_super() Al Viro
2025-10-28  0:46 ` [PATCH v2 48/50] convert securityfs Al Viro
2025-10-29  0:10   ` Paul Moore
2025-10-28  0:46 ` [PATCH v2 49/50] kill securityfs_recursive_remove() Al Viro
2025-10-29  0:04   ` Paul Moore
2025-10-28  0:46 ` [PATCH v2 50/50] d_make_discardable(): warn if given a non-persistent dentry Al Viro
2025-10-28  0:59 ` [PATCH v2 00/50] tree-in-dcache stuff Al Viro
2025-10-28  5:33 ` Linus Torvalds

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).