* [GIT PULL 00/16 for v7.2] v7.2
@ 2026-06-12 15:10 Christian Brauner
2026-06-12 15:11 ` [GIT PULL 01/16 for v7.2] vfs kfunc Christian Brauner
` (15 more replies)
0 siblings, 16 replies; 17+ messages in thread
From: Christian Brauner @ 2026-06-12 15:10 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Christian Brauner, linux-fsdevel, linux-kernel
Hey Linus,
This is the batch of pull requests for the v7.2 merge window.
This cycle is light on new uapi and heavy on infrastructure: a couple
of long-standing scalability problems are fixed and a few pieces of
filesystem behavior that file servers have wanted for a long time are
finally exposed.
Case folding behavior of local filesystems is now exposed so file
servers - nfsd, ksmbd, and user space servers - can report it to
clients instead of guessing. Filesystems report case-insensitive and
case-nonpreserving behavior through fileattr_get and nfsd implements
NFSv3 PATHCONF and the NFSv4 FATTR4_CASE_INSENSITIVE and
FATTR4_CASE_PRESERVING attributes which have been part of the NFS
protocols for decades. Windows NFS clients hard-require this
information for Win32 applications to behave correctly, the Linux
client uses it to disable negative dentry caching on case-insensitive
shares, and multi-protocol NFS/SMB servers need it to participate as
first-class citizens in such environments.
openat2() grows two new flags. O_EMPTYPATH allows reopening the file
behind an O_PATH file descriptor through an empty path string,
removing the detour through /proc/<pid>/fd and the procfs dependency
that comes with it. OPENAT2_REGULAR refuses to open anything but
regular files, returning the new EFTYPE error code, so services can
protect themselves against being redirected to fifos or device nodes.
exec gains a per-task task_exec_state structure holding the dumpable
mode and the user namespace captured at execve(). Both used to live on
mm_struct which exit_mm() clears long before a task is reaped, so
__ptrace_may_access() and several /proc visibility checks misbehaved
for zombies - denying legitimate access to non-dumpable zombies that
were running in nested user namespaces. exec also stops tearing down
the old mm while holding exec_update_lock and cred_guard_mutex, so
execve() of a large process no longer blocks ptrace_attach() and every
exec_update_lock reader for the duration of the teardown.
The VFS prerequisites for directory delegations land: lease holders
can opt out of having specific directory change events break their
delegation and fsnotify grows the helpers nfsd needs to drive
CB_NOTIFY callbacks from inotify watches in a future cycle.
Acquiring an inode reference becomes lockless as long as the refcount
was already at least 1, so only the 0->1 and 1->0 transitions take
inode->i_lock anymore.
The race between cgroup_writeback_umount() and inode_switch_wbs() that
could trigger "VFS: Busy inodes after unmount" and a use-after-free on
percpu counters is fixed, and the global serialization in the umount
path is replaced with a per-sb counter. Umount latency under cgroup
writeback churn drops from ~92-138ms p50 to ~5-8ms p50. Writeback also
learns to track dirty RWF_DONTCACHE pages per bdi_writeback so the
flusher can be kicked in a targeted fashion, improving uncached write
performance.
b_end_io is removed from struct buffer_head. The completion path loses
an indirect function call, struct buffer_head shrinks from 104 to 96
bytes, and a corruptible function pointer in the middle of a writable
data structure goes away. All in-tree users are converted to the new
bh_submit() interface.
fs/eventpoll.c is extensively documented and refactored. The
invariants the recent UAF fixes relied on were nowhere written down
and had to be reverse-engineered, so they are now codified in source,
long function bodies are split into named helpers, and the per-CTL_ADD
scratch state moves off file-scope globals. epoll also gains a
file-based control interface so io_uring can stop supporting nested
epoll contexts, and a long-standing race that made epoll_wait() report
false negatives with a zero timeout is fixed.
The simple xattr infrastructure moves its hash table into a
per-superblock cache and handles lazy allocation internally instead of
burdening every caller. On top of this bpffs gains support for
trusted.* and security.* xattrs so metadata like content hashes or
security labels can be attached to pinned objects.
iomap brings the vfs infrastructure required for fs-verity support in
XFS with a post-EOF merkle tree, stops pointlessly zeroing the iomap
on the final iteration which improves polled I/O IOPS by about 5%, and
introduces the IOMAP_F_ZERO_TAIL flag needed by filesystems with a
valid data length like exFAT and NTFS.
The string emitted from /proc/filesystems is pre-generated and cached
and the filesystems list is RCU-ified. The file is read by libselinux
and thus by a surprising number of programs; open+read+close goes from
~440k to ~1.06M ops/s single-threaded and from ~600k to ~3.3M ops/s
with 20 processes. procfs mounts with subset=pid are exempted from the
full mount visibility checks, unblocking procfs mounts in rootless
containers, and most ptrace_may_access() users in procfs now hold
exec_update_lock to avoid TOCTOU races with concurrent privileged
execve().
pipe writes pre-allocate pages outside pipe->mutex so readers no
longer stall behind a writer doing direct reclaim under the mutex,
improving throughput by 6-28% and up to 48% under memory pressure.
sget() is retired with the last users converted to sget_fc(), and the
exportfs support for block-style layouts is cleaned up in preparation
for multi-device filesystem exports.
Smaller items include a fix for the bpf dentry xattr kfuncs with
negative dentries, per-instance lockdep classes for rhashtable, fixes
and new helpers for the copy_struct_*() machinery, set_blocksize()
error handling for a pile of legacy filesystems that crashed when
mounting devices with sector size > PAGE_SIZE, SB_I_NOEXEC and
SB_I_NODEV being set by default in init_pseudo(), honouring SB_NOUSER
in the new mount API, a SOFTIRQ-unsafe lock order fix in fasync
signaling, an FS_USERNS_DELEGATABLE flag to unbreak delegated NFS
mounts in containers, documentation with guidelines for submitting new
filesystems, and assorted selftest fixes and cleanups.
Thanks!
Christian
^ permalink raw reply [flat|nested] 17+ messages in thread
* [GIT PULL 01/16 for v7.2] vfs kfunc
2026-06-12 15:10 [GIT PULL 00/16 for v7.2] v7.2 Christian Brauner
@ 2026-06-12 15:11 ` Christian Brauner
2026-06-12 15:11 ` [GIT PULL 02/16 for v7.2] vfs exportfs Christian Brauner
` (14 subsequent siblings)
15 siblings, 0 replies; 17+ messages in thread
From: Christian Brauner @ 2026-06-12 15:11 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Christian Brauner, linux-fsdevel, linux-kernel
Hey Linus,
/* Summary */
This contains a fix for the bpf filesystem kfuncs.
The bpf_set_dentry_xattr() and bpf_remove_dentry_xattr() kfuncs locked
the inode of the supplied dentry without checking whether the dentry is
negative. Passing a negative dentry (e.g., from security_inode_create)
caused a NULL pointer dereference. Negative dentries now fail with
EINVAL. The WARN_ON(!inode) in the bpf xattr permission helpers is
dropped as well since it could be triggered the same way, amounting to
a denial of service on systems with panic_on_warn enabled.
/* Testing */
gcc (Debian 14.2.0-19) 14.2.0
Debian clang version 19.1.7 (3+b1)
No build failures or warnings were observed.
/* Conflicts */
Merge conflicts with mainline
=============================
No known conflicts.
Merge conflicts with other trees
================================
The following changes since commit 254f49634ee16a731174d2ae34bc50bd5f45e731:
Linux 7.1-rc1 (2026-04-26 14:19:00 -0700)
are available in the Git repository at:
git@gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs tags/vfs-7.2-rc1.kfunc
for you to fetch changes up to 07410646f6ff1d23222f105ccab778957d401bbe:
bpf: fix crash in bpf_[set|remove]_dentry_xattr for negative dentries (2026-05-11 11:23:00 +0200)
----------------------------------------------------------------
vfs-7.2-rc1.kfunc
Please consider pulling these changes from the signed vfs-7.2-rc1.kfunc tag.
Thanks!
Christian
----------------------------------------------------------------
Matt Bobrowski (1):
bpf: fix crash in bpf_[set|remove]_dentry_xattr for negative dentries
fs/bpf_fs_kfuncs.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
^ permalink raw reply [flat|nested] 17+ messages in thread
* [GIT PULL 02/16 for v7.2] vfs exportfs
2026-06-12 15:10 [GIT PULL 00/16 for v7.2] v7.2 Christian Brauner
2026-06-12 15:11 ` [GIT PULL 01/16 for v7.2] vfs kfunc Christian Brauner
@ 2026-06-12 15:11 ` Christian Brauner
2026-06-12 15:12 ` [GIT PULL 03/16 for v7.2] vfs inode Christian Brauner
` (13 subsequent siblings)
15 siblings, 0 replies; 17+ messages in thread
From: Christian Brauner @ 2026-06-12 15:11 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Christian Brauner, linux-fsdevel, linux-kernel
Hey Linus,
/* Summary */
This cleans up the exportfs support for block-style layouts that
provide direct block device access: the operations for layout-based
block device access are split out of struct export_operations into a
separate header, ->commit_blocks() no longer takes a struct iattr
argument, and the way support for layout-based block device access is
detected is reworked. nfsd's blocklayout code also stops honoring
loca_time_modify. This is preparation for supporting export of more
than a single device per file system.
Note that the nfsd tree is based on a merge of this branch so these
changes may also reach you through the nfsd pull request.
/* Testing */
gcc (Debian 14.2.0-19) 14.2.0
Debian clang version 19.1.7 (3+b1)
No build failures or warnings were observed.
/* Conflicts */
Merge conflicts with mainline
=============================
No known conflicts.
Merge conflicts with other trees
================================
The following changes since commit 254f49634ee16a731174d2ae34bc50bd5f45e731:
Linux 7.1-rc1 (2026-04-26 14:19:00 -0700)
are available in the Git repository at:
git@gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs tags/vfs-7.2-rc1.exportfs
for you to fetch changes up to 79e33ddc62c03cce6c29f0792454e1d618228acf:
Merge patch series "cleanup block-style layouts exports" (2026-05-11 11:11:55 +0200)
----------------------------------------------------------------
vfs-7.2-rc1.exportfs
Please consider pulling these changes from the signed vfs-7.2-rc1.exportfs tag.
Thanks!
Christian
----------------------------------------------------------------
Christian Brauner (1):
Merge patch series "cleanup block-style layouts exports"
Christoph Hellwig (4):
nfsd/blocklayout: always ignore loca_time_modify
exportfs: split out the ops for layout-based block device access
exportfs: don't pass struct iattr to ->commit_blocks
exportfs,nfsd: rework checking for layout-based block device access support
MAINTAINERS | 2 +-
fs/nfsd/blocklayout.c | 37 ++++++++----------
fs/nfsd/export.c | 3 +-
fs/nfsd/nfs4layouts.c | 29 ++++----------
fs/xfs/xfs_export.c | 4 +-
fs/xfs/xfs_pnfs.c | 44 +++++++++++++++------
fs/xfs/xfs_pnfs.h | 11 +++---
include/linux/exportfs.h | 25 ++++--------
include/linux/exportfs_block.h | 88 ++++++++++++++++++++++++++++++++++++++++++
9 files changed, 162 insertions(+), 81 deletions(-)
create mode 100644 include/linux/exportfs_block.h
^ permalink raw reply [flat|nested] 17+ messages in thread
* [GIT PULL 03/16 for v7.2] vfs inode
2026-06-12 15:10 [GIT PULL 00/16 for v7.2] v7.2 Christian Brauner
2026-06-12 15:11 ` [GIT PULL 01/16 for v7.2] vfs kfunc Christian Brauner
2026-06-12 15:11 ` [GIT PULL 02/16 for v7.2] vfs exportfs Christian Brauner
@ 2026-06-12 15:12 ` Christian Brauner
2026-06-12 15:12 ` [GIT PULL 04/16 for v7.2] vfs directory delegations Christian Brauner
` (12 subsequent siblings)
15 siblings, 0 replies; 17+ messages in thread
From: Christian Brauner @ 2026-06-12 15:12 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Christian Brauner, linux-fsdevel, linux-kernel
Hey Linus,
/* Summary */
This extends the lockless ->i_count handling. iput() could already
decrement any value greater than 1 locklessly but acquiring a
reference always required taking inode->i_lock. Now acquiring a
reference is lockless as long as the count was already at least 1,
i.e., only the 0->1 and 1->0 transitions take the lock. This avoids
the lock for the common cases of nfs calling into the inode hash and
btrfs using igrab(). Cleanup-wise icount_read_once() is added to line
up with inode_state_read_once() and the open-coded ->i_count loads
across the tree are converted, and ihold() is relocated and tidied up.
On top of that some stale lock ordering annotations are retired from
the inode hash code: iunique() no longer takes the hash lock since the
inode hash became RCU-searchable and s_inode_list_lock is no longer
taken under the hash lock either.
/* Testing */
gcc (Debian 14.2.0-19) 14.2.0
Debian clang version 19.1.7 (3+b1)
No build failures or warnings were observed.
/* Conflicts */
Merge conflicts with mainline
=============================
No known conflicts.
Merge conflicts with other trees
================================
This has a merge conflict with the xfs tree in fs/xfs/xfs_trace.h
between commit 1113a6d6d5d133 ("xfs: remove the i_ino field in struct
xfs_inode") from the xfs tree and commit 769e143b115a4a ("fs: add
icount_read_once() and stop open-coding ->i_count loads") from this
tree, reported in [1]. It can be resolved as follows:
[1]: https://lore.kernel.org/linux-next/aigwDvQMI2CHiLl3@sirena.co.uk
diff --cc fs/xfs/xfs_trace.h
index ae5faa78783005,f87c738d84b248..00000000000000
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@@ -1157,8 -1157,8 +1157,8 @@@ DECLARE_EVENT_CLASS(xfs_iref_class
),
TP_fast_assign(
__entry->dev = VFS_I(ip)->i_sb->s_dev;
- __entry->ino = ip->i_ino;
+ __entry->ino = I_INO(ip);
- __entry->count = icount_read(VFS_I(ip));
+ __entry->count = icount_read_once(VFS_I(ip));
__entry->pincount = atomic_read(&ip->i_pincount);
__entry->iflags = ip->i_flags;
__entry->caller_ip = caller_ip;
The following changes since commit 254f49634ee16a731174d2ae34bc50bd5f45e731:
Linux 7.1-rc1 (2026-04-26 14:19:00 -0700)
are available in the Git repository at:
git@gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs tags/vfs-7.2-rc1.inode
for you to fetch changes up to 5b451b76c85c8309d2e02caa467b38f5999c986f:
fs: retire stale lock ordering annotations from inode hash (2026-05-11 23:12:29 +0200)
----------------------------------------------------------------
vfs-7.2-rc1.inode
Please consider pulling these changes from the signed vfs-7.2-rc1.inode tag.
Thanks!
Christian
----------------------------------------------------------------
Christian Brauner (1):
Merge patch series "assorted ->i_count changes + extension of lockless handling"
Mateusz Guzik (4):
fs: add icount_read_once() and stop open-coding ->i_count loads
fs: relocate and tidy up ihold()
fs: allow lockless ->i_count bumps as long as it does not transition 0->1
fs: retire stale lock ordering annotations from inode hash
arch/powerpc/platforms/cell/spufs/file.c | 2 +-
fs/btrfs/inode.c | 2 +-
fs/ceph/mds_client.c | 2 +-
fs/dcache.c | 4 ++
fs/ext4/ialloc.c | 4 +-
fs/hpfs/inode.c | 2 +-
fs/inode.c | 100 +++++++++++++++++++++++++------
fs/nfs/inode.c | 4 +-
fs/smb/client/inode.c | 2 +-
fs/ubifs/super.c | 2 +-
fs/xfs/xfs_inode.c | 2 +-
fs/xfs/xfs_trace.h | 2 +-
include/linux/fs.h | 13 ++++
include/trace/events/filelock.h | 2 +-
security/landlock/fs.c | 2 +-
15 files changed, 112 insertions(+), 33 deletions(-)
^ permalink raw reply [flat|nested] 17+ messages in thread
* [GIT PULL 04/16 for v7.2] vfs directory delegations
2026-06-12 15:10 [GIT PULL 00/16 for v7.2] v7.2 Christian Brauner
` (2 preceding siblings ...)
2026-06-12 15:12 ` [GIT PULL 03/16 for v7.2] vfs inode Christian Brauner
@ 2026-06-12 15:12 ` Christian Brauner
2026-06-12 15:12 ` [GIT PULL 05/16 for v7.2] vfs casefold Christian Brauner
` (11 subsequent siblings)
15 siblings, 0 replies; 17+ messages in thread
From: Christian Brauner @ 2026-06-12 15:12 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Christian Brauner, linux-fsdevel, linux-kernel
Hey Linus,
/* Summary */
This contains the VFS prerequisites for supporting directory
delegations in nfsd via CB_NOTIFY callbacks.
The filelock core gains support for ignoring delegation breaks for
directory change events together with an inode_lease_ignore_mask()
helper, and fsnotify gains fsnotify_modify_mark_mask() and a
FSNOTIFY_EVENT_RENAME data type. With this in place nfsd can request
delegations on directories and set up inotify watches to trigger
sending CB_NOTIFY events to clients instead of having every directory
change break the delegation. New tracepoints are added to fsnotify()
and to the start of break_lease(), and trace_break_lease_block() is
passed the currently blocking lease instead of the new one.
A follow-up fix moves the LEASE_BREAK_* flags out of
#ifdef CONFIG_FILE_LOCKING to fix the build for CONFIG_FILE_LOCKING=n
configurations.
Note that the nfsd tree is based on a merge of this branch so these
changes may also reach you through the nfsd pull request.
/* Testing */
gcc (Debian 14.2.0-19) 14.2.0
Debian clang version 19.1.7 (3+b1)
No build failures or warnings were observed.
/* Conflicts */
Merge conflicts with mainline
=============================
No known conflicts.
Merge conflicts with other trees
================================
The following changes since commit 254f49634ee16a731174d2ae34bc50bd5f45e731:
Linux 7.1-rc1 (2026-04-26 14:19:00 -0700)
are available in the Git repository at:
git@gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs tags/vfs-7.2-rc1.directory.delegations
for you to fetch changes up to 246bc86d0fd891273a8502314f158eab23af823c:
filelock: move LEASE_BREAK_* flags out of #ifdef CONFIG_FILE_LOCKING (2026-05-16 17:05:52 +0200)
----------------------------------------------------------------
vfs-7.2-rc1.directory.delegations
Please consider pulling these changes from the signed vfs-7.2-rc1.directory.delegations tag.
Thanks!
Christian
----------------------------------------------------------------
Christian Brauner (1):
Merge patch series "VFS changes for nfsd CB_NOTIFY callbacks in directory delegations"
Jeff Layton (8):
filelock: pass current blocking lease to trace_break_lease_block() rather than "new_fl"
filelock: add support for ignoring deleg breaks for dir change events
filelock: add a tracepoint to start of break_lease()
filelock: add an inode_lease_ignore_mask helper
fsnotify: new tracepoint in fsnotify()
fsnotify: add fsnotify_modify_mark_mask()
fsnotify: add FSNOTIFY_EVENT_RENAME data type
filelock: move LEASE_BREAK_* flags out of #ifdef CONFIG_FILE_LOCKING
fs/attr.c | 2 +-
fs/locks.c | 118 ++++++++++++++++++++++++++++++---------
fs/namei.c | 31 +++++-----
fs/notify/fsnotify.c | 5 ++
fs/notify/mark.c | 29 ++++++++++
fs/posix_acl.c | 4 +-
fs/xattr.c | 4 +-
include/linux/filelock.h | 66 ++++++++++++++--------
include/linux/fsnotify.h | 8 ++-
include/linux/fsnotify_backend.h | 21 +++++++
include/trace/events/filelock.h | 38 ++++++++++++-
include/trace/events/fsnotify.h | 51 +++++++++++++++++
include/trace/misc/fsnotify.h | 35 ++++++++++++
13 files changed, 341 insertions(+), 71 deletions(-)
create mode 100644 include/trace/events/fsnotify.h
create mode 100644 include/trace/misc/fsnotify.h
^ permalink raw reply [flat|nested] 17+ messages in thread
* [GIT PULL 05/16 for v7.2] vfs casefold
2026-06-12 15:10 [GIT PULL 00/16 for v7.2] v7.2 Christian Brauner
` (3 preceding siblings ...)
2026-06-12 15:12 ` [GIT PULL 04/16 for v7.2] vfs directory delegations Christian Brauner
@ 2026-06-12 15:12 ` Christian Brauner
2026-06-12 15:13 ` [GIT PULL 06/16 for v7.2] kernel task_exec_state Christian Brauner
` (10 subsequent siblings)
15 siblings, 0 replies; 17+ messages in thread
From: Christian Brauner @ 2026-06-12 15:12 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Christian Brauner, linux-fsdevel, linux-kernel
Hey Linus,
/* Summary */
This exposes the case folding behavior of local filesystems so that
file servers - nfsd, ksmbd, and user space file servers - can report
the actual behavior to clients instead of guessing.
Filesystems report case-insensitive and case-nonpreserving behavior
via new file_kattr flags in their fileattr_get implementations. fat,
exfat, ntfs3, hfs, hfsplus, xfs, cifs, nfs, vboxsf, and isofs are
wired up; local filesystems not explicitly handled default to the
usual POSIX behavior of case-sensitive and case-preserving. nfsd uses
this to report case folding via NFSv3 PATHCONF and to implement the
NFSv4 FATTR4_CASE_INSENSITIVE and FATTR4_CASE_PRESERVING attributes -
both have been part of the NFS protocols for decades to support
clients on non-POSIX systems - and ksmbd reports it via
FS_ATTRIBUTE_INFORMATION. Exposing the information through the
fileattr uapi covers user space file servers.
The immediate motivation is interoperability: Windows NFS clients
hard-require servers to report case-insensitivity for Win32
applications to work correctly, and a client that knows the server is
case-insensitive can avoid issuing multiple LOOKUP/READDIR requests
searching for case variants. The Linux NFS client already grew
support for case-insensitive shares years ago in support of the
Hammerspace NFS server - negative dentry caching must be disabled (a
lookup for "FILE.TXT" failing must not cache a negative entry when
"file.txt" exists) and directory change invalidation must drop cached
case-folded name variants. Such servers often operate in
multi-protocol environments where a single file service instance
caters to both NFS and SMB clients, and nfsd needs to report case
folding properly to participate as a first-class citizen there.
A follow-up series brings fixes for the initial work: the nfsd
case-info probe now uses kernel credentials, maps -ESTALE to
NFS3ERR_STALE, and has its cost capped across READDIR entries; the
nfs client avoids transiently zeroed case capability bits during the
probe and skips the pathconf probe when neither field is consumed;
the FS_CASEFOLD_FL semantics are clarified in the UAPI header; and
the tools UAPI headers are synced.
Note that the nfsd tree is based on a merge of this branch so these
changes may also reach you through the nfsd pull request.
/* Testing */
gcc (Debian 14.2.0-19) 14.2.0
Debian clang version 19.1.7 (3+b1)
No build failures or warnings were observed.
/* Conflicts */
Merge conflicts with mainline
=============================
No known conflicts.
Merge conflicts with other trees
================================
This has a merge conflict with the ntfs3 tree in fs/ntfs3/file.c,
fs/ntfs3/namei.c, and fs/ntfs3/ntfs_fs.h between commit eeb7b37b9700f
("ntfs3: Implement fileattr_get for case sensitivity") from this tree
and commit 245bbdd2b9d65 ("fs/ntfs3: add fileattr support") from the
ntfs3 tree. Reported with resolution in [1].
[1]: https://lore.kernel.org/linux-next/ahmF4spkQMYcQMGI@sirena.org.uk
The following changes since commit 254f49634ee16a731174d2ae34bc50bd5f45e731:
Linux 7.1-rc1 (2026-04-26 14:19:00 -0700)
are available in the Git repository at:
git@gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs tags/vfs-7.2-rc1.casefold
for you to fetch changes up to ea3120fd5153c967efb20e6e3330caecbf9d8b0a:
Merge patch series "Casefold Fixes" (2026-05-15 17:49:29 +0200)
----------------------------------------------------------------
vfs-7.2-rc1.casefold
Please consider pulling these changes from the signed vfs-7.2-rc1.casefold tag.
Thanks!
Christian
----------------------------------------------------------------
Christian Brauner (2):
Merge patch series "Exposing case folding behavior"
Merge patch series "Casefold Fixes"
Chuck Lever (22):
fs: Move file_kattr initialization to callers
fs: Add case sensitivity flags to file_kattr
fat: Implement fileattr_get for case sensitivity
exfat: Implement fileattr_get for case sensitivity
ntfs3: Implement fileattr_get for case sensitivity
hfs: Implement fileattr_get for case sensitivity
hfsplus: Report case sensitivity in fileattr_get
xfs: Report case sensitivity in fileattr_get
cifs: Implement fileattr_get for case sensitivity
nfs: Implement fileattr_get for case sensitivity
vboxsf: Implement fileattr_get for case sensitivity
isofs: Implement fileattr_get for case sensitivity
nfsd: Report export case-folding via NFSv3 PATHCONF
nfsd: Implement NFSv4 FATTR4_CASE_INSENSITIVE and FATTR4_CASE_PRESERVING
ksmbd: Report filesystem case sensitivity via FS_ATTRIBUTE_INFORMATION
tools headers UAPI: Sync case-sensitivity flags from linux/fs.h
nfs: Avoid transient zeroed case capability bits during probe
nfs: Skip pathconf probe when neither field is consumed
fs: Clarify FS_CASEFOLD_FL semantics in UAPI header
nfsd: Use kernel credentials for case-info probe
nfsd: Map -ESTALE from case probe to NFS3ERR_STALE
nfsd: Cap case-folding probe cost across READDIR entries
fs/exfat/exfat_fs.h | 2 +
fs/exfat/file.c | 18 ++++-
fs/exfat/namei.c | 1 +
fs/fat/fat.h | 3 +
fs/fat/file.c | 36 +++++++++
fs/fat/namei_msdos.c | 1 +
fs/fat/namei_vfat.c | 1 +
fs/file_attr.c | 16 ++--
fs/hfs/dir.c | 1 +
fs/hfs/hfs_fs.h | 2 +
fs/hfs/inode.c | 14 ++++
fs/hfsplus/inode.c | 16 +++-
fs/isofs/dir.c | 16 ++++
fs/isofs/isofs.h | 3 +
fs/nfs/client.c | 32 +++++---
fs/nfs/inode.c | 15 ++++
fs/nfs/internal.h | 3 +
fs/nfs/namespace.c | 2 +
fs/nfs/nfs3proc.c | 2 +
fs/nfs/nfs3xdr.c | 7 +-
fs/nfs/nfs4proc.c | 10 ++-
fs/nfs/proc.c | 3 +
fs/nfs/symlink.c | 3 +
fs/nfsd/nfs3proc.c | 39 ++++++++--
fs/nfsd/nfs4xdr.c | 99 +++++++++++++++++++++++--
fs/nfsd/vfs.c | 86 +++++++++++++++++++++
fs/nfsd/vfs.h | 3 +
fs/nfsd/xdr3.h | 4 +-
fs/nfsd/xdr4.h | 14 ++++
fs/ntfs3/file.c | 29 ++++++++
fs/ntfs3/namei.c | 1 +
fs/ntfs3/ntfs_fs.h | 1 +
fs/smb/client/cifsfs.c | 53 +++++++++++++
fs/smb/client/cifsfs.h | 3 +
fs/smb/client/namespace.c | 1 +
fs/smb/server/smb2pdu.c | 30 ++++++--
fs/vboxsf/dir.c | 1 +
fs/vboxsf/file.c | 6 +-
fs/vboxsf/super.c | 7 ++
fs/vboxsf/utils.c | 30 ++++++++
fs/vboxsf/vfsmod.h | 6 ++
fs/xfs/libxfs/xfs_inode_util.c | 2 +
fs/xfs/xfs_ioctl.c | 22 +++++-
include/linux/fileattr.h | 3 +-
include/linux/nfs_fs_sb.h | 2 +-
include/linux/nfs_xdr.h | 2 +
include/uapi/linux/fs.h | 18 ++++-
tools/perf/trace/beauty/include/uapi/linux/fs.h | 7 ++
48 files changed, 618 insertions(+), 58 deletions(-)
^ permalink raw reply [flat|nested] 17+ messages in thread
* [GIT PULL 06/16 for v7.2] kernel task_exec_state
2026-06-12 15:10 [GIT PULL 00/16 for v7.2] v7.2 Christian Brauner
` (4 preceding siblings ...)
2026-06-12 15:12 ` [GIT PULL 05/16 for v7.2] vfs casefold Christian Brauner
@ 2026-06-12 15:13 ` Christian Brauner
2026-06-12 15:13 ` [GIT PULL 07/16 for v7.2] kernel misc Christian Brauner
` (9 subsequent siblings)
15 siblings, 0 replies; 17+ messages in thread
From: Christian Brauner @ 2026-06-12 15:13 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Christian Brauner, linux-fsdevel, linux-kernel
Hey Linus,
/* Summary */
This introduces a new per-task task_exec_state structure and relocates
the dumpable mode and the user namespace captured at execve() from
mm_struct onto it. It stays attached to the task for its full
lifetime.
__ptrace_may_access() and several /proc owner and visibility checks
need to consult two pieces of state for any observable task, including
zombies that have already gone through exit_mm(): the dumpable mode
and the user namespace captured at execve(). Both live on mm_struct
today, which exit_mm() clears from the task long before the task is
reaped. A reader that races with do_exit() observes task->mm == NULL
and either fails the check or falls back to init_user_ns - which
denies legitimate access to non-dumpable zombies that were running in
a nested user namespace.
mm_struct loses ->user_ns and the dumpability bits in ->flags.
MMF_DUMPABLE_BITS is reserved so the MMF_DUMP_FILTER_* layout exposed
via /proc/<pid>/coredump_filter stays stable. task->user_dumpable and
its exit_mm() snapshot are removed.
task_exec_state is the privilege domain established by an execve().
Within a thread group it is shared via refcount; across thread groups
each task has its own:
- CLONE_VM siblings (thread-group members, io_uring workers)
refcount-share the parent's exec_state.
- Non-CLONE_VM clones (fork(), vfork() without CLONE_VM) allocate a
fresh exec_state inheriting the parent's dumpable mode and user_ns.
- execve() in the child allocates a fresh instance and installs it
under task_lock + exec_update_lock via task_exec_state_replace().
- Credential changes (setresuid, capset, ...) and
prctl(PR_SET_DUMPABLE) update dumpability on the current task's
exec_state, i.e., on the thread group's shared instance.
On top of this exec_mmap() no longer tears down the old mm while
holding exec_update_lock for writing and cred_guard_mutex. Neither
lock is needed for that: exec_update_lock only exists to make the mm
swap atomic with the later commit_creds() and all its readers operate
on the new mm; none looks at the detached old mm. The cost was real:
__mmput() runs exit_mmap() over the entire old address space and can
block in exit_aio() waiting for in-flight AIO, so execve() of a large
process blocked ptrace_attach() and every exec_update_lock reader for
the duration of the teardown. The old mm is now stashed in
bprm->old_mm and released from setup_new_exec() after both locks are
dropped, with a backstop in free_bprm() for the error paths.
/* Testing */
gcc (Debian 14.2.0-19) 14.2.0
Debian clang version 19.1.7 (3+b1)
No build failures or warnings were observed.
/* Conflicts */
Merge conflicts with mainline
=============================
No known conflicts.
Merge conflicts with other trees
================================
The following changes since commit 5200f5f493f79f14bbdc349e402a40dfb32f23c8:
Linux 7.1-rc4 (2026-05-17 13:59:58 -0700)
are available in the Git repository at:
git@gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs tags/kernel-7.2-rc1.task_exec_state
for you to fetch changes up to 38205ecbe6b6dc47968ad4e9c978e2117720969e:
exec: free the old mm outside the exec locks (2026-05-26 11:02:02 +0200)
----------------------------------------------------------------
kernel-7.2-rc1.task_exec_state
Please consider pulling these changes from the signed kernel-7.2-rc1.task_exec_state tag.
Thanks!
Christian
----------------------------------------------------------------
Christian Brauner (2):
Merge patch series "exec: introduce task_exec_state for exec-time metadata"
exec: free the old mm outside the exec locks
Christian Brauner (Amutable) (4):
sched/coredump: introduce enum task_dumpable
exec: introduce struct task_exec_state
ptrace: add ptracer_access_allowed()
exec_state: relocate dumpable information
arch/arm64/kernel/mte.c | 6 +-
drivers/firmware/efi/efi.c | 1 -
fs/coredump.c | 22 +++-----
fs/exec.c | 65 +++++++++++++--------
fs/pidfs.c | 23 +++-----
fs/proc/base.c | 39 ++++++-------
include/linux/binfmts.h | 3 +
include/linux/coredump.h | 4 ++
include/linux/mm_types.h | 9 ++-
include/linux/ptrace.h | 1 +
include/linux/sched.h | 6 +-
include/linux/sched/coredump.h | 47 ++++------------
include/linux/sched/exec_state.h | 31 ++++++++++
init/init_task.c | 10 ++++
kernel/Makefile | 2 +-
kernel/cred.c | 3 +-
kernel/exec_state.c | 119 +++++++++++++++++++++++++++++++++++++++
kernel/exit.c | 1 -
kernel/fork.c | 33 +++++++++--
kernel/kthread.c | 1 -
kernel/ptrace.c | 51 +++++++++++------
kernel/sys.c | 6 +-
mm/init-mm.c | 1 -
23 files changed, 329 insertions(+), 155 deletions(-)
create mode 100644 include/linux/sched/exec_state.h
create mode 100644 kernel/exec_state.c
^ permalink raw reply [flat|nested] 17+ messages in thread
* [GIT PULL 07/16 for v7.2] kernel misc
2026-06-12 15:10 [GIT PULL 00/16 for v7.2] v7.2 Christian Brauner
` (5 preceding siblings ...)
2026-06-12 15:13 ` [GIT PULL 06/16 for v7.2] kernel task_exec_state Christian Brauner
@ 2026-06-12 15:13 ` Christian Brauner
2026-06-12 15:13 ` [GIT PULL 08/16 for v7.2] vfs openat2 Christian Brauner
` (8 subsequent siblings)
15 siblings, 0 replies; 17+ messages in thread
From: Christian Brauner @ 2026-06-12 15:13 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Christian Brauner, linux-fsdevel, linux-kernel
Hey Linus,
/* Summary */
Fixes
- rhashtable: give each instance its own lockdep class
syzbot reported a circular locking dependency between ht->mutex and
fs_reclaim via the simple_xattrs rhashtable being torn down during
inode eviction. The predicted deadlock cannot occur:
rhashtable_free_and_destroy() cancels the deferred worker before
taking ht->mutex and acquisitions on distinct rhashtables are on
distinct mutexes. Lockdep flags a cycle anyway because every
ht->mutex in the kernel shared the single static lockdep class from
rhashtable_init_noprof(). The lockdep key is lifted to a
per-call-site static key so every rhashtable instance gets its own
class.
- selftests/clone3: fix misuse of the libcap library interface in the
cap_checkpoint_restore test and remove unused variables
- selftests/pid_namespace: compute the pid_max test limits dynamically
instead of hardcoding values below the kernel-enforced minimum of
PIDS_PER_CPU_MIN * num_possible_cpus() which made the tests fail on
machines with many possible CPUs
- selftests: fix the Makefile TARGETS entry for nsfs which wasn't
adjusted when the tests moved under filesystems/
Cleanups
- ipc/sem.c: use unsigned int for nsops to match the declaration in
syscalls.h
/* Testing */
gcc (Debian 14.2.0-19) 14.2.0
Debian clang version 19.1.7 (3+b1)
No build failures or warnings were observed.
/* Conflicts */
Merge conflicts with mainline
=============================
No known conflicts.
Merge conflicts with other trees
================================
The following changes since commit 254f49634ee16a731174d2ae34bc50bd5f45e731:
Linux 7.1-rc1 (2026-04-26 14:19:00 -0700)
are available in the Git repository at:
git@gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs tags/kernel-7.2-rc1.misc
for you to fetch changes up to ee8ab98f831226d69d43ccd93f53c50e6f19b389:
Merge patch series "selftests/clone3: fix cap_checkpoint_restore test" (2026-05-27 14:11:47 +0200)
----------------------------------------------------------------
kernel-7.2-rc1.misc
Please consider pulling these changes from the signed kernel-7.2-rc1.misc tag.
Thanks!
Christian
----------------------------------------------------------------
Bjoern Doebel (1):
selftests/pid_namespace: compute pid_max test limits dynamically
Christian Brauner (2):
rhashtable: give each instance its own lockdep class
Merge patch series "selftests/clone3: fix cap_checkpoint_restore test"
Eva Kurchatova (1):
selftests/clone3: fix libcap interface usage
Florian Schmaus (1):
selftests: Fix Makefile target for nsfs
Konstantin Khorenko (1):
selftests/clone3: remove unused variables
Yi Xie (1):
ipc/sem.c: use unsigned int for nsops
include/linux/rhashtable-types.h | 22 ++-
ipc/sem.c | 6 +-
lib/rhashtable.c | 17 ++-
tools/testing/selftests/Makefile | 2 +-
.../clone3/clone3_cap_checkpoint_restore.c | 24 +---
tools/testing/selftests/pid_namespace/pid_max.c | 156 ++++++++++++++++-----
6 files changed, 161 insertions(+), 66 deletions(-)
^ permalink raw reply [flat|nested] 17+ messages in thread
* [GIT PULL 08/16 for v7.2] vfs openat2
2026-06-12 15:10 [GIT PULL 00/16 for v7.2] v7.2 Christian Brauner
` (6 preceding siblings ...)
2026-06-12 15:13 ` [GIT PULL 07/16 for v7.2] kernel misc Christian Brauner
@ 2026-06-12 15:13 ` Christian Brauner
2026-06-12 15:14 ` [GIT PULL 09/16 for v7.2] vfs super Christian Brauner
` (7 subsequent siblings)
15 siblings, 0 replies; 17+ messages in thread
From: Christian Brauner @ 2026-06-12 15:13 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Christian Brauner, linux-fsdevel, linux-kernel
Hey Linus,
/* Summary */
This contains the openat2 changes for this cycle:
Features
- Add O_EMPTYPATH to openat(2)/openat2(2). To get an operable file
descriptor from an O_PATH file descriptor it is possible to use
openat(fd, ".", O_DIRECTORY) for directories, but other file types
require going through open("/proc/<pid>/fd/<nr>") and thus depend on
a functioning procfs. With O_EMPTYPATH an empty path string is
accepted and LOOKUP_EMPTY is set at path resolution time, allowing
to reopen the file behind the file descriptor directly. Selftests
are included.
- Add an OPENAT2_REGULAR flag for openat2(2) which refuses to open
anything but regular files with the new EFTYPE error code. This
implements the "ability to only open regular files" feature
requested by userspace via uapi-group.org and protects services
from being redirected to fifos, device nodes, and friends.
All atomic_open implementations were audited for OPENAT2_REGULAR
handling. Explicit checks were added to ceph, gfs2, nfs (v4), and
cifs/smb - these are the filesystems whose atomic_open can encounter
an existing non-regular file and would otherwise call finish_open()
on it or return a misleading error code. The remaining
implementations (9p, fuse, vboxsf, nfs v2/v3) only call
finish_open() on freshly created files and use finish_no_open() for
lookup hits, letting the VFS catch non-regular files via the
do_open() safety net.
Cleanups
- Migrate the openat2 selftests to the kselftest harness and move
them under selftests/filesystems/. The tests were written in the
early days of selftests' TAP support and the modern kselftest
harness is much easier to follow and maintain. The contents of the
tests are unchanged and the new emptypath tests are ported on top.
- Make the LAST_XXX last-type constants private to fs/namei.c. The
only user outside of fs/namei.c was ksmbd which only needs to know
whether the last component is a regular one, so
vfs_path_parent_lookup() now performs the LAST_NORM check
internally. The ints are replaced with a dedicated enum last_type.
/* Testing */
gcc (Debian 14.2.0-19) 14.2.0
Debian clang version 19.1.7 (3+b1)
No build failures or warnings were observed.
/* Conflicts */
Merge conflicts with mainline
=============================
No known conflicts.
Merge conflicts with other trees
================================
The following changes since commit 254f49634ee16a731174d2ae34bc50bd5f45e731:
Linux 7.1-rc1 (2026-04-26 14:19:00 -0700)
are available in the Git repository at:
git@gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs tags/vfs-7.2-rc1.openat2
for you to fetch changes up to 318643721de396012da102723f337f35ba7ec1e9:
vfs: replace ints with enum last_type for LAST_XXX (2026-05-29 09:47:02 +0200)
----------------------------------------------------------------
vfs-7.2-rc1.openat2
Please consider pulling these changes from the signed vfs-7.2-rc1.openat2 tag.
Thanks!
Christian
----------------------------------------------------------------
Aleksa Sarai (4):
selftests: move openat2 tests to selftests/filesystems/
selftests: openat2: move helpers to header
selftests: openat2: switch from custom ARRAY_LEN to ARRAY_SIZE
selftests: openat2: migrate to kselftest harness
Christian Brauner (4):
Merge patch series "selftests: openat2: migrate to kselftest harness"
Merge patch series "vfs: add O_EMPTYPATH to openat(2)/openat2(2)"
Merge patch series "OPENAT2_REGULAR flag support for openat2"
selftests: openat2: port emptypath_test to kselftest harness
Dorjoy Chowdhury (3):
openat2: introduce EFTYPE error code
openat2: new OPENAT2_REGULAR flag support
kselftest/openat2: test for OPENAT2_REGULAR flag
Jori Koolstra (4):
vfs: add O_EMPTYPATH to openat(2)/openat2(2)
selftest: add tests for O_EMPTYPATH
vfs: make LAST_XXX private to fs/namei.c
vfs: replace ints with enum last_type for LAST_XXX
arch/alpha/include/uapi/asm/errno.h | 2 +
arch/mips/include/uapi/asm/errno.h | 2 +
arch/parisc/include/uapi/asm/errno.h | 2 +
arch/sparc/include/uapi/asm/errno.h | 2 +
fs/ceph/file.c | 4 +
fs/fcntl.c | 4 +-
fs/gfs2/inode.c | 7 +
fs/namei.c | 48 ++-
fs/nfs/dir.c | 4 +
fs/open.c | 41 ++-
fs/smb/client/dir.c | 18 +-
fs/smb/server/vfs.c | 15 +-
include/linux/fcntl.h | 20 +-
include/linux/namei.h | 7 +-
include/uapi/asm-generic/errno.h | 2 +
include/uapi/asm-generic/fcntl.h | 4 +
include/uapi/linux/openat2.h | 7 +
tools/arch/alpha/include/uapi/asm/errno.h | 2 +
tools/arch/mips/include/uapi/asm/errno.h | 2 +
tools/arch/parisc/include/uapi/asm/errno.h | 2 +
tools/arch/sparc/include/uapi/asm/errno.h | 2 +
tools/include/uapi/asm-generic/errno.h | 2 +
tools/include/uapi/linux/openat2.h | 43 +++
.../selftests/{ => filesystems}/openat2/.gitignore | 0
.../selftests/{ => filesystems}/openat2/Makefile | 9 +-
.../selftests/filesystems/openat2/emptypath_test.c | 77 +++++
.../selftests/filesystems/openat2/helpers.h | 135 ++++++++
.../{ => filesystems}/openat2/openat2_test.c | 262 ++++++++-------
.../filesystems/openat2/rename_attack_test.c | 159 +++++++++
.../{ => filesystems}/openat2/resolve_test.c | 368 ++++++++++++---------
tools/testing/selftests/openat2/helpers.c | 109 ------
tools/testing/selftests/openat2/helpers.h | 108 ------
.../testing/selftests/openat2/rename_attack_test.c | 160 ---------
33 files changed, 920 insertions(+), 709 deletions(-)
create mode 100644 tools/include/uapi/linux/openat2.h
rename tools/testing/selftests/{ => filesystems}/openat2/.gitignore (100%)
rename tools/testing/selftests/{ => filesystems}/openat2/Makefile (65%)
create mode 100644 tools/testing/selftests/filesystems/openat2/emptypath_test.c
create mode 100644 tools/testing/selftests/filesystems/openat2/helpers.h
rename tools/testing/selftests/{ => filesystems}/openat2/openat2_test.c (63%)
create mode 100644 tools/testing/selftests/filesystems/openat2/rename_attack_test.c
rename tools/testing/selftests/{ => filesystems}/openat2/resolve_test.c (74%)
delete mode 100644 tools/testing/selftests/openat2/helpers.c
delete mode 100644 tools/testing/selftests/openat2/helpers.h
delete mode 100644 tools/testing/selftests/openat2/rename_attack_test.c
^ permalink raw reply [flat|nested] 17+ messages in thread
* [GIT PULL 09/16 for v7.2] vfs super
2026-06-12 15:10 [GIT PULL 00/16 for v7.2] v7.2 Christian Brauner
` (7 preceding siblings ...)
2026-06-12 15:13 ` [GIT PULL 08/16 for v7.2] vfs openat2 Christian Brauner
@ 2026-06-12 15:14 ` Christian Brauner
2026-06-12 15:14 ` [GIT PULL 10/16 for v7.2] vfs writeback Christian Brauner
` (6 subsequent siblings)
15 siblings, 0 replies; 17+ messages in thread
From: Christian Brauner @ 2026-06-12 15:14 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Christian Brauner, linux-fsdevel, linux-kernel
Hey Linus,
/* Summary */
This retires sget(). CIFS plus the two ext4 KUnit tests (extents-test,
mballoc-test) were the last in-tree callers, and all three convert
cleanly to sget_fc(). That lets sget() and its prototype come out,
taking ~60 lines that only existed to be kept in lockstep with
sget_fc() on every publish-path change.
/* Testing */
gcc (Debian 14.2.0-19) 14.2.0
Debian clang version 19.1.7 (3+b1)
No build failures or warnings were observed.
/* Conflicts */
Merge conflicts with mainline
=============================
No known conflicts.
Merge conflicts with other trees
================================
The following changes since commit 254f49634ee16a731174d2ae34bc50bd5f45e731:
Linux 7.1-rc1 (2026-04-26 14:19:00 -0700)
are available in the Git repository at:
git@gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs tags/vfs-7.2-rc1.super
for you to fetch changes up to 2c6f0c248a6b49a6fc8c301c84d367860c56ccd8:
Merge patch series "super: retire sget()" (2026-06-03 09:09:57 +0200)
----------------------------------------------------------------
vfs-7.2-rc1.super
Please consider pulling these changes from the signed vfs-7.2-rc1.super tag.
Thanks!
Christian
----------------------------------------------------------------
Christian Brauner (5):
ext4: convert extents KUnit test to sget_fc()
ext4: convert mballoc KUnit test to sget_fc()
smb: client: convert cifs_smb3_do_mount() to sget_fc()
fs: retire sget()
Merge patch series "super: retire sget()"
fs/btrfs/super.c | 2 +-
fs/ext4/extents-test.c | 22 +++++++++++---
fs/ext4/mballoc-test.c | 17 +++++++++--
fs/smb/client/cifsfs.c | 43 +++++++++++++++++-----------
fs/smb/client/cifsfs.h | 3 +-
fs/smb/client/cifsproto.h | 3 +-
fs/smb/client/connect.c | 5 ++--
fs/smb/client/fs_context.c | 2 +-
fs/super.c | 71 ++++------------------------------------------
include/linux/fs.h | 4 ---
10 files changed, 73 insertions(+), 99 deletions(-)
^ permalink raw reply [flat|nested] 17+ messages in thread
* [GIT PULL 10/16 for v7.2] vfs writeback
2026-06-12 15:10 [GIT PULL 00/16 for v7.2] v7.2 Christian Brauner
` (8 preceding siblings ...)
2026-06-12 15:14 ` [GIT PULL 09/16 for v7.2] vfs super Christian Brauner
@ 2026-06-12 15:14 ` Christian Brauner
2026-06-12 15:14 ` [GIT PULL 11/16 for v7.2] vfs bh Christian Brauner
` (5 subsequent siblings)
15 siblings, 0 replies; 17+ messages in thread
From: Christian Brauner @ 2026-06-12 15:14 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Christian Brauner, linux-fsdevel, linux-kernel
Hey Linus,
/* Summary */
This contains the writeback changes for this cycle:
* Fix a race between cgroup_writeback_umount() and inode_switch_wbs()
When a container exits, a race between cgroup_writeback_umount() and
inode_switch_wbs()/cleanup_offline_cgwb() can trigger "VFS: Busy
inodes after unmount" followed by a use-after-free on percpu
counters. There is a window between inode_prepare_wbs_switch()
returning true (having passed the SB_ACTIVE check and grabbed the
inode) and the subsequent wb_queue_isw() call: if
cgroup_writeback_umount() observes the global isw_nr_in_flight
counter as non-zero but flush_workqueue() finds nothing queued yet,
it returns early - leaving a held inode reference that blocks
evict_inodes() and a later iput() that hits freed percpu counters.
The race is closed by covering the window from
inode_prepare_wbs_switch() through wb_queue_isw() with an RCU
read-side critical section and synchronizing in the umount path. On
top of that the now-dead rcu_barrier() left over from the
queue_rcu_work() era is removed, and the global
synchronize_rcu()/flush_workqueue() pair is replaced with a per-sb
in-flight counter plus pin/unpin/drain helpers so umount no longer
serializes against switch activity on unrelated superblocks.
Under cgroup writeback churn on a 16 vCPU guest this takes umount
latency from ~92-138ms p50 down to ~5-8ms p50 and the cumulative
cost of cgroup_writeback_umount() from ~62ms to ~4us per call. The
initial race fix is kept separate and minimal so it backports
cleanly to stable trees that still queue switches via
queue_rcu_work().
* Improve write performance with RWF_DONTCACHE
Dirty DONTCACHE pages are now tracked per bdi_writeback so that the
writeback flusher can be kicked in a targeted fashion for
IOCB_DONTCACHE writes instead of relying on global writeback, and
the PG_dropbehind flag is preserved when a folio is split.
/* Testing */
gcc (Debian 14.2.0-19) 14.2.0
Debian clang version 19.1.7 (3+b1)
No build failures or warnings were observed.
/* Conflicts */
Merge conflicts with mainline
=============================
No known conflicts.
Merge conflicts with other trees
================================
The following changes since commit 254f49634ee16a731174d2ae34bc50bd5f45e731:
Linux 7.1-rc1 (2026-04-26 14:19:00 -0700)
are available in the Git repository at:
git@gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs tags/vfs-7.2-rc1.writeback
for you to fetch changes up to 0275dc184aa007b260374af6d46fb15741c062a8:
Merge patch series "mm: improve write performance with RWF_DONTCACHE" (2026-06-04 10:18:25 +0200)
----------------------------------------------------------------
vfs-7.2-rc1.writeback
Please consider pulling these changes from the signed vfs-7.2-rc1.writeback tag.
Thanks!
Christian
----------------------------------------------------------------
Baokun Li (3):
writeback: fix race between cgroup_writeback_umount() and inode_switch_wbs()
writeback: drop now-unnecessary rcu_barrier() in cgroup_writeback_umount()
writeback: use a per-sb counter to drain inode wb switches at umount
Christian Brauner (2):
Merge patch series "writeback: fix race between cgroup_writeback_umount() and inode_switch_wbs()"
Merge patch series "mm: improve write performance with RWF_DONTCACHE"
Jeff Layton (3):
mm: preserve PG_dropbehind flag during folio split
mm: track DONTCACHE dirty pages per bdi_writeback
mm: kick writeback flusher for IOCB_DONTCACHE with targeted dirty tracking
fs/fs-writeback.c | 138 +++++++++++++++++++++++++++++++--------
include/linux/backing-dev-defs.h | 3 +
include/linux/fs.h | 6 +-
include/linux/fs/super_types.h | 8 +++
include/trace/events/writeback.h | 3 +-
mm/filemap.c | 15 ++++-
mm/huge_memory.c | 1 +
mm/page-writeback.c | 6 ++
8 files changed, 147 insertions(+), 33 deletions(-)
^ permalink raw reply [flat|nested] 17+ messages in thread
* [GIT PULL 11/16 for v7.2] vfs bh
2026-06-12 15:10 [GIT PULL 00/16 for v7.2] v7.2 Christian Brauner
` (9 preceding siblings ...)
2026-06-12 15:14 ` [GIT PULL 10/16 for v7.2] vfs writeback Christian Brauner
@ 2026-06-12 15:14 ` Christian Brauner
2026-06-12 15:15 ` [GIT PULL 12/16 for v7.2] vfs eventpoll Christian Brauner
` (4 subsequent siblings)
15 siblings, 0 replies; 17+ messages in thread
From: Christian Brauner @ 2026-06-12 15:14 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Christian Brauner, linux-fsdevel, linux-kernel
Hey Linus,
/* Summary */
This removes b_end_io from struct buffer_head.
Instead of setting bio->bi_end_io to end_bio_bh_io_sync() which then
calls bh->b_end_io(), the new bh_submit() and __bh_submit() interfaces
set bio->bi_end_io to the appropriate completion handler directly,
replacing two indirect function calls in the completion path with one.
It is also one fewer function pointer in the middle of a writable data
structure that can be corrupted, it shrinks struct buffer_head from
104 to 96 bytes allowing roughly 7% more buffer_heads to be cached in
the same amount of memory, and it removes some atomic operations as
the buffer refcount is no longer incremented before calling the end_io
handler.
All in-tree users (fs/buffer.c itself, ext4, jbd2, ocfs2, gfs2,
nilfs2, and md-bitmap) are converted, and submit_bh(),
mark_buffer_async_write(), and end_buffer_write_sync() are removed.
/* Testing */
gcc (Debian 14.2.0-19) 14.2.0
Debian clang version 19.1.7 (3+b1)
No build failures or warnings were observed.
/* Conflicts */
Merge conflicts with mainline
=============================
No known conflicts.
Merge conflicts with other trees
================================
The following changes since commit 254f49634ee16a731174d2ae34bc50bd5f45e731:
Linux 7.1-rc1 (2026-04-26 14:19:00 -0700)
are available in the Git repository at:
git@gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs tags/vfs-7.2-rc1.bh
for you to fetch changes up to f0d857543e4d37464759c338f46ad6c85a618a2e:
Merge patch series "Remove b_end_io from struct buffer_head" (2026-06-04 10:28:17 +0200)
----------------------------------------------------------------
vfs-7.2-rc1.bh
Please consider pulling these changes from the signed vfs-7.2-rc1.bh tag.
Thanks!
Christian
----------------------------------------------------------------
Christian Brauner (1):
Merge patch series "Remove b_end_io from struct buffer_head"
Matthew Wilcox (Oracle) (34):
buffer: Remove forward declaration of submit_bh_wbc()
buffer: Add bh_submit()
buffer: Remove mark_buffer_async_write_endio()
buffer: Add bh_end_read(), bh_end_write() and bh_end_async_write()
buffer: Convert write_dirty_buffer to bh_submit()
buffer: Convert __bread_slow to bh_submit()
buffer: Convert __sync_dirty_buffer to bh_submit()
buffer: Convert __bh_read to bh_submit()
buffer: Convert __bh_read_batch to bh_submit()
buffer: Convert block_read_full_folio to bh_submit()
buffer: Convert __block_write_full_folio to __bh_submit()
ext4; Convert __ext4_read_bh() to bh_submit()
ext4: Convert ext4_fc_submit_bh() to bh_submit()
ext4: Convert write_mmp_block_thawed() to bh_submit()
ext4: Convert ext4_commit_super() to bh_submit()
jbd2: Convert journal commit to bh_submit()
jbd2: Convert jbd2_write_superblock() to bh_submit()
ocfs2: Convert ocfs2_write_block to bh_submit()
ocfs2: Convert ocfs2_read_block to bh_submit()
ocfs2: Convert ocfs2_read_blocks to bh_submit()
ocfs2: Convert ocfs2_write_super_or_backup to bh_submit()
gfs2: Convert gfs2_metapath_ra to bh_submit()
gfs2: Convert gfs2_dir_readahead to bh_submit()
gfs2: Remove use of b_end_io in gfs2_meta_read_endio()
gfs2: Convert gfs2_aspace_write_folio to bh_submit()
buffer: Remove mark_buffer_async_write()
nilfs2: Convert nilfs_btnode_submit_block to bh_submit()
nilfs2: Convert nilfs_gccache_submit_read_data to bh_submit()
nilfs2: Convert nilfs_mdt_submit_block to bh_submit()
md-bitmap: Convert read_file_page and write_file_page to bh_submit()
buffer: Remove submit_bh()
buffer: Remove b_end_io
buffer: Change calling convention for end_buffer_read_sync()
buffer: Remove end_buffer_write_sync()
Documentation/filesystems/locking.rst | 14 --
Documentation/trace/ftrace.rst | 4 +-
drivers/md/md-bitmap.c | 27 +--
drivers/md/raid5.h | 6 +-
fs/buffer.c | 385 ++++++++++++++++++----------------
fs/ext4/ext4.h | 10 +-
fs/ext4/fast_commit.c | 8 +-
fs/ext4/ialloc.c | 6 +-
fs/ext4/mmp.c | 5 +-
fs/ext4/super.c | 18 +-
fs/gfs2/bmap.c | 13 +-
fs/gfs2/dir.c | 12 +-
fs/gfs2/meta_io.c | 13 +-
fs/jbd2/commit.c | 13 +-
fs/jbd2/journal.c | 4 +-
fs/nilfs2/btnode.c | 4 +-
fs/nilfs2/gcinode.c | 4 +-
fs/nilfs2/mdt.c | 4 +-
fs/ocfs2/buffer_head_io.c | 16 +-
include/linux/buffer_head.h | 16 +-
mm/vmscan.c | 2 +-
21 files changed, 288 insertions(+), 296 deletions(-)
^ permalink raw reply [flat|nested] 17+ messages in thread
* [GIT PULL 12/16 for v7.2] vfs eventpoll
2026-06-12 15:10 [GIT PULL 00/16 for v7.2] v7.2 Christian Brauner
` (10 preceding siblings ...)
2026-06-12 15:14 ` [GIT PULL 11/16 for v7.2] vfs bh Christian Brauner
@ 2026-06-12 15:15 ` Christian Brauner
2026-06-12 15:15 ` [GIT PULL 13/16 for v7.2] vfs iomap Christian Brauner
` (3 subsequent siblings)
15 siblings, 0 replies; 17+ messages in thread
From: Christian Brauner @ 2026-06-12 15:15 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Christian Brauner, linux-fsdevel, linux-kernel
Hey Linus,
/* Summary */
This contains the eventpoll changes for this cycle:
* eventpoll clarity refactor
The recent eventpoll UAF fixes (a6dc643c6931 and follow-ups) rode on
invariants in fs/eventpoll.c that were nowhere documented and had to
be reverse-engineered from the code: the lifetime relationships
between struct eventpoll, struct epitem, and struct file, the three
removal paths coordinating via epi_fget() pins and ep->mtx, the
ovflist sentinel-encoded scan state machine, the POLLFREE
release/acquire handshake, and the loop / path check globals
serialized by epnested_mutex. The fixes were correct but the next
person to touch this code would hit the same learning curve.
This series codifies those invariants in source and tightens the
surrounding structure. No functional changes intended:
- Documentation: a top-of-file overview with field-protection
tables for struct eventpoll and struct epitem, a section
gathering the loop-check / path-check globals next to their
declarations, labelled comments on the two sides of the POLLFREE
handshake, refreshed comments on epi_fget() and ep_remove_file(),
and a docblock on ep_clear_and_put() that names its two-pass
structure as load-bearing.
- Mechanical renames: ep_refcount_dec_and_test() -> ep_put() to
pair with ep_get(), attach_epitem() -> ep_attach_file() for
ep_remove_file() symmetry, the unused depth argument dropped from
epoll_mutex_lock(), and the CONFIG_KCMP block relocated next to
CONFIG_COMPAT so the hot-path code is contiguous.
- Helper extraction: ep_insert() splits into ep_alloc_epitem() and
ep_register_epitem(), ep_clear_and_put()'s two passes become
ep_drain_pollwaits() and ep_drain_tree() so the ordering
invariant is enforced by the call sequence rather than
convention, the per-event delivery loop body becomes
ep_deliver_event(), and the ep->mtx + epnested_mutex acquisition
dance lifts out of do_epoll_ctl() into ep_ctl_lock() /
ep_ctl_unlock().
- Sentinel and predicate cleanup: the EP_UNACTIVE_PTR overload is
hidden behind named helpers (ep_is_scanning, epi_on_ovflist,
...), epi->next is renamed to epi->ovflist_next, and the boolean
predicates return bool.
- The per-CTL_ADD scratch state (tfile_check_list, path_count[],
inserting_into) moves from file-scope globals into a
stack-allocated struct ep_ctl_ctx plumbed through the loop / path
check chain.
Two follow-up fixes are included: missing kernel-doc for the new
@ctx parameters, and restoring the EP_UNACTIVE_PTR sentinel for
ctx->tfile_check_list - replacing it with NULL termination broke
ep_remove_file()'s "never listed" check for the list tail, causing
a syzbot-reported use-after-free.
* io_uring related epoll cleanups
One of the nastier things about epoll is how it allows nesting
contexts inside each other, leading to the necessity of loop
detection and the issues that have come with that. There is no
reason to support nesting on the io_uring side, so contain the
damage and disallow nested contexts from there: eventpoll gains a
file based control interface and struct epoll_filefd is renamed to
epoll_key. The io_uring side proper goes on top of this through the
block tree.
* Fix epoll_wait() reporting false negatives
ep_events_available() checks ep->rdllist and ep_is_scanning()
without a lock and can race with a concurrent scan such that
neither check sees the events, causing epoll_wait() with a zero
timeout to wrongly report no events even though events are
available. A sequence lock closes the race and a reproducer is
added to the eventpoll selftests.
/* Testing */
gcc (Debian 14.2.0-19) 14.2.0
Debian clang version 19.1.7 (3+b1)
No build failures or warnings were observed.
/* Conflicts */
Merge conflicts with mainline
=============================
No known conflicts.
Merge conflicts with other trees
================================
The following changes since commit 254f49634ee16a731174d2ae34bc50bd5f45e731:
Linux 7.1-rc1 (2026-04-26 14:19:00 -0700)
are available in the Git repository at:
git@gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs tags/vfs-7.2-rc1.eventpoll
for you to fetch changes up to a1e9718b406bc4e6d0c63c7b999d06febbdc4091:
eventpoll: restore EP_UNACTIVE_PTR sentinel for ctx->tfile_check_list (2026-06-04 13:53:50 +0200)
----------------------------------------------------------------
vfs-7.2-rc1.eventpoll
Please consider pulling these changes from the signed vfs-7.2-rc1.eventpoll tag.
Thanks!
Christian
----------------------------------------------------------------
Christian Brauner (20):
eventpoll: expand top-of-file overview / locking doc
eventpoll: document loop-check / path-check globals
eventpoll: clarify POLLFREE handshake comments
eventpoll: refresh epi_fget() / ep_remove_file() comments
eventpoll: document ep_clear_and_put() two-pass pattern
eventpoll: rename ep_refcount_dec_and_test() to ep_put()
eventpoll: drop unused depth argument from epoll_mutex_lock()
eventpoll: rename attach_epitem() to ep_attach_file()
eventpoll: relocate KCMP helpers near compat syscalls
eventpoll: split ep_insert() into alloc + register stages
eventpoll: split ep_clear_and_put() into drain helpers
eventpoll: extract ep_deliver_event() from ep_send_events()
eventpoll: extract lock dance from do_epoll_ctl() into ep_ctl_lock()
eventpoll: wrap EP_UNACTIVE_PTR in typed sentinel helpers
eventpoll: rename epi->next and txlist for clarity
eventpoll: use bool for predicate helpers
eventpoll: hoist CTL_ADD scratch state into struct ep_ctl_ctx
Merge patch series "eventpoll: clarity refactor"
Merge patch series "io_uring related epoll cleanups"
Merge patch series "eventpoll: Fix epoll_wait() report false negative"
Jens Axboe (4):
eventpoll: pass struct epoll_filefd through ep_find() and ep_insert()
eventpoll: export is_file_epoll()
eventpoll: add file based control interface
eventpoll: rename struct epoll_filefd to epoll_key
Nam Cao (2):
selftests/eventpoll: Add test for multiple waiters
eventpoll: Fix epoll_wait() report false negative
Randy Dunlap (1):
eventpoll: add missing kernel-doc for @ctx function parameters
Zhan Wei (1):
eventpoll: restore EP_UNACTIVE_PTR sentinel for ctx->tfile_check_list
fs/eventpoll.c | 1275 +++++++++++++-------
include/linux/eventpoll.h | 8 +
.../filesystems/epoll/epoll_wakeup_test.c | 45 +
3 files changed, 886 insertions(+), 442 deletions(-)
^ permalink raw reply [flat|nested] 17+ messages in thread
* [GIT PULL 13/16 for v7.2] vfs iomap
2026-06-12 15:10 [GIT PULL 00/16 for v7.2] v7.2 Christian Brauner
` (11 preceding siblings ...)
2026-06-12 15:15 ` [GIT PULL 12/16 for v7.2] vfs eventpoll Christian Brauner
@ 2026-06-12 15:15 ` Christian Brauner
2026-06-12 15:15 ` [GIT PULL 14/16 for v7.2] vfs xattr Christian Brauner
` (2 subsequent siblings)
15 siblings, 0 replies; 17+ messages in thread
From: Christian Brauner @ 2026-06-12 15:15 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Christian Brauner, linux-fsdevel, linux-kernel
Hey Linus,
/* Summary */
This contains the iomap changes for this cycle:
* Add the vfs infrastructure required to implement fs-verity support
for XFS with a post-EOF merkle tree: fsverity generates and stores a
zero-block hash, and iomap learns to verify data on buffered reads,
to handle fsverity during writeback via the new IOMAP_F_FSVERITY
flag, and to write fsverity metadata through iomap_fsverity_write().
* Skip the memset of the iomap in iomap_iter() once the iteration is
done. In high-IOPS scenarios (4k randread NVMe polling via io_uring)
the pointless memset wasted memory write bandwidth; this improves
IOPS by about 5% on ext4 and xfs.
* Add balance_dirty_pages_ratelimited() to iomap_zero_iter(), aligning
it with iomap_write_iter(). This prepares for the exFAT iomap
conversion where zeroing beyond valid_size can trigger large-scale
zeroing operations that caused memory pressure without throttling.
* Remove the over-strict inline data boundary check. If a filesystem
provides a valid inline_data pointer and length there is no reason
to require that inline data must not cross a page boundary.
* Don't make REQ_POLLED imply REQ_NOWAIT, matching the earlier
equivalent block layer fix: there are valid cases to poll for I/O
completion without REQ_NOWAIT, and REQ_NOWAIT for file system writes
is currently not supported as writes aren't idempotent.
* Introduce IOMAP_F_ZERO_TAIL for filesystems that maintain a separate
valid data length (exFAT, NTFS). For a write starting at or beyond
valid_size, __iomap_write_begin() now zeroes only the tail portion
of the block while preserving valid data before it, instead of
leaving stale data in the page cache. The flag is also added to the
iomap trace event strings.
/* Testing */
gcc (Debian 14.2.0-19) 14.2.0
Debian clang version 19.1.7 (3+b1)
No build failures or warnings were observed.
/* Conflicts */
Merge conflicts with mainline
=============================
No known conflicts.
Merge conflicts with other trees
================================
The exfat tree carries the same "iomap: introduce IOMAP_F_ZERO_TAIL
flag" patch as this tree as a dependency for the exFAT iomap
conversion, so the merge gets a conflict in include/linux/iomap.h
together with the IOMAP_F_FSVERITY additions from this tree. Reported
in [1]. It can be resolved as follows:
[1]: https://lore.kernel.org/linux-next/aiKrepiU3-L6KRqJ@sirena.org.uk
diff --combined include/linux/iomap.h
index cea6bbc97b6ef,3582ed1fe2361..0000000000000
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@@ -91,6 -91,14 +91,14 @@@ struct vm_fault
#endif /* CONFIG_BLK_DEV_INTEGRITY */
#define IOMAP_F_ZERO_TAIL (1U << 10)
+ /*
+ * Indicates reads and writes of fsverity metadata.
+ *
+ * Fsverity metadata is stored after the regular file data and thus beyond
+ * i_size.
+ */
+ #define IOMAP_F_FSVERITY (1U << 11)
+
/*
* Flag reserved for file system specific usage
*/
@@@ -345,6 -353,9 +353,9 @@@ static inline bool iomap_want_unshare_i
ssize_t iomap_file_buffered_write(struct kiocb *iocb, struct iov_iter *from,
const struct iomap_ops *ops,
const struct iomap_write_ops *write_ops, void *private);
+ int iomap_fsverity_write(struct file *file, loff_t pos, size_t length,
+ const void *buf, const struct iomap_ops *ops,
+ const struct iomap_write_ops *write_ops);
void iomap_read_folio(const struct iomap_ops *ops,
struct iomap_read_folio_ctx *ctx, void *private);
void iomap_readahead(const struct iomap_ops *ops,
@@@ -421,6 -432,7 +432,7 @@@ struct iomap_ioend
loff_t io_offset; /* offset in the file */
sector_t io_sector; /* start sector of ioend */
void *io_private; /* file system private data */
+ struct fsverity_info *io_vi; /* fsverity info */
struct bio io_bio; /* MUST BE LAST! */
};
@@@ -495,6 -507,7 +507,7 @@@ struct iomap_read_folio_ctx
struct readahead_control *rac;
void *read_ctx;
loff_t read_ctx_file_offset;
+ struct fsverity_info *vi;
};
The following changes since commit 254f49634ee16a731174d2ae34bc50bd5f45e731:
Linux 7.1-rc1 (2026-04-26 14:19:00 -0700)
are available in the Git repository at:
git@gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs tags/vfs-7.2-rc1.iomap
for you to fetch changes up to b7bae6880e8de2a5f693c18d87ad5cc26f157eb2:
iomap: Add IOMAP_F_ZERO_TAIL flag to trace event strings (2026-06-05 13:36:42 +0200)
----------------------------------------------------------------
vfs-7.2-rc1.iomap
Please consider pulling these changes from the signed vfs-7.2-rc1.iomap tag.
Thanks!
Christian
----------------------------------------------------------------
Andrey Albershteyn (4):
fsverity: generate and store zero-block hash
iomap: introduce IOMAP_F_FSVERITY and teach writeback to handle fsverity
iomap: teach iomap to read files with fsverity
iomap: introduce iomap_fsverity_write() for writing fsverity metadata
Chi Zhiling (1):
iomap: add dirty page control to iomap_zero_iter
Christian Brauner (1):
Merge patch series "vfs infrastructure for fs-verity support for XFS with post EOF merkle tree"
Christoph Hellwig (1):
iomap: don't make REQ_POLLED imply REQ_NOWAIT
Fengnan Chang (1):
iomap: avoid memset iomap when iter is done
Namjae Jeon (3):
iomap: remove over-strict inline data boundary check
iomap: introduce IOMAP_F_ZERO_TAIL flag
iomap: Add IOMAP_F_ZERO_TAIL flag to trace event strings
fs/iomap/buffered-io.c | 116 ++++++++++++++++++++++++++++++++++++++-----
fs/iomap/direct-io.c | 5 +-
fs/iomap/ioend.c | 1 +
fs/iomap/iter.c | 12 ++---
fs/iomap/trace.h | 4 +-
fs/verity/fsverity_private.h | 3 ++
fs/verity/measure.c | 4 +-
fs/verity/open.c | 3 ++
fs/verity/pagecache.c | 22 ++++++++
include/linux/bio.h | 14 ------
include/linux/fsverity.h | 8 +++
include/linux/iomap.h | 27 ++++++----
12 files changed, 170 insertions(+), 49 deletions(-)
^ permalink raw reply [flat|nested] 17+ messages in thread
* [GIT PULL 14/16 for v7.2] vfs xattr
2026-06-12 15:10 [GIT PULL 00/16 for v7.2] v7.2 Christian Brauner
` (12 preceding siblings ...)
2026-06-12 15:15 ` [GIT PULL 13/16 for v7.2] vfs iomap Christian Brauner
@ 2026-06-12 15:15 ` Christian Brauner
2026-06-12 15:16 ` [GIT PULL 15/16 for v7.2] vfs misc Christian Brauner
2026-06-12 15:16 ` [GIT PULL 16/16 for v7.2] vfs procfs Christian Brauner
15 siblings, 0 replies; 17+ messages in thread
From: Christian Brauner @ 2026-06-12 15:15 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Christian Brauner, linux-fsdevel, linux-kernel
Hey Linus,
/* Summary */
This reworks the simple xattr api to make it more efficient and easier
to use for all consumers.
The simple_xattr hash table moves from the inode into a per-superblock
cache, removing the per-inode overhead for the common case of few or
no xattrs. The interface now passes struct simple_xattrs ** so lazy
allocation is handled internally instead of by every caller, kernfs
xattr operations on kernfs nodes shared between multiple superblocks
are properly serialized, and tmpfs constructs "security.foo" xattr
names with kasprintf() instead of kmalloc() plus two memcpy()s.
A follow-up fix links kernfs nodes to their parent before the LSM init
hook runs: with the per-sb cache kernfs_xattr_set() computes the cache
via kernfs_root(kn), which faulted on a freshly allocated node when
selinux_kernfs_init_security() called into it - reproducible as a NULL
pointer dereference on the first cgroup mkdir on SELinux-enabled
systems.
On top of this bpffs gains support for trusted.* and security.* xattrs
so that user space and BPF LSM programs can attach metadata - for
example a content hash or a security label - to pinned objects and
directories and inspect it uniformly like on other filesystems. The
store is in-memory and non-persistent, living only for the lifetime of
the mount like everything else in bpffs.
/* Testing */
gcc (Debian 14.2.0-19) 14.2.0
Debian clang version 19.1.7 (3+b1)
No build failures or warnings were observed.
/* Conflicts */
Merge conflicts with mainline
=============================
No known conflicts.
Merge conflicts with other trees
================================
This has merge conflicts with the bpf-next tree in kernel/bpf/inode.c
between commit 9722955b54307 ("bpf: Add simple xattr support to
bpffs") from this tree and commit b93c55b4932dd ("bpf: fix UAF by
restoring RCU-delayed inode freeing in bpffs") from the bpf-next tree,
and in include/linux/bpf.h. Reported in [1] and [2]; Daniel confirmed
the resolution in [3]. They can be resolved as follows:
[1]: https://lore.kernel.org/linux-next/aiF2rsdpUb5LuhmZ@sirena.org.uk
[2]: https://lore.kernel.org/linux-next/aiamrLm8DnCP6dbw@sirena.org.uk
[3]: https://lore.kernel.org/linux-next/8906796e-0542-46d2-bb92-9e49642d86dc@iogearbox.net
diff --cc kernel/bpf/inode.c
index c3f79b5a2f8c0,188c774a469ca..0000000000000
--- a/kernel/bpf/inode.c
+++ b/kernel/bpf/inode.c
@@@ -842,9 -768,12 +842,13 @@@ static void bpf_destroy_inode(struct in
if (!bpf_inode_type(inode, &type))
bpf_any_put(inode->i_private, type);
+ simple_xattrs_free(&opts->xa_cache, &bi->xattrs, NULL);
}
+ /*
+ * Called after RCU grace period - safe to free inode and anything
+ * that might be accessed by RCU pathwalk (inode fields, i_link).
+ */
static void bpf_free_inode(struct inode *inode)
{
if (S_ISLNK(inode->i_mode))
diff --cc include/linux/bpf.h
index 64efc3fdb7163,62bba7a4876f5..0000000000000
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@@ -31,7 -32,8 +32,9 @@@
#include <linux/static_call.h>
#include <linux/memcontrol.h>
#include <linux/cfi.h>
+ #include <linux/key.h>
+ #include <linux/ftrace.h>
+#include <linux/xattr.h>
#include <asm/rqspinlock.h>
struct bpf_verifier_env;
The following changes since commit 254f49634ee16a731174d2ae34bc50bd5f45e731:
Linux 7.1-rc1 (2026-04-26 14:19:00 -0700)
are available in the Git repository at:
git@gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs tags/vfs-7.2-rc1.xattr
for you to fetch changes up to 9722955b54307e9070994f2382ec06af3d7405e0:
bpf: Add simple xattr support to bpffs (2026-06-06 15:22:44 +0200)
----------------------------------------------------------------
vfs-7.2-rc1.xattr
Please consider pulling these changes from the signed vfs-7.2-rc1.xattr tag.
Thanks!
Christian
----------------------------------------------------------------
Christian Brauner (2):
Merge patch series "Rework simple xattrs"
kernfs: link kn to its parent before the LSM init hook
Daniel Borkmann (1):
bpf: Add simple xattr support to bpffs
Miklos Szeredi (4):
kernfs: fix xattr race condition with multiple superblocks
tmpfs: simplify constructing "security.foo" xattr names
simple_xattr: change interface to pass struct simple_xattrs **
simpe_xattr: use per-sb cache
fs/kernfs/dir.c | 22 ++--
fs/kernfs/file.c | 13 +--
fs/kernfs/inode.c | 36 +++---
fs/kernfs/kernfs-internal.h | 24 +++-
fs/kernfs/mount.c | 2 +-
fs/pidfs.c | 45 ++-----
fs/xattr.c | 278 ++++++++++++++++++++++++++------------------
include/linux/bpf.h | 3 +
include/linux/kernfs.h | 11 +-
include/linux/shmem_fs.h | 3 +-
include/linux/xattr.h | 39 ++++---
kernel/bpf/inode.c | 256 +++++++++++++++++++++++++++++++++++++---
mm/shmem.c | 50 +++-----
net/socket.c | 30 ++---
14 files changed, 526 insertions(+), 286 deletions(-)
^ permalink raw reply [flat|nested] 17+ messages in thread
* [GIT PULL 15/16 for v7.2] vfs misc
2026-06-12 15:10 [GIT PULL 00/16 for v7.2] v7.2 Christian Brauner
` (13 preceding siblings ...)
2026-06-12 15:15 ` [GIT PULL 14/16 for v7.2] vfs xattr Christian Brauner
@ 2026-06-12 15:16 ` Christian Brauner
2026-06-12 15:16 ` [GIT PULL 16/16 for v7.2] vfs procfs Christian Brauner
15 siblings, 0 replies; 17+ messages in thread
From: Christian Brauner @ 2026-06-12 15:16 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Christian Brauner, linux-fsdevel, linux-kernel
Hey Linus,
/* Summary */
Features
- Reduce pipe->mutex contention by pre-allocating pages outside the
lock in anon_pipe_write().
anon_pipe_write() called alloc_page() once per page while holding
pipe->mutex. The allocation can sleep doing direct reclaim and runs
memcg charging, which extends the critical section and stalls any
concurrent reader on the same mutex. Now up to 8 pages are
pre-allocated before the mutex is taken, leftovers are recycled into
the per-pipe tmp_page[] cache before unlock, and any remainder is
released after unlock, keeping the allocator out of the critical
section on both sides. On a writers x readers sweep with 64KB writes
against a 1 MB pipe throughput improves 6-28% and average write
latency drops 5-22%; under memory pressure - when the cost of
holding the mutex across reclaim is highest - throughput improves
21-48% and latency drops 17-33%. The microbenchmark is added to
selftests.
- uaccess/sockptr: fix the ignored_trailing logic in
copy_struct_to_user() to behave as documented and the usize check in
copy_struct_from_sockptr() for user pointers, and add
copy_struct_{from,to}_bounce_buffer() and copy_struct_to_sockptr()
helpers for upcoming users (IPPROTO_SMBDIRECT, IPPROTO_QUIC).
- bpf: add a sleepable bpf_real_inode() kfunc that resolves the real
inode backing a dentry via d_real_inode(). On overlayfs the inode
attached to the dentry doesn't carry the underlying device
information; this is used by the filesystem restriction BPF program
that was merged into systemd.
- docs: add guidelines for submitting new filesystems, motivated by
the maintenance burden abandoned and untestable filesystems impose
on VFS developers, blocking infrastructure work like folio
conversions and iomap migration.
Fixes
- libfs: set SB_I_NOEXEC and SB_I_NODEV by default in init_pseudo()
and drop the now-redundant assignments in callers. This began as a
one-line dma-buf fix for a path_noexec() warning; a pseudo
filesystem has no reason not to set SB_I_NOEXEC. All init_pseudo()
callers were audited: the only visible effect is on dma-buf where
SB_I_NOEXEC silences the warning.
- Handle set_blocksize() failures in legacy filesystems (bfs, hpfs,
qnx4, jfs, befs, affs, isofs, minix, ntfs3, omfs). Mounting a device
with a sector size > PAGE_SIZE crashed roughly half of them; the
rest had the same missing error handling pattern. Plus a follow-up
releasing the superblock buffer_head when setting the minix v3 block
size fails.
- mount: honour SB_NOUSER in the new mount API.
- fs/fcntl: fix a SOFTIRQ-unsafe lock order in fasync signaling by
switching the process-group paths of send_sigio() and send_sigurg()
from read_lock(&tasklist_lock) to RCU, matching the single-PID path.
- vfs: add an FS_USERNS_DELEGATABLE flag and set it for NFS, fixing
delegated NFS mounts (fsopen() in a container with the mount
performed by a privileged daemon) that broke when non-init s_user_ns
was tied to FS_USERNS_MOUNT.
- selftests/namespaces: fix a hang in nsid_test where an unreaped
grandchild kept the TAP pipe write-end open, a waitpid(-1) race in
listns_efault_test, and a false FAIL on kernels without listns()
where the tests should SKIP.
- filelock: fix the break_lease() stub signature for
CONFIG_FILE_LOCKING=n.
- init/initramfs_test: wait for the async initramfs unpacking before
running; the test and do_populate_rootfs() share the parser state.
- fs/coredump: reduce redundant log noise in
validate_coredump_safety().
- iomap: pass the correct length to fserror_report_io() in
__iomap_write_begin().
- backing-file: fix the backing_file_open() kerneldoc.
Cleanups
- initramfs: refactor the cpio hex header parsing to use hex2bin()
instead of the hand-rolled simple_strntoul() which is reverted, and
extend the initramfs KUnit tests to cover header fields with 0x
prefixes.
- Replace __get_free_pages() and friends with kmalloc()/kzalloc()
across quota, proc, ocfs2/dlm, nilfs2, nfs, nfsd, libfs, jfs, jbd2,
isofs, fuse, select, namespace, configfs, binfmt_misc, bfs, and the
do_mounts init code - part of the larger work of replacing page
allocator calls with kmalloc().
- Use clear_and_wake_up_bit() in unlock_buffer() and
journal_end_buffer_io_sync() instead of open-coding the sequence.
- Drop unused VFS exports: unexport drop_super_exclusive(), remove
start_removing_user_path_at(), and fold __start_removing_path() into
start_removing_path().
- fs/read_write: narrow the __kernel_write() export with
EXPORT_SYMBOL_FOR_MODULES().
- vfs: uapi: retire octal and hex constants in favor of (1 << n) for
the O_ flags. Finding a free bit for a new flag across the
architectures was needlessly hard with the mixed bases.
- dcache: add extra sanity checks of dead dentries in dentry_free()
via a new DENTRY_WARN_ONCE() that also prints d_flags.
- iov_iter: use kmemdup_array() in dup_iter() to harden the allocation
against multiplication overflow.
- fs/pipe: write to ->poll_usage only once.
- vfs: remove an always-taken if-branch in find_next_fd().
- dcache: use kmalloc_flex() for struct external_name in __d_alloc().
- namei: use QSTR() instead of QSTR_INIT() in path_pts().
- sync_file_range: delete dead S_ISLNK code.
- Comment fixes: retire a stale comment in fget_task_next() and fix
assorted spelling mistakes.
/* Testing */
gcc (Debian 14.2.0-19) 14.2.0
Debian clang version 19.1.7 (3+b1)
No build failures or warnings were observed.
/* Conflicts */
Merge conflicts with mainline
=============================
No known conflicts.
Merge conflicts with other trees
================================
This has a merge conflict with the ext4 tree in fs/jbd2/journal.c
between commit bbe9015f23432b ("jbd2: remove special jbd2 slabs") from
the ext4 tree and commit 2f6702dc6fdcf0 ("jbd2: replace
__get_free_pages() with kmalloc()") from this tree. The change in this
tree is a subset of the ext4 tree's commit, so the conflict can be
resolved by taking the ext4 side. Reported in [1].
It was suggested in [2] to drop the patch from this tree. But the
patch is part of the merged "fs: replace __get_free_pages() call with
kmalloc()" series with a dozen commits on top of it, so dropping it
would have meant rewriting the whole branch after it had been exposed
in linux-next. Since the change is a strict subset of the ext4 commit,
taking the ext4 side during the merge yields the identical end result.
[1]: https://lore.kernel.org/linux-next/aiq8CByJNMlXo6Be@sirena.co.uk
[2]: https://lore.kernel.org/linux-next/airBGjtjTf3Yuy0X@casper.infradead.org
The following changes since commit 254f49634ee16a731174d2ae34bc50bd5f45e731:
Linux 7.1-rc1 (2026-04-26 14:19:00 -0700)
are available in the Git repository at:
git@gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs tags/vfs-7.2-rc1.misc
for you to fetch changes up to aa5c4fe3ba0cb2af90bbcfa7a8ef4fefcd5c2370:
backing-file: fix backing_file_open() kerneldoc parameter (2026-06-10 09:49:25 +0200)
----------------------------------------------------------------
vfs-7.2-rc1.misc
Please consider pulling these changes from the signed vfs-7.2-rc1.misc tag.
Thanks!
Christian
----------------------------------------------------------------
Agatha Isabelle Moreira (2):
fs: buffer: use clear_and_wake_up_bit() in unlock_buffer()
fs: jbd2: use clear_and_wake_up_bit() in journal_end_buffer_io_sync()
Al Viro (1):
mount: honour SB_NOUSER in the new mount API
Alexey Dobriyan (1):
sync_file_range: delete dead S_ISLNK code
Amir Goldstein (1):
docs: add guidelines for submitting new filesystems
Andy Shevchenko (5):
initramfs: Sort headers alphabetically
initramfs: Refactor to use hex2bin() instead of custom approach
vsprintf: Revert "add simple_strntoul"
kstrtox: Drop extern keyword in the simple_strtox() declarations
fs/read_write: Do not export __kernel_write() to the entire world
Breno Leitao (2):
fs/pipe: pre-allocate pages outside pipe->mutex in anon_pipe_write
selftests/pipe: add pipe_bench microbenchmark
Christian Brauner (11):
Merge patch series "uaccess/sockptr: copy_struct_ fixes and more helpers"
Merge patch series "selftests/namespaces: Fix test hangs and false failures"
Merge patch series "initramfs: test and improve cpio hex header validation"
Merge patch series "drop unused VFS exports"
Merge patch series "fix crashes when mounting legacy file system with sector size > PAGE_SIZE"
Merge patch series "fs: refactor code to use clear_and_wake_up_bit()"
Merge patch series "fs: replace __get_free_pages() call with kmalloc()"
Merge patch series "fs/pipe: reduce pipe->mutex contention by pre-allocating outside the lock"
Merge patch series "libfs: set SB_I_NOEXEC and SB_I_NODEV in init_pseudo()"
bpf: add bpf_real_inode() kfunc
filelock: fix break_lease() stub signature for CONFIG_FILE_LOCKING=n
Christoph Hellwig (15):
fs: unexport drop_super_exclusive
fs: remove start_removing_user_path_at
fs: fold __start_removing_path into start_removing_path
bfs: handle set_blocksize failures
hpfs: handle set_blocksize failures
qnx4: handle set_blocksize failures
jfs: handle set_blocksize failures
befs: handle set_blocksize failures
affs: handle set_blocksize failures
isofs: handle set_blocksize failures
minix: handle set_blocksize failures
ntfs3: handle set_blocksize failures
omfs: handle set_blocksize failures
minix: release the sb buffer_head when setting the v3 block size fails
iomap: pass the correct len to fserror_report_io in __iomap_write_begin
David Disseldorp (2):
initramfs_test: add fill_cpio() inject_ox parameter
initramfs_test: test header fields with 0x hex prefix
Jeff Layton (2):
dcache: add extra sanity checks of the dentry in dentry_free()
vfs: add FS_USERNS_DELEGATABLE flag and set it for NFS
Jia He (1):
init/initramfs_test: wait_for_initramfs() before running
John Hubbard (2):
libfs: set SB_I_NOEXEC and SB_I_NODEV by default in init_pseudo()
libfs: drop redundant SB_I_NOEXEC/SB_I_NODEV in init_pseudo() callers
Jori Koolstra (2):
vfs: remove always taken if-branch in find_next_fd()
vfs: uapi: retire octal and hex numbers in favor of (1 << n) for O_ flags
Li RongQing (1):
fs/coredump: reduce redundant log noise in validate_coredump_safety
Li Wang (1):
backing-file: fix backing_file_open() kerneldoc parameter
Mateusz Guzik (2):
fs/pipe: write to ->poll_usage only once
fs: retire stale comment in fget_task_next()
Mike Rapoport (Microsoft) (18):
init: do_mounts: use kmalloc() for allocations of temporary buffers
quota: allocate dquot_hash with kmalloc()
proc: replace __get_free_page() with kmalloc()
ocfs2/dlm: replace __get_free_page() with kmalloc()
nilfs2: replace get_zeroed_page() with kzalloc()
NFS: replace __get_free_page() with kmalloc() in nfs_show_devname()
NFS: remove unused page and page2 in nfs4_replace_transport()
NFSD: replace __get_free_page() with kmalloc() in nfsd_buffered_readdir()
libfs: simple_transaction_get(): replace get_zeroed_page() with kzalloc()
jfs: replace __get_free_page() with kmalloc()
jbd2: replace __get_free_pages() with kmalloc()
isofs: replace __get_free_page() with kmalloc()
fuse: replace __get_free_page() with kmalloc()
fs/select: replace __get_free_page() with kmalloc()
fs/namespace: use __getname() to allocate mntpath buffer
configfs: replace __get_free_pages() with kzalloc()
binfmt_misc: replace __get_free_page() with kmalloc()
bfs: replace get_zeroed_page() with kzalloc()
Mingyu Wang (1):
fs/fcntl: fix SOFTIRQ-unsafe lock order in fasync signaling
Qingshuang Fu (1):
fs: fix spelling mistakes in comment
Ricardo B. Marlière (3):
selftests/namespaces: Kill grandchild in nsid fixture teardown
selftests/namespaces: Fix waitpid race in listns_efault_test cleanup
selftests/namespaces: Skip efault tests when listns() is not available
Stefan Metzmacher (5):
uaccess: fix ignored_trailing logic in copy_struct_to_user()
sockptr: fix usize check in copy_struct_from_sockptr() for user pointers
uaccess: add copy_struct_{from,to}_bounce_buffer() helpers
sockptr: let copy_struct_from_sockptr() use copy_struct_from_bounce_buffer()
sockptr: introduce copy_struct_to_sockptr()
Thorsten Blum (2):
dcache: use kmalloc_flex() in __d_alloc
namei: use QSTR() instead of QSTR_INIT() in path_pts
Wang Haoran (1):
iov_iter: use kmemdup_array for dup_iter to harden against overflow
.../filesystems/adding-new-filesystems.rst | 195 +++++++
Documentation/filesystems/index.rst | 1 +
Documentation/filesystems/porting.rst | 1 -
arch/alpha/include/uapi/asm/fcntl.h | 34 +-
arch/arm/include/uapi/asm/fcntl.h | 8 +-
arch/arm64/include/uapi/asm/fcntl.h | 8 +-
arch/m68k/include/uapi/asm/fcntl.h | 8 +-
arch/mips/include/uapi/asm/fcntl.h | 22 +-
arch/parisc/include/uapi/asm/fcntl.h | 28 +-
arch/powerpc/include/uapi/asm/fcntl.h | 8 +-
arch/sparc/include/uapi/asm/fcntl.h | 34 +-
fs/affs/affs.h | 5 -
fs/affs/super.c | 6 +-
fs/aio.c | 1 -
fs/anon_inodes.c | 2 -
fs/backing-file.c | 13 +-
fs/befs/linuxvfs.c | 3 +-
fs/bfs/inode.c | 7 +-
fs/binfmt_misc.c | 4 +-
fs/bpf_fs_kfuncs.c | 16 +
fs/buffer.c | 4 +-
fs/configfs/file.c | 7 +-
fs/coredump.c | 3 +-
fs/dcache.c | 18 +-
fs/exec.c | 6 +-
fs/fcntl.c | 8 +-
fs/file.c | 18 +-
fs/file_table.c | 4 +-
fs/fuse/ioctl.c | 5 +-
fs/hpfs/super.c | 3 +-
fs/iomap/buffered-io.c | 2 +-
fs/isofs/dir.c | 5 +-
fs/isofs/inode.c | 3 +-
fs/jbd2/commit.c | 4 +-
fs/jbd2/journal.c | 7 +-
fs/jfs/jfs_dtree.c | 16 +-
fs/jfs/super.c | 3 +-
fs/libfs.c | 7 +-
fs/minix/inode.c | 3 +-
fs/namei.c | 25 +-
fs/namespace.c | 11 +-
fs/nfs/fs_context.c | 8 +-
fs/nfs/nfs4namespace.c | 15 +-
fs/nfs/super.c | 4 +-
fs/nfsd/vfs.c | 4 +-
fs/nilfs2/ioctl.c | 4 +-
fs/nsfs.c | 1 -
fs/ntfs3/super.c | 8 +-
fs/ocfs2/dlm/dlmdebug.c | 24 +-
fs/ocfs2/dlm/dlmdomain.c | 8 +-
fs/ocfs2/dlm/dlmmaster.c | 5 +-
fs/ocfs2/dlm/dlmrecovery.c | 4 +-
fs/omfs/inode.c | 6 +-
fs/pidfs.c | 2 -
fs/pipe.c | 106 +++-
fs/proc/base.c | 16 +-
fs/qnx4/inode.c | 3 +-
fs/quota/dquot.c | 11 +-
fs/read_write.c | 5 +-
fs/select.c | 4 +-
fs/super.c | 12 +-
fs/sync.c | 3 +-
include/linux/filelock.h | 2 +-
include/linux/fs.h | 1 +
include/linux/kstrtox.h | 9 +-
include/linux/namei.h | 1 -
include/linux/sockptr.h | 28 +-
include/linux/uaccess.h | 65 ++-
include/uapi/asm-generic/fcntl.h | 50 +-
init/do_mounts.c | 21 +-
init/initramfs.c | 68 ++-
init/initramfs_test.c | 97 +++-
lib/iov_iter.c | 8 +-
lib/vsprintf.c | 7 -
mm/secretmem.c | 2 -
tools/testing/selftests/Makefile | 1 +
.../selftests/namespaces/listns_efault_test.c | 33 +-
tools/testing/selftests/namespaces/nsid_test.c | 14 +-
tools/testing/selftests/pipe/.gitignore | 1 +
tools/testing/selftests/pipe/Makefile | 9 +
tools/testing/selftests/pipe/pipe_bench.c | 616 +++++++++++++++++++++
virt/kvm/guest_memfd.c | 2 -
82 files changed, 1464 insertions(+), 390 deletions(-)
create mode 100644 Documentation/filesystems/adding-new-filesystems.rst
create mode 100644 tools/testing/selftests/pipe/.gitignore
create mode 100644 tools/testing/selftests/pipe/Makefile
create mode 100644 tools/testing/selftests/pipe/pipe_bench.c
^ permalink raw reply [flat|nested] 17+ messages in thread
* [GIT PULL 16/16 for v7.2] vfs procfs
2026-06-12 15:10 [GIT PULL 00/16 for v7.2] v7.2 Christian Brauner
` (14 preceding siblings ...)
2026-06-12 15:16 ` [GIT PULL 15/16 for v7.2] vfs misc Christian Brauner
@ 2026-06-12 15:16 ` Christian Brauner
15 siblings, 0 replies; 17+ messages in thread
From: Christian Brauner @ 2026-06-12 15:16 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Christian Brauner, linux-fsdevel, linux-kernel
Hey Linus,
/* Summary */
This contains the procfs changes for this cycle:
* Revamp fs/filesystems.c
The file was a mess with a hand-rolled linked list in desperate need
of a cleanup. The filesystems list is now RCU-ified, /proc files can
be marked permanent from outside fs/proc/, and the string emitted
when reading /proc/filesystems is pre-generated and cached instead
of pointer-chasing and printfing entry by entry on every read. The
file is read frequently because libselinux reads it and is linked
into numerous frequently used programs (even ones you would not
suspect, like sed!). Scalability also improves since reference
maintenance on open/close is bypassed.
open+read+close cycle single-threaded (ops/s):
before: 442732
after: 1063462 (+140%)
open+read+close cycle with 20 processes (ops/s):
before: 606177
after: 3300576 (+444%)
A follow-up patch adds missing unlocks in some corner cases and
tidies things up.
* Relax the mount visibility check for subset=pid mounts
When procfs is mounted with subset=pid, all static files become
unavailable and only the dynamic pid information is accessible. In
that case there is no point in imposing the full mount visibility
restrictions on the mounter - everything that can be hidden in
procfs is already inaccessible. These restrictions prevented procfs
from being mounted inside rootless containers since almost all
container implementations overmount parts of procfs to hide certain
directories.
As part of this /proc/self/net is only shown in subset=pid mounts
for CAP_NET_ADMIN, reconfiguring subset=pid is rejected, the
SB_I_USERNS_VISIBLE superblock flag is replaced with an
FS_USERNS_MOUNT_RESTRICTED filesystem flag, fully visible mounts are
recorded in a list, and the mount restrictions are finally
documented.
* Protect ptrace_may_access() with exec_update_lock in procfs
Most uses of ptrace_may_access() in procfs should hold
exec_update_lock to avoid TOCTOU issues with concurrent privileged
execve() (like setuid binary execution). This fixes the easy cases -
the owner and visibility checks and the FD link permission checks -
with the gnarlier ones to follow later.
/* Testing */
gcc (Debian 14.2.0-19) 14.2.0
Debian clang version 19.1.7 (3+b1)
No build failures or warnings were observed.
/* Conflicts */
Merge conflicts with mainline
=============================
No known conflicts.
Merge conflicts with other trees
================================
This will have a merge conflict with:
[1]: https://lore.kernel.org/20260612-vfs-misc-v72-13d57389d260@brauner
Both add a new fs_flags define at the same location in
include/linux/fs.h. The bit values don't overlap. It can be resolved
as follows:
diff --cc include/linux/fs.h
index 10d35a68f597,e7ff9f8b1485..dcd0575a3830
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@@ -2281,7 -2281,7 +2281,8 @@@ struct file_system_type
#define FS_MGTIME 64 /* FS uses multigrain timestamps */
#define FS_LBS 128 /* FS supports LBS */
#define FS_POWER_FREEZE 256 /* Always freeze on suspend/hibernate */
+ #define FS_USERNS_MOUNT_RESTRICTED 512 /* Restrict mount in userns if not already visible */
+#define FS_USERNS_DELEGATABLE 1024 /* Can be mounted inside userns from outside */
#define FS_RENAME_DOES_D_MOVE 32768 /* FS will handle d_move() during rename() internally. */
int (*init_fs_context)(struct fs_context *);
const struct fs_parameter_spec *parameters;
The following changes since commit 254f49634ee16a731174d2ae34bc50bd5f45e731:
Linux 7.1-rc1 (2026-04-26 14:19:00 -0700)
are available in the Git repository at:
git@gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs tags/vfs-7.2-rc1.procfs
for you to fetch changes up to cf30ceccfaec3d2549ff60f7c915625f12dd3a93:
fs: fix ups and tidy ups to /proc/filesystems caching (2026-06-12 14:26:27 +0200)
----------------------------------------------------------------
vfs-7.2-rc1.procfs
Please consider pulling these changes from the signed vfs-7.2-rc1.procfs tag.
Thanks!
Christian
----------------------------------------------------------------
Alexey Dobriyan (1):
proc: allow to mark /proc files permanent outside of fs/proc/
Alexey Gladkov (4):
proc: subset=pid: Show /proc/self/net only for CAP_NET_ADMIN
proc: prevent reconfiguring subset=pid
proc: handle subset=pid separately in userns visibility checks
docs: proc: add documentation about mount restrictions
Christian Brauner (7):
namespace: record fully visible mounts in list
fs: move SB_I_USERNS_VISIBLE to FS_USERNS_MOUNT_RESTRICTED
fs: RCU-ify filesystems list
sysfs: remove trivial sysfs_get_tree() wrapper
Merge patch series "revamp fs/filesystems.c"
Merge patch series "proc: subset=pid: Relax check of mount visibility"
Merge patch series "proc: protect ptrace_may_access() with exec_update_lock"
Jann Horn (2):
proc: protect ptrace_may_access() with exec_update_lock (part 1)
proc: protect ptrace_may_access() with exec_update_lock (FD links)
Mateusz Guzik (2):
fs: cache the string generated by reading /proc/filesystems
fs: fix ups and tidy ups to /proc/filesystems caching
Documentation/filesystems/proc.rst | 19 ++-
fs/filesystems.c | 330 +++++++++++++++++++++++++------------
fs/mount.h | 4 +
fs/namespace.c | 34 +++-
fs/ocfs2/super.c | 1 -
fs/proc/array.c | 6 +
fs/proc/base.c | 160 ++++++++----------
fs/proc/fd.c | 27 ++-
fs/proc/generic.c | 10 ++
fs/proc/internal.h | 5 +-
fs/proc/namespaces.c | 12 ++
fs/proc/proc_net.c | 8 +
fs/proc/root.c | 24 ++-
fs/sysfs/mount.c | 18 +-
include/linux/fs.h | 3 +-
include/linux/fs/super_types.h | 2 +-
include/linux/proc_fs.h | 13 ++
kernel/acct.c | 2 +-
18 files changed, 429 insertions(+), 249 deletions(-)
^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2026-06-12 15:16 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-12 15:10 [GIT PULL 00/16 for v7.2] v7.2 Christian Brauner
2026-06-12 15:11 ` [GIT PULL 01/16 for v7.2] vfs kfunc Christian Brauner
2026-06-12 15:11 ` [GIT PULL 02/16 for v7.2] vfs exportfs Christian Brauner
2026-06-12 15:12 ` [GIT PULL 03/16 for v7.2] vfs inode Christian Brauner
2026-06-12 15:12 ` [GIT PULL 04/16 for v7.2] vfs directory delegations Christian Brauner
2026-06-12 15:12 ` [GIT PULL 05/16 for v7.2] vfs casefold Christian Brauner
2026-06-12 15:13 ` [GIT PULL 06/16 for v7.2] kernel task_exec_state Christian Brauner
2026-06-12 15:13 ` [GIT PULL 07/16 for v7.2] kernel misc Christian Brauner
2026-06-12 15:13 ` [GIT PULL 08/16 for v7.2] vfs openat2 Christian Brauner
2026-06-12 15:14 ` [GIT PULL 09/16 for v7.2] vfs super Christian Brauner
2026-06-12 15:14 ` [GIT PULL 10/16 for v7.2] vfs writeback Christian Brauner
2026-06-12 15:14 ` [GIT PULL 11/16 for v7.2] vfs bh Christian Brauner
2026-06-12 15:15 ` [GIT PULL 12/16 for v7.2] vfs eventpoll Christian Brauner
2026-06-12 15:15 ` [GIT PULL 13/16 for v7.2] vfs iomap Christian Brauner
2026-06-12 15:15 ` [GIT PULL 14/16 for v7.2] vfs xattr Christian Brauner
2026-06-12 15:16 ` [GIT PULL 15/16 for v7.2] vfs misc Christian Brauner
2026-06-12 15:16 ` [GIT PULL 16/16 for v7.2] vfs procfs Christian Brauner
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.