* Re: [SECURITY] e2fsprogs v1.47.4 Vulnerabilities — Orphan File & Extent Handling
From: Theodore Tso @ 2026-04-03 23:11 UTC (permalink / raw)
To: Andreas Dilger; +Cc: 4fqr, linux-ext4@vger.kernel.org
In-Reply-To: <0778B5AC-16ED-4736-AE64-849541629466@dilger.ca>
On Fri, Apr 03, 2026 at 10:17:38AM -0600, Andreas Dilger wrote:
>
> I don't see how this exposes any kind of security vulnerability if it
> requires that the image be specially modified in the first place?
> At that point the "attacker" can directly modify the image in any way
> they want, regardless of how e2fsprogs behaves.
After taking a closer look the "vulnerabilities", I understand where
Andreas is coming from. If the threat model is "the attacker can
modify the file system, then the fact that you can craft the orphan to
"force" the the file system to release an inode isn't interesting ---
because the attacker could just simply overwrite the inode and mark
the blocks as not in use in the block allocation directly.
If there was a way an attempt to pass an inode number which is very
large to release_orphan_inode() could result in a buffer overrun, then
that might be interesting. But the oh, nooes! You might be able to
force the file system to release the resize inode is not interesting;
the attacker could just stomp on the resize inode directly.
Should we add some better bounds checking? Sure, so that we can give
a more user-friendly error message, and reduce the chance of
accidentally making things worse if the file system is corrupted by
chance and metadata checksum is not enabled. But is it a "security
vulnerability"? No.
No, if you could actually force a malicious payload to run due to a
stack overrun attack, that would be interesting. And I was expecting
to find *something* like that in your analysis, only to be
disappointed.
Cheers,
- Ted
^ permalink raw reply
* Re: [PATCH v2 3/3] ext4: derive f_fsid from block device to avoid collisions
From: Anand Jain @ 2026-04-04 8:59 UTC (permalink / raw)
To: Theodore Tso, Christoph Hellwig, Darrick J. Wong
Cc: linux-ext4, linux-btrfs, linux-xfs, Anand Jain
In-Reply-To: <d4a9970b-e7ed-4e74-be9d-2d08400f9d79@gmail.com>
Hi Ted, Christoph, Darrick,
As I prepare v3, I'd appreciate your final thoughts on the mount option
naming and its necessity for ext4.
For the new option, I am considering:
-o nodup_f_fsid
-o unique_f_fsid
Context:
Currently, ext4's f_fsid is consistent across reboots but fails to be
unique when dealing with cloned filesystems (sharing the same UUID). Per
statfs(2) [1], the primary requirement is that the (f_fsid, ino) pair
uniquely identifies a file. The man page makes no explicit guarantee
regarding consistency across mount cycles or reboots.
Proposal:
With this fix, f_fsid becomes f(uuid, dev_t). This ensures OS-wide
uniqueness and maintains consistency as long as the underlying dev_t
remains stable.
Dilemma:
While statfs(2) [1] suggests f_fsid is "some random stuff," we know
userspace (NFS, systemd) often treats it as a persistent handle.
Do you prefer one of the names above, or is there a more idiomatic ext4
naming convention I should follow?
Given the ambiguity in the man page, is gating this behind an -o option
necessary, or should we consider making uniqueness the default behavior?
[1]
----------
statfs(2)
<snap>
Nobody knows what f_fsid is supposed to contain (but see below).
<snap>
The f_fsid field
Solaris, Irix, and POSIX have a system call statvfs(2)
that returns a struct statvfs (defined in <sys/statvfs.h>) containing
an unsigned long f_fsid. Linux, SunOS, HP-UX, 4.4BSD have a system
call statfs() that returns a struct statfs (defined in <sys/vfs.h>)
containing a fsid_t f_fsid, where fsid_t is defined as struct { int
val[2]; }. The same holds for FreeBSD, except that it uses the
include file <sys/mount.h>.
The general idea is that f_fsid contains some random stuff such
that the pair (f_fsid,ino) uniquely determines a file. Some operating
systems use (a variation on) the device number, or the device number
combined with the filesystem type. Several operating systems restrict
giving out the f_fsid field to the superuser only (and zero it for
unprivileged users), because this field is used in the filehandle of
the filesystem when NFS-exported, and giving it out is a security concern.
Under some operating systems, the fsid can be used as the second
argument to the sysfs(2) system call.
----------
Thanks, Anand
^ permalink raw reply
* [PATCH] ext2: reject inodes with zero i_nlink and valid mode in ext2_iget()
From: Vasiliy Kovalev @ 2026-04-04 15:20 UTC (permalink / raw)
To: Jan Kara, linux-ext4
Cc: Andrew Morton, Alexey Dobriyan, linux-kernel, lvc-project,
kovalev
ext2_iget() already rejects inodes with i_nlink == 0 when i_mode is
zero or i_dtime is set, treating them as deleted. However, the case of
i_nlink == 0 with a non-zero mode and zero dtime slips through. Since
ext2 has no orphan list, such a combination can only result from
filesystem corruption - a legitimate inode deletion always sets either
i_dtime or clears i_mode before freeing the inode.
A crafted image can exploit this gap to present such an inode to the
VFS, which then triggers WARN_ON inside drop_nlink() (fs/inode.c) via
ext2_unlink(), ext2_rename() and ext2_rmdir():
WARNING: CPU: 3 PID: 609 at fs/inode.c:336 drop_nlink+0xad/0xd0 fs/inode.c:336
CPU: 3 UID: 0 PID: 609 Comm: syz-executor Not tainted 6.12.77+ #1
Call Trace:
<TASK>
inode_dec_link_count include/linux/fs.h:2518 [inline]
ext2_unlink+0x26c/0x300 fs/ext2/namei.c:295
vfs_unlink+0x2fc/0x9b0 fs/namei.c:4477
do_unlinkat+0x53e/0x730 fs/namei.c:4541
__x64_sys_unlink+0xc6/0x110 fs/namei.c:4587
do_syscall_64+0xf5/0x220 arch/x86/entry/common.c:78
entry_SYSCALL_64_after_hwframe+0x77/0x7f
</TASK>
WARNING: CPU: 0 PID: 646 at fs/inode.c:336 drop_nlink+0xad/0xd0 fs/inode.c:336
CPU: 0 UID: 0 PID: 646 Comm: syz.0.17 Not tainted 6.12.77+ #1
Call Trace:
<TASK>
inode_dec_link_count include/linux/fs.h:2518 [inline]
ext2_rename+0x35e/0x850 fs/ext2/namei.c:374
vfs_rename+0xf2f/0x2060 fs/namei.c:5021
do_renameat2+0xbe2/0xd50 fs/namei.c:5178
__x64_sys_rename+0x7e/0xa0 fs/namei.c:5223
do_syscall_64+0xf5/0x220 arch/x86/entry/common.c:78
entry_SYSCALL_64_after_hwframe+0x77/0x7f
</TASK>
WARNING: CPU: 0 PID: 634 at fs/inode.c:336 drop_nlink+0xad/0xd0 fs/inode.c:336
CPU: 0 UID: 0 PID: 634 Comm: syz-executor Not tainted 6.12.77+ #1
Call Trace:
<TASK>
inode_dec_link_count include/linux/fs.h:2518 [inline]
ext2_rmdir+0xca/0x110 fs/ext2/namei.c:311
vfs_rmdir+0x204/0x690 fs/namei.c:4348
do_rmdir+0x372/0x3e0 fs/namei.c:4407
__x64_sys_unlinkat+0xf0/0x130 fs/namei.c:4577
do_syscall_64+0xf5/0x220 arch/x86/entry/common.c:78
entry_SYSCALL_64_after_hwframe+0x77/0x7f
</TASK>
Extend the existing i_nlink == 0 check to also catch this case,
reporting the corruption via ext2_error() and returning -EFSCORRUPTED.
This rejects the inode at load time and prevents it from reaching any
of the namei.c paths.
Found by Linux Verification Center (linuxtesting.org) with Syzkaller.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Cc: stable@vger.kernel.org
Signed-off-by: Vasiliy Kovalev <kovalev@altlinux.org>
---
fs/ext2/inode.c | 14 +++++++++++---
1 file changed, 11 insertions(+), 3 deletions(-)
diff --git a/fs/ext2/inode.c b/fs/ext2/inode.c
index dbfe9098a124..39d972722f5f 100644
--- a/fs/ext2/inode.c
+++ b/fs/ext2/inode.c
@@ -1430,9 +1430,17 @@ struct inode *ext2_iget (struct super_block *sb, unsigned long ino)
* the test is that same one that e2fsck uses
* NeilBrown 1999oct15
*/
- if (inode->i_nlink == 0 && (inode->i_mode == 0 || ei->i_dtime)) {
- /* this inode is deleted */
- ret = -ESTALE;
+ if (inode->i_nlink == 0) {
+ if (inode->i_mode == 0 || ei->i_dtime) {
+ /* this inode is deleted */
+ ret = -ESTALE;
+ } else {
+ ext2_error(sb, __func__,
+ "inode %lu has zero i_nlink with mode 0%o and no dtime, "
+ "filesystem may be corrupt",
+ ino, inode->i_mode);
+ ret = -EFSCORRUPTED;
+ }
goto bad_inode;
}
inode->i_blocks = le32_to_cpu(raw_inode->i_blocks);
--
2.50.1
^ permalink raw reply related
* Re: [PATCH 0/2] ext2: fix WARN_ON in drop_nlink() triggered by corrupt images
From: Vasiliy Kovalev @ 2026-04-04 15:27 UTC (permalink / raw)
To: Jan Kara, Andrew Morton, Alexey Dobriyan, linux-ext4
Cc: linux-kernel, lvc-project
In-Reply-To: <20260401220837.2424925-1-kovalev@altlinux.org>
On 4/2/26 01:08, Vasiliy Kovalev wrote:
> A crafted ext2 image can contain a directory entry pointing to an inode
> whose on-disk i_links_count is zero. ext2 mounts such an image without
> error. Any subsequent syscall that decrements i_nlink on that inode
> triggers WARN_ON inside drop_nlink() in fs/inode.c.
>
> These patches prevent the warning by validating i_nlink before decrementing
> it in ext2_unlink() and ext2_rename(), reporting the corruption via
> ext2_error() instead.
>
> The issues were found by Linux Verification Center (linuxtesting.org)
> with Syzkaller.
>
> Vasiliy Kovalev (2):
> ext2: validate i_nlink before decrement in ext2_unlink()
> ext2: guard against zero i_nlink on new_inode in ext2_rename()
Syzkaller found a third trigger via ext2_rmdir(). Rather than adding
another guard in namei.c, I fixed the root cause in ext2_iget() instead
- a single check there covers all three cases at once.
New patch:
https://lore.kernel.org/all/20260404152011.2590197-1-kovalev@altlinux.org/
If the previous two patches have not been picked up yet, please
consider this one as a replacement for the entire series.
> fs/ext2/namei.c | 14 +++++++++++++-
> 1 file changed, 13 insertions(+), 1 deletion(-)
>
> --- [Reproducer for PATCH 1/2: ext2_unlink] ---
> [...]
>
> --- [Reproducer for PATCH 2/2: ext2_rename] ---
> [...]
--
Thanks,
Vasiliy
^ permalink raw reply
* [PATCH v3] ext2: use get_random_u32() where appropriate
From: David Carlier @ 2026-04-05 15:47 UTC (permalink / raw)
To: Jan Kara; +Cc: linux-ext4, linux-kernel, David Carlier
Use the typed random integer helpers instead of
get_random_bytes() when filling a single integer variable.
The helpers return the value directly, require no pointer
or size argument, and better express intent.
Signed-off-by: David Carlier <devnexen@gmail.com>
---
fs/ext2/super.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/ext2/super.c b/fs/ext2/super.c
index 603f2641fe10..e4136490c883 100644
--- a/fs/ext2/super.c
+++ b/fs/ext2/super.c
@@ -1151,7 +1151,7 @@ static int ext2_fill_super(struct super_block *sb, struct fs_context *fc)
goto failed_mount2;
}
sbi->s_gdb_count = db_count;
- get_random_bytes(&sbi->s_next_generation, sizeof(u32));
+ sbi->s_next_generation = get_random_u32();
spin_lock_init(&sbi->s_next_gen_lock);
/* per filesystem reservation list head & lock */
--
2.53.0
^ permalink raw reply related
* [RFC PATCH v1 0/6] provenance_time (ptime): a new settable timestamp for cross-filesystem provenance
From: Sean Smith @ 2026-04-05 19:49 UTC (permalink / raw)
To: linux-fsdevel
Cc: linux-ext4, linux-btrfs, tytso, dsterba, david, brauner, osandov,
almaz, hirofumi, linkinjeon, Sean Smith
This series adds provenance_time (ptime) -- a new settable inode
timestamp that records when a file's content was first created,
preserving this date across copies, moves, and application saves.
This is a working implementation of the concept I proposed in my
RFC in March:
https://lore.kernel.org/linux-fsdevel/CAOx6djP4hb-Cd1Zk07SNfFfLc8irjNmbVqq+58h1Whz+h1wSFA@mail.gmail.com/T/#u
MOTIVATION
Linux has no mechanism to preserve original creation dates when
files move between filesystems. Every copy resets btime to "now."
For workflows involving document migration (NTFS to Btrfs, between
ext4 volumes, to USB drives), creation date provenance is lost.
Since the March RFC, I attempted an xattr-based workaround
(user.provenance_time) and found it structurally unworkable:
1. Application atomic saves destroy xattrs. Programs that save
via write-to-temp + rename() replace the inode, permanently
destroying all extended attributes. Only the VFS sees both
inodes during rename -- no userspace mechanism can intercept
this and copy metadata across.
2. Every tool in the copy chain must explicitly opt in to xattr
preservation. cp requires --preserve=xattr, rsync requires -X,
tar requires --xattrs. Each missing flag causes silent data
loss. Transparent preservation through arbitrary tool flows
is not achievable in userspace.
Atomic saves are the default behavior of mainstream applications
(LibreOffice, Vim, Kate, etc.).
DESIGN
ptime is a separate timestamp from btime. btime remains immutable
and forensic ("when was this inode born on this disk"). ptime is
settable and portable ("when was this content first created").
This resolves the 2019 impasse: Dave Chinner's forensic argument
for immutable btime is fully respected -- btime is untouched on
native Linux filesystems. Ted Ts'o's March 2025 concept of a
settable "crtime" alongside immutable btime is implemented in ext4
with dedicated i_ptime fields.
Two implementation categories:
Native (Btrfs, ext4): Dedicated on-disk ptime field. btime
remains immutable. Full nanosecond precision.
Mapped (ntfs3, FAT32/vfat, exFAT): ptime reads/writes the
existing creation time field. This matches Windows and macOS
behavior, where creation time is already settable via standard
APIs. No new on-disk structures needed.
Key VFS capability -- rename-over preservation: when rename()
overwrites an existing file, the kernel copies ptime from the
old file to the new file. This fixes the atomic-save xattr
destruction problem at its root, for every application on
every supported filesystem.
API
ptime is exposed through existing interfaces with minimal
additions:
- statx: STATX_PTIME (0x00040000U) returns ptime in stx_ptime
- utimensat: AT_UTIME_PTIME (0x20000) flag with times[2]
extension for setting ptime
- setattr_prepare: ATTR_PTIME (bit 19) / ATTR_PTIME_SET (bit 20)
The utimensat extension reuses Sandoval's 2019 pattern. For
upstream, an extensible-struct syscall (utimensat2, following
the clone3/openat2 convention) may be preferred -- I am open
to guidance on the API design.
Permissions follow the existing utimensat model: file owner
or CAP_FOWNER required.
TESTING
This has been running on EndeavourOS (kernel 6.19.11) for daily
use. Test coverage:
- 10 xfstests (7 generic VFS + 3 Btrfs-specific): basic
set/read, persistence, rename-over, permissions, utime-omit,
chmod/truncate survival, snapshots, nlink guards, compat_ro
- Runtime tests across all 5 filesystems: set/read, rename-over,
cp -a preservation, cross-FS copies (Btrfs, ext4, ntfs3,
FAT32, exFAT)
KNOWN LIMITATIONS
- XFS: deferred (separate inode structure analysis needed)
- Btrfs send/receive: not yet patched for ptime
- glibc utimensat() wrapper: cannot pass ptime; tools use raw
syscall()
- Btrfs compat_ro: writing ptime sets a compat_ro flag;
unpatched kernels refuse RW mount (correct Btrfs behavior)
The userspace ecosystem (patched cp, rsync, tar, KDE Dolphin)
and xfstests are available at:
https://github.com/DefendTheDisabled/linux-ptime
This implementation was developed using AI-assisted tooling for
code generation, iterative review, and test infrastructure. I am
responsible for review, testing, and sign-off.
Sean Smith (6):
vfs: add provenance_time (ptime) infrastructure
btrfs: add provenance time (ptime) support
ntfs3: map ptime to NTFS creation time with rename-over
ext4: add dedicated ptime field alongside i_crtime
fat: map ptime to FAT creation time with rename-over
exfat: map ptime to exFAT creation time with rename-over
fs/attr.c | 6 +++-
fs/btrfs/btrfs_inode.h | 4 +++
fs/btrfs/delayed-inode.c | 4 +++
fs/btrfs/fs.h | 3 +-
fs/btrfs/inode.c | 43 +++++++++++++++++++++++++
fs/btrfs/tree-log.c | 2 ++
fs/btrfs/volumes.c | 2 +-
fs/exfat/file.c | 9 ++++++
fs/exfat/namei.c | 21 +++++++++++--
fs/ext4/ext4.h | 3 ++
fs/ext4/inode.c | 14 +++++++++
fs/ext4/namei.c | 13 ++++++++
fs/fat/file.c | 6 ++++
fs/fat/namei_vfat.c | 20 ++++++++++--
fs/init.c | 2 +-
fs/ntfs3/file.c | 13 ++++++++
fs/ntfs3/frecord.c | 8 +++++
fs/ntfs3/namei.c | 14 +++++++++
fs/stat.c | 2 ++
fs/utimes.c | 56 +++++++++++++++++++++++++--------
include/linux/fs.h | 5 ++-
include/linux/stat.h | 1 +
include/uapi/linux/btrfs.h | 1 +
include/uapi/linux/btrfs_tree.h | 4 ++-
include/uapi/linux/fcntl.h | 3 ++
include/uapi/linux/stat.h | 4 ++-
init/initramfs.c | 2 +-
27 files changed, 239 insertions(+), 26 deletions(-)
--
2.53.0
^ permalink raw reply
* [PATCH 1/6] vfs: add provenance_time (ptime) infrastructure
From: Sean Smith @ 2026-04-05 19:49 UTC (permalink / raw)
To: linux-fsdevel
Cc: linux-ext4, linux-btrfs, tytso, dsterba, david, brauner, osandov,
almaz, hirofumi, linkinjeon, Sean Smith
In-Reply-To: <20260405195007.1306-1-DefendTheDisabled@gmail.com>
Add a new settable inode timestamp, provenance_time (ptime), for tracking
the original creation date of file content across filesystem boundaries.
ptime is distinct from btime (forensic, immutable): it records when file
content first came into existence on any filesystem, and is designed to
be set during cross-filesystem migration and preserved through copies.
VFS changes:
- ATTR_PTIME (bit 19) and ATTR_PTIME_SET (bit 20) in struct iattr
- STATX_PTIME (0x00040000) in struct statx at offset 0xC0
- AT_UTIME_PTIME (0x20000) flag for utimensat()
- ptime field in struct kstat
- Permission model matches mtime (owner or CAP_FOWNER)
- UTIME_NOW and UTIME_OMIT supported for ptime element
- All existing vfs_utimes() callers updated for new flags parameter
Signed-off-by: Sean Smith <DefendTheDisabled@gmail.com>
---
fs/attr.c | 6 +++-
fs/btrfs/volumes.c | 2 +-
fs/init.c | 2 +-
fs/stat.c | 2 ++
fs/utimes.c | 56 +++++++++++++++++++++++++++++---------
include/linux/fs.h | 5 +++-
include/linux/stat.h | 1 +
include/uapi/linux/fcntl.h | 3 ++
include/uapi/linux/stat.h | 4 ++-
init/initramfs.c | 2 +-
10 files changed, 64 insertions(+), 19 deletions(-)
diff --git a/fs/attr.c b/fs/attr.c
index b9ec6b47b..7fa9c01d1 100644
--- a/fs/attr.c
+++ b/fs/attr.c
@@ -206,7 +206,7 @@ int setattr_prepare(struct mnt_idmap *idmap, struct dentry *dentry,
}
/* Check for setting the inode time. */
- if (ia_valid & (ATTR_MTIME_SET | ATTR_ATIME_SET | ATTR_TIMES_SET)) {
+ if (ia_valid & (ATTR_MTIME_SET | ATTR_ATIME_SET | ATTR_PTIME_SET | ATTR_TIMES_SET)) {
if (!inode_owner_or_capable(idmap, inode))
return -EPERM;
}
@@ -466,6 +466,10 @@ int notify_change(struct mnt_idmap *idmap, struct dentry *dentry,
attr->ia_mtime = timestamp_truncate(attr->ia_mtime, inode);
else
attr->ia_mtime = now;
+ if (ia_valid & ATTR_PTIME_SET)
+ attr->ia_ptime = timestamp_truncate(attr->ia_ptime, inode);
+ else if (ia_valid & ATTR_PTIME)
+ attr->ia_ptime = now;
if (ia_valid & ATTR_KILL_PRIV) {
error = security_inode_need_killpriv(dentry);
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 052b830a0..0e81f2cc9 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -2117,7 +2117,7 @@ static void update_dev_time(const char *device_path)
struct path path;
if (!kern_path(device_path, LOOKUP_FOLLOW, &path)) {
- vfs_utimes(&path, NULL);
+ vfs_utimes(&path, NULL, 0);
path_put(&path);
}
}
diff --git a/fs/init.c b/fs/init.c
index e0f5429c0..e9a9f4d93 100644
--- a/fs/init.c
+++ b/fs/init.c
@@ -254,7 +254,7 @@ int __init init_utimes(char *filename, struct timespec64 *ts)
error = kern_path(filename, 0, &path);
if (error)
return error;
- error = vfs_utimes(&path, ts);
+ error = vfs_utimes(&path, ts, 0);
path_put(&path);
return error;
}
diff --git a/fs/stat.c b/fs/stat.c
index 6c79661e1..9284bb753 100644
--- a/fs/stat.c
+++ b/fs/stat.c
@@ -728,6 +728,8 @@ cp_statx(const struct kstat *stat, struct statx __user *buffer)
tmp.stx_atime.tv_nsec = stat->atime.tv_nsec;
tmp.stx_btime.tv_sec = stat->btime.tv_sec;
tmp.stx_btime.tv_nsec = stat->btime.tv_nsec;
+ tmp.stx_ptime.tv_sec = stat->ptime.tv_sec;
+ tmp.stx_ptime.tv_nsec = stat->ptime.tv_nsec;
tmp.stx_ctime.tv_sec = stat->ctime.tv_sec;
tmp.stx_ctime.tv_nsec = stat->ctime.tv_nsec;
tmp.stx_mtime.tv_sec = stat->mtime.tv_sec;
diff --git a/fs/utimes.c b/fs/utimes.c
index 86f8ce8cd..50b5ad296 100644
--- a/fs/utimes.c
+++ b/fs/utimes.c
@@ -17,10 +17,10 @@ static bool nsec_valid(long nsec)
return nsec >= 0 && nsec <= 999999999;
}
-int vfs_utimes(const struct path *path, struct timespec64 *times)
+int vfs_utimes(const struct path *path, struct timespec64 *times, int flags)
{
int error;
- struct iattr newattrs;
+ struct iattr newattrs = {};
struct inode *inode = path->dentry->d_inode;
struct delegated_inode delegated_inode = { };
@@ -28,7 +28,11 @@ int vfs_utimes(const struct path *path, struct timespec64 *times)
if (!nsec_valid(times[0].tv_nsec) ||
!nsec_valid(times[1].tv_nsec))
return -EINVAL;
- if (times[0].tv_nsec == UTIME_NOW &&
+ if ((flags & AT_UTIME_PTIME) &&
+ !nsec_valid(times[2].tv_nsec))
+ return -EINVAL;
+ if (!(flags & AT_UTIME_PTIME) &&
+ times[0].tv_nsec == UTIME_NOW &&
times[1].tv_nsec == UTIME_NOW)
times = NULL;
}
@@ -52,6 +56,15 @@ int vfs_utimes(const struct path *path, struct timespec64 *times)
newattrs.ia_mtime = times[1];
newattrs.ia_valid |= ATTR_MTIME_SET;
}
+ if (flags & AT_UTIME_PTIME) {
+ if (times[2].tv_nsec != UTIME_OMIT) {
+ newattrs.ia_valid |= ATTR_PTIME;
+ if (times[2].tv_nsec != UTIME_NOW) {
+ newattrs.ia_ptime = times[2];
+ newattrs.ia_valid |= ATTR_PTIME_SET;
+ }
+ }
+ }
/*
* Tell setattr_prepare(), that this is an explicit time
* update, even if neither ATTR_ATIME_SET nor ATTR_MTIME_SET
@@ -84,7 +97,7 @@ static int do_utimes_path(int dfd, const char __user *filename,
struct path path;
int lookup_flags = 0, error;
- if (flags & ~(AT_SYMLINK_NOFOLLOW | AT_EMPTY_PATH))
+ if (flags & ~(AT_SYMLINK_NOFOLLOW | AT_EMPTY_PATH | AT_UTIME_PTIME))
return -EINVAL;
if (!(flags & AT_SYMLINK_NOFOLLOW))
@@ -97,7 +110,7 @@ static int do_utimes_path(int dfd, const char __user *filename,
if (error)
return error;
- error = vfs_utimes(&path, times);
+ error = vfs_utimes(&path, times, flags);
path_put(&path);
if (retry_estale(error, lookup_flags)) {
lookup_flags |= LOOKUP_REVAL;
@@ -109,13 +122,13 @@ static int do_utimes_path(int dfd, const char __user *filename,
static int do_utimes_fd(int fd, struct timespec64 *times, int flags)
{
- if (flags)
+ if (flags & ~AT_UTIME_PTIME)
return -EINVAL;
CLASS(fd, f)(fd);
if (fd_empty(f))
return -EBADF;
- return vfs_utimes(&fd_file(f)->f_path, times);
+ return vfs_utimes(&fd_file(f)->f_path, times, flags);
}
/*
@@ -144,16 +157,24 @@ long do_utimes(int dfd, const char __user *filename, struct timespec64 *times,
SYSCALL_DEFINE4(utimensat, int, dfd, const char __user *, filename,
struct __kernel_timespec __user *, utimes, int, flags)
{
- struct timespec64 tstimes[2];
+ struct timespec64 tstimes[3];
+
+ if ((flags & AT_UTIME_PTIME) && !utimes)
+ return -EINVAL;
if (utimes) {
if ((get_timespec64(&tstimes[0], &utimes[0]) ||
- get_timespec64(&tstimes[1], &utimes[1])))
+ get_timespec64(&tstimes[1], &utimes[1])))
+ return -EFAULT;
+ if ((flags & AT_UTIME_PTIME) &&
+ get_timespec64(&tstimes[2], &utimes[2]))
return -EFAULT;
/* Nothing to do, we must not even check the path. */
if (tstimes[0].tv_nsec == UTIME_OMIT &&
- tstimes[1].tv_nsec == UTIME_OMIT)
+ tstimes[1].tv_nsec == UTIME_OMIT &&
+ (!(flags & AT_UTIME_PTIME) ||
+ tstimes[2].tv_nsec == UTIME_OMIT))
return 0;
}
@@ -247,14 +268,23 @@ SYSCALL_DEFINE2(utime32, const char __user *, filename,
SYSCALL_DEFINE4(utimensat_time32, unsigned int, dfd, const char __user *, filename, struct old_timespec32 __user *, t, int, flags)
{
- struct timespec64 tv[2];
+ struct timespec64 tv[3];
+
+ if ((flags & AT_UTIME_PTIME) && !t)
+ return -EINVAL;
- if (t) {
+ if (t) {
if (get_old_timespec32(&tv[0], &t[0]) ||
get_old_timespec32(&tv[1], &t[1]))
return -EFAULT;
+ if ((flags & AT_UTIME_PTIME) &&
+ get_old_timespec32(&tv[2], &t[2]))
+ return -EFAULT;
- if (tv[0].tv_nsec == UTIME_OMIT && tv[1].tv_nsec == UTIME_OMIT)
+ if (tv[0].tv_nsec == UTIME_OMIT &&
+ tv[1].tv_nsec == UTIME_OMIT &&
+ (!(flags & AT_UTIME_PTIME) ||
+ tv[2].tv_nsec == UTIME_OMIT))
return 0;
}
return do_utimes(dfd, filename, t ? tv : NULL, flags);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index a01621fa6..07719e216 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -239,6 +239,8 @@ typedef int (dio_iodone_t)(struct kiocb *iocb, loff_t offset,
#define ATTR_TIMES_SET (1 << 16)
#define ATTR_TOUCH (1 << 17)
#define ATTR_DELEG (1 << 18) /* Delegated attrs. Don't break write delegations */
+#define ATTR_PTIME (1 << 19) /* Set provenance time */
+#define ATTR_PTIME_SET (1 << 20) /* Set provenance time to specific value */
/*
* Whiteout is represented by a char device. The following constants define the
@@ -283,6 +285,7 @@ struct iattr {
struct timespec64 ia_atime;
struct timespec64 ia_mtime;
struct timespec64 ia_ctime;
+ struct timespec64 ia_ptime;
/*
* Not an attribute, but an auxiliary info for filesystems wanting to
@@ -1814,7 +1817,7 @@ int vfs_mkobj(struct dentry *, umode_t,
int vfs_fchown(struct file *file, uid_t user, gid_t group);
int vfs_fchmod(struct file *file, umode_t mode);
-int vfs_utimes(const struct path *path, struct timespec64 *times);
+int vfs_utimes(const struct path *path, struct timespec64 *times, int flags);
#ifdef CONFIG_COMPAT
extern long compat_ptr_ioctl(struct file *file, unsigned int cmd,
diff --git a/include/linux/stat.h b/include/linux/stat.h
index e3d00e7bb..52272000c 100644
--- a/include/linux/stat.h
+++ b/include/linux/stat.h
@@ -48,6 +48,7 @@ struct kstat {
struct timespec64 mtime;
struct timespec64 ctime;
struct timespec64 btime; /* File creation time */
+ struct timespec64 ptime; /* Provenance time */
u64 blocks;
u64 mnt_id;
u64 change_cookie;
diff --git a/include/uapi/linux/fcntl.h b/include/uapi/linux/fcntl.h
index aadfbf6e0..f80ce0295 100644
--- a/include/uapi/linux/fcntl.h
+++ b/include/uapi/linux/fcntl.h
@@ -190,4 +190,7 @@ struct delegation {
#define AT_EXECVE_CHECK 0x10000 /* Only perform a check if execution
would be allowed. */
+/* Flag for utimensat(2): times[2] carries provenance time */
+#define AT_UTIME_PTIME 0x20000
+
#endif /* _UAPI_LINUX_FCNTL_H */
diff --git a/include/uapi/linux/stat.h b/include/uapi/linux/stat.h
index 1686861aa..0c8db3715 100644
--- a/include/uapi/linux/stat.h
+++ b/include/uapi/linux/stat.h
@@ -187,7 +187,8 @@ struct statx {
__u32 __spare2[1];
/* 0xc0 */
- __u64 __spare3[8]; /* Spare space for future expansion */
+ struct statx_timestamp stx_ptime; /* File provenance time */
+ __u64 __spare3[6]; /* Spare space for future expansion */
/* 0x100 */
};
@@ -219,6 +220,7 @@ struct statx {
#define STATX_SUBVOL 0x00008000U /* Want/got stx_subvol */
#define STATX_WRITE_ATOMIC 0x00010000U /* Want/got atomic_write_* fields */
#define STATX_DIO_READ_ALIGN 0x00020000U /* Want/got dio read alignment info */
+#define STATX_PTIME 0x00040000U /* Want/got stx_ptime */
#define STATX__RESERVED 0x80000000U /* Reserved for future struct statx expansion */
diff --git a/init/initramfs.c b/init/initramfs.c
index 6ddbfb17f..e066b1fee 100644
--- a/init/initramfs.c
+++ b/init/initramfs.c
@@ -139,7 +139,7 @@ static void __init do_utime(char *filename, time64_t mtime)
static void __init do_utime_path(const struct path *path, time64_t mtime)
{
struct timespec64 t[2] = { { .tv_sec = mtime }, { .tv_sec = mtime } };
- vfs_utimes(path, t);
+ vfs_utimes(path, t, 0);
}
static __initdata LIST_HEAD(dir_list);
--
2.53.0
^ permalink raw reply related
* [PATCH 2/6] btrfs: add provenance time (ptime) support
From: Sean Smith @ 2026-04-05 19:49 UTC (permalink / raw)
To: linux-fsdevel
Cc: linux-ext4, linux-btrfs, tytso, dsterba, david, brauner, osandov,
almaz, hirofumi, linkinjeon, Sean Smith
In-Reply-To: <20260405195007.1306-1-DefendTheDisabled@gmail.com>
Store ptime as a dedicated field in btrfs_inode_item reserved space:
struct btrfs_timespec (12 bytes) + __le32 pad (4 bytes) = 16 bytes,
consuming 2 of 4 reserved __le64 slots, leaving 2 free.
In-memory: i_ptime_sec/i_ptime_nsec in struct btrfs_inode.
Persistence: delayed-inode read/write path (the primary persistence
path for normal inodes, not fill_inode_item).
Tree-log: ptime written to log tree for fsync crash recovery.
New inode: initialized to zero (ptime unset).
Getattr reports ptime only when non-zero (distinguishes unset from
supported-but-zero). Setattr accepts ATTR_PTIME and sets
BTRFS_FEATURE_COMPAT_RO_PTIME - old kernels see unknown compat_ro
bit and refuse RW mount, protecting ptime data.
Rename-over preservation: when rename(source, target) replaces an
existing regular file, if source has ptime=0 and target has ptime
set, inherit target ptime to source. Guards: S_ISREG both sides,
nlink==1, not RENAME_EXCHANGE/WHITEOUT. Atomic with rename
transaction. Enables atomic-save survival (write-temp + rename).
Signed-off-by: Sean Smith <DefendTheDisabled@gmail.com>
---
fs/btrfs/btrfs_inode.h | 4 ++++
fs/btrfs/delayed-inode.c | 4 ++++
fs/btrfs/fs.h | 3 ++-
fs/btrfs/inode.c | 42 +++++++++++++++++++++++++++++++++
fs/btrfs/tree-log.c | 2 ++
include/uapi/linux/btrfs.h | 1 +
include/uapi/linux/btrfs_tree.h | 4 +++-
7 files changed, 58 insertions(+), 2 deletions(-)
diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h
index 73602ee8d..bac92f766 100644
--- a/fs/btrfs/btrfs_inode.h
+++ b/fs/btrfs/btrfs_inode.h
@@ -334,6 +334,10 @@ struct btrfs_inode {
u64 i_otime_sec;
u32 i_otime_nsec;
+ /* Provenance time - original creation date of file content. */
+ u64 i_ptime_sec;
+ u32 i_ptime_nsec;
+
/* Hook into fs_info->delayed_iputs */
struct list_head delayed_iput;
diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c
index 7e3d294a6..649de7c29 100644
--- a/fs/btrfs/delayed-inode.c
+++ b/fs/btrfs/delayed-inode.c
@@ -1887,6 +1887,8 @@ static void fill_stack_inode_item(struct btrfs_trans_handle *trans,
btrfs_set_stack_timespec_sec(&inode_item->otime, inode->i_otime_sec);
btrfs_set_stack_timespec_nsec(&inode_item->otime, inode->i_otime_nsec);
+ btrfs_set_stack_timespec_sec(&inode_item->ptime, inode->i_ptime_sec);
+ btrfs_set_stack_timespec_nsec(&inode_item->ptime, inode->i_ptime_nsec);
}
int btrfs_fill_inode(struct btrfs_inode *inode, u32 *rdev)
@@ -1935,6 +1937,8 @@ int btrfs_fill_inode(struct btrfs_inode *inode, u32 *rdev)
inode->i_otime_sec = btrfs_stack_timespec_sec(&inode_item->otime);
inode->i_otime_nsec = btrfs_stack_timespec_nsec(&inode_item->otime);
+ inode->i_ptime_sec = btrfs_stack_timespec_sec(&inode_item->ptime);
+ inode->i_ptime_nsec = btrfs_stack_timespec_nsec(&inode_item->ptime);
vfs_inode->i_generation = inode->generation;
if (S_ISDIR(vfs_inode->i_mode))
diff --git a/fs/btrfs/fs.h b/fs/btrfs/fs.h
index 8ffbc40eb..7c8105ecf 100644
--- a/fs/btrfs/fs.h
+++ b/fs/btrfs/fs.h
@@ -284,7 +284,8 @@ enum {
(BTRFS_FEATURE_COMPAT_RO_FREE_SPACE_TREE | \
BTRFS_FEATURE_COMPAT_RO_FREE_SPACE_TREE_VALID | \
BTRFS_FEATURE_COMPAT_RO_VERITY | \
- BTRFS_FEATURE_COMPAT_RO_BLOCK_GROUP_TREE)
+ BTRFS_FEATURE_COMPAT_RO_BLOCK_GROUP_TREE | \
+ BTRFS_FEATURE_COMPAT_RO_PTIME)
#define BTRFS_FEATURE_COMPAT_RO_SAFE_SET 0ULL
#define BTRFS_FEATURE_COMPAT_RO_SAFE_CLEAR 0ULL
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 13f1f3b52..dce80561a 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -4029,6 +4029,8 @@ static int btrfs_read_locked_inode(struct btrfs_inode *inode, struct btrfs_path
inode->i_otime_sec = btrfs_timespec_sec(leaf, &inode_item->otime);
inode->i_otime_nsec = btrfs_timespec_nsec(leaf, &inode_item->otime);
+ inode->i_ptime_sec = btrfs_timespec_sec(leaf, &inode_item->ptime);
+ inode->i_ptime_nsec = btrfs_timespec_nsec(leaf, &inode_item->ptime);
inode_set_bytes(vfs_inode, btrfs_inode_nbytes(leaf, inode_item));
inode->generation = btrfs_inode_generation(leaf, inode_item);
@@ -4220,6 +4222,8 @@ static void fill_inode_item(struct btrfs_trans_handle *trans,
btrfs_set_timespec_sec(leaf, &item->otime, BTRFS_I(inode)->i_otime_sec);
btrfs_set_timespec_nsec(leaf, &item->otime, BTRFS_I(inode)->i_otime_nsec);
+ btrfs_set_timespec_sec(leaf, &item->ptime, BTRFS_I(inode)->i_ptime_sec);
+ btrfs_set_timespec_nsec(leaf, &item->ptime, BTRFS_I(inode)->i_ptime_nsec);
btrfs_set_inode_nbytes(leaf, item, inode_get_bytes(inode));
btrfs_set_inode_generation(leaf, item, BTRFS_I(inode)->generation);
@@ -5424,6 +5428,12 @@ static int btrfs_setattr(struct mnt_idmap *idmap, struct dentry *dentry,
}
if (attr->ia_valid) {
+ if (attr->ia_valid & ATTR_PTIME) {
+ BTRFS_I(inode)->i_ptime_sec = attr->ia_ptime.tv_sec;
+ BTRFS_I(inode)->i_ptime_nsec = attr->ia_ptime.tv_nsec;
+ btrfs_set_fs_compat_ro(BTRFS_I(inode)->root->fs_info, PTIME);
+ }
+
setattr_copy(idmap, inode, attr);
inode_inc_iversion(inode);
ret = btrfs_dirty_inode(BTRFS_I(inode));
@@ -8007,6 +8017,8 @@ struct inode *btrfs_alloc_inode(struct super_block *sb)
ei->i_otime_sec = 0;
ei->i_otime_nsec = 0;
+ ei->i_ptime_sec = 0;
+ ei->i_ptime_nsec = 0;
inode = &ei->vfs_inode;
btrfs_extent_map_tree_init(&ei->extent_tree);
@@ -8159,6 +8171,14 @@ static int btrfs_getattr(struct mnt_idmap *idmap,
u32 bi_ro_flags = BTRFS_I(inode)->ro_flags;
stat->result_mask |= STATX_BTIME;
+ if (request_mask & STATX_PTIME) {
+ if (BTRFS_I(inode)->i_ptime_sec ||
+ BTRFS_I(inode)->i_ptime_nsec) {
+ stat->ptime.tv_sec = BTRFS_I(inode)->i_ptime_sec;
+ stat->ptime.tv_nsec = BTRFS_I(inode)->i_ptime_nsec;
+ stat->result_mask |= STATX_PTIME;
+ }
+ }
stat->btime.tv_sec = BTRFS_I(inode)->i_otime_sec;
stat->btime.tv_nsec = BTRFS_I(inode)->i_otime_nsec;
if (bi_flags & BTRFS_INODE_APPEND)
@@ -8675,6 +8695,28 @@ static int btrfs_rename(struct mnt_idmap *idmap,
btrfs_abort_transaction(trans, ret);
goto out_fail;
}
+ /*
+ * ptime rename-over preservation: if a file with no ptime
+ * is being renamed over a file that has ptime (the atomic
+ * save pattern: write-to-temp + rename over original),
+ * inherit the target's ptime so provenance survives.
+ */
+ if (new_inode && S_ISREG(old_inode->i_mode) &&
+ S_ISREG(new_inode->i_mode) && old_inode->i_nlink == 1 &&
+ !(flags & (RENAME_EXCHANGE | RENAME_WHITEOUT))) {
+ struct btrfs_inode *old_bi = BTRFS_I(old_inode);
+ struct btrfs_inode *new_bi = BTRFS_I(new_inode);
+ if (!old_bi->i_ptime_sec && !old_bi->i_ptime_nsec &&
+ (new_bi->i_ptime_sec || new_bi->i_ptime_nsec)) {
+ old_bi->i_ptime_sec = new_bi->i_ptime_sec;
+ old_bi->i_ptime_nsec = new_bi->i_ptime_nsec;
+ }
+ }
+ /* Note: if rename fails below, ptime mutation is harmless —
+ * the source file keeps its previous ptime=0 semantics since
+ * the rename didn't complete. The in-memory value will be
+ * overwritten on next inode read from disk. */
+
ret = btrfs_update_inode(trans, BTRFS_I(old_inode));
if (unlikely(ret)) {
btrfs_abort_transaction(trans, ret);
diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index 6c40f48cc..7ed09af22 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -4640,6 +4640,8 @@ static void fill_inode_item(struct btrfs_trans_handle *trans,
btrfs_set_timespec_sec(leaf, &item->otime, BTRFS_I(inode)->i_otime_sec);
btrfs_set_timespec_nsec(leaf, &item->otime, BTRFS_I(inode)->i_otime_nsec);
+ btrfs_set_timespec_sec(leaf, &item->ptime, BTRFS_I(inode)->i_ptime_sec);
+ btrfs_set_timespec_nsec(leaf, &item->ptime, BTRFS_I(inode)->i_ptime_nsec);
/*
* We do not need to set the nbytes field, in fact during a fast fsync
diff --git a/include/uapi/linux/btrfs.h b/include/uapi/linux/btrfs.h
index e8fd92789..d2c542425 100644
--- a/include/uapi/linux/btrfs.h
+++ b/include/uapi/linux/btrfs.h
@@ -313,6 +313,7 @@ struct btrfs_ioctl_fs_info_args {
* reducing mount time for large filesystem due to better locality.
*/
#define BTRFS_FEATURE_COMPAT_RO_BLOCK_GROUP_TREE (1ULL << 3)
+#define BTRFS_FEATURE_COMPAT_RO_PTIME (1ULL << 4)
#define BTRFS_FEATURE_INCOMPAT_MIXED_BACKREF (1ULL << 0)
#define BTRFS_FEATURE_INCOMPAT_DEFAULT_SUBVOL (1ULL << 1)
diff --git a/include/uapi/linux/btrfs_tree.h b/include/uapi/linux/btrfs_tree.h
index fc29d2738..719c00363 100644
--- a/include/uapi/linux/btrfs_tree.h
+++ b/include/uapi/linux/btrfs_tree.h
@@ -890,7 +890,9 @@ struct btrfs_inode_item {
* a little future expansion, for more than this we can
* just grow the inode item and version it
*/
- __le64 reserved[4];
+ struct btrfs_timespec ptime;
+ __le32 __reserved_pad;
+ __le64 reserved[2];
struct btrfs_timespec atime;
struct btrfs_timespec ctime;
struct btrfs_timespec mtime;
--
2.53.0
^ permalink raw reply related
* [PATCH 3/6] ntfs3: map ptime to NTFS creation time with rename-over
From: Sean Smith @ 2026-04-05 19:49 UTC (permalink / raw)
To: linux-fsdevel
Cc: linux-ext4, linux-btrfs, tytso, dsterba, david, brauner, osandov,
almaz, hirofumi, linkinjeon, Sean Smith
In-Reply-To: <20260405195007.1306-1-DefendTheDisabled@gmail.com>
Map ptime to the NTFS Date Created field in $STANDARD_INFORMATION.
This is a mapped-ptime implementation: setting ptime overwrites the
creation time. Justified because Windows treats NTFS creation time
as mutable via SetFileTime() - it was never truly immutable.
Getattr: report NTFS creation time as ptime.
Setattr: write ptime to NTFS creation time via frecord cr_time path.
Rename-over: save target creation time before unlink, restore to
source after rename. Replicates Windows behavior where creation
time survives application atomic saves.
Round-trip: NTFS Date Created -> Btrfs ptime -> NTFS Date Created
preserves the original creation date through cross-FS copies.
Signed-off-by: Sean Smith <DefendTheDisabled@gmail.com>
---
fs/ntfs3/file.c | 13 +++++++++++++
fs/ntfs3/frecord.c | 8 ++++++++
fs/ntfs3/namei.c | 14 ++++++++++++++
3 files changed, 35 insertions(+)
diff --git a/fs/ntfs3/file.c b/fs/ntfs3/file.c
index 13d014b87..8688a48b1 100644
--- a/fs/ntfs3/file.c
+++ b/fs/ntfs3/file.c
@@ -161,6 +161,13 @@ int ntfs_getattr(struct mnt_idmap *idmap, const struct path *path,
stat->result_mask |= STATX_BTIME;
stat->btime = ni->i_crtime;
+
+ /* Map NTFS creation time to ptime (provenance time) */
+ if (request_mask & STATX_PTIME) {
+ stat->ptime = ni->i_crtime;
+ stat->result_mask |= STATX_PTIME;
+ }
+
stat->blksize = ni->mi.sbi->cluster_size; /* 512, 1K, ..., 2M */
if (inode->i_flags & S_IMMUTABLE)
@@ -857,6 +864,12 @@ int ntfs_setattr(struct mnt_idmap *idmap, struct dentry *dentry,
i_size_write(inode, newsize);
}
+ /* Accept ptime and store as NTFS creation time */
+ if (ia_valid & ATTR_PTIME) {
+ ni->i_crtime = attr->ia_ptime;
+ ni->ni_flags |= NI_FLAG_UPDATE_PARENT;
+ }
+
setattr_copy(idmap, inode, attr);
if (mode != inode->i_mode) {
diff --git a/fs/ntfs3/frecord.c b/fs/ntfs3/frecord.c
index d5bbd47e1..b164b2f50 100644
--- a/fs/ntfs3/frecord.c
+++ b/fs/ntfs3/frecord.c
@@ -3197,6 +3197,14 @@ int ni_write_inode(struct inode *inode, int sync, const char *hint)
modified = true;
}
+ /* Write creation time (ptime maps to NTFS cr_time) */
+ ts = ni->i_crtime;
+ dup.cr_time = kernel2nt(&ts);
+ if (std->cr_time != dup.cr_time) {
+ std->cr_time = dup.cr_time;
+ modified = true;
+ }
+
dup.fa = ni->std_fa;
if (std->fa != dup.fa) {
std->fa = dup.fa;
diff --git a/fs/ntfs3/namei.c b/fs/ntfs3/namei.c
index b2af8f695..40d06884f 100644
--- a/fs/ntfs3/namei.c
+++ b/fs/ntfs3/namei.c
@@ -292,6 +292,16 @@ static int ntfs_rename(struct mnt_idmap *idmap, struct inode *dir,
return -EINVAL;
}
+ /* ptime rename-over: save target creation time before unlink */
+ struct timespec64 saved_crtime = {};
+ bool inherit_crtime = false;
+
+ if (new_inode && S_ISREG(inode->i_mode) &&
+ S_ISREG(new_inode->i_mode) && inode->i_nlink == 1) {
+ saved_crtime = ntfs_i(new_inode)->i_crtime;
+ inherit_crtime = true;
+ }
+
if (new_inode) {
/* Target name exists. Unlink it. */
dget(new_dentry);
@@ -330,6 +340,10 @@ static int ntfs_rename(struct mnt_idmap *idmap, struct inode *dir,
err = ni_rename(dir_ni, new_dir_ni, ni, de, new_de);
if (!err) {
+ /* ptime rename-over: inherit target creation time */
+ if (inherit_crtime)
+ ni->i_crtime = saved_crtime;
+
simple_rename_timestamp(dir, dentry, new_dir, new_dentry);
mark_inode_dirty(inode);
mark_inode_dirty(dir);
--
2.53.0
^ permalink raw reply related
* [PATCH 4/6] ext4: add dedicated ptime field alongside i_crtime
From: Sean Smith @ 2026-04-05 19:50 UTC (permalink / raw)
To: linux-fsdevel
Cc: linux-ext4, linux-btrfs, tytso, dsterba, david, brauner, osandov,
almaz, hirofumi, linkinjeon, Sean Smith
In-Reply-To: <20260405195007.1306-1-DefendTheDisabled@gmail.com>
Add i_ptime (__le32) and i_ptime_extra (__le32) to the ext4 on-disk
inode structure after i_projid. Total: 8 bytes in the extended inode
area. i_crtime remains untouched as immutable birth time.
This is a native-ptime implementation: ptime and btime are separate
fields. On 256-byte inodes (modern default), both fit easily. On
128-byte inodes, ptime is silently unavailable (same graceful
degradation as i_crtime via EXT4_FITS_IN_INODE).
Uses existing EXT4_EINODE_GET_XTIME/SET_XTIME macros for read/write.
Rename-over: when a file with ptime=0 replaces a file with ptime set,
inherit target ptime (same zero-sentinel logic as Btrfs).
Signed-off-by: Sean Smith <DefendTheDisabled@gmail.com>
---
fs/ext4/ext4.h | 3 +++
fs/ext4/inode.c | 14 ++++++++++++++
fs/ext4/namei.c | 13 +++++++++++++
3 files changed, 30 insertions(+)
diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index f1c476303..5c2812637 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -860,6 +860,8 @@ struct ext4_inode {
__le32 i_crtime_extra; /* extra FileCreationtime (nsec << 2 | epoch) */
__le32 i_version_hi; /* high 32 bits for 64-bit version */
__le32 i_projid; /* Project ID */
+ __le32 i_ptime; /* Provenance time */
+ __le32 i_ptime_extra; /* extra Provenance time (nsec << 2 | epoch) */
};
#define EXT4_EPOCH_BITS 2
@@ -1136,6 +1138,7 @@ struct ext4_inode_info {
* struct timespec64 i_{a,c,m}time in the generic inode.
*/
struct timespec64 i_crtime;
+ struct timespec64 i_ptime;
/* mballoc */
atomic_t i_prealloc_active;
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 625cfbf61..15b6b6dc6 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -4753,6 +4753,7 @@ static int ext4_fill_raw_inode(struct inode *inode, struct ext4_inode *raw_inode
EXT4_INODE_SET_MTIME(inode, raw_inode);
EXT4_INODE_SET_ATIME(inode, raw_inode);
EXT4_EINODE_SET_XTIME(i_crtime, ei, raw_inode);
+ EXT4_EINODE_SET_XTIME(i_ptime, ei, raw_inode);
raw_inode->i_dtime = cpu_to_le32(ei->i_dtime);
raw_inode->i_flags = cpu_to_le32(ei->i_flags & 0xFFFFFFFF);
@@ -5409,6 +5410,7 @@ struct inode *__ext4_iget(struct super_block *sb, unsigned long ino,
EXT4_INODE_GET_ATIME(inode, raw_inode);
EXT4_INODE_GET_MTIME(inode, raw_inode);
EXT4_EINODE_GET_XTIME(i_crtime, ei, raw_inode);
+ EXT4_EINODE_GET_XTIME(i_ptime, ei, raw_inode);
if (likely(!test_opt2(inode->i_sb, HURD_COMPAT))) {
u64 ivers = le32_to_cpu(raw_inode->i_disk_version);
@@ -6061,6 +6063,9 @@ int ext4_setattr(struct mnt_idmap *idmap, struct dentry *dentry,
if (!error) {
if (inc_ivers)
inode_inc_iversion(inode);
+ if (attr->ia_valid & ATTR_PTIME)
+ EXT4_I(inode)->i_ptime = attr->ia_ptime;
+
setattr_copy(idmap, inode, attr);
mark_inode_dirty(inode);
}
@@ -6114,6 +6119,15 @@ int ext4_getattr(struct mnt_idmap *idmap, const struct path *path,
stat->btime.tv_nsec = ei->i_crtime.tv_nsec;
}
+ /* Report ptime from dedicated field, not crtime */
+ if ((request_mask & STATX_PTIME) &&
+ EXT4_FITS_IN_INODE(raw_inode, ei, i_ptime) &&
+ (ei->i_ptime.tv_sec || ei->i_ptime.tv_nsec)) {
+ stat->result_mask |= STATX_PTIME;
+ stat->ptime.tv_sec = ei->i_ptime.tv_sec;
+ stat->ptime.tv_nsec = ei->i_ptime.tv_nsec;
+ }
+
/*
* Return the DIO alignment restrictions if requested. We only return
* this information when requested, since on encrypted files it might
diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index c4b5e252a..1bfe4df24 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -3942,6 +3942,19 @@ static int ext4_rename(struct mnt_idmap *idmap, struct inode *old_dir,
* rename.
*/
inode_set_ctime_current(old.inode);
+
+ /* ptime rename-over: preserve ptime across atomic saves */
+ if (new.inode && S_ISREG(old.inode->i_mode) &&
+ S_ISREG(new.inode->i_mode) && old.inode->i_nlink == 1 &&
+ !(flags & RENAME_WHITEOUT)) {
+ struct ext4_inode_info *old_ei = EXT4_I(old.inode);
+ struct ext4_inode_info *new_ei = EXT4_I(new.inode);
+
+ if (!old_ei->i_ptime.tv_sec && !old_ei->i_ptime.tv_nsec &&
+ (new_ei->i_ptime.tv_sec || new_ei->i_ptime.tv_nsec))
+ old_ei->i_ptime = new_ei->i_ptime;
+ }
+
retval = ext4_mark_inode_dirty(handle, old.inode);
if (unlikely(retval))
goto end_rename;
--
2.53.0
^ permalink raw reply related
* [PATCH 5/6] fat: map ptime to FAT creation time with rename-over
From: Sean Smith @ 2026-04-05 19:50 UTC (permalink / raw)
To: linux-fsdevel
Cc: linux-ext4, linux-btrfs, tytso, dsterba, david, brauner, osandov,
almaz, hirofumi, linkinjeon, Sean Smith
In-Reply-To: <20260405195007.1306-1-DefendTheDisabled@gmail.com>
Map ptime to the FAT/VFAT creation time field. Only active on VFAT
(long filename) mounts since plain FAT12/FAT16 lack creation time.
FAT32 creation time has 2-second precision.
Getattr: report creation time as ptime (VFAT only, via isvfat check).
Setattr: write ptime to i_crtime.
Rename-over: save target creation time before detach, restore to
source after attach. Preserves creation time across atomic saves.
Signed-off-by: Sean Smith <DefendTheDisabled@gmail.com>
---
fs/fat/file.c | 6 ++++++
fs/fat/namei_vfat.c | 20 ++++++++++++++++++--
2 files changed, 24 insertions(+), 2 deletions(-)
diff --git a/fs/fat/file.c b/fs/fat/file.c
index 4fc49a614..9d1fcc554 100644
--- a/fs/fat/file.c
+++ b/fs/fat/file.c
@@ -413,6 +413,10 @@ int fat_getattr(struct mnt_idmap *idmap, const struct path *path,
stat->result_mask |= STATX_BTIME;
stat->btime = MSDOS_I(inode)->i_crtime;
}
+ if (sbi->options.isvfat && (request_mask & STATX_PTIME)) {
+ stat->result_mask |= STATX_PTIME;
+ stat->ptime = MSDOS_I(inode)->i_crtime;
+ }
return 0;
}
@@ -564,6 +568,8 @@ int fat_setattr(struct mnt_idmap *idmap, struct dentry *dentry,
fat_truncate_time(inode, &attr->ia_mtime, S_MTIME);
attr->ia_valid &= ~(ATTR_ATIME|ATTR_CTIME|ATTR_MTIME);
+ if (attr->ia_valid & ATTR_PTIME)
+ MSDOS_I(inode)->i_crtime = attr->ia_ptime;
setattr_copy(idmap, inode, attr);
mark_inode_dirty(inode);
out:
diff --git a/fs/fat/namei_vfat.c b/fs/fat/namei_vfat.c
index 47ff083cf..f1e2eadf8 100644
--- a/fs/fat/namei_vfat.c
+++ b/fs/fat/namei_vfat.c
@@ -975,8 +975,24 @@ static int vfat_rename(struct inode *old_dir, struct dentry *old_dentry,
}
inode_inc_iversion(new_dir);
- fat_detach(old_inode);
- fat_attach(old_inode, new_i_pos);
+ /* ptime rename-over: save target creation time */
+ {
+ struct timespec64 saved_crtime = {};
+ bool inherit_crtime = false;
+
+ if (new_inode && S_ISREG(old_inode->i_mode) &&
+ S_ISREG(new_inode->i_mode) && old_inode->i_nlink == 1) {
+ saved_crtime = MSDOS_I(new_inode)->i_crtime;
+ inherit_crtime = true;
+ }
+
+ fat_detach(old_inode);
+ fat_attach(old_inode, new_i_pos);
+
+ if (inherit_crtime)
+ MSDOS_I(old_inode)->i_crtime = saved_crtime;
+ }
+
err = vfat_sync_ipos(new_dir, old_inode);
if (err)
goto error_inode;
--
2.53.0
^ permalink raw reply related
* [PATCH 6/6] exfat: map ptime to exFAT creation time with rename-over
From: Sean Smith @ 2026-04-05 19:50 UTC (permalink / raw)
To: linux-fsdevel
Cc: linux-ext4, linux-btrfs, tytso, dsterba, david, brauner, osandov,
almaz, hirofumi, linkinjeon, Sean Smith
In-Reply-To: <20260405195007.1306-1-DefendTheDisabled@gmail.com>
Map ptime to the exFAT creation time field. exFAT creation time
has 10ms precision.
Getattr: report creation time as ptime.
Setattr: write ptime to i_crtime.
Rename-over: save target creation time before __exfat_rename, restore
after. Preserves creation time across atomic saves.
Signed-off-by: Sean Smith <DefendTheDisabled@gmail.com>
---
fs/btrfs/inode.c | 3 ++-
fs/exfat/file.c | 9 +++++++++
fs/exfat/namei.c | 21 ++++++++++++++++++---
3 files changed, 29 insertions(+), 4 deletions(-)
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index dce80561a..918dfd4c5 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -8715,7 +8715,8 @@ static int btrfs_rename(struct mnt_idmap *idmap,
/* Note: if rename fails below, ptime mutation is harmless —
* the source file keeps its previous ptime=0 semantics since
* the rename didn't complete. The in-memory value will be
- * overwritten on next inode read from disk. */
+ * overwritten on next inode read from disk.
+ */
ret = btrfs_update_inode(trans, BTRFS_I(old_inode));
if (unlikely(ret)) {
diff --git a/fs/exfat/file.c b/fs/exfat/file.c
index 536c8078f..b6438bd79 100644
--- a/fs/exfat/file.c
+++ b/fs/exfat/file.c
@@ -277,6 +277,11 @@ int exfat_getattr(struct mnt_idmap *idmap, const struct path *path,
stat->result_mask |= STATX_BTIME;
stat->btime.tv_sec = ei->i_crtime.tv_sec;
stat->btime.tv_nsec = ei->i_crtime.tv_nsec;
+ if (request_mask & STATX_PTIME) {
+ stat->result_mask |= STATX_PTIME;
+ stat->ptime.tv_sec = ei->i_crtime.tv_sec;
+ stat->ptime.tv_nsec = ei->i_crtime.tv_nsec;
+ }
stat->blksize = EXFAT_SB(inode->i_sb)->cluster_size;
return 0;
}
@@ -337,6 +342,10 @@ int exfat_setattr(struct mnt_idmap *idmap, struct dentry *dentry,
if (attr->ia_valid & ATTR_SIZE)
inode_set_mtime_to_ts(inode, inode_set_ctime_current(inode));
+ if (attr->ia_valid & ATTR_PTIME) {
+ struct exfat_inode_info *exi = EXFAT_I(inode);
+ exi->i_crtime = attr->ia_ptime;
+ }
setattr_copy(idmap, inode, attr);
exfat_truncate_inode_atime(inode);
diff --git a/fs/exfat/namei.c b/fs/exfat/namei.c
index dfe957493..9c0b59e00 100644
--- a/fs/exfat/namei.c
+++ b/fs/exfat/namei.c
@@ -1262,9 +1262,24 @@ static int exfat_rename(struct mnt_idmap *idmap,
old_inode = old_dentry->d_inode;
new_inode = new_dentry->d_inode;
- err = __exfat_rename(old_dir, EXFAT_I(old_inode), new_dir, new_dentry);
- if (err)
- goto unlock;
+ /* ptime rename-over: save target creation time */
+ {
+ struct timespec64 saved_crtime = {};
+ bool inherit_crtime = false;
+
+ if (new_inode && S_ISREG(old_inode->i_mode) &&
+ S_ISREG(new_inode->i_mode) && old_inode->i_nlink == 1) {
+ saved_crtime = EXFAT_I(new_inode)->i_crtime;
+ inherit_crtime = true;
+ }
+
+ err = __exfat_rename(old_dir, EXFAT_I(old_inode), new_dir, new_dentry);
+ if (err)
+ goto unlock;
+
+ if (inherit_crtime)
+ EXFAT_I(old_inode)->i_crtime = saved_crtime;
+ }
inode_inc_iversion(new_dir);
simple_rename_timestamp(old_dir, old_dentry, new_dir, new_dentry);
--
2.53.0
^ permalink raw reply related
* Re: [RFC PATCH v1 0/6] provenance_time (ptime): a new settable timestamp for cross-filesystem provenance
From: Theodore Tso @ 2026-04-05 22:54 UTC (permalink / raw)
To: Sean Smith
Cc: linux-fsdevel, linux-ext4, linux-btrfs, dsterba, david, brauner,
osandov, almaz, hirofumi, linkinjeon
In-Reply-To: <20260405195007.1306-1-DefendTheDisabled@gmail.com>
On Sun, Apr 05, 2026 at 02:49:56PM -0500, Sean Smith wrote:
>
> 1. Application atomic saves destroy xattrs. Programs that save
> via write-to-temp + rename() replace the inode, permanently
> destroying all extended attributes. Only the VFS sees both
> inodes during rename -- no userspace mechanism can intercept
> this and copy metadata across.
The VFS could potentially copy the xattr on a rename, no?
> 2. Every tool in the copy chain must explicitly opt in to xattr
> preservation. cp requires --preserve=xattr, rsync requires -X,
> tar requires --xattrs. Each missing flag causes silent data
> loss. Transparent preservation through arbitrary tool flows
> is not achievable in userspace.
But this is true for your proposed ptime as well. You have to change
every single tool to copy over the ptime. Worse, you have to change
the format of tar in a non-standard on-disk format change to support
this new ptime timestamp. And rsync will require a non-standard
protocol change to support the new timestamp.
> Atomic saves are the default behavior of mainstream applications
> (LibreOffice, Vim, Kate, etc.).
You will also have to change mainstream applications to copy ptime
from the original file to the file.new before the atomic rename.
Using ptime doesn't change this. So you will need to make this
non-standard, Linux-specific change to all of these mainstream
applications.
Is it worth it? It's a huge amount of cost being spread across a very
large part of the open source ecosystem just this fairly narrow use
case. Personally, I'm not convinced it's worth the effort.
- Ted
^ permalink raw reply
* [PATCH v2] ext4: fix missing brelse() in ext4_xattr_inode_dec_ref_all()
From: skoyama.kernel @ 2026-04-06 7:48 UTC (permalink / raw)
To: linux-ext4
Cc: tytso, adilger.kernel, libaokun, jack, ojaswin, ritesh.list,
yi.zhang, bhupesh, Sohei Koyama, Andreas Dilger, stable
From: Sohei Koyama <skoyama@ddn.com>
The commit c8e008b60492 ("ext4: ignore xattrs past end")
introduced a refcount leak in when block_csum is false.
ext4_xattr_inode_dec_ref_all() calls ext4_get_inode_loc() to
get iloc.bh, but never releases it with brelse().
Fixes: c8e008b60492 ("ext4: ignore xattrs past end")
Signed-off-by: Sohei Koyama <skoyama@ddn.com>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Cc: stable@vger.kernel.org
---
fs/ext4/xattr.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/fs/ext4/xattr.c b/fs/ext4/xattr.c
index 7bf9ba19a89d..19c72e38fb82 100644
--- a/fs/ext4/xattr.c
+++ b/fs/ext4/xattr.c
@@ -1165,7 +1165,7 @@ ext4_xattr_inode_dec_ref_all(handle_t *handle, struct inode *parent,
{
struct inode *ea_inode;
struct ext4_xattr_entry *entry;
- struct ext4_iloc iloc;
+ struct ext4_iloc iloc = { .bh = NULL };
bool dirty = false;
unsigned int ea_ino;
int err;
@@ -1260,6 +1260,8 @@ ext4_xattr_inode_dec_ref_all(handle_t *handle, struct inode *parent,
ext4_warning_inode(parent,
"handle dirty metadata err=%d", err);
}
+
+ brelse(iloc.bh);
}
/*
--
2.39.3 (Apple Git-146)
^ permalink raw reply related
* Re: [PATCH v2] ext4: fix missing brelse() in ext4_xattr_inode_dec_ref_all()
From: Baokun Li @ 2026-04-06 14:38 UTC (permalink / raw)
To: skoyama.kernel, linux-ext4
Cc: tytso, adilger.kernel, jack, ojaswin, ritesh.list, yi.zhang,
bhupesh, Sohei Koyama, Andreas Dilger, stable
In-Reply-To: <20260406074830.8480-1-skoyama@ddn.com>
On 2026/4/6 15:48, skoyama.kernel@gmail.com wrote:
> From: Sohei Koyama <skoyama@ddn.com>
>
> The commit c8e008b60492 ("ext4: ignore xattrs past end")
> introduced a refcount leak in when block_csum is false.
>
> ext4_xattr_inode_dec_ref_all() calls ext4_get_inode_loc() to
> get iloc.bh, but never releases it with brelse().
>
> Fixes: c8e008b60492 ("ext4: ignore xattrs past end")
> Signed-off-by: Sohei Koyama <skoyama@ddn.com>
> Reviewed-by: Andreas Dilger <adilger@dilger.ca>
> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
> Cc: stable@vger.kernel.org
Looks good, feel free to add:
Reviewed-by: Baokun Li <libaokun@linux.alibaba.com>
> ---
> fs/ext4/xattr.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/fs/ext4/xattr.c b/fs/ext4/xattr.c
> index 7bf9ba19a89d..19c72e38fb82 100644
> --- a/fs/ext4/xattr.c
> +++ b/fs/ext4/xattr.c
> @@ -1165,7 +1165,7 @@ ext4_xattr_inode_dec_ref_all(handle_t *handle, struct inode *parent,
> {
> struct inode *ea_inode;
> struct ext4_xattr_entry *entry;
> - struct ext4_iloc iloc;
> + struct ext4_iloc iloc = { .bh = NULL };
> bool dirty = false;
> unsigned int ea_ino;
> int err;
> @@ -1260,6 +1260,8 @@ ext4_xattr_inode_dec_ref_all(handle_t *handle, struct inode *parent,
> ext4_warning_inode(parent,
> "handle dirty metadata err=%d", err);
> }
> +
> + brelse(iloc.bh);
> }
>
> /*
^ permalink raw reply
* Re: [RFC PATCH v1 0/6] provenance_time (ptime): a new settable timestamp for cross-filesystem provenance
From: Sean Smith @ 2026-04-07 0:05 UTC (permalink / raw)
To: tytso
Cc: defendthedisabled, linux-fsdevel, linux-ext4, linux-btrfs,
dsterba, david, brauner, osandov, almaz, hirofumi, linkinjeon
In-Reply-To: <20260405225442.GA1763@macsyma-wired.lan>
[written with AI assistance]
On Sun, Apr 05, 2026 at 06:54:42PM -0400, Theodore Tso wrote:
Thanks for the substantive engagement — it helps clarify where
the proposal needs to justify itself.
> On Sun, Apr 05, 2026 at 02:49:56PM -0500, Sean Smith wrote:
> >
> > 1. Application atomic saves destroy xattrs. Programs that save
> > via write-to-temp + rename() replace the inode, permanently
> > destroying all extended attributes. Only the VFS sees both
> > inodes during rename -- no userspace mechanism can intercept
> > this and copy metadata across.
>
> The VFS could potentially copy the xattr on a rename, no?
It could, but even scoping to user.* means adding conditional
xattr-copy logic into every filesystem's rename handler — with
dynamic allocation and xattr tree lookups on a hot path. ptime
avoids this: one inline inode field, clear semantics, same VFS
patterns as atime/mtime/btime.
> > 2. Every tool in the copy chain must explicitly opt in to xattr
> > preservation. cp requires --preserve=xattr, rsync requires -X,
> > tar requires --xattrs. Each missing flag causes silent data
> > loss. Transparent preservation through arbitrary tool flows
> > is not achievable in userspace.
>
> But this is true for your proposed ptime as well. You have to change
> every single tool to copy over the ptime. Worse, you have to change
> the format of tar in a non-standard on-disk format change to support
> this new ptime timestamp. And rsync will require a non-standard
> protocol change to support the new timestamp.
You are right that copy tools require patches. If ptime only
improved the copy-tool situation, I would agree it does not
justify new kernel surface over xattrs.
The structural difference is in the default adoption path.
xattr preservation is permanently per-invocation opt-in: each
tool call needs the correct flag, and the default is to drop
them. A kernel timestamp exposed through statx/utimensat
follows the same API pattern as mtime — standard libraries
and tools naturally evolve to preserve all standard timestamps
by default. ptime has a path to default-preservation that
xattrs structurally cannot reach.
On the formats: the tar patch uses a vendor-prefixed PAX
header (SCHILY.ptime), backward-compatible — old readers
ignore it cleanly. The rsync patch plugs into the existing
--crtimes machinery that already supports macOS and Cygwin.
> > Atomic saves are the default behavior of mainstream applications
> > (LibreOffice, Vim, Kate, etc.).
>
> You will also have to change mainstream applications to copy ptime
> from the original file to the file.new before the atomic rename.
> Using ptime doesn't change this. So you will need to make this
> non-standard, Linux-specific change to all of these mainstream
> applications.
This is where the cover letter was not clear enough, and it
is the core reason ptime must be a kernel timestamp.
The patches implement rename-over preservation in all 5
filesystem rename handlers. When rename(source, target)
replaces an existing file, and the source has ptime=0 (the
default for any newly-created temp file) while the target
has ptime != 0, the filesystem copies the target's ptime to
the source before destroying the target's inode. This runs
inside the rename transaction, atomic with the rename itself.
Most GUI applications — LibreOffice, Kate, Qt and GNOME
apps — save via write-to-temp + rename-over-original. For
these, ptime survives automatically with no application
changes:
1. App writes to temp file (ptime = 0)
2. rename(temp, document.odt)
3. Kernel: source ptime=0, target!=0 -> copies ptime
4. ptime preserved. No app change.
This is not universal: editors that use rename-away +
create-new (Vim with default backupcopy=no, Emacs) do not
trigger rename-over, and the spec documents this as a known
limitation. But the write-to-temp + rename-over pattern is
the dominant GUI save path, and the kernel handles it
transparently — something no xattr mechanism can provide
without application cooperation.
> Is it worth it? It's a huge amount of cost being spread across a very
> large part of the open source ecosystem just this fairly narrow use
> case. Personally, I'm not convinced it's worth the effort.
I think the use case is broader than I conveyed. Any workflow
that copies files from NTFS, APFS, or HFS+ onto native Linux
filesystems loses user-visible creation time unless carried
out-of-band. This affects personal migrations, enterprise
backups, dual-boot users, and professional workflows in
photography, legal, scientific data, and media production.
Windows, macOS, and SMB have supported a settable creation
timestamp for decades — Linux is the outlier.
Users already expend significant resources working around
this gap — metadata manifests, scripts to stamp creation
dates into filenames or xattrs, side-channel databases —
or simply accept the data loss. The cost is already being
paid, continuously and redundantly across the ecosystem.
One upstream investment in ptime converts that distributed
ongoing cost into a bounded effort.
ptime is separate from btime by design: it preserves btime's
value as immutable forensic metadata while providing a
settable timestamp that travels with file content across
filesystem boundaries.
On ecosystem cost: the kernel surface is ~240 lines across
28 files. For context, I am a disabled Medicaid recipient
who came to this from a disability rights litigation
workflow — I need file provenance preserved across an
NTFS-to-Btrfs migration for legal work. The complete
implementation — kernel patches across 5 filesystems,
tool patches, and xfstests — was produced in a few days using
agentic development tools, which suggests the adoption cost may
be meaningfully lower than traditional estimates as these
tools become available across the ecosystem.
I understand a new timestamp is permanent API surface and
the bar should be high. My claim is that rename-over
preservation — automatic ptime survival through application
saves, without application changes — makes this materially
different from an xattr workaround, and justifies that cost.
Sean
^ permalink raw reply
* Re: [PATCH v2] ext4: fix missing brelse() in ext4_xattr_inode_dec_ref_all()
From: Zhang Yi @ 2026-04-07 1:13 UTC (permalink / raw)
To: skoyama.kernel, linux-ext4
Cc: tytso, adilger.kernel, libaokun, jack, ojaswin, ritesh.list,
bhupesh, Sohei Koyama, Andreas Dilger, stable
In-Reply-To: <20260406074830.8480-1-skoyama@ddn.com>
On 4/6/2026 3:48 PM, skoyama.kernel@gmail.com wrote:
> From: Sohei Koyama <skoyama@ddn.com>
>
> The commit c8e008b60492 ("ext4: ignore xattrs past end")
> introduced a refcount leak in when block_csum is false.
>
> ext4_xattr_inode_dec_ref_all() calls ext4_get_inode_loc() to
> get iloc.bh, but never releases it with brelse().
>
> Fixes: c8e008b60492 ("ext4: ignore xattrs past end")
> Signed-off-by: Sohei Koyama <skoyama@ddn.com>
> Reviewed-by: Andreas Dilger <adilger@dilger.ca>
> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
> Cc: stable@vger.kernel.org
Looks good to me.
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
> ---
> fs/ext4/xattr.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/fs/ext4/xattr.c b/fs/ext4/xattr.c
> index 7bf9ba19a89d..19c72e38fb82 100644
> --- a/fs/ext4/xattr.c
> +++ b/fs/ext4/xattr.c
> @@ -1165,7 +1165,7 @@ ext4_xattr_inode_dec_ref_all(handle_t *handle, struct inode *parent,
> {
> struct inode *ea_inode;
> struct ext4_xattr_entry *entry;
> - struct ext4_iloc iloc;
> + struct ext4_iloc iloc = { .bh = NULL };
> bool dirty = false;
> unsigned int ea_ino;
> int err;
> @@ -1260,6 +1260,8 @@ ext4_xattr_inode_dec_ref_all(handle_t *handle, struct inode *parent,
> ext4_warning_inode(parent,
> "handle dirty metadata err=%d", err);
> }
> +
> + brelse(iloc.bh);
> }
>
> /*
^ permalink raw reply
* Re: [RFC PATCH v1 0/6] provenance_time (ptime): a new settable timestamp for cross-filesystem provenance
From: Darrick J. Wong @ 2026-04-07 1:42 UTC (permalink / raw)
To: Sean Smith
Cc: tytso, linux-fsdevel, linux-ext4, linux-btrfs, dsterba, david,
brauner, osandov, hirofumi, linkinjeon
In-Reply-To: <20260407000558.417-1-DefendTheDisabled@gmail.com>
[drop almaz because the kernel.org mailer immediately refused]
On Mon, Apr 06, 2026 at 07:05:55PM -0500, Sean Smith wrote:
> [written with AI assistance]
>
> On Sun, Apr 05, 2026 at 06:54:42PM -0400, Theodore Tso wrote:
>
> Thanks for the substantive engagement — it helps clarify where
> the proposal needs to justify itself.
>
> > On Sun, Apr 05, 2026 at 02:49:56PM -0500, Sean Smith wrote:
> > >
> > > 1. Application atomic saves destroy xattrs. Programs that save
> > > via write-to-temp + rename() replace the inode, permanently
> > > destroying all extended attributes. Only the VFS sees both
> > > inodes during rename -- no userspace mechanism can intercept
> > > this and copy metadata across.
> >
> > The VFS could potentially copy the xattr on a rename, no?
>
> It could, but even scoping to user.* means adding conditional
> xattr-copy logic into every filesystem's rename handler — with
> dynamic allocation and xattr tree lookups on a hot path. ptime
> avoids this: one inline inode field, clear semantics, same VFS
> patterns as atime/mtime/btime.
>
> > > 2. Every tool in the copy chain must explicitly opt in to xattr
> > > preservation. cp requires --preserve=xattr, rsync requires -X,
> > > tar requires --xattrs. Each missing flag causes silent data
> > > loss. Transparent preservation through arbitrary tool flows
> > > is not achievable in userspace.
> >
> > But this is true for your proposed ptime as well. You have to change
> > every single tool to copy over the ptime. Worse, you have to change
> > the format of tar in a non-standard on-disk format change to support
> > this new ptime timestamp. And rsync will require a non-standard
> > protocol change to support the new timestamp.
>
> You are right that copy tools require patches. If ptime only
> improved the copy-tool situation, I would agree it does not
> justify new kernel surface over xattrs.
>
> The structural difference is in the default adoption path.
> xattr preservation is permanently per-invocation opt-in: each
> tool call needs the correct flag, and the default is to drop
> them. A kernel timestamp exposed through statx/utimensat
> follows the same API pattern as mtime — standard libraries
> and tools naturally evolve to preserve all standard timestamps
> by default. ptime has a path to default-preservation that
> xattrs structurally cannot reach.
"Standard"... I was about to write a sardonic reply here, but then I
remembred that Linux finally *does* have a standard means to transfer
some of those newer file attributes: file_getattr/file_setattr.
(Go Andrey!)
So, I guess all you really need to do is extend struct file_attr and now
userspace has a fairly convenient means to propagate the provenance
time. :)
> On the formats: the tar patch uses a vendor-prefixed PAX
> header (SCHILY.ptime), backward-compatible — old readers
> ignore it cleanly. The rsync patch plugs into the existing
> --crtimes machinery that already supports macOS and Cygwin.
>
> > > Atomic saves are the default behavior of mainstream applications
> > > (LibreOffice, Vim, Kate, etc.).
> >
> > You will also have to change mainstream applications to copy ptime
> > from the original file to the file.new before the atomic rename.
> > Using ptime doesn't change this. So you will need to make this
> > non-standard, Linux-specific change to all of these mainstream
> > applications.
>
> This is where the cover letter was not clear enough, and it
> is the core reason ptime must be a kernel timestamp.
>
> The patches implement rename-over preservation in all 5
> filesystem rename handlers. When rename(source, target)
> replaces an existing file, and the source has ptime=0 (the
> default for any newly-created temp file) while the target
> has ptime != 0, the filesystem copies the target's ptime to
> the source before destroying the target's inode. This runs
> inside the rename transaction, atomic with the rename itself.
>
> Most GUI applications — LibreOffice, Kate, Qt and GNOME
> apps — save via write-to-temp + rename-over-original. For
> these, ptime survives automatically with no application
> changes:
>
> 1. App writes to temp file (ptime = 0)
> 2. rename(temp, document.odt)
> 3. Kernel: source ptime=0, target!=0 -> copies ptime
> 4. ptime preserved. No app change.
>
> This is not universal: editors that use rename-away +
> create-new (Vim with default backupcopy=no, Emacs) do not
> trigger rename-over, and the spec documents this as a known
> limitation. But the write-to-temp + rename-over pattern is
> the dominant GUI save path, and the kernel handles it
> transparently — something no xattr mechanism can provide
> without application cooperation.
So does the provenance time cover just the file's contents, or the other
attributes and xattrs?
The reason I ask is, does the ptime get copied over for an FICLONE,
which maps all of one file's data blocks into another?
And by extension, would it also need to be exchanged if you told
XFS_IOC_EXCHANGE_RANGE to exchange all contents between two files?
(I know, I know, you said XFS was TBDHBD ;))
Last question: Is the provenance time only useful if the file is
immutable? Either directly via chattr +i, or by enabling fsverity?
--D
> > Is it worth it? It's a huge amount of cost being spread across a very
> > large part of the open source ecosystem just this fairly narrow use
> > case. Personally, I'm not convinced it's worth the effort.
>
> I think the use case is broader than I conveyed. Any workflow
> that copies files from NTFS, APFS, or HFS+ onto native Linux
> filesystems loses user-visible creation time unless carried
> out-of-band. This affects personal migrations, enterprise
> backups, dual-boot users, and professional workflows in
> photography, legal, scientific data, and media production.
> Windows, macOS, and SMB have supported a settable creation
> timestamp for decades — Linux is the outlier.
>
> Users already expend significant resources working around
> this gap — metadata manifests, scripts to stamp creation
> dates into filenames or xattrs, side-channel databases —
> or simply accept the data loss. The cost is already being
> paid, continuously and redundantly across the ecosystem.
> One upstream investment in ptime converts that distributed
> ongoing cost into a bounded effort.
>
> ptime is separate from btime by design: it preserves btime's
> value as immutable forensic metadata while providing a
> settable timestamp that travels with file content across
> filesystem boundaries.
>
> On ecosystem cost: the kernel surface is ~240 lines across
> 28 files. For context, I am a disabled Medicaid recipient
> who came to this from a disability rights litigation
> workflow — I need file provenance preserved across an
> NTFS-to-Btrfs migration for legal work. The complete
> implementation — kernel patches across 5 filesystems,
> tool patches, and xfstests — was produced in a few days using
> agentic development tools, which suggests the adoption cost may
> be meaningfully lower than traditional estimates as these
> tools become available across the ecosystem.
>
> I understand a new timestamp is permanent API surface and
> the bar should be high. My claim is that rename-over
> preservation — automatic ptime survival through application
> saves, without application changes — makes this materially
> different from an xattr workaround, and justifies that cost.
>
> Sean
>
^ permalink raw reply
* Re: [PATCH v2 3/3] ext4: derive f_fsid from block device to avoid collisions
From: Christoph Hellwig @ 2026-04-07 5:22 UTC (permalink / raw)
To: Anand Jain
Cc: Theodore Tso, Christoph Hellwig, Darrick J. Wong, linux-ext4,
linux-btrfs, linux-xfs, Anand Jain
In-Reply-To: <5bda3d00-df35-4ea1-b313-2fef6e5c5682@gmail.com>
On Sat, Apr 04, 2026 at 04:59:08PM +0800, Anand Jain wrote:
> Context:
> Currently, ext4's f_fsid is consistent across reboots but fails to be
> unique when dealing with cloned filesystems (sharing the same UUID). Per
> statfs(2) [1], the primary requirement is that the (f_fsid, ino) pair
> uniquely identifies a file. The man page makes no explicit guarantee
> regarding consistency across mount cycles or reboots.
>
> Proposal:
> With this fix, f_fsid becomes f(uuid, dev_t). This ensures OS-wide
> uniqueness and maintains consistency as long as the underlying dev_t
> remains stable.
>
> Dilemma:
> While statfs(2) [1] suggests f_fsid is "some random stuff," we know
> userspace (NFS, systemd) often treats it as a persistent handle.
>
> Do you prefer one of the names above, or is there a more idiomatic ext4
> naming convention I should follow?
>
> Given the ambiguity in the man page, is gating this behind an -o option
> necessary, or should we consider making uniqueness the default behavior?
>
My take is that anything that should persist should be an on-disk
feature flag, not a mount option. But I'm not in charge for ext4.
^ permalink raw reply
* Re: [RFC PATCH v1 0/6] provenance_time (ptime): a new settable timestamp for cross-filesystem provenance
From: Sean Smith @ 2026-04-07 6:06 UTC (permalink / raw)
To: Darrick J. Wong
Cc: tytso, linux-fsdevel, linux-ext4, linux-btrfs, dsterba, david,
brauner, osandov, hirofumi, linkinjeon
In-Reply-To: <20260407014129.GC6192@frogsfrogsfrogs>
[written with AI assistance]
On 4/6/2026 20:42, Darrick J. Wong wrote:
> "Standard"... I was about to write a sardonic reply here, but then I
> remembred that Linux finally *does* have a standard means to transfer
> some of those newer file attributes: file_getattr/file_setattr.
>
> (Go Andrey!)
>
> So, I guess all you really need to do is extend struct file_attr and now
> userspace has a fairly convenient means to propagate the provenance
> time. 🙂
Thank you for pointing to file_getattr/file_setattr — this
is a significantly better API path than our utimensat
extension. The size-versioned struct file_attr eliminates
the glibc times[2] limitation entirely, which was one of
the main upstream concerns with the current approach.
We will investigate extending struct file_attr with ptime
fields for v2. The on-disk storage across all 5 filesystems
and the rename-over preservation are API-independent and
would remain unchanged. The change is re-plumbing the
userspace write path from utimensat to file_setattr.
Two design questions:
Would you recommend fa_ptime_sec (__u64) + fa_ptime_nsec
(__u32) matching the statx timespec pattern, or a different
representation?
Pali Rohar has announced plans for mask fields in file_attr.
Should we coordinate with his mask work so ptime can be
selectively set without read-modify-write?
> So does the provenance time cover just the file's contents, or the other
> attributes and xattrs?
Content only. ptime records when the file's data first came
into existence. Metadata changes (permissions, owner,
xattrs) update ctime but leave ptime unchanged. This
matches the semantics of Windows Date Created and macOS
creation time.
> The reason I ask is, does the ptime get copied over for an FICLONE,
> which maps all of one file's data blocks into another?
It should, conceptually — the content's provenance doesn't
change when you clone it. Currently FICLONE shares data
extents but does not copy inode metadata (timestamps,
permissions), so ptime would not be automatically
preserved. The calling tool (e.g., cp --reflink) handles
timestamp copying separately via the write path.
The question is whether FICLONE should be enhanced to copy
ptime from source to destination at the kernel level —
similar to how rename-over preserves ptime. There is an
argument for it: if the kernel handles provenance during
clone, tools don't need to know. But FICLONE doesn't
currently copy mtime either, so adding ptime alone would
be inconsistent. Worth discussing.
Btrfs subvolume snapshots are a different case — they do
preserve ptime because the inodes are COW copies of the
originals.
> And by extension, would it also need to be exchanged if you told
> XFS_IOC_EXCHANGE_RANGE to exchange all contents between two files?
Yes — if the content moves, the provenance should move
with it. If files A and B exchange data extents, their
ptimes should swap. ptime follows the content, not the
inode identity.
> (I know, I know, you said XFS was TBDHBD ;))
Worth considering for a future XFS implementation — and
the file_attr route you suggested would give XFS a clean
integration path for ptime alongside FICLONE and
EXCHANGE_RANGE.
> Last question: Is the provenance time only useful if the file is
> immutable? Either directly via chattr +i, or by enabling fsverity?
No — ptime is useful regardless of mutability. It records
when the document was born, the same way Windows Date
Created works. Editing a document updates mtime but not
the creation date. Both are independently valuable:
ptime: "This file was first created March 15, 2019"
mtime: "It was last modified today"
btime: "This inode was created when I copied it here"
Immutable files (chattr +i, fsverity) are a special case
where ptime has extra forensic strength — the content
provably hasn't changed since the provenance date. But for
the primary use case — preserving creation dates across
cross-platform migrations — mutability doesn't diminish
ptime's value. A document's creation date remains meaningful
regardless of subsequent edits.
Sean
^ permalink raw reply
* Re: [BUG] lseek in sparse files broken on ext3 mounted as ext4
From: Alexander Monakov @ 2026-04-07 7:46 UTC (permalink / raw)
To: linux-ext4; +Cc: Zhang Yi
In-Reply-To: <594c17d9-c00f-e485-96fb-cedf27ce3aa3@ispras.ru>
On Sat, 28 Mar 2026, Alexander Monakov wrote:
> Hi!
>
> Mounting ext3 with '-o delalloc' is explicitly rejected by the kernel
> ("EXT4-fs: Mount option(s) incompatible with ext3" in dmesg).
>
> At the same time, mounting ext3 with '-t ext4' is accepted, and enables
> delayed allocation. In this case, lseek with SEEK_DATA/SEEK_HOLE requests
> does not work correctly and breaks userspace programs such as install(1)
> from coreutils.
>
> To reproduce, it is sufficient to prepare an ext3 image as usual and mount it
> as ext4:
>
> truncate -s 1G img-ext3
> mkfs.ext3 img-ext3
> mkdir mnt-ext3
> mount -t ext4 img-ext3 mnt-ext3
>
> and run the following repro script:
>
> #!/bin/sh
> echo | dd of=src bs=1 count=1 seek=64K
> strace -v -o install.strace install src dst
> cmp src dst
>
> the output should be
>
> src dst differ: char 65537, line 1
>
> with
>
> lseek(3, 0, SEEK_DATA) = -1 ENXIO
>
> in install.strace on the first lseek call, meaning that it reports
> "no more data until EOF" (and hence install does not copy anything).
>
> Either mounting with '-o nodelalloc' or fdatasync'ing the file before install
> (conv=fdatasync in dd or 'sync -d src' after dd) avoids the problem.
>
> The old ext4 wiki [1], the KernelNewbies wiki [2], and the Arch wiki [3]
> all claim that mounting ext3 as ext4 is expected to work correctly.
>
> [1] https://archive.kernel.org/oldwiki/ext4.wiki.kernel.org/index.php/UpgradeToExt4.html
> [2] https://kernelnewbies.org/Ext4
> [3] https://wiki.archlinux.org/title/Ext4
I would like to ping this, and take this chance to cc Zhang Yi: it seems you
have experience in this area, do you have some ideas what is happening here?
How does delayed allocation interact with lseek like that?
I think it would be nice to solve this problem, because it is just silent
data corruption (no sign in dmesg of anything going wrong) in a setup that
was claimed to be safe by ext4 wiki (and propagated elsewhere).
Downstream this was discovered in Gentoo bug 970253 (I'm not the affected user,
I assisted with root-causing the bug): https://bugs.gentoo.org/970253
Thanks.
Alexander
^ permalink raw reply
* Re: [PATCH 0/3] show orphan file inode detail info
From: Jan Kara @ 2026-04-07 10:29 UTC (permalink / raw)
To: Ye Bin; +Cc: tytso, adilger.kernel, linux-ext4, jack, linux-fsdevel
In-Reply-To: <20260403082507.1882703-1-yebin@huaweicloud.com>
Hi!
On Fri 03-04-26 16:25:04, Ye Bin wrote:
> From: Ye Bin <yebin10@huawei.com>
>
> In actual production environments, the issue of inconsistency between
> df and du is frequently encountered. In many cases, the cause of the
> problem can be identified through the use of lsof. However, when
> overlayfs is combined with project quota configuration, the issue becomes
> more complex and troublesome to diagnose. First, to determine the project
> ID, one needs to obtain orphaned nodes using `fsck.ext4 -fn /dev/xx`, and
> then retrieve file information through `debugfs`. However, the file names
> cannot always be obtained, and it is often unclear which files they are.
> To identify which files these are, one would need to use crash for online
> debugging or use kprobe to gather information incrementally. However, some
> customers in production environments do not agree to upload any tools, and
> online debugging might impact the business. There are also scenarios where
> files are opened in kernel mode, which do not generate file descriptors(fds),
> making it impossible to identify which files were deleted but still have
> references through lsof. This patchset adds a procfs interface to query
> information about orphaned nodes, which can assist in the analysis and
> localization of such issues.
I agree listing orphan inodes for a superblock is useful and the usefulness
could actually go beyond ext4. I imagine the very same problem is there for
XFS or btrfs so perhaps we could think for a while whether we can provide
an interface that wouldn't be ext4 specific? Perhaps an ioctl
(GET_ORPHAN_FILES) that would return an fd and reading from that fd would
return entries for orphan inodes?
Also regarding information reported about orphan inodes - won't it be better
interface to just return a list of file handles? Userspace can then do
whatever it needs with them - open, statx, calling ioctl, etc - so we
thwart feature creep with people asking us to add more information to the
interface. This also offloads a lot of security questions about the
interface to appropriate syscalls. So overall it looks like a win to me.
Honza
>
> Ye Bin (3):
> ext4: register 'orphan_list' procfs
> ext4: show inode orphan list detail information
> ext4: show orphan file inode detail info
>
> fs/ext4/ext4.h | 1 +
> fs/ext4/orphan.c | 227 +++++++++++++++++++++++++++++++++++++++++++++++
> fs/ext4/sysfs.c | 2 +
> 3 files changed, 230 insertions(+)
>
> --
> 2.34.1
>
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply
* Re: [PATCH v3] ext2: use get_random_u32() where appropriate
From: Jan Kara @ 2026-04-07 11:18 UTC (permalink / raw)
To: David Carlier; +Cc: Jan Kara, linux-ext4, linux-kernel
In-Reply-To: <20260405154717.4705-1-devnexen@gmail.com>
On Sun 05-04-26 16:47:17, David Carlier wrote:
> Use the typed random integer helpers instead of
> get_random_bytes() when filling a single integer variable.
> The helpers return the value directly, require no pointer
> or size argument, and better express intent.
>
> Signed-off-by: David Carlier <devnexen@gmail.com>
Thanks! I've added the patch to my tree.
Honza
> ---
> fs/ext2/super.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/fs/ext2/super.c b/fs/ext2/super.c
> index 603f2641fe10..e4136490c883 100644
> --- a/fs/ext2/super.c
> +++ b/fs/ext2/super.c
> @@ -1151,7 +1151,7 @@ static int ext2_fill_super(struct super_block *sb, struct fs_context *fc)
> goto failed_mount2;
> }
> sbi->s_gdb_count = db_count;
> - get_random_bytes(&sbi->s_next_generation, sizeof(u32));
> + sbi->s_next_generation = get_random_u32();
> spin_lock_init(&sbi->s_next_gen_lock);
>
> /* per filesystem reservation list head & lock */
> --
> 2.53.0
>
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply
* Re: [PATCH] ext2: reject inodes with zero i_nlink and valid mode in ext2_iget()
From: Jan Kara @ 2026-04-07 14:00 UTC (permalink / raw)
To: Vasiliy Kovalev
Cc: Jan Kara, linux-ext4, Andrew Morton, Alexey Dobriyan,
linux-kernel, lvc-project
In-Reply-To: <20260404152011.2590197-1-kovalev@altlinux.org>
On Sat 04-04-26 18:20:11, Vasiliy Kovalev wrote:
> ext2_iget() already rejects inodes with i_nlink == 0 when i_mode is
> zero or i_dtime is set, treating them as deleted. However, the case of
> i_nlink == 0 with a non-zero mode and zero dtime slips through. Since
> ext2 has no orphan list, such a combination can only result from
> filesystem corruption - a legitimate inode deletion always sets either
> i_dtime or clears i_mode before freeing the inode.
>
> A crafted image can exploit this gap to present such an inode to the
> VFS, which then triggers WARN_ON inside drop_nlink() (fs/inode.c) via
> ext2_unlink(), ext2_rename() and ext2_rmdir():
>
> WARNING: CPU: 3 PID: 609 at fs/inode.c:336 drop_nlink+0xad/0xd0 fs/inode.c:336
> CPU: 3 UID: 0 PID: 609 Comm: syz-executor Not tainted 6.12.77+ #1
> Call Trace:
> <TASK>
> inode_dec_link_count include/linux/fs.h:2518 [inline]
> ext2_unlink+0x26c/0x300 fs/ext2/namei.c:295
> vfs_unlink+0x2fc/0x9b0 fs/namei.c:4477
> do_unlinkat+0x53e/0x730 fs/namei.c:4541
> __x64_sys_unlink+0xc6/0x110 fs/namei.c:4587
> do_syscall_64+0xf5/0x220 arch/x86/entry/common.c:78
> entry_SYSCALL_64_after_hwframe+0x77/0x7f
> </TASK>
>
> WARNING: CPU: 0 PID: 646 at fs/inode.c:336 drop_nlink+0xad/0xd0 fs/inode.c:336
> CPU: 0 UID: 0 PID: 646 Comm: syz.0.17 Not tainted 6.12.77+ #1
> Call Trace:
> <TASK>
> inode_dec_link_count include/linux/fs.h:2518 [inline]
> ext2_rename+0x35e/0x850 fs/ext2/namei.c:374
> vfs_rename+0xf2f/0x2060 fs/namei.c:5021
> do_renameat2+0xbe2/0xd50 fs/namei.c:5178
> __x64_sys_rename+0x7e/0xa0 fs/namei.c:5223
> do_syscall_64+0xf5/0x220 arch/x86/entry/common.c:78
> entry_SYSCALL_64_after_hwframe+0x77/0x7f
> </TASK>
>
> WARNING: CPU: 0 PID: 634 at fs/inode.c:336 drop_nlink+0xad/0xd0 fs/inode.c:336
> CPU: 0 UID: 0 PID: 634 Comm: syz-executor Not tainted 6.12.77+ #1
> Call Trace:
> <TASK>
> inode_dec_link_count include/linux/fs.h:2518 [inline]
> ext2_rmdir+0xca/0x110 fs/ext2/namei.c:311
> vfs_rmdir+0x204/0x690 fs/namei.c:4348
> do_rmdir+0x372/0x3e0 fs/namei.c:4407
> __x64_sys_unlinkat+0xf0/0x130 fs/namei.c:4577
> do_syscall_64+0xf5/0x220 arch/x86/entry/common.c:78
> entry_SYSCALL_64_after_hwframe+0x77/0x7f
> </TASK>
>
> Extend the existing i_nlink == 0 check to also catch this case,
> reporting the corruption via ext2_error() and returning -EFSCORRUPTED.
> This rejects the inode at load time and prevents it from reaching any
> of the namei.c paths.
>
> Found by Linux Verification Center (linuxtesting.org) with Syzkaller.
>
> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
> Cc: stable@vger.kernel.org
> Signed-off-by: Vasiliy Kovalev <kovalev@altlinux.org>
Thanks. I've added the patch to my tree.
Honza
> ---
> fs/ext2/inode.c | 14 +++++++++++---
> 1 file changed, 11 insertions(+), 3 deletions(-)
>
> diff --git a/fs/ext2/inode.c b/fs/ext2/inode.c
> index dbfe9098a124..39d972722f5f 100644
> --- a/fs/ext2/inode.c
> +++ b/fs/ext2/inode.c
> @@ -1430,9 +1430,17 @@ struct inode *ext2_iget (struct super_block *sb, unsigned long ino)
> * the test is that same one that e2fsck uses
> * NeilBrown 1999oct15
> */
> - if (inode->i_nlink == 0 && (inode->i_mode == 0 || ei->i_dtime)) {
> - /* this inode is deleted */
> - ret = -ESTALE;
> + if (inode->i_nlink == 0) {
> + if (inode->i_mode == 0 || ei->i_dtime) {
> + /* this inode is deleted */
> + ret = -ESTALE;
> + } else {
> + ext2_error(sb, __func__,
> + "inode %lu has zero i_nlink with mode 0%o and no dtime, "
> + "filesystem may be corrupt",
> + ino, inode->i_mode);
> + ret = -EFSCORRUPTED;
> + }
> goto bad_inode;
> }
> inode->i_blocks = le32_to_cpu(raw_inode->i_blocks);
> --
> 2.50.1
>
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply
* Re: [PATCH v2 3/3] ext4: derive f_fsid from block device to avoid collisions
From: Theodore Tso @ 2026-04-07 14:47 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Anand Jain, Darrick J. Wong, linux-ext4, linux-btrfs, linux-xfs,
Anand Jain
In-Reply-To: <adSUiB9L0sFAd04U@infradead.org>
On Mon, Apr 06, 2026 at 10:22:16PM -0700, Christoph Hellwig wrote:
> > Dilemma:
> > While statfs(2) [1] suggests f_fsid is "some random stuff," we know
> > userspace (NFS, systemd) often treats it as a persistent handle.
> >
> > Do you prefer one of the names above, or is there a more idiomatic ext4
> > naming convention I should follow?
> >
>
> My take is that anything that should persist should be an on-disk
> feature flag, not a mount option. But I'm not in charge for ext4
My take is that f_fsid is random stuff, as documented by the
specification, so anyone who tries to depend on it needs to be kept in
a padding room where they can't hurt themselves or their users.
And as far as NFS is concerned, file handles should be based on
the super block UUID, not statfs's f_fsid, and anyone who wants to
mount a snapshot as an NFS exported file system at the same time that
the original file system is mounted is _also_ should be gently coaxed
into a padding room where they can't hurt themselves or their users.
The solution that we've used for people who are cloning block devices
for things like cloud images has been for *years* has been to use
"tune2fs -U random /dev/sda1". And this works on mounted file system,
and (for example) built into various cloud images for Google Cloud
Engine.
If we want to change statfs's f_fsid, from one set of "Random stuff"
to another set of "Random stuff", I don't really mind, but I don't
think it's worth *either* a mount option, *or* a feature flag, as
either would be confusing for system adminsitrators when some file
systems behave one way, and other file systems behave another.
- Ted
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox