public inbox for linux-fsdevel@vger.kernel.org
 help / color / mirror / Atom feed
From: Sean Smith <defendthedisabled@gmail.com>
To: tytso@mit.edu
Cc: defendthedisabled@gmail.com, linux-fsdevel@vger.kernel.org,
	linux-ext4@vger.kernel.org, linux-btrfs@vger.kernel.org,
	dsterba@suse.com, david@fromorbit.com, brauner@kernel.org,
	osandov@osandov.com, almaz@kernel.org,
	hirofumi@mail.parknet.co.jp, linkinjeon@kernel.org
Subject: Re: [RFC PATCH v1 0/6] provenance_time (ptime): a new settable timestamp for cross-filesystem provenance
Date: Mon,  6 Apr 2026 19:05:55 -0500	[thread overview]
Message-ID: <20260407000558.417-1-DefendTheDisabled@gmail.com> (raw)
In-Reply-To: <20260405225442.GA1763@macsyma-wired.lan>

[written with AI assistance]

On Sun, Apr 05, 2026 at 06:54:42PM -0400, Theodore Tso wrote:

Thanks for the substantive engagement — it helps clarify where
the proposal needs to justify itself.

> On Sun, Apr 05, 2026 at 02:49:56PM -0500, Sean Smith wrote:
> > 
> >   1. Application atomic saves destroy xattrs. Programs that save
> >      via write-to-temp + rename() replace the inode, permanently
> >      destroying all extended attributes. Only the VFS sees both
> >      inodes during rename -- no userspace mechanism can intercept
> >      this and copy metadata across.
> 
> The VFS could potentially copy the xattr on a rename, no?

It could, but even scoping to user.* means adding conditional
xattr-copy logic into every filesystem's rename handler — with
dynamic allocation and xattr tree lookups on a hot path. ptime
avoids this: one inline inode field, clear semantics, same VFS
patterns as atime/mtime/btime.

> >   2. Every tool in the copy chain must explicitly opt in to xattr
> >      preservation. cp requires --preserve=xattr, rsync requires -X,
> >      tar requires --xattrs. Each missing flag causes silent data
> >      loss. Transparent preservation through arbitrary tool flows
> >      is not achievable in userspace.
> 
> But this is true for your proposed ptime as well.  You have to change
> every single tool to copy over the ptime.  Worse, you have to change
> the format of tar in a non-standard on-disk format change to support
> this new ptime timestamp.  And rsync will require a non-standard
> protocol change to support the new timestamp.

You are right that copy tools require patches. If ptime only
improved the copy-tool situation, I would agree it does not
justify new kernel surface over xattrs.

The structural difference is in the default adoption path.
xattr preservation is permanently per-invocation opt-in: each
tool call needs the correct flag, and the default is to drop
them. A kernel timestamp exposed through statx/utimensat
follows the same API pattern as mtime — standard libraries
and tools naturally evolve to preserve all standard timestamps
by default. ptime has a path to default-preservation that
xattrs structurally cannot reach.

On the formats: the tar patch uses a vendor-prefixed PAX
header (SCHILY.ptime), backward-compatible — old readers
ignore it cleanly. The rsync patch plugs into the existing
--crtimes machinery that already supports macOS and Cygwin.

> > Atomic saves are the default behavior of mainstream applications
> > (LibreOffice, Vim, Kate, etc.).
> 
> You will also have to change mainstream applications to copy ptime
> from the original file to the file.new before the atomic rename.
> Using ptime doesn't change this.  So you will need to make this
> non-standard, Linux-specific change to all of these mainstream
> applications.

This is where the cover letter was not clear enough, and it
is the core reason ptime must be a kernel timestamp.

The patches implement rename-over preservation in all 5
filesystem rename handlers. When rename(source, target)
replaces an existing file, and the source has ptime=0 (the
default for any newly-created temp file) while the target
has ptime != 0, the filesystem copies the target's ptime to
the source before destroying the target's inode. This runs
inside the rename transaction, atomic with the rename itself.

Most GUI applications — LibreOffice, Kate, Qt and GNOME
apps — save via write-to-temp + rename-over-original. For
these, ptime survives automatically with no application
changes:

  1. App writes to temp file              (ptime = 0)
  2. rename(temp, document.odt)
  3. Kernel: source ptime=0, target!=0 -> copies ptime
  4. ptime preserved. No app change.

This is not universal: editors that use rename-away +
create-new (Vim with default backupcopy=no, Emacs) do not
trigger rename-over, and the spec documents this as a known
limitation. But the write-to-temp + rename-over pattern is
the dominant GUI save path, and the kernel handles it
transparently — something no xattr mechanism can provide
without application cooperation.

> Is it worth it?  It's a huge amount of cost being spread across a very
> large part of the open source ecosystem just this fairly narrow use
> case.  Personally, I'm not convinced it's worth the effort.

I think the use case is broader than I conveyed. Any workflow
that copies files from NTFS, APFS, or HFS+ onto native Linux
filesystems loses user-visible creation time unless carried
out-of-band. This affects personal migrations, enterprise
backups, dual-boot users, and professional workflows in
photography, legal, scientific data, and media production.
Windows, macOS, and SMB have supported a settable creation
timestamp for decades — Linux is the outlier.

Users already expend significant resources working around
this gap — metadata manifests, scripts to stamp creation
dates into filenames or xattrs, side-channel databases —
or simply accept the data loss. The cost is already being
paid, continuously and redundantly across the ecosystem.
One upstream investment in ptime converts that distributed
ongoing cost into a bounded effort.

ptime is separate from btime by design: it preserves btime's
value as immutable forensic metadata while providing a
settable timestamp that travels with file content across
filesystem boundaries.

On ecosystem cost: the kernel surface is ~240 lines across
28 files. For context, I am a disabled Medicaid recipient
who came to this from a disability rights litigation
workflow — I need file provenance preserved across an
NTFS-to-Btrfs migration for legal work. The complete
implementation — kernel patches across 5 filesystems,
tool patches, and xfstests — was produced in a few days using 
agentic development tools, which suggests the adoption cost may 
be meaningfully lower than traditional estimates as these 
tools become available across the ecosystem.

I understand a new timestamp is permanent API surface and
the bar should be high. My claim is that rename-over
preservation — automatic ptime survival through application
saves, without application changes — makes this materially
different from an xattr workaround, and justifies that cost.

Sean

  reply	other threads:[~2026-04-07  0:06 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-05 19:49 [RFC PATCH v1 0/6] provenance_time (ptime): a new settable timestamp for cross-filesystem provenance Sean Smith
2026-04-05 19:49 ` [PATCH 1/6] vfs: add provenance_time (ptime) infrastructure Sean Smith
2026-04-05 19:49 ` [PATCH 2/6] btrfs: add provenance time (ptime) support Sean Smith
2026-04-05 19:49 ` [PATCH 3/6] ntfs3: map ptime to NTFS creation time with rename-over Sean Smith
2026-04-05 19:50 ` [PATCH 4/6] ext4: add dedicated ptime field alongside i_crtime Sean Smith
2026-04-05 19:50 ` [PATCH 5/6] fat: map ptime to FAT creation time with rename-over Sean Smith
2026-04-05 19:50 ` [PATCH 6/6] exfat: map ptime to exFAT " Sean Smith
2026-04-05 22:54 ` [RFC PATCH v1 0/6] provenance_time (ptime): a new settable timestamp for cross-filesystem provenance Theodore Tso
2026-04-07  0:05   ` Sean Smith [this message]
2026-04-07  1:42     ` Darrick J. Wong
2026-04-07  6:06       ` Sean Smith
2026-04-07 15:17         ` Darrick J. Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260407000558.417-1-DefendTheDisabled@gmail.com \
    --to=defendthedisabled@gmail.com \
    --cc=almaz@kernel.org \
    --cc=brauner@kernel.org \
    --cc=david@fromorbit.com \
    --cc=dsterba@suse.com \
    --cc=hirofumi@mail.parknet.co.jp \
    --cc=linkinjeon@kernel.org \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=osandov@osandov.com \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox