linux-api.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Theodore Ts'o" <tytso@mit.edu>
To: Dave Chinner <david@fromorbit.com>
Cc: Christoph Hellwig <hch@infradead.org>,
	Pavel Emelyanov <xemul@scylladb.com>,
	linux-fsdevel@vger.kernel.org,
	"Raphael S . Carvalho" <raphaelsc@scylladb.com>,
	linux-api@vger.kernel.org, linux-xfs@vger.kernel.org
Subject: Re: [PATCH] fs: Propagate FMODE_NOCMTIME flag to user-facing O_NOCMTIME
Date: Sun, 5 Oct 2025 22:16:32 -0400	[thread overview]
Message-ID: <20251006021632.GI386127@mit.edu> (raw)
In-Reply-To: <aOMBbKUlvv2uYLzD@dread.disaster.area>

On Mon, Oct 06, 2025 at 10:38:20AM +1100, Dave Chinner wrote:
> We have already provided a safe method for minimising the overhead
> of c/mtime updates in the IO path - it's called lazytime.  The
> lazytime mount option provides eventual consistency for c/mtime
> updates for IO operations instead of immediate consistency.
> 
> Timestamps are still updated to have the correct values, but the
> latency/performance of the timestamp updates is greatly improved by
> holding them purely in memory until some other trigger forces them
> to be persisted to disk.

Specifically, the timestamps are persisted to stable store when (a)
the file system is unmounted, (b) when the inode needs to be pushed
out to memory due to memory pressure, (c) when the inode is forcibly
persisted using fsync(), (d) when some other inode field is updated,
and the inode gets written out, or (e) after 24 hours.

As a result, the on-disk timestamps will be at most 24 hours stale.
But this is POSIX compliant, because if you read the timestamps using
stat(1), you will get the updated values, and what happens after a
crash in the absense of an fsync(2) is not defined.

The reason why we implemented this at $WORK is you are constantly
updating a database using fdatasync(2), and you care about 99.9
percentage I/O latency, the 4k writes to the inode table will
eventually triger a hard drive's Adjacent Track Interference (ATI)
mitigation, which involves rewriting set of disk tracks to avoid the
analog signal for adjacent tracks getting weakened by the hot-spot
writes, and this is measurable if you are looking at long-tail I/O
latencies.  (And yes, we had to talk to our HDD vendors to figure out
this is what was going on, since performance is out of scop[e of
SCSI/SATA specifications.  Hence, random long-tail ATI latencies to
preserve data integrity is allowed, and in fact, actually a good
thing.  :-)

					- Ted

      reply	other threads:[~2025-10-06  2:16 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20251003093213.52624-1-xemul@scylladb.com>
2025-10-04  4:26 ` [PATCH] fs: Propagate FMODE_NOCMTIME flag to user-facing O_NOCMTIME Christoph Hellwig
2025-10-04 16:08   ` Andy Lutomirski
2025-10-07  5:08     ` Christoph Hellwig
2025-10-08 15:22       ` Andy Lutomirski
2025-10-08 21:27         ` Dave Chinner
2025-10-08 21:51           ` Andy Lutomirski
2025-10-11  1:35             ` Dave Chinner
2025-10-11  4:04               ` Andy Lutomirski
2025-10-10  5:27         ` Christoph Hellwig
2025-10-10 17:35           ` Andy Lutomirski
2025-10-05 22:06   ` Dave Chinner
2025-10-07  5:10     ` Christoph Hellwig
2025-10-05 23:38   ` Dave Chinner
2025-10-06  2:16     ` Theodore Ts'o [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20251006021632.GI386127@mit.edu \
    --to=tytso@mit.edu \
    --cc=david@fromorbit.com \
    --cc=hch@infradead.org \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=raphaelsc@scylladb.com \
    --cc=xemul@scylladb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).