From: Josef Bacik <josef@toxicpanda.com>
To: Jeff Layton <jlayton@kernel.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>,
Christian Brauner <brauner@kernel.org>, Jan Kara <jack@suse.cz>,
Steven Rostedt <rostedt@goodmis.org>,
Masami Hiramatsu <mhiramat@kernel.org>,
Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
Chandan Babu R <chandan.babu@oracle.com>,
"Darrick J. Wong" <djwong@kernel.org>,
Theodore Ts'o <tytso@mit.edu>,
Andreas Dilger <adilger.kernel@dilger.ca>,
Chris Mason <clm@fb.com>, David Sterba <dsterba@suse.com>,
Hugh Dickins <hughd@google.com>,
Andrew Morton <akpm@linux-foundation.org>,
Andi Kleen <ak@linux.intel.com>,
kernel-team@fb.com, linux-fsdevel@vger.kernel.org,
linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org,
linux-xfs@vger.kernel.org, linux-ext4@vger.kernel.org,
linux-btrfs@vger.kernel.org, linux-mm@kvack.org,
linux-nfs@vger.kernel.org
Subject: Re: [PATCH v2 11/11] Documentation: add a new file documenting multigrain timestamps
Date: Mon, 1 Jul 2024 09:52:55 -0400 [thread overview]
Message-ID: <20240701135255.GC504479@perftesting> (raw)
In-Reply-To: <20240701-mgtime-v2-11-19d412a940d9@kernel.org>
On Mon, Jul 01, 2024 at 06:26:47AM -0400, Jeff Layton wrote:
> Add a high-level document that describes how multigrain timestamps work,
> rationale for them, and some info about implementation and tradeoffs.
>
> Signed-off-by: Jeff Layton <jlayton@kernel.org>
> ---
> Documentation/filesystems/multigrain-ts.rst | 126 ++++++++++++++++++++++++++++
> 1 file changed, 126 insertions(+)
>
> diff --git a/Documentation/filesystems/multigrain-ts.rst b/Documentation/filesystems/multigrain-ts.rst
> new file mode 100644
> index 000000000000..beef7f79108c
> --- /dev/null
> +++ b/Documentation/filesystems/multigrain-ts.rst
> @@ -0,0 +1,126 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +=====================
> +Multigrain Timestamps
> +=====================
> +
> +Introduction
> +============
> +Historically, the kernel has always used a coarse time values to stamp
> +inodes. This value is updated on every jiffy, so any change that happens
> +within that jiffy will end up with the same timestamp.
> +
> +When the kernel goes to stamp an inode (due to a read or write), it first gets
> +the current time and then compares it to the existing timestamp(s) to see
> +whether anything will change. If nothing changed, then it can avoid updating
> +the inode's metadata.
> +
> +Coarse timestamps are therefore good from a performance standpoint, since they
> +reduce the need for metadata updates, but bad from the standpoint of
> +determining whether anything has changed, since a lot of things can happen in a
> +jiffy.
> +
> +They are particularly troublesome with NFSv3, where unchanging timestamps can
> +make it difficult to tell whether to invalidate caches. NFSv4 provides a
> +dedicated change attribute that should always show a visible change, but not
> +all filesystems implement this properly, and many just populating this with
> +the ctime.
> +
> +Multigrain timestamps aim to remedy this by selectively using fine-grained
> +timestamps when a file has had its timestamps queried recently, and the current
> +coarse-grained time does not cause a change.
> +
> +Inode Timestamps
> +================
> +There are currently 3 timestamps in the inode that are updated to the current
> +wallclock time on different activity:
> +
> +ctime:
> + The inode change time. This is stamped with the current time whenever
> + the inode's metadata is changed. Note that this value is not settable
> + from userland.
> +
> +mtime:
> + The inode modification time. This is stamped with the current time
> + any time a file's contents change.
> +
> +atime:
> + The inode access time. This is stamped whenever an inode's contents are
> + read. Widely considered to be a terrible mistake. Usually avoided with
> + options like noatime or relatime.
> +
> +Updating the mtime always implies a change to the ctime, but updating the
> +atime due to a read request does not.
> +
> +Multigrain timestamps are only tracked for the ctime and the mtime. atimes are
> +not affected and always use the coarse-grained value (subject to the floor).
> +
> +Inode Timestamp Ordering
> +========================
> +
> +In addition just providing info about changes to individual files, file
> +timestamps also serve an important purpose in applications like "make". These
> +programs measure timestamps in order to determine whether source files might be
> +newer than cached objects.
> +
> +Userland applications like make can only determine ordering based on
> +operational boundaries. For a syscall those are the syscall entry and exit
> +points. For io_uring or nfsd operations, that's the request submission and
> +response. In the case of concurrent operations, userland can make no
> +determination about the order in which things will occur.
> +
> +For instance, if a single thread modifies one file, and then another file in
> +sequence, the second file must show an equal or later mtime than the first. The
> +same is true if two threads are issuing similar operations that do not overlap
> +in time.
> +
> +If however, two threads have racing syscalls that overlap in time, then there
> +is no such guarantee, and the second file may appear to have been modified
> +before, after or at the same time as the first, regardless of which one was
> +submitted first.
> +
> +Multigrain Timestamps
> +=====================
> +Multigrain timestamps are aimed at ensuring that changes to a single file are
> +always recognizeable, without violating the ordering guarantees when multiple
> +different files are modified. This affects the mtime and the ctime, but the
> +atime will always use coarse-grained timestamps.
> +
> +It uses the lowest-order bit in the timestamp as a flag that indicates whether
> +the mtime or ctime have been queried. If either or both have, then the kernel
> +takes special care to ensure the next timestamp update will display a visible
> +change. This ensures tight cache coherency for use-cases like NFS, without
> +sacrificing the benefits of reduced metadata updates when files aren't being
> +watched.
> +
> +The ctime Floor Value
> +=====================
> +It's not sufficient to simply use fine or coarse-grained timestamps based on
> +whether the mtime or ctime has been queried. A file could get a fine grained
> +timestamp, and then a second file modified later could get a coarse-grained one
> +that appears earlier than the first, which would break the kernel's timestamp
> +ordering guarantees.
> +
> +To mitigate this problem, we maintain a per-time_namespace floor value that
You dropped this bit in the series, so this isn't correct, should just be
"we maintain a floor value"
Thanks,
Josef
next prev parent reply other threads:[~2024-07-01 13:52 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-07-01 10:26 [PATCH v2 00/11] fs: multigrain timestamp redux Jeff Layton
2024-07-01 10:26 ` [PATCH v2 01/11] fs: turn inode ctime fields into a single ktime_t Jeff Layton
2024-07-01 10:26 ` [PATCH v2 02/11] fs: uninline inode_get_ctime and inode_set_ctime_to_ts Jeff Layton
2024-07-01 10:26 ` [PATCH v2 03/11] fs: tracepoints for inode_needs_update_time " Jeff Layton
2024-07-01 10:26 ` [PATCH v2 04/11] fs: add infrastructure for multigrain timestamps Jeff Layton
2024-07-01 10:26 ` [PATCH v2 05/11] fs: add percpu counters to count fine vs. coarse timestamps Jeff Layton
2024-07-01 10:26 ` [PATCH v2 06/11] fs: have setattr_copy handle multigrain timestamps appropriately Jeff Layton
2024-07-01 10:26 ` [PATCH v2 07/11] xfs: switch to multigrain timestamps Jeff Layton
2024-07-01 13:46 ` Josef Bacik
2024-07-01 10:26 ` [PATCH v2 08/11] ext4: " Jeff Layton
2024-07-01 10:26 ` [PATCH v2 09/11] btrfs: convert " Jeff Layton
2024-07-01 13:49 ` Josef Bacik
2024-07-01 13:57 ` Jeff Layton
2024-07-01 15:24 ` David Sterba
2024-07-01 20:33 ` Josef Bacik
2024-07-01 10:26 ` [PATCH v2 10/11] tmpfs: add support for " Jeff Layton
2024-07-01 10:26 ` [PATCH v2 11/11] Documentation: add a new file documenting " Jeff Layton
2024-07-01 13:52 ` Josef Bacik [this message]
2024-07-01 13:53 ` [PATCH v2 00/11] fs: multigrain timestamp redux Josef Bacik
2024-07-01 14:12 ` Jeff Layton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240701135255.GC504479@perftesting \
--to=josef@toxicpanda.com \
--cc=adilger.kernel@dilger.ca \
--cc=ak@linux.intel.com \
--cc=akpm@linux-foundation.org \
--cc=brauner@kernel.org \
--cc=chandan.babu@oracle.com \
--cc=clm@fb.com \
--cc=djwong@kernel.org \
--cc=dsterba@suse.com \
--cc=hughd@google.com \
--cc=jack@suse.cz \
--cc=jlayton@kernel.org \
--cc=kernel-team@fb.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-nfs@vger.kernel.org \
--cc=linux-trace-kernel@vger.kernel.org \
--cc=linux-xfs@vger.kernel.org \
--cc=mathieu.desnoyers@efficios.com \
--cc=mhiramat@kernel.org \
--cc=rostedt@goodmis.org \
--cc=tytso@mit.edu \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.