linux-trace-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff Layton <jlayton@kernel.org>
To: Thomas Gleixner <tglx@linutronix.de>,
	John Stultz <jstultz@google.com>,
	 Stephen Boyd <sboyd@kernel.org>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Christian Brauner <brauner@kernel.org>, Jan Kara <jack@suse.cz>,
	Steven Rostedt <rostedt@goodmis.org>,
	Masami Hiramatsu <mhiramat@kernel.org>,
	Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
	Jonathan Corbet <corbet@lwn.net>,
	Chandan Babu R <chandan.babu@oracle.com>,
	"Darrick J. Wong" <djwong@kernel.org>,
	Theodore Ts'o <tytso@mit.edu>,
	Andreas Dilger <adilger.kernel@dilger.ca>,
	Chris Mason <clm@fb.com>, Josef Bacik <josef@toxicpanda.com>,
	David Sterba <dsterba@suse.com>,  Hugh Dickins <hughd@google.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Chuck Lever <chuck.lever@oracle.com>,
	Vadim Fedorenko <vadim.fedorenko@linux.dev>
Cc: Randy Dunlap <rdunlap@infradead.org>,
	linux-kernel@vger.kernel.org,  linux-fsdevel@vger.kernel.org,
	linux-trace-kernel@vger.kernel.org,  linux-doc@vger.kernel.org,
	linux-xfs@vger.kernel.org, linux-ext4@vger.kernel.org,
	 linux-btrfs@vger.kernel.org, linux-nfs@vger.kernel.org,
	linux-mm@kvack.org
Subject: Re: [PATCH v8 01/11] timekeeping: move multigrain timestamp floor handling into timekeeper
Date: Thu, 19 Sep 2024 18:50:59 +0200	[thread overview]
Message-ID: <d771ea4d44f3c9da8470d0aa9d58ee1d96f5fb30.camel@kernel.org> (raw)
In-Reply-To: <87a5g79aag.ffs@tglx>

On Mon, 2024-09-16 at 12:12 +0200, Thomas Gleixner wrote:
> On Sat, Sep 14 2024 at 13:07, Jeff Layton wrote:
> > For multigrain timestamps, we must keep track of the latest timestamp
> 
> What is a multgrain timestamp? Can you please describe the concept
> behind it? I'm not going to chase random documentation or whatever
> because change logs have to self contained.
> 
> And again 'we' do nothing. Describe the problem in technical terms and
> do not impersonate code.
> 

Hi Thomas!

Sorry for the delay in responding. I'll try to summarize below, but
I'll also note that patch #7 in the v8 series adds a file to
Documentation/ that explains this in a bit more depth:

Currently the kernel always stamps files (mtime, ctime, etc.) using the
coarse-grained clock. This is usually a good thing, since it reduces
the number of metadata updates, but means that you can't reliably use
file timestamps to detect whether there have been changes to the file
since it was last checked. This is particularly a problem for NFSv3
clients, which use timestamps to know when to invalidate their
pagecache for an inode [1].

The idea is to allow the kernel to use fine-grained timestamps (mtime
and ctime) on files when they are under direct observation. When a task
does a ->getattr against an inode for STATX_MTIME or STATX_CTIME, a
flag is set in the inode that tells the kernel to use the fine-grained
clock for the timestamp update iff the current coarse-grained clock
value would not cause a change to the mtime/ctime.

This works, but there is a problem:

It's possible for one inode to get a fine-grained timestamp, and then
another to get a coarse-grained timestamp. If this happens within a
single coarse-grained timer tick, then the files may appear to have
been modified in reverse order, which breaks POSIX guarantees (and
obscure programs like "make").

The fix for this is to establish a floor value for the coarse-grained
clock. When stamping a file with a fine-grained timestamp, we update
the floor value with the current monotonic time (using cmpxchg). Then
later, when a coarse-grained timestamp is requested, check whether the
floor is later than the current coarse-grained time. If it is, then the
kernel will return the floor value (converted to realtime) instead of
the current coarse-grained clock. That allows us to maintain the
ordering guarantees.

My original implementation of this tracked the floor value in
fs/inode.c (also using cmpxchg), but that caused a performance
regression, mostly due to multiple calls into the timekeeper functions
with seqcount loops. By adding the floor to the timekeeper we can get
that back down to 1 seqcount loop.

Let me know if you have more questions about this, or suggestions about
how to do this better. The timekeeping code is not my area of expertise
(obviously) so I'm open to doing this a better way if there is one.

Thanks for the review so far!

[1]: NFSv4 mandates an opaque change attribute (usually using
inode->i_version), but only some filesystems have a proper
implementation of it (XFS being the notable exception). For the others,
we end up using the ctime to generate a change attribute, which means
that NFSv4 has the same problem on those filesystems. i_version also
doesn't help NFSv3 clients and servers.
-- 
Jeff Layton <jlayton@kernel.org>

  parent reply	other threads:[~2024-09-19 16:51 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-09-14 17:07 [PATCH v8 00/11] fs: multigrain timestamp redux Jeff Layton
2024-09-14 17:07 ` [PATCH v8 01/11] timekeeping: move multigrain timestamp floor handling into timekeeper Jeff Layton
2024-09-14 20:10   ` John Stultz
2024-09-14 23:14     ` Jeff Layton
2024-09-16 10:12   ` Thomas Gleixner
2024-09-16 10:32     ` Thomas Gleixner
2024-09-16 10:57       ` Jeff Layton
2024-09-30 19:16         ` Thomas Gleixner
2024-09-30 19:37           ` Jeff Layton
2024-09-30 20:19             ` Thomas Gleixner
2024-09-30 20:53               ` Jeff Layton
2024-09-30 21:35                 ` Thomas Gleixner
2024-10-01  9:45                   ` Jeff Layton
2024-10-01 12:45                     ` Thomas Gleixner
2024-10-02 12:41                       ` Jeff Layton
2024-09-19 16:50     ` Jeff Layton [this message]
2024-09-30 19:43       ` Thomas Gleixner
2024-09-30 20:12         ` Jeff Layton
2024-09-30 19:13   ` Thomas Gleixner
2024-09-30 19:27     ` Jeff Layton
2024-09-30 20:15       ` Thomas Gleixner
2024-09-14 17:07 ` [PATCH v8 02/11] fs: add infrastructure for multigrain timestamps Jeff Layton
2024-09-14 17:07 ` [PATCH v8 03/11] fs: have setattr_copy handle multigrain timestamps appropriately Jeff Layton
2024-09-14 17:07 ` [PATCH v8 04/11] fs: handle delegated timestamps in setattr_copy_mgtime Jeff Layton
2024-09-14 17:07 ` [PATCH v8 05/11] fs: tracepoints around multigrain timestamp events Jeff Layton
2024-09-15  8:21   ` Steven Rostedt
2024-09-14 17:07 ` [PATCH v8 06/11] fs: add percpu counters for significant " Jeff Layton
2024-09-16 10:20   ` Thomas Gleixner
2024-09-14 17:07 ` [PATCH v8 07/11] Documentation: add a new file documenting multigrain timestamps Jeff Layton
2024-09-16  1:01   ` Bagas Sanjaya
2024-09-19 16:53     ` Jeff Layton
2024-09-14 17:07 ` [PATCH v8 08/11] xfs: switch to " Jeff Layton
2024-09-14 17:07 ` [PATCH v8 09/11] ext4: " Jeff Layton
2024-09-14 17:07 ` [PATCH v8 10/11] btrfs: convert " Jeff Layton
2024-09-14 17:07 ` [PATCH v8 11/11] tmpfs: add support for " Jeff Layton
2024-09-26 16:59 ` [PATCH v8 00/11] fs: multigrain timestamp redux Randy Dunlap

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d771ea4d44f3c9da8470d0aa9d58ee1d96f5fb30.camel@kernel.org \
    --to=jlayton@kernel.org \
    --cc=adilger.kernel@dilger.ca \
    --cc=akpm@linux-foundation.org \
    --cc=brauner@kernel.org \
    --cc=chandan.babu@oracle.com \
    --cc=chuck.lever@oracle.com \
    --cc=clm@fb.com \
    --cc=corbet@lwn.net \
    --cc=djwong@kernel.org \
    --cc=dsterba@suse.com \
    --cc=hughd@google.com \
    --cc=jack@suse.cz \
    --cc=josef@toxicpanda.com \
    --cc=jstultz@google.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=linux-trace-kernel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=mhiramat@kernel.org \
    --cc=rdunlap@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=sboyd@kernel.org \
    --cc=tglx@linutronix.de \
    --cc=tytso@mit.edu \
    --cc=vadim.fedorenko@linux.dev \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).