From: Thomas Gleixner <tglx@linutronix.de>
To: Jeff Layton <jlayton@kernel.org>,
John Stultz <jstultz@google.com>, Stephen Boyd <sboyd@kernel.org>,
Alexander Viro <viro@zeniv.linux.org.uk>,
Christian Brauner <brauner@kernel.org>, Jan Kara <jack@suse.cz>,
Steven Rostedt <rostedt@goodmis.org>,
Masami Hiramatsu <mhiramat@kernel.org>,
Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
Jonathan Corbet <corbet@lwn.net>,
Randy Dunlap <rdunlap@infradead.org>,
Chandan Babu R <chandan.babu@oracle.com>,
"Darrick J. Wong" <djwong@kernel.org>,
Theodore Ts'o <tytso@mit.edu>,
Andreas Dilger <adilger.kernel@dilger.ca>,
Chris Mason <clm@fb.com>, Josef Bacik <josef@toxicpanda.com>,
David Sterba <dsterba@suse.com>, Hugh Dickins <hughd@google.com>,
Andrew Morton <akpm@linux-foundation.org>,
Chuck Lever <chuck.lever@oracle.com>,
Vadim Fedorenko <vadim.fedorenko@linux.dev>
Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
linux-trace-kernel@vger.kernel.org, linux-doc@vger.kernel.org,
linux-xfs@vger.kernel.org, linux-ext4@vger.kernel.org,
linux-btrfs@vger.kernel.org, linux-nfs@vger.kernel.org,
linux-mm@kvack.org, Jeff Layton <jlayton@kernel.org>
Subject: Re: [PATCH v8 01/12] timekeeping: add interfaces for handling timestamps with a floor value
Date: Wed, 02 Oct 2024 16:16:59 +0200 [thread overview]
Message-ID: <87zfnmwpwk.ffs@tglx> (raw)
In-Reply-To: <20241001-mgtime-v8-1-903343d91bc3@kernel.org>
On Tue, Oct 01 2024 at 06:58, Jeff Layton wrote:
V8 ? I remember that I reviewed v* already :)
Also the sentence after the susbsystem prefix starts with an uppercase
letter.
> Multigrain timestamps allow the kernel to use fine-grained timestamps
> when an inode's attributes is being actively observed via ->getattr().
> With this support, it's possible for a file to get a fine-grained
> timestamp, and another modified after it to get a coarse-grained stamp
> that is earlier than the fine-grained time. If this happens then the
> files can appear to have been modified in reverse order, which breaks
> VFS ordering guarantees.
>
> To prevent this, maintain a floor value for multigrain timestamps.
> Whenever a fine-grained timestamp is handed out, record it, and when
> coarse-grained stamps are handed out, ensure they are not earlier than
> that value. If the coarse-grained timestamp is earlier than the
> fine-grained floor, return the floor value instead.
>
> Add a static singleton atomic64_t into timekeeper.c that we can use to
s/we can use/is used/
> keep track of the latest fine-grained time ever handed out. This is
> tracked as a monotonic ktime_t value to ensure that it isn't affected by
> clock jumps. Because it is updated at different times than the rest of
> the timekeeper object, the floor value is managed independently of the
> timekeeper via a cmpxchg() operation, and sits on its own cacheline.
>
> This patch also adds two new public interfaces:
git grep 'This patch' Docuementation/process/
> - ktime_get_coarse_real_ts64_mg() fills a timespec64 with the later of the
> coarse-grained clock and the floor time
>
> - ktime_get_real_ts64_mg() gets the fine-grained clock value, and tries
> to swap it into the floor. A timespec64 is filled with the result.
>
> Since the floor is global, take care to avoid updating it unless it's
> absolutely necessary. If we do the cmpxchg and find that the value has
We do nothing. Please describe your changes in passive voice and do not
impersonate code.
> been updated since we fetched it, then we discard the fine-grained time
> that was fetched in favor of the recent update.
This still is confusing. Something like this:
The floor value is global and updated via a single try_cmpxchg(). If
that fails then the operation raced with a concurrent update. It does
not matter whether the new value is later than the value, which the
failed cmpxchg() tried to write, or not. Add sensible technical
explanation.
> +/*
> + * Multigrain timestamps require that we keep track of the latest fine-grained
> + * timestamp that has been issued, and never return a coarse-grained timestamp
> + * that is earlier than that value.
> + *
> + * mg_floor represents the latest fine-grained time that we have handed out as
> + * a timestamp on the system. Tracked as a monotonic ktime_t, and converted to
> + * the realtime clock on an as-needed basis.
> + *
> + * This ensures that we never issue a timestamp earlier than one that has
> + * already been issued, as long as the realtime clock never experiences a
> + * backward clock jump. If the realtime clock is set to an earlier time, then
> + * realtime timestamps can appear to go backward.
> + */
> +static __cacheline_aligned_in_smp atomic64_t mg_floor;
> +
> static inline void tk_normalize_xtime(struct timekeeper *tk)
> {
> while (tk->tkr_mono.xtime_nsec >= ((u64)NSEC_PER_SEC << tk->tkr_mono.shift)) {
> @@ -2394,6 +2410,86 @@ void ktime_get_coarse_real_ts64(struct timespec64 *ts)
> }
> EXPORT_SYMBOL(ktime_get_coarse_real_ts64);
>
> +/**
> + * ktime_get_coarse_real_ts64_mg - return latter of coarse grained time or floor
> + * @ts: timespec64 to be filled
> + *
> + * Fetch the global mg_floor value, convert it to realtime and
> + * compare it to the current coarse-grained time. Fill @ts with
> + * whichever is latest. Note that this is a filesystem-specific
> + * interface and should be avoided outside of that context.
> + */
> +void ktime_get_coarse_real_ts64_mg(struct timespec64 *ts)
> +{
> + struct timekeeper *tk = &tk_core.timekeeper;
> + u64 floor = atomic64_read(&mg_floor);
> + ktime_t f_real, offset, coarse;
> + unsigned int seq;
> +
> + do {
> + seq = read_seqcount_begin(&tk_core.seq);
> + *ts = tk_xtime(tk);
> + offset = tk_core.timekeeper.offs_real;
> + } while (read_seqcount_retry(&tk_core.seq, seq));
> +
> + coarse = timespec64_to_ktime(*ts);
> + f_real = ktime_add(floor, offset);
> + if (ktime_after(f_real, coarse))
> + *ts = ktime_to_timespec64(f_real);
> +}
> +EXPORT_SYMBOL_GPL(ktime_get_coarse_real_ts64_mg);
> +
> +/**
> + * ktime_get_real_ts64_mg - attempt to update floor value and return result
> + * @ts: pointer to the timespec to be set
> + *
> + * Get a monotonic fine-grained time value and attempt to swap it into the
> + * floor. If it succeeds then accept the new floor value. If it fails
> + * then another task raced in during the interim time and updated the floor.
> + * That value is just as valid, so accept that value in this case.
Why? 'just as valid' does not tell me anything.
> + * @ts will be filled with the resulting floor value, regardless of
> + * the outcome of the swap. Note that this is a filesystem specific interface
> + * and should be avoided outside of that context.
> + */
> +void ktime_get_real_ts64_mg(struct timespec64 *ts)
> +{
> + struct timekeeper *tk = &tk_core.timekeeper;
> + ktime_t old = atomic64_read(&mg_floor);
> + ktime_t offset, mono;
> + unsigned int seq;
> + u64 nsecs;
> +
> + do {
> + seq = read_seqcount_begin(&tk_core.seq);
> +
> + ts->tv_sec = tk->xtime_sec;
> + mono = tk->tkr_mono.base;
> + nsecs = timekeeping_get_ns(&tk->tkr_mono);
> + offset = tk_core.timekeeper.offs_real;
> + } while (read_seqcount_retry(&tk_core.seq, seq));
> +
> + mono = ktime_add_ns(mono, nsecs);
> +
> + /*
> + * Attempt to update the floor with the new time value. Accept the
> + * resulting floor value regardless of the outcome of the swap.
> + */
> + if (atomic64_try_cmpxchg(&mg_floor, &old, mono)) {
> + ts->tv_nsec = 0;
> + timespec64_add_ns(ts, nsecs);
> + } else {
> + /*
> + * Something has changed mg_floor since "old" was
I complained about 'something has changed ...' before. Can we please
have proper technical explanations?
> + * fetched. "old" has now been updated with the
> + * current value of mg_floor, so use that to return
> + * the current coarse floor value.
This still does not explain why the new floor value is valid even when
it is before the value in @mono.
Thanks,
tglx
next prev parent reply other threads:[~2024-10-02 14:17 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-10-01 10:58 [PATCH v8 00/12] fs: multigrain timestamp redux Jeff Layton
2024-10-01 10:58 ` [PATCH v8 01/12] timekeeping: add interfaces for handling timestamps with a floor value Jeff Layton
2024-10-01 16:17 ` John Stultz
2024-10-02 14:16 ` Thomas Gleixner [this message]
2024-10-01 10:58 ` [PATCH v8 02/12] fs: add infrastructure for multigrain timestamps Jeff Layton
2024-10-01 13:20 ` Jan Kara
2024-10-01 13:34 ` Jeff Layton
2024-10-02 9:14 ` Jan Kara
2024-10-02 12:32 ` Jeff Layton
2024-10-01 10:58 ` [PATCH v8 03/12] fs: have setattr_copy handle multigrain timestamps appropriately Jeff Layton
2024-10-01 10:58 ` [PATCH v8 04/12] fs: handle delegated timestamps in setattr_copy_mgtime Jeff Layton
2024-10-01 13:32 ` Jan Kara
2024-10-01 10:58 ` [PATCH v8 05/12] fs: tracepoints around multigrain timestamp events Jeff Layton
2024-10-01 10:59 ` [PATCH v8 06/12] fs: add percpu counters for significant " Jeff Layton
2024-10-01 10:59 ` [PATCH v8 07/12] timekeeping: add percpu counter for tracking floor swap events Jeff Layton
2024-10-01 10:59 ` [PATCH v8 08/12] Documentation: add a new file documenting multigrain timestamps Jeff Layton
2024-10-01 10:59 ` [PATCH v8 09/12] xfs: switch to " Jeff Layton
2024-10-01 10:59 ` [PATCH v8 10/12] ext4: " Jeff Layton
2024-10-01 10:59 ` [PATCH v8 11/12] btrfs: convert " Jeff Layton
2024-10-01 10:59 ` [PATCH v8 12/12] tmpfs: add support for " Jeff Layton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87zfnmwpwk.ffs@tglx \
--to=tglx@linutronix.de \
--cc=adilger.kernel@dilger.ca \
--cc=akpm@linux-foundation.org \
--cc=brauner@kernel.org \
--cc=chandan.babu@oracle.com \
--cc=chuck.lever@oracle.com \
--cc=clm@fb.com \
--cc=corbet@lwn.net \
--cc=djwong@kernel.org \
--cc=dsterba@suse.com \
--cc=hughd@google.com \
--cc=jack@suse.cz \
--cc=jlayton@kernel.org \
--cc=josef@toxicpanda.com \
--cc=jstultz@google.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-nfs@vger.kernel.org \
--cc=linux-trace-kernel@vger.kernel.org \
--cc=linux-xfs@vger.kernel.org \
--cc=mathieu.desnoyers@efficios.com \
--cc=mhiramat@kernel.org \
--cc=rdunlap@infradead.org \
--cc=rostedt@goodmis.org \
--cc=sboyd@kernel.org \
--cc=tytso@mit.edu \
--cc=vadim.fedorenko@linux.dev \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).