From: Thomas Gleixner <tglx@linutronix.de>
To: Jeff Layton <jlayton@kernel.org>,
John Stultz <jstultz@google.com>, Stephen Boyd <sboyd@kernel.org>,
Alexander Viro <viro@zeniv.linux.org.uk>,
Christian Brauner <brauner@kernel.org>, Jan Kara <jack@suse.cz>,
Steven Rostedt <rostedt@goodmis.org>,
Masami Hiramatsu <mhiramat@kernel.org>,
Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
Jonathan Corbet <corbet@lwn.net>,
Randy Dunlap <rdunlap@infradead.org>,
Chandan Babu R <chandan.babu@oracle.com>,
"Darrick J. Wong" <djwong@kernel.org>,
Theodore Ts'o <tytso@mit.edu>,
Andreas Dilger <adilger.kernel@dilger.ca>,
Chris Mason <clm@fb.com>, Josef Bacik <josef@toxicpanda.com>,
David Sterba <dsterba@suse.com>, Hugh Dickins <hughd@google.com>,
Andrew Morton <akpm@linux-foundation.org>,
Chuck Lever <chuck.lever@oracle.com>,
Vadim Fedorenko <vadim.fedorenko@linux.dev>
Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
linux-trace-kernel@vger.kernel.org, linux-doc@vger.kernel.org,
linux-xfs@vger.kernel.org, linux-ext4@vger.kernel.org,
linux-btrfs@vger.kernel.org, linux-nfs@vger.kernel.org,
linux-mm@kvack.org, Jeff Layton <jlayton@kernel.org>
Subject: Re: [PATCH v8 01/12] timekeeping: add interfaces for handling timestamps with a floor value
Date: Wed, 02 Oct 2024 16:16:59 +0200 [thread overview]
Message-ID: <87zfnmwpwk.ffs@tglx> (raw)
In-Reply-To: <20241001-mgtime-v8-1-903343d91bc3@kernel.org>
On Tue, Oct 01 2024 at 06:58, Jeff Layton wrote:
V8 ? I remember that I reviewed v* already :)
Also the sentence after the susbsystem prefix starts with an uppercase
letter.
> Multigrain timestamps allow the kernel to use fine-grained timestamps
> when an inode's attributes is being actively observed via ->getattr().
> With this support, it's possible for a file to get a fine-grained
> timestamp, and another modified after it to get a coarse-grained stamp
> that is earlier than the fine-grained time. If this happens then the
> files can appear to have been modified in reverse order, which breaks
> VFS ordering guarantees.
>
> To prevent this, maintain a floor value for multigrain timestamps.
> Whenever a fine-grained timestamp is handed out, record it, and when
> coarse-grained stamps are handed out, ensure they are not earlier than
> that value. If the coarse-grained timestamp is earlier than the
> fine-grained floor, return the floor value instead.
>
> Add a static singleton atomic64_t into timekeeper.c that we can use to
s/we can use/is used/
> keep track of the latest fine-grained time ever handed out. This is
> tracked as a monotonic ktime_t value to ensure that it isn't affected by
> clock jumps. Because it is updated at different times than the rest of
> the timekeeper object, the floor value is managed independently of the
> timekeeper via a cmpxchg() operation, and sits on its own cacheline.
>
> This patch also adds two new public interfaces:
git grep 'This patch' Docuementation/process/
> - ktime_get_coarse_real_ts64_mg() fills a timespec64 with the later of the
> coarse-grained clock and the floor time
>
> - ktime_get_real_ts64_mg() gets the fine-grained clock value, and tries
> to swap it into the floor. A timespec64 is filled with the result.
>
> Since the floor is global, take care to avoid updating it unless it's
> absolutely necessary. If we do the cmpxchg and find that the value has
We do nothing. Please describe your changes in passive voice and do not
impersonate code.
> been updated since we fetched it, then we discard the fine-grained time
> that was fetched in favor of the recent update.
This still is confusing. Something like this:
The floor value is global and updated via a single try_cmpxchg(). If
that fails then the operation raced with a concurrent update. It does
not matter whether the new value is later than the value, which the
failed cmpxchg() tried to write, or not. Add sensible technical
explanation.
> +/*
> + * Multigrain timestamps require that we keep track of the latest fine-grained
> + * timestamp that has been issued, and never return a coarse-grained timestamp
> + * that is earlier than that value.
> + *
> + * mg_floor represents the latest fine-grained time that we have handed out as
> + * a timestamp on the system. Tracked as a monotonic ktime_t, and converted to
> + * the realtime clock on an as-needed basis.
> + *
> + * This ensures that we never issue a timestamp earlier than one that has
> + * already been issued, as long as the realtime clock never experiences a
> + * backward clock jump. If the realtime clock is set to an earlier time, then
> + * realtime timestamps can appear to go backward.
> + */
> +static __cacheline_aligned_in_smp atomic64_t mg_floor;
> +
> static inline void tk_normalize_xtime(struct timekeeper *tk)
> {
> while (tk->tkr_mono.xtime_nsec >= ((u64)NSEC_PER_SEC << tk->tkr_mono.shift)) {
> @@ -2394,6 +2410,86 @@ void ktime_get_coarse_real_ts64(struct timespec64 *ts)
> }
> EXPORT_SYMBOL(ktime_get_coarse_real_ts64);
>
> +/**
> + * ktime_get_coarse_real_ts64_mg - return latter of coarse grained time or floor
> + * @ts: timespec64 to be filled
> + *
> + * Fetch the global mg_floor value, convert it to realtime and
> + * compare it to the current coarse-grained time. Fill @ts with
> + * whichever is latest. Note that this is a filesystem-specific
> + * interface and should be avoided outside of that context.
> + */
> +void ktime_get_coarse_real_ts64_mg(struct timespec64 *ts)
> +{
> + struct timekeeper *tk = &tk_core.timekeeper;
> + u64 floor = atomic64_read(&mg_floor);
> + ktime_t f_real, offset, coarse;
> + unsigned int seq;
> +
> + do {
> + seq = read_seqcount_begin(&tk_core.seq);
> + *ts = tk_xtime(tk);
> + offset = tk_core.timekeeper.offs_real;
> + } while (read_seqcount_retry(&tk_core.seq, seq));
> +
> + coarse = timespec64_to_ktime(*ts);
> + f_real = ktime_add(floor, offset);
> + if (ktime_after(f_real, coarse))
> + *ts = ktime_to_timespec64(f_real);
> +}
> +EXPORT_SYMBOL_GPL(ktime_get_coarse_real_ts64_mg);
> +
> +/**
> + * ktime_get_real_ts64_mg - attempt to update floor value and return result
> + * @ts: pointer to the timespec to be set
> + *
> + * Get a monotonic fine-grained time value and attempt to swap it into the
> + * floor. If it succeeds then accept the new floor value. If it fails
> + * then another task raced in during the interim time and updated the floor.
> + * That value is just as valid, so accept that value in this case.
Why? 'just as valid' does not tell me anything.
> + * @ts will be filled with the resulting floor value, regardless of
> + * the outcome of the swap. Note that this is a filesystem specific interface
> + * and should be avoided outside of that context.
> + */
> +void ktime_get_real_ts64_mg(struct timespec64 *ts)
> +{
> + struct timekeeper *tk = &tk_core.timekeeper;
> + ktime_t old = atomic64_read(&mg_floor);
> + ktime_t offset, mono;
> + unsigned int seq;
> + u64 nsecs;
> +
> + do {
> + seq = read_seqcount_begin(&tk_core.seq);
> +
> + ts->tv_sec = tk->xtime_sec;
> + mono = tk->tkr_mono.base;
> + nsecs = timekeeping_get_ns(&tk->tkr_mono);
> + offset = tk_core.timekeeper.offs_real;
> + } while (read_seqcount_retry(&tk_core.seq, seq));
> +
> + mono = ktime_add_ns(mono, nsecs);
> +
> + /*
> + * Attempt to update the floor with the new time value. Accept the
> + * resulting floor value regardless of the outcome of the swap.
> + */
> + if (atomic64_try_cmpxchg(&mg_floor, &old, mono)) {
> + ts->tv_nsec = 0;
> + timespec64_add_ns(ts, nsecs);
> + } else {
> + /*
> + * Something has changed mg_floor since "old" was
I complained about 'something has changed ...' before. Can we please
have proper technical explanations?
> + * fetched. "old" has now been updated with the
> + * current value of mg_floor, so use that to return
> + * the current coarse floor value.
This still does not explain why the new floor value is valid even when
it is before the value in @mono.
Thanks,
tglx
next prev parent reply other threads:[~2024-10-02 14:17 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-10-01 10:58 [PATCH v8 00/12] fs: multigrain timestamp redux Jeff Layton
2024-10-01 10:58 ` [PATCH v8 01/12] timekeeping: add interfaces for handling timestamps with a floor value Jeff Layton
2024-10-01 16:17 ` John Stultz
2024-10-02 14:16 ` Thomas Gleixner [this message]
2024-10-01 10:58 ` [PATCH v8 02/12] fs: add infrastructure for multigrain timestamps Jeff Layton
2024-10-01 13:20 ` Jan Kara
2024-10-01 13:34 ` Jeff Layton
2024-10-02 9:14 ` Jan Kara
2024-10-02 12:32 ` Jeff Layton
2024-10-01 10:58 ` [PATCH v8 03/12] fs: have setattr_copy handle multigrain timestamps appropriately Jeff Layton
2024-10-01 10:58 ` [PATCH v8 04/12] fs: handle delegated timestamps in setattr_copy_mgtime Jeff Layton
2024-10-01 13:32 ` Jan Kara
2024-10-01 10:58 ` [PATCH v8 05/12] fs: tracepoints around multigrain timestamp events Jeff Layton
2024-10-01 10:59 ` [PATCH v8 06/12] fs: add percpu counters for significant " Jeff Layton
2024-10-01 10:59 ` [PATCH v8 07/12] timekeeping: add percpu counter for tracking floor swap events Jeff Layton
2024-10-01 10:59 ` [PATCH v8 08/12] Documentation: add a new file documenting multigrain timestamps Jeff Layton
2024-10-01 10:59 ` [PATCH v8 09/12] xfs: switch to " Jeff Layton
2024-10-01 10:59 ` [PATCH v8 10/12] ext4: " Jeff Layton
2024-10-01 10:59 ` [PATCH v8 11/12] btrfs: convert " Jeff Layton
2024-10-01 10:59 ` [PATCH v8 12/12] tmpfs: add support for " Jeff Layton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87zfnmwpwk.ffs@tglx \
--to=tglx@linutronix.de \
--cc=adilger.kernel@dilger.ca \
--cc=akpm@linux-foundation.org \
--cc=brauner@kernel.org \
--cc=chandan.babu@oracle.com \
--cc=chuck.lever@oracle.com \
--cc=clm@fb.com \
--cc=corbet@lwn.net \
--cc=djwong@kernel.org \
--cc=dsterba@suse.com \
--cc=hughd@google.com \
--cc=jack@suse.cz \
--cc=jlayton@kernel.org \
--cc=josef@toxicpanda.com \
--cc=jstultz@google.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-nfs@vger.kernel.org \
--cc=linux-trace-kernel@vger.kernel.org \
--cc=linux-xfs@vger.kernel.org \
--cc=mathieu.desnoyers@efficios.com \
--cc=mhiramat@kernel.org \
--cc=rdunlap@infradead.org \
--cc=rostedt@goodmis.org \
--cc=sboyd@kernel.org \
--cc=tytso@mit.edu \
--cc=vadim.fedorenko@linux.dev \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.