From: John Stultz <jstultz@google.com>
To: Jeff Layton <jlayton@kernel.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>,
Christian Brauner <brauner@kernel.org>, Jan Kara <jack@suse.cz>,
Thomas Gleixner <tglx@linutronix.de>,
Stephen Boyd <sboyd@kernel.org>, Arnd Bergmann <arnd@kernel.org>,
Vadim Fedorenko <vadim.fedorenko@linux.dev>,
linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
kernel test robot <oliver.sang@intel.com>
Subject: Re: [PATCH v2] timekeeping: move multigrain timestamp floor handling into timekeeper
Date: Thu, 12 Sep 2024 13:11:31 -0700 [thread overview]
Message-ID: <CANDhNCrpkTfe6BRVNf1ihhGALbPBBhOs1PCPxA4MDHa1+=sEbQ@mail.gmail.com> (raw)
In-Reply-To: <20240912-mgtime-v2-1-54db84afb7a7@kernel.org>
On Thu, Sep 12, 2024 at 11:02 AM Jeff Layton <jlayton@kernel.org> wrote:
>
> The kernel test robot reported a performance hit in some will-it-scale
> tests due to the multigrain timestamp patches. My own testing showed
> about a 7% drop in performance on the pipe1_threads test, and the data
> showed that coarse_ctime() was slowing down current_time().
So, you provided some useful detail about why coarse_ctime() was slow
in your reply earlier, but it would be good to preserve that in the
commit message here.
> Move the multigrain timestamp floor tracking word into timekeeper.c. Add
> two new public interfaces: The first fills a timespec64 with the later
> of the coarse-grained clock and the floor time, and the second gets a
> fine-grained time and tries to swap it into the floor and fills a
> timespec64 with the result.
>
> The first function returns an opaque cookie that is suitable for passing
> to the second, which will use it as the "old" value in the cmpxchg.
The cookie usage isn't totally clear to me right off. It feels a bit
more subtle then I'd expect.
> diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
> index 5391e4167d60..bb039c9d525e 100644
> --- a/kernel/time/timekeeping.c
> +++ b/kernel/time/timekeeping.c
> @@ -114,6 +114,13 @@ static struct tk_fast tk_fast_raw ____cacheline_aligned = {
> .base[1] = FAST_TK_INIT,
> };
>
> +/*
> + * This represents the latest fine-grained time that we have handed out as a
> + * timestamp on the system. Tracked as a monotonic ktime_t, and converted to the
> + * realtime clock on an as-needed basis.
> + */
> +static __cacheline_aligned_in_smp atomic64_t mg_floor;
> +
So I do really like this general approach of having an internal floor
value combined with special coarse/fine grained assessors that work
with the floor, so we're not impacting the normal hotpath logic
(basically I was writing up a suggestion to this effect to the thread
with Arnd when I realized you had follow up patch in my inbox).
> static inline void tk_normalize_xtime(struct timekeeper *tk)
> {
> while (tk->tkr_mono.xtime_nsec >= ((u64)NSEC_PER_SEC << tk->tkr_mono.shift)) {
> @@ -2394,6 +2401,76 @@ void ktime_get_coarse_real_ts64(struct timespec64 *ts)
> }
> EXPORT_SYMBOL(ktime_get_coarse_real_ts64);
>
> +/**
> + * ktime_get_coarse_real_ts64_mg - get later of coarse grained time or floor
> + * @ts: timespec64 to be filled
> + *
> + * Adjust floor to realtime and compare it to the coarse time. Fill
> + * @ts with the latest one. Returns opaque cookie suitable to pass
> + * to ktime_get_real_ts64_mg.
> + */
> +u64 ktime_get_coarse_real_ts64_mg(struct timespec64 *ts)
> +{
> + struct timekeeper *tk = &tk_core.timekeeper;
> + u64 floor = atomic64_read(&mg_floor);
> + ktime_t f_real, offset, coarse;
> + unsigned int seq;
> +
> + WARN_ON(timekeeping_suspended);
> +
> + do {
> + seq = read_seqcount_begin(&tk_core.seq);
> + *ts = tk_xtime(tk);
> + offset = *offsets[TK_OFFS_REAL];
> + } while (read_seqcount_retry(&tk_core.seq, seq));
> +
> + coarse = timespec64_to_ktime(*ts);
> + f_real = ktime_add(floor, offset);
> + if (ktime_after(f_real, coarse))
> + *ts = ktime_to_timespec64(f_real);
> + return floor;
> +}
> +EXPORT_SYMBOL_GPL(ktime_get_coarse_real_ts64_mg);
Generally this looks ok to me.
> +/**
> + * ktime_get_real_ts64_mg - attempt to update floor value and return result
> + * @ts: pointer to the timespec to be set
> + * @cookie: opaque cookie from earlier call to ktime_get_coarse_real_ts64_mg()
> + *
> + * Get a current monotonic fine-grained time value and attempt to swap
> + * it into the floor using @cookie as the "old" value. @ts will be
> + * filled with the resulting floor value, regardless of the outcome of
> + * the swap.
> + */
Again this cookie argument usage and the behavior of this function
isn't very clear to me.
> +void ktime_get_real_ts64_mg(struct timespec64 *ts, u64 cookie)
> +{
> + struct timekeeper *tk = &tk_core.timekeeper;
> + ktime_t offset, mono, old = (ktime_t)cookie;
> + unsigned int seq;
> + u64 nsecs;
> +
> + WARN_ON(timekeeping_suspended);
> +
> + do {
> + seq = read_seqcount_begin(&tk_core.seq);
> +
> + ts->tv_sec = tk->xtime_sec;
> + mono = tk->tkr_mono.base;
> + nsecs = timekeeping_get_ns(&tk->tkr_mono);
> + offset = *offsets[TK_OFFS_REAL];
> + } while (read_seqcount_retry(&tk_core.seq, seq));
> +
> + mono = ktime_add_ns(mono, nsecs);
> + if (atomic64_try_cmpxchg(&mg_floor, &old, mono)) {
> + ts->tv_nsec = 0;
> + timespec64_add_ns(ts, nsecs);
> + } else {
> + *ts = ktime_to_timespec64(ktime_add(old, offset));
> + }
> +
> +}
> +EXPORT_SYMBOL(ktime_get_real_ts64_mg);
So initially I was expecting this to look something like (sorry for
the whitespace damage here):
{
do {
seq = read_seqcount_begin(&tk_core.seq);
ts->tv_sec = tk->xtime_sec;
mono = tk->tkr_mono.base;
nsecs = timekeeping_get_ns(&tk->tkr_mono);
offset = *offsets[TK_OFFS_REAL];
} while (read_seqcount_retry(&tk_core.seq, seq));
mono = ktime_add_ns(mono, nsecs);
do {
old = atomic64_read(&mg_floor);
if (floor >= mono)
break;
} while(!atomic64_try_cmpxchg(&mg_floor, old, mono);
ts->tv_nsec = 0;
timespec64_add_ns(ts, nsecs);
}
Where you read the tk data, atomically update the floor (assuming it's
not in the future) and then return the finegrained value, not needing
to manage a cookie value.
But instead, it seems like if something has happened since the cookie
value was saved (another cpu getting a fine grained timestamp), your
ktime_get_real_ts64_mg() will fall back to returning the same coarse
grained time saved to the cookie, as if no time had past?
It seems like that could cause problems:
cpu1 cpu2
--------------------------------------------------------------------------
t2a = ktime_get_coarse_real_ts64_mg
t1a = ktime_get_coarse_real_ts64_mg()
t1b = ktime_get_real_ts64_mg(t1a)
t2b = ktime_get_real_ts64_mg(t2a)
Where t2b will seem to be before t1b, even though it happened afterwards.
thanks
-john
next prev parent reply other threads:[~2024-09-12 20:11 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-09-12 18:02 [PATCH v2] timekeeping: move multigrain timestamp floor handling into timekeeper Jeff Layton
2024-09-12 20:11 ` John Stultz [this message]
2024-09-12 20:15 ` John Stultz
2024-09-12 20:18 ` Jeff Layton
2024-09-12 20:33 ` John Stultz
2024-09-12 20:35 ` John Stultz
2024-09-13 11:26 ` Jan Kara
2024-09-13 12:01 ` Jeff Layton
2024-09-13 18:43 ` John Stultz
2024-09-13 19:01 ` Jeff Layton
2024-09-16 10:41 ` Jan Kara
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CANDhNCrpkTfe6BRVNf1ihhGALbPBBhOs1PCPxA4MDHa1+=sEbQ@mail.gmail.com' \
--to=jstultz@google.com \
--cc=arnd@kernel.org \
--cc=brauner@kernel.org \
--cc=jack@suse.cz \
--cc=jlayton@kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=oliver.sang@intel.com \
--cc=sboyd@kernel.org \
--cc=tglx@linutronix.de \
--cc=vadim.fedorenko@linux.dev \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).