From: Jeff Layton <jlayton@kernel.org>
To: John Stultz <jstultz@google.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>,
Christian Brauner <brauner@kernel.org>, Jan Kara <jack@suse.cz>,
Thomas Gleixner <tglx@linutronix.de>,
Stephen Boyd <sboyd@kernel.org>, Arnd Bergmann <arnd@kernel.org>,
Vadim Fedorenko <vadim.fedorenko@linux.dev>,
linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
kernel test robot <oliver.sang@intel.com>
Subject: Re: [PATCH v2] timekeeping: move multigrain timestamp floor handling into timekeeper
Date: Thu, 12 Sep 2024 16:18:43 -0400 [thread overview]
Message-ID: <80f8f49da3f7208101e497a130a8abe106b298ad.camel@kernel.org> (raw)
In-Reply-To: <CANDhNCrpkTfe6BRVNf1ihhGALbPBBhOs1PCPxA4MDHa1+=sEbQ@mail.gmail.com>
On Thu, 2024-09-12 at 13:11 -0700, John Stultz wrote:
> On Thu, Sep 12, 2024 at 11:02 AM Jeff Layton <jlayton@kernel.org> wrote:
> >
> > The kernel test robot reported a performance hit in some will-it-scale
> > tests due to the multigrain timestamp patches. My own testing showed
> > about a 7% drop in performance on the pipe1_threads test, and the data
> > showed that coarse_ctime() was slowing down current_time().
>
> So, you provided some useful detail about why coarse_ctime() was slow
> in your reply earlier, but it would be good to preserve that in the
> commit message here.
>
>
> > Move the multigrain timestamp floor tracking word into timekeeper.c. Add
> > two new public interfaces: The first fills a timespec64 with the later
> > of the coarse-grained clock and the floor time, and the second gets a
> > fine-grained time and tries to swap it into the floor and fills a
> > timespec64 with the result.
> >
> > The first function returns an opaque cookie that is suitable for passing
> > to the second, which will use it as the "old" value in the cmpxchg.
>
> The cookie usage isn't totally clear to me right off. It feels a bit
> more subtle then I'd expect.
>
>
> > diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
> > index 5391e4167d60..bb039c9d525e 100644
> > --- a/kernel/time/timekeeping.c
> > +++ b/kernel/time/timekeeping.c
> > @@ -114,6 +114,13 @@ static struct tk_fast tk_fast_raw ____cacheline_aligned = {
> > .base[1] = FAST_TK_INIT,
> > };
> >
> > +/*
> > + * This represents the latest fine-grained time that we have handed out as a
> > + * timestamp on the system. Tracked as a monotonic ktime_t, and converted to the
> > + * realtime clock on an as-needed basis.
> > + */
> > +static __cacheline_aligned_in_smp atomic64_t mg_floor;
> > +
>
> So I do really like this general approach of having an internal floor
> value combined with special coarse/fine grained assessors that work
> with the floor, so we're not impacting the normal hotpath logic
> (basically I was writing up a suggestion to this effect to the thread
> with Arnd when I realized you had follow up patch in my inbox).
>
>
> > static inline void tk_normalize_xtime(struct timekeeper *tk)
> > {
> > while (tk->tkr_mono.xtime_nsec >= ((u64)NSEC_PER_SEC << tk->tkr_mono.shift)) {
> > @@ -2394,6 +2401,76 @@ void ktime_get_coarse_real_ts64(struct timespec64 *ts)
> > }
> > EXPORT_SYMBOL(ktime_get_coarse_real_ts64);
> >
> > +/**
> > + * ktime_get_coarse_real_ts64_mg - get later of coarse grained time or floor
> > + * @ts: timespec64 to be filled
> > + *
> > + * Adjust floor to realtime and compare it to the coarse time. Fill
> > + * @ts with the latest one. Returns opaque cookie suitable to pass
> > + * to ktime_get_real_ts64_mg.
> > + */
> > +u64 ktime_get_coarse_real_ts64_mg(struct timespec64 *ts)
> > +{
> > + struct timekeeper *tk = &tk_core.timekeeper;
> > + u64 floor = atomic64_read(&mg_floor);
> > + ktime_t f_real, offset, coarse;
> > + unsigned int seq;
> > +
> > + WARN_ON(timekeeping_suspended);
> > +
> > + do {
> > + seq = read_seqcount_begin(&tk_core.seq);
> > + *ts = tk_xtime(tk);
> > + offset = *offsets[TK_OFFS_REAL];
> > + } while (read_seqcount_retry(&tk_core.seq, seq));
> > +
> > + coarse = timespec64_to_ktime(*ts);
> > + f_real = ktime_add(floor, offset);
> > + if (ktime_after(f_real, coarse))
> > + *ts = ktime_to_timespec64(f_real);
> > + return floor;
> > +}
> > +EXPORT_SYMBOL_GPL(ktime_get_coarse_real_ts64_mg);
>
> Generally this looks ok to me.
>
>
> > +/**
> > + * ktime_get_real_ts64_mg - attempt to update floor value and return result
> > + * @ts: pointer to the timespec to be set
> > + * @cookie: opaque cookie from earlier call to ktime_get_coarse_real_ts64_mg()
> > + *
> > + * Get a current monotonic fine-grained time value and attempt to swap
> > + * it into the floor using @cookie as the "old" value. @ts will be
> > + * filled with the resulting floor value, regardless of the outcome of
> > + * the swap.
> > + */
>
> Again this cookie argument usage and the behavior of this function
> isn't very clear to me.
>
> > +void ktime_get_real_ts64_mg(struct timespec64 *ts, u64 cookie)
> > +{
> > + struct timekeeper *tk = &tk_core.timekeeper;
> > + ktime_t offset, mono, old = (ktime_t)cookie;
> > + unsigned int seq;
> > + u64 nsecs;
> > +
> > + WARN_ON(timekeeping_suspended);
> > +
> > + do {
> > + seq = read_seqcount_begin(&tk_core.seq);
> > +
> > + ts->tv_sec = tk->xtime_sec;
> > + mono = tk->tkr_mono.base;
> > + nsecs = timekeeping_get_ns(&tk->tkr_mono);
> > + offset = *offsets[TK_OFFS_REAL];
> > + } while (read_seqcount_retry(&tk_core.seq, seq));
> > +
> > + mono = ktime_add_ns(mono, nsecs);
> > + if (atomic64_try_cmpxchg(&mg_floor, &old, mono)) {
> > + ts->tv_nsec = 0;
> > + timespec64_add_ns(ts, nsecs);
> > + } else {
> > + *ts = ktime_to_timespec64(ktime_add(old, offset));
> > + }
> > +
> > +}
> > +EXPORT_SYMBOL(ktime_get_real_ts64_mg);
>
>
> So initially I was expecting this to look something like (sorry for
> the whitespace damage here):
> {
> do {
> seq = read_seqcount_begin(&tk_core.seq);
> ts->tv_sec = tk->xtime_sec;
> mono = tk->tkr_mono.base;
> nsecs = timekeeping_get_ns(&tk->tkr_mono);
> offset = *offsets[TK_OFFS_REAL];
> } while (read_seqcount_retry(&tk_core.seq, seq));
>
> mono = ktime_add_ns(mono, nsecs);
> do {
> old = atomic64_read(&mg_floor);
> if (floor >= mono)
> break;
> } while(!atomic64_try_cmpxchg(&mg_floor, old, mono);
> ts->tv_nsec = 0;
> timespec64_add_ns(ts, nsecs);
> }
>
> Where you read the tk data, atomically update the floor (assuming it's
> not in the future) and then return the finegrained value, not needing
> to manage a cookie value.
>
> But instead, it seems like if something has happened since the cookie
> value was saved (another cpu getting a fine grained timestamp), your
> ktime_get_real_ts64_mg() will fall back to returning the same coarse
> grained time saved to the cookie, as if no time had past?
>
> It seems like that could cause problems:
>
> cpu1 cpu2
> --------------------------------------------------------------------------
> t2a = ktime_get_coarse_real_ts64_mg
> t1a = ktime_get_coarse_real_ts64_mg()
> t1b = ktime_get_real_ts64_mg(t1a)
>
> t2b = ktime_get_real_ts64_mg(t2a)
>
> Where t2b will seem to be before t1b, even though it happened afterwards.
>
Ahh no, the subtle thing about atomic64_try_cmpxchg is that it
overwrites "old" with the value that was currently there in the event
that the cmp fails.
So, the try_cmpxchg there will either swap the new value into place, or
if it was updated in the meantime, "old" will now refer to the value
that's currently in the floor word. Either is fine in this case, so we
don't need to retry anything.
--
Jeff Layton <jlayton@kernel.org>
next prev parent reply other threads:[~2024-09-12 20:18 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-09-12 18:02 [PATCH v2] timekeeping: move multigrain timestamp floor handling into timekeeper Jeff Layton
2024-09-12 20:11 ` John Stultz
2024-09-12 20:15 ` John Stultz
2024-09-12 20:18 ` Jeff Layton [this message]
2024-09-12 20:33 ` John Stultz
2024-09-12 20:35 ` John Stultz
2024-09-13 11:26 ` Jan Kara
2024-09-13 12:01 ` Jeff Layton
2024-09-13 18:43 ` John Stultz
2024-09-13 19:01 ` Jeff Layton
2024-09-16 10:41 ` Jan Kara
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=80f8f49da3f7208101e497a130a8abe106b298ad.camel@kernel.org \
--to=jlayton@kernel.org \
--cc=arnd@kernel.org \
--cc=brauner@kernel.org \
--cc=jack@suse.cz \
--cc=jstultz@google.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=oliver.sang@intel.com \
--cc=sboyd@kernel.org \
--cc=tglx@linutronix.de \
--cc=vadim.fedorenko@linux.dev \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).