From: Vincent Donnefort <vdonnefort@google.com>
To: John Stultz <jstultz@google.com>
Cc: rostedt@goodmis.org, mhiramat@kernel.org,
linux-trace-kernel@vger.kernel.org, maz@kernel.org,
oliver.upton@linux.dev, kvmarm@lists.linux.dev, will@kernel.org,
qperret@google.com, kernel-team@android.com,
Thomas Gleixner <tglx@linutronix.de>
Subject: Re: [RFC PATCH 04/11] timekeeping: Export the boot clock in snapshots
Date: Thu, 5 Sep 2024 14:17:42 +0100 [thread overview]
Message-ID: <ZtmvdjGwAARUv2yR@google.com> (raw)
In-Reply-To: <CANDhNCoWvBn=h3ENEM9bbJO8LBVh5m=3vw79Bawr3e4NE4UMSQ@mail.gmail.com>
On Thu, Aug 22, 2024 at 11:13:11AM -0700, John Stultz wrote:
> On Mon, Aug 5, 2024 at 10:33 AM 'Vincent Donnefort' via kernel-team
> <kernel-team@android.com> wrote:
> >
> > On arm64 systems, the arch timer can be accessible by both EL1 and EL2.
> > This means when running with nVHE or protected KVM, it is easy to
> > generate clock values from the hypervisor, synchronized with the kernel.
> >
> > For tracing purpose, the boot clock is interesting as it doesn't stop on
> > suspend. Export it as part of the time snapshot. This will later allow
> > the hypervisor to add boot clock timestamps to its events.
> >
> > Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
> >
> > diff --git a/include/linux/timekeeping.h b/include/linux/timekeeping.h
> > index fc12a9ba2c88..0fc6a61d64bd 100644
> > --- a/include/linux/timekeeping.h
> > +++ b/include/linux/timekeeping.h
> > @@ -275,18 +275,24 @@ struct ktime_timestamps {
> > * counter value
> > * @cycles: Clocksource counter value to produce the system times
> > * @real: Realtime system time
> > + * @boot: Boot time
>
> So, adding the boottime to this kernel-internal snapshot seems reasonable to me.
>
> > * @raw: Monotonic raw system time
> > * @cs_id: Clocksource ID
> > * @clock_was_set_seq: The sequence number of clock-was-set events
> > * @cs_was_changed_seq: The sequence number of clocksource change events
> > + * @mono_shift: The monotonic clock slope shift
> > + * @mono_mult: The monotonic clock slope mult
>
>
> This bit, including the mult/shift pair however, isn't well explained
> and is a little more worrying.
>
>
> > @@ -1074,14 +1076,21 @@ void ktime_get_snapshot(struct system_time_snapshot *systime_snapshot)
> > systime_snapshot->clock_was_set_seq = tk->clock_was_set_seq;
> > base_real = ktime_add(tk->tkr_mono.base,
> > tk_core.timekeeper.offs_real);
> > + base_boot = ktime_add(tk->tkr_mono.base,
> > + tk_core.timekeeper.offs_boot);
> > base_raw = tk->tkr_raw.base;
> > nsec_real = timekeeping_cycles_to_ns(&tk->tkr_mono, now);
> > nsec_raw = timekeeping_cycles_to_ns(&tk->tkr_raw, now);
> > + mono_mult = tk->tkr_mono.mult;
> > + mono_shift = tk->tkr_mono.shift;
> > } while (read_seqcount_retry(&tk_core.seq, seq));
> >
> > systime_snapshot->cycles = now;
> > systime_snapshot->real = ktime_add_ns(base_real, nsec_real);
> > + systime_snapshot->boot = ktime_add_ns(base_boot, nsec_real);
> > systime_snapshot->raw = ktime_add_ns(base_raw, nsec_raw);
> > + systime_snapshot->mono_shift = mono_shift;
> > + systime_snapshot->mono_mult = mono_mult;
> > }
> > EXPORT_SYMBOL_GPL(ktime_get_snapshot);
>
> So this looks like you're trying to stuff kernel timekeeping internal
> values into the snapshot so you can skirt around the timekeeping
> subsystem and generate your own timestamps.
>
> This ends up duplicating logic, but in an incomplete way. For
> instance, you don't have things like ntp state, etc, so the timestamps
> you generate will not exactly match the kernel, and may have
> discontinuities. :(
>
> Now for many cases "close enough" is fine. But the difficulty is the
> expectation bar always raises, and eventually "close enough" isn't and
> we have a broken interface that has to be fixed.
>
> That said, I do get the need to have something like this is
> legitimate. There have been a number of cases where external hardware
> (PTP timestamps from NICs) or contexts (virt) are able to record
> hardware clocksource timestamps on their own, and want to be able to
> map that back to the kernel's (or maybe "a kernel's" if there are
> multiple VMs) sense of time. Sometimes even wanting to do this quite
> a bit later after the timestamp was recorded. The ktime_get_snapshot()
> logic was added in the first place for this reason.
>
> Some more aggressive approaches try to dump a bunch of the internal
> kernel timekeeping state out to userland and call it an api.
> See https://lore.kernel.org/lkml/410bbef9771ef8aa51704994a70d5965e367e2ce.camel@infradead.org/
> for a recent (and thorough) effort there.
>
> I'm very much not a fan of this approach, as it mimics older efforts
> for userspace time calculations that were done before we settled on
> VDSOs, which were very fragile and required years of keeping backwards
> compatibility logic to map the current kernel state back to separate
> structures and expensive conversions to different units that userland
> expected.
>
> The benefit with VDSO interface is while the data is exposed to
> userland, the structure is not, and the logic is still kernel
> controlled, so changes to internal state can be done without breaking
> userland.
>
> Something I have been thinking about is maybe it would be beneficial
> to rework the timekeeping core so that given a clocksource timestamp,
> it could calculate the time for that timestamp. While existing apis
> would still do a new read of the clocksource, so the timestamps would
> always increase, an old timestamp could be used to retro-calculate a
> past time. The thing that prevents this now is that the timekeeping
> core doesn't keep any history, so we can't correctly back-calculate
> times before the last state change. But potentially we could keep a
> buffer of timekeeper states associated with clocksource intervals, and
> so we could find the right state to use for a given clocksource
> timestamp. Now, this would still only work to a point, as we don't
> want to keep tons of historical state. But then with this, maybe we
> could switch to something more VDSO-like where the PTP drivers or host
> systems could request a time given a timestamp (and probably some
> clocksource id so we can sanity check everyone is using the same
> clock), and we could still provide what they want without having to
> expose all of our state.
>
> Unfortunately though, this is all hand waving and pontificating on my
> part, as it would be a large rework. But it seems something closer
> where we share opaque kernel state along with logic with proper
> syscall like APIs to do the calculations, would be a much better
> approach over just exporting more kernel state as an API.
>
> For a more short term approach, since you can't be exact outside of
> the timekeeping logic, why not interpolate from the data
> ktime_get_snapshot already provides to calculate your own sense of the
> frequency?
Understood, I shouldn't sneak out mult and shift. So for the following version,
I'll just use the boot clock value and process my "own" mult and "shift".
Thanks for having a look at the change!
>
> thanks
> -john
next prev parent reply other threads:[~2024-09-05 13:17 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-08-05 17:32 [RFC PATCH 00/11] Tracefs support for pKVM Vincent Donnefort
2024-08-05 17:32 ` [RFC PATCH 01/11] ring-buffer: Check for empty ring-buffer with rb_num_of_entries() Vincent Donnefort
2024-08-05 17:32 ` [RFC PATCH 02/11] ring-buffer: Introducing ring-buffer writer Vincent Donnefort
2024-08-06 20:39 ` Steven Rostedt
2024-08-13 14:21 ` Vincent Donnefort
2024-08-13 14:35 ` Steven Rostedt
2024-08-05 17:32 ` [RFC PATCH 03/11] ring-buffer: Expose buffer_data_page material Vincent Donnefort
2024-08-05 17:32 ` [RFC PATCH 04/11] timekeeping: Export the boot clock in snapshots Vincent Donnefort
2024-08-22 9:13 ` Marc Zyngier
2024-09-05 13:04 ` Vincent Donnefort
2024-08-22 18:13 ` John Stultz
2024-08-22 21:41 ` Thomas Gleixner
2024-09-05 13:17 ` Vincent Donnefort [this message]
2024-08-05 17:32 ` [RFC PATCH 05/11] KVM: arm64: Support unaligned fixmap in the nVHE hyp Vincent Donnefort
2024-08-05 17:32 ` [RFC PATCH 06/11] KVM: arm64: Add clock support " Vincent Donnefort
2024-08-05 17:32 ` [RFC PATCH 07/11] KVM: arm64: Add tracing support for the pKVM hyp Vincent Donnefort
2024-08-05 17:32 ` [RFC PATCH 08/11] KVM: arm64: Add hyp tracing to tracefs Vincent Donnefort
2024-08-05 17:32 ` [RFC PATCH 09/11] KVM: arm64: Add raw interface for hyp tracefs Vincent Donnefort
2024-08-05 17:32 ` [RFC PATCH 10/11] KVM: arm64: Add support for hyp events Vincent Donnefort
2024-08-05 17:32 ` [RFC PATCH 11/11] KVM: arm64: Add kselftest for tracefs hyp tracefs Vincent Donnefort
2024-08-06 20:11 ` [RFC PATCH 00/11] Tracefs support for pKVM Steven Rostedt
2024-08-07 16:39 ` Vincent Donnefort
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZtmvdjGwAARUv2yR@google.com \
--to=vdonnefort@google.com \
--cc=jstultz@google.com \
--cc=kernel-team@android.com \
--cc=kvmarm@lists.linux.dev \
--cc=linux-trace-kernel@vger.kernel.org \
--cc=maz@kernel.org \
--cc=mhiramat@kernel.org \
--cc=oliver.upton@linux.dev \
--cc=qperret@google.com \
--cc=rostedt@goodmis.org \
--cc=tglx@linutronix.de \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.