From: David Woodhouse <dwmw2@infradead.org>
To: Peter Hilber <peter.hilber@opensynergy.com>,
linux-kernel@vger.kernel.org, virtualization@lists.linux.dev,
linux-arm-kernel@lists.infradead.org, linux-rtc@vger.kernel.org,
"Ridoux, Julien" <ridouxj@amazon.com>,
virtio-dev@lists.linux.dev, "Luu, Ryan" <rluu@amazon.com>,
"Chashper, David" <chashper@amazon.com>
Cc: "Christopher S. Hall" <christopher.s.hall@intel.com>,
Jason Wang <jasowang@redhat.com>,
John Stultz <jstultz@google.com>,
"Michael S. Tsirkin" <mst@redhat.com>,
netdev@vger.kernel.org,
Richard Cochran <richardcochran@gmail.com>,
Stephen Boyd <sboyd@kernel.org>,
Thomas Gleixner <tglx@linutronix.de>,
Xuan Zhuo <xuanzhuo@linux.alibaba.com>,
Marc Zyngier <maz@kernel.org>,
Mark Rutland <mark.rutland@arm.com>,
Daniel Lezcano <daniel.lezcano@linaro.org>,
Alessandro Zummo <a.zummo@towertech.it>,
Alexandre Belloni <alexandre.belloni@bootlin.com>
Subject: Re: [RFC PATCH v2] ptp: Add vDSO-style vmclock support
Date: Wed, 03 Jul 2024 11:40:16 +0100 [thread overview]
Message-ID: <352a7f910269daf1a7ff57ea4a41a306d6981b21.camel@infradead.org> (raw)
In-Reply-To: <02077acb-7f26-4cfb-90be-cf085a048334@opensynergy.com>
[-- Attachment #1: Type: text/plain, Size: 11141 bytes --]
On Wed, 2024-07-03 at 11:56 +0200, Peter Hilber wrote:
> On 02.07.24 20:40, David Woodhouse wrote:
> > On 2 July 2024 19:12:00 BST, Peter Hilber <peter.hilber@opensynergy.com> wrote:
> > > On 02.07.24 18:39, David Woodhouse wrote:
> > > > To clarify then, the main types are
> > > >
> > > > VIRTIO_RTC_CLOCK_UTC == 0
> > > > VIRTIO_RTC_CLOCK_TAI == 1
> > > > VIRTIO_RTC_CLOCK_MONOTONIC == 2
> > > > VIRTIO_RTC_CLOCK_SMEARED_UTC == 3
> > > >
> > > > And the subtypes are *only* for the case of
> > > > VIRTIO_RTC_CLOCK_SMEARED_UTC. They include
> > > >
> > > > VIRTIO_RTC_SUBTYPE_STRICT
> > > > VIRTIO_RTC_SUBTYPE_UNDEFINED /* or whatever you want to call it */
> > > > VIRTIO_RTC_SUBTYPE_SMEAR_NOON_LINEAR
> > > > VIRTIO_RTC_SUBTYPE_UTC_SLS /* if it's worth doing this one */
> > > >
> > > > Is that what we just agreed on?
> > > >
> > > >
> > >
> > > This is a misunderstanding. My idea was that the main types are
> > >
> > > > VIRTIO_RTC_CLOCK_UTC == 0
> > > > VIRTIO_RTC_CLOCK_TAI == 1
> > > > VIRTIO_RTC_CLOCK_MONOTONIC == 2
> > > > VIRTIO_RTC_CLOCK_SMEARED_UTC == 3
> > >
> > > VIRTIO_RTC_CLOCK_MAYBE_SMEARED_UTC == 4
> > >
> > > The subtypes would be (1st for clocks other than
> > > VIRTIO_RTC_CLOCK_SMEARED_UTC, 2nd to last for
> > > VIRTIO_RTC_CLOCK_SMEARED_UTC):
> > >
> > > #define VIRTIO_RTC_SUBTYPE_STRICT 0
> > > #define VIRTIO_RTC_SUBTYPE_SMEAR_NOON_LINEAR 1
> > > #define VIRTIO_RTC_SUBTYPE_SMEAR_UTC_SLS 2
> > >
> >
> > Thanks. I really do think that from the guest point of view there's
> > really no distinction between "maybe smeared" and "undefined
> > smearing", and have a preference for using the latter form, which
> > is the key difference there?
> >
> > Again though, not a hill for me to die on.
>
> I have no issue with staying with "undefined smearing", so would you agree
> to something like
>
> VIRTIO_RTC_CLOCK_SMEAR_UNDEFINED_UTC == 4
>
> (or another name if you prefer)?
Well, the point of contention was really whether that was a *type* or a
*subtype*.
Either way, it's a "precision clock" telling its consumer that the
device *itself* doesn't really know what time is being exposed. Which
seems like a bizarre thing to support.
But I think I've constructed an argument which persuades me to your
point of view that *if* we permit it, it should be a primary type...
A clock can *either* be UTC, *or* it can be monotonic. The whole point
of smearing is to produce a monotonic clock, of course.
VIRTIO_RTC_CLOCK_UTC is UTC. It is not monotonic.
VIRTIO_RTC_CLOCK_SMEARED is, presumably, monotonic (and I think we
should explicitly require that to be true in virtio-rtc).
But VIRTIO_RTC_CLOCK_MAYBE_SMEARED is the worst of both worlds. It is
neither known to be correct UTC, *nor* is it known to be monotonic. So
(again, if we permit it at all) I think it probably does make sense for
that to be a primary type.
This is what I currently have for 'struct vmclock_abi' that I'd like to
persuade you to adopt. I need to tweak it some more, for at least the
following reasons, as well as any more you can see:
• size isn't big enough for 64KiB pages
• Should be explicitly little-endian
• Does it need esterror as well as maxerror?
• Why is maxerror in picoseconds? It's the only use of that unit
• Where do the clock_status values come from? Do they make sense?
• Are signed integers OK? (I think so!).
/*
* This structure provides a vDSO-style clock to VM guests, exposing the
* relationship (or lack thereof) between the CPU clock (TSC, timebase, arch
* counter, etc.) and real time. It is designed to address the problem of
* live migration, which other clock enlightenments do not.
*
* When a guest is live migrated, this affects the clock in two ways.
*
* First, even between identical hosts the actual frequency of the underlying
* counter will change within the tolerances of its specification (typically
* ±50PPM, or 4 seconds a day). This frequency also varies over time on the
* same host, but can be tracked by NTP as it generally varies slowly. With
* live migration there is a step change in the frequency, with no warning.
*
* Second, there may be a step change in the value of the counter itself, as
* its accuracy is limited by the precision of the NTP synchronization on the
* source and destination hosts.
*
* So any calibration (NTP, PTP, etc.) which the guest has done on the source
* host before migration is invalid, and needs to be redone on the new host.
*
* In its most basic mode, this structure provides only an indication to the
* guest that live migration has occurred. This allows the guest to know that
* its clock is invalid and take remedial action. For applications that need
* reliable accurate timestamps (e.g. distributed databases), the structure
* can be mapped all the way to userspace. This allows the application to see
* directly for itself that the clock is disrupted and take appropriate
* action, even when using a vDSO-style method to get the time instead of a
* system call.
*
* In its more advanced mode. this structure can also be used to expose the
* precise relationship of the CPU counter to real time, as calibrated by the
* host. This means that userspace applications can have accurate time
* immediately after live migration, rather than having to pause operations
* and wait for NTP to recover. This mode does, of course, rely on the
* counter being reliable and consistent across CPUs.
*
* Note that this must be true UTC, never with smeared leap seconds. If a
* guest wishes to construct a smeared clock, it can do so. Presenting a
* smeared clock through this interface would be problematic because it
* actually messes with the apparent counter *period*. A linear smearing
* of 1 ms per second would effectively tweak the counter period by 1000PPM
* at the start/end of the smearing period, while a sinusoidal smear would
* basically be impossible to represent.
*
* This structure is offered with the intent that it be adopted into the
* nascent virtio-rtc standard, as a virtio-rtc that does not address the live
* migration problem seems a little less than fit for purpose. For that
* reason, certain fields use precisely the same numeric definitions as in
* the virtio-rtc proposal. The structure can also be exposed through an ACPI
* device with the CID "VMCLOCK", modelled on the "VMGENID" device except for
* the fact that it uses a real _CRS to convey the address of the structure
* (which should be a full page, to allow for mapping directly to userspace).
*/
#ifndef __VMCLOCK_ABI_H__
#define __VMCLOCK_ABI_H__
#ifdef __KERNEL__
#include <linux/types.h>
#else
#include <stdint.h>
#endif
struct vmclock_abi {
uint64_t magic;
#define VMCLOCK_MAGIC 0x4b4c4356 /* "VCLK" */
uint16_t size; /* Size of page containing this structure */
uint16_t version; /* 1 */
/* Sequence lock. Low bit means an update is in progress. */
uint32_t seq_count;
uint32_t flags;
/* Indicates that the tai_offset_sec field is valid */
#define VMCLOCK_FLAG_TAI_OFFSET_VALID (1 << 0)
/*
* Optionally used to notify guests of pending maintenance events.
* A guest may wish to remove itself from service if an event is
* coming up. Two flags indicate the rough imminence of the event.
*/
#define VMCLOCK_FLAG_DISRUPTION_SOON (1 << 1) /* About a day */
#define VMCLOCK_FLAG_DISRUPTION_IMMINENT (1 << 2) /* About an hour */
/* Indicates that the utc_time_maxerror_picosec field is valid */
#define VMCLOCK_FLAG_UTC_MAXERROR_VALID (1 << 3)
/* Indicates counter_period_error_rate_frac_sec is valid */
#define VMCLOCK_FLAG_PERIOD_ERROR_VALID (1 << 4)
/*
* This field changes to another non-repeating value when the CPU
* counter is disrupted, for example on live migration. This lets
* the guest know that it should discard any calibration it has
* performed of the counter against external sources (NTP/PTP/etc.).
*/
uint64_t disruption_marker;
uint8_t clock_status;
#define VMCLOCK_STATUS_UNKNOWN 0
#define VMCLOCK_STATUS_INITIALIZING 1
#define VMCLOCK_STATUS_SYNCHRONIZED 2
#define VMCLOCK_STATUS_FREERUNNING 3
#define VMCLOCK_STATUS_UNRELIABLE 4
uint8_t counter_id; /* Matches VIRTIO_RTC_COUNTER_xxx */
#define VMCLOCK_COUNTER_ARM_VCNT 0
#define VMCLOCK_COUNTER_X86_TSC 1
#define VMCLOCK_COUNTER_INVALID 0xff
/*
* By providing the offset from UTC to TAI, the guest can know both
* UTC and TAI reliably, whichever is indicated in the time_type
* field. Valid if VMCLOCK_FLAG_TAI_OFFSET_VALID is set in flags.
*/
int16_t tai_offset_sec;
/*
* What time is exposed in the time_sec/time_frac_sec fields?
*/
uint8_t time_type; /* Matches VIRTIO_RTC_TYPE_xxx */
#define VMCLOCK_TIME_UTC 0 /* Since 1970-01-01 00:00:00z */
#define VMCLOCK_TIME_TAI 1 /* Since 1970-01-01 00:00:00z */
#define VMCLOCK_TIME_MONOTONIC 2 /* Since undefined epoch */
#define VMCLOCK_TIME_INVALID_SMEARED 3 /* Not supported */
#define VMCLOCK_TIME_INVALID_MAYBE_SMEARED 4 /* Not supported */
/*
* The time exposed through this device is never smeared. This field
* corresponds to the 'subtype' field in virtio-rtc, which indicates
* the smearing method. However in this case it provides a *hint* to
* the guest operating system, such that *if* the guest OS wants to
* provide its users with an alternative clock which does not follow
* the POSIX CLOCK_REALTIME standard, it may do so in a fashion
* consistent with the other systems in the nearby environment.
*/
uint8_t leap_second_smearing_hint; /* Matches VIRTIO_RTC_SUBTYPE_xxx */
#define VMCLOCK_SMEARING_STRICT 0
#define VMCLOCK_SMEARING_NOON_LINEAR 1
#define VMCLOCK_SMEARING_UTC_SLS 2
/* Bit shift for counter_period_frac_sec and its error rate */
uint8_t counter_period_shift;
/*
* Unlike in NTP, this can indicate a leap second in the past. This
* is needed to allow guests to derive an imprecise clock with
* smeared leap seconds for themselves, as some modes of smearing
* need the adjustments to continue even after the moment at which
* the leap second should have occurred.
*/
uint8_t leap_indicator; /* Matches VIRTIO_RTC_LEAP_xxx */
#define VMCLOCK_LEAP_NONE 0
#define VMCLOCK_LEAP_PRE_POS 1
#define VMCLOCK_LEAP_PRE_NEG 2
#define VMCLOCK_LEAP_POS 3
#define VMCLOCK_LEAP_NEG 4
uint64_t leapsecond_tai_sec; /* Since 1970-01-01 00:00:00z */
/*
* Paired values of counter and UTC at a given point in time.
*/
uint64_t counter_value;
uint64_t time_sec;
uint64_t time_frac_sec;
/*
* Counter frequency, and error margin. The unit of these fields is
* seconds >> (64 + counter_period_shift)
*/
uint64_t counter_period_frac_sec;
uint64_t counter_period_error_rate_frac_sec;
/* Error margin of UTC reading above (± picoseconds) */
uint64_t utc_time_maxerror_picosec;
};
#endif /* __VMCLOCK_ABI_H__ */
[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5965 bytes --]
next prev parent reply other threads:[~2024-07-03 10:40 UTC|newest]
Thread overview: 66+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-12-18 7:38 [RFC PATCH v3 0/7] Add virtio_rtc module and related changes Peter Hilber
2023-12-18 7:38 ` [RFC PATCH v3 4/7] virtio_rtc: Add module and driver core Peter Hilber
2023-12-18 7:38 ` [RFC PATCH v3 5/7] virtio_rtc: Add PTP clocks Peter Hilber
2024-06-15 8:01 ` David Woodhouse
2024-06-20 12:01 ` Peter Hilber
2024-06-20 14:33 ` David Woodhouse
2023-12-18 7:38 ` [RFC PATCH v3 6/7] virtio_rtc: Add Arm Generic Timer cross-timestamping Peter Hilber
2024-06-15 7:50 ` David Woodhouse
2024-06-20 12:06 ` Peter Hilber
2024-03-07 14:02 ` [RFC PATCH v3 0/7] Add virtio_rtc module and related changes David Woodhouse
2024-03-08 10:32 ` Peter Hilber
2024-03-08 12:33 ` David Woodhouse
2024-03-11 18:24 ` Peter Hilber
2024-03-12 17:15 ` David Woodhouse
2024-03-13 9:45 ` Peter Hilber
2024-03-13 11:18 ` Alexandre Belloni
2024-03-13 12:29 ` David Woodhouse
2024-03-13 12:58 ` Alexandre Belloni
2024-03-13 14:06 ` David Woodhouse
2024-03-13 14:50 ` Alexandre Belloni
2024-03-13 20:12 ` Andrew Lunn
2024-03-14 9:13 ` Peter Hilber
2024-03-13 17:50 ` Peter Hilber
2024-03-13 14:15 ` Peter Hilber
2024-03-13 12:45 ` David Woodhouse
2024-03-13 17:50 ` Peter Hilber
2024-03-13 18:18 ` David Woodhouse
2024-03-14 10:13 ` Peter Hilber
2024-03-14 14:19 ` David Woodhouse
2024-03-19 13:47 ` Peter Hilber
2024-03-20 17:22 ` David Woodhouse
2024-06-15 8:40 ` David Woodhouse
2024-06-20 12:37 ` Peter Hilber
2024-06-20 16:19 ` David Woodhouse
2024-06-21 8:45 ` David Woodhouse
2024-06-25 19:01 ` [RFC PATCH v2] ptp: Add vDSO-style vmclock support David Woodhouse
2024-06-25 21:34 ` Thomas Gleixner
2024-06-25 21:48 ` David Woodhouse
2024-06-25 22:22 ` John Stultz
2024-06-26 8:32 ` David Woodhouse
2024-06-26 16:43 ` Richard Cochran
2024-06-27 13:50 ` Peter Hilber
2024-06-27 14:52 ` David Woodhouse
2024-06-28 11:33 ` Peter Hilber
2024-06-28 12:15 ` David Woodhouse
2024-06-28 16:38 ` Peter Hilber
2024-06-28 21:27 ` David Woodhouse
2024-07-01 8:57 ` David Woodhouse
2024-07-02 15:03 ` Peter Hilber
2024-07-02 16:39 ` David Woodhouse
2024-07-02 18:12 ` Peter Hilber
2024-07-02 18:40 ` David Woodhouse
2024-07-03 9:56 ` Peter Hilber
2024-07-03 10:40 ` David Woodhouse [this message]
2024-07-05 8:12 ` Peter Hilber
2024-07-05 15:02 ` David Woodhouse
2024-07-06 7:50 ` Peter Hilber
2024-06-27 16:03 ` David Woodhouse
2024-06-28 11:33 ` Peter Hilber
2024-06-28 11:41 ` David Woodhouse
2024-06-30 13:28 ` Simon Horman
2024-07-01 8:02 ` David Woodhouse
2024-07-01 15:39 ` Kees Cook
2024-07-03 8:00 ` David Woodhouse
2024-06-27 13:50 ` [RFC PATCH v3 0/7] Add virtio_rtc module and related changes Peter Hilber
2024-06-21 14:02 ` David Woodhouse
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=352a7f910269daf1a7ff57ea4a41a306d6981b21.camel@infradead.org \
--to=dwmw2@infradead.org \
--cc=a.zummo@towertech.it \
--cc=alexandre.belloni@bootlin.com \
--cc=chashper@amazon.com \
--cc=christopher.s.hall@intel.com \
--cc=daniel.lezcano@linaro.org \
--cc=jasowang@redhat.com \
--cc=jstultz@google.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-rtc@vger.kernel.org \
--cc=mark.rutland@arm.com \
--cc=maz@kernel.org \
--cc=mst@redhat.com \
--cc=netdev@vger.kernel.org \
--cc=peter.hilber@opensynergy.com \
--cc=richardcochran@gmail.com \
--cc=ridouxj@amazon.com \
--cc=rluu@amazon.com \
--cc=sboyd@kernel.org \
--cc=tglx@linutronix.de \
--cc=virtio-dev@lists.linux.dev \
--cc=virtualization@lists.linux.dev \
--cc=xuanzhuo@linux.alibaba.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).