All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH v2 0/8] timekeeping: Fix draft tracking precision and add feed-forward discipline via vmclock
@ 2026-05-17 21:25 David Woodhouse
  2026-05-17 21:25 ` [RFC PATCH v2 1/8] timekeeping: Remove xtime_remainder from ntp_error accumulation David Woodhouse
                   ` (8 more replies)
  0 siblings, 9 replies; 50+ messages in thread
From: David Woodhouse @ 2026-05-17 21:25 UTC (permalink / raw)
  To: Richard Cochran, Wen Gu, David Woodhouse, Andrew Lunn,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	John Stultz, Thomas Gleixner, Stephen Boyd, Anna-Maria Behnsen,
	Frederic Weisbecker, Shuah Khan, Peter Zijlstra,
	Thomas Weißschuh, Arnd Bergmann, Miroslav Lichvar,
	Julien Ridoux, Ryan Luu, linux-kernel

This is v2 of the series to add feed-forward clock discipline, allowing
a guest kernel to lock its system clock directly to a hypervisor-provided
vmclock reference with sub-10ns precision and no drift.

The vmclock device (https://uapi-group.org/specifications/specs/vmclock/)
provides a shared memory page containing a linear time function:
time = base + (counter - counter_value) × period. The guest can read
this at any time to determine the hypervisor's view of the current time,
without a VM exit. Unlike guest-driven NTP, it allows for accurate time
to be preserved across live migration.

The existing ptp_vmclock driver already exposes this as a PTP clock for
userspace consumers (phc2sys, chrony). This series adds kernel-internal
consumption: the tick mechanism can clamp directly to the vmclock
reference, eliminating the need for NTP to discipline the guest clock.

The previous series introduced an external oracle to drive the per-tick 
dithering mechanism towards the reference clock. By fixing all the 
inaccuracies and systematic drift in the kernel's own tracking, we can 
dispense with the external oracle and just configure the timekeeping 
using the existing frequency/tick_length and time_offset/ntp_error 
mechanisms.

Changes since v1 (RFC):
 • Fixed three additional issues in the timekeeping code that were
   discovered during nanosecond-precision testing with the vmclock
   reference:
   - The clawback adjustment in timekeeping_apply_adjustment() moved
     xtime without updating ntp_error (patch 2).
   - The exponential tail of ntp_offset_chunk() asymptotically approached
     zero, preventing convergence to the final nanosecond (patch 3).
   - A divide-by-zero in timekeeping_adjust() when cycle_interval is
     momentarily zero during TSC recalibration on KVM guests (patch 4).
 • Replaced the per-tick absolute reference clamping with a cleaner
   mechanism: the skew from time_offset is now driven by per-tick
   transfer into ntp_error with a matching mult adjustment, rather than
   by inflating tick_length (patch 7). This gives exact per-tick
   accounting of the time_offset drain with no rounding loss.
 • The timekeeping_set_reference() API (patch 5) sets time_offset and
   the frequency, letting the standard skew mechanism handle convergence.

The series:

Patches 1-4: Timekeeping bugfixes (suitable for stable/independent review)
  1. Remove stale xtime_remainder from ntp_error accumulation.
  2. Account for clawback adjustment in ntp_error.
  3. Clamp time_offset delta to prevent infinite exponential tail.
  4. Guard against divide-by-zero during clocksource recalibration.

Patches 5-6: Feed-forward reference clock infrastructure
  5. Add timekeeping_set_reference() API for external clock references.
  6. Wire ptp_vmclock to call timekeeping_set_reference() on probe.

Patch 7: Improved time_offset skew mechanism
  7. Drive time_offset skew via per-tick ntp_error transfer instead of
     tick_length inflation, with mult adjustment for dithering bandwidth.
     (we can't *yet* kill tick_length_base; I have to frown at adjtime()
     some more first).

Patch 8: Host-side vmclock page export (WIP)
  8. Add /dev/vmclock_host miscdev for VMM consumption.

Tested with QEMU passing through a vmclock device to a guest¹. The guest 
clock converges to the reference within seconds and remains within 
single digit nanoseconds indefinitely, with no further external 
correction. Injecting a ±10µs offset via ntp_set_time_offset() converges 
to the target via the same exponential decay as before over about 70 
seconds, and retains the same single-digit nanosecond jitter around 
precisely ±10000ns once converged. Obviously in real usage, the 
reference will be periodically changing too, but the feed-forward setup 
does rely on the kernel being able to converge to, and remain on, the 
precise line it's given.

¹ https://git.infradead.org/?p=users/dwmw2/qemu.git;a=shortlog;h=refs/heads/vmclock-passthrough

David Woodhouse (8):
      timekeeping: Remove xtime_remainder from ntp_error accumulation
      timekeeping: Account for clawback adjustment in ntp_error
      timekeeping: Clamp time_offset delta to prevent infinite tail
      timekeeping: Guard against divide-by-zero in timekeeping_adjust
      timekeeping: Add absolute reference for feed-forward clock discipline
      ptp_vmclock: Feed reference to timekeeping for feed-forward discipline
      timekeeping: Drive time_offset skew via per-tick ntp_error transfer
      WIP: kernel/time: Add /dev/vmclock_host miscdev

 drivers/ptp/ptp_vmclock.c                          |  79 +++++
 include/linux/timekeeper_internal.h                |   3 +-
 include/linux/timekeeping_reference.h              |  19 ++
 include/linux/vmclock_host.h                       |  17 ++
 kernel/time/Kconfig                                |   8 +
 kernel/time/Makefile                               |   1 +
 kernel/time/ntp.c                                  |  72 ++++-
 kernel/time/ntp_internal.h                         |   6 +
 kernel/time/timekeeping.c                          |  83 +++++-
 kernel/time/vmclock_host.c                         | 319 +++++++++++++++++++++
 tools/testing/selftests/timers/vmclock_host_test.c | 171 +++++++++++
 11 files changed, 766 insertions(+), 12 deletions(-)


^ permalink raw reply	[flat|nested] 50+ messages in thread

end of thread, other threads:[~2026-05-27 12:29 UTC | newest]

Thread overview: 50+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-17 21:25 [RFC PATCH v2 0/8] timekeeping: Fix draft tracking precision and add feed-forward discipline via vmclock David Woodhouse
2026-05-17 21:25 ` [RFC PATCH v2 1/8] timekeeping: Remove xtime_remainder from ntp_error accumulation David Woodhouse
2026-05-17 21:25 ` [RFC PATCH v2 2/8] timekeeping: Account for clawback adjustment in ntp_error David Woodhouse
2026-05-19  1:59   ` John Stultz
2026-05-19 10:04     ` David Woodhouse
2026-05-19 19:28       ` John Stultz
2026-05-20 10:47         ` Miroslav Lichvar
2026-05-20 12:37           ` David Woodhouse
2026-05-17 21:25 ` [RFC PATCH v2 3/8] timekeeping: Clamp time_offset delta to prevent infinite tail David Woodhouse
2026-05-19 13:25   ` Miroslav Lichvar
2026-05-19 13:31     ` David Woodhouse
2026-05-19 14:17       ` Miroslav Lichvar
2026-05-19 15:06         ` David Woodhouse
2026-05-17 21:25 ` [RFC PATCH v2 4/8] timekeeping: Add absolute reference for feed-forward clock discipline David Woodhouse
2026-05-19  2:09   ` John Stultz
2026-05-19 11:07     ` David Woodhouse
2026-05-17 21:25 ` [RFC PATCH v2 5/8] ptp_vmclock: Feed reference to timekeeping for feed-forward discipline David Woodhouse
2026-05-17 21:25 ` [RFC PATCH v2 6/8] timekeeping: Guard against divide-by-zero in timekeeping_adjust David Woodhouse
2026-05-17 21:25 ` [RFC PATCH v2 7/8] timekeeping: Drive time_offset skew via per-tick ntp_error transfer David Woodhouse
2026-05-17 21:25 ` [RFC PATCH v2 8/8] WIP: kernel/time: Add /dev/vmclock_host miscdev David Woodhouse
2026-05-19 13:16 ` [RFC PATCH v2 0/8] timekeeping: Fix draft tracking precision and add feed-forward discipline via vmclock Miroslav Lichvar
2026-05-19 15:50   ` David Woodhouse
2026-05-20 10:39     ` Miroslav Lichvar
2026-05-20 12:21       ` David Woodhouse
2026-05-21  6:35         ` Miroslav Lichvar
2026-05-21  9:54           ` David Woodhouse
2026-05-25  8:08             ` Miroslav Lichvar
2026-05-25  9:14               ` David Woodhouse
2026-05-26  7:10                 ` Miroslav Lichvar
2026-05-26 10:00                   ` David Woodhouse
2026-05-27  7:46                     ` Miroslav Lichvar
2026-05-27 12:28                       ` David Woodhouse
2026-05-21 18:30         ` Thomas Gleixner
2026-05-21 21:06           ` David Woodhouse
2026-05-22  8:02             ` Thomas Gleixner
2026-05-22 10:01               ` David Woodhouse
2026-05-22 15:28                 ` Thomas Gleixner
2026-05-22 16:23                   ` David Woodhouse
2026-05-24 12:36                     ` Thomas Gleixner
2026-05-24 13:13                       ` David Woodhouse
2026-05-24 15:05                         ` Thomas Gleixner
2026-05-25  8:06                       ` Arthur Kiyanovski
2026-05-25  8:41                         ` David Woodhouse
2026-05-26 14:12                         ` Thomas Gleixner
2026-05-22 16:50                   ` David Woodhouse
2026-05-24 15:15                     ` Thomas Gleixner
2026-05-24 15:37                       ` Thomas Gleixner
2026-05-24 15:48                         ` Thomas Gleixner
2026-05-24 16:36                         ` Thomas Gleixner
2026-05-24 16:42                           ` David Woodhouse

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.