All of lore.kernel.org
 help / color / mirror / Atom feed
From: Miroslav Lichvar <mlichvar@redhat.com>
To: David Woodhouse <dwmw2@infradead.org>
Cc: "Richard Cochran" <richardcochran@gmail.com>,
	"Wen Gu" <guwen@linux.alibaba.com>,
	"Andrew Lunn" <andrew+netdev@lunn.ch>,
	"David S. Miller" <davem@davemloft.net>,
	"Eric Dumazet" <edumazet@google.com>,
	"Jakub Kicinski" <kuba@kernel.org>,
	"Paolo Abeni" <pabeni@redhat.com>,
	"John Stultz" <jstultz@google.com>,
	"Thomas Gleixner" <tglx@kernel.org>,
	"Stephen Boyd" <sboyd@kernel.org>,
	"Anna-Maria Behnsen" <anna-maria@linutronix.de>,
	"Frederic Weisbecker" <frederic@kernel.org>,
	"Shuah Khan" <shuah@kernel.org>,
	"Peter Zijlstra" <peterz@infradead.org>,
	"Thomas Weißschuh" <thomas.weissschuh@linutronix.de>,
	"Arnd Bergmann" <arnd@arndb.de>,
	"Julien Ridoux" <ridouxj@amazon.com>,
	"Ryan Luu" <rluu@amazon.com>,
	linux-kernel@vger.kernel.org,
	"Marcelo Tosatti" <mtosatti@redhat.com>
Subject: Re: [RFC PATCH v2 0/8] timekeeping: Fix draft tracking precision and add feed-forward discipline via vmclock
Date: Wed, 27 May 2026 09:46:50 +0200	[thread overview]
Message-ID: <ahahalJIqU7Mn967@localhost> (raw)
In-Reply-To: <69a953d665738f5021e511c44e193dd832ba009b.camel@infradead.org>

On Tue, May 26, 2026 at 11:00:28AM +0100, David Woodhouse wrote:
> Let us assume that userspace, either from vmclock or direct discipline
> of the arch counter against external sources, has: 
>   • Reference time T.
>   • Arch counter value at time T.
>   • Period of a single arch counter tick.
> 
> This translates fairly directly into the kernel's tick_length and
> time_offset. But *only* if you know cycle_interval, ntp_error and other
> details. Which is why my timekeeping_set_reference() takes the
> information in that form, and then translates it within the core
> timekeeping.
> 
> If you can show me how to do that with adjtimex(), that would be great.

tick_length can be set by the adjtimex() modes ADJ_FREQUENCY (in
scaled units of 1/65536 ppm up to 500 ppm) and ADJ_TICK (in
microseconds per 1/USER_HZ tick).

time_offset can be set by the ADJ_OFFSET mode. The PLL needs to be
enabled first by setting the STA_PLL status (ADJ_STATUS mode) and also
the STA_FREQHOLD flag needs to be set to avoid changing the PLL
frequency.

The ntp_error and other details need to be exposed to userspace. Maybe
in the same API that will be used for reporting the time and frequency
offsets between system clocks.

> As chrony introduces a change on the host, QEMU propagates that to the
> guest (the vmclock: line is from QEMU), and the guest adjusts
> accordingly. And then converges *really* slowly, as even setting the
> time constant to 0 gives a half-life for time_offset of about 11
> seconds.

A simple linear slew would be better for this. The offset is accurate,
there is no need for filtering.

> Given the simplicity of the 'bad shortcut', and the fact that we do
> want the kernel to follow the reference at *boot* time, I do think I'd
> like to have a mode for microvms which optionally *allows* the kernel
> to continue to track the reference for itself rather than having an
> extra userspace tool that literally just polling on /dev/vmclock in
> order to feed precisely that same information back into the kernel
> directly.

Setting the values on boot in the kernel makes sense to me. There is
no loop involved. It follows the setting of the system clock from the
RTC.

> > I think a better solution is scaling of the clocksource, i.e. a layer
> > below the realtime clock. An additional multiplier applied in HW or
> > SW. That would address the problem for all system clocks, not just the
> > realtime clock. adjtimex() changes are applied on top of that, they
> > are not in conflict.
> 
> But we literally already have a way to 'scale' the counter in order to
> derive CLOCK_MONOTONIC/CLOCK_REALTIME: the kernel's timekeeping code.
> Currently driven *only* by NTP/adjtimex().

I see that as a different purpose than guest migrations. A migrated
guest should have its clocksource frequency corrected while the clock
is controlled by NTP/PTP. If this mechanism was shared, that would not
be possible.

> Are you suggesting that the actual clocksource driver in the kernel for
> e.g. CSID_ARM_ARCH_COUNTER should *scale* the results it returns,
> instead of giving raw counter reads? So we have some NTP-like process
> to adjust each clocksource, in *addition* to the core kernel
> timekeeping?

Not so much NTP-like. There would be no mult dithering or phase
adjustments, only frequency.

> And then those skewed clocksource values are only
> meaningful under a seqlock like the existing kernel timekeeper values
> are valid under the tk_data.seq seqlock?

I guess you are implying here this SW-fallback scaling would have a
significant impact on the performance. Could it not be applied at the
same time as the normal multiplier in the conversion to nanoseconds?

> And would we have a separate way to get real value, to use for
> CLOCK_MONOTONIC_RAW?

All system clocks should be scaled, that's my point.

-- 
Miroslav Lichvar


  reply	other threads:[~2026-05-27  7:47 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-17 21:25 [RFC PATCH v2 0/8] timekeeping: Fix draft tracking precision and add feed-forward discipline via vmclock David Woodhouse
2026-05-17 21:25 ` [RFC PATCH v2 1/8] timekeeping: Remove xtime_remainder from ntp_error accumulation David Woodhouse
2026-05-17 21:25 ` [RFC PATCH v2 2/8] timekeeping: Account for clawback adjustment in ntp_error David Woodhouse
2026-05-19  1:59   ` John Stultz
2026-05-19 10:04     ` David Woodhouse
2026-05-19 19:28       ` John Stultz
2026-05-20 10:47         ` Miroslav Lichvar
2026-05-20 12:37           ` David Woodhouse
2026-05-17 21:25 ` [RFC PATCH v2 3/8] timekeeping: Clamp time_offset delta to prevent infinite tail David Woodhouse
2026-05-19 13:25   ` Miroslav Lichvar
2026-05-19 13:31     ` David Woodhouse
2026-05-19 14:17       ` Miroslav Lichvar
2026-05-19 15:06         ` David Woodhouse
2026-05-17 21:25 ` [RFC PATCH v2 4/8] timekeeping: Add absolute reference for feed-forward clock discipline David Woodhouse
2026-05-19  2:09   ` John Stultz
2026-05-19 11:07     ` David Woodhouse
2026-05-17 21:25 ` [RFC PATCH v2 5/8] ptp_vmclock: Feed reference to timekeeping for feed-forward discipline David Woodhouse
2026-05-17 21:25 ` [RFC PATCH v2 6/8] timekeeping: Guard against divide-by-zero in timekeeping_adjust David Woodhouse
2026-05-17 21:25 ` [RFC PATCH v2 7/8] timekeeping: Drive time_offset skew via per-tick ntp_error transfer David Woodhouse
2026-05-17 21:25 ` [RFC PATCH v2 8/8] WIP: kernel/time: Add /dev/vmclock_host miscdev David Woodhouse
2026-05-19 13:16 ` [RFC PATCH v2 0/8] timekeeping: Fix draft tracking precision and add feed-forward discipline via vmclock Miroslav Lichvar
2026-05-19 15:50   ` David Woodhouse
2026-05-20 10:39     ` Miroslav Lichvar
2026-05-20 12:21       ` David Woodhouse
2026-05-21  6:35         ` Miroslav Lichvar
2026-05-21  9:54           ` David Woodhouse
2026-05-25  8:08             ` Miroslav Lichvar
2026-05-25  9:14               ` David Woodhouse
2026-05-26  7:10                 ` Miroslav Lichvar
2026-05-26 10:00                   ` David Woodhouse
2026-05-27  7:46                     ` Miroslav Lichvar [this message]
2026-05-27 12:28                       ` David Woodhouse
2026-05-21 18:30         ` Thomas Gleixner
2026-05-21 21:06           ` David Woodhouse
2026-05-22  8:02             ` Thomas Gleixner
2026-05-22 10:01               ` David Woodhouse
2026-05-22 15:28                 ` Thomas Gleixner
2026-05-22 16:23                   ` David Woodhouse
2026-05-24 12:36                     ` Thomas Gleixner
2026-05-24 13:13                       ` David Woodhouse
2026-05-24 15:05                         ` Thomas Gleixner
2026-05-25  8:06                       ` Arthur Kiyanovski
2026-05-25  8:41                         ` David Woodhouse
2026-05-26 14:12                         ` Thomas Gleixner
2026-05-22 16:50                   ` David Woodhouse
2026-05-24 15:15                     ` Thomas Gleixner
2026-05-24 15:37                       ` Thomas Gleixner
2026-05-24 15:48                         ` Thomas Gleixner
2026-05-24 16:36                         ` Thomas Gleixner
2026-05-24 16:42                           ` David Woodhouse

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ahahalJIqU7Mn967@localhost \
    --to=mlichvar@redhat.com \
    --cc=andrew+netdev@lunn.ch \
    --cc=anna-maria@linutronix.de \
    --cc=arnd@arndb.de \
    --cc=davem@davemloft.net \
    --cc=dwmw2@infradead.org \
    --cc=edumazet@google.com \
    --cc=frederic@kernel.org \
    --cc=guwen@linux.alibaba.com \
    --cc=jstultz@google.com \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mtosatti@redhat.com \
    --cc=pabeni@redhat.com \
    --cc=peterz@infradead.org \
    --cc=richardcochran@gmail.com \
    --cc=ridouxj@amazon.com \
    --cc=rluu@amazon.com \
    --cc=sboyd@kernel.org \
    --cc=shuah@kernel.org \
    --cc=tglx@kernel.org \
    --cc=thomas.weissschuh@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.