From: Petr Mladek <pmladek@suse.com>
To: "Bird, Tim" <Tim.Bird@sony.com>
Cc: Michael Kelley <mhklinux@outlook.com>,
"rostedt@goodmis.org" <rostedt@goodmis.org>,
"senozhatsky@chromium.org" <senozhatsky@chromium.org>,
"francesco@valla.it" <francesco@valla.it>,
"geert@linux-m68k.org" <geert@linux-m68k.org>,
"linux-embedded@vger.kernel.org" <linux-embedded@vger.kernel.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
Thomas Gleixner <tglx@kernel.org>,
John Stultz <jstultz@google.com>, Stephen Boyd <sboyd@kernel.org>,
John Ogness <john.ogness@linutronix.de>
Subject: Re: [PATCH v3] printk: fix zero-valued printk timestamps in early boot
Date: Fri, 13 Mar 2026 11:45:02 +0100 [thread overview]
Message-ID: <abPqrqgXHnDu8XWf@pathway> (raw)
In-Reply-To: <MW5PR13MB563216F6D786D4DCDFEBED2BFD45A@MW5PR13MB5632.namprd13.prod.outlook.com>
Finally added timekeeping maintainers and John into Cc.
We should have added time since v1.
Anyway, you might see the entire history at
https://lore.kernel.org/all/39b09edb-8998-4ebd-a564-7d594434a981@bird.org/
On Fri 2026-03-13 04:52:40, Bird, Tim wrote:
> Hey Micheal,
>
> This report is very interesting.
>
> Thanks very much for trying it out!
>
> > -----Original Message-----
> > From: Michael Kelley <mhklinux@outlook.com>
> > Sent: Wednesday, March 11, 2026 9:47 AM
> > From: Tim Bird <tim.bird@sony.com> Sent: Tuesday, February 10, 2026 3:48 PM
> > >
> > > During early boot, printk timestamps are reported as zero before
> > > kernel timekeeping starts (e.g. before time_init()). This
> > > hinders boot-time optimization efforts. This period is about 400
> > > milliseconds for many current desktop and embedded machines
> > > running Linux.
> > >
> > > Add support to save cycles during early boot, and output correct
> > > timestamp values after timekeeping is initialized. get_cycles()
> > > is operational on arm64 and x86_64 from kernel start. Add code
> > > and variables to save calibration values used to later convert
> > > cycle counts to time values in the early printks. Add a config
> > > to control the feature.
> > >
> > > This yields non-zero timestamps for printks from the very start
> > > of kernel execution. The timestamps are relative to the start of
> > > the architecture-specified counter used in get_cycles
> > > (e.g. the TSC on x86_64 and cntvct_el0 on arm64).
> > >
> > > All timestamps reflect time from processor power-on instead of
> > > time from the kernel's timekeeping initialization.
> >
> > I tried this patch in linux-next20260302 kernel running as a guest VM
> > on a Hyper-V host. Two things:
> >
> > 1) In the dmesg output, I'm seeing a place where the timestamps briefly go
> > backwards -- i.e., they are not monotonically increasing. Here's a snippet,
> > where there's a smaller timestamp immediately after the tsc is detected:
> >
> > [ 27.994891] SMBIOS 3.1.0 present.
> > [ 27.994893] DMI: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.1 09/25/2025
> > [ 27.994898] DMI: Memory slots populated: 2/2
> > [ 27.995202] Hypervisor detected: Microsoft Hyper-V
> > [ 27.995205] Hyper-V: privilege flags low 0xae7f, high 0x3b8030, ext 0x62, hints 0xa0e24, misc 0xe0bed7b2
> > [ 27.995208] Hyper-V: Nested features: 0x0
> > [ 27.995209] Hyper-V: LAPIC Timer Frequency: 0xc3500
> > [ 27.995210] Hyper-V: Using hypercall for remote TLB flush
> > [ 27.995216] clocksource: hyperv_clocksource_tsc_page: mask: 0xffffffffffffffff max_cycles: 0x24e6a1710, max_idle_ns: 440795202120 ns
> > [ 27.995218] clocksource: hyperv_clocksource_msr: mask: 0xffffffffffffffff max_cycles: 0x24e6a1710, max_idle_ns: 440795202120 ns
> > [ 27.995220] tsc: Detected 2918.401 MHz processor
>
> I wonder if the tsc is getting fiddled with or virtualized somewhere in here, as part of clocksource initialization.
> I believe each clocksource in the kernel maintains it's own internal offset, and maybe the offset that is
> being used ends up being slightly different from the cycle-counter offset that the early_times feature uses.
> I'm just throwing out guesses. It's about a 4ms delta, which is pretty big.
>
> > [ 27.991060] e820: update [mem 0x00000000-0x00000fff] System RAM ==> device reserved
> > [ 27.991062] e820: remove [mem 0x000a0000-0x000fffff] System RAM
> > [ 27.991064] last_pfn = 0x210000 max_arch_pfn = 0x400000000
> > [ 27.991065] x86/PAT: PAT support disabled because CONFIG_X86_PAT is disabled in the kernel.
> > [ 27.991066] x86/PAT: Configuration [0-7]: WB WT UC- UC WB WT UC- UC
I wonder how the calibration of the cycles is precise. I wonder if
the problem might be that cycles were faster right after boot than
later during the calibration.
I added the following debug output on top of this patch:
diff --git a/include/linux/early_times.h b/include/linux/early_times.h
index 05388dcb8573..cdb467345bcc 100644
--- a/include/linux/early_times.h
+++ b/include/linux/early_times.h
@@ -20,6 +20,7 @@ static inline void early_times_start_calibration(void)
{
start_cycles = get_cycles();
start_ns = local_clock();
+ pr_info("Early printk times: started callibration: %llu ns\n", start_ns);
}
static inline void early_times_finish_calibration(void)
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 774ffb1fa5ac..836cb03aaa6d 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2308,6 +2308,8 @@ int vprintk_store(int facility, int level,
ts_nsec = local_clock();
if (!ts_nsec)
ts_nsec = early_cycles();
+ else
+ pr_info_once("local_clock() returned non-zero timestamp: %llu nsec\n", ts_nsec);
caller_id = printk_caller_id();
And it produced in my kvm:
Let's say that start of the cycle counter is
Start of stage A
[ 8.684438] Linux version 7.0.0-rc2-default+ (pmladek@pathway) (gcc (SUSE Linux) 15.2.1 20260202, GNU ld (GNU Binutils; openSUSE Tumbleweed) 2.45.0.20251103-2) #571 SMP PREEMPT_DYNAMIC Fri Mar 13 10:23:54 CET 2026
[ 8.684442] Command line: BOOT_IMAGE=/boot/vmlinuz-7.0.0-rc2-default+ root=/dev/vda2 resume=/dev/disk/by-uuid/369c7453-3d16-409d-88b2-5de027891a12 mitigations=auto nosplash earlycon=uart8250,io,0x3f8,115200 console=ttyS0,115200 console=tty0 ignore_loglevel log_buf_len=1M crashkernel=512M,high crashkernel=72M,low
[...]
[ 8.696633] earlycon: uart8250 at I/O port 0x3f8 (options '115200')
[ 8.696639] printk: legacy bootconsole [uart8250] enabled
[ 8.731303] printk: debug: ignoring loglevel setting.
[ 8.732349] NX (Execute Disable) protection: active
[ 8.733447] APIC: Static calls initialized
[ 8.734667] SMBIOS 2.8 present.
[ 8.735358] DMI: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.17.0-2-g4f253b9b-prebuilt.qemu.org 04/01/2014
[ 8.737285] DMI: Memory slots populated: 1/1
[ 8.738380] Hypervisor detected: KVM
[ 8.739151] last_pfn = 0x7ffdc max_arch_pfn = 0x400000000
[ 8.740254] kvm-clock: Using msrs 4b564d01 and 4b564d00
[ 8.732971] printk: local_clock() returned non-zero timestamp: 3486 nsec
End of stage A
This is the point where printk() started storing the values from
local_clock() instead of cycles.
Start of stage B
[ 8.732971] kvm-clock: using sched offset of 252367014082295 cycles
[ 8.735471] clocksource: kvm-clock: mask: 0xffffffffffffffff max_cycles: 0x1cd42e4dffb, max_idle_ns: 881590591483 ns
[ 8.738880] tsc: Detected 3293.776 MHz processor
[ 8.740474] e820: update [mem 0x00000000-0x00000fff] System RAM ==> device reserved
[...]
[ 8.932671] rcu: srcu_init: Setting srcu_struct sizes based on contention.
[ 8.934047] Early printk times: started callibration: 201079507 ns
End of stage B
This is where we started calibration of early cycles.
Start of stage C
[ 8.935571] Console: colour dummy device 80x25
[...]
[ 9.289673] thermal_sys: Registered thermal governor 'fair_share'
[ 9.290077] thermal_sys: Registered thermal governor 'bang_bang'
[ 9.292290] thermal_sys: Registered thermal governor 'step_wise'
[ 9.293530] thermal_sys: Registered thermal governor 'user_space'
[ 9.294856] cpuidle: using governor ladder
[ 9.296302] cpuidle: using governor menu
Here the thermal governors are registered. I guess that they might
reduce speed of some HW.
[...]
[ 11.974147] clk: Disabling unused clocks
Some unused clocks are disabled. I wonder if this might affect
counting the cycles.
[ 12.330852] Freeing unused kernel image (rodata/data gap) memory: 1500K
[ 12.351191] Early printk times: mult=19634245, shift=26, offset=8732967929 ns
End of stage C
Here is the end on calibration.
Now, if the frequence of the cycles was:
+ was higher in the stage A when only cycles were stored
+ was lower in stage C when it was calibrated against local_clock()
Then it might result in higher (calibrated) timestamps in stage A
and step back in stage B.
Or something like this. It is possible that even local_clock() does
not have a stable frequence during the early boot.
Idea: A solution might be to start calibration when printk()
gets first non-zero time from local_clock.
Something like:
diff --git a/include/linux/early_times.h b/include/linux/early_times.h
index 05388dcb8573..09d278996184 100644
--- a/include/linux/early_times.h
+++ b/include/linux/early_times.h
@@ -16,10 +16,13 @@ extern u64 start_ns;
extern u32 early_mult, early_shift;
extern u64 early_ts_offset;
-static inline void early_times_start_calibration(void)
+static inline void early_times_may_start_calibration(u64 ts_ns)
{
+ if (start_ns)
+ return;
+
+ start_ns = ts_ns;
start_cycles = get_cycles();
- start_ns = local_clock();
}
static inline void early_times_finish_calibration(void)
diff --git a/init/main.c b/init/main.c
index 27835270dfb5..a333b0da69cf 100644
--- a/init/main.c
+++ b/init/main.c
@@ -1123,9 +1123,6 @@ void start_kernel(void)
timekeeping_init();
time_init();
- /* This must be after timekeeping is initialized */
- early_times_start_calibration();
-
/* This must be after timekeeping is initialized */
random_init();
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 774ffb1fa5ac..19330b6b4eb2 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2306,7 +2306,9 @@ int vprintk_store(int facility, int level,
* timestamp with respect to the caller.
*/
ts_nsec = local_clock();
- if (!ts_nsec)
+ if (ts_nsec)
+ early_times_may_start_calibration(ts_nsec);
+ else
ts_nsec = early_cycles();
caller_id = printk_caller_id();
> >
> > 2) A Linux VM running in the Azure cloud is also running on Hyper-V. Such a
> > VM typically uses cloud-init to set everything up at boot time, and cloud-init
> > is outputting lines to the serial console with a timestamp that looks like the
> > printk() timestamp, but apparently is not adjusted for the early timestamping
> > that this patch adds. Again, I haven't debugged what's going on -- I'm not
> > immediately sure of the mechanism that cloud-init uses to do output to the
> > serial console. The use of the Hyper-V synthetic clock source might the cause
> > of the problem here as well. Here's an output snippet from the serial console:
> >
> > [ 20.330414] systemd[1]: Condition check resulted in OpenVSwitch configuration for cleanup being skipped.
> > [ 20.332911] systemd[1]: Condition check resulted in Set Up Additional Binary Formats being skipped.
> > [ 20.333257] pstore: Registered efi_pstore as persistent store backend
> > [ 20.334360] systemd[1]: Condition check resulted in File System Check on Root Device being skipped.
> > [ 20.338319] systemd[1]: Starting Load Kernel Modules...
> > [ 20.341094] systemd[1]: Starting Remount Root and Kernel File Systems...
> > [ 20.350993] systemd[1]: Starting udev Coldplug all Devices...
> > [ 20.356255] systemd[1]: Starting Uncomplicated firewall...
> > [ 20.361536] systemd[1]: Started Journal Service.
> > [ 20.386902] EXT4-fs (sda1): re-mounted c02dce0c-0c40-4e6e-88af-c5a0987b0adb r/w.
> > [ 22.532033] /dev/sr0: Can't lookup blockdev
> > [ 7.955973] cloud-init[783]: Cloud-init v. 24.3.1-0ubuntu0~20.04.1 running 'init-local' at Wed, 11 Mar 2026 15:27:06 +0000. Up 7.48
> > seconds.
> > [ 9.933120] cloud-init[822]: Cloud-init v. 24.3.1-0ubuntu0~20.04.1 running 'init' at Wed, 11 Mar 2026 15:27:08 +0000. Up 9.82 seconds.
> > [ 9.935483] cloud-init[822]: ci-info: ++++++++++++++++++++++++++++++++++++++Net device
> > info+++++++++++++++++++++++++++++++++++++++
> > [ 9.937726] cloud-init[822]: ci-info: +--------+------+-----------------------------+---------------+--------+-------------------+
> > [ 9.939905] cloud-init[822]: ci-info: | Device | Up | Address | Mask | Scope | Hw-Address |
> > [ 9.942059] cloud-init[822]: ci-info: +--------+------+-----------------------------+---------------+--------+-------------------+
This is more complicated. I wonder if the timestamps from cloud-init()
are somehow synchronized with local_clock().
We might need to synchronize local_clock() with the cycles as well.
But there is the chicken&egg problem. We need:
+ to know the offset caused by cycles when local_clock() gets initialized.
+ local_clock() running for some time to calibrate cycles.
Hmm, I see that time-management people are not in Cc. We should have
added them since v1.
I add them now. Better late than never.
Best Regards,
Petr
next prev parent reply other threads:[~2026-03-13 10:45 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-25 5:30 [PATCH] printk: add early_counter_ns routine for printk blind spot Tim Bird
2025-11-25 7:52 ` kernel test robot
2025-11-25 13:08 ` Francesco Valla
2025-11-26 7:38 ` Geert Uytterhoeven
2025-11-27 0:16 ` Bird, Tim
2025-11-27 16:16 ` Petr Mladek
2025-11-26 12:55 ` Petr Mladek
2025-11-27 0:03 ` Bird, Tim
2025-11-26 11:13 ` Petr Mladek
2025-11-27 9:13 ` kernel test robot
2026-01-24 19:40 ` [PATCH v2] printk: fix zero-valued printk timestamps in early boot Tim Bird
2026-01-25 14:41 ` Francesco Valla
2026-01-26 16:52 ` Bird, Tim
2026-02-02 16:23 ` Petr Mladek
2026-01-26 10:12 ` Geert Uytterhoeven
2026-01-26 17:11 ` Bird, Tim
2026-01-27 8:10 ` Geert Uytterhoeven
2026-02-10 23:47 ` [PATCH v3] " Tim Bird
2026-03-04 11:23 ` Petr Mladek
2026-03-09 17:27 ` Shashank Balaji
2026-03-10 10:43 ` Petr Mladek
2026-03-10 19:17 ` Bird, Tim
2026-03-09 19:25 ` Shashank Balaji
2026-03-10 11:39 ` Petr Mladek
2026-03-10 18:54 ` Bird, Tim
2026-03-11 15:45 ` Petr Mladek
2026-03-11 15:47 ` Michael Kelley
2026-03-13 4:52 ` Bird, Tim
2026-03-13 10:45 ` Petr Mladek [this message]
2026-03-14 14:16 ` Shashank Balaji
2026-03-24 20:07 ` Bird, Tim
2026-03-14 16:15 ` Michael Kelley
2026-03-24 19:47 ` Bird, Tim
2026-03-26 9:24 ` John Ogness
2026-03-27 18:04 ` Bird, Tim
2026-03-20 18:15 ` Bird, Tim
2026-03-26 13:17 ` Thomas Gleixner
2026-03-27 18:48 ` Bird, Tim
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=abPqrqgXHnDu8XWf@pathway \
--to=pmladek@suse.com \
--cc=Tim.Bird@sony.com \
--cc=francesco@valla.it \
--cc=geert@linux-m68k.org \
--cc=john.ogness@linutronix.de \
--cc=jstultz@google.com \
--cc=linux-embedded@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mhklinux@outlook.com \
--cc=rostedt@goodmis.org \
--cc=sboyd@kernel.org \
--cc=senozhatsky@chromium.org \
--cc=tglx@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox