public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Petr Mladek <pmladek@suse.com>
To: "Bird, Tim" <Tim.Bird@sony.com>
Cc: Michael Kelley <mhklinux@outlook.com>,
	"rostedt@goodmis.org" <rostedt@goodmis.org>,
	"senozhatsky@chromium.org" <senozhatsky@chromium.org>,
	"francesco@valla.it" <francesco@valla.it>,
	"geert@linux-m68k.org" <geert@linux-m68k.org>,
	"linux-embedded@vger.kernel.org" <linux-embedded@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Thomas Gleixner <tglx@kernel.org>,
	John Stultz <jstultz@google.com>, Stephen Boyd <sboyd@kernel.org>,
	John Ogness <john.ogness@linutronix.de>
Subject: Re: [PATCH v3] printk: fix zero-valued printk timestamps in early boot
Date: Fri, 13 Mar 2026 11:45:02 +0100	[thread overview]
Message-ID: <abPqrqgXHnDu8XWf@pathway> (raw)
In-Reply-To: <MW5PR13MB563216F6D786D4DCDFEBED2BFD45A@MW5PR13MB5632.namprd13.prod.outlook.com>

Finally added timekeeping maintainers and John into Cc.
We should have added time since v1.

Anyway, you might see the entire history at
https://lore.kernel.org/all/39b09edb-8998-4ebd-a564-7d594434a981@bird.org/


On Fri 2026-03-13 04:52:40, Bird, Tim wrote:
> Hey Micheal,
> 
> This report is very interesting. 
> 
> Thanks very much for trying it out!
> 
> > -----Original Message-----
> > From: Michael Kelley <mhklinux@outlook.com>
> > Sent: Wednesday, March 11, 2026 9:47 AM
> > From: Tim Bird <tim.bird@sony.com> Sent: Tuesday, February 10, 2026 3:48 PM
> > >
> > > During early boot, printk timestamps are reported as zero before
> > > kernel timekeeping starts (e.g. before time_init()).  This
> > > hinders boot-time optimization efforts.  This period is about 400
> > > milliseconds for many current desktop and embedded machines
> > > running Linux.
> > >
> > > Add support to save cycles during early boot, and output correct
> > > timestamp values after timekeeping is initialized.  get_cycles()
> > > is operational on arm64 and x86_64 from kernel start.  Add code
> > > and variables to save calibration values used to later convert
> > > cycle counts to time values in the early printks.  Add a config
> > > to control the feature.
> > >
> > > This yields non-zero timestamps for printks from the very start
> > > of kernel execution.  The timestamps are relative to the start of
> > > the architecture-specified counter used in get_cycles
> > > (e.g. the TSC on x86_64 and cntvct_el0 on arm64).
> > >
> > > All timestamps reflect time from processor power-on instead of
> > > time from the kernel's timekeeping initialization.
> > 
> > I tried this patch in linux-next20260302 kernel running as a guest VM
> > on a Hyper-V host. Two things:
> > 
> > 1) In the dmesg output, I'm seeing a place where the timestamps briefly go
> > backwards -- i.e., they are not monotonically increasing. Here's a snippet,
> > where there's a smaller timestamp immediately after the tsc is detected:
> > 
> > [   27.994891] SMBIOS 3.1.0 present.
> > [   27.994893] DMI: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.1 09/25/2025
> > [   27.994898] DMI: Memory slots populated: 2/2
> > [   27.995202] Hypervisor detected: Microsoft Hyper-V
> > [   27.995205] Hyper-V: privilege flags low 0xae7f, high 0x3b8030, ext 0x62, hints 0xa0e24, misc 0xe0bed7b2
> > [   27.995208] Hyper-V: Nested features: 0x0
> > [   27.995209] Hyper-V: LAPIC Timer Frequency: 0xc3500
> > [   27.995210] Hyper-V: Using hypercall for remote TLB flush
> > [   27.995216] clocksource: hyperv_clocksource_tsc_page: mask: 0xffffffffffffffff max_cycles: 0x24e6a1710, max_idle_ns: 440795202120 ns
> > [   27.995218] clocksource: hyperv_clocksource_msr: mask: 0xffffffffffffffff max_cycles: 0x24e6a1710, max_idle_ns: 440795202120 ns
> > [   27.995220] tsc: Detected 2918.401 MHz processor
> 
> I wonder if the tsc is getting fiddled with or virtualized somewhere in here, as part of clocksource initialization.
> I believe each clocksource in the kernel maintains it's own internal offset, and maybe the offset that is
> being used ends up being slightly different from the cycle-counter offset that the early_times feature uses.
> I'm just throwing out guesses.  It's about a 4ms delta, which is pretty big.
> 
> > [   27.991060] e820: update [mem 0x00000000-0x00000fff] System RAM ==> device reserved
> > [   27.991062] e820: remove [mem 0x000a0000-0x000fffff] System RAM
> > [   27.991064] last_pfn = 0x210000 max_arch_pfn = 0x400000000
> > [   27.991065] x86/PAT: PAT support disabled because CONFIG_X86_PAT is disabled in the kernel.
> > [   27.991066] x86/PAT: Configuration [0-7]: WB  WT  UC- UC  WB  WT  UC- UC

I wonder how the calibration of the cycles is precise. I wonder if
the problem might be that cycles were faster right after boot than
later during the calibration.

I added the following debug output on top of this patch:

diff --git a/include/linux/early_times.h b/include/linux/early_times.h
index 05388dcb8573..cdb467345bcc 100644
--- a/include/linux/early_times.h
+++ b/include/linux/early_times.h
@@ -20,6 +20,7 @@ static inline void early_times_start_calibration(void)
 {
 	start_cycles = get_cycles();
 	start_ns = local_clock();
+	pr_info("Early printk times: started callibration: %llu ns\n", start_ns);
 }
 
 static inline void early_times_finish_calibration(void)
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 774ffb1fa5ac..836cb03aaa6d 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2308,6 +2308,8 @@ int vprintk_store(int facility, int level,
 	ts_nsec = local_clock();
 	if (!ts_nsec)
 		ts_nsec = early_cycles();
+	else
+		pr_info_once("local_clock() returned non-zero timestamp: %llu nsec\n", ts_nsec);
 
 	caller_id = printk_caller_id();
 

And it produced in my kvm:

Let's say that start of the cycle counter is

Start of stage A

[    8.684438] Linux version 7.0.0-rc2-default+ (pmladek@pathway) (gcc (SUSE Linux) 15.2.1 20260202, GNU ld (GNU Binutils; openSUSE Tumbleweed) 2.45.0.20251103-2) #571 SMP PREEMPT_DYNAMIC Fri Mar 13 10:23:54 CET 2026
[    8.684442] Command line: BOOT_IMAGE=/boot/vmlinuz-7.0.0-rc2-default+ root=/dev/vda2 resume=/dev/disk/by-uuid/369c7453-3d16-409d-88b2-5de027891a12 mitigations=auto nosplash earlycon=uart8250,io,0x3f8,115200 console=ttyS0,115200 console=tty0 ignore_loglevel log_buf_len=1M crashkernel=512M,high crashkernel=72M,low
[...]
[    8.696633] earlycon: uart8250 at I/O port 0x3f8 (options '115200')
[    8.696639] printk: legacy bootconsole [uart8250] enabled
[    8.731303] printk: debug: ignoring loglevel setting.
[    8.732349] NX (Execute Disable) protection: active
[    8.733447] APIC: Static calls initialized
[    8.734667] SMBIOS 2.8 present.
[    8.735358] DMI: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.17.0-2-g4f253b9b-prebuilt.qemu.org 04/01/2014
[    8.737285] DMI: Memory slots populated: 1/1
[    8.738380] Hypervisor detected: KVM
[    8.739151] last_pfn = 0x7ffdc max_arch_pfn = 0x400000000
[    8.740254] kvm-clock: Using msrs 4b564d01 and 4b564d00
[    8.732971] printk: local_clock() returned non-zero timestamp: 3486 nsec

End of stage A

This is the point where printk() started storing the values from
local_clock() instead of cycles.

Start of stage B

[    8.732971] kvm-clock: using sched offset of 252367014082295 cycles
[    8.735471] clocksource: kvm-clock: mask: 0xffffffffffffffff max_cycles: 0x1cd42e4dffb, max_idle_ns: 881590591483 ns
[    8.738880] tsc: Detected 3293.776 MHz processor
[    8.740474] e820: update [mem 0x00000000-0x00000fff] System RAM ==> device reserved
[...]
[    8.932671] rcu: srcu_init: Setting srcu_struct sizes based on contention.
[    8.934047] Early printk times: started callibration: 201079507 ns

End of stage B

This is where we started calibration of early cycles.

Start of stage C

[    8.935571] Console: colour dummy device 80x25
[...]
[    9.289673] thermal_sys: Registered thermal governor 'fair_share'
[    9.290077] thermal_sys: Registered thermal governor 'bang_bang'
[    9.292290] thermal_sys: Registered thermal governor 'step_wise'
[    9.293530] thermal_sys: Registered thermal governor 'user_space'
[    9.294856] cpuidle: using governor ladder
[    9.296302] cpuidle: using governor menu

Here the thermal governors are registered. I guess that they might
reduce speed of some HW.

[...]
[   11.974147] clk: Disabling unused clocks

Some unused clocks are disabled. I wonder if this might affect
counting the cycles.

[   12.330852] Freeing unused kernel image (rodata/data gap) memory: 1500K
[   12.351191] Early printk times: mult=19634245, shift=26, offset=8732967929 ns

End of stage C

Here is the end on calibration.

Now, if the frequence of the cycles was:

   + was higher in the stage A when only cycles were stored
   + was lower in stage C when it was calibrated against local_clock()

Then it might result in higher (calibrated) timestamps in stage A
and step back in stage B.

Or something like this. It is possible that even local_clock() does
not have a stable frequence during the early boot.

Idea: A solution might be to start calibration when printk()
      gets first non-zero time from local_clock.

      Something like:

diff --git a/include/linux/early_times.h b/include/linux/early_times.h
index 05388dcb8573..09d278996184 100644
--- a/include/linux/early_times.h
+++ b/include/linux/early_times.h
@@ -16,10 +16,13 @@ extern u64 start_ns;
 extern u32 early_mult, early_shift;
 extern u64 early_ts_offset;
 
-static inline void early_times_start_calibration(void)
+static inline void early_times_may_start_calibration(u64 ts_ns)
 {
+	if (start_ns)
+		return;
+
+	start_ns = ts_ns;
 	start_cycles = get_cycles();
-	start_ns = local_clock();
 }
 
 static inline void early_times_finish_calibration(void)
diff --git a/init/main.c b/init/main.c
index 27835270dfb5..a333b0da69cf 100644
--- a/init/main.c
+++ b/init/main.c
@@ -1123,9 +1123,6 @@ void start_kernel(void)
 	timekeeping_init();
 	time_init();
 
-	/* This must be after timekeeping is initialized */
-	early_times_start_calibration();
-
 	/* This must be after timekeeping is initialized */
 	random_init();
 
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 774ffb1fa5ac..19330b6b4eb2 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2306,7 +2306,9 @@ int vprintk_store(int facility, int level,
 	 * timestamp with respect to the caller.
 	 */
 	ts_nsec = local_clock();
-	if (!ts_nsec)
+	if (ts_nsec)
+		early_times_may_start_calibration(ts_nsec);
+	else
 		ts_nsec = early_cycles();
 
 	caller_id = printk_caller_id();


> > 
> > 2) A Linux VM running in the Azure cloud is also running on Hyper-V. Such a
> > VM typically uses cloud-init to set everything up at boot time, and cloud-init
> > is outputting lines to the serial console with a timestamp that looks like the
> > printk() timestamp, but apparently is not adjusted for the early timestamping
> > that this patch adds. Again, I haven't debugged what's going on -- I'm not
> > immediately sure of the mechanism that cloud-init uses to do output to the
> > serial console. The use of the Hyper-V synthetic clock source might the cause
> > of the problem here as well. Here's an output snippet from the serial console:
> > 
> > [   20.330414] systemd[1]: Condition check resulted in OpenVSwitch configuration for cleanup being skipped.
> > [   20.332911] systemd[1]: Condition check resulted in Set Up Additional Binary Formats being skipped.
> > [   20.333257] pstore: Registered efi_pstore as persistent store backend
> > [   20.334360] systemd[1]: Condition check resulted in File System Check on Root Device being skipped.
> > [   20.338319] systemd[1]: Starting Load Kernel Modules...
> > [   20.341094] systemd[1]: Starting Remount Root and Kernel File Systems...
> > [   20.350993] systemd[1]: Starting udev Coldplug all Devices...
> > [   20.356255] systemd[1]: Starting Uncomplicated firewall...
> > [   20.361536] systemd[1]: Started Journal Service.
> > [   20.386902] EXT4-fs (sda1): re-mounted c02dce0c-0c40-4e6e-88af-c5a0987b0adb r/w.
> > [   22.532033] /dev/sr0: Can't lookup blockdev
> > [    7.955973] cloud-init[783]: Cloud-init v. 24.3.1-0ubuntu0~20.04.1 running 'init-local' at Wed, 11 Mar 2026 15:27:06 +0000. Up 7.48
> > seconds.
> > [    9.933120] cloud-init[822]: Cloud-init v. 24.3.1-0ubuntu0~20.04.1 running 'init' at Wed, 11 Mar 2026 15:27:08 +0000. Up 9.82 seconds.
> > [    9.935483] cloud-init[822]: ci-info: ++++++++++++++++++++++++++++++++++++++Net device
> > info+++++++++++++++++++++++++++++++++++++++
> > [    9.937726] cloud-init[822]: ci-info: +--------+------+-----------------------------+---------------+--------+-------------------+
> > [    9.939905] cloud-init[822]: ci-info: | Device |  Up  |           Address           |      Mask     | Scope  |     Hw-Address    |
> > [    9.942059] cloud-init[822]: ci-info: +--------+------+-----------------------------+---------------+--------+-------------------+

This is more complicated. I wonder if the timestamps from cloud-init()
are somehow synchronized with local_clock().

We might need to synchronize local_clock() with the cycles as well.
But there is the chicken&egg problem. We need:

    + to know the offset caused by cycles when local_clock() gets initialized.
    + local_clock() running for some time to calibrate cycles.

Hmm, I see that time-management people are not in Cc. We should have
added them since v1.

I add them now. Better late than never.

Best Regards,
Petr

  reply	other threads:[~2026-03-13 10:45 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-25  5:30 [PATCH] printk: add early_counter_ns routine for printk blind spot Tim Bird
2025-11-25  7:52 ` kernel test robot
2025-11-25 13:08 ` Francesco Valla
2025-11-26  7:38   ` Geert Uytterhoeven
2025-11-27  0:16     ` Bird, Tim
2025-11-27 16:16       ` Petr Mladek
2025-11-26 12:55   ` Petr Mladek
2025-11-27  0:03     ` Bird, Tim
2025-11-26 11:13 ` Petr Mladek
2025-11-27  9:13 ` kernel test robot
2026-01-24 19:40 ` [PATCH v2] printk: fix zero-valued printk timestamps in early boot Tim Bird
2026-01-25 14:41   ` Francesco Valla
2026-01-26 16:52     ` Bird, Tim
2026-02-02 16:23       ` Petr Mladek
2026-01-26 10:12   ` Geert Uytterhoeven
2026-01-26 17:11     ` Bird, Tim
2026-01-27  8:10       ` Geert Uytterhoeven
2026-02-10 23:47 ` [PATCH v3] " Tim Bird
2026-03-04 11:23   ` Petr Mladek
2026-03-09 17:27   ` Shashank Balaji
2026-03-10 10:43     ` Petr Mladek
2026-03-10 19:17     ` Bird, Tim
2026-03-09 19:25   ` Shashank Balaji
2026-03-10 11:39     ` Petr Mladek
2026-03-10 18:54       ` Bird, Tim
2026-03-11 15:45         ` Petr Mladek
2026-03-11 15:47   ` Michael Kelley
2026-03-13  4:52     ` Bird, Tim
2026-03-13 10:45       ` Petr Mladek [this message]
2026-03-14 14:16         ` Shashank Balaji
2026-03-24 20:07           ` Bird, Tim
2026-03-14 16:15         ` Michael Kelley
2026-03-24 19:47           ` Bird, Tim
2026-03-26  9:24             ` John Ogness
2026-03-27 18:04               ` Bird, Tim
2026-03-20 18:15         ` Bird, Tim
2026-03-26 13:17   ` Thomas Gleixner
2026-03-27 18:48     ` Bird, Tim

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=abPqrqgXHnDu8XWf@pathway \
    --to=pmladek@suse.com \
    --cc=Tim.Bird@sony.com \
    --cc=francesco@valla.it \
    --cc=geert@linux-m68k.org \
    --cc=john.ogness@linutronix.de \
    --cc=jstultz@google.com \
    --cc=linux-embedded@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mhklinux@outlook.com \
    --cc=rostedt@goodmis.org \
    --cc=sboyd@kernel.org \
    --cc=senozhatsky@chromium.org \
    --cc=tglx@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox