[PATCH] printk: add early_counter_ns routine for printk blind spot

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH] printk: add early_counter_ns routine for printk blind spot
@ 2025-11-25  5:30 Tim Bird
  2025-11-25  7:52 ` kernel test robot
                   ` (5 more replies)
  0 siblings, 6 replies; 36+ messages in thread
From: Tim Bird @ 2025-11-25  5:30 UTC (permalink / raw)
  To: pmladek, Steve Rostedt, john.ogness, senozhatsky
  Cc: Tim Bird, Andrew Morton, Francesco Valla, LKML, Linux Embedded

From: Tim Bird <tim.bird@sony.com>

During early boot, printk timestamps are reported as zero,
which creates a blind spot in early boot timings.  This blind
spot hinders timing and optimization efforts for code that
executes before time_init(), which is when local_clock() is
initialized sufficiently to start returning non-zero timestamps.
This period is about 400 milliseconds for many current desktop
and embedded machines running Linux.

Add an early_counter_ns function that returns nanosecond
timestamps based on get_cycles().  get_cycles() is operational
on arm64 and x86_64 from kernel start.  Add some calibration
printks to allow setting configuration variables that are used
to convert cycle counts to nanoseconds (which are then used
in early printks).  Add CONFIG_EARLY_COUNTER_NS, as well as
some associated conversion variables, as new kernel config
variables.

After proper configuration, this yields non-zero timestamps for
printks from the very start of kernel execution.  The timestamps
are relative to the start of the architecture-specific counter
used in get_cycles (e.g. the TSC on x86_64 and cntvct_el0 on arm64).
This means that the time reported reflects time-from-power-on for
most embedded products.  This is also a useful data point for
boot-time optimization work.

Note that there is a discontinuity in the timestamp sequencing
when standard clocks are finally initialized in time_init().
The printk timestamps are thus not monotonically increasing
through the entire boot.

Signed-off-by: Tim Bird <tim.bird@sony.com>
---
 init/Kconfig           | 47 ++++++++++++++++++++++++++++++++++++++++++
 init/main.c            | 25 ++++++++++++++++++++++
 kernel/printk/printk.c | 15 ++++++++++++++
 3 files changed, 87 insertions(+)

diff --git a/init/Kconfig b/init/Kconfig
index cab3ad28ca49..5352567c43ed 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -770,6 +770,53 @@ config IKHEADERS
 	  or similar programs.  If you build the headers as a module, a module called
 	  kheaders.ko is built which can be loaded on-demand to get access to headers.
 
+config EARLY_COUNTER_NS
+	bool "Use counter for early printk timestamps"
+	default y
+	depends on PRINTK
+	help
+	  Use a cycle-counter to provide printk timestamps during early
+	  boot.  This allows seeing timing information that would
+	  otherwise be displayed with 0-valued timestamps.
+
+	  In order for this to work, you need to specify values for
+	  EARLY_COUNTER_MULT and EARLY_COUNTER_SHIFT, used to convert
+	  from the cycle count to nanoseconds.
+
+config EARLY_COUNTER_MULT
+	int "Multiplier for early cycle counter"
+	depends on PRINTK && EARLY_COUNTER_NS
+	default 1
+	help
+	  This value specifies a multiplier to be used when converting
+	  cycle counts to nanoseconds.  The formula used is:
+		  ns = (cycles * mult) >> shift
+
+	  Use a multiplier that will bring the value of (cycles * mult)
+	  to near a power of two, that is greater than 1000.  The
+	  nanoseconds returned by this conversion are divided by 1000
+	  to be used as the printk timestamp counter (with resolution
+	  of microseconds).
+
+	  As an example, for a cycle-counter with a frequency of 200 Mhz,
+	  the multiplier would be: 10485760, and the shift would be 21.
+
+config EARLY_COUNTER_SHIFT
+	int "Shift value for early cycle counter"
+	range 0 63
+	depends on PRINTK && EARLY_COUNTER_NS
+	default 0
+	help
+	  This value specifies a shift value to be used when converting
+	  cycle counts to nanoseconds.  The formula used is:
+		  ns = (cycles * mult) >> shift
+
+	  Use a shift that will bring the result to a value
+	  in nanoseconds.
+
+	  As an example, for a cycle-counter with a frequency of 200 Mhz,
+	  the multiplier would be: 10485760, and the shift would be 21.
+
 config LOG_BUF_SHIFT
 	int "Kernel log buffer size (16 => 64KB, 17 => 128KB)"
 	range 12 25
diff --git a/init/main.c b/init/main.c
index 07a3116811c5..587aaaad22d1 100644
--- a/init/main.c
+++ b/init/main.c
@@ -105,6 +105,8 @@
 #include <linux/ptdump.h>
 #include <linux/time_namespace.h>
 #include <net/net_namespace.h>
+#include <linux/timex.h>
+#include <linux/sched/clock.h>
 
 #include <asm/io.h>
 #include <asm/setup.h>
@@ -906,6 +908,8 @@ static void __init early_numa_node_init(void)
 #endif
 }
 
+static u64 start_cycles, start_ns;
+
 asmlinkage __visible __init __no_sanitize_address __noreturn __no_stack_protector
 void start_kernel(void)
 {
@@ -1023,6 +1027,10 @@ void start_kernel(void)
 	timekeeping_init();
 	time_init();
 
+	/* used to calibrate early_counter_ns */
+	start_cycles = get_cycles();
+	start_ns = local_clock();
+
 	/* This must be after timekeeping is initialized */
 	random_init();
 
@@ -1474,6 +1482,8 @@ void __weak free_initmem(void)
 static int __ref kernel_init(void *unused)
 {
 	int ret;
+	u64 end_cycles, end_ns;
+	u32 early_mult, early_shift;
 
 	/*
 	 * Wait until kthreadd is all set-up.
@@ -1505,6 +1515,21 @@ static int __ref kernel_init(void *unused)
 
 	do_sysctl_args();
 
+	/* show calibration data for early_counter_ns */
+	end_cycles = get_cycles();
+	end_ns = local_clock();
+	clocks_calc_mult_shift(&early_mult, &early_shift,
+		((end_cycles - start_cycles) * NSEC_PER_SEC)/(end_ns - start_ns),
+		NSEC_PER_SEC, 50);
+
+#ifdef CONFIG_EARLY_COUNTER_NS
+	pr_info("Early Counter: start_cycles=%llu, end_cycles=%llu, cycles=%llu\n",
+		start_cycles, end_cycles, (end_cycles - start_cycles));
+	pr_info("Early Counter: start_ns=%llu, end_ns=%llu, ns=%llu\n",
+		start_ns, end_ns, (end_ns - start_ns));
+	pr_info("Early Counter: MULT=%u, SHIFT=%u\n", early_mult, early_shift);
+#endif
+
 	if (ramdisk_execute_command) {
 		ret = run_init_process(ramdisk_execute_command);
 		if (!ret)
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 5aee9ffb16b9..522dd24cd534 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2210,6 +2210,19 @@ static u16 printk_sprint(char *text, u16 size, int facility,
 	return text_len;
 }
 
+#ifdef CONFIG_EARLY_COUNTER_NS
+static inline u64 early_counter_ns(void)
+{
+	return ((u64)get_cycles() * CONFIG_EARLY_COUNTER_MULT)
+		>> CONFIG_EARLY_COUNTER_SHIFT;
+}
+#else
+static inline u64 early_counter_ns(void)
+{
+	return 0;
+}
+#endif
+
 __printf(4, 0)
 int vprintk_store(int facility, int level,
 		  const struct dev_printk_info *dev_info,
@@ -2239,6 +2252,8 @@ int vprintk_store(int facility, int level,
 	 * timestamp with respect to the caller.
 	 */
 	ts_nsec = local_clock();
+	if (!ts_nsec)
+		ts_nsec = early_counter_ns();
 
 	caller_id = printk_caller_id();
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH] printk: add early_counter_ns routine for printk blind spot
  2025-11-25  5:30 [PATCH] printk: add early_counter_ns routine for printk blind spot Tim Bird
@ 2025-11-25  7:52 ` kernel test robot
  2025-11-25 13:08 ` Francesco Valla
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 36+ messages in thread
From: kernel test robot @ 2025-11-25  7:52 UTC (permalink / raw)
  To: Tim Bird, pmladek, Steve Rostedt, john.ogness, senozhatsky
  Cc: oe-kbuild-all, Tim Bird, Andrew Morton,
	Linux Memory Management List, Francesco Valla, LKML,
	Linux Embedded

Hi Tim,

kernel test robot noticed the following build errors:

[auto build test ERROR on linus/master]
[also build test ERROR on v6.18-rc7]
[cannot apply to akpm-mm/mm-everything next-20251124]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Tim-Bird/printk-add-early_counter_ns-routine-for-printk-blind-spot/20251125-133242
base:   linus/master
patch link:    https://lore.kernel.org/r/39b09edb-8998-4ebd-a564-7d594434a981%40bird.org
patch subject: [PATCH] printk: add early_counter_ns routine for printk blind spot
config: powerpc-allnoconfig (https://download.01.org/0day-ci/archive/20251125/202511251534.9kMSsAH6-lkp@intel.com/config)
compiler: powerpc-linux-gcc (GCC) 15.1.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20251125/202511251534.9kMSsAH6-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202511251534.9kMSsAH6-lkp@intel.com/

All errors (new ones prefixed by >>):

   powerpc-linux-ld: init/main.o: in function `kernel_init':
>> main.c:(.ref.text+0x144): undefined reference to `__udivdi3'

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] printk: add early_counter_ns routine for printk blind spot
  2025-11-25  5:30 [PATCH] printk: add early_counter_ns routine for printk blind spot Tim Bird
  2025-11-25  7:52 ` kernel test robot
@ 2025-11-25 13:08 ` Francesco Valla
  2025-11-26  7:38   ` Geert Uytterhoeven
  2025-11-26 12:55   ` Petr Mladek
  2025-11-26 11:13 ` Petr Mladek
                   ` (3 subsequent siblings)
  5 siblings, 2 replies; 36+ messages in thread
From: Francesco Valla @ 2025-11-25 13:08 UTC (permalink / raw)
  To: Tim Bird
  Cc: pmladek, Steve Rostedt, john.ogness, senozhatsky, Tim Bird,
	Andrew Morton, LKML, Linux Embedded

Hi Tim,

I tested this on my i.MX93 FRDM (arm64) board and after a bit of
fiddling with the MULT/SHIFT values I got it working. It can be a very
valuable addition.

Some comments follow.

On Mon, Nov 24, 2025 at 10:30:52PM -0700, Tim Bird wrote:
> From: Tim Bird <tim.bird@sony.com>
> 
> During early boot, printk timestamps are reported as zero,
> which creates a blind spot in early boot timings.  This blind
> spot hinders timing and optimization efforts for code that
> executes before time_init(), which is when local_clock() is
> initialized sufficiently to start returning non-zero timestamps.
> This period is about 400 milliseconds for many current desktop
> and embedded machines running Linux.
> 
> Add an early_counter_ns function that returns nanosecond
> timestamps based on get_cycles().  get_cycles() is operational
> on arm64 and x86_64 from kernel start.  Add some calibration
> printks to allow setting configuration variables that are used
> to convert cycle counts to nanoseconds (which are then used
> in early printks).  Add CONFIG_EARLY_COUNTER_NS, as well as
> some associated conversion variables, as new kernel config
> variables.
> 
> After proper configuration, this yields non-zero timestamps for
> printks from the very start of kernel execution.  The timestamps
> are relative to the start of the architecture-specific counter
> used in get_cycles (e.g. the TSC on x86_64 and cntvct_el0 on arm64).
> This means that the time reported reflects time-from-power-on for
> most embedded products.  This is also a useful data point for
> boot-time optimization work.
> 
> Note that there is a discontinuity in the timestamp sequencing
> when standard clocks are finally initialized in time_init().
> The printk timestamps are thus not monotonically increasing
> through the entire boot.

This is... not going to work, IMO, and might lead to breakages in
userspace tools (are printk timings a userspace API?).

I actually have a counter-proposal: the time obtained through cycle
evaluation is used as an offset to be added to the printk time after
time_init() is called. A (working, but maybe sub-optimal) patch to
obtain this is attached at the end.

> 
> Signed-off-by: Tim Bird <tim.bird@sony.com>
> ---
>  init/Kconfig           | 47 ++++++++++++++++++++++++++++++++++++++++++
>  init/main.c            | 25 ++++++++++++++++++++++
>  kernel/printk/printk.c | 15 ++++++++++++++
>  3 files changed, 87 insertions(+)
> 
> diff --git a/init/Kconfig b/init/Kconfig
> index cab3ad28ca49..5352567c43ed 100644
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -770,6 +770,53 @@ config IKHEADERS
>  	  or similar programs.  If you build the headers as a module, a module called
>  	  kheaders.ko is built which can be loaded on-demand to get access to headers.
>  
> +config EARLY_COUNTER_NS
> +	bool "Use counter for early printk timestamps"
> +	default y
> +	depends on PRINTK
> +	help
> +	  Use a cycle-counter to provide printk timestamps during early
> +	  boot.  This allows seeing timing information that would
> +	  otherwise be displayed with 0-valued timestamps.
> +
> +	  In order for this to work, you need to specify values for
> +	  EARLY_COUNTER_MULT and EARLY_COUNTER_SHIFT, used to convert
> +	  from the cycle count to nanoseconds.
> +
> +config EARLY_COUNTER_MULT
> +	int "Multiplier for early cycle counter"
> +	depends on PRINTK && EARLY_COUNTER_NS
> +	default 1
> +	help
> +	  This value specifies a multiplier to be used when converting
> +	  cycle counts to nanoseconds.  The formula used is:
> +		  ns = (cycles * mult) >> shift
> +
> +	  Use a multiplier that will bring the value of (cycles * mult)
> +	  to near a power of two, that is greater than 1000.  The
> +	  nanoseconds returned by this conversion are divided by 1000
> +	  to be used as the printk timestamp counter (with resolution
> +	  of microseconds).
> +
> +	  As an example, for a cycle-counter with a frequency of 200 Mhz,
> +	  the multiplier would be: 10485760, and the shift would be 21.
> +

If I got this correclty:

	EARLY_COUNTER_MULT = (10^9 / freq) << EARLY_COUNTER_SHIFT

where EARLY_COUNTER_SHIFT can be chosen at will, provided it is big
enough to survice the ns->us conversion but small enough not to overflow
the u64 container. 

> +config EARLY_COUNTER_SHIFT
> +	int "Shift value for early cycle counter"
> +	range 0 63
> +	depends on PRINTK && EARLY_COUNTER_NS
> +	default 0
> +	help
> +	  This value specifies a shift value to be used when converting
> +	  cycle counts to nanoseconds.  The formula used is:
> +		  ns = (cycles * mult) >> shift
> +
> +	  Use a shift that will bring the result to a value
> +	  in nanoseconds.
> +
> +	  As an example, for a cycle-counter with a frequency of 200 Mhz,
> +	  the multiplier would be: 10485760, and the shift would be 21.
> +
>  config LOG_BUF_SHIFT
>  	int "Kernel log buffer size (16 => 64KB, 17 => 128KB)"
>  	range 12 25
> diff --git a/init/main.c b/init/main.c
> index 07a3116811c5..587aaaad22d1 100644
> --- a/init/main.c
> +++ b/init/main.c
> @@ -105,6 +105,8 @@
>  #include <linux/ptdump.h>
>  #include <linux/time_namespace.h>
>  #include <net/net_namespace.h>
> +#include <linux/timex.h>
> +#include <linux/sched/clock.h>
>  
>  #include <asm/io.h>
>  #include <asm/setup.h>
> @@ -906,6 +908,8 @@ static void __init early_numa_node_init(void)
>  #endif
>  }
>  
> +static u64 start_cycles, start_ns;
> +
>  asmlinkage __visible __init __no_sanitize_address __noreturn __no_stack_protector
>  void start_kernel(void)
>  {
> @@ -1023,6 +1027,10 @@ void start_kernel(void)
>  	timekeeping_init();
>  	time_init();
>  
> +	/* used to calibrate early_counter_ns */
> +	start_cycles = get_cycles();
> +	start_ns = local_clock();
> +
>  	/* This must be after timekeeping is initialized */
>  	random_init();
>  
> @@ -1474,6 +1482,8 @@ void __weak free_initmem(void)
>  static int __ref kernel_init(void *unused)
>  {
>  	int ret;
> +	u64 end_cycles, end_ns;
> +	u32 early_mult, early_shift;
>  
>  	/*
>  	 * Wait until kthreadd is all set-up.
> @@ -1505,6 +1515,21 @@ static int __ref kernel_init(void *unused)
>  
>  	do_sysctl_args();
>  
> +	/* show calibration data for early_counter_ns */
> +	end_cycles = get_cycles();
> +	end_ns = local_clock();
> +	clocks_calc_mult_shift(&early_mult, &early_shift,
> +		((end_cycles - start_cycles) * NSEC_PER_SEC)/(end_ns - start_ns),
> +		NSEC_PER_SEC, 50);
> +
> +#ifdef CONFIG_EARLY_COUNTER_NS
> +	pr_info("Early Counter: start_cycles=%llu, end_cycles=%llu, cycles=%llu\n",
> +		start_cycles, end_cycles, (end_cycles - start_cycles));
> +	pr_info("Early Counter: start_ns=%llu, end_ns=%llu, ns=%llu\n",
> +		start_ns, end_ns, (end_ns - start_ns));
> +	pr_info("Early Counter: MULT=%u, SHIFT=%u\n", early_mult, early_shift);
> +#endif
> +

I don't get the need to have these here - should they be an help for the
integrator to calibrate and choose EARLY_COUNTER_MULT and
EARLY_COUNTER_SHIFT? The ns values printed here have some meaning only if
these two parameters are already set correctly in the first place -
what's the foreseen calibration procedure?

Moreover, if they are only required for calibration, maybe pr_debugi()
would be a better choice?

>  	if (ramdisk_execute_command) {
>  		ret = run_init_process(ramdisk_execute_command);
>  		if (!ret)
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index 5aee9ffb16b9..522dd24cd534 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -2210,6 +2210,19 @@ static u16 printk_sprint(char *text, u16 size, int facility,
>  	return text_len;
>  }
>  
> +#ifdef CONFIG_EARLY_COUNTER_NS
> +static inline u64 early_counter_ns(void)
> +{
> +	return ((u64)get_cycles() * CONFIG_EARLY_COUNTER_MULT)
> +		>> CONFIG_EARLY_COUNTER_SHIFT;
> +}
> +#else
> +static inline u64 early_counter_ns(void)
> +{
> +	return 0;
> +}
> +#endif
> +
>  __printf(4, 0)
>  int vprintk_store(int facility, int level,
>  		  const struct dev_printk_info *dev_info,
> @@ -2239,6 +2252,8 @@ int vprintk_store(int facility, int level,
>  	 * timestamp with respect to the caller.
>  	 */
>  	ts_nsec = local_clock();
> +	if (!ts_nsec)
> +		ts_nsec = early_counter_ns();
>  
>  	caller_id = printk_caller_id();
>  
> -- 
> 2.43.0
> 
> 

Best regards,

Francesco

---

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 522dd24cd534..b4108f215c5e 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2216,11 +2216,26 @@ static inline u64 early_counter_ns(void)
 	return ((u64)get_cycles() * CONFIG_EARLY_COUNTER_MULT)
 		>> CONFIG_EARLY_COUNTER_SHIFT;
 }
+
+static u64 early_counter_ns_offset(void)
+{
+	static u64 early_counter_ns_start = 0;
+
+	if (!early_counter_ns_start)
+		early_counter_ns_start = early_counter_ns();
+
+	return early_counter_ns_start;
+}
 #else
 static inline u64 early_counter_ns(void)
 {
 	return 0;
 }
+
+static inline u64 early_counter_ns_offset(void)
+{
+	return 0;
+}
 #endif
 
 __printf(4, 0)
@@ -2254,6 +2269,8 @@ int vprintk_store(int facility, int level,
 	ts_nsec = local_clock();
 	if (!ts_nsec)
 		ts_nsec = early_counter_ns();
+	else
+		ts_nsec += early_counter_ns_offset();
 
 	caller_id = printk_caller_id();


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH] printk: add early_counter_ns routine for printk blind spot
  2025-11-25 13:08 ` Francesco Valla
@ 2025-11-26  7:38   ` Geert Uytterhoeven
  2025-11-27  0:16     ` Bird, Tim
  2025-11-26 12:55   ` Petr Mladek
  1 sibling, 1 reply; 36+ messages in thread
From: Geert Uytterhoeven @ 2025-11-26  7:38 UTC (permalink / raw)
  To: Francesco Valla
  Cc: Tim Bird, pmladek, Steve Rostedt, john.ogness, senozhatsky,
	Tim Bird, Andrew Morton, LKML, Linux Embedded

Hi all,

On Wed, 26 Nov 2025 at 03:24, Francesco Valla <francesco@valla.it> wrote:
> On Mon, Nov 24, 2025 at 10:30:52PM -0700, Tim Bird wrote:
> > From: Tim Bird <tim.bird@sony.com>
> >
> > During early boot, printk timestamps are reported as zero,
> > which creates a blind spot in early boot timings.  This blind
> > spot hinders timing and optimization efforts for code that
> > executes before time_init(), which is when local_clock() is
> > initialized sufficiently to start returning non-zero timestamps.
> > This period is about 400 milliseconds for many current desktop
> > and embedded machines running Linux.
> >
> > Add an early_counter_ns function that returns nanosecond
> > timestamps based on get_cycles().  get_cycles() is operational
> > on arm64 and x86_64 from kernel start.  Add some calibration
> > printks to allow setting configuration variables that are used
> > to convert cycle counts to nanoseconds (which are then used
> > in early printks).  Add CONFIG_EARLY_COUNTER_NS, as well as
> > some associated conversion variables, as new kernel config
> > variables.
> >
> > After proper configuration, this yields non-zero timestamps for
> > printks from the very start of kernel execution.  The timestamps
> > are relative to the start of the architecture-specific counter
> > used in get_cycles (e.g. the TSC on x86_64 and cntvct_el0 on arm64).
> > This means that the time reported reflects time-from-power-on for
> > most embedded products.  This is also a useful data point for
> > boot-time optimization work.
> >
> > Note that there is a discontinuity in the timestamp sequencing
> > when standard clocks are finally initialized in time_init().
> > The printk timestamps are thus not monotonically increasing
> > through the entire boot.
>
> This is... not going to work, IMO, and might lead to breakages in
> userspace tools (are printk timings a userspace API?).

I think they are.

Another approach would be to defer the calibration/conversion to
userspace, and make sure the early part stands out.
I.e. when real timekeeping is available, kernel messages are prefixed by
"[%5lu.%06lu]".  Early messages could be prefixed by a plain integer
"[%12u]", containing the raw cycle counter value.
The presence of the decimal point would make the difference obvious.

> I actually have a counter-proposal: the time obtained through cycle
> evaluation is used as an offset to be added to the printk time after
> time_init() is called. A (working, but maybe sub-optimal) patch to
> obtain this is attached at the end.

Oh, that's a nice idea, too!

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: [PATCH] printk: add early_counter_ns routine for printk blind spot
  2025-11-26  7:38   ` Geert Uytterhoeven
@ 2025-11-27  0:16     ` Bird, Tim
  2025-11-27 16:16       ` Petr Mladek
  0 siblings, 1 reply; 36+ messages in thread
From: Bird, Tim @ 2025-11-27  0:16 UTC (permalink / raw)
  To: Geert Uytterhoeven, Francesco Valla
  Cc: Tim Bird, pmladek@suse.com, Steve Rostedt,
	john.ogness@linutronix.de, senozhatsky@chromium.org,
	Andrew Morton, LKML, Linux Embedded



> -----Original Message-----
> From: Geert Uytterhoeven <geert@linux-m68k.org>
> Hi all,
> 
> On Wed, 26 Nov 2025 at 03:24, Francesco Valla <francesco@valla.it> wrote:
> > On Mon, Nov 24, 2025 at 10:30:52PM -0700, Tim Bird wrote:
> > > From: Tim Bird <tim.bird@sony.com>
> > >
> > > During early boot, printk timestamps are reported as zero,
> > > which creates a blind spot in early boot timings.  This blind
> > > spot hinders timing and optimization efforts for code that
> > > executes before time_init(), which is when local_clock() is
> > > initialized sufficiently to start returning non-zero timestamps.
> > > This period is about 400 milliseconds for many current desktop
> > > and embedded machines running Linux.
> > >
> > > Add an early_counter_ns function that returns nanosecond
> > > timestamps based on get_cycles().  get_cycles() is operational
> > > on arm64 and x86_64 from kernel start.  Add some calibration
> > > printks to allow setting configuration variables that are used
> > > to convert cycle counts to nanoseconds (which are then used
> > > in early printks).  Add CONFIG_EARLY_COUNTER_NS, as well as
> > > some associated conversion variables, as new kernel config
> > > variables.
> > >
> > > After proper configuration, this yields non-zero timestamps for
> > > printks from the very start of kernel execution.  The timestamps
> > > are relative to the start of the architecture-specific counter
> > > used in get_cycles (e.g. the TSC on x86_64 and cntvct_el0 on arm64).
> > > This means that the time reported reflects time-from-power-on for
> > > most embedded products.  This is also a useful data point for
> > > boot-time optimization work.
> > >
> > > Note that there is a discontinuity in the timestamp sequencing
> > > when standard clocks are finally initialized in time_init().
> > > The printk timestamps are thus not monotonically increasing
> > > through the entire boot.
> >
> > This is... not going to work, IMO, and might lead to breakages in
> > userspace tools (are printk timings a userspace API?).
> 
> I think they are.
> 
> Another approach would be to defer the calibration/conversion to
> userspace, and make sure the early part stands out.
> I.e. when real timekeeping is available, kernel messages are prefixed by
> "[%5lu.%06lu]".  Early messages could be prefixed by a plain integer
> "[%12u]", containing the raw cycle counter value.
> The presence of the decimal point would make the difference obvious.

I thought about this while I was creating this.
It wouldn't require the extra configuration for MULT and SHIFT (which would be nice),
and it would be, as you say, very obvious that this was not a regular timestamp. 
This means it could be enabled on a generic kernel (making more likely it could be
enabled by default). And really only boot-time optimizers would care enough to
decode the data, so the onus would be on them to run the tool.  Everyone else
could ignore them.

I'm not sure if it would break existing printk-processing tools.  I suspect it would.

Also,  I find that post-processing tools often get overlooked.
I asked at ELC this year how many people are using show_delta, which 
has been upstream for years, and can do a few neat things with printk timestamps,
and not a single person had even heard of it.

In this scenario, you would still need to have the calibration printks in
the code so that the tool could pull them out to then convert the cycle-valued
printks into printks with regular timestamps.

I could see doing this if people object to the non-genericity of the current
solution.

> 
> > I actually have a counter-proposal: the time obtained through cycle
> > evaluation is used as an offset to be added to the printk time after
> > time_init() is called. A (working, but maybe sub-optimal) patch to
> > obtain this is attached at the end.
> 
> Oh, that's a nice idea, too!

Thanks for the feedback!
 -- Tim


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] printk: add early_counter_ns routine for printk blind spot
  2025-11-27  0:16     ` Bird, Tim
@ 2025-11-27 16:16       ` Petr Mladek
  0 siblings, 0 replies; 36+ messages in thread
From: Petr Mladek @ 2025-11-27 16:16 UTC (permalink / raw)
  To: Bird, Tim
  Cc: Geert Uytterhoeven, Francesco Valla, Tim Bird, Steve Rostedt,
	john.ogness@linutronix.de, senozhatsky@chromium.org,
	Andrew Morton, LKML, Linux Embedded

On Thu 2025-11-27 00:16:23, Bird, Tim wrote:
> 
> 
> > -----Original Message-----
> > From: Geert Uytterhoeven <geert@linux-m68k.org>
> > Hi all,
> > 
> > On Wed, 26 Nov 2025 at 03:24, Francesco Valla <francesco@valla.it> wrote:
> > > On Mon, Nov 24, 2025 at 10:30:52PM -0700, Tim Bird wrote:
> > > > From: Tim Bird <tim.bird@sony.com>
> > > >
> > > > During early boot, printk timestamps are reported as zero,
> > > > which creates a blind spot in early boot timings.  This blind
> > > > spot hinders timing and optimization efforts for code that
> > > > executes before time_init(), which is when local_clock() is
> > > > initialized sufficiently to start returning non-zero timestamps.
> > > > This period is about 400 milliseconds for many current desktop
> > > > and embedded machines running Linux.
> > > >
> > > > Add an early_counter_ns function that returns nanosecond
> > > > timestamps based on get_cycles().  get_cycles() is operational
> > > > on arm64 and x86_64 from kernel start.  Add some calibration
> > > > printks to allow setting configuration variables that are used
> > > > to convert cycle counts to nanoseconds (which are then used
> > > > in early printks).  Add CONFIG_EARLY_COUNTER_NS, as well as
> > > > some associated conversion variables, as new kernel config
> > > > variables.
> > > >
> > > > After proper configuration, this yields non-zero timestamps for
> > > > printks from the very start of kernel execution.  The timestamps
> > > > are relative to the start of the architecture-specific counter
> > > > used in get_cycles (e.g. the TSC on x86_64 and cntvct_el0 on arm64).
> > > > This means that the time reported reflects time-from-power-on for
> > > > most embedded products.  This is also a useful data point for
> > > > boot-time optimization work.
> > > >
> > > > Note that there is a discontinuity in the timestamp sequencing
> > > > when standard clocks are finally initialized in time_init().
> > > > The printk timestamps are thus not monotonically increasing
> > > > through the entire boot.
> > >
> > > This is... not going to work, IMO, and might lead to breakages in
> > > userspace tools (are printk timings a userspace API?).
> > 
> > I think they are.
> > 
> > Another approach would be to defer the calibration/conversion to
> > userspace, and make sure the early part stands out.
> > I.e. when real timekeeping is available, kernel messages are prefixed by
> > "[%5lu.%06lu]".  Early messages could be prefixed by a plain integer
> > "[%12u]", containing the raw cycle counter value.
> > The presence of the decimal point would make the difference obvious.
> 
> I thought about this while I was creating this.
> It wouldn't require the extra configuration for MULT and SHIFT (which would be nice),
> and it would be, as you say, very obvious that this was not a regular timestamp. 
> This means it could be enabled on a generic kernel (making more likely it could be
> enabled by default). And really only boot-time optimizers would care enough to
> decode the data, so the onus would be on them to run the tool.  Everyone else
> could ignore them.
> 
> I'm not sure if it would break existing printk-processing tools.  I suspect it would.

I guess that it might break even basic tools, like dmesg, journalctl,
or crash.

A solution might be to pass it as an extra information to the official
timestamp, for example:

  + on console:

      <level>[timestamp][callerid][cl cycles] message
      <6>[    0.000000][    T0][cl  345678] BIOS-provided physical RAM map:
      <6>[    0.000000][    T0][cl 1036890] BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable
      <6>[    0.000000][    T0][cl 1129452] BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved

  + via /dev/kmsg

     <level>,<sequnum>,<timestamp>,<contflag>[,additional_values, ... ];<message text>
     6,2,0,-,caller=T0,cycle=345678;BIOS-provided physical RAM map:
     6,3,0,-,caller=T0,cycle=1036890;BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable
     6,4,0,-,caller=T0,cycle=1129452;BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved

The extra field would disappear after time_init().

The value might be stored in struct printk_info in the same field .ts_nsec.
It might be distinguished from a real timestamp using a flag in
enum printk_info_flags. The official timestamp would be zero when
this flag is set.

It will not require the two CONFIG_ values for calibrating the
computation.

The output on the console is a bit messy. But I guess that this
feature is rather for tuning and it won't be enabled on production
systems. So it might be acceptable.

time_init() might even print a message with the cycle value
and the official timestamp on the same line. It can be used
for post-processing and translating cycles back to ns.

> Also,  I find that post-processing tools often get overlooked.
> I asked at ELC this year how many people are using show_delta, which 
> has been upstream for years, and can do a few neat things with printk timestamps,
> and not a single person had even heard of it.

Yeah. We need to make sure that the post processing tool won't get mad,
for example, crash or show garbage.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] printk: add early_counter_ns routine for printk blind spot
  2025-11-25 13:08 ` Francesco Valla
  2025-11-26  7:38   ` Geert Uytterhoeven
@ 2025-11-26 12:55   ` Petr Mladek
  2025-11-27  0:03     ` Bird, Tim
  1 sibling, 1 reply; 36+ messages in thread
From: Petr Mladek @ 2025-11-26 12:55 UTC (permalink / raw)
  To: Francesco Valla
  Cc: Tim Bird, Steve Rostedt, john.ogness, senozhatsky, Tim Bird,
	Andrew Morton, LKML, Anna-Maria Behnsen, Frederic Weisbecker,
	Thomas Gleixner, Linux Embedded

On Tue 2025-11-25 14:08:40, Francesco Valla wrote:
> Hi Tim,
> 
> I tested this on my i.MX93 FRDM (arm64) board and after a bit of
> fiddling with the MULT/SHIFT values I got it working. It can be a very
> valuable addition.
> 
> Some comments follow.
> 
> On Mon, Nov 24, 2025 at 10:30:52PM -0700, Tim Bird wrote:
> > From: Tim Bird <tim.bird@sony.com>
> > 
> > During early boot, printk timestamps are reported as zero,
> > which creates a blind spot in early boot timings.  This blind
> > spot hinders timing and optimization efforts for code that
> > executes before time_init(), which is when local_clock() is
> > initialized sufficiently to start returning non-zero timestamps.
> > This period is about 400 milliseconds for many current desktop
> > and embedded machines running Linux.
> > 
> > Add an early_counter_ns function that returns nanosecond
> > timestamps based on get_cycles().  get_cycles() is operational
> > on arm64 and x86_64 from kernel start.  Add some calibration
> > printks to allow setting configuration variables that are used
> > to convert cycle counts to nanoseconds (which are then used
> > in early printks).  Add CONFIG_EARLY_COUNTER_NS, as well as
> > some associated conversion variables, as new kernel config
> > variables.
> > 
> > After proper configuration, this yields non-zero timestamps for
> > printks from the very start of kernel execution.  The timestamps
> > are relative to the start of the architecture-specific counter
> > used in get_cycles (e.g. the TSC on x86_64 and cntvct_el0 on arm64).
> > This means that the time reported reflects time-from-power-on for
> > most embedded products.  This is also a useful data point for
> > boot-time optimization work.
> > 
> > Note that there is a discontinuity in the timestamp sequencing
> > when standard clocks are finally initialized in time_init().
> > The printk timestamps are thus not monotonically increasing
> > through the entire boot.
> 
> This is... not going to work, IMO, and might lead to breakages in
> userspace tools (are printk timings a userspace API?).

Honestly, I am not sure if it would break anything. The fact is
that printk() always used monotonic timers. And it is possible
that some userspace depends on it.

I personally thing that non-monotonic time stamps might be confusing
but they should not cause any serious breakage. But I might be wrong.
People are creative...

> I actually have a counter-proposal: the time obtained through cycle
> evaluation is used as an offset to be added to the printk time after
> time_init() is called. A (working, but maybe sub-optimal) patch to
> obtain this is attached at the end.

I am not sure if this is a good idea. The offset would cause
that all post-timer-init printk timestamps differ from values
provided by the timer API. And it might cause confusion,
for example, when they are printed as part of the message,
or when analyzing a crash dump.

On the other hand, there are various clock sources in the kernel
which are not comparable anyway. So maybe I am too cautious.

> > --- a/init/Kconfig
> > +++ b/init/Kconfig
> > @@ -770,6 +770,53 @@ config IKHEADERS
> >  	  or similar programs.  If you build the headers as a module, a module called
> >  	  kheaders.ko is built which can be loaded on-demand to get access to headers.
> >  
> > +config EARLY_COUNTER_NS
> > +	bool "Use counter for early printk timestamps"
> > +	default y
> > +	depends on PRINTK
> > +	help
> > +	  Use a cycle-counter to provide printk timestamps during early
> > +	  boot.  This allows seeing timing information that would
> > +	  otherwise be displayed with 0-valued timestamps.
> > +
> > +	  In order for this to work, you need to specify values for
> > +	  EARLY_COUNTER_MULT and EARLY_COUNTER_SHIFT, used to convert
> > +	  from the cycle count to nanoseconds.
> > +
> > +config EARLY_COUNTER_MULT
> > +	int "Multiplier for early cycle counter"
> > +	depends on PRINTK && EARLY_COUNTER_NS
> > +	default 1
> > +	help
> > +	  This value specifies a multiplier to be used when converting
> > +	  cycle counts to nanoseconds.  The formula used is:
> > +		  ns = (cycles * mult) >> shift
> > +
> > +	  Use a multiplier that will bring the value of (cycles * mult)
> > +	  to near a power of two, that is greater than 1000.  The
> > +	  nanoseconds returned by this conversion are divided by 1000
> > +	  to be used as the printk timestamp counter (with resolution
> > +	  of microseconds).
> > +
> > +	  As an example, for a cycle-counter with a frequency of 200 Mhz,
> > +	  the multiplier would be: 10485760, and the shift would be 21.
> > +
> 
> If I got this correclty:
> 
> 	EARLY_COUNTER_MULT = (10^9 / freq) << EARLY_COUNTER_SHIFT
> 
> where EARLY_COUNTER_SHIFT can be chosen at will, provided it is big
> enough to survice the ns->us conversion but small enough not to overflow
> the u64 container. 
> 
> > +config EARLY_COUNTER_SHIFT
> > +	int "Shift value for early cycle counter"
> > +	range 0 63
> > +	depends on PRINTK && EARLY_COUNTER_NS
> > +	default 0
> > +	help
> > +	  This value specifies a shift value to be used when converting
> > +	  cycle counts to nanoseconds.  The formula used is:
> > +		  ns = (cycles * mult) >> shift
> > +
> > +	  Use a shift that will bring the result to a value
> > +	  in nanoseconds.
> > +
> > +	  As an example, for a cycle-counter with a frequency of 200 Mhz,
> > +	  the multiplier would be: 10485760, and the shift would be 21.
> > +
> >  config LOG_BUF_SHIFT
> >  	int "Kernel log buffer size (16 => 64KB, 17 => 128KB)"
> >  	range 12 25

So, it is usable only for a particular HW. It is not usable for a
generic kernel which is supposed to run on misc HW.

I guess that there is no way to detect the CPU frequency at runtime?

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: [PATCH] printk: add early_counter_ns routine for printk blind spot
  2025-11-26 12:55   ` Petr Mladek
@ 2025-11-27  0:03     ` Bird, Tim
  0 siblings, 0 replies; 36+ messages in thread
From: Bird, Tim @ 2025-11-27  0:03 UTC (permalink / raw)
  To: Petr Mladek, Francesco Valla
  Cc: Tim Bird, Steve Rostedt, john.ogness@linutronix.de,
	senozhatsky@chromium.org, Andrew Morton, LKML, Anna-Maria Behnsen,
	Frederic Weisbecker, Thomas Gleixner, Linux Embedded

> -----Original Message-----
> From: Petr Mladek <pmladek@suse.com>
> On Tue 2025-11-25 14:08:40, Francesco Valla wrote:
> > Hi Tim,
> >
> > I tested this on my i.MX93 FRDM (arm64) board and after a bit of
> > fiddling with the MULT/SHIFT values I got it working. It can be a very
> > valuable addition.
> >
> > Some comments follow.
> >
> > On Mon, Nov 24, 2025 at 10:30:52PM -0700, Tim Bird wrote:
> > > From: Tim Bird <tim.bird@sony.com>
> > >
> > > During early boot, printk timestamps are reported as zero,
> > > which creates a blind spot in early boot timings.  This blind
> > > spot hinders timing and optimization efforts for code that
> > > executes before time_init(), which is when local_clock() is
> > > initialized sufficiently to start returning non-zero timestamps.
> > > This period is about 400 milliseconds for many current desktop
> > > and embedded machines running Linux.
> > >
> > > Add an early_counter_ns function that returns nanosecond
> > > timestamps based on get_cycles().  get_cycles() is operational
> > > on arm64 and x86_64 from kernel start.  Add some calibration
> > > printks to allow setting configuration variables that are used
> > > to convert cycle counts to nanoseconds (which are then used
> > > in early printks).  Add CONFIG_EARLY_COUNTER_NS, as well as
> > > some associated conversion variables, as new kernel config
> > > variables.
> > >
> > > After proper configuration, this yields non-zero timestamps for
> > > printks from the very start of kernel execution.  The timestamps
> > > are relative to the start of the architecture-specific counter
> > > used in get_cycles (e.g. the TSC on x86_64 and cntvct_el0 on arm64).
> > > This means that the time reported reflects time-from-power-on for
> > > most embedded products.  This is also a useful data point for
> > > boot-time optimization work.
> > >
> > > Note that there is a discontinuity in the timestamp sequencing
> > > when standard clocks are finally initialized in time_init().
> > > The printk timestamps are thus not monotonically increasing
> > > through the entire boot.
> >
> > This is... not going to work, IMO, and might lead to breakages in
> > userspace tools (are printk timings a userspace API?).
> 
> Honestly, I am not sure if it would break anything. The fact is
> that printk() always used monotonic timers. And it is possible
> that some userspace depends on it.
> 
> I personally thing that non-monotonic time stamps might be confusing
> but they should not cause any serious breakage. But I might be wrong.
> People are creative...

I worried about this, but I'm skeptical it's a big deal.  Humans might be
a little confused, but it's not difficult to see what's going on just by looking
at the timestamps.   If a tool breaks, especially something that's used
in automation, e.g. it's used to report results, or is in some sort of CI
loop where the break will cascade into a test failure, then that's a bigger issue. 
But right now I'm not aware of any boot-time tests where that would be the case.

I'll comment more on different fixes for this below.

> 
> > I actually have a counter-proposal: the time obtained through cycle
> > evaluation is used as an offset to be added to the printk time after
> > time_init() is called. A (working, but maybe sub-optimal) patch to
> > obtain this is attached at the end.
> 
> I am not sure if this is a good idea. The offset would cause
> that all post-timer-init printk timestamps differ from values
> provided by the timer API. And it might cause confusion,
> for example, when they are printed as part of the message,
> or when analyzing a crash dump.
> 
> On the other hand, there are various clock sources in the kernel
> which are not comparable anyway. So maybe I am too cautious.

I thought of adding an offset, but I didn't want to disturb anything past
the time_init() call.  As it is now, the early_counter_ns feature only
changes the zero-valued timestamps.  So anything relying on the absolute
value of an existing timestamp later in the boot would not be affected.
I thought that if people suddenly saw the timestamps jump by 10 to 30 seconds
(since they are now relative to machine start instead of to kernel clock start
(time_init()), it would be very jarring.  I suppose they would get used to it,
though, and all relative timings should stay the same.

I also didn't want to add additional overhead (even a single add) in the case
where CONFIG_EARLY_COUNTER_NS was disabled.  But, realistically, I
don't think an additional add in the printk path is not going to be noticeable,
and I can probably structure it so that there's absolutely no overhead when
CONFIG_EARLY_COUNTER_NS is disabled.

I considered embedding an early counter offset into local_clock(), and thus
not modifying the printk code at all.  This would have the benefit
of keeping the printk timestamps consistent with other uses of local_clock()
data (such as crash dumps or inside other messages). But then that would embed the
early_counter overhead into every user of local_clock() (whether humans
saw the timestamp values or not). And some of those users might be more
performance sensitive than printk is.

Finally, I considered adding another config option to control adding the offset,
but at 3 configs for this fairly niche functionality, I thought I was already
pushing my luck.  But a new config would be easy to add.

My plan is to add an offset for the early_counter_ns value to printk, without
a config, in the next patch version, and see what people think.

> 
> > > --- a/init/Kconfig
> > > +++ b/init/Kconfig
> > > @@ -770,6 +770,53 @@ config IKHEADERS
> > >  	  or similar programs.  If you build the headers as a module, a module called
> > >  	  kheaders.ko is built which can be loaded on-demand to get access to headers.
> > >
> > > +config EARLY_COUNTER_NS
> > > +	bool "Use counter for early printk timestamps"
> > > +	default y
> > > +	depends on PRINTK
> > > +	help
> > > +	  Use a cycle-counter to provide printk timestamps during early
> > > +	  boot.  This allows seeing timing information that would
> > > +	  otherwise be displayed with 0-valued timestamps.
> > > +
> > > +	  In order for this to work, you need to specify values for
> > > +	  EARLY_COUNTER_MULT and EARLY_COUNTER_SHIFT, used to convert
> > > +	  from the cycle count to nanoseconds.
> > > +
> > > +config EARLY_COUNTER_MULT
> > > +	int "Multiplier for early cycle counter"
> > > +	depends on PRINTK && EARLY_COUNTER_NS
> > > +	default 1
> > > +	help
> > > +	  This value specifies a multiplier to be used when converting
> > > +	  cycle counts to nanoseconds.  The formula used is:
> > > +		  ns = (cycles * mult) >> shift
> > > +
> > > +	  Use a multiplier that will bring the value of (cycles * mult)
> > > +	  to near a power of two, that is greater than 1000.  The
> > > +	  nanoseconds returned by this conversion are divided by 1000
> > > +	  to be used as the printk timestamp counter (with resolution
> > > +	  of microseconds).
> > > +
> > > +	  As an example, for a cycle-counter with a frequency of 200 Mhz,
> > > +	  the multiplier would be: 10485760, and the shift would be 21.
> > > +
> >
> > If I got this correclty:
> >
> > 	EARLY_COUNTER_MULT = (10^9 / freq) << EARLY_COUNTER_SHIFT
> >
> > where EARLY_COUNTER_SHIFT can be chosen at will, provided it is big
> > enough to survice the ns->us conversion but small enough not to overflow
> > the u64 container.

Yeah, I was worried about these Kconfig explanations.  I think it will be easier
to just explain how I recommend this should be configured, which is something like:

Turn on CONFIG_EARLY_COUNTER_NS, run the kernel once, get the values for MULT
and SHIFT (printed by the calibration code), enter them in the appropriate configs, and
then build and run the kernel again.  This only needs to be done once per platform,
and could even be put into the defconfig (or a config fragment) for a platform.
(More on having hardcoded config for this below).  It was not my intent to have
kernel developers doing weird (shift) math to enable this feature.  Note that I can't
get the values from devicetree or the kernel command line, as this is used before
either of those is initialized or parsed.

> >
> > > +config EARLY_COUNTER_SHIFT
> > > +	int "Shift value for early cycle counter"
> > > +	range 0 63
> > > +	depends on PRINTK && EARLY_COUNTER_NS
> > > +	default 0
> > > +	help
> > > +	  This value specifies a shift value to be used when converting
> > > +	  cycle counts to nanoseconds.  The formula used is:
> > > +		  ns = (cycles * mult) >> shift
> > > +
> > > +	  Use a shift that will bring the result to a value
> > > +	  in nanoseconds.
> > > +
> > > +	  As an example, for a cycle-counter with a frequency of 200 Mhz,
> > > +	  the multiplier would be: 10485760, and the shift would be 21.
> > > +
> > >  config LOG_BUF_SHIFT
> > >  	int "Kernel log buffer size (16 => 64KB, 17 => 128KB)"
> > >  	range 12 25
> 
> So, it is usable only for a particular HW. It is not usable for a
> generic kernel which is supposed to run on misc HW.

That's correct.  It is mostly targeted at embedded products,
where shaving off 10 to 200 milliseconds in the pre-clock-initialization
region of boot would be valuable.  For people doing aggressive
boot time optimization, they will have a custom kernel anyway
(and probably a custom bootloader, device tree, initramfs, udev rules,
SE linux rules, module load ordering, systemd config, etc.)

Basically, if you're optimizing the code in this kernel boot time blind
spot, you are very likely not using a generic kernel.  (That's not to
say that the optimizations made won't ultimately be valuable to people using
generic kernels).

> 
> I guess that there is no way to detect the CPU frequency at runtime?
This "feature" is used before clock initialization, which is what
would be used to calibrate the CPU frequency at runtime.  This runs so 
early that doing the calibration inline doesn't work.  Not enough kernel
services are available (Actually, zero services are available, as this
can be used in the very first printk in start_kernel.)

Plan from here...

I got a compile error from 0-day on powerpc, so I need to re-spin to fix that.
I'll address the other issues raised and submit a new version when I can.
I'm off to Japan this week, and between business travel and the holidays,
and being away from my lab where I can do hardware testing, it will
probably be some time in January before I send the next version.

Thanks very much for the feedback!
 -- Tim

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] printk: add early_counter_ns routine for printk blind spot
  2025-11-25  5:30 [PATCH] printk: add early_counter_ns routine for printk blind spot Tim Bird
  2025-11-25  7:52 ` kernel test robot
  2025-11-25 13:08 ` Francesco Valla
@ 2025-11-26 11:13 ` Petr Mladek
  2025-11-27  9:13 ` kernel test robot
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 36+ messages in thread
From: Petr Mladek @ 2025-11-26 11:13 UTC (permalink / raw)
  To: Tim Bird
  Cc: Steve Rostedt, john.ogness, senozhatsky, Tim Bird, Andrew Morton,
	Francesco Valla, LKML, Anna-Maria Behnsen, Frederic Weisbecker,
	Thomas Gleixner, Linux Embedded

Adding some people from the time subsystem into Cc.
Please, keep them in the loop in the eventual next version of the
patch.

For the new people, please note the discussion has already started,
see
https://lore.kernel.org/r/39b09edb-8998-4ebd-a564-7d594434a981@bird.org

Best Regards,
Petr

On Mon 2025-11-24 22:30:52, Tim Bird wrote:
> From: Tim Bird <tim.bird@sony.com>
> 
> During early boot, printk timestamps are reported as zero,
> which creates a blind spot in early boot timings.  This blind
> spot hinders timing and optimization efforts for code that
> executes before time_init(), which is when local_clock() is
> initialized sufficiently to start returning non-zero timestamps.
> This period is about 400 milliseconds for many current desktop
> and embedded machines running Linux.
> 
> Add an early_counter_ns function that returns nanosecond
> timestamps based on get_cycles().  get_cycles() is operational
> on arm64 and x86_64 from kernel start.  Add some calibration
> printks to allow setting configuration variables that are used
> to convert cycle counts to nanoseconds (which are then used
> in early printks).  Add CONFIG_EARLY_COUNTER_NS, as well as
> some associated conversion variables, as new kernel config
> variables.
> 
> After proper configuration, this yields non-zero timestamps for
> printks from the very start of kernel execution.  The timestamps
> are relative to the start of the architecture-specific counter
> used in get_cycles (e.g. the TSC on x86_64 and cntvct_el0 on arm64).
> This means that the time reported reflects time-from-power-on for
> most embedded products.  This is also a useful data point for
> boot-time optimization work.
> 
> Note that there is a discontinuity in the timestamp sequencing
> when standard clocks are finally initialized in time_init().
> The printk timestamps are thus not monotonically increasing
> through the entire boot.
> 
> Signed-off-by: Tim Bird <tim.bird@sony.com>
> ---
>  init/Kconfig           | 47 ++++++++++++++++++++++++++++++++++++++++++
>  init/main.c            | 25 ++++++++++++++++++++++
>  kernel/printk/printk.c | 15 ++++++++++++++
>  3 files changed, 87 insertions(+)
> 
> diff --git a/init/Kconfig b/init/Kconfig
> index cab3ad28ca49..5352567c43ed 100644
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -770,6 +770,53 @@ config IKHEADERS
>  	  or similar programs.  If you build the headers as a module, a module called
>  	  kheaders.ko is built which can be loaded on-demand to get access to headers.
>  
> +config EARLY_COUNTER_NS
> +	bool "Use counter for early printk timestamps"
> +	default y
> +	depends on PRINTK
> +	help
> +	  Use a cycle-counter to provide printk timestamps during early
> +	  boot.  This allows seeing timing information that would
> +	  otherwise be displayed with 0-valued timestamps.
> +
> +	  In order for this to work, you need to specify values for
> +	  EARLY_COUNTER_MULT and EARLY_COUNTER_SHIFT, used to convert
> +	  from the cycle count to nanoseconds.
> +
> +config EARLY_COUNTER_MULT
> +	int "Multiplier for early cycle counter"
> +	depends on PRINTK && EARLY_COUNTER_NS
> +	default 1
> +	help
> +	  This value specifies a multiplier to be used when converting
> +	  cycle counts to nanoseconds.  The formula used is:
> +		  ns = (cycles * mult) >> shift
> +
> +	  Use a multiplier that will bring the value of (cycles * mult)
> +	  to near a power of two, that is greater than 1000.  The
> +	  nanoseconds returned by this conversion are divided by 1000
> +	  to be used as the printk timestamp counter (with resolution
> +	  of microseconds).
> +
> +	  As an example, for a cycle-counter with a frequency of 200 Mhz,
> +	  the multiplier would be: 10485760, and the shift would be 21.
> +
> +config EARLY_COUNTER_SHIFT
> +	int "Shift value for early cycle counter"
> +	range 0 63
> +	depends on PRINTK && EARLY_COUNTER_NS
> +	default 0
> +	help
> +	  This value specifies a shift value to be used when converting
> +	  cycle counts to nanoseconds.  The formula used is:
> +		  ns = (cycles * mult) >> shift
> +
> +	  Use a shift that will bring the result to a value
> +	  in nanoseconds.
> +
> +	  As an example, for a cycle-counter with a frequency of 200 Mhz,
> +	  the multiplier would be: 10485760, and the shift would be 21.
> +
>  config LOG_BUF_SHIFT
>  	int "Kernel log buffer size (16 => 64KB, 17 => 128KB)"
>  	range 12 25
> diff --git a/init/main.c b/init/main.c
> index 07a3116811c5..587aaaad22d1 100644
> --- a/init/main.c
> +++ b/init/main.c
> @@ -105,6 +105,8 @@
>  #include <linux/ptdump.h>
>  #include <linux/time_namespace.h>
>  #include <net/net_namespace.h>
> +#include <linux/timex.h>
> +#include <linux/sched/clock.h>
>  
>  #include <asm/io.h>
>  #include <asm/setup.h>
> @@ -906,6 +908,8 @@ static void __init early_numa_node_init(void)
>  #endif
>  }
>  
> +static u64 start_cycles, start_ns;
> +
>  asmlinkage __visible __init __no_sanitize_address __noreturn __no_stack_protector
>  void start_kernel(void)
>  {
> @@ -1023,6 +1027,10 @@ void start_kernel(void)
>  	timekeeping_init();
>  	time_init();
>  
> +	/* used to calibrate early_counter_ns */
> +	start_cycles = get_cycles();
> +	start_ns = local_clock();
> +
>  	/* This must be after timekeeping is initialized */
>  	random_init();
>  
> @@ -1474,6 +1482,8 @@ void __weak free_initmem(void)
>  static int __ref kernel_init(void *unused)
>  {
>  	int ret;
> +	u64 end_cycles, end_ns;
> +	u32 early_mult, early_shift;
>  
>  	/*
>  	 * Wait until kthreadd is all set-up.
> @@ -1505,6 +1515,21 @@ static int __ref kernel_init(void *unused)
>  
>  	do_sysctl_args();
>  
> +	/* show calibration data for early_counter_ns */
> +	end_cycles = get_cycles();
> +	end_ns = local_clock();
> +	clocks_calc_mult_shift(&early_mult, &early_shift,
> +		((end_cycles - start_cycles) * NSEC_PER_SEC)/(end_ns - start_ns),
> +		NSEC_PER_SEC, 50);
> +
> +#ifdef CONFIG_EARLY_COUNTER_NS
> +	pr_info("Early Counter: start_cycles=%llu, end_cycles=%llu, cycles=%llu\n",
> +		start_cycles, end_cycles, (end_cycles - start_cycles));
> +	pr_info("Early Counter: start_ns=%llu, end_ns=%llu, ns=%llu\n",
> +		start_ns, end_ns, (end_ns - start_ns));
> +	pr_info("Early Counter: MULT=%u, SHIFT=%u\n", early_mult, early_shift);
> +#endif
> +
>  	if (ramdisk_execute_command) {
>  		ret = run_init_process(ramdisk_execute_command);
>  		if (!ret)
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index 5aee9ffb16b9..522dd24cd534 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -2210,6 +2210,19 @@ static u16 printk_sprint(char *text, u16 size, int facility,
>  	return text_len;
>  }
>  
> +#ifdef CONFIG_EARLY_COUNTER_NS
> +static inline u64 early_counter_ns(void)
> +{
> +	return ((u64)get_cycles() * CONFIG_EARLY_COUNTER_MULT)
> +		>> CONFIG_EARLY_COUNTER_SHIFT;
> +}
> +#else
> +static inline u64 early_counter_ns(void)
> +{
> +	return 0;
> +}
> +#endif
> +
>  __printf(4, 0)
>  int vprintk_store(int facility, int level,
>  		  const struct dev_printk_info *dev_info,
> @@ -2239,6 +2252,8 @@ int vprintk_store(int facility, int level,
>  	 * timestamp with respect to the caller.
>  	 */
>  	ts_nsec = local_clock();
> +	if (!ts_nsec)
> +		ts_nsec = early_counter_ns();
>  
>  	caller_id = printk_caller_id();
>  
> -- 
> 2.43.0

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] printk: add early_counter_ns routine for printk blind spot
  2025-11-25  5:30 [PATCH] printk: add early_counter_ns routine for printk blind spot Tim Bird
                   ` (2 preceding siblings ...)
  2025-11-26 11:13 ` Petr Mladek
@ 2025-11-27  9:13 ` kernel test robot
  2026-01-24 19:40 ` [PATCH v2] printk: fix zero-valued printk timestamps in early boot Tim Bird
  2026-02-10 23:47 ` [PATCH v3] " Tim Bird
  5 siblings, 0 replies; 36+ messages in thread
From: kernel test robot @ 2025-11-27  9:13 UTC (permalink / raw)
  To: Tim Bird, pmladek, Steve Rostedt, john.ogness, senozhatsky
  Cc: oe-kbuild-all, Tim Bird, Andrew Morton,
	Linux Memory Management List, Francesco Valla, LKML,
	Linux Embedded

Hi Tim,

kernel test robot noticed the following build errors:

[auto build test ERROR on linus/master]
[also build test ERROR on v6.18-rc7]
[cannot apply to akpm-mm/mm-everything next-20251127]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Tim-Bird/printk-add-early_counter_ns-routine-for-printk-blind-spot/20251125-133242
base:   linus/master
patch link:    https://lore.kernel.org/r/39b09edb-8998-4ebd-a564-7d594434a981%40bird.org
patch subject: [PATCH] printk: add early_counter_ns routine for printk blind spot
config: i386-randconfig-2006-20250825 (https://download.01.org/0day-ci/archive/20251127/202511271051.yfp2O98B-lkp@intel.com/config)
compiler: gcc-14 (Debian 14.2.0-19) 14.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20251127/202511271051.yfp2O98B-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202511271051.yfp2O98B-lkp@intel.com/

All errors (new ones prefixed by >>):

   ld: init/main.o: in function `kernel_init':
>> init/main.c:1522:(.ref.text+0x201): undefined reference to `__udivdi3'


vim +1522 init/main.c

  1481	
  1482	static int __ref kernel_init(void *unused)
  1483	{
  1484		int ret;
  1485		u64 end_cycles, end_ns;
  1486		u32 early_mult, early_shift;
  1487	
  1488		/*
  1489		 * Wait until kthreadd is all set-up.
  1490		 */
  1491		wait_for_completion(&kthreadd_done);
  1492	
  1493		kernel_init_freeable();
  1494		/* need to finish all async __init code before freeing the memory */
  1495		async_synchronize_full();
  1496	
  1497		system_state = SYSTEM_FREEING_INITMEM;
  1498		kprobe_free_init_mem();
  1499		ftrace_free_init_mem();
  1500		kgdb_free_init_mem();
  1501		exit_boot_config();
  1502		free_initmem();
  1503		mark_readonly();
  1504	
  1505		/*
  1506		 * Kernel mappings are now finalized - update the userspace page-table
  1507		 * to finalize PTI.
  1508		 */
  1509		pti_finalize();
  1510	
  1511		system_state = SYSTEM_RUNNING;
  1512		numa_default_policy();
  1513	
  1514		rcu_end_inkernel_boot();
  1515	
  1516		do_sysctl_args();
  1517	
  1518		/* show calibration data for early_counter_ns */
  1519		end_cycles = get_cycles();
  1520		end_ns = local_clock();
  1521		clocks_calc_mult_shift(&early_mult, &early_shift,
> 1522			((end_cycles - start_cycles) * NSEC_PER_SEC)/(end_ns - start_ns),
  1523			NSEC_PER_SEC, 50);
  1524	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v2] printk: fix zero-valued printk timestamps in early boot
  2025-11-25  5:30 [PATCH] printk: add early_counter_ns routine for printk blind spot Tim Bird
                   ` (3 preceding siblings ...)
  2025-11-27  9:13 ` kernel test robot
@ 2026-01-24 19:40 ` Tim Bird
  2026-01-25 14:41   ` Francesco Valla
  2026-01-26 10:12   ` Geert Uytterhoeven
  2026-02-10 23:47 ` [PATCH v3] " Tim Bird
  5 siblings, 2 replies; 36+ messages in thread
From: Tim Bird @ 2026-01-24 19:40 UTC (permalink / raw)
  To: pmladek, rostedt, john.ogness, senozhatsky
  Cc: francesco, linux-embedded, linux-kernel, Tim Bird

During early boot, printk timestamps are reported as zero before
kernel timekeeping starts (e.g. before time_init()).  This
hinders boot-time optimization efforts.  This period is about 400
milliseconds for many current desktop and embedded machines
running Linux.

Add support to save cycles during early boot, and output correct
timestamp values after timekeeping is initialized.  get_cycles()
is operational on arm64 and x86_64 from kernel start.  Add code
and variables to save calibration values used to later convert
cycle counts to time values in the early printks.  Add a config
to control the feature.

This yields non-zero timestamps for printks from the very start
of kernel execution.  The timestamps are relative to the start of
the architecture-specified counter used in get_cycles
(e.g. the TSC on x86_64 and cntvct_el0 on arm64).

All timestamps reflect time from power-on instead of time from
the kernel's timekeeping initialization.

Signed-off-by: Tim Bird <tim.bird@sony.com>
---
V1 -> V2
  Remove calibration CONFIG vars
  Add 'depends on' to restrict arches (to handle ppc bug)
  Add early_ts_offset to avoid discontinuity
  Save cycles in ts_nsec, and convert on output
  Move conditional code to include file (early_times.h)
---
 include/linux/early_times.h | 48 +++++++++++++++++++++++++++++++++++++
 init/Kconfig                | 12 ++++++++++
 init/main.c                 | 26 ++++++++++++++++++++
 kernel/printk/printk.c      | 16 +++++++++++--
 4 files changed, 100 insertions(+), 2 deletions(-)
 create mode 100644 include/linux/early_times.h

diff --git a/include/linux/early_times.h b/include/linux/early_times.h
new file mode 100644
index 000000000000..9dc31eb442c2
--- /dev/null
+++ b/include/linux/early_times.h
@@ -0,0 +1,48 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef _KERNEL_PRINTK_EARLY_TIMES_H
+#define _KERNEL_PRINTK_EARLY_TIMES_H
+
+#include <linux/timex.h>
+
+#if defined(CONFIG_EARLY_PRINTK_TIMES)
+extern u32 early_mult, early_shift;
+extern u64 early_ts_offset;
+
+static inline u64 early_cycles(void)
+{
+	return ((u64)get_cycles() | (1ULL << 63));
+}
+
+static inline u64 adjust_early_ts(u64 ts)
+{
+	/* High bit means ts is a cycle count */
+	if (unlikely(ts & (1ULL << 63)))
+		/*
+		 * mask high bit and convert to ns
+		 * Note that early_mult may be 0, but that's OK because
+		 * we'll just multiply by 0 and return 0. This will
+		 * only occur if we're outputting a printk message
+		 * before the calibration of the early timestamp.
+		 * Any output after user space start (eg. from dmesg or
+		 * journalctl) will show correct values.
+		 */
+		return (((ts & ~(1ULL << 63)) * early_mult) >> early_shift);
+
+	/* If timestamp is already in ns, just add offset */
+	return ts + early_ts_offset;
+}
+#else
+static inline u64 early_cycles(void)
+{
+	return 0;
+}
+
+static inline u64 adjust_early_ts(u64 ts)
+{
+	return ts;
+}
+#endif /* CONFIG_EARLY_PRINTK_TIMES */
+
+#endif /* _KERNEL_PRINTK_EARLY_TIMES_H */
+
diff --git a/init/Kconfig b/init/Kconfig
index fa79feb8fe57..060a22cddd17 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -777,6 +777,18 @@ config IKHEADERS
 	  or similar programs.  If you build the headers as a module, a module called
 	  kheaders.ko is built which can be loaded on-demand to get access to headers.
 
+config EARLY_PRINTK_TIMES
+	bool "Show non-zero printk timestamps early in boot"
+	default y
+	depends on PRINTK
+	depends on ARM64 || X86_64
+	help
+	  Use a cycle-counter to provide printk timestamps during
+	  early boot.  This allows seeing timestamps for printks that
+	  would otherwise show as 0.  Note that this will shift the
+	  printk timestamps to be relative to machine power on, instead
+	  of relative to the start of kernel timekeeping.
+
 config LOG_BUF_SHIFT
 	int "Kernel log buffer size (16 => 64KB, 17 => 128KB)"
 	range 12 25
diff --git a/init/main.c b/init/main.c
index b84818ad9685..cc1af26933f7 100644
--- a/init/main.c
+++ b/init/main.c
@@ -104,6 +104,9 @@
 #include <linux/pidfs.h>
 #include <linux/ptdump.h>
 #include <linux/time_namespace.h>
+#include <linux/timex.h>
+#include <linux/sched/clock.h>
+#include <linux/early_times.h>
 #include <net/net_namespace.h>
 
 #include <asm/io.h>
@@ -160,6 +163,10 @@ static size_t initargs_offs;
 # define initargs_offs 0
 #endif
 
+#ifdef CONFIG_EARLY_PRINTK_TIMES
+static u64 start_cycles, start_ns;
+#endif
+
 static char *execute_command;
 static char *ramdisk_execute_command = "/init";
 
@@ -1118,6 +1125,11 @@ void start_kernel(void)
 	timekeeping_init();
 	time_init();
 
+#ifdef CONFIG_EARLY_PRINTK_TIMES
+	start_cycles = get_cycles();
+	start_ns = local_clock();
+#endif
+
 	/* This must be after timekeeping is initialized */
 	random_init();
 
@@ -1600,6 +1612,20 @@ static int __ref kernel_init(void *unused)
 
 	do_sysctl_args();
 
+#ifdef CONFIG_EARLY_PRINTK_TIMES
+	u64 end_cycles, end_ns;
+
+	/* set calibration data for early_printk_times */
+	end_cycles = get_cycles();
+	end_ns = local_clock();
+	clocks_calc_mult_shift(&early_mult, &early_shift,
+		((end_cycles - start_cycles) * NSEC_PER_SEC)/(end_ns - start_ns),
+		NSEC_PER_SEC, 50);
+	early_ts_offset = ((start_cycles * early_mult) >> early_shift) - start_ns;
+	pr_debug("Early printk times: mult=%u, shift=%u, offset=%llu ns\n",
+		early_mult, early_shift, early_ts_offset);
+#endif
+
 	if (ramdisk_execute_command) {
 		ret = run_init_process(ramdisk_execute_command);
 		if (!ret)
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 1d765ad242b8..f17877337735 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -46,6 +46,7 @@
 #include <linux/ctype.h>
 #include <linux/uio.h>
 #include <linux/sched/clock.h>
+#include <linux/early_times.h>
 #include <linux/sched/debug.h>
 #include <linux/sched/task_stack.h>
 #include <linux/panic.h>
@@ -75,6 +76,11 @@ EXPORT_SYMBOL(ignore_console_lock_warning);
 
 EXPORT_TRACEPOINT_SYMBOL_GPL(console);
 
+#ifdef CONFIG_EARLY_PRINTK_TIMES
+u32 early_mult, early_shift;
+u64 early_ts_offset;
+#endif
+
 /*
  * Low level drivers may need that to know if they can schedule in
  * their unblank() callback or not. So let's export it.
@@ -639,7 +645,7 @@ static void append_char(char **pp, char *e, char c)
 static ssize_t info_print_ext_header(char *buf, size_t size,
 				     struct printk_info *info)
 {
-	u64 ts_usec = info->ts_nsec;
+	u64 ts_usec = adjust_early_ts(info->ts_nsec);
 	char caller[20];
 #ifdef CONFIG_PRINTK_CALLER
 	u32 id = info->caller_id;
@@ -1352,7 +1358,11 @@ static size_t print_syslog(unsigned int level, char *buf)
 
 static size_t print_time(u64 ts, char *buf)
 {
-	unsigned long rem_nsec = do_div(ts, 1000000000);
+	unsigned long rem_nsec;
+
+	ts = adjust_early_ts(ts);
+
+	rem_nsec = do_div(ts, 1000000000);
 
 	return sprintf(buf, "[%5lu.%06lu]",
 		       (unsigned long)ts, rem_nsec / 1000);
@@ -2242,6 +2252,8 @@ int vprintk_store(int facility, int level,
 	 * timestamp with respect to the caller.
 	 */
 	ts_nsec = local_clock();
+	if (!ts_nsec)
+		ts_nsec = early_cycles();
 
 	caller_id = printk_caller_id();
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH v2] printk: fix zero-valued printk timestamps in early boot
  2026-01-24 19:40 ` [PATCH v2] printk: fix zero-valued printk timestamps in early boot Tim Bird
@ 2026-01-25 14:41   ` Francesco Valla
  2026-01-26 16:52     ` Bird, Tim
  2026-01-26 10:12   ` Geert Uytterhoeven
  1 sibling, 1 reply; 36+ messages in thread
From: Francesco Valla @ 2026-01-25 14:41 UTC (permalink / raw)
  To: Tim Bird
  Cc: pmladek, rostedt, john.ogness, senozhatsky, linux-embedded,
	linux-kernel

Hi Tim,

I tested this both on X86_64 QEMU and on a i.MX93 (ARM64) and can
confirm it is working as expected. Auto-calc of calibration data is far
better than the configuration parameters in v1.

It is slightly confusing to see a time value printed to serial output
and another one inside kmsg, but that's a human thing and should not
confuse any tool.

Some notes follow.

On Sat, Jan 24, 2026 at 12:40:27PM -0700, Tim Bird wrote:
> During early boot, printk timestamps are reported as zero before
> kernel timekeeping starts (e.g. before time_init()).  This
> hinders boot-time optimization efforts.  This period is about 400
> milliseconds for many current desktop and embedded machines
> running Linux.
> 
> Add support to save cycles during early boot, and output correct
> timestamp values after timekeeping is initialized.  get_cycles()
> is operational on arm64 and x86_64 from kernel start.  Add code
> and variables to save calibration values used to later convert
> cycle counts to time values in the early printks.  Add a config
> to control the feature.
> 
> This yields non-zero timestamps for printks from the very start
> of kernel execution.  The timestamps are relative to the start of
> the architecture-specified counter used in get_cycles
> (e.g. the TSC on x86_64 and cntvct_el0 on arm64).
> 
> All timestamps reflect time from power-on instead of time from
> the kernel's timekeeping initialization.
> 
> Signed-off-by: Tim Bird <tim.bird@sony.com>
> ---
> V1 -> V2
>   Remove calibration CONFIG vars
>   Add 'depends on' to restrict arches (to handle ppc bug)
>   Add early_ts_offset to avoid discontinuity
>   Save cycles in ts_nsec, and convert on output
>   Move conditional code to include file (early_times.h)
> ---
>  include/linux/early_times.h | 48 +++++++++++++++++++++++++++++++++++++
>  init/Kconfig                | 12 ++++++++++
>  init/main.c                 | 26 ++++++++++++++++++++
>  kernel/printk/printk.c      | 16 +++++++++++--
>  4 files changed, 100 insertions(+), 2 deletions(-)
>  create mode 100644 include/linux/early_times.h
> 
> diff --git a/include/linux/early_times.h b/include/linux/early_times.h
> new file mode 100644
> index 000000000000..9dc31eb442c2
> --- /dev/null
> +++ b/include/linux/early_times.h
> @@ -0,0 +1,48 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef _KERNEL_PRINTK_EARLY_TIMES_H
> +#define _KERNEL_PRINTK_EARLY_TIMES_H
> +
> +#include <linux/timex.h>
> +
> +#if defined(CONFIG_EARLY_PRINTK_TIMES)
> +extern u32 early_mult, early_shift;
> +extern u64 early_ts_offset;
> +
> +static inline u64 early_cycles(void)
> +{
> +	return ((u64)get_cycles() | (1ULL << 63));
> +}
> +
> +static inline u64 adjust_early_ts(u64 ts)
> +{
> +	/* High bit means ts is a cycle count */
> +	if (unlikely(ts & (1ULL << 63)))
> +		/*
> +		 * mask high bit and convert to ns
> +		 * Note that early_mult may be 0, but that's OK because
> +		 * we'll just multiply by 0 and return 0. This will
> +		 * only occur if we're outputting a printk message
> +		 * before the calibration of the early timestamp.
> +		 * Any output after user space start (eg. from dmesg or
> +		 * journalctl) will show correct values.
> +		 */
> +		return (((ts & ~(1ULL << 63)) * early_mult) >> early_shift);
> +
> +	/* If timestamp is already in ns, just add offset */
> +	return ts + early_ts_offset;
> +}
> +#else
> +static inline u64 early_cycles(void)
> +{
> +	return 0;
> +}
> +
> +static inline u64 adjust_early_ts(u64 ts)
> +{
> +	return ts;
> +}
> +#endif /* CONFIG_EARLY_PRINTK_TIMES */
> +
> +#endif /* _KERNEL_PRINTK_EARLY_TIMES_H */
> +
> diff --git a/init/Kconfig b/init/Kconfig
> index fa79feb8fe57..060a22cddd17 100644
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -777,6 +777,18 @@ config IKHEADERS
>  	  or similar programs.  If you build the headers as a module, a module called
>  	  kheaders.ko is built which can be loaded on-demand to get access to headers.
>  
> +config EARLY_PRINTK_TIMES
> +	bool "Show non-zero printk timestamps early in boot"
> +	default y

Considering that this might have a significant impact on monitoring
mechanisms already in place (that e.g. expect a specific dmesg print to
have a maximum associated time value), please consider a N default here.

> +	depends on PRINTK
> +	depends on ARM64 || X86_64
> +	help
> +	  Use a cycle-counter to provide printk timestamps during
> +	  early boot.  This allows seeing timestamps for printks that
> +	  would otherwise show as 0.  Note that this will shift the
> +	  printk timestamps to be relative to machine power on, instead
> +	  of relative to the start of kernel timekeeping.
> +

To be precise, the timestamps will be relative to processor power on;
the machine might have some other processors that run before the Linux
one (this is the case for example of i.MX9 or AM62 SoCs) and will be
unaccounted for even by this mechanism.

>  config LOG_BUF_SHIFT
>  	int "Kernel log buffer size (16 => 64KB, 17 => 128KB)"
>  	range 12 25
> diff --git a/init/main.c b/init/main.c
> index b84818ad9685..cc1af26933f7 100644
> --- a/init/main.c
> +++ b/init/main.c
> @@ -104,6 +104,9 @@
>  #include <linux/pidfs.h>
>  #include <linux/ptdump.h>
>  #include <linux/time_namespace.h>
> +#include <linux/timex.h>
> +#include <linux/sched/clock.h>
> +#include <linux/early_times.h>
>  #include <net/net_namespace.h>
>  
>  #include <asm/io.h>
> @@ -160,6 +163,10 @@ static size_t initargs_offs;
>  # define initargs_offs 0
>  #endif
>  
> +#ifdef CONFIG_EARLY_PRINTK_TIMES
> +static u64 start_cycles, start_ns;
> +#endif
> +
>  static char *execute_command;
>  static char *ramdisk_execute_command = "/init";
>  
> @@ -1118,6 +1125,11 @@ void start_kernel(void)
>  	timekeeping_init();
>  	time_init();
>  
> +#ifdef CONFIG_EARLY_PRINTK_TIMES
> +	start_cycles = get_cycles();
> +	start_ns = local_clock();
> +#endif
> +

I was wondering it it wouldn't make more sense to move this logic to its
own file, and have a plain call e.g. to early_times_init() here
(continue...)

>  	/* This must be after timekeeping is initialized */
>  	random_init();
>  
> @@ -1600,6 +1612,20 @@ static int __ref kernel_init(void *unused)
>  
>  	do_sysctl_args();
>  
> +#ifdef CONFIG_EARLY_PRINTK_TIMES
> +	u64 end_cycles, end_ns;
> +
> +	/* set calibration data for early_printk_times */
> +	end_cycles = get_cycles();
> +	end_ns = local_clock();
> +	clocks_calc_mult_shift(&early_mult, &early_shift,
> +		((end_cycles - start_cycles) * NSEC_PER_SEC)/(end_ns - start_ns),
> +		NSEC_PER_SEC, 50);
> +	early_ts_offset = ((start_cycles * early_mult) >> early_shift) - start_ns;
> +	pr_debug("Early printk times: mult=%u, shift=%u, offset=%llu ns\n",
> +		early_mult, early_shift, early_ts_offset);
> +#endif
> +

(...continue) and to early_times_calc() or something like that here.

In this way, all related variables (i.e.: start_cycles, start_ns from
this file, but also early_mult, early_shift, and early_ts_offset from
kernel/printk/printk.c) can be confined to their own file and not add
noise here.

>  	if (ramdisk_execute_command) {
>  		ret = run_init_process(ramdisk_execute_command);
>  		if (!ret)
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index 1d765ad242b8..f17877337735 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -46,6 +46,7 @@
>  #include <linux/ctype.h>
>  #include <linux/uio.h>
>  #include <linux/sched/clock.h>
> +#include <linux/early_times.h>
>  #include <linux/sched/debug.h>
>  #include <linux/sched/task_stack.h>
>  #include <linux/panic.h>
> @@ -75,6 +76,11 @@ EXPORT_SYMBOL(ignore_console_lock_warning);
>  
>  EXPORT_TRACEPOINT_SYMBOL_GPL(console);
>  
> +#ifdef CONFIG_EARLY_PRINTK_TIMES
> +u32 early_mult, early_shift;
> +u64 early_ts_offset;
> +#endif
> +
>  /*
>   * Low level drivers may need that to know if they can schedule in
>   * their unblank() callback or not. So let's export it.
> @@ -639,7 +645,7 @@ static void append_char(char **pp, char *e, char c)
>  static ssize_t info_print_ext_header(char *buf, size_t size,
>  				     struct printk_info *info)
>  {
> -	u64 ts_usec = info->ts_nsec;
> +	u64 ts_usec = adjust_early_ts(info->ts_nsec);
>  	char caller[20];
>  #ifdef CONFIG_PRINTK_CALLER
>  	u32 id = info->caller_id;
> @@ -1352,7 +1358,11 @@ static size_t print_syslog(unsigned int level, char *buf)
>  
>  static size_t print_time(u64 ts, char *buf)
>  {
> -	unsigned long rem_nsec = do_div(ts, 1000000000);
> +	unsigned long rem_nsec;
> +
> +	ts = adjust_early_ts(ts);
> +
> +	rem_nsec = do_div(ts, 1000000000);
>  
>  	return sprintf(buf, "[%5lu.%06lu]",
>  		       (unsigned long)ts, rem_nsec / 1000);
> @@ -2242,6 +2252,8 @@ int vprintk_store(int facility, int level,
>  	 * timestamp with respect to the caller.
>  	 */
>  	ts_nsec = local_clock();
> +	if (!ts_nsec)
> +		ts_nsec = early_cycles();
>  
>  	caller_id = printk_caller_id();
>  
> -- 
> 2.43.0
>

Regards,
Francesco


^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: [PATCH v2] printk: fix zero-valued printk timestamps in early boot
  2026-01-25 14:41   ` Francesco Valla
@ 2026-01-26 16:52     ` Bird, Tim
  2026-02-02 16:23       ` Petr Mladek
  0 siblings, 1 reply; 36+ messages in thread
From: Bird, Tim @ 2026-01-26 16:52 UTC (permalink / raw)
  To: Francesco Valla
  Cc: pmladek@suse.com, rostedt@goodmis.org, john.ogness@linuxtronix.de,
	senozhatsky@chromium.org, linux-embedded@vger.kernel.org,
	linux-kernel@vger.kernel.org



> -----Original Message-----
> From: Francesco Valla <francesco@valla.it>
> 
> Hi Tim,
> 
> I tested this both on X86_64 QEMU and on a i.MX93 (ARM64) and can
> confirm it is working as expected. Auto-calc of calibration data is far
> better than the configuration parameters in v1.
> 
> It is slightly confusing to see a time value printed to serial output
> and another one inside kmsg, but that's a human thing and should not
> confuse any tool.
Agreed.  I wasn't too worried about it, because most serious developers working
on boot-time will not be watching early messages over serial console.  (Usually they
use 'quiet' or some lower log level).  But on qemu, it does look strange to see 0s
on the first output sequence, and then non-zeroes when using dmesg later in the same
boot.

I just realized though, that I should go back and see if there's a discontinuity on the output via serial
(before and after calibration), and possibly put a note about that in the config description.

I'll think about what I can do here to reduce the confusion.

> 
> Some notes follow.
> 
> On Sat, Jan 24, 2026 at 12:40:27PM -0700, Tim Bird wrote:
> > During early boot, printk timestamps are reported as zero before
> > kernel timekeeping starts (e.g. before time_init()).  This
> > hinders boot-time optimization efforts.  This period is about 400
> > milliseconds for many current desktop and embedded machines
> > running Linux.
> >
> > Add support to save cycles during early boot, and output correct
> > timestamp values after timekeeping is initialized.  get_cycles()
> > is operational on arm64 and x86_64 from kernel start.  Add code
> > and variables to save calibration values used to later convert
> > cycle counts to time values in the early printks.  Add a config
> > to control the feature.
> >
> > This yields non-zero timestamps for printks from the very start
> > of kernel execution.  The timestamps are relative to the start of
> > the architecture-specified counter used in get_cycles
> > (e.g. the TSC on x86_64 and cntvct_el0 on arm64).
> >
> > All timestamps reflect time from power-on instead of time from
> > the kernel's timekeeping initialization.
> >
> > Signed-off-by: Tim Bird <tim.bird@sony.com>
> > ---
> > V1 -> V2
> >   Remove calibration CONFIG vars
> >   Add 'depends on' to restrict arches (to handle ppc bug)
> >   Add early_ts_offset to avoid discontinuity
> >   Save cycles in ts_nsec, and convert on output
> >   Move conditional code to include file (early_times.h)
> > ---
> >  include/linux/early_times.h | 48 +++++++++++++++++++++++++++++++++++++
> >  init/Kconfig                | 12 ++++++++++
> >  init/main.c                 | 26 ++++++++++++++++++++
> >  kernel/printk/printk.c      | 16 +++++++++++--
> >  4 files changed, 100 insertions(+), 2 deletions(-)
> >  create mode 100644 include/linux/early_times.h
> >
> > diff --git a/include/linux/early_times.h b/include/linux/early_times.h
> > new file mode 100644
> > index 000000000000..9dc31eb442c2
> > --- /dev/null
> > +++ b/include/linux/early_times.h
> > @@ -0,0 +1,48 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +
> > +#ifndef _KERNEL_PRINTK_EARLY_TIMES_H
> > +#define _KERNEL_PRINTK_EARLY_TIMES_H
> > +
> > +#include <linux/timex.h>
> > +
> > +#if defined(CONFIG_EARLY_PRINTK_TIMES)
> > +extern u32 early_mult, early_shift;
> > +extern u64 early_ts_offset;
> > +
> > +static inline u64 early_cycles(void)
> > +{
> > +	return ((u64)get_cycles() | (1ULL << 63));
> > +}
> > +
> > +static inline u64 adjust_early_ts(u64 ts)
> > +{
> > +	/* High bit means ts is a cycle count */
> > +	if (unlikely(ts & (1ULL << 63)))
> > +		/*
> > +		 * mask high bit and convert to ns
> > +		 * Note that early_mult may be 0, but that's OK because
> > +		 * we'll just multiply by 0 and return 0. This will
> > +		 * only occur if we're outputting a printk message
> > +		 * before the calibration of the early timestamp.
> > +		 * Any output after user space start (eg. from dmesg or
> > +		 * journalctl) will show correct values.
> > +		 */
> > +		return (((ts & ~(1ULL << 63)) * early_mult) >> early_shift);
> > +
> > +	/* If timestamp is already in ns, just add offset */
> > +	return ts + early_ts_offset;
> > +}
> > +#else
> > +static inline u64 early_cycles(void)
> > +{
> > +	return 0;
> > +}
> > +
> > +static inline u64 adjust_early_ts(u64 ts)
> > +{
> > +	return ts;
> > +}
> > +#endif /* CONFIG_EARLY_PRINTK_TIMES */
> > +
> > +#endif /* _KERNEL_PRINTK_EARLY_TIMES_H */
> > +
> > diff --git a/init/Kconfig b/init/Kconfig
> > index fa79feb8fe57..060a22cddd17 100644
> > --- a/init/Kconfig
> > +++ b/init/Kconfig
> > @@ -777,6 +777,18 @@ config IKHEADERS
> >  	  or similar programs.  If you build the headers as a module, a module called
> >  	  kheaders.ko is built which can be loaded on-demand to get access to headers.
> >
> > +config EARLY_PRINTK_TIMES
> > +	bool "Show non-zero printk timestamps early in boot"
> > +	default y
> 
> Considering that this might have a significant impact on monitoring
> mechanisms already in place (that e.g. expect a specific dmesg print to
> have a maximum associated time value), please consider a N default here.

Oops!  Sorry, that was supposed to be 'default n'.  You're right.  I know I had
this as default N, and I think I switched it temporarily for testing, and forgot
to switch it back (and never caught it the numerous times I reviewed the
patch before sending it out again, ugh).  Thanks for catching this.

If people like this, and we don't see any problems with tooling or virtualization, I
could see it switching to default Y in the future.  But for now this should definitely
be 'default n'.

> 
> > +	depends on PRINTK
> > +	depends on ARM64 || X86_64
> > +	help
> > +	  Use a cycle-counter to provide printk timestamps during
> > +	  early boot.  This allows seeing timestamps for printks that
> > +	  would otherwise show as 0.  Note that this will shift the
> > +	  printk timestamps to be relative to machine power on, instead
> > +	  of relative to the start of kernel timekeeping.
> > +
> 
> To be precise, the timestamps will be relative to processor power on;
> the machine might have some other processors that run before the Linux
> one (this is the case for example of i.MX9 or AM62 SoCs) and will be
> unaccounted for even by this mechanism.

Good point.   Even more precisely, it will be relative to
cycle-counter value initialization or reset, which often (but not always)
corresponds to processor power on.

I'll adjust the wording.

I'm still a bit unsure what happens in the virtualization case.  qemu seems to initialize
the TSC at qemu start, but I'm not sure what happens for e.g. client VMs on cloud servers.

> 
> >  config LOG_BUF_SHIFT
> >  	int "Kernel log buffer size (16 => 64KB, 17 => 128KB)"
> >  	range 12 25
> > diff --git a/init/main.c b/init/main.c
> > index b84818ad9685..cc1af26933f7 100644
> > --- a/init/main.c
> > +++ b/init/main.c
> > @@ -104,6 +104,9 @@
> >  #include <linux/pidfs.h>
> >  #include <linux/ptdump.h>
> >  #include <linux/time_namespace.h>
> > +#include <linux/timex.h>
> > +#include <linux/sched/clock.h>
> > +#include <linux/early_times.h>
> >  #include <net/net_namespace.h>
> >
> >  #include <asm/io.h>
> > @@ -160,6 +163,10 @@ static size_t initargs_offs;
> >  # define initargs_offs 0
> >  #endif
> >
> > +#ifdef CONFIG_EARLY_PRINTK_TIMES
> > +static u64 start_cycles, start_ns;
> > +#endif
> > +
> >  static char *execute_command;
> >  static char *ramdisk_execute_command = "/init";
> >
> > @@ -1118,6 +1125,11 @@ void start_kernel(void)
> >  	timekeeping_init();
> >  	time_init();
> >
> > +#ifdef CONFIG_EARLY_PRINTK_TIMES
> > +	start_cycles = get_cycles();
> > +	start_ns = local_clock();
> > +#endif
> > +
> 
> I was wondering it it wouldn't make more sense to move this logic to its
> own file, and have a plain call e.g. to early_times_init() here
> (continue...)
> 
> >  	/* This must be after timekeeping is initialized */
> >  	random_init();
> >
> > @@ -1600,6 +1612,20 @@ static int __ref kernel_init(void *unused)
> >
> >  	do_sysctl_args();
> >
> > +#ifdef CONFIG_EARLY_PRINTK_TIMES
> > +	u64 end_cycles, end_ns;
> > +
> > +	/* set calibration data for early_printk_times */
> > +	end_cycles = get_cycles();
> > +	end_ns = local_clock();
> > +	clocks_calc_mult_shift(&early_mult, &early_shift,
> > +		((end_cycles - start_cycles) * NSEC_PER_SEC)/(end_ns - start_ns),
> > +		NSEC_PER_SEC, 50);
> > +	early_ts_offset = ((start_cycles * early_mult) >> early_shift) - start_ns;
> > +	pr_debug("Early printk times: mult=%u, shift=%u, offset=%llu ns\n",
> > +		early_mult, early_shift, early_ts_offset);
> > +#endif
> > +
> 
> (...continue) and to early_times_calc() or something like that here.
> 
> In this way, all related variables (i.e.: start_cycles, start_ns from
> this file, but also early_mult, early_shift, and early_ts_offset from
> kernel/printk/printk.c) can be confined to their own file and not add
> noise here.

I thought a lot about this, and wasn't sure whether to use that approach or
not.  I would need two routines: early_times_start_calibration, and
early_times_finish_calibration().  I agree with the idea of keeping
all variables in the printk module, so this is more self-contained.  It also
reduces the number of visible #ifdefs.

I think I'll go ahead and do this, and see what the code looks like. I suspect
that what you recommend is the better way to go.

> 
> >  	if (ramdisk_execute_command) {
> >  		ret = run_init_process(ramdisk_execute_command);
> >  		if (!ret)
> > diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> > index 1d765ad242b8..f17877337735 100644
> > --- a/kernel/printk/printk.c
> > +++ b/kernel/printk/printk.c
> > @@ -46,6 +46,7 @@
> >  #include <linux/ctype.h>
> >  #include <linux/uio.h>
> >  #include <linux/sched/clock.h>
> > +#include <linux/early_times.h>
> >  #include <linux/sched/debug.h>
> >  #include <linux/sched/task_stack.h>
> >  #include <linux/panic.h>
> > @@ -75,6 +76,11 @@ EXPORT_SYMBOL(ignore_console_lock_warning);
> >
> >  EXPORT_TRACEPOINT_SYMBOL_GPL(console);
> >
> > +#ifdef CONFIG_EARLY_PRINTK_TIMES
> > +u32 early_mult, early_shift;
> > +u64 early_ts_offset;
> > +#endif
> > +
> >  /*
> >   * Low level drivers may need that to know if they can schedule in
> >   * their unblank() callback or not. So let's export it.
> > @@ -639,7 +645,7 @@ static void append_char(char **pp, char *e, char c)
> >  static ssize_t info_print_ext_header(char *buf, size_t size,
> >  				     struct printk_info *info)
> >  {
> > -	u64 ts_usec = info->ts_nsec;
> > +	u64 ts_usec = adjust_early_ts(info->ts_nsec);
> >  	char caller[20];
> >  #ifdef CONFIG_PRINTK_CALLER
> >  	u32 id = info->caller_id;
> > @@ -1352,7 +1358,11 @@ static size_t print_syslog(unsigned int level, char *buf)
> >
> >  static size_t print_time(u64 ts, char *buf)
> >  {
> > -	unsigned long rem_nsec = do_div(ts, 1000000000);
> > +	unsigned long rem_nsec;
> > +
> > +	ts = adjust_early_ts(ts);
> > +
> > +	rem_nsec = do_div(ts, 1000000000);
> >
> >  	return sprintf(buf, "[%5lu.%06lu]",
> >  		       (unsigned long)ts, rem_nsec / 1000);
> > @@ -2242,6 +2252,8 @@ int vprintk_store(int facility, int level,
> >  	 * timestamp with respect to the caller.
> >  	 */
> >  	ts_nsec = local_clock();
> > +	if (!ts_nsec)
> > +		ts_nsec = early_cycles();
> >
> >  	caller_id = printk_caller_id();
> >
> > --
> > 2.43.0
> >
> 
> Regards,
> Francesco

Thanks very much for the review and test!!
 -- Tim


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2] printk: fix zero-valued printk timestamps in early boot
  2026-01-26 16:52     ` Bird, Tim
@ 2026-02-02 16:23       ` Petr Mladek
  0 siblings, 0 replies; 36+ messages in thread
From: Petr Mladek @ 2026-02-02 16:23 UTC (permalink / raw)
  To: Bird, Tim
  Cc: Francesco Valla, rostedt@goodmis.org, john.ogness@linuxtronix.de,
	senozhatsky@chromium.org, linux-embedded@vger.kernel.org,
	linux-kernel@vger.kernel.org

On Mon 2026-01-26 16:52:57, Bird, Tim wrote:
> 
> 
> > -----Original Message-----
> > From: Francesco Valla <francesco@valla.it>
> > 
> > Hi Tim,
> > 
> > I tested this both on X86_64 QEMU and on a i.MX93 (ARM64) and can
> > confirm it is working as expected. Auto-calc of calibration data is far
> > better than the configuration parameters in v1.
> > 
> > It is slightly confusing to see a time value printed to serial output
> > and another one inside kmsg, but that's a human thing and should not
> > confuse any tool.
> Agreed.  I wasn't too worried about it, because most serious developers working
> on boot-time will not be watching early messages over serial console.  (Usually they
> use 'quiet' or some lower log level).  But on qemu, it does look strange to see 0s
> on the first output sequence, and then non-zeroes when using dmesg later in the same
> boot.
> 
> I just realized though, that I should go back and see if there's a discontinuity on the output via serial
> (before and after calibration), and possibly put a note about that in the config description.

I see the following in the serial console output:

[    3.288049][    T1] Write protecting the kernel read-only data: 36864k
[    3.298554][    T1] Freeing unused kernel image (text/rodata gap) memory: 1656K
[    3.318942][    T1] Freeing unused kernel image (rodata/data gap) memory: 1540K
[   12.230014][    T1] Early printk times: mult=38775352, shift=27, offset=8891950261 ns
[   12.246008][    T1] Run /init as init process
[   12.254944][    T1]   with arguments:
[   12.264341][    T1]     /init
[   12.272184][    T1]     nosplash

And this is from dmesg -S

[   12.179999] [    T1] Write protecting the kernel read-only data: 36864k
[   12.190505] [    T1] Freeing unused kernel image (text/rodata gap) memory: 1656K
[   12.210893] [    T1] Freeing unused kernel image (rodata/data gap) memory: 1540K
[   12.230014] [    T1] Early printk times: mult=38775352, shift=27, offset=8891950261 ns
[   12.246008] [    T1] Run /init as init process
[   12.254944] [    T1]   with arguments:
[   12.264341] [    T1]     /init
[   12.272184] [    T1]     nosplash

> I'll think about what I can do here to reduce the confusion.

I though about showing the non-adjusted timestamp with '?',
Something like:

[    3.288049?][    T1] Write protecting the kernel read-only data: 36864k
[    3.298554?][    T1] Freeing unused kernel image (text/rodata gap) memory: 1656K
[    3.318942?][    T1] Freeing unused kernel image (rodata/data gap) memory: 1540K
[   12.230014][    T1] Early printk times: mult=38775352, shift=27, offset=8891950261 ns
[   12.246008][    T1] Run /init as init process
[   12.254944][    T1]   with arguments:
[   12.264341][    T1]     /init
[   12.272184][    T1]     nosplash

But I am afraid that it might break some monitoring tools.

Well, it might be acceptable when this feature is not enabled
in production systems.

> > > --- a/init/Kconfig
> > > +++ b/init/Kconfig
> > > @@ -777,6 +777,18 @@ config IKHEADERS
> > >  	  or similar programs.  If you build the headers as a module, a module called
> > >  	  kheaders.ko is built which can be loaded on-demand to get access to headers.
> > >
> > > +config EARLY_PRINTK_TIMES
> > > +	bool "Show non-zero printk timestamps early in boot"
> > > +	default y
> > 
> > Considering that this might have a significant impact on monitoring
> > mechanisms already in place (that e.g. expect a specific dmesg print to
> > have a maximum associated time value), please consider a N default here.
> 
> Oops!  Sorry, that was supposed to be 'default n'.  You're right.  I know I had
> this as default N, and I think I switched it temporarily for testing, and forgot
> to switch it back (and never caught it the numerous times I reviewed the
> patch before sending it out again, ugh).  Thanks for catching this.
> 
> If people like this, and we don't see any problems with tooling or virtualization, I
> could see it switching to default Y in the future.  But for now this should definitely
> be 'default n'.

We need to be careful. The different output on console and via dmesg
might confuse people. The extra '?' might help poeple but it might confuse
tools.

> > 
> > > +	depends on PRINTK
> > > +	depends on ARM64 || X86_64
> > > +	help
> > > +	  Use a cycle-counter to provide printk timestamps during
> > > +	  early boot.  This allows seeing timestamps for printks that
> > > +	  would otherwise show as 0.  Note that this will shift the
> > > +	  printk timestamps to be relative to machine power on, instead
> > > +	  of relative to the start of kernel timekeeping.
> > > +
> > 
> > To be precise, the timestamps will be relative to processor power on;
> > the machine might have some other processors that run before the Linux
> > one (this is the case for example of i.MX9 or AM62 SoCs) and will be
> > unaccounted for even by this mechanism.
> 
> Good point.   Even more precisely, it will be relative to
> cycle-counter value initialization or reset, which often (but not always)
> corresponds to processor power on.
> 
> I'll adjust the wording.
> 
> I'm still a bit unsure what happens in the virtualization case.  qemu seems to initialize
> the TSC at qemu start, but I'm not sure what happens for e.g. client VMs on cloud servers.

I see the following via QEMU (from dmesg):

[    8.853613] Linux version 6.19.0-rc7-default+ (pmladek@pathway) (gcc (SUSE Linux) 15.2.1 20251006, GNU ld (GNU Binutils; openSUSE Tumbleweed) 2.45.0.20251103-2) #521 SMP PREEMPT_DYNAMIC Mon Feb  2 16:36:53 CET 2026
[    8.853617] Command line: BOOT_IMAGE=/boot/vmlinuz-6.19.0-rc7-default+ root=UUID=587ae802-e330-4059-9b48-d5b845e1075a resume=/dev/disk/by-uuid/369c7453-3d16-409d-88b2-5de027891a12 mitigations=auto nosplash earlycon=uart8250,io,0x3f8,115200 console=ttyS0,115200 console=ttynull console=tty0 debug_non_panic_cpus=1 panic=10 ignore_loglevel log_buf_len=1M
[    8.865086] BIOS-provided physical RAM map:
[    8.865087] BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable
[    8.865089] BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved
[    8.865090] BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff] reserved
[    8.865090] BIOS-e820: [mem 0x0000000000100000-0x000000007ffdbfff] usable
[    8.865091] BIOS-e820: [mem 0x000000007ffdc000-0x000000007fffffff] reserved
[    8.865092] BIOS-e820: [mem 0x00000000b0000000-0x00000000bfffffff] reserved
[    8.865092] BIOS-e820: [mem 0x00000000fed1c000-0x00000000fed1ffff] reserved
[    8.865093] BIOS-e820: [mem 0x00000000feffc000-0x00000000feffffff] reserved
[    8.865093] BIOS-e820: [mem 0x00000000fffc0000-0x00000000ffffffff] reserved
[    8.865094] BIOS-e820: [mem 0x0000000100000000-0x000000017fffffff] usable
[    8.865094] BIOS-e820: [mem 0x000000fd00000000-0x000000ffffffffff] reserved
[    8.865171] earlycon: uart8250 at I/O port 0x3f8 (options '115200')
[    8.865176] printk: legacy bootconsole [uart8250] enabled
[    8.892181] printk: allow messages from non-panic CPUs in panic()
[    8.893327] printk: debug: ignoring loglevel setting.
[...]
[   12.162011] Freeing unused decrypted memory: 2036K
[   12.171970] Freeing unused kernel image (initmem) memory: 7120K
[   12.179999] Write protecting the kernel read-only data: 36864k
[   12.190505] Freeing unused kernel image (text/rodata gap) memory: 1656K
[   12.210893] Freeing unused kernel image (rodata/data gap) memory: 1540K
[   12.230014] Early printk times: mult=38775352, shift=27, offset=8891950261 ns
[   12.246008] Run /init as init process
[   12.254944]   with arguments:
[   12.264341]     /init
[   12.272184]     nosplash
[   12.280738]   with environment:
[   12.288728]     HOME=/
[   12.296319]     TERM=linux

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2] printk: fix zero-valued printk timestamps in early boot
  2026-01-24 19:40 ` [PATCH v2] printk: fix zero-valued printk timestamps in early boot Tim Bird
  2026-01-25 14:41   ` Francesco Valla
@ 2026-01-26 10:12   ` Geert Uytterhoeven
  2026-01-26 17:11     ` Bird, Tim
  1 sibling, 1 reply; 36+ messages in thread
From: Geert Uytterhoeven @ 2026-01-26 10:12 UTC (permalink / raw)
  To: Tim Bird
  Cc: pmladek, rostedt, john.ogness, senozhatsky, francesco,
	linux-embedded, linux-kernel

Hi Tim,

On Sat, 24 Jan 2026 at 20:41, Tim Bird <tim.bird@sony.com> wrote:
> During early boot, printk timestamps are reported as zero before
> kernel timekeeping starts (e.g. before time_init()).  This
> hinders boot-time optimization efforts.  This period is about 400
> milliseconds for many current desktop and embedded machines
> running Linux.
>
> Add support to save cycles during early boot, and output correct
> timestamp values after timekeeping is initialized.  get_cycles()
> is operational on arm64 and x86_64 from kernel start.  Add code
> and variables to save calibration values used to later convert
> cycle counts to time values in the early printks.  Add a config
> to control the feature.
>
> This yields non-zero timestamps for printks from the very start
> of kernel execution.  The timestamps are relative to the start of
> the architecture-specified counter used in get_cycles
> (e.g. the TSC on x86_64 and cntvct_el0 on arm64).
>
> All timestamps reflect time from power-on instead of time from
> the kernel's timekeeping initialization.
>
> Signed-off-by: Tim Bird <tim.bird@sony.com>
> ---
> V1 -> V2
>   Remove calibration CONFIG vars
>   Add 'depends on' to restrict arches (to handle ppc bug)
>   Add early_ts_offset to avoid discontinuity
>   Save cycles in ts_nsec, and convert on output
>   Move conditional code to include file (early_times.h)

Thanks for the update!

> --- /dev/null
> +++ b/include/linux/early_times.h
> @@ -0,0 +1,48 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef _KERNEL_PRINTK_EARLY_TIMES_H
> +#define _KERNEL_PRINTK_EARLY_TIMES_H
> +
> +#include <linux/timex.h>
> +
> +#if defined(CONFIG_EARLY_PRINTK_TIMES)
> +extern u32 early_mult, early_shift;
> +extern u64 early_ts_offset;
> +
> +static inline u64 early_cycles(void)
> +{
> +       return ((u64)get_cycles() | (1ULL << 63));

No need to cast to u64, as the second operand of the OR is u64 anyway.
BIT_ULL(63)

I think it would be good to have a #define for this at the top.

> +}
> +
> +static inline u64 adjust_early_ts(u64 ts)
> +{
> +       /* High bit means ts is a cycle count */
> +       if (unlikely(ts & (1ULL << 63)))
> +               /*
> +                * mask high bit and convert to ns
> +                * Note that early_mult may be 0, but that's OK because
> +                * we'll just multiply by 0 and return 0. This will
> +                * only occur if we're outputting a printk message
> +                * before the calibration of the early timestamp.
> +                * Any output after user space start (eg. from dmesg or
> +                * journalctl) will show correct values.
> +                */
> +               return (((ts & ~(1ULL << 63)) * early_mult) >> early_shift);

Please use the mul_u64_u32_shr() helper.

Please wrap this block in curly braces.
Alternatively, you can invert the logic:

    if (likely(!(ts & DEFINITION_FOR_1ULL_LSH63)))
            return ts + early_ts_offset;

> +
> +       /* If timestamp is already in ns, just add offset */
> +       return ts + early_ts_offset;
> +}
> +#else
> +static inline u64 early_cycles(void)
> +{
> +       return 0;
> +}
> +
> +static inline u64 adjust_early_ts(u64 ts)
> +{
> +       return ts;
> +}
> +#endif /* CONFIG_EARLY_PRINTK_TIMES */
> +
> +#endif /* _KERNEL_PRINTK_EARLY_TIMES_H */
> +
> diff --git a/init/Kconfig b/init/Kconfig
> index fa79feb8fe57..060a22cddd17 100644
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -777,6 +777,18 @@ config IKHEADERS
>           or similar programs.  If you build the headers as a module, a module called
>           kheaders.ko is built which can be loaded on-demand to get access to headers.
>
> +config EARLY_PRINTK_TIMES
> +       bool "Show non-zero printk timestamps early in boot"
> +       default y
> +       depends on PRINTK
> +       depends on ARM64 || X86_64

So for now this is limited to (a few) 64-bit platforms...

> +       help
> +         Use a cycle-counter to provide printk timestamps during
> +         early boot.  This allows seeing timestamps for printks that
> +         would otherwise show as 0.  Note that this will shift the
> +         printk timestamps to be relative to machine power on, instead
> +         of relative to the start of kernel timekeeping.
> +
>  config LOG_BUF_SHIFT
>         int "Kernel log buffer size (16 => 64KB, 17 => 128KB)"
>         range 12 25
> diff --git a/init/main.c b/init/main.c
> index b84818ad9685..cc1af26933f7 100644
> --- a/init/main.c
> +++ b/init/main.c

> @@ -160,6 +163,10 @@ static size_t initargs_offs;
>  # define initargs_offs 0
>  #endif
>
> +#ifdef CONFIG_EARLY_PRINTK_TIMES
> +static u64 start_cycles, start_ns;

cycles_t start_cycles;

(cycles_t is unsigned long, i.e. either 32- or 64-bit).

> +#endif
> +
>  static char *execute_command;
>  static char *ramdisk_execute_command = "/init";
>
> @@ -1118,6 +1125,11 @@ void start_kernel(void)
>         timekeeping_init();
>         time_init();
>
> +#ifdef CONFIG_EARLY_PRINTK_TIMES
> +       start_cycles = get_cycles();
> +       start_ns = local_clock();
> +#endif
> +
>         /* This must be after timekeeping is initialized */
>         random_init();
>
> @@ -1600,6 +1612,20 @@ static int __ref kernel_init(void *unused)
>
>         do_sysctl_args();
>
> +#ifdef CONFIG_EARLY_PRINTK_TIMES
> +       u64 end_cycles, end_ns;

cycles_t end_cycles;

> +
> +       /* set calibration data for early_printk_times */
> +       end_cycles = get_cycles();
> +       end_ns = local_clock();
> +       clocks_calc_mult_shift(&early_mult, &early_shift,
> +               ((end_cycles - start_cycles) * NSEC_PER_SEC)/(end_ns - start_ns),
> +               NSEC_PER_SEC, 50);

mul_u64_u64_div_u64() is probably the best helper that is available
(there is no mul_ulong_u32_div_u64()).

> +       early_ts_offset = ((start_cycles * early_mult) >> early_shift) - start_ns;
> +       pr_debug("Early printk times: mult=%u, shift=%u, offset=%llu ns\n",
> +               early_mult, early_shift, early_ts_offset);
> +#endif
> +
>         if (ramdisk_execute_command) {
>                 ret = run_init_process(ramdisk_execute_command);
>                 if (!ret)

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: [PATCH v2] printk: fix zero-valued printk timestamps in early boot
  2026-01-26 10:12   ` Geert Uytterhoeven
@ 2026-01-26 17:11     ` Bird, Tim
  2026-01-27  8:10       ` Geert Uytterhoeven
  0 siblings, 1 reply; 36+ messages in thread
From: Bird, Tim @ 2026-01-26 17:11 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: pmladek@suse.com, rostedt@goodmis.org, john.ogness@linuxtronix.de,
	senozhatsky@chromium.org, francesco@valla.it,
	linux-embedded@vger.kernel.org, linux-kernel@vger.kernel.org



> -----Original Message-----
> From: Geert Uytterhoeven <geert@linux-m68k.org>
> Hi Tim,
> 
> On Sat, 24 Jan 2026 at 20:41, Tim Bird <tim.bird@sony.com> wrote:
> > During early boot, printk timestamps are reported as zero before
> > kernel timekeeping starts (e.g. before time_init()).  This
> > hinders boot-time optimization efforts.  This period is about 400
> > milliseconds for many current desktop and embedded machines
> > running Linux.
> >
> > Add support to save cycles during early boot, and output correct
> > timestamp values after timekeeping is initialized.  get_cycles()
> > is operational on arm64 and x86_64 from kernel start.  Add code
> > and variables to save calibration values used to later convert
> > cycle counts to time values in the early printks.  Add a config
> > to control the feature.
> >
> > This yields non-zero timestamps for printks from the very start
> > of kernel execution.  The timestamps are relative to the start of
> > the architecture-specified counter used in get_cycles
> > (e.g. the TSC on x86_64 and cntvct_el0 on arm64).
> >
> > All timestamps reflect time from power-on instead of time from
> > the kernel's timekeeping initialization.
> >
> > Signed-off-by: Tim Bird <tim.bird@sony.com>
> > ---
> > V1 -> V2
> >   Remove calibration CONFIG vars
> >   Add 'depends on' to restrict arches (to handle ppc bug)
> >   Add early_ts_offset to avoid discontinuity
> >   Save cycles in ts_nsec, and convert on output
> >   Move conditional code to include file (early_times.h)
> 
> Thanks for the update!
> 
> > --- /dev/null
> > +++ b/include/linux/early_times.h
> > @@ -0,0 +1,48 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +
> > +#ifndef _KERNEL_PRINTK_EARLY_TIMES_H
> > +#define _KERNEL_PRINTK_EARLY_TIMES_H
> > +
> > +#include <linux/timex.h>
> > +
> > +#if defined(CONFIG_EARLY_PRINTK_TIMES)
> > +extern u32 early_mult, early_shift;
> > +extern u64 early_ts_offset;
> > +
> > +static inline u64 early_cycles(void)
> > +{
> > +       return ((u64)get_cycles() | (1ULL << 63));
> 
> No need to cast to u64, as the second operand of the OR is u64 anyway.
> BIT_ULL(63)
> 
> I think it would be good to have a #define for this at the top.

I'll look at this.  Is BIT_ULL(63) preferred over (1ULL << 63)?

Do you think something like "HIGH_BIT63" would be good enough?

> 
> > +}
> > +
> > +static inline u64 adjust_early_ts(u64 ts)
> > +{
> > +       /* High bit means ts is a cycle count */
> > +       if (unlikely(ts & (1ULL << 63)))
> > +               /*
> > +                * mask high bit and convert to ns
> > +                * Note that early_mult may be 0, but that's OK because
> > +                * we'll just multiply by 0 and return 0. This will
> > +                * only occur if we're outputting a printk message
> > +                * before the calibration of the early timestamp.
> > +                * Any output after user space start (eg. from dmesg or
> > +                * journalctl) will show correct values.
> > +                */
> > +               return (((ts & ~(1ULL << 63)) * early_mult) >> early_shift);
> 
> Please use the mul_u64_u32_shr() helper.
OK.  I did not know about that.

I can check, but do you know offhand if timestamps from local_clock() on 32-bit systems are
always 64-bit nanoseconds?  I assume so looking at the printk code and
making some assumptions.  (But that's dangerous.)

> 
> Please wrap this block in curly braces.
> Alternatively, you can invert the logic:
> 
>     if (likely(!(ts & DEFINITION_FOR_1ULL_LSH63)))
>             return ts + early_ts_offset;

This is probably a better structure anyway.  Will do.

> 
> > +
> > +       /* If timestamp is already in ns, just add offset */
> > +       return ts + early_ts_offset;
> > +}
> > +#else
> > +static inline u64 early_cycles(void)
> > +{
> > +       return 0;
> > +}
> > +
> > +static inline u64 adjust_early_ts(u64 ts)
> > +{
> > +       return ts;
> > +}
> > +#endif /* CONFIG_EARLY_PRINTK_TIMES */
> > +
> > +#endif /* _KERNEL_PRINTK_EARLY_TIMES_H */
> > +
> > diff --git a/init/Kconfig b/init/Kconfig
> > index fa79feb8fe57..060a22cddd17 100644
> > --- a/init/Kconfig
> > +++ b/init/Kconfig
> > @@ -777,6 +777,18 @@ config IKHEADERS
> >           or similar programs.  If you build the headers as a module, a module called
> >           kheaders.ko is built which can be loaded on-demand to get access to headers.
> >
> > +config EARLY_PRINTK_TIMES
> > +       bool "Show non-zero printk timestamps early in boot"
> > +       default y
> > +       depends on PRINTK
> > +       depends on ARM64 || X86_64
> 
> So for now this is limited to (a few) 64-bit platforms...

Yes, but it really shouldn't be.  I got spooked when 0-day told me that the code
wouldn't link on powerpc, so I restricted it to just machines I was actually testing.
But this should work on some ARM32 and most x8632 platforms.  Actually, it should
work on anything that has a cycle-counter from kernel start (some RISC machine
might qualify as well).  However, I've seen a few cases where some platforms require
kernel initialization of their cycle-counter, and I wanted to play it safe.

If such platforms are well-behaved and return 0 before they are initialized, it should
still be safe to turn this on - it just won't have any effect.

I plan to do some more investigation of the powerpc error.  It was 
powerpc-linux-ld: init/main.o: in function `kernel_init':
main.c:(.ref.text+0x144): undefined reference to `__udivdi3'

from the definition of clocks_calc_mult_shift(), which seems to indicate a bug
in the powerpc code.  In the long run, I should try to track down that bug rather than
exclude a bunch of other (likely-working) arches.  But I was playing it conservative for now.

> 
> > +       help
> > +         Use a cycle-counter to provide printk timestamps during
> > +         early boot.  This allows seeing timestamps for printks that
> > +         would otherwise show as 0.  Note that this will shift the
> > +         printk timestamps to be relative to machine power on, instead
> > +         of relative to the start of kernel timekeeping.
> > +
> >  config LOG_BUF_SHIFT
> >         int "Kernel log buffer size (16 => 64KB, 17 => 128KB)"
> >         range 12 25
> > diff --git a/init/main.c b/init/main.c
> > index b84818ad9685..cc1af26933f7 100644
> > --- a/init/main.c
> > +++ b/init/main.c
> 
> > @@ -160,6 +163,10 @@ static size_t initargs_offs;
> >  # define initargs_offs 0
> >  #endif
> >
> > +#ifdef CONFIG_EARLY_PRINTK_TIMES
> > +static u64 start_cycles, start_ns;
> 
> cycles_t start_cycles;
> 
> (cycles_t is unsigned long, i.e. either 32- or 64-bit).
OK, I’ll use this type.

> 
> > +#endif
> > +
> >  static char *execute_command;
> >  static char *ramdisk_execute_command = "/init";
> >
> > @@ -1118,6 +1125,11 @@ void start_kernel(void)
> >         timekeeping_init();
> >         time_init();
> >
> > +#ifdef CONFIG_EARLY_PRINTK_TIMES
> > +       start_cycles = get_cycles();
> > +       start_ns = local_clock();
> > +#endif
> > +
> >         /* This must be after timekeeping is initialized */
> >         random_init();
> >
> > @@ -1600,6 +1612,20 @@ static int __ref kernel_init(void *unused)
> >
> >         do_sysctl_args();
> >
> > +#ifdef CONFIG_EARLY_PRINTK_TIMES
> > +       u64 end_cycles, end_ns;
> 
> cycles_t end_cycles;
ack.
> 
> > +
> > +       /* set calibration data for early_printk_times */
> > +       end_cycles = get_cycles();
> > +       end_ns = local_clock();
> > +       clocks_calc_mult_shift(&early_mult, &early_shift,
> > +               ((end_cycles - start_cycles) * NSEC_PER_SEC)/(end_ns - start_ns),
> > +               NSEC_PER_SEC, 50);
> 
> mul_u64_u64_div_u64() is probably the best helper that is available
> (there is no mul_ulong_u32_div_u64()).

What would the mul_u64_u64_div_u64 do if cycles_t is u32?
Should it sill all work, just not optimized?

> 
> > +       early_ts_offset = ((start_cycles * early_mult) >> early_shift) - start_ns;
> > +       pr_debug("Early printk times: mult=%u, shift=%u, offset=%llu ns\n",
> > +               early_mult, early_shift, early_ts_offset);
> > +#endif
> > +
> >         if (ramdisk_execute_command) {
> >                 ret = run_init_process(ramdisk_execute_command);
> >                 if (!ret)
> 
> Gr{oetje,eeting}s,
> 
>                         Geert
> 
Thanks very much for the review and suggestions.  I'm away from my lab for the next week,
but I'd really like to test this on a 32-bit platform that has an early cycle-counter available.
I'll have to do some research.  I have some 32-bit platforms in my lab, but I'm not sure which ones
have early-cycle-counter support.   Do any of the 32-bit platforms you're familiar with support
early cycle-counters  (that is, cycle-counters that are running when the kernel starts, and don't
need any kernel initialization at all)?
 -- Tim


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2] printk: fix zero-valued printk timestamps in early boot
  2026-01-26 17:11     ` Bird, Tim
@ 2026-01-27  8:10       ` Geert Uytterhoeven
  0 siblings, 0 replies; 36+ messages in thread
From: Geert Uytterhoeven @ 2026-01-27  8:10 UTC (permalink / raw)
  To: Bird, Tim
  Cc: pmladek@suse.com, rostedt@goodmis.org, john.ogness@linuxtronix.de,
	senozhatsky@chromium.org, francesco@valla.it,
	linux-embedded@vger.kernel.org, linux-kernel@vger.kernel.org

Hi Tim,

On Mon, 26 Jan 2026 at 18:11, Bird, Tim <Tim.Bird@sony.com> wrote:
> > From: Geert Uytterhoeven <geert@linux-m68k.org>
> > On Sat, 24 Jan 2026 at 20:41, Tim Bird <tim.bird@sony.com> wrote:
> > > During early boot, printk timestamps are reported as zero before
> > > kernel timekeeping starts (e.g. before time_init()).  This
> > > hinders boot-time optimization efforts.  This period is about 400
> > > milliseconds for many current desktop and embedded machines
> > > running Linux.
> > >
> > > Add support to save cycles during early boot, and output correct
> > > timestamp values after timekeeping is initialized.  get_cycles()
> > > is operational on arm64 and x86_64 from kernel start.  Add code
> > > and variables to save calibration values used to later convert
> > > cycle counts to time values in the early printks.  Add a config
> > > to control the feature.
> > >
> > > This yields non-zero timestamps for printks from the very start
> > > of kernel execution.  The timestamps are relative to the start of
> > > the architecture-specified counter used in get_cycles
> > > (e.g. the TSC on x86_64 and cntvct_el0 on arm64).
> > >
> > > All timestamps reflect time from power-on instead of time from
> > > the kernel's timekeeping initialization.
> > >
> > > Signed-off-by: Tim Bird <tim.bird@sony.com>
> > > ---
> > > V1 -> V2
> > >   Remove calibration CONFIG vars
> > >   Add 'depends on' to restrict arches (to handle ppc bug)
> > >   Add early_ts_offset to avoid discontinuity
> > >   Save cycles in ts_nsec, and convert on output
> > >   Move conditional code to include file (early_times.h)
> >
> > Thanks for the update!
> >
> > > --- /dev/null
> > > +++ b/include/linux/early_times.h
> > > @@ -0,0 +1,48 @@
> > > +/* SPDX-License-Identifier: GPL-2.0 */
> > > +
> > > +#ifndef _KERNEL_PRINTK_EARLY_TIMES_H
> > > +#define _KERNEL_PRINTK_EARLY_TIMES_H
> > > +
> > > +#include <linux/timex.h>
> > > +
> > > +#if defined(CONFIG_EARLY_PRINTK_TIMES)
> > > +extern u32 early_mult, early_shift;
> > > +extern u64 early_ts_offset;
> > > +
> > > +static inline u64 early_cycles(void)
> > > +{
> > > +       return ((u64)get_cycles() | (1ULL << 63));
> >
> > No need to cast to u64, as the second operand of the OR is u64 anyway.
> > BIT_ULL(63)
> >
> > I think it would be good to have a #define for this at the top.
>
> I'll look at this.  Is BIT_ULL(63) preferred over (1ULL << 63)?

When you refer to the bit value, yes: BIT() for unsigned long, BIT_ULL()
for unsigned long long; recently we got BIT_U{8,16,32,64}(), too).

> Do you think something like "HIGH_BIT63" would be good enough?

I'd name it for what it means, not what it does, e.g. EARLY_TS_FLAG?

> > > +}
> > > +
> > > +static inline u64 adjust_early_ts(u64 ts)
> > > +{
> > > +       /* High bit means ts is a cycle count */
> > > +       if (unlikely(ts & (1ULL << 63)))
> > > +               /*
> > > +                * mask high bit and convert to ns
> > > +                * Note that early_mult may be 0, but that's OK because
> > > +                * we'll just multiply by 0 and return 0. This will
> > > +                * only occur if we're outputting a printk message
> > > +                * before the calibration of the early timestamp.
> > > +                * Any output after user space start (eg. from dmesg or
> > > +                * journalctl) will show correct values.
> > > +                */
> > > +               return (((ts & ~(1ULL << 63)) * early_mult) >> early_shift);
> >
> > Please use the mul_u64_u32_shr() helper.
> OK.  I did not know about that.
>
> I can check, but do you know offhand if timestamps from local_clock() on 32-bit systems are
> always 64-bit nanoseconds?  I assume so looking at the printk code and
> making some assumptions.  (But that's dangerous.)

I am not 100% sure, but I think they are.  Note that on systems
without high-resolution timers, they may increment in large steps,
e.g. by 10000000 if HZ=100.

> > > --- a/init/Kconfig
> > > +++ b/init/Kconfig
> > > @@ -777,6 +777,18 @@ config IKHEADERS
> > >           or similar programs.  If you build the headers as a module, a module called
> > >           kheaders.ko is built which can be loaded on-demand to get access to headers.
> > >
> > > +config EARLY_PRINTK_TIMES
> > > +       bool "Show non-zero printk timestamps early in boot"
> > > +       default y
> > > +       depends on PRINTK
> > > +       depends on ARM64 || X86_64
> >
> > So for now this is limited to (a few) 64-bit platforms...
>
> Yes, but it really shouldn't be.  I got spooked when 0-day told me that the code
> wouldn't link on powerpc, so I restricted it to just machines I was actually testing.
> But this should work on some ARM32 and most x8632 platforms.  Actually, it should
> work on anything that has a cycle-counter from kernel start (some RISC machine
> might qualify as well).  However, I've seen a few cases where some platforms require
> kernel initialization of their cycle-counter, and I wanted to play it safe.
>
> If such platforms are well-behaved and return 0 before they are initialized, it should
> still be safe to turn this on - it just won't have any effect.

At least on m68k, get_cycles() always returns zero ;-)

> I plan to do some more investigation of the powerpc error.  It was
> powerpc-linux-ld: init/main.o: in function `kernel_init':
> main.c:(.ref.text+0x144): undefined reference to `__udivdi3'
>
> from the definition of clocks_calc_mult_shift(), which seems to indicate a bug
> in the powerpc code.  In the long run, I should try to track down that bug rather than
> exclude a bunch of other (likely-working) arches.  But I was playing it conservative for now.

An undefined reference to __udivdi3 means you are using a 64-by-32
division, for which you should always use the helpers from
<linux/math64.h>.  Even on 64-bit, the helpers may generate better code.

> > > --- a/init/main.c
> > > +++ b/init/main.c

> > > +
> > > +       /* set calibration data for early_printk_times */
> > > +       end_cycles = get_cycles();
> > > +       end_ns = local_clock();
> > > +       clocks_calc_mult_shift(&early_mult, &early_shift,
> > > +               ((end_cycles - start_cycles) * NSEC_PER_SEC)/(end_ns - start_ns),
> > > +               NSEC_PER_SEC, 50);
> >
> > mul_u64_u64_div_u64() is probably the best helper that is available
> > (there is no mul_ulong_u32_div_u64()).
>
> What would the mul_u64_u64_div_u64 do if cycles_t is u32?
> Should it sill all work, just not optimized?

It should still work.
arch/x86/include/asm/div64.h uses divq, so it is up to Intel or AMD.
Everything else uses lib/math/div64.c.
If need, you can add an optimized version for e.g. arm64, too.

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v3] printk: fix zero-valued printk timestamps in early boot
  2025-11-25  5:30 [PATCH] printk: add early_counter_ns routine for printk blind spot Tim Bird
                   ` (4 preceding siblings ...)
  2026-01-24 19:40 ` [PATCH v2] printk: fix zero-valued printk timestamps in early boot Tim Bird
@ 2026-02-10 23:47 ` Tim Bird
  2026-03-04 11:23   ` Petr Mladek
                     ` (4 more replies)
  5 siblings, 5 replies; 36+ messages in thread
From: Tim Bird @ 2026-02-10 23:47 UTC (permalink / raw)
  To: pmladek, rostedt, john.ogness, senozhatsky
  Cc: francesco, geert, linux-embedded, linux-kernel, Tim Bird

During early boot, printk timestamps are reported as zero before
kernel timekeeping starts (e.g. before time_init()).  This
hinders boot-time optimization efforts.  This period is about 400
milliseconds for many current desktop and embedded machines
running Linux.

Add support to save cycles during early boot, and output correct
timestamp values after timekeeping is initialized.  get_cycles()
is operational on arm64 and x86_64 from kernel start.  Add code
and variables to save calibration values used to later convert
cycle counts to time values in the early printks.  Add a config
to control the feature.

This yields non-zero timestamps for printks from the very start
of kernel execution.  The timestamps are relative to the start of
the architecture-specified counter used in get_cycles
(e.g. the TSC on x86_64 and cntvct_el0 on arm64).

All timestamps reflect time from processor power-on instead of
time from the kernel's timekeeping initialization.

Signed-off-by: Tim Bird <tim.bird@sony.com>
---
V2->V3
 Default CONFIG option to 'n'
 Move more code from into early_times.h
  (reducing ifdefs in init/main.c)
 Use math64 helper routines
 Use cycles_t instead of u64 type
 Add #defines for EARLY_CYCLES_BIT and MASK
 Invert if logic in adjust_early_ts()
 (note: no change to 'depends on' in Kconfig entry)

V1->V2
 Remove calibration CONFIG vars
 Add 'depends on' to restrict arches (to handle ppc bug)
 Add early_ts_offset to avoid discontinuity
 Save cycles in ts_nsec, and convert on output
 Move conditional code to include file (early_times.h)

 include/linux/early_times.h | 85 +++++++++++++++++++++++++++++++++++++
 init/Kconfig                | 14 ++++++
 init/main.c                 |  6 +++
 kernel/printk/printk.c      | 18 +++++++-
 4 files changed, 121 insertions(+), 2 deletions(-)
 create mode 100644 include/linux/early_times.h

diff --git a/include/linux/early_times.h b/include/linux/early_times.h
new file mode 100644
index 000000000000..05388dcb8573
--- /dev/null
+++ b/include/linux/early_times.h
@@ -0,0 +1,85 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef _KERNEL_PRINTK_EARLY_TIMES_H
+#define _KERNEL_PRINTK_EARLY_TIMES_H
+
+#include <linux/timex.h>
+#include <linux/clocksource.h>
+
+/* use high bit of a u64 to indicate cycles instead of a timestamp */
+#define EARLY_CYCLES_BIT	BIT_ULL(63)
+#define EARLY_CYCLES_MASK	~(BIT_ULL(63))
+
+#if defined(CONFIG_EARLY_PRINTK_TIMES)
+extern cycles_t start_cycles;
+extern u64 start_ns;
+extern u32 early_mult, early_shift;
+extern u64 early_ts_offset;
+
+static inline void early_times_start_calibration(void)
+{
+	start_cycles = get_cycles();
+	start_ns = local_clock();
+}
+
+static inline void early_times_finish_calibration(void)
+{
+	cycles_t end_cycles;
+	u64 end_ns;
+
+	/* set calibration data for early_printk_times */
+	end_cycles = get_cycles();
+	end_ns = local_clock();
+	clocks_calc_mult_shift(&early_mult, &early_shift,
+		mul_u64_u64_div_u64(end_cycles - start_cycles,
+			NSEC_PER_SEC, end_ns - start_ns),
+		NSEC_PER_SEC, 100);
+	early_ts_offset = mul_u64_u32_shr(start_cycles, early_mult, early_shift) - start_ns;
+
+	pr_debug("Early printk times: mult=%u, shift=%u, offset=%llu ns\n",
+		early_mult, early_shift, early_ts_offset);
+}
+
+static inline u64 early_cycles(void)
+{
+	return (get_cycles() | EARLY_CYCLES_BIT);
+}
+
+/*
+ * adjust_early_ts detects whether ts in is cycles or nanoseconds
+ * and converts it or adjusts it, taking into account the offset
+ * from cycle-counter start.
+ *
+ * Note that early_mult may be 0, but that's OK because
+ * we'll just multiply by 0 and return 0. This will
+ * only occur if we're outputting a printk message
+ * before the calibration of the early timestamp.
+ * Any output after user space start (eg. from dmesg or
+ * journalctl) will show correct values.
+ */
+static inline u64 adjust_early_ts(u64 ts)
+{
+	if (likely(!(ts & EARLY_CYCLES_BIT)))
+		/* if timestamp is not in cycles, just add offset */
+		return ts + early_ts_offset;
+
+	/* mask high bit and convert to nanoseconds */
+	return mul_u64_u32_shr(ts & EARLY_CYCLES_MASK, early_mult, early_shift);
+}
+
+#else
+# define early_times_start_calibration() do { } while (0)
+# define early_times_finish_calibration() do { } while (0)
+
+static inline u64 early_cycles(void)
+{
+	return 0;
+}
+
+static inline u64 adjust_early_ts(u64 ts)
+{
+	return ts;
+}
+#endif /* CONFIG_EARLY_PRINTK_TIMES */
+
+#endif /* _KERNEL_PRINTK_EARLY_TIMES_H */
diff --git a/init/Kconfig b/init/Kconfig
index fa79feb8fe57..a928c1efb09d 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -777,6 +777,20 @@ config IKHEADERS
 	  or similar programs.  If you build the headers as a module, a module called
 	  kheaders.ko is built which can be loaded on-demand to get access to headers.
 
+config EARLY_PRINTK_TIMES
+	bool "Show non-zero printk timestamps early in boot"
+	default n
+	depends on PRINTK
+	depends on ARM64 || X86_64
+	help
+	  Use a cycle-counter to provide printk timestamps during
+	  early boot.  This allows seeing timestamps for printks that
+	  would otherwise show as 0.  Note that this will shift the
+	  printk timestamps to be relative to processor power on, instead
+	  of relative to the start of kernel timekeeping.  This should be
+	  closer to machine power on, giving a better indication of
+	  overall boot time.
+
 config LOG_BUF_SHIFT
 	int "Kernel log buffer size (16 => 64KB, 17 => 128KB)"
 	range 12 25
diff --git a/init/main.c b/init/main.c
index b84818ad9685..d5774aec1aff 100644
--- a/init/main.c
+++ b/init/main.c
@@ -104,6 +104,7 @@
 #include <linux/pidfs.h>
 #include <linux/ptdump.h>
 #include <linux/time_namespace.h>
+#include <linux/early_times.h>
 #include <net/net_namespace.h>
 
 #include <asm/io.h>
@@ -1118,6 +1119,9 @@ void start_kernel(void)
 	timekeeping_init();
 	time_init();
 
+	/* This must be after timekeeping is initialized */
+	early_times_start_calibration();
+
 	/* This must be after timekeeping is initialized */
 	random_init();
 
@@ -1600,6 +1604,8 @@ static int __ref kernel_init(void *unused)
 
 	do_sysctl_args();
 
+	early_times_finish_calibration();
+
 	if (ramdisk_execute_command) {
 		ret = run_init_process(ramdisk_execute_command);
 		if (!ret)
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 1d765ad242b8..5afd31c3345c 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -46,6 +46,7 @@
 #include <linux/ctype.h>
 #include <linux/uio.h>
 #include <linux/sched/clock.h>
+#include <linux/early_times.h>
 #include <linux/sched/debug.h>
 #include <linux/sched/task_stack.h>
 #include <linux/panic.h>
@@ -75,6 +76,13 @@ EXPORT_SYMBOL(ignore_console_lock_warning);
 
 EXPORT_TRACEPOINT_SYMBOL_GPL(console);
 
+#ifdef CONFIG_EARLY_PRINTK_TIMES
+cycles_t start_cycles;
+u64 start_ns;
+u32 early_mult, early_shift;
+u64 early_ts_offset;
+#endif
+
 /*
  * Low level drivers may need that to know if they can schedule in
  * their unblank() callback or not. So let's export it.
@@ -639,7 +647,7 @@ static void append_char(char **pp, char *e, char c)
 static ssize_t info_print_ext_header(char *buf, size_t size,
 				     struct printk_info *info)
 {
-	u64 ts_usec = info->ts_nsec;
+	u64 ts_usec = adjust_early_ts(info->ts_nsec);
 	char caller[20];
 #ifdef CONFIG_PRINTK_CALLER
 	u32 id = info->caller_id;
@@ -1352,7 +1360,11 @@ static size_t print_syslog(unsigned int level, char *buf)
 
 static size_t print_time(u64 ts, char *buf)
 {
-	unsigned long rem_nsec = do_div(ts, 1000000000);
+	unsigned long rem_nsec;
+
+	ts = adjust_early_ts(ts);
+
+	rem_nsec = do_div(ts, 1000000000);
 
 	return sprintf(buf, "[%5lu.%06lu]",
 		       (unsigned long)ts, rem_nsec / 1000);
@@ -2242,6 +2254,8 @@ int vprintk_store(int facility, int level,
 	 * timestamp with respect to the caller.
 	 */
 	ts_nsec = local_clock();
+	if (!ts_nsec)
+		ts_nsec = early_cycles();
 
 	caller_id = printk_caller_id();
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH v3] printk: fix zero-valued printk timestamps in early boot
  2026-02-10 23:47 ` [PATCH v3] " Tim Bird
@ 2026-03-04 11:23   ` Petr Mladek
  2026-03-09 17:27   ` Shashank Balaji
                     ` (3 subsequent siblings)
  4 siblings, 0 replies; 36+ messages in thread
From: Petr Mladek @ 2026-03-04 11:23 UTC (permalink / raw)
  To: Tim Bird
  Cc: rostedt, john.ogness, senozhatsky, francesco, geert,
	linux-embedded, linux-kernel

On Tue 2026-02-10 16:47:41, Tim Bird wrote:
> During early boot, printk timestamps are reported as zero before
> kernel timekeeping starts (e.g. before time_init()).  This
> hinders boot-time optimization efforts.  This period is about 400
> milliseconds for many current desktop and embedded machines
> running Linux.
> 
> Add support to save cycles during early boot, and output correct
> timestamp values after timekeeping is initialized.  get_cycles()
> is operational on arm64 and x86_64 from kernel start.  Add code
> and variables to save calibration values used to later convert
> cycle counts to time values in the early printks.  Add a config
> to control the feature.
> 
> This yields non-zero timestamps for printks from the very start
> of kernel execution.  The timestamps are relative to the start of
> the architecture-specified counter used in get_cycles
> (e.g. the TSC on x86_64 and cntvct_el0 on arm64).
> 
> All timestamps reflect time from processor power-on instead of
> time from the kernel's timekeeping initialization.
> 
> Signed-off-by: Tim Bird <tim.bird@sony.com>

It looks good to me and seems to work fine. Feel free to use:

Reviewed-by: Petr Mladek <pmladek@suse.com>
Tested-by: Petr Mladek <pmladek@suse.com>

See a note below.

> --- a/init/main.c
> +++ b/init/main.c
> @@ -104,6 +104,7 @@
>  #include <linux/pidfs.h>
>  #include <linux/ptdump.h>
>  #include <linux/time_namespace.h>
> +#include <linux/early_times.h>

JFYI, I have tried this patch on top of the current Linus' tree (v7.0-rc2+)
and it conflicted with the commit 499f86de4f8c34e19 ("init/main: read
bootconfig header with get_unaligned_le32()") which added here:

 #include <linux/unaligned.h>

>  #include <net/net_namespace.h>
>  
>  #include <asm/io.h>

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v3] printk: fix zero-valued printk timestamps in early boot
  2026-02-10 23:47 ` [PATCH v3] " Tim Bird
  2026-03-04 11:23   ` Petr Mladek
@ 2026-03-09 17:27   ` Shashank Balaji
  2026-03-10 10:43     ` Petr Mladek
  2026-03-10 19:17     ` Bird, Tim
  2026-03-09 19:25   ` Shashank Balaji
                     ` (2 subsequent siblings)
  4 siblings, 2 replies; 36+ messages in thread
From: Shashank Balaji @ 2026-03-09 17:27 UTC (permalink / raw)
  To: Tim Bird
  Cc: pmladek, rostedt, john.ogness, senozhatsky, francesco, geert,
	linux-embedded, linux-kernel

Hi Tim,

Tested-by: Shashank Balaji <shashankbalaji02@gmail.com>

...on top of rc3 on an AMD Ryzen 7 4800H laptop. This patch conflicts
with these commits with trivial fixes:

032a730268a3 init/main.c: wrap long kernel cmdline when printing to logs
60325c27d3cf printk: Add execution context (task name/CPU) to printk_info
499f86de4f8c init/main: read bootconfig header with get_unaligned_le32()

Comment below.

On Tue, Feb 10, 2026 at 04:47:41PM -0700, Tim Bird wrote:
> During early boot, printk timestamps are reported as zero before
	<snip>
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index 1d765ad242b8..5afd31c3345c 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -46,6 +46,7 @@
>  #include <linux/ctype.h>
>  #include <linux/uio.h>
>  #include <linux/sched/clock.h>
> +#include <linux/early_times.h>
>  #include <linux/sched/debug.h>
>  #include <linux/sched/task_stack.h>
>  #include <linux/panic.h>
> @@ -75,6 +76,13 @@ EXPORT_SYMBOL(ignore_console_lock_warning);
>  
>  EXPORT_TRACEPOINT_SYMBOL_GPL(console);
>  
> +#ifdef CONFIG_EARLY_PRINTK_TIMES
> +cycles_t start_cycles;
> +u64 start_ns;
> +u32 early_mult, early_shift;
> +u64 early_ts_offset;
> +#endif
> +
>  /*
>   * Low level drivers may need that to know if they can schedule in
>   * their unblank() callback or not. So let's export it.
> @@ -639,7 +647,7 @@ static void append_char(char **pp, char *e, char c)
>  static ssize_t info_print_ext_header(char *buf, size_t size,
>  				     struct printk_info *info)
>  {
> -	u64 ts_usec = info->ts_nsec;
> +	u64 ts_usec = adjust_early_ts(info->ts_nsec);
>  	char caller[20];
>  #ifdef CONFIG_PRINTK_CALLER
>  	u32 id = info->caller_id;
> @@ -1352,7 +1360,11 @@ static size_t print_syslog(unsigned int level, char *buf)
>  
>  static size_t print_time(u64 ts, char *buf)
>  {
> -	unsigned long rem_nsec = do_div(ts, 1000000000);
> +	unsigned long rem_nsec;
> +
> +	ts = adjust_early_ts(ts);
> +
> +	rem_nsec = do_div(ts, 1000000000);
>  
>  	return sprintf(buf, "[%5lu.%06lu]",
>  		       (unsigned long)ts, rem_nsec / 1000);
> @@ -2242,6 +2254,8 @@ int vprintk_store(int facility, int level,
>  	 * timestamp with respect to the caller.
>  	 */
>  	ts_nsec = local_clock();
> +	if (!ts_nsec)
> +		ts_nsec = early_cycles();

ts_nsec goes on to be stored in a struct printk_info's ts_nsec which is
documented to be "timestamp in nanoseconds":

	/*
	 * Meta information about each stored message.
	 *
	 * All fields are set by the printk code except for @seq, which is
	 * set by the ringbuffer code.
	 */
	struct printk_info {
		u64	seq;		/* sequence number */
		u64	ts_nsec;	/* timestamp in nanoseconds */
		u16	text_len;	/* length of text message */
		u8	facility;	/* syslog facility */
		u8	flags:5;	/* internal record flags */
		u8	level:3;	/* syslog level */
		u32	caller_id;	/* thread id or processor id */
	#ifdef CONFIG_PRINTK_EXECUTION_CTX
		u32	caller_id2;	/* caller_id complement */
		/* name of the task that generated the message */
		char	comm[TASK_COMM_LEN];
	#endif

		struct dev_printk_info	dev_info;
	};

Since with this patch, ts_nsec can either be a timestamp in ns or a
cycle count, the comment should be updated. Ideally, I'd like the member
name to be changed as well to reflect the new semantic. I'm thinking 
ts_raw or ts_ns_or_cyc... naming is hard :)

Thanks,
Shashank

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v3] printk: fix zero-valued printk timestamps in early boot
  2026-03-09 17:27   ` Shashank Balaji
@ 2026-03-10 10:43     ` Petr Mladek
  2026-03-10 19:17     ` Bird, Tim
  1 sibling, 0 replies; 36+ messages in thread
From: Petr Mladek @ 2026-03-10 10:43 UTC (permalink / raw)
  To: Shashank Balaji
  Cc: Tim Bird, rostedt, john.ogness, senozhatsky, francesco, geert,
	linux-embedded, linux-kernel

On Tue 2026-03-10 02:27:27, Shashank Balaji wrote:
> Hi Tim,
> 
> Tested-by: Shashank Balaji <shashankbalaji02@gmail.com>
> 
> ...on top of rc3 on an AMD Ryzen 7 4800H laptop. This patch conflicts
> with these commits with trivial fixes:
> 
> 032a730268a3 init/main.c: wrap long kernel cmdline when printing to logs
> 60325c27d3cfq printk: Add execution context (task name/CPU) to printk_info
> 499f86de4f8c init/main: read bootconfig header with get_unaligned_le32()

Good to know.

> On Tue, Feb 10, 2026 at 04:47:41PM -0700, Tim Bird wrote:
> > During early boot, printk timestamps are reported as zero before
> 	<snip>
> > diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> > index 1d765ad242b8..5afd31c3345c 100644
> > --- a/kernel/printk/printk.c
> > +++ b/kernel/printk/printk.c
> > @@ -2242,6 +2254,8 @@ int vprintk_store(int facility, int level,
> >  	 * timestamp with respect to the caller.
> >  	 */
> >  	ts_nsec = local_clock();
> > +	if (!ts_nsec)
> > +		ts_nsec = early_cycles();
> 
> ts_nsec goes on to be stored in a struct printk_info's ts_nsec which is
> documented to be "timestamp in nanoseconds":
> 
> 	/*
> 	 * Meta information about each stored message.
> 	 *
> 	 * All fields are set by the printk code except for @seq, which is
> 	 * set by the ringbuffer code.
> 	 */
> 	struct printk_info {
> 		u64	seq;		/* sequence number */
> 		u64	ts_nsec;	/* timestamp in nanoseconds */
> 		u16	text_len;	/* length of text message */
> 		u8	facility;	/* syslog facility */
> 		u8	flags:5;	/* internal record flags */
> 		u8	level:3;	/* syslog level */
> 		u32	caller_id;	/* thread id or processor id */
> 	#ifdef CONFIG_PRINTK_EXECUTION_CTX
> 		u32	caller_id2;	/* caller_id complement */
> 		/* name of the task that generated the message */
> 		char	comm[TASK_COMM_LEN];
> 	#endif
> 
> 		struct dev_printk_info	dev_info;
> 	};
> 
> Since with this patch, ts_nsec can either be a timestamp in ns or a
> cycle count, the comment should be updated.

Yup, great catch!

> Ideally, I'd like the member
> name to be changed as well to reflect the new semantic. I'm thinking 
> ts_raw or ts_ns_or_cyc... naming is hard :)

Hmm, we could not change it easily because it would break user space
tools for reading kernel crash dump.

Alternative solution would be usign an union.

	union {
		u64	ts_nsec;
		u64	ts_cycles;
	};

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: [PATCH v3] printk: fix zero-valued printk timestamps in early boot
  2026-03-09 17:27   ` Shashank Balaji
  2026-03-10 10:43     ` Petr Mladek
@ 2026-03-10 19:17     ` Bird, Tim
  1 sibling, 0 replies; 36+ messages in thread
From: Bird, Tim @ 2026-03-10 19:17 UTC (permalink / raw)
  To: Shashank Balaji
  Cc: pmladek@suse.com, rostedt@goodmis.org, John Ogness,
	senozhatsky@chromium.org, francesco@valla.it,
	geert@linux-m68k.org, linux-embedded@vger.kernel.org,
	linux-kernel@vger.kernel.org



> -----Original Message-----
> From: Shashank Balaji <shashankbalaji02@gmail.com>
> Hi Tim,
> 
> Tested-by: Shashank Balaji <shashankbalaji02@gmail.com>

Thanks for the testing!!

> 
> ...on top of rc3 on an AMD Ryzen 7 4800H laptop. This patch conflicts
> with these commits with trivial fixes:
> 
> 032a730268a3 init/main.c: wrap long kernel cmdline when printing to logs
> 60325c27d3cf printk: Add execution context (task name/CPU) to printk_info
> 499f86de4f8c init/main: read bootconfig header with get_unaligned_le32()
> 
> Comment below.
> 
> On Tue, Feb 10, 2026 at 04:47:41PM -0700, Tim Bird wrote:
> > During early boot, printk timestamps are reported as zero before
> 	<snip>
> > diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> > index 1d765ad242b8..5afd31c3345c 100644
> > --- a/kernel/printk/printk.c
> > +++ b/kernel/printk/printk.c
> > @@ -46,6 +46,7 @@
> >  #include <linux/ctype.h>
> >  #include <linux/uio.h>
> >  #include <linux/sched/clock.h>
> > +#include <linux/early_times.h>
> >  #include <linux/sched/debug.h>
> >  #include <linux/sched/task_stack.h>
> >  #include <linux/panic.h>
> > @@ -75,6 +76,13 @@ EXPORT_SYMBOL(ignore_console_lock_warning);
> >
> >  EXPORT_TRACEPOINT_SYMBOL_GPL(console);
> >
> > +#ifdef CONFIG_EARLY_PRINTK_TIMES
> > +cycles_t start_cycles;
> > +u64 start_ns;
> > +u32 early_mult, early_shift;
> > +u64 early_ts_offset;
> > +#endif
> > +
> >  /*
> >   * Low level drivers may need that to know if they can schedule in
> >   * their unblank() callback or not. So let's export it.
> > @@ -639,7 +647,7 @@ static void append_char(char **pp, char *e, char c)
> >  static ssize_t info_print_ext_header(char *buf, size_t size,
> >  				     struct printk_info *info)
> >  {
> > -	u64 ts_usec = info->ts_nsec;
> > +	u64 ts_usec = adjust_early_ts(info->ts_nsec);
> >  	char caller[20];
> >  #ifdef CONFIG_PRINTK_CALLER
> >  	u32 id = info->caller_id;
> > @@ -1352,7 +1360,11 @@ static size_t print_syslog(unsigned int level, char *buf)
> >
> >  static size_t print_time(u64 ts, char *buf)
> >  {
> > -	unsigned long rem_nsec = do_div(ts, 1000000000);
> > +	unsigned long rem_nsec;
> > +
> > +	ts = adjust_early_ts(ts);
> > +
> > +	rem_nsec = do_div(ts, 1000000000);
> >
> >  	return sprintf(buf, "[%5lu.%06lu]",
> >  		       (unsigned long)ts, rem_nsec / 1000);
> > @@ -2242,6 +2254,8 @@ int vprintk_store(int facility, int level,
> >  	 * timestamp with respect to the caller.
> >  	 */
> >  	ts_nsec = local_clock();
> > +	if (!ts_nsec)
> > +		ts_nsec = early_cycles();
> 
> ts_nsec goes on to be stored in a struct printk_info's ts_nsec which is
> documented to be "timestamp in nanoseconds":
> 
> 	/*
> 	 * Meta information about each stored message.
> 	 *
> 	 * All fields are set by the printk code except for @seq, which is
> 	 * set by the ringbuffer code.
> 	 */
> 	struct printk_info {
> 		u64	seq;		/* sequence number */
> 		u64	ts_nsec;	/* timestamp in nanoseconds */
> 		u16	text_len;	/* length of text message */
> 		u8	facility;	/* syslog facility */
> 		u8	flags:5;	/* internal record flags */
> 		u8	level:3;	/* syslog level */
> 		u32	caller_id;	/* thread id or processor id */
> 	#ifdef CONFIG_PRINTK_EXECUTION_CTX
> 		u32	caller_id2;	/* caller_id complement */
> 		/* name of the task that generated the message */
> 		char	comm[TASK_COMM_LEN];
> 	#endif
> 
> 		struct dev_printk_info	dev_info;
> 	};
> 
> Since with this patch, ts_nsec can either be a timestamp in ns or a
> cycle count, the comment should be updated. Ideally, I'd like the member
> name to be changed as well to reflect the new semantic. I'm thinking
> ts_raw or ts_ns_or_cyc... naming is hard :)

Nice catch!  I'm considering either changing the comment, or
using a union here (or maybe both).

Thanks,
 -- Tim


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v3] printk: fix zero-valued printk timestamps in early boot
  2026-02-10 23:47 ` [PATCH v3] " Tim Bird
  2026-03-04 11:23   ` Petr Mladek
  2026-03-09 17:27   ` Shashank Balaji
@ 2026-03-09 19:25   ` Shashank Balaji
  2026-03-10 11:39     ` Petr Mladek
  2026-03-11 15:47   ` Michael Kelley
  2026-03-26 13:17   ` Thomas Gleixner
  4 siblings, 1 reply; 36+ messages in thread
From: Shashank Balaji @ 2026-03-09 19:25 UTC (permalink / raw)
  To: Tim Bird
  Cc: pmladek, rostedt, john.ogness, senozhatsky, francesco, geert,
	linux-embedded, linux-kernel

Hi again,

On Tue, Feb 10, 2026 at 04:47:41PM -0700, Tim Bird wrote:
> During early boot, printk timestamps are reported as zero before
> kernel timekeeping starts (e.g. before time_init()).  This
> hinders boot-time optimization efforts.  This period is about 400
> milliseconds for many current desktop and embedded machines
> running Linux.
> 
> Add support to save cycles during early boot, and output correct
> timestamp values after timekeeping is initialized.  get_cycles()
> is operational on arm64 and x86_64 from kernel start.  Add code
> and variables to save calibration values used to later convert
> cycle counts to time values in the early printks.  Add a config
> to control the feature.
> 
> This yields non-zero timestamps for printks from the very start
> of kernel execution.  The timestamps are relative to the start of
> the architecture-specified counter used in get_cycles
> (e.g. the TSC on x86_64 and cntvct_el0 on arm64).
> 
> All timestamps reflect time from processor power-on instead of
> time from the kernel's timekeeping initialization.
> 
> Signed-off-by: Tim Bird <tim.bird@sony.com>

So if a console is read before the cycles -> timestamp conversion can
happen, then they'll see 0. But reading from userspace will give the
right timestamps.

Based on the previous discussions, to address this possible confusion,
if changing the timestamp format, like adding '?', is a no-go because
of concerns of breaking existing monitoring tools, what about appending
something to the printk string after the timestamp? Hmm, no, that'll
affect grep'ability _and_ may break monitoring tools. Or what about a
pr_warn() early in boot to warn about the possible timestamp difference?

At the very least the possibility of this difference should be
documented in the Kconfig description.

Continuing below...

> ---
> V2->V3
>  Default CONFIG option to 'n'
>  Move more code from into early_times.h
>   (reducing ifdefs in init/main.c)
>  Use math64 helper routines
>  Use cycles_t instead of u64 type
>  Add #defines for EARLY_CYCLES_BIT and MASK
>  Invert if logic in adjust_early_ts()
>  (note: no change to 'depends on' in Kconfig entry)
> 
> V1->V2
>  Remove calibration CONFIG vars
>  Add 'depends on' to restrict arches (to handle ppc bug)
>  Add early_ts_offset to avoid discontinuity
>  Save cycles in ts_nsec, and convert on output
>  Move conditional code to include file (early_times.h)
> 
>  include/linux/early_times.h | 85 +++++++++++++++++++++++++++++++++++++
>  init/Kconfig                | 14 ++++++
>  init/main.c                 |  6 +++
>  kernel/printk/printk.c      | 18 +++++++-
>  4 files changed, 121 insertions(+), 2 deletions(-)
>  create mode 100644 include/linux/early_times.h
> 
> diff --git a/include/linux/early_times.h b/include/linux/early_times.h
> new file mode 100644
> index 000000000000..05388dcb8573
> --- /dev/null
> +++ b/include/linux/early_times.h
> @@ -0,0 +1,85 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef _KERNEL_PRINTK_EARLY_TIMES_H
> +#define _KERNEL_PRINTK_EARLY_TIMES_H
> +
> +#include <linux/timex.h>
> +#include <linux/clocksource.h>
> +
> +/* use high bit of a u64 to indicate cycles instead of a timestamp */
> +#define EARLY_CYCLES_BIT	BIT_ULL(63)
> +#define EARLY_CYCLES_MASK	~(BIT_ULL(63))
> +
> +#if defined(CONFIG_EARLY_PRINTK_TIMES)
> +extern cycles_t start_cycles;
> +extern u64 start_ns;
> +extern u32 early_mult, early_shift;
> +extern u64 early_ts_offset;
> +
> +static inline void early_times_start_calibration(void)
> +{
> +	start_cycles = get_cycles();
> +	start_ns = local_clock();
> +}
> +
> +static inline void early_times_finish_calibration(void)
> +{
> +	cycles_t end_cycles;
> +	u64 end_ns;
> +
> +	/* set calibration data for early_printk_times */
> +	end_cycles = get_cycles();
> +	end_ns = local_clock();
> +	clocks_calc_mult_shift(&early_mult, &early_shift,
> +		mul_u64_u64_div_u64(end_cycles - start_cycles,
> +			NSEC_PER_SEC, end_ns - start_ns),
> +		NSEC_PER_SEC, 100);
> +	early_ts_offset = mul_u64_u32_shr(start_cycles, early_mult, early_shift) - start_ns;
> +
> +	pr_debug("Early printk times: mult=%u, shift=%u, offset=%llu ns\n",
> +		early_mult, early_shift, early_ts_offset);
> +}
> +
> +static inline u64 early_cycles(void)
> +{
> +	return (get_cycles() | EARLY_CYCLES_BIT);
> +}
> +
> +/*
> + * adjust_early_ts detects whether ts in is cycles or nanoseconds
> + * and converts it or adjusts it, taking into account the offset
> + * from cycle-counter start.
> + *
> + * Note that early_mult may be 0, but that's OK because
> + * we'll just multiply by 0 and return 0. This will
> + * only occur if we're outputting a printk message
> + * before the calibration of the early timestamp.
> + * Any output after user space start (eg. from dmesg or
> + * journalctl) will show correct values.
> + */
> +static inline u64 adjust_early_ts(u64 ts)
> +{
> +	if (likely(!(ts & EARLY_CYCLES_BIT)))
> +		/* if timestamp is not in cycles, just add offset */
> +		return ts + early_ts_offset;
> +
> +	/* mask high bit and convert to nanoseconds */
> +	return mul_u64_u32_shr(ts & EARLY_CYCLES_MASK, early_mult, early_shift);
> +}
> +
> +#else
> +# define early_times_start_calibration() do { } while (0)
> +# define early_times_finish_calibration() do { } while (0)
> +
> +static inline u64 early_cycles(void)
> +{
> +	return 0;
> +}
> +
> +static inline u64 adjust_early_ts(u64 ts)
> +{
> +	return ts;
> +}
> +#endif /* CONFIG_EARLY_PRINTK_TIMES */
> +
> +#endiV
> diff --git a/init/Kconfig b/init/Kconfig
> index fa79feb8fe57..a928c1efb09d 100644
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -777,6 +777,20 @@ config IKHEADERS
>  	  or similar programs.  If you build the headers as a module, a module called
>  	  kheaders.ko is built which can be loaded on-demand to get access to headers.
>  
> +config EARLY_PRINTK_TIMES
> +	bool "Show non-zero printk timestamps early in boot"
> +	default n
> +	depends on PRINTK
> +	depends on ARM64 || X86_64
> +	help
> +	  Use a cycle-counter to provide printk timestamps during
> +	  early boot.  This allows seeing timestamps for printks that
> +	  would otherwise show as 0.  Note that this will shift the
> +	  printk timestamps to be relative to processor power on, instead
> +	  of relative to the start of kernel timekeeping.  This should be
> +	  closer to machine power on, giving a better indication of
> +	  overall boot time.
> +
>  config LOG_BUF_SHIFT
>  	int "Kernel log buffer size (16 => 64KB, 17 => 128KB)"
>  	range 12 25
> diff --git a/init/main.c b/init/main.c
> index b84818ad9685..d5774aec1aff 100644
> --- a/init/main.c
> +++ b/init/main.c
> @@ -104,6 +104,7 @@
>  #include <linux/pidfs.h>
>  #include <linux/ptdump.h>
>  #include <linux/time_namespace.h>
> +#include <linux/early_times.h>
>  #include <net/net_namespace.h>
>  
>  #include <asm/io.h>
> @@ -1118,6 +1119,9 @@ void start_kernel(void)
>  	timekeeping_init();
>  	time_init();
>  
> +	/* This must be after timekeeping is initialized */
> +	early_times_start_calibration();
> +
>  	/* This must be after timekeeping is initialized */
>  	random_init();
>  
> @@ -1600,6 +1604,8 @@ static int __ref kernel_init(void *unused)
>  
>  	do_sysctl_args();
>  
> +	early_times_finish_calibration();
> +
>  	if (ramdisk_execute_command) {
>  		ret = run_init_process(ramdisk_execute_command);
>  		if (!ret)
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index 1d765ad242b8..5afd31c3345c 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -46,6 +46,7 @@
>  #include <linux/ctype.h>
>  #include <linux/uio.h>
>  #include <linux/sched/clock.h>
> +#include <linux/early_times.h>
>  #include <linux/sched/debug.h>
>  #include <linux/sched/task_stack.h>
>  #include <linux/panic.h>
> @@ -75,6 +76,13 @@ EXPORT_SYMBOL(ignore_console_lock_warning);
>  
>  EXPORT_TRACEPOINT_SYMBOL_GPL(console);
>  
> +#ifdef CONFIG_EARLY_PRINTK_TIMES
> +cycles_t start_cycles;
> +u64 start_ns;
> +u32 early_mult, early_shift;
> +u64 early_ts_offset;
> +#endif
> +
>  /*
>   * Low level drivers may need that to know if they can schedule in
>   * their unblank() callback or not. So let's export it.
> @@ -639,7 +647,7 @@ static void append_char(char **pp, char *e, char c)
>  static ssize_t info_print_ext_header(char *buf, size_t size,
>  				     struct printk_info *info)
>  {
> -	u64 ts_usec = info->ts_nsec;
> +	u64 ts_usec = adjust_early_ts(info->ts_nsec);

printk_get_next_message() calls info_print_ext_header() for an extended
console (/dev/kmsg and netcon_ext use this), whereas for
non-extended consoles, record_print_text() -> info_print_prefix() ->
print_time() is called. So, this adjustment should be made in
print_time() too, otherwise non-extended console users are gonna be
spooked with insane timestamps. This may explain the non-zero early
timestamps Petr saw in his serial console output [1].

An accessor can be implemented for (struct printk_info).ts_nsec, say
get_timestamp(), which can be called from both the places.

[1] https://lore.kernel.org/all/aYDPn2EJgJIWGDhM@pathway/

Thanks,
Shashank

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v3] printk: fix zero-valued printk timestamps in early boot
  2026-03-09 19:25   ` Shashank Balaji
@ 2026-03-10 11:39     ` Petr Mladek
  2026-03-10 18:54       ` Bird, Tim
  0 siblings, 1 reply; 36+ messages in thread
From: Petr Mladek @ 2026-03-10 11:39 UTC (permalink / raw)
  To: Shashank Balaji
  Cc: Tim Bird, rostedt, john.ogness, senozhatsky, francesco, geert,
	linux-embedded, linux-kernel

On Tue 2026-03-10 04:25:33, Shashank Balaji wrote:
> Hi again,
> 
> On Tue, Feb 10, 2026 at 04:47:41PM -0700, Tim Bird wrote:
> > During early boot, printk timestamps are reported as zero before
> > kernel timekeeping starts (e.g. before time_init()).  This
> > hinders boot-time optimization efforts.  This period is about 400
> > milliseconds for many current desktop and embedded machines
> > running Linux.
> > 
> > Add support to save cycles during early boot, and output correct
> > timestamp values after timekeeping is initialized.  get_cycles()
> > is operational on arm64 and x86_64 from kernel start.  Add code
> > and variables to save calibration values used to later convert
> > cycle counts to time values in the early printks.  Add a config
> > to control the feature.
> > 
> > This yields non-zero timestamps for printks from the very start
> > of kernel execution.  The timestamps are relative to the start of
> > the architecture-specified counter used in get_cycles
> > (e.g. the TSC on x86_64 and cntvct_el0 on arm64).
> > 
> > All timestamps reflect time from processor power-on instead of
> > time from the kernel's timekeeping initialization.
> > 
> > Signed-off-by: Tim Bird <tim.bird@sony.com>
> 
> So if a console is read before the cycles -> timestamp conversion can
> happen, then they'll see 0. But reading from userspace will give the
> right timestamps.
> 
> Based on the previous discussions, to address this possible confusion,
> if changing the timestamp format, like adding '?', is a no-go because
> of concerns of breaking existing monitoring tools, what about appending
> something to the printk string after the timestamp? Hmm, no, that'll
> affect grep'ability _and_ may break monitoring tools. Or what about a
> pr_warn() early in boot to warn about the possible timestamp difference?

Or we could make it more obvious from the message in
early_times_finish_calibration(), see below.

> At the very least the possibility of this difference should be
> documented in the Kconfig description.

Yeah, it would be nice to mention this in the Kconfig description.

> > --- /dev/null
> > +++ b/include/linux/early_times.h
> > @@ -0,0 +1,85 @@
> > +static inline void early_times_finish_calibration(void)
> > +{
> > +	cycles_t end_cycles;
> > +	u64 end_ns;
> > +
> > +	/* set calibration data for early_printk_times */
> > +	end_cycles = get_cycles();
> > +	end_ns = local_clock();
> > +	clocks_calc_mult_shift(&early_mult, &early_shift,
> > +		mul_u64_u64_div_u64(end_cycles - start_cycles,
> > +			NSEC_PER_SEC, end_ns - start_ns),
> > +		NSEC_PER_SEC, 100);
> > +	early_ts_offset = mul_u64_u32_shr(start_cycles, early_mult, early_shift) - start_ns;
> > +
> > +	pr_debug("Early printk times: mult=%u, shift=%u, offset=%llu ns\n",
> > +		early_mult, early_shift, early_ts_offset);

We might make it more obvious that an offset will get added to the
existing timestamp since this point.

Also it has a "surprising" user visible effect so that it should
be pr_info() instead of pr_debug(). Note pr_debug() messages might be hidden.

A minimalist change would be:

	pr_info("Calibrated offset for early printk times: mult=%u, shift=%u, offset=%llu ns\n",
		early_mult, early_shift, early_ts_offset);

And/Or we might add one more line:

	pr_info("The time offset is added for existing and newly added printk messages since now!");


> > +}
> > +
> > --- a/kernel/printk/printk.c
> > +++ b/kernel/printk/printk.c
> > @@ -639,7 +647,7 @@ static void append_char(char **pp, char *e, char c)
> >  static ssize_t info_print_ext_header(char *buf, size_t size,
> >  				     struct printk_info *info)
> >  {
> > -	u64 ts_usec = info->ts_nsec;
> > +	u64 ts_usec = adjust_early_ts(info->ts_nsec);
> 
> printk_get_next_message() calls info_print_ext_header() for an extended
> console (/dev/kmsg and netcon_ext use this), whereas for
> non-extended consoles, record_print_text() -> info_print_prefix() ->
> print_time() is called. So, this adjustment should be made in
> print_time() too, otherwise non-extended console users are gonna be
> spooked with insane timestamps.

The v3 patch already modifies print_time().

> This may explain the non-zero early
> timestamps Petr saw in his serial console output [1].

I am a bit confused now. There are three stages:

   1. Early messages where the cycles are stored.

      The serial console shows zero time stamp because
      it reads the messages _before the calibration_, e.g.

	[    0.000000] Linux version 6.19.0-rc7-default+ (pmladek@pathway) (gcc (SUSE Linux) 15.2.1 20251006, GNU ld (GNU Binutils; openSUSE Tumbleweed) 2.45.0.20251103-2) #521 SMP PREEMPT_DYNAMIC Mon Feb  2 16:36:53 CET 2026
	[    0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-6.19.0-rc7-default+ root=UUID=587ae802-e330-4059-9b48-d5b845e1075a resume=/dev/disk/by-uuid/369c7453-3d16-409d-88b2-5de027891a12 mitigations=auto nosplash earlycon=uart8250,io,0x3f8,115200 console=ttyS0,115200 console=ttynull console=tty0 debug_non_panic_cpus=1 panic=10 ignore_loglevel log_buf_len=1M
	[    0.000000] BIOS-provided physical RAM map:
	[    0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable
	[    0.000000] BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved

      But "dmesg" shows some numbers because it reads the messages
      _after the calibration_:

	[    8.853613] Linux version 6.19.0-rc7-default+ (pmladek@pathway) (gcc (SUSE Linux) 15.2.1 20251006, GNU ld (GNU Binutils; openSUSE Tumbleweed) 2.45.0.20251103-2) #521 SMP PREEMPT_DYNAMIC Mon Feb  2 16:36:53 CET 2026
	[    8.853617] Command line: BOOT_IMAGE=/boot/vmlinuz-6.19.0-rc7-default+ root=UUID=587ae802-e330-4059-9b48-d5b845e1075a resume=/dev/disk/by-uuid/369c7453-3d16-409d-88b2-5de027891a12 mitigations=auto nosplash earlycon=uart8250,io,0x3f8,115200 console=ttyS0,115200 console=ttynull console=tty0 debug_non_panic_cpus=1 panic=10 ignore_loglevel log_buf_len=1M
	[    8.865086] BIOS-provided physical RAM map:
	[    8.865087] BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable
	[    8.865089] BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved


   2. Early messages added _after the timekeeping_ is initialized
      but _before the early cycles calibration_.

      They serial console prints them _without the offset_ because
      it reads them _before the calibration_, e.g.

	[    3.288049][    T1] Write protecting the kernel read-only data: 36864k
	[    3.298554][    T1] Freeing unused kernel image (text/rodata gap) memory: 1656K
	[    3.318942][    T1] Freeing unused kernel image (rodata/data gap) memory: 1540K

       But "dmesg" prints them _with the offset_ because it reads them
       _after the calibration_, e.g.

	[   12.179999] [    T1] Write protecting the kernel read-only data: 36864k
	[   12.190505] [    T1] Freeing unused kernel image (text/rodata gap) memory: 1656K
	[   12.210893] [    T1] Freeing unused kernel image (rodata/data gap) memory: 1540K


   3. Messages added after the calibration of the early cycles.

      They are printed with the offset by both serial console and
      dmesg, e.g.

	[   12.230014][    T1] Early printk times: mult=38775352, shift=27, offset=8891950261 ns
	[   12.246008][    T1] Run /init as init process
	[   12.254944][    T1]   with arguments:
	[   12.264341][    T1]     /init


> An accessor can be implemented for (struct printk_info).ts_nsec, say
> get_timestamp(), which can be called from both the places.

Yeah, a helper function for reading the timestamp might be a cleaner solution.

> [1] https://lore.kernel.org/all/aYDPn2EJgJIWGDhM@pathway/

Everything seems to be as expected there. The non-zero timestamps
on the serial console are from messages added after the timekeeping
was initialized.

Best Regards,
Petr 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: [PATCH v3] printk: fix zero-valued printk timestamps in early boot
  2026-03-10 11:39     ` Petr Mladek
@ 2026-03-10 18:54       ` Bird, Tim
  2026-03-11 15:45         ` Petr Mladek
  0 siblings, 1 reply; 36+ messages in thread
From: Bird, Tim @ 2026-03-10 18:54 UTC (permalink / raw)
  To: Petr Mladek, Shashank Balaji
  Cc: rostedt@goodmis.org, john.ogness@linuxtronix.de,
	senozhatsky@chromium.org, francesco@valla.it,
	geert@linux-m68k.org, linux-embedded@vger.kernel.org,
	linux-kernel@vger.kernel.org

Thank you both Petr and Shashank for the review and feedback.

See my responses inline below.

> -----Original Message-----
> From: Petr Mladek <pmladek@suse.com>
> 
> On Tue 2026-03-10 04: 25: 33, Shashank Balaji wrote: > Hi again, > > On Tue, Feb 10, 2026 at 04: 47: 41PM -0700, Tim Bird wrote: > > During
> early boot, printk timestamps are reported as zero before > > kernel timekeeping starts
> On Tue 2026-03-10 04:25:33, Shashank Balaji wrote:
> > Hi again,
> >
> > On Tue, Feb 10, 2026 at 04:47:41PM -0700, Tim Bird wrote:
> > > During early boot, printk timestamps are reported as zero before
> > > kernel timekeeping starts (e.g. before time_init()).  This
> > > hinders boot-time optimization efforts.  This period is about 400
> > > milliseconds for many current desktop and embedded machines
> > > running Linux.
> > >
> > > Add support to save cycles during early boot, and output correct
> > > timestamp values after timekeeping is initialized.  get_cycles()
> > > is operational on arm64 and x86_64 from kernel start.  Add code
> > > and variables to save calibration values used to later convert
> > > cycle counts to time values in the early printks.  Add a config
> > > to control the feature.
> > >
> > > This yields non-zero timestamps for printks from the very start
> > > of kernel execution.  The timestamps are relative to the start of
> > > the architecture-specified counter used in get_cycles
> > > (e.g. the TSC on x86_64 and cntvct_el0 on arm64).
> > >
> > > All timestamps reflect time from processor power-on instead of
> > > time from the kernel's timekeeping initialization.
> > >
> > > Signed-off-by: Tim Bird <tim.bird@sony.com>
> >
> > So if a console is read before the cycles -> timestamp conversion can
> > happen, then they'll see 0. But reading from userspace will give the
> > right timestamps.
> >
> > Based on the previous discussions, to address this possible confusion,
> > if changing the timestamp format, like adding '?', is a no-go because
> > of concerns of breaking existing monitoring tools, what about appending
> > something to the printk string after the timestamp? Hmm, no, that'll
> > affect grep'ability _and_ may break monitoring tools. Or what about a
> > pr_warn() early in boot to warn about the possible timestamp difference?
> 
> Or we could make it more obvious from the message in
> early_times_finish_calibration(), see below.
> 
> > At the very least the possibility of this difference should be
> > documented in the Kconfig description.
> 
> Yeah, it would be nice to mention this in the Kconfig description.

I'll work on a v4 version of the patch, and add some text
to mention this.

> 
> > > --- /dev/null
> > > +++ b/include/linux/early_times.h
> > > @@ -0,0 +1,85 @@
> > > +static inline void early_times_finish_calibration(void)
> > > +{
> > > +	cycles_t end_cycles;
> > > +	u64 end_ns;
> > > +
> > > +	/* set calibration data for early_printk_times */
> > > +	end_cycles = get_cycles();
> > > +	end_ns = local_clock();
> > > +	clocks_calc_mult_shift(&early_mult, &early_shift,
> > > +		mul_u64_u64_div_u64(end_cycles - start_cycles,
> > > +			NSEC_PER_SEC, end_ns - start_ns),
> > > +		NSEC_PER_SEC, 100);
> > > +	early_ts_offset = mul_u64_u32_shr(start_cycles, early_mult, early_shift) - start_ns;
> > > +
> > > +	pr_debug("Early printk times: mult=%u, shift=%u, offset=%llu ns\n",
> > > +		early_mult, early_shift, early_ts_offset);
> 
> We might make it more obvious that an offset will get added to the
> existing timestamp since this point.
> 
> Also it has a "surprising" user visible effect so that it should
> be pr_info() instead of pr_debug(). Note pr_debug() messages might be hidden.
> 
> A minimalist change would be:
> 
> 	pr_info("Calibrated offset for early printk times: mult=%u, shift=%u, offset=%llu ns\n",
> 		early_mult, early_shift, early_ts_offset);

This sounds like a good idea.  Will do.

 
> And/Or we might add one more line:
> 
> 	pr_info("The time offset is added for existing and newly added printk messages since now!");
I think adding a message is a possibility, to explain the offset.

Let me consider the wordings for this.  Using 'now' is a bit ambiguous, because the offset is
always used in post-boot output, such as from dmesg or journalctl, even for messages earlier
in the log.
 
> 
> > > +}
> > > +
> > > --- a/kernel/printk/printk.c
> > > +++ b/kernel/printk/printk.c
> > > @@ -639,7 +647,7 @@ static void append_char(char **pp, char *e, char c)
> > >  static ssize_t info_print_ext_header(char *buf, size_t size,
> > >  				     struct printk_info *info)
> > >  {
> > > -	u64 ts_usec = info->ts_nsec;
> > > +	u64 ts_usec = adjust_early_ts(info->ts_nsec);
> >
> > printk_get_next_message() calls info_print_ext_header() for an extended
> > console (/dev/kmsg and netcon_ext use this), whereas for
> > non-extended consoles, record_print_text() -> info_print_prefix() ->
> > print_time() is called. So, this adjustment should be made in
> > print_time() too, otherwise non-extended console users are gonna be
> > spooked with insane timestamps.
> 
> The v3 patch already modifies print_time().
> 
> > This may explain the non-zero early
> > timestamps Petr saw in his serial console output [1].
> 
> I am a bit confused now. There are three stages:
> 
>    1. Early messages where the cycles are stored.
> 
>       The serial console shows zero time stamp because
>       it reads the messages _before the calibration_, e.g.
> 
> 	[    0.000000] Linux version 6.19.0-rc7-default+ (pmladek@pathway) (gcc (SUSE Linux) 15.2.1 20251006, GNU ld (GNU Binutils;
> openSUSE Tumbleweed) 2.45.0.20251103-2) #521 SMP PREEMPT_DYNAMIC Mon Feb  2 16:36:53 CET 2026
> 	[    0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-6.19.0-rc7-default+ root=UUID=587ae802-e330-4059-9b48-
> d5b845e1075a resume=/dev/disk/by-uuid/369c7453-3d16-409d-88b2-5de027891a12 mitigations=auto nosplash
> earlycon=uart8250,io,0x3f8,115200 console=ttyS0,115200 console=ttynull console=tty0 debug_non_panic_cpus=1 panic=10 ignore_loglevel
> log_buf_len=1M
> 	[    0.000000] BIOS-provided physical RAM map:
> 	[    0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable
> 	[    0.000000] BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved
> 
>       But "dmesg" shows some numbers because it reads the messages
>       _after the calibration_:
> 
> 	[    8.853613] Linux version 6.19.0-rc7-default+ (pmladek@pathway) (gcc (SUSE Linux) 15.2.1 20251006, GNU ld (GNU Binutils;
> openSUSE Tumbleweed) 2.45.0.20251103-2) #521 SMP PREEMPT_DYNAMIC Mon Feb  2 16:36:53 CET 2026
> 	[    8.853617] Command line: BOOT_IMAGE=/boot/vmlinuz-6.19.0-rc7-default+ root=UUID=587ae802-e330-4059-9b48-
> d5b845e1075a resume=/dev/disk/by-uuid/369c7453-3d16-409d-88b2-5de027891a12 mitigations=auto nosplash
> earlycon=uart8250,io,0x3f8,115200 console=ttyS0,115200 console=ttynull console=tty0 debug_non_panic_cpus=1 panic=10 ignore_loglevel
> log_buf_len=1M
> 	[    8.865086] BIOS-provided physical RAM map:
> 	[    8.865087] BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable
> 	[    8.865089] BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved
> 
> 
>    2. Early messages added _after the timekeeping_ is initialized
>       but _before the early cycles calibration_.
> 
>       They serial console prints them _without the offset_ because
>       it reads them _before the calibration_, e.g.
> 
> 	[    3.288049][    T1] Write protecting the kernel read-only data: 36864k
> 	[    3.298554][    T1] Freeing unused kernel image (text/rodata gap) memory: 1656K
> 	[    3.318942][    T1] Freeing unused kernel image (rodata/data gap) memory: 1540K
> 
>        But "dmesg" prints them _with the offset_ because it reads them
>        _after the calibration_, e.g.
> 
> 	[   12.179999] [    T1] Write protecting the kernel read-only data: 36864k
> 	[   12.190505] [    T1] Freeing unused kernel image (text/rodata gap) memory: 1656K
> 	[   12.210893] [    T1] Freeing unused kernel image (rodata/data gap) memory: 1540K
> 
> 
>    3. Messages added after the calibration of the early cycles.
> 
>       They are printed with the offset by both serial console and
>       dmesg, e.g.
> 
> 	[   12.230014][    T1] Early printk times: mult=38775352, shift=27, offset=8891950261 ns
> 	[   12.246008][    T1] Run /init as init process
> 	[   12.254944][    T1]   with arguments:
> 	[   12.264341][    T1]     /init

This is correct.  I don't want to overwhelm users of this, but there are three time-gathering periods, and roughly
2 output times (before calibration and after calibration).

early boot = before time init, before cycles calibration and offset calculation
mid boot = after time init, before cycles calibration and offset calculation
late boot = after time init, after cycles calibration and offset calculation
All of these are before the start of user space processes.

time of printk	output time	timestamp type	stored	timestamp output
------------------	----------------	-----------------------------	------------------------
early boot	early boot	cycles			0
mid boot	mid-boot	nanosecs		seconds, with offset from time_init
late boot	late-boot	nanosecs		seconds with offset from cycle counter start
-----
early boot	post-boot	(cycles already stored)	seconds with offset from cycle counter start
mid boot	post-boot	(ns already stored)	seconds with offset from cycle counter start
late boot	post-boot	(ns already stored)	seconds with offset from cycle counter start
all others	post-boot	nanosecs		seconds with offset from cycle counter start			

The confusing thing is messages that are output to the console before calibration and offset calculation.
Reports from all user space tools (ie from dmesg or journalctl) should be correct and consistent.

> 
> > An accessor can be implemented for (struct printk_info).ts_nsec, say
> > get_timestamp(), which can be called from both the places.
> 
> Yeah, a helper function for reading the timestamp might be a cleaner solution.
> 

I consider adjust_early_ts() to be such an accessor function.  It's supposed
to hide the details of the type of the timestamp (cycles or ns) and the offset.
Maybe this could be renamed to something better, like: get_adjusted_ts()?
Let me know what you think.

> > [1] https://lore.kernel.org/all/aYDPn2EJgJIWGDhM@pathway/
> 
> Everything seems to be as expected there. The non-zero timestamps
> on the serial console are from messages added after the timekeeping
> was initialized.

I'll rebase this patch to resolve the #include conflicts, and address
this feedback, and hopefully get a new version out this week.

Thanks again for the discussion and ideas!
 -- Tim


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v3] printk: fix zero-valued printk timestamps in early boot
  2026-03-10 18:54       ` Bird, Tim
@ 2026-03-11 15:45         ` Petr Mladek
  0 siblings, 0 replies; 36+ messages in thread
From: Petr Mladek @ 2026-03-11 15:45 UTC (permalink / raw)
  To: Bird, Tim
  Cc: Shashank Balaji, rostedt@goodmis.org, john.ogness@linuxtronix.de,
	senozhatsky@chromium.org, francesco@valla.it,
	geert@linux-m68k.org, linux-embedded@vger.kernel.org,
	linux-kernel@vger.kernel.org

On Tue 2026-03-10 18:54:22, Bird, Tim wrote:
> > From: Petr Mladek <pmladek@suse.com>
> > There are three stages:
> > 
> >    1. Early messages where the cycles are stored.
> > 
> >       The serial console shows zero time stamp because
> >       it reads the messages _before the calibration_, e.g.
> > 
> > 	[    0.000000] Linux version 6.19.0-rc7-default+ (pmladek@pathway) (gcc (SUSE Linux) 15.2.1 20251006, GNU ld (GNU Binutils;
> > openSUSE Tumbleweed) 2.45.0.20251103-2) #521 SMP PREEMPT_DYNAMIC Mon Feb  2 16:36:53 CET 2026
> > 	[    0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-6.19.0-rc7-default+ root=UUID=587ae802-e330-4059-9b48-
> > d5b845e1075a resume=/dev/disk/by-uuid/369c7453-3d16-409d-88b2-5de027891a12 mitigations=auto nosplash
> > earlycon=uart8250,io,0x3f8,115200 console=ttyS0,115200 console=ttynull console=tty0 debug_non_panic_cpus=1 panic=10 ignore_loglevel
> > log_buf_len=1M
> > 	[    0.000000] BIOS-provided physical RAM map:
> > 	[    0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable
> > 	[    0.000000] BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved
> > 
> >       But "dmesg" shows some numbers because it reads the messages
> >       _after the calibration_:
> > 
> > 	[    8.853613] Linux version 6.19.0-rc7-default+ (pmladek@pathway) (gcc (SUSE Linux) 15.2.1 20251006, GNU ld (GNU Binutils;
> > openSUSE Tumbleweed) 2.45.0.20251103-2) #521 SMP PREEMPT_DYNAMIC Mon Feb  2 16:36:53 CET 2026
> > 	[    8.853617] Command line: BOOT_IMAGE=/boot/vmlinuz-6.19.0-rc7-default+ root=UUID=587ae802-e330-4059-9b48-
> > d5b845e1075a resume=/dev/disk/by-uuid/369c7453-3d16-409d-88b2-5de027891a12 mitigations=auto nosplash
> > earlycon=uart8250,io,0x3f8,115200 console=ttyS0,115200 console=ttynull console=tty0 debug_non_panic_cpus=1 panic=10 ignore_loglevel
> > log_buf_len=1M
> > 	[    8.865086] BIOS-provided physical RAM map:
> > 	[    8.865087] BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable
> > 	[    8.865089] BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved
> > 
> > 
> >    2. Early messages added _after the timekeeping_ is initialized
> >       but _before the early cycles calibration_.
> > 
> >       They serial console prints them _without the offset_ because
> >       it reads them _before the calibration_, e.g.
> > 
> > 	[    3.288049][    T1] Write protecting the kernel read-only data: 36864k
> > 	[    3.298554][    T1] Freeing unused kernel image (text/rodata gap) memory: 1656K
> > 	[    3.318942][    T1] Freeing unused kernel image (rodata/data gap) memory: 1540K
> > 
> >        But "dmesg" prints them _with the offset_ because it reads them
> >        _after the calibration_, e.g.
> > 
> > 	[   12.179999] [    T1] Write protecting the kernel read-only data: 36864k
> > 	[   12.190505] [    T1] Freeing unused kernel image (text/rodata gap) memory: 1656K
> > 	[   12.210893] [    T1] Freeing unused kernel image (rodata/data gap) memory: 1540K
> > 
> > 
> >    3. Messages added after the calibration of the early cycles.
> > 
> >       They are printed with the offset by both serial console and
> >       dmesg, e.g.
> > 
> > 	[   12.230014][    T1] Early printk times: mult=38775352, shift=27, offset=8891950261 ns
> > 	[   12.246008][    T1] Run /init as init process
> > 	[   12.254944][    T1]   with arguments:
> > 	[   12.264341][    T1]     /init
> 
> This is correct.  I don't want to overwhelm users of this, but there are three time-gathering periods, and roughly
> 2 output times (before calibration and after calibration).
> 
> early boot = before time init, before cycles calibration and offset calculation
> mid boot = after time init, before cycles calibration and offset calculation
> late boot = after time init, after cycles calibration and offset calculation
> All of these are before the start of user space processes.
>
> time of printk	output time	timestamp type	stored	timestamp output
> ------------------	----------------	-----------------------------	------------------------
> early boot	early boot	cycles			0
> mid boot	mid-boot	nanosecs		seconds, with offset from time_init
> late boot	late-boot	nanosecs		seconds with offset from cycle counter start

This is a bit confusing. I looks like the offset from time_init() is
not longer added.

> -----
> early boot	post-boot	(cycles already stored)	seconds with offset from cycle counter start
> mid boot	post-boot	(ns already stored)	seconds with offset from cycle counter start
> late boot	post-boot	(ns already stored)	seconds with offset from cycle counter start
> all others	post-boot	nanosecs		seconds with offset from cycle counter start			

> The confusing thing is messages that are output to the console before calibration and offset calculation.
> Reports from all user space tools (ie from dmesg or journalctl) should be correct and consistent.

Yup.

I like that table. I just wonder how to better distinguish the offset from
timekeeping and calibrated cycles. My variant:

<proposal>
The printk timestamps are stored and interpretted differently in
the following periods:

  - early boot: before timekeeping init, before cycles calibration
  - mid boot:   after timekeeping init, before cycles calibration
  - late boot:  after timekeeping init, after cycles calibration

Console output (immediately):

printk() time	stored value	immediate output(sec)
-----------------------------------------------------------------------------
early boot	cycles		0
mid boot	get_time()	get_time()
late boot	get_time()	get_time + calibrated(cycles offset)

User space tools and late registered consoles:

printk() time	stored value	output after calibration (sec)
-----------------------------------------------------------------------------
early boot	cycles		calibrated(cycles)
mid boot	get_time()	get_time() + calibrated(cycles offset)
late boot	get_time()	get_time() + calibrated(cycles offset)

I am not sure where to put this. One place might be
Documentation/core-api/printk-basics.rst. But it might be better
add a separate file either under core-api/ or under admin-guide/.

> > 
> > > An accessor can be implemented for (struct printk_info).ts_nsec, say
> > > get_timestamp(), which can be called from both the places.
> > 
> > Yeah, a helper function for reading the timestamp might be a cleaner solution.
> > 
> I consider adjust_early_ts() to be such an accessor function.  It's supposed
> to hide the details of the type of the timestamp (cycles or ns) and
> the offset. Maybe this could be renamed to something better,
> like: get_adjusted_ts()? Let me know what you think.

This a bike shedding area ;-)

I personally find

    u64 ts = get_printk_info_ts(info);

a bit cleaner than

    u64 ts = adjust_early_ts(info->ts_nsec);

because you might add a comment into struct printk_info
definition that nobody should read the timestamp directly.
They should use the helper intead.

The helper would do something like:

/*
 * The number of early cycles is stored before the timekeeping gets initialized.
 * The local_clock() value is stored later.
 *
 * Note that early_ts_offset, early_mult, and early_shift are 0
 * before the cycles get calibrated against the official time keeping.
 *
 * Any output after user space start (eg. from dmesg or journalctl)
 * will show consistent values with calibrated cycles and offset.
 */
static inline u64 get_printk_info_ts(const struct printk_info *info)
{
	if (likely(!(info->ts_cycles & EARLY_CYCLES_BIT)))
		/* if timestamp is not in cycles, just add offset */
		return info->ts_nsec + early_ts_offset;

	/* mask high bit and convert to nanoseconds */
	return mul_u64_u32_shr(info->ts_cycles & EARLY_CYCLES_MASK,
			       early_mult, early_shift);
}

Note that I have already used two names (ts_cycles and ts_nsec) which
would point to the same data via an union.

> I'll rebase this patch to resolve the #include conflicts, and address
> this feedback, and hopefully get a new version out this week.

Thanks a lot. v3 looked good enough to me. But v4 will be even better
after the feedback.

Take your time ;-)

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: [PATCH v3] printk: fix zero-valued printk timestamps in early boot
  2026-02-10 23:47 ` [PATCH v3] " Tim Bird
                     ` (2 preceding siblings ...)
  2026-03-09 19:25   ` Shashank Balaji
@ 2026-03-11 15:47   ` Michael Kelley
  2026-03-13  4:52     ` Bird, Tim
  2026-03-26 13:17   ` Thomas Gleixner
  4 siblings, 1 reply; 36+ messages in thread
From: Michael Kelley @ 2026-03-11 15:47 UTC (permalink / raw)
  To: Tim Bird, pmladek@suse.com, rostedt@goodmis.org,
	john.ogness@linuxtronix.de, senozhatsky@chromium.org
  Cc: francesco@valla.it, geert@linux-m68k.org,
	linux-embedded@vger.kernel.org, linux-kernel@vger.kernel.org

From: Tim Bird <tim.bird@sony.com> Sent: Tuesday, February 10, 2026 3:48 PM
> 
> During early boot, printk timestamps are reported as zero before
> kernel timekeeping starts (e.g. before time_init()).  This
> hinders boot-time optimization efforts.  This period is about 400
> milliseconds for many current desktop and embedded machines
> running Linux.
> 
> Add support to save cycles during early boot, and output correct
> timestamp values after timekeeping is initialized.  get_cycles()
> is operational on arm64 and x86_64 from kernel start.  Add code
> and variables to save calibration values used to later convert
> cycle counts to time values in the early printks.  Add a config
> to control the feature.
> 
> This yields non-zero timestamps for printks from the very start
> of kernel execution.  The timestamps are relative to the start of
> the architecture-specified counter used in get_cycles
> (e.g. the TSC on x86_64 and cntvct_el0 on arm64).
> 
> All timestamps reflect time from processor power-on instead of
> time from the kernel's timekeeping initialization.

I tried this patch in linux-next20260302 kernel running as a guest VM
on a Hyper-V host. Two things:

1) In the dmesg output, I'm seeing a place where the timestamps briefly go
backwards -- i.e., they are not monotonically increasing. Here's a snippet,
where there's a smaller timestamp immediately after the tsc is detected:

[   27.994891] SMBIOS 3.1.0 present.
[   27.994893] DMI: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.1 09/25/2025
[   27.994898] DMI: Memory slots populated: 2/2
[   27.995202] Hypervisor detected: Microsoft Hyper-V
[   27.995205] Hyper-V: privilege flags low 0xae7f, high 0x3b8030, ext 0x62, hints 0xa0e24, misc 0xe0bed7b2
[   27.995208] Hyper-V: Nested features: 0x0
[   27.995209] Hyper-V: LAPIC Timer Frequency: 0xc3500
[   27.995210] Hyper-V: Using hypercall for remote TLB flush
[   27.995216] clocksource: hyperv_clocksource_tsc_page: mask: 0xffffffffffffffff max_cycles: 0x24e6a1710, max_idle_ns: 440795202120 ns
[   27.995218] clocksource: hyperv_clocksource_msr: mask: 0xffffffffffffffff max_cycles: 0x24e6a1710, max_idle_ns: 440795202120 ns
[   27.995220] tsc: Detected 2918.401 MHz processor
[   27.991060] e820: update [mem 0x00000000-0x00000fff] System RAM ==> device reserved
[   27.991062] e820: remove [mem 0x000a0000-0x000fffff] System RAM
[   27.991064] last_pfn = 0x210000 max_arch_pfn = 0x400000000
[   27.991065] x86/PAT: PAT support disabled because CONFIG_X86_PAT is disabled in the kernel.
[   27.991066] x86/PAT: Configuration [0-7]: WB  WT  UC- UC  WB  WT  UC- UC

Hyper-V provides a synthetic clocksource (two actually), and perhaps they
are the cause of the problem, though I haven't spent any time debugging.

2) A Linux VM running in the Azure cloud is also running on Hyper-V. Such a
VM typically uses cloud-init to set everything up at boot time, and cloud-init
is outputting lines to the serial console with a timestamp that looks like the
printk() timestamp, but apparently is not adjusted for the early timestamping
that this patch adds. Again, I haven't debugged what's going on -- I'm not
immediately sure of the mechanism that cloud-init uses to do output to the
serial console. The use of the Hyper-V synthetic clock source might the cause
of the problem here as well. Here's an output snippet from the serial console:

[   20.330414] systemd[1]: Condition check resulted in OpenVSwitch configuration for cleanup being skipped.
[   20.332911] systemd[1]: Condition check resulted in Set Up Additional Binary Formats being skipped.
[   20.333257] pstore: Registered efi_pstore as persistent store backend
[   20.334360] systemd[1]: Condition check resulted in File System Check on Root Device being skipped.
[   20.338319] systemd[1]: Starting Load Kernel Modules...
[   20.341094] systemd[1]: Starting Remount Root and Kernel File Systems...
[   20.350993] systemd[1]: Starting udev Coldplug all Devices...
[   20.356255] systemd[1]: Starting Uncomplicated firewall...
[   20.361536] systemd[1]: Started Journal Service.
[   20.386902] EXT4-fs (sda1): re-mounted c02dce0c-0c40-4e6e-88af-c5a0987b0adb r/w.
[   22.532033] /dev/sr0: Can't lookup blockdev
[    7.955973] cloud-init[783]: Cloud-init v. 24.3.1-0ubuntu0~20.04.1 running 'init-local' at Wed, 11 Mar 2026 15:27:06 +0000. Up 7.48 seconds.
[    9.933120] cloud-init[822]: Cloud-init v. 24.3.1-0ubuntu0~20.04.1 running 'init' at Wed, 11 Mar 2026 15:27:08 +0000. Up 9.82 seconds.
[    9.935483] cloud-init[822]: ci-info: ++++++++++++++++++++++++++++++++++++++Net device info+++++++++++++++++++++++++++++++++++++++
[    9.937726] cloud-init[822]: ci-info: +--------+------+-----------------------------+---------------+--------+-------------------+
[    9.939905] cloud-init[822]: ci-info: | Device |  Up  |           Address           |      Mask     | Scope  |     Hw-Address    |
[    9.942059] cloud-init[822]: ci-info: +--------+------+-----------------------------+---------------+--------+-------------------+

The cloud-init lines don't show up in dmesg, so there's no problem there.

I will look into both issues further, but probably not today.

Michael

> 
> Signed-off-by: Tim Bird <tim.bird@sony.com>
> ---
> V2->V3
>  Default CONFIG option to 'n'
>  Move more code from into early_times.h
>   (reducing ifdefs in init/main.c)
>  Use math64 helper routines
>  Use cycles_t instead of u64 type
>  Add #defines for EARLY_CYCLES_BIT and MASK
>  Invert if logic in adjust_early_ts()
>  (note: no change to 'depends on' in Kconfig entry)
> 
> V1->V2
>  Remove calibration CONFIG vars
>  Add 'depends on' to restrict arches (to handle ppc bug)
>  Add early_ts_offset to avoid discontinuity
>  Save cycles in ts_nsec, and convert on output
>  Move conditional code to include file (early_times.h)
> 
>  include/linux/early_times.h | 85 +++++++++++++++++++++++++++++++++++++
>  init/Kconfig                | 14 ++++++
>  init/main.c                 |  6 +++
>  kernel/printk/printk.c      | 18 +++++++-
>  4 files changed, 121 insertions(+), 2 deletions(-)
>  create mode 100644 include/linux/early_times.h
> 
> diff --git a/include/linux/early_times.h b/include/linux/early_times.h
> new file mode 100644
> index 000000000000..05388dcb8573
> --- /dev/null
> +++ b/include/linux/early_times.h
> @@ -0,0 +1,85 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef _KERNEL_PRINTK_EARLY_TIMES_H
> +#define _KERNEL_PRINTK_EARLY_TIMES_H
> +
> +#include <linux/timex.h>
> +#include <linux/clocksource.h>
> +
> +/* use high bit of a u64 to indicate cycles instead of a timestamp */
> +#define EARLY_CYCLES_BIT	BIT_ULL(63)
> +#define EARLY_CYCLES_MASK	~(BIT_ULL(63))
> +
> +#if defined(CONFIG_EARLY_PRINTK_TIMES)
> +extern cycles_t start_cycles;
> +extern u64 start_ns;
> +extern u32 early_mult, early_shift;
> +extern u64 early_ts_offset;
> +
> +static inline void early_times_start_calibration(void)
> +{
> +	start_cycles = get_cycles();
> +	start_ns = local_clock();
> +}
> +
> +static inline void early_times_finish_calibration(void)
> +{
> +	cycles_t end_cycles;
> +	u64 end_ns;
> +
> +	/* set calibration data for early_printk_times */
> +	end_cycles = get_cycles();
> +	end_ns = local_clock();
> +	clocks_calc_mult_shift(&early_mult, &early_shift,
> +		mul_u64_u64_div_u64(end_cycles - start_cycles,
> +			NSEC_PER_SEC, end_ns - start_ns),
> +		NSEC_PER_SEC, 100);
> +	early_ts_offset = mul_u64_u32_shr(start_cycles, early_mult, early_shift) -
> start_ns;
> +
> +	pr_debug("Early printk times: mult=%u, shift=%u, offset=%llu ns\n",
> +		early_mult, early_shift, early_ts_offset);
> +}
> +
> +static inline u64 early_cycles(void)
> +{
> +	return (get_cycles() | EARLY_CYCLES_BIT);
> +}
> +
> +/*
> + * adjust_early_ts detects whether ts in is cycles or nanoseconds
> + * and converts it or adjusts it, taking into account the offset
> + * from cycle-counter start.
> + *
> + * Note that early_mult may be 0, but that's OK because
> + * we'll just multiply by 0 and return 0. This will
> + * only occur if we're outputting a printk message
> + * before the calibration of the early timestamp.
> + * Any output after user space start (eg. from dmesg or
> + * journalctl) will show correct values.
> + */
> +static inline u64 adjust_early_ts(u64 ts)
> +{
> +	if (likely(!(ts & EARLY_CYCLES_BIT)))
> +		/* if timestamp is not in cycles, just add offset */
> +		return ts + early_ts_offset;
> +
> +	/* mask high bit and convert to nanoseconds */
> +	return mul_u64_u32_shr(ts & EARLY_CYCLES_MASK, early_mult, early_shift);
> +}
> +
> +#else
> +# define early_times_start_calibration() do { } while (0)
> +# define early_times_finish_calibration() do { } while (0)
> +
> +static inline u64 early_cycles(void)
> +{
> +	return 0;
> +}
> +
> +static inline u64 adjust_early_ts(u64 ts)
> +{
> +	return ts;
> +}
> +#endif /* CONFIG_EARLY_PRINTK_TIMES */
> +
> +#endif /* _KERNEL_PRINTK_EARLY_TIMES_H */
> diff --git a/init/Kconfig b/init/Kconfig
> index fa79feb8fe57..a928c1efb09d 100644
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -777,6 +777,20 @@ config IKHEADERS
>  	  or similar programs.  If you build the headers as a module, a module called
>  	  kheaders.ko is built which can be loaded on-demand to get access to headers.
> 
> +config EARLY_PRINTK_TIMES
> +	bool "Show non-zero printk timestamps early in boot"
> +	default n
> +	depends on PRINTK
> +	depends on ARM64 || X86_64
> +	help
> +	  Use a cycle-counter to provide printk timestamps during
> +	  early boot.  This allows seeing timestamps for printks that
> +	  would otherwise show as 0.  Note that this will shift the
> +	  printk timestamps to be relative to processor power on, instead
> +	  of relative to the start of kernel timekeeping.  This should be
> +	  closer to machine power on, giving a better indication of
> +	  overall boot time.
> +
>  config LOG_BUF_SHIFT
>  	int "Kernel log buffer size (16 => 64KB, 17 => 128KB)"
>  	range 12 25
> diff --git a/init/main.c b/init/main.c
> index b84818ad9685..d5774aec1aff 100644
> --- a/init/main.c
> +++ b/init/main.c
> @@ -104,6 +104,7 @@
>  #include <linux/pidfs.h>
>  #include <linux/ptdump.h>
>  #include <linux/time_namespace.h>
> +#include <linux/early_times.h>
>  #include <net/net_namespace.h>
> 
>  #include <asm/io.h>
> @@ -1118,6 +1119,9 @@ void start_kernel(void)
>  	timekeeping_init();
>  	time_init();
> 
> +	/* This must be after timekeeping is initialized */
> +	early_times_start_calibration();
> +
>  	/* This must be after timekeeping is initialized */
>  	random_init();
> 
> @@ -1600,6 +1604,8 @@ static int __ref kernel_init(void *unused)
> 
>  	do_sysctl_args();
> 
> +	early_times_finish_calibration();
> +
>  	if (ramdisk_execute_command) {
>  		ret = run_init_process(ramdisk_execute_command);
>  		if (!ret)
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index 1d765ad242b8..5afd31c3345c 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -46,6 +46,7 @@
>  #include <linux/ctype.h>
>  #include <linux/uio.h>
>  #include <linux/sched/clock.h>
> +#include <linux/early_times.h>
>  #include <linux/sched/debug.h>
>  #include <linux/sched/task_stack.h>
>  #include <linux/panic.h>
> @@ -75,6 +76,13 @@ EXPORT_SYMBOL(ignore_console_lock_warning);
> 
>  EXPORT_TRACEPOINT_SYMBOL_GPL(console);
> 
> +#ifdef CONFIG_EARLY_PRINTK_TIMES
> +cycles_t start_cycles;
> +u64 start_ns;
> +u32 early_mult, early_shift;
> +u64 early_ts_offset;
> +#endif
> +
>  /*
>   * Low level drivers may need that to know if they can schedule in
>   * their unblank() callback or not. So let's export it.
> @@ -639,7 +647,7 @@ static void append_char(char **pp, char *e, char c)
>  static ssize_t info_print_ext_header(char *buf, size_t size,
>  				     struct printk_info *info)
>  {
> -	u64 ts_usec = info->ts_nsec;
> +	u64 ts_usec = adjust_early_ts(info->ts_nsec);
>  	char caller[20];
>  #ifdef CONFIG_PRINTK_CALLER
>  	u32 id = info->caller_id;
> @@ -1352,7 +1360,11 @@ static size_t print_syslog(unsigned int level, char *buf)
> 
>  static size_t print_time(u64 ts, char *buf)
>  {
> -	unsigned long rem_nsec = do_div(ts, 1000000000);
> +	unsigned long rem_nsec;
> +
> +	ts = adjust_early_ts(ts);
> +
> +	rem_nsec = do_div(ts, 1000000000);
> 
>  	return sprintf(buf, "[%5lu.%06lu]",
>  		       (unsigned long)ts, rem_nsec / 1000);
> @@ -2242,6 +2254,8 @@ int vprintk_store(int facility, int level,
>  	 * timestamp with respect to the caller.
>  	 */
>  	ts_nsec = local_clock();
> +	if (!ts_nsec)
> +		ts_nsec = early_cycles();
> 
>  	caller_id = printk_caller_id();
> 
> --
> 2.43.0
> 


^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: [PATCH v3] printk: fix zero-valued printk timestamps in early boot
  2026-03-11 15:47   ` Michael Kelley
@ 2026-03-13  4:52     ` Bird, Tim
  2026-03-13 10:45       ` Petr Mladek
  0 siblings, 1 reply; 36+ messages in thread
From: Bird, Tim @ 2026-03-13  4:52 UTC (permalink / raw)
  To: Michael Kelley, pmladek@suse.com, rostedt@goodmis.org,
	senozhatsky@chromium.org
  Cc: francesco@valla.it, geert@linux-m68k.org,
	linux-embedded@vger.kernel.org, linux-kernel@vger.kernel.org

Hey Micheal,

This report is very interesting. 

Thanks very much for trying it out!
 

> -----Original Message-----
> From: Michael Kelley <mhklinux@outlook.com>
> Sent: Wednesday, March 11, 2026 9:47 AM
> From: Tim Bird <tim.bird@sony.com> Sent: Tuesday, February 10, 2026 3:48 PM
> >
> > During early boot, printk timestamps are reported as zero before
> > kernel timekeeping starts (e.g. before time_init()).  This
> > hinders boot-time optimization efforts.  This period is about 400
> > milliseconds for many current desktop and embedded machines
> > running Linux.
> >
> > Add support to save cycles during early boot, and output correct
> > timestamp values after timekeeping is initialized.  get_cycles()
> > is operational on arm64 and x86_64 from kernel start.  Add code
> > and variables to save calibration values used to later convert
> > cycle counts to time values in the early printks.  Add a config
> > to control the feature.
> >
> > This yields non-zero timestamps for printks from the very start
> > of kernel execution.  The timestamps are relative to the start of
> > the architecture-specified counter used in get_cycles
> > (e.g. the TSC on x86_64 and cntvct_el0 on arm64).
> >
> > All timestamps reflect time from processor power-on instead of
> > time from the kernel's timekeeping initialization.
> 
> I tried this patch in linux-next20260302 kernel running as a guest VM
> on a Hyper-V host. Two things:
> 
> 1) In the dmesg output, I'm seeing a place where the timestamps briefly go
> backwards -- i.e., they are not monotonically increasing. Here's a snippet,
> where there's a smaller timestamp immediately after the tsc is detected:
> 
> [   27.994891] SMBIOS 3.1.0 present.
> [   27.994893] DMI: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.1 09/25/2025
> [   27.994898] DMI: Memory slots populated: 2/2
> [   27.995202] Hypervisor detected: Microsoft Hyper-V
> [   27.995205] Hyper-V: privilege flags low 0xae7f, high 0x3b8030, ext 0x62, hints 0xa0e24, misc 0xe0bed7b2
> [   27.995208] Hyper-V: Nested features: 0x0
> [   27.995209] Hyper-V: LAPIC Timer Frequency: 0xc3500
> [   27.995210] Hyper-V: Using hypercall for remote TLB flush
> [   27.995216] clocksource: hyperv_clocksource_tsc_page: mask: 0xffffffffffffffff max_cycles: 0x24e6a1710, max_idle_ns: 440795202120 ns
> [   27.995218] clocksource: hyperv_clocksource_msr: mask: 0xffffffffffffffff max_cycles: 0x24e6a1710, max_idle_ns: 440795202120 ns
> [   27.995220] tsc: Detected 2918.401 MHz processor

I wonder if the tsc is getting fiddled with or virtualized somewhere in here, as part of clocksource initialization.
I believe each clocksource in the kernel maintains it's own internal offset, and maybe the offset that is
being used ends up being slightly different from the cycle-counter offset that the early_times feature uses.
I'm just throwing out guesses.  It's about a 4ms delta, which is pretty big.

> [   27.991060] e820: update [mem 0x00000000-0x00000fff] System RAM ==> device reserved
> [   27.991062] e820: remove [mem 0x000a0000-0x000fffff] System RAM
> [   27.991064] last_pfn = 0x210000 max_arch_pfn = 0x400000000
> [   27.991065] x86/PAT: PAT support disabled because CONFIG_X86_PAT is disabled in the kernel.
> [   27.991066] x86/PAT: Configuration [0-7]: WB  WT  UC- UC  WB  WT  UC- UC
> 
> Hyper-V provides a synthetic clocksource (two actually), and perhaps they
> are the cause of the problem, though I haven't spent any time debugging.
> 
> 2) A Linux VM running in the Azure cloud is also running on Hyper-V. Such a
> VM typically uses cloud-init to set everything up at boot time, and cloud-init
> is outputting lines to the serial console with a timestamp that looks like the
> printk() timestamp, but apparently is not adjusted for the early timestamping
> that this patch adds. Again, I haven't debugged what's going on -- I'm not
> immediately sure of the mechanism that cloud-init uses to do output to the
> serial console. The use of the Hyper-V synthetic clock source might the cause
> of the problem here as well. Here's an output snippet from the serial console:
> 
> [   20.330414] systemd[1]: Condition check resulted in OpenVSwitch configuration for cleanup being skipped.
> [   20.332911] systemd[1]: Condition check resulted in Set Up Additional Binary Formats being skipped.
> [   20.333257] pstore: Registered efi_pstore as persistent store backend
> [   20.334360] systemd[1]: Condition check resulted in File System Check on Root Device being skipped.
> [   20.338319] systemd[1]: Starting Load Kernel Modules...
> [   20.341094] systemd[1]: Starting Remount Root and Kernel File Systems...
> [   20.350993] systemd[1]: Starting udev Coldplug all Devices...
> [   20.356255] systemd[1]: Starting Uncomplicated firewall...
> [   20.361536] systemd[1]: Started Journal Service.
> [   20.386902] EXT4-fs (sda1): re-mounted c02dce0c-0c40-4e6e-88af-c5a0987b0adb r/w.
> [   22.532033] /dev/sr0: Can't lookup blockdev
> [    7.955973] cloud-init[783]: Cloud-init v. 24.3.1-0ubuntu0~20.04.1 running 'init-local' at Wed, 11 Mar 2026 15:27:06 +0000. Up 7.48
> seconds.
> [    9.933120] cloud-init[822]: Cloud-init v. 24.3.1-0ubuntu0~20.04.1 running 'init' at Wed, 11 Mar 2026 15:27:08 +0000. Up 9.82 seconds.
> [    9.935483] cloud-init[822]: ci-info: ++++++++++++++++++++++++++++++++++++++Net device
> info+++++++++++++++++++++++++++++++++++++++
> [    9.937726] cloud-init[822]: ci-info: +--------+------+-----------------------------+---------------+--------+-------------------+
> [    9.939905] cloud-init[822]: ci-info: | Device |  Up  |           Address           |      Mask     | Scope  |     Hw-Address    |
> [    9.942059] cloud-init[822]: ci-info: +--------+------+-----------------------------+---------------+--------+-------------------+
> 
> The cloud-init lines don't show up in dmesg, so there's no problem there.

Are the timestamp values for cloud-init usually consistent with the ones from the kernel?
That is, without the early_times patch, do they usually match kernel's printk timestamps?

It sounds like these cloud-init messages are emitted directly to the serial console, and do
not go through the printk system.  Is that right?  How do they avoid having the printk
messages and cloud-init messages intermingle?  I assume cloud-init is a systemd service?

Can you share the calibration lines from this boot?  I'm curious if the timestamps
just relative to local_clock().

> I will look into both issues further, but probably not today.

Thanks for trying the code out.  Let me know if you find out anything.
 -- Tim

> Michael
> 
> >
> > Signed-off-by: Tim Bird <tim.bird@sony.com>
> > ---
> > V2->V3
> >  Default CONFIG option to 'n'
> >  Move more code from into early_times.h
> >   (reducing ifdefs in init/main.c)
> >  Use math64 helper routines
> >  Use cycles_t instead of u64 type
> >  Add #defines for EARLY_CYCLES_BIT and MASK
> >  Invert if logic in adjust_early_ts()
> >  (note: no change to 'depends on' in Kconfig entry)
> >
> > V1->V2
> >  Remove calibration CONFIG vars
> >  Add 'depends on' to restrict arches (to handle ppc bug)
> >  Add early_ts_offset to avoid discontinuity
> >  Save cycles in ts_nsec, and convert on output
> >  Move conditional code to include file (early_times.h)
> >
> >  include/linux/early_times.h | 85 +++++++++++++++++++++++++++++++++++++
> >  init/Kconfig                | 14 ++++++
> >  init/main.c                 |  6 +++
> >  kernel/printk/printk.c      | 18 +++++++-
> >  4 files changed, 121 insertions(+), 2 deletions(-)
> >  create mode 100644 include/linux/early_times.h
> >
> > diff --git a/include/linux/early_times.h b/include/linux/early_times.h
> > new file mode 100644
> > index 000000000000..05388dcb8573
> > --- /dev/null
> > +++ b/include/linux/early_times.h
> > @@ -0,0 +1,85 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +
> > +#ifndef _KERNEL_PRINTK_EARLY_TIMES_H
> > +#define _KERNEL_PRINTK_EARLY_TIMES_H
> > +
> > +#include <linux/timex.h>
> > +#include <linux/clocksource.h>
> > +
> > +/* use high bit of a u64 to indicate cycles instead of a timestamp */
> > +#define EARLY_CYCLES_BIT	BIT_ULL(63)
> > +#define EARLY_CYCLES_MASK	~(BIT_ULL(63))
> > +
> > +#if defined(CONFIG_EARLY_PRINTK_TIMES)
> > +extern cycles_t start_cycles;
> > +extern u64 start_ns;
> > +extern u32 early_mult, early_shift;
> > +extern u64 early_ts_offset;
> > +
> > +static inline void early_times_start_calibration(void)
> > +{
> > +	start_cycles = get_cycles();
> > +	start_ns = local_clock();
> > +}
> > +
> > +static inline void early_times_finish_calibration(void)
> > +{
> > +	cycles_t end_cycles;
> > +	u64 end_ns;
> > +
> > +	/* set calibration data for early_printk_times */
> > +	end_cycles = get_cycles();
> > +	end_ns = local_clock();
> > +	clocks_calc_mult_shift(&early_mult, &early_shift,
> > +		mul_u64_u64_div_u64(end_cycles - start_cycles,
> > +			NSEC_PER_SEC, end_ns - start_ns),
> > +		NSEC_PER_SEC, 100);
> > +	early_ts_offset = mul_u64_u32_shr(start_cycles, early_mult, early_shift) -
> > start_ns;
> > +
> > +	pr_debug("Early printk times: mult=%u, shift=%u, offset=%llu ns\n",
> > +		early_mult, early_shift, early_ts_offset);
> > +}
> > +
> > +static inline u64 early_cycles(void)
> > +{
> > +	return (get_cycles() | EARLY_CYCLES_BIT);
> > +}
> > +
> > +/*
> > + * adjust_early_ts detects whether ts in is cycles or nanoseconds
> > + * and converts it or adjusts it, taking into account the offset
> > + * from cycle-counter start.
> > + *
> > + * Note that early_mult may be 0, but that's OK because
> > + * we'll just multiply by 0 and return 0. This will
> > + * only occur if we're outputting a printk message
> > + * before the calibration of the early timestamp.
> > + * Any output after user space start (eg. from dmesg or
> > + * journalctl) will show correct values.
> > + */
> > +static inline u64 adjust_early_ts(u64 ts)
> > +{
> > +	if (likely(!(ts & EARLY_CYCLES_BIT)))
> > +		/* if timestamp is not in cycles, just add offset */
> > +		return ts + early_ts_offset;
> > +
> > +	/* mask high bit and convert to nanoseconds */
> > +	return mul_u64_u32_shr(ts & EARLY_CYCLES_MASK, early_mult, early_shift);
> > +}
> > +
> > +#else
> > +# define early_times_start_calibration() do { } while (0)
> > +# define early_times_finish_calibration() do { } while (0)
> > +
> > +static inline u64 early_cycles(void)
> > +{
> > +	return 0;
> > +}
> > +
> > +static inline u64 adjust_early_ts(u64 ts)
> > +{
> > +	return ts;
> > +}
> > +#endif /* CONFIG_EARLY_PRINTK_TIMES */
> > +
> > +#endif /* _KERNEL_PRINTK_EARLY_TIMES_H */
> > diff --git a/init/Kconfig b/init/Kconfig
> > index fa79feb8fe57..a928c1efb09d 100644
> > --- a/init/Kconfig
> > +++ b/init/Kconfig
> > @@ -777,6 +777,20 @@ config IKHEADERS
> >  	  or similar programs.  If you build the headers as a module, a module called
> >  	  kheaders.ko is built which can be loaded on-demand to get access to headers.
> >
> > +config EARLY_PRINTK_TIMES
> > +	bool "Show non-zero printk timestamps early in boot"
> > +	default n
> > +	depends on PRINTK
> > +	depends on ARM64 || X86_64
> > +	help
> > +	  Use a cycle-counter to provide printk timestamps during
> > +	  early boot.  This allows seeing timestamps for printks that
> > +	  would otherwise show as 0.  Note that this will shift the
> > +	  printk timestamps to be relative to processor power on, instead
> > +	  of relative to the start of kernel timekeeping.  This should be
> > +	  closer to machine power on, giving a better indication of
> > +	  overall boot time.
> > +
> >  config LOG_BUF_SHIFT
> >  	int "Kernel log buffer size (16 => 64KB, 17 => 128KB)"
> >  	range 12 25
> > diff --git a/init/main.c b/init/main.c
> > index b84818ad9685..d5774aec1aff 100644
> > --- a/init/main.c
> > +++ b/init/main.c
> > @@ -104,6 +104,7 @@
> >  #include <linux/pidfs.h>
> >  #include <linux/ptdump.h>
> >  #include <linux/time_namespace.h>
> > +#include <linux/early_times.h>
> >  #include <net/net_namespace.h>
> >
> >  #include <asm/io.h>
> > @@ -1118,6 +1119,9 @@ void start_kernel(void)
> >  	timekeeping_init();
> >  	time_init();
> >
> > +	/* This must be after timekeeping is initialized */
> > +	early_times_start_calibration();
> > +
> >  	/* This must be after timekeeping is initialized */
> >  	random_init();
> >
> > @@ -1600,6 +1604,8 @@ static int __ref kernel_init(void *unused)
> >
> >  	do_sysctl_args();
> >
> > +	early_times_finish_calibration();
> > +
> >  	if (ramdisk_execute_command) {
> >  		ret = run_init_process(ramdisk_execute_command);
> >  		if (!ret)
> > diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> > index 1d765ad242b8..5afd31c3345c 100644
> > --- a/kernel/printk/printk.c
> > +++ b/kernel/printk/printk.c
> > @@ -46,6 +46,7 @@
> >  #include <linux/ctype.h>
> >  #include <linux/uio.h>
> >  #include <linux/sched/clock.h>
> > +#include <linux/early_times.h>
> >  #include <linux/sched/debug.h>
> >  #include <linux/sched/task_stack.h>
> >  #include <linux/panic.h>
> > @@ -75,6 +76,13 @@ EXPORT_SYMBOL(ignore_console_lock_warning);
> >
> >  EXPORT_TRACEPOINT_SYMBOL_GPL(console);
> >
> > +#ifdef CONFIG_EARLY_PRINTK_TIMES
> > +cycles_t start_cycles;
> > +u64 start_ns;
> > +u32 early_mult, early_shift;
> > +u64 early_ts_offset;
> > +#endif
> > +
> >  /*
> >   * Low level drivers may need that to know if they can schedule in
> >   * their unblank() callback or not. So let's export it.
> > @@ -639,7 +647,7 @@ static void append_char(char **pp, char *e, char c)
> >  static ssize_t info_print_ext_header(char *buf, size_t size,
> >  				     struct printk_info *info)
> >  {
> > -	u64 ts_usec = info->ts_nsec;
> > +	u64 ts_usec = adjust_early_ts(info->ts_nsec);
> >  	char caller[20];
> >  #ifdef CONFIG_PRINTK_CALLER
> >  	u32 id = info->caller_id;
> > @@ -1352,7 +1360,11 @@ static size_t print_syslog(unsigned int level, char *buf)
> >
> >  static size_t print_time(u64 ts, char *buf)
> >  {
> > -	unsigned long rem_nsec = do_div(ts, 1000000000);
> > +	unsigned long rem_nsec;
> > +
> > +	ts = adjust_early_ts(ts);
> > +
> > +	rem_nsec = do_div(ts, 1000000000);
> >
> >  	return sprintf(buf, "[%5lu.%06lu]",
> >  		       (unsigned long)ts, rem_nsec / 1000);
> > @@ -2242,6 +2254,8 @@ int vprintk_store(int facility, int level,
> >  	 * timestamp with respect to the caller.
> >  	 */
> >  	ts_nsec = local_clock();
> > +	if (!ts_nsec)
> > +		ts_nsec = early_cycles();
> >
> >  	caller_id = printk_caller_id();
> >
> > --
> > 2.43.0
> >


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v3] printk: fix zero-valued printk timestamps in early boot
  2026-03-13  4:52     ` Bird, Tim
@ 2026-03-13 10:45       ` Petr Mladek
  2026-03-14 14:16         ` Shashank Balaji
                           ` (2 more replies)
  0 siblings, 3 replies; 36+ messages in thread
From: Petr Mladek @ 2026-03-13 10:45 UTC (permalink / raw)
  To: Bird, Tim
  Cc: Michael Kelley, rostedt@goodmis.org, senozhatsky@chromium.org,
	francesco@valla.it, geert@linux-m68k.org,
	linux-embedded@vger.kernel.org, linux-kernel@vger.kernel.org,
	Thomas Gleixner, John Stultz, Stephen Boyd, John Ogness

Finally added timekeeping maintainers and John into Cc.
We should have added time since v1.

Anyway, you might see the entire history at
https://lore.kernel.org/all/39b09edb-8998-4ebd-a564-7d594434a981@bird.org/


On Fri 2026-03-13 04:52:40, Bird, Tim wrote:
> Hey Micheal,
> 
> This report is very interesting. 
> 
> Thanks very much for trying it out!
> 
> > -----Original Message-----
> > From: Michael Kelley <mhklinux@outlook.com>
> > Sent: Wednesday, March 11, 2026 9:47 AM
> > From: Tim Bird <tim.bird@sony.com> Sent: Tuesday, February 10, 2026 3:48 PM
> > >
> > > During early boot, printk timestamps are reported as zero before
> > > kernel timekeeping starts (e.g. before time_init()).  This
> > > hinders boot-time optimization efforts.  This period is about 400
> > > milliseconds for many current desktop and embedded machines
> > > running Linux.
> > >
> > > Add support to save cycles during early boot, and output correct
> > > timestamp values after timekeeping is initialized.  get_cycles()
> > > is operational on arm64 and x86_64 from kernel start.  Add code
> > > and variables to save calibration values used to later convert
> > > cycle counts to time values in the early printks.  Add a config
> > > to control the feature.
> > >
> > > This yields non-zero timestamps for printks from the very start
> > > of kernel execution.  The timestamps are relative to the start of
> > > the architecture-specified counter used in get_cycles
> > > (e.g. the TSC on x86_64 and cntvct_el0 on arm64).
> > >
> > > All timestamps reflect time from processor power-on instead of
> > > time from the kernel's timekeeping initialization.
> > 
> > I tried this patch in linux-next20260302 kernel running as a guest VM
> > on a Hyper-V host. Two things:
> > 
> > 1) In the dmesg output, I'm seeing a place where the timestamps briefly go
> > backwards -- i.e., they are not monotonically increasing. Here's a snippet,
> > where there's a smaller timestamp immediately after the tsc is detected:
> > 
> > [   27.994891] SMBIOS 3.1.0 present.
> > [   27.994893] DMI: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.1 09/25/2025
> > [   27.994898] DMI: Memory slots populated: 2/2
> > [   27.995202] Hypervisor detected: Microsoft Hyper-V
> > [   27.995205] Hyper-V: privilege flags low 0xae7f, high 0x3b8030, ext 0x62, hints 0xa0e24, misc 0xe0bed7b2
> > [   27.995208] Hyper-V: Nested features: 0x0
> > [   27.995209] Hyper-V: LAPIC Timer Frequency: 0xc3500
> > [   27.995210] Hyper-V: Using hypercall for remote TLB flush
> > [   27.995216] clocksource: hyperv_clocksource_tsc_page: mask: 0xffffffffffffffff max_cycles: 0x24e6a1710, max_idle_ns: 440795202120 ns
> > [   27.995218] clocksource: hyperv_clocksource_msr: mask: 0xffffffffffffffff max_cycles: 0x24e6a1710, max_idle_ns: 440795202120 ns
> > [   27.995220] tsc: Detected 2918.401 MHz processor
> 
> I wonder if the tsc is getting fiddled with or virtualized somewhere in here, as part of clocksource initialization.
> I believe each clocksource in the kernel maintains it's own internal offset, and maybe the offset that is
> being used ends up being slightly different from the cycle-counter offset that the early_times feature uses.
> I'm just throwing out guesses.  It's about a 4ms delta, which is pretty big.
> 
> > [   27.991060] e820: update [mem 0x00000000-0x00000fff] System RAM ==> device reserved
> > [   27.991062] e820: remove [mem 0x000a0000-0x000fffff] System RAM
> > [   27.991064] last_pfn = 0x210000 max_arch_pfn = 0x400000000
> > [   27.991065] x86/PAT: PAT support disabled because CONFIG_X86_PAT is disabled in the kernel.
> > [   27.991066] x86/PAT: Configuration [0-7]: WB  WT  UC- UC  WB  WT  UC- UC

I wonder how the calibration of the cycles is precise. I wonder if
the problem might be that cycles were faster right after boot than
later during the calibration.

I added the following debug output on top of this patch:

diff --git a/include/linux/early_times.h b/include/linux/early_times.h
index 05388dcb8573..cdb467345bcc 100644
--- a/include/linux/early_times.h
+++ b/include/linux/early_times.h
@@ -20,6 +20,7 @@ static inline void early_times_start_calibration(void)
 {
 	start_cycles = get_cycles();
 	start_ns = local_clock();
+	pr_info("Early printk times: started callibration: %llu ns\n", start_ns);
 }
 
 static inline void early_times_finish_calibration(void)
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 774ffb1fa5ac..836cb03aaa6d 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2308,6 +2308,8 @@ int vprintk_store(int facility, int level,
 	ts_nsec = local_clock();
 	if (!ts_nsec)
 		ts_nsec = early_cycles();
+	else
+		pr_info_once("local_clock() returned non-zero timestamp: %llu nsec\n", ts_nsec);
 
 	caller_id = printk_caller_id();
 

And it produced in my kvm:

Let's say that start of the cycle counter is

Start of stage A

[    8.684438] Linux version 7.0.0-rc2-default+ (pmladek@pathway) (gcc (SUSE Linux) 15.2.1 20260202, GNU ld (GNU Binutils; openSUSE Tumbleweed) 2.45.0.20251103-2) #571 SMP PREEMPT_DYNAMIC Fri Mar 13 10:23:54 CET 2026
[    8.684442] Command line: BOOT_IMAGE=/boot/vmlinuz-7.0.0-rc2-default+ root=/dev/vda2 resume=/dev/disk/by-uuid/369c7453-3d16-409d-88b2-5de027891a12 mitigations=auto nosplash earlycon=uart8250,io,0x3f8,115200 console=ttyS0,115200 console=tty0 ignore_loglevel log_buf_len=1M crashkernel=512M,high crashkernel=72M,low
[...]
[    8.696633] earlycon: uart8250 at I/O port 0x3f8 (options '115200')
[    8.696639] printk: legacy bootconsole [uart8250] enabled
[    8.731303] printk: debug: ignoring loglevel setting.
[    8.732349] NX (Execute Disable) protection: active
[    8.733447] APIC: Static calls initialized
[    8.734667] SMBIOS 2.8 present.
[    8.735358] DMI: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.17.0-2-g4f253b9b-prebuilt.qemu.org 04/01/2014
[    8.737285] DMI: Memory slots populated: 1/1
[    8.738380] Hypervisor detected: KVM
[    8.739151] last_pfn = 0x7ffdc max_arch_pfn = 0x400000000
[    8.740254] kvm-clock: Using msrs 4b564d01 and 4b564d00
[    8.732971] printk: local_clock() returned non-zero timestamp: 3486 nsec

End of stage A

This is the point where printk() started storing the values from
local_clock() instead of cycles.

Start of stage B

[    8.732971] kvm-clock: using sched offset of 252367014082295 cycles
[    8.735471] clocksource: kvm-clock: mask: 0xffffffffffffffff max_cycles: 0x1cd42e4dffb, max_idle_ns: 881590591483 ns
[    8.738880] tsc: Detected 3293.776 MHz processor
[    8.740474] e820: update [mem 0x00000000-0x00000fff] System RAM ==> device reserved
[...]
[    8.932671] rcu: srcu_init: Setting srcu_struct sizes based on contention.
[    8.934047] Early printk times: started callibration: 201079507 ns

End of stage B

This is where we started calibration of early cycles.

Start of stage C

[    8.935571] Console: colour dummy device 80x25
[...]
[    9.289673] thermal_sys: Registered thermal governor 'fair_share'
[    9.290077] thermal_sys: Registered thermal governor 'bang_bang'
[    9.292290] thermal_sys: Registered thermal governor 'step_wise'
[    9.293530] thermal_sys: Registered thermal governor 'user_space'
[    9.294856] cpuidle: using governor ladder
[    9.296302] cpuidle: using governor menu

Here the thermal governors are registered. I guess that they might
reduce speed of some HW.

[...]
[   11.974147] clk: Disabling unused clocks

Some unused clocks are disabled. I wonder if this might affect
counting the cycles.

[   12.330852] Freeing unused kernel image (rodata/data gap) memory: 1500K
[   12.351191] Early printk times: mult=19634245, shift=26, offset=8732967929 ns

End of stage C

Here is the end on calibration.

Now, if the frequence of the cycles was:

   + was higher in the stage A when only cycles were stored
   + was lower in stage C when it was calibrated against local_clock()

Then it might result in higher (calibrated) timestamps in stage A
and step back in stage B.

Or something like this. It is possible that even local_clock() does
not have a stable frequence during the early boot.

Idea: A solution might be to start calibration when printk()
      gets first non-zero time from local_clock.

      Something like:

diff --git a/include/linux/early_times.h b/include/linux/early_times.h
index 05388dcb8573..09d278996184 100644
--- a/include/linux/early_times.h
+++ b/include/linux/early_times.h
@@ -16,10 +16,13 @@ extern u64 start_ns;
 extern u32 early_mult, early_shift;
 extern u64 early_ts_offset;
 
-static inline void early_times_start_calibration(void)
+static inline void early_times_may_start_calibration(u64 ts_ns)
 {
+	if (start_ns)
+		return;
+
+	start_ns = ts_ns;
 	start_cycles = get_cycles();
-	start_ns = local_clock();
 }
 
 static inline void early_times_finish_calibration(void)
diff --git a/init/main.c b/init/main.c
index 27835270dfb5..a333b0da69cf 100644
--- a/init/main.c
+++ b/init/main.c
@@ -1123,9 +1123,6 @@ void start_kernel(void)
 	timekeeping_init();
 	time_init();
 
-	/* This must be after timekeeping is initialized */
-	early_times_start_calibration();
-
 	/* This must be after timekeeping is initialized */
 	random_init();
 
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 774ffb1fa5ac..19330b6b4eb2 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2306,7 +2306,9 @@ int vprintk_store(int facility, int level,
 	 * timestamp with respect to the caller.
 	 */
 	ts_nsec = local_clock();
-	if (!ts_nsec)
+	if (ts_nsec)
+		early_times_may_start_calibration(ts_nsec);
+	else
 		ts_nsec = early_cycles();
 
 	caller_id = printk_caller_id();


> > 
> > 2) A Linux VM running in the Azure cloud is also running on Hyper-V. Such a
> > VM typically uses cloud-init to set everything up at boot time, and cloud-init
> > is outputting lines to the serial console with a timestamp that looks like the
> > printk() timestamp, but apparently is not adjusted for the early timestamping
> > that this patch adds. Again, I haven't debugged what's going on -- I'm not
> > immediately sure of the mechanism that cloud-init uses to do output to the
> > serial console. The use of the Hyper-V synthetic clock source might the cause
> > of the problem here as well. Here's an output snippet from the serial console:
> > 
> > [   20.330414] systemd[1]: Condition check resulted in OpenVSwitch configuration for cleanup being skipped.
> > [   20.332911] systemd[1]: Condition check resulted in Set Up Additional Binary Formats being skipped.
> > [   20.333257] pstore: Registered efi_pstore as persistent store backend
> > [   20.334360] systemd[1]: Condition check resulted in File System Check on Root Device being skipped.
> > [   20.338319] systemd[1]: Starting Load Kernel Modules...
> > [   20.341094] systemd[1]: Starting Remount Root and Kernel File Systems...
> > [   20.350993] systemd[1]: Starting udev Coldplug all Devices...
> > [   20.356255] systemd[1]: Starting Uncomplicated firewall...
> > [   20.361536] systemd[1]: Started Journal Service.
> > [   20.386902] EXT4-fs (sda1): re-mounted c02dce0c-0c40-4e6e-88af-c5a0987b0adb r/w.
> > [   22.532033] /dev/sr0: Can't lookup blockdev
> > [    7.955973] cloud-init[783]: Cloud-init v. 24.3.1-0ubuntu0~20.04.1 running 'init-local' at Wed, 11 Mar 2026 15:27:06 +0000. Up 7.48
> > seconds.
> > [    9.933120] cloud-init[822]: Cloud-init v. 24.3.1-0ubuntu0~20.04.1 running 'init' at Wed, 11 Mar 2026 15:27:08 +0000. Up 9.82 seconds.
> > [    9.935483] cloud-init[822]: ci-info: ++++++++++++++++++++++++++++++++++++++Net device
> > info+++++++++++++++++++++++++++++++++++++++
> > [    9.937726] cloud-init[822]: ci-info: +--------+------+-----------------------------+---------------+--------+-------------------+
> > [    9.939905] cloud-init[822]: ci-info: | Device |  Up  |           Address           |      Mask     | Scope  |     Hw-Address    |
> > [    9.942059] cloud-init[822]: ci-info: +--------+------+-----------------------------+---------------+--------+-------------------+

This is more complicated. I wonder if the timestamps from cloud-init()
are somehow synchronized with local_clock().

We might need to synchronize local_clock() with the cycles as well.
But there is the chicken&egg problem. We need:

    + to know the offset caused by cycles when local_clock() gets initialized.
    + local_clock() running for some time to calibrate cycles.

Hmm, I see that time-management people are not in Cc. We should have
added them since v1.

I add them now. Better late than never.

Best Regards,
Petr

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH v3] printk: fix zero-valued printk timestamps in early boot
  2026-03-13 10:45       ` Petr Mladek
@ 2026-03-14 14:16         ` Shashank Balaji
  2026-03-24 20:07           ` Bird, Tim
  2026-03-14 16:15         ` Michael Kelley
  2026-03-20 18:15         ` Bird, Tim
  2 siblings, 1 reply; 36+ messages in thread
From: Shashank Balaji @ 2026-03-14 14:16 UTC (permalink / raw)
  To: Petr Mladek, Bird, Tim
  Cc: Michael Kelley, rostedt@goodmis.org, senozhatsky@chromium.org,
	francesco@valla.it, geert@linux-m68k.org,
	linux-embedded@vger.kernel.org, linux-kernel@vger.kernel.org,
	Thomas Gleixner, John Stultz, Stephen Boyd, John Ogness

Hi Tim, Petr,

On Fri, Mar 13, 2026 at 11:45:02AM +0100, Petr Mladek wrote:
> Finally added timekeeping maintainers and John into Cc.
> We should have added time since v1.
> 
> Anyway, you might see the entire history at
> https://lore.kernel.org/all/39b09edb-8998-4ebd-a564-7d594434a981@bird.org/
> 
> 
> On Fri 2026-03-13 04:52:40, Bird, Tim wrote:
> > Hey Micheal,
> > 
> > This report is very interesting. 
> > 
> > Thanks very much for trying it out!
> > 
> > > -----Original Message-----
> > > From: Michael Kelley <mhklinux@outlook.com>
> > > Sent: Wednesday, March 11, 2026 9:47 AM
> > > From: Tim Bird <tim.bird@sony.com> Sent: Tuesday, February 10, 2026 3:48 PM
> > > >
> > > > During early boot, printk timestamps are reported as zero before
> > > > kernel timekeeping starts (e.g. before time_init()).  This
> > > > hinders boot-time optimization efforts.  This period is about 400
> > > > milliseconds for many current desktop and embedded machines
> > > > running Linux.
> > > >
> > > > Add support to save cycles during early boot, and output correct
> > > > timestamp values after timekeeping is initialized.  get_cycles()
> > > > is operational on arm64 and x86_64 from kernel start.  Add code
> > > > and variables to save calibration values used to later convert
> > > > cycle counts to time values in the early printks.  Add a config
> > > > to control the feature.
> > > >
> > > > This yields non-zero timestamps for printks from the very start
> > > > of kernel execution.  The timestamps are relative to the start of
> > > > the architecture-specified counter used in get_cycles
> > > > (e.g. the TSC on x86_64 and cntvct_el0 on arm64).
> > > >
> > > > All timestamps reflect time from processor power-on instead of
> > > > time from the kernel's timekeeping initialization.
> > > 
> > > I tried this patch in linux-next20260302 kernel running as a guest VM
> > > on a Hyper-V host. Two things:
> > > 
> > > 1) In the dmesg output, I'm seeing a place where the timestamps briefly go
> > > backwards -- i.e., they are not monotonically increasing. Here's a snippet,
> > > where there's a smaller timestamp immediately after the tsc is detected:
> > > 
> > > [   27.994891] SMBIOS 3.1.0 present.
> > > [   27.994893] DMI: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.1 09/25/2025
> > > [   27.994898] DMI: Memory slots populated: 2/2
> > > [   27.995202] Hypervisor detected: Microsoft Hyper-V
> > > [   27.995205] Hyper-V: privilege flags low 0xae7f, high 0x3b8030, ext 0x62, hints 0xa0e24, misc 0xe0bed7b2
> > > [   27.995208] Hyper-V: Nested features: 0x0
> > > [   27.995209] Hyper-V: LAPIC Timer Frequency: 0xc3500
> > > [   27.995210] Hyper-V: Using hypercall for remote TLB flush
> > > [   27.995216] clocksource: hyperv_clocksource_tsc_page: mask: 0xffffffffffffffff max_cycles: 0x24e6a1710, max_idle_ns: 440795202120 ns
> > > [   27.995218] clocksource: hyperv_clocksource_msr: mask: 0xffffffffffffffff max_cycles: 0x24e6a1710, max_idle_ns: 440795202120 ns
> > > [   27.995220] tsc: Detected 2918.401 MHz processor
> > 
> > I wonder if the tsc is getting fiddled with or virtualized somewhere in here, as part of clocksource initialization.
> > I believe each clocksource in the kernel maintains it's own internal offset, and maybe the offset that is
> > being used ends up being slightly different from the cycle-counter offset that the early_times feature uses.
> > I'm just throwing out guesses.  It's about a 4ms delta, which is pretty big.
> > 
> > > [   27.991060] e820: update [mem 0x00000000-0x00000fff] System RAM ==> device reserved
> > > [   27.991062] e820: remove [mem 0x000a0000-0x000fffff] System RAM
> > > [   27.991064] last_pfn = 0x210000 max_arch_pfn = 0x400000000
> > > [   27.991065] x86/PAT: PAT support disabled because CONFIG_X86_PAT is disabled in the kernel.
> > > [   27.991066] x86/PAT: Configuration [0-7]: WB  WT  UC- UC  WB  WT  UC- UC
> 
> I wonder how the calibration of the cycles is precise. I wonder if
> the problem might be that cycles were faster right after boot than
> later during the calibration.
> 
> I added the following debug output on top of this patch:
> 
> diff --git a/include/linux/early_times.h b/include/linux/early_times.h
> index 05388dcb8573..cdb467345bcc 100644
> --- a/include/linux/early_times.h
> +++ b/include/linux/early_times.h
> @@ -20,6 +20,7 @@ static inline void early_times_start_calibration(void)
>  {
>  	start_cycles = get_cycles();
>  	start_ns = local_clock();
> +	pr_info("Early printk times: started callibration: %llu ns\n", start_ns);
>  }
>  
>  static inline void early_times_finish_calibration(void)
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index 774ffb1fa5ac..836cb03aaa6d 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -2308,6 +2308,8 @@ int vprintk_store(int facility, int level,
>  	ts_nsec = local_clock();
>  	if (!ts_nsec)
>  		ts_nsec = early_cycles();
> +	else
> +		pr_info_once("local_clock() returned non-zero timestamp: %llu nsec\n", ts_nsec);
>  
>  	caller_id = printk_caller_id();
>  
> 
> And it produced in my kvm:
> 
> Let's say that start of the cycle counter is
> 
> Start of stage A
> 
> [    8.684438] Linux version 7.0.0-rc2-default+ (pmladek@pathway) (gcc (SUSE Linux) 15.2.1 20260202, GNU ld (GNU Binutils; openSUSE Tumbleweed) 2.45.0.20251103-2) #571 SMP PREEMPT_DYNAMIC Fri Mar 13 10:23:54 CET 2026
> [    8.684442] Command line: BOOT_IMAGE=/boot/vmlinuz-7.0.0-rc2-default+ root=/dev/vda2 resume=/dev/disk/by-uuid/369c7453-3d16-409d-88b2-5de027891a12 mitigations=auto nosplash earlycon=uart8250,io,0x3f8,115200 console=ttyS0,115200 console=tty0 ignore_loglevel log_buf_len=1M crashkernel=512M,high crashkernel=72M,low
> [...]
> [    8.696633] earlycon: uart8250 at I/O port 0x3f8 (options '115200')
> [    8.696639] printk: legacy bootconsole [uart8250] enabled
> [    8.731303] printk: debug: ignoring loglevel setting.
> [    8.732349] NX (Execute Disable) protection: active
> [    8.733447] APIC: Static calls initialized
> [    8.734667] SMBIOS 2.8 present.
> [    8.735358] DMI: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.17.0-2-g4f253b9b-prebuilt.qemu.org 04/01/2014
> [    8.737285] DMI: Memory slots populated: 1/1
> [    8.738380] Hypervisor detected: KVM
> [    8.739151] last_pfn = 0x7ffdc max_arch_pfn = 0x400000000
> [    8.740254] kvm-clock: Using msrs 4b564d01 and 4b564d00
> [    8.732971] printk: local_clock() returned non-zero timestamp: 3486 nsec
> 
> End of stage A
> 
> This is the point where printk() started storing the values from
> local_clock() instead of cycles.
> 
> Start of stage B
> 
> [    8.732971] kvm-clock: using sched offset of 252367014082295 cycles
> [    8.735471] clocksource: kvm-clock: mask: 0xffffffffffffffff max_cycles: 0x1cd42e4dffb, max_idle_ns: 881590591483 ns
> [    8.738880] tsc: Detected 3293.776 MHz processor
> [    8.740474] e820: update [mem 0x00000000-0x00000fff] System RAM ==> device reserved
> [...]
> [    8.932671] rcu: srcu_init: Setting srcu_struct sizes based on contention.
> [    8.934047] Early printk times: started callibration: 201079507 ns
> 
> End of stage B
> 
> This is where we started calibration of early cycles.
> 
> Start of stage C
> 
> [    8.935571] Console: colour dummy device 80x25
> [...]
> [    9.289673] thermal_sys: Registered thermal governor 'fair_share'
> [    9.290077] thermal_sys: Registered thermal governor 'bang_bang'
> [    9.292290] thermal_sys: Registered thermal governor 'step_wise'
> [    9.293530] thermal_sys: Registered thermal governor 'user_space'
> [    9.294856] cpuidle: using governor ladder
> [    9.296302] cpuidle: using governor menu
> 
> Here the thermal governors are registered. I guess that they might
> reduce speed of some HW.
> 
> [...]
> [   11.974147] clk: Disabling unused clocks
> 
> Some unused clocks are disabled. I wonder if this might affect
> counting the cycles.
> 
> [   12.330852] Freeing unused kernel image (rodata/data gap) memory: 1500K
> [   12.351191] Early printk times: mult=19634245, shift=26, offset=8732967929 ns
> 
> End of stage C
> 
> Here is the end on calibration.
> 
> Now, if the frequence of the cycles was:
> 
>    + was higher in the stage A when only cycles were stored
>    + was lower in stage C when it was calibrated against local_clock()
> 
> Then it might result in higher (calibrated) timestamps in stage A
> and step back in stage B.
> 
> Or something like this. It is possible that even local_clock() does
> not have a stable frequence during the early boot.
> 
> Idea: A solution might be to start calibration when printk()
>       gets first non-zero time from local_clock.
> 
>       Something like:
> 
> diff --git a/include/linux/early_times.h b/include/linux/early_times.h
> index 05388dcb8573..09d278996184 100644
> --- a/include/linux/early_times.h
> +++ b/include/linux/early_times.h
> @@ -16,10 +16,13 @@ extern u64 start_ns;
>  extern u32 early_mult, early_shift;
>  extern u64 early_ts_offset;
>  
> -static inline void early_times_start_calibration(void)
> +static inline void early_times_may_start_calibration(u64 ts_ns)
>  {
> +	if (start_ns)
> +		return;
> +
> +	start_ns = ts_ns;
>  	start_cycles = get_cycles();
> -	start_ns = local_clock();
>  }
>  
>  static inline void early_times_finish_calibration(void)
> diff --git a/init/main.c b/init/main.c
> index 27835270dfb5..a333b0da69cf 100644
> --- a/init/main.c
> +++ b/init/main.c
> @@ -1123,9 +1123,6 @@ void start_kernel(void)
>  	timekeeping_init();
>  	time_init();
>  
> -	/* This must be after timekeeping is initialized */
> -	early_times_start_calibration();
> -
>  	/* This must be after timekeeping is initialized */
>  	random_init();
>  
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index 774ffb1fa5ac..19330b6b4eb2 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -2306,7 +2306,9 @@ int vprintk_store(int facility, int level,
>  	 * timestamp with respect to the caller.
>  	 */
>  	ts_nsec = local_clock();
> -	if (!ts_nsec)
> +	if (ts_nsec)
> +		early_times_may_start_calibration(ts_nsec);
> +	else
>  		ts_nsec = early_cycles();
>  
>  	caller_id = printk_caller_id();
> 
> 
> > > 
> > > 2) A Linux VM running in the Azure cloud is also running on Hyper-V. Such a
> > > VM typically uses cloud-init to set everything up at boot time, and cloud-init
> > > is outputting lines to the serial console with a timestamp that looks like the
> > > printk() timestamp, but apparently is not adjusted for the early timestamping
> > > that this patch adds. Again, I haven't debugged what's going on -- I'm not
> > > immediately sure of the mechanism that cloud-init uses to do output to the
> > > serial console. The use of the Hyper-V synthetic clock source might the cause
> > > of the problem here as well. Here's an output snippet from the serial console:
> > > 
> > > [   20.330414] systemd[1]: Condition check resulted in OpenVSwitch configuration for cleanup being skipped.
> > > [   20.332911] systemd[1]: Condition check resulted in Set Up Additional Binary Formats being skipped.
> > > [   20.333257] pstore: Registered efi_pstore as persistent store backend
> > > [   20.334360] systemd[1]: Condition check resulted in File System Check on Root Device being skipped.
> > > [   20.338319] systemd[1]: Starting Load Kernel Modules...
> > > [   20.341094] systemd[1]: Starting Remount Root and Kernel File Systems...
> > > [   20.350993] systemd[1]: Starting udev Coldplug all Devices...
> > > [   20.356255] systemd[1]: Starting Uncomplicated firewall...
> > > [   20.361536] systemd[1]: Started Journal Service.
> > > [   20.386902] EXT4-fs (sda1): re-mounted c02dce0c-0c40-4e6e-88af-c5a0987b0adb r/w.
> > > [   22.532033] /dev/sr0: Can't lookup blockdev
> > > [    7.955973] cloud-init[783]: Cloud-init v. 24.3.1-0ubuntu0~20.04.1 running 'init-local' at Wed, 11 Mar 2026 15:27:06 +0000. Up 7.48
> > > seconds.
> > > [    9.933120] cloud-init[822]: Cloud-init v. 24.3.1-0ubuntu0~20.04.1 running 'init' at Wed, 11 Mar 2026 15:27:08 +0000. Up 9.82 seconds.
> > > [    9.935483] cloud-init[822]: ci-info: ++++++++++++++++++++++++++++++++++++++Net device
> > > info+++++++++++++++++++++++++++++++++++++++
> > > [    9.937726] cloud-init[822]: ci-info: +--------+------+-----------------------------+---------------+--------+-------------------+
> > > [    9.939905] cloud-init[822]: ci-info: | Device |  Up  |           Address           |      Mask     | Scope  |     Hw-Address    |
> > > [    9.942059] cloud-init[822]: ci-info: +--------+------+-----------------------------+---------------+--------+-------------------+
> 
> This is more complicated. I wonder if the timestamps from cloud-init()
> are somehow synchronized with local_clock().
> 
> We might need to synchronize local_clock() with the cycles as well.
> But there is the chicken&egg problem. We need:
> 
>     + to know the offset caused by cycles when local_clock() gets initialized.
>     + local_clock() running for some time to calibrate cycles.
>

Based on the discussions thus far, I see three problems currently:

1. Three phases of timestamps on consoles printing synchronously is awkward
2. Possible userspace breakage:

    I don't know how big of a concern this is, but if the timestamps were to not start at zero, any
    tools expecting them to, would be shocked. printk timestamp comparison with CLOCK_BOOTTIME
    wouldn't work anymore.

3. Counter frequency constancy requirement:

    For the early time calibration to be accurate, the counter frequency should be constant from
    processor power on to the end of the calibration. We don't know what the firmware or the bootloader
    may do in-between. If the counter frequency depends on cpu frequency or is just unstable, then we
    should err on the side of caution and bail out of early times. On the other hand, if constant counter
    frequency is architecturally guaranteed or if the cpu advertises so, then we should be good. Like,
    x86 has one only if boot_cpu_has(X86_FEATURE_CONSTANT_TSC), rest are dubious.

Idea: Making the early timestamps start at 0 would partially address these problems:

1. No awkward three phases of timestamps:

    The only difference would be that serial would have a number of timestamps exactly at 0, while userspace
    wouldn't. The early time calibration data would only be used for userspace to convert the counter values
    to timestamps with a zero origin.

2. Userspace breakage:

    CLOCK_BOOTTIME's origin is timekeeping init. So relative to that, the origin of early timestamps would be
    closer, but still different.

Making the early timestamps start at 0 would also lower counter frequency constancy requirement. Now the constancy
requirement would only be between the period of kernel start to calibration end, a phase in which the kernel is
in full control. Maybe even if !boot_cpu_has(X86_FEATURE_CONSTANT_TSC), the calibration data would be fine to use?
This is debatable.

A disadvantage of this method is that visibility into per-kernel duration is lost. But that can be made up for by
adding something like:

    pr_info("Pre-kernel duration: %llu ms\n", pre_kernel_time);

But this information should be taken with a grain of salt if constant counter frequency is not guaranteed.

Another point, since constancy of counter frequency is being relied upon anyway, conversion of counter value to
timestamp can be done later once counter frequency calibration is complete, instead of doing our own calibration. 
An arch hook like arch_counter_freq() could be provided from which we can get this info. On x86, it would be wired
up to tsc_khz. If arch_counter_freq() returns 0, that means constant frequency is not guaranteed, so we'd bail
out of the counter value to timestamp conversion.

Thanks,
Shashank

^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: [PATCH v3] printk: fix zero-valued printk timestamps in early boot
  2026-03-14 14:16         ` Shashank Balaji
@ 2026-03-24 20:07           ` Bird, Tim
  0 siblings, 0 replies; 36+ messages in thread
From: Bird, Tim @ 2026-03-24 20:07 UTC (permalink / raw)
  To: Shashank Balaji, Petr Mladek
  Cc: Michael Kelley, rostedt@goodmis.org, senozhatsky@chromium.org,
	francesco@valla.it, geert@linux-m68k.org,
	linux-embedded@vger.kernel.org, linux-kernel@vger.kernel.org,
	Thomas Gleixner, John Stultz, Stephen Boyd, John Ogness

Hey Shashank,

Thanks for looking into this, and providing some ideas.  See my responses below.

> -----Original Message-----
> From: Shashank Balaji <shashankbalaji02@gmail.com>
> 
> Hi Tim, Petr,
> 
> On Fri, Mar 13, 2026 at 11:45:02AM +0100, Petr Mladek wrote:
> > Finally added timekeeping maintainers and John into Cc.
> > We should have added time since v1.
> >
> > Anyway, you might see the entire history at
> > https://lore.kernel.org/all/39b09edb-8998-4ebd-a564-7d594434a981@bird.org/
> >
> >
...

> 
> Based on the discussions thus far, I see three problems currently:
> 
> 1. Three phases of timestamps on consoles printing synchronously is awkward
Agreed.

> 2. Possible userspace breakage:
> 
>     I don't know how big of a concern this is, but if the timestamps were to not start at zero, any
>     tools expecting them to, would be shocked. printk timestamp comparison with CLOCK_BOOTTIME
>     wouldn't work anymore.
> 
> 3. Counter frequency constancy requirement:
> 
>     For the early time calibration to be accurate, the counter frequency should be constant from
>     processor power on to the end of the calibration. We don't know what the firmware or the bootloader
>     may do in-between. If the counter frequency depends on cpu frequency or is just unstable, then we
>     should err on the side of caution and bail out of early times. On the other hand, if constant counter
>     frequency is architecturally guaranteed or if the cpu advertises so, then we should be good. Like,
>     x86 has one only if boot_cpu_has(X86_FEATURE_CONSTANT_TSC), rest are dubious.

I'll need to check when the data for boot_cpu_has() is initialized.  I believe it's not available
from the very first kernel instruction, but I may be misremembering.  I think almost all
x86_64 processors have invariant TSCs since about 2008, so it might be better to just
document this issue in the config.  I don't think a lot of people are optimizing this
100ms to 400ms region of early boot (that this patch is targeted at) on older processors.

> 
> Idea: Making the early timestamps start at 0 would partially address these problems:
> 
> 1. No awkward three phases of timestamps:
> 
>     The only difference would be that serial would have a number of timestamps exactly at 0, while userspace
>     wouldn't. The early time calibration data would only be used for userspace to convert the counter values
>     to timestamps with a zero origin.
> 
> 2. Userspace breakage:
> 
>     CLOCK_BOOTTIME's origin is timekeeping init. So relative to that, the origin of early timestamps would be
>     closer, but still different.
> 
> Making the early timestamps start at 0 would also lower counter frequency constancy requirement. Now the constancy
> requirement would only be between the period of kernel start to calibration end, a phase in which the kernel is
> in full control. Maybe even if !boot_cpu_has(X86_FEATURE_CONSTANT_TSC), the calibration data would be fine to use?
> This is debatable.
> 
> A disadvantage of this method is that visibility into per-kernel duration is lost. But that can be made up for by
> adding something like:
> 
>     pr_info("Pre-kernel duration: %llu ms\n", pre_kernel_time);
> 
> But this information should be taken with a grain of salt if constant counter frequency is not guaranteed.
> 
> Another point, since constancy of counter frequency is being relied upon anyway, conversion of counter value to
> timestamp can be done later once counter frequency calibration is complete, instead of doing our own calibration.
> An arch hook like arch_counter_freq() could be provided from which we can get this info. On x86, it would be wired
> up to tsc_khz. If arch_counter_freq() returns 0, that means constant frequency is not guaranteed, so we'd bail
> out of the counter value to timestamp conversion.

Interesting.  If I can validate that local_clock() is using the same clock source (TSC) as my early_times calibration,
maybe I *can* just steal the calibration already done when initializing that clock.  I'm using the same
routine 'clocks_calc_mult_shift', and maybe it would be better to just grab the numbers that result from
the TSC clock initialization, rather than calling it again (with different numbers) myself.  I'll look into this.

Thanks very much for the feedback and ideas.
 -- Tim


^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: [PATCH v3] printk: fix zero-valued printk timestamps in early boot
  2026-03-13 10:45       ` Petr Mladek
  2026-03-14 14:16         ` Shashank Balaji
@ 2026-03-14 16:15         ` Michael Kelley
  2026-03-24 19:47           ` Bird, Tim
  2026-03-20 18:15         ` Bird, Tim
  2 siblings, 1 reply; 36+ messages in thread
From: Michael Kelley @ 2026-03-14 16:15 UTC (permalink / raw)
  To: Petr Mladek, Bird, Tim, Shashank Balaji
  Cc: Michael Kelley, rostedt@goodmis.org, senozhatsky@chromium.org,
	francesco@valla.it, geert@linux-m68k.org,
	linux-embedded@vger.kernel.org, linux-kernel@vger.kernel.org,
	Thomas Gleixner, John Stultz, Stephen Boyd, John Ogness

From: Petr Mladek <pmladek@suse.com> Sent: Friday, March 13, 2026 3:45 AM
> 
> Finally added timekeeping maintainers and John into Cc.
> We should have added time since v1.
> 
> Anyway, you might see the entire history at
> https://lore.kernel.org/all/39b09edb-8998-4ebd-a564-7d594434a981@bird.org/
> 
> On Fri 2026-03-13 04:52:40, Bird, Tim wrote:
> > Hey Micheal,
> >
> > This report is very interesting.
> >
> > Thanks very much for trying it out!
> >
> > > -----Original Message-----
> > > From: Michael Kelley <mhklinux@outlook.com>
> > > Sent: Wednesday, March 11, 2026 9:47 AM
> > > From: Tim Bird <tim.bird@sony.com> Sent: Tuesday, February 10, 2026 3:48 PM
> > > >
> > > > During early boot, printk timestamps are reported as zero before
> > > > kernel timekeeping starts (e.g. before time_init()).  This
> > > > hinders boot-time optimization efforts.  This period is about 400
> > > > milliseconds for many current desktop and embedded machines
> > > > running Linux.
> > > >
> > > > Add support to save cycles during early boot, and output correct
> > > > timestamp values after timekeeping is initialized.  get_cycles()
> > > > is operational on arm64 and x86_64 from kernel start.  Add code
> > > > and variables to save calibration values used to later convert
> > > > cycle counts to time values in the early printks.  Add a config
> > > > to control the feature.
> > > >
> > > > This yields non-zero timestamps for printks from the very start
> > > > of kernel execution.  The timestamps are relative to the start of
> > > > the architecture-specified counter used in get_cycles
> > > > (e.g. the TSC on x86_64 and cntvct_el0 on arm64).
> > > >
> > > > All timestamps reflect time from processor power-on instead of
> > > > time from the kernel's timekeeping initialization.
> > >
> > > I tried this patch in linux-next20260302 kernel running as a guest VM
> > > on a Hyper-V host. Two things:
> > >
> > > 1) In the dmesg output, I'm seeing a place where the timestamps briefly go
> > > backwards -- i.e., they are not monotonically increasing. Here's a snippet,
> > > where there's a smaller timestamp immediately after the tsc is detected:
> > >
> > > [   27.994891] SMBIOS 3.1.0 present.
> > > [   27.994893] DMI: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.1 09/25/2025
> > > [   27.994898] DMI: Memory slots populated: 2/2
> > > [   27.995202] Hypervisor detected: Microsoft Hyper-V
> > > [   27.995205] Hyper-V: privilege flags low 0xae7f, high 0x3b8030, ext 0x62, hints 0xa0e24, misc 0xe0bed7b2
> > > [   27.995208] Hyper-V: Nested features: 0x0
> > > [   27.995209] Hyper-V: LAPIC Timer Frequency: 0xc3500
> > > [   27.995210] Hyper-V: Using hypercall for remote TLB flush
> > > [   27.995216] clocksource: hyperv_clocksource_tsc_page: mask: 0xffffffffffffffff max_cycles: 0x24e6a1710, max_idle_ns: 440795202120 ns
> > > [   27.995218] clocksource: hyperv_clocksource_msr: mask: 0xffffffffffffffff max_cycles: 0x24e6a1710, max_idle_ns: 440795202120 ns
> > > [   27.995220] tsc: Detected 2918.401 MHz processor
> >
> > I wonder if the tsc is getting fiddled with or virtualized somewhere
> > in here, as part of clocksource initialization. I believe each clocksource in
> > the kernel maintains it's own internal offset, and maybe the offset that is
> > being used ends up being slightly different from the cycle-counter offset
> > that the early_times feature uses. I'm just throwing out guesses.  It's about
> > a 4ms delta, which is pretty big.

I'm fairly certain the TSC frequency is not being fiddled with. In a guest VM on
Hyper-V, the x86 instruction to read the TSC executes directly in hardware and
is not virtualized. There *is* per-VM scaling of the TSC value to handle live
migrations across virtualization hosts with different TSC frequencies, but that's
not in play during my experiments.

> >
> > > [   27.991060] e820: update [mem 0x00000000-0x00000fff] System RAM ==> device reserved
> > > [   27.991062] e820: remove [mem 0x000a0000-0x000fffff] System RAM
> > > [   27.991064] last_pfn = 0x210000 max_arch_pfn = 0x400000000
> > > [   27.991065] x86/PAT: PAT support disabled because CONFIG_X86_PAT is disabled in the kernel.
> > > [   27.991066] x86/PAT: Configuration [0-7]: WB  WT  UC- UC  WB  WT  UC- UC
> 
> I wonder how the calibration of the cycles is precise. I wonder if
> the problem might be that cycles were faster right after boot than
> later during the calibration.
> 
> I added the following debug output on top of this patch:
> 
> diff --git a/include/linux/early_times.h b/include/linux/early_times.h
> index 05388dcb8573..cdb467345bcc 100644
> --- a/include/linux/early_times.h
> +++ b/include/linux/early_times.h
> @@ -20,6 +20,7 @@ static inline void early_times_start_calibration(void)
>  {
>  	start_cycles = get_cycles();
>  	start_ns = local_clock();
> +	pr_info("Early printk times: started callibration: %llu ns\n", start_ns);
>  }
> 
>  static inline void early_times_finish_calibration(void)
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index 774ffb1fa5ac..836cb03aaa6d 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -2308,6 +2308,8 @@ int vprintk_store(int facility, int level,
>  	ts_nsec = local_clock();
>  	if (!ts_nsec)
>  		ts_nsec = early_cycles();
> +	else
> +		pr_info_once("local_clock() returned non-zero timestamp: %llu nsec\n", ts_nsec);
> 
>  	caller_id = printk_caller_id();
> 
> 
> And it produced in my kvm:
> 
> Let's say that start of the cycle counter is
> 
> Start of stage A
> 
> [    8.684438] Linux version 7.0.0-rc2-default+ (pmladek@pathway) (gcc (SUSE Linux)
> 15.2.1 20260202, GNU ld (GNU Binutils; openSUSE Tumbleweed) 2.45.0.20251103-2)
> #571 SMP PREEMPT_DYNAMIC Fri Mar 13 10:23:54 CET 2026
> [    8.684442] Command line: BOOT_IMAGE=/boot/vmlinuz-7.0.0-rc2-default+
> root=/dev/vda2 resume=/dev/disk/by-uuid/369c7453-3d16-409d-88b2-
> 5de027891a12 mitigations=auto nosplash earlycon=uart8250,io,0x3f8,115200
> console=ttyS0,115200 console=tty0 ignore_loglevel log_buf_len=1M
> crashkernel=512M,high crashkernel=72M,low
> [...]
> [    8.696633] earlycon: uart8250 at I/O port 0x3f8 (options '115200')
> [    8.696639] printk: legacy bootconsole [uart8250] enabled
> [    8.731303] printk: debug: ignoring loglevel setting.
> [    8.732349] NX (Execute Disable) protection: active
> [    8.733447] APIC: Static calls initialized
> [    8.734667] SMBIOS 2.8 present.
> [    8.735358] DMI: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.17.0-2-g4f253b9b-prebuilt.qemu.org 04/01/2014
> [    8.737285] DMI: Memory slots populated: 1/1
> [    8.738380] Hypervisor detected: KVM
> [    8.739151] last_pfn = 0x7ffdc max_arch_pfn = 0x400000000
> [    8.740254] kvm-clock: Using msrs 4b564d01 and 4b564d00
> [    8.732971] printk: local_clock() returned non-zero timestamp: 3486 nsec
> 
> End of stage A
> 
> This is the point where printk() started storing the values from
> local_clock() instead of cycles.
> 
> Start of stage B
> 
> [    8.732971] kvm-clock: using sched offset of 252367014082295 cycles
> [    8.735471] clocksource: kvm-clock: mask: 0xffffffffffffffff max_cycles: 0x1cd42e4dffb, max_idle_ns: 881590591483 ns
> [    8.738880] tsc: Detected 3293.776 MHz processor
> [    8.740474] e820: update [mem 0x00000000-0x00000fff] System RAM ==> device reserved
> [...]
> [    8.932671] rcu: srcu_init: Setting srcu_struct sizes based on contention.
> [    8.934047] Early printk times: started callibration: 201079507 ns
> 
> End of stage B
> 
> This is where we started calibration of early cycles.
> 
> Start of stage C
> 
> [    8.935571] Console: colour dummy device 80x25
> [...]
> [    9.289673] thermal_sys: Registered thermal governor 'fair_share'
> [    9.290077] thermal_sys: Registered thermal governor 'bang_bang'
> [    9.292290] thermal_sys: Registered thermal governor 'step_wise'
> [    9.293530] thermal_sys: Registered thermal governor 'user_space'
> [    9.294856] cpuidle: using governor ladder
> [    9.296302] cpuidle: using governor menu
> 
> Here the thermal governors are registered. I guess that they might
> reduce speed of some HW.
> 
> [...]
> [   11.974147] clk: Disabling unused clocks
> 
> Some unused clocks are disabled. I wonder if this might affect
> counting the cycles.
> 
> [   12.330852] Freeing unused kernel image (rodata/data gap) memory: 1500K
> [   12.351191] Early printk times: mult=19634245, shift=26, offset=8732967929 ns
> 
> End of stage C
> 
> Here is the end on calibration.
> 
> Now, if the frequence of the cycles was:
> 
>    + was higher in the stage A when only cycles were stored
>    + was lower in stage C when it was calibrated against local_clock()
> 
> Then it might result in higher (calibrated) timestamps in stage A
> and step back in stage B.
> 
> Or something like this. It is possible that even local_clock() does
> not have a stable frequence during the early boot.

In my VM on Hyper-V, I do see a problem with the results of your
calibration code. Over the calibration interval, you calculate the delta
number of nanoseconds from local_clock() and the delta number of
TSC cycles. The delta TSC cycles divided by the delta nanoseconds
should yield the TSC frequency. But the result of your calibration code
is about 3.05 cycles/nsec, when the actual TSC frequency is 2.918
cycles/nsec for the hardware I'm running on.

In a Linux VM where CONFIG_PARAVIRT=y, local_clock() eventually
comes down to native_sched_clock(), which just reads the TSC and
then converts to nanoseconds based on the kernel's understanding
of the TSC frequency. So I don't think the local_clock() frequency is
varying. But I'm thinking there are some adjustments being made
to the value returned by local_clock() during early initialization, and
I didn't try to track those down.

Hyper-V provides guest VMs with a synthetic clock (that is based
on the TSC). As an experiment, I used that clock in the early time
calibration, and everything worked properly. The calibration code
produced delta nanoseconds and delta cycles that were exactly
2.918 cycles/nsec, and the transition from Stage A to Stage B was
correct -- no cases of a smaller timestamp following a larger
timestamp. So my conclusion is that the calibration is indeed
problematic, though I haven't identified why the nanoseconds
delta from local_clock() is smaller than it should be.

> 
> Idea: A solution might be to start calibration when printk()
>       gets first non-zero time from local_clock.
> 
>       Something like:
> 
> diff --git a/include/linux/early_times.h b/include/linux/early_times.h
> index 05388dcb8573..09d278996184 100644
> --- a/include/linux/early_times.h
> +++ b/include/linux/early_times.h
> @@ -16,10 +16,13 @@ extern u64 start_ns;
>  extern u32 early_mult, early_shift;
>  extern u64 early_ts_offset;
> 
> -static inline void early_times_start_calibration(void)
> +static inline void early_times_may_start_calibration(u64 ts_ns)
>  {
> +	if (start_ns)
> +		return;
> +
> +	start_ns = ts_ns;
>  	start_cycles = get_cycles();
> -	start_ns = local_clock();
>  }
> 
>  static inline void early_times_finish_calibration(void)
> diff --git a/init/main.c b/init/main.c
> index 27835270dfb5..a333b0da69cf 100644
> --- a/init/main.c
> +++ b/init/main.c
> @@ -1123,9 +1123,6 @@ void start_kernel(void)
>  	timekeeping_init();
>  	time_init();
> 
> -	/* This must be after timekeeping is initialized */
> -	early_times_start_calibration();
> -
>  	/* This must be after timekeeping is initialized */
>  	random_init();
> 
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index 774ffb1fa5ac..19330b6b4eb2 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -2306,7 +2306,9 @@ int vprintk_store(int facility, int level,
>  	 * timestamp with respect to the caller.
>  	 */
>  	ts_nsec = local_clock();
> -	if (!ts_nsec)
> +	if (ts_nsec)
> +		early_times_may_start_calibration(ts_nsec);
> +	else
>  		ts_nsec = early_cycles();
> 
>  	caller_id = printk_caller_id();
> 
> 
> > >
> > > 2) A Linux VM running in the Azure cloud is also running on Hyper-V. Such a
> > > VM typically uses cloud-init to set everything up at boot time, and cloud-init
> > > is outputting lines to the serial console with a timestamp that looks like the
> > > printk() timestamp, but apparently is not adjusted for the early timestamping
> > > that this patch adds. Again, I haven't debugged what's going on -- I'm not
> > > immediately sure of the mechanism that cloud-init uses to do output to the
> > > serial console. The use of the Hyper-V synthetic clock source might the cause
> > > of the problem here as well. Here's an output snippet from the serial console:
> > >
> > > [   20.330414] systemd[1]: Condition check resulted in OpenVSwitch configuration for cleanup being skipped.
> > > [   20.332911] systemd[1]: Condition check resulted in Set Up Additional Binary Formats being skipped.
> > > [   20.333257] pstore: Registered efi_pstore as persistent store backend
> > > [   20.334360] systemd[1]: Condition check resulted in File System Check on Root Device being skipped.
> > > [   20.338319] systemd[1]: Starting Load Kernel Modules...
> > > [   20.341094] systemd[1]: Starting Remount Root and Kernel File Systems...
> > > [   20.350993] systemd[1]: Starting udev Coldplug all Devices...
> > > [   20.356255] systemd[1]: Starting Uncomplicated firewall...
> > > [   20.361536] systemd[1]: Started Journal Service.
> > > [   20.386902] EXT4-fs (sda1): re-mounted c02dce0c-0c40-4e6e-88af-c5a0987b0adb r/w.
> > > [   22.532033] /dev/sr0: Can't lookup blockdev
> > > [    7.955973] cloud-init[783]: Cloud-init v. 24.3.1-0ubuntu0~20.04.1 running 'init-local' at Wed, 11 Mar 2026 15:27:06 +0000. Up 7.48 seconds.
> > > [    9.933120] cloud-init[822]: Cloud-init v. 24.3.1-0ubuntu0~20.04.1 running 'init' at Wed, 11 Mar 2026 15:27:08 +0000. Up 9.82 seconds.
> > > [    9.935483] cloud-init[822]: ci-info: ++++++++++++++++++++++++++++++++++++++Net device info+++++++++++++++++++++++++++++++++++++++
> > > [    9.937726] cloud-init[822]: ci-info: +--------+------+-----------------------------+---------------+--------+-------------------+
> > > [    9.939905] cloud-init[822]: ci-info: | Device |  Up  |           Address           |      Mask | Scope  |     Hw-Address    |
> > > [    9.942059] cloud-init[822]: ci-info: +--------+------+-----------------------------+---------------+--------+-------------------+

The cloud-init output to the serial console is coming from syslog, which is
presumably writing directly to /dev/console. This output is also recorded
in the syslog log files (/var/log/syslog in my Ubuntu system), though with
timestamps in a text format like

2026-03-14T08:44:24.781241-07:00

The timestamps recorded in /var/log/syslog, and as shown with the
'journalctl' command, are monontonic using that full date/time format. But
'journalctl' with the "-o short-monotonic" option shows the seconds-since-boot
format, and in that case, the cloud-init timestamps are discontinuous with the
kernel messages, like in the serial console output. I don't know exactly where
journalctl gets its knowledge of the boot time, but among the possibilities are:

/proc/uptime
/proc/stat (the "btime" field)

These are not adjusted for using early boot times. And it's not clear whether
they should be -- I don't know what the big picture implications would be.
And there are probably other places the boot time is available to user space.
If using early boot times is intended to be only for occasional diagnostic use,
then maybe living with the discontinuity is OK. I see that Shashank Balaji has
also commented about userspace issues, which covers this syslog case.

I think all this gives a first-level explanation of what I'm seeing in a
Hyper-V guest. I don't think any of it is specific to Hyper-V guests
or the Hyper-V synthetic clock sources. The issues are more generic.
Sorry. :-(

Michael

> 
> This is more complicated. I wonder if the timestamps from cloud-init()
> are somehow synchronized with local_clock().
> 
> We might need to synchronize local_clock() with the cycles as well.
> But there is the chicken&egg problem. We need:
> 
>     + to know the offset caused by cycles when local_clock() gets initialized.
>     + local_clock() running for some time to calibrate cycles.
> 
> Hmm, I see that time-management people are not in Cc. We should have
> added them since v1.
> 
> I add them now. Better late than never.
> 
> Best Regards,
> Petr


^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: [PATCH v3] printk: fix zero-valued printk timestamps in early boot
  2026-03-14 16:15         ` Michael Kelley
@ 2026-03-24 19:47           ` Bird, Tim
  2026-03-26  9:24             ` John Ogness
  0 siblings, 1 reply; 36+ messages in thread
From: Bird, Tim @ 2026-03-24 19:47 UTC (permalink / raw)
  To: Michael Kelley, Petr Mladek, Shashank Balaji
  Cc: rostedt@goodmis.org, senozhatsky@chromium.org, francesco@valla.it,
	geert@linux-m68k.org, linux-embedded@vger.kernel.org,
	linux-kernel@vger.kernel.org, Thomas Gleixner, John Stultz,
	Stephen Boyd, John Ogness

Sorry for the slow response.  See my response and plan inline below.

> -----Original Message-----
> From: Michael Kelley <mhklinux@outlook.com>
> 
> From: Petr Mladek <pmladek@ suse. com> Sent: Friday, March 13, 2026 3: 45 AM > > Finally added timekeeping maintainers and John into Cc.
> > We should have added time since v1. > > Anyway, you might see the entire history at
> 
> From: Petr Mladek <pmladek@suse.com> Sent: Friday, March 13, 2026 3:45 AM
> >
> > Finally added timekeeping maintainers and John into Cc.
> > We should have added time since v1.
> >
> > Anyway, you might see the entire history at
> > https://lore.kernel.org/all/39b09edb-8998-4ebd-a564-7d594434a981@bird.org/
> >
> > On Fri 2026-03-13 04:52:40, Bird, Tim wrote:
> > > Hey Micheal,
> > >
> > > This report is very interesting.
> > >
> > > Thanks very much for trying it out!
> > >
> > > > -----Original Message-----
> > > > From: Michael Kelley <mhklinux@outlook.com>
> > > > Sent: Wednesday, March 11, 2026 9:47 AM
> > > > From: Tim Bird <tim.bird@sony.com> Sent: Tuesday, February 10, 2026 3:48 PM
> > > > >
> > > > > During early boot, printk timestamps are reported as zero before
> > > > > kernel timekeeping starts (e.g. before time_init()).  This
> > > > > hinders boot-time optimization efforts.  This period is about 400
> > > > > milliseconds for many current desktop and embedded machines
> > > > > running Linux.
> > > > >
> > > > > Add support to save cycles during early boot, and output correct
> > > > > timestamp values after timekeeping is initialized.  get_cycles()
> > > > > is operational on arm64 and x86_64 from kernel start.  Add code
> > > > > and variables to save calibration values used to later convert
> > > > > cycle counts to time values in the early printks.  Add a config
> > > > > to control the feature.
> > > > >
> > > > > This yields non-zero timestamps for printks from the very start
> > > > > of kernel execution.  The timestamps are relative to the start of
> > > > > the architecture-specified counter used in get_cycles
> > > > > (e.g. the TSC on x86_64 and cntvct_el0 on arm64).
> > > > >
> > > > > All timestamps reflect time from processor power-on instead of
> > > > > time from the kernel's timekeeping initialization.
> > > >
> > > > I tried this patch in linux-next20260302 kernel running as a guest VM
> > > > on a Hyper-V host. Two things:
> > > >
> > > > 1) In the dmesg output, I'm seeing a place where the timestamps briefly go
> > > > backwards -- i.e., they are not monotonically increasing. Here's a snippet,
> > > > where there's a smaller timestamp immediately after the tsc is detected:
> > > >
> > > > [   27.994891] SMBIOS 3.1.0 present.
> > > > [   27.994893] DMI: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.1 09/25/2025
> > > > [   27.994898] DMI: Memory slots populated: 2/2
> > > > [   27.995202] Hypervisor detected: Microsoft Hyper-V
> > > > [   27.995205] Hyper-V: privilege flags low 0xae7f, high 0x3b8030, ext 0x62, hints 0xa0e24, misc 0xe0bed7b2
> > > > [   27.995208] Hyper-V: Nested features: 0x0
> > > > [   27.995209] Hyper-V: LAPIC Timer Frequency: 0xc3500
> > > > [   27.995210] Hyper-V: Using hypercall for remote TLB flush
> > > > [   27.995216] clocksource: hyperv_clocksource_tsc_page: mask: 0xffffffffffffffff max_cycles: 0x24e6a1710, max_idle_ns: 440795202120
> ns
> > > > [   27.995218] clocksource: hyperv_clocksource_msr: mask: 0xffffffffffffffff max_cycles: 0x24e6a1710, max_idle_ns: 440795202120 ns
> > > > [   27.995220] tsc: Detected 2918.401 MHz processor
> > >
> > > I wonder if the tsc is getting fiddled with or virtualized somewhere
> > > in here, as part of clocksource initialization. I believe each clocksource in
> > > the kernel maintains it's own internal offset, and maybe the offset that is
> > > being used ends up being slightly different from the cycle-counter offset
> > > that the early_times feature uses. I'm just throwing out guesses.  It's about
> > > a 4ms delta, which is pretty big.
> 
> I'm fairly certain the TSC frequency is not being fiddled with. In a guest VM on
> Hyper-V, the x86 instruction to read the TSC executes directly in hardware and
> is not virtualized. There *is* per-VM scaling of the TSC value to handle live
> migrations across virtualization hosts with different TSC frequencies, but that's
> not in play during my experiments.

OK - thanks.  I suspected this was the case.
> 
> > >
> > > > [   27.991060] e820: update [mem 0x00000000-0x00000fff] System RAM ==> device reserved
> > > > [   27.991062] e820: remove [mem 0x000a0000-0x000fffff] System RAM
> > > > [   27.991064] last_pfn = 0x210000 max_arch_pfn = 0x400000000
> > > > [   27.991065] x86/PAT: PAT support disabled because CONFIG_X86_PAT is disabled in the kernel.
> > > > [   27.991066] x86/PAT: Configuration [0-7]: WB  WT  UC- UC  WB  WT  UC- UC
> >
> > I wonder how the calibration of the cycles is precise. I wonder if
> > the problem might be that cycles were faster right after boot than
> > later during the calibration.
> >
> > I added the following debug output on top of this patch:
> >
> > diff --git a/include/linux/early_times.h b/include/linux/early_times.h
> > index 05388dcb8573..cdb467345bcc 100644
> > --- a/include/linux/early_times.h
> > +++ b/include/linux/early_times.h
> > @@ -20,6 +20,7 @@ static inline void early_times_start_calibration(void)
> >  {
> >  	start_cycles = get_cycles();
> >  	start_ns = local_clock();
> > +	pr_info("Early printk times: started callibration: %llu ns\n", start_ns);
> >  }
> >
> >  static inline void early_times_finish_calibration(void)
> > diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> > index 774ffb1fa5ac..836cb03aaa6d 100644
> > --- a/kernel/printk/printk.c
> > +++ b/kernel/printk/printk.c
> > @@ -2308,6 +2308,8 @@ int vprintk_store(int facility, int level,
> >  	ts_nsec = local_clock();
> >  	if (!ts_nsec)
> >  		ts_nsec = early_cycles();
> > +	else
> > +		pr_info_once("local_clock() returned non-zero timestamp: %llu nsec\n", ts_nsec);
> >
> >  	caller_id = printk_caller_id();
> >
> >
> > And it produced in my kvm:
> >
> > Let's say that start of the cycle counter is
> >
> > Start of stage A
> >
> > [    8.684438] Linux version 7.0.0-rc2-default+ (pmladek@pathway) (gcc (SUSE Linux)
> > 15.2.1 20260202, GNU ld (GNU Binutils; openSUSE Tumbleweed) 2.45.0.20251103-2)
> > #571 SMP PREEMPT_DYNAMIC Fri Mar 13 10:23:54 CET 2026
> > [    8.684442] Command line: BOOT_IMAGE=/boot/vmlinuz-7.0.0-rc2-default+
> > root=/dev/vda2 resume=/dev/disk/by-uuid/369c7453-3d16-409d-88b2-
> > 5de027891a12 mitigations=auto nosplash earlycon=uart8250,io,0x3f8,115200
> > console=ttyS0,115200 console=tty0 ignore_loglevel log_buf_len=1M
> > crashkernel=512M,high crashkernel=72M,low
> > [...]
> > [    8.696633] earlycon: uart8250 at I/O port 0x3f8 (options '115200')
> > [    8.696639] printk: legacy bootconsole [uart8250] enabled
> > [    8.731303] printk: debug: ignoring loglevel setting.
> > [    8.732349] NX (Execute Disable) protection: active
> > [    8.733447] APIC: Static calls initialized
> > [    8.734667] SMBIOS 2.8 present.
> > [    8.735358] DMI: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.17.0-2-g4f253b9b-prebuilt.qemu.org 04/01/2014
> > [    8.737285] DMI: Memory slots populated: 1/1
> > [    8.738380] Hypervisor detected: KVM
> > [    8.739151] last_pfn = 0x7ffdc max_arch_pfn = 0x400000000
> > [    8.740254] kvm-clock: Using msrs 4b564d01 and 4b564d00
> > [    8.732971] printk: local_clock() returned non-zero timestamp: 3486 nsec
> >
> > End of stage A
> >
> > This is the point where printk() started storing the values from
> > local_clock() instead of cycles.
> >
> > Start of stage B
> >
> > [    8.732971] kvm-clock: using sched offset of 252367014082295 cycles
> > [    8.735471] clocksource: kvm-clock: mask: 0xffffffffffffffff max_cycles: 0x1cd42e4dffb, max_idle_ns: 881590591483 ns
> > [    8.738880] tsc: Detected 3293.776 MHz processor
> > [    8.740474] e820: update [mem 0x00000000-0x00000fff] System RAM ==> device reserved
> > [...]
> > [    8.932671] rcu: srcu_init: Setting srcu_struct sizes based on contention.
> > [    8.934047] Early printk times: started callibration: 201079507 ns
> >
> > End of stage B
> >
> > This is where we started calibration of early cycles.
> >
> > Start of stage C
> >
> > [    8.935571] Console: colour dummy device 80x25
> > [...]
> > [    9.289673] thermal_sys: Registered thermal governor 'fair_share'
> > [    9.290077] thermal_sys: Registered thermal governor 'bang_bang'
> > [    9.292290] thermal_sys: Registered thermal governor 'step_wise'
> > [    9.293530] thermal_sys: Registered thermal governor 'user_space'
> > [    9.294856] cpuidle: using governor ladder
> > [    9.296302] cpuidle: using governor menu
> >
> > Here the thermal governors are registered. I guess that they might
> > reduce speed of some HW.
> >
> > [...]
> > [   11.974147] clk: Disabling unused clocks
> >
> > Some unused clocks are disabled. I wonder if this might affect
> > counting the cycles.
> >
> > [   12.330852] Freeing unused kernel image (rodata/data gap) memory: 1500K
> > [   12.351191] Early printk times: mult=19634245, shift=26, offset=8732967929 ns
> >
> > End of stage C
> >
> > Here is the end on calibration.
> >
> > Now, if the frequence of the cycles was:
> >
> >    + was higher in the stage A when only cycles were stored
> >    + was lower in stage C when it was calibrated against local_clock()
> >
> > Then it might result in higher (calibrated) timestamps in stage A
> > and step back in stage B.
> >
> > Or something like this. It is possible that even local_clock() does
> > not have a stable frequence during the early boot.
> 
> In my VM on Hyper-V, I do see a problem with the results of your
> calibration code. Over the calibration interval, you calculate the delta
> number of nanoseconds from local_clock() and the delta number of
> TSC cycles. The delta TSC cycles divided by the delta nanoseconds
> should yield the TSC frequency. But the result of your calibration code
> is about 3.05 cycles/nsec, when the actual TSC frequency is 2.918
> cycles/nsec for the hardware I'm running on.
> 
> In a Linux VM where CONFIG_PARAVIRT=y, local_clock() eventually
> comes down to native_sched_clock(), which just reads the TSC and
> then converts to nanoseconds based on the kernel's understanding
> of the TSC frequency. So I don't think the local_clock() frequency is
> varying. But I'm thinking there are some adjustments being made
> to the value returned by local_clock() during early initialization, and
> I didn't try to track those down.

Thanks very much for this data point and information!

> 
> Hyper-V provides guest VMs with a synthetic clock (that is based
> on the TSC). As an experiment, I used that clock in the early time
> calibration, and everything worked properly. The calibration code
> produced delta nanoseconds and delta cycles that were exactly
> 2.918 cycles/nsec, and the transition from Stage A to Stage B was
> correct -- no cases of a smaller timestamp following a larger
> timestamp. So my conclusion is that the calibration is indeed
> problematic, though I haven't identified why the nanoseconds
> delta from local_clock() is smaller than it should be.

I'll do a deep dive on my calibration math and see if I can
figure out the problem.  One guess I have is that it could
be an error in precision due to not enough time accumulating in local_clock
before the start of calibration.  I plan to move the calibration start around,
as well as compare the math from my calibration with the one done by 
local_clock() to try to figure out the discrepancy.
 
> >
> > Idea: A solution might be to start calibration when printk()
> >       gets first non-zero time from local_clock.
> >
> >       Something like:
> >
> > diff --git a/include/linux/early_times.h b/include/linux/early_times.h
> > index 05388dcb8573..09d278996184 100644
> > --- a/include/linux/early_times.h
> > +++ b/include/linux/early_times.h
> > @@ -16,10 +16,13 @@ extern u64 start_ns;
> >  extern u32 early_mult, early_shift;
> >  extern u64 early_ts_offset;
> >
> > -static inline void early_times_start_calibration(void)
> > +static inline void early_times_may_start_calibration(u64 ts_ns)
> >  {
> > +	if (start_ns)
> > +		return;
> > +
> > +	start_ns = ts_ns;
> >  	start_cycles = get_cycles();
> > -	start_ns = local_clock();
> >  }
> >
> >  static inline void early_times_finish_calibration(void)
> > diff --git a/init/main.c b/init/main.c
> > index 27835270dfb5..a333b0da69cf 100644
> > --- a/init/main.c
> > +++ b/init/main.c
> > @@ -1123,9 +1123,6 @@ void start_kernel(void)
> >  	timekeeping_init();
> >  	time_init();
> >
> > -	/* This must be after timekeeping is initialized */
> > -	early_times_start_calibration();
> > -
> >  	/* This must be after timekeeping is initialized */
> >  	random_init();
> >
> > diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> > index 774ffb1fa5ac..19330b6b4eb2 100644
> > --- a/kernel/printk/printk.c
> > +++ b/kernel/printk/printk.c
> > @@ -2306,7 +2306,9 @@ int vprintk_store(int facility, int level,
> >  	 * timestamp with respect to the caller.
> >  	 */
> >  	ts_nsec = local_clock();
> > -	if (!ts_nsec)
> > +	if (ts_nsec)
> > +		early_times_may_start_calibration(ts_nsec);
> > +	else
> >  		ts_nsec = early_cycles();
> >
> >  	caller_id = printk_caller_id();
> >
> >
> > > >
> > > > 2) A Linux VM running in the Azure cloud is also running on Hyper-V. Such a
> > > > VM typically uses cloud-init to set everything up at boot time, and cloud-init
> > > > is outputting lines to the serial console with a timestamp that looks like the
> > > > printk() timestamp, but apparently is not adjusted for the early timestamping
> > > > that this patch adds. Again, I haven't debugged what's going on -- I'm not
> > > > immediately sure of the mechanism that cloud-init uses to do output to the
> > > > serial console. The use of the Hyper-V synthetic clock source might the cause
> > > > of the problem here as well. Here's an output snippet from the serial console:
> > > >
> > > > [   20.330414] systemd[1]: Condition check resulted in OpenVSwitch configuration for cleanup being skipped.
> > > > [   20.332911] systemd[1]: Condition check resulted in Set Up Additional Binary Formats being skipped.
> > > > [   20.333257] pstore: Registered efi_pstore as persistent store backend
> > > > [   20.334360] systemd[1]: Condition check resulted in File System Check on Root Device being skipped.
> > > > [   20.338319] systemd[1]: Starting Load Kernel Modules...
> > > > [   20.341094] systemd[1]: Starting Remount Root and Kernel File Systems...
> > > > [   20.350993] systemd[1]: Starting udev Coldplug all Devices...
> > > > [   20.356255] systemd[1]: Starting Uncomplicated firewall...
> > > > [   20.361536] systemd[1]: Started Journal Service.
> > > > [   20.386902] EXT4-fs (sda1): re-mounted c02dce0c-0c40-4e6e-88af-c5a0987b0adb r/w.
> > > > [   22.532033] /dev/sr0: Can't lookup blockdev
> > > > [    7.955973] cloud-init[783]: Cloud-init v. 24.3.1-0ubuntu0~20.04.1 running 'init-local' at Wed, 11 Mar 2026 15:27:06 +0000. Up 7.48
> seconds.
> > > > [    9.933120] cloud-init[822]: Cloud-init v. 24.3.1-0ubuntu0~20.04.1 running 'init' at Wed, 11 Mar 2026 15:27:08 +0000. Up 9.82
> seconds.
> > > > [    9.935483] cloud-init[822]: ci-info: ++++++++++++++++++++++++++++++++++++++Net device
> info+++++++++++++++++++++++++++++++++++++++
> > > > [    9.937726] cloud-init[822]: ci-info: +--------+------+-----------------------------+---------------+--------+-------------------+
> > > > [    9.939905] cloud-init[822]: ci-info: | Device |  Up  |           Address           |      Mask | Scope  |     Hw-Address    |
> > > > [    9.942059] cloud-init[822]: ci-info: +--------+------+-----------------------------+---------------+--------+-------------------+
> 
> The cloud-init output to the serial console is coming from syslog, which is
> presumably writing directly to /dev/console. This output is also recorded
> in the syslog log files (/var/log/syslog in my Ubuntu system), though with
> timestamps in a text format like
> 
> 2026-03-14T08:44:24.781241-07:00
> 
> The timestamps recorded in /var/log/syslog, and as shown with the
> 'journalctl' command, are monontonic using that full date/time format. But
> 'journalctl' with the "-o short-monotonic" option shows the seconds-since-boot
> format, and in that case, the cloud-init timestamps are discontinuous with the
> kernel messages, like in the serial console output. I don't know exactly where
> journalctl gets its knowledge of the boot time, but among the possibilities are:
> 
> /proc/uptime
> /proc/stat (the "btime" field)
> 
> These are not adjusted for using early boot times. And it's not clear whether
> they should be -- I don't know what the big picture implications would be.
> And there are probably other places the boot time is available to user space.
> If using early boot times is intended to be only for occasional diagnostic use,
> then maybe living with the discontinuity is OK. I see that Shashank Balaji has
> also commented about userspace issues, which covers this syslog case.
> 
> I think all this gives a first-level explanation of what I'm seeing in a
> Hyper-V guest. I don't think any of it is specific to Hyper-V guests
> or the Hyper-V synthetic clock sources. The issues are more generic.
> Sorry. :-(

Yeah - I was worried about other timestamps not being in synchronization
with my adjusted printk timestamps.  Looks like that worry was justified.

At this point, there are two avenues:

 1) double-down and embed the offset (from power-on rather than from time_init)
    all the way into local_clock() and/or whatever is providing CLOCK_MONOTONIC
    and CLOCK_BOOTTIME, or

 2) back off, and abandon adding the offset to local_clock()-based printk timestamps.
    This would leave a discontinuity when EARLY_PRINTK_TIMES was enabled,
    between the (now) non-zero early printk timestamps and the ones following
    time_init().  This has the benefit of changing less code, and only affecting
    the early printk timestamps (and none of the rest of the system).  And it has
    the downside of leaving the possibly confusing time discontinuity
    early in the kernel log.  So far, I haven't seen any tools confused by this, and
    I can put a message before time_init() to inform humans about the switch.

The purpose of this patch is really focused on that early period of boot, where all
other timing and tracing mechanisms are unavailable, and limiting the impact to
just those early (currently zero) timestamps seems like the best course.

Of course I still need to get the calibration correct, and I'll work on that
before sending another update.
 -- Tim

^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: [PATCH v3] printk: fix zero-valued printk timestamps in early boot
  2026-03-24 19:47           ` Bird, Tim
@ 2026-03-26  9:24             ` John Ogness
  0 siblings, 0 replies; 36+ messages in thread
From: John Ogness @ 2026-03-26  9:24 UTC (permalink / raw)
  To: Bird, Tim, Michael Kelley, Petr Mladek, Shashank Balaji
  Cc: rostedt@goodmis.org, senozhatsky@chromium.org, francesco@valla.it,
	geert@linux-m68k.org, linux-embedded@vger.kernel.org,
	linux-kernel@vger.kernel.org, Thomas Gleixner, John Stultz,
	Stephen Boyd

Hi Tim,

On 2026-03-24, "Bird, Tim" <Tim.Bird@sony.com> wrote:
> Yeah - I was worried about other timestamps not being in synchronization
> with my adjusted printk timestamps.  Looks like that worry was justified.
>
> At this point, there are two avenues:
>
>  1) double-down and embed the offset (from power-on rather than from time_init)
>     all the way into local_clock() and/or whatever is providing CLOCK_MONOTONIC
>     and CLOCK_BOOTTIME, or
>
>  2) back off, and abandon adding the offset to local_clock()-based printk timestamps.
>     This would leave a discontinuity when EARLY_PRINTK_TIMES was enabled,
>     between the (now) non-zero early printk timestamps and the ones following
>     time_init().  This has the benefit of changing less code, and only affecting
>     the early printk timestamps (and none of the rest of the system).  And it has
>     the downside of leaving the possibly confusing time discontinuity
>     early in the kernel log.  So far, I haven't seen any tools confused by this, and
>     I can put a message before time_init() to inform humans about the switch.
>
> The purpose of this patch is really focused on that early period of boot, where all
> other timing and tracing mechanisms are unavailable, and limiting the impact to
> just those early (currently zero) timestamps seems like the best course.

I have always felt that printk timestamping should be using a
userspace-accessible clock, such as CLOCK_MONOTONIC, rather than the CPU
local clock. This simplifies applications coordinating their own logs
with raw kernel logs.

I was wondering if your pre-boot timing could be used as the init values
for CLOCK_MONOTONIC, so that CLOCK_MONOTONIC is a clean continuation of
your pre-boot clocking.

And then we could use this opportunity to switch printk to
CLOCK_MONOTONIC.

This might also make sense if initializing CLOCK_MONOTONIC is somehow
more straight forward that tracking an extra CPU local clock diff.

John Ogness

^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: [PATCH v3] printk: fix zero-valued printk timestamps in early boot
  2026-03-13 10:45       ` Petr Mladek
  2026-03-14 14:16         ` Shashank Balaji
  2026-03-14 16:15         ` Michael Kelley
@ 2026-03-20 18:15         ` Bird, Tim
  2 siblings, 0 replies; 36+ messages in thread
From: Bird, Tim @ 2026-03-20 18:15 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Michael Kelley, rostedt@goodmis.org, senozhatsky@chromium.org,
	francesco@valla.it, geert@linux-m68k.org,
	linux-embedded@vger.kernel.org, linux-kernel@vger.kernel.org,
	Thomas Gleixner, John Stultz, Stephen Boyd, John Ogness



> -----Original Message-----
> From: Petr Mladek <pmladek@suse.com>
> Finally added timekeeping maintainers and John into Cc.
> We should have added time since v1.

Thanks for catching that.  I'll try to remember to add those (and use John Ogness' correct
email) going forward.

> 
> Anyway, you might see the entire history at
> https://lore.kernel.org/all/39b09edb-8998-4ebd-a564-7d594434a981@bird.org/
> 
> 
> On Fri 2026-03-13 04:52:40, Bird, Tim wrote:
> > Hey Micheal,
> >
> > This report is very interesting.
> >
> > Thanks very much for trying it out!
> >
> > > -----Original Message-----
> > > From: Michael Kelley <mhklinux@outlook.com>
> > > Sent: Wednesday, March 11, 2026 9:47 AM
> > > From: Tim Bird <tim.bird@sony.com> Sent: Tuesday, February 10, 2026 3:48 PM
> > > >
> > > > During early boot, printk timestamps are reported as zero before
> > > > kernel timekeeping starts (e.g. before time_init()).  This
> > > > hinders boot-time optimization efforts.  This period is about 400
> > > > milliseconds for many current desktop and embedded machines
> > > > running Linux.
> > > >
> > > > Add support to save cycles during early boot, and output correct
> > > > timestamp values after timekeeping is initialized.  get_cycles()
> > > > is operational on arm64 and x86_64 from kernel start.  Add code
> > > > and variables to save calibration values used to later convert
> > > > cycle counts to time values in the early printks.  Add a config
> > > > to control the feature.
> > > >
> > > > This yields non-zero timestamps for printks from the very start
> > > > of kernel execution.  The timestamps are relative to the start of
> > > > the architecture-specified counter used in get_cycles
> > > > (e.g. the TSC on x86_64 and cntvct_el0 on arm64).
> > > >
> > > > All timestamps reflect time from processor power-on instead of
> > > > time from the kernel's timekeeping initialization.
> > >
> > > I tried this patch in linux-next20260302 kernel running as a guest VM
> > > on a Hyper-V host. Two things:
> > >
> > > 1) In the dmesg output, I'm seeing a place where the timestamps briefly go
> > > backwards -- i.e., they are not monotonically increasing. Here's a snippet,
> > > where there's a smaller timestamp immediately after the tsc is detected:
> > >
> > > [   27.994891] SMBIOS 3.1.0 present.
> > > [   27.994893] DMI: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.1 09/25/2025
> > > [   27.994898] DMI: Memory slots populated: 2/2
> > > [   27.995202] Hypervisor detected: Microsoft Hyper-V
> > > [   27.995205] Hyper-V: privilege flags low 0xae7f, high 0x3b8030, ext 0x62, hints 0xa0e24, misc 0xe0bed7b2
> > > [   27.995208] Hyper-V: Nested features: 0x0
> > > [   27.995209] Hyper-V: LAPIC Timer Frequency: 0xc3500
> > > [   27.995210] Hyper-V: Using hypercall for remote TLB flush
> > > [   27.995216] clocksource: hyperv_clocksource_tsc_page: mask: 0xffffffffffffffff max_cycles: 0x24e6a1710, max_idle_ns: 440795202120
> ns
> > > [   27.995218] clocksource: hyperv_clocksource_msr: mask: 0xffffffffffffffff max_cycles: 0x24e6a1710, max_idle_ns: 440795202120 ns
> > > [   27.995220] tsc: Detected 2918.401 MHz processor
> >
> > I wonder if the tsc is getting fiddled with or virtualized somewhere in here, as part of clocksource initialization.
> > I believe each clocksource in the kernel maintains it's own internal offset, and maybe the offset that is
> > being used ends up being slightly different from the cycle-counter offset that the early_times feature uses.
> > I'm just throwing out guesses.  It's about a 4ms delta, which is pretty big.
> >
> > > [   27.991060] e820: update [mem 0x00000000-0x00000fff] System RAM ==> device reserved
> > > [   27.991062] e820: remove [mem 0x000a0000-0x000fffff] System RAM
> > > [   27.991064] last_pfn = 0x210000 max_arch_pfn = 0x400000000
> > > [   27.991065] x86/PAT: PAT support disabled because CONFIG_X86_PAT is disabled in the kernel.
> > > [   27.991066] x86/PAT: Configuration [0-7]: WB  WT  UC- UC  WB  WT  UC- UC
> 
> I wonder how the calibration of the cycles is precise. I wonder if
> the problem might be that cycles were faster right after boot than
> later during the calibration.

On x86_64 TSC has been stable on all processors produced since about
2008, when invariant TSC were introduced.  They shouldn't be changing
speeds or doing weird things.

> 
> I added the following debug output on top of this patch:
> 
> diff --git a/include/linux/early_times.h b/include/linux/early_times.h
> index 05388dcb8573..cdb467345bcc 100644
> --- a/include/linux/early_times.h
> +++ b/include/linux/early_times.h
> @@ -20,6 +20,7 @@ static inline void early_times_start_calibration(void)
>  {
>  	start_cycles = get_cycles();
>  	start_ns = local_clock();
> +	pr_info("Early printk times: started callibration: %llu ns\n", start_ns);
>  }
> 
>  static inline void early_times_finish_calibration(void)
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index 774ffb1fa5ac..836cb03aaa6d 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -2308,6 +2308,8 @@ int vprintk_store(int facility, int level,
>  	ts_nsec = local_clock();
>  	if (!ts_nsec)
>  		ts_nsec = early_cycles();
> +	else
> +		pr_info_once("local_clock() returned non-zero timestamp: %llu nsec\n", ts_nsec);
> 
>  	caller_id = printk_caller_id();
> 
> 
> And it produced in my kvm:
> 
> Let's say that start of the cycle counter is
> 
> Start of stage A
> 
> [    8.684438] Linux version 7.0.0-rc2-default+ (pmladek@pathway) (gcc (SUSE Linux) 15.2.1 20260202, GNU ld (GNU Binutils; openSUSE
> Tumbleweed) 2.45.0.20251103-2) #571 SMP PREEMPT_DYNAMIC Fri Mar 13 10:23:54 CET 2026
> [    8.684442] Command line: BOOT_IMAGE=/boot/vmlinuz-7.0.0-rc2-default+ root=/dev/vda2 resume=/dev/disk/by-uuid/369c7453-3d16-
> 409d-88b2-5de027891a12 mitigations=auto nosplash earlycon=uart8250,io,0x3f8,115200 console=ttyS0,115200 console=tty0 ignore_loglevel
> log_buf_len=1M crashkernel=512M,high crashkernel=72M,low
> [...]
> [    8.696633] earlycon: uart8250 at I/O port 0x3f8 (options '115200')
> [    8.696639] printk: legacy bootconsole [uart8250] enabled
> [    8.731303] printk: debug: ignoring loglevel setting.
> [    8.732349] NX (Execute Disable) protection: active
> [    8.733447] APIC: Static calls initialized
> [    8.734667] SMBIOS 2.8 present.
> [    8.735358] DMI: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.17.0-2-g4f253b9b-prebuilt.qemu.org 04/01/2014
> [    8.737285] DMI: Memory slots populated: 1/1
> [    8.738380] Hypervisor detected: KVM
> [    8.739151] last_pfn = 0x7ffdc max_arch_pfn = 0x400000000
> [    8.740254] kvm-clock: Using msrs 4b564d01 and 4b564d00
> [    8.732971] printk: local_clock() returned non-zero timestamp: 3486 nsec
> 
> End of stage A
> 
> This is the point where printk() started storing the values from
> local_clock() instead of cycles.

Thanks for measuring this.

That's suspiciously close to the approximate 4ms different that Michael saw.
It may be that my offset calculation is off.  Michael also mentioned that the
math seemed to be off on the calibration.  I'll review the math and run some tests
and see if I can figure out where the problem is.  Another option
is to just eliminate the early_ts_offset, and live with the discontinuity going
from cycle-based timestamps to local-clock()-based timestamps.

> 
> Start of stage B
> 
> [    8.732971] kvm-clock: using sched offset of 252367014082295 cycles
> [    8.735471] clocksource: kvm-clock: mask: 0xffffffffffffffff max_cycles: 0x1cd42e4dffb, max_idle_ns: 881590591483 ns
> [    8.738880] tsc: Detected 3293.776 MHz processor
> [    8.740474] e820: update [mem 0x00000000-0x00000fff] System RAM ==> device reserved
> [...]
> [    8.932671] rcu: srcu_init: Setting srcu_struct sizes based on contention.
> [    8.934047] Early printk times: started callibration: 201079507 ns
> 
> End of stage B
> 
> This is where we started calibration of early cycles.
> 
> Start of stage C
> 
> [    8.935571] Console: colour dummy device 80x25
> [...]
> [    9.289673] thermal_sys: Registered thermal governor 'fair_share'
> [    9.290077] thermal_sys: Registered thermal governor 'bang_bang'
> [    9.292290] thermal_sys: Registered thermal governor 'step_wise'
> [    9.293530] thermal_sys: Registered thermal governor 'user_space'
> [    9.294856] cpuidle: using governor ladder
> [    9.296302] cpuidle: using governor menu
> 
> Here the thermal governors are registered. I guess that they might
> reduce speed of some HW.
> 
> [...]
> [   11.974147] clk: Disabling unused clocks
> 
> Some unused clocks are disabled. I wonder if this might affect
> counting the cycles.
> 
> [   12.330852] Freeing unused kernel image (rodata/data gap) memory: 1500K
> [   12.351191] Early printk times: mult=19634245, shift=26, offset=8732967929 ns
> 
> End of stage C
> 
> Here is the end on calibration.
> 
> Now, if the frequence of the cycles was:
> 
>    + was higher in the stage A when only cycles were stored
>    + was lower in stage C when it was calibrated against local_clock()
> 
> Then it might result in higher (calibrated) timestamps in stage A
> and step back in stage B.
> 
> Or something like this. It is possible that even local_clock() does
> not have a stable frequence during the early boot.

Most x86_64 platforms are using TSC as the basis for local_clock(), and it should
be stable (have invariant frequency) on all modern processors.
In the vast majority of cases, I expect that the cycles being used for
early_cycles is coming from the exact same base hardware (the TSC) as local_clock(), so
there shouldn't be discrepancies.
(Someone with more clock expertise than me, please correct me if I'm wrong.
Even on virtualized systems, the underlying hardware for the TSC instruction is still just the real
TSC hardware on the physical platform).

I suspect that I just got the math wrong somewhere, or there's an unaccounted-for delay
somewhere I didn't take into account when calculating the offset.  I'll review it and let
you know what I find.

> 
> Idea: A solution might be to start calibration when printk()
>       gets first non-zero time from local_clock.

Actually, I want the values used in the calibration to have accumulated enough
time for some accuracy to set in.  If I grab local_clock() too close to zero,
then the math is less accurate.

> 
>       Something like:
> 
> diff --git a/include/linux/early_times.h b/include/linux/early_times.h
> index 05388dcb8573..09d278996184 100644
> --- a/include/linux/early_times.h
> +++ b/include/linux/early_times.h
> @@ -16,10 +16,13 @@ extern u64 start_ns;
>  extern u32 early_mult, early_shift;
>  extern u64 early_ts_offset;
> 
> -static inline void early_times_start_calibration(void)
> +static inline void early_times_may_start_calibration(u64 ts_ns)
>  {
> +	if (start_ns)
> +		return;
> +
> +	start_ns = ts_ns;
>  	start_cycles = get_cycles();
> -	start_ns = local_clock();
>  }
> 
>  static inline void early_times_finish_calibration(void)
> diff --git a/init/main.c b/init/main.c
> index 27835270dfb5..a333b0da69cf 100644
> --- a/init/main.c
> +++ b/init/main.c
> @@ -1123,9 +1123,6 @@ void start_kernel(void)
>  	timekeeping_init();
>  	time_init();
> 
> -	/* This must be after timekeeping is initialized */
> -	early_times_start_calibration();
> -
>  	/* This must be after timekeeping is initialized */
>  	random_init();
> 
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index 774ffb1fa5ac..19330b6b4eb2 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -2306,7 +2306,9 @@ int vprintk_store(int facility, int level,
>  	 * timestamp with respect to the caller.
>  	 */
>  	ts_nsec = local_clock();
> -	if (!ts_nsec)
> +	if (ts_nsec)
> +		early_times_may_start_calibration(ts_nsec);
> +	else
>  		ts_nsec = early_cycles();
> 
>  	caller_id = printk_caller_id();
> 
> 
> > >
> > > 2) A Linux VM running in the Azure cloud is also running on Hyper-V. Such a
> > > VM typically uses cloud-init to set everything up at boot time, and cloud-init
> > > is outputting lines to the serial console with a timestamp that looks like the
> > > printk() timestamp, but apparently is not adjusted for the early timestamping
> > > that this patch adds. Again, I haven't debugged what's going on -- I'm not
> > > immediately sure of the mechanism that cloud-init uses to do output to the
> > > serial console. The use of the Hyper-V synthetic clock source might the cause
> > > of the problem here as well. Here's an output snippet from the serial console:
> > >
> > > [   20.330414] systemd[1]: Condition check resulted in OpenVSwitch configuration for cleanup being skipped.
> > > [   20.332911] systemd[1]: Condition check resulted in Set Up Additional Binary Formats being skipped.
> > > [   20.333257] pstore: Registered efi_pstore as persistent store backend
> > > [   20.334360] systemd[1]: Condition check resulted in File System Check on Root Device being skipped.
> > > [   20.338319] systemd[1]: Starting Load Kernel Modules...
> > > [   20.341094] systemd[1]: Starting Remount Root and Kernel File Systems...
> > > [   20.350993] systemd[1]: Starting udev Coldplug all Devices...
> > > [   20.356255] systemd[1]: Starting Uncomplicated firewall...
> > > [   20.361536] systemd[1]: Started Journal Service.
> > > [   20.386902] EXT4-fs (sda1): re-mounted c02dce0c-0c40-4e6e-88af-c5a0987b0adb r/w.
> > > [   22.532033] /dev/sr0: Can't lookup blockdev
> > > [    7.955973] cloud-init[783]: Cloud-init v. 24.3.1-0ubuntu0~20.04.1 running 'init-local' at Wed, 11 Mar 2026 15:27:06 +0000. Up 7.48
> > > seconds.
> > > [    9.933120] cloud-init[822]: Cloud-init v. 24.3.1-0ubuntu0~20.04.1 running 'init' at Wed, 11 Mar 2026 15:27:08 +0000. Up 9.82 seconds.
> > > [    9.935483] cloud-init[822]: ci-info: ++++++++++++++++++++++++++++++++++++++Net device
> > > info+++++++++++++++++++++++++++++++++++++++
> > > [    9.937726] cloud-init[822]: ci-info: +--------+------+-----------------------------+---------------+--------+-------------------+
> > > [    9.939905] cloud-init[822]: ci-info: | Device |  Up  |           Address           |      Mask     | Scope  |     Hw-Address    |
> > > [    9.942059] cloud-init[822]: ci-info: +--------+------+-----------------------------+---------------+--------+-------------------+
> 
> This is more complicated. I wonder if the timestamps from cloud-init()
> are somehow synchronized with local_clock().

This raises an important issue, which is that anyone using local_clock() will get timestamps
relative to time_init(), whereas with the early times patch applied, printk is using timestamps
relative to power on (well, TSC init which is almost always the same thing).  I'm still a bit
confused with how cloud-init is getting it's messages intermingled with the printk
messages?  Is this a feature of systemd?

> 
> We might need to synchronize local_clock() with the cycles as well.

I thought about modifying local_clock to use early_ts_offset, so that all users
of local_clock() would get timestamps with an offset adjusted so they were
relative to power on, but that was a much more intrusive patch.

> But there is the chicken&egg problem. We need:
> 
>     + to know the offset caused by cycles when local_clock() gets initialized.
>     + local_clock() running for some time to calibrate cycles.
> 
> Hmm, I see that time-management people are not in Cc. We should have
> added them since v1.
> 
> I add them now. Better late than never.

Thanks for the feedback and testing!  I really appreciate it.
 -- Tim


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v3] printk: fix zero-valued printk timestamps in early boot
  2026-02-10 23:47 ` [PATCH v3] " Tim Bird
                     ` (3 preceding siblings ...)
  2026-03-11 15:47   ` Michael Kelley
@ 2026-03-26 13:17   ` Thomas Gleixner
  4 siblings, 0 replies; 36+ messages in thread
From: Thomas Gleixner @ 2026-03-26 13:17 UTC (permalink / raw)
  To: Tim Bird, pmladek, rostedt, john.ogness, senozhatsky
  Cc: francesco, geert, linux-embedded, linux-kernel, Tim Bird

On Tue, Feb 10 2026 at 16:47, Tim Bird wrote:
> During early boot, printk timestamps are reported as zero before
> kernel timekeeping starts (e.g. before time_init()).  This
> hinders boot-time optimization efforts.  This period is about 400
> milliseconds for many current desktop and embedded machines
> running Linux.
>
> Add support to save cycles during early boot, and output correct
> timestamp values after timekeeping is initialized.  get_cycles()
> is operational on arm64 and x86_64 from kernel start.  Add code
> and variables to save calibration values used to later convert
> cycle counts to time values in the early printks.  Add a config
> to control the feature.
>
> This yields non-zero timestamps for printks from the very start
> of kernel execution.  The timestamps are relative to the start of
> the architecture-specified counter used in get_cycles
> (e.g. the TSC on x86_64 and cntvct_el0 on arm64).
>
> All timestamps reflect time from processor power-on instead of
> time from the kernel's timekeeping initialization.

Can we pretty please _not_ introduce yet another side channel to
generate time stamps?

printk()

   time_ns = local_clock();

local_clock()
  local_clock_noinstr()
  	// After boot
	if (static_branch_likely(&__sched_clock_stable))
		return sched_clock_noinstr() + __sched_clock_offset;

	// Before sched_clock_init()
	if (!static_branch_likely(&sched_clock_running))
		return sched_clock_noinstr();

	clock = sched_clock_local(this_scd());

On x86:
sched_clock_noinstr()
  // bare metal
  native_sched_clock()
     // After TSC calibration
     if (static_branch_likely(&__use_tsc)) {
       ...
     }

  // Jiffies fallback.

So the obvious solution is to expand the fallback with:

    if (tsc_available())
    	return tsc_early_uncalibrated();

    return jiffies ....;

As this needs to be supported by the architecture/platform in any case
there is close to zero benefit from creating complicated generic
infrastructure for this.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2026-03-26 13:17 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-25  5:30 [PATCH] printk: add early_counter_ns routine for printk blind spot Tim Bird
2025-11-25  7:52 ` kernel test robot
2025-11-25 13:08 ` Francesco Valla
2025-11-26  7:38   ` Geert Uytterhoeven
2025-11-27  0:16     ` Bird, Tim
2025-11-27 16:16       ` Petr Mladek
2025-11-26 12:55   ` Petr Mladek
2025-11-27  0:03     ` Bird, Tim
2025-11-26 11:13 ` Petr Mladek
2025-11-27  9:13 ` kernel test robot
2026-01-24 19:40 ` [PATCH v2] printk: fix zero-valued printk timestamps in early boot Tim Bird
2026-01-25 14:41   ` Francesco Valla
2026-01-26 16:52     ` Bird, Tim
2026-02-02 16:23       ` Petr Mladek
2026-01-26 10:12   ` Geert Uytterhoeven
2026-01-26 17:11     ` Bird, Tim
2026-01-27  8:10       ` Geert Uytterhoeven
2026-02-10 23:47 ` [PATCH v3] " Tim Bird
2026-03-04 11:23   ` Petr Mladek
2026-03-09 17:27   ` Shashank Balaji
2026-03-10 10:43     ` Petr Mladek
2026-03-10 19:17     ` Bird, Tim
2026-03-09 19:25   ` Shashank Balaji
2026-03-10 11:39     ` Petr Mladek
2026-03-10 18:54       ` Bird, Tim
2026-03-11 15:45         ` Petr Mladek
2026-03-11 15:47   ` Michael Kelley
2026-03-13  4:52     ` Bird, Tim
2026-03-13 10:45       ` Petr Mladek
2026-03-14 14:16         ` Shashank Balaji
2026-03-24 20:07           ` Bird, Tim
2026-03-14 16:15         ` Michael Kelley
2026-03-24 19:47           ` Bird, Tim
2026-03-26  9:24             ` John Ogness
2026-03-20 18:15         ` Bird, Tim
2026-03-26 13:17   ` Thomas Gleixner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox