* [PATCH] printk: add early_counter_ns routine for printk blind spot
@ 2025-11-25 5:30 Tim Bird
2025-11-25 7:52 ` kernel test robot
` (3 more replies)
0 siblings, 4 replies; 10+ messages in thread
From: Tim Bird @ 2025-11-25 5:30 UTC (permalink / raw)
To: pmladek, Steve Rostedt, john.ogness, senozhatsky
Cc: Tim Bird, Andrew Morton, Francesco Valla, LKML, Linux Embedded
From: Tim Bird <tim.bird@sony.com>
During early boot, printk timestamps are reported as zero,
which creates a blind spot in early boot timings. This blind
spot hinders timing and optimization efforts for code that
executes before time_init(), which is when local_clock() is
initialized sufficiently to start returning non-zero timestamps.
This period is about 400 milliseconds for many current desktop
and embedded machines running Linux.
Add an early_counter_ns function that returns nanosecond
timestamps based on get_cycles(). get_cycles() is operational
on arm64 and x86_64 from kernel start. Add some calibration
printks to allow setting configuration variables that are used
to convert cycle counts to nanoseconds (which are then used
in early printks). Add CONFIG_EARLY_COUNTER_NS, as well as
some associated conversion variables, as new kernel config
variables.
After proper configuration, this yields non-zero timestamps for
printks from the very start of kernel execution. The timestamps
are relative to the start of the architecture-specific counter
used in get_cycles (e.g. the TSC on x86_64 and cntvct_el0 on arm64).
This means that the time reported reflects time-from-power-on for
most embedded products. This is also a useful data point for
boot-time optimization work.
Note that there is a discontinuity in the timestamp sequencing
when standard clocks are finally initialized in time_init().
The printk timestamps are thus not monotonically increasing
through the entire boot.
Signed-off-by: Tim Bird <tim.bird@sony.com>
---
init/Kconfig | 47 ++++++++++++++++++++++++++++++++++++++++++
init/main.c | 25 ++++++++++++++++++++++
kernel/printk/printk.c | 15 ++++++++++++++
3 files changed, 87 insertions(+)
diff --git a/init/Kconfig b/init/Kconfig
index cab3ad28ca49..5352567c43ed 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -770,6 +770,53 @@ config IKHEADERS
or similar programs. If you build the headers as a module, a module called
kheaders.ko is built which can be loaded on-demand to get access to headers.
+config EARLY_COUNTER_NS
+ bool "Use counter for early printk timestamps"
+ default y
+ depends on PRINTK
+ help
+ Use a cycle-counter to provide printk timestamps during early
+ boot. This allows seeing timing information that would
+ otherwise be displayed with 0-valued timestamps.
+
+ In order for this to work, you need to specify values for
+ EARLY_COUNTER_MULT and EARLY_COUNTER_SHIFT, used to convert
+ from the cycle count to nanoseconds.
+
+config EARLY_COUNTER_MULT
+ int "Multiplier for early cycle counter"
+ depends on PRINTK && EARLY_COUNTER_NS
+ default 1
+ help
+ This value specifies a multiplier to be used when converting
+ cycle counts to nanoseconds. The formula used is:
+ ns = (cycles * mult) >> shift
+
+ Use a multiplier that will bring the value of (cycles * mult)
+ to near a power of two, that is greater than 1000. The
+ nanoseconds returned by this conversion are divided by 1000
+ to be used as the printk timestamp counter (with resolution
+ of microseconds).
+
+ As an example, for a cycle-counter with a frequency of 200 Mhz,
+ the multiplier would be: 10485760, and the shift would be 21.
+
+config EARLY_COUNTER_SHIFT
+ int "Shift value for early cycle counter"
+ range 0 63
+ depends on PRINTK && EARLY_COUNTER_NS
+ default 0
+ help
+ This value specifies a shift value to be used when converting
+ cycle counts to nanoseconds. The formula used is:
+ ns = (cycles * mult) >> shift
+
+ Use a shift that will bring the result to a value
+ in nanoseconds.
+
+ As an example, for a cycle-counter with a frequency of 200 Mhz,
+ the multiplier would be: 10485760, and the shift would be 21.
+
config LOG_BUF_SHIFT
int "Kernel log buffer size (16 => 64KB, 17 => 128KB)"
range 12 25
diff --git a/init/main.c b/init/main.c
index 07a3116811c5..587aaaad22d1 100644
--- a/init/main.c
+++ b/init/main.c
@@ -105,6 +105,8 @@
#include <linux/ptdump.h>
#include <linux/time_namespace.h>
#include <net/net_namespace.h>
+#include <linux/timex.h>
+#include <linux/sched/clock.h>
#include <asm/io.h>
#include <asm/setup.h>
@@ -906,6 +908,8 @@ static void __init early_numa_node_init(void)
#endif
}
+static u64 start_cycles, start_ns;
+
asmlinkage __visible __init __no_sanitize_address __noreturn __no_stack_protector
void start_kernel(void)
{
@@ -1023,6 +1027,10 @@ void start_kernel(void)
timekeeping_init();
time_init();
+ /* used to calibrate early_counter_ns */
+ start_cycles = get_cycles();
+ start_ns = local_clock();
+
/* This must be after timekeeping is initialized */
random_init();
@@ -1474,6 +1482,8 @@ void __weak free_initmem(void)
static int __ref kernel_init(void *unused)
{
int ret;
+ u64 end_cycles, end_ns;
+ u32 early_mult, early_shift;
/*
* Wait until kthreadd is all set-up.
@@ -1505,6 +1515,21 @@ static int __ref kernel_init(void *unused)
do_sysctl_args();
+ /* show calibration data for early_counter_ns */
+ end_cycles = get_cycles();
+ end_ns = local_clock();
+ clocks_calc_mult_shift(&early_mult, &early_shift,
+ ((end_cycles - start_cycles) * NSEC_PER_SEC)/(end_ns - start_ns),
+ NSEC_PER_SEC, 50);
+
+#ifdef CONFIG_EARLY_COUNTER_NS
+ pr_info("Early Counter: start_cycles=%llu, end_cycles=%llu, cycles=%llu\n",
+ start_cycles, end_cycles, (end_cycles - start_cycles));
+ pr_info("Early Counter: start_ns=%llu, end_ns=%llu, ns=%llu\n",
+ start_ns, end_ns, (end_ns - start_ns));
+ pr_info("Early Counter: MULT=%u, SHIFT=%u\n", early_mult, early_shift);
+#endif
+
if (ramdisk_execute_command) {
ret = run_init_process(ramdisk_execute_command);
if (!ret)
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 5aee9ffb16b9..522dd24cd534 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2210,6 +2210,19 @@ static u16 printk_sprint(char *text, u16 size, int facility,
return text_len;
}
+#ifdef CONFIG_EARLY_COUNTER_NS
+static inline u64 early_counter_ns(void)
+{
+ return ((u64)get_cycles() * CONFIG_EARLY_COUNTER_MULT)
+ >> CONFIG_EARLY_COUNTER_SHIFT;
+}
+#else
+static inline u64 early_counter_ns(void)
+{
+ return 0;
+}
+#endif
+
__printf(4, 0)
int vprintk_store(int facility, int level,
const struct dev_printk_info *dev_info,
@@ -2239,6 +2252,8 @@ int vprintk_store(int facility, int level,
* timestamp with respect to the caller.
*/
ts_nsec = local_clock();
+ if (!ts_nsec)
+ ts_nsec = early_counter_ns();
caller_id = printk_caller_id();
--
2.43.0
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH] printk: add early_counter_ns routine for printk blind spot
2025-11-25 5:30 [PATCH] printk: add early_counter_ns routine for printk blind spot Tim Bird
@ 2025-11-25 7:52 ` kernel test robot
2025-11-25 13:08 ` Francesco Valla
` (2 subsequent siblings)
3 siblings, 0 replies; 10+ messages in thread
From: kernel test robot @ 2025-11-25 7:52 UTC (permalink / raw)
To: Tim Bird, pmladek, Steve Rostedt, john.ogness, senozhatsky
Cc: oe-kbuild-all, Tim Bird, Andrew Morton,
Linux Memory Management List, Francesco Valla, LKML,
Linux Embedded
Hi Tim,
kernel test robot noticed the following build errors:
[auto build test ERROR on linus/master]
[also build test ERROR on v6.18-rc7]
[cannot apply to akpm-mm/mm-everything next-20251124]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Tim-Bird/printk-add-early_counter_ns-routine-for-printk-blind-spot/20251125-133242
base: linus/master
patch link: https://lore.kernel.org/r/39b09edb-8998-4ebd-a564-7d594434a981%40bird.org
patch subject: [PATCH] printk: add early_counter_ns routine for printk blind spot
config: powerpc-allnoconfig (https://download.01.org/0day-ci/archive/20251125/202511251534.9kMSsAH6-lkp@intel.com/config)
compiler: powerpc-linux-gcc (GCC) 15.1.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20251125/202511251534.9kMSsAH6-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202511251534.9kMSsAH6-lkp@intel.com/
All errors (new ones prefixed by >>):
powerpc-linux-ld: init/main.o: in function `kernel_init':
>> main.c:(.ref.text+0x144): undefined reference to `__udivdi3'
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] printk: add early_counter_ns routine for printk blind spot
2025-11-25 5:30 [PATCH] printk: add early_counter_ns routine for printk blind spot Tim Bird
2025-11-25 7:52 ` kernel test robot
@ 2025-11-25 13:08 ` Francesco Valla
2025-11-26 7:38 ` Geert Uytterhoeven
2025-11-26 12:55 ` Petr Mladek
2025-11-26 11:13 ` Petr Mladek
2025-11-27 9:13 ` kernel test robot
3 siblings, 2 replies; 10+ messages in thread
From: Francesco Valla @ 2025-11-25 13:08 UTC (permalink / raw)
To: Tim Bird
Cc: pmladek, Steve Rostedt, john.ogness, senozhatsky, Tim Bird,
Andrew Morton, LKML, Linux Embedded
Hi Tim,
I tested this on my i.MX93 FRDM (arm64) board and after a bit of
fiddling with the MULT/SHIFT values I got it working. It can be a very
valuable addition.
Some comments follow.
On Mon, Nov 24, 2025 at 10:30:52PM -0700, Tim Bird wrote:
> From: Tim Bird <tim.bird@sony.com>
>
> During early boot, printk timestamps are reported as zero,
> which creates a blind spot in early boot timings. This blind
> spot hinders timing and optimization efforts for code that
> executes before time_init(), which is when local_clock() is
> initialized sufficiently to start returning non-zero timestamps.
> This period is about 400 milliseconds for many current desktop
> and embedded machines running Linux.
>
> Add an early_counter_ns function that returns nanosecond
> timestamps based on get_cycles(). get_cycles() is operational
> on arm64 and x86_64 from kernel start. Add some calibration
> printks to allow setting configuration variables that are used
> to convert cycle counts to nanoseconds (which are then used
> in early printks). Add CONFIG_EARLY_COUNTER_NS, as well as
> some associated conversion variables, as new kernel config
> variables.
>
> After proper configuration, this yields non-zero timestamps for
> printks from the very start of kernel execution. The timestamps
> are relative to the start of the architecture-specific counter
> used in get_cycles (e.g. the TSC on x86_64 and cntvct_el0 on arm64).
> This means that the time reported reflects time-from-power-on for
> most embedded products. This is also a useful data point for
> boot-time optimization work.
>
> Note that there is a discontinuity in the timestamp sequencing
> when standard clocks are finally initialized in time_init().
> The printk timestamps are thus not monotonically increasing
> through the entire boot.
This is... not going to work, IMO, and might lead to breakages in
userspace tools (are printk timings a userspace API?).
I actually have a counter-proposal: the time obtained through cycle
evaluation is used as an offset to be added to the printk time after
time_init() is called. A (working, but maybe sub-optimal) patch to
obtain this is attached at the end.
>
> Signed-off-by: Tim Bird <tim.bird@sony.com>
> ---
> init/Kconfig | 47 ++++++++++++++++++++++++++++++++++++++++++
> init/main.c | 25 ++++++++++++++++++++++
> kernel/printk/printk.c | 15 ++++++++++++++
> 3 files changed, 87 insertions(+)
>
> diff --git a/init/Kconfig b/init/Kconfig
> index cab3ad28ca49..5352567c43ed 100644
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -770,6 +770,53 @@ config IKHEADERS
> or similar programs. If you build the headers as a module, a module called
> kheaders.ko is built which can be loaded on-demand to get access to headers.
>
> +config EARLY_COUNTER_NS
> + bool "Use counter for early printk timestamps"
> + default y
> + depends on PRINTK
> + help
> + Use a cycle-counter to provide printk timestamps during early
> + boot. This allows seeing timing information that would
> + otherwise be displayed with 0-valued timestamps.
> +
> + In order for this to work, you need to specify values for
> + EARLY_COUNTER_MULT and EARLY_COUNTER_SHIFT, used to convert
> + from the cycle count to nanoseconds.
> +
> +config EARLY_COUNTER_MULT
> + int "Multiplier for early cycle counter"
> + depends on PRINTK && EARLY_COUNTER_NS
> + default 1
> + help
> + This value specifies a multiplier to be used when converting
> + cycle counts to nanoseconds. The formula used is:
> + ns = (cycles * mult) >> shift
> +
> + Use a multiplier that will bring the value of (cycles * mult)
> + to near a power of two, that is greater than 1000. The
> + nanoseconds returned by this conversion are divided by 1000
> + to be used as the printk timestamp counter (with resolution
> + of microseconds).
> +
> + As an example, for a cycle-counter with a frequency of 200 Mhz,
> + the multiplier would be: 10485760, and the shift would be 21.
> +
If I got this correclty:
EARLY_COUNTER_MULT = (10^9 / freq) << EARLY_COUNTER_SHIFT
where EARLY_COUNTER_SHIFT can be chosen at will, provided it is big
enough to survice the ns->us conversion but small enough not to overflow
the u64 container.
> +config EARLY_COUNTER_SHIFT
> + int "Shift value for early cycle counter"
> + range 0 63
> + depends on PRINTK && EARLY_COUNTER_NS
> + default 0
> + help
> + This value specifies a shift value to be used when converting
> + cycle counts to nanoseconds. The formula used is:
> + ns = (cycles * mult) >> shift
> +
> + Use a shift that will bring the result to a value
> + in nanoseconds.
> +
> + As an example, for a cycle-counter with a frequency of 200 Mhz,
> + the multiplier would be: 10485760, and the shift would be 21.
> +
> config LOG_BUF_SHIFT
> int "Kernel log buffer size (16 => 64KB, 17 => 128KB)"
> range 12 25
> diff --git a/init/main.c b/init/main.c
> index 07a3116811c5..587aaaad22d1 100644
> --- a/init/main.c
> +++ b/init/main.c
> @@ -105,6 +105,8 @@
> #include <linux/ptdump.h>
> #include <linux/time_namespace.h>
> #include <net/net_namespace.h>
> +#include <linux/timex.h>
> +#include <linux/sched/clock.h>
>
> #include <asm/io.h>
> #include <asm/setup.h>
> @@ -906,6 +908,8 @@ static void __init early_numa_node_init(void)
> #endif
> }
>
> +static u64 start_cycles, start_ns;
> +
> asmlinkage __visible __init __no_sanitize_address __noreturn __no_stack_protector
> void start_kernel(void)
> {
> @@ -1023,6 +1027,10 @@ void start_kernel(void)
> timekeeping_init();
> time_init();
>
> + /* used to calibrate early_counter_ns */
> + start_cycles = get_cycles();
> + start_ns = local_clock();
> +
> /* This must be after timekeeping is initialized */
> random_init();
>
> @@ -1474,6 +1482,8 @@ void __weak free_initmem(void)
> static int __ref kernel_init(void *unused)
> {
> int ret;
> + u64 end_cycles, end_ns;
> + u32 early_mult, early_shift;
>
> /*
> * Wait until kthreadd is all set-up.
> @@ -1505,6 +1515,21 @@ static int __ref kernel_init(void *unused)
>
> do_sysctl_args();
>
> + /* show calibration data for early_counter_ns */
> + end_cycles = get_cycles();
> + end_ns = local_clock();
> + clocks_calc_mult_shift(&early_mult, &early_shift,
> + ((end_cycles - start_cycles) * NSEC_PER_SEC)/(end_ns - start_ns),
> + NSEC_PER_SEC, 50);
> +
> +#ifdef CONFIG_EARLY_COUNTER_NS
> + pr_info("Early Counter: start_cycles=%llu, end_cycles=%llu, cycles=%llu\n",
> + start_cycles, end_cycles, (end_cycles - start_cycles));
> + pr_info("Early Counter: start_ns=%llu, end_ns=%llu, ns=%llu\n",
> + start_ns, end_ns, (end_ns - start_ns));
> + pr_info("Early Counter: MULT=%u, SHIFT=%u\n", early_mult, early_shift);
> +#endif
> +
I don't get the need to have these here - should they be an help for the
integrator to calibrate and choose EARLY_COUNTER_MULT and
EARLY_COUNTER_SHIFT? The ns values printed here have some meaning only if
these two parameters are already set correctly in the first place -
what's the foreseen calibration procedure?
Moreover, if they are only required for calibration, maybe pr_debugi()
would be a better choice?
> if (ramdisk_execute_command) {
> ret = run_init_process(ramdisk_execute_command);
> if (!ret)
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index 5aee9ffb16b9..522dd24cd534 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -2210,6 +2210,19 @@ static u16 printk_sprint(char *text, u16 size, int facility,
> return text_len;
> }
>
> +#ifdef CONFIG_EARLY_COUNTER_NS
> +static inline u64 early_counter_ns(void)
> +{
> + return ((u64)get_cycles() * CONFIG_EARLY_COUNTER_MULT)
> + >> CONFIG_EARLY_COUNTER_SHIFT;
> +}
> +#else
> +static inline u64 early_counter_ns(void)
> +{
> + return 0;
> +}
> +#endif
> +
> __printf(4, 0)
> int vprintk_store(int facility, int level,
> const struct dev_printk_info *dev_info,
> @@ -2239,6 +2252,8 @@ int vprintk_store(int facility, int level,
> * timestamp with respect to the caller.
> */
> ts_nsec = local_clock();
> + if (!ts_nsec)
> + ts_nsec = early_counter_ns();
>
> caller_id = printk_caller_id();
>
> --
> 2.43.0
>
>
Best regards,
Francesco
---
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 522dd24cd534..b4108f215c5e 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2216,11 +2216,26 @@ static inline u64 early_counter_ns(void)
return ((u64)get_cycles() * CONFIG_EARLY_COUNTER_MULT)
>> CONFIG_EARLY_COUNTER_SHIFT;
}
+
+static u64 early_counter_ns_offset(void)
+{
+ static u64 early_counter_ns_start = 0;
+
+ if (!early_counter_ns_start)
+ early_counter_ns_start = early_counter_ns();
+
+ return early_counter_ns_start;
+}
#else
static inline u64 early_counter_ns(void)
{
return 0;
}
+
+static inline u64 early_counter_ns_offset(void)
+{
+ return 0;
+}
#endif
__printf(4, 0)
@@ -2254,6 +2269,8 @@ int vprintk_store(int facility, int level,
ts_nsec = local_clock();
if (!ts_nsec)
ts_nsec = early_counter_ns();
+ else
+ ts_nsec += early_counter_ns_offset();
caller_id = printk_caller_id();
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH] printk: add early_counter_ns routine for printk blind spot
2025-11-25 13:08 ` Francesco Valla
@ 2025-11-26 7:38 ` Geert Uytterhoeven
2025-11-27 0:16 ` Bird, Tim
2025-11-26 12:55 ` Petr Mladek
1 sibling, 1 reply; 10+ messages in thread
From: Geert Uytterhoeven @ 2025-11-26 7:38 UTC (permalink / raw)
To: Francesco Valla
Cc: Tim Bird, pmladek, Steve Rostedt, john.ogness, senozhatsky,
Tim Bird, Andrew Morton, LKML, Linux Embedded
Hi all,
On Wed, 26 Nov 2025 at 03:24, Francesco Valla <francesco@valla.it> wrote:
> On Mon, Nov 24, 2025 at 10:30:52PM -0700, Tim Bird wrote:
> > From: Tim Bird <tim.bird@sony.com>
> >
> > During early boot, printk timestamps are reported as zero,
> > which creates a blind spot in early boot timings. This blind
> > spot hinders timing and optimization efforts for code that
> > executes before time_init(), which is when local_clock() is
> > initialized sufficiently to start returning non-zero timestamps.
> > This period is about 400 milliseconds for many current desktop
> > and embedded machines running Linux.
> >
> > Add an early_counter_ns function that returns nanosecond
> > timestamps based on get_cycles(). get_cycles() is operational
> > on arm64 and x86_64 from kernel start. Add some calibration
> > printks to allow setting configuration variables that are used
> > to convert cycle counts to nanoseconds (which are then used
> > in early printks). Add CONFIG_EARLY_COUNTER_NS, as well as
> > some associated conversion variables, as new kernel config
> > variables.
> >
> > After proper configuration, this yields non-zero timestamps for
> > printks from the very start of kernel execution. The timestamps
> > are relative to the start of the architecture-specific counter
> > used in get_cycles (e.g. the TSC on x86_64 and cntvct_el0 on arm64).
> > This means that the time reported reflects time-from-power-on for
> > most embedded products. This is also a useful data point for
> > boot-time optimization work.
> >
> > Note that there is a discontinuity in the timestamp sequencing
> > when standard clocks are finally initialized in time_init().
> > The printk timestamps are thus not monotonically increasing
> > through the entire boot.
>
> This is... not going to work, IMO, and might lead to breakages in
> userspace tools (are printk timings a userspace API?).
I think they are.
Another approach would be to defer the calibration/conversion to
userspace, and make sure the early part stands out.
I.e. when real timekeeping is available, kernel messages are prefixed by
"[%5lu.%06lu]". Early messages could be prefixed by a plain integer
"[%12u]", containing the raw cycle counter value.
The presence of the decimal point would make the difference obvious.
> I actually have a counter-proposal: the time obtained through cycle
> evaluation is used as an offset to be added to the printk time after
> time_init() is called. A (working, but maybe sub-optimal) patch to
> obtain this is attached at the end.
Oh, that's a nice idea, too!
Gr{oetje,eeting}s,
Geert
--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org
In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] printk: add early_counter_ns routine for printk blind spot
2025-11-25 5:30 [PATCH] printk: add early_counter_ns routine for printk blind spot Tim Bird
2025-11-25 7:52 ` kernel test robot
2025-11-25 13:08 ` Francesco Valla
@ 2025-11-26 11:13 ` Petr Mladek
2025-11-27 9:13 ` kernel test robot
3 siblings, 0 replies; 10+ messages in thread
From: Petr Mladek @ 2025-11-26 11:13 UTC (permalink / raw)
To: Tim Bird
Cc: Steve Rostedt, john.ogness, senozhatsky, Tim Bird, Andrew Morton,
Francesco Valla, LKML, Anna-Maria Behnsen, Frederic Weisbecker,
Thomas Gleixner, Linux Embedded
Adding some people from the time subsystem into Cc.
Please, keep them in the loop in the eventual next version of the
patch.
For the new people, please note the discussion has already started,
see
https://lore.kernel.org/r/39b09edb-8998-4ebd-a564-7d594434a981@bird.org
Best Regards,
Petr
On Mon 2025-11-24 22:30:52, Tim Bird wrote:
> From: Tim Bird <tim.bird@sony.com>
>
> During early boot, printk timestamps are reported as zero,
> which creates a blind spot in early boot timings. This blind
> spot hinders timing and optimization efforts for code that
> executes before time_init(), which is when local_clock() is
> initialized sufficiently to start returning non-zero timestamps.
> This period is about 400 milliseconds for many current desktop
> and embedded machines running Linux.
>
> Add an early_counter_ns function that returns nanosecond
> timestamps based on get_cycles(). get_cycles() is operational
> on arm64 and x86_64 from kernel start. Add some calibration
> printks to allow setting configuration variables that are used
> to convert cycle counts to nanoseconds (which are then used
> in early printks). Add CONFIG_EARLY_COUNTER_NS, as well as
> some associated conversion variables, as new kernel config
> variables.
>
> After proper configuration, this yields non-zero timestamps for
> printks from the very start of kernel execution. The timestamps
> are relative to the start of the architecture-specific counter
> used in get_cycles (e.g. the TSC on x86_64 and cntvct_el0 on arm64).
> This means that the time reported reflects time-from-power-on for
> most embedded products. This is also a useful data point for
> boot-time optimization work.
>
> Note that there is a discontinuity in the timestamp sequencing
> when standard clocks are finally initialized in time_init().
> The printk timestamps are thus not monotonically increasing
> through the entire boot.
>
> Signed-off-by: Tim Bird <tim.bird@sony.com>
> ---
> init/Kconfig | 47 ++++++++++++++++++++++++++++++++++++++++++
> init/main.c | 25 ++++++++++++++++++++++
> kernel/printk/printk.c | 15 ++++++++++++++
> 3 files changed, 87 insertions(+)
>
> diff --git a/init/Kconfig b/init/Kconfig
> index cab3ad28ca49..5352567c43ed 100644
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -770,6 +770,53 @@ config IKHEADERS
> or similar programs. If you build the headers as a module, a module called
> kheaders.ko is built which can be loaded on-demand to get access to headers.
>
> +config EARLY_COUNTER_NS
> + bool "Use counter for early printk timestamps"
> + default y
> + depends on PRINTK
> + help
> + Use a cycle-counter to provide printk timestamps during early
> + boot. This allows seeing timing information that would
> + otherwise be displayed with 0-valued timestamps.
> +
> + In order for this to work, you need to specify values for
> + EARLY_COUNTER_MULT and EARLY_COUNTER_SHIFT, used to convert
> + from the cycle count to nanoseconds.
> +
> +config EARLY_COUNTER_MULT
> + int "Multiplier for early cycle counter"
> + depends on PRINTK && EARLY_COUNTER_NS
> + default 1
> + help
> + This value specifies a multiplier to be used when converting
> + cycle counts to nanoseconds. The formula used is:
> + ns = (cycles * mult) >> shift
> +
> + Use a multiplier that will bring the value of (cycles * mult)
> + to near a power of two, that is greater than 1000. The
> + nanoseconds returned by this conversion are divided by 1000
> + to be used as the printk timestamp counter (with resolution
> + of microseconds).
> +
> + As an example, for a cycle-counter with a frequency of 200 Mhz,
> + the multiplier would be: 10485760, and the shift would be 21.
> +
> +config EARLY_COUNTER_SHIFT
> + int "Shift value for early cycle counter"
> + range 0 63
> + depends on PRINTK && EARLY_COUNTER_NS
> + default 0
> + help
> + This value specifies a shift value to be used when converting
> + cycle counts to nanoseconds. The formula used is:
> + ns = (cycles * mult) >> shift
> +
> + Use a shift that will bring the result to a value
> + in nanoseconds.
> +
> + As an example, for a cycle-counter with a frequency of 200 Mhz,
> + the multiplier would be: 10485760, and the shift would be 21.
> +
> config LOG_BUF_SHIFT
> int "Kernel log buffer size (16 => 64KB, 17 => 128KB)"
> range 12 25
> diff --git a/init/main.c b/init/main.c
> index 07a3116811c5..587aaaad22d1 100644
> --- a/init/main.c
> +++ b/init/main.c
> @@ -105,6 +105,8 @@
> #include <linux/ptdump.h>
> #include <linux/time_namespace.h>
> #include <net/net_namespace.h>
> +#include <linux/timex.h>
> +#include <linux/sched/clock.h>
>
> #include <asm/io.h>
> #include <asm/setup.h>
> @@ -906,6 +908,8 @@ static void __init early_numa_node_init(void)
> #endif
> }
>
> +static u64 start_cycles, start_ns;
> +
> asmlinkage __visible __init __no_sanitize_address __noreturn __no_stack_protector
> void start_kernel(void)
> {
> @@ -1023,6 +1027,10 @@ void start_kernel(void)
> timekeeping_init();
> time_init();
>
> + /* used to calibrate early_counter_ns */
> + start_cycles = get_cycles();
> + start_ns = local_clock();
> +
> /* This must be after timekeeping is initialized */
> random_init();
>
> @@ -1474,6 +1482,8 @@ void __weak free_initmem(void)
> static int __ref kernel_init(void *unused)
> {
> int ret;
> + u64 end_cycles, end_ns;
> + u32 early_mult, early_shift;
>
> /*
> * Wait until kthreadd is all set-up.
> @@ -1505,6 +1515,21 @@ static int __ref kernel_init(void *unused)
>
> do_sysctl_args();
>
> + /* show calibration data for early_counter_ns */
> + end_cycles = get_cycles();
> + end_ns = local_clock();
> + clocks_calc_mult_shift(&early_mult, &early_shift,
> + ((end_cycles - start_cycles) * NSEC_PER_SEC)/(end_ns - start_ns),
> + NSEC_PER_SEC, 50);
> +
> +#ifdef CONFIG_EARLY_COUNTER_NS
> + pr_info("Early Counter: start_cycles=%llu, end_cycles=%llu, cycles=%llu\n",
> + start_cycles, end_cycles, (end_cycles - start_cycles));
> + pr_info("Early Counter: start_ns=%llu, end_ns=%llu, ns=%llu\n",
> + start_ns, end_ns, (end_ns - start_ns));
> + pr_info("Early Counter: MULT=%u, SHIFT=%u\n", early_mult, early_shift);
> +#endif
> +
> if (ramdisk_execute_command) {
> ret = run_init_process(ramdisk_execute_command);
> if (!ret)
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index 5aee9ffb16b9..522dd24cd534 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -2210,6 +2210,19 @@ static u16 printk_sprint(char *text, u16 size, int facility,
> return text_len;
> }
>
> +#ifdef CONFIG_EARLY_COUNTER_NS
> +static inline u64 early_counter_ns(void)
> +{
> + return ((u64)get_cycles() * CONFIG_EARLY_COUNTER_MULT)
> + >> CONFIG_EARLY_COUNTER_SHIFT;
> +}
> +#else
> +static inline u64 early_counter_ns(void)
> +{
> + return 0;
> +}
> +#endif
> +
> __printf(4, 0)
> int vprintk_store(int facility, int level,
> const struct dev_printk_info *dev_info,
> @@ -2239,6 +2252,8 @@ int vprintk_store(int facility, int level,
> * timestamp with respect to the caller.
> */
> ts_nsec = local_clock();
> + if (!ts_nsec)
> + ts_nsec = early_counter_ns();
>
> caller_id = printk_caller_id();
>
> --
> 2.43.0
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] printk: add early_counter_ns routine for printk blind spot
2025-11-25 13:08 ` Francesco Valla
2025-11-26 7:38 ` Geert Uytterhoeven
@ 2025-11-26 12:55 ` Petr Mladek
2025-11-27 0:03 ` Bird, Tim
1 sibling, 1 reply; 10+ messages in thread
From: Petr Mladek @ 2025-11-26 12:55 UTC (permalink / raw)
To: Francesco Valla
Cc: Tim Bird, Steve Rostedt, john.ogness, senozhatsky, Tim Bird,
Andrew Morton, LKML, Anna-Maria Behnsen, Frederic Weisbecker,
Thomas Gleixner, Linux Embedded
On Tue 2025-11-25 14:08:40, Francesco Valla wrote:
> Hi Tim,
>
> I tested this on my i.MX93 FRDM (arm64) board and after a bit of
> fiddling with the MULT/SHIFT values I got it working. It can be a very
> valuable addition.
>
> Some comments follow.
>
> On Mon, Nov 24, 2025 at 10:30:52PM -0700, Tim Bird wrote:
> > From: Tim Bird <tim.bird@sony.com>
> >
> > During early boot, printk timestamps are reported as zero,
> > which creates a blind spot in early boot timings. This blind
> > spot hinders timing and optimization efforts for code that
> > executes before time_init(), which is when local_clock() is
> > initialized sufficiently to start returning non-zero timestamps.
> > This period is about 400 milliseconds for many current desktop
> > and embedded machines running Linux.
> >
> > Add an early_counter_ns function that returns nanosecond
> > timestamps based on get_cycles(). get_cycles() is operational
> > on arm64 and x86_64 from kernel start. Add some calibration
> > printks to allow setting configuration variables that are used
> > to convert cycle counts to nanoseconds (which are then used
> > in early printks). Add CONFIG_EARLY_COUNTER_NS, as well as
> > some associated conversion variables, as new kernel config
> > variables.
> >
> > After proper configuration, this yields non-zero timestamps for
> > printks from the very start of kernel execution. The timestamps
> > are relative to the start of the architecture-specific counter
> > used in get_cycles (e.g. the TSC on x86_64 and cntvct_el0 on arm64).
> > This means that the time reported reflects time-from-power-on for
> > most embedded products. This is also a useful data point for
> > boot-time optimization work.
> >
> > Note that there is a discontinuity in the timestamp sequencing
> > when standard clocks are finally initialized in time_init().
> > The printk timestamps are thus not monotonically increasing
> > through the entire boot.
>
> This is... not going to work, IMO, and might lead to breakages in
> userspace tools (are printk timings a userspace API?).
Honestly, I am not sure if it would break anything. The fact is
that printk() always used monotonic timers. And it is possible
that some userspace depends on it.
I personally thing that non-monotonic time stamps might be confusing
but they should not cause any serious breakage. But I might be wrong.
People are creative...
> I actually have a counter-proposal: the time obtained through cycle
> evaluation is used as an offset to be added to the printk time after
> time_init() is called. A (working, but maybe sub-optimal) patch to
> obtain this is attached at the end.
I am not sure if this is a good idea. The offset would cause
that all post-timer-init printk timestamps differ from values
provided by the timer API. And it might cause confusion,
for example, when they are printed as part of the message,
or when analyzing a crash dump.
On the other hand, there are various clock sources in the kernel
which are not comparable anyway. So maybe I am too cautious.
> > --- a/init/Kconfig
> > +++ b/init/Kconfig
> > @@ -770,6 +770,53 @@ config IKHEADERS
> > or similar programs. If you build the headers as a module, a module called
> > kheaders.ko is built which can be loaded on-demand to get access to headers.
> >
> > +config EARLY_COUNTER_NS
> > + bool "Use counter for early printk timestamps"
> > + default y
> > + depends on PRINTK
> > + help
> > + Use a cycle-counter to provide printk timestamps during early
> > + boot. This allows seeing timing information that would
> > + otherwise be displayed with 0-valued timestamps.
> > +
> > + In order for this to work, you need to specify values for
> > + EARLY_COUNTER_MULT and EARLY_COUNTER_SHIFT, used to convert
> > + from the cycle count to nanoseconds.
> > +
> > +config EARLY_COUNTER_MULT
> > + int "Multiplier for early cycle counter"
> > + depends on PRINTK && EARLY_COUNTER_NS
> > + default 1
> > + help
> > + This value specifies a multiplier to be used when converting
> > + cycle counts to nanoseconds. The formula used is:
> > + ns = (cycles * mult) >> shift
> > +
> > + Use a multiplier that will bring the value of (cycles * mult)
> > + to near a power of two, that is greater than 1000. The
> > + nanoseconds returned by this conversion are divided by 1000
> > + to be used as the printk timestamp counter (with resolution
> > + of microseconds).
> > +
> > + As an example, for a cycle-counter with a frequency of 200 Mhz,
> > + the multiplier would be: 10485760, and the shift would be 21.
> > +
>
> If I got this correclty:
>
> EARLY_COUNTER_MULT = (10^9 / freq) << EARLY_COUNTER_SHIFT
>
> where EARLY_COUNTER_SHIFT can be chosen at will, provided it is big
> enough to survice the ns->us conversion but small enough not to overflow
> the u64 container.
>
> > +config EARLY_COUNTER_SHIFT
> > + int "Shift value for early cycle counter"
> > + range 0 63
> > + depends on PRINTK && EARLY_COUNTER_NS
> > + default 0
> > + help
> > + This value specifies a shift value to be used when converting
> > + cycle counts to nanoseconds. The formula used is:
> > + ns = (cycles * mult) >> shift
> > +
> > + Use a shift that will bring the result to a value
> > + in nanoseconds.
> > +
> > + As an example, for a cycle-counter with a frequency of 200 Mhz,
> > + the multiplier would be: 10485760, and the shift would be 21.
> > +
> > config LOG_BUF_SHIFT
> > int "Kernel log buffer size (16 => 64KB, 17 => 128KB)"
> > range 12 25
So, it is usable only for a particular HW. It is not usable for a
generic kernel which is supposed to run on misc HW.
I guess that there is no way to detect the CPU frequency at runtime?
Best Regards,
Petr
^ permalink raw reply [flat|nested] 10+ messages in thread
* RE: [PATCH] printk: add early_counter_ns routine for printk blind spot
2025-11-26 12:55 ` Petr Mladek
@ 2025-11-27 0:03 ` Bird, Tim
0 siblings, 0 replies; 10+ messages in thread
From: Bird, Tim @ 2025-11-27 0:03 UTC (permalink / raw)
To: Petr Mladek, Francesco Valla
Cc: Tim Bird, Steve Rostedt, john.ogness@linutronix.de,
senozhatsky@chromium.org, Andrew Morton, LKML, Anna-Maria Behnsen,
Frederic Weisbecker, Thomas Gleixner, Linux Embedded
> -----Original Message-----
> From: Petr Mladek <pmladek@suse.com>
> On Tue 2025-11-25 14:08:40, Francesco Valla wrote:
> > Hi Tim,
> >
> > I tested this on my i.MX93 FRDM (arm64) board and after a bit of
> > fiddling with the MULT/SHIFT values I got it working. It can be a very
> > valuable addition.
> >
> > Some comments follow.
> >
> > On Mon, Nov 24, 2025 at 10:30:52PM -0700, Tim Bird wrote:
> > > From: Tim Bird <tim.bird@sony.com>
> > >
> > > During early boot, printk timestamps are reported as zero,
> > > which creates a blind spot in early boot timings. This blind
> > > spot hinders timing and optimization efforts for code that
> > > executes before time_init(), which is when local_clock() is
> > > initialized sufficiently to start returning non-zero timestamps.
> > > This period is about 400 milliseconds for many current desktop
> > > and embedded machines running Linux.
> > >
> > > Add an early_counter_ns function that returns nanosecond
> > > timestamps based on get_cycles(). get_cycles() is operational
> > > on arm64 and x86_64 from kernel start. Add some calibration
> > > printks to allow setting configuration variables that are used
> > > to convert cycle counts to nanoseconds (which are then used
> > > in early printks). Add CONFIG_EARLY_COUNTER_NS, as well as
> > > some associated conversion variables, as new kernel config
> > > variables.
> > >
> > > After proper configuration, this yields non-zero timestamps for
> > > printks from the very start of kernel execution. The timestamps
> > > are relative to the start of the architecture-specific counter
> > > used in get_cycles (e.g. the TSC on x86_64 and cntvct_el0 on arm64).
> > > This means that the time reported reflects time-from-power-on for
> > > most embedded products. This is also a useful data point for
> > > boot-time optimization work.
> > >
> > > Note that there is a discontinuity in the timestamp sequencing
> > > when standard clocks are finally initialized in time_init().
> > > The printk timestamps are thus not monotonically increasing
> > > through the entire boot.
> >
> > This is... not going to work, IMO, and might lead to breakages in
> > userspace tools (are printk timings a userspace API?).
>
> Honestly, I am not sure if it would break anything. The fact is
> that printk() always used monotonic timers. And it is possible
> that some userspace depends on it.
>
> I personally thing that non-monotonic time stamps might be confusing
> but they should not cause any serious breakage. But I might be wrong.
> People are creative...
I worried about this, but I'm skeptical it's a big deal. Humans might be
a little confused, but it's not difficult to see what's going on just by looking
at the timestamps. If a tool breaks, especially something that's used
in automation, e.g. it's used to report results, or is in some sort of CI
loop where the break will cascade into a test failure, then that's a bigger issue.
But right now I'm not aware of any boot-time tests where that would be the case.
I'll comment more on different fixes for this below.
>
> > I actually have a counter-proposal: the time obtained through cycle
> > evaluation is used as an offset to be added to the printk time after
> > time_init() is called. A (working, but maybe sub-optimal) patch to
> > obtain this is attached at the end.
>
> I am not sure if this is a good idea. The offset would cause
> that all post-timer-init printk timestamps differ from values
> provided by the timer API. And it might cause confusion,
> for example, when they are printed as part of the message,
> or when analyzing a crash dump.
>
> On the other hand, there are various clock sources in the kernel
> which are not comparable anyway. So maybe I am too cautious.
I thought of adding an offset, but I didn't want to disturb anything past
the time_init() call. As it is now, the early_counter_ns feature only
changes the zero-valued timestamps. So anything relying on the absolute
value of an existing timestamp later in the boot would not be affected.
I thought that if people suddenly saw the timestamps jump by 10 to 30 seconds
(since they are now relative to machine start instead of to kernel clock start
(time_init()), it would be very jarring. I suppose they would get used to it,
though, and all relative timings should stay the same.
I also didn't want to add additional overhead (even a single add) in the case
where CONFIG_EARLY_COUNTER_NS was disabled. But, realistically, I
don't think an additional add in the printk path is not going to be noticeable,
and I can probably structure it so that there's absolutely no overhead when
CONFIG_EARLY_COUNTER_NS is disabled.
I considered embedding an early counter offset into local_clock(), and thus
not modifying the printk code at all. This would have the benefit
of keeping the printk timestamps consistent with other uses of local_clock()
data (such as crash dumps or inside other messages). But then that would embed the
early_counter overhead into every user of local_clock() (whether humans
saw the timestamp values or not). And some of those users might be more
performance sensitive than printk is.
Finally, I considered adding another config option to control adding the offset,
but at 3 configs for this fairly niche functionality, I thought I was already
pushing my luck. But a new config would be easy to add.
My plan is to add an offset for the early_counter_ns value to printk, without
a config, in the next patch version, and see what people think.
>
> > > --- a/init/Kconfig
> > > +++ b/init/Kconfig
> > > @@ -770,6 +770,53 @@ config IKHEADERS
> > > or similar programs. If you build the headers as a module, a module called
> > > kheaders.ko is built which can be loaded on-demand to get access to headers.
> > >
> > > +config EARLY_COUNTER_NS
> > > + bool "Use counter for early printk timestamps"
> > > + default y
> > > + depends on PRINTK
> > > + help
> > > + Use a cycle-counter to provide printk timestamps during early
> > > + boot. This allows seeing timing information that would
> > > + otherwise be displayed with 0-valued timestamps.
> > > +
> > > + In order for this to work, you need to specify values for
> > > + EARLY_COUNTER_MULT and EARLY_COUNTER_SHIFT, used to convert
> > > + from the cycle count to nanoseconds.
> > > +
> > > +config EARLY_COUNTER_MULT
> > > + int "Multiplier for early cycle counter"
> > > + depends on PRINTK && EARLY_COUNTER_NS
> > > + default 1
> > > + help
> > > + This value specifies a multiplier to be used when converting
> > > + cycle counts to nanoseconds. The formula used is:
> > > + ns = (cycles * mult) >> shift
> > > +
> > > + Use a multiplier that will bring the value of (cycles * mult)
> > > + to near a power of two, that is greater than 1000. The
> > > + nanoseconds returned by this conversion are divided by 1000
> > > + to be used as the printk timestamp counter (with resolution
> > > + of microseconds).
> > > +
> > > + As an example, for a cycle-counter with a frequency of 200 Mhz,
> > > + the multiplier would be: 10485760, and the shift would be 21.
> > > +
> >
> > If I got this correclty:
> >
> > EARLY_COUNTER_MULT = (10^9 / freq) << EARLY_COUNTER_SHIFT
> >
> > where EARLY_COUNTER_SHIFT can be chosen at will, provided it is big
> > enough to survice the ns->us conversion but small enough not to overflow
> > the u64 container.
Yeah, I was worried about these Kconfig explanations. I think it will be easier
to just explain how I recommend this should be configured, which is something like:
Turn on CONFIG_EARLY_COUNTER_NS, run the kernel once, get the values for MULT
and SHIFT (printed by the calibration code), enter them in the appropriate configs, and
then build and run the kernel again. This only needs to be done once per platform,
and could even be put into the defconfig (or a config fragment) for a platform.
(More on having hardcoded config for this below). It was not my intent to have
kernel developers doing weird (shift) math to enable this feature. Note that I can't
get the values from devicetree or the kernel command line, as this is used before
either of those is initialized or parsed.
> >
> > > +config EARLY_COUNTER_SHIFT
> > > + int "Shift value for early cycle counter"
> > > + range 0 63
> > > + depends on PRINTK && EARLY_COUNTER_NS
> > > + default 0
> > > + help
> > > + This value specifies a shift value to be used when converting
> > > + cycle counts to nanoseconds. The formula used is:
> > > + ns = (cycles * mult) >> shift
> > > +
> > > + Use a shift that will bring the result to a value
> > > + in nanoseconds.
> > > +
> > > + As an example, for a cycle-counter with a frequency of 200 Mhz,
> > > + the multiplier would be: 10485760, and the shift would be 21.
> > > +
> > > config LOG_BUF_SHIFT
> > > int "Kernel log buffer size (16 => 64KB, 17 => 128KB)"
> > > range 12 25
>
> So, it is usable only for a particular HW. It is not usable for a
> generic kernel which is supposed to run on misc HW.
That's correct. It is mostly targeted at embedded products,
where shaving off 10 to 200 milliseconds in the pre-clock-initialization
region of boot would be valuable. For people doing aggressive
boot time optimization, they will have a custom kernel anyway
(and probably a custom bootloader, device tree, initramfs, udev rules,
SE linux rules, module load ordering, systemd config, etc.)
Basically, if you're optimizing the code in this kernel boot time blind
spot, you are very likely not using a generic kernel. (That's not to
say that the optimizations made won't ultimately be valuable to people using
generic kernels).
>
> I guess that there is no way to detect the CPU frequency at runtime?
This "feature" is used before clock initialization, which is what
would be used to calibrate the CPU frequency at runtime. This runs so
early that doing the calibration inline doesn't work. Not enough kernel
services are available (Actually, zero services are available, as this
can be used in the very first printk in start_kernel.)
Plan from here...
I got a compile error from 0-day on powerpc, so I need to re-spin to fix that.
I'll address the other issues raised and submit a new version when I can.
I'm off to Japan this week, and between business travel and the holidays,
and being away from my lab where I can do hardware testing, it will
probably be some time in January before I send the next version.
Thanks very much for the feedback!
-- Tim
^ permalink raw reply [flat|nested] 10+ messages in thread
* RE: [PATCH] printk: add early_counter_ns routine for printk blind spot
2025-11-26 7:38 ` Geert Uytterhoeven
@ 2025-11-27 0:16 ` Bird, Tim
2025-11-27 16:16 ` Petr Mladek
0 siblings, 1 reply; 10+ messages in thread
From: Bird, Tim @ 2025-11-27 0:16 UTC (permalink / raw)
To: Geert Uytterhoeven, Francesco Valla
Cc: Tim Bird, pmladek@suse.com, Steve Rostedt,
john.ogness@linutronix.de, senozhatsky@chromium.org,
Andrew Morton, LKML, Linux Embedded
> -----Original Message-----
> From: Geert Uytterhoeven <geert@linux-m68k.org>
> Hi all,
>
> On Wed, 26 Nov 2025 at 03:24, Francesco Valla <francesco@valla.it> wrote:
> > On Mon, Nov 24, 2025 at 10:30:52PM -0700, Tim Bird wrote:
> > > From: Tim Bird <tim.bird@sony.com>
> > >
> > > During early boot, printk timestamps are reported as zero,
> > > which creates a blind spot in early boot timings. This blind
> > > spot hinders timing and optimization efforts for code that
> > > executes before time_init(), which is when local_clock() is
> > > initialized sufficiently to start returning non-zero timestamps.
> > > This period is about 400 milliseconds for many current desktop
> > > and embedded machines running Linux.
> > >
> > > Add an early_counter_ns function that returns nanosecond
> > > timestamps based on get_cycles(). get_cycles() is operational
> > > on arm64 and x86_64 from kernel start. Add some calibration
> > > printks to allow setting configuration variables that are used
> > > to convert cycle counts to nanoseconds (which are then used
> > > in early printks). Add CONFIG_EARLY_COUNTER_NS, as well as
> > > some associated conversion variables, as new kernel config
> > > variables.
> > >
> > > After proper configuration, this yields non-zero timestamps for
> > > printks from the very start of kernel execution. The timestamps
> > > are relative to the start of the architecture-specific counter
> > > used in get_cycles (e.g. the TSC on x86_64 and cntvct_el0 on arm64).
> > > This means that the time reported reflects time-from-power-on for
> > > most embedded products. This is also a useful data point for
> > > boot-time optimization work.
> > >
> > > Note that there is a discontinuity in the timestamp sequencing
> > > when standard clocks are finally initialized in time_init().
> > > The printk timestamps are thus not monotonically increasing
> > > through the entire boot.
> >
> > This is... not going to work, IMO, and might lead to breakages in
> > userspace tools (are printk timings a userspace API?).
>
> I think they are.
>
> Another approach would be to defer the calibration/conversion to
> userspace, and make sure the early part stands out.
> I.e. when real timekeeping is available, kernel messages are prefixed by
> "[%5lu.%06lu]". Early messages could be prefixed by a plain integer
> "[%12u]", containing the raw cycle counter value.
> The presence of the decimal point would make the difference obvious.
I thought about this while I was creating this.
It wouldn't require the extra configuration for MULT and SHIFT (which would be nice),
and it would be, as you say, very obvious that this was not a regular timestamp.
This means it could be enabled on a generic kernel (making more likely it could be
enabled by default). And really only boot-time optimizers would care enough to
decode the data, so the onus would be on them to run the tool. Everyone else
could ignore them.
I'm not sure if it would break existing printk-processing tools. I suspect it would.
Also, I find that post-processing tools often get overlooked.
I asked at ELC this year how many people are using show_delta, which
has been upstream for years, and can do a few neat things with printk timestamps,
and not a single person had even heard of it.
In this scenario, you would still need to have the calibration printks in
the code so that the tool could pull them out to then convert the cycle-valued
printks into printks with regular timestamps.
I could see doing this if people object to the non-genericity of the current
solution.
>
> > I actually have a counter-proposal: the time obtained through cycle
> > evaluation is used as an offset to be added to the printk time after
> > time_init() is called. A (working, but maybe sub-optimal) patch to
> > obtain this is attached at the end.
>
> Oh, that's a nice idea, too!
Thanks for the feedback!
-- Tim
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] printk: add early_counter_ns routine for printk blind spot
2025-11-25 5:30 [PATCH] printk: add early_counter_ns routine for printk blind spot Tim Bird
` (2 preceding siblings ...)
2025-11-26 11:13 ` Petr Mladek
@ 2025-11-27 9:13 ` kernel test robot
3 siblings, 0 replies; 10+ messages in thread
From: kernel test robot @ 2025-11-27 9:13 UTC (permalink / raw)
To: Tim Bird, pmladek, Steve Rostedt, john.ogness, senozhatsky
Cc: oe-kbuild-all, Tim Bird, Andrew Morton,
Linux Memory Management List, Francesco Valla, LKML,
Linux Embedded
Hi Tim,
kernel test robot noticed the following build errors:
[auto build test ERROR on linus/master]
[also build test ERROR on v6.18-rc7]
[cannot apply to akpm-mm/mm-everything next-20251127]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Tim-Bird/printk-add-early_counter_ns-routine-for-printk-blind-spot/20251125-133242
base: linus/master
patch link: https://lore.kernel.org/r/39b09edb-8998-4ebd-a564-7d594434a981%40bird.org
patch subject: [PATCH] printk: add early_counter_ns routine for printk blind spot
config: i386-randconfig-2006-20250825 (https://download.01.org/0day-ci/archive/20251127/202511271051.yfp2O98B-lkp@intel.com/config)
compiler: gcc-14 (Debian 14.2.0-19) 14.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20251127/202511271051.yfp2O98B-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202511271051.yfp2O98B-lkp@intel.com/
All errors (new ones prefixed by >>):
ld: init/main.o: in function `kernel_init':
>> init/main.c:1522:(.ref.text+0x201): undefined reference to `__udivdi3'
vim +1522 init/main.c
1481
1482 static int __ref kernel_init(void *unused)
1483 {
1484 int ret;
1485 u64 end_cycles, end_ns;
1486 u32 early_mult, early_shift;
1487
1488 /*
1489 * Wait until kthreadd is all set-up.
1490 */
1491 wait_for_completion(&kthreadd_done);
1492
1493 kernel_init_freeable();
1494 /* need to finish all async __init code before freeing the memory */
1495 async_synchronize_full();
1496
1497 system_state = SYSTEM_FREEING_INITMEM;
1498 kprobe_free_init_mem();
1499 ftrace_free_init_mem();
1500 kgdb_free_init_mem();
1501 exit_boot_config();
1502 free_initmem();
1503 mark_readonly();
1504
1505 /*
1506 * Kernel mappings are now finalized - update the userspace page-table
1507 * to finalize PTI.
1508 */
1509 pti_finalize();
1510
1511 system_state = SYSTEM_RUNNING;
1512 numa_default_policy();
1513
1514 rcu_end_inkernel_boot();
1515
1516 do_sysctl_args();
1517
1518 /* show calibration data for early_counter_ns */
1519 end_cycles = get_cycles();
1520 end_ns = local_clock();
1521 clocks_calc_mult_shift(&early_mult, &early_shift,
> 1522 ((end_cycles - start_cycles) * NSEC_PER_SEC)/(end_ns - start_ns),
1523 NSEC_PER_SEC, 50);
1524
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] printk: add early_counter_ns routine for printk blind spot
2025-11-27 0:16 ` Bird, Tim
@ 2025-11-27 16:16 ` Petr Mladek
0 siblings, 0 replies; 10+ messages in thread
From: Petr Mladek @ 2025-11-27 16:16 UTC (permalink / raw)
To: Bird, Tim
Cc: Geert Uytterhoeven, Francesco Valla, Tim Bird, Steve Rostedt,
john.ogness@linutronix.de, senozhatsky@chromium.org,
Andrew Morton, LKML, Linux Embedded
On Thu 2025-11-27 00:16:23, Bird, Tim wrote:
>
>
> > -----Original Message-----
> > From: Geert Uytterhoeven <geert@linux-m68k.org>
> > Hi all,
> >
> > On Wed, 26 Nov 2025 at 03:24, Francesco Valla <francesco@valla.it> wrote:
> > > On Mon, Nov 24, 2025 at 10:30:52PM -0700, Tim Bird wrote:
> > > > From: Tim Bird <tim.bird@sony.com>
> > > >
> > > > During early boot, printk timestamps are reported as zero,
> > > > which creates a blind spot in early boot timings. This blind
> > > > spot hinders timing and optimization efforts for code that
> > > > executes before time_init(), which is when local_clock() is
> > > > initialized sufficiently to start returning non-zero timestamps.
> > > > This period is about 400 milliseconds for many current desktop
> > > > and embedded machines running Linux.
> > > >
> > > > Add an early_counter_ns function that returns nanosecond
> > > > timestamps based on get_cycles(). get_cycles() is operational
> > > > on arm64 and x86_64 from kernel start. Add some calibration
> > > > printks to allow setting configuration variables that are used
> > > > to convert cycle counts to nanoseconds (which are then used
> > > > in early printks). Add CONFIG_EARLY_COUNTER_NS, as well as
> > > > some associated conversion variables, as new kernel config
> > > > variables.
> > > >
> > > > After proper configuration, this yields non-zero timestamps for
> > > > printks from the very start of kernel execution. The timestamps
> > > > are relative to the start of the architecture-specific counter
> > > > used in get_cycles (e.g. the TSC on x86_64 and cntvct_el0 on arm64).
> > > > This means that the time reported reflects time-from-power-on for
> > > > most embedded products. This is also a useful data point for
> > > > boot-time optimization work.
> > > >
> > > > Note that there is a discontinuity in the timestamp sequencing
> > > > when standard clocks are finally initialized in time_init().
> > > > The printk timestamps are thus not monotonically increasing
> > > > through the entire boot.
> > >
> > > This is... not going to work, IMO, and might lead to breakages in
> > > userspace tools (are printk timings a userspace API?).
> >
> > I think they are.
> >
> > Another approach would be to defer the calibration/conversion to
> > userspace, and make sure the early part stands out.
> > I.e. when real timekeeping is available, kernel messages are prefixed by
> > "[%5lu.%06lu]". Early messages could be prefixed by a plain integer
> > "[%12u]", containing the raw cycle counter value.
> > The presence of the decimal point would make the difference obvious.
>
> I thought about this while I was creating this.
> It wouldn't require the extra configuration for MULT and SHIFT (which would be nice),
> and it would be, as you say, very obvious that this was not a regular timestamp.
> This means it could be enabled on a generic kernel (making more likely it could be
> enabled by default). And really only boot-time optimizers would care enough to
> decode the data, so the onus would be on them to run the tool. Everyone else
> could ignore them.
>
> I'm not sure if it would break existing printk-processing tools. I suspect it would.
I guess that it might break even basic tools, like dmesg, journalctl,
or crash.
A solution might be to pass it as an extra information to the official
timestamp, for example:
+ on console:
<level>[timestamp][callerid][cl cycles] message
<6>[ 0.000000][ T0][cl 345678] BIOS-provided physical RAM map:
<6>[ 0.000000][ T0][cl 1036890] BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable
<6>[ 0.000000][ T0][cl 1129452] BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved
+ via /dev/kmsg
<level>,<sequnum>,<timestamp>,<contflag>[,additional_values, ... ];<message text>
6,2,0,-,caller=T0,cycle=345678;BIOS-provided physical RAM map:
6,3,0,-,caller=T0,cycle=1036890;BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable
6,4,0,-,caller=T0,cycle=1129452;BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved
The extra field would disappear after time_init().
The value might be stored in struct printk_info in the same field .ts_nsec.
It might be distinguished from a real timestamp using a flag in
enum printk_info_flags. The official timestamp would be zero when
this flag is set.
It will not require the two CONFIG_ values for calibrating the
computation.
The output on the console is a bit messy. But I guess that this
feature is rather for tuning and it won't be enabled on production
systems. So it might be acceptable.
time_init() might even print a message with the cycle value
and the official timestamp on the same line. It can be used
for post-processing and translating cycles back to ns.
> Also, I find that post-processing tools often get overlooked.
> I asked at ELC this year how many people are using show_delta, which
> has been upstream for years, and can do a few neat things with printk timestamps,
> and not a single person had even heard of it.
Yeah. We need to make sure that the post processing tool won't get mad,
for example, crash or show garbage.
Best Regards,
Petr
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2025-11-27 16:16 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-25 5:30 [PATCH] printk: add early_counter_ns routine for printk blind spot Tim Bird
2025-11-25 7:52 ` kernel test robot
2025-11-25 13:08 ` Francesco Valla
2025-11-26 7:38 ` Geert Uytterhoeven
2025-11-27 0:16 ` Bird, Tim
2025-11-27 16:16 ` Petr Mladek
2025-11-26 12:55 ` Petr Mladek
2025-11-27 0:03 ` Bird, Tim
2025-11-26 11:13 ` Petr Mladek
2025-11-27 9:13 ` kernel test robot
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).