* Re: [PATCH] rtc: interface: Add rtc time jump debug in rtc_timer_do_work()
2026-05-25 13:08 [PATCH] rtc: interface: Add rtc time jump debug in rtc_timer_do_work() Jinjie Ruan
@ 2026-06-11 1:40 ` Jinjie Ruan
2026-06-11 2:01 ` Jinjie Ruan
2026-06-15 15:22 ` Alexandre Belloni
2 siblings, 0 replies; 4+ messages in thread
From: Jinjie Ruan @ 2026-06-11 1:40 UTC (permalink / raw)
To: alexandre.belloni, linux-rtc, linux-kernel
Gentle ping.
On 5/25/2026 9:08 PM, Jinjie Ruan wrote:
> In virtualization environments like QEMU [1], or during hardware
> clocksource anomalies, an extreme time-warp event can occur. When
> the system time abruptly jumps forward, the rtc_timer_do_work() handler
> falls into a prolonged processing loop to clear accumulated historical
> timers via timerqueue_getnext(). Running this loop indefinitely under
> the rtc->ops_lock mutex triggers a kernel softlockup, stalling
> the system.
>
> Introduce an adaptive telemetry and loop guard mechanism to enhance debug
> visibility and prevent softlockups:
>
> 1. Record `start_jiffies` upon entry and leverage `time_after()` to
> check if the loop has monopolized the CPU for more than 1s (HZ). If so,
> the handler prints a telemetry warning, triggers a WARN stack dump, and
> breaks the loop to safely yield the CPU.
>
> 2. Track the execution via a `loop_count` metric. Printing this counter
> in the warning log provides vital diagnostics to distinguish
> an aggressive time-warp storm (high count) from a bogged-down callback
> bug (low count).
>
> 3. Utilize the kernel format specifier `%ptR` to convert the raw ktime
> into a human-readable timestamp (YYYY-MM-DD HH:MM:SS), allowing
> developers to instantly pinpoint the exact boundary of the time
> jump in dmesg.
>
> This non-destructive telemetry guard provides precise hardware/emulator
> diagnostic visibility while ensuring core kernel availability.
>
> [1]: https://lore.kernel.org/all/20260114013257.3500578-1-ruanjinjie@huawei.com/
> Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
> ---
> drivers/rtc/interface.c | 15 +++++++++++++--
> 1 file changed, 13 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/rtc/interface.c b/drivers/rtc/interface.c
> index 1906f4884a83..f6c5fd16cc4e 100644
> --- a/drivers/rtc/interface.c
> +++ b/drivers/rtc/interface.c
> @@ -927,10 +927,12 @@ static void rtc_timer_remove(struct rtc_device *rtc, struct rtc_timer *timer)
> */
> void rtc_timer_do_work(struct work_struct *work)
> {
> - struct rtc_timer *timer;
> + unsigned long start_jiffies = jiffies;
> struct timerqueue_node *next;
> - ktime_t now;
> + struct rtc_timer *timer;
> struct rtc_time tm;
> + int loop_count = 0;
> + ktime_t now;
> int err;
>
> struct rtc_device *rtc =
> @@ -945,6 +947,15 @@ void rtc_timer_do_work(struct work_struct *work)
> }
> now = rtc_tm_to_ktime(tm);
> while ((next = timerqueue_getnext(&rtc->timerqueue))) {
> + loop_count++;
> +
> + if (unlikely(time_after(jiffies, start_jiffies + HZ))) {
> + dev_warn(&rtc->dev, "RTC time jump (loop: %d) to %ptR.\n",
> + loop_count, &tm);
> + WARN_ON_ONCE(1);
> + break;
> + }
> +
> if (next->expires > now)
> break;
>
^ permalink raw reply [flat|nested] 4+ messages in thread* Re: [PATCH] rtc: interface: Add rtc time jump debug in rtc_timer_do_work()
2026-05-25 13:08 [PATCH] rtc: interface: Add rtc time jump debug in rtc_timer_do_work() Jinjie Ruan
2026-06-11 1:40 ` Jinjie Ruan
@ 2026-06-11 2:01 ` Jinjie Ruan
2026-06-15 15:22 ` Alexandre Belloni
2 siblings, 0 replies; 4+ messages in thread
From: Jinjie Ruan @ 2026-06-11 2:01 UTC (permalink / raw)
To: alexandre.belloni, linux-rtc, linux-kernel, tglx
+cc Thomas Gleixner
On 5/25/2026 9:08 PM, Jinjie Ruan wrote:
> In virtualization environments like QEMU [1], or during hardware
> clocksource anomalies, an extreme time-warp event can occur. When
> the system time abruptly jumps forward, the rtc_timer_do_work() handler
> falls into a prolonged processing loop to clear accumulated historical
> timers via timerqueue_getnext(). Running this loop indefinitely under
> the rtc->ops_lock mutex triggers a kernel softlockup, stalling
> the system.
>
> Introduce an adaptive telemetry and loop guard mechanism to enhance debug
> visibility and prevent softlockups:
>
> 1. Record `start_jiffies` upon entry and leverage `time_after()` to
> check if the loop has monopolized the CPU for more than 1s (HZ). If so,
> the handler prints a telemetry warning, triggers a WARN stack dump, and
> breaks the loop to safely yield the CPU.
>
> 2. Track the execution via a `loop_count` metric. Printing this counter
> in the warning log provides vital diagnostics to distinguish
> an aggressive time-warp storm (high count) from a bogged-down callback
> bug (low count).
>
> 3. Utilize the kernel format specifier `%ptR` to convert the raw ktime
> into a human-readable timestamp (YYYY-MM-DD HH:MM:SS), allowing
> developers to instantly pinpoint the exact boundary of the time
> jump in dmesg.
>
> This non-destructive telemetry guard provides precise hardware/emulator
> diagnostic visibility while ensuring core kernel availability.
>
> [1]: https://lore.kernel.org/all/20260114013257.3500578-1-ruanjinjie@huawei.com/
> Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
> ---
> drivers/rtc/interface.c | 15 +++++++++++++--
> 1 file changed, 13 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/rtc/interface.c b/drivers/rtc/interface.c
> index 1906f4884a83..f6c5fd16cc4e 100644
> --- a/drivers/rtc/interface.c
> +++ b/drivers/rtc/interface.c
> @@ -927,10 +927,12 @@ static void rtc_timer_remove(struct rtc_device *rtc, struct rtc_timer *timer)
> */
> void rtc_timer_do_work(struct work_struct *work)
> {
> - struct rtc_timer *timer;
> + unsigned long start_jiffies = jiffies;
> struct timerqueue_node *next;
> - ktime_t now;
> + struct rtc_timer *timer;
> struct rtc_time tm;
> + int loop_count = 0;
> + ktime_t now;
> int err;
>
> struct rtc_device *rtc =
> @@ -945,6 +947,15 @@ void rtc_timer_do_work(struct work_struct *work)
> }
> now = rtc_tm_to_ktime(tm);
> while ((next = timerqueue_getnext(&rtc->timerqueue))) {
> + loop_count++;
> +
> + if (unlikely(time_after(jiffies, start_jiffies + HZ))) {
> + dev_warn(&rtc->dev, "RTC time jump (loop: %d) to %ptR.\n",
> + loop_count, &tm);
> + WARN_ON_ONCE(1);
> + break;
> + }
> +
> if (next->expires > now)
> break;
>
^ permalink raw reply [flat|nested] 4+ messages in thread* Re: [PATCH] rtc: interface: Add rtc time jump debug in rtc_timer_do_work()
2026-05-25 13:08 [PATCH] rtc: interface: Add rtc time jump debug in rtc_timer_do_work() Jinjie Ruan
2026-06-11 1:40 ` Jinjie Ruan
2026-06-11 2:01 ` Jinjie Ruan
@ 2026-06-15 15:22 ` Alexandre Belloni
2 siblings, 0 replies; 4+ messages in thread
From: Alexandre Belloni @ 2026-06-15 15:22 UTC (permalink / raw)
To: Jinjie Ruan; +Cc: linux-rtc, linux-kernel
Hello,
On 25/05/2026 21:08:25+0800, Jinjie Ruan wrote:
> In virtualization environments like QEMU [1], or during hardware
> clocksource anomalies, an extreme time-warp event can occur. When
> the system time abruptly jumps forward, the rtc_timer_do_work() handler
> falls into a prolonged processing loop to clear accumulated historical
> timers via timerqueue_getnext(). Running this loop indefinitely under
> the rtc->ops_lock mutex triggers a kernel softlockup, stalling
> the system.
>
> Introduce an adaptive telemetry and loop guard mechanism to enhance debug
> visibility and prevent softlockups:
>
> 1. Record `start_jiffies` upon entry and leverage `time_after()` to
> check if the loop has monopolized the CPU for more than 1s (HZ). If so,
> the handler prints a telemetry warning, triggers a WARN stack dump, and
> breaks the loop to safely yield the CPU.
>
> 2. Track the execution via a `loop_count` metric. Printing this counter
> in the warning log provides vital diagnostics to distinguish
> an aggressive time-warp storm (high count) from a bogged-down callback
> bug (low count).
>
> 3. Utilize the kernel format specifier `%ptR` to convert the raw ktime
> into a human-readable timestamp (YYYY-MM-DD HH:MM:SS), allowing
> developers to instantly pinpoint the exact boundary of the time
> jump in dmesg.
>
> This non-destructive telemetry guard provides precise hardware/emulator
> diagnostic visibility while ensuring core kernel availability.
>
> [1]: https://lore.kernel.org/all/20260114013257.3500578-1-ruanjinjie@huawei.com/
> Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
> ---
> drivers/rtc/interface.c | 15 +++++++++++++--
> 1 file changed, 13 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/rtc/interface.c b/drivers/rtc/interface.c
> index 1906f4884a83..f6c5fd16cc4e 100644
> --- a/drivers/rtc/interface.c
> +++ b/drivers/rtc/interface.c
> @@ -927,10 +927,12 @@ static void rtc_timer_remove(struct rtc_device *rtc, struct rtc_timer *timer)
> */
> void rtc_timer_do_work(struct work_struct *work)
> {
> - struct rtc_timer *timer;
> + unsigned long start_jiffies = jiffies;
> struct timerqueue_node *next;
> - ktime_t now;
> + struct rtc_timer *timer;
> struct rtc_time tm;
> + int loop_count = 0;
> + ktime_t now;
> int err;
>
> struct rtc_device *rtc =
> @@ -945,6 +947,15 @@ void rtc_timer_do_work(struct work_struct *work)
> }
> now = rtc_tm_to_ktime(tm);
> while ((next = timerqueue_getnext(&rtc->timerqueue))) {
> + loop_count++;
> +
> + if (unlikely(time_after(jiffies, start_jiffies + HZ))) {
> + dev_warn(&rtc->dev, "RTC time jump (loop: %d) to %ptR.\n",
> + loop_count, &tm);
> + WARN_ON_ONCE(1);
So, your issue is that it is too slow so you make it even slower? There
are already plenty of tracepoints that allow proper debugging in this
loop, I'm pretty sure we don't want to bloat the kernel with more
messages.
--
Alexandre Belloni, co-owner and COO, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com
^ permalink raw reply [flat|nested] 4+ messages in thread