Linux RTC
 help / color / mirror / Atom feed
From: Alexandre Belloni <alexandre.belloni@bootlin.com>
To: Jinjie Ruan <ruanjinjie@huawei.com>
Cc: linux-rtc@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] rtc: interface: Add rtc time jump debug in rtc_timer_do_work()
Date: Tue, 16 Jun 2026 08:57:46 +0200	[thread overview]
Message-ID: <202606160657465eab6e41@mail.local> (raw)
In-Reply-To: <06cdb8b3-8a5d-4f1d-b686-6122fa6f7af9@huawei.com>

Hi,

On 16/06/2026 10:10:19+0800, Jinjie Ruan wrote:
> 
> 
> On 6/15/2026 11:22 PM, Alexandre Belloni wrote:
> > Hello,
> > 
> > On 25/05/2026 21:08:25+0800, Jinjie Ruan wrote:
> >> In virtualization environments like QEMU [1], or during hardware
> >> clocksource anomalies, an extreme time-warp event can occur. When
> >> the system time abruptly jumps forward, the rtc_timer_do_work() handler
> >> falls into a prolonged processing loop to clear accumulated historical
> >> timers via timerqueue_getnext(). Running this loop indefinitely under
> >> the rtc->ops_lock mutex triggers a kernel softlockup, stalling
> >> the system.
> >>
> >> Introduce an adaptive telemetry and loop guard mechanism to enhance debug
> >> visibility and prevent softlockups:
> >>
> >> 1. Record `start_jiffies` upon entry and leverage `time_after()` to
> >>    check if the loop has monopolized the CPU for more than 1s (HZ). If so,
> >>    the handler prints a telemetry warning, triggers a WARN stack dump, and
> >>    breaks the loop to safely yield the CPU.
> >>
> >> 2. Track the execution via a `loop_count` metric. Printing this counter
> >>    in the warning log provides vital diagnostics to distinguish
> >>    an aggressive time-warp storm (high count) from a bogged-down callback
> >>    bug (low count).
> >>
> >> 3. Utilize the kernel format specifier `%ptR` to convert the raw ktime
> >>    into a human-readable timestamp (YYYY-MM-DD HH:MM:SS), allowing
> >>    developers to instantly pinpoint the exact boundary of the time
> >>    jump in dmesg.
> >>
> >> This non-destructive telemetry guard provides precise hardware/emulator
> >> diagnostic visibility while ensuring core kernel availability.
> >>
> >> [1]: https://lore.kernel.org/all/20260114013257.3500578-1-ruanjinjie@huawei.com/
> >> Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
> >> ---
> >>  drivers/rtc/interface.c | 15 +++++++++++++--
> >>  1 file changed, 13 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/drivers/rtc/interface.c b/drivers/rtc/interface.c
> >> index 1906f4884a83..f6c5fd16cc4e 100644
> >> --- a/drivers/rtc/interface.c
> >> +++ b/drivers/rtc/interface.c
> >> @@ -927,10 +927,12 @@ static void rtc_timer_remove(struct rtc_device *rtc, struct rtc_timer *timer)
> >>   */
> >>  void rtc_timer_do_work(struct work_struct *work)
> >>  {
> >> -	struct rtc_timer *timer;
> >> +	unsigned long start_jiffies = jiffies;
> >>  	struct timerqueue_node *next;
> >> -	ktime_t now;
> >> +	struct rtc_timer *timer;
> >>  	struct rtc_time tm;
> >> +	int loop_count = 0;
> >> +	ktime_t now;
> >>  	int err;
> >>  
> >>  	struct rtc_device *rtc =
> >> @@ -945,6 +947,15 @@ void rtc_timer_do_work(struct work_struct *work)
> >>  	}
> >>  	now = rtc_tm_to_ktime(tm);
> >>  	while ((next = timerqueue_getnext(&rtc->timerqueue))) {
> >> +		loop_count++;
> >> +
> >> +		if (unlikely(time_after(jiffies, start_jiffies + HZ))) {
> >> +			dev_warn(&rtc->dev, "RTC time jump (loop: %d) to %ptR.\n",
> >> +				 loop_count, &tm);
> >> +			WARN_ON_ONCE(1);
> > 
> > So, your issue is that it is too slow so you make it even slower? There
> > are already plenty of tracepoints that allow proper debugging in this
> > loop, I'm pretty sure we don't want to bloat the kernel with more
> > messages.
> 
> Hi, Alexandre,
> 
> The point here is not about the performance of the rtc_timer_do_work()
> loop — it’s about making the problem debuggable when things go wrong.
> And we can put it under a debug Kconfig option, so production kernels
> see no extra overhead at all.


But then aren't the tracepoint enough? There are 3 tracepoints in the
loop that are exactly for debugging your issue.

> 
> The patch is installed in the following scenarios:
> 
> If the RTC hardware fails, or if the QEMU-emulated RTC device code in a
> KVM virtual machine has a problem (for example, the x86 RTC emulation
> hardware mc146818 has an overflow issue[1]), the time may jump as shown
> in the log below, which can cause a soft lockup.

This is not clear, you are not explaining the issue. You seem to mix
system time and RTC time. What I understand is that there was a periodic
timer enqueued and then for some reason, the system time jumped forward
by a large amount and now rtc_timer_do_work is firing events for each of
the missed timers. You are not explaining the relationship with the RTC
hardware (I see none). 

> 
> However, when the issue occurs, it is only possible to know that too
> many pending timers have accumulated in the timerqueue (for example, the
> log shows that tens of millions of timer nodes have been processed) by
> temporarily adding diagnostic code in rtc_timer_do_work().
> 

This is not true, there are 3 tracepoints to know what is happening with
the timers. Also, there are always exactly zero, one or two timers in
the queue, never tens of millions.

> To determine whether the root cause is hardware, kernel RTC code, an RTC
> driver issue, or an RTC hardware problem, more debugging is needed. But
> if the problem is indeed caused by RTC hardware, adding a diagnostic
> print of the current RTC time when the loop takes too long (as this
> diagnostic patch does) would make it easy to tell whether QEMU or the
> hardware is faulty.
> 
> [1]: https://lore.kernel.org/all/20260613195116.1807273-21-mjt@tls.msk.ru/
> 
>      kworker/0:1-37      [000] .N..   489.159634: rtc_timer_do_work:
> timerqueue_getnext handle timer node count: 13281423
>      kworker/0:1-37      [000] .N..   489.159635: rtc_timer_do_work:
> timerqueue_getnext handle timer node count: 13281424
>      kworker/0:1-37      [000] .N..   489.159635: rtc_timer_do_work:
> timerqueue_getnext handle timer node count: 13281425
>      kworker/0:1-37      [000] .N..   489.159636: rtc_timer_do_work:
> timerqueue_getnext handle timer node count: 13281426
>      kworker/0:1-37      [000] .N..   489.159637: rtc_timer_do_work:
> timerqueue_getnext handle timer node count: 13281427
>      kworker/0:1-37      [000] .N..   489.159638: rtc_timer_do_work:
> timerqueue_getnext handle timer node count: 13281428
>      kworker/0:1-37      [000] .N..   489.159638: rtc_timer_do_work:
> timerqueue_getnext handle timer node count: 13281429
>      kworker/0:1-37      [000] .N..   489.159639: rtc_timer_do_work:
> timerqueue_getnext handle timer node count: 13281430
>      kworker/0:1-37      [000] .N..   489.159640: rtc_timer_do_work:
> timerqueue_getnext handle timer node count: 13281431
>      kworker/0:1-37      [000] .N..   489.159641: rtc_timer_do_work:
> timerqueue_getnext handle timer node count: 13281432
> 
> 
>    swapper/0-1       [001] .N..    11.579334: __rtc_read_time: rtc:
> 0xff11000109896800, ops->read_time:2026:01:05:09:36:21
>        swapper/0-1       [001] .N..    11.579421: __rtc_read_time: rtc:
> 0xff11000109896800, ops->read_time:2026:01:05:09:36:21
>        swapper/0-1       [001] .N..    11.579469: __rtc_read_time: rtc:
> 0xff11000109896800, ops->read_time:2026:01:05:09:36:21
>        swapper/0-1       [001] .N..    11.580816: __rtc_read_time: rtc:
> 0xff11000109896800, ops->read_time:2026:01:05:09:36:21
>   syz-executor.5-7492    [003] ....   129.807406: __rtc_read_time: rtc:
> 0xff11000109896800, ops->read_time:2033:05:04:07:03:51
>   syz-executor.5-7492    [003] ....   129.807419:
> __rtc_update_irq_enable.part.8: rtc uie on: 0xff11000109896800, now:
> 2033:05:04:07:03:51, expire: 2033:05:04:07:03:52
> 
> 
> Best regards,
> Jinjie
> 
> > 
> > 
> 

-- 
Alexandre Belloni, co-owner and COO, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com

  reply	other threads:[~2026-06-16  6:57 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-25 13:08 [PATCH] rtc: interface: Add rtc time jump debug in rtc_timer_do_work() Jinjie Ruan
2026-06-11  1:40 ` Jinjie Ruan
2026-06-11  2:01 ` Jinjie Ruan
2026-06-15 15:22 ` Alexandre Belloni
2026-06-16  2:10   ` Jinjie Ruan
2026-06-16  6:57     ` Alexandre Belloni [this message]
2026-06-16  8:12       ` Jinjie Ruan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=202606160657465eab6e41@mail.local \
    --to=alexandre.belloni@bootlin.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-rtc@vger.kernel.org \
    --cc=ruanjinjie@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox