From: Alexandre Belloni <alexandre.belloni@bootlin.com>
To: Jinjie Ruan <ruanjinjie@huawei.com>
Cc: linux-rtc@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] rtc: interface: Add rtc time jump debug in rtc_timer_do_work()
Date: Tue, 16 Jun 2026 08:57:46 +0200 [thread overview]
Message-ID: <202606160657465eab6e41@mail.local> (raw)
In-Reply-To: <06cdb8b3-8a5d-4f1d-b686-6122fa6f7af9@huawei.com>
Hi,
On 16/06/2026 10:10:19+0800, Jinjie Ruan wrote:
>
>
> On 6/15/2026 11:22 PM, Alexandre Belloni wrote:
> > Hello,
> >
> > On 25/05/2026 21:08:25+0800, Jinjie Ruan wrote:
> >> In virtualization environments like QEMU [1], or during hardware
> >> clocksource anomalies, an extreme time-warp event can occur. When
> >> the system time abruptly jumps forward, the rtc_timer_do_work() handler
> >> falls into a prolonged processing loop to clear accumulated historical
> >> timers via timerqueue_getnext(). Running this loop indefinitely under
> >> the rtc->ops_lock mutex triggers a kernel softlockup, stalling
> >> the system.
> >>
> >> Introduce an adaptive telemetry and loop guard mechanism to enhance debug
> >> visibility and prevent softlockups:
> >>
> >> 1. Record `start_jiffies` upon entry and leverage `time_after()` to
> >> check if the loop has monopolized the CPU for more than 1s (HZ). If so,
> >> the handler prints a telemetry warning, triggers a WARN stack dump, and
> >> breaks the loop to safely yield the CPU.
> >>
> >> 2. Track the execution via a `loop_count` metric. Printing this counter
> >> in the warning log provides vital diagnostics to distinguish
> >> an aggressive time-warp storm (high count) from a bogged-down callback
> >> bug (low count).
> >>
> >> 3. Utilize the kernel format specifier `%ptR` to convert the raw ktime
> >> into a human-readable timestamp (YYYY-MM-DD HH:MM:SS), allowing
> >> developers to instantly pinpoint the exact boundary of the time
> >> jump in dmesg.
> >>
> >> This non-destructive telemetry guard provides precise hardware/emulator
> >> diagnostic visibility while ensuring core kernel availability.
> >>
> >> [1]: https://lore.kernel.org/all/20260114013257.3500578-1-ruanjinjie@huawei.com/
> >> Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
> >> ---
> >> drivers/rtc/interface.c | 15 +++++++++++++--
> >> 1 file changed, 13 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/drivers/rtc/interface.c b/drivers/rtc/interface.c
> >> index 1906f4884a83..f6c5fd16cc4e 100644
> >> --- a/drivers/rtc/interface.c
> >> +++ b/drivers/rtc/interface.c
> >> @@ -927,10 +927,12 @@ static void rtc_timer_remove(struct rtc_device *rtc, struct rtc_timer *timer)
> >> */
> >> void rtc_timer_do_work(struct work_struct *work)
> >> {
> >> - struct rtc_timer *timer;
> >> + unsigned long start_jiffies = jiffies;
> >> struct timerqueue_node *next;
> >> - ktime_t now;
> >> + struct rtc_timer *timer;
> >> struct rtc_time tm;
> >> + int loop_count = 0;
> >> + ktime_t now;
> >> int err;
> >>
> >> struct rtc_device *rtc =
> >> @@ -945,6 +947,15 @@ void rtc_timer_do_work(struct work_struct *work)
> >> }
> >> now = rtc_tm_to_ktime(tm);
> >> while ((next = timerqueue_getnext(&rtc->timerqueue))) {
> >> + loop_count++;
> >> +
> >> + if (unlikely(time_after(jiffies, start_jiffies + HZ))) {
> >> + dev_warn(&rtc->dev, "RTC time jump (loop: %d) to %ptR.\n",
> >> + loop_count, &tm);
> >> + WARN_ON_ONCE(1);
> >
> > So, your issue is that it is too slow so you make it even slower? There
> > are already plenty of tracepoints that allow proper debugging in this
> > loop, I'm pretty sure we don't want to bloat the kernel with more
> > messages.
>
> Hi, Alexandre,
>
> The point here is not about the performance of the rtc_timer_do_work()
> loop — it’s about making the problem debuggable when things go wrong.
> And we can put it under a debug Kconfig option, so production kernels
> see no extra overhead at all.
But then aren't the tracepoint enough? There are 3 tracepoints in the
loop that are exactly for debugging your issue.
>
> The patch is installed in the following scenarios:
>
> If the RTC hardware fails, or if the QEMU-emulated RTC device code in a
> KVM virtual machine has a problem (for example, the x86 RTC emulation
> hardware mc146818 has an overflow issue[1]), the time may jump as shown
> in the log below, which can cause a soft lockup.
This is not clear, you are not explaining the issue. You seem to mix
system time and RTC time. What I understand is that there was a periodic
timer enqueued and then for some reason, the system time jumped forward
by a large amount and now rtc_timer_do_work is firing events for each of
the missed timers. You are not explaining the relationship with the RTC
hardware (I see none).
>
> However, when the issue occurs, it is only possible to know that too
> many pending timers have accumulated in the timerqueue (for example, the
> log shows that tens of millions of timer nodes have been processed) by
> temporarily adding diagnostic code in rtc_timer_do_work().
>
This is not true, there are 3 tracepoints to know what is happening with
the timers. Also, there are always exactly zero, one or two timers in
the queue, never tens of millions.
> To determine whether the root cause is hardware, kernel RTC code, an RTC
> driver issue, or an RTC hardware problem, more debugging is needed. But
> if the problem is indeed caused by RTC hardware, adding a diagnostic
> print of the current RTC time when the loop takes too long (as this
> diagnostic patch does) would make it easy to tell whether QEMU or the
> hardware is faulty.
>
> [1]: https://lore.kernel.org/all/20260613195116.1807273-21-mjt@tls.msk.ru/
>
> kworker/0:1-37 [000] .N.. 489.159634: rtc_timer_do_work:
> timerqueue_getnext handle timer node count: 13281423
> kworker/0:1-37 [000] .N.. 489.159635: rtc_timer_do_work:
> timerqueue_getnext handle timer node count: 13281424
> kworker/0:1-37 [000] .N.. 489.159635: rtc_timer_do_work:
> timerqueue_getnext handle timer node count: 13281425
> kworker/0:1-37 [000] .N.. 489.159636: rtc_timer_do_work:
> timerqueue_getnext handle timer node count: 13281426
> kworker/0:1-37 [000] .N.. 489.159637: rtc_timer_do_work:
> timerqueue_getnext handle timer node count: 13281427
> kworker/0:1-37 [000] .N.. 489.159638: rtc_timer_do_work:
> timerqueue_getnext handle timer node count: 13281428
> kworker/0:1-37 [000] .N.. 489.159638: rtc_timer_do_work:
> timerqueue_getnext handle timer node count: 13281429
> kworker/0:1-37 [000] .N.. 489.159639: rtc_timer_do_work:
> timerqueue_getnext handle timer node count: 13281430
> kworker/0:1-37 [000] .N.. 489.159640: rtc_timer_do_work:
> timerqueue_getnext handle timer node count: 13281431
> kworker/0:1-37 [000] .N.. 489.159641: rtc_timer_do_work:
> timerqueue_getnext handle timer node count: 13281432
>
>
> swapper/0-1 [001] .N.. 11.579334: __rtc_read_time: rtc:
> 0xff11000109896800, ops->read_time:2026:01:05:09:36:21
> swapper/0-1 [001] .N.. 11.579421: __rtc_read_time: rtc:
> 0xff11000109896800, ops->read_time:2026:01:05:09:36:21
> swapper/0-1 [001] .N.. 11.579469: __rtc_read_time: rtc:
> 0xff11000109896800, ops->read_time:2026:01:05:09:36:21
> swapper/0-1 [001] .N.. 11.580816: __rtc_read_time: rtc:
> 0xff11000109896800, ops->read_time:2026:01:05:09:36:21
> syz-executor.5-7492 [003] .... 129.807406: __rtc_read_time: rtc:
> 0xff11000109896800, ops->read_time:2033:05:04:07:03:51
> syz-executor.5-7492 [003] .... 129.807419:
> __rtc_update_irq_enable.part.8: rtc uie on: 0xff11000109896800, now:
> 2033:05:04:07:03:51, expire: 2033:05:04:07:03:52
>
>
> Best regards,
> Jinjie
>
> >
> >
>
--
Alexandre Belloni, co-owner and COO, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com
next prev parent reply other threads:[~2026-06-16 6:57 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-25 13:08 [PATCH] rtc: interface: Add rtc time jump debug in rtc_timer_do_work() Jinjie Ruan
2026-06-11 1:40 ` Jinjie Ruan
2026-06-11 2:01 ` Jinjie Ruan
2026-06-15 15:22 ` Alexandre Belloni
2026-06-16 2:10 ` Jinjie Ruan
2026-06-16 6:57 ` Alexandre Belloni [this message]
2026-06-16 8:12 ` Jinjie Ruan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=202606160657465eab6e41@mail.local \
--to=alexandre.belloni@bootlin.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-rtc@vger.kernel.org \
--cc=ruanjinjie@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.