Re: [PATCH V2] hrtimer: Check running timer state

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: "enlin.mu" <enlin.mu@linux.dev>
To: Thomas Gleixner <tglx@linutronix.de>,
	anna-maria@linutronix.de, frederic@kernel.org,
	linux-kernel@vger.kernel.org, enlin.mu@unisoc.com
Subject: Re: [PATCH V2] hrtimer: Check running timer state
Date: Mon, 17 Nov 2025 07:58:57 +0800	[thread overview]
Message-ID: <68734d1a-b77e-495e-922a-7fce3a122789@linux.dev> (raw)
In-Reply-To: <87o6p4bgzn.ffs@tglx>



On 2025/11/15 02:34, Thomas Gleixner wrote:
> On Fri, Nov 14 2025 at 19:42, Enlin Mu wrote:
>> When the running timer is not NULL, print debugging information.
> 
> What for?
> 
>> The test code is roughly as follows:
>>
>> static struct hrtimer serial_timer;
>> enum hrtimer_restart serial_timer_handler(struct hrtimer * timer)
>> {
>> 	local_irq_disable();
> 
> What is this for?
Hi Thomas

This code comes from my colleauge who is not familiar with the
running machanism of hrtimer, which is why there is this logical error.

> 
>> 	......
>> 	do_someting();
>> 	copy_data_with_dma();
>> 	......
>> 	hrtimer_forward_now(*serial_timer, ns_to_ktime(1000*2000));
>> 	local_irq_enable();
> 
> And this?
> 
>> 	return HRTIMER_RESTART;
>> }
>>
>> static int serial_start(struct uart_port *port)
>> {
>> 	......
>> 	......
>> 	hrtime_init(&serial_timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
> 
> That function does not exist.
> 
>> 	ktime = no_to_ktime(1000*2000);
>> 	serial_timer.function = serial_timer_handler;
>> 	hrtimer_start(&serial_timer, ktime, HRTIMER_MODE_REL);
>> 	......
>> 	return 0;
>> }
>>
>> static void serial_shutdown(struct uart_port *port)
>> {
>> 	......
>> 	hrtimer_cancle(&serial_timer);
>> 	......
>> 	serial_release_dma(port);
>> 	......
>> }
> 
>> The cpu6 canceled serial_timer and released dma,but the
>> serial_timer still ran many times on CPU7 until a panic occurred.
>> The reason for the panic is that serial_timer accessed the
>> released dma,But the serial_timer had been canceled for
>> some time now on cpu6.
> 
> You still fail to explain how the timer can still run after being
> canceled.

Hi Thomas

The serial_timer on cpu6 has not been properly cancelled.
The serial_timer is once again entered into the active queue on cpu7.
---------------------------
on cpu7

step 1:  preempted by hrtimer_usb

static void __run_hrtimer(.....)
{
	......
	base->running = timer;   //timer is serial_timer;

	.....
	.....
	restart = fn(timer);   // fn is serial_timer_handler
	....  ==> irq is enable.  serial_timer is preempted by hrtimer_usb
	....
}


step 2: hrtimer_usb return

static void __run_hrtimer(.....)
{
	......
	base->running = timer;   //timer is hrtimer_usb;

	.....
	.....
	restart = fn(timer);
	....
}


step 3: serial_timer continues to execute

static void __run_hrtimer(.....)
{
	......
	 (restart != HRTIMER_NORESTART &&
		timer->state & HRTIMER_STATE_ENQUEUED))
			(timer, base, HRTIMER_MODE_ABS);  //timer is serial_timer

}

step 4: serial_timer run again.
	.......

........

step n: serial_timer run again.
	.......

step n+1: serial_timer run again.
	.......
	panic from serial_timer;

---------------------------
on cpu6

step 1:
     hrtimer_cancel


step 2:
     hrtimer_try_to_cancel

step 3:
     hrtimer_active return false, because running timer is hrtimer_usb 
on cpu7(step 2 on cpu7)

step 4:
     hrtimer_cancel returns 0

step 5:
     serial_timer is added active queue on cpu 7

---------------------------
Due to English not being my native language, I am unable to
explain this exception with a more complex description, and
can only provide a simple step-by-step explanation.

> 
>> The cpu6 can successfully cancel the serial_timer because the
>> running timer has changed and it is another timer(such as
>> hrtimer_usb).
> 
> After that the timer _cannot_ be running anymore unless some other code
> re-arms it afterwards.
> 
>> When the serial_timer is enable to interrupt, the next hrtimer
>> (such as hrtimer_usb) on cpu7 preempts the return of ther serial_timer,
>> causing a change in the running timer.
> 
> Then fix your timer callback. The callback is invoked in hard interrupt
> context and the callback enables interrupts, which is a NONO. You
> clearly never ran your code with lockdep enabled. It would have told you
> so.
> 
>> Signed-off-by: Enlin Mu <enlin.mu@linux.dev>
>> Signed-off-by: Enlin Mu <enlin.mu@unisoc.com>
>> Signed-off-by: Enlin Mu <enlin.mu@linux.dev>
> 
> Interesting Signed-off-by chain. Seems you're co-developing this patch
> with your Alter ego.
> 
> Thanks,
> 
>          tglx

next prev parent reply	other threads:[~2025-11-16 23:59 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-14 11:42 [PATCH V2] hrtimer: Check running timer state Enlin Mu
2025-11-14 18:34 ` Thomas Gleixner
2025-11-16 23:58   ` enlin.mu [this message]
2025-11-17  7:31     ` Thomas Gleixner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=68734d1a-b77e-495e-922a-7fce3a122789@linux.dev \
    --to=enlin.mu@linux.dev \
    --cc=anna-maria@linutronix.de \
    --cc=enlin.mu@unisoc.com \
    --cc=frederic@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox