Low-res tick handler device not going to ONESHOT_STOPPED when tick is stopped (was: rcu_sched self-detected stall on CPU)

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Nicholas Piggin <npiggin@gmail.com>
To: Michael Ellerman <mpe@ellerman.id.au>,
	paulmck@kernel.org, Zhouyi Zhou <zhouzhouyi@gmail.com>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>,
	linux-kernel@vger.kernel.org,
	linuxppc-dev <linuxppc-dev@lists.ozlabs.org>,
	Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>,
	rcu <rcu@vger.kernel.org>, Thomas Gleixner <tglx@linutronix.de>,
	Viresh Kumar <viresh.kumar@linaro.org>
Subject: Low-res tick handler device not going to ONESHOT_STOPPED when tick is stopped (was: rcu_sched self-detected stall on CPU)
Date: Wed, 13 Apr 2022 16:10:02 +1000	[thread overview]
Message-ID: <1649829917.xni78o33uo.astroid@bobo.none> (raw)
In-Reply-To: <1649818529.j46672mh2p.astroid@bobo.none>

Oops, fixed subject...

Excerpts from Nicholas Piggin's message of April 13, 2022 3:11 pm:
> +Daniel, Thomas, Viresh
> 
> Subject: Re: rcu_sched self-detected stall on CPU
> 
> Excerpts from Michael Ellerman's message of April 9, 2022 12:42 am:
>> Michael Ellerman <mpe@ellerman.id.au> writes:
>>> "Paul E. McKenney" <paulmck@kernel.org> writes:
>>>> On Wed, Apr 06, 2022 at 05:31:10PM +0800, Zhouyi Zhou wrote:
>>>>> Hi
>>>>> 
>>>>> I can reproduce it in a ppc virtual cloud server provided by Oregon
>>>>> State University.  Following is what I do:
>>>>> 1) curl -l https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/snapshot/linux-5.18-rc1.tar.gz
>>>>> -o linux-5.18-rc1.tar.gz
>>>>> 2) tar zxf linux-5.18-rc1.tar.gz
>>>>> 3) cp config linux-5.18-rc1/.config
>>>>> 4) cd linux-5.18-rc1
>>>>> 5) make vmlinux -j 8
>>>>> 6) qemu-system-ppc64 -kernel vmlinux -nographic -vga none -no-reboot
>>>>> -smp 2 (QEMU 4.2.1)
>>>>> 7) after 12 rounds, the bug got reproduced:
>>>>> (http://154.223.142.244/logs/20220406/qemu.log.txt)
>>>>
>>>> Just to make sure, are you both seeing the same thing?  Last I knew,
>>>> Zhouyi was chasing an RCU-tasks issue that appears only in kernels
>>>> built with CONFIG_PROVE_RCU=y, which Miguel does not have set.  Or did
>>>> I miss something?
>>>>
>>>> Miguel is instead seeing an RCU CPU stall warning where RCU's grace-period
>>>> kthread slept for three milliseconds, but did not wake up for more than
>>>> 20 seconds.  This kthread would normally have awakened on CPU 1, but
>>>> CPU 1 looks to me to be very unhealthy, as can be seen in your console
>>>> output below (but maybe my idea of what is healthy for powerpc systems
>>>> is outdated).  Please see also the inline annotations.
>>>>
>>>> Thoughts from the PPC guys?
>>>
>>> I haven't seen it in my testing. But using Miguel's config I can
>>> reproduce it seemingly on every boot.
>>>
>>> For me it bisects to:
>>>
>>>   35de589cb879 ("powerpc/time: improve decrementer clockevent processing")
>>>
>>> Which seems plausible.
>>>
>>> Reverting that on mainline makes the bug go away.
>>>
>>> I don't see an obvious bug in the diff, but I could be wrong, or the old
>>> code was papering over an existing bug?
>>>
>>> I'll try and work out what it is about Miguel's config that exposes
>>> this vs our defconfig, that might give us a clue.
>> 
>> It's CONFIG_HIGH_RES_TIMERS=n which triggers the stall.
>> 
>> I can reproduce just with:
>> 
>>   $ make ppc64le_guest_defconfig
>>   $ ./scripts/config -d HIGH_RES_TIMERS
>> 
>> We have no defconfigs that disable HIGH_RES_TIMERS, I didn't even
>> realise you could disable it TBH :)
>> 
>> The Rust CI has it disabled because I copied that from the x86 defconfig
>> they were using back when I added the Rust support. I think that was
>> meant to be a stripped down fast config for CI, but the result is it's
>> just using a badly tested combination which is not helpful.
>> 
>> So I'll send a patch to turn HIGH_RES_TIMERS on for the Rust CI, and we
>> can debug this further without blocking them.
> 
> So we traced the problem down to possibly a misunderstanding between 
> decrementer clock event device and core code.
> 
> The decrementer is only oneshot*ish*. It actually needs to either be 
> reprogrammed or shut down otherwise it just continues to cause 
> interrupts.
> 
> Before commit 35de589cb879, it was sort of two-shot. The initial 
> interrupt at the programmed time would set its internal next_tb variable 
> to ~0 and call the ->event_handler(). If that did not set_next_event or 
> stop the timer, the interrupt will fire again immediately, notice 
> next_tb is ~0, and only then stop the decrementer interrupt.
> 
> So that was already kind of ugly, this patch just turned it into a hang.
> 
> The problem happens when the tick is stopped with an event still 
> pending, then tick_nohz_handler() is called, but it bails out because 
> tick_stopped == 1 so the device never gets programmed again, and so it 
> keeps firing.
> 
> How to fix it? Before commit a7cba02deced, powerpc's decrementer was 
> really oneshot, but we would like to avoid doing that because it requires 
> additional programming of the hardware on each timer interrupt. We have 
> the ONESHOT_STOPPED state which seems to be just about what we want.
> 
> Did the ONESHOT_STOPPED patch just miss this case, or is there a reason 
> we don't stop it here? This patch seems to fix the hang (not heavily
> tested though).
>  
> Thanks,
> Nick
> 
> ---
> diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> index 2d76c91b85de..7e13a55b6b71 100644
> --- a/kernel/time/tick-sched.c
> +++ b/kernel/time/tick-sched.c
> @@ -1364,9 +1364,11 @@ static void tick_nohz_handler(struct clock_event_device *dev)
>  	tick_sched_do_timer(ts, now);
>  	tick_sched_handle(ts, regs);
>  
> -	/* No need to reprogram if we are running tickless  */
> -	if (unlikely(ts->tick_stopped))
> +	if (unlikely(ts->tick_stopped)) {
> +		/* If we are tickless, change the clock event to stopped */
> +		tick_program_event(KTIME_MAX, 1);
>  		return;
> +	}
>  
>  	hrtimer_forward(&ts->sched_timer, now, TICK_NSEC);
>  	tick_program_event(hrtimer_get_expires(&ts->sched_timer), 1);
>

next prev parent reply	other threads:[~2022-04-13  6:12 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CANiq72k+5Rdj7i3Df2dcE6_OPYPXK3z5EWLKnY56sSMz4G3OvA@mail.gmail.com>
     [not found] ` <CAABZP2z64aYWfVSdXHaQopWc+BAbJJUGqtrju2iWER3DDTDFWg@mail.gmail.com>
     [not found]   ` <20220406170012.GO4285@paulmck-ThinkPad-P17-Gen-1>
     [not found]     ` <87pmls6nt7.fsf@mpe.ellerman.id.au>
     [not found]       ` <87k0bz7i1s.fsf@mpe.ellerman.id.au>
2022-04-13  5:11         ` Nicholas Piggin
2022-04-13  6:10           ` Nicholas Piggin [this message]
2022-04-14 17:15             ` Low-res tick handler device not going to ONESHOT_STOPPED when tick is stopped (was: rcu_sched self-detected stall on CPU) Paul E. McKenney
2022-04-22 15:53           ` Thomas Gleixner
2022-04-23  2:29             ` Re: Nicholas Piggin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1649829917.xni78o33uo.astroid@bobo.none \
    --to=npiggin@gmail.com \
    --cc=daniel.lezcano@linaro.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=miguel.ojeda.sandonis@gmail.com \
    --cc=mpe@ellerman.id.au \
    --cc=paulmck@kernel.org \
    --cc=rcu@vger.kernel.org \
    --cc=tglx@linutronix.de \
    --cc=viresh.kumar@linaro.org \
    --cc=zhouzhouyi@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox