* [PATCH] powerpc/watchdog: Use hrtimers for per-CPU heartbeat
@ 2019-04-02 11:25 Nicholas Piggin
2019-04-03 3:42 ` Ravi Bangoria
2019-04-04 11:19 ` Gautham R Shenoy
0 siblings, 2 replies; 4+ messages in thread
From: Nicholas Piggin @ 2019-04-02 11:25 UTC (permalink / raw)
To: linuxppc-dev; +Cc: Ravikumar Bangoria, Nicholas Piggin
Using a jiffies timer creates a dependency on the tick_do_timer_cpu
incrementing jiffies. If that CPU has locked up and jiffies is not
incrementing, the watchdog heartbeat timer for all CPUs stops and
creates false positives and confusing warnings on local CPUs, and
also causes the SMP detector to stop, so the root cause is never
detected.
Fix this by using hrtimer based timers for the watchdog heartbeat,
like the generic kernel hardlockup detector.
Reported-by: Ravikumar Bangoria <ravi.bangoria@in.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
arch/powerpc/kernel/watchdog.c | 34 ++++++++++++++++++----------------
1 file changed, 18 insertions(+), 16 deletions(-)
diff --git a/arch/powerpc/kernel/watchdog.c b/arch/powerpc/kernel/watchdog.c
index 3c6ab22a0c4e..59a0e5942f6b 100644
--- a/arch/powerpc/kernel/watchdog.c
+++ b/arch/powerpc/kernel/watchdog.c
@@ -77,7 +77,7 @@ static u64 wd_smp_panic_timeout_tb __read_mostly; /* panic other CPUs */
static u64 wd_timer_period_ms __read_mostly; /* interval between heartbeat */
-static DEFINE_PER_CPU(struct timer_list, wd_timer);
+static DEFINE_PER_CPU(struct hrtimer, wd_hrtimer);
static DEFINE_PER_CPU(u64, wd_timer_tb);
/* SMP checker bits */
@@ -293,21 +293,21 @@ void soft_nmi_interrupt(struct pt_regs *regs)
nmi_exit();
}
-static void wd_timer_reset(unsigned int cpu, struct timer_list *t)
-{
- t->expires = jiffies + msecs_to_jiffies(wd_timer_period_ms);
- if (wd_timer_period_ms > 1000)
- t->expires = __round_jiffies_up(t->expires, cpu);
- add_timer_on(t, cpu);
-}
-
-static void wd_timer_fn(struct timer_list *t)
+static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
{
int cpu = smp_processor_id();
+ if (!(watchdog_enabled & NMI_WATCHDOG_ENABLED))
+ return HRTIMER_NORESTART;
+
+ if (!cpumask_test_cpu(cpu, &watchdog_cpumask))
+ return HRTIMER_NORESTART;
+
watchdog_timer_interrupt(cpu);
- wd_timer_reset(cpu, t);
+ hrtimer_forward_now(hrtimer, ms_to_ktime(wd_timer_period_ms));
+
+ return HRTIMER_RESTART;
}
void arch_touch_nmi_watchdog(void)
@@ -325,19 +325,21 @@ EXPORT_SYMBOL(arch_touch_nmi_watchdog);
static void start_watchdog_timer_on(unsigned int cpu)
{
- struct timer_list *t = per_cpu_ptr(&wd_timer, cpu);
+ struct hrtimer *hrtimer = this_cpu_ptr(&wd_hrtimer);
per_cpu(wd_timer_tb, cpu) = get_tb();
- timer_setup(t, wd_timer_fn, TIMER_PINNED);
- wd_timer_reset(cpu, t);
+ hrtimer_init(hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
+ hrtimer->function = watchdog_timer_fn;
+ hrtimer_start(hrtimer, ms_to_ktime(wd_timer_period_ms),
+ HRTIMER_MODE_REL_PINNED);
}
static void stop_watchdog_timer_on(unsigned int cpu)
{
- struct timer_list *t = per_cpu_ptr(&wd_timer, cpu);
+ struct hrtimer *hrtimer = this_cpu_ptr(&wd_hrtimer);
- del_timer_sync(t);
+ hrtimer_cancel(hrtimer);
}
static int start_wd_on_cpu(unsigned int cpu)
--
2.20.1
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH] powerpc/watchdog: Use hrtimers for per-CPU heartbeat
2019-04-02 11:25 [PATCH] powerpc/watchdog: Use hrtimers for per-CPU heartbeat Nicholas Piggin
@ 2019-04-03 3:42 ` Ravi Bangoria
2019-04-04 11:19 ` Gautham R Shenoy
1 sibling, 0 replies; 4+ messages in thread
From: Ravi Bangoria @ 2019-04-03 3:42 UTC (permalink / raw)
To: Nicholas Piggin; +Cc: Ravikumar Bangoria, linuxppc-dev
On 4/2/19 4:55 PM, Nicholas Piggin wrote:
> Using a jiffies timer creates a dependency on the tick_do_timer_cpu
> incrementing jiffies. If that CPU has locked up and jiffies is not
> incrementing, the watchdog heartbeat timer for all CPUs stops and
> creates false positives and confusing warnings on local CPUs, and
> also causes the SMP detector to stop, so the root cause is never
> detected.
>
> Fix this by using hrtimer based timers for the watchdog heartbeat,
> like the generic kernel hardlockup detector.
>
> Reported-by: Ravikumar Bangoria <ravi.bangoria@in.ibm.com>
Reported-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Thanks,
Ravi
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] powerpc/watchdog: Use hrtimers for per-CPU heartbeat
2019-04-02 11:25 [PATCH] powerpc/watchdog: Use hrtimers for per-CPU heartbeat Nicholas Piggin
2019-04-03 3:42 ` Ravi Bangoria
@ 2019-04-04 11:19 ` Gautham R Shenoy
2019-04-04 16:03 ` Nicholas Piggin
1 sibling, 1 reply; 4+ messages in thread
From: Gautham R Shenoy @ 2019-04-04 11:19 UTC (permalink / raw)
To: Nicholas Piggin; +Cc: Ravikumar Bangoria, linuxppc-dev
Hello Nicholas,
On Tue, Apr 2, 2019 at 4:57 PM Nicholas Piggin <npiggin@gmail.com> wrote:
>
> Using a jiffies timer creates a dependency on the tick_do_timer_cpu
> incrementing jiffies. If that CPU has locked up and jiffies is not
> incrementing, the watchdog heartbeat timer for all CPUs stops and
> creates false positives and confusing warnings on local CPUs, and
> also causes the SMP detector to stop, so the root cause is never
> detected.
>
> Fix this by using hrtimer based timers for the watchdog heartbeat,
> like the generic kernel hardlockup detector.
>
> Reported-by: Ravikumar Bangoria <ravi.bangoria@in.ibm.com>
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[..snip..]
> @@ -325,19 +325,21 @@ EXPORT_SYMBOL(arch_touch_nmi_watchdog);
>
> static void start_watchdog_timer_on(unsigned int cpu)
> {
> - struct timer_list *t = per_cpu_ptr(&wd_timer, cpu);
> + struct hrtimer *hrtimer = this_cpu_ptr(&wd_hrtimer);
This function can be called during the initialization via
watchdog_nmi_start -->
for_each_online_cpu(cpu)
start_wd_on_cpu(cpu) -->
start_watchdog_timer_on(cpu)
Thus, it is not guarateed that we are always calling
start_watchdog_timer_on() from the CPU where
we want to start the watchdog timer.
Thus, should we be calling this function from start_wd_on_cpu() via an
smp_call_function_single() ?
>
> per_cpu(wd_timer_tb, cpu) = get_tb();
>
> - timer_setup(t, wd_timer_fn, TIMER_PINNED);
> - wd_timer_reset(cpu, t);
> + hrtimer_init(hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
> + hrtimer->function = watchdog_timer_fn;
> + hrtimer_start(hrtimer, ms_to_ktime(wd_timer_period_ms),
> + HRTIMER_MODE_REL_PINNED);
> }
>
> static void stop_watchdog_timer_on(unsigned int cpu)
> {
> - struct timer_list *t = per_cpu_ptr(&wd_timer, cpu);
> + struct hrtimer *hrtimer = this_cpu_ptr(&wd_hrtimer);
>
> - del_timer_sync(t);
> + hrtimer_cancel(hrtimer);
> }
>
> static int start_wd_on_cpu(unsigned int cpu)
> --
> 2.20.1
>
--
Thanks and Regards
gautham.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] powerpc/watchdog: Use hrtimers for per-CPU heartbeat
2019-04-04 11:19 ` Gautham R Shenoy
@ 2019-04-04 16:03 ` Nicholas Piggin
0 siblings, 0 replies; 4+ messages in thread
From: Nicholas Piggin @ 2019-04-04 16:03 UTC (permalink / raw)
To: ego; +Cc: Ravikumar Bangoria, linuxppc-dev
Gautham R Shenoy's on April 4, 2019 9:19 pm:
> Hello Nicholas,
>
> On Tue, Apr 2, 2019 at 4:57 PM Nicholas Piggin <npiggin@gmail.com> wrote:
>>
>> Using a jiffies timer creates a dependency on the tick_do_timer_cpu
>> incrementing jiffies. If that CPU has locked up and jiffies is not
>> incrementing, the watchdog heartbeat timer for all CPUs stops and
>> creates false positives and confusing warnings on local CPUs, and
>> also causes the SMP detector to stop, so the root cause is never
>> detected.
>>
>> Fix this by using hrtimer based timers for the watchdog heartbeat,
>> like the generic kernel hardlockup detector.
>>
>> Reported-by: Ravikumar Bangoria <ravi.bangoria@in.ibm.com>
>> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
>
> [..snip..]
>
>> @@ -325,19 +325,21 @@ EXPORT_SYMBOL(arch_touch_nmi_watchdog);
>>
>> static void start_watchdog_timer_on(unsigned int cpu)
>> {
>> - struct timer_list *t = per_cpu_ptr(&wd_timer, cpu);
>> + struct hrtimer *hrtimer = this_cpu_ptr(&wd_hrtimer);
>
> This function can be called during the initialization via
>
> watchdog_nmi_start -->
> for_each_online_cpu(cpu)
> start_wd_on_cpu(cpu) -->
> start_watchdog_timer_on(cpu)
>
> Thus, it is not guarateed that we are always calling
> start_watchdog_timer_on() from the CPU where
> we want to start the watchdog timer.
>
> Thus, should we be calling this function from start_wd_on_cpu() via an
> smp_call_function_single() ?
Good catch, yes I think we need that change (like kernel/watchdog.c).
I'll resend.
Thanks,
Nick
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2019-04-04 16:05 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2019-04-02 11:25 [PATCH] powerpc/watchdog: Use hrtimers for per-CPU heartbeat Nicholas Piggin
2019-04-03 3:42 ` Ravi Bangoria
2019-04-04 11:19 ` Gautham R Shenoy
2019-04-04 16:03 ` Nicholas Piggin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).