From: Nicholas Piggin <npiggin@gmail.com>
To: benh@kernel.crashing.org, Laurent Dufour <ldufour@linux.ibm.com>,
mpe@ellerman.id.au, paulus@samba.org
Cc: linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org
Subject: Re: [PATCH 1/2] powerpc/watchdog: prevent printk and send IPI while holding the wd lock
Date: Wed, 27 Oct 2021 13:29:10 +1000 [thread overview]
Message-ID: <1635303699.wgz87uxy4c.astroid@bobo.none> (raw)
In-Reply-To: <20211026162740.16283-2-ldufour@linux.ibm.com>
Excerpts from Laurent Dufour's message of October 27, 2021 2:27 am:
> When handling the Watchdog interrupt, long processing should not be done
> while holding the __wd_smp_lock. This prevents the other CPUs to grab it
> and to process Watchdog timer interrupts. Furhtermore, this could lead to
> the following situation:
>
> CPU x detect lockup on CPU y and grab the __wd_smp_lock
> in watchdog_smp_panic()
> CPU y caught the watchdog interrupt and try to grab the __wd_smp_lock
> in soft_nmi_interrupt()
> CPU x wait for CPU y to catch the IPI for 1s in __smp_send_nmi_ipi()
CPU y should get the IPI here if it's a NMI IPI (which will be true for
>= POWER9 64s).
That said, not all platforms support it and the console lock problem
seems real, so okay.
> CPU x will timeout and so has spent 1s waiting while holding the
> __wd_smp_lock.
>
> A deadlock may also happen between the __wd_smp_lock and the console_owner
> 'lock' this way:
> CPU x grab the console_owner
> CPU y grab the __wd_smp_lock
> CPU x catch the watchdog timer interrupt and needs to grab __wd_smp_lock
> CPU y wants to print something and wait for console_owner
> -> deadlock
>
> Doing all the long processing without holding the _wd_smp_lock prevents
> these situations.
The intention was to avoid logs getting garbled e.g., if multiple
different CPUs fire at once.
I wonder if instead we could deal with that by protecting the IPI
sending and printing stuff with a trylock, and if you don't get the
trylock then just return, and you'll come back with the next timer
interrupt.
Thanks,
Nick
>
> Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com>
> ---
> arch/powerpc/kernel/watchdog.c | 31 +++++++++++++++++--------------
> 1 file changed, 17 insertions(+), 14 deletions(-)
>
> diff --git a/arch/powerpc/kernel/watchdog.c b/arch/powerpc/kernel/watchdog.c
> index f9ea0e5357f9..bc7411327066 100644
> --- a/arch/powerpc/kernel/watchdog.c
> +++ b/arch/powerpc/kernel/watchdog.c
> @@ -149,6 +149,8 @@ static void set_cpu_stuck(int cpu, u64 tb)
>
> static void watchdog_smp_panic(int cpu, u64 tb)
> {
> + cpumask_t cpus_pending_copy;
> + u64 last_reset_tb_copy;
> unsigned long flags;
> int c;
>
> @@ -161,29 +163,32 @@ static void watchdog_smp_panic(int cpu, u64 tb)
> if (cpumask_weight(&wd_smp_cpus_pending) == 0)
> goto out;
>
> + cpumask_copy(&cpus_pending_copy, &wd_smp_cpus_pending);
> + last_reset_tb_copy = wd_smp_last_reset_tb;
> +
> + /* Take the stuck CPUs out of the watch group */
> + set_cpumask_stuck(&wd_smp_cpus_pending, tb);
> +
> + wd_smp_unlock(&flags);
> +
> pr_emerg("CPU %d detected hard LOCKUP on other CPUs %*pbl\n",
> - cpu, cpumask_pr_args(&wd_smp_cpus_pending));
> + cpu, cpumask_pr_args(&cpus_pending_copy));
> pr_emerg("CPU %d TB:%lld, last SMP heartbeat TB:%lld (%lldms ago)\n",
> - cpu, tb, wd_smp_last_reset_tb,
> - tb_to_ns(tb - wd_smp_last_reset_tb) / 1000000);
> + cpu, tb, last_reset_tb_copy,
> + tb_to_ns(tb - last_reset_tb_copy) / 1000000);
>
> if (!sysctl_hardlockup_all_cpu_backtrace) {
> /*
> * Try to trigger the stuck CPUs, unless we are going to
> * get a backtrace on all of them anyway.
> */
> - for_each_cpu(c, &wd_smp_cpus_pending) {
> + for_each_cpu(c, &cpus_pending_copy) {
> if (c == cpu)
> continue;
> smp_send_nmi_ipi(c, wd_lockup_ipi, 1000000);
> }
> }
>
> - /* Take the stuck CPUs out of the watch group */
> - set_cpumask_stuck(&wd_smp_cpus_pending, tb);
> -
> - wd_smp_unlock(&flags);
> -
> if (sysctl_hardlockup_all_cpu_backtrace)
> trigger_allbutself_cpu_backtrace();
>
> @@ -204,6 +209,8 @@ static void wd_smp_clear_cpu_pending(int cpu, u64 tb)
> unsigned long flags;
>
> wd_smp_lock(&flags);
> + cpumask_clear_cpu(cpu, &wd_smp_cpus_stuck);
> + wd_smp_unlock(&flags);
>
> pr_emerg("CPU %d became unstuck TB:%lld\n",
> cpu, tb);
> @@ -212,9 +219,6 @@ static void wd_smp_clear_cpu_pending(int cpu, u64 tb)
> show_regs(regs);
> else
> dump_stack();
> -
> - cpumask_clear_cpu(cpu, &wd_smp_cpus_stuck);
> - wd_smp_unlock(&flags);
> }
> return;
> }
> @@ -267,6 +271,7 @@ DEFINE_INTERRUPT_HANDLER_NMI(soft_nmi_interrupt)
> return 0;
> }
> set_cpu_stuck(cpu, tb);
> + wd_smp_unlock(&flags);
>
> pr_emerg("CPU %d self-detected hard LOCKUP @ %pS\n",
> cpu, (void *)regs->nip);
> @@ -277,8 +282,6 @@ DEFINE_INTERRUPT_HANDLER_NMI(soft_nmi_interrupt)
> print_irqtrace_events(current);
> show_regs(regs);
>
> - wd_smp_unlock(&flags);
> -
> if (sysctl_hardlockup_all_cpu_backtrace)
> trigger_allbutself_cpu_backtrace();
>
> --
> 2.33.1
>
>
next prev parent reply other threads:[~2021-10-27 3:29 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-10-26 16:27 [PATCH 0/2] powerpc prevents deadlock in the watchdog path Laurent Dufour
2021-10-26 16:27 ` [PATCH 1/2] powerpc/watchdog: prevent printk and send IPI while holding the wd lock Laurent Dufour
2021-10-27 3:29 ` Nicholas Piggin [this message]
2021-10-27 8:14 ` Laurent Dufour
2021-10-27 8:51 ` Nicholas Piggin
2021-10-27 9:49 ` Nicholas Piggin
2021-10-28 15:45 ` Laurent Dufour
2021-10-26 16:27 ` [PATCH 2/2] powerpc/watchdog: ensure watchdog data accesses are protected Laurent Dufour
2021-10-27 3:48 ` Nicholas Piggin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1635303699.wgz87uxy4c.astroid@bobo.none \
--to=npiggin@gmail.com \
--cc=benh@kernel.crashing.org \
--cc=ldufour@linux.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=mpe@ellerman.id.au \
--cc=paulus@samba.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox