From: Michael Ellerman <michaele@au1.ibm.com>
To: Nathan Lynch <nathanl@linux.ibm.com>,
Laurent Dufour <ldufour@linux.ibm.com>
Cc: linux-kernel@vger.kernel.org, npiggin@gmail.com,
paulus@samba.org, linuxppc-dev@lists.ozlabs.org,
haren@linux.vnet.ibm.com
Subject: Re: [PATCH 0/2] Disabling NMI watchdog during LPM's memory transfer
Date: Thu, 09 Jun 2022 17:45:49 +1000 [thread overview]
Message-ID: <87zgimfff6.fsf@mpe.ellerman.id.au> (raw)
In-Reply-To: <874k0x1s1d.fsf@linux.ibm.com>
Nathan Lynch <nathanl@linux.ibm.com> writes:
> Laurent Dufour <ldufour@linux.ibm.com> writes:
...
>
>> There are ongoing investigations to clarify where and how this latency is
>> happening. I'm not excluding any other issue in the Linux kernel, but right
>> now, this looks to be the best option to prevent system crash during
>> LPM.
>
> It will prevent the likely crash mode for enterprise distros with
> default watchdog tunables that our internal test environments happen to
> use. But if someone were to run the same scenario with softlockup_panic
> enabled, or with the RCU stall timeout lower than the watchdog
> threshold, the failure mode would be different.
>
> Basically I'm saying:
> * Some users may actually want the OS to panic when it's in this state,
> because their applications can't work correctly.
> * But if we're going to inhibit one watchdog, we should inhibit them
> all.
I'm sympathetic to both of your arguments.
But I think there is a key difference between the NMI watchdog and other
watchdogs, which is that the NMI watchdog will use the unsafe NMI to
interrupt other CPUs, and that can cause the system to crash when other
watchdogs would just print a backtrace.
We had the same problem with the rcu_sched stall detector until we
changed it to use the "safe" NMI, see:
5cc05910f26e ("powerpc/64s: Wire up arch_trigger_cpumask_backtrace()")
So even if the NMI watchdog is disabled there are still the other
watchdogs enabled, which should print backtraces by default, and if
desired can also be configured to cause a panic.
Instead of disabling the NMI watchdog, can we instead increase the
timeout (by how much?) during LPM, so that it is less likely to fire in
normal usage, but is still there as a backup if the system is completely
clogged.
cheers
next prev parent reply other threads:[~2022-06-09 7:46 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-06-01 15:53 [PATCH 0/2] Disabling NMI watchdog during LPM's memory transfer Laurent Dufour
2022-06-01 15:53 ` [PATCH 1/2] powerpc/mobility: Wait for memory transfer to complete Laurent Dufour
2022-06-01 15:53 ` [PATCH 2/2] powerpc/mobility: disabling hard lockup watchdog during LPM Laurent Dufour
2022-06-06 1:41 ` kernel test robot
2022-06-02 17:58 ` [PATCH 0/2] Disabling NMI watchdog during LPM's memory transfer Nathan Lynch
2022-06-03 8:59 ` Laurent Dufour
2022-06-06 20:00 ` Nathan Lynch
2022-06-09 7:45 ` Michael Ellerman [this message]
2022-06-09 9:09 ` Laurent Dufour
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87zgimfff6.fsf@mpe.ellerman.id.au \
--to=michaele@au1.ibm.com \
--cc=haren@linux.vnet.ibm.com \
--cc=ldufour@linux.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=nathanl@linux.ibm.com \
--cc=npiggin@gmail.com \
--cc=paulus@samba.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox