From: Artem Senichev <artemsen@gmail.com>
To: Neeraj Ladkani <neladk@microsoft.com>
Cc: Jayanth Othayoth <ojayanth@gmail.com>,
"openbmc@lists.ozlabs.org" <openbmc@lists.ozlabs.org>,
"geissonator@gmail.com" <geissonator@gmail.com>,
"bradleyb@fuzziesquirrel.com" <bradleyb@fuzziesquirrel.com>
Subject: Re: Add support to debug unresponsive host
Date: Thu, 16 May 2019 12:11:32 +0300 [thread overview]
Message-ID: <20190516091132.5233hakhib52bf7x@gmail.com> (raw)
In-Reply-To: <BL0PR2101MB093284F40176FBC801851059C8090@BL0PR2101MB0932.namprd21.prod.outlook.com>
We have solved a similar task for our VESNIN servers, based on POWER8 CPU
(OpenPOWER platform).
OpenBMC has pdbg debugger (meta-openpower/recipes-bsp/pdbg), this
utility, among other things, can be used to send SRESET signal from
OpenBMC to the host's CPU. As a result of handling the signal, host
side Linux kernel initiates kdump.
This procedure inevitably reboots the host system, whether the host
is working or the system is hung, so it is not a good idea to do this
automatically.
A system administrator initiates the procedure manually from OpenBMC
console.
--
Regards,
Artem Senichev
Software Engineer, YADRO.
On Wed, May 15, 2019 at 06:26:08PM +0000, Neeraj Ladkani wrote:
> Some questions.
>
>
> 1. How does BMC know when to trigger NMI? Are we relying on agents to run and send heartbeat? Can this be done agentless ?
> 2. How do we NMI on non x86 platforms ?
>
> we should brainstorm to create a generic framework to solve this problem.
>
> What
> Neeraj
>
> From: openbmc <openbmc-bounces+neladk=microsoft.com@lists.ozlabs.org> On Behalf Of Jayanth Othayoth
> Sent: Wednesday, May 15, 2019 5:40 AM
> To: openbmc@lists.ozlabs.org; geissonator@gmail.com; bradleyb@fuzziesquirrel.com
> Subject: Add support to debug unresponsive host
>
> ## Problem Description
> Issue #457: Add support to debug unresponsive host.
>
> Scope: High level design direction to solve this problem,
>
> ## Background and References
> There are situation at customer places where OPAL/Linux goes unresponsive causing a system hang. And there is no way to figure out what went wrong with Linux kernel or OPAL. Looking for a way to trigger a dump capture on Linux host so that we can capture the OS dump for post analysis.
>
> ## Proposed Design for POWER processor based systems:
> Get all Host CPUs in reset vector and Linux then has a mechanism to patch it into panic-kdump path to trigger dump capture. This will enable us to analyze and fix customer issue where we see Linux hang and unresponsive system.
>
> ### Redfish Schema used:
> * Reference: DSP2046 2018.3,
> * ComputerSystem 1.6.0 schema provides an action called #ComputerSystem.Reset”, This action is used to reset the system. ResetType parameter is used for indicating type of reset need to be performed. In this use case we can use “Nmi” type
> * Nmi: Generate a Diagnostic Interrupt (usually an NMI on x86 systems) to cease normal operations, perform diagnostic actions and typically halt the system.
> * ### d-bus :
>
> Option 1: Extending the existing d-bus interface state.Host name space ( /openbmc/phosphor-dbus-interfaces/xyz/openbmc_project/State/Host.interface.yaml ) to support new RequestedHostTransition property called “Nmi”. d-bus backend can internally invoke processor specific target to do Sreset( equivalent to x86 NMI) and associated actions.
>
> Option 2: Introducing new d-bus interface in the control.state namespace ( /openbmc/phosphor-dbus-interfaces/xyz/openbmc_project/Control/Host/NMI.interface.yaml) namespace and implement the new d-bus back-end for respective processor specific targets.
>
> ## Alternatives Considered
> NA
>
> ## Impacts:
> NA
>
> ## Testing
> NA
>
> Looking for input from the team on this High level design direction approach.
next prev parent reply other threads:[~2019-05-16 9:11 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-05-15 12:39 Add support to debug unresponsive host Jayanth Othayoth
2019-05-15 18:26 ` Neeraj Ladkani
2019-05-16 9:11 ` Artem Senichev [this message]
2019-05-16 6:36 ` Deepak Kodihalli
2019-05-16 13:01 ` Andrew Geissler
2019-05-27 7:15 ` Jayanth Othayoth
2019-05-27 12:42 ` vishwa
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190516091132.5233hakhib52bf7x@gmail.com \
--to=artemsen@gmail.com \
--cc=bradleyb@fuzziesquirrel.com \
--cc=geissonator@gmail.com \
--cc=neladk@microsoft.com \
--cc=ojayanth@gmail.com \
--cc=openbmc@lists.ozlabs.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.