* Re: Out-of-band SRESET
2017-03-16 15:37 ` Patrick Williams
@ 2017-03-16 16:06 ` Rick Altherr
2017-03-16 21:54 ` Stewart Smith
2017-03-17 5:25 ` Ananth N Mavinakayanahalli
2 siblings, 0 replies; 5+ messages in thread
From: Rick Altherr @ 2017-03-16 16:06 UTC (permalink / raw)
To: Patrick Williams
Cc: Ananth N Mavinakayanahalli, mahesh, OpenBMC Maillist, vsainath
I know x86 has debug modes but I'm unfamiliar with them. I'll ask my
teammates who know more for some details to see if and how the BMC is
involved.
On Thu, Mar 16, 2017 at 8:37 AM, Patrick Williams <patrick@stwcx.xyz> wrote:
> On Thu, Mar 16, 2017 at 01:32:52PM +0530, Ananth N Mavinakayanahalli wrote:
>> Hi,
>>
>> One requirement from a OpenPOWER service point-of-view is to be able to
>> trigger an out-of-band SRESET on a unresponsive system. We can then have
>> the necessary plumbing in the host Linux kernel to either drop the
>> machine into a debugger or trigger a dump capture, if configured.
>>
>> On P9, this would translate to a series of SCOM operations for the SBE
>> It would be good to have a REST API defined to cater to this specific
>> purpose.
>>
>> The API should cater to:
>> - SRESET a core
>> - SRESET a chip
>> - SRESET all cores
>>
>> Thoughts?
>>
>> Regards,
>> Ananth
>>
>
> Ananth,
>
> I understand the desire from your end with respect to debugging the
> host. Is there something we can do to model this better from a REST
> perspective to make this less Power-specific? Do other architectures
> also have a "send debug interrupt"?
>
> Do you need to SRESET targeting an SMT thread? We will need to come up
> with some kind of identifier for sending the debug interrupts.
>
> --
> Patrick Williams
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Out-of-band SRESET
2017-03-16 15:37 ` Patrick Williams
2017-03-16 16:06 ` Rick Altherr
@ 2017-03-16 21:54 ` Stewart Smith
2017-03-17 5:25 ` Ananth N Mavinakayanahalli
2 siblings, 0 replies; 5+ messages in thread
From: Stewart Smith @ 2017-03-16 21:54 UTC (permalink / raw)
To: Patrick Williams, Ananth N Mavinakayanahalli; +Cc: mahesh, openbmc, vsainath
Patrick Williams <patrick@stwcx.xyz> writes:
> On Thu, Mar 16, 2017 at 01:32:52PM +0530, Ananth N Mavinakayanahalli wrote:
>> Hi,
>>
>> One requirement from a OpenPOWER service point-of-view is to be able to
>> trigger an out-of-band SRESET on a unresponsive system. We can then have
>> the necessary plumbing in the host Linux kernel to either drop the
>> machine into a debugger or trigger a dump capture, if configured.
>>
>> On P9, this would translate to a series of SCOM operations for the SBE
>> It would be good to have a REST API defined to cater to this specific
>> purpose.
>>
>> The API should cater to:
>> - SRESET a core
>> - SRESET a chip
>> - SRESET all cores
>>
>> Thoughts?
>>
>> Regards,
>> Ananth
>>
>
> Ananth,
>
> I understand the desire from your end with respect to debugging the
> host. Is there something we can do to model this better from a REST
> perspective to make this less Power-specific? Do other architectures
> also have a "send debug interrupt"?
on x86 there's the NMI, which can be sent via "ipmitool power diag"
It also exists in RedFish as a type of restart (on, forceoff,
gracefulrestart, nmi etc)
--
Stewart Smith
OPAL Architect, IBM.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Out-of-band SRESET
2017-03-16 15:37 ` Patrick Williams
2017-03-16 16:06 ` Rick Altherr
2017-03-16 21:54 ` Stewart Smith
@ 2017-03-17 5:25 ` Ananth N Mavinakayanahalli
2 siblings, 0 replies; 5+ messages in thread
From: Ananth N Mavinakayanahalli @ 2017-03-17 5:25 UTC (permalink / raw)
To: Patrick Williams; +Cc: openbmc, mahesh, vsainath
On Thu, Mar 16, 2017 at 10:37:56AM -0500, Patrick Williams wrote:
> On Thu, Mar 16, 2017 at 01:32:52PM +0530, Ananth N Mavinakayanahalli wrote:
> > Hi,
> >
> > One requirement from a OpenPOWER service point-of-view is to be able to
> > trigger an out-of-band SRESET on a unresponsive system. We can then have
> > the necessary plumbing in the host Linux kernel to either drop the
> > machine into a debugger or trigger a dump capture, if configured.
> >
> > On P9, this would translate to a series of SCOM operations for the SBE
> > It would be good to have a REST API defined to cater to this specific
> > purpose.
> >
> > The API should cater to:
> > - SRESET a core
> > - SRESET a chip
> > - SRESET all cores
> >
> > Thoughts?
> >
> > Regards,
> > Ananth
> >
>
> Ananth,
>
> I understand the desire from your end with respect to debugging the
> host. Is there something we can do to model this better from a REST
> perspective to make this less Power-specific? Do other architectures
> also have a "send debug interrupt"?
Any option that says nmi for x86 can apply here, IMO.
> Do you need to SRESET targeting an SMT thread? We will need to come up
> with some kind of identifier for sending the debug interrupts.
For starters, we will be using the SRESET as an unrecoverable entity --
option of last resort. The SRESET all cores will be the most used, but I
can envisage cases where we would need specific cores/threads to be
forced into xmon or such. While it is good to have the design to be able
to accommodate it, targeted SMT thread reset isn't a 'must have' to
begin with.
Ananth
^ permalink raw reply [flat|nested] 5+ messages in thread