From: Stewart Smith <stewart@linux.vnet.ibm.com>
To: Oliver <oohall@gmail.com>, Joel Stanley <joel@jms.id.au>
Cc: Sergey Kachkin <s.kachkin@gmail.com>,
Alistair Popple <alistair@popple.id.au>,
Benjamin Herrenschmidt <benh@kernel.crashing.org>,
Balbir Singh <bsingharora@gmail.com>,
OpenBMC Maillist <openbmc@lists.ozlabs.org>
Subject: Re: checkstop processing
Date: Tue, 14 Nov 2017 17:01:49 +1100 [thread overview]
Message-ID: <87zi7ptn3m.fsf@linux.vnet.ibm.com> (raw)
In-Reply-To: <CAOSf1CHTeuy0V+JvFr8WWdxuvVcoQMLjXNjR8WSgThGcATYOdg@mail.gmail.com>
Oliver <oohall@gmail.com> writes:
> On Tue, Nov 14, 2017 at 2:42 PM, Joel Stanley <joel@jms.id.au> wrote:
>> On Tue, Nov 14, 2017 at 8:04 AM, Sergey Kachkin <s.kachkin@gmail.com> wrote:
>>> Hi all,
>>>
>>> i'm investigating the checkstop processing and looking for a way to isolate
>>> a faulty component with OpenBmc.
>>> So far SEL logs available via REST are not really helpful.
>>>
>>> Is there any data source in the openbmc to troubleshoot checkstops?
>>>
>>> I guess eSEL binary data parsed with eSEL.pl can be more informative but do
>>> we have any procedure to grab the binary sel data and parse it with the
>>> latest obmc?
>>>
>>> Currently it seems that IPL checkstop analysis is not really working. i mean
>>> that faulty component is not deconfigured on the next boot and gard list is
>>> empty.
>>> It can be easily duplicated by injecting an error manually via putscom.
>>
>> I think you've identified an area that would be great for improvement.
>>
>> I'd like to expand the scope beyond just checkstop to other boot
>> failures: I've tried to boot machines recently that have failed to
>> even start hostboot, and I haven't known what has failed.
>>
>> A tool that inspects recent error logs, and the state of the SBE would
>> be useful. We can leverage libpdbg to talk to the host.
>
> The SBE stores some state information in cfam 2809 that we can use to
> find out the currents istep. I think we can also dump the SBE trace
> buffer out of PIB memory on non-secure systems too. Parsing the trace
> buffer requires the tracehash file from the SBE build, but we can
> probably able to add that to the squashfs file for the host firmware.
This would be ideal to put in a sensor for boot progress.
--
Stewart Smith
OPAL Architect, IBM.
next prev parent reply other threads:[~2017-11-14 6:02 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-11-13 21:34 checkstop processing Sergey Kachkin
2017-11-14 3:42 ` Joel Stanley
2017-11-14 5:15 ` Oliver
2017-11-14 6:01 ` Stewart Smith [this message]
2017-11-14 6:00 ` Stewart Smith
2017-11-14 4:51 ` Oliver
2017-11-14 13:17 ` Balbir Singh
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87zi7ptn3m.fsf@linux.vnet.ibm.com \
--to=stewart@linux.vnet.ibm.com \
--cc=alistair@popple.id.au \
--cc=benh@kernel.crashing.org \
--cc=bsingharora@gmail.com \
--cc=joel@jms.id.au \
--cc=oohall@gmail.com \
--cc=openbmc@lists.ozlabs.org \
--cc=s.kachkin@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.