From: Patrick Williams <patrick@stwcx.xyz>
To: Amithash Prasad <amithash@meta.com>
Cc: LF/OpenBMC Mailing List <openbmc@lists.ozlabs.org>,
"wangkuiying.wky@alibaba-inc.com"
<wangkuiying.wky@alibaba-inc.com>,
"zhikui.ren@intel.com" <zhikui.ren@intel.com>
Subject: Re: [RFC] Special handlers for post-codes
Date: Fri, 30 May 2025 09:17:44 -0400 [thread overview]
Message-ID: <aDmv-MAXX2QFsLlp@heinlein> (raw)
In-Reply-To: <SJ2PR15MB5801C8B07E960251A53DDF98AB61A@SJ2PR15MB5801.namprd15.prod.outlook.com>
[-- Attachment #1: Type: text/plain, Size: 3662 bytes --]
On Fri, May 30, 2025 at 02:02:20AM +0000, Amithash Prasad wrote:
> Hello,
>
> There are many occasions when a post code from a server actually means something is wrong — especially crucial if a boot failure occurs before the part of the system firmware capable of sending a SEL to the BMC is loaded. To support this, I am proposing enhancing phosphor-post-code-mfg to support configurable special handling of post codes.
Thanks, this looks like interesting work. I know some processors that
have magic postcodes that mean things like memory training has failed.
How do you anticipate these configurations are managed? I see 3
options:
1. People add them to their meta-layer for a particular machine
and/or processor.
2. The configuration files are part of phosphor-post-code-manager
(and enabled via CompatibleHardware matching from entity-manager?).
3. The configuration is part of the entity-manager config instead.
My initial impression is that we have two different kinds of configs:
- Configuration that is entirely processor dependent; any system
using a particular processor version will have the same postcode
handling.
- Configuration that is vendor / BIOS / machine specific.
For configuration that is processor dependent, install option (1) does
not seem like a good direction, since it means we're going to be
duplicating this work. I would lean towards option (2) here, but you
probably need a method to load multiple configs: "processor.json" and
"system.json".
I don't think this needs to be solved immediately but "which processor
type is installed in a socket" is not necessarily fixed. For example,
AMD socket SP5[1] supports both "Genoa" and "Bergamo" processor variants,
which could require different post code handling. There is little
reason why a system with an SP5 socket couldn't have a BMC that should
be able to handle both Genoa and Bergamo chips.
>
> Example configuration:
> [
> {
Please add a name and/or description field.
> "primary": [123],
> "secondary": [234, 123],
This is a bit awkward to me; you should probably look at what
entity-manager does. People tend to think of postcodes as hex and not
decimal. I don't think we should do conversion to decimal just to make
it JSON-native; optimize for humans and (especially) reviewers.
> "targets": ["my_special.service"]
Why do we need to be able to trigger custom systemd services? This
isn't clear. To me, this starts to cause the configs to be system
specific, which is far less ideal.
I'd rather see some well-defined "actions" like "catastrophic failure
that requires a server reboot".
You should also consider how multi-host systems, like yosemite4, might
be handled here. We will have multiple instances of phosphor-post-code-manager
running, one for each host. If you do have systemd targets, they have
to be templated.
> },
> {
> "primary": [999],
> "targets": ["power_failure.service"],
> "event": {
> "name": "xyz.openbmc_project.State.Power.PowerRailFault",
> "arguments": {
> "POWER_RAIL": "MyDevice",
> "FAILURE_DATA": ""
> }
> }
> }
> ]
>
I'd like to see a jsonschema validation of whatever the config ends up
being. We do that in at least entity-manager and sdbusplus if you need
examples (EM uses JSON for the schema, sdbusplus uses YAML).
> I would love to get feedback before I continue down this path.
>
>
> Thanks,
>
> Amithash
[1]: https://en.wikipedia.org/wiki/Socket_SP5
--
Patrick Williams
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
next prev parent reply other threads:[~2025-05-30 13:26 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-05-30 2:02 [RFC] Special handlers for post-codes Amithash Prasad
2025-05-30 13:17 ` Patrick Williams [this message]
2025-07-17 21:00 ` Amithash Prasad
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aDmv-MAXX2QFsLlp@heinlein \
--to=patrick@stwcx.xyz \
--cc=amithash@meta.com \
--cc=openbmc@lists.ozlabs.org \
--cc=wangkuiying.wky@alibaba-inc.com \
--cc=zhikui.ren@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.