* [RFC] Special handlers for post-codes
@ 2025-05-30 2:02 Amithash Prasad
2025-05-30 13:17 ` Patrick Williams
0 siblings, 1 reply; 3+ messages in thread
From: Amithash Prasad @ 2025-05-30 2:02 UTC (permalink / raw)
To: LF/OpenBMC Mailing List
Cc: wangkuiying.wky@alibaba-inc.com, zhikui.ren@intel.com,
wangkuiying.wky@alibaba-inc.com, zhikui.ren@intel.com
[-- Attachment #1: Type: text/plain, Size: 1500 bytes --]
Hello,
There are many occasions when a post code from a server actually means something is wrong — especially crucial if a boot failure occurs before the part of the system firmware capable of sending a SEL to the BMC is loaded. To support this, I am proposing enhancing phosphor-post-code-mfg to support configurable special handling of post codes.
Example configuration:
[
{
"primary": [123],
"secondary": [234, 123],
"targets": ["my_special.service"]
},
{
"primary": [999],
"targets": ["power_failure.service"],
"event": {
"name": "xyz.openbmc_project.State.Power.PowerRailFault",
"arguments": {
"POWER_RAIL": "MyDevice",
"FAILURE_DATA": ""
}
}
}
]
The first config will start my_special.service if we receive a postcode: {{123}, {234, 123}}. The second will start power_failure.service and also create a phosphor-logging event xyz.openbmc_project.State.Power.PowerRailFault with arguments {"POWER_FAIL": "MyDevice", "FAILURE_DATA": ""} when we receive a post code {{999}, {anything}}.
Any feedback is appreciated. Especially pertaining to the secondary post-code (since I don't have experience with machines which have a secondary post-code and my proposal is just a guess).
I have a untested proof of concept I am working on: https://gerrit.openbmc.org/c/openbmc/phosphor-post-code-manager/+/80646
I would love to get feedback before I continue down this path.
Thanks,
Amithash
[-- Attachment #2: Type: text/html, Size: 7145 bytes --]
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [RFC] Special handlers for post-codes
2025-05-30 2:02 [RFC] Special handlers for post-codes Amithash Prasad
@ 2025-05-30 13:17 ` Patrick Williams
2025-07-17 21:00 ` Amithash Prasad
0 siblings, 1 reply; 3+ messages in thread
From: Patrick Williams @ 2025-05-30 13:17 UTC (permalink / raw)
To: Amithash Prasad
Cc: LF/OpenBMC Mailing List, wangkuiying.wky@alibaba-inc.com,
zhikui.ren@intel.com
[-- Attachment #1: Type: text/plain, Size: 3662 bytes --]
On Fri, May 30, 2025 at 02:02:20AM +0000, Amithash Prasad wrote:
> Hello,
>
> There are many occasions when a post code from a server actually means something is wrong — especially crucial if a boot failure occurs before the part of the system firmware capable of sending a SEL to the BMC is loaded. To support this, I am proposing enhancing phosphor-post-code-mfg to support configurable special handling of post codes.
Thanks, this looks like interesting work. I know some processors that
have magic postcodes that mean things like memory training has failed.
How do you anticipate these configurations are managed? I see 3
options:
1. People add them to their meta-layer for a particular machine
and/or processor.
2. The configuration files are part of phosphor-post-code-manager
(and enabled via CompatibleHardware matching from entity-manager?).
3. The configuration is part of the entity-manager config instead.
My initial impression is that we have two different kinds of configs:
- Configuration that is entirely processor dependent; any system
using a particular processor version will have the same postcode
handling.
- Configuration that is vendor / BIOS / machine specific.
For configuration that is processor dependent, install option (1) does
not seem like a good direction, since it means we're going to be
duplicating this work. I would lean towards option (2) here, but you
probably need a method to load multiple configs: "processor.json" and
"system.json".
I don't think this needs to be solved immediately but "which processor
type is installed in a socket" is not necessarily fixed. For example,
AMD socket SP5[1] supports both "Genoa" and "Bergamo" processor variants,
which could require different post code handling. There is little
reason why a system with an SP5 socket couldn't have a BMC that should
be able to handle both Genoa and Bergamo chips.
>
> Example configuration:
> [
> {
Please add a name and/or description field.
> "primary": [123],
> "secondary": [234, 123],
This is a bit awkward to me; you should probably look at what
entity-manager does. People tend to think of postcodes as hex and not
decimal. I don't think we should do conversion to decimal just to make
it JSON-native; optimize for humans and (especially) reviewers.
> "targets": ["my_special.service"]
Why do we need to be able to trigger custom systemd services? This
isn't clear. To me, this starts to cause the configs to be system
specific, which is far less ideal.
I'd rather see some well-defined "actions" like "catastrophic failure
that requires a server reboot".
You should also consider how multi-host systems, like yosemite4, might
be handled here. We will have multiple instances of phosphor-post-code-manager
running, one for each host. If you do have systemd targets, they have
to be templated.
> },
> {
> "primary": [999],
> "targets": ["power_failure.service"],
> "event": {
> "name": "xyz.openbmc_project.State.Power.PowerRailFault",
> "arguments": {
> "POWER_RAIL": "MyDevice",
> "FAILURE_DATA": ""
> }
> }
> }
> ]
>
I'd like to see a jsonschema validation of whatever the config ends up
being. We do that in at least entity-manager and sdbusplus if you need
examples (EM uses JSON for the schema, sdbusplus uses YAML).
> I would love to get feedback before I continue down this path.
>
>
> Thanks,
>
> Amithash
[1]: https://en.wikipedia.org/wiki/Socket_SP5
--
Patrick Williams
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [RFC] Special handlers for post-codes
2025-05-30 13:17 ` Patrick Williams
@ 2025-07-17 21:00 ` Amithash Prasad
0 siblings, 0 replies; 3+ messages in thread
From: Amithash Prasad @ 2025-07-17 21:00 UTC (permalink / raw)
To: Patrick Williams
Cc: LF/OpenBMC Mailing List, wangkuiying.wky@alibaba-inc.com,
zhikui.ren@intel.com
[-- Attachment #1: Type: text/plain, Size: 6435 bytes --]
>> I don't think this needs to be solved immediately but "which processor
>> type is installed in a socket" is not necessarily fixed. For example,
>> AMD socket SP5[1] supports both "Genoa" and "Bergamo" processor variants,
>> which could require different post code handling. There is little
>> reason why a system with an SP5 socket couldn't have a BMC that should
>> be able to handle both Genoa and Bergamo chips.
Agree with the general direction. We would need to accomplish a couple things before we get here.
1.
We would need EM or other runtime detection of CPU types.
2.
Also to extend your example, I would not be surprised if Genoa and Bergamo are more similar than different. So, having different handler JSONs would mean they might be more alike than different. But, we can do that optimization when we cross that bridge.
3.
I believe there are post codes defined by system software (BIOS) vendors in addition to CPU vendors. I would not be surprised if there are additional codes which are then OEM defined. This might need platform layer extensions because I would be surprised if there are universally consistent.
At the moment, I think we can go with machine owners packaging the configuration in their meta-layer while the nuances are developed for CPU type detection along with handling of SW/OEM bits.
>> Please add a name and/or description field.
>> I don't think we should do conversion to decimal just to make it JSON-native; optimize for humans and (especially) reviewers.
Ah yes, this will make it easy to review and the configuration more human readable. I will go ahead and push updates to change this (Add name, description fields and convert the primary/secondary to hex strings).
>> I'd like to see a jsonschema validation of whatever the config ends up being
Yes. This will really help catch a lot of things at compile time.
>> You should also consider how multi-host systems, like yosemite4, might be handled here.
+1. I was thinking of extending the code to use magic fields in the configuration for the service to insert the "host" index.
Is there a general common approach taken for this by other services? I see EM uses $ to indicate variables.
Example:
```
"arguments": {
"POWER_RAIL": "/path/to/host$HOST",
```
>> Why do we need to be able to trigger custom systemd services?
I was considering cases certain platforms could do some platform specific debug collection when they receive certain post-code.
Thanks,
Amithash
________________________________
From: Patrick Williams
Sent: Friday, May 30, 2025 6:17 AM
To: Amithash Prasad
Cc: LF/OpenBMC Mailing List; wangkuiying.wky@alibaba-inc.com; zhikui.ren@intel.com
Subject: Re: [RFC] Special handlers for post-codes
On Fri, May 30, 2025 at 02:02:20AM +0000, Amithash Prasad wrote:
> Hello,
>
> There are many occasions when a post code from a server actually means something is wrong — especially crucial if a boot failure occurs before the part of the system firmware capable of sending a SEL to the BMC is loaded. To support this, I am proposing enhancing phosphor-post-code-mfg to support configurable special handling of post codes.
Thanks, this looks like interesting work. I know some processors that
have magic postcodes that mean things like memory training has failed.
How do you anticipate these configurations are managed? I see 3
options:
1. People add them to their meta-layer for a particular machine
and/or processor.
2. The configuration files are part of phosphor-post-code-manager
(and enabled via CompatibleHardware matching from entity-manager?).
3. The configuration is part of the entity-manager config instead.
My initial impression is that we have two different kinds of configs:
- Configuration that is entirely processor dependent; any system
using a particular processor version will have the same postcode
handling.
- Configuration that is vendor / BIOS / machine specific.
For configuration that is processor dependent, install option (1) does
not seem like a good direction, since it means we're going to be
duplicating this work. I would lean towards option (2) here, but you
probably need a method to load multiple configs: "processor.json" and
"system.json".
I don't think this needs to be solved immediately but "which processor
type is installed in a socket" is not necessarily fixed. For example,
AMD socket SP5[1] supports both "Genoa" and "Bergamo" processor variants,
which could require different post code handling. There is little
reason why a system with an SP5 socket couldn't have a BMC that should
be able to handle both Genoa and Bergamo chips.
>
> Example configuration:
> [
> {
Please add a name and/or description field.
> "primary": [123],
> "secondary": [234, 123],
This is a bit awkward to me; you should probably look at what
entity-manager does. People tend to think of postcodes as hex and not
decimal. I don't think we should do conversion to decimal just to make
it JSON-native; optimize for humans and (especially) reviewers.
> "targets": ["my_special.service"]
Why do we need to be able to trigger custom systemd services? This
isn't clear. To me, this starts to cause the configs to be system
specific, which is far less ideal.
I'd rather see some well-defined "actions" like "catastrophic failure
that requires a server reboot".
You should also consider how multi-host systems, like yosemite4, might
be handled here. We will have multiple instances of phosphor-post-code-manager
running, one for each host. If you do have systemd targets, they have
to be templated.
> },
> {
> "primary": [999],
> "targets": ["power_failure.service"],
> "event": {
> "name": "xyz.openbmc_project.State.Power.PowerRailFault",
> "arguments": {
> "POWER_RAIL": "MyDevice",
> "FAILURE_DATA": ""
> }
> }
> }
> ]
>
I'd like to see a jsonschema validation of whatever the config ends up
being. We do that in at least entity-manager and sdbusplus if you need
examples (EM uses JSON for the schema, sdbusplus uses YAML).
> I would love to get feedback before I continue down this path.
>
>
> Thanks,
>
> Amithash
[1]: https://en.wikipedia.org/wiki/Socket_SP5
--
Patrick Williams
[-- Attachment #2: Type: text/html, Size: 16060 bytes --]
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2025-07-17 23:45 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-30 2:02 [RFC] Special handlers for post-codes Amithash Prasad
2025-05-30 13:17 ` Patrick Williams
2025-07-17 21:00 ` Amithash Prasad
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.