* Preventing a system power on before BMC Ready @ 2023-05-02 20:48 Andrew Geissler 2023-05-02 21:50 ` Michael Richardson 2023-05-03 0:48 ` Ed Tanous 0 siblings, 2 replies; 4+ messages in thread From: Andrew Geissler @ 2023-05-02 20:48 UTC (permalink / raw) To: OpenBMC List About once a month a bug arrives internally where someone has powered on the host without waiting for the BMC to reach its Ready state. Our systems for a variety of reasons require the BMC to be at Ready before initiating a system power on. The defects are usually returned as user error in that users are supposed to know to wait. Our Redfish clients (including the web UI) know to not allow a power on operation until Ready. Recently however we had a bug where our external Redfish client allowed a power on before Ready. That client is event driven once connected to the BMC and because they never got an event about an unexpected BMC reboot, they allowed a power on before Ready when the BMC came back up. Granted there is only about a 30s window where we have a problem here, but as we all know, when there's a window, someone finds it. That got us brainstorming about some possible solutions: - Write some code in bmcweb to send a “bmc state change event” anytime bmcweb comes up to ensure listening clients know “something” has happened - Add an optional compile option to bmcweb (or PSM/x86-power-control) to require BMC Ready before issuing chassis or system POST requests (return error if not at Ready) - Queue up the power on request and execute it once we reach BMC Ready (not sure what type of response that would be to Redfish clients or what error path looks like if we never reach Ready?) - Find a way in the client to better detect an unexpected bmc reboot (heartbeat of some sort) - Push bmcweb further in the startup to BMC Ready, ensuring clients can't talk to the BMC until it's near Ready state Thoughts? Andrew ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Preventing a system power on before BMC Ready 2023-05-02 20:48 Preventing a system power on before BMC Ready Andrew Geissler @ 2023-05-02 21:50 ` Michael Richardson 2023-05-03 0:48 ` Ed Tanous 1 sibling, 0 replies; 4+ messages in thread From: Michael Richardson @ 2023-05-02 21:50 UTC (permalink / raw) To: Andrew Geissler, OpenBMC List [-- Attachment #1: Type: text/plain, Size: 814 bytes --] Andrew Geissler <geissonator@gmail.com> wrote: > That got us brainstorming about some possible solutions: - Write some > code in bmcweb to send a “bmc state change event” anytime bmcweb comes > up to ensure listening clients know “something” has happened useful, but not foolproof. > Queue up the power on request and execute it once we > reach BMC Ready (not sure what type of response that would be to > Redfish clients or what error path looks like if we never reach Ready?) this seems like the best plan. > Push bmcweb further in the startup to BMC > Ready, ensuring clients can't talk to the BMC until it's near Ready > state The problem with this is that if you can't talk to the BMC, then you can't find out why it was never Ready. [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 511 bytes --] ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Preventing a system power on before BMC Ready 2023-05-02 20:48 Preventing a system power on before BMC Ready Andrew Geissler 2023-05-02 21:50 ` Michael Richardson @ 2023-05-03 0:48 ` Ed Tanous 2023-05-09 20:00 ` Andrew Geissler 1 sibling, 1 reply; 4+ messages in thread From: Ed Tanous @ 2023-05-03 0:48 UTC (permalink / raw) To: Andrew Geissler; +Cc: OpenBMC List [-- Attachment #1: Type: text/plain, Size: 2657 bytes --] On Tue, May 2, 2023 at 1:49 PM Andrew Geissler <geissonator@gmail.com> wrote: > > About once a month a bug arrives internally where someone has powered on the > host without waiting for the BMC to reach its Ready state. Our systems for a > variety of reasons require the BMC to be at Ready before initiating a system > power on. > > The defects are usually returned as user error in that users are supposed to > know to wait. Our Redfish clients (including the web UI) know to not allow a > power on operation until Ready. Recently however we had a bug where our external > Redfish client allowed a power on before Ready. That client is event driven once > connected to the BMC and because they never got an event about an unexpected BMC > reboot, they allowed a power on before Ready when the BMC came back up. Granted > there is only about a 30s window where we have a problem here, but as we all > know, when there's a window, someone finds it. > > That got us brainstorming about some possible solutions: > - Write some code in bmcweb to send a “bmc state change event” anytime bmcweb > comes up to ensure listening clients know “something” has happened > - Add an optional compile option to bmcweb (or PSM/x86-power-control) to require > BMC Ready before issuing chassis or system POST requests (return error if not > at Ready) PSM or x86-power-control mods would be my preference. bmcweb should not be in charge of business logic. If the system shouldn't allow power on while the bmc is in ready state, then the daemons that handle power on need to have that as a constraint, otherwise you'd have the same problem if a user tried from IPMI. > - Queue up the power on request and execute it once we reach BMC Ready (not sure > what type of response that would be to Redfish clients or what error path > looks like if we never reach Ready?) Redfish has async tasks for this exact use case, and we already have code to do them. Alternatively you could just return an error that the operation is not possible, along with a retry-after header instructing the user when to retry their request. We do this in the few update apis already. > - Find a way in the client to better detect an unexpected bmc reboot (heartbeat > of some sort) > - Push bmcweb further in the startup to BMC Ready, ensuring clients can't talk > to the BMC until it's near Ready state For your use case, if this is possible, that’s probably easiest and most client friendly, so long as you can handle the case where the bmc never hits “ready” > > Thoughts? > Andrew -- -Ed [-- Attachment #2: Type: text/html, Size: 3277 bytes --] ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Preventing a system power on before BMC Ready 2023-05-03 0:48 ` Ed Tanous @ 2023-05-09 20:00 ` Andrew Geissler 0 siblings, 0 replies; 4+ messages in thread From: Andrew Geissler @ 2023-05-09 20:00 UTC (permalink / raw) To: Ed Tanous, Michael Richardson; +Cc: OpenBMC List [-- Attachment #1: Type: text/plain, Size: 2910 bytes --] > On May 2, 2023, at 7:48 PM, Ed Tanous <ed@tanous.net> wrote: > > > > On Tue, May 2, 2023 at 1:49 PM Andrew Geissler <geissonator@gmail.com <mailto:geissonator@gmail.com>> wrote: > > > > That got us brainstorming about some possible solutions: > > - Write some code in bmcweb to send a “bmc state change event” anytime bmcweb > > comes up to ensure listening clients know “something” has happened > > - Add an optional compile option to bmcweb (or PSM/x86-power-control) to require > > BMC Ready before issuing chassis or system POST requests (return error if not > > at Ready) > > PSM or x86-power-control mods would be my preference. bmcweb should not be in charge of business logic. If the system shouldn't allow power on while the bmc is in ready state, then the daemons that handle power on need to have that as a constraint, otherwise you'd have the same problem if a user tried from IPMI. Thanks for the responses guys. I’m going to go down the path of an optional config option to PSM that will require BMC Ready for chassis or host operations. It will return a well defined d-bus error that bmcweb can look at and return an error to the redfish client indicating the operation is not possible (and when they should retry). Long term, we’d really like to see the power on/off operations return a redfish task so clients could track the power operation vs. the required polling and/or boot event notifications by them now. That timeline for us is out there a bit though. > > - Queue up the power on request and execute it once we reach BMC Ready (not sure > > what type of response that would be to Redfish clients or what error path > > looks like if we never reach Ready?) > > Redfish has async tasks for this exact use case, and we already have code to do them. Alternatively you could just return an error that the operation is not possible, along with a retry-after header instructing the user when to retry their request. We do this in the few update apis already. Yep, I like the alternative here medium term. > > > - Find a way in the client to better detect an unexpected bmc reboot (heartbeat > > of some sort) > > - Push bmcweb further in the startup to BMC Ready, ensuring clients can't talk > > to the BMC until it's near Ready state > > For your use case, if this is possible, that’s probably easiest and most client friendly, so long as you can handle the case where the bmc never hits “ready” Possible, but our redfish client does potentially manage a lot of systems, so anything that increases repeated traffic is frowned upon. And since this seems like something that could affect any Redfish client with similar event driven requirements, it seems best to ensure the openbmc back end provides an adequate error in this situation. > > > > > Thoughts? > > Andrew > -- > -Ed [-- Attachment #2: Type: text/html, Size: 4570 bytes --] ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2023-05-09 20:01 UTC | newest] Thread overview: 4+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-05-02 20:48 Preventing a system power on before BMC Ready Andrew Geissler 2023-05-02 21:50 ` Michael Richardson 2023-05-03 0:48 ` Ed Tanous 2023-05-09 20:00 ` Andrew Geissler
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.