public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/1] ipmi: Fix issues with BMCs that report event and message incorrectly
@ 2026-04-21 12:42 Corey Minyard
  2026-04-21 12:42 ` [PATCH 1/2] ipmi: Check event message buffer response for bad data Corey Minyard
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Corey Minyard @ 2026-04-21 12:42 UTC (permalink / raw)
  To: Matt Fleming; +Cc: openipmi-developer, Tony Camuso, linux-kernel, kernel-team

Matt reported that there were issues with the IPMI driver getting wedged
in some cases.  It turns out that the BMC was not reporting an error as
it should have (per the spec) when the event queue was empty.  The IPMI
driver would then request the next event, and so on, wedging the driver.

The BMC sits on a fuzzy line between a trusted devices and a remote and
possibly untrusted device.  If you compromised a BMC you have all sorts
of tools you can use to attack the host: the reset line, interrupts,
and usually access to write the system firmware and possibly devices
like disk drives, serial ports and VGA consoles.  So attacking through
this interface would not be the first thing you would do.  But it is an
possible attack point.

I'm assuming that the BMC was delivering an empty message when this
happens, so the first patch checks the message length to make sure it's
a valid message.  It's a good check no matter what, so it's in
whether that's the issue or not.

The second patch limits the number of events or messages that can
be fetched at a time to 10.  This is a good thing to do, anyway.
If more message or events were present, the next flag check should
get them.  So it's a more general fix.

I looked at adding the patch Matt suggested, doing a timeout on the
wait, but that introduces some race conditions if the response comes
back late.  That will require some more thought.

The timeouts with IPMI can be pretty long, the spec specifies fairly
long timeouts, 5 seconds waiting for the BMC to respond to anything.
So failing an operation can take some time, and reducing the timeouts
is probably a bad idea.  No rationale is given in the spec, but I'm
guessing it expects that a BMC in restart can recover within 5 seconds,
so it gives timeouts so the BMC is always available within that tie.

The spec gives you the gist that the BMC should always be available
on a system that has one.  So the driver (at the beginning) followed
that.

Thus the driver tries 10 times for a message before it gives up, giving
50 seconds total failure time for a message.  That is not in the spec (I
don't think) so that could be made selectable on a per-message basis.
There are already mechanisms for this available in the APIs; I'll look
at that.

-corey


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2026-04-28 13:06 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-21 12:42 [PATCH 0/1] ipmi: Fix issues with BMCs that report event and message incorrectly Corey Minyard
2026-04-21 12:42 ` [PATCH 1/2] ipmi: Check event message buffer response for bad data Corey Minyard
2026-04-21 12:42 ` [PATCH 2/2] ipmi: Add limits to event and receive message requests Corey Minyard
2026-04-25  9:36   ` Matt Fleming
2026-04-25 23:58     ` Corey Minyard
2026-04-28 10:15       ` Matt Fleming
2026-04-28 11:45         ` Corey Minyard
2026-04-28 13:06           ` Corey Minyard
2026-04-21 22:24 ` [PATCH 0/1] ipmi: Fix issues with BMCs that report event and message incorrectly Matt Fleming
2026-04-22  4:44   ` [Openipmi-developer] " Jian Zhang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox