[PATCH 0/1] ipmi: Fix issues with BMCs that report event and message incorrectly

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/1] ipmi: Fix issues with BMCs that report event and message incorrectly
@ 2026-04-21 12:42 Corey Minyard
  2026-04-21 12:42 ` [PATCH 1/2] ipmi: Check event message buffer response for bad data Corey Minyard
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Corey Minyard @ 2026-04-21 12:42 UTC (permalink / raw)
  To: Matt Fleming; +Cc: openipmi-developer, Tony Camuso, linux-kernel, kernel-team

Matt reported that there were issues with the IPMI driver getting wedged
in some cases.  It turns out that the BMC was not reporting an error as
it should have (per the spec) when the event queue was empty.  The IPMI
driver would then request the next event, and so on, wedging the driver.

The BMC sits on a fuzzy line between a trusted devices and a remote and
possibly untrusted device.  If you compromised a BMC you have all sorts
of tools you can use to attack the host: the reset line, interrupts,
and usually access to write the system firmware and possibly devices
like disk drives, serial ports and VGA consoles.  So attacking through
this interface would not be the first thing you would do.  But it is an
possible attack point.

I'm assuming that the BMC was delivering an empty message when this
happens, so the first patch checks the message length to make sure it's
a valid message.  It's a good check no matter what, so it's in
whether that's the issue or not.

The second patch limits the number of events or messages that can
be fetched at a time to 10.  This is a good thing to do, anyway.
If more message or events were present, the next flag check should
get them.  So it's a more general fix.

I looked at adding the patch Matt suggested, doing a timeout on the
wait, but that introduces some race conditions if the response comes
back late.  That will require some more thought.

The timeouts with IPMI can be pretty long, the spec specifies fairly
long timeouts, 5 seconds waiting for the BMC to respond to anything.
So failing an operation can take some time, and reducing the timeouts
is probably a bad idea.  No rationale is given in the spec, but I'm
guessing it expects that a BMC in restart can recover within 5 seconds,
so it gives timeouts so the BMC is always available within that tie.

The spec gives you the gist that the BMC should always be available
on a system that has one.  So the driver (at the beginning) followed
that.

Thus the driver tries 10 times for a message before it gives up, giving
50 seconds total failure time for a message.  That is not in the spec (I
don't think) so that could be made selectable on a per-message basis.
There are already mechanisms for this available in the APIs; I'll look
at that.

-corey

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/2] ipmi: Check event message buffer response for bad data
  2026-04-21 12:42 [PATCH 0/1] ipmi: Fix issues with BMCs that report event and message incorrectly Corey Minyard
@ 2026-04-21 12:42 ` Corey Minyard
  2026-04-21 12:42 ` [PATCH 2/2] ipmi: Add limits to event and receive message requests Corey Minyard
  2026-04-21 22:24 ` [PATCH 0/1] ipmi: Fix issues with BMCs that report event and message incorrectly Matt Fleming
  2 siblings, 0 replies; 10+ messages in thread
From: Corey Minyard @ 2026-04-21 12:42 UTC (permalink / raw)
  To: Matt Fleming
  Cc: openipmi-developer, Tony Camuso, linux-kernel, kernel-team,
	Corey Minyard, stable

The event message buffer response data size got checked later when
processing, but check it right after the response comes back.  It
appears some BMCs may return an empty message instead of an error
when fetching events.

There are apparently some new BMCs that make this error, so we need to
compensate.

Reported-by: Matt Fleming <mfleming@cloudflare.com>
Closes: https://lore.kernel.org/lkml/20260415115930.3428942-1-matt@readmodwrite.com/
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Cc: <stable@vger.kernel.org>
Signed-off-by: Corey Minyard <corey@minyard.net>
---
 drivers/char/ipmi/ipmi_si_intf.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/char/ipmi/ipmi_si_intf.c b/drivers/char/ipmi/ipmi_si_intf.c
index 4a9e9de4d684..08c208cc64c5 100644
--- a/drivers/char/ipmi/ipmi_si_intf.c
+++ b/drivers/char/ipmi/ipmi_si_intf.c
@@ -630,7 +630,13 @@ static void handle_transaction_done(struct smi_info *smi_info)
 		 */
 		msg = smi_info->curr_msg;
 		smi_info->curr_msg = NULL;
-		if (msg->rsp[2] != 0) {
+		/*
+		 * It appears some BMCs, with no event data, return no
+		 * data in the message and not a 0x80 error as the
+		 * spec says they should.  Shut down processing if
+		 * the data is not the right length.
+		 */
+		if (msg->rsp[2] != 0 || msg->rsp_size != 19) {
 			/* Error getting event, probably done. */
 			msg->done(msg);
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 2/2] ipmi: Add limits to event and receive message requests
  2026-04-21 12:42 [PATCH 0/1] ipmi: Fix issues with BMCs that report event and message incorrectly Corey Minyard
  2026-04-21 12:42 ` [PATCH 1/2] ipmi: Check event message buffer response for bad data Corey Minyard
@ 2026-04-21 12:42 ` Corey Minyard
  2026-04-25  9:36   ` Matt Fleming
  2026-04-21 22:24 ` [PATCH 0/1] ipmi: Fix issues with BMCs that report event and message incorrectly Matt Fleming
  2 siblings, 1 reply; 10+ messages in thread
From: Corey Minyard @ 2026-04-21 12:42 UTC (permalink / raw)
  To: Matt Fleming
  Cc: openipmi-developer, Tony Camuso, linux-kernel, kernel-team,
	Corey Minyard, stable

The driver would just fetch events and receive messages until the
BMC said it was done.  To avoid issues with BMCs that never say they are
done, add a limit of 10 fetches at a time.

This is a more general fix than the previous fix for the specific bad
BMC, but should fix the more general issue of a BMC that won't stop
saying it has data.

This has been there from the beginning of the driver.

Reported-by: Matt Fleming <mfleming@cloudflare.com>
Closes: https://lore.kernel.org/lkml/20260415115930.3428942-1-matt@readmodwrite.com/
Fixes: <1da177e4c3f4> ("Linux-2.6.12-rc2")
Cc: stable@vger.kernel.org
Signed-off-by: Corey Minyard <corey@minyard.net>
---
 drivers/char/ipmi/ipmi_si_intf.c | 15 +++++++++++++++
 drivers/char/ipmi/ipmi_ssif.c    | 15 +++++++++++++++
 2 files changed, 30 insertions(+)

diff --git a/drivers/char/ipmi/ipmi_si_intf.c b/drivers/char/ipmi/ipmi_si_intf.c
index 08c208cc64c5..a705aae29867 100644
--- a/drivers/char/ipmi/ipmi_si_intf.c
+++ b/drivers/char/ipmi/ipmi_si_intf.c
@@ -168,6 +168,9 @@ struct smi_info {
 			     OEM2_DATA_AVAIL)
 	unsigned char       msg_flags;
 
+	/* When requesting events and messages, don't do it forever. */
+	unsigned int        num_requests_in_a_row;
+
 	/* Does the BMC have an event buffer? */
 	bool		    has_event_buffer;
 
@@ -410,6 +413,7 @@ static void start_getting_msg_queue(struct smi_info *smi_info)
 
 	start_new_msg(smi_info, smi_info->curr_msg->data,
 		      smi_info->curr_msg->data_size);
+	smi_info->num_requests_in_a_row = 0;
 	smi_info->si_state = SI_GETTING_MESSAGES;
 }
 
@@ -421,6 +425,7 @@ static void start_getting_events(struct smi_info *smi_info)
 
 	start_new_msg(smi_info, smi_info->curr_msg->data,
 		      smi_info->curr_msg->data_size);
+	smi_info->num_requests_in_a_row = 0;
 	smi_info->si_state = SI_GETTING_EVENTS;
 }
 
@@ -646,6 +651,11 @@ static void handle_transaction_done(struct smi_info *smi_info)
 		} else {
 			smi_inc_stat(smi_info, events);
 
+			smi_info->num_requests_in_a_row++;
+			if (smi_info->num_requests_in_a_row > 10)
+				/* Stop if we do this too many times. */
+				smi_info->msg_flags &= ~EVENT_MSG_BUFFER_FULL;
+
 			/*
 			 * Do this before we deliver the message
 			 * because delivering the message releases the
@@ -684,6 +694,11 @@ static void handle_transaction_done(struct smi_info *smi_info)
 		} else {
 			smi_inc_stat(smi_info, incoming_messages);
 
+			smi_info->num_requests_in_a_row++;
+			if (smi_info->num_requests_in_a_row > 10)
+				/* Stop if we do this too many times. */
+				smi_info->msg_flags &= ~RECEIVE_MSG_AVAIL;
+
 			/*
 			 * Do this before we deliver the message
 			 * because delivering the message releases the
diff --git a/drivers/char/ipmi/ipmi_ssif.c b/drivers/char/ipmi/ipmi_ssif.c
index b49500a1bd36..547447f304ba 100644
--- a/drivers/char/ipmi/ipmi_ssif.c
+++ b/drivers/char/ipmi/ipmi_ssif.c
@@ -225,6 +225,9 @@ struct ssif_info {
 	bool		    has_event_buffer;
 	bool		    supports_alert;
 
+	/* When requesting events and messages, don't do it forever. */
+	unsigned int        num_requests_in_a_row;
+
 	/*
 	 * Used to tell what we should do with alerts.  If we are
 	 * waiting on a response, read the data immediately.
@@ -413,6 +416,7 @@ static void start_event_fetch(struct ssif_info *ssif_info, unsigned long *flags)
 	}
 
 	ssif_info->curr_msg = msg;
+	ssif_info->num_requests_in_a_row = 0;
 	ssif_info->ssif_state = SSIF_GETTING_EVENTS;
 	ipmi_ssif_unlock_cond(ssif_info, flags);
 
@@ -436,6 +440,7 @@ static void start_recv_msg_fetch(struct ssif_info *ssif_info,
 	}
 
 	ssif_info->curr_msg = msg;
+	ssif_info->num_requests_in_a_row = 0;
 	ssif_info->ssif_state = SSIF_GETTING_MESSAGES;
 	ipmi_ssif_unlock_cond(ssif_info, flags);
 
@@ -843,6 +848,11 @@ static void msg_done_handler(struct ssif_info *ssif_info, int result,
 			ssif_info->msg_flags &= ~EVENT_MSG_BUFFER_FULL;
 			handle_flags(ssif_info, flags);
 		} else {
+			ssif_info->num_requests_in_a_row++;
+			if (ssif_info->num_requests_in_a_row > 10)
+				/* Stop if we do this too many times. */
+				ssif_info->msg_flags &= ~EVENT_MSG_BUFFER_FULL;
+
 			handle_flags(ssif_info, flags);
 			ssif_inc_stat(ssif_info, events);
 			deliver_recv_msg(ssif_info, msg);
@@ -876,6 +886,11 @@ static void msg_done_handler(struct ssif_info *ssif_info, int result,
 			ssif_info->msg_flags &= ~RECEIVE_MSG_AVAIL;
 			handle_flags(ssif_info, flags);
 		} else {
+			ssif_info->num_requests_in_a_row++;
+			if (ssif_info->num_requests_in_a_row > 10)
+				/* Stop if we do this too many times. */
+				ssif_info->msg_flags &= ~RECEIVE_MSG_AVAIL;
+
 			ssif_inc_stat(ssif_info, incoming_messages);
 			handle_flags(ssif_info, flags);
 			deliver_recv_msg(ssif_info, msg);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] ipmi: Add limits to event and receive message requests
  2026-04-21 12:42 ` [PATCH 2/2] ipmi: Add limits to event and receive message requests Corey Minyard
@ 2026-04-25  9:36   ` Matt Fleming
  2026-04-25 23:58     ` Corey Minyard
  0 siblings, 1 reply; 10+ messages in thread
From: Matt Fleming @ 2026-04-25  9:36 UTC (permalink / raw)
  To: Corey Minyard
  Cc: Matt Fleming, openipmi-developer, Tony Camuso, linux-kernel,
	kernel-team, stable

On Tue, Apr 21, 2026 at 07:42:44AM -0500, Corey Minyard wrote:
> The driver would just fetch events and receive messages until the
> BMC said it was done.  To avoid issues with BMCs that never say they are
> done, add a limit of 10 fetches at a time.
> 
> This is a more general fix than the previous fix for the specific bad
> BMC, but should fix the more general issue of a BMC that won't stop
> saying it has data.
> 
> This has been there from the beginning of the driver.
> 
> Reported-by: Matt Fleming <mfleming@cloudflare.com>
> Closes: https://lore.kernel.org/lkml/20260415115930.3428942-1-matt@readmodwrite.com/
> Fixes: <1da177e4c3f4> ("Linux-2.6.12-rc2")
> Cc: stable@vger.kernel.org
> Signed-off-by: Corey Minyard <corey@minyard.net>
> ---
>  drivers/char/ipmi/ipmi_si_intf.c | 15 +++++++++++++++
>  drivers/char/ipmi/ipmi_ssif.c    | 15 +++++++++++++++
>  2 files changed, 30 insertions(+)
 
[...]

> @@ -410,6 +413,7 @@ static void start_getting_msg_queue(struct smi_info *smi_info)
>  
>  	start_new_msg(smi_info, smi_info->curr_msg->data,
>  		      smi_info->curr_msg->data_size);
> +	smi_info->num_requests_in_a_row = 0;
>  	smi_info->si_state = SI_GETTING_MESSAGES;
>  }
>  
> @@ -421,6 +425,7 @@ static void start_getting_events(struct smi_info *smi_info)
>  
>  	start_new_msg(smi_info, smi_info->curr_msg->data,
>  		      smi_info->curr_msg->data_size);
> +	smi_info->num_requests_in_a_row = 0;
>  	smi_info->si_state = SI_GETTING_EVENTS;
>  }
>  

Would it be better to move this zeroing to handle_transaction_done()?
Otherwise we reset the counter in handle_flags() ->
start_getting_events() and the threshold is never reached.

Thanks,
Matt

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] ipmi: Add limits to event and receive message requests
  2026-04-25  9:36   ` Matt Fleming
@ 2026-04-25 23:58     ` Corey Minyard
  2026-04-28 10:15       ` Matt Fleming
  0 siblings, 1 reply; 10+ messages in thread
From: Corey Minyard @ 2026-04-25 23:58 UTC (permalink / raw)
  To: Matt Fleming
  Cc: Matt Fleming, openipmi-developer, Tony Camuso, linux-kernel,
	kernel-team, stable

On Sat, Apr 25, 2026 at 10:36:05AM +0100, Matt Fleming wrote:
> On Tue, Apr 21, 2026 at 07:42:44AM -0500, Corey Minyard wrote:
> > The driver would just fetch events and receive messages until the
> > BMC said it was done.  To avoid issues with BMCs that never say they are
> > done, add a limit of 10 fetches at a time.
> > 
> > This is a more general fix than the previous fix for the specific bad
> > BMC, but should fix the more general issue of a BMC that won't stop
> > saying it has data.
> > 
> > This has been there from the beginning of the driver.
> > 
> > Reported-by: Matt Fleming <mfleming@cloudflare.com>
> > Closes: https://lore.kernel.org/lkml/20260415115930.3428942-1-matt@readmodwrite.com/
> > Fixes: <1da177e4c3f4> ("Linux-2.6.12-rc2")
> > Cc: stable@vger.kernel.org
> > Signed-off-by: Corey Minyard <corey@minyard.net>
> > ---
> >  drivers/char/ipmi/ipmi_si_intf.c | 15 +++++++++++++++
> >  drivers/char/ipmi/ipmi_ssif.c    | 15 +++++++++++++++
> >  2 files changed, 30 insertions(+)
>  
> [...]
> 
> > @@ -410,6 +413,7 @@ static void start_getting_msg_queue(struct smi_info *smi_info)
> >  
> >  	start_new_msg(smi_info, smi_info->curr_msg->data,
> >  		      smi_info->curr_msg->data_size);
> > +	smi_info->num_requests_in_a_row = 0;
> >  	smi_info->si_state = SI_GETTING_MESSAGES;
> >  }
> >  
> > @@ -421,6 +425,7 @@ static void start_getting_events(struct smi_info *smi_info)
> >  
> >  	start_new_msg(smi_info, smi_info->curr_msg->data,
> >  		      smi_info->curr_msg->data_size);
> > +	smi_info->num_requests_in_a_row = 0;
> >  	smi_info->si_state = SI_GETTING_EVENTS;
> >  }
> >  
> 
> Would it be better to move this zeroing to handle_transaction_done()?
> Otherwise we reset the counter in handle_flags() ->
> start_getting_events() and the threshold is never reached.

Oh, yeah.

Moving it to handle_transaction_done() is not ideal, though.  If
something was spewing receive messages, it would never get to events,
which is why I did it like I did.

The following should fix this.  You could also have different limits for
receive messages and events, but I think the following is more clear.

diff --git a/drivers/char/ipmi/ipmi_si_intf.c b/drivers/char/ipmi/ipmi_si_intf.c
index 2a739123270c..e46f4150ceb5 100644
--- a/drivers/char/ipmi/ipmi_si_intf.c
+++ b/drivers/char/ipmi/ipmi_si_intf.c
@@ -413,8 +413,10 @@ static void start_getting_msg_queue(struct smi_info *smi_info)

 	start_new_msg(smi_info, smi_info->curr_msg->data,
 		      smi_info->curr_msg->data_size);
-	smi_info->num_requests_in_a_row = 0;
-	smi_info->si_state = SI_GETTING_MESSAGES;
+	if (smi_info->si_state != SI_GETTING_MESSAGES) {
+	    smi_info->num_requests_in_a_row = 0;
+	    smi_info->si_state = SI_GETTING_MESSAGES;
+	}
 }

 static void start_getting_events(struct smi_info *smi_info)
@@ -425,8 +427,10 @@ static void start_getting_events(struct smi_info *smi_info)

 	start_new_msg(smi_info, smi_info->curr_msg->data,
 		      smi_info->curr_msg->data_size);
-	smi_info->num_requests_in_a_row = 0;
-	smi_info->si_state = SI_GETTING_EVENTS;
+	if (smi_info->si_state != SI_GETTING_EVENTS) {
+	    smi_info->num_requests_in_a_row = 0;
+	    smi_info->si_state = SI_GETTING_EVENTS;
+	}
 }

 /*


> 
> Thanks,
> Matt

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] ipmi: Add limits to event and receive message requests
  2026-04-25 23:58     ` Corey Minyard
@ 2026-04-28 10:15       ` Matt Fleming
  2026-04-28 11:45         ` Corey Minyard
  0 siblings, 1 reply; 10+ messages in thread
From: Matt Fleming @ 2026-04-28 10:15 UTC (permalink / raw)
  To: Corey Minyard
  Cc: Matt Fleming, openipmi-developer, Tony Camuso, linux-kernel,
	kernel-team, stable

On Sat, Apr 25, 2026 at 06:58:48PM -0500, Corey Minyard wrote:
> 
> Oh, yeah.
> 
> Moving it to handle_transaction_done() is not ideal, though.  If
> something was spewing receive messages, it would never get to events,
> which is why I did it like I did.
> 
> The following should fix this.  You could also have different limits for
> receive messages and events, but I think the following is more clear.
> 
> diff --git a/drivers/char/ipmi/ipmi_si_intf.c b/drivers/char/ipmi/ipmi_si_intf.c
> index 2a739123270c..e46f4150ceb5 100644
> --- a/drivers/char/ipmi/ipmi_si_intf.c
> +++ b/drivers/char/ipmi/ipmi_si_intf.c
> @@ -413,8 +413,10 @@ static void start_getting_msg_queue(struct smi_info *smi_info)
> 
>  	start_new_msg(smi_info, smi_info->curr_msg->data,
>  		      smi_info->curr_msg->data_size);
> -	smi_info->num_requests_in_a_row = 0;
> -	smi_info->si_state = SI_GETTING_MESSAGES;
> +	if (smi_info->si_state != SI_GETTING_MESSAGES) {
> +	    smi_info->num_requests_in_a_row = 0;
> +	    smi_info->si_state = SI_GETTING_MESSAGES;
> +	}
>  }
> 
>  static void start_getting_events(struct smi_info *smi_info)
> @@ -425,8 +427,10 @@ static void start_getting_events(struct smi_info *smi_info)
> 
>  	start_new_msg(smi_info, smi_info->curr_msg->data,
>  		      smi_info->curr_msg->data_size);
> -	smi_info->num_requests_in_a_row = 0;
> -	smi_info->si_state = SI_GETTING_EVENTS;
> +	if (smi_info->si_state != SI_GETTING_EVENTS) {
> +	    smi_info->num_requests_in_a_row = 0;
> +	    smi_info->si_state = SI_GETTING_EVENTS;
> +	}

Thanks. Does this correctly handle a stream of mixed receive+event
flags, though? If we bounce between MESSAGES and EVENTS, won't we keep
resetting the counter on each state transition and never hit the limit?

I was able to cook up a simple repro in Qemu for this class of bug.

---->8----

diff --git a/hw/ipmi/ipmi_bmc_sim.c b/hw/ipmi/ipmi_bmc_sim.c
index fd875491f5..127db30c03 100644
--- a/hw/ipmi/ipmi_bmc_sim.c
+++ b/hw/ipmi/ipmi_bmc_sim.c
@@ -249,6 +249,21 @@ struct IPMIBmcSim {
     uint8_t evtbuf[16];
 
     QTAILQ_HEAD(, IPMIRcvBufEntry) rcvbufs;
+
+    /*
+     * Fault injection: sticky EVENT_MSG_BUFFER_FULL.
+     *
+     * Simulates a BMC that continuously generates events (e.g. after a
+     * cold reset causes sensor state changes).  Once armed, every
+     * READ_EVENT_MSG_BUFFER returns success but never clears the
+     * EVT_BUF_FULL flag, starving waiting_msg in the SI layer's
+     * handle_flags() loop.  Reproduces the 517m277 / KRN-1233 wedge.
+     */
+    bool     fi_sticky_events;   /* enable via property */
+    uint32_t fi_evt_arm_after;   /* arm after N completed responses */
+    uint32_t fi_evt_rsp_count;   /* responses completed so far */
+    bool     fi_evt_armed;       /* fault is active */
+    uint32_t fi_evt_read_count;  /* READ_EVENT_MSG_BUFFER calls served */
 };
 
 #define IPMI_BMC_MSG_FLAG_WATCHDOG_TIMEOUT_MASK        (1 << 3)
@@ -494,7 +509,7 @@ static int sel_add_event(IPMIBmcSim *ibs, uint8_t *event)
 static int attn_set(IPMIBmcSim *ibs)
 {
     return IPMI_BMC_MSG_FLAG_RCV_MSG_QUEUE_SET(ibs)
-        || IPMI_BMC_MSG_FLAG_EVT_BUF_FULL_SET(ibs)
+        || (!ibs->fi_evt_armed && IPMI_BMC_MSG_FLAG_EVT_BUF_FULL_SET(ibs))
         || IPMI_BMC_MSG_FLAG_WATCHDOG_TIMEOUT_MASK_SET(ibs);
 }
 
@@ -750,6 +765,48 @@ static void ipmi_sim_handle_command(IPMIBmc *b,
  out:
     k->handle_rsp(s, msg_id, rsp.buffer, rsp.len);
 
+    /*
+     * Fault injection: count completed responses and arm sticky
+     * EVENT_MSG_BUFFER_FULL after the configured threshold.
+     */
+    if (ibs->fi_sticky_events && !ibs->fi_evt_armed) {
+        ibs->fi_evt_rsp_count++;
+        if (ibs->fi_evt_rsp_count >= ibs->fi_evt_arm_after) {
+            ibs->fi_evt_armed = true;
+            ibs->fi_evt_read_count = 0;
+
+            /*
+             * Seed the event buffer with a synthetic sensor event
+             * (sensor type 0x01 = Temperature, event type 0x01 =
+             * threshold, evd1 = upper critical going high).
+             * This matches what real BMCs generate after a cold reset.
+             */
+            memset(ibs->evtbuf, 0, 16);
+            ibs->evtbuf[2]  = 0x02; /* System event record */
+            ibs->evtbuf[7]  = ibs->parent.slave_addr;
+            ibs->evtbuf[9]  = 0x04; /* Format version */
+            ibs->evtbuf[10] = 0x01; /* Sensor type: Temperature */
+            ibs->evtbuf[11] = 0x01; /* Sensor number */
+            ibs->evtbuf[12] = 0x01; /* Event type: threshold */
+            ibs->evtbuf[13] = 0x09; /* Upper critical going high */
+            ibs->evtbuf[14] = 0x57; /* Threshold value */
+            ibs->evtbuf[15] = 0x00; /* Sequence (incremented on reads) */
+
+            ibs->msg_flags |= IPMI_BMC_MSG_FLAG_EVT_BUF_FULL;
+
+            /* Ensure event message buffer is enabled in global enables
+             * so the kernel sees it.  Also enable event logging.
+             */
+            ibs->bmc_global_enables |= (1 << IPMI_BMC_EVENT_MSG_BUF_BIT)
+                                     | (1 << IPMI_BMC_EVENT_LOG_BIT);
+
+            k->set_atn(s, 1, attn_irq_enabled(ibs));
+
+            fprintf(stderr, "[FI] sticky-events armed after %u responses\n",
+                    ibs->fi_evt_rsp_count);
+        }
+    }
+
     next_timeout(ibs);
 }
 
@@ -1013,8 +1070,14 @@ static void clr_msg_flags(IPMIBmcSim *ibs,
 {
     IPMIInterface *s = ibs->parent.intf;
     IPMIInterfaceClass *k = IPMI_INTERFACE_GET_CLASS(s);
+    uint8_t clear_mask = cmd[2];
+
+    if (ibs->fi_evt_armed) {
+        /* Don't allow clearing EVT_BUF_FULL when sticky events active */
+        clear_mask &= ~IPMI_BMC_MSG_FLAG_EVT_BUF_FULL;
+    }
 
-    ibs->msg_flags &= ~cmd[2];
+    ibs->msg_flags &= ~clear_mask;
     k->set_atn(s, attn_set(ibs), attn_irq_enabled(ibs));
 }
 
@@ -1040,7 +1103,19 @@ static void read_evt_msg_buf(IPMIBmcSim *ibs,
     for (i = 0; i < 16; i++) {
         rsp_buffer_push(rsp, ibs->evtbuf[i]);
     }
-    ibs->msg_flags &= ~IPMI_BMC_MSG_FLAG_EVT_BUF_FULL;
+
+    if (ibs->fi_evt_armed) {
+        /*
+         * Sticky mode: return success but keep EVT_BUF_FULL set.
+         * Vary the event data slightly so the kernel doesn't
+         * de-duplicate (increment evd3 as a sequence number).
+         */
+        ibs->fi_evt_read_count++;
+        ibs->evtbuf[15] = (uint8_t)(ibs->fi_evt_read_count & 0xff);
+        /* Don't clear the flag — starvation continues */
+    } else {
+        ibs->msg_flags &= ~IPMI_BMC_MSG_FLAG_EVT_BUF_FULL;
+    }
     k->set_atn(s, attn_set(ibs), attn_irq_enabled(ibs));
 }
 
@@ -2670,6 +2745,8 @@ static const Property ipmi_sim_properties[] = {
     DEFINE_PROP_STRING("lan.netmask", IPMIBmcSim, lan.arg_netmask),
     DEFINE_PROP_STRING("lan.defgw_ipaddr", IPMIBmcSim, lan.arg_defgw_ipaddr),
     DEFINE_PROP_MACADDR("lan.defgw_macaddr", IPMIBmcSim, lan.defgw_macaddr),
+    DEFINE_PROP_BOOL("fi_sticky_events", IPMIBmcSim, fi_sticky_events, false),
+    DEFINE_PROP_UINT32("fi_evt_arm_after", IPMIBmcSim, fi_evt_arm_after, 40),
 };
 
 static void ipmi_sim_class_init(ObjectClass *oc, const void *data)

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] ipmi: Add limits to event and receive message requests
  2026-04-28 10:15       ` Matt Fleming
@ 2026-04-28 11:45         ` Corey Minyard
  2026-04-28 13:06           ` Corey Minyard
  0 siblings, 1 reply; 10+ messages in thread
From: Corey Minyard @ 2026-04-28 11:45 UTC (permalink / raw)
  To: Matt Fleming
  Cc: Matt Fleming, openipmi-developer, Tony Camuso, linux-kernel,
	kernel-team, stable

On Tue, Apr 28, 2026 at 11:15:46AM +0100, Matt Fleming wrote:
> On Sat, Apr 25, 2026 at 06:58:48PM -0500, Corey Minyard wrote:
> > 
> > Oh, yeah.
> > 
> > Moving it to handle_transaction_done() is not ideal, though.  If
> > something was spewing receive messages, it would never get to events,
> > which is why I did it like I did.
> > 
> > The following should fix this.  You could also have different limits for
> > receive messages and events, but I think the following is more clear.
> > 
> > diff --git a/drivers/char/ipmi/ipmi_si_intf.c b/drivers/char/ipmi/ipmi_si_intf.c
> > index 2a739123270c..e46f4150ceb5 100644
> > --- a/drivers/char/ipmi/ipmi_si_intf.c
> > +++ b/drivers/char/ipmi/ipmi_si_intf.c
> > @@ -413,8 +413,10 @@ static void start_getting_msg_queue(struct smi_info *smi_info)
> > 
> >  	start_new_msg(smi_info, smi_info->curr_msg->data,
> >  		      smi_info->curr_msg->data_size);
> > -	smi_info->num_requests_in_a_row = 0;
> > -	smi_info->si_state = SI_GETTING_MESSAGES;
> > +	if (smi_info->si_state != SI_GETTING_MESSAGES) {
> > +	    smi_info->num_requests_in_a_row = 0;
> > +	    smi_info->si_state = SI_GETTING_MESSAGES;
> > +	}
> >  }
> > 
> >  static void start_getting_events(struct smi_info *smi_info)
> > @@ -425,8 +427,10 @@ static void start_getting_events(struct smi_info *smi_info)
> > 
> >  	start_new_msg(smi_info, smi_info->curr_msg->data,
> >  		      smi_info->curr_msg->data_size);
> > -	smi_info->num_requests_in_a_row = 0;
> > -	smi_info->si_state = SI_GETTING_EVENTS;
> > +	if (smi_info->si_state != SI_GETTING_EVENTS) {
> > +	    smi_info->num_requests_in_a_row = 0;
> > +	    smi_info->si_state = SI_GETTING_EVENTS;
> > +	}
> 
> Thanks. Does this correctly handle a stream of mixed receive+event
> flags, though? If we bounce between MESSAGES and EVENTS, won't we keep
> resetting the counter on each state transition and never hit the limit?

It should.  Once the limit is reached it clears the bit in msg_flags.
That should prevent the messages or events from being re-requested
until the next flags fetch.  So if a continuous stream of messages
and events was coming in, it should fetch 10 messages, clear the bit,
then fetch 10 events, then clear that bit, then go back to normal
operation.

Of course, the next flag fetch will start the process over.

> 
> I was able to cook up a simple repro in Qemu for this class of bug.

I was thinking about how to do an automated test for this.  I use an
external simulated BMC for the automated tests I have.  So I'll work in
that direction.

But thanks, this should help me develop that test.

-corey

> 
> ---->8----
> 
> diff --git a/hw/ipmi/ipmi_bmc_sim.c b/hw/ipmi/ipmi_bmc_sim.c
> index fd875491f5..127db30c03 100644
> --- a/hw/ipmi/ipmi_bmc_sim.c
> +++ b/hw/ipmi/ipmi_bmc_sim.c
> @@ -249,6 +249,21 @@ struct IPMIBmcSim {
>      uint8_t evtbuf[16];
>  
>      QTAILQ_HEAD(, IPMIRcvBufEntry) rcvbufs;
> +
> +    /*
> +     * Fault injection: sticky EVENT_MSG_BUFFER_FULL.
> +     *
> +     * Simulates a BMC that continuously generates events (e.g. after a
> +     * cold reset causes sensor state changes).  Once armed, every
> +     * READ_EVENT_MSG_BUFFER returns success but never clears the
> +     * EVT_BUF_FULL flag, starving waiting_msg in the SI layer's
> +     * handle_flags() loop.  Reproduces the 517m277 / KRN-1233 wedge.
> +     */
> +    bool     fi_sticky_events;   /* enable via property */
> +    uint32_t fi_evt_arm_after;   /* arm after N completed responses */
> +    uint32_t fi_evt_rsp_count;   /* responses completed so far */
> +    bool     fi_evt_armed;       /* fault is active */
> +    uint32_t fi_evt_read_count;  /* READ_EVENT_MSG_BUFFER calls served */
>  };
>  
>  #define IPMI_BMC_MSG_FLAG_WATCHDOG_TIMEOUT_MASK        (1 << 3)
> @@ -494,7 +509,7 @@ static int sel_add_event(IPMIBmcSim *ibs, uint8_t *event)
>  static int attn_set(IPMIBmcSim *ibs)
>  {
>      return IPMI_BMC_MSG_FLAG_RCV_MSG_QUEUE_SET(ibs)
> -        || IPMI_BMC_MSG_FLAG_EVT_BUF_FULL_SET(ibs)
> +        || (!ibs->fi_evt_armed && IPMI_BMC_MSG_FLAG_EVT_BUF_FULL_SET(ibs))
>          || IPMI_BMC_MSG_FLAG_WATCHDOG_TIMEOUT_MASK_SET(ibs);
>  }
>  
> @@ -750,6 +765,48 @@ static void ipmi_sim_handle_command(IPMIBmc *b,
>   out:
>      k->handle_rsp(s, msg_id, rsp.buffer, rsp.len);
>  
> +    /*
> +     * Fault injection: count completed responses and arm sticky
> +     * EVENT_MSG_BUFFER_FULL after the configured threshold.
> +     */
> +    if (ibs->fi_sticky_events && !ibs->fi_evt_armed) {
> +        ibs->fi_evt_rsp_count++;
> +        if (ibs->fi_evt_rsp_count >= ibs->fi_evt_arm_after) {
> +            ibs->fi_evt_armed = true;
> +            ibs->fi_evt_read_count = 0;
> +
> +            /*
> +             * Seed the event buffer with a synthetic sensor event
> +             * (sensor type 0x01 = Temperature, event type 0x01 =
> +             * threshold, evd1 = upper critical going high).
> +             * This matches what real BMCs generate after a cold reset.
> +             */
> +            memset(ibs->evtbuf, 0, 16);
> +            ibs->evtbuf[2]  = 0x02; /* System event record */
> +            ibs->evtbuf[7]  = ibs->parent.slave_addr;
> +            ibs->evtbuf[9]  = 0x04; /* Format version */
> +            ibs->evtbuf[10] = 0x01; /* Sensor type: Temperature */
> +            ibs->evtbuf[11] = 0x01; /* Sensor number */
> +            ibs->evtbuf[12] = 0x01; /* Event type: threshold */
> +            ibs->evtbuf[13] = 0x09; /* Upper critical going high */
> +            ibs->evtbuf[14] = 0x57; /* Threshold value */
> +            ibs->evtbuf[15] = 0x00; /* Sequence (incremented on reads) */
> +
> +            ibs->msg_flags |= IPMI_BMC_MSG_FLAG_EVT_BUF_FULL;
> +
> +            /* Ensure event message buffer is enabled in global enables
> +             * so the kernel sees it.  Also enable event logging.
> +             */
> +            ibs->bmc_global_enables |= (1 << IPMI_BMC_EVENT_MSG_BUF_BIT)
> +                                     | (1 << IPMI_BMC_EVENT_LOG_BIT);
> +
> +            k->set_atn(s, 1, attn_irq_enabled(ibs));
> +
> +            fprintf(stderr, "[FI] sticky-events armed after %u responses\n",
> +                    ibs->fi_evt_rsp_count);
> +        }
> +    }
> +
>      next_timeout(ibs);
>  }
>  
> @@ -1013,8 +1070,14 @@ static void clr_msg_flags(IPMIBmcSim *ibs,
>  {
>      IPMIInterface *s = ibs->parent.intf;
>      IPMIInterfaceClass *k = IPMI_INTERFACE_GET_CLASS(s);
> +    uint8_t clear_mask = cmd[2];
> +
> +    if (ibs->fi_evt_armed) {
> +        /* Don't allow clearing EVT_BUF_FULL when sticky events active */
> +        clear_mask &= ~IPMI_BMC_MSG_FLAG_EVT_BUF_FULL;
> +    }
>  
> -    ibs->msg_flags &= ~cmd[2];
> +    ibs->msg_flags &= ~clear_mask;
>      k->set_atn(s, attn_set(ibs), attn_irq_enabled(ibs));
>  }
>  
> @@ -1040,7 +1103,19 @@ static void read_evt_msg_buf(IPMIBmcSim *ibs,
>      for (i = 0; i < 16; i++) {
>          rsp_buffer_push(rsp, ibs->evtbuf[i]);
>      }
> -    ibs->msg_flags &= ~IPMI_BMC_MSG_FLAG_EVT_BUF_FULL;
> +
> +    if (ibs->fi_evt_armed) {
> +        /*
> +         * Sticky mode: return success but keep EVT_BUF_FULL set.
> +         * Vary the event data slightly so the kernel doesn't
> +         * de-duplicate (increment evd3 as a sequence number).
> +         */
> +        ibs->fi_evt_read_count++;
> +        ibs->evtbuf[15] = (uint8_t)(ibs->fi_evt_read_count & 0xff);
> +        /* Don't clear the flag — starvation continues */
> +    } else {
> +        ibs->msg_flags &= ~IPMI_BMC_MSG_FLAG_EVT_BUF_FULL;
> +    }
>      k->set_atn(s, attn_set(ibs), attn_irq_enabled(ibs));
>  }
>  
> @@ -2670,6 +2745,8 @@ static const Property ipmi_sim_properties[] = {
>      DEFINE_PROP_STRING("lan.netmask", IPMIBmcSim, lan.arg_netmask),
>      DEFINE_PROP_STRING("lan.defgw_ipaddr", IPMIBmcSim, lan.arg_defgw_ipaddr),
>      DEFINE_PROP_MACADDR("lan.defgw_macaddr", IPMIBmcSim, lan.defgw_macaddr),
> +    DEFINE_PROP_BOOL("fi_sticky_events", IPMIBmcSim, fi_sticky_events, false),
> +    DEFINE_PROP_UINT32("fi_evt_arm_after", IPMIBmcSim, fi_evt_arm_after, 40),
>  };
>  
>  static void ipmi_sim_class_init(ObjectClass *oc, const void *data)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] ipmi: Add limits to event and receive message requests
  2026-04-28 11:45         ` Corey Minyard
@ 2026-04-28 13:06           ` Corey Minyard
  0 siblings, 0 replies; 10+ messages in thread
From: Corey Minyard @ 2026-04-28 13:06 UTC (permalink / raw)
  To: Matt Fleming, Matt Fleming, openipmi-developer, Tony Camuso,
	linux-kernel, kernel-team, stable

On Tue, Apr 28, 2026 at 06:45:15AM -0500, Corey Minyard wrote:
> On Tue, Apr 28, 2026 at 11:15:46AM +0100, Matt Fleming wrote:
> > On Sat, Apr 25, 2026 at 06:58:48PM -0500, Corey Minyard wrote:
> > > 
> > > Oh, yeah.
> > > 
> > > Moving it to handle_transaction_done() is not ideal, though.  If
> > > something was spewing receive messages, it would never get to events,
> > > which is why I did it like I did.
> > > 
> > > The following should fix this.  You could also have different limits for
> > > receive messages and events, but I think the following is more clear.
> > > 
> > > diff --git a/drivers/char/ipmi/ipmi_si_intf.c b/drivers/char/ipmi/ipmi_si_intf.c
> > > index 2a739123270c..e46f4150ceb5 100644
> > > --- a/drivers/char/ipmi/ipmi_si_intf.c
> > > +++ b/drivers/char/ipmi/ipmi_si_intf.c
> > > @@ -413,8 +413,10 @@ static void start_getting_msg_queue(struct smi_info *smi_info)
> > > 
> > >  	start_new_msg(smi_info, smi_info->curr_msg->data,
> > >  		      smi_info->curr_msg->data_size);
> > > -	smi_info->num_requests_in_a_row = 0;
> > > -	smi_info->si_state = SI_GETTING_MESSAGES;
> > > +	if (smi_info->si_state != SI_GETTING_MESSAGES) {
> > > +	    smi_info->num_requests_in_a_row = 0;
> > > +	    smi_info->si_state = SI_GETTING_MESSAGES;
> > > +	}
> > >  }
> > > 
> > >  static void start_getting_events(struct smi_info *smi_info)
> > > @@ -425,8 +427,10 @@ static void start_getting_events(struct smi_info *smi_info)
> > > 
> > >  	start_new_msg(smi_info, smi_info->curr_msg->data,
> > >  		      smi_info->curr_msg->data_size);
> > > -	smi_info->num_requests_in_a_row = 0;
> > > -	smi_info->si_state = SI_GETTING_EVENTS;
> > > +	if (smi_info->si_state != SI_GETTING_EVENTS) {
> > > +	    smi_info->num_requests_in_a_row = 0;
> > > +	    smi_info->si_state = SI_GETTING_EVENTS;
> > > +	}
> > 
> > Thanks. Does this correctly handle a stream of mixed receive+event
> > flags, though? If we bounce between MESSAGES and EVENTS, won't we keep
> > resetting the counter on each state transition and never hit the limit?
> 
> It should.  Once the limit is reached it clears the bit in msg_flags.
> That should prevent the messages or events from being re-requested
> until the next flags fetch.  So if a continuous stream of messages
> and events was coming in, it should fetch 10 messages, clear the bit,
> then fetch 10 events, then clear that bit, then go back to normal
> operation.
> 
> Of course, the next flag fetch will start the process over.

Actually, probing deeper, this probably won't work.  And I'm not sure
there is much I can do to fix it.  It will be much harder.  But it
depends on how the BMC handles this.

If there is something in the event queue or receive message queue, the
BMC is supposed to set an attention bit (ATTN) on the interface  If
ATTN is set, the driver is supposed to fetch flags to know what it
needs to do.

I haven't tried, but in the qemu changes below, I'm fairly sure the ATTN
bit will never get cleared, thus when it goes through all this the
KCS state machine will return SI_SM_ATTN at the end and the flag fetch
will start again.  You are still wedged.

The qemu version also runs with interrupts by default, which only
magnifies the problem.  In that case, when ATTN is set and you aren't
running transactions, the interrupt is enabled.  On KCS there is no way
in the KCS interface to disable the interrupt at the interface, you have
to send it a message or disable it with disable_irq().

But the actual failing BMC may not work this way.  It may or may not
clear the Event Message Buffer Full flag.  It may or may not do anything
with ATTN.

A driver can only do so much to account for broken hardware.  The driver
is already too complex, a lot of it due to having to handle broken
hardware.  Fixing this adds more complexity and penalizes hardware
that works properly.

Anyway, I'm going to need to get this failing in simulation and figure
out how to handle this.  Yet more issues may come up, especially with
interrupts.

Is there any way you can just get the hardware fixed?  It's never going
to work very well as it is.  I'd be inclined to just denylist it.

-corey

> 
> > 
> > I was able to cook up a simple repro in Qemu for this class of bug.
> 
> I was thinking about how to do an automated test for this.  I use an
> external simulated BMC for the automated tests I have.  So I'll work in
> that direction.
> 
> But thanks, this should help me develop that test.
> 
> -corey
> 
> > 
> > ---->8----
> > 
> > diff --git a/hw/ipmi/ipmi_bmc_sim.c b/hw/ipmi/ipmi_bmc_sim.c
> > index fd875491f5..127db30c03 100644
> > --- a/hw/ipmi/ipmi_bmc_sim.c
> > +++ b/hw/ipmi/ipmi_bmc_sim.c
> > @@ -249,6 +249,21 @@ struct IPMIBmcSim {
> >      uint8_t evtbuf[16];
> >  
> >      QTAILQ_HEAD(, IPMIRcvBufEntry) rcvbufs;
> > +
> > +    /*
> > +     * Fault injection: sticky EVENT_MSG_BUFFER_FULL.
> > +     *
> > +     * Simulates a BMC that continuously generates events (e.g. after a
> > +     * cold reset causes sensor state changes).  Once armed, every
> > +     * READ_EVENT_MSG_BUFFER returns success but never clears the
> > +     * EVT_BUF_FULL flag, starving waiting_msg in the SI layer's
> > +     * handle_flags() loop.  Reproduces the 517m277 / KRN-1233 wedge.
> > +     */
> > +    bool     fi_sticky_events;   /* enable via property */
> > +    uint32_t fi_evt_arm_after;   /* arm after N completed responses */
> > +    uint32_t fi_evt_rsp_count;   /* responses completed so far */
> > +    bool     fi_evt_armed;       /* fault is active */
> > +    uint32_t fi_evt_read_count;  /* READ_EVENT_MSG_BUFFER calls served */
> >  };
> >  
> >  #define IPMI_BMC_MSG_FLAG_WATCHDOG_TIMEOUT_MASK        (1 << 3)
> > @@ -494,7 +509,7 @@ static int sel_add_event(IPMIBmcSim *ibs, uint8_t *event)
> >  static int attn_set(IPMIBmcSim *ibs)
> >  {
> >      return IPMI_BMC_MSG_FLAG_RCV_MSG_QUEUE_SET(ibs)
> > -        || IPMI_BMC_MSG_FLAG_EVT_BUF_FULL_SET(ibs)
> > +        || (!ibs->fi_evt_armed && IPMI_BMC_MSG_FLAG_EVT_BUF_FULL_SET(ibs))
> >          || IPMI_BMC_MSG_FLAG_WATCHDOG_TIMEOUT_MASK_SET(ibs);
> >  }
> >  
> > @@ -750,6 +765,48 @@ static void ipmi_sim_handle_command(IPMIBmc *b,
> >   out:
> >      k->handle_rsp(s, msg_id, rsp.buffer, rsp.len);
> >  
> > +    /*
> > +     * Fault injection: count completed responses and arm sticky
> > +     * EVENT_MSG_BUFFER_FULL after the configured threshold.
> > +     */
> > +    if (ibs->fi_sticky_events && !ibs->fi_evt_armed) {
> > +        ibs->fi_evt_rsp_count++;
> > +        if (ibs->fi_evt_rsp_count >= ibs->fi_evt_arm_after) {
> > +            ibs->fi_evt_armed = true;
> > +            ibs->fi_evt_read_count = 0;
> > +
> > +            /*
> > +             * Seed the event buffer with a synthetic sensor event
> > +             * (sensor type 0x01 = Temperature, event type 0x01 =
> > +             * threshold, evd1 = upper critical going high).
> > +             * This matches what real BMCs generate after a cold reset.
> > +             */
> > +            memset(ibs->evtbuf, 0, 16);
> > +            ibs->evtbuf[2]  = 0x02; /* System event record */
> > +            ibs->evtbuf[7]  = ibs->parent.slave_addr;
> > +            ibs->evtbuf[9]  = 0x04; /* Format version */
> > +            ibs->evtbuf[10] = 0x01; /* Sensor type: Temperature */
> > +            ibs->evtbuf[11] = 0x01; /* Sensor number */
> > +            ibs->evtbuf[12] = 0x01; /* Event type: threshold */
> > +            ibs->evtbuf[13] = 0x09; /* Upper critical going high */
> > +            ibs->evtbuf[14] = 0x57; /* Threshold value */
> > +            ibs->evtbuf[15] = 0x00; /* Sequence (incremented on reads) */
> > +
> > +            ibs->msg_flags |= IPMI_BMC_MSG_FLAG_EVT_BUF_FULL;
> > +
> > +            /* Ensure event message buffer is enabled in global enables
> > +             * so the kernel sees it.  Also enable event logging.
> > +             */
> > +            ibs->bmc_global_enables |= (1 << IPMI_BMC_EVENT_MSG_BUF_BIT)
> > +                                     | (1 << IPMI_BMC_EVENT_LOG_BIT);
> > +
> > +            k->set_atn(s, 1, attn_irq_enabled(ibs));
> > +
> > +            fprintf(stderr, "[FI] sticky-events armed after %u responses\n",
> > +                    ibs->fi_evt_rsp_count);
> > +        }
> > +    }
> > +
> >      next_timeout(ibs);
> >  }
> >  
> > @@ -1013,8 +1070,14 @@ static void clr_msg_flags(IPMIBmcSim *ibs,
> >  {
> >      IPMIInterface *s = ibs->parent.intf;
> >      IPMIInterfaceClass *k = IPMI_INTERFACE_GET_CLASS(s);
> > +    uint8_t clear_mask = cmd[2];
> > +
> > +    if (ibs->fi_evt_armed) {
> > +        /* Don't allow clearing EVT_BUF_FULL when sticky events active */
> > +        clear_mask &= ~IPMI_BMC_MSG_FLAG_EVT_BUF_FULL;
> > +    }
> >  
> > -    ibs->msg_flags &= ~cmd[2];
> > +    ibs->msg_flags &= ~clear_mask;
> >      k->set_atn(s, attn_set(ibs), attn_irq_enabled(ibs));
> >  }
> >  
> > @@ -1040,7 +1103,19 @@ static void read_evt_msg_buf(IPMIBmcSim *ibs,
> >      for (i = 0; i < 16; i++) {
> >          rsp_buffer_push(rsp, ibs->evtbuf[i]);
> >      }
> > -    ibs->msg_flags &= ~IPMI_BMC_MSG_FLAG_EVT_BUF_FULL;
> > +
> > +    if (ibs->fi_evt_armed) {
> > +        /*
> > +         * Sticky mode: return success but keep EVT_BUF_FULL set.
> > +         * Vary the event data slightly so the kernel doesn't
> > +         * de-duplicate (increment evd3 as a sequence number).
> > +         */
> > +        ibs->fi_evt_read_count++;
> > +        ibs->evtbuf[15] = (uint8_t)(ibs->fi_evt_read_count & 0xff);
> > +        /* Don't clear the flag — starvation continues */
> > +    } else {
> > +        ibs->msg_flags &= ~IPMI_BMC_MSG_FLAG_EVT_BUF_FULL;
> > +    }
> >      k->set_atn(s, attn_set(ibs), attn_irq_enabled(ibs));
> >  }
> >  
> > @@ -2670,6 +2745,8 @@ static const Property ipmi_sim_properties[] = {
> >      DEFINE_PROP_STRING("lan.netmask", IPMIBmcSim, lan.arg_netmask),
> >      DEFINE_PROP_STRING("lan.defgw_ipaddr", IPMIBmcSim, lan.arg_defgw_ipaddr),
> >      DEFINE_PROP_MACADDR("lan.defgw_macaddr", IPMIBmcSim, lan.defgw_macaddr),
> > +    DEFINE_PROP_BOOL("fi_sticky_events", IPMIBmcSim, fi_sticky_events, false),
> > +    DEFINE_PROP_UINT32("fi_evt_arm_after", IPMIBmcSim, fi_evt_arm_after, 40),
> >  };
> >  
> >  static void ipmi_sim_class_init(ObjectClass *oc, const void *data)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 0/1] ipmi: Fix issues with BMCs that report event and message incorrectly
  2026-04-21 12:42 [PATCH 0/1] ipmi: Fix issues with BMCs that report event and message incorrectly Corey Minyard
  2026-04-21 12:42 ` [PATCH 1/2] ipmi: Check event message buffer response for bad data Corey Minyard
  2026-04-21 12:42 ` [PATCH 2/2] ipmi: Add limits to event and receive message requests Corey Minyard
@ 2026-04-21 22:24 ` Matt Fleming
  2026-04-22  4:44   ` [Openipmi-developer] " Jian Zhang
  2 siblings, 1 reply; 10+ messages in thread
From: Matt Fleming @ 2026-04-21 22:24 UTC (permalink / raw)
  To: Corey Minyard
  Cc: Matt Fleming, openipmi-developer, Tony Camuso, linux-kernel,
	kernel-team

On Tue, Apr 21, 2026 at 07:42:42AM -0500, Corey Minyard wrote:
> Matt reported that there were issues with the IPMI driver getting wedged
> in some cases.  It turns out that the BMC was not reporting an error as
> it should have (per the spec) when the event queue was empty.  The IPMI
> driver would then request the next event, and so on, wedging the driver.

Thanks for replying so quickly, Corey. I'll test these out.

One bit of info I pulled out of the stuck machine is that the response
looks properly formed.

I sampled the first 8 entries and they were all identical 19-byte
successful READ_EVENT_MSG_BUFFER responses:

  1c 35 00 55 55 c0 41 a7 00 00 00 00 00 3a ff 00 ff ff ff

So on this machine, the event replies do not look short or malformed;
they look like repeated successful event-buffer reads with the same
payload.

Thanks,
Matt

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Openipmi-developer] [PATCH 0/1] ipmi: Fix issues with BMCs that report event and message incorrectly
  2026-04-21 22:24 ` [PATCH 0/1] ipmi: Fix issues with BMCs that report event and message incorrectly Matt Fleming
@ 2026-04-22  4:44   ` Jian Zhang
  0 siblings, 0 replies; 10+ messages in thread
From: Jian Zhang @ 2026-04-22  4:44 UTC (permalink / raw)
  To: Matt Fleming
  Cc: Corey Minyard, linux-kernel, Matt Fleming, kernel-team,
	openipmi-developer, Tony Camuso

> 2026年4月22日 06:24，Matt Fleming <matt@readmodwrite.com> 写道：
> 
> On Tue, Apr 21, 2026 at 07:42:42AM -0500, Corey Minyard wrote:
>> Matt reported that there were issues with the IPMI driver getting wedged
>> in some cases.  It turns out that the BMC was not reporting an error as
>> it should have (per the spec) when the event queue was empty.  The IPMI
>> driver would then request the next event, and so on, wedging the driver.
> 
> Thanks for replying so quickly, Corey. I'll test these out.
> 
> One bit of info I pulled out of the stuck machine is that the response
> looks properly formed.
> 
> I sampled the first 8 entries and they were all identical 19-byte
> successful READ_EVENT_MSG_BUFFER responses:
> 
>  1c 35 00 55 55 c0 41 a7 00 00 00 00 00 3a ff 00 ff ff ff
> 

Perhaps I know where this data comes from. 
During a previous debugging session (where ipmitool v1.8.19 failed on sensor list due to an underflow in
nr_numbers, which has since been fixed), I noticed this behavior. However, I
’m not sure why it is implemented this way or what exactly this command is intended to do.
If you are running on OpenBMC, it is very likely related to this part, 
where a fixed value is always returned (especially if the KCS channel happens to be configured as 15):

See: https://github.com/openbmc/phosphor-host-ipmid/blob/master/systemintfcmds.cpp#L35

Jian.

> So on this machine, the event replies do not look short or malformed;
> they look like repeated successful event-buffer reads with the same
> payload.
> 
> Thanks,
> Matt
> 
> 
> _______________________________________________
> Openipmi-developer mailing list
> Openipmi-developer@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/openipmi-developer

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2026-04-28 13:06 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-21 12:42 [PATCH 0/1] ipmi: Fix issues with BMCs that report event and message incorrectly Corey Minyard
2026-04-21 12:42 ` [PATCH 1/2] ipmi: Check event message buffer response for bad data Corey Minyard
2026-04-21 12:42 ` [PATCH 2/2] ipmi: Add limits to event and receive message requests Corey Minyard
2026-04-25  9:36   ` Matt Fleming
2026-04-25 23:58     ` Corey Minyard
2026-04-28 10:15       ` Matt Fleming
2026-04-28 11:45         ` Corey Minyard
2026-04-28 13:06           ` Corey Minyard
2026-04-21 22:24 ` [PATCH 0/1] ipmi: Fix issues with BMCs that report event and message incorrectly Matt Fleming
2026-04-22  4:44   ` [Openipmi-developer] " Jian Zhang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox