public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Corey Minyard <corey@minyard.net>
To: Matt Fleming <matt@readmodwrite.com>
Cc: Matt Fleming <mfleming@cloudflare.com>,
	openipmi-developer@lists.sourceforge.net,
	Tony Camuso <tcamuso@redhat.com>,
	linux-kernel@vger.kernel.org, kernel-team@cloudflare.com,
	stable@vger.kernel.org
Subject: Re: [PATCH 2/2] ipmi: Add limits to event and receive message requests
Date: Tue, 28 Apr 2026 06:45:15 -0500	[thread overview]
Message-ID: <afCdy0Nu8glFCzqk@mail.minyard.net> (raw)
In-Reply-To: <afCHrNN-PuXh1WzG@matt-Precision-5490>

On Tue, Apr 28, 2026 at 11:15:46AM +0100, Matt Fleming wrote:
> On Sat, Apr 25, 2026 at 06:58:48PM -0500, Corey Minyard wrote:
> > 
> > Oh, yeah.
> > 
> > Moving it to handle_transaction_done() is not ideal, though.  If
> > something was spewing receive messages, it would never get to events,
> > which is why I did it like I did.
> > 
> > The following should fix this.  You could also have different limits for
> > receive messages and events, but I think the following is more clear.
> > 
> > diff --git a/drivers/char/ipmi/ipmi_si_intf.c b/drivers/char/ipmi/ipmi_si_intf.c
> > index 2a739123270c..e46f4150ceb5 100644
> > --- a/drivers/char/ipmi/ipmi_si_intf.c
> > +++ b/drivers/char/ipmi/ipmi_si_intf.c
> > @@ -413,8 +413,10 @@ static void start_getting_msg_queue(struct smi_info *smi_info)
> > 
> >  	start_new_msg(smi_info, smi_info->curr_msg->data,
> >  		      smi_info->curr_msg->data_size);
> > -	smi_info->num_requests_in_a_row = 0;
> > -	smi_info->si_state = SI_GETTING_MESSAGES;
> > +	if (smi_info->si_state != SI_GETTING_MESSAGES) {
> > +	    smi_info->num_requests_in_a_row = 0;
> > +	    smi_info->si_state = SI_GETTING_MESSAGES;
> > +	}
> >  }
> > 
> >  static void start_getting_events(struct smi_info *smi_info)
> > @@ -425,8 +427,10 @@ static void start_getting_events(struct smi_info *smi_info)
> > 
> >  	start_new_msg(smi_info, smi_info->curr_msg->data,
> >  		      smi_info->curr_msg->data_size);
> > -	smi_info->num_requests_in_a_row = 0;
> > -	smi_info->si_state = SI_GETTING_EVENTS;
> > +	if (smi_info->si_state != SI_GETTING_EVENTS) {
> > +	    smi_info->num_requests_in_a_row = 0;
> > +	    smi_info->si_state = SI_GETTING_EVENTS;
> > +	}
> 
> Thanks. Does this correctly handle a stream of mixed receive+event
> flags, though? If we bounce between MESSAGES and EVENTS, won't we keep
> resetting the counter on each state transition and never hit the limit?

It should.  Once the limit is reached it clears the bit in msg_flags.
That should prevent the messages or events from being re-requested
until the next flags fetch.  So if a continuous stream of messages
and events was coming in, it should fetch 10 messages, clear the bit,
then fetch 10 events, then clear that bit, then go back to normal
operation.

Of course, the next flag fetch will start the process over.

> 
> I was able to cook up a simple repro in Qemu for this class of bug.

I was thinking about how to do an automated test for this.  I use an
external simulated BMC for the automated tests I have.  So I'll work in
that direction.

But thanks, this should help me develop that test.

-corey

> 
> ---->8----
> 
> diff --git a/hw/ipmi/ipmi_bmc_sim.c b/hw/ipmi/ipmi_bmc_sim.c
> index fd875491f5..127db30c03 100644
> --- a/hw/ipmi/ipmi_bmc_sim.c
> +++ b/hw/ipmi/ipmi_bmc_sim.c
> @@ -249,6 +249,21 @@ struct IPMIBmcSim {
>      uint8_t evtbuf[16];
>  
>      QTAILQ_HEAD(, IPMIRcvBufEntry) rcvbufs;
> +
> +    /*
> +     * Fault injection: sticky EVENT_MSG_BUFFER_FULL.
> +     *
> +     * Simulates a BMC that continuously generates events (e.g. after a
> +     * cold reset causes sensor state changes).  Once armed, every
> +     * READ_EVENT_MSG_BUFFER returns success but never clears the
> +     * EVT_BUF_FULL flag, starving waiting_msg in the SI layer's
> +     * handle_flags() loop.  Reproduces the 517m277 / KRN-1233 wedge.
> +     */
> +    bool     fi_sticky_events;   /* enable via property */
> +    uint32_t fi_evt_arm_after;   /* arm after N completed responses */
> +    uint32_t fi_evt_rsp_count;   /* responses completed so far */
> +    bool     fi_evt_armed;       /* fault is active */
> +    uint32_t fi_evt_read_count;  /* READ_EVENT_MSG_BUFFER calls served */
>  };
>  
>  #define IPMI_BMC_MSG_FLAG_WATCHDOG_TIMEOUT_MASK        (1 << 3)
> @@ -494,7 +509,7 @@ static int sel_add_event(IPMIBmcSim *ibs, uint8_t *event)
>  static int attn_set(IPMIBmcSim *ibs)
>  {
>      return IPMI_BMC_MSG_FLAG_RCV_MSG_QUEUE_SET(ibs)
> -        || IPMI_BMC_MSG_FLAG_EVT_BUF_FULL_SET(ibs)
> +        || (!ibs->fi_evt_armed && IPMI_BMC_MSG_FLAG_EVT_BUF_FULL_SET(ibs))
>          || IPMI_BMC_MSG_FLAG_WATCHDOG_TIMEOUT_MASK_SET(ibs);
>  }
>  
> @@ -750,6 +765,48 @@ static void ipmi_sim_handle_command(IPMIBmc *b,
>   out:
>      k->handle_rsp(s, msg_id, rsp.buffer, rsp.len);
>  
> +    /*
> +     * Fault injection: count completed responses and arm sticky
> +     * EVENT_MSG_BUFFER_FULL after the configured threshold.
> +     */
> +    if (ibs->fi_sticky_events && !ibs->fi_evt_armed) {
> +        ibs->fi_evt_rsp_count++;
> +        if (ibs->fi_evt_rsp_count >= ibs->fi_evt_arm_after) {
> +            ibs->fi_evt_armed = true;
> +            ibs->fi_evt_read_count = 0;
> +
> +            /*
> +             * Seed the event buffer with a synthetic sensor event
> +             * (sensor type 0x01 = Temperature, event type 0x01 =
> +             * threshold, evd1 = upper critical going high).
> +             * This matches what real BMCs generate after a cold reset.
> +             */
> +            memset(ibs->evtbuf, 0, 16);
> +            ibs->evtbuf[2]  = 0x02; /* System event record */
> +            ibs->evtbuf[7]  = ibs->parent.slave_addr;
> +            ibs->evtbuf[9]  = 0x04; /* Format version */
> +            ibs->evtbuf[10] = 0x01; /* Sensor type: Temperature */
> +            ibs->evtbuf[11] = 0x01; /* Sensor number */
> +            ibs->evtbuf[12] = 0x01; /* Event type: threshold */
> +            ibs->evtbuf[13] = 0x09; /* Upper critical going high */
> +            ibs->evtbuf[14] = 0x57; /* Threshold value */
> +            ibs->evtbuf[15] = 0x00; /* Sequence (incremented on reads) */
> +
> +            ibs->msg_flags |= IPMI_BMC_MSG_FLAG_EVT_BUF_FULL;
> +
> +            /* Ensure event message buffer is enabled in global enables
> +             * so the kernel sees it.  Also enable event logging.
> +             */
> +            ibs->bmc_global_enables |= (1 << IPMI_BMC_EVENT_MSG_BUF_BIT)
> +                                     | (1 << IPMI_BMC_EVENT_LOG_BIT);
> +
> +            k->set_atn(s, 1, attn_irq_enabled(ibs));
> +
> +            fprintf(stderr, "[FI] sticky-events armed after %u responses\n",
> +                    ibs->fi_evt_rsp_count);
> +        }
> +    }
> +
>      next_timeout(ibs);
>  }
>  
> @@ -1013,8 +1070,14 @@ static void clr_msg_flags(IPMIBmcSim *ibs,
>  {
>      IPMIInterface *s = ibs->parent.intf;
>      IPMIInterfaceClass *k = IPMI_INTERFACE_GET_CLASS(s);
> +    uint8_t clear_mask = cmd[2];
> +
> +    if (ibs->fi_evt_armed) {
> +        /* Don't allow clearing EVT_BUF_FULL when sticky events active */
> +        clear_mask &= ~IPMI_BMC_MSG_FLAG_EVT_BUF_FULL;
> +    }
>  
> -    ibs->msg_flags &= ~cmd[2];
> +    ibs->msg_flags &= ~clear_mask;
>      k->set_atn(s, attn_set(ibs), attn_irq_enabled(ibs));
>  }
>  
> @@ -1040,7 +1103,19 @@ static void read_evt_msg_buf(IPMIBmcSim *ibs,
>      for (i = 0; i < 16; i++) {
>          rsp_buffer_push(rsp, ibs->evtbuf[i]);
>      }
> -    ibs->msg_flags &= ~IPMI_BMC_MSG_FLAG_EVT_BUF_FULL;
> +
> +    if (ibs->fi_evt_armed) {
> +        /*
> +         * Sticky mode: return success but keep EVT_BUF_FULL set.
> +         * Vary the event data slightly so the kernel doesn't
> +         * de-duplicate (increment evd3 as a sequence number).
> +         */
> +        ibs->fi_evt_read_count++;
> +        ibs->evtbuf[15] = (uint8_t)(ibs->fi_evt_read_count & 0xff);
> +        /* Don't clear the flag — starvation continues */
> +    } else {
> +        ibs->msg_flags &= ~IPMI_BMC_MSG_FLAG_EVT_BUF_FULL;
> +    }
>      k->set_atn(s, attn_set(ibs), attn_irq_enabled(ibs));
>  }
>  
> @@ -2670,6 +2745,8 @@ static const Property ipmi_sim_properties[] = {
>      DEFINE_PROP_STRING("lan.netmask", IPMIBmcSim, lan.arg_netmask),
>      DEFINE_PROP_STRING("lan.defgw_ipaddr", IPMIBmcSim, lan.arg_defgw_ipaddr),
>      DEFINE_PROP_MACADDR("lan.defgw_macaddr", IPMIBmcSim, lan.defgw_macaddr),
> +    DEFINE_PROP_BOOL("fi_sticky_events", IPMIBmcSim, fi_sticky_events, false),
> +    DEFINE_PROP_UINT32("fi_evt_arm_after", IPMIBmcSim, fi_evt_arm_after, 40),
>  };
>  
>  static void ipmi_sim_class_init(ObjectClass *oc, const void *data)

  reply	other threads:[~2026-04-28 11:45 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-21 12:42 [PATCH 0/1] ipmi: Fix issues with BMCs that report event and message incorrectly Corey Minyard
2026-04-21 12:42 ` [PATCH 1/2] ipmi: Check event message buffer response for bad data Corey Minyard
2026-04-21 12:42 ` [PATCH 2/2] ipmi: Add limits to event and receive message requests Corey Minyard
2026-04-25  9:36   ` Matt Fleming
2026-04-25 23:58     ` Corey Minyard
2026-04-28 10:15       ` Matt Fleming
2026-04-28 11:45         ` Corey Minyard [this message]
2026-04-28 13:06           ` Corey Minyard
2026-04-21 22:24 ` [PATCH 0/1] ipmi: Fix issues with BMCs that report event and message incorrectly Matt Fleming
2026-04-22  4:44   ` [Openipmi-developer] " Jian Zhang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=afCdy0Nu8glFCzqk@mail.minyard.net \
    --to=corey@minyard.net \
    --cc=kernel-team@cloudflare.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=matt@readmodwrite.com \
    --cc=mfleming@cloudflare.com \
    --cc=openipmi-developer@lists.sourceforge.net \
    --cc=stable@vger.kernel.org \
    --cc=tcamuso@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox