From: Corey Minyard <corey@minyard.net>
To: Matt Fleming <matt@readmodwrite.com>,
Tony Camuso <tcamuso@redhat.com>,
openipmi-developer@lists.sourceforge.net,
linux-kernel@vger.kernel.org, kernel-team@cloudflare.com,
Matt Fleming <mfleming@cloudflare.com>,
Frederick Lawler <fred@cloudflare.com>
Subject: Re: [PATCH] ipmi: Add timeout to unconditional wait in __get_device_id()
Date: Mon, 20 Apr 2026 13:11:25 -0500 [thread overview]
Message-ID: <aeZsTQdBNBzvmNG2@mail.minyard.net> (raw)
In-Reply-To: <aeZVPUV_gPrgITJQ@mail.minyard.net>
On Mon, Apr 20, 2026 at 11:33:01AM -0500, Corey Minyard wrote:
> On Sun, Apr 19, 2026 at 09:50:38PM +0100, Matt Fleming wrote:
> > On Fri, Apr 17, 2026 at 06:53:55PM -0500, Corey Minyard wrote:
> > >
> > > The EVENT_MSG_BUFFER_FULL flag only gets cleared when a unsuccessful
> > > READ_EVENT_MSG_BUFFER command completes. Getting data from the
> > > BMC has higher priority than sending data to the BMC.
> > >
> > > If the BMC continually reports success from READ_EVENT_MSG_BUFFER, then
> > > that would certainly wedge the driver. But it would have to continually
> > > report success for that command, which would be strange as its supposed
> > > to error out when the queue is empty.
> >
> > That does indeed appear to be what's happening.
> >
> > The implementation of intel-ipmi-oem's OpenBMC READ_EVENT_MSG_BUFFER
> > handler does not fail when there is nothing to read,
> >
> > https://github.com/openbmc/intel-ipmi-oem/blob/master/src/bridgingcommands.cpp#L704
>
> Actually, that is so clearly wrong that it's hard to imagine they made
> this mistake. Anyway, a defect needs to be filed against it. It will
> certainly break other software on other OSes.
>
> I think I have an easy workaround, though I'm guessing. I'm guessing
> they are returning zero data bytes. There's no check on the size at that
> point in the code (it's later).
>
> Can you try the following patch?
Actually ignore that, I was thinking too much about the other stuff and
didn't actually read my patch.
Here's a working patch:
diff --git a/drivers/char/ipmi/ipmi_si_intf.c b/drivers/char/ipmi/ipmi_si_intf.c
index 4a9e9de4d684..08c208cc64c5 100644
--- a/drivers/char/ipmi/ipmi_si_intf.c
+++ b/drivers/char/ipmi/ipmi_si_intf.c
@@ -630,7 +630,13 @@ static void handle_transaction_done(struct smi_info *smi_info)
*/
msg = smi_info->curr_msg;
smi_info->curr_msg = NULL;
- if (msg->rsp[2] != 0) {
+ /*
+ * It appears some BMCs, with no event data, return no
+ * data in the message and not a 0x80 error as the
+ * spec says they should. Shut down processing if
+ * the data is not the right length.
+ */
+ if (msg->rsp[2] != 0 || msg->rsp_size != 19) {
/* Error getting event, probably done. */
msg->done(msg);
>
> diff --git a/drivers/char/ipmi/ipmi_si_intf.c b/drivers/char/ipmi/ipmi_si_intf.c
> index 4a9e9de4d684..cf8674a93af1 100644
> --- a/drivers/char/ipmi/ipmi_si_intf.c
> +++ b/drivers/char/ipmi/ipmi_si_intf.c
> @@ -630,7 +630,13 @@ static void handle_transaction_done(struct smi_info *smi_info)
> */
> msg = smi_info->curr_msg;
> smi_info->curr_msg = NULL;
> - if (msg->rsp[2] != 0) {
> + /*
> + * It appears some BMCs, with no event data, return no
> + * data in the message and not a 0x80 error as the
> + * spec says they should. Shut down processing if
> + * the data is not the right length.
> + */
> + if (msg->rsp[2] != 0 || smi_info->curr_msg->rsp_size != 19) {
> /* Error getting event, probably done. */
> msg->done(msg);
>
>
> With your approval on that, I'll send it to Linus after letting it sit
> in the next tree for a bit. Actually, I'll probably add it in any case,
> as it's a good check to do.
>
> >
> > > If it's really something like that, I could also look at adding limits
> > > for those operations.
> >
> > That would be great. Me and Fred would be happy to test out any patch.
> >
> > I still think the original patch I sent is a worthwhile defense.
> >
> > Our periodic monitoring scripts cause TASK_UNINTERRUPTIBLE tasks to
> > block behind one another when we hit these kinds of issues in the IPMI
> > code. Untangling that across thousands of machines can be time
> > consuming and a more explicit EIO or ETIMEDOUT would help with triage.
>
> Unfortunately, that might have other issues, similar to the ones the
> people with the watchdog issue found. I'll look at it a bit, but those
> sorts of things would have to be scattered all over the code, not just
> in that one place. As you say, it would make debugging easier.
>
> I think adding a counter to the number of operations occuring from a
> single flag fetch would be a way to avoid this issue. That's going to
> take a little more time, but I'll definitely work on that.
>
> Thanks,
>
> -corey
prev parent reply other threads:[~2026-04-20 18:11 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-15 11:59 [PATCH] ipmi: Add timeout to unconditional wait in __get_device_id() Matt Fleming
2026-04-15 12:16 ` Corey Minyard
2026-04-15 15:46 ` Tony Camuso
2026-04-15 21:22 ` Frederick Lawler
2026-04-16 14:28 ` Tony Camuso
2026-04-17 16:01 ` Matt Fleming
2026-04-17 15:41 ` Matt Fleming
2026-04-17 22:23 ` Matt Fleming
2026-04-17 23:53 ` Corey Minyard
2026-04-19 20:50 ` Matt Fleming
2026-04-20 16:33 ` Corey Minyard
2026-04-20 18:11 ` Corey Minyard [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aeZsTQdBNBzvmNG2@mail.minyard.net \
--to=corey@minyard.net \
--cc=fred@cloudflare.com \
--cc=kernel-team@cloudflare.com \
--cc=linux-kernel@vger.kernel.org \
--cc=matt@readmodwrite.com \
--cc=mfleming@cloudflare.com \
--cc=openipmi-developer@lists.sourceforge.net \
--cc=tcamuso@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox