From: Matt Fleming <matt@readmodwrite.com>
To: Corey Minyard <corey@minyard.net>
Cc: Tony Camuso <tcamuso@redhat.com>,
openipmi-developer@lists.sourceforge.net,
linux-kernel@vger.kernel.org, kernel-team@cloudflare.com,
Matt Fleming <mfleming@cloudflare.com>
Subject: Re: [PATCH] ipmi: Add timeout to unconditional wait in __get_device_id()
Date: Fri, 17 Apr 2026 23:23:03 +0100 [thread overview]
Message-ID: <aeKwa4napKfBerJM@matt-Precision-5490> (raw)
In-Reply-To: <ad-BtS5b3qiowqb7@mail.minyard.net>
On Wed, Apr 15, 2026 at 07:16:53AM -0500, Corey Minyard wrote:
>
> The lower level driver should never not return an answer, it is supposed
> to guarantee that it returns an error if the BMC doesn't respond.
>
> So the bug is not here, the bug is elsewhere. My guess is that there
> is some new failure mode where a BMC is not working but it responds well
> enough that it sort of works and fools the driver. But that's only a
> guess.
I can now reproduce this pretty reliably by running concurrent
ipmitool commands (sensor/sel/mc info) + sysfs readers + periodic
ipmitool mc reset cold. It wedges in a few minutes.
My working theory is handle_flags() in ipmi_si_intf.c can loop on
flag-driven commands (e.g. READ_EVENT_MSG_BUFFER) without ever calling
start_next_msg(), starving waiting_msg indefinitely.
Captured state at wedge:
si_state=SI_GETTING_EVENTS msg_flags=0x02
si_curr cycling cmd=0x35 (READ_EVENT_MSG_BUFFER)
si_wait frozen cmd=0x08 (GET_DEVICE_GUID, never promoted)
The cold reset makes the BMC report EVENT_MSG_BUFFER_FULL during
re-init, which drives the flag loop.
Thanks,
Matt
next prev parent reply other threads:[~2026-04-17 22:23 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-15 11:59 [PATCH] ipmi: Add timeout to unconditional wait in __get_device_id() Matt Fleming
2026-04-15 12:16 ` Corey Minyard
2026-04-15 15:46 ` Tony Camuso
2026-04-15 21:22 ` Frederick Lawler
2026-04-16 14:28 ` Tony Camuso
2026-04-17 16:01 ` Matt Fleming
2026-04-17 15:41 ` Matt Fleming
2026-04-17 22:23 ` Matt Fleming [this message]
2026-04-17 23:53 ` Corey Minyard
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aeKwa4napKfBerJM@matt-Precision-5490 \
--to=matt@readmodwrite.com \
--cc=corey@minyard.net \
--cc=kernel-team@cloudflare.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mfleming@cloudflare.com \
--cc=openipmi-developer@lists.sourceforge.net \
--cc=tcamuso@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox