From: Corey Minyard <corey@minyard.net>
To: Matt Fleming <matt@readmodwrite.com>
Cc: Tony Camuso <tcamuso@redhat.com>,
openipmi-developer@lists.sourceforge.net,
linux-kernel@vger.kernel.org, kernel-team@cloudflare.com,
Matt Fleming <mfleming@cloudflare.com>
Subject: Re: [PATCH] ipmi: Add timeout to unconditional wait in __get_device_id()
Date: Fri, 17 Apr 2026 18:53:55 -0500 [thread overview]
Message-ID: <aeLIE0Psdlvr9l7j@mail.minyard.net> (raw)
In-Reply-To: <aeKwa4napKfBerJM@matt-Precision-5490>
On Fri, Apr 17, 2026 at 11:23:03PM +0100, Matt Fleming wrote:
> On Wed, Apr 15, 2026 at 07:16:53AM -0500, Corey Minyard wrote:
> >
> > The lower level driver should never not return an answer, it is supposed
> > to guarantee that it returns an error if the BMC doesn't respond.
> >
> > So the bug is not here, the bug is elsewhere. My guess is that there
> > is some new failure mode where a BMC is not working but it responds well
> > enough that it sort of works and fools the driver. But that's only a
> > guess.
>
> I can now reproduce this pretty reliably by running concurrent
> ipmitool commands (sensor/sel/mc info) + sysfs readers + periodic
> ipmitool mc reset cold. It wedges in a few minutes.
Hmm. If you are sending cold resets, then the driver is going into
reset maintenance mode and it should be rejecting messages for 30
seconds after you send that command.
You can disable that by changing is_maintenance_mode_cmd() in
ipmi_msghandler.c to always return false.
>
> My working theory is handle_flags() in ipmi_si_intf.c can loop on
> flag-driven commands (e.g. READ_EVENT_MSG_BUFFER) without ever calling
> start_next_msg(), starving waiting_msg indefinitely.
>
> Captured state at wedge:
>
> si_state=SI_GETTING_EVENTS msg_flags=0x02
> si_curr cycling cmd=0x35 (READ_EVENT_MSG_BUFFER)
> si_wait frozen cmd=0x08 (GET_DEVICE_GUID, never promoted)
>
> The cold reset makes the BMC report EVENT_MSG_BUFFER_FULL during
> re-init, which drives the flag loop.
The EVENT_MSG_BUFFER_FULL flag only gets cleared when a unsuccessful
READ_EVENT_MSG_BUFFER command completes. Getting data from the
BMC has higher priority than sending data to the BMC.
If the BMC continually reports success from READ_EVENT_MSG_BUFFER, then
that would certainly wedge the driver. But it would have to continually
report success for that command, which would be strange as its supposed
to error out when the queue is empty.
If it's really something like that, I could also look at adding limits
for those operations.
To debug things like this I often add module_params that let me see what
is going on. But you can look at the "invalid_events" counter to see
if the data is bogus. Or there should be an "Event queue full,
discarding incoming events" log coming out once at the beginning of when
this happens.
-corey
>
> Thanks,
> Matt
prev parent reply other threads:[~2026-04-17 23:54 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-15 11:59 [PATCH] ipmi: Add timeout to unconditional wait in __get_device_id() Matt Fleming
2026-04-15 12:16 ` Corey Minyard
2026-04-15 15:46 ` Tony Camuso
2026-04-15 21:22 ` Frederick Lawler
2026-04-16 14:28 ` Tony Camuso
2026-04-17 16:01 ` Matt Fleming
2026-04-17 15:41 ` Matt Fleming
2026-04-17 22:23 ` Matt Fleming
2026-04-17 23:53 ` Corey Minyard [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aeLIE0Psdlvr9l7j@mail.minyard.net \
--to=corey@minyard.net \
--cc=kernel-team@cloudflare.com \
--cc=linux-kernel@vger.kernel.org \
--cc=matt@readmodwrite.com \
--cc=mfleming@cloudflare.com \
--cc=openipmi-developer@lists.sourceforge.net \
--cc=tcamuso@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox