All of lore.kernel.org
 help / color / mirror / Atom feed
From: Matt Fleming <matt@readmodwrite.com>
To: Corey Minyard <corey@minyard.net>
Cc: Tony Camuso <tcamuso@redhat.com>,
	 openipmi-developer@lists.sourceforge.net,
	linux-kernel@vger.kernel.org, kernel-team@cloudflare.com,
	 Matt Fleming <mfleming@cloudflare.com>
Subject: Re: [PATCH] ipmi: Add timeout to unconditional wait in __get_device_id()
Date: Fri, 17 Apr 2026 23:23:03 +0100	[thread overview]
Message-ID: <aeKwa4napKfBerJM@matt-Precision-5490> (raw)
In-Reply-To: <ad-BtS5b3qiowqb7@mail.minyard.net>

On Wed, Apr 15, 2026 at 07:16:53AM -0500, Corey Minyard wrote:
> 
> The lower level driver should never not return an answer, it is supposed
> to guarantee that it returns an error if the BMC doesn't respond.
> 
> So the bug is not here, the bug is elsewhere.  My guess is that there
> is some new failure mode where a BMC is not working but it responds well
> enough that it sort of works and fools the driver.  But that's only a
> guess.

I can now reproduce this pretty reliably by running concurrent
ipmitool commands (sensor/sel/mc info) + sysfs readers + periodic
ipmitool mc reset cold. It wedges in a few minutes. 

My working theory is handle_flags() in ipmi_si_intf.c can loop on
flag-driven commands (e.g. READ_EVENT_MSG_BUFFER) without ever calling
start_next_msg(), starving waiting_msg indefinitely.

Captured state at wedge:

  si_state=SI_GETTING_EVENTS  msg_flags=0x02
  si_curr cycling cmd=0x35 (READ_EVENT_MSG_BUFFER)
  si_wait frozen cmd=0x08 (GET_DEVICE_GUID, never promoted)

The cold reset makes the BMC report EVENT_MSG_BUFFER_FULL during
re-init, which drives the flag loop.

Thanks,
Matt

  parent reply	other threads:[~2026-04-17 22:23 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-15 11:59 [PATCH] ipmi: Add timeout to unconditional wait in __get_device_id() Matt Fleming
2026-04-15 12:16 ` Corey Minyard
2026-04-15 15:46   ` Tony Camuso
2026-04-15 21:22     ` Frederick Lawler
2026-04-16 14:28       ` Tony Camuso
2026-04-17 16:01         ` Matt Fleming
2026-04-17 15:41   ` Matt Fleming
2026-04-17 22:23   ` Matt Fleming [this message]
2026-04-17 23:53     ` Corey Minyard
2026-04-19 20:50       ` Matt Fleming
2026-04-20 16:33         ` Corey Minyard
2026-04-20 18:11           ` Corey Minyard

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aeKwa4napKfBerJM@matt-Precision-5490 \
    --to=matt@readmodwrite.com \
    --cc=corey@minyard.net \
    --cc=kernel-team@cloudflare.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mfleming@cloudflare.com \
    --cc=openipmi-developer@lists.sourceforge.net \
    --cc=tcamuso@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.