From: Frederick Lawler <fred@cloudflare.com>
To: Tony Camuso <tcamuso@redhat.com>
Cc: corey@minyard.net, Matt Fleming <matt@readmodwrite.com>,
openipmi-developer@lists.sourceforge.net,
linux-kernel@vger.kernel.org, kernel-team@cloudflare.com,
Matt Fleming <mfleming@cloudflare.com>
Subject: Re: [PATCH] ipmi: Add timeout to unconditional wait in __get_device_id()
Date: Wed, 15 Apr 2026 16:22:45 -0500 [thread overview]
Message-ID: <aeABpewNzo4MURpO@CMGLRV3> (raw)
In-Reply-To: <9b6af9ab-79f9-4f87-ab7c-8ad6efeb18ed@redhat.com>
Hi Corey & Tony,
On Wed, Apr 15, 2026 at 11:46:27AM -0400, 'Tony Camuso' via kernel-team wrote:
> On Wed, Apr 15, 2026 at 12:59:30PM +0100, Matt Fleming wrote:
> > From: Matt Fleming <mfl...@cl...>
> >
> > When the BMC does not respond to a "Get Device ID" command, the
> > wait_event() in __get_device_id() blocks forever in
> > TASK_UNINTERRUPTIBLE while holding bmc->dyn_mutex. Every subsequent
> > sysfs reader then piles up in D state. Replace with
> > wait_event_timeout() to return -EIO after 1 second.
>
> On Wed, Apr 15, 2026 at 12:17:04PM, Corey Minyard wrote:
> > This is the second report I have of something like this. So
> > something is up. I'm adding Tony, who reported something like this
> > dealing with the watchdog.
> >
> > The lower level driver should never not return an answer, it is
> > supposed to guarantee that it returns an error if the BMC doesn't
> > respond. So the bug is not here, the bug is elsewhere.
This is a bit of a throwback to our previous discussions around [1].
I did end up applying [2] based on that discussion, and had limited
success, but we still have external resets that cause us to enter
this undesirable state :(
[1]: https://lore.kernel.org/all/aJUMlAG17c6lCgFq@mail.minyard.net/
[2]: https://lore.kernel.org/all/20250807230648.1112569-2-corey@minyard.net/
>
> I've been tracking a related issue (RHEL customer case) where BMC
> reset while the IPMI watchdog is active causes D-state hangs. This
> appears to be the same root cause Matt is hitting.
>
> I backported the recent upstream KCS/SI fixes to a RHEL 9 test kernel
> (54 patches bringing it to mainline parity) and tested today on a
> Dell R640.
I assume this patch series: "ipmi:watchdog: Fix panic, D-state hang, and
lost protection on BMC reset" [3]?
[3]: https://lore.kernel.org/all/20260407175134.3367345-1-tcamuso@redhat.com/
next prev parent reply other threads:[~2026-04-15 21:22 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-15 11:59 [PATCH] ipmi: Add timeout to unconditional wait in __get_device_id() Matt Fleming
2026-04-15 12:16 ` Corey Minyard
2026-04-15 15:46 ` Tony Camuso
2026-04-15 21:22 ` Frederick Lawler [this message]
2026-04-16 14:28 ` Tony Camuso
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aeABpewNzo4MURpO@CMGLRV3 \
--to=fred@cloudflare.com \
--cc=corey@minyard.net \
--cc=kernel-team@cloudflare.com \
--cc=linux-kernel@vger.kernel.org \
--cc=matt@readmodwrite.com \
--cc=mfleming@cloudflare.com \
--cc=openipmi-developer@lists.sourceforge.net \
--cc=tcamuso@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox