* [PATCH] ipmi: Add timeout to unconditional wait in __get_device_id() @ 2026-04-15 11:59 Matt Fleming 2026-04-15 12:16 ` Corey Minyard 0 siblings, 1 reply; 9+ messages in thread From: Matt Fleming @ 2026-04-15 11:59 UTC (permalink / raw) To: Corey Minyard Cc: openipmi-developer, linux-kernel, kernel-team, Matt Fleming, Matt Fleming From: Matt Fleming <mfleming@cloudflare.com> When the BMC does not respond to a "Get Device ID" command, the wait_event() in __get_device_id() blocks forever in TASK_UNINTERRUPTIBLE while holding bmc->dyn_mutex. Every subsequent sysfs reader then piles up in D state. Replace with wait_event_timeout() to return -EIO after 1 second. Signed-off-by: Matt Fleming <matt@readmodwrite.com> --- drivers/char/ipmi/ipmi_msghandler.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/drivers/char/ipmi/ipmi_msghandler.c b/drivers/char/ipmi/ipmi_msghandler.c index c41f51c82edd..efa9588e8210 100644 --- a/drivers/char/ipmi/ipmi_msghandler.c +++ b/drivers/char/ipmi/ipmi_msghandler.c @@ -2599,7 +2599,13 @@ static int __get_device_id(struct ipmi_smi *intf, struct bmc_device *bmc) if (rv) goto out_reset_handler; - wait_event(intf->waitq, bmc->dyn_id_set != 2); + if (!wait_event_timeout(intf->waitq, bmc->dyn_id_set != 2, + msecs_to_jiffies(1000))) { + dev_warn(intf->si_dev, + "Timed out waiting for get bmc device id response\n"); + rv = -EIO; + goto out_reset_handler; + } if (!bmc->dyn_id_set) { if (bmc->cc != IPMI_CC_NO_ERROR && -- 2.43.0 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH] ipmi: Add timeout to unconditional wait in __get_device_id() 2026-04-15 11:59 [PATCH] ipmi: Add timeout to unconditional wait in __get_device_id() Matt Fleming @ 2026-04-15 12:16 ` Corey Minyard 2026-04-15 15:46 ` Tony Camuso ` (2 more replies) 0 siblings, 3 replies; 9+ messages in thread From: Corey Minyard @ 2026-04-15 12:16 UTC (permalink / raw) To: Matt Fleming, Tony Camuso Cc: openipmi-developer, linux-kernel, kernel-team, Matt Fleming On Wed, Apr 15, 2026 at 12:59:30PM +0100, Matt Fleming wrote: > From: Matt Fleming <mfleming@cloudflare.com> > > When the BMC does not respond to a "Get Device ID" command, the > wait_event() in __get_device_id() blocks forever in TASK_UNINTERRUPTIBLE > while holding bmc->dyn_mutex. Every subsequent sysfs reader then piles > up in D state. Replace with wait_event_timeout() to return -EIO after 1 > second. This is the second report I have of something like this. So something is up. I'm adding Tony, who reported something like this dealing with the watchdog. The lower level driver should never not return an answer, it is supposed to guarantee that it returns an error if the BMC doesn't respond. So the bug is not here, the bug is elsewhere. My guess is that there is some new failure mode where a BMC is not working but it responds well enough that it sort of works and fools the driver. But that's only a guess. I've seen this before in several scenarios, including a system that put IPMI in the ACPI tree and it sort of worked but there was no BMC present. I had to disable that particular device. What hardware is involved here? Can you give a more detailed example of what's happening in the low-level hardware? If it's KCS there's a debug flag in the drivers/char/ipmi/ipmi_kcs_sm.c file that should help. Thanks, -corey > > Signed-off-by: Matt Fleming <matt@readmodwrite.com> > --- > drivers/char/ipmi/ipmi_msghandler.c | 8 +++++++- > 1 file changed, 7 insertions(+), 1 deletion(-) > > diff --git a/drivers/char/ipmi/ipmi_msghandler.c b/drivers/char/ipmi/ipmi_msghandler.c > index c41f51c82edd..efa9588e8210 100644 > --- a/drivers/char/ipmi/ipmi_msghandler.c > +++ b/drivers/char/ipmi/ipmi_msghandler.c > @@ -2599,7 +2599,13 @@ static int __get_device_id(struct ipmi_smi *intf, struct bmc_device *bmc) > if (rv) > goto out_reset_handler; > > - wait_event(intf->waitq, bmc->dyn_id_set != 2); > + if (!wait_event_timeout(intf->waitq, bmc->dyn_id_set != 2, > + msecs_to_jiffies(1000))) { > + dev_warn(intf->si_dev, > + "Timed out waiting for get bmc device id response\n"); > + rv = -EIO; > + goto out_reset_handler; > + } > > if (!bmc->dyn_id_set) { > if (bmc->cc != IPMI_CC_NO_ERROR && > -- > 2.43.0 > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] ipmi: Add timeout to unconditional wait in __get_device_id() 2026-04-15 12:16 ` Corey Minyard @ 2026-04-15 15:46 ` Tony Camuso 2026-04-15 21:22 ` Frederick Lawler 2026-04-17 15:41 ` Matt Fleming 2026-04-17 22:23 ` Matt Fleming 2 siblings, 1 reply; 9+ messages in thread From: Tony Camuso @ 2026-04-15 15:46 UTC (permalink / raw) To: corey, Matt Fleming Cc: openipmi-developer, linux-kernel, kernel-team, Matt Fleming On Wed, Apr 15, 2026 at 12:59:30PM +0100, Matt Fleming wrote: > From: Matt Fleming <mfl...@cl...> > > When the BMC does not respond to a "Get Device ID" command, the > wait_event() in __get_device_id() blocks forever in > TASK_UNINTERRUPTIBLE while holding bmc->dyn_mutex. Every subsequent > sysfs reader then piles up in D state. Replace with > wait_event_timeout() to return -EIO after 1 second. On Wed, Apr 15, 2026 at 12:17:04PM, Corey Minyard wrote: > This is the second report I have of something like this. So > something is up. I'm adding Tony, who reported something like this > dealing with the watchdog. > > The lower level driver should never not return an answer, it is > supposed to guarantee that it returns an error if the BMC doesn't > respond. So the bug is not here, the bug is elsewhere. I've been tracking a related issue (RHEL customer case) where BMC reset while the IPMI watchdog is active causes D-state hangs. This appears to be the same root cause Matt is hitting. I backported the recent upstream KCS/SI fixes to a RHEL 9 test kernel (54 patches bringing it to mainline parity) and tested today on a Dell R640. Test: Trigger `ipmitool mc reset cold` while watchdog daemon is running. Results with backported fixes: [ 245.376402] IPMI Watchdog: heartbeat completion received [ 246.376392] IPMI Watchdog: heartbeat send failure: -16 [ 247.377560] IPMI Watchdog: heartbeat send failure: -16 ... [ 252.413240] IPMI Watchdog: set timeout error: -16 The watchdog daemon received error 16 (EBUSY) and eventually initiated orderly shutdown: write watchdog device gave error 16 = 'Device or resource busy' shutting down the system because of error 16 Key finding: With the upstream fixes, the driver returns -EBUSY instead of blocking forever. No D-state hang. The watchdog daemon handles the error and initiates orderly reboot. Note: There was still a delay of several minutes before the daemon timed out and triggered shutdown. The driver returned errors promptly, but the watchdog daemon's retry logic (error retry time-out = 120 seconds) extended the overall recovery time. This may warrant a separate look at whether the daemon's retry behavior is appropriate when the BMC is completely unresponsive. This confirms Corey's assessment - the bug is in the lower-level driver not returning errors, not in __get_device_id(). Matt's timeout patch would be a defensive fallback, but the real fix is ensuring KCS/SI properly returns errors when the BMC is unresponsive. Tony ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] ipmi: Add timeout to unconditional wait in __get_device_id() 2026-04-15 15:46 ` Tony Camuso @ 2026-04-15 21:22 ` Frederick Lawler 2026-04-16 14:28 ` Tony Camuso 0 siblings, 1 reply; 9+ messages in thread From: Frederick Lawler @ 2026-04-15 21:22 UTC (permalink / raw) To: Tony Camuso Cc: corey, Matt Fleming, openipmi-developer, linux-kernel, kernel-team, Matt Fleming Hi Corey & Tony, On Wed, Apr 15, 2026 at 11:46:27AM -0400, 'Tony Camuso' via kernel-team wrote: > On Wed, Apr 15, 2026 at 12:59:30PM +0100, Matt Fleming wrote: > > From: Matt Fleming <mfl...@cl...> > > > > When the BMC does not respond to a "Get Device ID" command, the > > wait_event() in __get_device_id() blocks forever in > > TASK_UNINTERRUPTIBLE while holding bmc->dyn_mutex. Every subsequent > > sysfs reader then piles up in D state. Replace with > > wait_event_timeout() to return -EIO after 1 second. > > On Wed, Apr 15, 2026 at 12:17:04PM, Corey Minyard wrote: > > This is the second report I have of something like this. So > > something is up. I'm adding Tony, who reported something like this > > dealing with the watchdog. > > > > The lower level driver should never not return an answer, it is > > supposed to guarantee that it returns an error if the BMC doesn't > > respond. So the bug is not here, the bug is elsewhere. This is a bit of a throwback to our previous discussions around [1]. I did end up applying [2] based on that discussion, and had limited success, but we still have external resets that cause us to enter this undesirable state :( [1]: https://lore.kernel.org/all/aJUMlAG17c6lCgFq@mail.minyard.net/ [2]: https://lore.kernel.org/all/20250807230648.1112569-2-corey@minyard.net/ > > I've been tracking a related issue (RHEL customer case) where BMC > reset while the IPMI watchdog is active causes D-state hangs. This > appears to be the same root cause Matt is hitting. > > I backported the recent upstream KCS/SI fixes to a RHEL 9 test kernel > (54 patches bringing it to mainline parity) and tested today on a > Dell R640. I assume this patch series: "ipmi:watchdog: Fix panic, D-state hang, and lost protection on BMC reset" [3]? [3]: https://lore.kernel.org/all/20260407175134.3367345-1-tcamuso@redhat.com/ ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] ipmi: Add timeout to unconditional wait in __get_device_id() 2026-04-15 21:22 ` Frederick Lawler @ 2026-04-16 14:28 ` Tony Camuso 2026-04-17 16:01 ` Matt Fleming 0 siblings, 1 reply; 9+ messages in thread From: Tony Camuso @ 2026-04-16 14:28 UTC (permalink / raw) To: Frederick Lawler Cc: corey, Matt Fleming, openipmi-developer, linux-kernel, kernel-team, Matt Fleming On 4/15/26 5:22 PM, Frederick Lawler wrote: > Hi Corey & Tony, > > On Wed, Apr 15, 2026 at 11:46:27AM -0400, 'Tony Camuso' via kernel-team wrote: >> On Wed, Apr 15, 2026 at 12:59:30PM +0100, Matt Fleming wrote: >>> From: Matt Fleming <mfl...@cl...> >>> >>> When the BMC does not respond to a "Get Device ID" command, the >>> wait_event() in __get_device_id() blocks forever in >>> TASK_UNINTERRUPTIBLE while holding bmc->dyn_mutex. Every subsequent >>> sysfs reader then piles up in D state. Replace with >>> wait_event_timeout() to return -EIO after 1 second. >> >> On Wed, Apr 15, 2026 at 12:17:04PM, Corey Minyard wrote: >>> This is the second report I have of something like this. So >>> something is up. I'm adding Tony, who reported something like this >>> dealing with the watchdog. >>> >>> The lower level driver should never not return an answer, it is >>> supposed to guarantee that it returns an error if the BMC doesn't >>> respond. So the bug is not here, the bug is elsewhere. > > This is a bit of a throwback to our previous discussions around [1]. > > I did end up applying [2] based on that discussion, and had limited > success, but we still have external resets that cause us to enter > this undesirable state :( In my testing with updates from the Linus tree, after a BMC cold reset: 1. The KCS driver returned -EBUSY to callers (good) 2. The watchdog daemon received the error and initiated shutdown 3. No D-state hang My tests, conducted on a Dell PER640, verified that Corey's upstream fixes cause the driver to properly return errors instead of blocking. At least on that platform. Which hich low-level driver are you using (KCS, BT, SSIF)? The PER640 uses KCS. # cat /sys/class/ipmi/ipmi0/device/params 2>/dev/null kcs,i/o,0xca8,rsp=4,rsi=1,rsh=0,irq=10,ipmb=32 > [1]: https://lore.kernel.org/all/aJUMlAG17c6lCgFq@mail.minyard.net/ > [2]: https://lore.kernel.org/all/20250807230648.1112569-2-corey@minyard.net/ >> >> I've been tracking a related issue (RHEL customer case) where BMC >> reset while the IPMI watchdog is active causes D-state hangs. This >> appears to be the same root cause Matt is hitting. >> >> I backported the recent upstream KCS/SI fixes to a RHEL 9 test kernel >> (54 patches bringing it to mainline parity) and tested today on a >> Dell R640. > > I assume this patch series: "ipmi:watchdog: Fix panic, D-state hang, and > lost protection on BMC reset" [3]? > > [3]: https://lore.kernel.org/all/20260407175134.3367345-1-tcamuso@redhat.com/ > Actually, no. The 54 commits I backported simply bring my RHEL-9 test kernel to parity with the Linus tree, which includes [2] and ... cae66f1a1dcd 2026-02-13 corey@minyard.net ipmi:si: Fix check for a misbehaving BMC ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] ipmi: Add timeout to unconditional wait in __get_device_id() 2026-04-16 14:28 ` Tony Camuso @ 2026-04-17 16:01 ` Matt Fleming 0 siblings, 0 replies; 9+ messages in thread From: Matt Fleming @ 2026-04-17 16:01 UTC (permalink / raw) To: Tony Camuso Cc: Frederick Lawler, corey, openipmi-developer, linux-kernel, kernel-team, Matt Fleming On Thu, Apr 16, 2026 at 10:28:50AM -0400, Tony Camuso wrote: > > In my testing with updates from the Linus tree, after a BMC cold reset: > 1. The KCS driver returned -EBUSY to callers (good) > 2. The watchdog daemon received the error and initiated shutdown > 3. No D-state hang > > My tests, conducted on a Dell PER640, verified that Corey's upstream fixes > cause the driver to properly return errors instead of blocking. > At least on that platform. > > Which hich low-level driver are you using (KCS, BT, SSIF)? > The PER640 uses KCS. > # cat /sys/class/ipmi/ipmi0/device/params 2>/dev/null > kcs,i/o,0xca8,rsp=4,rsi=1,rsh=0,irq=10,ipmb=32 $ cat /sys/class/ipmi/ipmi0/device/params kcs,i/o,0xca2,rsp=1,rsi=1,rsh=0,irq=0,ipmb=32 attentions 3 complete_transactions 7080342 events 3 flag_fetches 0 hosed_count 1 idles 25359147 incoming_messages 0 interrupts 0 long_timeouts 264790 short_timeouts 13723711 watchdog_pretimeouts 0 > Actually, no. The 54 commits I backported simply bring my RHEL-9 test kernel > to parity with the Linus tree, which includes [2] and ... > cae66f1a1dcd 2026-02-13 corey@minyard.net ipmi:si: Fix check for a misbehaving BMC Ah, I see we have some machines on v6.18.20 which includes this and they're still triggering this problem. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] ipmi: Add timeout to unconditional wait in __get_device_id() 2026-04-15 12:16 ` Corey Minyard 2026-04-15 15:46 ` Tony Camuso @ 2026-04-17 15:41 ` Matt Fleming 2026-04-17 22:23 ` Matt Fleming 2 siblings, 0 replies; 9+ messages in thread From: Matt Fleming @ 2026-04-17 15:41 UTC (permalink / raw) To: Corey Minyard Cc: Tony Camuso, openipmi-developer, linux-kernel, kernel-team, Matt Fleming On Wed, Apr 15, 2026 at 07:16:53AM -0500, Corey Minyard wrote: > > I've seen this before in several scenarios, including a system that put > IPMI in the ACPI tree and it sort of worked but there was no BMC > present. I had to disable that particular device. > > What hardware is involved here? I'm fairly sure we've seen this across a bunch of different BMCs, so it's not a BMC hardware thing. Almost certainly a driver issue. > Can you give a more detailed example of what's happening in the > low-level hardware? If it's KCS there's a debug flag in the > drivers/char/ipmi/ipmi_kcs_sm.c file that should help. Yep, it's KCS. Unfortunately I haven't found a way to reproduce this reliably yet. Looking at a wedged machine (cat /sys/class/ipmi/.../firmware_revision) with drgn I can see that there's 99,846 messages sat on intf->xmit_msgs and the KCS SM is idle (it's responding to internal traffic like Get Global Enables and Get Msg Flags). So it looks like completions are getting dropped. We're running a 6.18.18 kernel which includes c08ec55617cb ("ipmi: Fix use-after-free and list corruption on sender error"), so it's not that. Here's a dump of some of the data structures. intf = 0xffff9d2f4a5a0000 intf->curr_msg = 0xffff9d34f21a9c00 intf->xmit_msgs.next = 0xffff9d30c28e3c80 intf->waiting_rcv_msgs = empty intf->maintenance_mode = 0 intf->maintenance_mode_state = 0 intf->in_shutdown = false intf->seq_table = 0/64 slots used intf->smi_work.pending = 0 The stuck message itself — intf->curr_msg: msg @ 0xffff9d34f21a9c00 .data = { 0x18, 0x01 } # NetFn 0x06 (App), cmd 0x01 = Get Device ID .data_size = 2 .rsp_size = 38 .rsp[0..7] = 2c 01 00 00 ... .done = free_smi_msg .user_data = NULL .msgid = (internal GDI poll) .type = IPMI_SMI_MSG_TYPE_NORMAL smi_info = 0xffff9d2f4a010000 smi_info->si_state = SI_NORMAL (0) smi_info->curr_msg = 0xffff9d2f48c7b800 smi_info->waiting_msg = NULL smi_info->interrupt_disabled = false smi_info->supports_event_msg_buff = true smi_info->io.irq = 0 smi_info->run_to_completion = false smi_info->in_maintenance_mode = 0 Let me know if you want any other info. I'll try to trace this and catch it reproducing. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] ipmi: Add timeout to unconditional wait in __get_device_id() 2026-04-15 12:16 ` Corey Minyard 2026-04-15 15:46 ` Tony Camuso 2026-04-17 15:41 ` Matt Fleming @ 2026-04-17 22:23 ` Matt Fleming 2026-04-17 23:53 ` Corey Minyard 2 siblings, 1 reply; 9+ messages in thread From: Matt Fleming @ 2026-04-17 22:23 UTC (permalink / raw) To: Corey Minyard Cc: Tony Camuso, openipmi-developer, linux-kernel, kernel-team, Matt Fleming On Wed, Apr 15, 2026 at 07:16:53AM -0500, Corey Minyard wrote: > > The lower level driver should never not return an answer, it is supposed > to guarantee that it returns an error if the BMC doesn't respond. > > So the bug is not here, the bug is elsewhere. My guess is that there > is some new failure mode where a BMC is not working but it responds well > enough that it sort of works and fools the driver. But that's only a > guess. I can now reproduce this pretty reliably by running concurrent ipmitool commands (sensor/sel/mc info) + sysfs readers + periodic ipmitool mc reset cold. It wedges in a few minutes. My working theory is handle_flags() in ipmi_si_intf.c can loop on flag-driven commands (e.g. READ_EVENT_MSG_BUFFER) without ever calling start_next_msg(), starving waiting_msg indefinitely. Captured state at wedge: si_state=SI_GETTING_EVENTS msg_flags=0x02 si_curr cycling cmd=0x35 (READ_EVENT_MSG_BUFFER) si_wait frozen cmd=0x08 (GET_DEVICE_GUID, never promoted) The cold reset makes the BMC report EVENT_MSG_BUFFER_FULL during re-init, which drives the flag loop. Thanks, Matt ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] ipmi: Add timeout to unconditional wait in __get_device_id() 2026-04-17 22:23 ` Matt Fleming @ 2026-04-17 23:53 ` Corey Minyard 0 siblings, 0 replies; 9+ messages in thread From: Corey Minyard @ 2026-04-17 23:53 UTC (permalink / raw) To: Matt Fleming Cc: Tony Camuso, openipmi-developer, linux-kernel, kernel-team, Matt Fleming On Fri, Apr 17, 2026 at 11:23:03PM +0100, Matt Fleming wrote: > On Wed, Apr 15, 2026 at 07:16:53AM -0500, Corey Minyard wrote: > > > > The lower level driver should never not return an answer, it is supposed > > to guarantee that it returns an error if the BMC doesn't respond. > > > > So the bug is not here, the bug is elsewhere. My guess is that there > > is some new failure mode where a BMC is not working but it responds well > > enough that it sort of works and fools the driver. But that's only a > > guess. > > I can now reproduce this pretty reliably by running concurrent > ipmitool commands (sensor/sel/mc info) + sysfs readers + periodic > ipmitool mc reset cold. It wedges in a few minutes. Hmm. If you are sending cold resets, then the driver is going into reset maintenance mode and it should be rejecting messages for 30 seconds after you send that command. You can disable that by changing is_maintenance_mode_cmd() in ipmi_msghandler.c to always return false. > > My working theory is handle_flags() in ipmi_si_intf.c can loop on > flag-driven commands (e.g. READ_EVENT_MSG_BUFFER) without ever calling > start_next_msg(), starving waiting_msg indefinitely. > > Captured state at wedge: > > si_state=SI_GETTING_EVENTS msg_flags=0x02 > si_curr cycling cmd=0x35 (READ_EVENT_MSG_BUFFER) > si_wait frozen cmd=0x08 (GET_DEVICE_GUID, never promoted) > > The cold reset makes the BMC report EVENT_MSG_BUFFER_FULL during > re-init, which drives the flag loop. The EVENT_MSG_BUFFER_FULL flag only gets cleared when a unsuccessful READ_EVENT_MSG_BUFFER command completes. Getting data from the BMC has higher priority than sending data to the BMC. If the BMC continually reports success from READ_EVENT_MSG_BUFFER, then that would certainly wedge the driver. But it would have to continually report success for that command, which would be strange as its supposed to error out when the queue is empty. If it's really something like that, I could also look at adding limits for those operations. To debug things like this I often add module_params that let me see what is going on. But you can look at the "invalid_events" counter to see if the data is bogus. Or there should be an "Event queue full, discarding incoming events" log coming out once at the beginning of when this happens. -corey > > Thanks, > Matt ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2026-04-17 23:54 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-04-15 11:59 [PATCH] ipmi: Add timeout to unconditional wait in __get_device_id() Matt Fleming 2026-04-15 12:16 ` Corey Minyard 2026-04-15 15:46 ` Tony Camuso 2026-04-15 21:22 ` Frederick Lawler 2026-04-16 14:28 ` Tony Camuso 2026-04-17 16:01 ` Matt Fleming 2026-04-17 15:41 ` Matt Fleming 2026-04-17 22:23 ` Matt Fleming 2026-04-17 23:53 ` Corey Minyard
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox