From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-yw1-f172.google.com (mail-yw1-f172.google.com [209.85.128.172]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CA69B3AEF4C for ; Tue, 28 Apr 2026 13:06:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.172 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777381607; cv=none; b=nDiBps9d0PYQWS7ikfl3OGosTJqLW5cg4LoKV80TN1mR1U3BcVuM0LeDwV5fXY75nPsDRf/VWUOdFt7AXb1ZBgoNlwCsfkuwW6UosO+Pq7ZhnPa/yIBDTazVLrNyManfnnUeagmxRjgjx5eAqMygH4a6bDgypNZQAzmphoQNBhE= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777381607; c=relaxed/simple; bh=ORFcg6Ls6kv7lhN6d3CaFlr88gghIN/aXrL+pZEkSK4=; h=Date:From:To:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=BFfW/VTOP67pAUdzn+JG0GmeIHHY4rDUfbUXB6YJFQXho7WmLfUIaYKcnwzaj+eSzPaSOWO1DnYf5b7mOHwL4ehStIbcirI4Uwx+JJSFs4L4YWQP5T+G1pZ2GzIh0GMtzYGdBPhTz/ZuIqvrYgFr8exNT3t+t22tyDx228OkU1A= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=minyard.net; spf=pass smtp.mailfrom=minyard.net; dkim=pass (2048-bit key) header.d=minyard.net header.i=@minyard.net header.b=cjzfuVW/; arc=none smtp.client-ip=209.85.128.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=minyard.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=minyard.net Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=minyard.net header.i=@minyard.net header.b="cjzfuVW/" Received: by mail-yw1-f172.google.com with SMTP id 00721157ae682-79ea87af213so174735357b3.0 for ; Tue, 28 Apr 2026 06:06:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=minyard.net; s=google; t=1777381605; x=1777986405; darn=vger.kernel.org; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:reply-to:message-id:subject:to:from:date :from:to:cc:subject:date:message-id:reply-to; bh=+tsU5INn4bJgwh3/96W1961MQFOx5hK1YCsF02Kefwg=; b=cjzfuVW/jmSrG/W/YmV42xU5MZqlKYV2sl4EIfGJ7Khuh/iqikrjsz5ptiATBh/vhv n4M2sz2y1OrGdhdcAklYO8uTHpzncTfQnxPYopE61v/mDPFNjZwhC9PNfxMfx+enSm9s 1H4gCAEz+zXCyX7OZUI4bZ5nYBFJLiByR9bsyCuhlP3F3Bdx7Zm487782jvmLKD4sb7o baHVCcrCR4LmSAHAl1sWnfZmBaa6ggYc/s6T4dVlk2qVYFKSpTWeouN3ZSfB4EjahKtt T23NcO8qAPRnS9SBRo6Fid/PNx8HXq+CslHDeIqJhnudHbjCrq9b37oPfcb6B3J7IpVj BT4Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1777381605; x=1777986405; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:reply-to:message-id:subject:to:from:date :x-gm-gg:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=+tsU5INn4bJgwh3/96W1961MQFOx5hK1YCsF02Kefwg=; b=QwZ2SF0ntDYk/fT6KojdUANskxC1ZB2k+9p453UR00DJyjZfCaoZTm3LNWX9LA//xF /AzeW56mMhiFu1X5JiBbD5b9l3e88FvkQlUzpsWKiL/Rg5F2quLf/ItCz85qzPmHiedh fIsheogAYg5wx+gH73QiGMue13goRrw+vsbLy2GPLZY6ZM+TrqRez4WCQpgHCgFGuv0p 3BJ4TWVwNkYzLv6ohT86KjeStbzTX1XSCBiAMIbndA640p8Duws56Cqzy577G2TpjIOF 13epFl9xiWf6Vx8szXO+pZXKvwAJp6zhYso8DVE8VLTvriioHZkI5Z7h1wps7qhlYAC3 fs6Q== X-Forwarded-Encrypted: i=1; AFNElJ+JaicM8m/vdp9b/RMWJky+AafAsiNDbI9g4XtXEIrk8g9xd6xcrdz96YzKRtzP+FgfmyxlAzZAsC0HW/k=@vger.kernel.org X-Gm-Message-State: AOJu0YxpDvqWBA/wKRhutnkSKjgKbN7KMtL4rWzWbrVQqL9R8OlbSpNx 0sJrTWmqYntoiU+/XR4qOH1q9GLjvelOJP94acj3iaiyy6EBkJlR0QEWN7Z7HFsf4pI= X-Gm-Gg: AeBDievkvVofgFXVJhRtIEZ0xFl7axGbTA020S4khKNshzkRxpgGDobnZZhA6M6uUYs 9SuN3DtimGfmvhrHyKeHsNAFOGbEulonBHfsPDR0jMh0teR7juS8M1bHQ4Ub8Diu65aNkO9new3 msYMrzKAbdKoDmtTZ2fRRreQlTWky6DW/f6IcDpaaE6+LKJOs2xjE1gCll/qNveMPkHqjC1Mc/D 3TecCe8WNWA/ltaskFBTZsNTxQD4BPKHA7uP8/x4emaFjXyxtoiyGMLh3LKq1dA1k2d0RmmpAA1 FxlmXPoOuxFlyl1kQ2GIgypkH/2tWDqf4ezz7jNy54hLWjitpQlkGWEAM3047U7b4IY9QcrXWOd +n4ZqpWEhcYGrROa1kFDTSBqMogPtch23AJpipMgNXEbnIO+usQEFyISLmOAb571UcWr8vOJ10F sJB0HRnWED6g3PxEUZQz9bH40sCt4L/si/HOKaJ3dtlsEZ9hzmDWzyvsUk0sw6IBVgYLo5ATcbF XJq8raPx5LsMgUqmfw3O+usrQ== X-Received: by 2002:a05:690e:140f:b0:651:c221:9649 with SMTP id 956f58d0204a3-65bee79ad34mr2152879d50.19.1777381604398; Tue, 28 Apr 2026 06:06:44 -0700 (PDT) Received: from mail.minyard.net ([2001:470:b8f6:1b:7bc4:3841:9e4c:e2ac]) by smtp.gmail.com with ESMTPSA id 956f58d0204a3-65bee29f1b4sm1645074d50.5.2026.04.28.06.06.43 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 28 Apr 2026 06:06:43 -0700 (PDT) Date: Tue, 28 Apr 2026 08:06:39 -0500 From: Corey Minyard To: Matt Fleming , Matt Fleming , openipmi-developer@lists.sourceforge.net, Tony Camuso , linux-kernel@vger.kernel.org, kernel-team@cloudflare.com, stable@vger.kernel.org Subject: Re: [PATCH 2/2] ipmi: Add limits to event and receive message requests Message-ID: Reply-To: corey@minyard.net References: <20260421132544.2666174-1-corey@minyard.net> <20260421132544.2666174-3-corey@minyard.net> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: On Tue, Apr 28, 2026 at 06:45:15AM -0500, Corey Minyard wrote: > On Tue, Apr 28, 2026 at 11:15:46AM +0100, Matt Fleming wrote: > > On Sat, Apr 25, 2026 at 06:58:48PM -0500, Corey Minyard wrote: > > > > > > Oh, yeah. > > > > > > Moving it to handle_transaction_done() is not ideal, though. If > > > something was spewing receive messages, it would never get to events, > > > which is why I did it like I did. > > > > > > The following should fix this. You could also have different limits for > > > receive messages and events, but I think the following is more clear. > > > > > > diff --git a/drivers/char/ipmi/ipmi_si_intf.c b/drivers/char/ipmi/ipmi_si_intf.c > > > index 2a739123270c..e46f4150ceb5 100644 > > > --- a/drivers/char/ipmi/ipmi_si_intf.c > > > +++ b/drivers/char/ipmi/ipmi_si_intf.c > > > @@ -413,8 +413,10 @@ static void start_getting_msg_queue(struct smi_info *smi_info) > > > > > > start_new_msg(smi_info, smi_info->curr_msg->data, > > > smi_info->curr_msg->data_size); > > > - smi_info->num_requests_in_a_row = 0; > > > - smi_info->si_state = SI_GETTING_MESSAGES; > > > + if (smi_info->si_state != SI_GETTING_MESSAGES) { > > > + smi_info->num_requests_in_a_row = 0; > > > + smi_info->si_state = SI_GETTING_MESSAGES; > > > + } > > > } > > > > > > static void start_getting_events(struct smi_info *smi_info) > > > @@ -425,8 +427,10 @@ static void start_getting_events(struct smi_info *smi_info) > > > > > > start_new_msg(smi_info, smi_info->curr_msg->data, > > > smi_info->curr_msg->data_size); > > > - smi_info->num_requests_in_a_row = 0; > > > - smi_info->si_state = SI_GETTING_EVENTS; > > > + if (smi_info->si_state != SI_GETTING_EVENTS) { > > > + smi_info->num_requests_in_a_row = 0; > > > + smi_info->si_state = SI_GETTING_EVENTS; > > > + } > > > > Thanks. Does this correctly handle a stream of mixed receive+event > > flags, though? If we bounce between MESSAGES and EVENTS, won't we keep > > resetting the counter on each state transition and never hit the limit? > > It should. Once the limit is reached it clears the bit in msg_flags. > That should prevent the messages or events from being re-requested > until the next flags fetch. So if a continuous stream of messages > and events was coming in, it should fetch 10 messages, clear the bit, > then fetch 10 events, then clear that bit, then go back to normal > operation. > > Of course, the next flag fetch will start the process over. Actually, probing deeper, this probably won't work. And I'm not sure there is much I can do to fix it. It will be much harder. But it depends on how the BMC handles this. If there is something in the event queue or receive message queue, the BMC is supposed to set an attention bit (ATTN) on the interface If ATTN is set, the driver is supposed to fetch flags to know what it needs to do. I haven't tried, but in the qemu changes below, I'm fairly sure the ATTN bit will never get cleared, thus when it goes through all this the KCS state machine will return SI_SM_ATTN at the end and the flag fetch will start again. You are still wedged. The qemu version also runs with interrupts by default, which only magnifies the problem. In that case, when ATTN is set and you aren't running transactions, the interrupt is enabled. On KCS there is no way in the KCS interface to disable the interrupt at the interface, you have to send it a message or disable it with disable_irq(). But the actual failing BMC may not work this way. It may or may not clear the Event Message Buffer Full flag. It may or may not do anything with ATTN. A driver can only do so much to account for broken hardware. The driver is already too complex, a lot of it due to having to handle broken hardware. Fixing this adds more complexity and penalizes hardware that works properly. Anyway, I'm going to need to get this failing in simulation and figure out how to handle this. Yet more issues may come up, especially with interrupts. Is there any way you can just get the hardware fixed? It's never going to work very well as it is. I'd be inclined to just denylist it. -corey > > > > > I was able to cook up a simple repro in Qemu for this class of bug. > > I was thinking about how to do an automated test for this. I use an > external simulated BMC for the automated tests I have. So I'll work in > that direction. > > But thanks, this should help me develop that test. > > -corey > > > > > ---->8---- > > > > diff --git a/hw/ipmi/ipmi_bmc_sim.c b/hw/ipmi/ipmi_bmc_sim.c > > index fd875491f5..127db30c03 100644 > > --- a/hw/ipmi/ipmi_bmc_sim.c > > +++ b/hw/ipmi/ipmi_bmc_sim.c > > @@ -249,6 +249,21 @@ struct IPMIBmcSim { > > uint8_t evtbuf[16]; > > > > QTAILQ_HEAD(, IPMIRcvBufEntry) rcvbufs; > > + > > + /* > > + * Fault injection: sticky EVENT_MSG_BUFFER_FULL. > > + * > > + * Simulates a BMC that continuously generates events (e.g. after a > > + * cold reset causes sensor state changes). Once armed, every > > + * READ_EVENT_MSG_BUFFER returns success but never clears the > > + * EVT_BUF_FULL flag, starving waiting_msg in the SI layer's > > + * handle_flags() loop. Reproduces the 517m277 / KRN-1233 wedge. > > + */ > > + bool fi_sticky_events; /* enable via property */ > > + uint32_t fi_evt_arm_after; /* arm after N completed responses */ > > + uint32_t fi_evt_rsp_count; /* responses completed so far */ > > + bool fi_evt_armed; /* fault is active */ > > + uint32_t fi_evt_read_count; /* READ_EVENT_MSG_BUFFER calls served */ > > }; > > > > #define IPMI_BMC_MSG_FLAG_WATCHDOG_TIMEOUT_MASK (1 << 3) > > @@ -494,7 +509,7 @@ static int sel_add_event(IPMIBmcSim *ibs, uint8_t *event) > > static int attn_set(IPMIBmcSim *ibs) > > { > > return IPMI_BMC_MSG_FLAG_RCV_MSG_QUEUE_SET(ibs) > > - || IPMI_BMC_MSG_FLAG_EVT_BUF_FULL_SET(ibs) > > + || (!ibs->fi_evt_armed && IPMI_BMC_MSG_FLAG_EVT_BUF_FULL_SET(ibs)) > > || IPMI_BMC_MSG_FLAG_WATCHDOG_TIMEOUT_MASK_SET(ibs); > > } > > > > @@ -750,6 +765,48 @@ static void ipmi_sim_handle_command(IPMIBmc *b, > > out: > > k->handle_rsp(s, msg_id, rsp.buffer, rsp.len); > > > > + /* > > + * Fault injection: count completed responses and arm sticky > > + * EVENT_MSG_BUFFER_FULL after the configured threshold. > > + */ > > + if (ibs->fi_sticky_events && !ibs->fi_evt_armed) { > > + ibs->fi_evt_rsp_count++; > > + if (ibs->fi_evt_rsp_count >= ibs->fi_evt_arm_after) { > > + ibs->fi_evt_armed = true; > > + ibs->fi_evt_read_count = 0; > > + > > + /* > > + * Seed the event buffer with a synthetic sensor event > > + * (sensor type 0x01 = Temperature, event type 0x01 = > > + * threshold, evd1 = upper critical going high). > > + * This matches what real BMCs generate after a cold reset. > > + */ > > + memset(ibs->evtbuf, 0, 16); > > + ibs->evtbuf[2] = 0x02; /* System event record */ > > + ibs->evtbuf[7] = ibs->parent.slave_addr; > > + ibs->evtbuf[9] = 0x04; /* Format version */ > > + ibs->evtbuf[10] = 0x01; /* Sensor type: Temperature */ > > + ibs->evtbuf[11] = 0x01; /* Sensor number */ > > + ibs->evtbuf[12] = 0x01; /* Event type: threshold */ > > + ibs->evtbuf[13] = 0x09; /* Upper critical going high */ > > + ibs->evtbuf[14] = 0x57; /* Threshold value */ > > + ibs->evtbuf[15] = 0x00; /* Sequence (incremented on reads) */ > > + > > + ibs->msg_flags |= IPMI_BMC_MSG_FLAG_EVT_BUF_FULL; > > + > > + /* Ensure event message buffer is enabled in global enables > > + * so the kernel sees it. Also enable event logging. > > + */ > > + ibs->bmc_global_enables |= (1 << IPMI_BMC_EVENT_MSG_BUF_BIT) > > + | (1 << IPMI_BMC_EVENT_LOG_BIT); > > + > > + k->set_atn(s, 1, attn_irq_enabled(ibs)); > > + > > + fprintf(stderr, "[FI] sticky-events armed after %u responses\n", > > + ibs->fi_evt_rsp_count); > > + } > > + } > > + > > next_timeout(ibs); > > } > > > > @@ -1013,8 +1070,14 @@ static void clr_msg_flags(IPMIBmcSim *ibs, > > { > > IPMIInterface *s = ibs->parent.intf; > > IPMIInterfaceClass *k = IPMI_INTERFACE_GET_CLASS(s); > > + uint8_t clear_mask = cmd[2]; > > + > > + if (ibs->fi_evt_armed) { > > + /* Don't allow clearing EVT_BUF_FULL when sticky events active */ > > + clear_mask &= ~IPMI_BMC_MSG_FLAG_EVT_BUF_FULL; > > + } > > > > - ibs->msg_flags &= ~cmd[2]; > > + ibs->msg_flags &= ~clear_mask; > > k->set_atn(s, attn_set(ibs), attn_irq_enabled(ibs)); > > } > > > > @@ -1040,7 +1103,19 @@ static void read_evt_msg_buf(IPMIBmcSim *ibs, > > for (i = 0; i < 16; i++) { > > rsp_buffer_push(rsp, ibs->evtbuf[i]); > > } > > - ibs->msg_flags &= ~IPMI_BMC_MSG_FLAG_EVT_BUF_FULL; > > + > > + if (ibs->fi_evt_armed) { > > + /* > > + * Sticky mode: return success but keep EVT_BUF_FULL set. > > + * Vary the event data slightly so the kernel doesn't > > + * de-duplicate (increment evd3 as a sequence number). > > + */ > > + ibs->fi_evt_read_count++; > > + ibs->evtbuf[15] = (uint8_t)(ibs->fi_evt_read_count & 0xff); > > + /* Don't clear the flag — starvation continues */ > > + } else { > > + ibs->msg_flags &= ~IPMI_BMC_MSG_FLAG_EVT_BUF_FULL; > > + } > > k->set_atn(s, attn_set(ibs), attn_irq_enabled(ibs)); > > } > > > > @@ -2670,6 +2745,8 @@ static const Property ipmi_sim_properties[] = { > > DEFINE_PROP_STRING("lan.netmask", IPMIBmcSim, lan.arg_netmask), > > DEFINE_PROP_STRING("lan.defgw_ipaddr", IPMIBmcSim, lan.arg_defgw_ipaddr), > > DEFINE_PROP_MACADDR("lan.defgw_macaddr", IPMIBmcSim, lan.defgw_macaddr), > > + DEFINE_PROP_BOOL("fi_sticky_events", IPMIBmcSim, fi_sticky_events, false), > > + DEFINE_PROP_UINT32("fi_evt_arm_after", IPMIBmcSim, fi_evt_arm_after, 40), > > }; > > > > static void ipmi_sim_class_init(ObjectClass *oc, const void *data)