From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-oa1-f47.google.com (mail-oa1-f47.google.com [209.85.160.47]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 62AC934B410 for ; Mon, 20 Apr 2026 16:33:07 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.160.47 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776702788; cv=none; b=AXEvfa68flF/xHqNU25E2045CifLOPojP9CHGwkISiTuCveHlsKwAc9ir6gXp9jSPVBlstf6gcBxL/5fseM4Ve46j63++2cAYgRPfGQkZniz8nU2ZSQ0NsfrQIjQigzXBNqanPVC9iDCcwj0StJUN+xemAf7MNGWd6Yyvd9kgec= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776702788; c=relaxed/simple; bh=9uo5Ny6L0qR3/CZHhI4nz2z3znQhRAsezRw2F+j7AoM=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=jGkFde5kJMjfuWO0W9K30L7X+29ioLlFfSMg00YwBV0kGLWBkVGlJMi09VzF7dWZ73aBzXt0F0dLcBVXZhS5gQrNUOMlTrnWeDX/oWEVv9zlavgSLk1J4h1VIJ2C3jfXsO7CLlxlmQ4ljl5uQas5oSv6gWNW8ofx9hk5KWM+p6M= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=minyard.net; spf=pass smtp.mailfrom=minyard.net; dkim=pass (2048-bit key) header.d=minyard.net header.i=@minyard.net header.b=RCCcJUjk; arc=none smtp.client-ip=209.85.160.47 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=minyard.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=minyard.net Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=minyard.net header.i=@minyard.net header.b="RCCcJUjk" Received: by mail-oa1-f47.google.com with SMTP id 586e51a60fabf-40974bf7781so3385939fac.0 for ; Mon, 20 Apr 2026 09:33:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=minyard.net; s=google; t=1776702786; x=1777307586; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:reply-to :message-id:subject:cc:to:from:date:from:to:cc:subject:date :message-id:reply-to; bh=o5iBShq+fd5aMbsN0/l2xsgHdkTpLIEqmvkauY4V3ds=; b=RCCcJUjk5aI4uBUC4c8J0B7EbjmcrpMC0owuyDz4VZxmraIeO9ee/SNX/fVw6NyKtY bRROwLVLl9H0ytzZZkIfBsKbyKXjaSpOI7e/vI2CP8L/jo25ijHQFm48vk9r+RzGmzZ6 j5TSNGhLXXKfAc5nfZBT77kuNCAjDQbTc4wULHD7fRC4vKzg5y6gWZ2z1MQ24jszQp/h M3t9zzRTmi4qJ3YBAbM/1yvvIXyFp+5s+/eep6ucBgdT/PkbzkszLn1HSh6Q8SVMRRHR z4hFn0CvholLRKCNjDRipzI4LnHh8RpIH8r2n2Vn0/CgkarIifC4SNEqjckVobRO1Ewz ZLqQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1776702786; x=1777307586; h=in-reply-to:content-disposition:mime-version:references:reply-to :message-id:subject:cc:to:from:date:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=o5iBShq+fd5aMbsN0/l2xsgHdkTpLIEqmvkauY4V3ds=; b=g6nEj1aJOr0hqJF61RZGqG2YGArau98RWAJ0VWx7Xw6FPbhXCbTx17HfV4XkZHEFcv Ppz0BBMEAC3hf7sd6BBHnUFNN/6phrLLVkOrH+f6X9dbvgIVOkqK4sh27Hdy8b9P1YrV wOv+7/xAPX18+cuAiyS8je6PwpKP6T74MSZejk90tvEuYgb5UgqA7BFbdLGK6Aa7E8Ev 8ewV4YIaIoSD159s5tpmP9cdRFhthRNvteD53mQD1DyAkKGkY0Mjaoyq3oaphEHfVpQl AIUx5lfVEyp9mPWz/4TFMp1kIY6SMcCIp449RBgL+Bv2pCg0pcD938AwFzFQvdUeUtuZ prbw== X-Forwarded-Encrypted: i=1; AFNElJ9hAJbB2ogpsYwGPDG5UP5K9U9/WYyqjE8CkIjf81BqQ6VvzcLvGu6aW5JEteaJjaf5uG93VdcYAhBWBso=@vger.kernel.org X-Gm-Message-State: AOJu0YwJ081QiCYdN+dGIbsCLneh2s5kR+nqCTZlB6Yz8fOO+TCF5wiV giE0arw55B41ErCdJ1l8XM6f6aUzWoP/Di20ZcyUjbkd5unIAbz8Qq5t5b1NeMMFgqs= X-Gm-Gg: AeBDiets/Ds7kYxrOvfCoFUmIi7x7fTlA4P0G2dILUkBOasGfrkp0WWlD4hi5rmE2Qj LRnuj8oKffxP9LZvvOtxXv73fsyZ3/cUkHwMKaTUuIrY/W7Zo32QNKH7pu4ALwmKZEaxNH9Pt6n oIKuLwvesROKPxSpfsyoIPqUUXNf9gDV5tAtv6c76g0THGuCNikYDAIFWFRKHippdmp7C+5ej0Y 4aaZyeGLsTlaILlWCrCzNXbtcI+dE1cmHLN05UWiAfaJ8DqY5uEZzz3qmNAQ+8MjEULxlEPc/t5 S5tQgfWbEz5sWNQdxYc05DXOOcawniVbxGUZ09mZpZCHpZ7QcV+ygTDzTtGNclL+QKDV5KF4Sqw CQYNoqlBt3c3/hBcYRIyhOKKn4ZX7FKgwf0DOVBfPExSL9WPy0Dy+YkpZEPHdJ+QqKgx4alQMpV SuGx/iQQFlPuaAu7vxHzIudv9mO7xNbKksHeEyVRaUAgcEuqfoRsPYnMAcLCqamGprSyFRxoFHg DRfqqTQs4Q/Kyf92CTLDgw6BA== X-Received: by 2002:a05:6820:1507:b0:67e:3305:d7eb with SMTP id 006d021491bc7-694636e74e5mr6069805eaf.10.1776702786180; Mon, 20 Apr 2026 09:33:06 -0700 (PDT) Received: from mail.minyard.net ([2001:470:b8f6:1b:d47a:597e:1b35:c35f]) by smtp.gmail.com with ESMTPSA id 006d021491bc7-69493183021sm325603eaf.13.2026.04.20.09.33.04 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 20 Apr 2026 09:33:04 -0700 (PDT) Date: Mon, 20 Apr 2026 11:33:01 -0500 From: Corey Minyard To: Matt Fleming Cc: Tony Camuso , openipmi-developer@lists.sourceforge.net, linux-kernel@vger.kernel.org, kernel-team@cloudflare.com, Matt Fleming , Frederick Lawler Subject: Re: [PATCH] ipmi: Add timeout to unconditional wait in __get_device_id() Message-ID: Reply-To: corey@minyard.net References: <20260415115930.3428942-1-matt@readmodwrite.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: On Sun, Apr 19, 2026 at 09:50:38PM +0100, Matt Fleming wrote: > On Fri, Apr 17, 2026 at 06:53:55PM -0500, Corey Minyard wrote: > > > > The EVENT_MSG_BUFFER_FULL flag only gets cleared when a unsuccessful > > READ_EVENT_MSG_BUFFER command completes. Getting data from the > > BMC has higher priority than sending data to the BMC. > > > > If the BMC continually reports success from READ_EVENT_MSG_BUFFER, then > > that would certainly wedge the driver. But it would have to continually > > report success for that command, which would be strange as its supposed > > to error out when the queue is empty. > > That does indeed appear to be what's happening. > > The implementation of intel-ipmi-oem's OpenBMC READ_EVENT_MSG_BUFFER > handler does not fail when there is nothing to read, > > https://github.com/openbmc/intel-ipmi-oem/blob/master/src/bridgingcommands.cpp#L704 Actually, that is so clearly wrong that it's hard to imagine they made this mistake. Anyway, a defect needs to be filed against it. It will certainly break other software on other OSes. I think I have an easy workaround, though I'm guessing. I'm guessing they are returning zero data bytes. There's no check on the size at that point in the code (it's later). Can you try the following patch? diff --git a/drivers/char/ipmi/ipmi_si_intf.c b/drivers/char/ipmi/ipmi_si_intf.c index 4a9e9de4d684..cf8674a93af1 100644 --- a/drivers/char/ipmi/ipmi_si_intf.c +++ b/drivers/char/ipmi/ipmi_si_intf.c @@ -630,7 +630,13 @@ static void handle_transaction_done(struct smi_info *smi_info) */ msg = smi_info->curr_msg; smi_info->curr_msg = NULL; - if (msg->rsp[2] != 0) { + /* + * It appears some BMCs, with no event data, return no + * data in the message and not a 0x80 error as the + * spec says they should. Shut down processing if + * the data is not the right length. + */ + if (msg->rsp[2] != 0 || smi_info->curr_msg->rsp_size != 19) { /* Error getting event, probably done. */ msg->done(msg); With your approval on that, I'll send it to Linus after letting it sit in the next tree for a bit. Actually, I'll probably add it in any case, as it's a good check to do. > > > If it's really something like that, I could also look at adding limits > > for those operations. > > That would be great. Me and Fred would be happy to test out any patch. > > I still think the original patch I sent is a worthwhile defense. > > Our periodic monitoring scripts cause TASK_UNINTERRUPTIBLE tasks to > block behind one another when we hit these kinds of issues in the IPMI > code. Untangling that across thousands of machines can be time > consuming and a more explicit EIO or ETIMEDOUT would help with triage. Unfortunately, that might have other issues, similar to the ones the people with the watchdog issue found. I'll look at it a bit, but those sorts of things would have to be scattered all over the code, not just in that one place. As you say, it would make debugging easier. I think adding a counter to the number of operations occuring from a single flag fetch would be a way to avoid this issue. That's going to take a little more time, but I'll definitely work on that. Thanks, -corey