From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-ot1-f44.google.com (mail-ot1-f44.google.com [209.85.210.44]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B05223AA1BD for ; Mon, 20 Apr 2026 18:11:30 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.44 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776708692; cv=none; b=YsINk0K7pk4I+rd4FB+074DLt8sWdQSfpEQ76upfhx+ns5XQYfhgn1J15laEg4OY/8L6Nh4nwY4F3keuOCqPyIp+uXqYdxTSxj8wWTB+fWPoBMea/w0988lC0kiaOMF0cSvcDfNXlkKE9L7O7+eCNDppkDoN5lNFFy/uT3R08EI= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776708692; c=relaxed/simple; bh=NIS000XOSsNXrXTmY4myNeJ1PDYxrF+KcIxSD6aFGKQ=; h=Date:From:To:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=c4rXgUI5jA07o0GnhuGVJLIvVHJFUlV7Z/I4Hu9BpA2D+yhZmKGkeKSH2YqzJX754VuR8F4/YQ2Y0WNPoS9InlWHdqFbDzY9OT+zNrW9QRuVsxdJ8rbQnXlwQPcRhoJY6VQBgR16iKRSIQutFRndUQ7pXWFDUs5otFg9cnu5+Kw= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=minyard.net; spf=pass smtp.mailfrom=minyard.net; dkim=pass (2048-bit key) header.d=minyard.net header.i=@minyard.net header.b=EBhfOoZh; arc=none smtp.client-ip=209.85.210.44 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=minyard.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=minyard.net Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=minyard.net header.i=@minyard.net header.b="EBhfOoZh" Received: by mail-ot1-f44.google.com with SMTP id 46e09a7af769-7dbe07d3ec3so1685769a34.0 for ; Mon, 20 Apr 2026 11:11:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=minyard.net; s=google; t=1776708689; x=1777313489; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:reply-to :message-id:subject:to:from:date:from:to:cc:subject:date:message-id :reply-to; bh=NZqklhuDF1k0Dbjjx4jNSxPWDEioxX8SCeAanRedit4=; b=EBhfOoZhG6nai+ZhM2Bj7n5ukxyiHxH6LMxgkkw7FKhkazbQtRggOOoeTHi2nP/AuA PKH0Km8lPObTiHBKO7mSLIbrdYcgahJEwGffOD1x9UHssmItbm2g2yrJ+uQzIiS3Jdei Z/AAYyHC5uMItbhc5LVcwdtphRQSDjl2rq3+2xImvDEOvzD9uKYeb4FR+DNPdnVu5HO8 MFAlFDRa0q/bwRhMsXflUyk8DLgbTszpdH0XbmPxZQx3A5Losgch2pMtOBv7xVkfXeWX lgxyVAZSs6u8/rxciFg/5wQPI5k5LPMSW3+IfF8w/bJfx48/CvjSEAv6xYWy0vsPjk5B vBHA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1776708689; x=1777313489; h=in-reply-to:content-disposition:mime-version:references:reply-to :message-id:subject:to:from:date:x-gm-gg:x-gm-message-state:from:to :cc:subject:date:message-id:reply-to; bh=NZqklhuDF1k0Dbjjx4jNSxPWDEioxX8SCeAanRedit4=; b=QbZIWI/vGTAkwc3OEwLR2v0iGOccg7D/fzQJWemBtu3s2UE+uht+KrK6VBsOhmWtI5 F2IYcZMzmdb4Rr6ZQkHSQnPB1Ou1KBp4Vzjj7c2MkezeMI9wU/Sto23EexqV7ilnOmuF GUked3ncMtXpw5E8MYs19hHGvcxxKFOJESP3twD/HzIbSyaaTqoW7LeFTomj4ljduQTF OI1lRI/bggTjj7P+prBHDDAumGKwaYiTNbPGlxQzezGeaduJQ4QUV/x2rse3JD6E/iwa PnTc1ZB7qF4KU2dktJ27iifoMaAKCgt/+VqAHZprdJhWblgRnOP75Ade4Jqsu5nIGvZK qdBQ== X-Forwarded-Encrypted: i=1; AFNElJ//at7TtuttndeHw4BaegNm7JoQdCD5aFUp2VuGQJcVcJh63c4fWMXchABBnaUtOgxlAl+c4ahvBjAryAg=@vger.kernel.org X-Gm-Message-State: AOJu0YxnFws4YHpEQ8ga7A2hWbCLVicxyNlHjaR9iDCjpIibH5cbZe5W bpJ1uX8tRcelHs0sHnbmFNDR8CmghLjv/CavSegpoDEtlnrZ2fP4u56iUWC4z9ZTIAo= X-Gm-Gg: AeBDiev1e0LkIRWnTtcF8Q8HRQ2xUqKWMdzH0G8Rt8StCFZAqJY9loluOFzIkqzG0lr MvB3SM8kXUqRPkNb+Ym5HIKF7cqoublPHsEWIgGt8sW4jz7X6f/MFp4dndwhqpdeg7q8757MXEZ YFqHpCoYn+yhNReSvJ7H2X8qeSIr5Funey4B48i36ES2qAzWCX+fIVV8LJ68JinreZrrBulD5T/ VOHkoH1U1BWa4MWOL2+4t+H3Hq/ftpuTjA0Xnz74jkQHY+4MEyKeSdAiqMckrgJt3/AOrncHGv4 GB2cP/tk9QS7bZz3P0ArZ/EgZ+RxKzmU2fOzYNS/DEH/S7f7OVCqWnv2BRFoLR5q5ekyiyvE0r4 h25wcL63s3y9nTr2CVMMvHUB/6369xYOckL9lRXEO97bjbOkhJB0wy8c0tVDM2ZeYI5GgIrk8LO GbdoAyUzVh6vExPszAiJTsH4AoYntHvLUFumIZdXHwCGkl7I4ApTisYoHitkfkFTU2cw51chMCn vojIy2g5lveHlO+HnGHPlmKBA== X-Received: by 2002:a05:6830:3149:b0:7d9:f50f:968a with SMTP id 46e09a7af769-7dc951177a9mr8609038a34.5.1776708689404; Mon, 20 Apr 2026 11:11:29 -0700 (PDT) Received: from mail.minyard.net ([2001:470:b8f6:1b:d47a:597e:1b35:c35f]) by smtp.gmail.com with ESMTPSA id 46e09a7af769-7dcc712b4easm2167871a34.23.2026.04.20.11.11.28 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 20 Apr 2026 11:11:28 -0700 (PDT) Date: Mon, 20 Apr 2026 13:11:25 -0500 From: Corey Minyard To: Matt Fleming , Tony Camuso , openipmi-developer@lists.sourceforge.net, linux-kernel@vger.kernel.org, kernel-team@cloudflare.com, Matt Fleming , Frederick Lawler Subject: Re: [PATCH] ipmi: Add timeout to unconditional wait in __get_device_id() Message-ID: Reply-To: corey@minyard.net References: <20260415115930.3428942-1-matt@readmodwrite.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: On Mon, Apr 20, 2026 at 11:33:01AM -0500, Corey Minyard wrote: > On Sun, Apr 19, 2026 at 09:50:38PM +0100, Matt Fleming wrote: > > On Fri, Apr 17, 2026 at 06:53:55PM -0500, Corey Minyard wrote: > > > > > > The EVENT_MSG_BUFFER_FULL flag only gets cleared when a unsuccessful > > > READ_EVENT_MSG_BUFFER command completes. Getting data from the > > > BMC has higher priority than sending data to the BMC. > > > > > > If the BMC continually reports success from READ_EVENT_MSG_BUFFER, then > > > that would certainly wedge the driver. But it would have to continually > > > report success for that command, which would be strange as its supposed > > > to error out when the queue is empty. > > > > That does indeed appear to be what's happening. > > > > The implementation of intel-ipmi-oem's OpenBMC READ_EVENT_MSG_BUFFER > > handler does not fail when there is nothing to read, > > > > https://github.com/openbmc/intel-ipmi-oem/blob/master/src/bridgingcommands.cpp#L704 > > Actually, that is so clearly wrong that it's hard to imagine they made > this mistake. Anyway, a defect needs to be filed against it. It will > certainly break other software on other OSes. > > I think I have an easy workaround, though I'm guessing. I'm guessing > they are returning zero data bytes. There's no check on the size at that > point in the code (it's later). > > Can you try the following patch? Actually ignore that, I was thinking too much about the other stuff and didn't actually read my patch. Here's a working patch: diff --git a/drivers/char/ipmi/ipmi_si_intf.c b/drivers/char/ipmi/ipmi_si_intf.c index 4a9e9de4d684..08c208cc64c5 100644 --- a/drivers/char/ipmi/ipmi_si_intf.c +++ b/drivers/char/ipmi/ipmi_si_intf.c @@ -630,7 +630,13 @@ static void handle_transaction_done(struct smi_info *smi_info) */ msg = smi_info->curr_msg; smi_info->curr_msg = NULL; - if (msg->rsp[2] != 0) { + /* + * It appears some BMCs, with no event data, return no + * data in the message and not a 0x80 error as the + * spec says they should. Shut down processing if + * the data is not the right length. + */ + if (msg->rsp[2] != 0 || msg->rsp_size != 19) { /* Error getting event, probably done. */ msg->done(msg); > > diff --git a/drivers/char/ipmi/ipmi_si_intf.c b/drivers/char/ipmi/ipmi_si_intf.c > index 4a9e9de4d684..cf8674a93af1 100644 > --- a/drivers/char/ipmi/ipmi_si_intf.c > +++ b/drivers/char/ipmi/ipmi_si_intf.c > @@ -630,7 +630,13 @@ static void handle_transaction_done(struct smi_info *smi_info) > */ > msg = smi_info->curr_msg; > smi_info->curr_msg = NULL; > - if (msg->rsp[2] != 0) { > + /* > + * It appears some BMCs, with no event data, return no > + * data in the message and not a 0x80 error as the > + * spec says they should. Shut down processing if > + * the data is not the right length. > + */ > + if (msg->rsp[2] != 0 || smi_info->curr_msg->rsp_size != 19) { > /* Error getting event, probably done. */ > msg->done(msg); > > > With your approval on that, I'll send it to Linus after letting it sit > in the next tree for a bit. Actually, I'll probably add it in any case, > as it's a good check to do. > > > > > > If it's really something like that, I could also look at adding limits > > > for those operations. > > > > That would be great. Me and Fred would be happy to test out any patch. > > > > I still think the original patch I sent is a worthwhile defense. > > > > Our periodic monitoring scripts cause TASK_UNINTERRUPTIBLE tasks to > > block behind one another when we hit these kinds of issues in the IPMI > > code. Untangling that across thousands of machines can be time > > consuming and a more explicit EIO or ETIMEDOUT would help with triage. > > Unfortunately, that might have other issues, similar to the ones the > people with the watchdog issue found. I'll look at it a bit, but those > sorts of things would have to be scattered all over the code, not just > in that one place. As you say, it would make debugging easier. > > I think adding a counter to the number of operations occuring from a > single flag fetch would be a way to avoid this issue. That's going to > take a little more time, but I'll definitely work on that. > > Thanks, > > -corey