From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-wr1-f49.google.com (mail-wr1-f49.google.com [209.85.221.49]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0EEE43128CA for ; Fri, 17 Apr 2026 22:23:06 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.49 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776464588; cv=none; b=RM1UlIoU/RHXxkWqcXfGtBB+1luG6IgN8Fga+CHyOCC8ORzMXfo/KvojPg/Hn5J0NKEdfYa0Qvhkk45QAcPyLz12qMK4c2blf07oJIixUbXsZ4/Q8dvKB6LvR1XmeJjQkO5nG97FjttdD14Rm010GxeIql6dUztpIcYcNLVqaww= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776464588; c=relaxed/simple; bh=YbIUV1tJ9OjylnPEOI4y6c2LtvOBJWcA+kz2etOJoZs=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=swe52sXwAH4Uj9Ekp+jNiEPeLWEECDHQCNKeXO9cozWh98uCC4MXpU+7B31WaozWKaqg1hBEcSg2f67ezjr2G1LRr5yNOCQTKK+ZSU113DWJDW1mXcfVT5HiYyCk/aMdih5QWOjQWg2GWU6a0WBmcQD9pL+xprdtuQcxdJVUYP0= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=readmodwrite.com; spf=none smtp.mailfrom=readmodwrite.com; dkim=pass (2048-bit key) header.d=readmodwrite-com.20251104.gappssmtp.com header.i=@readmodwrite-com.20251104.gappssmtp.com header.b=NT8wf7T9; arc=none smtp.client-ip=209.85.221.49 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=readmodwrite.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=readmodwrite.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=readmodwrite-com.20251104.gappssmtp.com header.i=@readmodwrite-com.20251104.gappssmtp.com header.b="NT8wf7T9" Received: by mail-wr1-f49.google.com with SMTP id ffacd0b85a97d-43d77f6092eso788953f8f.2 for ; Fri, 17 Apr 2026 15:23:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=readmodwrite-com.20251104.gappssmtp.com; s=20251104; t=1776464585; x=1777069385; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=keW8tCNCXNgFds1AOTm1EAKoDbIP8ABP4q+r78ZZpQk=; b=NT8wf7T9AevpwUHpfqRUa9DbRyUJLEjggMli3IyTy8QECFUFg18CMCLVo5+MQ6p0GZ e8cKtn0VcX/g+s+Pt6VPZp7i4S1VNfTy35yzL8rjDf08kHCYTD3y3gkH0JEEbk3fQbol GRsfXxW69Rx5jzSrdCZz+ku1bQZn+zIc4MX7rERXxdV4zaJ/vbbKHVxbPj4VvtevDl7o UkE6jBiqaFim9/WgUShd3egFJ0OvBDOne62gUGCWTNkCpdhTTT4WXoOBrHS6xjK9ncNO 5Q2yBnSXgd8VXkUq12APnUiLx/qdcfUug3Y24y0kfwJhwv4i22YBmxZ9zP5jW12Qoa5E rmvQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1776464585; x=1777069385; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-gg:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=keW8tCNCXNgFds1AOTm1EAKoDbIP8ABP4q+r78ZZpQk=; b=bGWuIVnvC+5R8K76ikCIQSWrDYIn5v2cUpBnQM41qjacOheDHqVAzJc/xKS8KqGXvm UJyVeEvngU6YXybWSI3oMXe2daGCpnq7/O6JXMdIY/bBSZANLMyqMj9hnvsdzhHPUzj1 fOCYMFEiMP14i+GkvaZbJ6p0wpkCCXMKRuyL/Jt9/LX4yhMDls8sUXOA5MjcefGkZ+vj QbyQqfDcctYYKQATM7bbRgSnxDiBmVD6Ajexa6wbjZyyNKcQ+a3rRGZYIqZLZjsFOBdw lksdUGDdQ/b5NHCETWuz27vzNDLhltrcR9AhBkj9orh9rYN2ImE7VKRgUNQ7F1fGpXU3 PFcQ== X-Forwarded-Encrypted: i=1; AFNElJ/x4ypODpfWDGrl3RNNoBLgfp4OUReu5hxMNGIhJCF387xNXLFRdnF4HTz8H0JYHU3jkvej9pUDTOsxe9Y=@vger.kernel.org X-Gm-Message-State: AOJu0YyVFGmgYXrbE+Aa31xqBs/w3SdijmE4o5gA6JLAA/JtkzYGby86 fq7uRwYo35VIOcvHpnN04Jw2p3ZZudc4iP/ZI0hADYUnAJl4tG7IGi23NMfPTqAyYlXNGQNgdZY 4iD5J X-Gm-Gg: AeBDietjbeaRL4NkVfENZzecLz/+B5AaE9a24YrHgURXe4N3ByPSPk/z5OYjmNKlD1y ezh2AtrXQ746rF4ka81ozewmkb/6n+K95xC33v2dg0ZFmO//Ztml7tEyZHTGXiu5xAjpTPa2X2F WfjlgarsFZARgJlbg5LZcQ7avs7Ia5K7iOp9fCjUGs+C7F101dUtAxr/UU+NnX1+LtrlOTJr8uZ XOUwvhg0mdoYiL6m/AF0OjpzATOYDedGe/+usaRklXs42DbPXS19fk3RvrwhpfnMYGx5EDnkABj /yBSFN++rBBhlkkYw6YA/fzHT1Rs7x5qKnCE3Kik2GdsFVtvAB/TuZYodiJBT0IaoMxSx52OrxN jhcDGyT+OsT9AugAfPNaHl5GaGorsiyb6offCVLA276ILo7XHNjnCKnt5NKwGsQraGeQIOscR+F rEZHGqU1cVAGGxFqewWQ== X-Received: by 2002:a05:600c:888b:b0:488:c40b:c8bf with SMTP id 5b1f17b1804b1-488fb73d234mr53748045e9.2.1776464584835; Fri, 17 Apr 2026 15:23:04 -0700 (PDT) Received: from localhost ([2a09:bac6:37a8:d2::15:415]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-488fb6e16b3sm39339915e9.0.2026.04.17.15.23.03 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 17 Apr 2026 15:23:04 -0700 (PDT) Date: Fri, 17 Apr 2026 23:23:03 +0100 From: Matt Fleming To: Corey Minyard Cc: Tony Camuso , openipmi-developer@lists.sourceforge.net, linux-kernel@vger.kernel.org, kernel-team@cloudflare.com, Matt Fleming Subject: Re: [PATCH] ipmi: Add timeout to unconditional wait in __get_device_id() Message-ID: References: <20260415115930.3428942-1-matt@readmodwrite.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: On Wed, Apr 15, 2026 at 07:16:53AM -0500, Corey Minyard wrote: > > The lower level driver should never not return an answer, it is supposed > to guarantee that it returns an error if the BMC doesn't respond. > > So the bug is not here, the bug is elsewhere. My guess is that there > is some new failure mode where a BMC is not working but it responds well > enough that it sort of works and fools the driver. But that's only a > guess. I can now reproduce this pretty reliably by running concurrent ipmitool commands (sensor/sel/mc info) + sysfs readers + periodic ipmitool mc reset cold. It wedges in a few minutes. My working theory is handle_flags() in ipmi_si_intf.c can loop on flag-driven commands (e.g. READ_EVENT_MSG_BUFFER) without ever calling start_next_msg(), starving waiting_msg indefinitely. Captured state at wedge: si_state=SI_GETTING_EVENTS msg_flags=0x02 si_curr cycling cmd=0x35 (READ_EVENT_MSG_BUFFER) si_wait frozen cmd=0x08 (GET_DEVICE_GUID, never promoted) The cold reset makes the BMC report EVENT_MSG_BUFFER_FULL during re-init, which drives the flag loop. Thanks, Matt