From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-wr1-f47.google.com (mail-wr1-f47.google.com [209.85.221.47]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CD9AA24A047 for ; Fri, 17 Apr 2026 15:41:15 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.47 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776440477; cv=none; b=sTaQD4flmkwxJzUKyTDGzWgFWMTLF2nbk5/7PwfNKT+aiy1GctEmZ7qum2PyGBw/ZRrbkxbR2I41OnArV2MHHRwv+IkPfDzSxpTOpNhavBzsw9eOR4k/E+sBuz77h2k5mING0cBWsA5cC/1eNkoZJUo6Mx9kPkrtgovodhFfWEY= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776440477; c=relaxed/simple; bh=hh7MASIvK6CKw0VkgtRC35i2uI46hvwzBaQAgEypy68=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=YdwrHPpRg62GiN20nftNcKPUofaLhv8c8KjWAeXacuccJS76gnasOZmPBWWhkVdYajyXURpiROSudmvv8nM3svKujausXSFgcH+fGpzjfgJanrd8EGvlJcr9dvpSJPfKZCWhak9anRMHDlKnl5wNN1c8+M15H4ybuhiz1fVOcgI= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=readmodwrite.com; spf=none smtp.mailfrom=readmodwrite.com; dkim=pass (2048-bit key) header.d=readmodwrite-com.20251104.gappssmtp.com header.i=@readmodwrite-com.20251104.gappssmtp.com header.b=Rr+y2UE8; arc=none smtp.client-ip=209.85.221.47 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=readmodwrite.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=readmodwrite.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=readmodwrite-com.20251104.gappssmtp.com header.i=@readmodwrite-com.20251104.gappssmtp.com header.b="Rr+y2UE8" Received: by mail-wr1-f47.google.com with SMTP id ffacd0b85a97d-43d7a5e77b1so615537f8f.1 for ; Fri, 17 Apr 2026 08:41:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=readmodwrite-com.20251104.gappssmtp.com; s=20251104; t=1776440474; x=1777045274; darn=vger.kernel.org; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:from:to :cc:subject:date:message-id:reply-to; bh=K8d3/9PYuqP2Cf+uINzt8hhgShEc1l2/GsPnTmqM1Vg=; b=Rr+y2UE8UBrTJQOMqFsQp9W9PNIJAUTxRr/qMnS1jB0wS2FgpQc1mLoUc6If89vIII RM1j8KIHA006MtRivtyXJyYc04eUhb4Bfrmn/oFZzVocxxqcssWK108ZakNk6HfLHaaU 5GIyIa9BnsoA4/295Ce+oDRBjlfRzgLyd9fnn6VRwbYpgcvmi0UmCxjcyyyQc8E51O6x 2PoIB3LslZ5LiJaY0/gUYqj1J6WrBTl+c5c6c+q1lSUX1qItYpuq8VmjGV0/NCFlEbiY Quch0wNlST/SlQUUIDhUXYp+csIkT9FXivkJlunmQiwzovbrMUA8CfHTWn6ysZSISpcA bMbQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1776440474; x=1777045274; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:x-gm-gg :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=K8d3/9PYuqP2Cf+uINzt8hhgShEc1l2/GsPnTmqM1Vg=; b=fhjujPd0umXLM6qFOylHsmWPL6k2/4GGyQVv35RcZ6rrDn0veiWxxWYwPjG/kwZ7s2 j1leZvtb4/gM9Z/m7VkkmeO82biJFgjKJYebUSAhwkgMVvMB/DYHoRG1JWRf0Gaxqwhy d6bpYR2/0Neh2z7gsynaXNnrfzkBE/yT1nWl+pjyfwYHxOI9+WVn1hGKJG+1mVe6cGdR Vm+cTYuNLBkby+EGLos9euISwlz5YB0R510kAVxruo2THGkb9Z1Mt65J7z8WmPYDOT2V t5h+F+IYkhNHCgiiofmQj0AfB8BOcXMEnxoDHiq667vuDB/GzqKf81S0Sc9C53cvanKm DbFQ== X-Forwarded-Encrypted: i=1; AFNElJ+bhB7aVgkX9EJRPxfxoVPjZcxcGV3MHqX4obhZ1FZFaJHpsv6XX/bKxvdS5G6y0uVaJBzWt7WRitMF82s=@vger.kernel.org X-Gm-Message-State: AOJu0YxBC292eB68IRx7S7eE+/n3huMiZmlS68tn32eyPrDGA+uTt0gR 6p8QkydUqxboLX4tbQa5OeAd3rtVSCVCwXHvQFUNwRyv5SV//617wuGYyQ932qaMfog= X-Gm-Gg: AeBDietvPFkoPg92cK+ATEjqXH1bLlGGdSJAsFqG6gcxVPQo52VUQtYzUhauo0cEjPg SZUSge7Vo6eGYem1FwSJ0nIhr7TLCH4eooJOdmtuozRxkby6g2iTxUabwneeFqMCCj9vPbs9IFu 70V04KHcG57THUYuR6Jp3d/ZqP+jx8G7q2iAsLS8XXyu3hG1y9iFezP036ssMsIKJAEYoil/07N BeaLtF0s2Ta4s2xjIgaE327Dzjm972+Hsnuw+9iVQN+i6PcsGzsZn2kj23mhL7dkg2OQ31f12ox v3nUj0x7MQX0thjV2bvUq7txCIPxP0IHV7xEBZms9bWkJ0EiMp/4PtUxbrIDl84c1NbHFvqX22q G6I+hMmLOYYczXvTjmVDjHDdu3G47E8aeCjk/wwCsQJV/ZEl/brteyey8nI3yUQmGZ8zShw2PAF AlIJK5uONF8FGA6+vguw== X-Received: by 2002:a05:6000:15e8:b0:43f:e56a:636e with SMTP id ffacd0b85a97d-43fe56a649bmr2752002f8f.18.1776440473692; Fri, 17 Apr 2026 08:41:13 -0700 (PDT) Received: from localhost ([2a09:bac6:37a8:d2::15:415]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-43fe4e3a79esm6436594f8f.17.2026.04.17.08.41.11 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 17 Apr 2026 08:41:11 -0700 (PDT) Date: Fri, 17 Apr 2026 16:41:10 +0100 From: Matt Fleming To: Corey Minyard Cc: Tony Camuso , openipmi-developer@lists.sourceforge.net, linux-kernel@vger.kernel.org, kernel-team@cloudflare.com, Matt Fleming Subject: Re: [PATCH] ipmi: Add timeout to unconditional wait in __get_device_id() Message-ID: References: <20260415115930.3428942-1-matt@readmodwrite.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: On Wed, Apr 15, 2026 at 07:16:53AM -0500, Corey Minyard wrote: > > I've seen this before in several scenarios, including a system that put > IPMI in the ACPI tree and it sort of worked but there was no BMC > present. I had to disable that particular device. > > What hardware is involved here? I'm fairly sure we've seen this across a bunch of different BMCs, so it's not a BMC hardware thing. Almost certainly a driver issue. > Can you give a more detailed example of what's happening in the > low-level hardware? If it's KCS there's a debug flag in the > drivers/char/ipmi/ipmi_kcs_sm.c file that should help. Yep, it's KCS. Unfortunately I haven't found a way to reproduce this reliably yet. Looking at a wedged machine (cat /sys/class/ipmi/.../firmware_revision) with drgn I can see that there's 99,846 messages sat on intf->xmit_msgs and the KCS SM is idle (it's responding to internal traffic like Get Global Enables and Get Msg Flags). So it looks like completions are getting dropped. We're running a 6.18.18 kernel which includes c08ec55617cb ("ipmi: Fix use-after-free and list corruption on sender error"), so it's not that. Here's a dump of some of the data structures. intf = 0xffff9d2f4a5a0000 intf->curr_msg = 0xffff9d34f21a9c00 intf->xmit_msgs.next = 0xffff9d30c28e3c80 intf->waiting_rcv_msgs = empty intf->maintenance_mode = 0 intf->maintenance_mode_state = 0 intf->in_shutdown = false intf->seq_table = 0/64 slots used intf->smi_work.pending = 0 The stuck message itself — intf->curr_msg: msg @ 0xffff9d34f21a9c00 .data = { 0x18, 0x01 } # NetFn 0x06 (App), cmd 0x01 = Get Device ID .data_size = 2 .rsp_size = 38 .rsp[0..7] = 2c 01 00 00 ... .done = free_smi_msg .user_data = NULL .msgid = (internal GDI poll) .type = IPMI_SMI_MSG_TYPE_NORMAL smi_info = 0xffff9d2f4a010000 smi_info->si_state = SI_NORMAL (0) smi_info->curr_msg = 0xffff9d2f48c7b800 smi_info->waiting_msg = NULL smi_info->interrupt_disabled = false smi_info->supports_event_msg_buff = true smi_info->io.irq = 0 smi_info->run_to_completion = false smi_info->in_maintenance_mode = 0 Let me know if you want any other info. I'll try to trace this and catch it reproducing.