From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-qv1-f49.google.com (mail-qv1-f49.google.com [209.85.219.49]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B72C829ACC5 for ; Wed, 15 Apr 2026 21:22:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.219.49 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776288171; cv=none; b=CB0lFJCr20Z3DXXeHrZERJ+xorqmiVWdFmKQpq/NpWkQmml0qn+TeIF2dDU0oQrM8iVbdYLX4dYEtwjdwlNGpXMCSl7N8643OqvY+3bfVW4qM81WuYDz/nvWBkh2iQXMav/jeH/IUX0hQfDzQTPKi5yZP13I6uLveKQPD5O8lok= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776288171; c=relaxed/simple; bh=kJCRo+zkBOLyp92RhKWx1fVHshQknqm+hHE327WHQSg=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=OOdQKotuOJOgt1PMVAVmRo+vtqm8fzEmcViVDjNlmZWVO2j36iCraj9DxPZ5VNHIoNmcq1XTJhPXPGAx4wRJ1A7NIz8doK97ZHZoHBeWyXKU8KGB4WBOVwKaNqeywfoEvok1hycn6l9Fn1xGW/Br6EMTB4et7CToKqWjvew2SbA= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=cloudflare.com; spf=pass smtp.mailfrom=cloudflare.com; dkim=pass (2048-bit key) header.d=cloudflare.com header.i=@cloudflare.com header.b=OmG4WAOc; arc=none smtp.client-ip=209.85.219.49 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=cloudflare.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=cloudflare.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=cloudflare.com header.i=@cloudflare.com header.b="OmG4WAOc" Received: by mail-qv1-f49.google.com with SMTP id 6a1803df08f44-8a016799d2cso78497546d6.1 for ; Wed, 15 Apr 2026 14:22:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cloudflare.com; s=google09082023; t=1776288169; x=1776892969; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=b6qdTyIDQ2Crr+xkUcN8ian79ontqrxWP3mXCRL15bw=; b=OmG4WAOcuwGPN1Uxdyur7HS6qgZHgwZhrOp2WQW0LIWdZ6w9i0gdScCNbm7jb8I3r1 YUe+WAMI27JNiAMSo2ADFANULHsdoYtl8+cWNjBMW6hl4nmsRpLAi0sZNyxTultw9hzm 6WicINhs8Ah1gNvgy+6poQinSgXc3kBUwwhZQtT8MPa0w2RrToIv8CWM5okbqCBQ8Yy8 p1hjvB+eqMA8O/VzghzA1zI6FQHcjsYUWLSXTJpEC0ysgRcmeiLdyVfIYP/iZiHF7A3T cUCZY4aRlcabpJADnm/VClSkzJ4ewCZFDa+7mMCJFvOL/SoI1A1nPdWyZ+21EzINZD0z uXoQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1776288169; x=1776892969; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-gg:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=b6qdTyIDQ2Crr+xkUcN8ian79ontqrxWP3mXCRL15bw=; b=EfvEgDOeVQLD9GsIbEwbUXYiVFaELtcBQMiiltcn5c43GRBxBaRwS5KhxLFKcPx/7/ cUvMTUOVEE4nF3AF9RK5+7J+bWvro9n1RYV/z2sree/nJ4MKs1gfYnIzoVExbMdAUggR Kj4PQbYudz8riqmIVPTcSsMSO0pRCUkvZFARTpMMozGml1p/Yw0k0ftiZ+fDFn/AVJ+k VNqz11nGMYSuZsmRzE10G2PXUdruZLisDbqerzTSILld873z+2ddfb3T+ihHR88fLK1d Op5ki6tdVfD6+Kc4HzjnbOOv25syAaCSw/c6c9UUXWal4m9XEnS1Mb1AVTnN7z/wb9BL 3Q2A== X-Forwarded-Encrypted: i=1; AFNElJ8nSJIhk/Ev+S3bIzhpLPS5DoT4ampVhQast5Xtpo2TJseRMmshr5SG4IsC04CCtvHdVPClMEK5lTnvclk=@vger.kernel.org X-Gm-Message-State: AOJu0Yx5YNKmmlKl3rWjg/Cg5FMOU6XdLgGqGVwgbQruKrd2K424nfnc BKb0HwCSxq7xJICk/fn9/UWYge29jUiXJ4DbEK6nGLylWREP0kt8VcDPpBOl3zN5CvI= X-Gm-Gg: AeBDietpcKgLSchAd2SZR8kE9fjoXollhl9Nw/QoOUBFJv0ozHUanZMF3exrNveo85A Nm+zGh6WoSaRz3yu3eAOTR+iA1XWVJy21d7wHKd5iiI/6Oe4gTUm2SJ4dkQZVlcvolwbqc4Kefy g148C14lc68/ljesc7z7O3FJXk2pLpz4V9QgJtzqSab3aGBqShiQ17DwYxGDvbPQ42DotFUkiyV YwaxT06rGcTP+dVAmQ2R3Wxr0l3mA3MPILloCBUCoUZAmU8f7g9zODzwUhQmbOYBIl2Zjjq0p4D +AbK16uerm8tlGMLbOUzjN9ts2g853IlGWpF7s19vvsk6nQqQ/V6LFzlNa+FPtavjsyngSGHg+i zuKEoU/+JAUHKO7K+COICMScr38xo9GKTLpV409+0ToRaWYhhp7H1wSfie9tY8N0Yw6X7yV4tZh PU0uJ4spgH3bX0rrKzuoL+m6pB0WBAA8v405HCh8o0f6kFFdJOT9OHHdhzqqfhJ4FOyA== X-Received: by 2002:ad4:5bcf:0:b0:8ac:abc5:8754 with SMTP id 6a1803df08f44-8acabc58a7dmr240670846d6.30.1776288168512; Wed, 15 Apr 2026 14:22:48 -0700 (PDT) Received: from CMGLRV3 (dhcp-74-167-106-30.gobrightspeed.net. [74.167.106.30]) by smtp.gmail.com with ESMTPSA id 6a1803df08f44-8ae6c93830esm26519286d6.9.2026.04.15.14.22.47 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 15 Apr 2026 14:22:48 -0700 (PDT) Date: Wed, 15 Apr 2026 16:22:45 -0500 From: Frederick Lawler To: Tony Camuso Cc: corey@minyard.net, Matt Fleming , openipmi-developer@lists.sourceforge.net, linux-kernel@vger.kernel.org, kernel-team@cloudflare.com, Matt Fleming Subject: Re: [PATCH] ipmi: Add timeout to unconditional wait in __get_device_id() Message-ID: References: <20260415115930.3428942-1-matt@readmodwrite.com> <9b6af9ab-79f9-4f87-ab7c-8ad6efeb18ed@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <9b6af9ab-79f9-4f87-ab7c-8ad6efeb18ed@redhat.com> Hi Corey & Tony, On Wed, Apr 15, 2026 at 11:46:27AM -0400, 'Tony Camuso' via kernel-team wrote: > On Wed, Apr 15, 2026 at 12:59:30PM +0100, Matt Fleming wrote: > > From: Matt Fleming > > > > When the BMC does not respond to a "Get Device ID" command, the > > wait_event() in __get_device_id() blocks forever in > > TASK_UNINTERRUPTIBLE while holding bmc->dyn_mutex. Every subsequent > > sysfs reader then piles up in D state. Replace with > > wait_event_timeout() to return -EIO after 1 second. > > On Wed, Apr 15, 2026 at 12:17:04PM, Corey Minyard wrote: > > This is the second report I have of something like this. So > > something is up. I'm adding Tony, who reported something like this > > dealing with the watchdog. > > > > The lower level driver should never not return an answer, it is > > supposed to guarantee that it returns an error if the BMC doesn't > > respond. So the bug is not here, the bug is elsewhere. This is a bit of a throwback to our previous discussions around [1]. I did end up applying [2] based on that discussion, and had limited success, but we still have external resets that cause us to enter this undesirable state :( [1]: https://lore.kernel.org/all/aJUMlAG17c6lCgFq@mail.minyard.net/ [2]: https://lore.kernel.org/all/20250807230648.1112569-2-corey@minyard.net/ > > I've been tracking a related issue (RHEL customer case) where BMC > reset while the IPMI watchdog is active causes D-state hangs. This > appears to be the same root cause Matt is hitting. > > I backported the recent upstream KCS/SI fixes to a RHEL 9 test kernel > (54 patches bringing it to mainline parity) and tested today on a > Dell R640. I assume this patch series: "ipmi:watchdog: Fix panic, D-state hang, and lost protection on BMC reset" [3]? [3]: https://lore.kernel.org/all/20260407175134.3367345-1-tcamuso@redhat.com/