From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-ot1-f49.google.com (mail-ot1-f49.google.com [209.85.210.49]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 031961C01 for ; Wed, 15 Apr 2026 12:16:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.49 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776255420; cv=none; b=VBU7k+ayAjsfrrLVPGxHqUbZJl9Todf/TwCYax1IgY37yDErVyZUOcrzWX8mee2n5zKXfdTKLhNSnh8UnqSowb73oL76UEsMiC8ybD2SN/rYPam8L36GnN+IgXGjh3Ta5LNPdj349iGzvTAQ0I45wcmawtSsjH4xvaXEzbOnPwU= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776255420; c=relaxed/simple; bh=wNLrUCCuunx/p5QIOZGp+AIUDowD+I3MPdFNbnwwVNs=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=sXj1vaRSJ3y7Ou0yl74kOkB5KowhxkF2j3X+pgP0VTet9uZbG/3NFffyUJm4tDubZbomgdIJK0vM391JOUqwVWXqpMO9KOR/g7YwC6h/7EgY+I1RsLifbd1Gqg+dgMSd5BSmRGZXJK2X6N+vifWdmvZvoRcuXjZ2XpfNgq5OBqI= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=minyard.net; spf=pass smtp.mailfrom=minyard.net; dkim=pass (2048-bit key) header.d=minyard.net header.i=@minyard.net header.b=PJLmLKrZ; arc=none smtp.client-ip=209.85.210.49 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=minyard.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=minyard.net Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=minyard.net header.i=@minyard.net header.b="PJLmLKrZ" Received: by mail-ot1-f49.google.com with SMTP id 46e09a7af769-7dbec19732eso5968464a34.3 for ; Wed, 15 Apr 2026 05:16:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=minyard.net; s=google; t=1776255418; x=1776860218; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:reply-to :message-id:subject:cc:to:from:date:from:to:cc:subject:date :message-id:reply-to; bh=WUC/0p3bO+38N4pEARQpFAg8Bixho7M7pABWIReJg/I=; b=PJLmLKrZWZoMlDxE3ejts/M4pJnBYxHNhCc3fHF4OA6fTmZTsJUeV2Ze49exKXjQ5b EunNasR5Ira0schZp8d7eRTSBCgzhUxH6j3h31Hs0x3FVH/nEgLv7Ilkbz288tJMNkDH /svLtSDNdAMcatKVmL7pPedd6pt/OEZ2h1FwnvdTrbva2Anf5rLZE1/az1XysRm1LADI aN2/M6b90JsohDM+4cod/LC11pYRQdPAD5pltL5opwGFNgVCN3r34bsBr7TEjtF7z/ba xyngXmsdyXIFk7X0ZWhpRBJpx12AyI67OL6Vkvmdm2W+N9pmuRFdtHSiNeJfnXXGRCkw pGPA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1776255418; x=1776860218; h=in-reply-to:content-disposition:mime-version:references:reply-to :message-id:subject:cc:to:from:date:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=WUC/0p3bO+38N4pEARQpFAg8Bixho7M7pABWIReJg/I=; b=lfkovXOYF42xw9fQUH5rviyOw9B+4lADr94ex1QQb0ikvQfVPah6hu5dwv2mR2KEJc 0tXhKHk+Dyj4LwX9NNAYV15fX5fvt3GzSdQiKPoBsKEbnDQHAkGuITVosbXnMGt4l99N ArBy7zOU3izOSr0Hu5aUClv67rKZVcO2fwAncRGJWUICnGK3hOuXnLmexPHtaYJYRQw1 pg4KrZfMWQQ2x75mhv/A9RujXO/nos74xMLPrMTNMrNV8LsyjpapZjv+VGA3trFoe4O0 yr+TloGONK5ZKf2rgfSrsRlQa3CYPlTHW6FpvIW3JvrHIapiUc6ucfe4omeItO2vF6Fg hpPA== X-Forwarded-Encrypted: i=1; AFNElJ92S7vjPdLMVqYgSH0wf4SJiHR2kCagahCyeuD69TsO/X/OgOtAx7V7vlzWy8onenoAc8LUnc16sBW2qEA=@vger.kernel.org X-Gm-Message-State: AOJu0YwHq0oPPNYrboVVCTcY+XKj2OPvRf/NiXaeFTG7nDQz/uM5LTDn RqhG9zR+YS4sILGeLyVB8gz8+SnsKKPq78p2Asnxa4cl/WoKVwxreh1Q3q888RGGfhU= X-Gm-Gg: AeBDieuryFrJ30q9EYf7g17hmkcBINvuue1v14tl62rh70lsuc+ODdcloNp0brREe1u 3jssqlJilYUwjbqS6941fkOcyfo626jSm3jNbab83XUu4v1mogoh7j8ZXIZdUJJufgx430isHqh SYaH89lJ6HBaB2aB0ie8oI7Qzn3ZHCd0tynOoFW8c0c/5Ml6L9byHKmoX73t/sDNtOTuB4N3lUt UvoB66dsbJJI9B5Ai0F4vXHuGumsAtuJcZVKIm3VqratrY6Y1FAXyMm7bq9nYs3+k95KcBDzPye Lqvx+n4+zY5QhKpvsGJ36hHyYsn3QueiRWuQBoPW+E1N4rdVM4EDYtJHgvOyd7ZKOPUURTy+rg/ zuRx9t9qSfDSqjB7GeYQ58tSH4/55O+SdH5vEXxWSwa/YIyA1fIZvVPYqMpJwYxJR6VAJAjXgSb R2zBgnfOm9pK2cbZ2w8KS1gEZyfAM2B6bHTYEZkOQxB2Jh1ftYS4pDRfqXw9RtweY6NaeKUEnrF M5Rapy3AYcRB3D1GR0v3QqLiQ== X-Received: by 2002:a05:6830:34a3:b0:7d7:f13a:762c with SMTP id 46e09a7af769-7dc27f60e59mr12962502a34.27.1776255417800; Wed, 15 Apr 2026 05:16:57 -0700 (PDT) Received: from mail.minyard.net ([2001:470:b8f6:1b:d21e:1d0d:5514:dd13]) by smtp.gmail.com with ESMTPSA id 46e09a7af769-7dc76a12833sm1189151a34.3.2026.04.15.05.16.56 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 15 Apr 2026 05:16:57 -0700 (PDT) Date: Wed, 15 Apr 2026 07:16:53 -0500 From: Corey Minyard To: Matt Fleming , Tony Camuso Cc: openipmi-developer@lists.sourceforge.net, linux-kernel@vger.kernel.org, kernel-team@cloudflare.com, Matt Fleming Subject: Re: [PATCH] ipmi: Add timeout to unconditional wait in __get_device_id() Message-ID: Reply-To: corey@minyard.net References: <20260415115930.3428942-1-matt@readmodwrite.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20260415115930.3428942-1-matt@readmodwrite.com> On Wed, Apr 15, 2026 at 12:59:30PM +0100, Matt Fleming wrote: > From: Matt Fleming > > When the BMC does not respond to a "Get Device ID" command, the > wait_event() in __get_device_id() blocks forever in TASK_UNINTERRUPTIBLE > while holding bmc->dyn_mutex. Every subsequent sysfs reader then piles > up in D state. Replace with wait_event_timeout() to return -EIO after 1 > second. This is the second report I have of something like this. So something is up. I'm adding Tony, who reported something like this dealing with the watchdog. The lower level driver should never not return an answer, it is supposed to guarantee that it returns an error if the BMC doesn't respond. So the bug is not here, the bug is elsewhere. My guess is that there is some new failure mode where a BMC is not working but it responds well enough that it sort of works and fools the driver. But that's only a guess. I've seen this before in several scenarios, including a system that put IPMI in the ACPI tree and it sort of worked but there was no BMC present. I had to disable that particular device. What hardware is involved here? Can you give a more detailed example of what's happening in the low-level hardware? If it's KCS there's a debug flag in the drivers/char/ipmi/ipmi_kcs_sm.c file that should help. Thanks, -corey > > Signed-off-by: Matt Fleming > --- > drivers/char/ipmi/ipmi_msghandler.c | 8 +++++++- > 1 file changed, 7 insertions(+), 1 deletion(-) > > diff --git a/drivers/char/ipmi/ipmi_msghandler.c b/drivers/char/ipmi/ipmi_msghandler.c > index c41f51c82edd..efa9588e8210 100644 > --- a/drivers/char/ipmi/ipmi_msghandler.c > +++ b/drivers/char/ipmi/ipmi_msghandler.c > @@ -2599,7 +2599,13 @@ static int __get_device_id(struct ipmi_smi *intf, struct bmc_device *bmc) > if (rv) > goto out_reset_handler; > > - wait_event(intf->waitq, bmc->dyn_id_set != 2); > + if (!wait_event_timeout(intf->waitq, bmc->dyn_id_set != 2, > + msecs_to_jiffies(1000))) { > + dev_warn(intf->si_dev, > + "Timed out waiting for get bmc device id response\n"); > + rv = -EIO; > + goto out_reset_handler; > + } > > if (!bmc->dyn_id_set) { > if (bmc->cc != IPMI_CC_NO_ERROR && > -- > 2.43.0 >