From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from stravinsky.debian.org (stravinsky.debian.org [82.195.75.108]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1F0C737189A for ; Thu, 7 May 2026 13:12:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=82.195.75.108 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778159526; cv=none; b=ZsGFphFkkyOfG6YoqQBdJQJzIfPbcZa4zWTPWxl9lkO2ZxOH6iQg/b4zqh4o/Wy3BUP68wGdas3LCNc2G/yoHiHnWfNxNaXqC9KdnchB9AmRDcjx0jPpuIYcKfYzitWmVxBHejiKjM4UuEVCCsREIwOSmj3lY7lCOpQhX+kyxes= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778159526; c=relaxed/simple; bh=/DJ2UPvqedPiKMcAtcybE57lXgm/ib0Dp2kFPNCmHT4=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=Huu6/gJUOiysPw66EillDn2vhY4LroGCkmubgSXgHhAVnYgMsJLaNqgQNE16dN6qlRPIk0VzKzASYdxaN2iJ9o+QG6xVmHcAZTcomwgaT6Rkd5FwGbjnKJpqHTFwtqYyRiH/B/DDUzVPE2xi9wtZXdgvrUDgU4+csZJKRiMVg4E= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=debian.org; spf=pass smtp.mailfrom=debian.org; dkim=pass (2048-bit key) header.d=debian.org header.i=@debian.org header.b=ZIMI4TXZ; arc=none smtp.client-ip=82.195.75.108 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=debian.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=debian.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=debian.org header.i=@debian.org header.b="ZIMI4TXZ" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=debian.org; s=smtpauto.stravinsky; h=X-Debian-User:In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=Xzzxv7LtLaAafixRn8CnKMiKSdcc5CBTStp38LFRkC4=; b=ZIMI4TXZHTLPw8DZ34bRXmMjpK foHY4Fq95GpwcJtfgxK6TnZZfOAGKs4oSFeex7UPbNcrLz7Q1yHn29t3VvOQX+cFpuvmji8X0zgSE qY+3KNtO5k7spjiaevbASfgHgDYun7vP3IRW1yG78NtPiXrB5fZnz825euoBwrWngk/3EZOyS0WKP AB0LCkD6CSGh1CD4bpBwlNBwV69ajerfKMlSWZr+dIbkKIXM+hermqy44UV/bNRlfzgvi/wG3GlvX EtLxfVkOlnBDOQjWEI1L/23BlGe7CXnq6IgVx0aeLo216Pgb2/YQKUnKx1hyerompliU6tdXDut7+ yNeeNd5g==; Received: from authenticated user by stravinsky.debian.org with esmtpsa (TLS1.3:ECDHE_X25519__RSA_PSS_RSAE_SHA256__AES_256_GCM:256) (Exim 4.96) (envelope-from ) id 1wKyWE-004OIc-2N; Thu, 07 May 2026 13:11:59 +0000 Date: Thu, 7 May 2026 06:11:53 -0700 From: Breno Leitao To: Jiri Slaby Cc: Tejun Heo , Lai Jiangshan , Andrew Morton , linux-kernel@vger.kernel.org, Omar Sandoval , Song Liu , Danielle Costantino , kasan-dev@googlegroups.com, Petr Mladek , kernel-team@meta.com Subject: Re: [PATCH v2 4/5] workqueue: Show all busy workers in stall diagnostics Message-ID: References: <20260305-wqstall_start-at-v2-0-b60863ee0899@debian.org> <20260305-wqstall_start-at-v2-4-b60863ee0899@debian.org> <4ed2b000-2306-49fe-87b8-8bfd7f6b6d43@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4ed2b000-2306-49fe-87b8-8bfd7f6b6d43@kernel.org> X-Debian-User: leitao Hi Jiri, On Thu, May 07, 2026 at 12:20:33PM +0200, Jiri Slaby wrote: > On 05. 03. 26, 17:15, Breno Leitao wrote: > > BUG: workqueue lockup - pool cpus=144 node=0 flags=0x4 nice=0 stuck for > 168224s! That's an extremely long stall (~1.95 days). > ... > Showing busy workqueues and worker pools: > workqueue rcu_gp: flags=0x108 > pwq 578: cpus=144 node=0 flags=0x4 nice=0 active=3 refcnt=4 > in: > https://bugzilla.suse.com/show_bug.cgi?id=1263947 > ? > > Can this (or other patch from the series) cause this? Should there be > something like cpu_online() instead of task_is_running() somewhere? This series only affects stall reporting, not detection. The changes run after the watchdog has identified a stall, so the detection logic itself remains unchanged. To help diagnose this issue, could you provide some additional information: 1) Was CPU 144 online at any point? If so, when was it taken offline? 2) Does this message appear repeatedly? If you bring CPU 144 online, does the issue resolve? 3) Have you run similar tests on earlier kernel versions without seeing this behavior, or is this a clear regression? Thanks for the report, --breno