From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-wm1-f46.google.com (mail-wm1-f46.google.com [209.85.128.46]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 96954332EC8 for ; Tue, 16 Jun 2026 12:38:16 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.46 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781613498; cv=none; b=lHRH4Swye+nG63lzUXVQGNSFYckKnfVZdIekBuSB+HCupj1dS7RKhMw9UG+D5Bsuy6sx0yXNP7+4QQ/YVw+mffpo1OySGouXP/t4JR5iU0hLswDU1MqP+eDMibZyKmnlMZtnX+C5Z5hndKAk7WgaBeAxb2THHupTlxgXBUSJogk= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781613498; c=relaxed/simple; bh=9jxtrawWliRHaDSWWhaDmPfD6scn04PELA5N9y1V1oI=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=lKf8rhU/fmznsDGgmReTJ49JMbsl40tUlS/uypyEEgwMigU1NRYjQpwxq7UWsmlRGb2xmz+rKMK5muAwcIRqI+Y/baKDRUKMNDCDLZS0zmyv25ABHoWUAGneO3lls4FQsfCgdV0jXdgCU/jAHh0AnRHpdWMidalDLtcAfuEl08M= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=suse.com; spf=pass smtp.mailfrom=suse.com; dkim=pass (2048-bit key) header.d=suse.com header.i=@suse.com header.b=SrDVmSHR; arc=none smtp.client-ip=209.85.128.46 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=suse.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=suse.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=suse.com header.i=@suse.com header.b="SrDVmSHR" Received: by mail-wm1-f46.google.com with SMTP id 5b1f17b1804b1-490acbb0f89so29231665e9.0 for ; Tue, 16 Jun 2026 05:38:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=google; t=1781613495; x=1782218295; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=iZAbXkxf2SIvPyDG1BWn5zKnjHvnLszosQtcE5h/UF8=; b=SrDVmSHRNOsm/93blwBS/HUNxbRLKs+ieNgvg0XKcQYssYBSTcO4ZfmihjFafGOzJe lpsKpcgBu872tXc6JcyubRk5ySDQHIQN6nrKKUZqYj4gAqw8Wa1UgWrEM3ongh5D2ifX 0KaUFsqehbKeJHQfdnda2L2obITQRRIBT553h8w7wkHadb0z2ChMxFGpdJ6lNI+hvrmB BwvsdOlvSpP6pFfHhzEPprvaTqOhWt3WP64ImV7HmWlimys6Yv4959HMV6lRSSA2jfP6 lil+KkgnCEIwwGbKEdvlkFGf8nOOoTFUkzE/M4PFQhuSX1BvEIdF8gQMQfYdbuUc+xYE 0vDg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1781613495; x=1782218295; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-gg:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=iZAbXkxf2SIvPyDG1BWn5zKnjHvnLszosQtcE5h/UF8=; b=rxT31D7Tzors4yoc/jPMlwxrtNdgCT9/s9oQEQDhcATxRZg2lYoLoOPxOFlvW48CZb bpKoOgRnh0t8LMMbEbPLCdYCcqwWfhiDRtSYUBvFQsCEzddKsiRJcBZlsgoynDm9tQG0 wK7EN2toLpymbG7XuxheAj0JMeACJpjEuP0K04EZKYdnla0sFo2GqjGB3jIIYGjzmzGJ 9odIEBsvo1VUMvM0M0GheEtGXc7sr79M938xBMGsQWJrk8rpDSO5nCIKCv7hpvTnvzsh oHJXUkN+bTyzL/sDn6NOUToQ5MD2dVNmLYg6nKvT3bN0J8klxOtBHRmkfB6twuaJGrsw /D5Q== X-Forwarded-Encrypted: i=1; AFNElJ+8AjH52FWyxzm73sk33zz2Vng9kCieBgMQbQvUZ1hv2CNm2P6yRUleHMt/R06t/rAe7bXMffH2g54XtbI=@vger.kernel.org X-Gm-Message-State: AOJu0Yz4RGgM47j1xi+izYVpWMZtqRrl1BoJvR37A43LI53PtmU51mWu DUWjXQoI2p63XTtWg2RvYDo6tVYTJeNkmBJJC+eoa4qMiyJ6msNDxlgya3NA2KlSwTU= X-Gm-Gg: Acq92OFzYHqHk0NVycoU6eRsJH3kixMe+Lrsem8tT0MfKX4LigNR8qno7bBuR7Tt9hV y+NVqY0kG1JLT9zeIfF/GaZ7zU3Z/qCzrNgQY7s7v8aScj9E0zF/isOPB6tY58XGXPiRBYCuLxq lS9nZC4dk5rJaZWXSqkNTqRKRvR+8+bGXTOfsvt/Lhc5q10eXizkqIPkR87IHwcRsLaVPvwm2gq JO23U0nMifgCmzDncxeSn+Qs6fTzgmnZKz2TqSCF/4Cm0knlkP0BFsQzgZyLCtyro4J74iDnymH SaFnMu1lYzicnXogI3O1r50pudl50cQ4sW03dU+MfkRtlWNRR/v7BcTCX15UvP/HmTd8I4XLNkt dBu2FF3jo4YdeYfXG+Pghicsv21WdJ5dbURucZfVUXaLuATznmc7BBmsgSxbumw4Q5FM4OSM5ZA FseYUM8k7gYWemkEfCvjMBXeOxzg== X-Received: by 2002:a7b:c017:0:b0:48e:6db3:ff2e with SMTP id 5b1f17b1804b1-49220093459mr149622345e9.15.1781613494799; Tue, 16 Jun 2026 05:38:14 -0700 (PDT) Received: from pathway.suse.cz ([176.114.240.130]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-4922fa47ce3sm86187765e9.6.2026.06.16.05.38.13 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 16 Jun 2026 05:38:14 -0700 (PDT) Date: Tue, 16 Jun 2026 14:38:11 +0200 From: Petr Mladek To: Breno Leitao Cc: Tejun Heo , Lai Jiangshan , Andrew Morton , linux-kernel@vger.kernel.org, Omar Sandoval , Song Liu , Danielle Costantino , kasan-dev@googlegroups.com, kernel-team@meta.com Subject: Re: [PATCH v2 4/5] workqueue: Show all busy workers in stall diagnostics Message-ID: References: <20260305-wqstall_start-at-v2-0-b60863ee0899@debian.org> <20260305-wqstall_start-at-v2-4-b60863ee0899@debian.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: On Thu 2026-06-11 07:50:04, Breno Leitao wrote: > On Fri, Mar 20, 2026 at 03:41:13AM -0700, Breno Leitao wrote: > > On Wed, Mar 18, 2026 at 04:11:54PM +0100, Petr Mladek wrote: > > > On Wed 2026-03-18 04:31:08, Breno Leitao wrote: > > > Otherwise, I like this patch. > > > > > > I still think what might be the reason that there is no worker > > > in the running state. Let's see if this patch brings some useful info. > > > > > > One more idea. It might be useful to store a timestamp when the last > > > worker was woken. And then print either the timestamp or delta. > > > It would help to make sure that kick_pool() was really called > > > during the reported stall. > > > > Ack, this is the following patch I will deploy in production, let's see > > how useful it is. > > I got this running in production (backported to 6.16), and we finally got the culprit. > > 05:42:00 BUG: workqueue lockup - pool cpus=2 node=0 flags=0x0 nice=0 stuck for 115s! > NMI backtrace for cpu 2 > CPU: 2 UID: 0 PID: 411 Comm: kworker/u288:2 Tainted: G O 6.16.1-0_fbk4_0_gb849430a436c #1 NONE > Tainted: [O]=OOT_MODULE > Hardware name: > Workqueue: efi_rts_wq efi_call_rts > pstate: 23401009 (nzCv daif +PAN -UAO +TCO +DIT +SSBS BTYPE=--) > pc : 0x4052f10900 > lr : 0x4052f10e94 > sp : ffff800088cefc90 > x29: ffff800088cefc90 x28: 0000000048524641 x27: 0000004052b60000 > x26: 0000000000010058 x25: 0000004043ba0000 x24: 0000000001280000 > x23: 000000405a02807f x22: 0000000000010080 x21: 0000004053ac0097 > x20: 000000405a028080 x19: 0000004053ac0098 x18: 0000000000000000 > x17: 0000000000000030 x16: 0000004052eb6de0 x15: 0000004042ba0030 > x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000001 > x11: 0000000001d00d09 x10: 0000004042ba0028 x9 : ffff800088cefc90 > x8 : 0000000001d00cd9 x7 : 0000000000000000 x6 : 0000004043ba0000 > x5 : 0000004043bb0000 x4 : 0000004053ac0098 x3 : 000000405a028080 > x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffffffffffffe1e8 > Call trace: > 0x4052f10900 (P) > 0x4052f10e94 > 0x4052b00ed0 > 0x4052b02e38 > 0x4052b0175c > 0x4052b517b4 > 0x4052a70b84 > 0x4052cb11d4 > __efi_rt_asm_wrapper+0x50/0x78 > efi_call_rts+0x178/0x240 > process_scheduled_works+0x17c/0x420 > worker_thread+0x184/0x4d8 > kthread+0xcc/0x1f8 > ret_from_fork+0x10/0x20 > 05:42:30 BUG: workqueue lockup - pool cpus=2 node=0 flags=0x0 nice=0 stuck for 145s! > NMI backtrace for cpu 2 > CPU: 2 UID: 0 PID: 411 Comm: kworker/u288:2 Tainted: G O 6.16.1-0_fbk4_0_gb849430a436c #1 NONE > Tainted: [O]=OOT_MODULE > Hardware name: > Workqueue: efi_rts_wq efi_call_rts > pstate: 63401009 (nZCv daif +PAN -UAO +TCO +DIT +SSBS BTYPE=--) > pc : 0x4052f11ecc > lr : 0x4052f10b8c > sp : ffff800088cefc30 > x29: ffff800088cefc40 x28: 0000000048524641 x27: 0000004052b60000 > x26: 0000000000010058 x25: 0000004043fb0000 x24: 0000000001690000 > x23: 0000004053ab0040 x22: 0000000000010080 x21: ffff800088cefd00 > > rinse and repeat.. > > Unfortunately I didn't get the other pr_info(), because of console settings, > but, I can say the following from this issue and previous code: > > 1) in show_cpu_pool_hog, found_running variable is set to false. > 2) hash_for_each() never found any running task > 3) The following code was trigger and was very helpful: > > if (!found_running) > trigger_single_cpu_backtrace(cpu); Great. So, the extra complexity was worth it. Should I clean it and send a proper patch? Or would you like to do so? Also I wonder whether it would make sense to revert the commit 8823eaef45da7f ("workqueue: Show all busy workers in stall diagnostics"). If I get it correctly then printing all busy workers was not that helpful. Namely, the sleeping workers should not prevent progress. Best Regards, Petr