From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-wr1-f45.google.com (mail-wr1-f45.google.com [209.85.221.45]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BAE513B27FB for ; Fri, 19 Jun 2026 15:40:46 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.45 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781883648; cv=none; b=DQkineUB8OIimS7Xbi5mPlpezDiErJUp2IwTB4UgLa1vRth1iB2bMasybp55X1uZv/Yn1KpLHfNRbRVbviaG3DoC4ilaMjPd6vAwwzUJhOTRalGIi2k+yR9N2wT8kgVExCTPWZHwv/CKkmlyheV8NMaWmNHwnQk6+9miPkJT00g= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781883648; c=relaxed/simple; bh=01mDqoHK/4TSvedCzJ453stnqv7wJkxPjAEuzvtEmts=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=sN82wbb48Pty4LxS+i066XljnMOAOwpSBzvAZMDNADfNAoJMZ3FSwbPq7xB6mtsyTP4ZUOsJu/npOvSV5Bl9i+3xxXcbarD9nRG23yDXvShOENzIlvAySVldTRsQmc3XV8J8014Ksxafacfjlv6eCRxuKqSaEsP4zDCvQZNsDM4= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=suse.com; spf=pass smtp.mailfrom=suse.com; dkim=pass (2048-bit key) header.d=suse.com header.i=@suse.com header.b=BqsXfsH3; arc=none smtp.client-ip=209.85.221.45 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=suse.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=suse.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=suse.com header.i=@suse.com header.b="BqsXfsH3" Received: by mail-wr1-f45.google.com with SMTP id ffacd0b85a97d-46066e640easo1408749f8f.1 for ; Fri, 19 Jun 2026 08:40:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=google; t=1781883645; x=1782488445; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=uB3IE4q8imk4GMyhmTs/Ka7Lc6GwEGhhuqtJ0HnxEos=; b=BqsXfsH3B9uXbdoOY/xSsle3rijGPhowOWX+zXIO5kEQHoXFl7CJFBZ4VCKWsN5ei6 MxRusNV8uaYZVkYnu3SVvJqFAg+O7Njc/vcMT55CNNoFy92Q+9ZMsadx2bDbMXmjFn/e Frm+n4FB2P9CyqUCFqwICU3Y6kgV2dYBogIQSDY8YWIuQtJPx7fLnBpFx/NxwwVzOPjB jC8BtBNKGtunBJPyM3AocFrkqefKmMz5zLxfzW/qlwwqqbxAYA+uRwYw31A2oZG+Q8t7 GevYlr9UvOpXlY4/EVhA1DwOoaWyMbiH7t8MJxJjgiw/XdxPOsj3m3aKKxBxiV/SeVde zwyQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1781883645; x=1782488445; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-gg:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=uB3IE4q8imk4GMyhmTs/Ka7Lc6GwEGhhuqtJ0HnxEos=; b=S6Aqx6oAh/Q8meJU7susx47PC1Ixw1YVqEB3i09bnjr/YUdVo9BXzliJQsueeecdDS SOnbRLTD2TwLsstwzre/JCp0DstyyNZf7RdtGIP3sEjaBjcnHgiG+AVwq+aqIla81Sjb AheNFsOvsAabPzh2ElBlqaCQAK9idOU453B7Q0nKFGMFkndGQyfEtLZarHJsLi//g9XF /7UtN02/zpfAz49LhaE2UN+Wjmwbl1QsAiVF+G+86MfrcaZpjzqm16Dbvxrwxy+/2EZ0 hLFaxfsvPHTvrGuw9/ZN9usHz8UquSVzdUlPSVfkNyVhtpt43EZr+d+6qluNg2KN9dO7 fy8g== X-Forwarded-Encrypted: i=1; AFNElJ/H7v4an1i+NdayTqx0PayWkjKdNAgg3K1z/muTnq+BJ4K/Qw8YtKGJ6HIjelyU6kMG+MIziW+x17eoKNU=@vger.kernel.org X-Gm-Message-State: AOJu0YzuA6u7FadRPJI15VZUlVkvIKAAUeMIWE/1x/SjV0C5wL0A2o2Z NNlednUSfOatWcjcp4wDcOqndmlrSeYmG5xI6iQWL0E2JkB4IU5lLt9Z7UgUPy6BvVQ= X-Gm-Gg: AfdE7ckMxq5+36BuhsGyWko4mjdpnHSY1xCK1yvscrgDOvT+8WRS9CXOleLtNbfnEf2 fmgX1beNfSUVsEK0QSsitPJnEwJHxEKvWIdD5uqi6cum3FUPZO5KmZv58qTrSTg/wDN0kUO+ndx m4/nFugaMwXLcxbDkptKZu0yjl2SmYOSWlyC3n/GSX6AfQFr1c/o9mBcA5gXPd1TSHl7RSE5Tnc kP41j1+2MYqyOwW1cW6N2gd+okYab7e4TcK7+SFbLOEQYW8RePCOFDomrifIV9KLxMn7dCZQIQj jNriXiKugSN2kpDWd1Oa9uBRieH5sWbNTqs2TeQys9HvCmg2jz4kMZviaoeuo2GJgYB8Z8z1apn 1g7z1rPYAy08IDtWYTsUI/Ay43k6aJAF3c3Z4vc7+QT/QkEIHVrOUZL6sSbD+9VfXRVEMkUee1h UPtaKZy+PF7S0fQkw= X-Received: by 2002:a05:6000:40c9:b0:462:be7:bb3e with SMTP id ffacd0b85a97d-4650a32ad86mr7450195f8f.39.1781883645132; Fri, 19 Jun 2026 08:40:45 -0700 (PDT) Received: from pathway.suse.cz ([176.114.240.130]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-465090c0f4esm8703675f8f.14.2026.06.19.08.40.44 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 19 Jun 2026 08:40:44 -0700 (PDT) Date: Fri, 19 Jun 2026 17:40:42 +0200 From: Petr Mladek To: Breno Leitao Cc: Tejun Heo , Lai Jiangshan , Song Liu , linux-kernel@vger.kernel.org, kernel-team@meta.com Subject: Re: [PATCH RFC 3/3] workqueue: dump the last woken worker for stalled pools Message-ID: References: <20260616-wq_dump_petr-v1-0-b57473ca6d18@debian.org> <20260616-wq_dump_petr-v1-3-b57473ca6d18@debian.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20260616-wq_dump_petr-v1-3-b57473ca6d18@debian.org> On Tue 2026-06-16 09:44:41, Breno Leitao wrote: > To identify the task most likely responsible for a stall, add > last_woken_worker (L: pool->lock) to worker_pool and record it in > kick_pool() just before wake_up_process(). This captures the idle > worker that was kicked to take over when the last running worker went to > sleep; if the pool is now stuck with no running worker, that task is the > prime suspect and its backtrace is dumped by show_pool_no_running_worker(). > > Using struct worker * rather than struct task_struct * avoids any > lifetime concern: workers are only destroyed via set_worker_dying() > which requires pool->lock, and set_worker_dying() clears > last_woken_worker when the dying worker matches. > show_cpu_pool_busy_workers() holds pool->lock while calling > sched_show_task(), so last_woken_worker is either NULL or points to a > live worker with a valid task. More precisely, set_worker_dying() clears > last_woken_worker before setting WORKER_DIE, so a non-NULL > last_woken_worker means the kthread has not yet exited and worker->task > is still alive. > > --- a/kernel/workqueue.c > +++ b/kernel/workqueue.c > @@ -226,6 +226,7 @@ struct worker_pool { > /* L: hash of busy workers */ > > struct worker *manager; /* L: purely informational */ > + struct worker *last_woken_worker; /* L: last worker woken by kick_pool() */ I thought more about it. The "last_woken_worker" and "manager" are related and they might eventually duplicate the information. If I get it correctly then kick_pool() wakes a worker when needed. The last worker becomes a "manager" and tries to create a new worker. IMHO, in most situations "manager" will have the same value as "last_woken_worker". But it is not guaranteed because "pool->lock" is not taken all the time. There are two questions: 1. Do we need both values? IMHO, we do: + "last_woken_worker" is the last woken worker. It is supposed to guarantee the forward progress. The backtrace is interesting because it can never get scheduled. + "manager" is the last "idle" worker which is actively trying to create a new worker. It is supposed to guarantee forward progress too. IMHO, it usually will be the "last_woken_worker" but it is not guaranteed as mentioned above. 2. Should we print backtrace of both? Probably not both at the same time: + We should print "manager" when it is set. It is set when a new worker has to be created. And the "manager" is responsible for the forward progress, definitely. + We should print "last_woken_worker" when "manager" is not set. It is the only clue. And it likely got stuck for some reasons. + IMHO, "last_woken_worker" need not be printed when "manager" is set even when it is a different worker. The "manager" is the really responsible worker. And "last_woken_worker" likely just started processing work items because it somehow raced with the manager. Does this make sense, please? We could also add the "manager" printing in a separate patch later. This patch is a good step forward. Feel free to use: Reviewed-by: Petr Mladek Best Regards, Petr