From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-wm1-f51.google.com (mail-wm1-f51.google.com [209.85.128.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BDFF233F5A5 for ; Fri, 13 Mar 2026 16:27:44 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.51 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773419266; cv=none; b=VBFh53zhF9MeqjBpXVTSeUPtbU3DxXm9oApCfvrFVUD9kWn/EejeMVhz6qCCrcuiGxlwIp5Z96pkYskdf0kCvNiyaraQo+QKuePB1MW5FRYnlS94DoU63pLlCGoZBWy6+SI6xvLtaSaL20bGpa29hMesMXAHfYXh2xpNmt+C9Y4= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773419266; c=relaxed/simple; bh=xZbq7V5rR8h2u81EIa43Q5fJQvgeRh096g11kqeS41Y=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=lqX2MrqJ4QReW7iZf7cOMnmnlmwozFCwERtP8cGkwLYmU7DcSidPDXUFY0d/Ulf2zK+B7ZqioEyhxDKzpzubuHkjiNBz9h+95jcwvLALlf2c62S9gNmWRE3uB0g4WBYhcb9dBoVhugbkbjXpuTc27N/4pGppyBJnwXSe61NE8rE= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=suse.com; spf=pass smtp.mailfrom=suse.com; dkim=pass (2048-bit key) header.d=suse.com header.i=@suse.com header.b=dQqMIbHz; arc=none smtp.client-ip=209.85.128.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=suse.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=suse.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=suse.com header.i=@suse.com header.b="dQqMIbHz" Received: by mail-wm1-f51.google.com with SMTP id 5b1f17b1804b1-4853fd7b59aso15022825e9.2 for ; Fri, 13 Mar 2026 09:27:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=google; t=1773419263; x=1774024063; darn=vger.kernel.org; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:from:to :cc:subject:date:message-id:reply-to; bh=H9Mgarnh5HO7C51mAZlTdXM97ZjuMx4Acn3Efvw/k0I=; b=dQqMIbHzj+5NthpsbwVECDwbQ334dI0PTvzYmKaQexrLLImWLCv0yLS3JxkzglrGTZ uj7Ke/NRCErlWt0NOeSFMav6ATYzUl0GiJcoEi1MPqrbsCwSbxo0EGrMP/ZZevNQUibQ ns14dWHiHTGlZYsrogOMfUCBHXHONzaTz5EPmX01aMXrJhU/aZANHMmbBnm9jwypXG+U nL/Ap0tSfKhEs0v1H77TdeojC2+0zC+ObhAZEuJlObgFiIblw3Silh1vZsa/DdMvYQjp ldiwlb/l32HJYEHp8RPqt/Yu/XT3yxqbsiYLlPkjugM8HqtCjs9JmWK7bahVq9tJgonp OslQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1773419263; x=1774024063; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:x-gm-gg :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=H9Mgarnh5HO7C51mAZlTdXM97ZjuMx4Acn3Efvw/k0I=; b=IKKXVn9m+LU+1M4nHZqhQViedT37aUUFrTfGgh+GdvqvdOLGE004s+bEMlUYxgAGuH dYiRsC6FolgvyWlP6GHyH0WWB/2B/nCs6sMuQ+eSH+GV/0WRwavkQTCJHCnE8HEa2jgv GTlEj4SiXWNYNaULwH5AP+tPwpYHom2emUPoY3/L6++YvPp7LI+RDRpZVkeabfl8ayHM gM3GZF31+w1i/kelyQOQteGYpriJnhDE9YUGAYkSd1fxPAKwrfe77lsDFIf/3bG1l385 H+kJwaxrG/z9balKWMi4mzzlQ1fv7w+GSYUwGXzsBTUeSXgZVewaUiqZ1kut+I+eJg6A B7yw== X-Forwarded-Encrypted: i=1; AJvYcCU4F/TEVSNIBl+vl4G54fRhkY02PC92odVn8GLOE+OgDrgGv6JN0pqZzXJ7Bp6stQmmzwV0NVAp6S/6cGE=@vger.kernel.org X-Gm-Message-State: AOJu0YzweA0sDBb/BM35JnbiMYbQWDKn8vPakhyWLGZ9ElxMsFkrNgOy RPbaM/wwB45MrQvZ418iG6yO8Cvs+zVb9TQxBgfS0aMzCX0GtS2S7NOOMorBAOSrjS8= X-Gm-Gg: ATEYQzx1+RtrMar/L5v4jUwgq131lL2rw+GDPtPiKd9NNNhSob2N+I4cebE/oDIVcjk CIjYv7NrFDgEi+XAhTyaFuG08RCQGw4jBPnKB/7qwIF63rh/GtU9sXSF5hHkJYmqJYZX0DmV6YG jH0fqojjM2Ne70ZjYfBD/jBx/PyR7hhRr8L7qvMxkAkd1ZL68bOi3PUuzFLRKZJhYxxbSYTLjJr qUtX1CuXIazUzcKQQNXNraG/b1bKWz1MwsQH5Nb8g+ugcZvuGb9qiCwGN+dPRpYsbjcPYJcfHJ7 ctwOR+tTGhLBMxeY+sMtKsM5BJm5cBlxDfCj5jzGCjq/xFQf6tDq38nGlLe6Oh0MIWRkDyqW0xb mXAwzHa5KxWenN6OJXpnl6sfPCZQyqxYKykysOvOWQ9WZX+j/Kjbuu6xYH3A05PZUMkY6ypIZu2 FrTfAa2krmg5BQWRk= X-Received: by 2002:a05:600c:64cf:b0:485:3f58:da6 with SMTP id 5b1f17b1804b1-485566cdc2dmr63816165e9.2.1773419262894; Fri, 13 Mar 2026 09:27:42 -0700 (PDT) Received: from pathway ([176.114.240.130]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-439fe20b544sm19802829f8f.20.2026.03.13.09.27.42 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 13 Mar 2026 09:27:42 -0700 (PDT) Date: Fri, 13 Mar 2026 17:27:40 +0100 From: Petr Mladek To: Breno Leitao Cc: Tejun Heo , Lai Jiangshan , Andrew Morton , linux-kernel@vger.kernel.org, Omar Sandoval , Song Liu , Danielle Costantino , kasan-dev@googlegroups.com, kernel-team@meta.com Subject: Re: [PATCH v2 4/5] workqueue: Show all busy workers in stall diagnostics Message-ID: References: <20260305-wqstall_start-at-v2-0-b60863ee0899@debian.org> <20260305-wqstall_start-at-v2-4-b60863ee0899@debian.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: On Fri 2026-03-13 05:57:59, Breno Leitao wrote: > On Thu, Mar 12, 2026 at 06:03:03PM +0100, Petr Mladek wrote: > > On Thu 2026-03-05 08:15:40, Breno Leitao wrote: > > > show_cpu_pool_hog() only prints workers whose task is currently running > > > on the CPU (task_is_running()). This misses workers that are busy > > > processing a work item but are sleeping or blocked — for example, a > > > worker that clears PF_WQ_WORKER and enters wait_event_idle(). > > > > IMHO, it is misleading. AFAIK, workers clear PF_WQ_WORKER flag only > > when they are going to die. They never do so when going to sleep. > > > > > Such a > > > worker still occupies a pool slot and prevents progress, yet produces > > > an empty backtrace section in the watchdog output. > > > > > > This is happening on real arm64 systems, where > > > toggle_allocation_gate() IPIs every single CPU in the machine (which > > > lacks NMI), causing workqueue stalls that show empty backtraces because > > > toggle_allocation_gate() is sleeping in wait_event_idle(). > > > > The wait_event_idle() called in toggle_allocation_gate() should not > > cause a stall. The scheduler should call wq_worker_sleeping(tsk) > > and wake up another idle worker. It should guarantee the progress. > > > > > Remove the task_is_running() filter so every in-flight worker in the > > > pool's busy_hash is dumped. The busy_hash is protected by pool->lock, > > > which is already held. > > > > As I explained in reply to the cover letter, sleeping workers should > > not block forward progress. It seems that in this case, the system was > > not able to wake up the other idle worker or it was the last idle > > worker and was not able to fork a new one. > > > > IMHO, we should warn about this when there is no running worker. > > It might be more useful than printing backtraces of the sleeping > > workers because they likely did not cause the problem. > > > > I believe that the problem, in this particular situation, is that > > the system can't schedule or fork new processes. It might help > > to warn about it and maybe show backtrace of the currently > > running process on the stalled CPU. > > Do you mean checking if pool->busy_hash is empty, and then warning? > > Commit fc36ad49ce7160907bcbe4f05c226595611ac293 > Author: Breno Leitao > Date: Fri Mar 13 05:35:02 2026 -0700 > > workqueue: warn when stalled pool has no running workers > > When the workqueue watchdog detects a pool stall and the pool's > busy_hash is empty (no workers executing any work item), print a > diagnostic warning with the pool state and trigger a backtrace of > the currently running task on the stalled CPU. > > Signed-off-by: Breno Leitao > Suggested-by: Petr Mladek > > diff --git a/kernel/workqueue.c b/kernel/workqueue.c > index 6ee52ba9b14f7..d538067754123 100644 > --- a/kernel/workqueue.c > +++ b/kernel/workqueue.c > @@ -7655,6 +7655,17 @@ static void show_cpu_pool_busy_workers(struct worker_pool *pool) > > raw_spin_lock_irqsave(&pool->lock, irq_flags); > > + if (hash_empty(pool->busy_hash)) { This would print it only when there is no in-flight work. But I think that the problem is when there in no worker in the running state. There should always be one to guarantee the forward progress. I took inspiration from your patch. This is what comes to my mind on top of the current master (printing only running workers): diff --git a/kernel/workqueue.c b/kernel/workqueue.c index aeaec79bc09c..a044c7e42139 100644 --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -7588,12 +7588,15 @@ static void show_cpu_pool_hog(struct worker_pool *pool) { struct worker *worker; unsigned long irq_flags; + bool found_running; int bkt; raw_spin_lock_irqsave(&pool->lock, irq_flags); + found_running = false; hash_for_each(pool->busy_hash, bkt, worker, hentry) { if (task_is_running(worker->task)) { + found_running = true; /* * Defer printing to avoid deadlocks in console * drivers that queue work while holding locks @@ -7609,6 +7612,19 @@ static void show_cpu_pool_hog(struct worker_pool *pool) } raw_spin_unlock_irqrestore(&pool->lock, irq_flags); + + if (!found_running) { + pr_info("pool %d: no worker in running state, cpu=%d is %s (nr_workers=%d nr_idle=%d)\n", + pool->id, pool->cpu, + idle_cpu(pool->cpu) ? "idle" : "busy", + pool->nr_workers, pool->nr_idle); + pr_info("The pool might have troubles to wake up another idle worker.\n"); + if (pool->manager) { + pr_info("Backtrace of the pool manager:\n"); + sched_show_task(pool->manager->task); + } + trigger_single_cpu_backtrace(pool->cpu); + } } static void show_cpu_pools_hogs(void) Warning: The code is not safe. We would need add some synchronization of the pool->manager pointer. Even better might be to print state and backtrace of the process which was woken by kick_pool() when the last running worker went asleep. Motivation: AFAIK, if there is a pending work in CPU bound workqueue than at least one worker in the related worker pool should be in "task_is_running()" state to guarantee forward progress. If we find the running worker then it will likely be the culprit. It either runs for too long. Or it is the last idle worker and it fails to create a new one. If there is no worker in running state then there is likely a problem in the core workqueue code. Or some work shoot the workqueue into its leg. Anyway, we might need to print much more details to nail it down. Best Regards, Petr