From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5023334CFDF for ; Fri, 20 Mar 2026 19:23:37 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774034617; cv=none; b=geSNJnQ9wewlL6CvhJHY8p+vv7ASUoMQjP/BUJZgqjwikNvlKTDBWgj0869ugRKxc5Z9X+5jUfpQDcxHCeSlsygN3L0S4+eKi+iMWYGegO8JNw19xvSQovQu+msASifxTN9kmMuj8fL0z0dotVN7DMwtc1khyS0dHRprQ2KwySo= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774034617; c=relaxed/simple; bh=orV8w4Q4CZa65KNrMlR0qlenkX2XfWjIuYeOSbRSx3A=; h=From:To:Cc:Subject:Date:Message-ID:MIME-Version; b=EO30hUMfzAg3ZDxX8knsdd748WOgsE2W1RT2IbctAvUurHrExW7Opd2ay1BO3lUNmLmdVpg3h/XEKTBYxL2e4OUrvX4DjkZTHwSWDvDW2XhieteyexrPUht9J93Arub/1Q2lj5/W1/m3KTo2Tq6RfQZdvPZoZPyvaDbFUWWtVWk= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=korqrxBD; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="korqrxBD" Received: by smtp.kernel.org (Postfix) with ESMTPSA id DB9E7C4CEF7; Fri, 20 Mar 2026 19:23:35 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1774034616; bh=orV8w4Q4CZa65KNrMlR0qlenkX2XfWjIuYeOSbRSx3A=; h=From:To:Cc:Subject:Date:From; b=korqrxBD8aQwXM/AAKDxNNixAyM5Fi6FzQKUVSz7PwwBN5IjHciKt4PbcValTVYLY 19kYQjppnXrsdpK5e7T3b/UERRa939o6QGS8/GAd63twSXZ1HCQfJfkNIY9sARi6J5 MTsMV/BEhfv3vYglqQXJ1ifmqHCcg9tQl2oFYaZRtDruCFfgs1R4C5TKasF81OqRNx NxdcdDnYyEhGgFNpOxYU342IHQKpsKIo5Bklyohrfc9KQ1fAiliPZIsiLvnV+VAoTd lWB1d63GYClDyAuJv6Fb7w/cs+MT/FrfCQ+ZpfDaJnWh1Wduc6HYGcY5MpO1bDZ8Go F3I+qRBuLDKew== From: Song Liu To: linux-kernel@vger.kernel.org Cc: tj@kernel.org, jiangshanlai@gmail.com, leitao@debian.org, pmladek@suse.com, kernel-team@meta.com, puranjay@kernel.org, Song Liu Subject: [PATCH] workqueue: Fix false positive stall reports Date: Fri, 20 Mar 2026 12:23:32 -0700 Message-ID: <20260320192332.1726079-1-song@kernel.org> X-Mailer: git-send-email 2.52.0 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit On weakly ordered architectures (e.g., arm64), the lockless check in wq_watchdog_timer_fn() can observe a reordering between the worklist insertion and the last_progress_ts update. Specifically, the watchdog can see a non-empty worklist (from a list_add) while reading a stale last_progress_ts value, causing a false positive stall report. This was confirmed by reading pool->last_progress_ts again after holding pool->lock in wq_watchdog_timer_fn(): workqueue watchdog: pool 7 false positive detected! lockless_ts=4784580465 locked_ts=4785033728 diff=453263ms worklist_empty=0 To avoid slowing down the hot path (queue_work, etc.), recheck last_progress_ts with pool->lock held. This will eliminate the false positive with minimal overhead. Remove two extra empty lines in wq_watchdog_timer_fn() as we are on it. Assisted-by: claude-code:claude-opus-4-6 Signed-off-by: Song Liu --- kernel/workqueue.c | 19 +++++++++++++++++-- 1 file changed, 17 insertions(+), 2 deletions(-) diff --git a/kernel/workqueue.c b/kernel/workqueue.c index b77119d71641..5b501ff1223a 100644 --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -7701,6 +7701,23 @@ static void wq_watchdog_timer_fn(struct timer_list *unused) /* did we stall? */ if (time_after(now, ts + thresh)) { + unsigned long irq_flags; + + raw_spin_lock_irqsave(&pool->lock, irq_flags); + /* + * Recheck last_progress_ts with pool->lock, this + * eliminates false positive where we report wq + * stall for newly queued work. + */ + pool_ts = READ_ONCE(pool->last_progress_ts); + if (time_after(pool_ts, touched)) + ts = pool_ts; + else + ts = touched; + raw_spin_unlock_irqrestore(&pool->lock, irq_flags); + if (!time_after(now, ts + thresh)) + continue; + lockup_detected = true; stall_time = jiffies_to_msecs(now - pool_ts) / 1000; max_stall_time = max(max_stall_time, stall_time); @@ -7712,8 +7729,6 @@ static void wq_watchdog_timer_fn(struct timer_list *unused) pr_cont_pool_info(pool); pr_cont(" stuck for %us!\n", stall_time); } - - } if (lockup_detected) -- 2.52.0