[PATCH v2] workqueue: Fix false positive stall reports

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH v2] workqueue: Fix false positive stall reports
@ 2026-03-22  3:30 Song Liu
  2026-03-22  4:41 ` Tejun Heo
  2026-03-23 14:09 ` Petr Mladek
  0 siblings, 2 replies; 11+ messages in thread
From: Song Liu @ 2026-03-22  3:30 UTC (permalink / raw)
  To: linux-kernel
  Cc: tj, jiangshanlai, leitao, pmladek, kernel-team, puranjay,
	Song Liu

On weakly ordered architectures (e.g., arm64), the lockless check in
wq_watchdog_timer_fn() can observe a reordering between the worklist
insertion and the last_progress_ts update. Specifically, the watchdog
can see a non-empty worklist (from a list_add) while reading a stale
last_progress_ts value, causing a false positive stall report.

This was confirmed by reading pool->last_progress_ts again after holding
pool->lock in wq_watchdog_timer_fn():

  workqueue watchdog: pool 7 false positive detected!
    lockless_ts=4784580465 locked_ts=4785033728
    diff=453263ms worklist_empty=0

To avoid slowing down the hot path (queue_work, etc.), recheck
last_progress_ts with pool->lock held. This will eliminate the false
positive with minimal overhead.

Remove two extra empty lines in wq_watchdog_timer_fn() as we are on it.

Assisted-by: claude-code:claude-opus-4-6
Signed-off-by: Song Liu <song@kernel.org>
---
v1 -> v2:
- Use scoped_guard() instead of manual raw_spin_lock/unlock (Tejun)
- Drop READ_ONCE() for pool->last_progress_ts under pool->lock (Tejun)
- Expand comment with reordering scenario and function names (Tejun)
---
 kernel/workqueue.c | 24 +++++++++++++++++++++---
 1 file changed, 21 insertions(+), 3 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index b77119d71641..ff97b705f25e 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -7699,8 +7699,28 @@ static void wq_watchdog_timer_fn(struct timer_list *unused)
 		else
 			ts = touched;
 
-		/* did we stall? */
+		/*
+		 * Did we stall?
+		 *
+		 * Do a lockless check first. On weakly ordered
+		 * architectures, the lockless check can observe a
+		 * reordering between worklist insert_work() and
+		 * last_progress_ts update from __queue_work(). Since
+		 * __queue_work() is a much hotter path than the timer
+		 * function, we handle false positive here by reading
+		 * last_progress_ts again with pool->lock held.
+		 */
 		if (time_after(now, ts + thresh)) {
+			scoped_guard(raw_spinlock_irqsave, &pool->lock) {
+				pool_ts = pool->last_progress_ts;
+				if (time_after(pool_ts, touched))
+					ts = pool_ts;
+				else
+					ts = touched;
+			}
+			if (!time_after(now, ts + thresh))
+				continue;
+
 			lockup_detected = true;
 			stall_time = jiffies_to_msecs(now - pool_ts) / 1000;
 			max_stall_time = max(max_stall_time, stall_time);
@@ -7712,8 +7732,6 @@ static void wq_watchdog_timer_fn(struct timer_list *unused)
 			pr_cont_pool_info(pool);
 			pr_cont(" stuck for %us!\n", stall_time);
 		}
-
-
 	}
 
 	if (lockup_detected)
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v2] workqueue: Fix false positive stall reports
  2026-03-22  3:30 [PATCH v2] workqueue: Fix false positive stall reports Song Liu
@ 2026-03-22  4:41 ` Tejun Heo
  2026-03-23 14:09 ` Petr Mladek
  1 sibling, 0 replies; 11+ messages in thread
From: Tejun Heo @ 2026-03-22  4:41 UTC (permalink / raw)
  To: Song Liu, linux-kernel
  Cc: jiangshanlai, leitao, pmladek, kernel-team, puranjay

Hello,

Applied to wq/for-7.0-fixes with the following Fixes and Cc tags added:

  Fixes: 82607adcf9cd ("workqueue: implement lockup detector")
  Cc: stable@vger.kernel.org # v4.5+

Thanks.

--
tejun

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2] workqueue: Fix false positive stall reports
  2026-03-22  3:30 [PATCH v2] workqueue: Fix false positive stall reports Song Liu
  2026-03-22  4:41 ` Tejun Heo
@ 2026-03-23 14:09 ` Petr Mladek
  2026-03-23 18:31   ` Song Liu
  1 sibling, 1 reply; 11+ messages in thread
From: Petr Mladek @ 2026-03-23 14:09 UTC (permalink / raw)
  To: Song Liu; +Cc: linux-kernel, tj, jiangshanlai, leitao, kernel-team, puranjay

On Sat 2026-03-21 20:30:45, Song Liu wrote:
> On weakly ordered architectures (e.g., arm64), the lockless check in
> wq_watchdog_timer_fn() can observe a reordering between the worklist
> insertion and the last_progress_ts update. Specifically, the watchdog
> can see a non-empty worklist (from a list_add) while reading a stale
> last_progress_ts value, causing a false positive stall report.
> 
> This was confirmed by reading pool->last_progress_ts again after holding
> pool->lock in wq_watchdog_timer_fn():
> 
>   workqueue watchdog: pool 7 false positive detected!
>     lockless_ts=4784580465 locked_ts=4785033728
>     diff=453263ms worklist_empty=0
> 
> To avoid slowing down the hot path (queue_work, etc.), recheck
> last_progress_ts with pool->lock held. This will eliminate the false
> positive with minimal overhead.
> 
> Remove two extra empty lines in wq_watchdog_timer_fn() as we are on it.
> 
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -7699,8 +7699,28 @@ static void wq_watchdog_timer_fn(struct timer_list *unused)
>  		else
>  			ts = touched;
>  
> -		/* did we stall? */
> +		/*
> +		 * Did we stall?
> +		 *
> +		 * Do a lockless check first. On weakly ordered
> +		 * architectures, the lockless check can observe a
> +		 * reordering between worklist insert_work() and
> +		 * last_progress_ts update from __queue_work(). Since
> +		 * __queue_work() is a much hotter path than the timer
> +		 * function, we handle false positive here by reading
> +		 * last_progress_ts again with pool->lock held.
> +		 */
>  		if (time_after(now, ts + thresh)) {
> +			scoped_guard(raw_spinlock_irqsave, &pool->lock) {
> +				pool_ts = pool->last_progress_ts;
> +				if (time_after(pool_ts, touched))
> +					ts = pool_ts;
> +				else
> +					ts = touched;
> +			}
> +			if (!time_after(now, ts + thresh))
> +				continue;

The new code is pretty hairy. It might make sense to take the lock
around the original check and keep it as is.

IMHO, if a contention on a pool->lock has ever been a problem than maybe
the non-trivial workqueue API was not a good choice for the affected
use-case. Or am I wrong?

Best Regards,
Petr

> +
>  			lockup_detected = true;
>  			stall_time = jiffies_to_msecs(now - pool_ts) / 1000;
>  			max_stall_time = max(max_stall_time, stall_time);


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2] workqueue: Fix false positive stall reports
  2026-03-23 14:09 ` Petr Mladek
@ 2026-03-23 18:31   ` Song Liu
  2026-03-24 10:01     ` Petr Mladek
  0 siblings, 1 reply; 11+ messages in thread
From: Song Liu @ 2026-03-23 18:31 UTC (permalink / raw)
  To: Petr Mladek; +Cc: linux-kernel, tj, jiangshanlai, leitao, kernel-team, puranjay

On Mon, Mar 23, 2026 at 7:09 AM Petr Mladek <pmladek@suse.com> wrote:
>
> On Sat 2026-03-21 20:30:45, Song Liu wrote:
> > On weakly ordered architectures (e.g., arm64), the lockless check in
> > wq_watchdog_timer_fn() can observe a reordering between the worklist
> > insertion and the last_progress_ts update. Specifically, the watchdog
> > can see a non-empty worklist (from a list_add) while reading a stale
> > last_progress_ts value, causing a false positive stall report.
> >
> > This was confirmed by reading pool->last_progress_ts again after holding
> > pool->lock in wq_watchdog_timer_fn():
> >
> >   workqueue watchdog: pool 7 false positive detected!
> >     lockless_ts=4784580465 locked_ts=4785033728
> >     diff=453263ms worklist_empty=0
> >
> > To avoid slowing down the hot path (queue_work, etc.), recheck
> > last_progress_ts with pool->lock held. This will eliminate the false
> > positive with minimal overhead.
> >
> > Remove two extra empty lines in wq_watchdog_timer_fn() as we are on it.
> >
> > --- a/kernel/workqueue.c
> > +++ b/kernel/workqueue.c
> > @@ -7699,8 +7699,28 @@ static void wq_watchdog_timer_fn(struct timer_list *unused)
> >               else
> >                       ts = touched;
> >
> > -             /* did we stall? */
> > +             /*
> > +              * Did we stall?
> > +              *
> > +              * Do a lockless check first. On weakly ordered
> > +              * architectures, the lockless check can observe a
> > +              * reordering between worklist insert_work() and
> > +              * last_progress_ts update from __queue_work(). Since
> > +              * __queue_work() is a much hotter path than the timer
> > +              * function, we handle false positive here by reading
> > +              * last_progress_ts again with pool->lock held.
> > +              */
> >               if (time_after(now, ts + thresh)) {
> > +                     scoped_guard(raw_spinlock_irqsave, &pool->lock) {
> > +                             pool_ts = pool->last_progress_ts;
> > +                             if (time_after(pool_ts, touched))
> > +                                     ts = pool_ts;
> > +                             else
> > +                                     ts = touched;
> > +                     }
> > +                     if (!time_after(now, ts + thresh))
> > +                             continue;
>
> The new code is pretty hairy. It might make sense to take the lock
> around the original check and keep it as is.
>
> IMHO, if a contention on a pool->lock has ever been a problem than maybe
> the non-trivial workqueue API was not a good choice for the affected
> use-case. Or am I wrong?

Given the false positive is really rare (a handful of them per day among a
fleet of thousands of hosts), I think taking hundreds of pool->lock per 30
second seems not necessary.

Thanks,
Song

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2] workqueue: Fix false positive stall reports
  2026-03-23 18:31   ` Song Liu
@ 2026-03-24 10:01     ` Petr Mladek
  2026-03-24 14:15       ` Tejun Heo
  2026-03-24 18:22       ` Song Liu
  0 siblings, 2 replies; 11+ messages in thread
From: Petr Mladek @ 2026-03-24 10:01 UTC (permalink / raw)
  To: Song Liu; +Cc: linux-kernel, tj, jiangshanlai, leitao, kernel-team, puranjay

On Mon 2026-03-23 11:31:14, Song Liu wrote:
> On Mon, Mar 23, 2026 at 7:09 AM Petr Mladek <pmladek@suse.com> wrote:
> >
> > On Sat 2026-03-21 20:30:45, Song Liu wrote:
> > > On weakly ordered architectures (e.g., arm64), the lockless check in
> > > wq_watchdog_timer_fn() can observe a reordering between the worklist
> > > insertion and the last_progress_ts update. Specifically, the watchdog
> > > can see a non-empty worklist (from a list_add) while reading a stale
> > > last_progress_ts value, causing a false positive stall report.
> > >
> > > This was confirmed by reading pool->last_progress_ts again after holding
> > > pool->lock in wq_watchdog_timer_fn():
> > >
> > >   workqueue watchdog: pool 7 false positive detected!
> > >     lockless_ts=4784580465 locked_ts=4785033728
> > >     diff=453263ms worklist_empty=0
> > >
> > > To avoid slowing down the hot path (queue_work, etc.), recheck
> > > last_progress_ts with pool->lock held. This will eliminate the false
> > > positive with minimal overhead.
> > >
> > > Remove two extra empty lines in wq_watchdog_timer_fn() as we are on it.
> > >
> > > --- a/kernel/workqueue.c
> > > +++ b/kernel/workqueue.c
> > > @@ -7699,8 +7699,28 @@ static void wq_watchdog_timer_fn(struct timer_list *unused)
> > >               else
> > >                       ts = touched;
> > >
> > > -             /* did we stall? */
> > > +             /*
> > > +              * Did we stall?
> > > +              *
> > > +              * Do a lockless check first. On weakly ordered
> > > +              * architectures, the lockless check can observe a
> > > +              * reordering between worklist insert_work() and
> > > +              * last_progress_ts update from __queue_work().

This explains why taking the lock is needed.

> > > + Since
> > > +              * __queue_work() is a much hotter path than the timer
> > > +              * function, we handle false positive here by reading
> > > +              * last_progress_ts again with pool->lock held.

But this is confusing. It says that __queue_work() is a much hotter path
but it already takes pool->lock. The sentence makes a feeling that
the watchdog patch is less hot. Then it is weird why the watchdog
path ignores the lock by default.

Instead, it would make sense to explain why taking the lock only
around "pool->last_progress_ts" read is safe.

You know, the code makes my head spin because it violates the usual
locking rules. The lock should serialize 2 or more writes/reads.
It works only when all the reads/writes are done under the lock.
But it is not the case here.

IMHO, the new code works because spin_lock() is a full memory
barrier. It makes sure that

       pool->worklist

is read before

       worker->pool->last_progress_ts

Plus, the worker->pool->last_progress_ts read has to wait for
any parallel modifications of both variables.

As a result, pool->worklist value might be wrong (set or empty)
but the timestamp is always more recent.

IMHO, it prevents false positives (reporting a stall because of
outdated timestamp). While it does not prevent false negatives
(missing a stall). But it is not a problem because the stall
will get caught in the next watchdog check round.

> > > +              */
> > >               if (time_after(now, ts + thresh)) {
> > > +                     scoped_guard(raw_spinlock_irqsave, &pool->lock) {
> > > +                             pool_ts = pool->last_progress_ts;
> > > +                             if (time_after(pool_ts, touched))
> > > +                                     ts = pool_ts;
> > > +                             else
> > > +                                     ts = touched;
> > > +                     }
> > > +                     if (!time_after(now, ts + thresh))
> > > +                             continue;
> >
> > The new code is pretty hairy. It might make sense to take the lock
> > around the original check and keep it as is.
> >
> > IMHO, if a contention on a pool->lock has ever been a problem than maybe
> > the non-trivial workqueue API was not a good choice for the affected
> > use-case. Or am I wrong?
> 
> Given the false positive is really rare (a handful of them per day among a
> fleet of thousands of hosts), I think taking hundreds of pool->lock per 30
> second seems not necessary.

This optimization might make sense. And it most likely works. My
primary problem is that the code violates standard locking rules.
And the comment does not explain why it works.

I am sorry if I am too meticulous. I just know that it is not trivial
to make lockless algorithms right. And the new code is still kind
of lockless.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2] workqueue: Fix false positive stall reports
  2026-03-24 10:01     ` Petr Mladek
@ 2026-03-24 14:15       ` Tejun Heo
  2026-03-24 18:22       ` Song Liu
  1 sibling, 0 replies; 11+ messages in thread
From: Tejun Heo @ 2026-03-24 14:15 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Song Liu, linux-kernel, jiangshanlai, leitao, kernel-team,
	puranjay

On Tue, Mar 24, 2026 at 11:01:30AM +0100, Petr Mladek wrote:
> You know, the code makes my head spin because it violates the usual
> locking rules. The lock should serialize 2 or more writes/reads.
> It works only when all the reads/writes are done under the lock.
> But it is not the case here.

It's unusual but not unique. We would have used spin_unlock_wait() here (w/
rmb()) here but we don't have that anymore. Maybe a clearer way to indicate
what's going on now is just doing lock/unlock() cycle without doing anything
inside, but, as long as the comment explains what's going on, I think either
way is fine. So, if anyone can improve the comment without writing a novel,
plesae go ahead.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2] workqueue: Fix false positive stall reports
  2026-03-24 10:01     ` Petr Mladek
  2026-03-24 14:15       ` Tejun Heo
@ 2026-03-24 18:22       ` Song Liu
  2026-03-25 13:19         ` Petr Mladek
  1 sibling, 1 reply; 11+ messages in thread
From: Song Liu @ 2026-03-24 18:22 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Song Liu, linux-kernel, tj, jiangshanlai, leitao, kernel-team,
	puranjay

On Tue, Mar 24, 2026 at 3:01 AM Petr Mladek <pmladek@suse.com> wrote:
[...]
> This explains why taking the lock is needed.
>
> > > > + Since
> > > > +              * __queue_work() is a much hotter path than the timer
> > > > +              * function, we handle false positive here by reading
> > > > +              * last_progress_ts again with pool->lock held.
>
> But this is confusing. It says that __queue_work() is a much hotter path
> but it already takes pool->lock. The sentence makes a feeling that
> the watchdog patch is less hot. Then it is weird why the watchdog
> path ignores the lock by default.

This comment primarily concerns the cost of making the read lockless.
To do that, we need to add a memory barrier in __queue_work(),
which will slow down the hot path. __queue_work() does take
pool->lock, but in most cases,  __queue_work() only takes the lock
in local pool, which is faster than taking all the pool->locks from the
watchdog timer.

That said, I am open to other suggestions to get rid of this false
positive.

Thanks,
Song

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2] workqueue: Fix false positive stall reports
  2026-03-24 18:22       ` Song Liu
@ 2026-03-25 13:19         ` Petr Mladek
  2026-03-25 14:12           ` Petr Mladek
  0 siblings, 1 reply; 11+ messages in thread
From: Petr Mladek @ 2026-03-25 13:19 UTC (permalink / raw)
  To: Song Liu
  Cc: Song Liu, linux-kernel, tj, jiangshanlai, leitao, kernel-team,
	puranjay

On Tue 2026-03-24 11:22:11, Song Liu wrote:
> On Tue, Mar 24, 2026 at 3:01 AM Petr Mladek <pmladek@suse.com> wrote:
> [...]
> > This explains why taking the lock is needed.
> >
> > > > > + Since
> > > > > +              * __queue_work() is a much hotter path than the timer
> > > > > +              * function, we handle false positive here by reading
> > > > > +              * last_progress_ts again with pool->lock held.
> >
> > But this is confusing. It says that __queue_work() is a much hotter path
> > but it already takes pool->lock. The sentence makes a feeling that
> > the watchdog patch is less hot. Then it is weird why the watchdog
> > path ignores the lock by default.
> 
> This comment primarily concerns the cost of making the read lockless.
> To do that, we need to add a memory barrier in __queue_work(),
> which will slow down the hot path. __queue_work() does take
> pool->lock, but in most cases,  __queue_work() only takes the lock
> in local pool, which is faster than taking all the pool->locks from the
> watchdog timer.
> 
> That said, I am open to other suggestions to get rid of this false
> positive.

See below a patch which tries to improve the comment.

Honestly, I am not sure if it is better than the original. It probably
better answers the questions which I had. But it might open another
questions. I feel that I have lost a detached view because I already
know the details. Also it is hard to mention everything and keep it
"short".

Feel free to take it, ignore it, or squash it into the original
commit.

From c65eeaac600f069c7f3398c87186f73f9c437735 Mon Sep 17 00:00:00 2001
From: Petr Mladek <pmladek@suse.com>
Date: Wed, 25 Mar 2026 13:34:18 +0100
Subject: [PATCH] workqueue: Better describe stall check

Try to be more explicit why the workqueue watchdog does not take
pool->lock by default. Spin locks are full memory barriers which
delay anything. Obviously, they would primary delay operations
on the related worker pools.

Explain why it is enough to prevent the false positive by re-checking
the timestamp under the pool->lock.

Finally, make it clear what would be the alternative solution in
__queue_work() which is a hotter path.

Signed-off-by: Petr Mladek <pmladek@suse.com>
---
 kernel/workqueue.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index ff97b705f25e..e3e97d59d2f4 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -7702,13 +7702,13 @@ static void wq_watchdog_timer_fn(struct timer_list *unused)
 		/*
 		 * Did we stall?
 		 *
-		 * Do a lockless check first. On weakly ordered
-		 * architectures, the lockless check can observe a
-		 * reordering between worklist insert_work() and
-		 * last_progress_ts update from __queue_work(). Since
-		 * __queue_work() is a much hotter path than the timer
-		 * function, we handle false positive here by reading
-		 * last_progress_ts again with pool->lock held.
+		 * Try to block or slow down queue_work() as less as possible
+		 * because it might be used in hot paths. Just prevent false
+		 * positives.
+		 *
+		 * Do a lockless check first. The spin_lock() makes sure
+		 * that the re-check reads an updated pool->last_progress_ts
+		 * when this CPU saw an already updated pool->worklist.
 		 */
 		if (time_after(now, ts + thresh)) {
 			scoped_guard(raw_spinlock_irqsave, &pool->lock) {
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v2] workqueue: Fix false positive stall reports
  2026-03-25 13:19         ` Petr Mladek
@ 2026-03-25 14:12           ` Petr Mladek
  2026-03-25 15:36             ` Song Liu
  2026-03-25 15:53             ` [PATCH v2] workqueue: Better describe stall check Tejun Heo
  0 siblings, 2 replies; 11+ messages in thread
From: Petr Mladek @ 2026-03-25 14:12 UTC (permalink / raw)
  To: Song Liu
  Cc: Song Liu, linux-kernel, tj, jiangshanlai, leitao, kernel-team,
	puranjay

On Wed 2026-03-25 14:19:27, Petr Mladek wrote:
> On Tue 2026-03-24 11:22:11, Song Liu wrote:
> > On Tue, Mar 24, 2026 at 3:01 AM Petr Mladek <pmladek@suse.com> wrote:
> > [...]
> > > This explains why taking the lock is needed.
> > >
> > > > > > + Since
> > > > > > +              * __queue_work() is a much hotter path than the timer
> > > > > > +              * function, we handle false positive here by reading
> > > > > > +              * last_progress_ts again with pool->lock held.
> > >
> > > But this is confusing. It says that __queue_work() is a much hotter path
> > > but it already takes pool->lock. The sentence makes a feeling that
> > > the watchdog patch is less hot. Then it is weird why the watchdog
> > > path ignores the lock by default.
> > 
> > This comment primarily concerns the cost of making the read lockless.
> > To do that, we need to add a memory barrier in __queue_work(),
> > which will slow down the hot path. __queue_work() does take
> > pool->lock, but in most cases,  __queue_work() only takes the lock
> > in local pool, which is faster than taking all the pool->locks from the
> > watchdog timer.
> > 
> > That said, I am open to other suggestions to get rid of this false
> > positive.
> 
> See below a patch which tries to improve the comment.
> 
> Honestly, I am not sure if it is better than the original. It probably
> better answers the questions which I had. But it might open another
> questions. I feel that I have lost a detached view because I already
> know the details. Also it is hard to mention everything and keep it
> "short".
> 
> Feel free to take it, ignore it, or squash it into the original
> commit.

Grrr, I have sent a wrong version of the patch. I forgot to amend
last changes and refresh it. It is actually quite different.

Anyway, this is what I wanted to send:

From e7ab783f9a1dc5c0e8bd3b1a4bf16dd8856a59e5 Mon Sep 17 00:00:00 2001
From: Petr Mladek <pmladek@suse.com>
Date: Wed, 25 Mar 2026 13:34:18 +0100
Subject: [PATCH v2] workqueue: Better describe stall check

Try to be more explicit why the workqueue watchdog does not take
pool->lock by default. Spin locks are full memory barriers which
delay anything. Obviously, they would primary delay operations
on the related worker pools.

Explain why it is enough to prevent the false positive by re-checking
the timestamp under the pool->lock.

Finally, make it clear what would be the alternative solution in
__queue_work() which is a hotter path.

Signed-off-by: Petr Mladek <pmladek@suse.com>
---
 kernel/workqueue.c | 15 ++++++++-------
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index ff97b705f25e..eda756556341 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -7702,13 +7702,14 @@ static void wq_watchdog_timer_fn(struct timer_list *unused)
 		/*
 		 * Did we stall?
 		 *
-		 * Do a lockless check first. On weakly ordered
-		 * architectures, the lockless check can observe a
-		 * reordering between worklist insert_work() and
-		 * last_progress_ts update from __queue_work(). Since
-		 * __queue_work() is a much hotter path than the timer
-		 * function, we handle false positive here by reading
-		 * last_progress_ts again with pool->lock held.
+		 * Do a lockless check first to do not disturb the system.
+		 *
+		 * Prevent false positives by double checking the timestamp
+		 * under pool->lock. The lock makes sure that the check reads
+		 * an updated pool->last_progress_ts when this CPU saw
+		 * an already updated pool->worklist above. It seems better
+		 * than adding another barrier into __queue_work() which
+		 * is a hotter path.
 		 */
 		if (time_after(now, ts + thresh)) {
 			scoped_guard(raw_spinlock_irqsave, &pool->lock) {
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v2] workqueue: Fix false positive stall reports
  2026-03-25 14:12           ` Petr Mladek
@ 2026-03-25 15:36             ` Song Liu
  2026-03-25 15:53             ` [PATCH v2] workqueue: Better describe stall check Tejun Heo
  1 sibling, 0 replies; 11+ messages in thread
From: Song Liu @ 2026-03-25 15:36 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Song Liu, linux-kernel, tj, jiangshanlai, leitao, kernel-team,
	puranjay

On Wed, Mar 25, 2026 at 7:12 AM Petr Mladek <pmladek@suse.com> wrote:
[...]
> Anyway, this is what I wanted to send:
>
> From e7ab783f9a1dc5c0e8bd3b1a4bf16dd8856a59e5 Mon Sep 17 00:00:00 2001
> From: Petr Mladek <pmladek@suse.com>
> Date: Wed, 25 Mar 2026 13:34:18 +0100
> Subject: [PATCH v2] workqueue: Better describe stall check
>
> Try to be more explicit why the workqueue watchdog does not take
> pool->lock by default. Spin locks are full memory barriers which
> delay anything. Obviously, they would primary delay operations
> on the related worker pools.
>
> Explain why it is enough to prevent the false positive by re-checking
> the timestamp under the pool->lock.
>
> Finally, make it clear what would be the alternative solution in
> __queue_work() which is a hotter path.
>
> Signed-off-by: Petr Mladek <pmladek@suse.com>
> ---
>  kernel/workqueue.c | 15 ++++++++-------
>  1 file changed, 8 insertions(+), 7 deletions(-)
>
> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> index ff97b705f25e..eda756556341 100644
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -7702,13 +7702,14 @@ static void wq_watchdog_timer_fn(struct timer_list *unused)
>                 /*
>                  * Did we stall?
>                  *
> -                * Do a lockless check first. On weakly ordered
> -                * architectures, the lockless check can observe a
> -                * reordering between worklist insert_work() and
> -                * last_progress_ts update from __queue_work(). Since
> -                * __queue_work() is a much hotter path than the timer
> -                * function, we handle false positive here by reading
> -                * last_progress_ts again with pool->lock held.
> +                * Do a lockless check first to do not disturb the system.
> +                *
> +                * Prevent false positives by double checking the timestamp
> +                * under pool->lock. The lock makes sure that the check reads
> +                * an updated pool->last_progress_ts when this CPU saw
> +                * an already updated pool->worklist above. It seems better
> +                * than adding another barrier into __queue_work() which
> +                * is a hotter path.
>                  */

Looks good to me. Thanks for improving this comment!

Tejun,

If we want this as a separate commit, please add

Acked-by: Song Liu <song@kernel.org>

Or we can fold this in with Petr's Signed-off-by and/or
Co-developed-by.

Thanks,
Song

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2] workqueue: Better describe stall check
  2026-03-25 14:12           ` Petr Mladek
  2026-03-25 15:36             ` Song Liu
@ 2026-03-25 15:53             ` Tejun Heo
  1 sibling, 0 replies; 11+ messages in thread
From: Tejun Heo @ 2026-03-25 15:53 UTC (permalink / raw)
  To: Petr Mladek; +Cc: Song Liu, Song Liu, linux-kernel

Applied to wq/for-7.0-fixes.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2026-03-25 15:53 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-22  3:30 [PATCH v2] workqueue: Fix false positive stall reports Song Liu
2026-03-22  4:41 ` Tejun Heo
2026-03-23 14:09 ` Petr Mladek
2026-03-23 18:31   ` Song Liu
2026-03-24 10:01     ` Petr Mladek
2026-03-24 14:15       ` Tejun Heo
2026-03-24 18:22       ` Song Liu
2026-03-25 13:19         ` Petr Mladek
2026-03-25 14:12           ` Petr Mladek
2026-03-25 15:36             ` Song Liu
2026-03-25 15:53             ` [PATCH v2] workqueue: Better describe stall check Tejun Heo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox