linux-doc.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] workqueue: add workqueue.mayday_initial_timeout
@ 2025-11-11  2:52 ying chen
  2025-11-11  2:55 ` Matthew Wilcox
  2025-11-11 20:40 ` Tejun Heo
  0 siblings, 2 replies; 10+ messages in thread
From: ying chen @ 2025-11-11  2:52 UTC (permalink / raw)
  To: corbet, tj, jiangshanlai, linux-doc, linux-kernel, laoar.shao

If creating a new worker takes longer than MAYDAY_INITIAL_TIMEOUT,
the rescuer thread will be woken up to process works scheduled on
@pool, resulting in sequential execution of all works. This may lead
to a situation where one work blocks others. However, the initial
rescue timeout defaults to 10 milliseconds, which can easily be
triggered in heavy-load environments.
---
 Documentation/admin-guide/kernel-parameters.txt | 4 ++++
 kernel/workqueue.c                              | 5 ++++-
 2 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt
b/Documentation/admin-guide/kernel-parameters.txt
index 149bfa7..be3f488 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -7376,6 +7376,10 @@
                        When enabled, memory and cache locality will be
                        impacted.

+       workqueue.mayday_initial_timeout
+                       Set the initial timeout (jiffies) for the mayday timer.
+                       Default is MAYDAY_INITIAL_TIMEOUT.
+
        writecombine=   [LOONGARCH,EARLY] Control the MAT (Memory Access
                        Type) of ioremap_wc().

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 003474c..c810b61 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -481,6 +481,9 @@ static DEFINE_PER_CPU_SHARED_ALIGNED(struct
irq_work [NR_STD_WORKER_POOLS],
 static DEFINE_PER_CPU_SHARED_ALIGNED(struct worker_pool [NR_STD_WORKER_POOLS],
                                     bh_worker_pools);

+static unsigned long wq_mayday_initial_timeout = MAYDAY_INITIAL_TIMEOUT;
+module_param_named(mayday_initial_timeout, wq_mayday_initial_timeout,
ulong, 0644);
+
 /* the per-cpu worker pools */
 static DEFINE_PER_CPU_SHARED_ALIGNED(struct worker_pool [NR_STD_WORKER_POOLS],
                                     cpu_worker_pools);
@@ -3050,7 +3053,7 @@ static void maybe_create_worker(struct worker_pool *pool)
        raw_spin_unlock_irq(&pool->lock);

        /* if we don't make progress in MAYDAY_INITIAL_TIMEOUT, call for help */
-       mod_timer(&pool->mayday_timer, jiffies + MAYDAY_INITIAL_TIMEOUT);
+       mod_timer(&pool->mayday_timer, jiffies + wq_mayday_initial_timeout);

        while (true) {
                if (create_worker(pool) || !need_to_create_worker(pool))
--
1.8.3.1

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] workqueue: add workqueue.mayday_initial_timeout
  2025-11-11  2:52 [PATCH] workqueue: add workqueue.mayday_initial_timeout ying chen
@ 2025-11-11  2:55 ` Matthew Wilcox
  2025-11-11  7:19   ` ying chen
  2025-11-11  8:00   ` Lai Jiangshan
  2025-11-11 20:40 ` Tejun Heo
  1 sibling, 2 replies; 10+ messages in thread
From: Matthew Wilcox @ 2025-11-11  2:55 UTC (permalink / raw)
  To: ying chen; +Cc: corbet, tj, jiangshanlai, linux-doc, linux-kernel, laoar.shao

On Tue, Nov 11, 2025 at 10:52:44AM +0800, ying chen wrote:
> If creating a new worker takes longer than MAYDAY_INITIAL_TIMEOUT,
> the rescuer thread will be woken up to process works scheduled on
> @pool, resulting in sequential execution of all works. This may lead
> to a situation where one work blocks others. However, the initial
> rescue timeout defaults to 10 milliseconds, which can easily be
> triggered in heavy-load environments.

Then we should tune it automatically, not blame the admin for not tuning
this parameter.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] workqueue: add workqueue.mayday_initial_timeout
  2025-11-11  2:55 ` Matthew Wilcox
@ 2025-11-11  7:19   ` ying chen
  2025-11-11  8:00   ` Lai Jiangshan
  1 sibling, 0 replies; 10+ messages in thread
From: ying chen @ 2025-11-11  7:19 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: corbet, tj, jiangshanlai, linux-doc, linux-kernel, laoar.shao

On Tue, Nov 11, 2025 at 10:55 AM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Tue, Nov 11, 2025 at 10:52:44AM +0800, ying chen wrote:
> > If creating a new worker takes longer than MAYDAY_INITIAL_TIMEOUT,
> > the rescuer thread will be woken up to process works scheduled on
> > @pool, resulting in sequential execution of all works. This may lead
> > to a situation where one work blocks others. However, the initial
> > rescue timeout defaults to 10 milliseconds, which can easily be
> > triggered in heavy-load environments.
>
> Then we should tune it automatically, not blame the admin for not tuning
> this parameter.
Currently, dynamic tuning of the mayday timer's initial timeout is not
supported.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] workqueue: add workqueue.mayday_initial_timeout
  2025-11-11  2:55 ` Matthew Wilcox
  2025-11-11  7:19   ` ying chen
@ 2025-11-11  8:00   ` Lai Jiangshan
  1 sibling, 0 replies; 10+ messages in thread
From: Lai Jiangshan @ 2025-11-11  8:00 UTC (permalink / raw)
  To: ying chen; +Cc: Matthew Wilcox, corbet, tj, linux-doc, linux-kernel, laoar.shao

Hello, Chen

Thanks for trying to improve.

On Tue, Nov 11, 2025 at 10:55 AM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Tue, Nov 11, 2025 at 10:52:44AM +0800, ying chen wrote:
> > If creating a new worker takes longer than MAYDAY_INITIAL_TIMEOUT,
> > the rescuer thread will be woken up to process works scheduled on
> > @pool, resulting in sequential execution of all works. This may lead
> > to a situation where one work blocks others. However, the initial
> > rescue timeout defaults to 10 milliseconds, which can easily be
> > triggered in heavy-load environments.


This is what it was intended for. A workqueue equipped with a rescuer
is considered a single-threaded workqueue, where deadlock can occur
when there are works depending on each other.

It is recommended that the user modify the works to eliminate any
dependencies or consider using multiple workqueues. Increasing the
timeout doesn’t solve the problem; it merely hides it.

Changing the workqueue implementation to support concurrency for
the rescuer would complicate the design unnecessarily. The use of a
rescuer is only for memory reclamation, where the works should be
simple, focusing on freeing memory rather than waiting each other.

Thanks
Lai

>
> Then we should tune it automatically, not blame the admin for not tuning
> this parameter.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] workqueue: add workqueue.mayday_initial_timeout
  2025-11-11  2:52 [PATCH] workqueue: add workqueue.mayday_initial_timeout ying chen
  2025-11-11  2:55 ` Matthew Wilcox
@ 2025-11-11 20:40 ` Tejun Heo
  2025-11-12  2:01   ` ying chen
  1 sibling, 1 reply; 10+ messages in thread
From: Tejun Heo @ 2025-11-11 20:40 UTC (permalink / raw)
  To: ying chen; +Cc: corbet, jiangshanlai, linux-doc, linux-kernel, laoar.shao

Hello,

On Tue, Nov 11, 2025 at 10:52:44AM +0800, ying chen wrote:
> If creating a new worker takes longer than MAYDAY_INITIAL_TIMEOUT,
> the rescuer thread will be woken up to process works scheduled on
> @pool, resulting in sequential execution of all works. This may lead
> to a situation where one work blocks others. However, the initial
> rescue timeout defaults to 10 milliseconds, which can easily be
> triggered in heavy-load environments.

This is not how workqueue works. Rescuer doesn't exclude other workers. If
other workers become available, they will run the workqueue concurrently.
All that initial timeout achieves is delaying the initial execution from the
rescuer.

Is this from observing real behaviors? If so, what was the test case and how
did the behavior after the patch? It couldn't have gotten better.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] workqueue: add workqueue.mayday_initial_timeout
  2025-11-11 20:40 ` Tejun Heo
@ 2025-11-12  2:01   ` ying chen
  2025-11-12 16:03     ` Tejun Heo
  0 siblings, 1 reply; 10+ messages in thread
From: ying chen @ 2025-11-12  2:01 UTC (permalink / raw)
  To: Tejun Heo; +Cc: corbet, jiangshanlai, linux-doc, linux-kernel, laoar.shao

On Wed, Nov 12, 2025 at 4:40 AM Tejun Heo <tj@kernel.org> wrote:
>
> Hello,
>
> On Tue, Nov 11, 2025 at 10:52:44AM +0800, ying chen wrote:
> > If creating a new worker takes longer than MAYDAY_INITIAL_TIMEOUT,
> > the rescuer thread will be woken up to process works scheduled on
> > @pool, resulting in sequential execution of all works. This may lead
> > to a situation where one work blocks others. However, the initial
> > rescue timeout defaults to 10 milliseconds, which can easily be
> > triggered in heavy-load environments.
>
> This is not how workqueue works. Rescuer doesn't exclude other workers. If
> other workers become available, they will run the workqueue concurrently.
> All that initial timeout achieves is delaying the initial execution from the
> rescuer.
>
> Is this from observing real behaviors? If so, what was the test case and how
> did the behavior after the patch? It couldn't have gotten better.
>
> Thanks.
>
> --
> tejun

We encountered an XFS deadlock issue. However, unlike the scenario
described in the patch,
 in our case the rescuer thread was still woken up even when memory
was sufficient, likely due to heavy load.
patch: xfs: don't use BMBT btree split workers for IO completion
(c85007e2e3942da1f9361e4b5a9388ea3a8dcc5b)

Works that have already been scheduled will be executed sequentially
within the rescuer thread.
static int rescuer_thread(void *__rescuer)
{
                ......
                /*
                 * Slurp in all works issued via this workqueue and
                 * process'em.
                 */
                WARN_ON_ONCE(!list_empty(scheduled));
                list_for_each_entry_safe(work, n, &pool->worklist, entry) {
                        if (get_work_pwq(work) == pwq) {
                                if (first)
                                        pool->watchdog_ts = jiffies;
                                move_linked_works(work, scheduled, &n);
                        }
                        first = false;
                }

                if (!list_empty(scheduled)) {
                        process_scheduled_works(rescuer);
                        ......
                }

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] workqueue: add workqueue.mayday_initial_timeout
  2025-11-12  2:01   ` ying chen
@ 2025-11-12 16:03     ` Tejun Heo
  2025-11-13  2:34       ` ying chen
  2025-11-13 15:50       ` Lai Jiangshan
  0 siblings, 2 replies; 10+ messages in thread
From: Tejun Heo @ 2025-11-12 16:03 UTC (permalink / raw)
  To: ying chen; +Cc: corbet, jiangshanlai, linux-doc, linux-kernel, laoar.shao

Hello,

On Wed, Nov 12, 2025 at 10:01:10AM +0800, ying chen wrote:
> Works that have already been scheduled will be executed sequentially
> within the rescuer thread.
> static int rescuer_thread(void *__rescuer)
> {
>                 ......
>                 /*
>                  * Slurp in all works issued via this workqueue and
>                  * process'em.
>                  */
>                 WARN_ON_ONCE(!list_empty(scheduled));
>                 list_for_each_entry_safe(work, n, &pool->worklist, entry) {
>                         if (get_work_pwq(work) == pwq) {
>                                 if (first)
>                                         pool->watchdog_ts = jiffies;
>                                 move_linked_works(work, scheduled, &n);
>                         }
>                         first = false;
>                 }
> 
>                 if (!list_empty(scheduled)) {
>                         process_scheduled_works(rescuer);
>                         ......
>                 }

Ah, I see what you mean. The slurping is to avoid potentially O(N^2)
scanning but that probably the wrong trade-off to make here. I think the
right solution is making it break out after finding the first matching work
item and loop outside so that it processes work item one by one.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] workqueue: add workqueue.mayday_initial_timeout
  2025-11-12 16:03     ` Tejun Heo
@ 2025-11-13  2:34       ` ying chen
  2025-11-13 16:31         ` Tejun Heo
  2025-11-13 15:50       ` Lai Jiangshan
  1 sibling, 1 reply; 10+ messages in thread
From: ying chen @ 2025-11-13  2:34 UTC (permalink / raw)
  To: Tejun Heo; +Cc: corbet, jiangshanlai, linux-doc, linux-kernel, laoar.shao

On Thu, Nov 13, 2025 at 12:03 AM Tejun Heo <tj@kernel.org> wrote:
>
> Hello,
>
> On Wed, Nov 12, 2025 at 10:01:10AM +0800, ying chen wrote:
> > Works that have already been scheduled will be executed sequentially
> > within the rescuer thread.
> > static int rescuer_thread(void *__rescuer)
> > {
> >                 ......
> >                 /*
> >                  * Slurp in all works issued via this workqueue and
> >                  * process'em.
> >                  */
> >                 WARN_ON_ONCE(!list_empty(scheduled));
> >                 list_for_each_entry_safe(work, n, &pool->worklist, entry) {
> >                         if (get_work_pwq(work) == pwq) {
> >                                 if (first)
> >                                         pool->watchdog_ts = jiffies;
> >                                 move_linked_works(work, scheduled, &n);
> >                         }
> >                         first = false;
> >                 }
> >
> >                 if (!list_empty(scheduled)) {
> >                         process_scheduled_works(rescuer);
> >                         ......
> >                 }
>
> Ah, I see what you mean. The slurping is to avoid potentially O(N^2)
> scanning but that probably the wrong trade-off to make here. I think the
> right solution is making it break out after finding the first matching work
> item and loop outside so that it processes work item one by one.

Processing work items one-by-one is indeed an excellent solution.
However, wouldn't it also be necessary to provide a method for
adjusting the mayday initial timeout?

Thanks.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] workqueue: add workqueue.mayday_initial_timeout
  2025-11-12 16:03     ` Tejun Heo
  2025-11-13  2:34       ` ying chen
@ 2025-11-13 15:50       ` Lai Jiangshan
  1 sibling, 0 replies; 10+ messages in thread
From: Lai Jiangshan @ 2025-11-13 15:50 UTC (permalink / raw)
  To: Tejun Heo; +Cc: ying chen, corbet, linux-doc, linux-kernel, laoar.shao

Hello.

On Thu, Nov 13, 2025 at 12:03 AM Tejun Heo <tj@kernel.org> wrote:

>
>
> Ah, I see what you mean. The slurping is to avoid potentially O(N^2)
> scanning but that probably the wrong trade-off to make here. I think the
> right solution is making it break out after finding the first matching work
> item and loop outside so that it processes work item one by one.
>

I'm implementing it. A positional dummy work item is added to mark the
position for the next scan after processing the previous one.

I’ll send it soon.

Still, I suggest that the workqueue user eliminate any dependencies among
the work items when using a rescuer. If the memory pressure is not relieved
and no normal workers come to help, it simply won’t work.


thanks
Lai

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] workqueue: add workqueue.mayday_initial_timeout
  2025-11-13  2:34       ` ying chen
@ 2025-11-13 16:31         ` Tejun Heo
  0 siblings, 0 replies; 10+ messages in thread
From: Tejun Heo @ 2025-11-13 16:31 UTC (permalink / raw)
  To: ying chen; +Cc: corbet, jiangshanlai, linux-doc, linux-kernel, laoar.shao

Hello,

On Thu, Nov 13, 2025 at 10:34:43AM +0800, ying chen wrote:
> Processing work items one-by-one is indeed an excellent solution.
> However, wouldn't it also be necessary to provide a method for
> adjusting the mayday initial timeout?

Adding an interface like that isn't difficult but I'm not sure what that
would achieve. A rescuer is there to guarantee forward progress when the
system is under memory pressure and processing work items of the workqueue
may be required to free up memory. IOW, when that workqueue not making
forward progress can lead to system deadlock.

As such, this doesn't have that much system performance implications (aside
from the serialization effect that you raised). If rescuer is needed, the
system is in tatters anyway, especially in terms of latency response, so I'm
not sure what fine-tuning rescuer response time would help with.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2025-11-13 16:31 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-11  2:52 [PATCH] workqueue: add workqueue.mayday_initial_timeout ying chen
2025-11-11  2:55 ` Matthew Wilcox
2025-11-11  7:19   ` ying chen
2025-11-11  8:00   ` Lai Jiangshan
2025-11-11 20:40 ` Tejun Heo
2025-11-12  2:01   ` ying chen
2025-11-12 16:03     ` Tejun Heo
2025-11-13  2:34       ` ying chen
2025-11-13 16:31         ` Tejun Heo
2025-11-13 15:50       ` Lai Jiangshan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).