All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH V5 0/3] workqueue: Factor the way to assign rescuer work
@ 2025-12-08 13:25 Lai Jiangshan
  2025-12-08 13:25 ` [PATCH V5 1/3] workqueue: Make send_mayday() take a PWQ argument directly Lai Jiangshan
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: Lai Jiangshan @ 2025-12-08 13:25 UTC (permalink / raw)
  To: linux-kernel; +Cc: Tejun Heo, ying chen, Lai Jiangshan

From: Lai Jiangshan <jiangshan.ljs@antgroup.com>

Previously, the rescuer scanned for all matching work items at once and
processed them within a single rescuer thread, which could cause one
blocking work item to stall all others.

Make the rescuer process work items one-by-one instead of slurping all
matches in a single pass using a cursor.

Changed from V4:

  Control the limit and call send_mayday() in rescuer_thread(), which
  makes rescuer_thread() responsible for ensuring the cursor is not left
  behind.

v4: https://lore.kernel.org/lkml/20251125063617.671199-1-jiangshanlai@gmail.com/

Lai Jiangshan (3):
  workqueue: Refactor send_mayday() to take pwq directly
  workqueue: Process rescuer work items one-by-one using a cursor
  workqueue: Process extra works in rescuer on memory pressure

 kernel/workqueue.c | 119 ++++++++++++++++++++++++++++++++++-----------
 1 file changed, 91 insertions(+), 28 deletions(-)

-- 
2.19.1.6.gb485710b


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH V5 1/3] workqueue: Make send_mayday() take a PWQ argument directly
  2025-12-08 13:25 [PATCH V5 0/3] workqueue: Factor the way to assign rescuer work Lai Jiangshan
@ 2025-12-08 13:25 ` Lai Jiangshan
  2025-12-08 13:25 ` [PATCH V5 2/3] workqueue: Process rescuer work items one-by-one using a cursor Lai Jiangshan
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: Lai Jiangshan @ 2025-12-08 13:25 UTC (permalink / raw)
  To: linux-kernel; +Cc: Tejun Heo, ying chen, Lai Jiangshan, Lai Jiangshan

From: Lai Jiangshan <jiangshan.ljs@antgroup.com>

Make send_mayday() operate on a PWQ directly instead of taking a work
item, so that rescuer_thread() now calls send_mayday(pwq) instead of
open-coding the mayday list manipulation.

Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com>
---
 kernel/workqueue.c | 13 +++----------
 1 file changed, 3 insertions(+), 10 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 253311af47c6..f8371aa54dca 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -2976,9 +2976,8 @@ static void idle_cull_fn(struct work_struct *work)
 	reap_dying_workers(&cull_list);
 }
 
-static void send_mayday(struct work_struct *work)
+static void send_mayday(struct pool_workqueue *pwq)
 {
-	struct pool_workqueue *pwq = get_work_pwq(work);
 	struct workqueue_struct *wq = pwq->wq;
 
 	lockdep_assert_held(&wq_mayday_lock);
@@ -3016,7 +3015,7 @@ static void pool_mayday_timeout(struct timer_list *t)
 		 * rescuers.
 		 */
 		list_for_each_entry(work, &pool->worklist, entry)
-			send_mayday(work);
+			send_mayday(get_work_pwq(work));
 	}
 
 	raw_spin_unlock(&wq_mayday_lock);
@@ -3538,13 +3537,7 @@ static int rescuer_thread(void *__rescuer)
 			 */
 			if (pwq->nr_active && need_to_create_worker(pool)) {
 				raw_spin_lock(&wq_mayday_lock);
-				/*
-				 * Queue iff somebody else hasn't queued it already.
-				 */
-				if (list_empty(&pwq->mayday_node)) {
-					get_pwq(pwq);
-					list_add_tail(&pwq->mayday_node, &wq->maydays);
-				}
+				send_mayday(pwq);
 				raw_spin_unlock(&wq_mayday_lock);
 			}
 		}
-- 
2.19.1.6.gb485710b


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH V5 2/3] workqueue: Process rescuer work items one-by-one using a cursor
  2025-12-08 13:25 [PATCH V5 0/3] workqueue: Factor the way to assign rescuer work Lai Jiangshan
  2025-12-08 13:25 ` [PATCH V5 1/3] workqueue: Make send_mayday() take a PWQ argument directly Lai Jiangshan
@ 2025-12-08 13:25 ` Lai Jiangshan
  2025-12-08 13:25 ` [PATCH V5 3/3] workqueue: Process extra works in rescuer on memory pressure Lai Jiangshan
  2025-12-08 19:24 ` [PATCH V5 0/3] workqueue: Factor the way to assign rescuer work Tejun Heo
  3 siblings, 0 replies; 5+ messages in thread
From: Lai Jiangshan @ 2025-12-08 13:25 UTC (permalink / raw)
  To: linux-kernel; +Cc: Tejun Heo, ying chen, Lai Jiangshan, Lai Jiangshan

From: Lai Jiangshan <jiangshan.ljs@antgroup.com>

Previously, the rescuer scanned for all matching work items at once and
processed them within a single rescuer thread, which could cause one
blocking work item to stall all others.

Make the rescuer process work items one-by-one instead of slurping all
matches in a single pass.

Break the rescuer loop after finding and processing the first matching
work item, then restart the search to pick up the next. This gives
normal worker threads a chance to process other items which gives them
the opportinity to be processed instead of waiting on the rescuer's
queue and prevents a blocking work item from stalling the rest once
memory pressure is relieved.

Introduce a dummy cursor  work item to avoid potentially O(N^2)
rescans of the work list.  The marker records the resume position for
the next scan, eliminating redundant traversals.

Also introduce RESCUER_BATCH to control the maximum number of work items
the rescuer processes in each turn, and move on to other PWQs when the
limit is reached.

Cc: ying chen <yc1082463@gmail.com>
Reported-by: ying chen <yc1082463@gmail.com>
Fixes: e22bee782b3b ("workqueue: implement concurrency managed dynamic worker pool")
Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com>
---
 kernel/workqueue.c | 75 ++++++++++++++++++++++++++++++++++++----------
 1 file changed, 59 insertions(+), 16 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index f8371aa54dca..add236e0dac4 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -117,6 +117,8 @@ enum wq_internal_consts {
 	MAYDAY_INTERVAL		= HZ / 10,	/* and then every 100ms */
 	CREATE_COOLDOWN		= HZ,		/* time to breath after fail */
 
+	RESCUER_BATCH		= 16,		/* process items per turn */
+
 	/*
 	 * Rescue workers are used only on emergencies and shared by
 	 * all cpus.  Give MIN_NICE.
@@ -286,6 +288,7 @@ struct pool_workqueue {
 	struct list_head	pending_node;	/* LN: node on wq_node_nr_active->pending_pwqs */
 	struct list_head	pwqs_node;	/* WR: node on wq->pwqs */
 	struct list_head	mayday_node;	/* MD: node on wq->maydays */
+	struct work_struct	mayday_cursor;	/* L: cursor on pool->worklist */
 
 	u64			stats[PWQ_NR_STATS];
 
@@ -1120,6 +1123,12 @@ static struct worker *find_worker_executing_work(struct worker_pool *pool,
 	return NULL;
 }
 
+static void mayday_cursor_func(struct work_struct *work)
+{
+	/* should not be processed, only for marking position */
+	BUG();
+}
+
 /**
  * move_linked_works - move linked works to a list
  * @work: start of series of works to be scheduled
@@ -1182,6 +1191,16 @@ static bool assign_work(struct work_struct *work, struct worker *worker,
 
 	lockdep_assert_held(&pool->lock);
 
+	/* The cursor work should not be processed */
+	if (unlikely(work->func == mayday_cursor_func)) {
+		/* only worker_thread() can possibly take this branch */
+		WARN_ON_ONCE(worker->rescue_wq);
+		if (nextp)
+			*nextp = list_next_entry(work, entry);
+		list_del_init(&work->entry);
+		return false;
+	}
+
 	/*
 	 * A single work shouldn't be executed concurrently by multiple workers.
 	 * __queue_work() ensures that @work doesn't jump to a different pool
@@ -3439,22 +3458,30 @@ static int worker_thread(void *__worker)
 static bool assign_rescuer_work(struct pool_workqueue *pwq, struct worker *rescuer)
 {
 	struct worker_pool *pool = pwq->pool;
+	struct work_struct *cursor = &pwq->mayday_cursor;
 	struct work_struct *work, *n;
 
 	/* need rescue? */
 	if (!pwq->nr_active || !need_to_create_worker(pool))
 		return false;
 
-	/*
-	 * Slurp in all works issued via this workqueue and
-	 * process'em.
-	 */
-	list_for_each_entry_safe(work, n, &pool->worklist, entry) {
-		if (get_work_pwq(work) == pwq && assign_work(work, rescuer, &n))
+	/* search from the start or cursor if available */
+	if (list_empty(&cursor->entry))
+		work = list_first_entry(&pool->worklist, struct work_struct, entry);
+	else
+		work = list_next_entry(cursor, entry);
+
+	/* find the next work item to rescue */
+	list_for_each_entry_safe_from(work, n, &pool->worklist, entry) {
+		if (get_work_pwq(work) == pwq && assign_work(work, rescuer, &n)) {
 			pwq->stats[PWQ_STAT_RESCUED]++;
+			/* put the cursor for next search */
+			list_move_tail(&cursor->entry, &n->entry);
+			return true;
+		}
 	}
 
-	return !list_empty(&rescuer->scheduled);
+	return false;
 }
 
 /**
@@ -3511,6 +3538,7 @@ static int rescuer_thread(void *__rescuer)
 		struct pool_workqueue *pwq = list_first_entry(&wq->maydays,
 					struct pool_workqueue, mayday_node);
 		struct worker_pool *pool = pwq->pool;
+		unsigned int count = 0;
 
 		__set_current_state(TASK_RUNNING);
 		list_del_init(&pwq->mayday_node);
@@ -3523,25 +3551,27 @@ static int rescuer_thread(void *__rescuer)
 
 		WARN_ON_ONCE(!list_empty(&rescuer->scheduled));
 
-		if (assign_rescuer_work(pwq, rescuer)) {
+		while (assign_rescuer_work(pwq, rescuer)) {
 			process_scheduled_works(rescuer);
 
 			/*
-			 * The above execution of rescued work items could
-			 * have created more to rescue through
-			 * pwq_activate_first_inactive() or chained
-			 * queueing.  Let's put @pwq back on mayday list so
-			 * that such back-to-back work items, which may be
-			 * being used to relieve memory pressure, don't
-			 * incur MAYDAY_INTERVAL delay inbetween.
+			 * If the per-turn work item limit is reached and other
+			 * PWQs are in mayday, requeue mayday for this PWQ and
+			 * let the rescuer handle the other PWQs first.
 			 */
-			if (pwq->nr_active && need_to_create_worker(pool)) {
+			if (++count > RESCUER_BATCH && !list_empty(&pwq->wq->maydays) &&
+			    pwq->nr_active && need_to_create_worker(pool)) {
 				raw_spin_lock(&wq_mayday_lock);
 				send_mayday(pwq);
 				raw_spin_unlock(&wq_mayday_lock);
+				break;
 			}
 		}
 
+		/* The cursor can not be left behind without the rescuer watching it. */
+		if (!list_empty(&pwq->mayday_cursor.entry) && list_empty(&pwq->mayday_node))
+			list_del_init(&pwq->mayday_cursor.entry);
+
 		/*
 		 * Leave this pool. Notify regular workers; otherwise, we end up
 		 * with 0 concurrency and stalling the execution.
@@ -5160,6 +5190,19 @@ static void init_pwq(struct pool_workqueue *pwq, struct workqueue_struct *wq,
 	INIT_LIST_HEAD(&pwq->pwqs_node);
 	INIT_LIST_HEAD(&pwq->mayday_node);
 	kthread_init_work(&pwq->release_work, pwq_release_workfn);
+
+	/*
+	 * Set the dummy cursor work with valid function and get_work_pwq().
+	 *
+	 * The cursor work should only be in the pwq->pool->worklist, and
+	 * should not be treated as a processable work item.
+	 *
+	 * WORK_STRUCT_PENDING and WORK_STRUCT_INACTIVE just make it less
+	 * surprise for kernel debuging tools and reviewers.
+	 */
+	INIT_WORK(&pwq->mayday_cursor, mayday_cursor_func);
+	atomic_long_set(&pwq->mayday_cursor.data, (unsigned long)pwq |
+			WORK_STRUCT_PENDING | WORK_STRUCT_PWQ | WORK_STRUCT_INACTIVE);
 }
 
 /* sync @pwq with the current state of its associated wq and link it */
-- 
2.19.1.6.gb485710b


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH V5 3/3] workqueue: Process extra works in rescuer on memory pressure
  2025-12-08 13:25 [PATCH V5 0/3] workqueue: Factor the way to assign rescuer work Lai Jiangshan
  2025-12-08 13:25 ` [PATCH V5 1/3] workqueue: Make send_mayday() take a PWQ argument directly Lai Jiangshan
  2025-12-08 13:25 ` [PATCH V5 2/3] workqueue: Process rescuer work items one-by-one using a cursor Lai Jiangshan
@ 2025-12-08 13:25 ` Lai Jiangshan
  2025-12-08 19:24 ` [PATCH V5 0/3] workqueue: Factor the way to assign rescuer work Tejun Heo
  3 siblings, 0 replies; 5+ messages in thread
From: Lai Jiangshan @ 2025-12-08 13:25 UTC (permalink / raw)
  To: linux-kernel; +Cc: Tejun Heo, ying chen, Lai Jiangshan, Lai Jiangshan

From: Lai Jiangshan <jiangshan.ljs@antgroup.com>

Make the rescuer process more work on the last pwq when there are no
more to rescue for the whole workqueue to help the regular workers in
case it is a temporary memory pressure relief and to reduce relapse.

Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com>
---
 kernel/workqueue.c | 31 +++++++++++++++++++++++++++++--
 1 file changed, 29 insertions(+), 2 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index add236e0dac4..fe5daa32d55e 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -3461,10 +3461,37 @@ static bool assign_rescuer_work(struct pool_workqueue *pwq, struct worker *rescu
 	struct work_struct *cursor = &pwq->mayday_cursor;
 	struct work_struct *work, *n;
 
-	/* need rescue? */
-	if (!pwq->nr_active || !need_to_create_worker(pool))
+	/* have work items to rescue? */
+	if (!pwq->nr_active)
 		return false;
 
+	/* need rescue? */
+	if (!need_to_create_worker(pool)) {
+		/*
+		 * The pool has idle workers and doesn't need the rescuer, so it
+		 * could simply return false here.
+		 *
+		 * However, the memory pressure might not be fully relieved.
+		 * In PERCPU pool with concurrency enabled, having idle workers
+		 * does not necessarily mean memory pressure is gone; it may
+		 * simply mean regular workers have woken up, completed their
+		 * work, and gone idle again due to concurrency limits.
+		 *
+		 * In this case, those working workers may later sleep again,
+		 * the pool may run out of idle workers, and it will have to
+		 * allocate new ones and wait for the timer to send mayday,
+		 * causing unnecessary delay - especially if memory pressure
+		 * was never resolved throughout.
+		 *
+		 * Do more work if memory pressure is still on to reduce
+		 * relapse, using (pool->flags & POOL_MANAGER_ACTIVE), though
+		 * not precisely, unless there are other PWQs needing help.
+		 */
+		if (!(pool->flags & POOL_MANAGER_ACTIVE) ||
+		    !list_empty(&pwq->wq->maydays))
+			return false;
+	}
+
 	/* search from the start or cursor if available */
 	if (list_empty(&cursor->entry))
 		work = list_first_entry(&pool->worklist, struct work_struct, entry);
-- 
2.19.1.6.gb485710b


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH V5 0/3] workqueue: Factor the way to assign rescuer work
  2025-12-08 13:25 [PATCH V5 0/3] workqueue: Factor the way to assign rescuer work Lai Jiangshan
                   ` (2 preceding siblings ...)
  2025-12-08 13:25 ` [PATCH V5 3/3] workqueue: Process extra works in rescuer on memory pressure Lai Jiangshan
@ 2025-12-08 19:24 ` Tejun Heo
  3 siblings, 0 replies; 5+ messages in thread
From: Tejun Heo @ 2025-12-08 19:24 UTC (permalink / raw)
  To: Lai Jiangshan; +Cc: linux-kernel, ying chen, Lai Jiangshan

> Lai Jiangshan (3):
>   workqueue: Refactor send_mayday() to take pwq directly
>   workqueue: Process rescuer work items one-by-one using a cursor
>   workqueue: Process extra works in rescuer on memory pressure

Applied 1-3 to wq/for-6.20 with minor typo fixes.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2025-12-08 19:24 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-12-08 13:25 [PATCH V5 0/3] workqueue: Factor the way to assign rescuer work Lai Jiangshan
2025-12-08 13:25 ` [PATCH V5 1/3] workqueue: Make send_mayday() take a PWQ argument directly Lai Jiangshan
2025-12-08 13:25 ` [PATCH V5 2/3] workqueue: Process rescuer work items one-by-one using a cursor Lai Jiangshan
2025-12-08 13:25 ` [PATCH V5 3/3] workqueue: Process extra works in rescuer on memory pressure Lai Jiangshan
2025-12-08 19:24 ` [PATCH V5 0/3] workqueue: Factor the way to assign rescuer work Tejun Heo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.