All of lore.kernel.org
 help / color / mirror / Atom feed
From: Tejun Heo <tj@kernel.org>
To: jiangshanlai@gmail.com
Cc: linux-kernel@vger.kernel.org, Naohiro.Aota@wdc.com,
	kernel-team@meta.com, Tejun Heo <tj@kernel.org>
Subject: [PATCH 10/10] workqueue: Reimplement ordered workqueue using shared nr_active
Date: Wed, 20 Dec 2023 16:24:41 +0900	[thread overview]
Message-ID: <20231220072529.1036099-11-tj@kernel.org> (raw)
In-Reply-To: <20231220072529.1036099-1-tj@kernel.org>

Because nr_active used to be tied to pwq, an ordered workqueue had to have a
single pwq to guarantee strict ordering. This led to several contortions to
avoid creating multiple pwqs.

Now that nr_active can be shared across multiple pwqs, we can simplify
ordered workqueue implementation. All that's necessary is ensuring that a
single wq_node_nr_active is shared across all pwqs, which is achieved by
making wq_node_nr_active() always return wq->node_nr_active[nr_node_ids] for
ordered workqueues.

The new implementation is simpler and allows ordered workqueues to share
locality aware worker_pools with other unbound workqueues which should
improve execution locality.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 kernel/workqueue.c | 44 ++++++--------------------------------------
 1 file changed, 6 insertions(+), 38 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 0017e9094034..bae7ed9cd1b4 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -441,9 +441,6 @@ static DEFINE_HASHTABLE(unbound_pool_hash, UNBOUND_POOL_HASH_ORDER);
 /* I: attributes used when instantiating standard unbound pools on demand */
 static struct workqueue_attrs *unbound_std_wq_attrs[NR_STD_WORKER_POOLS];
 
-/* I: attributes used when instantiating ordered pools on demand */
-static struct workqueue_attrs *ordered_wq_attrs[NR_STD_WORKER_POOLS];
-
 /*
  * I: kthread_worker to release pwq's. pwq release needs to be bounced to a
  * process context while holding a pool lock. Bounce to a dedicated kthread
@@ -1435,6 +1432,9 @@ work_func_t wq_worker_last_func(struct task_struct *task)
  *
  * - %NULL for per-cpu workqueues as they don't need to use shared nr_active.
  *
+ * - node_nr_active[nr_node_ids] if the associated workqueue is ordered so that
+ *   all pwq's are limited by the same nr_active.
+ *
  * - node_nr_active[nr_node_ids] if @node is %NUMA_NO_NODE.
  *
  * - Otherwise, node_nr_active[@node].
@@ -1445,7 +1445,7 @@ static struct wq_node_nr_active *wq_node_nr_active(struct workqueue_struct *wq,
 	if (!(wq->flags & WQ_UNBOUND))
 		return NULL;
 
-	if (node == NUMA_NO_NODE)
+	if ((wq->flags & __WQ_ORDERED) || node == NUMA_NO_NODE)
 		node = nr_node_ids;
 
 	return wq->node_nr_active[node];
@@ -4312,7 +4312,7 @@ static struct wq_node_nr_active **alloc_node_nr_active(void)
 		nna_ar[node] = nna;
 	}
 
-	/* [nr_node_ids] is used as the fallback */
+	/* [nr_node_ids] is used for ordered workqueues and as the fallback */
 	nna = kzalloc_node(sizeof(*nna), GFP_KERNEL, NUMA_NO_NODE);
 	if (!nna)
 		goto err_free;
@@ -4799,14 +4799,6 @@ static int apply_workqueue_attrs_locked(struct workqueue_struct *wq,
 	if (WARN_ON(!(wq->flags & WQ_UNBOUND)))
 		return -EINVAL;
 
-	/* creating multiple pwqs breaks ordering guarantee */
-	if (!list_empty(&wq->pwqs)) {
-		if (WARN_ON(wq->flags & __WQ_ORDERED_EXPLICIT))
-			return -EINVAL;
-
-		wq->flags &= ~__WQ_ORDERED;
-	}
-
 	ctx = apply_wqattrs_prepare(wq, attrs, wq_unbound_cpumask);
 	if (IS_ERR(ctx))
 		return PTR_ERR(ctx);
@@ -4955,15 +4947,7 @@ static int alloc_and_link_pwqs(struct workqueue_struct *wq)
 	}
 
 	cpus_read_lock();
-	if (wq->flags & __WQ_ORDERED) {
-		ret = apply_workqueue_attrs(wq, ordered_wq_attrs[highpri]);
-		/* there should only be single pwq for ordering guarantee */
-		WARN(!ret && (wq->pwqs.next != &wq->dfl_pwq->pwqs_node ||
-			      wq->pwqs.prev != &wq->dfl_pwq->pwqs_node),
-		     "ordering guarantee broken for workqueue %s\n", wq->name);
-	} else {
-		ret = apply_workqueue_attrs(wq, unbound_std_wq_attrs[highpri]);
-	}
+	ret = apply_workqueue_attrs(wq, unbound_std_wq_attrs[highpri]);
 	cpus_read_unlock();
 
 	/* for unbound pwq, flush the pwq_release_worker ensures that the
@@ -6220,13 +6204,6 @@ static int workqueue_apply_unbound_cpumask(const cpumask_var_t unbound_cpumask)
 		if (!(wq->flags & WQ_UNBOUND))
 			continue;
 
-		/* creating multiple pwqs breaks ordering guarantee */
-		if (!list_empty(&wq->pwqs)) {
-			if (wq->flags & __WQ_ORDERED_EXPLICIT)
-				continue;
-			wq->flags &= ~__WQ_ORDERED;
-		}
-
 		ctx = apply_wqattrs_prepare(wq, wq->unbound_attrs, unbound_cpumask);
 		if (IS_ERR(ctx)) {
 			ret = PTR_ERR(ctx);
@@ -7023,15 +7000,6 @@ void __init workqueue_init_early(void)
 		BUG_ON(!(attrs = alloc_workqueue_attrs()));
 		attrs->nice = std_nice[i];
 		unbound_std_wq_attrs[i] = attrs;
-
-		/*
-		 * An ordered wq should have only one pwq as ordering is
-		 * guaranteed by max_active which is enforced by pwqs.
-		 */
-		BUG_ON(!(attrs = alloc_workqueue_attrs()));
-		attrs->nice = std_nice[i];
-		attrs->ordered = true;
-		ordered_wq_attrs[i] = attrs;
 	}
 
 	system_wq = alloc_workqueue("events", 0, 0);
-- 
2.43.0


  parent reply	other threads:[~2023-12-20  7:26 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-12-20  7:24 [PATCHSET wq/for-6.8] workqueue: Implement system-wide max_active for unbound workqueues Tejun Heo
2023-12-20  7:24 ` [PATCH 01/10] workqueue: Move pwq->max_active to wq->max_active Tejun Heo
2023-12-26  9:13   ` Lai Jiangshan
2023-12-26 20:05     ` Tejun Heo
2023-12-26 21:36       ` Tejun Heo
2023-12-20  7:24 ` [PATCH 02/10] workqueue: Factor out pwq_is_empty() Tejun Heo
2023-12-20  7:24 ` [PATCH 03/10] workqueue: Replace pwq_activate_inactive_work() with [__]pwq_activate_work() Tejun Heo
2023-12-20  7:24 ` [PATCH 04/10] workqueue: Move nr_active handling into helpers Tejun Heo
2023-12-26  9:12   ` Lai Jiangshan
2023-12-26 20:06     ` Tejun Heo
2023-12-20  7:24 ` [PATCH 05/10] workqueue: Make wq_adjust_max_active() round-robin pwqs while activating Tejun Heo
2023-12-20  7:24 ` [PATCH 06/10] workqueue: Add first_possible_node and node_nr_cpus[] Tejun Heo
2023-12-20  7:24 ` [PATCH 07/10] workqueue: Move pwq_dec_nr_in_flight() to the end of work item handling Tejun Heo
2023-12-20  7:24 ` [PATCH 08/10] workqueue: Introduce struct wq_node_nr_active Tejun Heo
2023-12-26  9:14   ` Lai Jiangshan
2023-12-26 20:12     ` Tejun Heo
2023-12-20  7:24 ` [PATCH 09/10] workqueue: Implement system-wide nr_active enforcement for unbound workqueues Tejun Heo
2023-12-20  7:24 ` Tejun Heo [this message]
2024-01-13  0:18   ` [PATCH 10/10] workqueue: Reimplement ordered workqueue using shared nr_active Tejun Heo
2023-12-20  9:20 ` [PATCHSET wq/for-6.8] workqueue: Implement system-wide max_active for unbound workqueues Lai Jiangshan
2023-12-21 23:01   ` Tejun Heo
2023-12-22  8:04     ` Lai Jiangshan
2023-12-22  9:08       ` Tejun Heo
2024-01-05  2:44 ` Naohiro Aota
2024-01-12  0:49   ` Tejun Heo
2024-01-13  0:17     ` Tejun Heo
2024-01-15  5:46     ` Naohiro Aota
2024-01-16 21:04       ` Tejun Heo
2024-01-30  2:24         ` Naohiro Aota
2024-01-30 16:11           ` Tejun Heo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20231220072529.1036099-11-tj@kernel.org \
    --to=tj@kernel.org \
    --cc=Naohiro.Aota@wdc.com \
    --cc=jiangshanlai@gmail.com \
    --cc=kernel-team@meta.com \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.