io-uring.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCHSET 0/3] Avoid unnecessary io-wq worker creation
@ 2025-05-23 12:15 Jens Axboe
  2025-05-23 12:15 ` [PATCH 1/3] io_uring/io-wq: move hash helpers to the top Jens Axboe
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Jens Axboe @ 2025-05-23 12:15 UTC (permalink / raw)
  To: io-uring; +Cc: changfengnan, lidiangang

Hi,

Based on a report and RFC posted here:

https://lore.kernel.org/io-uring/20250522090909.73212-1-changfengnan@bytedance.com/

this attempts to avoid spurious and pointless io-wq worker creation.
Patch 2 eliminates io-wq worker creation if the worker is currently
idle, and patch 3 attempts to eliminate worker creation if it's blocking
on a hashed item and the next work item is also hashed to the same
bucket.

 io_uring/io-wq.c | 50 ++++++++++++++++++++++++++++++++++++++----------
 1 file changed, 40 insertions(+), 10 deletions(-)

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH 1/3] io_uring/io-wq: move hash helpers to the top
  2025-05-23 12:15 [PATCHSET 0/3] Avoid unnecessary io-wq worker creation Jens Axboe
@ 2025-05-23 12:15 ` Jens Axboe
  2025-05-23 12:15 ` [PATCH 2/3] io_uring/io-wq: ignore non-busy worker going to sleep Jens Axboe
  2025-05-23 12:15 ` [PATCH 3/3] io_uring/io-wq: only create a new worker if it can make progress Jens Axboe
  2 siblings, 0 replies; 4+ messages in thread
From: Jens Axboe @ 2025-05-23 12:15 UTC (permalink / raw)
  To: io-uring; +Cc: changfengnan, lidiangang, Jens Axboe

Just in preparation for using them higher up in the function, move
these generic helpers.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 io_uring/io-wq.c | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/io_uring/io-wq.c b/io_uring/io-wq.c
index d52069b1177b..d36d0bd9847d 100644
--- a/io_uring/io-wq.c
+++ b/io_uring/io-wq.c
@@ -150,6 +150,16 @@ static bool io_acct_cancel_pending_work(struct io_wq *wq,
 static void create_worker_cb(struct callback_head *cb);
 static void io_wq_cancel_tw_create(struct io_wq *wq);
 
+static inline unsigned int __io_get_work_hash(unsigned int work_flags)
+{
+	return work_flags >> IO_WQ_HASH_SHIFT;
+}
+
+static inline unsigned int io_get_work_hash(struct io_wq_work *work)
+{
+	return __io_get_work_hash(atomic_read(&work->flags));
+}
+
 static bool io_worker_get(struct io_worker *worker)
 {
 	return refcount_inc_not_zero(&worker->ref);
@@ -454,16 +464,6 @@ static void __io_worker_idle(struct io_wq_acct *acct, struct io_worker *worker)
 	}
 }
 
-static inline unsigned int __io_get_work_hash(unsigned int work_flags)
-{
-	return work_flags >> IO_WQ_HASH_SHIFT;
-}
-
-static inline unsigned int io_get_work_hash(struct io_wq_work *work)
-{
-	return __io_get_work_hash(atomic_read(&work->flags));
-}
-
 static bool io_wait_on_hash(struct io_wq *wq, unsigned int hash)
 {
 	bool ret = false;
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH 2/3] io_uring/io-wq: ignore non-busy worker going to sleep
  2025-05-23 12:15 [PATCHSET 0/3] Avoid unnecessary io-wq worker creation Jens Axboe
  2025-05-23 12:15 ` [PATCH 1/3] io_uring/io-wq: move hash helpers to the top Jens Axboe
@ 2025-05-23 12:15 ` Jens Axboe
  2025-05-23 12:15 ` [PATCH 3/3] io_uring/io-wq: only create a new worker if it can make progress Jens Axboe
  2 siblings, 0 replies; 4+ messages in thread
From: Jens Axboe @ 2025-05-23 12:15 UTC (permalink / raw)
  To: io-uring; +Cc: changfengnan, lidiangang, Jens Axboe

When an io-wq worker goes to sleep, it checks if there's work to do.
If there is, it'll create a new worker. But if this worker is currently
idle, it'll either get woken right back up immediately, or someone
else has already created the necessary worker to handle this work.

Only go through the worker creation logic if the current worker is
currently handling a work item. That means it's being scheduled out as
part of handling that work, not just going to sleep on its own.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 io_uring/io-wq.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/io_uring/io-wq.c b/io_uring/io-wq.c
index d36d0bd9847d..c4af99460399 100644
--- a/io_uring/io-wq.c
+++ b/io_uring/io-wq.c
@@ -429,6 +429,8 @@ static void io_wq_dec_running(struct io_worker *worker)
 
 	if (!atomic_dec_and_test(&acct->nr_running))
 		return;
+	if (!worker->cur_work)
+		return;
 	if (!io_acct_run_queue(acct))
 		return;
 
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH 3/3] io_uring/io-wq: only create a new worker if it can make progress
  2025-05-23 12:15 [PATCHSET 0/3] Avoid unnecessary io-wq worker creation Jens Axboe
  2025-05-23 12:15 ` [PATCH 1/3] io_uring/io-wq: move hash helpers to the top Jens Axboe
  2025-05-23 12:15 ` [PATCH 2/3] io_uring/io-wq: ignore non-busy worker going to sleep Jens Axboe
@ 2025-05-23 12:15 ` Jens Axboe
  2 siblings, 0 replies; 4+ messages in thread
From: Jens Axboe @ 2025-05-23 12:15 UTC (permalink / raw)
  To: io-uring; +Cc: changfengnan, lidiangang, Jens Axboe

Hashed work is serialized by io-wq, intended to be used for cases like
serializing buffered writes to a regular file, where the file system
will serialize the workers anyway with a mutex or similar. Since they
would be forcibly serialized and blocked, it's more efficient for io-wq
to handle these individually rather than issue them in parallel.

If a worker is currently handling a hashed work item and gets blocked,
don't create a new worker if the next work item is also hashed and
mapped to the same bucket. That new worker would not be able to make any
progress anyway.

Reported-by: Fengnan Chang <changfengnan@bytedance.com>
Reported-by: Diangang Li <lidiangang@bytedance.com>
Link: https://lore.kernel.org/io-uring/20250522090909.73212-1-changfengnan@bytedance.com/
Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 io_uring/io-wq.c | 28 ++++++++++++++++++++++++++++
 1 file changed, 28 insertions(+)

diff --git a/io_uring/io-wq.c b/io_uring/io-wq.c
index c4af99460399..cd1fcb115739 100644
--- a/io_uring/io-wq.c
+++ b/io_uring/io-wq.c
@@ -419,6 +419,30 @@ static bool io_queue_worker_create(struct io_worker *worker,
 	return false;
 }
 
+/* Defer if current and next work are both hashed to the same chain */
+static bool io_wq_hash_defer(struct io_wq_work *work, struct io_wq_acct *acct)
+{
+	unsigned int hash, work_flags;
+	struct io_wq_work *next;
+
+	lockdep_assert_held(&acct->lock);
+
+	work_flags = atomic_read(&work->flags);
+	if (!__io_wq_is_hashed(work_flags))
+		return false;
+
+	/* should not happen, io_acct_run_queue() said we had work */
+	if (wq_list_empty(&acct->work_list))
+		return true;
+
+	hash = __io_get_work_hash(work_flags);
+	next = container_of(acct->work_list.first, struct io_wq_work, list);
+	work_flags = atomic_read(&next->flags);
+	if (!__io_wq_is_hashed(work_flags))
+		return false;
+	return hash == __io_get_work_hash(work_flags);
+}
+
 static void io_wq_dec_running(struct io_worker *worker)
 {
 	struct io_wq_acct *acct = io_wq_get_acct(worker);
@@ -433,6 +457,10 @@ static void io_wq_dec_running(struct io_worker *worker)
 		return;
 	if (!io_acct_run_queue(acct))
 		return;
+	if (io_wq_hash_defer(worker->cur_work, acct)) {
+		raw_spin_unlock(&acct->lock);
+		return;
+	}
 
 	raw_spin_unlock(&acct->lock);
 	atomic_inc(&acct->nr_running);
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2025-05-23 12:18 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-23 12:15 [PATCHSET 0/3] Avoid unnecessary io-wq worker creation Jens Axboe
2025-05-23 12:15 ` [PATCH 1/3] io_uring/io-wq: move hash helpers to the top Jens Axboe
2025-05-23 12:15 ` [PATCH 2/3] io_uring/io-wq: ignore non-busy worker going to sleep Jens Axboe
2025-05-23 12:15 ` [PATCH 3/3] io_uring/io-wq: only create a new worker if it can make progress Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).