linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] fuse: Wake requests on the same cpu
@ 2025-10-13 17:27 Bernd Schubert
  2025-10-14  7:25 ` Johannes Thumshirn
  0 siblings, 1 reply; 3+ messages in thread
From: Bernd Schubert @ 2025-10-13 17:27 UTC (permalink / raw)
  To: Miklos Szeredi, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Valentin Schneider
  Cc: Joanne Koong, Luis Henriques, linux-fsdevel, Bernd Schubert

For io-uring it makes sense to wake the waiting application (synchronous
IO) on the same core.

With queue-per-pore

fio --directory=/tmp/dest --name=iops.\$jobnum --rw=randread --bs=4k \
    --size=1G --numjobs=1 --iodepth=1 --time_based --runtime=30s
    \ --group_reporting --ioengine=psync --direct=1

no-io-uring
   READ: bw=116MiB/s (122MB/s), 116MiB/s-116MiB/s
no-io-uring wake on the same core (not part of this patch)
   READ: bw=115MiB/s (120MB/s), 115MiB/s-115MiB/s
unpatched
   READ: bw=260MiB/s (273MB/s), 260MiB/s-260MiB/s
patched
   READ: bw=345MiB/s (362MB/s), 345MiB/s-345MiB/s

Without io-uring and core bound fuse-server queues there is almost
not difference. In fact, fio results are very fluctuating, in
between 85MB/s and 205MB/s during the run.

With --numjobs=8

unpatched
   READ: bw=2378MiB/s (2493MB/s), 2378MiB/s-2378MiB/s
patched
   READ: bw=2402MiB/s (2518MB/s), 2402MiB/s-2402MiB/s
(differences within the confidence interval)

'-o io_uring_q_mask=0-3:8-11' (16 core / 32 SMT core system) and

unpatched
   READ: bw=1286MiB/s (1348MB/s), 1286MiB/s-1286MiB/s
patched
   READ: bw=1561MiB/s (1637MB/s), 1561MiB/s-1561MiB/s

I.e. no differences with many application threads and queue-per-core,
but perf gain with overloaded queues - a bit surprising.

Signed-off-by: Bernd Schubert <bschubert@ddn.com>
---
This was already part of the RFC series and was then removed on
request to keep out optimizations from the main fuse-io-uring
series.
Later I was hesitating to add it back, as I was working on reducing the
required number of queues/rings and initially thought
wake-on-current-cpu needs to be a conditional if queue-per-core or
a reduced number of queues is used.
After testing with reduced number of queues, there is still a measurable
benefit with reduced number of queues - no condition on that needed
and the patch can be handled independently of queue size reduction.
---
 fs/fuse/dev.c        |  8 ++++++--
 include/linux/wait.h |  6 +++---
 kernel/sched/wait.c  | 12 ++++++++++++
 3 files changed, 21 insertions(+), 5 deletions(-)

diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index 132f38619d70720ce74eedc002a7b8f31e760a61..0f73ef9f77b463b6dfd07e35262dc3375648c56f 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -499,8 +499,12 @@ void fuse_request_end(struct fuse_req *req)
 		flush_bg_queue(fc);
 		spin_unlock(&fc->bg_lock);
 	} else {
-		/* Wake up waiter sleeping in request_wait_answer() */
-		wake_up(&req->waitq);
+		if (test_bit(FR_URING, &req->flags)) {
+			wake_up_on_current_cpu(&req->waitq);
+		} else {
+			/* Wake up waiter sleeping in request_wait_answer() */
+			wake_up(&req->waitq);
+		}
 	}
 
 	if (test_bit(FR_ASYNC, &req->flags))
diff --git a/include/linux/wait.h b/include/linux/wait.h
index f648044466d5f55f2d65a3aa153b4dfe39f0b6dc..831a187b3f68f0707c75ceee919fec338db410b3 100644
--- a/include/linux/wait.h
+++ b/include/linux/wait.h
@@ -219,6 +219,7 @@ void __wake_up_sync(struct wait_queue_head *wq_head, unsigned int mode);
 void __wake_up_pollfree(struct wait_queue_head *wq_head);
 
 #define wake_up(x)			__wake_up(x, TASK_NORMAL, 1, NULL)
+#define wake_up_on_current_cpu(x)	__wake_up_on_current_cpu(x, TASK_NORMAL, NULL)
 #define wake_up_nr(x, nr)		__wake_up(x, TASK_NORMAL, nr, NULL)
 #define wake_up_all(x)			__wake_up(x, TASK_NORMAL, 0, NULL)
 #define wake_up_locked(x)		__wake_up_locked((x), TASK_NORMAL, 1)
@@ -479,9 +480,8 @@ do {										\
 	__wait_event_cmd(wq_head, condition, cmd1, cmd2);			\
 } while (0)
 
-#define __wait_event_interruptible(wq_head, condition)				\
-	___wait_event(wq_head, condition, TASK_INTERRUPTIBLE, 0, 0,		\
-		      schedule())
+#define __wait_event_interruptible(wq_head, condition) \
+	___wait_event(wq_head, condition, TASK_INTERRUPTIBLE, 0, 0, schedule())
 
 /**
  * wait_event_interruptible - sleep until a condition gets true
diff --git a/kernel/sched/wait.c b/kernel/sched/wait.c
index 20f27e2cf7aec691af040fcf2236a20374ec66bf..1c6943a620ae389590a9d06577b998c320310923 100644
--- a/kernel/sched/wait.c
+++ b/kernel/sched/wait.c
@@ -147,10 +147,22 @@ int __wake_up(struct wait_queue_head *wq_head, unsigned int mode,
 }
 EXPORT_SYMBOL(__wake_up);
 
+/**
+ * __wake_up - wake up threads blocked on a waitqueue, on the current cpu
+ * @wq_head: the waitqueue
+ * @mode: which threads
+ * @nr_exclusive: how many wake-one or wake-many threads to wake up
+ * @key: is directly passed to the wakeup function
+ *
+ * If this function wakes up a task, it executes a full memory barrier
+ * before accessing the task state.  Returns the number of exclusive
+ * tasks that were awaken.
+ */
 void __wake_up_on_current_cpu(struct wait_queue_head *wq_head, unsigned int mode, void *key)
 {
 	__wake_up_common_lock(wq_head, mode, 1, WF_CURRENT_CPU, key);
 }
+EXPORT_SYMBOL_GPL(__wake_up_on_current_cpu);
 
 /*
  * Same as __wake_up but called with the spinlock in wait_queue_head_t held.

---
base-commit: ec714e371f22f716a04e6ecb2a24988c92b26911
change-id: 20251013-wake-same-cpu-b7ddb0b0688e

Best regards,
-- 
Bernd Schubert <bschubert@ddn.com>


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] fuse: Wake requests on the same cpu
  2025-10-13 17:27 [PATCH] fuse: Wake requests on the same cpu Bernd Schubert
@ 2025-10-14  7:25 ` Johannes Thumshirn
  2025-10-14  9:12   ` Bernd Schubert
  0 siblings, 1 reply; 3+ messages in thread
From: Johannes Thumshirn @ 2025-10-14  7:25 UTC (permalink / raw)
  To: Bernd Schubert, Miklos Szeredi, Ingo Molnar, Peter Zijlstra,
	Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt,
	Ben Segall, Mel Gorman, Valentin Schneider
  Cc: Joanne Koong, Luis Henriques, linux-fsdevel@vger.kernel.org

On 10/13/25 9:01 PM, Bernd Schubert wrote:
> +/**
> + * __wake_up - wake up threads blocked on a waitqueue, on the current cpu
That needs to be __wake_up_on_current_cpu
[..]
>   void __wake_up_on_current_cpu(struct wait_queue_head *wq_head, unsigned int mode, void *key)
>   {
>   	__wake_up_common_lock(wq_head, mode, 1, WF_CURRENT_CPU, key);
>   }
> +EXPORT_SYMBOL_GPL(__wake_up_on_current_cpu);



^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] fuse: Wake requests on the same cpu
  2025-10-14  7:25 ` Johannes Thumshirn
@ 2025-10-14  9:12   ` Bernd Schubert
  0 siblings, 0 replies; 3+ messages in thread
From: Bernd Schubert @ 2025-10-14  9:12 UTC (permalink / raw)
  To: Johannes Thumshirn, Miklos Szeredi, Ingo Molnar, Peter Zijlstra,
	Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt,
	Ben Segall, Mel Gorman, Valentin Schneider
  Cc: Joanne Koong, Luis Henriques, linux-fsdevel@vger.kernel.org

On 10/14/25 09:25, Johannes Thumshirn wrote:
> [You don't often get email from johannes.thumshirn@wdc.com. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]
> 
> On 10/13/25 9:01 PM, Bernd Schubert wrote:
>> +/**
>> + * __wake_up - wake up threads blocked on a waitqueue, on the current cpu
> That needs to be __wake_up_on_current_cpu
> [..]
>>   void __wake_up_on_current_cpu(struct wait_queue_head *wq_head, unsigned int mode, void *key)
>>   {
>>       __wake_up_common_lock(wq_head, mode, 1, WF_CURRENT_CPU, key);
>>   }
>> +EXPORT_SYMBOL_GPL(__wake_up_on_current_cpu);
> 
> 

Oops, thanks and spotting! v2 is coming.


Thanks,
Bernd

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2025-10-14  9:12 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-13 17:27 [PATCH] fuse: Wake requests on the same cpu Bernd Schubert
2025-10-14  7:25 ` Johannes Thumshirn
2025-10-14  9:12   ` Bernd Schubert

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).