Re: [PATCH v4 6/8] fuse: {io-uring} Queue background requests on a different core

public inbox for linux-fsdevel@vger.kernel.org
 help / color / mirror / Atom feed

From: Bernd Schubert <bschubert@ddn.com>
To: Luis Henriques <luis@igalia.com>,
	Bernd Schubert via B4 Relay
	<devnull+bernd.bsbernd.com@kernel.org>
Cc: Miklos Szeredi <miklos@szeredi.hu>,
	"bernd@bsbernd.com" <bernd@bsbernd.com>,
	Joanne Koong <joannelkoong@gmail.com>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	Gang He <dchg2000@gmail.com>
Subject: Re: [PATCH v4 6/8] fuse: {io-uring} Queue background requests on a different core
Date: Mon, 27 Apr 2026 12:08:55 +0000	[thread overview]
Message-ID: <ed5488cf-9f73-4337-9b06-57e3beddbaba@ddn.com> (raw)
In-Reply-To: <87v7dgmlb5.fsf@igalia.com>

On 4/24/26 17:26, Luis Henriques wrote:
> On Mon, Apr 13 2026, Bernd Schubert via B4 Relay wrote:
> 
>> From: Bernd Schubert <bschubert@ddn.com>
>>
>> Running background IO on a different core makes quite a difference.
>>
>> fio --directory=/tmp/dest --name=iops.\$jobnum --rw=randread \
>> --bs=4k --size=1G --numjobs=1 --iodepth=4 --time_based\
>> --runtime=30s --group_reporting --ioengine=io_uring\
>>  --direct=1
>>
>> unpatched
>>    READ: bw=272MiB/s (285MB/s) ...
>> patched
>>    READ: bw=650MiB/s (682MB/s)
>>
>> Reason is easily visible, the fio process is migrating between CPUs
>> when requests are submitted on the queue for the same core.
>>
>> With --iodepth=8
>>
>> unpatched
>>    READ: bw=466MiB/s (489MB/s)
>> patched
>>    READ: bw=641MiB/s (672MB/s)
>>
>> Without io-uring (--iodepth=8)
>>    READ: bw=729MiB/s (764MB/s)
>>
>> Without fuse (--iodepth=8)
>>    READ: bw=2199MiB/s (2306MB/s)
>>
>> (Test were done with
>> <libfuse>/example/passthrough_hp -o allow_other --nopassthrough  \
>> [-o io_uring] /tmp/source /tmp/dest
>> )
>>
>> Additional notes:
>>
>> With FURING_NEXT_QUEUE_RETRIES=0 (--iodepth=8)
>>    READ: bw=903MiB/s (946MB/s)
>>
>> With just a random qid (--iodepth=8)
>>    READ: bw=429MiB/s (450MB/s)
>>
>> With --iodepth=1
>> unpatched
>>    READ: bw=195MiB/s (204MB/s)
>> patched
>>    READ: bw=232MiB/s (243MB/s)
>>
>> With --iodepth=1 --numjobs=2
>> unpatched
>>    READ: bw=366MiB/s (384MB/s)
>> patched
>>    READ: bw=472MiB/s (495MB/s)
>>
>> With --iodepth=1 --numjobs=8
>> unpatched
>>    READ: bw=1437MiB/s (1507MB/s)
>> patched
>>    READ: bw=1529MiB/s (1603MB/s)
>> fuse without io-uring
>>    READ: bw=1314MiB/s (1378MB/s), 1314MiB/s-1314MiB/s ...
>> no-fuse
>>    READ: bw=2566MiB/s (2690MB/s), 2566MiB/s-2566MiB/s ...
>>
>> In summary, for async requests the core doing application IO is busy
>> sending requests and processing IOs should be done on a different core.
>> Spreading the load on random cores is also not desirable, as the core
>> might be frequency scaled down and/or in C1 sleep states. Not shown here,
>> but differnces are much smaller when the system uses performance govenor
>> instead of schedutil (ubuntu default). Obviously at the cost of higher
>> system power consumption for performance govenor - not desirable either.
>>
>> Results without io-uring (which uses fixed libfuse threads per queue)
>> heavily depend on the current number of active threads. Libfuse uses
>> default of max 10 threads, but actual nr max threads is a parameter.
>> Also, no-fuse-io-uring results heavily depend on, if there was already
>> running another workload before, as libfuse starts these threads
>> dynamically - i.e. the more threads are active, the worse the
>> performance.
>>
>> Signed-off-by: Bernd Schubert <bschubert@ddn.com>
>> ---
>>  fs/fuse/dev_uring.c | 14 +++++++++++---
>>  1 file changed, 11 insertions(+), 3 deletions(-)
>>
>> diff --git a/fs/fuse/dev_uring.c b/fs/fuse/dev_uring.c
>> index e68089babaf89fb81741e4a5e605c6e36a137f9e..ed061e239b8ed70ff36deb51dd6957fe1704ec87 100644
>> --- a/fs/fuse/dev_uring.c
>> +++ b/fs/fuse/dev_uring.c
>> @@ -1306,13 +1306,21 @@ static void fuse_uring_send_in_task(struct io_tw_req tw_req, io_tw_token_t tw)
>>  	fuse_uring_send(ent, cmd, err, issue_flags);
>>  }
>>  
>> -static struct fuse_ring_queue *fuse_uring_select_queue(struct fuse_ring *ring)
>> +static struct fuse_ring_queue *fuse_uring_select_queue(struct fuse_ring *ring,
>> +						       bool background)
>>  {
>>  	unsigned int qid;
>>  	int node;
>>  	unsigned int nr_queues;
>>  	unsigned int cpu = task_cpu(current);
>>  
>> +	/*
>> +	 *  Background requests result in better performance on a different
>> +	 *  CPU, unless CPUs are already busy.
>> +	 */
>> +	if (background)
>> +		cpu++;
>> +
> 
> The performance number look great, but I was wondering if you get similar
> improvements for write operations.

I didn't test yet, but this is still direct-io - what would be the
difference for writes here?

> 
> Also, isn't 'cpu++' too arbitrary?  I mean, isn't there some heuristics
> that could be used?  I understand the goal is just to push the request
> somewhere else, but does it make sense to push it to the next cpu on the
> same node?  Or to the next cpu in a different core?  I'm just thinking out
> loud, and maybe this is non-sense ;-)

My thinking is that the scheduler will take care of it and move the task
that is running on cpu1, if there is any. In principle we would need to
have help from the task scheduler here to provide another cpu.

> 
> Finally, shouldn't this behaviour be behind some knob?  Maybe it's
> over-complicating for no good reason, but being able to: 1) enable/disable
> it, 2) enable by pushing it to the next cpu (this behaviour), 3) enable by
> pushing to the next cpu on the same/different node, etc.


I think any further complex logic should go into userspace. We need to
try out to override the function with eBFS from libfuse. I don't think
it makes too much sense to add very complex logic and settings into
kernel code - we can see that the patches have advantages - what speaks
to take it as it is? With the exception that fuse_uring_select_queue()
needs to return an integer (qid) to make it eBPF overridable.

My personal issue here is rather simple, I was basically working all
weekend through with different things (as the past weekends) and no end
in sight (1 day per week is contractor work for DDN). Several reports
and reported issues in libfuse are waiting.
After working the rather large sync FUSE_INIT for selinux next comes to
add a generic interface to libfuse io-ioring for coroutines - I don't
have the time to severely change the logic of reduced queues and adding
knobs. Besides that FUSE_INIT almost running out of space.

Thanks,
Bernd

next prev parent reply	other threads:[~2026-04-27 12:43 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-13  9:41 [PATCH v4 0/8] fuse: {io-uring} Allow to reduce the number of queues and request distribution Bernd Schubert via B4 Relay
2026-04-13  9:41 ` [PATCH v4 1/8] fuse: {io-uring} Add queue length counters Bernd Schubert via B4 Relay
2026-04-13  9:41 ` [PATCH v4 2/8] fuse: {io-uring} Rename ring->nr_queues to max_nr_queues Bernd Schubert via B4 Relay
2026-04-27 15:35   ` Joanne Koong
2026-04-13  9:41 ` [PATCH v4 3/8] fuse: {io-uring} Use bitmaps to track registered queues Bernd Schubert via B4 Relay
2026-04-24 15:04   ` Luis Henriques
2026-04-24 15:33     ` Bernd Schubert
2026-04-27  8:02       ` Luis Henriques
2026-04-27 10:39         ` Bernd Schubert
2026-04-13  9:41 ` [PATCH v4 4/8] fuse: Fetch a queued fuse request on command registration Bernd Schubert via B4 Relay
2026-04-13  9:41 ` [PATCH v4 5/8] fuse: {io-uring} Allow reduced number of ring queues Bernd Schubert via B4 Relay
2026-04-24 15:15   ` Luis Henriques
2026-04-24 18:28   ` Joanne Koong
2026-04-24 22:00     ` Bernd Schubert
2026-04-27 13:10       ` Joanne Koong
2026-04-27 13:49         ` Bernd Schubert
2026-04-27 14:10           ` Joanne Koong
2026-04-27 14:42             ` Bernd Schubert
2026-04-27 15:10               ` Joanne Koong
2026-04-29 16:10       ` Joanne Koong
2026-04-29 16:24         ` Bernd Schubert
2026-04-29 16:32           ` Joanne Koong
2026-04-30  4:16             ` Darrick J. Wong
2026-04-13  9:41 ` [PATCH v4 6/8] fuse: {io-uring} Queue background requests on a different core Bernd Schubert via B4 Relay
2026-04-24 15:26   ` Luis Henriques
2026-04-27 12:08     ` Bernd Schubert [this message]
2026-04-29 14:43   ` Joanne Koong
2026-04-29 16:01     ` Bernd Schubert
2026-04-29 16:56       ` Joanne Koong
2026-04-29 20:19         ` Bernd Schubert
2026-04-13  9:41 ` [PATCH v4 7/8] fuse: Add retry attempts for numa local queues for load distribution Bernd Schubert via B4 Relay
2026-04-24 15:28   ` Luis Henriques
2026-04-29 15:03   ` Joanne Koong
2026-04-29 16:07     ` Bernd Schubert
2026-04-13  9:41 ` [PATCH v4 8/8] fuse: {io-uring} Prefer the current core over mapping Bernd Schubert via B4 Relay
2026-04-29 15:40   ` Joanne Koong
2026-04-29 16:11     ` Bernd Schubert
2026-04-29 16:15 ` [PATCH v4 0/8] fuse: {io-uring} Allow to reduce the number of queues and request distribution Joanne Koong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ed5488cf-9f73-4337-9b06-57e3beddbaba@ddn.com \
    --to=bschubert@ddn.com \
    --cc=bernd@bsbernd.com \
    --cc=dchg2000@gmail.com \
    --cc=devnull+bernd.bsbernd.com@kernel.org \
    --cc=joannelkoong@gmail.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=luis@igalia.com \
    --cc=miklos@szeredi.hu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox