From: Bernd Schubert <bernd@bsbernd.com>
To: Joanne Koong <joannelkoong@gmail.com>
Cc: "bernd@bsbernd.com" <bernd@bsbernd.com>,
Miklos Szeredi <miklos@szeredi.hu>,
"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
Luis Henriques <luis@igalia.com>, Gang He <dchg2000@gmail.com>
Subject: Re: [PATCH v4 5/8] fuse: {io-uring} Allow reduced number of ring queues
Date: Mon, 27 Apr 2026 15:49:37 +0200 [thread overview]
Message-ID: <e5a181d4-635f-4b91-b49a-e8d34b16f495@bsbernd.com> (raw)
In-Reply-To: <CAJnrk1aL7xW1Z+=ns=zaOtjcUvOZMRR9icz72B9Ye7d0jYgDeQ@mail.gmail.com>
On 4/27/26 15:10, Joanne Koong wrote:
> On Fri, Apr 24, 2026 at 11:01 PM Bernd Schubert <bschubert@ddn.com> wrote:
>>
>> On 4/24/26 20:28, Joanne Koong wrote:
>>> On Mon, Apr 13, 2026 at 2:41 AM Bernd Schubert via B4 Relay
>>> <devnull+bernd.bsbernd.com@kernel.org> wrote:
>>>>
>>>> From: Bernd Schubert <bschubert@ddn.com>
>>>>
>>>> Queues selection (fuse_uring_get_queue) can handle reduced number
>>>> queues - using io-uring is possible now even with a single
>>>> queue and entry.
>>>>
>>>> The FUSE_URING_REDUCED_Q flag is being introduce tell fuse server that
>>>> reduced queues are possible, i.e. if the flag is set, fuse server
>>>> is free to reduce number queues.
>>>>
>>>> Signed-off-by: Bernd Schubert <bschubert@ddn.com>
>>>> ---
>>>> fs/fuse/dev_uring.c | 160 ++++++++++++++++++++++++----------------------
>>>> fs/fuse/inode.c | 2 +-
>>>> include/uapi/linux/fuse.h | 3 +
>>>> 3 files changed, 88 insertions(+), 77 deletions(-)
>>>>
>>>> diff --git a/fs/fuse/dev_uring.c b/fs/fuse/dev_uring.c
>>>> index 9dcbc39531f0e019e5abf58a29cdf6c75fafdca1..e68089babaf89fb81741e4a5e605c6e36a137f9e 100644
>>>> --- a/fs/fuse/dev_uring.c
>>>> +++ b/fs/fuse/dev_uring.c
>>>>
>>>> -static struct fuse_ring_queue *fuse_uring_task_to_queue(struct fuse_ring *ring)
>>>> +static struct fuse_ring_queue *fuse_uring_select_queue(struct fuse_ring *ring)
>>>> {
>>>> unsigned int qid;
>>>> - struct fuse_ring_queue *queue;
>>>> + int node;
>>>> + unsigned int nr_queues;
>>>> + unsigned int cpu = task_cpu(current);
>>>>
>>>> - qid = task_cpu(current);
>>>> + cpu = cpu % ring->max_nr_queues;
>>>>
>>>> - if (WARN_ONCE(qid >= ring->max_nr_queues,
>>>> - "Core number (%u) exceeds nr queues (%zu)\n", qid,
>>>> - ring->max_nr_queues))
>>>> - qid = 0;
>>>> + /* numa local registered queue bitmap */
>>>> + node = cpu_to_node(cpu);
>>>> + if (WARN_ONCE(node >= ring->nr_numa_nodes,
>>>> + "Node number (%d) exceeds nr nodes (%d)\n",
>>>> + node, ring->nr_numa_nodes)) {
>>>> + node = 0;
>>>> + }
>>>>
>>>> - queue = ring->queues[qid];
>>>> - WARN_ONCE(!queue, "Missing queue for qid %d\n", qid);
>>>> + nr_queues = READ_ONCE(ring->numa_q_map[node].nr_queues);
>>>> + if (nr_queues) {
>>>> + qid = ring->numa_q_map[node].cpu_to_qid[cpu];
>>>> + if (WARN_ON_ONCE(qid >= ring->max_nr_queues))
>>>> + return NULL;
>>>> + return READ_ONCE(ring->queues[qid]);
>>>> + }
>>>
>>> Hi Bernd,
>>>
>>> Thanks for making the changes on this - I really like how much simpler
>>> the logic is now.
>>>
>>> I'm looking through how the block multiqueue code works
>>> (block/blk-mq.c and block/blk-mq-cpumap.c) because I think they
>>> basically have to do the same thing with figuring out which cpu to
>>> dispatch a request to.
>>>
>>> It looks like what they do is use group_cpus_evenly(), which as I
>>> understand it, will partition CPUs taking into account numa nodes (as
>>> well as clustering and SMT siblings). I think if we use this for fuse
>>> io-uring, it will make things a lot simpler and we could get rid of
>>> the per-numa state tracking (eg numa_q_map, registered_q_mask,
>>> nr_numa_nodes) and simplify queue selection where now that can just
>>> be a cpu to qid lookup instead of a two-level
>>> numa-then-global-fallback lookup.
>>>
>>> Do you think something like this makes sense?
>>
>> Maybe, I need to check that code. However, does this really need to be
>> done right now? This cannot be updated later? For me it looks a bit like
>> we are going to replace one code by another, without a clear advantage.
>> I can look into group_cpus_evenly(), but I cannot promise you when that
>> will happen.
>> My personal preference would be to work on real issue, like getting rid
>> of two locks (queue->lock and bg->lock) and distribute max_bg accross
>> queues. And that probably requires the distribution across queues, which
>> you didn't like in the previous series. Anway, already finding the time
>> for that is hard.
>
> Ok, we should go with what you have then and I'll submit changes as a
> separate patch.
>
>>
>> My personal opinion is that queue selection needs to return the qid, so
>> that the function can be overriden with eBPF. I didn't have time yet to
>> try that out.
>>
>>>
>>> Additionally, as I understand it, in this series, the ring->q_map
>>> mapping has to get rebuilt every time a new queue gets created. What
>>> do you think about just having the server declare the total queue
>>> count upfront and then the mapping can just get established at ring
>>> creation time? group_cpus_evenly() would only need to be called once,
>>> the cpu_to_qid map would only have to be built once, and we could
>>> avoid the rebuild-on-each-queue-creation complexity entirely. Do you
>>> think something like this makes sense?
>>
>> That is why I said in another mail that a config SQE would make to some
>> extend sense. However, the part where I disagree is that we could make
>> it all entirely dynamic with the current approach.
>> Only the logic for that in libfuse is missing. I.e. it _could_ start
>> with a single queue or one queue per numa and one ring entry. Basically
>> no memory usage then.
>> And now libfuse could add logic - many small requests - set up ring
>> entries with smaller payload size (or smaller pBuf). Many large requests
>> - add more requests with larger payload size. And with the current
>> approach queues can be added dynamically.
>
> For registered buffers and zero copy, the queue count is needed up
> front at registration time. imo it would be nice to have the
> registration interface for these things be cohesive. Dynamic queues
> could still be added in the future (eg server sends a new uring cmd to
> add a queue, kernel rebuilds mapping, etc).
And that cannot be _max_ number of queues for registered buffers?
Thanks,
Bernd
next prev parent reply other threads:[~2026-04-27 13:49 UTC|newest]
Thread overview: 39+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-13 9:41 [PATCH v4 0/8] fuse: {io-uring} Allow to reduce the number of queues and request distribution Bernd Schubert via B4 Relay
2026-04-13 9:41 ` [PATCH v4 1/8] fuse: {io-uring} Add queue length counters Bernd Schubert via B4 Relay
2026-04-13 9:41 ` [PATCH v4 2/8] fuse: {io-uring} Rename ring->nr_queues to max_nr_queues Bernd Schubert via B4 Relay
2026-04-27 15:35 ` Joanne Koong
2026-04-13 9:41 ` [PATCH v4 3/8] fuse: {io-uring} Use bitmaps to track registered queues Bernd Schubert via B4 Relay
2026-04-24 15:04 ` Luis Henriques
2026-04-24 15:33 ` Bernd Schubert
2026-04-27 8:02 ` Luis Henriques
2026-04-27 10:39 ` Bernd Schubert
2026-04-13 9:41 ` [PATCH v4 4/8] fuse: Fetch a queued fuse request on command registration Bernd Schubert via B4 Relay
2026-04-13 9:41 ` [PATCH v4 5/8] fuse: {io-uring} Allow reduced number of ring queues Bernd Schubert via B4 Relay
2026-04-24 15:15 ` Luis Henriques
2026-04-24 18:28 ` Joanne Koong
2026-04-24 22:00 ` Bernd Schubert
2026-04-27 13:10 ` Joanne Koong
2026-04-27 13:49 ` Bernd Schubert [this message]
2026-04-27 14:10 ` Joanne Koong
2026-04-27 14:42 ` Bernd Schubert
2026-04-27 15:10 ` Joanne Koong
2026-05-04 8:25 ` Bernd Schubert
2026-04-29 16:10 ` Joanne Koong
2026-04-29 16:24 ` Bernd Schubert
2026-04-29 16:32 ` Joanne Koong
2026-04-30 4:16 ` Darrick J. Wong
2026-04-13 9:41 ` [PATCH v4 6/8] fuse: {io-uring} Queue background requests on a different core Bernd Schubert via B4 Relay
2026-04-24 15:26 ` Luis Henriques
2026-04-27 12:08 ` Bernd Schubert
2026-04-29 14:43 ` Joanne Koong
2026-04-29 16:01 ` Bernd Schubert
2026-04-29 16:56 ` Joanne Koong
2026-04-29 20:19 ` Bernd Schubert
2026-04-13 9:41 ` [PATCH v4 7/8] fuse: Add retry attempts for numa local queues for load distribution Bernd Schubert via B4 Relay
2026-04-24 15:28 ` Luis Henriques
2026-04-29 15:03 ` Joanne Koong
2026-04-29 16:07 ` Bernd Schubert
2026-04-13 9:41 ` [PATCH v4 8/8] fuse: {io-uring} Prefer the current core over mapping Bernd Schubert via B4 Relay
2026-04-29 15:40 ` Joanne Koong
2026-04-29 16:11 ` Bernd Schubert
2026-04-29 16:15 ` [PATCH v4 0/8] fuse: {io-uring} Allow to reduce the number of queues and request distribution Joanne Koong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=e5a181d4-635f-4b91-b49a-e8d34b16f495@bsbernd.com \
--to=bernd@bsbernd.com \
--cc=dchg2000@gmail.com \
--cc=joannelkoong@gmail.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=luis@igalia.com \
--cc=miklos@szeredi.hu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox