public inbox for linux-fsdevel@vger.kernel.org
 help / color / mirror / Atom feed
From: Bernd Schubert <bernd@bsbernd.com>
To: Joanne Koong <joannelkoong@gmail.com>
Cc: "bernd@bsbernd.com" <bernd@bsbernd.com>,
	Miklos Szeredi <miklos@szeredi.hu>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	Luis Henriques <luis@igalia.com>, Gang He <dchg2000@gmail.com>
Subject: Re: [PATCH v4 5/8] fuse: {io-uring} Allow reduced number of ring queues
Date: Mon, 27 Apr 2026 15:49:37 +0200	[thread overview]
Message-ID: <e5a181d4-635f-4b91-b49a-e8d34b16f495@bsbernd.com> (raw)
In-Reply-To: <CAJnrk1aL7xW1Z+=ns=zaOtjcUvOZMRR9icz72B9Ye7d0jYgDeQ@mail.gmail.com>



On 4/27/26 15:10, Joanne Koong wrote:
> On Fri, Apr 24, 2026 at 11:01 PM Bernd Schubert <bschubert@ddn.com> wrote:
>>
>> On 4/24/26 20:28, Joanne Koong wrote:
>>> On Mon, Apr 13, 2026 at 2:41 AM Bernd Schubert via B4 Relay
>>> <devnull+bernd.bsbernd.com@kernel.org> wrote:
>>>>
>>>> From: Bernd Schubert <bschubert@ddn.com>
>>>>
>>>> Queues selection (fuse_uring_get_queue) can handle reduced number
>>>> queues - using io-uring is possible now even with a single
>>>> queue and entry.
>>>>
>>>> The FUSE_URING_REDUCED_Q flag is being introduce tell fuse server that
>>>> reduced queues are possible, i.e. if the flag is set, fuse server
>>>> is free to reduce number queues.
>>>>
>>>> Signed-off-by: Bernd Schubert <bschubert@ddn.com>
>>>> ---
>>>>  fs/fuse/dev_uring.c       | 160 ++++++++++++++++++++++++----------------------
>>>>  fs/fuse/inode.c           |   2 +-
>>>>  include/uapi/linux/fuse.h |   3 +
>>>>  3 files changed, 88 insertions(+), 77 deletions(-)
>>>>
>>>> diff --git a/fs/fuse/dev_uring.c b/fs/fuse/dev_uring.c
>>>> index 9dcbc39531f0e019e5abf58a29cdf6c75fafdca1..e68089babaf89fb81741e4a5e605c6e36a137f9e 100644
>>>> --- a/fs/fuse/dev_uring.c
>>>> +++ b/fs/fuse/dev_uring.c
>>>>
>>>> -static struct fuse_ring_queue *fuse_uring_task_to_queue(struct fuse_ring *ring)
>>>> +static struct fuse_ring_queue *fuse_uring_select_queue(struct fuse_ring *ring)
>>>>  {
>>>>         unsigned int qid;
>>>> -       struct fuse_ring_queue *queue;
>>>> +       int node;
>>>> +       unsigned int nr_queues;
>>>> +       unsigned int cpu = task_cpu(current);
>>>>
>>>> -       qid = task_cpu(current);
>>>> +       cpu = cpu % ring->max_nr_queues;
>>>>
>>>> -       if (WARN_ONCE(qid >= ring->max_nr_queues,
>>>> -                     "Core number (%u) exceeds nr queues (%zu)\n", qid,
>>>> -                     ring->max_nr_queues))
>>>> -               qid = 0;
>>>> +       /* numa local registered queue bitmap */
>>>> +       node = cpu_to_node(cpu);
>>>> +       if (WARN_ONCE(node >= ring->nr_numa_nodes,
>>>> +                     "Node number (%d) exceeds nr nodes (%d)\n",
>>>> +                     node, ring->nr_numa_nodes)) {
>>>> +               node = 0;
>>>> +       }
>>>>
>>>> -       queue = ring->queues[qid];
>>>> -       WARN_ONCE(!queue, "Missing queue for qid %d\n", qid);
>>>> +       nr_queues = READ_ONCE(ring->numa_q_map[node].nr_queues);
>>>> +       if (nr_queues) {
>>>> +               qid = ring->numa_q_map[node].cpu_to_qid[cpu];
>>>> +               if (WARN_ON_ONCE(qid >= ring->max_nr_queues))
>>>> +                       return NULL;
>>>> +               return READ_ONCE(ring->queues[qid]);
>>>> +       }
>>>
>>> Hi Bernd,
>>>
>>> Thanks for making the changes on this - I really like how much simpler
>>> the logic is now.
>>>
>>> I'm looking through how the block multiqueue code works
>>> (block/blk-mq.c and block/blk-mq-cpumap.c) because I think they
>>> basically have to do the same thing with figuring out which cpu to
>>> dispatch a request to.
>>>
>>> It looks like what they do is use group_cpus_evenly(), which as I
>>> understand it, will partition CPUs taking into account numa nodes (as
>>> well as clustering and SMT siblings). I think if we use this for fuse
>>> io-uring, it will make things a lot simpler and we could get rid of
>>> the per-numa state tracking (eg numa_q_map, registered_q_mask,
>>> nr_numa_nodes)  and simplify queue selection where now that can just
>>> be a cpu to qid lookup instead of a two-level
>>> numa-then-global-fallback lookup.
>>>
>>> Do you think something like this makes sense?
>>
>> Maybe, I need to check that code. However, does this really need to be
>> done right now? This cannot be updated later? For me it looks a bit like
>> we are going to replace one code by another, without a clear advantage.
>> I can look into group_cpus_evenly(), but I cannot promise you when that
>> will happen.
>> My personal preference would be to work on real issue, like getting rid
>> of two locks (queue->lock and bg->lock) and distribute max_bg accross
>> queues. And that probably requires the distribution across queues, which
>> you didn't like in the previous series. Anway, already finding the time
>> for that is hard.
> 
> Ok, we should go with what you have then and I'll submit changes as a
> separate patch.
> 
>>
>> My personal opinion is that queue selection needs to return the qid, so
>> that the function can be overriden with eBPF. I didn't have time yet to
>> try that out.
>>
>>>
>>> Additionally, as I understand it, in this series, the ring->q_map
>>> mapping has to get rebuilt every time a new queue gets created. What
>>> do you think about just having the server declare the total queue
>>> count upfront and then the mapping can just get established at ring
>>> creation time? group_cpus_evenly() would only need to be called once,
>>> the cpu_to_qid map would only have to be built once, and we could
>>> avoid the rebuild-on-each-queue-creation complexity entirely. Do you
>>> think something like this makes sense?
>>
>> That is why I said in another mail that a config SQE would make to some
>> extend sense. However, the part where I disagree is that we could make
>> it all entirely dynamic with the current approach.
>> Only the logic for that in libfuse is missing. I.e. it _could_ start
>> with a single queue or one queue per numa and one ring entry. Basically
>> no memory usage then.
>> And now libfuse could add logic - many small requests - set up ring
>> entries with smaller payload size (or smaller pBuf). Many large requests
>> - add more requests with larger payload size. And with the current
>> approach queues can be added dynamically.
> 
> For registered buffers and zero copy, the queue count is needed up
> front at registration time. imo it would be nice to have the
> registration interface for these things be cohesive. Dynamic queues
> could still be added in the future (eg server sends a new uring cmd to
> add a queue, kernel rebuilds mapping, etc).

And that cannot be _max_ number of queues for registered buffers?

Thanks,
Bernd

  reply	other threads:[~2026-04-27 13:49 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-13  9:41 [PATCH v4 0/8] fuse: {io-uring} Allow to reduce the number of queues and request distribution Bernd Schubert via B4 Relay
2026-04-13  9:41 ` [PATCH v4 1/8] fuse: {io-uring} Add queue length counters Bernd Schubert via B4 Relay
2026-04-13  9:41 ` [PATCH v4 2/8] fuse: {io-uring} Rename ring->nr_queues to max_nr_queues Bernd Schubert via B4 Relay
2026-04-27 15:35   ` Joanne Koong
2026-04-13  9:41 ` [PATCH v4 3/8] fuse: {io-uring} Use bitmaps to track registered queues Bernd Schubert via B4 Relay
2026-04-24 15:04   ` Luis Henriques
2026-04-24 15:33     ` Bernd Schubert
2026-04-27  8:02       ` Luis Henriques
2026-04-27 10:39         ` Bernd Schubert
2026-04-13  9:41 ` [PATCH v4 4/8] fuse: Fetch a queued fuse request on command registration Bernd Schubert via B4 Relay
2026-04-13  9:41 ` [PATCH v4 5/8] fuse: {io-uring} Allow reduced number of ring queues Bernd Schubert via B4 Relay
2026-04-24 15:15   ` Luis Henriques
2026-04-24 18:28   ` Joanne Koong
2026-04-24 22:00     ` Bernd Schubert
2026-04-27 13:10       ` Joanne Koong
2026-04-27 13:49         ` Bernd Schubert [this message]
2026-04-27 14:10           ` Joanne Koong
2026-04-27 14:42             ` Bernd Schubert
2026-04-27 15:10               ` Joanne Koong
2026-05-04  8:25         ` Bernd Schubert
2026-04-29 16:10       ` Joanne Koong
2026-04-29 16:24         ` Bernd Schubert
2026-04-29 16:32           ` Joanne Koong
2026-04-30  4:16             ` Darrick J. Wong
2026-04-13  9:41 ` [PATCH v4 6/8] fuse: {io-uring} Queue background requests on a different core Bernd Schubert via B4 Relay
2026-04-24 15:26   ` Luis Henriques
2026-04-27 12:08     ` Bernd Schubert
2026-04-29 14:43   ` Joanne Koong
2026-04-29 16:01     ` Bernd Schubert
2026-04-29 16:56       ` Joanne Koong
2026-04-29 20:19         ` Bernd Schubert
2026-04-13  9:41 ` [PATCH v4 7/8] fuse: Add retry attempts for numa local queues for load distribution Bernd Schubert via B4 Relay
2026-04-24 15:28   ` Luis Henriques
2026-04-29 15:03   ` Joanne Koong
2026-04-29 16:07     ` Bernd Schubert
2026-04-13  9:41 ` [PATCH v4 8/8] fuse: {io-uring} Prefer the current core over mapping Bernd Schubert via B4 Relay
2026-04-29 15:40   ` Joanne Koong
2026-04-29 16:11     ` Bernd Schubert
2026-04-29 16:15 ` [PATCH v4 0/8] fuse: {io-uring} Allow to reduce the number of queues and request distribution Joanne Koong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e5a181d4-635f-4b91-b49a-e8d34b16f495@bsbernd.com \
    --to=bernd@bsbernd.com \
    --cc=dchg2000@gmail.com \
    --cc=joannelkoong@gmail.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=luis@igalia.com \
    --cc=miklos@szeredi.hu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox