From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from fout-a2-smtp.messagingengine.com (fout-a2-smtp.messagingengine.com [103.168.172.145]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2113C3CEB9B for ; Mon, 27 Apr 2026 13:49:40 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=103.168.172.145 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777297783; cv=none; b=iH+kq7y4FNNdCLBxSX1iQJCm4nvJ+4b5V6L1gKivS8xrpr1LfkfZttfnaKlYkiR1x/xCPDaig7gfWoNQWD+lONbEcMsTBCtcA5a2fqvYWN2wC9DkPQWbxqMoJnC2W6vSfb1vlRYHbe1DFb3TeBxViUNRUO9v2wRtZUkMSGKg2g8= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777297783; c=relaxed/simple; bh=AjJYeSAG0GnokD/V6W2wnCzTxj7FboCo+x/NUg9vvs4=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=qtNEWfRgUmsekfbONnkVC3VmiaPRbGSOP2AUj3qBuK4wByFCxBP+Sx5sKOhpph1NdZr6DM93MeWuuusRj+NxviTJFiNiq8qRFDIvpLAJOABbLY1jLUxB15xIHEAKx8lP95QItAXID/2hmNikoqir6Y1X1Qnl/QseoOgoQaEXDzQ= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=bsbernd.com; spf=pass smtp.mailfrom=bsbernd.com; dkim=pass (2048-bit key) header.d=bsbernd.com header.i=@bsbernd.com header.b=feiafxcZ; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b=t0KOzmJn; arc=none smtp.client-ip=103.168.172.145 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=bsbernd.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bsbernd.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bsbernd.com header.i=@bsbernd.com header.b="feiafxcZ"; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="t0KOzmJn" Received: from phl-compute-03.internal (phl-compute-03.internal [10.202.2.43]) by mailfout.phl.internal (Postfix) with ESMTP id 530F7EC1452; Mon, 27 Apr 2026 09:49:40 -0400 (EDT) Received: from phl-frontend-04 ([10.202.2.163]) by phl-compute-03.internal (MEProxy); Mon, 27 Apr 2026 09:49:40 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bsbernd.com; h= cc:cc:content-transfer-encoding:content-type:content-type:date :date:from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:subject:subject:to:to; s=fm1; t=1777297780; x=1777384180; bh=lSJ4xvZjU6+qO99lxh8ztaq7kd7ernUCqYATsOwQkqY=; b= feiafxcZYwj5ZICUWcMyQ/D7K3+51bat2q6rc+15zSdp+Kzgz9e9AIsOAOkyYUds 2vdnDxfRlC/QB2a7Gn4Sl7O5eN2UgFNR1SX6daHMTVt1to7qNmS6w7DZghYPqXKH 3tOFisi2hzoRKlH8WSQXk5hFK7kaICnrTXK/BWhXXDXPiYRqi1gBTnPsAzfvN2P4 2194d8uyinZWwVVKC771twXSw8R4okHN0ikid+/6vBiTJ0moPw4RVyB8W0mBfNz0 KPtfp7XkvTjohZ3Ev++12BnwDMBXrDVVsNaDr2jbx5yCBQ/1m67bl5M8ujPAX7vV uZOuJZgFMHA5YmUoB1AALQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:content-type:date:date:feedback-id:feedback-id :from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:subject:subject:to:to:x-me-proxy :x-me-sender:x-me-sender:x-sasl-enc; s=fm2; t=1777297780; x= 1777384180; bh=lSJ4xvZjU6+qO99lxh8ztaq7kd7ernUCqYATsOwQkqY=; b=t 0KOzmJnvKpaa/oXEi0kOO5lb6wdB1lSAxRvD12WOK4xhF+TgN7msUtPfX/HybwyK f3+NcaYWSbTbN6dWBjXcuObDjq9a6inOMCxq2un5XyEbQ54bloVL86WW5WnasbVU s1zBdjOgJPFeD7i9oFi9iL9DtkIhSY7+gj++sU/otdy9Kbn7mSQbe+IQ4wI7QJbJ LCdMocL88p1y8Eq5R0zUTBwDfgFlJeWDMRdwKDdMn0WSC3wu0oNby9U7DftUoSn3 RBt5s0zOWdX9X3PRpZj9Fr8EnMRlbTalRHVKePOcyaTXbKXkJKCWPPEoRJUxuwWH t8PFbbsO3zS9UQemyuaIQ== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeefhedrtddtgdejkeekhecutefuodetggdotefrod ftvfcurfhrohhfihhlvgemucfhrghsthforghilhdpuffrtefokffrpgfnqfghnecuuegr ihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmdenucfjug hrpefkffggfgfuvfevfhfhjggtgfesthekredttddvjeenucfhrhhomhepuegvrhhnugcu ufgthhhusggvrhhtuceosggvrhhnugessghssggvrhhnugdrtghomheqnecuggftrfgrth htvghrnhepfeeggeefffekudduleefheelleehgfffhedujedvgfetvedvtdefieehfeel gfdvnecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehmrghilhhfrhhomhepsg gvrhhnugessghssggvrhhnugdrtghomhdpnhgspghrtghpthhtohepiedpmhhouggvpehs mhhtphhouhhtpdhrtghpthhtohepjhhorghnnhgvlhhkohhonhhgsehgmhgrihhlrdgtoh hmpdhrtghpthhtohepsggvrhhnugessghssggvrhhnugdrtghomhdprhgtphhtthhopehm ihhklhhoshesshiivghrvgguihdrhhhupdhrtghpthhtoheplhhinhhugidqfhhsuggvvh gvlhesvhhgvghrrdhkvghrnhgvlhdrohhrghdprhgtphhtthhopehluhhishesihhgrghl ihgrrdgtohhmpdhrtghpthhtohepuggthhhgvddttddtsehgmhgrihhlrdgtohhm X-ME-Proxy: Feedback-ID: i5c2e48a5:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Mon, 27 Apr 2026 09:49:38 -0400 (EDT) Message-ID: Date: Mon, 27 Apr 2026 15:49:37 +0200 Precedence: bulk X-Mailing-List: linux-fsdevel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v4 5/8] fuse: {io-uring} Allow reduced number of ring queues To: Joanne Koong Cc: "bernd@bsbernd.com" , Miklos Szeredi , "linux-fsdevel@vger.kernel.org" , Luis Henriques , Gang He References: <20260413-reduced-nr-ring-queues_3-v4-0-982b6414b723@bsbernd.com> <20260413-reduced-nr-ring-queues_3-v4-5-982b6414b723@bsbernd.com> From: Bernd Schubert Content-Language: fr In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit On 4/27/26 15:10, Joanne Koong wrote: > On Fri, Apr 24, 2026 at 11:01 PM Bernd Schubert wrote: >> >> On 4/24/26 20:28, Joanne Koong wrote: >>> On Mon, Apr 13, 2026 at 2:41 AM Bernd Schubert via B4 Relay >>> wrote: >>>> >>>> From: Bernd Schubert >>>> >>>> Queues selection (fuse_uring_get_queue) can handle reduced number >>>> queues - using io-uring is possible now even with a single >>>> queue and entry. >>>> >>>> The FUSE_URING_REDUCED_Q flag is being introduce tell fuse server that >>>> reduced queues are possible, i.e. if the flag is set, fuse server >>>> is free to reduce number queues. >>>> >>>> Signed-off-by: Bernd Schubert >>>> --- >>>> fs/fuse/dev_uring.c | 160 ++++++++++++++++++++++++---------------------- >>>> fs/fuse/inode.c | 2 +- >>>> include/uapi/linux/fuse.h | 3 + >>>> 3 files changed, 88 insertions(+), 77 deletions(-) >>>> >>>> diff --git a/fs/fuse/dev_uring.c b/fs/fuse/dev_uring.c >>>> index 9dcbc39531f0e019e5abf58a29cdf6c75fafdca1..e68089babaf89fb81741e4a5e605c6e36a137f9e 100644 >>>> --- a/fs/fuse/dev_uring.c >>>> +++ b/fs/fuse/dev_uring.c >>>> >>>> -static struct fuse_ring_queue *fuse_uring_task_to_queue(struct fuse_ring *ring) >>>> +static struct fuse_ring_queue *fuse_uring_select_queue(struct fuse_ring *ring) >>>> { >>>> unsigned int qid; >>>> - struct fuse_ring_queue *queue; >>>> + int node; >>>> + unsigned int nr_queues; >>>> + unsigned int cpu = task_cpu(current); >>>> >>>> - qid = task_cpu(current); >>>> + cpu = cpu % ring->max_nr_queues; >>>> >>>> - if (WARN_ONCE(qid >= ring->max_nr_queues, >>>> - "Core number (%u) exceeds nr queues (%zu)\n", qid, >>>> - ring->max_nr_queues)) >>>> - qid = 0; >>>> + /* numa local registered queue bitmap */ >>>> + node = cpu_to_node(cpu); >>>> + if (WARN_ONCE(node >= ring->nr_numa_nodes, >>>> + "Node number (%d) exceeds nr nodes (%d)\n", >>>> + node, ring->nr_numa_nodes)) { >>>> + node = 0; >>>> + } >>>> >>>> - queue = ring->queues[qid]; >>>> - WARN_ONCE(!queue, "Missing queue for qid %d\n", qid); >>>> + nr_queues = READ_ONCE(ring->numa_q_map[node].nr_queues); >>>> + if (nr_queues) { >>>> + qid = ring->numa_q_map[node].cpu_to_qid[cpu]; >>>> + if (WARN_ON_ONCE(qid >= ring->max_nr_queues)) >>>> + return NULL; >>>> + return READ_ONCE(ring->queues[qid]); >>>> + } >>> >>> Hi Bernd, >>> >>> Thanks for making the changes on this - I really like how much simpler >>> the logic is now. >>> >>> I'm looking through how the block multiqueue code works >>> (block/blk-mq.c and block/blk-mq-cpumap.c) because I think they >>> basically have to do the same thing with figuring out which cpu to >>> dispatch a request to. >>> >>> It looks like what they do is use group_cpus_evenly(), which as I >>> understand it, will partition CPUs taking into account numa nodes (as >>> well as clustering and SMT siblings). I think if we use this for fuse >>> io-uring, it will make things a lot simpler and we could get rid of >>> the per-numa state tracking (eg numa_q_map, registered_q_mask, >>> nr_numa_nodes) and simplify queue selection where now that can just >>> be a cpu to qid lookup instead of a two-level >>> numa-then-global-fallback lookup. >>> >>> Do you think something like this makes sense? >> >> Maybe, I need to check that code. However, does this really need to be >> done right now? This cannot be updated later? For me it looks a bit like >> we are going to replace one code by another, without a clear advantage. >> I can look into group_cpus_evenly(), but I cannot promise you when that >> will happen. >> My personal preference would be to work on real issue, like getting rid >> of two locks (queue->lock and bg->lock) and distribute max_bg accross >> queues. And that probably requires the distribution across queues, which >> you didn't like in the previous series. Anway, already finding the time >> for that is hard. > > Ok, we should go with what you have then and I'll submit changes as a > separate patch. > >> >> My personal opinion is that queue selection needs to return the qid, so >> that the function can be overriden with eBPF. I didn't have time yet to >> try that out. >> >>> >>> Additionally, as I understand it, in this series, the ring->q_map >>> mapping has to get rebuilt every time a new queue gets created. What >>> do you think about just having the server declare the total queue >>> count upfront and then the mapping can just get established at ring >>> creation time? group_cpus_evenly() would only need to be called once, >>> the cpu_to_qid map would only have to be built once, and we could >>> avoid the rebuild-on-each-queue-creation complexity entirely. Do you >>> think something like this makes sense? >> >> That is why I said in another mail that a config SQE would make to some >> extend sense. However, the part where I disagree is that we could make >> it all entirely dynamic with the current approach. >> Only the logic for that in libfuse is missing. I.e. it _could_ start >> with a single queue or one queue per numa and one ring entry. Basically >> no memory usage then. >> And now libfuse could add logic - many small requests - set up ring >> entries with smaller payload size (or smaller pBuf). Many large requests >> - add more requests with larger payload size. And with the current >> approach queues can be added dynamically. > > For registered buffers and zero copy, the queue count is needed up > front at registration time. imo it would be nice to have the > registration interface for these things be cohesive. Dynamic queues > could still be added in the future (eg server sends a new uring cmd to > add a queue, kernel rebuilds mapping, etc). And that cannot be _max_ number of queues for registered buffers? Thanks, Bernd