[PATCH 0/5] fuse: {io-uring} Allow to reduce the number of queues and request distribution

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/5] fuse: {io-uring} Allow to reduce the number of queues and request distribution
@ 2025-07-22 21:57 Bernd Schubert
  2025-07-22 21:57 ` [PATCH 1/5] fuse: {io-uring} Add queue length counters Bernd Schubert
                   ` (4 more replies)
  0 siblings, 5 replies; 10+ messages in thread
From: Bernd Schubert @ 2025-07-22 21:57 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Joanne Koong, linux-fsdevel, Bernd Schubert

This adds bitmaps that track which queues are registered and which queues
do not have queued requests.
These bitmaps are then used to map from request core to queue
and also allow load distribution. NUMA affinity is handled and
fuse client/server protocol does not need changes, all is handled
in fuse client internally.

Signed-off-by: Bernd Schubert <bschubert@ddn.com>
---
Bernd Schubert (5):
      fuse: {io-uring} Add queue length counters
      fuse: {io-uring} Rename ring->nr_queues to max_nr_queues
      fuse: {io-uring} Use bitmaps to track queue availability
      fuse: {io-uring} Distribute load among queues
      fuse: {io-uring} Allow reduced number of ring queues

 fs/fuse/dev_uring.c   | 308 ++++++++++++++++++++++++++++++++++++++++++--------
 fs/fuse/dev_uring_i.h |  26 ++++-
 2 files changed, 286 insertions(+), 48 deletions(-)
---
base-commit: 6832a9317eee280117cd695fa885b2b7a7a38daf
change-id: 20250722-reduced-nr-ring-queues_3-6acb79dad978

Best regards,
-- 
Bernd Schubert <bschubert@ddn.com>


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/5] fuse: {io-uring} Add queue length counters
  2025-07-22 21:57 [PATCH 0/5] fuse: {io-uring} Allow to reduce the number of queues and request distribution Bernd Schubert
@ 2025-07-22 21:57 ` Bernd Schubert
  2025-07-24 21:07   ` Joanne Koong
  2025-07-22 21:57 ` [PATCH 2/5] fuse: {io-uring} Rename ring->nr_queues to max_nr_queues Bernd Schubert
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 10+ messages in thread
From: Bernd Schubert @ 2025-07-22 21:57 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Joanne Koong, linux-fsdevel, Bernd Schubert

This is another preparation and will be used for decision
which queue to add a request to.
---
 fs/fuse/dev_uring.c   | 17 +++++++++++++++--
 fs/fuse/dev_uring_i.h |  3 +++
 2 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/fs/fuse/dev_uring.c b/fs/fuse/dev_uring.c
index 249b210becb1cc2b40ae7b2fdf3a57dc57eaac42..2f2f7ff5e95a63a4df76f484d30cce1077b29123 100644
--- a/fs/fuse/dev_uring.c
+++ b/fs/fuse/dev_uring.c
@@ -85,13 +85,13 @@ static void fuse_uring_req_end(struct fuse_ring_ent *ent, struct fuse_req *req,
 	lockdep_assert_not_held(&queue->lock);
 	spin_lock(&queue->lock);
 	ent->fuse_req = NULL;
+	queue->nr_reqs--;
 	if (test_bit(FR_BACKGROUND, &req->flags)) {
 		queue->active_background--;
 		spin_lock(&fc->bg_lock);
 		fuse_uring_flush_bg(queue);
 		spin_unlock(&fc->bg_lock);
 	}
-
 	spin_unlock(&queue->lock);
 
 	if (error)
@@ -111,6 +111,7 @@ static void fuse_uring_abort_end_queue_requests(struct fuse_ring_queue *queue)
 	list_for_each_entry(req, &queue->fuse_req_queue, list)
 		clear_bit(FR_PENDING, &req->flags);
 	list_splice_init(&queue->fuse_req_queue, &req_list);
+	queue->nr_reqs = 0;
 	spin_unlock(&queue->lock);
 
 	/* must not hold queue lock to avoid order issues with fi->lock */
@@ -1280,10 +1281,13 @@ void fuse_uring_queue_fuse_req(struct fuse_iqueue *fiq, struct fuse_req *req)
 	req->ring_queue = queue;
 	ent = list_first_entry_or_null(&queue->ent_avail_queue,
 				       struct fuse_ring_ent, list);
+	queue->nr_reqs++;
+
 	if (ent)
 		fuse_uring_add_req_to_ring_ent(ent, req);
 	else
 		list_add_tail(&req->list, &queue->fuse_req_queue);
+
 	spin_unlock(&queue->lock);
 
 	if (ent)
@@ -1319,6 +1323,7 @@ bool fuse_uring_queue_bq_req(struct fuse_req *req)
 	set_bit(FR_URING, &req->flags);
 	req->ring_queue = queue;
 	list_add_tail(&req->list, &queue->fuse_req_bg_queue);
+	queue->nr_reqs++;
 
 	ent = list_first_entry_or_null(&queue->ent_avail_queue,
 				       struct fuse_ring_ent, list);
@@ -1351,8 +1356,16 @@ bool fuse_uring_queue_bq_req(struct fuse_req *req)
 bool fuse_uring_remove_pending_req(struct fuse_req *req)
 {
 	struct fuse_ring_queue *queue = req->ring_queue;
+	bool removed = fuse_remove_pending_req(req, &queue->lock);
 
-	return fuse_remove_pending_req(req, &queue->lock);
+	if (removed) {
+		/* Update counters after successful removal */
+		spin_lock(&queue->lock);
+		queue->nr_reqs--;
+		spin_unlock(&queue->lock);
+	}
+
+	return removed;
 }
 
 static const struct fuse_iqueue_ops fuse_io_uring_ops = {
diff --git a/fs/fuse/dev_uring_i.h b/fs/fuse/dev_uring_i.h
index 51a563922ce14158904a86c248c77767be4fe5ae..c63bed9f863d53d4ac2bed7bfbda61941cd99083 100644
--- a/fs/fuse/dev_uring_i.h
+++ b/fs/fuse/dev_uring_i.h
@@ -94,6 +94,9 @@ struct fuse_ring_queue {
 	/* background fuse requests */
 	struct list_head fuse_req_bg_queue;
 
+	/* number of requests queued or in userspace */
+	unsigned int nr_reqs;
+
 	struct fuse_pqueue fpq;
 
 	unsigned int active_background;

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 2/5] fuse: {io-uring} Rename ring->nr_queues to max_nr_queues
  2025-07-22 21:57 [PATCH 0/5] fuse: {io-uring} Allow to reduce the number of queues and request distribution Bernd Schubert
  2025-07-22 21:57 ` [PATCH 1/5] fuse: {io-uring} Add queue length counters Bernd Schubert
@ 2025-07-22 21:57 ` Bernd Schubert
  2025-07-22 21:58 ` [PATCH 3/5] fuse: {io-uring} Use bitmaps to track queue availability Bernd Schubert
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 10+ messages in thread
From: Bernd Schubert @ 2025-07-22 21:57 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Joanne Koong, linux-fsdevel, Bernd Schubert

This is preparation for follow up commits that allow to run with a
reduced number of queues.

Signed-off-by: Bernd Schubert <bschubert@ddn.com>
---
 fs/fuse/dev_uring.c   | 24 ++++++++++++------------
 fs/fuse/dev_uring_i.h |  2 +-
 2 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/fs/fuse/dev_uring.c b/fs/fuse/dev_uring.c
index 2f2f7ff5e95a63a4df76f484d30cce1077b29123..0f5ab27dacb66c9f5f10eac2713d9bd3eb4c26da 100644
--- a/fs/fuse/dev_uring.c
+++ b/fs/fuse/dev_uring.c
@@ -124,7 +124,7 @@ void fuse_uring_abort_end_requests(struct fuse_ring *ring)
 	struct fuse_ring_queue *queue;
 	struct fuse_conn *fc = ring->fc;
 
-	for (qid = 0; qid < ring->nr_queues; qid++) {
+	for (qid = 0; qid < ring->max_nr_queues; qid++) {
 		queue = READ_ONCE(ring->queues[qid]);
 		if (!queue)
 			continue;
@@ -165,7 +165,7 @@ bool fuse_uring_request_expired(struct fuse_conn *fc)
 	if (!ring)
 		return false;
 
-	for (qid = 0; qid < ring->nr_queues; qid++) {
+	for (qid = 0; qid < ring->max_nr_queues; qid++) {
 		queue = READ_ONCE(ring->queues[qid]);
 		if (!queue)
 			continue;
@@ -192,7 +192,7 @@ void fuse_uring_destruct(struct fuse_conn *fc)
 	if (!ring)
 		return;
 
-	for (qid = 0; qid < ring->nr_queues; qid++) {
+	for (qid = 0; qid < ring->max_nr_queues; qid++) {
 		struct fuse_ring_queue *queue = ring->queues[qid];
 		struct fuse_ring_ent *ent, *next;
 
@@ -252,7 +252,7 @@ static struct fuse_ring *fuse_uring_create(struct fuse_conn *fc)
 
 	init_waitqueue_head(&ring->stop_waitq);
 
-	ring->nr_queues = nr_queues;
+	ring->max_nr_queues = nr_queues;
 	ring->fc = fc;
 	ring->max_payload_sz = max_payload_size;
 	smp_store_release(&fc->ring, ring);
@@ -404,7 +404,7 @@ static void fuse_uring_log_ent_state(struct fuse_ring *ring)
 	int qid;
 	struct fuse_ring_ent *ent;
 
-	for (qid = 0; qid < ring->nr_queues; qid++) {
+	for (qid = 0; qid < ring->max_nr_queues; qid++) {
 		struct fuse_ring_queue *queue = ring->queues[qid];
 
 		if (!queue)
@@ -435,7 +435,7 @@ static void fuse_uring_async_stop_queues(struct work_struct *work)
 		container_of(work, struct fuse_ring, async_teardown_work.work);
 
 	/* XXX code dup */
-	for (qid = 0; qid < ring->nr_queues; qid++) {
+	for (qid = 0; qid < ring->max_nr_queues; qid++) {
 		struct fuse_ring_queue *queue = READ_ONCE(ring->queues[qid]);
 
 		if (!queue)
@@ -470,7 +470,7 @@ void fuse_uring_stop_queues(struct fuse_ring *ring)
 {
 	int qid;
 
-	for (qid = 0; qid < ring->nr_queues; qid++) {
+	for (qid = 0; qid < ring->max_nr_queues; qid++) {
 		struct fuse_ring_queue *queue = READ_ONCE(ring->queues[qid]);
 
 		if (!queue)
@@ -889,7 +889,7 @@ static int fuse_uring_commit_fetch(struct io_uring_cmd *cmd, int issue_flags,
 	if (!ring)
 		return err;
 
-	if (qid >= ring->nr_queues)
+	if (qid >= ring->max_nr_queues)
 		return -EINVAL;
 
 	queue = ring->queues[qid];
@@ -952,7 +952,7 @@ static bool is_ring_ready(struct fuse_ring *ring, int current_qid)
 	struct fuse_ring_queue *queue;
 	bool ready = true;
 
-	for (qid = 0; qid < ring->nr_queues && ready; qid++) {
+	for (qid = 0; qid < ring->max_nr_queues && ready; qid++) {
 		if (current_qid == qid)
 			continue;
 
@@ -1093,7 +1093,7 @@ static int fuse_uring_register(struct io_uring_cmd *cmd,
 			return err;
 	}
 
-	if (qid >= ring->nr_queues) {
+	if (qid >= ring->max_nr_queues) {
 		pr_info_ratelimited("fuse: Invalid ring qid %u\n", qid);
 		return -EINVAL;
 	}
@@ -1236,9 +1236,9 @@ static struct fuse_ring_queue *fuse_uring_task_to_queue(struct fuse_ring *ring)
 
 	qid = task_cpu(current);
 
-	if (WARN_ONCE(qid >= ring->nr_queues,
+	if (WARN_ONCE(qid >= ring->max_nr_queues,
 		      "Core number (%u) exceeds nr queues (%zu)\n", qid,
-		      ring->nr_queues))
+		      ring->max_nr_queues))
 		qid = 0;
 
 	queue = ring->queues[qid];
diff --git a/fs/fuse/dev_uring_i.h b/fs/fuse/dev_uring_i.h
index c63bed9f863d53d4ac2bed7bfbda61941cd99083..708412294982566919122a1a0d7f741217c763ce 100644
--- a/fs/fuse/dev_uring_i.h
+++ b/fs/fuse/dev_uring_i.h
@@ -113,7 +113,7 @@ struct fuse_ring {
 	struct fuse_conn *fc;
 
 	/* number of ring queues */
-	size_t nr_queues;
+	size_t max_nr_queues;
 
 	/* maximum payload/arg size */
 	size_t max_payload_sz;

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 3/5] fuse: {io-uring} Use bitmaps to track queue availability
  2025-07-22 21:57 [PATCH 0/5] fuse: {io-uring} Allow to reduce the number of queues and request distribution Bernd Schubert
  2025-07-22 21:57 ` [PATCH 1/5] fuse: {io-uring} Add queue length counters Bernd Schubert
  2025-07-22 21:57 ` [PATCH 2/5] fuse: {io-uring} Rename ring->nr_queues to max_nr_queues Bernd Schubert
@ 2025-07-22 21:58 ` Bernd Schubert
  2025-07-24 23:56   ` Joanne Koong
  2025-07-22 21:58 ` [PATCH 4/5] fuse: {io-uring} Distribute load among queues Bernd Schubert
  2025-07-22 21:58 ` [PATCH 5/5] fuse: {io-uring} Allow reduced number of ring queues Bernd Schubert
  4 siblings, 1 reply; 10+ messages in thread
From: Bernd Schubert @ 2025-07-22 21:58 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Joanne Koong, linux-fsdevel, Bernd Schubert

Add per-CPU and per-NUMA node bitmasks to track which
io-uring queues are available for new requests.

- Global queue availability (avail_q_mask)
- Per-NUMA node queue availability (per_numa_avail_q_mask)
- Global queue registration (registered_q_mask)
- Per-NUMA node queue registration (numa_registered_q_mask)

Note that these bitmasks are not lock protected, accessing them
will not be absolutely accurate. Goal is to determine which
queues are aproximately idle and might be better suited for
a request.

Signed-off-by: Bernd Schubert <bschubert@ddn.com>
---
 fs/fuse/dev_uring.c   | 99 +++++++++++++++++++++++++++++++++++++++++++++++++++
 fs/fuse/dev_uring_i.h | 18 ++++++++++
 2 files changed, 117 insertions(+)

diff --git a/fs/fuse/dev_uring.c b/fs/fuse/dev_uring.c
index 0f5ab27dacb66c9f5f10eac2713d9bd3eb4c26da..c2bc20848bc54541ede9286562177994e7ca5879 100644
--- a/fs/fuse/dev_uring.c
+++ b/fs/fuse/dev_uring.c
@@ -18,6 +18,8 @@ MODULE_PARM_DESC(enable_uring,
 
 #define FUSE_URING_IOV_SEGS 2 /* header and payload */
 
+/* Number of queued fuse requests until a queue is considered full */
+#define FUSE_URING_QUEUE_THRESHOLD 5
 
 bool fuse_uring_enabled(void)
 {
@@ -184,6 +186,25 @@ bool fuse_uring_request_expired(struct fuse_conn *fc)
 	return false;
 }
 
+static void fuse_ring_destruct_q_masks(struct fuse_ring *ring)
+{
+	int node;
+
+	free_cpumask_var(ring->avail_q_mask);
+	if (ring->per_numa_avail_q_mask) {
+		for (node = 0; node < ring->nr_numa_nodes; node++)
+			free_cpumask_var(ring->per_numa_avail_q_mask[node]);
+		kfree(ring->per_numa_avail_q_mask);
+	}
+
+	free_cpumask_var(ring->registered_q_mask);
+	if (ring->numa_registered_q_mask) {
+		for (node = 0; node < ring->nr_numa_nodes; node++)
+			free_cpumask_var(ring->numa_registered_q_mask[node]);
+		kfree(ring->numa_registered_q_mask);
+	}
+}
+
 void fuse_uring_destruct(struct fuse_conn *fc)
 {
 	struct fuse_ring *ring = fc->ring;
@@ -215,11 +236,44 @@ void fuse_uring_destruct(struct fuse_conn *fc)
 		ring->queues[qid] = NULL;
 	}
 
+	fuse_ring_destruct_q_masks(ring);
 	kfree(ring->queues);
 	kfree(ring);
 	fc->ring = NULL;
 }
 
+static int fuse_ring_create_q_masks(struct fuse_ring *ring)
+{
+	if (!zalloc_cpumask_var(&ring->avail_q_mask, GFP_KERNEL_ACCOUNT))
+		return -ENOMEM;
+
+	if (!zalloc_cpumask_var(&ring->registered_q_mask, GFP_KERNEL_ACCOUNT))
+		return -ENOMEM;
+
+	ring->per_numa_avail_q_mask = kcalloc(ring->nr_numa_nodes,
+					      sizeof(struct cpumask *),
+					      GFP_KERNEL_ACCOUNT);
+	if (!ring->per_numa_avail_q_mask)
+		return -ENOMEM;
+	for (int node = 0; node < ring->nr_numa_nodes; node++)
+		if (!zalloc_cpumask_var(&ring->per_numa_avail_q_mask[node],
+					GFP_KERNEL_ACCOUNT))
+			return -ENOMEM;
+
+	ring->numa_registered_q_mask = kcalloc(ring->nr_numa_nodes,
+					       sizeof(struct cpumask *),
+					       GFP_KERNEL_ACCOUNT);
+	if (!ring->numa_registered_q_mask)
+		return -ENOMEM;
+	for (int node = 0; node < ring->nr_numa_nodes; node++) {
+		if (!zalloc_cpumask_var(&ring->numa_registered_q_mask[node],
+					GFP_KERNEL_ACCOUNT))
+			return -ENOMEM;
+	}
+
+	return 0;
+}
+
 /*
  * Basic ring setup for this connection based on the provided configuration
  */
@@ -229,11 +283,14 @@ static struct fuse_ring *fuse_uring_create(struct fuse_conn *fc)
 	size_t nr_queues = num_possible_cpus();
 	struct fuse_ring *res = NULL;
 	size_t max_payload_size;
+	int err;
 
 	ring = kzalloc(sizeof(*fc->ring), GFP_KERNEL_ACCOUNT);
 	if (!ring)
 		return NULL;
 
+	ring->nr_numa_nodes = num_online_nodes();
+
 	ring->queues = kcalloc(nr_queues, sizeof(struct fuse_ring_queue *),
 			       GFP_KERNEL_ACCOUNT);
 	if (!ring->queues)
@@ -242,6 +299,10 @@ static struct fuse_ring *fuse_uring_create(struct fuse_conn *fc)
 	max_payload_size = max(FUSE_MIN_READ_BUFFER, fc->max_write);
 	max_payload_size = max(max_payload_size, fc->max_pages * PAGE_SIZE);
 
+	err = fuse_ring_create_q_masks(ring);
+	if (err)
+		goto out_err;
+
 	spin_lock(&fc->lock);
 	if (fc->ring) {
 		/* race, another thread created the ring in the meantime */
@@ -261,6 +322,7 @@ static struct fuse_ring *fuse_uring_create(struct fuse_conn *fc)
 	return ring;
 
 out_err:
+	fuse_ring_destruct_q_masks(ring);
 	kfree(ring->queues);
 	kfree(ring);
 	return res;
@@ -284,6 +346,10 @@ static struct fuse_ring_queue *fuse_uring_create_queue(struct fuse_ring *ring,
 
 	queue->qid = qid;
 	queue->ring = ring;
+	queue->numa_node = cpu_to_node(qid);
+	if (unlikely(queue->numa_node < 0 ||
+		     queue->numa_node >= ring->nr_numa_nodes))
+		queue->numa_node = 0;
 	spin_lock_init(&queue->lock);
 
 	INIT_LIST_HEAD(&queue->ent_avail_queue);
@@ -423,6 +489,7 @@ static void fuse_uring_log_ent_state(struct fuse_ring *ring)
 			pr_info(" ent-commit-queue ring=%p qid=%d ent=%p state=%d\n",
 				ring, qid, ent, ent->state);
 		}
+
 		spin_unlock(&queue->lock);
 	}
 	ring->stop_debug_log = 1;
@@ -472,11 +539,18 @@ void fuse_uring_stop_queues(struct fuse_ring *ring)
 
 	for (qid = 0; qid < ring->max_nr_queues; qid++) {
 		struct fuse_ring_queue *queue = READ_ONCE(ring->queues[qid]);
+		int node;
 
 		if (!queue)
 			continue;
 
 		fuse_uring_teardown_entries(queue);
+
+		node = queue->numa_node;
+		cpumask_clear_cpu(qid, ring->registered_q_mask);
+		cpumask_clear_cpu(qid, ring->avail_q_mask);
+		cpumask_clear_cpu(qid, ring->numa_registered_q_mask[node]);
+		cpumask_clear_cpu(qid, ring->per_numa_avail_q_mask[node]);
 	}
 
 	if (atomic_read(&ring->queue_refs) > 0) {
@@ -744,9 +818,18 @@ static int fuse_uring_send_next_to_ring(struct fuse_ring_ent *ent,
 static void fuse_uring_ent_avail(struct fuse_ring_ent *ent,
 				 struct fuse_ring_queue *queue)
 {
+	struct fuse_ring *ring = queue->ring;
+	int node = queue->numa_node;
+
 	WARN_ON_ONCE(!ent->cmd);
 	list_move(&ent->list, &queue->ent_avail_queue);
 	ent->state = FRRS_AVAILABLE;
+
+	if (list_is_singular(&queue->ent_avail_queue) &&
+	    queue->nr_reqs <= FUSE_URING_QUEUE_THRESHOLD) {
+		cpumask_set_cpu(queue->qid, ring->avail_q_mask);
+		cpumask_set_cpu(queue->qid, ring->per_numa_avail_q_mask[node]);
+	}
 }
 
 /* Used to find the request on SQE commit */
@@ -769,6 +852,8 @@ static void fuse_uring_add_req_to_ring_ent(struct fuse_ring_ent *ent,
 					   struct fuse_req *req)
 {
 	struct fuse_ring_queue *queue = ent->queue;
+	struct fuse_ring *ring = queue->ring;
+	int node = queue->numa_node;
 
 	lockdep_assert_held(&queue->lock);
 
@@ -783,6 +868,16 @@ static void fuse_uring_add_req_to_ring_ent(struct fuse_ring_ent *ent,
 	ent->state = FRRS_FUSE_REQ;
 	list_move_tail(&ent->list, &queue->ent_w_req_queue);
 	fuse_uring_add_to_pq(ent, req);
+
+	/*
+	 * If there are no more available entries, mark the queue as unavailable
+	 * in both global and per-NUMA node masks
+	 */
+	if (list_empty(&queue->ent_avail_queue)) {
+		cpumask_clear_cpu(queue->qid, ring->avail_q_mask);
+		cpumask_clear_cpu(queue->qid,
+				  ring->per_numa_avail_q_mask[node]);
+	}
 }
 
 /* Fetch the next fuse request if available */
@@ -982,6 +1077,7 @@ static void fuse_uring_do_register(struct fuse_ring_ent *ent,
 	struct fuse_ring *ring = queue->ring;
 	struct fuse_conn *fc = ring->fc;
 	struct fuse_iqueue *fiq = &fc->iq;
+	int node = queue->numa_node;
 
 	fuse_uring_prepare_cancel(cmd, issue_flags, ent);
 
@@ -990,6 +1086,9 @@ static void fuse_uring_do_register(struct fuse_ring_ent *ent,
 	fuse_uring_ent_avail(ent, queue);
 	spin_unlock(&queue->lock);
 
+	cpumask_set_cpu(queue->qid, ring->registered_q_mask);
+	cpumask_set_cpu(queue->qid, ring->numa_registered_q_mask[node]);
+
 	if (!ring->ready) {
 		bool ready = is_ring_ready(ring, queue->qid);
 
diff --git a/fs/fuse/dev_uring_i.h b/fs/fuse/dev_uring_i.h
index 708412294982566919122a1a0d7f741217c763ce..0457dbc6737c8876dd7a7d4c9c724da05e553e6a 100644
--- a/fs/fuse/dev_uring_i.h
+++ b/fs/fuse/dev_uring_i.h
@@ -66,6 +66,9 @@ struct fuse_ring_queue {
 	/* queue id, corresponds to the cpu core */
 	unsigned int qid;
 
+	/* NUMA node this queue belongs to */
+	int numa_node;
+
 	/*
 	 * queue lock, taken when any value in the queue changes _and_ also
 	 * a ring entry state changes.
@@ -115,6 +118,9 @@ struct fuse_ring {
 	/* number of ring queues */
 	size_t max_nr_queues;
 
+	/* number of numa nodes */
+	int nr_numa_nodes;
+
 	/* maximum payload/arg size */
 	size_t max_payload_sz;
 
@@ -125,6 +131,18 @@ struct fuse_ring {
 	 */
 	unsigned int stop_debug_log : 1;
 
+	/* Tracks which queues are available (empty) globally */
+	cpumask_var_t avail_q_mask;
+
+	/* Tracks which queues are available per NUMA node */
+	cpumask_var_t *per_numa_avail_q_mask;
+
+	/* Tracks which queues are registered */
+	cpumask_var_t registered_q_mask;
+
+	/* Tracks which queues are registered per NUMA node */
+	cpumask_var_t *numa_registered_q_mask;
+
 	wait_queue_head_t stop_waitq;
 
 	/* async tear down */

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 4/5] fuse: {io-uring} Distribute load among queues
  2025-07-22 21:57 [PATCH 0/5] fuse: {io-uring} Allow to reduce the number of queues and request distribution Bernd Schubert
                   ` (2 preceding siblings ...)
  2025-07-22 21:58 ` [PATCH 3/5] fuse: {io-uring} Use bitmaps to track queue availability Bernd Schubert
@ 2025-07-22 21:58 ` Bernd Schubert
  2025-07-22 21:58 ` [PATCH 5/5] fuse: {io-uring} Allow reduced number of ring queues Bernd Schubert
  4 siblings, 0 replies; 10+ messages in thread
From: Bernd Schubert @ 2025-07-22 21:58 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Joanne Koong, linux-fsdevel, Bernd Schubert

So far queue selection was only for the queue corresponding
to the current core.
A previous commit introduced bitmaps that track which queues
are available - queue selection can make use of these bitmaps
and try to find another queue if the current one is loaded.

Signed-off-by: Bernd Schubert <bschubert@ddn.com>
---
 fs/fuse/dev_uring.c | 98 +++++++++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 88 insertions(+), 10 deletions(-)

diff --git a/fs/fuse/dev_uring.c b/fs/fuse/dev_uring.c
index c2bc20848bc54541ede9286562177994e7ca5879..624f856388e0867f3c3caed6771e61babd076645 100644
--- a/fs/fuse/dev_uring.c
+++ b/fs/fuse/dev_uring.c
@@ -825,8 +825,7 @@ static void fuse_uring_ent_avail(struct fuse_ring_ent *ent,
 	list_move(&ent->list, &queue->ent_avail_queue);
 	ent->state = FRRS_AVAILABLE;
 
-	if (list_is_singular(&queue->ent_avail_queue) &&
-	    queue->nr_reqs <= FUSE_URING_QUEUE_THRESHOLD) {
+	if (queue->nr_reqs <= FUSE_URING_QUEUE_THRESHOLD) {
 		cpumask_set_cpu(queue->qid, ring->avail_q_mask);
 		cpumask_set_cpu(queue->qid, ring->per_numa_avail_q_mask[node]);
 	}
@@ -1066,6 +1065,23 @@ static bool is_ring_ready(struct fuse_ring *ring, int current_qid)
 	return ready;
 }
 
+static int fuse_uring_map_qid(int qid, const struct cpumask *mask)
+{
+	int nr_queues = cpumask_weight(mask);
+	int nth, cpu;
+
+	if (nr_queues == 0)
+		return -1;
+
+	nth = qid % nr_queues;
+	for_each_cpu(cpu, mask) {
+		if (nth-- == 0)
+			return cpu;
+	}
+
+	return -1;
+}
+
 /*
  * fuse_uring_req_fetch command handling
  */
@@ -1328,22 +1344,57 @@ static void fuse_uring_send_in_task(struct io_uring_cmd *cmd,
 	fuse_uring_send(ent, cmd, err, issue_flags);
 }
 
-static struct fuse_ring_queue *fuse_uring_task_to_queue(struct fuse_ring *ring)
+static struct fuse_ring_queue *
+fuse_uring_get_first_queue(struct fuse_ring *ring, const struct cpumask *mask)
+{
+	int qid;
+
+	/* Find the first available CPU in this mask */
+	qid = cpumask_first(mask);
+
+	/* Check if we found a valid CPU */
+	if (qid >= ring->max_nr_queues)
+		return NULL; /* No available queues */
+
+	/* This is the global mask, cpu is already the global qid */
+	return ring->queues[qid];
+}
+
+/*
+ * Get the best queue for the current CPU
+ */
+static struct fuse_ring_queue *fuse_uring_get_queue(struct fuse_ring *ring)
 {
 	unsigned int qid;
-	struct fuse_ring_queue *queue;
+	struct fuse_ring_queue *queue, *local_queue;
+	int local_node;
+	struct cpumask *mask;
 
 	qid = task_cpu(current);
-
 	if (WARN_ONCE(qid >= ring->max_nr_queues,
 		      "Core number (%u) exceeds nr queues (%zu)\n", qid,
 		      ring->max_nr_queues))
 		qid = 0;
+	local_node = cpu_to_node(qid);
 
-	queue = ring->queues[qid];
-	WARN_ONCE(!queue, "Missing queue for qid %d\n", qid);
+	local_queue = queue = ring->queues[qid];
+	if (WARN_ONCE(!queue, "Missing queue for qid %d\n", qid))
+		return NULL;
 
-	return queue;
+	if (queue->nr_reqs <= FUSE_URING_QUEUE_THRESHOLD)
+		return queue;
+
+	mask = ring->per_numa_avail_q_mask[local_node];
+	queue = fuse_uring_get_first_queue(ring, mask);
+	if (queue)
+		return queue;
+
+	/* Third check if there are any available queues on any node */
+	queue = fuse_uring_get_first_queue(ring, ring->avail_q_mask);
+	if (queue)
+		return queue;
+
+	return local_queue;
 }
 
 static void fuse_uring_dispatch_ent(struct fuse_ring_ent *ent)
@@ -1364,7 +1415,7 @@ void fuse_uring_queue_fuse_req(struct fuse_iqueue *fiq, struct fuse_req *req)
 	int err;
 
 	err = -EINVAL;
-	queue = fuse_uring_task_to_queue(ring);
+	queue = fuse_uring_get_queue(ring);
 	if (!queue)
 		goto err;
 
@@ -1382,6 +1433,19 @@ void fuse_uring_queue_fuse_req(struct fuse_iqueue *fiq, struct fuse_req *req)
 				       struct fuse_ring_ent, list);
 	queue->nr_reqs++;
 
+	/*
+	 * Update queue availability based on number of requests
+	 * A queue is considered busy if it has more than
+	 * FUSE_URING_QUEUE_THRESHOLD requests
+	 */
+	if (queue->nr_reqs == FUSE_URING_QUEUE_THRESHOLD + 1) {
+		/* Queue just became busy */
+		cpumask_clear_cpu(queue->qid, ring->avail_q_mask);
+		cpumask_clear_cpu(
+			queue->qid,
+			ring->per_numa_avail_q_mask[queue->numa_node]);
+	}
+
 	if (ent)
 		fuse_uring_add_req_to_ring_ent(ent, req);
 	else
@@ -1409,7 +1473,7 @@ bool fuse_uring_queue_bq_req(struct fuse_req *req)
 	struct fuse_ring_queue *queue;
 	struct fuse_ring_ent *ent = NULL;
 
-	queue = fuse_uring_task_to_queue(ring);
+	queue = fuse_uring_get_queue(ring);
 	if (!queue)
 		return false;
 
@@ -1455,12 +1519,26 @@ bool fuse_uring_queue_bq_req(struct fuse_req *req)
 bool fuse_uring_remove_pending_req(struct fuse_req *req)
 {
 	struct fuse_ring_queue *queue = req->ring_queue;
+	struct fuse_ring *ring = queue->ring;
+	int node = queue->numa_node;
 	bool removed = fuse_remove_pending_req(req, &queue->lock);
 
 	if (removed) {
 		/* Update counters after successful removal */
 		spin_lock(&queue->lock);
 		queue->nr_reqs--;
+
+		/*
+		 * Update queue availability based on number of requests
+		 * A queue is considered available if it has
+		 * FUSE_URING_QUEUE_THRESHOLD or fewer requests
+		 */
+		if (queue->nr_reqs == FUSE_URING_QUEUE_THRESHOLD) {
+			/* Queue just became available */
+			cpumask_set_cpu(queue->qid, ring->avail_q_mask);
+			cpumask_set_cpu(queue->qid,
+					ring->per_numa_avail_q_mask[node]);
+		}
 		spin_unlock(&queue->lock);
 	}
 

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 5/5] fuse: {io-uring} Allow reduced number of ring queues
  2025-07-22 21:57 [PATCH 0/5] fuse: {io-uring} Allow to reduce the number of queues and request distribution Bernd Schubert
                   ` (3 preceding siblings ...)
  2025-07-22 21:58 ` [PATCH 4/5] fuse: {io-uring} Distribute load among queues Bernd Schubert
@ 2025-07-22 21:58 ` Bernd Schubert
  2025-07-25  0:43   ` Joanne Koong
  4 siblings, 1 reply; 10+ messages in thread
From: Bernd Schubert @ 2025-07-22 21:58 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Joanne Koong, linux-fsdevel, Bernd Schubert

Currently, FUSE io-uring requires all queues to be registered before
becoming ready, which can result in too much memory usage.

This patch introduces a static queue mapping system that allows FUSE
io-uring to operate with a reduced number of registered queues by:

1. Adding a queue_mapping array to track which registered queue each
   CPU should use
2. Replacing the is_ring_ready() check with immediate queue mapping
   once any queues are registered
3. Implementing fuse_uring_map_queues() to create CPU-to-queue mappings
   that prefer NUMA-local queues when available
4. Updating fuse_uring_get_queue() to use the static mapping instead
   of direct CPU-to-queue correspondence

The mapping prioritizes NUMA locality by first attempting to map CPUs
to queues on the same NUMA node, falling back to any available
registered queue if no local queue exists.

Signed-off-by: Bernd Schubert <bschubert@ddn.com>
---
 fs/fuse/dev_uring.c   | 112 ++++++++++++++++++++++++++++++--------------------
 fs/fuse/dev_uring_i.h |   3 ++
 2 files changed, 71 insertions(+), 44 deletions(-)

diff --git a/fs/fuse/dev_uring.c b/fs/fuse/dev_uring.c
index 624f856388e0867f3c3caed6771e61babd076645..8d16880cb0eb9b252dd6b6cf565011c3787ad1d0 100644
--- a/fs/fuse/dev_uring.c
+++ b/fs/fuse/dev_uring.c
@@ -238,6 +238,7 @@ void fuse_uring_destruct(struct fuse_conn *fc)
 
 	fuse_ring_destruct_q_masks(ring);
 	kfree(ring->queues);
+	kfree(ring->queue_mapping);
 	kfree(ring);
 	fc->ring = NULL;
 }
@@ -303,6 +304,12 @@ static struct fuse_ring *fuse_uring_create(struct fuse_conn *fc)
 	if (err)
 		goto out_err;
 
+	err = -ENOMEM;
+	ring->queue_mapping =
+		kcalloc(nr_queues, sizeof(int), GFP_KERNEL_ACCOUNT);
+	if (!ring->queue_mapping)
+		goto out_err;
+
 	spin_lock(&fc->lock);
 	if (fc->ring) {
 		/* race, another thread created the ring in the meantime */
@@ -324,6 +331,7 @@ static struct fuse_ring *fuse_uring_create(struct fuse_conn *fc)
 out_err:
 	fuse_ring_destruct_q_masks(ring);
 	kfree(ring->queues);
+	kfree(ring->queue_mapping);
 	kfree(ring);
 	return res;
 }
@@ -1040,31 +1048,6 @@ static int fuse_uring_commit_fetch(struct io_uring_cmd *cmd, int issue_flags,
 	return 0;
 }
 
-static bool is_ring_ready(struct fuse_ring *ring, int current_qid)
-{
-	int qid;
-	struct fuse_ring_queue *queue;
-	bool ready = true;
-
-	for (qid = 0; qid < ring->max_nr_queues && ready; qid++) {
-		if (current_qid == qid)
-			continue;
-
-		queue = ring->queues[qid];
-		if (!queue) {
-			ready = false;
-			break;
-		}
-
-		spin_lock(&queue->lock);
-		if (list_empty(&queue->ent_avail_queue))
-			ready = false;
-		spin_unlock(&queue->lock);
-	}
-
-	return ready;
-}
-
 static int fuse_uring_map_qid(int qid, const struct cpumask *mask)
 {
 	int nr_queues = cpumask_weight(mask);
@@ -1082,6 +1065,41 @@ static int fuse_uring_map_qid(int qid, const struct cpumask *mask)
 	return -1;
 }
 
+static int fuse_uring_map_queues(struct fuse_ring *ring)
+{
+	int qid, mapped_qid, node;
+
+	for (qid = 0; qid < ring->max_nr_queues; qid++) {
+		node = cpu_to_node(qid);
+		if (WARN_ON_ONCE(node >= ring->nr_numa_nodes) || node < 0)
+			return -EINVAL;
+
+		/* First try to find a registered queue on the same NUMA node */
+		mapped_qid = fuse_uring_map_qid(
+			qid, ring->numa_registered_q_mask[node]);
+		if (mapped_qid < 0) {
+			/*
+			 * No registered queue on this NUMA node,
+			 * use any registered queue
+			 */
+			mapped_qid = fuse_uring_map_qid(
+				qid, ring->registered_q_mask);
+			if (WARN_ON_ONCE(mapped_qid < 0))
+				return -EINVAL;
+		}
+
+		if (WARN_ON_ONCE(!ring->queues[mapped_qid])) {
+			pr_err("qid=%d mapped_qid=%d not created\n", qid,
+			       mapped_qid);
+			return -EINVAL;
+		}
+
+		WRITE_ONCE(ring->queue_mapping[qid], mapped_qid);
+	}
+
+	return 0;
+}
+
 /*
  * fuse_uring_req_fetch command handling
  */
@@ -1094,6 +1112,7 @@ static void fuse_uring_do_register(struct fuse_ring_ent *ent,
 	struct fuse_conn *fc = ring->fc;
 	struct fuse_iqueue *fiq = &fc->iq;
 	int node = queue->numa_node;
+	int err;
 
 	fuse_uring_prepare_cancel(cmd, issue_flags, ent);
 
@@ -1105,14 +1124,14 @@ static void fuse_uring_do_register(struct fuse_ring_ent *ent,
 	cpumask_set_cpu(queue->qid, ring->registered_q_mask);
 	cpumask_set_cpu(queue->qid, ring->numa_registered_q_mask[node]);
 
-	if (!ring->ready) {
-		bool ready = is_ring_ready(ring, queue->qid);
+	err = fuse_uring_map_queues(ring);
+	if (err)
+		return;
 
-		if (ready) {
-			WRITE_ONCE(fiq->ops, &fuse_io_uring_ops);
-			WRITE_ONCE(ring->ready, true);
-			wake_up_all(&fc->blocked_waitq);
-		}
+	if (!ring->ready) {
+		WRITE_ONCE(fiq->ops, &fuse_io_uring_ops);
+		WRITE_ONCE(ring->ready, true);
+		wake_up_all(&fc->blocked_waitq);
 	}
 }
 
@@ -1365,25 +1384,27 @@ fuse_uring_get_first_queue(struct fuse_ring *ring, const struct cpumask *mask)
  */
 static struct fuse_ring_queue *fuse_uring_get_queue(struct fuse_ring *ring)
 {
-	unsigned int qid;
-	struct fuse_ring_queue *queue, *local_queue;
+	unsigned int mapped_qid;
+	struct fuse_ring_queue *queue;
 	int local_node;
 	struct cpumask *mask;
+	unsigned int core = task_cpu(current);
 
-	qid = task_cpu(current);
-	if (WARN_ONCE(qid >= ring->max_nr_queues,
-		      "Core number (%u) exceeds nr queues (%zu)\n", qid,
-		      ring->max_nr_queues))
-		qid = 0;
-	local_node = cpu_to_node(qid);
+	local_node = cpu_to_node(core);
+	if (WARN_ON_ONCE(local_node >= ring->nr_numa_nodes) || local_node < 0)
+		local_node = 0;
 
-	local_queue = queue = ring->queues[qid];
-	if (WARN_ONCE(!queue, "Missing queue for qid %d\n", qid))
-		return NULL;
+	if (WARN_ON_ONCE(core >= ring->max_nr_queues))
+		core = 0;
 
+	mapped_qid = READ_ONCE(ring->queue_mapping[core]);
+	queue = ring->queues[mapped_qid];
+
+	/* First check if current CPU's queue is available */
 	if (queue->nr_reqs <= FUSE_URING_QUEUE_THRESHOLD)
 		return queue;
 
+	/* Second check if there are any available queues on the local node */
 	mask = ring->per_numa_avail_q_mask[local_node];
 	queue = fuse_uring_get_first_queue(ring, mask);
 	if (queue)
@@ -1394,7 +1415,10 @@ static struct fuse_ring_queue *fuse_uring_get_queue(struct fuse_ring *ring)
 	if (queue)
 		return queue;
 
-	return local_queue;
+	/* no better queue available, use the mapped queue */
+	queue = ring->queues[mapped_qid];
+
+	return queue;
 }
 
 static void fuse_uring_dispatch_ent(struct fuse_ring_ent *ent)
diff --git a/fs/fuse/dev_uring_i.h b/fs/fuse/dev_uring_i.h
index 0457dbc6737c8876dd7a7d4c9c724da05e553e6a..e72b83471cbfc2e911273966f3715305ca10e9ef 100644
--- a/fs/fuse/dev_uring_i.h
+++ b/fs/fuse/dev_uring_i.h
@@ -153,6 +153,9 @@ struct fuse_ring {
 
 	atomic_t queue_refs;
 
+	/* static queue mapping */
+	int *queue_mapping;
+
 	bool ready;
 };
 

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/5] fuse: {io-uring} Add queue length counters
  2025-07-22 21:57 ` [PATCH 1/5] fuse: {io-uring} Add queue length counters Bernd Schubert
@ 2025-07-24 21:07   ` Joanne Koong
  0 siblings, 0 replies; 10+ messages in thread
From: Joanne Koong @ 2025-07-24 21:07 UTC (permalink / raw)
  To: Bernd Schubert; +Cc: Miklos Szeredi, linux-fsdevel

On Tue, Jul 22, 2025 at 2:58 PM Bernd Schubert <bschubert@ddn.com> wrote:
>
> This is another preparation and will be used for decision
> which queue to add a request to.
> ---
>  fs/fuse/dev_uring.c   | 17 +++++++++++++++--
>  fs/fuse/dev_uring_i.h |  3 +++
>  2 files changed, 18 insertions(+), 2 deletions(-)

LGTM

Reviewed-by: Joanne Koong <joannelkoong@gmail.com>
>
> diff --git a/fs/fuse/dev_uring.c b/fs/fuse/dev_uring.c
> index 249b210becb1cc2b40ae7b2fdf3a57dc57eaac42..2f2f7ff5e95a63a4df76f484d30cce1077b29123 100644
> --- a/fs/fuse/dev_uring.c
> +++ b/fs/fuse/dev_uring.c

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 3/5] fuse: {io-uring} Use bitmaps to track queue availability
  2025-07-22 21:58 ` [PATCH 3/5] fuse: {io-uring} Use bitmaps to track queue availability Bernd Schubert
@ 2025-07-24 23:56   ` Joanne Koong
  0 siblings, 0 replies; 10+ messages in thread
From: Joanne Koong @ 2025-07-24 23:56 UTC (permalink / raw)
  To: Bernd Schubert; +Cc: Miklos Szeredi, linux-fsdevel

On Tue, Jul 22, 2025 at 2:58 PM Bernd Schubert <bschubert@ddn.com> wrote:
>
> Add per-CPU and per-NUMA node bitmasks to track which
> io-uring queues are available for new requests.
>
> - Global queue availability (avail_q_mask)
> - Per-NUMA node queue availability (per_numa_avail_q_mask)
> - Global queue registration (registered_q_mask)
> - Per-NUMA node queue registration (numa_registered_q_mask)
>
> Note that these bitmasks are not lock protected, accessing them
> will not be absolutely accurate. Goal is to determine which
> queues are aproximately idle and might be better suited for
> a request.
>
> Signed-off-by: Bernd Schubert <bschubert@ddn.com>
> ---
>  fs/fuse/dev_uring.c   | 99 +++++++++++++++++++++++++++++++++++++++++++++++++++
>  fs/fuse/dev_uring_i.h | 18 ++++++++++
>  2 files changed, 117 insertions(+)
>
> diff --git a/fs/fuse/dev_uring.c b/fs/fuse/dev_uring.c
> index 0f5ab27dacb66c9f5f10eac2713d9bd3eb4c26da..c2bc20848bc54541ede9286562177994e7ca5879 100644
> --- a/fs/fuse/dev_uring.c
> +++ b/fs/fuse/dev_uring.c
> +static int fuse_ring_create_q_masks(struct fuse_ring *ring)
> +{
> +       if (!zalloc_cpumask_var(&ring->avail_q_mask, GFP_KERNEL_ACCOUNT))
> +               return -ENOMEM;
> +
> +       if (!zalloc_cpumask_var(&ring->registered_q_mask, GFP_KERNEL_ACCOUNT))
> +               return -ENOMEM;
> +
> +       ring->per_numa_avail_q_mask = kcalloc(ring->nr_numa_nodes,
> +                                             sizeof(struct cpumask *),

nit: sizeof(cpumask_var_t) since per_numa_avail_q_mask gets defined in
the struct as a cpumask_var_t?

> +                                             GFP_KERNEL_ACCOUNT);
> +       if (!ring->per_numa_avail_q_mask)
> +               return -ENOMEM;
> +       for (int node = 0; node < ring->nr_numa_nodes; node++)

nit: afaik, the general convention has the int declared at top of
function instead of inside loop scope

> +               if (!zalloc_cpumask_var(&ring->per_numa_avail_q_mask[node],
> +                                       GFP_KERNEL_ACCOUNT))
> @@ -472,11 +539,18 @@ void fuse_uring_stop_queues(struct fuse_ring *ring)
>
>         for (qid = 0; qid < ring->max_nr_queues; qid++) {
>                 struct fuse_ring_queue *queue = READ_ONCE(ring->queues[qid]);
> +               int node;
>
>                 if (!queue)
>                         continue;
>
>                 fuse_uring_teardown_entries(queue);
> +
> +               node = queue->numa_node;
> +               cpumask_clear_cpu(qid, ring->registered_q_mask);
> +               cpumask_clear_cpu(qid, ring->avail_q_mask);
> +               cpumask_clear_cpu(qid, ring->numa_registered_q_mask[node]);
> +               cpumask_clear_cpu(qid, ring->per_numa_avail_q_mask[node]);

Would it be more efficient to clear it all at once (eg
cpumask_clear()) outside the loop instead of clearing it bit by bit
here?

>         }
>
>         if (atomic_read(&ring->queue_refs) > 0) {
> @@ -744,9 +818,18 @@ static int fuse_uring_send_next_to_ring(struct fuse_ring_ent *ent,
>  static void fuse_uring_ent_avail(struct fuse_ring_ent *ent,
>                                  struct fuse_ring_queue *queue)
>  {
> +       struct fuse_ring *ring = queue->ring;
> +       int node = queue->numa_node;
> +
>         WARN_ON_ONCE(!ent->cmd);
>         list_move(&ent->list, &queue->ent_avail_queue);
>         ent->state = FRRS_AVAILABLE;
> +
> +       if (list_is_singular(&queue->ent_avail_queue) &&

Did you mean to include "list_is_singular()" here? I think even if
queue->ent_avail_queue has more than one entry on it, we still need to
run this loop in case it previously used to be the case that
queue->nr_reqs >= FUSE_URING_QUEUE_THRESHOLD?

> +           queue->nr_reqs <= FUSE_URING_QUEUE_THRESHOLD) {

Should this be <? Afaict, if queue->nr_reqs ==
FUSE_URING_QUEUE_THRESHOLD then it's considered full (and no longer
available)?

> +               cpumask_set_cpu(queue->qid, ring->avail_q_mask);
> +               cpumask_set_cpu(queue->qid, ring->per_numa_avail_q_mask[node]);
> +       }
>  }
>
>  /* Used to find the request on SQE commit */
> @@ -769,6 +852,8 @@ static void fuse_uring_add_req_to_ring_ent(struct fuse_ring_ent *ent,
>                                            struct fuse_req *req)
>  {
>         struct fuse_ring_queue *queue = ent->queue;
> +       struct fuse_ring *ring = queue->ring;
> +       int node = queue->numa_node;
>
>         lockdep_assert_held(&queue->lock);
>
> @@ -783,6 +868,16 @@ static void fuse_uring_add_req_to_ring_ent(struct fuse_ring_ent *ent,
>         ent->state = FRRS_FUSE_REQ;
>         list_move_tail(&ent->list, &queue->ent_w_req_queue);
>         fuse_uring_add_to_pq(ent, req);
> +
> +       /*
> +        * If there are no more available entries, mark the queue as unavailable
> +        * in both global and per-NUMA node masks
> +        */
> +       if (list_empty(&queue->ent_avail_queue)) {
> +               cpumask_clear_cpu(queue->qid, ring->avail_q_mask);
> +               cpumask_clear_cpu(queue->qid,
> +                                 ring->per_numa_avail_q_mask[node]);
> +       }
>  }
>
>  /* Fetch the next fuse request if available */
> @@ -982,6 +1077,7 @@ static void fuse_uring_do_register(struct fuse_ring_ent *ent,
>         struct fuse_ring *ring = queue->ring;
>         struct fuse_conn *fc = ring->fc;
>         struct fuse_iqueue *fiq = &fc->iq;
> +       int node = queue->numa_node;
>
>         fuse_uring_prepare_cancel(cmd, issue_flags, ent);
>
> @@ -990,6 +1086,9 @@ static void fuse_uring_do_register(struct fuse_ring_ent *ent,
>         fuse_uring_ent_avail(ent, queue);
>         spin_unlock(&queue->lock);
>
> +       cpumask_set_cpu(queue->qid, ring->registered_q_mask);
> +       cpumask_set_cpu(queue->qid, ring->numa_registered_q_mask[node]);
> +
>         if (!ring->ready) {
>                 bool ready = is_ring_ready(ring, queue->qid);
>
> diff --git a/fs/fuse/dev_uring_i.h b/fs/fuse/dev_uring_i.h
> index 708412294982566919122a1a0d7f741217c763ce..0457dbc6737c8876dd7a7d4c9c724da05e553e6a 100644
> --- a/fs/fuse/dev_uring_i.h
> +++ b/fs/fuse/dev_uring_i.h
> @@ -125,6 +131,18 @@ struct fuse_ring {
>          */
>         unsigned int stop_debug_log : 1;
>
> +       /* Tracks which queues are available (empty) globally */

nit: is "(empty)" accurate here? afaict, the queue is available as
long as it has <= FUSE_URING_QUEUE_THRESHOLD requests (eg even if it's
not empty)?

> +       cpumask_var_t avail_q_mask;

Should avail_q_mask  also get set accordingly when requests are queued
(eg fuse_uring_queue_fuse_req(), fuse_uring_queue_bq_req()) or
completed by userspace (eg fuse_uring_req_end()), if they meet
FUSE_URING_QUEUE_THRESHOLD?


Thanks,
Joanne
> +

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 5/5] fuse: {io-uring} Allow reduced number of ring queues
  2025-07-22 21:58 ` [PATCH 5/5] fuse: {io-uring} Allow reduced number of ring queues Bernd Schubert
@ 2025-07-25  0:43   ` Joanne Koong
  2025-08-04 10:17     ` Bernd Schubert
  0 siblings, 1 reply; 10+ messages in thread
From: Joanne Koong @ 2025-07-25  0:43 UTC (permalink / raw)
  To: Bernd Schubert; +Cc: Miklos Szeredi, linux-fsdevel

On Tue, Jul 22, 2025 at 2:58 PM Bernd Schubert <bschubert@ddn.com> wrote:
>
> Currently, FUSE io-uring requires all queues to be registered before
> becoming ready, which can result in too much memory usage.
>
> This patch introduces a static queue mapping system that allows FUSE
> io-uring to operate with a reduced number of registered queues by:
>
> 1. Adding a queue_mapping array to track which registered queue each
>    CPU should use
> 2. Replacing the is_ring_ready() check with immediate queue mapping
>    once any queues are registered
> 3. Implementing fuse_uring_map_queues() to create CPU-to-queue mappings
>    that prefer NUMA-local queues when available
> 4. Updating fuse_uring_get_queue() to use the static mapping instead
>    of direct CPU-to-queue correspondence
>
> The mapping prioritizes NUMA locality by first attempting to map CPUs
> to queues on the same NUMA node, falling back to any available
> registered queue if no local queue exists.

Do we need a static queue map or does it suffice to just overload a
queue on the local node if we're not able to find an "ideal" queue for
the request? it seems to me like if we default to that behavior, then
we get the advantages the static queue map is trying to provide (eg
marking the ring as ready as soon as the first queue is registered and
finding a last-resort queue for the request) without the overhead.


Thanks,
Joanne

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 5/5] fuse: {io-uring} Allow reduced number of ring queues
  2025-07-25  0:43   ` Joanne Koong
@ 2025-08-04 10:17     ` Bernd Schubert
  0 siblings, 0 replies; 10+ messages in thread
From: Bernd Schubert @ 2025-08-04 10:17 UTC (permalink / raw)
  To: Joanne Koong; +Cc: Miklos Szeredi, linux-fsdevel@vger.kernel.org

Hi Joanne,

thanks for your review and sorry for me late reply. Had sent out the
series night before going on vacation.

On 7/25/25 02:43, Joanne Koong wrote:
> On Tue, Jul 22, 2025 at 2:58 PM Bernd Schubert <bschubert@ddn.com> wrote:
>>
>> Currently, FUSE io-uring requires all queues to be registered before
>> becoming ready, which can result in too much memory usage.
>>
>> This patch introduces a static queue mapping system that allows FUSE
>> io-uring to operate with a reduced number of registered queues by:
>>
>> 1. Adding a queue_mapping array to track which registered queue each
>>    CPU should use
>> 2. Replacing the is_ring_ready() check with immediate queue mapping
>>    once any queues are registered
>> 3. Implementing fuse_uring_map_queues() to create CPU-to-queue mappings
>>    that prefer NUMA-local queues when available
>> 4. Updating fuse_uring_get_queue() to use the static mapping instead
>>    of direct CPU-to-queue correspondence
>>
>> The mapping prioritizes NUMA locality by first attempting to map CPUs
>> to queues on the same NUMA node, falling back to any available
>> registered queue if no local queue exists.
> 
> Do we need a static queue map or does it suffice to just overload a
> queue on the local node if we're not able to find an "ideal" queue for
> the request? it seems to me like if we default to that behavior, then
> we get the advantages the static queue map is trying to provide (eg
> marking the ring as ready as soon as the first queue is registered and
> finding a last-resort queue for the request) without the overhead.
> 

I have a branch for that, that uses the first available queue from
the registered queue bitmask. In testing with our ddn file system
it resulted in too imbalanced queue usage and I had given up that
approach therefore. Assuming the scheduler balances processes
between cores the static mappping guarantees balanced queues.


Thanks,
Bernd

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2025-08-04 11:51 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-22 21:57 [PATCH 0/5] fuse: {io-uring} Allow to reduce the number of queues and request distribution Bernd Schubert
2025-07-22 21:57 ` [PATCH 1/5] fuse: {io-uring} Add queue length counters Bernd Schubert
2025-07-24 21:07   ` Joanne Koong
2025-07-22 21:57 ` [PATCH 2/5] fuse: {io-uring} Rename ring->nr_queues to max_nr_queues Bernd Schubert
2025-07-22 21:58 ` [PATCH 3/5] fuse: {io-uring} Use bitmaps to track queue availability Bernd Schubert
2025-07-24 23:56   ` Joanne Koong
2025-07-22 21:58 ` [PATCH 4/5] fuse: {io-uring} Distribute load among queues Bernd Schubert
2025-07-22 21:58 ` [PATCH 5/5] fuse: {io-uring} Allow reduced number of ring queues Bernd Schubert
2025-07-25  0:43   ` Joanne Koong
2025-08-04 10:17     ` Bernd Schubert

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).