Linux filesystem development
 help / color / mirror / Atom feed
From: Bernd Schubert <bernd@bsbernd.com>
To: Joanne Koong <joannelkoong@gmail.com>, miklos@szeredi.hu
Cc: axboe@kernel.dk, linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH v2 10/14] fuse: add io-uring buffer rings
Date: Wed, 6 May 2026 00:47:30 +0200	[thread overview]
Message-ID: <456f05a2-dec3-487d-89ea-06fe0acd084a@bsbernd.com> (raw)
In-Reply-To: <20260402162840.2989717-11-joannelkoong@gmail.com>



On 4/2/26 18:28, Joanne Koong wrote:
> Add fuse buffer rings for servers communicating through the io-uring
> interface. To use this, the server must set the FUSE_URING_BUFRING
> flag and provide header and payload buffers via an iovec array in the
> sqe during registration. The payload buffers are used to back the buffer
> ring. The kernel manages buffer selection and recycling through a simple
> internal ring.
> 
> This has the following advantages over the non-bufring (iovec) path:
> - Reduced memory usage: in the iovec path, each entry has its own
>   dedicated payload buffer, requiring N buffers for N entries where each
>   buffer must be large enough to accommodate the maximum possible
>   payload size. With buffer rings, payload buffers are pooled and
>   selected on demand. Entries only hold a buffer while actively
>   processing a request with payload data. When incremental buffer
>   consumption is added, this will allow non-overlapping regions of a
>   single buffer to be used simultaneously across multiple requests,
>   further reducing memory requirements.
> - Foundation for pinned buffers: the buffer ring headers and payloads
>   are now each passed in as a contiguous memory allocation, which allows
>   fuse to easily pin and vmap the entire region in one operation during
>   queue setup. This will eliminate the per-request overhead of having to
>   pin/unpin user pages and translate virtual addresses and is a
>   prerequisite for future optimizations like performing data copies
>   outside of the server's task context.
> 
> Each ring entry gets a fixed ID (sqe->buf_index) that maps to a specific
> header slot in the headers buffer. Payload buffers are selected from
> the ring on demand and recycled after each request. Buffer ring usage is
> set on a per-queue basis. All subsequent registration SQEs for the same
> queue must use consistent flags.
> 
> The headers are laid out contiguously and provided via iov[0]. Each slot
> maps to ent->id:
> 
> |<- headers_size (>= queue_depth * sizeof(fuse_uring_req_header)) ->|
> +------------------------------+------------------------------+-----+
> | struct fuse_uring_req_header | struct fuse_uring_req_header | ... |
> |        [ent id=0]            |        [ent id=1]            |     |
> +------------------------------+------------------------------+-----+
> 
> On the server side, the ent id is used to determine where in the headers
> buffer the headers data for the ent resides. This is done by
> calculating ent_id * sizeof(struct fuse_uring_req_header) as the offset
> into the headers buffer.
> 
> The buffer ring is backed by the payload buffer, which is contiguous but
> partitioned into individual bufs according to the buf_size passed in at
> registration.
> 
>   PAYLOAD BUFFER POOL (contiguous, provided via iov[1]):
>     |<-------------- payload_size ------------>|
>     +--------- --+-----------+-----------+-----+
>     |  buf [0]   |  buf [1]  |  buf [2]  | ... |
>     |  buf_size  |  buf_size |  buf_size | ... |
>     +--------- --+-----------+-----------+-----+
> 
> buffer ring state (struct fuse_bufring, kernel-internal):
> bufs[]: [ used | used | FREE | FREE | FREE ]
> 			^^^^^^^^^^^^^^^^^^^
> 			available for selection
> 
> The buffer ring logic is as follows:
> select:  buf = bufs[head % nbufs]; head++
> recycle: bufs[tail % nbufs] = buf; tail++
> empty:   tail == head (no buffers available)
> full:    tail - head >= nbufs
> 
> Buffer ring request flow
> ------------------------
> |  Kernel                                  |  FUSE daemon
> |                                          |
> |  [client request arrives]                |
> |  >fuse_uring_send()                      |
> |    [select payload buf from ring]        |
> |    >fuse_uring_select_buffer()           |
> |    [copy headers to ent's header slot]   |
> |    >copy_header_to_ring()                |
> |    [copy payload to selected buf]        |
> |    >fuse_uring_copy_to_ring()            |
> |    [set buf_id in ent_in_out header]     |
> |    >io_uring_cmd_done()                  |
> |                                          |  [CQE received]
> |                                          |  [read headers from header
> |                                          |    slot]
> |                                          |  [read payload from buf_id]
> |                                          |  [process request]
> |                                          |  [write reply to header
> |                                          |    slot]
> |                                          |  [write reply payload to
> |                                          |    buf]
> |                                          |  >io_uring_submit()
> |                                          |   COMMIT_AND_FETCH
> |  >fuse_uring_commit_fetch()              |
> |    >fuse_uring_commit()                  |
> |     [copy reply from ring]               |
> |     >fuse_uring_recycle_buffer()         |
> |    >fuse_uring_get_next_fuse_req()       |
> 
> Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> ---
>  fs/fuse/dev_uring.c       | 363 +++++++++++++++++++++++++++++++++-----
>  fs/fuse/dev_uring_i.h     |  45 ++++-
>  include/uapi/linux/fuse.h |  27 ++-
>  3 files changed, 381 insertions(+), 54 deletions(-)
> 
> diff --git a/fs/fuse/dev_uring.c b/fs/fuse/dev_uring.c
> index a061f175b3fd..9f14a2bcde3f 100644
> --- a/fs/fuse/dev_uring.c
> +++ b/fs/fuse/dev_uring.c
> @@ -41,6 +41,11 @@ enum fuse_uring_header_type {
>  	FUSE_URING_HEADER_RING_ENT,
>  };
>  
> +static inline bool bufring_enabled(struct fuse_ring_queue *queue)
> +{
> +	return queue->bufring != NULL;
> +}
> +
>  static void uring_cmd_set_ring_ent(struct io_uring_cmd *cmd,
>  				   struct fuse_ring_ent *ring_ent)
>  {
> @@ -222,6 +227,7 @@ void fuse_uring_destruct(struct fuse_conn *fc)
>  		}
>  
>  		kfree(queue->fpq.processing);
> +		kfree(queue->bufring);
>  		kfree(queue);
>  		ring->queues[qid] = NULL;
>  	}
> @@ -303,20 +309,102 @@ static int fuse_uring_get_iovec_from_sqe(const struct io_uring_sqe *sqe,
>  	return 0;
>  }
>  
> -static struct fuse_ring_queue *fuse_uring_create_queue(struct fuse_ring *ring,
> -						       int qid)
> +static int fuse_uring_bufring_setup(struct io_uring_cmd *cmd,
> +				     struct fuse_ring_queue *queue)
> +{
> +	const struct fuse_uring_cmd_req *cmd_req =
> +		io_uring_sqe128_cmd(cmd->sqe, struct fuse_uring_cmd_req);
> +	u16 queue_depth = READ_ONCE(cmd_req->init.queue_depth);
> +	unsigned int buf_size = READ_ONCE(cmd_req->init.buf_size);
> +	struct iovec iov[FUSE_URING_IOV_SEGS];
> +	void __user *payload, *headers;
> +	size_t headers_size, payload_size, ring_size;
> +	struct fuse_bufring *br;
> +	unsigned int nr_bufs, i;
> +	uintptr_t payload_addr;
> +	int err;
> +
> +	if (!queue_depth || !buf_size)
> +		return -EINVAL;
> +
> +	err = fuse_uring_get_iovec_from_sqe(cmd->sqe, iov);
> +	if (err)
> +		return err;
> +
> +	headers = iov[FUSE_URING_IOV_HEADERS].iov_base;
> +	headers_size = iov[FUSE_URING_IOV_HEADERS].iov_len;
> +	payload = iov[FUSE_URING_IOV_PAYLOAD].iov_base;
> +	payload_size = iov[FUSE_URING_IOV_PAYLOAD].iov_len;
> +
> +	/* check if there's enough space for all the headers */
> +	if (headers_size < queue_depth * sizeof(struct fuse_uring_req_header))
> +		return -EINVAL;
> +
> +	if (buf_size < queue->ring->max_payload_sz)
> +		return -EINVAL;
> +
> +	nr_bufs = payload_size / buf_size;
> +	if (!nr_bufs || nr_bufs > U16_MAX)
> +		return -EINVAL;
> +
> +	/* create the ring buffer */
> +	ring_size = struct_size(br, bufs, nr_bufs);
> +	br = kzalloc(ring_size, GFP_KERNEL_ACCOUNT);
> +	if (!br)
> +		return -ENOMEM;
> +
> +	br->queue_depth = queue_depth;
> +	br->headers = headers;
> +
> +	payload_addr = (uintptr_t)payload;
> +
> +	/* populate the ring buffer */
> +	for (i = 0; i < nr_bufs; i++, payload_addr += buf_size) {
> +		struct fuse_bufring_buf *buf = &br->bufs[i];
> +
> +		buf->addr = payload_addr;
> +		buf->len = buf_size;
> +		buf->id = i;
> +	}
> +
> +	br->nbufs = nr_bufs;
> +	br->tail = nr_bufs;
> +
> +	queue->bufring = br;
> +
> +	return 0;
> +}
> +
> +/*
> + * if the queue is already registered, check that the queue was initialized with
> + * the same init flags set for this FUSE_IO_URING_CMD_REGISTER cmd. all
> + * FUSE_IO_URING_CMD_REGISTER cmds should have the same init fields set on a
> + * per-queue basis.
> + */
> +static bool queue_init_flags_consistent(struct fuse_ring_queue *queue,
> +					u64 init_flags)
>  {
> +	bool bufring = init_flags & FUSE_URING_BUFRING;
> +
> +	return bufring_enabled(queue) == bufring;
> +}
> +
> +static struct fuse_ring_queue *
> +fuse_uring_create_queue(struct io_uring_cmd *cmd, struct fuse_ring *ring,
> +			int qid, u64 init_flags)
> +{
> +	bool use_bufring = init_flags & FUSE_URING_BUFRING;
>  	struct fuse_conn *fc = ring->fc;
>  	struct fuse_ring_queue *queue;
>  	struct list_head *pq;
>  
>  	queue = kzalloc_obj(*queue, GFP_KERNEL_ACCOUNT);
>  	if (!queue)
> -		return NULL;
> +		return ERR_PTR(-ENOMEM);
>  	pq = kzalloc_objs(struct list_head, FUSE_PQ_HASH_SIZE);
>  	if (!pq) {
>  		kfree(queue);
> -		return NULL;
> +		return ERR_PTR(-ENOMEM);
>  	}
>  
>  	queue->qid = qid;
> @@ -334,12 +422,29 @@ static struct fuse_ring_queue *fuse_uring_create_queue(struct fuse_ring *ring,
>  	queue->fpq.processing = pq;
>  	fuse_pqueue_init(&queue->fpq);
>  
> +	if (use_bufring) {
> +		int err = fuse_uring_bufring_setup(cmd, queue);
> +
> +		if (err) {
> +			kfree(pq);
> +			kfree(queue);
> +			return ERR_PTR(err);
> +		}
> +	}
> +
>  	spin_lock(&fc->lock);
> +	/* check if the queue creation raced with another thread */
>  	if (ring->queues[qid]) {
>  		spin_unlock(&fc->lock);
>  		kfree(queue->fpq.processing);
> +		if (use_bufring)
> +			kfree(queue->bufring);
>  		kfree(queue);
> -		return ring->queues[qid];
> +
> +		queue = ring->queues[qid];
> +		if (!queue_init_flags_consistent(queue, init_flags))
> +			return ERR_PTR(-EINVAL);
> +		return queue;
>  	}
>  
>  	/*
> @@ -649,7 +754,14 @@ static int copy_header_to_ring(struct fuse_ring_ent *ent,
>  	if (offset < 0)
>  		return offset;
>  
> -	ring = (void __user *)ent->headers + offset;
> +	if (bufring_enabled(ent->queue)) {
> +		int buf_offset = offset +
> +			sizeof(struct fuse_uring_req_header) * ent->id;
> +
> +		ring = ent->queue->bufring->headers + buf_offset;
> +	} else {
> +		ring = (void __user *)ent->headers + offset;
> +	}
>  
>  	if (copy_to_user(ring, header, header_size)) {
>  		pr_info_ratelimited("Copying header to ring failed.\n");
> @@ -669,7 +781,14 @@ static int copy_header_from_ring(struct fuse_ring_ent *ent,
>  	if (offset < 0)
>  		return offset;
>  
> -	ring = (void __user *)ent->headers + offset;
> +	if (bufring_enabled(ent->queue)) {
> +		int buf_offset = offset +
> +			sizeof(struct fuse_uring_req_header) * ent->id;
> +
> +		ring = ent->queue->bufring->headers + buf_offset;
> +	} else {
> +		ring = (void __user *)ent->headers + offset;
> +	}
>  
>  	if (copy_from_user(header, ring, header_size)) {
>  		pr_info_ratelimited("Copying header from ring failed.\n");
> @@ -684,12 +803,20 @@ static int setup_fuse_copy_state(struct fuse_copy_state *cs,
>  				 struct fuse_ring_ent *ent, int dir,
>  				 struct iov_iter *iter)
>  {
> +	void __user *payload;
>  	int err;
>  
> -	err = import_ubuf(dir, ent->payload, ring->max_payload_sz, iter);
> -	if (err) {
> -		pr_info_ratelimited("fuse: Import of user buffer failed\n");
> -		return err;
> +	if (bufring_enabled(ent->queue))
> +		payload = (void __user *)ent->payload_buf.addr;
> +	else
> +		payload = ent->payload;
> +
> +	if (payload) {
> +		err = import_ubuf(dir, payload, ring->max_payload_sz, iter);
> +		if (err) {
> +			pr_info_ratelimited("fuse: Import of user buffer failed\n");
> +			return err;
> +		}
>  	}
>  
>  	fuse_copy_init(cs, dir == ITER_DEST, iter);
> @@ -741,6 +868,9 @@ static int fuse_uring_args_to_ring(struct fuse_ring *ring, struct fuse_req *req,
>  		.commit_id = req->in.h.unique,
>  	};
>  
> +	if (bufring_enabled(ent->queue))
> +		ent_in_out.buf_id = ent->payload_buf.id;
> +
>  	err = setup_fuse_copy_state(&cs, ring, req, ent, ITER_DEST, &iter);
>  	if (err)
>  		return err;
> @@ -805,6 +935,96 @@ static int fuse_uring_copy_to_ring(struct fuse_ring_ent *ent,
>  				   sizeof(req->in.h));
>  }
>  
> +static bool fuse_uring_req_has_payload(struct fuse_req *req)
> +{
> +	struct fuse_args *args = req->args;
> +
> +	return args->in_numargs > 1 || args->out_numargs;
> +}
> +
> +static int fuse_uring_select_buffer(struct fuse_ring_ent *ent)
> +	__must_hold(&ent->queue->lock)
> +{
> +	struct fuse_ring_queue *queue = ent->queue;
> +	struct fuse_bufring *br = queue->bufring;
> +	struct fuse_bufring_buf *buf;
> +	unsigned int tail = br->tail, head = br->head;
> +
> +	lockdep_assert_held(&queue->lock);
> +
> +	/* Get a buffer to use for the payload */
> +	if (tail == head)
> +		return -ENOBUFS;
> +
> +	buf = &br->bufs[head % br->nbufs];
> +	br->head++;

Just a minor annotation and we can do this any time later. For cache
effects (mostly large L3) it might be worth to update buffer selection
and buffer recycling to LIFO.


Thanks,
Bernd


  parent reply	other threads:[~2026-05-05 22:47 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-02 16:28 [PATCH v2 00/14] fuse: add io-uring buffer rings and zero-copy Joanne Koong
2026-04-02 16:28 ` [PATCH v2 01/14] fuse: separate next request fetching from sending logic Joanne Koong
2026-04-29 11:52   ` Jeff Layton
2026-04-02 16:28 ` [PATCH v2 02/14] fuse: refactor io-uring header copying to ring Joanne Koong
2026-04-29 12:05   ` Jeff Layton
2026-04-02 16:28 ` [PATCH v2 03/14] fuse: refactor io-uring header copying from ring Joanne Koong
2026-04-29 12:06   ` Jeff Layton
2026-04-02 16:28 ` [PATCH v2 04/14] fuse: use enum types for header copying Joanne Koong
2026-04-30  8:04   ` Jeff Layton
2026-04-02 16:28 ` [PATCH v2 05/14] fuse: refactor setting up copy state for payload copying Joanne Koong
2026-04-30  8:06   ` Jeff Layton
2026-04-02 16:28 ` [PATCH v2 06/14] fuse: support buffer copying for kernel addresses Joanne Koong
2026-04-30  8:19   ` Jeff Layton
2026-04-02 16:28 ` [PATCH v2 07/14] fuse: use named constants for io-uring iovec indices Joanne Koong
2026-04-15  9:36   ` Bernd Schubert
2026-04-30  8:20   ` Jeff Layton
2026-04-02 16:28 ` [PATCH v2 08/14] fuse: move fuse_uring_abort() from header to dev_uring.c Joanne Koong
2026-04-15  9:40   ` Bernd Schubert
2026-04-30  8:21   ` Jeff Layton
2026-04-02 16:28 ` [PATCH v2 09/14] fuse: rearrange io-uring iovec and ent allocation logic Joanne Koong
2026-04-15  9:45   ` Bernd Schubert
2026-04-30  8:24   ` Jeff Layton
2026-04-02 16:28 ` [PATCH v2 10/14] fuse: add io-uring buffer rings Joanne Koong
2026-04-15  9:48   ` Bernd Schubert
2026-04-15 21:40     ` Joanne Koong
2026-04-30 11:08   ` Jeff Layton
2026-04-30 12:44     ` Joanne Koong
2026-05-05 22:47   ` Bernd Schubert [this message]
2026-04-02 16:28 ` [PATCH v2 11/14] fuse: add pinned headers capability for " Joanne Koong
2026-04-14 12:47   ` Bernd Schubert
2026-04-15  0:48     ` Joanne Koong
2026-05-05 22:51       ` Bernd Schubert
2026-04-30 11:22   ` Jeff Layton
2026-04-02 16:28 ` [PATCH v2 12/14] fuse: add pinned payload buffers " Joanne Koong
2026-04-30 11:29   ` Jeff Layton
2026-04-02 16:28 ` [PATCH v2 13/14] fuse: add zero-copy over io-uring Joanne Koong
2026-04-30 11:42   ` Jeff Layton
2026-04-30 12:35     ` Joanne Koong
2026-04-30 12:55       ` Jeff Layton
2026-05-05 22:55         ` Bernd Schubert
2026-04-30 12:56   ` Jeff Layton
2026-05-05 23:45   ` Bernd Schubert
2026-04-02 16:28 ` [PATCH v2 14/14] docs: fuse: add io-uring bufring and zero-copy documentation Joanne Koong
2026-04-14 21:05   ` Bernd Schubert
2026-04-15  1:10     ` Joanne Koong
2026-04-15 10:55       ` Bernd Schubert
2026-04-15 22:40         ` Joanne Koong
2026-04-30 12:57   ` Jeff Layton
2026-04-30 12:59 ` [PATCH v2 00/14] fuse: add io-uring buffer rings and zero-copy Jeff Layton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=456f05a2-dec3-487d-89ea-06fe0acd084a@bsbernd.com \
    --to=bernd@bsbernd.com \
    --cc=axboe@kernel.dk \
    --cc=joannelkoong@gmail.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=miklos@szeredi.hu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox