From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-pl1-f176.google.com (mail-pl1-f176.google.com [209.85.214.176]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 76E633F0AB8 for ; Thu, 2 Apr 2026 16:30:07 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.176 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775147409; cv=none; b=iO6JilZmAENUMndjbcQPXLZsSV8Uc6MxoAdqgboI4fgLPPNccTuWk4IthByjFivSqEqYL/MY0NxlZkzEC+qonFIajuXE3FO+PJKslxu1/4AjL2mit00RZW4Iv2kKIxcL35Lkb14t2ODHbUU6a8kSvIq+SlnA3N1lxmyzaB9p6O4= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775147409; c=relaxed/simple; bh=HwbwgHFw+UQSlb2mY6TcwQpSw4buAc+mKdgZtEAZc0A=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=j+KTJnT2gF4df5ZWRYT0gcvegbxA91zRtNLfOxBPIyNFiBCKtvUjo8rihV6j6T4MXLFVlrU8NTWoPMpKgr4XKjWDB4Y1aUCmr/9uE8K1Wh5dWtMVSOOJ7cfsp3yqXFdb+ebrcMUZrmba7Rc/VJFBQsy9TYaEDqdNMWyXqpHjFN8= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=ikXwCPr1; arc=none smtp.client-ip=209.85.214.176 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="ikXwCPr1" Received: by mail-pl1-f176.google.com with SMTP id d9443c01a7336-2ad617d5b80so6004045ad.1 for ; Thu, 02 Apr 2026 09:30:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1775147407; x=1775752207; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=bPuivHlwfeJGbRoRA2kkYktzOwR04vbxGKEMijs6124=; b=ikXwCPr1lT/WxcHJxMA2WL2wPZQILsSsz4YwbeDdpwCy0R9udKCX8eZ7DQV2MN6e9G 8ktALJ0caC7bFGR8R9EmQLelbcfDZGMgu8GmzM1Mm93ZMx9yqlweoSAftrZ5T89jfLL3 QYyyKWayn3sXhQao7DslmKHv1mbYjxpR6se6x+N7XL7ozBL8ZwezH5dE/l5SpmFckV/o xxBKeaSpH8//MOpPHlbaGwlcpwMRTq0uadfYs69QYPjdXi/LHEkOdSu+bQO7L50OyBWg NrqZeJNi21BvJPReWrWwBG9zsw782Mvt1DaacWaHrx+CAITpHQV+g+r3jrmnBwfo8ySh JUjQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1775147407; x=1775752207; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=bPuivHlwfeJGbRoRA2kkYktzOwR04vbxGKEMijs6124=; b=BJhGqC35XZbvO9955eY6v+3AQ+985s6rtY6qVXRcRfiGyARAZWxPr95VFLsbWk6dFo Ab3oxnN7ffFKlYX0tyixWnMOoWOFrN5mBxFrvSmtHLFoxXnF03dqmt1111si0dFakDKU P6KoGxLWbPACzxfq14PkIdmSlb2lNFu+/7EeAgg6J/PfyWpJWWl0Ws+qBMq4Hmgu3Gld 8GswGxColdNcQQgwfZPz+6c8MMkNpGC7IN0VBCwjoortyarmPeCYqQgIyEJ5nYDuKcBZ 2gNVTf5pdSDUGqdSG4LsK+4kc0Pz0+/4PNT90lYnili69NubTm7G5l7QjCfcXHkzyl7K OdiQ== X-Forwarded-Encrypted: i=1; AJvYcCXQ7hxBXc6ZrCIarFcfiKJisfaJtG/mIXsZpe8VycVVAxFW3dg3I5VyXQ2R70SE1qZweWV+xquR/2aSbe7o@vger.kernel.org X-Gm-Message-State: AOJu0YxDIbtXwvDzooQDJVSf6ZczO3nKbZ/HiUChq7yN2eSndL1k5oB1 V85DQsn818dI7pXJhRn1yJ7hoAjIfsxGkC4TFLruXuqX6BsjhvQ5bY1jSQ3+ag== X-Gm-Gg: AeBDievyMqELbYYP2i4rDBnYcgLUjs+0X6DOMvpMQFAK4E416Y43scdJe4TmhgGmIKe WHIAg6r0cvF1einVrqL/wJE5MtGUC0fIz9SmqeGuSYx2M89C1jRCkW5EiBytl+MQ2S3jiz+ImXv HtS1KQMX2x9aZlqQE6p4mEicWvcVQc3ovUMEaZv3zw8X2OE0VIAGXfA4M/CdH5Z2cq4F0viPkn/ zS702Oc2C6TboEC+JVJZH7qOT4lhs5im4cmlm8P1rLG/+e50DfV7tCb/SoJVPFcjNpi1JR0/5WV A5TaRbtdaUH/EaJioXMex9koa+AdZQUNGmU5gl+TA+xXjouhGWwhEo0dmlbCOzVkOyPjc9vfdKH HHSv6LKtFkQsqWAA33N0NQtyiQl7gRKWlfWOPEBPVkF2NmY5I2iQhm+/R4S/JAPDIZ9rINDFuZF LslSRg5iq/qlIjEliRVQ== X-Received: by 2002:a17:903:1b06:b0:2b0:4a57:e480 with SMTP id d9443c01a7336-2b269cba4b4mr86327455ad.45.1775147406584; Thu, 02 Apr 2026 09:30:06 -0700 (PDT) Received: from localhost ([2a03:2880:ff:71::]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-2b274756621sm33160235ad.20.2026.04.02.09.30.05 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 02 Apr 2026 09:30:06 -0700 (PDT) From: Joanne Koong To: miklos@szeredi.hu Cc: bernd@bsbernd.com, axboe@kernel.dk, linux-fsdevel@vger.kernel.org Subject: [PATCH v2 10/14] fuse: add io-uring buffer rings Date: Thu, 2 Apr 2026 09:28:36 -0700 Message-ID: <20260402162840.2989717-11-joannelkoong@gmail.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260402162840.2989717-1-joannelkoong@gmail.com> References: <20260402162840.2989717-1-joannelkoong@gmail.com> Precedence: bulk X-Mailing-List: linux-fsdevel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Add fuse buffer rings for servers communicating through the io-uring interface. To use this, the server must set the FUSE_URING_BUFRING flag and provide header and payload buffers via an iovec array in the sqe during registration. The payload buffers are used to back the buffer ring. The kernel manages buffer selection and recycling through a simple internal ring. This has the following advantages over the non-bufring (iovec) path: - Reduced memory usage: in the iovec path, each entry has its own dedicated payload buffer, requiring N buffers for N entries where each buffer must be large enough to accommodate the maximum possible payload size. With buffer rings, payload buffers are pooled and selected on demand. Entries only hold a buffer while actively processing a request with payload data. When incremental buffer consumption is added, this will allow non-overlapping regions of a single buffer to be used simultaneously across multiple requests, further reducing memory requirements. - Foundation for pinned buffers: the buffer ring headers and payloads are now each passed in as a contiguous memory allocation, which allows fuse to easily pin and vmap the entire region in one operation during queue setup. This will eliminate the per-request overhead of having to pin/unpin user pages and translate virtual addresses and is a prerequisite for future optimizations like performing data copies outside of the server's task context. Each ring entry gets a fixed ID (sqe->buf_index) that maps to a specific header slot in the headers buffer. Payload buffers are selected from the ring on demand and recycled after each request. Buffer ring usage is set on a per-queue basis. All subsequent registration SQEs for the same queue must use consistent flags. The headers are laid out contiguously and provided via iov[0]. Each slot maps to ent->id: |<- headers_size (>= queue_depth * sizeof(fuse_uring_req_header)) ->| +------------------------------+------------------------------+-----+ | struct fuse_uring_req_header | struct fuse_uring_req_header | ... | | [ent id=0] | [ent id=1] | | +------------------------------+------------------------------+-----+ On the server side, the ent id is used to determine where in the headers buffer the headers data for the ent resides. This is done by calculating ent_id * sizeof(struct fuse_uring_req_header) as the offset into the headers buffer. The buffer ring is backed by the payload buffer, which is contiguous but partitioned into individual bufs according to the buf_size passed in at registration. PAYLOAD BUFFER POOL (contiguous, provided via iov[1]): |<-------------- payload_size ------------>| +--------- --+-----------+-----------+-----+ | buf [0] | buf [1] | buf [2] | ... | | buf_size | buf_size | buf_size | ... | +--------- --+-----------+-----------+-----+ buffer ring state (struct fuse_bufring, kernel-internal): bufs[]: [ used | used | FREE | FREE | FREE ] ^^^^^^^^^^^^^^^^^^^ available for selection The buffer ring logic is as follows: select: buf = bufs[head % nbufs]; head++ recycle: bufs[tail % nbufs] = buf; tail++ empty: tail == head (no buffers available) full: tail - head >= nbufs Buffer ring request flow ------------------------ | Kernel | FUSE daemon | | | [client request arrives] | | >fuse_uring_send() | | [select payload buf from ring] | | >fuse_uring_select_buffer() | | [copy headers to ent's header slot] | | >copy_header_to_ring() | | [copy payload to selected buf] | | >fuse_uring_copy_to_ring() | | [set buf_id in ent_in_out header] | | >io_uring_cmd_done() | | | [CQE received] | | [read headers from header | | slot] | | [read payload from buf_id] | | [process request] | | [write reply to header | | slot] | | [write reply payload to | | buf] | | >io_uring_submit() | | COMMIT_AND_FETCH | >fuse_uring_commit_fetch() | | >fuse_uring_commit() | | [copy reply from ring] | | >fuse_uring_recycle_buffer() | | >fuse_uring_get_next_fuse_req() | Signed-off-by: Joanne Koong --- fs/fuse/dev_uring.c | 363 +++++++++++++++++++++++++++++++++----- fs/fuse/dev_uring_i.h | 45 ++++- include/uapi/linux/fuse.h | 27 ++- 3 files changed, 381 insertions(+), 54 deletions(-) diff --git a/fs/fuse/dev_uring.c b/fs/fuse/dev_uring.c index a061f175b3fd..9f14a2bcde3f 100644 --- a/fs/fuse/dev_uring.c +++ b/fs/fuse/dev_uring.c @@ -41,6 +41,11 @@ enum fuse_uring_header_type { FUSE_URING_HEADER_RING_ENT, }; +static inline bool bufring_enabled(struct fuse_ring_queue *queue) +{ + return queue->bufring != NULL; +} + static void uring_cmd_set_ring_ent(struct io_uring_cmd *cmd, struct fuse_ring_ent *ring_ent) { @@ -222,6 +227,7 @@ void fuse_uring_destruct(struct fuse_conn *fc) } kfree(queue->fpq.processing); + kfree(queue->bufring); kfree(queue); ring->queues[qid] = NULL; } @@ -303,20 +309,102 @@ static int fuse_uring_get_iovec_from_sqe(const struct io_uring_sqe *sqe, return 0; } -static struct fuse_ring_queue *fuse_uring_create_queue(struct fuse_ring *ring, - int qid) +static int fuse_uring_bufring_setup(struct io_uring_cmd *cmd, + struct fuse_ring_queue *queue) +{ + const struct fuse_uring_cmd_req *cmd_req = + io_uring_sqe128_cmd(cmd->sqe, struct fuse_uring_cmd_req); + u16 queue_depth = READ_ONCE(cmd_req->init.queue_depth); + unsigned int buf_size = READ_ONCE(cmd_req->init.buf_size); + struct iovec iov[FUSE_URING_IOV_SEGS]; + void __user *payload, *headers; + size_t headers_size, payload_size, ring_size; + struct fuse_bufring *br; + unsigned int nr_bufs, i; + uintptr_t payload_addr; + int err; + + if (!queue_depth || !buf_size) + return -EINVAL; + + err = fuse_uring_get_iovec_from_sqe(cmd->sqe, iov); + if (err) + return err; + + headers = iov[FUSE_URING_IOV_HEADERS].iov_base; + headers_size = iov[FUSE_URING_IOV_HEADERS].iov_len; + payload = iov[FUSE_URING_IOV_PAYLOAD].iov_base; + payload_size = iov[FUSE_URING_IOV_PAYLOAD].iov_len; + + /* check if there's enough space for all the headers */ + if (headers_size < queue_depth * sizeof(struct fuse_uring_req_header)) + return -EINVAL; + + if (buf_size < queue->ring->max_payload_sz) + return -EINVAL; + + nr_bufs = payload_size / buf_size; + if (!nr_bufs || nr_bufs > U16_MAX) + return -EINVAL; + + /* create the ring buffer */ + ring_size = struct_size(br, bufs, nr_bufs); + br = kzalloc(ring_size, GFP_KERNEL_ACCOUNT); + if (!br) + return -ENOMEM; + + br->queue_depth = queue_depth; + br->headers = headers; + + payload_addr = (uintptr_t)payload; + + /* populate the ring buffer */ + for (i = 0; i < nr_bufs; i++, payload_addr += buf_size) { + struct fuse_bufring_buf *buf = &br->bufs[i]; + + buf->addr = payload_addr; + buf->len = buf_size; + buf->id = i; + } + + br->nbufs = nr_bufs; + br->tail = nr_bufs; + + queue->bufring = br; + + return 0; +} + +/* + * if the queue is already registered, check that the queue was initialized with + * the same init flags set for this FUSE_IO_URING_CMD_REGISTER cmd. all + * FUSE_IO_URING_CMD_REGISTER cmds should have the same init fields set on a + * per-queue basis. + */ +static bool queue_init_flags_consistent(struct fuse_ring_queue *queue, + u64 init_flags) { + bool bufring = init_flags & FUSE_URING_BUFRING; + + return bufring_enabled(queue) == bufring; +} + +static struct fuse_ring_queue * +fuse_uring_create_queue(struct io_uring_cmd *cmd, struct fuse_ring *ring, + int qid, u64 init_flags) +{ + bool use_bufring = init_flags & FUSE_URING_BUFRING; struct fuse_conn *fc = ring->fc; struct fuse_ring_queue *queue; struct list_head *pq; queue = kzalloc_obj(*queue, GFP_KERNEL_ACCOUNT); if (!queue) - return NULL; + return ERR_PTR(-ENOMEM); pq = kzalloc_objs(struct list_head, FUSE_PQ_HASH_SIZE); if (!pq) { kfree(queue); - return NULL; + return ERR_PTR(-ENOMEM); } queue->qid = qid; @@ -334,12 +422,29 @@ static struct fuse_ring_queue *fuse_uring_create_queue(struct fuse_ring *ring, queue->fpq.processing = pq; fuse_pqueue_init(&queue->fpq); + if (use_bufring) { + int err = fuse_uring_bufring_setup(cmd, queue); + + if (err) { + kfree(pq); + kfree(queue); + return ERR_PTR(err); + } + } + spin_lock(&fc->lock); + /* check if the queue creation raced with another thread */ if (ring->queues[qid]) { spin_unlock(&fc->lock); kfree(queue->fpq.processing); + if (use_bufring) + kfree(queue->bufring); kfree(queue); - return ring->queues[qid]; + + queue = ring->queues[qid]; + if (!queue_init_flags_consistent(queue, init_flags)) + return ERR_PTR(-EINVAL); + return queue; } /* @@ -649,7 +754,14 @@ static int copy_header_to_ring(struct fuse_ring_ent *ent, if (offset < 0) return offset; - ring = (void __user *)ent->headers + offset; + if (bufring_enabled(ent->queue)) { + int buf_offset = offset + + sizeof(struct fuse_uring_req_header) * ent->id; + + ring = ent->queue->bufring->headers + buf_offset; + } else { + ring = (void __user *)ent->headers + offset; + } if (copy_to_user(ring, header, header_size)) { pr_info_ratelimited("Copying header to ring failed.\n"); @@ -669,7 +781,14 @@ static int copy_header_from_ring(struct fuse_ring_ent *ent, if (offset < 0) return offset; - ring = (void __user *)ent->headers + offset; + if (bufring_enabled(ent->queue)) { + int buf_offset = offset + + sizeof(struct fuse_uring_req_header) * ent->id; + + ring = ent->queue->bufring->headers + buf_offset; + } else { + ring = (void __user *)ent->headers + offset; + } if (copy_from_user(header, ring, header_size)) { pr_info_ratelimited("Copying header from ring failed.\n"); @@ -684,12 +803,20 @@ static int setup_fuse_copy_state(struct fuse_copy_state *cs, struct fuse_ring_ent *ent, int dir, struct iov_iter *iter) { + void __user *payload; int err; - err = import_ubuf(dir, ent->payload, ring->max_payload_sz, iter); - if (err) { - pr_info_ratelimited("fuse: Import of user buffer failed\n"); - return err; + if (bufring_enabled(ent->queue)) + payload = (void __user *)ent->payload_buf.addr; + else + payload = ent->payload; + + if (payload) { + err = import_ubuf(dir, payload, ring->max_payload_sz, iter); + if (err) { + pr_info_ratelimited("fuse: Import of user buffer failed\n"); + return err; + } } fuse_copy_init(cs, dir == ITER_DEST, iter); @@ -741,6 +868,9 @@ static int fuse_uring_args_to_ring(struct fuse_ring *ring, struct fuse_req *req, .commit_id = req->in.h.unique, }; + if (bufring_enabled(ent->queue)) + ent_in_out.buf_id = ent->payload_buf.id; + err = setup_fuse_copy_state(&cs, ring, req, ent, ITER_DEST, &iter); if (err) return err; @@ -805,6 +935,96 @@ static int fuse_uring_copy_to_ring(struct fuse_ring_ent *ent, sizeof(req->in.h)); } +static bool fuse_uring_req_has_payload(struct fuse_req *req) +{ + struct fuse_args *args = req->args; + + return args->in_numargs > 1 || args->out_numargs; +} + +static int fuse_uring_select_buffer(struct fuse_ring_ent *ent) + __must_hold(&ent->queue->lock) +{ + struct fuse_ring_queue *queue = ent->queue; + struct fuse_bufring *br = queue->bufring; + struct fuse_bufring_buf *buf; + unsigned int tail = br->tail, head = br->head; + + lockdep_assert_held(&queue->lock); + + /* Get a buffer to use for the payload */ + if (tail == head) + return -ENOBUFS; + + buf = &br->bufs[head % br->nbufs]; + br->head++; + + ent->payload_buf = *buf; + + return 0; +} + +static void fuse_uring_recycle_buffer(struct fuse_ring_ent *ent) + __must_hold(&ent->queue->lock) +{ + struct fuse_bufring_buf *ent_payload = &ent->payload_buf; + struct fuse_ring_queue *queue = ent->queue; + struct fuse_bufring_buf *buf; + struct fuse_bufring *br; + + lockdep_assert_held(&queue->lock); + + if (!bufring_enabled(queue) || !ent_payload->addr) + return; + + br = queue->bufring; + + /* ring should never be full */ + WARN_ON_ONCE(br->tail - br->head >= br->nbufs); + + buf = &br->bufs[(br->tail) % br->nbufs]; + + *buf = *ent_payload; + + br->tail++; + + memset(ent_payload, 0, sizeof(*ent_payload)); +} + +static int fuse_uring_next_req_update_buffer(struct fuse_ring_ent *ent, + struct fuse_req *req) +{ + bool buffer_selected; + bool has_payload; + + if (!bufring_enabled(ent->queue)) + return 0; + + buffer_selected = !!ent->payload_buf.addr; + has_payload = fuse_uring_req_has_payload(req); + + if (has_payload && !buffer_selected) + return fuse_uring_select_buffer(ent); + + if (!has_payload && buffer_selected) + fuse_uring_recycle_buffer(ent); + + return 0; +} + +static int fuse_uring_prep_buffer(struct fuse_ring_ent *ent, + struct fuse_req *req) +{ + if (!bufring_enabled(ent->queue)) + return 0; + + /* no payload to copy, can skip selecting a buffer */ + if (!fuse_uring_req_has_payload(req)) + return 0; + + return fuse_uring_select_buffer(ent); +} + static int fuse_uring_prepare_send(struct fuse_ring_ent *ent, struct fuse_req *req) { @@ -878,10 +1098,21 @@ static struct fuse_req *fuse_uring_ent_assign_req(struct fuse_ring_ent *ent) /* get and assign the next entry while it is still holding the lock */ req = list_first_entry_or_null(req_queue, struct fuse_req, list); - if (req) - fuse_uring_add_req_to_ring_ent(ent, req); + if (req) { + int err = fuse_uring_next_req_update_buffer(ent, req); - return req; + if (!err) { + fuse_uring_add_req_to_ring_ent(ent, req); + return req; + } + } + + /* + * Buffer selection may fail if all the buffers are currently saturated. + * The request will be serviced when a buffer is freed up. + */ + fuse_uring_recycle_buffer(ent); + return NULL; } /* @@ -1041,6 +1272,12 @@ static int fuse_uring_commit_fetch(struct io_uring_cmd *cmd, int issue_flags, * fuse requests would otherwise not get processed - committing * and fetching is done in one step vs legacy fuse, which has separated * read (fetch request) and write (commit result). + * + * If the server is using bufrings and has populated the ring with less + * payload buffers than ents, it is possible that there may not be an + * available buffer for the next request. If so, then the fetch is a + * no-op and the next request will be serviced when a buffer becomes + * available. */ if (fuse_uring_get_next_fuse_req(ent, queue)) fuse_uring_send(ent, cmd, 0, issue_flags); @@ -1120,30 +1357,38 @@ fuse_uring_create_ring_ent(struct io_uring_cmd *cmd, ent->queue = queue; - err = fuse_uring_get_iovec_from_sqe(cmd->sqe, iov); - if (err) { - pr_info_ratelimited("Failed to get iovec from sqe, err=%d\n", - err); - goto error; - } + if (bufring_enabled(queue)) { + ent->id = READ_ONCE(cmd->sqe->buf_index); + if (ent->id >= queue->bufring->queue_depth) { + err = -EINVAL; + goto error; + } + } else { + err = fuse_uring_get_iovec_from_sqe(cmd->sqe, iov); + if (err) { + pr_info_ratelimited("Failed to get iovec from sqe, err=%d\n", + err); + goto error; + } - err = -EINVAL; - headers = &iov[FUSE_URING_IOV_HEADERS]; - if (headers->iov_len < sizeof(struct fuse_uring_req_header)) { - pr_info_ratelimited("Invalid header len %zu\n", headers->iov_len); - goto error; - } + err = -EINVAL; + headers = &iov[FUSE_URING_IOV_HEADERS]; + if (headers->iov_len < sizeof(struct fuse_uring_req_header)) { + pr_info_ratelimited("Invalid header len %zu\n", + headers->iov_len); + goto error; + } - payload = &iov[FUSE_URING_IOV_PAYLOAD]; - if (payload->iov_len < ring->max_payload_sz) { - pr_info_ratelimited("Invalid req payload len %zu\n", - payload->iov_len); - goto error; + payload = &iov[FUSE_URING_IOV_PAYLOAD]; + if (payload->iov_len < ring->max_payload_sz) { + pr_info_ratelimited("Invalid req payload len %zu\n", + payload->iov_len); + goto error; + } + ent->headers = headers->iov_base; + ent->payload = payload->iov_base; } - ent->headers = headers->iov_base; - ent->payload = payload->iov_base; - atomic_inc(&ring->queue_refs); return ent; @@ -1152,6 +1397,13 @@ fuse_uring_create_ring_ent(struct io_uring_cmd *cmd, return ERR_PTR(err); } +static bool init_flags_valid(u64 init_flags) +{ + u64 valid_flags = FUSE_URING_BUFRING; + + return !(init_flags & ~valid_flags); +} + /* * Register header and payload buffer with the kernel and puts the * entry as "ready to get fuse requests" on the queue @@ -1161,6 +1413,7 @@ static int fuse_uring_register(struct io_uring_cmd *cmd, { const struct fuse_uring_cmd_req *cmd_req = io_uring_sqe128_cmd(cmd->sqe, struct fuse_uring_cmd_req); + u64 init_flags = READ_ONCE(cmd_req->flags); struct fuse_ring *ring = smp_load_acquire(&fc->ring); struct fuse_ring_queue *queue; struct fuse_ring_ent *ent; @@ -1179,11 +1432,16 @@ static int fuse_uring_register(struct io_uring_cmd *cmd, return -EINVAL; } + if (!init_flags_valid(init_flags)) + return -EINVAL; + queue = ring->queues[qid]; if (!queue) { - queue = fuse_uring_create_queue(ring, qid); - if (!queue) - return err; + queue = fuse_uring_create_queue(cmd, ring, qid, init_flags); + if (IS_ERR(queue)) + return PTR_ERR(queue); + } else if (!queue_init_flags_consistent(queue, init_flags)) { + return -EINVAL; } /* @@ -1349,14 +1607,18 @@ void fuse_uring_queue_fuse_req(struct fuse_iqueue *fiq, struct fuse_req *req) req->ring_queue = queue; ent = list_first_entry_or_null(&queue->ent_avail_queue, struct fuse_ring_ent, list); - if (ent) - fuse_uring_add_req_to_ring_ent(ent, req); - else - list_add_tail(&req->list, &queue->fuse_req_queue); - spin_unlock(&queue->lock); + if (ent) { + err = fuse_uring_prep_buffer(ent, req); + if (!err) { + fuse_uring_add_req_to_ring_ent(ent, req); + spin_unlock(&queue->lock); + fuse_uring_dispatch_ent(ent); + return; + } + } - if (ent) - fuse_uring_dispatch_ent(ent); + list_add_tail(&req->list, &queue->fuse_req_queue); + spin_unlock(&queue->lock); return; @@ -1406,14 +1668,17 @@ bool fuse_uring_queue_bq_req(struct fuse_req *req) req = list_first_entry_or_null(&queue->fuse_req_queue, struct fuse_req, list); if (ent && req) { - fuse_uring_add_req_to_ring_ent(ent, req); - spin_unlock(&queue->lock); + int err = fuse_uring_prep_buffer(ent, req); - fuse_uring_dispatch_ent(ent); - } else { - spin_unlock(&queue->lock); + if (!err) { + fuse_uring_add_req_to_ring_ent(ent, req); + spin_unlock(&queue->lock); + fuse_uring_dispatch_ent(ent); + return true; + } } + spin_unlock(&queue->lock); return true; } diff --git a/fs/fuse/dev_uring_i.h b/fs/fuse/dev_uring_i.h index 349418db3374..66d5d5f8dc3f 100644 --- a/fs/fuse/dev_uring_i.h +++ b/fs/fuse/dev_uring_i.h @@ -36,11 +36,47 @@ enum fuse_ring_req_state { FRRS_RELEASED, }; +struct fuse_bufring_buf { + uintptr_t addr; + unsigned int len; + unsigned int id; +}; + +struct fuse_bufring { + /* pointer to the headers buffer */ + void __user *headers; + + unsigned int queue_depth; + + /* metadata tracking state of the bufring */ + unsigned int nbufs; + unsigned int head; + unsigned int tail; + + /* the buffers backing the ring */ + __DECLARE_FLEX_ARRAY(struct fuse_bufring_buf, bufs); +}; + /** A fuse ring entry, part of the ring queue */ struct fuse_ring_ent { - /* userspace buffer */ - struct fuse_uring_req_header __user *headers; - void __user *payload; + union { + /* if bufrings are not used */ + struct { + /* userspace buffers */ + struct fuse_uring_req_header __user *headers; + void __user *payload; + }; + /* if bufrings are used */ + struct { + /* + * unique fixed id for the ent. used by kernel/server to + * locate where in the headers buffer the data for this + * ent resides + */ + unsigned int id; + struct fuse_bufring_buf payload_buf; + }; + }; /* the ring queue that owns the request */ struct fuse_ring_queue *queue; @@ -99,6 +135,9 @@ struct fuse_ring_queue { unsigned int active_background; bool stopped; + + /* only allocated if the server uses bufrings */ + struct fuse_bufring *bufring; }; /** diff --git a/include/uapi/linux/fuse.h b/include/uapi/linux/fuse.h index c13e1f9a2f12..8753de7eb189 100644 --- a/include/uapi/linux/fuse.h +++ b/include/uapi/linux/fuse.h @@ -240,6 +240,10 @@ * - add FUSE_COPY_FILE_RANGE_64 * - add struct fuse_copy_file_range_out * - add FUSE_NOTIFY_PRUNE + * + * 7.46 + * - add FUSE_URING_BUFRING flag + * - add fuse_uring_cmd_req init struct */ #ifndef _LINUX_FUSE_H @@ -1263,7 +1267,13 @@ struct fuse_uring_ent_in_out { /* size of user payload buffer */ uint32_t payload_sz; - uint32_t padding; + + /* + * if using bufrings, this is the id of the selected buffer. + * the selected buffer holds the request payload + */ + uint16_t buf_id; + uint16_t padding; uint64_t reserved; }; @@ -1294,6 +1304,9 @@ enum fuse_uring_cmd { FUSE_IO_URING_CMD_COMMIT_AND_FETCH = 2, }; +/* fuse_uring_cmd_req flags */ +#define FUSE_URING_BUFRING (1 << 0) + /** * In the 80B command area of the SQE. */ @@ -1305,7 +1318,17 @@ struct fuse_uring_cmd_req { /* queue the command is for (queue index) */ uint16_t qid; - uint8_t padding[6]; + uint16_t padding; + + union { + struct { + /* size of the bufring's backing buffers */ + uint32_t buf_size; + /* number of entries in the queue */ + uint16_t queue_depth; + uint16_t padding; + } init; + }; }; #endif /* _LINUX_FUSE_H */ -- 2.52.0