linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Josef Bacik <josef@toxicpanda.com>
To: Bernd Schubert <bernd.schubert@fastmail.fm>
Cc: Bernd Schubert <bschubert@ddn.com>,
	Miklos Szeredi <miklos@szeredi.hu>,
	Amir Goldstein <amir73il@gmail.com>,
	linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH RFC v2 04/19] fuse: Add fuse-io-uring design documentation
Date: Thu, 30 May 2024 10:59:53 -0400	[thread overview]
Message-ID: <20240530145953.GB2205585@perftesting> (raw)
In-Reply-To: <8e756ed6-3b12-4afa-ad6a-94e9a56fd4be@fastmail.fm>

On Thu, May 30, 2024 at 02:50:30PM +0200, Bernd Schubert wrote:
> 
> 
> On 5/29/24 23:17, Josef Bacik wrote:
> > On Wed, May 29, 2024 at 08:00:39PM +0200, Bernd Schubert wrote:
> >> Signed-off-by: Bernd Schubert <bschubert@ddn.com>
> >> ---
> >>  Documentation/filesystems/fuse-io-uring.rst | 167 ++++++++++++++++++++++++++++
> >>  1 file changed, 167 insertions(+)
> >>
> >> diff --git a/Documentation/filesystems/fuse-io-uring.rst b/Documentation/filesystems/fuse-io-uring.rst
> >> new file mode 100644
> >> index 000000000000..4aa168e3b229
> >> --- /dev/null
> >> +++ b/Documentation/filesystems/fuse-io-uring.rst
> >> @@ -0,0 +1,167 @@
> >> +.. SPDX-License-Identifier: GPL-2.0
> >> +
> >> +===============================
> >> +FUSE Uring design documentation
> >> +==============================
> >> +
> >> +This documentation covers basic details how the fuse
> >> +kernel/userspace communication through uring is configured
> >> +and works. For generic details about FUSE see fuse.rst.
> >> +
> >> +This document also covers the current interface, which is
> >> +still in development and might change.
> >> +
> >> +Limitations
> >> +===========
> >> +As of now not all requests types are supported through uring, userspace
> > 
> > s/userspace side/userspace/
> > 
> >> +side is required to also handle requests through /dev/fuse after
> >> +uring setup is complete. These are especially notifications (initiated
> > 
> > especially is an awkward word choice here, I'm not quite sure what you're trying
> > say here, perhaps
> > 
> > "Specifically notifications (initiated from the daemon side), interrupts and
> > forgets"
> 
> Yep, thanks a lot! I removed forgets", these should be working over the ring 
> in the mean time.
> 
> > 
> > ?
> > 
> >> +from daemon side), interrupts and forgets.
> >> +Interrupts are probably not working at all when uring is used. At least
> >> +current state of libfuse will not be able to handle those for requests
> >> +on ring queues.
> >> +All these limitation will be addressed later.
> >> +
> >> +Fuse uring configuration
> >> +========================
> >> +
> >> +Fuse kernel requests are queued through the classical /dev/fuse
> >> +read/write interface - until uring setup is complete.
> >> +
> >> +In order to set up fuse-over-io-uring userspace has to send ioctls,
> >> +mmap requests in the right order
> >> +
> >> +1) FUSE_DEV_IOC_URING ioctl with FUSE_URING_IOCTL_CMD_RING_CFG
> >> +
> >> +First the basic kernel data structure has to be set up, using
> >> +FUSE_DEV_IOC_URING with subcommand FUSE_URING_IOCTL_CMD_RING_CFG.
> >> +
> >> +Example (from libfuse)
> >> +
> >> +static int fuse_uring_setup_kernel_ring(int session_fd,
> >> +					int nr_queues, int sync_qdepth,
> >> +					int async_qdepth, int req_arg_len,
> >> +					int req_alloc_sz)
> >> +{
> >> +	int rc;
> >> +
> >> +	struct fuse_ring_config rconf = {
> >> +		.nr_queues		    = nr_queues,
> >> +		.sync_queue_depth	= sync_qdepth,
> >> +		.async_queue_depth	= async_qdepth,
> >> +		.req_arg_len		= req_arg_len,
> >> +		.user_req_buf_sz	= req_alloc_sz,
> >> +		.numa_aware		    = nr_queues > 1,
> >> +	};
> >> +
> >> +	struct fuse_uring_cfg ioc_cfg = {
> >> +		.flags = 0,
> >> +		.cmd = FUSE_URING_IOCTL_CMD_RING_CFG,
> >> +		.rconf = rconf,
> >> +	};
> >> +
> >> +	rc = ioctl(session_fd, FUSE_DEV_IOC_URING, &ioc_cfg);
> >> +	if (rc)
> >> +		rc = -errno;
> >> +
> >> +	return rc;
> >> +}
> >> +
> >> +2) MMAP
> >> +
> >> +For shared memory communication between kernel and userspace
> >> +each queue has to allocate and map memory buffer.
> >> +For numa awares kernel side verifies if the allocating thread
> > 
> > This bit is awkwardly worded and there's some spelling mistakes.  Perhaps
> > something like this?
> > 
> > "For numa aware kernels, the kernel verifies that the allocating thread is bound
> > to a single core, as the kernel has the expectation that only a single thread
> > accesses a queue, and for numa aware memory allocation the core of the thread
> > sending the mmap request is used to identify the numa node"
> 
> Thank you, updated. I actually consider to reduce this to a warning (will try 
> to add an async FUSE_WARN request type for this and others). Issue is that
> systems cannot set up fuse-uring when a core is disabled. 
> 
> > 
> >> +is bound to a single core - in general kernel side has expectations
> >> +that only a single thread accesses a queue and for numa aware
> >> +memory alloation the core of the thread sending the mmap request
> >> +is used to identify the numa node.
> >> +
> >> +The offsset parameter has to be FUSE_URING_MMAP_OFF to identify
> >        ^^^^ "offset"
> 
> 
> Fixed.
> 
> > 
> >> +it is a request concerning fuse-over-io-uring.
> >> +
> >> +3) FUSE_DEV_IOC_URING ioctl with FUSE_URING_IOCTL_CMD_QUEUE_CFG
> >> +
> >> +This ioctl has to be send for every queue and takes the queue-id (qid)
> >                         ^^^^ "sent"
> > 
> >> +and memory address obtained by mmap to set up queue data structures.
> >> +
> >> +Kernel - userspace interface using uring
> >> +========================================
> >> +
> >> +After queue ioctl setup and memory mapping userspace submits
> > 
> > This needs a comma, so
> > 
> > "After queue ioctl setup and memory mapping, userspace submites"
> > 
> >> +SQEs (opcode = IORING_OP_URING_CMD) in order to fetch
> >> +fuse requests. Initial submit is with the sub command
> >> +FUSE_URING_REQ_FETCH, which will just register entries
> >> +to be available on the kernel side - it sets the according
> > 
> > s/according/associated/ maybe?
> > 
> >> +entry state and marks the entry as available in the queue bitmap.
> 
> Or maybe like this?
> 
> Initial submit is with the sub command FUSE_URING_REQ_FETCH, which 
> will just register entries to be available in the kernel.
> 
> 
> >> +
> >> +Once all entries for all queues are submitted kernel side starts
> >> +to enqueue to ring queue(s). The request is copied into the shared
> >> +memory queue entry buffer and submitted as CQE to the userspace
> >> +side.
> >> +Userspace side handles the CQE and submits the result as subcommand
> >> +FUSE_URING_REQ_COMMIT_AND_FETCH - kernel side does completes the requests
> > 
> > "the kernel completes the request"
> 
> Yeah, now I see the bad grammar myself. Updated to
> 
> 
> Once all entries for all queues are submitted, kernel starts
> to enqueue to ring queues. The request is copied into the shared
> memory buffer and submitted as CQE to the daemon.
> Userspace handles the CQE/fuse-request and submits the result as
> subcommand FUSE_URING_REQ_COMMIT_AND_FETCH - kernel completes
> the requests and also marks the entry available again. If there are
> pending requests waiting the request will be immediately submitted
> to the daemon again.
> 
> 
> 
> Thank you very much for your help to phrase this better!
> 

This all looks great, thanks!

Josef

  reply	other threads:[~2024-05-30 14:59 UTC|newest]

Thread overview: 113+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-29 18:00 [PATCH RFC v2 00/19] fuse: fuse-over-io-uring Bernd Schubert
2024-05-29 18:00 ` [PATCH RFC v2 01/19] fuse: rename to fuse_dev_end_requests and make non-static Bernd Schubert
2024-05-29 21:09   ` Josef Bacik
2024-05-29 18:00 ` [PATCH RFC v2 02/19] fuse: Move fuse_get_dev to header file Bernd Schubert
2024-05-29 21:09   ` Josef Bacik
2024-05-29 18:00 ` [PATCH RFC v2 03/19] fuse: Move request bits Bernd Schubert
2024-05-29 21:10   ` Josef Bacik
2024-05-29 18:00 ` [PATCH RFC v2 04/19] fuse: Add fuse-io-uring design documentation Bernd Schubert
2024-05-29 21:17   ` Josef Bacik
2024-05-30 12:50     ` Bernd Schubert
2024-05-30 14:59       ` Josef Bacik [this message]
2024-05-29 18:00 ` [PATCH RFC v2 05/19] fuse: Add a uring config ioctl Bernd Schubert
2024-05-29 21:24   ` Josef Bacik
2024-05-30 12:51     ` Bernd Schubert
2024-06-03 13:03   ` Miklos Szeredi
2024-06-03 13:48     ` Bernd Schubert
2024-05-29 18:00 ` [PATCH RFC v2 06/19] Add a vmalloc_node_user function Bernd Schubert
2024-05-30 15:10   ` Josef Bacik
2024-05-30 16:13     ` Bernd Schubert
2024-05-31 13:56   ` Christoph Hellwig
2024-06-03 15:59     ` Kent Overstreet
2024-06-03 19:24       ` Bernd Schubert
2024-06-04  4:20         ` Christoph Hellwig
2024-06-07  2:30           ` Dave Chinner
2024-06-07  4:49             ` Christoph Hellwig
2024-06-04  4:08       ` Christoph Hellwig
2024-05-29 18:00 ` [PATCH RFC v2 07/19] fuse uring: Add an mmap method Bernd Schubert
2024-05-30 15:37   ` Josef Bacik
2024-05-29 18:00 ` [PATCH RFC v2 08/19] fuse: Add the queue configuration ioctl Bernd Schubert
2024-05-30 15:54   ` Josef Bacik
2024-05-30 17:49     ` Bernd Schubert
2024-05-29 18:00 ` [PATCH RFC v2 09/19] fuse: {uring} Add a dev_release exception for fuse-over-io-uring Bernd Schubert
2024-05-30 19:00   ` Josef Bacik
2024-05-29 18:00 ` [PATCH RFC v2 10/19] fuse: {uring} Handle SQEs - register commands Bernd Schubert
2024-05-30 19:55   ` Josef Bacik
2024-05-29 18:00 ` [PATCH RFC v2 11/19] fuse: Add support to copy from/to the ring buffer Bernd Schubert
2024-05-30 19:59   ` Josef Bacik
2024-09-01 11:56     ` Bernd Schubert
2024-09-01 11:56     ` Bernd Schubert
2024-05-29 18:00 ` [PATCH RFC v2 12/19] fuse: {uring} Add uring sqe commit and fetch support Bernd Schubert
2024-05-30 20:08   ` Josef Bacik
2024-05-29 18:00 ` [PATCH RFC v2 13/19] fuse: {uring} Handle uring shutdown Bernd Schubert
2024-05-30 20:21   ` Josef Bacik
2024-05-29 18:00 ` [PATCH RFC v2 14/19] fuse: {uring} Allow to queue to the ring Bernd Schubert
2024-05-30 20:32   ` Josef Bacik
2024-05-30 21:26     ` Bernd Schubert
2024-05-29 18:00 ` [PATCH RFC v2 15/19] export __wake_on_current_cpu Bernd Schubert
2024-05-30 20:37   ` Josef Bacik
2024-06-04  9:26     ` Peter Zijlstra
2024-06-04  9:36       ` Bernd Schubert
2024-06-04 19:27         ` Peter Zijlstra
2024-09-01 12:07           ` Bernd Schubert
2024-05-31 13:51   ` Christoph Hellwig
2024-05-29 18:00 ` [PATCH RFC v2 16/19] fuse: {uring} Wake requests on the the current cpu Bernd Schubert
2024-05-30 16:44   ` Shachar Sharon
2024-05-30 16:59     ` Bernd Schubert
2024-05-29 18:00 ` [PATCH RFC v2 17/19] fuse: {uring} Send async requests to qid of core + 1 Bernd Schubert
2024-05-29 18:00 ` [PATCH RFC v2 18/19] fuse: {uring} Set a min cpu offset io-size for reads/writes Bernd Schubert
2024-05-29 18:00 ` [PATCH RFC v2 19/19] fuse: {uring} Optimize async sends Bernd Schubert
2024-05-31 16:24   ` Jens Axboe
2024-05-31 17:36     ` Bernd Schubert
2024-05-31 19:10       ` Jens Axboe
2024-06-01 16:37         ` Bernd Schubert
2024-05-30  7:07 ` [PATCH RFC v2 00/19] fuse: fuse-over-io-uring Amir Goldstein
2024-05-30 12:09   ` Bernd Schubert
2024-05-30 15:36 ` Kent Overstreet
2024-05-30 16:02   ` Bernd Schubert
2024-05-30 16:10     ` Kent Overstreet
2024-05-30 16:17       ` Bernd Schubert
2024-05-30 17:30         ` Kent Overstreet
2024-05-30 19:09         ` Josef Bacik
2024-05-30 20:05           ` Kent Overstreet
2024-05-31  3:53         ` [PATCH] fs: sys_ringbuffer() (WIP) Kent Overstreet
2024-05-31 13:11           ` kernel test robot
2024-05-31 15:49           ` kernel test robot
2024-05-30 16:21     ` [PATCH RFC v2 00/19] fuse: fuse-over-io-uring Jens Axboe
2024-05-30 16:32       ` Bernd Schubert
2024-05-30 17:26         ` Jens Axboe
2024-05-30 17:16       ` Kent Overstreet
2024-05-30 17:28         ` Jens Axboe
2024-05-30 17:58           ` Kent Overstreet
2024-05-30 18:48             ` Jens Axboe
2024-05-30 19:35               ` Kent Overstreet
2024-05-31  0:11                 ` Jens Axboe
2024-06-04 23:45       ` Ming Lei
2024-05-30 20:47 ` Josef Bacik
2024-06-11  8:20 ` Miklos Szeredi
2024-06-11 10:26   ` Bernd Schubert
2024-06-11 15:35     ` Miklos Szeredi
2024-06-11 17:37       ` Bernd Schubert
2024-06-11 23:35         ` Kent Overstreet
2024-06-12 13:53           ` Bernd Schubert
2024-06-12 14:19             ` Kent Overstreet
2024-06-12 15:40               ` Bernd Schubert
2024-06-12 15:55                 ` Kent Overstreet
2024-06-12 16:15                   ` Bernd Schubert
2024-06-12 16:24                     ` Kent Overstreet
2024-06-12 16:44                       ` Bernd Schubert
2024-06-12  7:39         ` Miklos Szeredi
2024-06-12 13:32           ` Bernd Schubert
2024-06-12 13:46             ` Bernd Schubert
2024-06-12 14:07             ` Miklos Szeredi
2024-06-12 14:56               ` Bernd Schubert
2024-08-02 23:03                 ` Bernd Schubert
2024-08-29 22:32                 ` Bernd Schubert
2024-08-30 13:12                   ` Jens Axboe
2024-08-30 13:28                     ` Bernd Schubert
2024-08-30 13:33                       ` Jens Axboe
2024-08-30 14:55                         ` Pavel Begunkov
2024-08-30 15:10                           ` Bernd Schubert
2024-08-30 20:08                           ` Jens Axboe
2024-08-31  0:02                             ` Bernd Schubert
2024-08-31  0:49                               ` Bernd Schubert

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240530145953.GB2205585@perftesting \
    --to=josef@toxicpanda.com \
    --cc=amir73il@gmail.com \
    --cc=bernd.schubert@fastmail.fm \
    --cc=bschubert@ddn.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=miklos@szeredi.hu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).