public inbox for linux-fsdevel@vger.kernel.org
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <djwong@kernel.org>
To: Bernd Schubert <bernd@bsbernd.com>, \@magnolia.djwong.org
Cc: linux-fsdevel@vger.kernel.org, Miklos Szeredi <miklos@szeredi.hu>,
	Joanne Koong <joannelkoong@gmail.com>, Kevin Chen <kchen@ddn.com>
Subject: Re: [PATCH v2 04/25] Add a new daemonize API
Date: Mon, 30 Mar 2026 14:25:35 -0700	[thread overview]
Message-ID: <20260330212535.GF6202@frogsfrogsfrogs> (raw)
In-Reply-To: <27b4ec23-ea81-4945-a25d-b2a73bf51074@bsbernd.com>

On Mon, Mar 30, 2026 at 08:26:58PM +0200, Bernd Schubert wrote:
> 
> 
> On 3/30/26 19:45, Darrick J. Wong wrote:
> > On Sat, Mar 28, 2026 at 12:07:35AM +0100, Bernd Schubert wrote:
> >> Hi Darrick,
> >>
> >> On 3/27/26 23:06, Darrick J. Wong wrote:
> >>> On Thu, Mar 26, 2026 at 10:34:37PM +0100, Bernd Schubert wrote:
> >>>> Existing example/ file systems do the fuse_daemonize() after
> >>>> fuse_session_mount() - i.e. after the mount point is already
> >>>> established. Though, these example/ daemons do not start
> >>>> extra threads and do not need network initialization either.
> >>>
> >>> AFAICT the situation here with the extra threads and async FUSE_INIT is:
> >>>
> >>> a) You can't start them until after fuse_daemonize() because that might
> >>>    fork the whole process
> >>
> >> The extra threads are not cloned - they stay attached to the parent.
> > 
> > <nod>  I guess I should have said that explicitly -- "...because that
> > might fork the whole process, and all threads remain attached to the
> > parent."
> > 
> >>>
> >>> b) You don't want to start them until you know that the mount()
> >>>    succeeds (maybe?)
> >>
> >> For me the other way around, I want to start them before the mount and
> >> let them do things like network connection/authorization. Not nice if
> >> the mount has already suceeded and then part of the initialization fails
> >> and a stale mount come up.
> > 
> > <nod> That makes sense, you want to acquire resources and start the
> > threads (or fail) before calling mount().
> > 
> >>> c) You need those threads to be active to start serving the fuse
> >>>    requests that come after FUSE_INIT
> >>>
> >>> d) libfuse apparently starts even more threads to wait on the iouring
> >>>    queues after the fuse server returns from FUSE_INIT.
> >>
> >> Actually it starts these threads from FUSE_INIT and before replying
> >> FUSE_INIT success. In order to tell the fuse-client/kernel if
> >> fuse-server wants io-uring.
> > 
> > Got it.  Now I see the "XXX: Add an option to make non-available
> > io-uring fatal" code 20 lines up.
> > 
> >>> e) fuse_loop_mt() starts up the request handler threads and waits for
> >>>    the session to exit and/or for mt_finish to be sem_post()ed.
> >>>
> >>> Does that sound right?
> >>
> >>
> >>>
> >>> Looking at fuse4fs, I realize that it need /not/ start its background
> >>> threads from the FUSE_INIT handler; all that should be done after
> >>> daemonize before calling fuse_session_loop_mt.  The only reason I wrote
> >>
> >> So after the mount? What is if starting the threads would fail for some
> >> reason?
> > 
> > Right now they're optional features (monitor memory PSI file and flush
> > cache) so it's no big deal if they don't initialize.  But that *does*
> > make the case for adding the parent/child pipelines so that you can
> > daemonize earlier and report failures to the parent process.
> > 
> > Hrm.  Thinking about this more, the new fuse_daemonize_XXX calls make it
> > possible for any fuse server to do pre-mount() initialization in the
> > child and report the outcome to the parent.  Even if that fuse server
> > doesn't know, care, or try to enable sync FUSE_INIT.
> > 
> > So this new API really is separate from FUSE_SYNC_INIT.  The latter
> > depends on the former, but the reverse is not true.
> 
> Absolutely, as I wrote somewhere, this is basically my 3rd implementation of the same thing.

<nod>  

> > 
> >>> it that way was blind patterning after fuse2fs, which doesn't call
> >>> daemonize() directly, so FUSE_INIT is the first time any fuse2fs code
> >>> gets called after daemonizing.
> >>>
> >>>> fuse_daemonize() also does not allow to return notification
> >>>> from the forked child to the parent.
> >>>>
> >>>> Complex fuse file system daemons often want the order of
> >>>> 1) fork - parent watches, child does the work
> >>>>
> >>>> Child:
> >>>>     2) start extra threads and system initialization (like network
> >>>>        connection and RDMA memory registration) from the fork child.
> >>>>     3) Start the fuse session after everything else succeeded
> >>>>
> >>>> Parent:
> >>>>     Report child initialization success or failure
> >>>
> >>> Under classic async FUSE_INIT, the sequence in most fuse servers is:
> >>>
> >>> 1) The parent opens /dev/fuse and mounts the fuse filesystem before even
> >>>    daemonizing
> >>>
> >>> 2) Mounting the fuse fs causes an async FUSE_INIT to be sent to the
> >>>    queues, which sits there because nobody's looking for event yet
> >>>
> >>> 3) The parent daemonize()s, and the child proceeds with setting signal
> >>>    handlers and starting up the fuse-request processing threads
> >>>
> >>> 4) The parent exits, the child continues on to set up the fuse worker
> >>>    threads
> >>>
> >>> 5) One of the request handler threads finally reads /dev/fuse to find
> >>>    the FUSE_INIT request and processes it
> >>>
> >>> 6) do_init (in the lowlevel fuse library) starts up the uring workers
> >>>    if the kernel acknowledges the uring feature.  The fuse server has
> >>>    no means to discover if the fusedev would permit uring before
> >>>    calling mount().
> >>
> >> Yeah, initially I wanted to handle that through an ioctl and independent
> >> of FUSE_INIT, Miklos had asked to change that.
> > 
> > <nod>
> > 
> >>> Does my understanding make sense?
> >>>
> >>>> A new API is introduced to overcome the limitations of
> >>>> fuse_daemonize()
> >>>>
> >>>> fuse_daemonize_start() - fork, but foreground process does not
> >>>> terminate yet and watches the background.
> >>>
> >>> But with sync FUSE_INIT this is not workable because the child has to
> >>> have done (4) before (1) can complete, or it has to set up a temporary
> >>> request handler thread to process the FUSE_INIT.  That's partly why
> >>> fuse_daemonize_start/success/fail() is created here, right?
> >>
> >> I have written two similar APIs for two different daemons at DDN (one in
> >> C and the other in C++), independent of sync FUSE_INIT. For us it is
> >> important to only create the mount point when the network connection
> >> works - that is done by threads not under control from libfuse. With
> >> network you want to see something like "authorization failure" or "host
> >> not reachable" in foreground.
> >>
> >> With sync FUSE_INIT the additional issue of
> >> FUSE_INIT-creates-the-io-uring ring thread came up - even the
> >> <libfuse>/example/* file systems all don't work, because now
> >> fuse_session_mount() creates the ring threads, then comes
> >> fuse_daemonize() - parent exits - ring threads gone.
> > 
> > <nod>
> > 
> >>> And the rest of the reason for the new functions is to enable
> >>> communication between the parent and child processes -- if one dies
> >>> the other can find out about it; and the child can tell the parent the
> >>> outcome of mount()ing the filesystem.
> >>>
> >>> I wonder -- if you know that the kernel supports synchronous FUSE_INIT,
> >>> can you start the main event handling threadpool (i.e. the one created
> >>> in fuse_loop_mt.c) after opening /dev/fuse (obviously) but before
> >>> calling mount()?  That would make a hard requirement of having at least
> >>> one event handling thread, but you wouldn't have to create this temp
> >>> thread just to handle the FUSE_INIT.
> >>
> >> That would work and I believe it would be feasible for libfuse examples,
> >> but what about all the other real file systems out there? They would
> >> need to be rewritten to get sync INIT? And how many people understand
> >> the difference between sync and async init? Is this temp thread causing
> >> issues?
> > 
> > Don't fuse server authors already need to adapt their codebases to the
> > new fuse_daemonize_* calls if either they want to start threads or want
> > sync INIT?
> 
> If run with io-uring yes, otherwise I guess not, the current condition is
> 
> 	if (!se->want_sync_init &&
> 		(se->uring.enable && !fuse_daemonize_is_used())) {
> 		if (se->debug)
> 			fuse_log(FUSE_LOG_DEBUG,
> 					"fuse: sync init not enabled\n");
> 		return 0;
> 	}
> 
> 
> want_sync_init needs to be set if you had your own daemonization process
> (as we have at DDN). 
> 
> > 
> > The problem I have with the temp thread is that it adds a fourth(?)
> > piece of code that handles fuse requests (the existing ones being
> > single-thread read, multi-thread read, iouring) and I think "err, more
> > code that everyone has to understand".
> 
> We can actually switch single threaded to fuse_loop_mt.c and remove 
> fuse_loop.c altogether. I.e. just run a single worker thread with 
> fuse_loop_mt.c

<nod> At this point fuse3.pc always links in -lpthread so I concede that
there aren't really any "single threaded" (aka pthreadless) fuse servers
anymore.

> > Granted, any server that wants sync FUSE_INIT will basically have to be
> > a multithreaded server.
> 
> Does it?

Well libfuse starts one background thread to handle the FUSE_INIT, which
(in my mind) means the fuse server must be careful about thread
synchronization.  But these days there aren't many complex storage
programs that don't pull in pthreads.

> > 
> > Shifting to fuse servers, adding a background thread does have some side
> > effects -- if you want to use pthread_[gs]etspecific from fuse operation
> > handlers, you have to know that FUSE_INIT will run in a different thread
> > from the ones set up by fuse_session_loop_mt.  If the fuse server needs
> > to preallocate the thread-specific variables, it has to know to add an
> > extra preallocation for the FUSE_INIT thread.
> > 
> > (Or you can't use them from FUSE_INIT)
> 
> Hmm, I was hoping these would libfuse internal threads and you never would
> need to care about such things.

Libraries love to leak things, I'm afraid. :(

Granted if you /did/ want to use pthread_setspecific you'd also have to
be aware of (or at least disable) libfuse's ability to scale the fuse
worker pool size up and down.

> >>> Even better the daemonize() changes reduce to just the pipe between
> >>> parent and child and watching either for a return value or the POLLERR
> >>> when either program fails unexpectedly.
> >>
> >> We also need to send a signal to the parent that child has success. How
> >> many pipes we use is an internal detail? If we find a better way later
> >> to reduce the number of pipes, even better.
> > 
> > <nod> Between the parent and child of a fork() operation that's fine, we
> > can update the interface at any time.
> > 
> >>>> fuse_daemonize_success() / fuse_daemonize_fail() - background
> >>>> daemon signals to the foreground process success or failure.
> >>>>
> >>>> fuse_daemonize_active() - helper function for the high level
> >>>> interface, which needs to handle both APIs. fuse_daemonize()
> >>>> is called within fuse_main(), which now needs to know if the caller
> >>>> actually already used the new API itself.
> >>>>
> >>>> The object 'struct fuse_daemonize *' is allocated dynamically
> >>>> and stored a global variable in fuse_daemonize.c, because
> >>>> - high level fuse_main_real_versioned() needs to know
> >>>> if already daemonized
> >>>> - high level daemons do not have access to struct fuse_session
> >>>> - FUSE_SYNC_INIT in later commits can only be done if the new
> >>>> API (or a file system internal) daemonization is used.
> >>>
> >>> I don't know exactly what's required to switch libfuse into uring mode.
> >>> It look as though you inject -oio_uring as a mount option, then libfuse
> >>> sets up the uring, starts some extra workers to handle the ring(?) and
> >>> puts them to sleep.  If the kernel says it supports uring in FUSE_INIT
> >>> then do_init wakes them up.  Each uring thread submits SQEs and waits
> >>> for fuse requests to appear as CQEs, right?
> >>
> >> Yeah, we can also make that later on a default, without the -oio_uring
> >> option (there is also env for ci test purposes to enforce io-uring
> >> mode). Right now it starts too many threads (got distracted today and
> >> didn't send the new version of ring reduction patches - will try
> >> tomorrow). And I also expected bugs in all that new code, so I didn't
> >> want to make it a default.
> > 
> > <nod>
> > 
> >>>
> >>> So after a fuse server negotiates with the kernel about iouring, the
> >>> background threads started by fuse_loop_mt just sit there in read()
> >>> doing nothing else, while new fuse requests get sent to userspace as a
> >>> CQE, right?
> >>
> >> Exactly.
> > 
> > Any reason not to cancel those threads?
> 
> When and why do you want to cancel them? After at fork time and recreate
> them in the fork child? I'm not sure how much liburing likes that. At
> lease with IORING_SETUP_SINGLE_ISSUER and/or IORING_SETUP_COOP_TASKRUN.

Eh, let's leave it alone then :)

> > 
> >> My question is now, are you ok with the new fuse_daemonize API? It
> >> sounds a bit like you don't like it too much.
> > 
> > Now that I've come back with a fresher mind, I think I'm ok with the
> > daemonize API itself.  I've a few minor nits to pick, so I'll reply to
> > the patch itself.
> 
> 
> So now it is about the temporary thread ;)

Yep.  FWIW even now I don't hate it so much as to nak it :P

> Thanks for all your reviews!

Likewise!

--D

> 
> Bernd
> 

  reply	other threads:[~2026-03-30 21:25 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-26 21:34 [PATCH v2 00/25] libfuse: Add support for synchronous init Bernd Schubert
2026-03-26 21:34 ` [PATCH v2 01/25] ci-build: Add environment logging Bernd Schubert
2026-03-27  3:20   ` Darrick J. Wong
2026-03-26 21:34 ` [PATCH v2 02/25] Add 'STRCPY' to the checkpatch ignore option Bernd Schubert
2026-03-26 21:34 ` [PATCH v2 03/25] checkpatch.pl: Add _Atomic to $Attribute patttern Bernd Schubert
2026-03-26 21:34 ` [PATCH v2 04/25] Add a new daemonize API Bernd Schubert
2026-03-27 22:06   ` Darrick J. Wong
2026-03-27 23:07     ` Bernd Schubert
2026-03-28  4:01       ` Darrick J. Wong
2026-03-30 17:45       ` Darrick J. Wong
2026-03-30 18:26         ` Bernd Schubert
2026-03-30 21:25           ` Darrick J. Wong [this message]
2026-03-30 17:55   ` Darrick J. Wong
2026-03-26 21:34 ` [PATCH v2 05/25] Sync fuse_kernel.h with linux-6.18 Bernd Schubert
2026-03-26 21:34 ` [PATCH v2 06/25] mount.c: Split fuse_mount_sys to prepare privileged sync FUSE_INIT Bernd Schubert
2026-03-26 21:34 ` [PATCH v2 07/25] Add FUSE_MOUNT_FALLBACK_NEEDED define for -2 mount errors Bernd Schubert
2026-03-27  3:20   ` Darrick J. Wong
2026-03-26 21:34 ` [PATCH v2 08/25] Refactor mount code / move common functions to mount_util.c Bernd Schubert
2026-03-26 21:34 ` [PATCH v2 09/25] Use asprintf() for fuse_mnt_build_{source,type} Bernd Schubert
2026-03-27  3:24   ` Darrick J. Wong
2026-03-30 15:34     ` Bernd Schubert
2026-03-26 21:34 ` [PATCH v2 10/25] lib/mount.c: Remove some BSD ifdefs Bernd Schubert
2026-03-27  3:28   ` Darrick J. Wong
2026-03-26 21:34 ` [PATCH v2 11/25] Move 'struct mount_flags' to util.h Bernd Schubert
2026-03-30 18:11   ` Darrick J. Wong
2026-03-26 21:34 ` [PATCH v2 12/25] conftest.py: Add more valgrind filter patterns Bernd Schubert
2026-03-30 18:16   ` Darrick J. Wong
2026-03-26 21:34 ` [PATCH v2 13/25] Add support for the new linux mount API Bernd Schubert
2026-03-30 18:27   ` Darrick J. Wong
2026-03-26 21:34 ` [PATCH v2 14/25] fuse mount: Support synchronous FUSE_INIT (privileged daemon) Bernd Schubert
2026-03-30 18:44   ` Darrick J. Wong
2026-03-26 21:34 ` [PATCH v2 15/25] Add fuse_session_set_debug() to enable debug output without foreground Bernd Schubert
2026-03-26 21:34 ` [PATCH v2 16/25] Move more generic mount code to mount_util.{c,h} Bernd Schubert
2026-03-30 18:47   ` Darrick J. Wong
2026-03-26 21:34 ` [PATCH v2 17/25] Split the fusermount do_mount function Bernd Schubert
2026-03-30 18:48   ` Darrick J. Wong
2026-03-26 21:34 ` [PATCH v2 18/25] fusermout: Remove the large read check Bernd Schubert
2026-03-27  3:32   ` Darrick J. Wong
2026-03-30 15:26     ` Bernd Schubert
2026-03-30 17:57       ` Darrick J. Wong
2026-03-26 21:34 ` [PATCH v2 19/25] fusermount: Refactor extract_x_options Bernd Schubert
2026-03-26 21:34 ` [PATCH v2 20/25] Make fusermount work bidirectional for sync init Bernd Schubert
2026-03-30 19:03   ` Darrick J. Wong
2026-03-26 21:34 ` [PATCH v2 21/25] New mount API: Filter out "user=" Bernd Schubert
2026-03-27  3:32   ` Darrick J. Wong
2026-03-26 21:34 ` [PATCH v2 22/25] Add support for sync-init of unprivileged daemons Bernd Schubert
2026-03-31  0:54   ` Darrick J. Wong
2026-03-26 21:34 ` [PATCH v2 23/25] Move fuse_mnt_build_{source,type} to mount_util.c Bernd Schubert
2026-03-30 19:04   ` Darrick J. Wong
2026-03-26 21:34 ` [PATCH v2 24/25] Add mount and daemonization README documents Bernd Schubert
2026-03-31  1:17   ` Darrick J. Wong
2026-03-26 21:34 ` [PATCH v2 25/25] Add a background debug option to passthrough hp Bernd Schubert
2026-03-30 19:04   ` Darrick J. Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260330212535.GF6202@frogsfrogsfrogs \
    --to=djwong@kernel.org \
    --cc=bernd@bsbernd.com \
    --cc=joannelkoong@gmail.com \
    --cc=kchen@ddn.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=miklos@szeredi.hu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox