linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC] Another take at restarting FUSE servers
@ 2025-07-29 13:56 Luis Henriques
  2025-07-29 23:38 ` Darrick J. Wong
  0 siblings, 1 reply; 13+ messages in thread
From: Luis Henriques @ 2025-07-29 13:56 UTC (permalink / raw)
  To: Miklos Szeredi, Bernd Schubert; +Cc: linux-fsdevel, linux-kernel

Hi!

I know this has been discussed several times in several places, and the
recent(ish) addition of NOTIFY_RESEND is an important step towards being
able to restart a user-space FUSE server.

While looking at how to restart a server that uses the libfuse lowlevel
API, I've created an RFC pull request [1] to understand whether adding
support for this operation would be something acceptable in the project.
The PR doesn't do anything sophisticated, it simply hacks into the opaque
libfuse data structures so that a server could set some of the sessions'
fields.

So, a FUSE server simply has to save the /dev/fuse file descriptor and
pass it to libfuse while recovering, after a restart or a crash.  The
mentioned NOTIFY_RESEND should be used so that no requests are lost, of
course.  And there are probably other data structures that user-space file
systems will have to keep track as well, so that everything can be
restored.  (The parameters set in the INIT phase, for example.)

But, from the discussion with Bernd in the PR, one of the things that
would be good to have is for the kernel to send back to user-space the
information about the inodes it already knows about.

I have been playing with this idea with a patch that simply sends out
LOOKUPs for each of these inodes.  This could be done through a new
NOTIFY_RESEND_INODES, or maybe it could be an extra operation added to the
already existing NOTIFY_RESEND.

Anyway, before spending any more time with this, I wanted to ask whether
this is something that could be acceptable in the kernel, if people think
a different approach should be followed, or if I'm simply trying to solve
the wrong problem.

Thanks in advance for any feedback on this.

[1] https://github.com/libfuse/libfuse/pull/1219

Cheers,
-- 
Luís

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC] Another take at restarting FUSE servers
  2025-07-29 13:56 [RFC] Another take at restarting FUSE servers Luis Henriques
@ 2025-07-29 23:38 ` Darrick J. Wong
  2025-07-30 14:04   ` Luis Henriques
  2025-07-31 13:04   ` Theodore Ts'o
  0 siblings, 2 replies; 13+ messages in thread
From: Darrick J. Wong @ 2025-07-29 23:38 UTC (permalink / raw)
  To: Luis Henriques
  Cc: Miklos Szeredi, Bernd Schubert, linux-fsdevel, linux-kernel

On Tue, Jul 29, 2025 at 02:56:02PM +0100, Luis Henriques wrote:
> Hi!
> 
> I know this has been discussed several times in several places, and the
> recent(ish) addition of NOTIFY_RESEND is an important step towards being
> able to restart a user-space FUSE server.
> 
> While looking at how to restart a server that uses the libfuse lowlevel
> API, I've created an RFC pull request [1] to understand whether adding
> support for this operation would be something acceptable in the project.

Just speaking for fuse2fs here -- that would be kinda nifty if libfuse
could restart itself.  It's unclear if doing so will actually enable us
to clear the condition that caused the failure in the first place, but I
suppose fuse2fs /does/ have e2fsck -fy at hand.  So maybe restarts
aren't totally crazy.

> The PR doesn't do anything sophisticated, it simply hacks into the opaque
> libfuse data structures so that a server could set some of the sessions'
> fields.
> 
> So, a FUSE server simply has to save the /dev/fuse file descriptor and
> pass it to libfuse while recovering, after a restart or a crash.  The
> mentioned NOTIFY_RESEND should be used so that no requests are lost, of
> course.  And there are probably other data structures that user-space file
> systems will have to keep track as well, so that everything can be
> restored.  (The parameters set in the INIT phase, for example.)

Yeah, I don't know how that would work in practice.  Would the kernel
send back the old connection flags and whatnot via some sort of
FUSE_REINIT request, and the fuse server can either decide that it will
try to recover, or just bail out?

> But, from the discussion with Bernd in the PR, one of the things that
> would be good to have is for the kernel to send back to user-space the
> information about the inodes it already knows about.
> 
> I have been playing with this idea with a patch that simply sends out
> LOOKUPs for each of these inodes.  This could be done through a new
> NOTIFY_RESEND_INODES, or maybe it could be an extra operation added to the
> already existing NOTIFY_RESEND.

I have no idea if NOTIFY_RESEND already does this, but you'd probably
want to purge all the unreferenced dentries/inodes to reduce the amount
of re-querying.

I gather that any fuse server that wants to reboot itself would either
have to persist what the nodeids map to, or otherwise stabilize them?
For example, fuse2fs could set the nodeid to match the ext2 inode
numbers.  Then reconnecting them wouldn't be too hard.

> Anyway, before spending any more time with this, I wanted to ask whether
> this is something that could be acceptable in the kernel, if people think
> a different approach should be followed, or if I'm simply trying to solve
> the wrong problem.
> 
> Thanks in advance for any feedback on this.
> 
> [1] https://github.com/libfuse/libfuse/pull/1219

Who calls fuse_session_reinitialize() ?

--D

> Cheers,
> -- 
> Luís
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC] Another take at restarting FUSE servers
  2025-07-29 23:38 ` Darrick J. Wong
@ 2025-07-30 14:04   ` Luis Henriques
  2025-07-31 11:33     ` Christian Brauner
  2025-07-31 13:04   ` Theodore Ts'o
  1 sibling, 1 reply; 13+ messages in thread
From: Luis Henriques @ 2025-07-30 14:04 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Miklos Szeredi, Bernd Schubert, linux-fsdevel, linux-kernel

Hi Darrick,

On Tue, Jul 29 2025, Darrick J. Wong wrote:

> On Tue, Jul 29, 2025 at 02:56:02PM +0100, Luis Henriques wrote:
>> Hi!
>> 
>> I know this has been discussed several times in several places, and the
>> recent(ish) addition of NOTIFY_RESEND is an important step towards being
>> able to restart a user-space FUSE server.
>> 
>> While looking at how to restart a server that uses the libfuse lowlevel
>> API, I've created an RFC pull request [1] to understand whether adding
>> support for this operation would be something acceptable in the project.
>
> Just speaking for fuse2fs here -- that would be kinda nifty if libfuse
> could restart itself.  It's unclear if doing so will actually enable us
> to clear the condition that caused the failure in the first place, but I
> suppose fuse2fs /does/ have e2fsck -fy at hand.  So maybe restarts
> aren't totally crazy.

Maybe my PR lacks a bit of ambition -- it's goal wasn't to have libfuse do
the restart itself.  Instead, it simply adds some visibility into the
opaque data structures so that a FUSE server could re-initialise a session
without having to go through a full remount.

But sure, there are other things that could be added to the library as
well.  For example, in my current experiments, the FUSE server needs start
some sort of "file descriptor server" to keep the fd alive for the
restart.  This daemon could be optionally provided in libfuse itself,
which could also be used to store all sorts of blobs needed by the file
system after recovery is done.

>> The PR doesn't do anything sophisticated, it simply hacks into the opaque
>> libfuse data structures so that a server could set some of the sessions'
>> fields.
>> 
>> So, a FUSE server simply has to save the /dev/fuse file descriptor and
>> pass it to libfuse while recovering, after a restart or a crash.  The
>> mentioned NOTIFY_RESEND should be used so that no requests are lost, of
>> course.  And there are probably other data structures that user-space file
>> systems will have to keep track as well, so that everything can be
>> restored.  (The parameters set in the INIT phase, for example.)
>
> Yeah, I don't know how that would work in practice.  Would the kernel
> send back the old connection flags and whatnot via some sort of
> FUSE_REINIT request, and the fuse server can either decide that it will
> try to recover, or just bail out?

That would be an option.  But my current idea would be that the server
would need to store those somewhere and simply assume they are still OK
after reconnecting.  The kernel wouldn't need to know the user-space was
replaced by another server, potentially different, after an upgrade for
example.

Right now, AFAIU, restarting a FUSE server *can* be done without any help
from the kernel side, as long as the fd is kept alive.  The NOTIFY_RESEND
is used only for resending FUSE requests for which the kernel is currently
waiting replies for.  So, for example if the kernel sends a FUSE_READ to
user-space and the server crashes while trying to serve it, the kernel
will still be waiting for that reply.  However, a new server trying to
recover from the crash will have no way to know that.  And this is where
the NOTIFY_RESEND is useful.

>> But, from the discussion with Bernd in the PR, one of the things that
>> would be good to have is for the kernel to send back to user-space the
>> information about the inodes it already knows about.
>> 
>> I have been playing with this idea with a patch that simply sends out
>> LOOKUPs for each of these inodes.  This could be done through a new
>> NOTIFY_RESEND_INODES, or maybe it could be an extra operation added to the
>> already existing NOTIFY_RESEND.
>
> I have no idea if NOTIFY_RESEND already does this, but you'd probably
> want to purge all the unreferenced dentries/inodes to reduce the amount
> of re-querying.

No, NOTIFY_RESEND doesn't purge any of those; currently it simply resend
all the requests.

> I gather that any fuse server that wants to reboot itself would either
> have to persist what the nodeids map to, or otherwise stabilize them?
> For example, fuse2fs could set the nodeid to match the ext2 inode
> numbers.  Then reconnecting them wouldn't be too hard.

Right, that's my understanding as well -- restarting a server requires
stable nodeids.  IIRC most (all?) examples shipped with libfuse can't be
restarted because they cast a pointer (the memory address to some sort of
inode data struct) and use that as the nodeid.

>> Anyway, before spending any more time with this, I wanted to ask whether
>> this is something that could be acceptable in the kernel, if people think
>> a different approach should be followed, or if I'm simply trying to solve
>> the wrong problem.
>> 
>> Thanks in advance for any feedback on this.
>> 
>> [1] https://github.com/libfuse/libfuse/pull/1219
>
> Who calls fuse_session_reinitialize() ?

Ah! Good question!  So, my idea was that a FUSE server would do something
like this:

	fuse_session_new()

	if (do_recovery) {
		get_old_fd()
		fuse_session_reinitialize()
                fuse_lowlevel_notify_resend()
	} else
		fuse_session_mount()

	fuse_daemonize()
	fuse_session_loop_mt()

Anyway, my initial concerns with restartability started because it is
currently not possible to restart a server that uses libfuse without
hacking into it's internal data structures.  The idea of resending all
LOOKUPs just came from the discussion in the PR.

Cheers,
-- 
Luís

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC] Another take at restarting FUSE servers
  2025-07-30 14:04   ` Luis Henriques
@ 2025-07-31 11:33     ` Christian Brauner
  2025-07-31 12:23       ` Luis Henriques
  2025-07-31 17:29       ` Darrick J. Wong
  0 siblings, 2 replies; 13+ messages in thread
From: Christian Brauner @ 2025-07-31 11:33 UTC (permalink / raw)
  To: Luis Henriques
  Cc: Darrick J. Wong, Miklos Szeredi, Bernd Schubert, linux-fsdevel,
	linux-kernel

On Wed, Jul 30, 2025 at 03:04:00PM +0100, Luis Henriques wrote:
> Hi Darrick,
> 
> On Tue, Jul 29 2025, Darrick J. Wong wrote:
> 
> > On Tue, Jul 29, 2025 at 02:56:02PM +0100, Luis Henriques wrote:
> >> Hi!
> >> 
> >> I know this has been discussed several times in several places, and the
> >> recent(ish) addition of NOTIFY_RESEND is an important step towards being
> >> able to restart a user-space FUSE server.
> >> 
> >> While looking at how to restart a server that uses the libfuse lowlevel
> >> API, I've created an RFC pull request [1] to understand whether adding
> >> support for this operation would be something acceptable in the project.
> >
> > Just speaking for fuse2fs here -- that would be kinda nifty if libfuse
> > could restart itself.  It's unclear if doing so will actually enable us
> > to clear the condition that caused the failure in the first place, but I
> > suppose fuse2fs /does/ have e2fsck -fy at hand.  So maybe restarts
> > aren't totally crazy.
> 
> Maybe my PR lacks a bit of ambition -- it's goal wasn't to have libfuse do
> the restart itself.  Instead, it simply adds some visibility into the
> opaque data structures so that a FUSE server could re-initialise a session
> without having to go through a full remount.
> 
> But sure, there are other things that could be added to the library as
> well.  For example, in my current experiments, the FUSE server needs start
> some sort of "file descriptor server" to keep the fd alive for the
> restart.  This daemon could be optionally provided in libfuse itself,
> which could also be used to store all sorts of blobs needed by the file
> system after recovery is done.

Fwiw, for most use-cases you really just want to use systemd's file
descriptor store to persist the /dev/fuse connection:
https://systemd.io/FILE_DESCRIPTOR_STORE/

> 
> >> The PR doesn't do anything sophisticated, it simply hacks into the opaque
> >> libfuse data structures so that a server could set some of the sessions'
> >> fields.
> >> 
> >> So, a FUSE server simply has to save the /dev/fuse file descriptor and
> >> pass it to libfuse while recovering, after a restart or a crash.  The
> >> mentioned NOTIFY_RESEND should be used so that no requests are lost, of
> >> course.  And there are probably other data structures that user-space file
> >> systems will have to keep track as well, so that everything can be
> >> restored.  (The parameters set in the INIT phase, for example.)
> >
> > Yeah, I don't know how that would work in practice.  Would the kernel
> > send back the old connection flags and whatnot via some sort of
> > FUSE_REINIT request, and the fuse server can either decide that it will
> > try to recover, or just bail out?
> 
> That would be an option.  But my current idea would be that the server
> would need to store those somewhere and simply assume they are still OK

The fdstore currently allows to associate a name with a file descriptor
in the fdstore. That name would allow you to associate the options with
the fuse connection. However, I would not rule it out that additional
metadata could be attached to file descriptors in the fdstore if that's
something that's needed.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC] Another take at restarting FUSE servers
  2025-07-31 11:33     ` Christian Brauner
@ 2025-07-31 12:23       ` Luis Henriques
  2025-07-31 17:29       ` Darrick J. Wong
  1 sibling, 0 replies; 13+ messages in thread
From: Luis Henriques @ 2025-07-31 12:23 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Darrick J. Wong, Miklos Szeredi, Bernd Schubert, linux-fsdevel,
	linux-kernel

On Thu, Jul 31 2025, Christian Brauner wrote:

> On Wed, Jul 30, 2025 at 03:04:00PM +0100, Luis Henriques wrote:
>> Hi Darrick,
>> 
>> On Tue, Jul 29 2025, Darrick J. Wong wrote:
>> 
>> > On Tue, Jul 29, 2025 at 02:56:02PM +0100, Luis Henriques wrote:
>> >> Hi!
>> >> 
>> >> I know this has been discussed several times in several places, and the
>> >> recent(ish) addition of NOTIFY_RESEND is an important step towards being
>> >> able to restart a user-space FUSE server.
>> >> 
>> >> While looking at how to restart a server that uses the libfuse lowlevel
>> >> API, I've created an RFC pull request [1] to understand whether adding
>> >> support for this operation would be something acceptable in the project.
>> >
>> > Just speaking for fuse2fs here -- that would be kinda nifty if libfuse
>> > could restart itself.  It's unclear if doing so will actually enable us
>> > to clear the condition that caused the failure in the first place, but I
>> > suppose fuse2fs /does/ have e2fsck -fy at hand.  So maybe restarts
>> > aren't totally crazy.
>> 
>> Maybe my PR lacks a bit of ambition -- it's goal wasn't to have libfuse do
>> the restart itself.  Instead, it simply adds some visibility into the
>> opaque data structures so that a FUSE server could re-initialise a session
>> without having to go through a full remount.
>> 
>> But sure, there are other things that could be added to the library as
>> well.  For example, in my current experiments, the FUSE server needs start
>> some sort of "file descriptor server" to keep the fd alive for the
>> restart.  This daemon could be optionally provided in libfuse itself,
>> which could also be used to store all sorts of blobs needed by the file
>> system after recovery is done.
>
> Fwiw, for most use-cases you really just want to use systemd's file
> descriptor store to persist the /dev/fuse connection:
> https://systemd.io/FILE_DESCRIPTOR_STORE/

Thank you, Christian.  I guess I should have mentioned systemd's fdstore
here.  In fact, I knew about it, but in my experiments I decided not to
use it because it's trivial to keep the fd alive[1] (and also because my
test environment doesn't run systemd).

But still, any eventual libfuse support could still include the interface
with fdstore for that.

[1] Obviously "it's trivial" for my experiments.  Doing it in a secure way
    is probably a bit more challenging.

Cheers,
-- 
Luís

>
>> 
>> >> The PR doesn't do anything sophisticated, it simply hacks into the opaque
>> >> libfuse data structures so that a server could set some of the sessions'
>> >> fields.
>> >> 
>> >> So, a FUSE server simply has to save the /dev/fuse file descriptor and
>> >> pass it to libfuse while recovering, after a restart or a crash.  The
>> >> mentioned NOTIFY_RESEND should be used so that no requests are lost, of
>> >> course.  And there are probably other data structures that user-space file
>> >> systems will have to keep track as well, so that everything can be
>> >> restored.  (The parameters set in the INIT phase, for example.)
>> >
>> > Yeah, I don't know how that would work in practice.  Would the kernel
>> > send back the old connection flags and whatnot via some sort of
>> > FUSE_REINIT request, and the fuse server can either decide that it will
>> > try to recover, or just bail out?
>> 
>> That would be an option.  But my current idea would be that the server
>> would need to store those somewhere and simply assume they are still OK
>
> The fdstore currently allows to associate a name with a file descriptor
> in the fdstore. That name would allow you to associate the options with
> the fuse connection. However, I would not rule it out that additional
> metadata could be attached to file descriptors in the fdstore if that's
> something that's needed.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC] Another take at restarting FUSE servers
  2025-07-29 23:38 ` Darrick J. Wong
  2025-07-30 14:04   ` Luis Henriques
@ 2025-07-31 13:04   ` Theodore Ts'o
  2025-07-31 17:38     ` Darrick J. Wong
  1 sibling, 1 reply; 13+ messages in thread
From: Theodore Ts'o @ 2025-07-31 13:04 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Luis Henriques, Miklos Szeredi, Bernd Schubert, linux-fsdevel,
	linux-kernel

On Tue, Jul 29, 2025 at 04:38:54PM -0700, Darrick J. Wong wrote:
> 
> Just speaking for fuse2fs here -- that would be kinda nifty if libfuse
> could restart itself.  It's unclear if doing so will actually enable us
> to clear the condition that caused the failure in the first place, but I
> suppose fuse2fs /does/ have e2fsck -fy at hand.  So maybe restarts
> aren't totally crazy.

I'm trying to understand what the failure scenario is here.  Is this
if the userspace fuse server (i.e., fuse2fs) has crashed?  If so, what
is supposed to happen with respect to open files, metadata and data
modifications which were in transit, etc.?  Sure, fuse2fs could run
e2fsck -fy, but if there are dirty inode on the system, that's going
potentally to be out of sync, right?

What are the recovery semantics that we hope to be able to provide?

     	     	      		     	     - Ted

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC] Another take at restarting FUSE servers
  2025-07-31 11:33     ` Christian Brauner
  2025-07-31 12:23       ` Luis Henriques
@ 2025-07-31 17:29       ` Darrick J. Wong
  2025-08-04  8:45         ` Christian Brauner
  1 sibling, 1 reply; 13+ messages in thread
From: Darrick J. Wong @ 2025-07-31 17:29 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Luis Henriques, Miklos Szeredi, Bernd Schubert, linux-fsdevel,
	linux-kernel

On Thu, Jul 31, 2025 at 01:33:09PM +0200, Christian Brauner wrote:
> On Wed, Jul 30, 2025 at 03:04:00PM +0100, Luis Henriques wrote:
> > Hi Darrick,
> > 
> > On Tue, Jul 29 2025, Darrick J. Wong wrote:
> > 
> > > On Tue, Jul 29, 2025 at 02:56:02PM +0100, Luis Henriques wrote:
> > >> Hi!
> > >> 
> > >> I know this has been discussed several times in several places, and the
> > >> recent(ish) addition of NOTIFY_RESEND is an important step towards being
> > >> able to restart a user-space FUSE server.
> > >> 
> > >> While looking at how to restart a server that uses the libfuse lowlevel
> > >> API, I've created an RFC pull request [1] to understand whether adding
> > >> support for this operation would be something acceptable in the project.
> > >
> > > Just speaking for fuse2fs here -- that would be kinda nifty if libfuse
> > > could restart itself.  It's unclear if doing so will actually enable us
> > > to clear the condition that caused the failure in the first place, but I
> > > suppose fuse2fs /does/ have e2fsck -fy at hand.  So maybe restarts
> > > aren't totally crazy.
> > 
> > Maybe my PR lacks a bit of ambition -- it's goal wasn't to have libfuse do
> > the restart itself.  Instead, it simply adds some visibility into the
> > opaque data structures so that a FUSE server could re-initialise a session
> > without having to go through a full remount.
> > 
> > But sure, there are other things that could be added to the library as
> > well.  For example, in my current experiments, the FUSE server needs start
> > some sort of "file descriptor server" to keep the fd alive for the
> > restart.  This daemon could be optionally provided in libfuse itself,
> > which could also be used to store all sorts of blobs needed by the file
> > system after recovery is done.
> 
> Fwiw, for most use-cases you really just want to use systemd's file
> descriptor store to persist the /dev/fuse connection:
> https://systemd.io/FILE_DESCRIPTOR_STORE/

Very nice!  This is exactly what I was looking for to handle the initial
setup, so I'm glad I don't have to go design a protocol around that.

> > 
> > >> The PR doesn't do anything sophisticated, it simply hacks into the opaque
> > >> libfuse data structures so that a server could set some of the sessions'
> > >> fields.
> > >> 
> > >> So, a FUSE server simply has to save the /dev/fuse file descriptor and
> > >> pass it to libfuse while recovering, after a restart or a crash.  The
> > >> mentioned NOTIFY_RESEND should be used so that no requests are lost, of
> > >> course.  And there are probably other data structures that user-space file
> > >> systems will have to keep track as well, so that everything can be
> > >> restored.  (The parameters set in the INIT phase, for example.)
> > >
> > > Yeah, I don't know how that would work in practice.  Would the kernel
> > > send back the old connection flags and whatnot via some sort of
> > > FUSE_REINIT request, and the fuse server can either decide that it will
> > > try to recover, or just bail out?
> > 
> > That would be an option.  But my current idea would be that the server
> > would need to store those somewhere and simply assume they are still OK
> 
> The fdstore currently allows to associate a name with a file descriptor
> in the fdstore. That name would allow you to associate the options with
> the fuse connection. However, I would not rule it out that additional
> metadata could be attached to file descriptors in the fdstore if that's
> something that's needed.

Names are useful, I'd at least want "fusedev", "fsopen", and "device".

If someone passed "journal_dev=/dev/sdaX" to fuse2fs then I'd want it to
be able to tell mountfsd "Hey, can you also open /dev/sdaX and put it in
the store as 'journal_dev'?" Then it just has to wait until the fd shows
up, and it can continue with the mount process.

Though the "device" argument needn't be a path, so to be fully general
mountfsd and the fuse server would have to handshake that as well.

--D

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC] Another take at restarting FUSE servers
  2025-07-31 13:04   ` Theodore Ts'o
@ 2025-07-31 17:38     ` Darrick J. Wong
  2025-08-01 10:15       ` Luis Henriques
  0 siblings, 1 reply; 13+ messages in thread
From: Darrick J. Wong @ 2025-07-31 17:38 UTC (permalink / raw)
  To: Theodore Ts'o
  Cc: Luis Henriques, Miklos Szeredi, Bernd Schubert, linux-fsdevel,
	linux-kernel

On Thu, Jul 31, 2025 at 09:04:58AM -0400, Theodore Ts'o wrote:
> On Tue, Jul 29, 2025 at 04:38:54PM -0700, Darrick J. Wong wrote:
> > 
> > Just speaking for fuse2fs here -- that would be kinda nifty if libfuse
> > could restart itself.  It's unclear if doing so will actually enable us
> > to clear the condition that caused the failure in the first place, but I
> > suppose fuse2fs /does/ have e2fsck -fy at hand.  So maybe restarts
> > aren't totally crazy.
> 
> I'm trying to understand what the failure scenario is here.  Is this
> if the userspace fuse server (i.e., fuse2fs) has crashed?  If so, what
> is supposed to happen with respect to open files, metadata and data
> modifications which were in transit, etc.?  Sure, fuse2fs could run
> e2fsck -fy, but if there are dirty inode on the system, that's going
> potentally to be out of sync, right?
> 
> What are the recovery semantics that we hope to be able to provide?

<echoing what we said on the ext4 call this morning>

With iomap, most of the dirty state is in the kernel, so I think the new
fuse2fs instance would poke the kernel with FUSE_NOTIFY_RESTARTED, which
would initiate GETATTR requests on all the cached inodes to validate
that they still exist; and then resend all the unacknowledged requests
that were pending at the time.  It might be the case that you have to
that in the reverse order; I only know enough about the design of fuse
to suspect that to be true.

Anyhow once those are complete, I think we can resume operations with
the surviving inodes.  The ones that fail the GETATTR revalidation are
fuse_make_bad'd, which effectively revokes them.

All of this of course relies on fuse2fs maintaining as little volatile
state of its own as possible.  I think that means disabling the block
cache in the unix io manager, and if we ever implemented delalloc then
either we'd have to save the reservations somewhere or I guess you could
immediately syncfs the whole filesystem to try to push all the dirty
data to disk before we start allowing new free space allocations for new
changes.

--D

>      	     	      		     	     - Ted
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC] Another take at restarting FUSE servers
  2025-07-31 17:38     ` Darrick J. Wong
@ 2025-08-01 10:15       ` Luis Henriques
  2025-08-11 15:43         ` Darrick J. Wong
  0 siblings, 1 reply; 13+ messages in thread
From: Luis Henriques @ 2025-08-01 10:15 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Theodore Ts'o, Miklos Szeredi, Bernd Schubert, linux-fsdevel,
	linux-kernel

On Thu, Jul 31 2025, Darrick J. Wong wrote:

> On Thu, Jul 31, 2025 at 09:04:58AM -0400, Theodore Ts'o wrote:
>> On Tue, Jul 29, 2025 at 04:38:54PM -0700, Darrick J. Wong wrote:
>> > 
>> > Just speaking for fuse2fs here -- that would be kinda nifty if libfuse
>> > could restart itself.  It's unclear if doing so will actually enable us
>> > to clear the condition that caused the failure in the first place, but I
>> > suppose fuse2fs /does/ have e2fsck -fy at hand.  So maybe restarts
>> > aren't totally crazy.
>> 
>> I'm trying to understand what the failure scenario is here.  Is this
>> if the userspace fuse server (i.e., fuse2fs) has crashed?  If so, what
>> is supposed to happen with respect to open files, metadata and data
>> modifications which were in transit, etc.?  Sure, fuse2fs could run
>> e2fsck -fy, but if there are dirty inode on the system, that's going
>> potentally to be out of sync, right?
>> 
>> What are the recovery semantics that we hope to be able to provide?
>
> <echoing what we said on the ext4 call this morning>
>
> With iomap, most of the dirty state is in the kernel, so I think the new
> fuse2fs instance would poke the kernel with FUSE_NOTIFY_RESTARTED, which
> would initiate GETATTR requests on all the cached inodes to validate
> that they still exist; and then resend all the unacknowledged requests
> that were pending at the time.  It might be the case that you have to
> that in the reverse order; I only know enough about the design of fuse
> to suspect that to be true.
>
> Anyhow once those are complete, I think we can resume operations with
> the surviving inodes.  The ones that fail the GETATTR revalidation are
> fuse_make_bad'd, which effectively revokes them.

Ah! Interesting, I have been playing a bit with sending LOOKUP requests,
but probably GETATTR is a better option.

So, are you currently working on any of this?  Are you implementing this
new NOTIFY_RESTARTED request?  I guess it's time for me to have a closer
look at fuse2fs too.

Cheers,
-- 
Luís

> All of this of course relies on fuse2fs maintaining as little volatile
> state of its own as possible.  I think that means disabling the block
> cache in the unix io manager, and if we ever implemented delalloc then
> either we'd have to save the reservations somewhere or I guess you could
> immediately syncfs the whole filesystem to try to push all the dirty
> data to disk before we start allowing new free space allocations for new
> changes.
>
> --D
>
>>      	     	      		     	     - Ted
>> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC] Another take at restarting FUSE servers
  2025-07-31 17:29       ` Darrick J. Wong
@ 2025-08-04  8:45         ` Christian Brauner
  2025-08-12 19:28           ` Darrick J. Wong
  0 siblings, 1 reply; 13+ messages in thread
From: Christian Brauner @ 2025-08-04  8:45 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Luis Henriques, Miklos Szeredi, Bernd Schubert, linux-fsdevel,
	linux-kernel

On Thu, Jul 31, 2025 at 10:29:46AM -0700, Darrick J. Wong wrote:
> On Thu, Jul 31, 2025 at 01:33:09PM +0200, Christian Brauner wrote:
> > On Wed, Jul 30, 2025 at 03:04:00PM +0100, Luis Henriques wrote:
> > > Hi Darrick,
> > > 
> > > On Tue, Jul 29 2025, Darrick J. Wong wrote:
> > > 
> > > > On Tue, Jul 29, 2025 at 02:56:02PM +0100, Luis Henriques wrote:
> > > >> Hi!
> > > >> 
> > > >> I know this has been discussed several times in several places, and the
> > > >> recent(ish) addition of NOTIFY_RESEND is an important step towards being
> > > >> able to restart a user-space FUSE server.
> > > >> 
> > > >> While looking at how to restart a server that uses the libfuse lowlevel
> > > >> API, I've created an RFC pull request [1] to understand whether adding
> > > >> support for this operation would be something acceptable in the project.
> > > >
> > > > Just speaking for fuse2fs here -- that would be kinda nifty if libfuse
> > > > could restart itself.  It's unclear if doing so will actually enable us
> > > > to clear the condition that caused the failure in the first place, but I
> > > > suppose fuse2fs /does/ have e2fsck -fy at hand.  So maybe restarts
> > > > aren't totally crazy.
> > > 
> > > Maybe my PR lacks a bit of ambition -- it's goal wasn't to have libfuse do
> > > the restart itself.  Instead, it simply adds some visibility into the
> > > opaque data structures so that a FUSE server could re-initialise a session
> > > without having to go through a full remount.
> > > 
> > > But sure, there are other things that could be added to the library as
> > > well.  For example, in my current experiments, the FUSE server needs start
> > > some sort of "file descriptor server" to keep the fd alive for the
> > > restart.  This daemon could be optionally provided in libfuse itself,
> > > which could also be used to store all sorts of blobs needed by the file
> > > system after recovery is done.
> > 
> > Fwiw, for most use-cases you really just want to use systemd's file
> > descriptor store to persist the /dev/fuse connection:
> > https://systemd.io/FILE_DESCRIPTOR_STORE/
> 
> Very nice!  This is exactly what I was looking for to handle the initial
> setup, so I'm glad I don't have to go design a protocol around that.
> 
> > > 
> > > >> The PR doesn't do anything sophisticated, it simply hacks into the opaque
> > > >> libfuse data structures so that a server could set some of the sessions'
> > > >> fields.
> > > >> 
> > > >> So, a FUSE server simply has to save the /dev/fuse file descriptor and
> > > >> pass it to libfuse while recovering, after a restart or a crash.  The
> > > >> mentioned NOTIFY_RESEND should be used so that no requests are lost, of
> > > >> course.  And there are probably other data structures that user-space file
> > > >> systems will have to keep track as well, so that everything can be
> > > >> restored.  (The parameters set in the INIT phase, for example.)
> > > >
> > > > Yeah, I don't know how that would work in practice.  Would the kernel
> > > > send back the old connection flags and whatnot via some sort of
> > > > FUSE_REINIT request, and the fuse server can either decide that it will
> > > > try to recover, or just bail out?
> > > 
> > > That would be an option.  But my current idea would be that the server
> > > would need to store those somewhere and simply assume they are still OK
> > 
> > The fdstore currently allows to associate a name with a file descriptor
> > in the fdstore. That name would allow you to associate the options with
> > the fuse connection. However, I would not rule it out that additional
> > metadata could be attached to file descriptors in the fdstore if that's
> > something that's needed.
> 
> Names are useful, I'd at least want "fusedev", "fsopen", and "device".
> 
> If someone passed "journal_dev=/dev/sdaX" to fuse2fs then I'd want it to
> be able to tell mountfsd "Hey, can you also open /dev/sdaX and put it in
> the store as 'journal_dev'?" Then it just has to wait until the fd shows
> up, and it can continue with the mount process.
> 
> Though the "device" argument needn't be a path, so to be fully general
> mountfsd and the fuse server would have to handshake that as well.

Fwiw, to attach arbitrary metadata to a file descriptor the easiest
thing to do would be to stash both a (fuse server) file descriptor and
then also a memfd via memfd_create() that e.g., can contain all the
server options that you want to store.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC] Another take at restarting FUSE servers
  2025-08-01 10:15       ` Luis Henriques
@ 2025-08-11 15:43         ` Darrick J. Wong
  2025-08-13 13:14           ` Luis Henriques
  0 siblings, 1 reply; 13+ messages in thread
From: Darrick J. Wong @ 2025-08-11 15:43 UTC (permalink / raw)
  To: Luis Henriques
  Cc: Theodore Ts'o, Miklos Szeredi, Bernd Schubert, linux-fsdevel,
	linux-kernel

On Fri, Aug 01, 2025 at 11:15:26AM +0100, Luis Henriques wrote:
> On Thu, Jul 31 2025, Darrick J. Wong wrote:
> 
> > On Thu, Jul 31, 2025 at 09:04:58AM -0400, Theodore Ts'o wrote:
> >> On Tue, Jul 29, 2025 at 04:38:54PM -0700, Darrick J. Wong wrote:
> >> > 
> >> > Just speaking for fuse2fs here -- that would be kinda nifty if libfuse
> >> > could restart itself.  It's unclear if doing so will actually enable us
> >> > to clear the condition that caused the failure in the first place, but I
> >> > suppose fuse2fs /does/ have e2fsck -fy at hand.  So maybe restarts
> >> > aren't totally crazy.
> >> 
> >> I'm trying to understand what the failure scenario is here.  Is this
> >> if the userspace fuse server (i.e., fuse2fs) has crashed?  If so, what
> >> is supposed to happen with respect to open files, metadata and data
> >> modifications which were in transit, etc.?  Sure, fuse2fs could run
> >> e2fsck -fy, but if there are dirty inode on the system, that's going
> >> potentally to be out of sync, right?
> >> 
> >> What are the recovery semantics that we hope to be able to provide?
> >
> > <echoing what we said on the ext4 call this morning>
> >
> > With iomap, most of the dirty state is in the kernel, so I think the new
> > fuse2fs instance would poke the kernel with FUSE_NOTIFY_RESTARTED, which
> > would initiate GETATTR requests on all the cached inodes to validate
> > that they still exist; and then resend all the unacknowledged requests
> > that were pending at the time.  It might be the case that you have to
> > that in the reverse order; I only know enough about the design of fuse
> > to suspect that to be true.
> >
> > Anyhow once those are complete, I think we can resume operations with
> > the surviving inodes.  The ones that fail the GETATTR revalidation are
> > fuse_make_bad'd, which effectively revokes them.
> 
> Ah! Interesting, I have been playing a bit with sending LOOKUP requests,
> but probably GETATTR is a better option.
> 
> So, are you currently working on any of this?  Are you implementing this
> new NOTIFY_RESTARTED request?  I guess it's time for me to have a closer
> look at fuse2fs too.

Nope, right now I'm concentrating on making sure the fuse/iomap IO path
works reliably; and converting fuse2fs to be a lowlevel fuse server.
Eliminating all the path walking stuff that the highlevel fuse library
does reduces the fstests runtime from 7.9 to 3.5h, and turning on iomap
cuts that to 2.2h.

--D

> Cheers,
> -- 
> Luís
> 
> > All of this of course relies on fuse2fs maintaining as little volatile
> > state of its own as possible.  I think that means disabling the block
> > cache in the unix io manager, and if we ever implemented delalloc then
> > either we'd have to save the reservations somewhere or I guess you could
> > immediately syncfs the whole filesystem to try to push all the dirty
> > data to disk before we start allowing new free space allocations for new
> > changes.
> >
> > --D
> >
> >>      	     	      		     	     - Ted
> >> 
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC] Another take at restarting FUSE servers
  2025-08-04  8:45         ` Christian Brauner
@ 2025-08-12 19:28           ` Darrick J. Wong
  0 siblings, 0 replies; 13+ messages in thread
From: Darrick J. Wong @ 2025-08-12 19:28 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Luis Henriques, Miklos Szeredi, Bernd Schubert, linux-fsdevel,
	linux-kernel

On Mon, Aug 04, 2025 at 10:45:44AM +0200, Christian Brauner wrote:
> On Thu, Jul 31, 2025 at 10:29:46AM -0700, Darrick J. Wong wrote:
> > On Thu, Jul 31, 2025 at 01:33:09PM +0200, Christian Brauner wrote:
> > > On Wed, Jul 30, 2025 at 03:04:00PM +0100, Luis Henriques wrote:
> > > > Hi Darrick,
> > > > 
> > > > On Tue, Jul 29 2025, Darrick J. Wong wrote:
> > > > 
> > > > > On Tue, Jul 29, 2025 at 02:56:02PM +0100, Luis Henriques wrote:
> > > > >> Hi!
> > > > >> 
> > > > >> I know this has been discussed several times in several places, and the
> > > > >> recent(ish) addition of NOTIFY_RESEND is an important step towards being
> > > > >> able to restart a user-space FUSE server.
> > > > >> 
> > > > >> While looking at how to restart a server that uses the libfuse lowlevel
> > > > >> API, I've created an RFC pull request [1] to understand whether adding
> > > > >> support for this operation would be something acceptable in the project.
> > > > >
> > > > > Just speaking for fuse2fs here -- that would be kinda nifty if libfuse
> > > > > could restart itself.  It's unclear if doing so will actually enable us
> > > > > to clear the condition that caused the failure in the first place, but I
> > > > > suppose fuse2fs /does/ have e2fsck -fy at hand.  So maybe restarts
> > > > > aren't totally crazy.
> > > > 
> > > > Maybe my PR lacks a bit of ambition -- it's goal wasn't to have libfuse do
> > > > the restart itself.  Instead, it simply adds some visibility into the
> > > > opaque data structures so that a FUSE server could re-initialise a session
> > > > without having to go through a full remount.
> > > > 
> > > > But sure, there are other things that could be added to the library as
> > > > well.  For example, in my current experiments, the FUSE server needs start
> > > > some sort of "file descriptor server" to keep the fd alive for the
> > > > restart.  This daemon could be optionally provided in libfuse itself,
> > > > which could also be used to store all sorts of blobs needed by the file
> > > > system after recovery is done.
> > > 
> > > Fwiw, for most use-cases you really just want to use systemd's file
> > > descriptor store to persist the /dev/fuse connection:
> > > https://systemd.io/FILE_DESCRIPTOR_STORE/
> > 
> > Very nice!  This is exactly what I was looking for to handle the initial
> > setup, so I'm glad I don't have to go design a protocol around that.
> > 
> > > > 
> > > > >> The PR doesn't do anything sophisticated, it simply hacks into the opaque
> > > > >> libfuse data structures so that a server could set some of the sessions'
> > > > >> fields.
> > > > >> 
> > > > >> So, a FUSE server simply has to save the /dev/fuse file descriptor and
> > > > >> pass it to libfuse while recovering, after a restart or a crash.  The
> > > > >> mentioned NOTIFY_RESEND should be used so that no requests are lost, of
> > > > >> course.  And there are probably other data structures that user-space file
> > > > >> systems will have to keep track as well, so that everything can be
> > > > >> restored.  (The parameters set in the INIT phase, for example.)
> > > > >
> > > > > Yeah, I don't know how that would work in practice.  Would the kernel
> > > > > send back the old connection flags and whatnot via some sort of
> > > > > FUSE_REINIT request, and the fuse server can either decide that it will
> > > > > try to recover, or just bail out?
> > > > 
> > > > That would be an option.  But my current idea would be that the server
> > > > would need to store those somewhere and simply assume they are still OK
> > > 
> > > The fdstore currently allows to associate a name with a file descriptor
> > > in the fdstore. That name would allow you to associate the options with
> > > the fuse connection. However, I would not rule it out that additional
> > > metadata could be attached to file descriptors in the fdstore if that's
> > > something that's needed.
> > 
> > Names are useful, I'd at least want "fusedev", "fsopen", and "device".
> > 
> > If someone passed "journal_dev=/dev/sdaX" to fuse2fs then I'd want it to
> > be able to tell mountfsd "Hey, can you also open /dev/sdaX and put it in
> > the store as 'journal_dev'?" Then it just has to wait until the fd shows
> > up, and it can continue with the mount process.
> > 
> > Though the "device" argument needn't be a path, so to be fully general
> > mountfsd and the fuse server would have to handshake that as well.
> 
> Fwiw, to attach arbitrary metadata to a file descriptor the easiest
> thing to do would be to stash both a (fuse server) file descriptor and
> then also a memfd via memfd_create() that e.g., can contain all the
> server options that you want to store.

<nod> I'll keep that in mind when I get to designing those components.
Thanks for the input!

(I'm still working on stabiling the new fuse4fs server, it's probably
going to be a while yet...)

--D

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC] Another take at restarting FUSE servers
  2025-08-11 15:43         ` Darrick J. Wong
@ 2025-08-13 13:14           ` Luis Henriques
  0 siblings, 0 replies; 13+ messages in thread
From: Luis Henriques @ 2025-08-13 13:14 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Theodore Ts'o, Miklos Szeredi, Bernd Schubert, linux-fsdevel,
	linux-kernel

On Mon, Aug 11 2025, Darrick J. Wong wrote:

> On Fri, Aug 01, 2025 at 11:15:26AM +0100, Luis Henriques wrote:
>> On Thu, Jul 31 2025, Darrick J. Wong wrote:
>> 
>> > On Thu, Jul 31, 2025 at 09:04:58AM -0400, Theodore Ts'o wrote:
>> >> On Tue, Jul 29, 2025 at 04:38:54PM -0700, Darrick J. Wong wrote:
>> >> > 
>> >> > Just speaking for fuse2fs here -- that would be kinda nifty if libfuse
>> >> > could restart itself.  It's unclear if doing so will actually enable us
>> >> > to clear the condition that caused the failure in the first place, but I
>> >> > suppose fuse2fs /does/ have e2fsck -fy at hand.  So maybe restarts
>> >> > aren't totally crazy.
>> >> 
>> >> I'm trying to understand what the failure scenario is here.  Is this
>> >> if the userspace fuse server (i.e., fuse2fs) has crashed?  If so, what
>> >> is supposed to happen with respect to open files, metadata and data
>> >> modifications which were in transit, etc.?  Sure, fuse2fs could run
>> >> e2fsck -fy, but if there are dirty inode on the system, that's going
>> >> potentally to be out of sync, right?
>> >> 
>> >> What are the recovery semantics that we hope to be able to provide?
>> >
>> > <echoing what we said on the ext4 call this morning>
>> >
>> > With iomap, most of the dirty state is in the kernel, so I think the new
>> > fuse2fs instance would poke the kernel with FUSE_NOTIFY_RESTARTED, which
>> > would initiate GETATTR requests on all the cached inodes to validate
>> > that they still exist; and then resend all the unacknowledged requests
>> > that were pending at the time.  It might be the case that you have to
>> > that in the reverse order; I only know enough about the design of fuse
>> > to suspect that to be true.
>> >
>> > Anyhow once those are complete, I think we can resume operations with
>> > the surviving inodes.  The ones that fail the GETATTR revalidation are
>> > fuse_make_bad'd, which effectively revokes them.
>> 
>> Ah! Interesting, I have been playing a bit with sending LOOKUP requests,
>> but probably GETATTR is a better option.
>> 
>> So, are you currently working on any of this?  Are you implementing this
>> new NOTIFY_RESTARTED request?  I guess it's time for me to have a closer
>> look at fuse2fs too.
>
> Nope, right now I'm concentrating on making sure the fuse/iomap IO path
> works reliably; and converting fuse2fs to be a lowlevel fuse server.

Great, thanks for clarifying.

> Eliminating all the path walking stuff that the highlevel fuse library
> does reduces the fstests runtime from 7.9 to 3.5h, and turning on iomap
> cuts that to 2.2h.

Wow! those are quite impressive numbers.  Looking forward to look into
those fuse2fs improvements!

Cheers,
-- 
Luís

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2025-08-13 13:14 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-29 13:56 [RFC] Another take at restarting FUSE servers Luis Henriques
2025-07-29 23:38 ` Darrick J. Wong
2025-07-30 14:04   ` Luis Henriques
2025-07-31 11:33     ` Christian Brauner
2025-07-31 12:23       ` Luis Henriques
2025-07-31 17:29       ` Darrick J. Wong
2025-08-04  8:45         ` Christian Brauner
2025-08-12 19:28           ` Darrick J. Wong
2025-07-31 13:04   ` Theodore Ts'o
2025-07-31 17:38     ` Darrick J. Wong
2025-08-01 10:15       ` Luis Henriques
2025-08-11 15:43         ` Darrick J. Wong
2025-08-13 13:14           ` Luis Henriques

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).