public inbox for linux-ext4@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCHSET 4/6] fuse2fs: use fuseblk mode
@ 2025-09-15 23:59 Darrick J. Wong
  2025-09-16  0:03 ` [PATCH 1/2] fuse2fs: mount norecovery if main block device is readonly Darrick J. Wong
  2025-09-16  0:03 ` [PATCH 2/2] fuse2fs: use fuseblk mode for mounting filesystems Darrick J. Wong
  0 siblings, 2 replies; 7+ messages in thread
From: Darrick J. Wong @ 2025-09-15 23:59 UTC (permalink / raw)
  To: tytso; +Cc: linux-ext4

Hi all,

While I was testing pre-iomap fuse2fs, I noticed a strange behavior of
fuse2fs.  When the filesystem is unmounted, the VFS mount goes away and
umount(3) returns before op_destroy is even called in fuse2fs.  As a
result, a subsequent fstest can try to format/mount the block device
even though fuse2fs hasn't even finished flushing dirty data to disk
or closed the block device.

This causes various weird test failures.  More alarmingly, this also
means that the age old advice that it's safe to yank a USB stick after
unmount returns is not actually true for fuse2fs.  This can lead to user
data loss.

There is a solution to this -- fuseblk mode.  In this scheme, fuse2fs
tells the kernel which block device it wants, the kernel opens the block
device, and it upcalls FUSE_DESTROY before releasing the block device
or the in-kernel super_block.  This gives us the desired property that
when unmount completes, it's safe to remove the device.

Unfortunately, this comes at a price.  Because the kernel insists upon
opening the fuseblk device in O_EXCL mode, we have to close the
filesystem before starting up fuse, and reopen it in op_init.  This
creates a largeish TOCTOU race window and increases mount times.  Worse
yet, if CONFIG_BLK_DEV_WRITE_MOUNTED=n, then this won't even work.

The last patch also registers fuse2fs as a process involved in memory
reclamation to prevent memory allocation deadlocks.

If you're going to start using this code, I strongly recommend pulling
from my git trees, which are linked below.

Comments and questions are, as always, welcome.

e2fsprogs git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/e2fsprogs.git/log/?h=fuse2fs-use-fuseblk
---
Commits in this patchset:
 * fuse2fs: mount norecovery if main block device is readonly
 * fuse2fs: use fuseblk mode for mounting filesystems
---
 misc/fuse2fs.c |   55 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 55 insertions(+)


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 1/2] fuse2fs: mount norecovery if main block device is readonly
  2025-09-15 23:59 [PATCHSET 4/6] fuse2fs: use fuseblk mode Darrick J. Wong
@ 2025-09-16  0:03 ` Darrick J. Wong
  2025-10-16 19:34   ` Dave Dykstra
  2025-09-16  0:03 ` [PATCH 2/2] fuse2fs: use fuseblk mode for mounting filesystems Darrick J. Wong
  1 sibling, 1 reply; 7+ messages in thread
From: Darrick J. Wong @ 2025-09-16  0:03 UTC (permalink / raw)
  To: tytso; +Cc: linux-ext4

From: Darrick J. Wong <djwong@kernel.org>

If the main block device is read-only, downgrade the mount to an ro
norecovery mount.  This will make generic/050 somewhat less grouchy.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 misc/fuse2fs.c |    9 +++++++++
 1 file changed, 9 insertions(+)


diff --git a/misc/fuse2fs.c b/misc/fuse2fs.c
index 48473321f469dc..fb44b0a79b53e6 100644
--- a/misc/fuse2fs.c
+++ b/misc/fuse2fs.c
@@ -946,6 +946,15 @@ static errcode_t fuse2fs_open(struct fuse2fs *ff, int libext2_flags)
 
 	err = ext2fs_open2(ff->device, options, flags, 0, 0, unix_io_manager,
 			   &ff->fs);
+	if (err == EPERM) {
+		err_printf(ff, "%s.\n",
+			   _("read-only device, trying to mount norecovery"));
+		flags &= ~EXT2_FLAG_RW;
+		ff->ro = 1;
+		ff->norecovery = 1;
+		err = ext2fs_open2(ff->device, options, flags, 0, 0,
+				   unix_io_manager, &ff->fs);
+	}
 	if (err) {
 		err_printf(ff, "%s.\n", error_message(err));
 		err_printf(ff, "%s\n", _("Please run e2fsck -fy."));


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 2/2] fuse2fs: use fuseblk mode for mounting filesystems
  2025-09-15 23:59 [PATCHSET 4/6] fuse2fs: use fuseblk mode Darrick J. Wong
  2025-09-16  0:03 ` [PATCH 1/2] fuse2fs: mount norecovery if main block device is readonly Darrick J. Wong
@ 2025-09-16  0:03 ` Darrick J. Wong
  1 sibling, 0 replies; 7+ messages in thread
From: Darrick J. Wong @ 2025-09-16  0:03 UTC (permalink / raw)
  To: tytso; +Cc: linux-ext4

From: Darrick J. Wong <djwong@kernel.org>

So this is awkward.  Up until now, fuse2fs has opened the block device
in exclusive mode so that it can do all the superblock feature parsing
in main() prior to initiating the fuse connection.

However, in running fstests on fuse2fs, I noticed a weird unmount race
where the umount() syscall can return before the op_destroy function
gets called by libfuse.  This is problematic because fstests (and
probably users too) make a lot of assumptions about the block device
being openable after umount() completes.  The op_destroy function can
take some time to flush dirty blocks out of its pagecache, call fsync,
etc.

I poked around the kernel and libfuse and discovered that the kernel
fuse driver has two modes: anonymous and block device mode.  In block
device mode the kernel will send a FUSE_DESTROY command to userspace and
wait for libfuse to call our op_destroy function.  In anonymous mode,
the kernel closes the fuse device and completes the unmount, which means
that libfuse calls op_destroy after the unmount has gone away.

This is the root cause of _scratch_cycle_mount sporadically complaining
about the block device being in use.  The solution is to use block
device mode, but this means we have to move the libext2fs initialization
to op_init and we can no longer be the exclusive owner of the block
device.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 misc/fuse2fs.c |   46 ++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 46 insertions(+)


diff --git a/misc/fuse2fs.c b/misc/fuse2fs.c
index fb44b0a79b53e6..8cd4dc600888b9 100644
--- a/misc/fuse2fs.c
+++ b/misc/fuse2fs.c
@@ -24,6 +24,7 @@
 #include <sys/ioctl.h>
 #include <unistd.h>
 #include <ctype.h>
+#include <stdbool.h>
 #define FUSE_DARWIN_ENABLE_EXTENSIONS 0
 #ifdef __SET_FOB_FOR_FUSE
 # error Do not set magic value __SET_FOB_FOR_FUSE!!!!
@@ -236,6 +237,8 @@ struct fuse2fs {
 	int directio;
 	int acl;
 	int dirsync;
+	int unmount_in_destroy;
+	int noblkdev;
 
 	int logfd;
 	int blocklog;
@@ -967,6 +970,11 @@ static errcode_t fuse2fs_open(struct fuse2fs *ff, int libext2_flags)
 	return 0;
 }
 
+static inline bool fuse2fs_on_bdev(const struct fuse2fs *ff)
+{
+	return ff->fs->io->flags & CHANNEL_FLAGS_BLOCK_DEVICE;
+}
+
 static errcode_t fuse2fs_config_cache(struct fuse2fs *ff)
 {
 	char buf[128];
@@ -1152,6 +1160,9 @@ static void op_destroy(void *p EXT2FS_ATTR((unused)))
 		log_printf(ff, "%s %s.\n", _("unmounting filesystem"), uuid);
 	}
 
+	if (ff->unmount_in_destroy)
+		fuse2fs_unmount(ff);
+
 	fuse2fs_finish(ff, 0);
 }
 
@@ -4998,6 +5009,7 @@ static struct fuse_opt fuse2fs_opts[] = {
 #ifdef HAVE_CLOCK_MONOTONIC
 	FUSE2FS_OPT("timing",		timing,			1),
 #endif
+	FUSE2FS_OPT("noblkdev",		noblkdev,		1),
 
 	FUSE_OPT_KEY("user_xattr",	FUSE2FS_IGNORED),
 	FUSE_OPT_KEY("noblock_validity", FUSE2FS_IGNORED),
@@ -5143,6 +5155,18 @@ static unsigned long long default_cache_size(void)
 	return ret;
 }
 
+static inline bool fuse2fs_want_fuseblk(const struct fuse2fs *ff)
+{
+	if (ff->noblkdev)
+		return false;
+
+	/* libfuse won't let non-root do fuseblk mounts */
+	if (getuid() != 0)
+		return false;
+
+	return fuse2fs_on_bdev(ff);
+}
+
 int main(int argc, char *argv[])
 {
 	struct fuse_args args = FUSE_ARGS_INIT(argc, argv);
@@ -5201,6 +5225,28 @@ int main(int argc, char *argv[])
 		goto out;
 	}
 
+	if (fuse2fs_want_fuseblk(&fctx)) {
+		/*
+		 * If this is a block device, we want to close the fs, reopen
+		 * the block device in non-exclusive mode, and start the fuse
+		 * driver in fuseblk mode (which will reopen the block device
+		 * in exclusive mode) so that unmount will wait until
+		 * op_destroy completes.
+		 */
+		fuse2fs_unmount(&fctx);
+		err = fuse2fs_open(&fctx, 0);
+		if (err) {
+			ret = 32;
+			goto out;
+		}
+
+		/* "blkdev" is the magic mount option for fuseblk mode */
+		snprintf(extra_args, BUFSIZ, "-oblkdev,blksize=%u",
+			 fctx.fs->blocksize);
+		fuse_opt_add_arg(&args, extra_args);
+		fctx.unmount_in_destroy = 1;
+	}
+
 	if (!fctx.cache_size)
 		fctx.cache_size = default_cache_size();
 	if (fctx.cache_size) {


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] fuse2fs: mount norecovery if main block device is readonly
  2025-09-16  0:03 ` [PATCH 1/2] fuse2fs: mount norecovery if main block device is readonly Darrick J. Wong
@ 2025-10-16 19:34   ` Dave Dykstra
  2025-10-17 19:38     ` Darrick J. Wong
  0 siblings, 1 reply; 7+ messages in thread
From: Dave Dykstra @ 2025-10-16 19:34 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: tytso, linux-ext4

I have a few problems with this patch, details below.

I have proposed an alternative at
    https://github.com/tytso/e2fsprogs/pull/250
and I'll email that here next.

On Mon, Sep 15, 2025 at 05:03:14PM -0700, Darrick J. Wong wrote:
...
> diff --git a/misc/fuse2fs.c b/misc/fuse2fs.c
> index 48473321f469dc..fb44b0a79b53e6 100644
> --- a/misc/fuse2fs.c
> +++ b/misc/fuse2fs.c
> @@ -946,6 +946,15 @@ static errcode_t fuse2fs_open(struct fuse2fs *ff, int libext2_flags)
>  
>  	err = ext2fs_open2(ff->device, options, flags, 0, 0, unix_io_manager,
>  			   &ff->fs);
> +	if (err == EPERM) {

In my case the error here is EACCES (Permission denied) rather than EPERM
so I in my patch I included both.

> +		err_printf(ff, "%s.\n",
> +			   _("read-only device, trying to mount norecovery"));
> +		flags &= ~EXT2_FLAG_RW;
> +		ff->ro = 1;
> +		ff->norecovery = 1;

I don't think it's good to switch to read-only+norecovery even when a
read-write mode was requested.  That goes too far.  It also doesn't
catch when recovery is needed.  My proposed patch only reopens read-only
when ro was requested and then later checks to see if recovery is needed
and if so, errors out.

Dave

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] fuse2fs: mount norecovery if main block device is readonly
  2025-10-16 19:34   ` Dave Dykstra
@ 2025-10-17 19:38     ` Darrick J. Wong
  2025-10-17 20:20       ` Dave Dykstra
  0 siblings, 1 reply; 7+ messages in thread
From: Darrick J. Wong @ 2025-10-17 19:38 UTC (permalink / raw)
  To: Dave Dykstra; +Cc: tytso, linux-ext4

On Thu, Oct 16, 2025 at 02:34:18PM -0500, Dave Dykstra wrote:
> I have a few problems with this patch, details below.
> 
> I have proposed an alternative at
>     https://github.com/tytso/e2fsprogs/pull/250
> and I'll email that here next.
> 
> On Mon, Sep 15, 2025 at 05:03:14PM -0700, Darrick J. Wong wrote:
> ...
> > diff --git a/misc/fuse2fs.c b/misc/fuse2fs.c
> > index 48473321f469dc..fb44b0a79b53e6 100644
> > --- a/misc/fuse2fs.c
> > +++ b/misc/fuse2fs.c
> > @@ -946,6 +946,15 @@ static errcode_t fuse2fs_open(struct fuse2fs *ff, int libext2_flags)
> >  
> >  	err = ext2fs_open2(ff->device, options, flags, 0, 0, unix_io_manager,
> >  			   &ff->fs);
> > +	if (err == EPERM) {
> 
> In my case the error here is EACCES (Permission denied) rather than EPERM
> so I in my patch I included both.

Ok, I'll go update my own patch.

> > +		err_printf(ff, "%s.\n",
> > +			   _("read-only device, trying to mount norecovery"));
> > +		flags &= ~EXT2_FLAG_RW;
> > +		ff->ro = 1;
> > +		ff->norecovery = 1;
> 
> I don't think it's good to switch to read-only+norecovery even when a
> read-write mode was requested.  That goes too far.

The block device cannot be opened for write, so the mount cannot allow
user programs to write to files, and the fs driver cannot recover the
journal and it cannot write to the disk.  The only other choice would
be to fail the mount.

norecovery is wrong though.  The kernel fails the mount if the journal
needs recovery, the block device is ro, and the user didn't specify
norecovery.

> It also doesn't catch when recovery is needed.

What specifically do you mean "catch when recovery is needed"?  68 lines
down from the ext2fs_open2 call is a check for the needsrecovery state,
followed by recovering the journal.

> My proposed patch only reopens read-only
> when ro was requested and then later checks to see if recovery is needed
> and if so, errors out.

Your patch also didn't re-check the feature support after reopening the
block device, which you dismissed even though that can lead to
catastrophic behavior.

--D

> 
> Dave
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] fuse2fs: mount norecovery if main block device is readonly
  2025-10-17 19:38     ` Darrick J. Wong
@ 2025-10-17 20:20       ` Dave Dykstra
  2025-10-17 23:30         ` Darrick J. Wong
  0 siblings, 1 reply; 7+ messages in thread
From: Dave Dykstra @ 2025-10-17 20:20 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: tytso, linux-ext4

On Fri, Oct 17, 2025 at 12:38:41PM -0700, Darrick J. Wong wrote:
> On Thu, Oct 16, 2025 at 02:34:18PM -0500, Dave Dykstra wrote:
...
> > > +		err_printf(ff, "%s.\n",
> > > +			   _("read-only device, trying to mount norecovery"));
> > > +		flags &= ~EXT2_FLAG_RW;
> > > +		ff->ro = 1;
> > > +		ff->norecovery = 1;
> > 
> > I don't think it's good to switch to read-only+norecovery even when a
> > read-write mode was requested.  That goes too far.
> 
> The block device cannot be opened for write, so the mount cannot allow
> user programs to write to files, and the fs driver cannot recover the
> journal and it cannot write to the disk.  The only other choice would
> be to fail the mount.

Yes, I think it's better to fail the mount if recovery is needed and it
can't be done.

> norecovery is wrong though.  The kernel fails the mount if the journal
> needs recovery, the block device is ro, and the user didn't specify
> norecovery.

That makes more sense.

> > It also doesn't catch when recovery is needed.
> 
> What specifically do you mean "catch when recovery is needed"?  68 lines
> down from the ext2fs_open2 call is a check for the needsrecovery state,
> followed by recovering the journal.

I meant that it should fail in that case because it can't recover.

> > My proposed patch only reopens read-only
> > when ro was requested and then later checks to see if recovery is needed
> > and if so, errors out.
> 
> Your patch also didn't re-check the feature support after reopening the
> block device, which you dismissed even though that can lead to
> catastrophic behavior.

In this version of the patch there is no reopening, there is only a
switch to open without RW if the RW open fails.  So all feature checks
happen after it.

Dave

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] fuse2fs: mount norecovery if main block device is readonly
  2025-10-17 20:20       ` Dave Dykstra
@ 2025-10-17 23:30         ` Darrick J. Wong
  0 siblings, 0 replies; 7+ messages in thread
From: Darrick J. Wong @ 2025-10-17 23:30 UTC (permalink / raw)
  To: Dave Dykstra; +Cc: tytso, linux-ext4

On Fri, Oct 17, 2025 at 03:20:00PM -0500, Dave Dykstra wrote:
> On Fri, Oct 17, 2025 at 12:38:41PM -0700, Darrick J. Wong wrote:
> > On Thu, Oct 16, 2025 at 02:34:18PM -0500, Dave Dykstra wrote:
> ...
> > > > +		err_printf(ff, "%s.\n",
> > > > +			   _("read-only device, trying to mount norecovery"));
> > > > +		flags &= ~EXT2_FLAG_RW;
> > > > +		ff->ro = 1;
> > > > +		ff->norecovery = 1;
> > > 
> > > I don't think it's good to switch to read-only+norecovery even when a
> > > read-write mode was requested.  That goes too far.
> > 
> > The block device cannot be opened for write, so the mount cannot allow
> > user programs to write to files, and the fs driver cannot recover the
> > journal and it cannot write to the disk.  The only other choice would
> > be to fail the mount.
> 
> Yes, I think it's better to fail the mount if recovery is needed and it
> can't be done.
> 
> > norecovery is wrong though.  The kernel fails the mount if the journal
> > needs recovery, the block device is ro, and the user didn't specify
> > norecovery.
> 
> That makes more sense.
> 
> > > It also doesn't catch when recovery is needed.
> > 
> > What specifically do you mean "catch when recovery is needed"?  68 lines
> > down from the ext2fs_open2 call is a check for the needsrecovery state,
> > followed by recovering the journal.
> 
> I meant that it should fail in that case because it can't recover.
> 
> > > My proposed patch only reopens read-only
> > > when ro was requested and then later checks to see if recovery is needed
> > > and if so, errors out.
> > 
> > Your patch also didn't re-check the feature support after reopening the
> > block device, which you dismissed even though that can lead to
> > catastrophic behavior.
> 
> In this version of the patch there is no reopening, there is only a
> switch to open without RW if the RW open fails.  So all feature checks
> happen after it.

Well I'm struggling to reshuffle my patch deck to keep up with you.

I already _had_ patches to fix every thing you've mentioned, but since
you pre-declared you wouldn't make time even to go look at my patch
stack why the hell am I even bothering?

I no longer have patience for these interactions.

--D

> Dave
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2025-10-17 23:30 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-15 23:59 [PATCHSET 4/6] fuse2fs: use fuseblk mode Darrick J. Wong
2025-09-16  0:03 ` [PATCH 1/2] fuse2fs: mount norecovery if main block device is readonly Darrick J. Wong
2025-10-16 19:34   ` Dave Dykstra
2025-10-17 19:38     ` Darrick J. Wong
2025-10-17 20:20       ` Dave Dykstra
2025-10-17 23:30         ` Darrick J. Wong
2025-09-16  0:03 ` [PATCH 2/2] fuse2fs: use fuseblk mode for mounting filesystems Darrick J. Wong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox