* [PATCHSET 4/6] fuse2fs: use fuseblk mode @ 2025-09-15 23:59 Darrick J. Wong 2025-09-16 0:03 ` [PATCH 1/2] fuse2fs: mount norecovery if main block device is readonly Darrick J. Wong 2025-09-16 0:03 ` [PATCH 2/2] fuse2fs: use fuseblk mode for mounting filesystems Darrick J. Wong 0 siblings, 2 replies; 7+ messages in thread From: Darrick J. Wong @ 2025-09-15 23:59 UTC (permalink / raw) To: tytso; +Cc: linux-ext4 Hi all, While I was testing pre-iomap fuse2fs, I noticed a strange behavior of fuse2fs. When the filesystem is unmounted, the VFS mount goes away and umount(3) returns before op_destroy is even called in fuse2fs. As a result, a subsequent fstest can try to format/mount the block device even though fuse2fs hasn't even finished flushing dirty data to disk or closed the block device. This causes various weird test failures. More alarmingly, this also means that the age old advice that it's safe to yank a USB stick after unmount returns is not actually true for fuse2fs. This can lead to user data loss. There is a solution to this -- fuseblk mode. In this scheme, fuse2fs tells the kernel which block device it wants, the kernel opens the block device, and it upcalls FUSE_DESTROY before releasing the block device or the in-kernel super_block. This gives us the desired property that when unmount completes, it's safe to remove the device. Unfortunately, this comes at a price. Because the kernel insists upon opening the fuseblk device in O_EXCL mode, we have to close the filesystem before starting up fuse, and reopen it in op_init. This creates a largeish TOCTOU race window and increases mount times. Worse yet, if CONFIG_BLK_DEV_WRITE_MOUNTED=n, then this won't even work. The last patch also registers fuse2fs as a process involved in memory reclamation to prevent memory allocation deadlocks. If you're going to start using this code, I strongly recommend pulling from my git trees, which are linked below. Comments and questions are, as always, welcome. e2fsprogs git tree: https://git.kernel.org/cgit/linux/kernel/git/djwong/e2fsprogs.git/log/?h=fuse2fs-use-fuseblk --- Commits in this patchset: * fuse2fs: mount norecovery if main block device is readonly * fuse2fs: use fuseblk mode for mounting filesystems --- misc/fuse2fs.c | 55 +++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 55 insertions(+) ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH 1/2] fuse2fs: mount norecovery if main block device is readonly 2025-09-15 23:59 [PATCHSET 4/6] fuse2fs: use fuseblk mode Darrick J. Wong @ 2025-09-16 0:03 ` Darrick J. Wong 2025-10-16 19:34 ` Dave Dykstra 2025-09-16 0:03 ` [PATCH 2/2] fuse2fs: use fuseblk mode for mounting filesystems Darrick J. Wong 1 sibling, 1 reply; 7+ messages in thread From: Darrick J. Wong @ 2025-09-16 0:03 UTC (permalink / raw) To: tytso; +Cc: linux-ext4 From: Darrick J. Wong <djwong@kernel.org> If the main block device is read-only, downgrade the mount to an ro norecovery mount. This will make generic/050 somewhat less grouchy. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> --- misc/fuse2fs.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/misc/fuse2fs.c b/misc/fuse2fs.c index 48473321f469dc..fb44b0a79b53e6 100644 --- a/misc/fuse2fs.c +++ b/misc/fuse2fs.c @@ -946,6 +946,15 @@ static errcode_t fuse2fs_open(struct fuse2fs *ff, int libext2_flags) err = ext2fs_open2(ff->device, options, flags, 0, 0, unix_io_manager, &ff->fs); + if (err == EPERM) { + err_printf(ff, "%s.\n", + _("read-only device, trying to mount norecovery")); + flags &= ~EXT2_FLAG_RW; + ff->ro = 1; + ff->norecovery = 1; + err = ext2fs_open2(ff->device, options, flags, 0, 0, + unix_io_manager, &ff->fs); + } if (err) { err_printf(ff, "%s.\n", error_message(err)); err_printf(ff, "%s\n", _("Please run e2fsck -fy.")); ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH 1/2] fuse2fs: mount norecovery if main block device is readonly 2025-09-16 0:03 ` [PATCH 1/2] fuse2fs: mount norecovery if main block device is readonly Darrick J. Wong @ 2025-10-16 19:34 ` Dave Dykstra 2025-10-17 19:38 ` Darrick J. Wong 0 siblings, 1 reply; 7+ messages in thread From: Dave Dykstra @ 2025-10-16 19:34 UTC (permalink / raw) To: Darrick J. Wong; +Cc: tytso, linux-ext4 I have a few problems with this patch, details below. I have proposed an alternative at https://github.com/tytso/e2fsprogs/pull/250 and I'll email that here next. On Mon, Sep 15, 2025 at 05:03:14PM -0700, Darrick J. Wong wrote: ... > diff --git a/misc/fuse2fs.c b/misc/fuse2fs.c > index 48473321f469dc..fb44b0a79b53e6 100644 > --- a/misc/fuse2fs.c > +++ b/misc/fuse2fs.c > @@ -946,6 +946,15 @@ static errcode_t fuse2fs_open(struct fuse2fs *ff, int libext2_flags) > > err = ext2fs_open2(ff->device, options, flags, 0, 0, unix_io_manager, > &ff->fs); > + if (err == EPERM) { In my case the error here is EACCES (Permission denied) rather than EPERM so I in my patch I included both. > + err_printf(ff, "%s.\n", > + _("read-only device, trying to mount norecovery")); > + flags &= ~EXT2_FLAG_RW; > + ff->ro = 1; > + ff->norecovery = 1; I don't think it's good to switch to read-only+norecovery even when a read-write mode was requested. That goes too far. It also doesn't catch when recovery is needed. My proposed patch only reopens read-only when ro was requested and then later checks to see if recovery is needed and if so, errors out. Dave ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 1/2] fuse2fs: mount norecovery if main block device is readonly 2025-10-16 19:34 ` Dave Dykstra @ 2025-10-17 19:38 ` Darrick J. Wong 2025-10-17 20:20 ` Dave Dykstra 0 siblings, 1 reply; 7+ messages in thread From: Darrick J. Wong @ 2025-10-17 19:38 UTC (permalink / raw) To: Dave Dykstra; +Cc: tytso, linux-ext4 On Thu, Oct 16, 2025 at 02:34:18PM -0500, Dave Dykstra wrote: > I have a few problems with this patch, details below. > > I have proposed an alternative at > https://github.com/tytso/e2fsprogs/pull/250 > and I'll email that here next. > > On Mon, Sep 15, 2025 at 05:03:14PM -0700, Darrick J. Wong wrote: > ... > > diff --git a/misc/fuse2fs.c b/misc/fuse2fs.c > > index 48473321f469dc..fb44b0a79b53e6 100644 > > --- a/misc/fuse2fs.c > > +++ b/misc/fuse2fs.c > > @@ -946,6 +946,15 @@ static errcode_t fuse2fs_open(struct fuse2fs *ff, int libext2_flags) > > > > err = ext2fs_open2(ff->device, options, flags, 0, 0, unix_io_manager, > > &ff->fs); > > + if (err == EPERM) { > > In my case the error here is EACCES (Permission denied) rather than EPERM > so I in my patch I included both. Ok, I'll go update my own patch. > > + err_printf(ff, "%s.\n", > > + _("read-only device, trying to mount norecovery")); > > + flags &= ~EXT2_FLAG_RW; > > + ff->ro = 1; > > + ff->norecovery = 1; > > I don't think it's good to switch to read-only+norecovery even when a > read-write mode was requested. That goes too far. The block device cannot be opened for write, so the mount cannot allow user programs to write to files, and the fs driver cannot recover the journal and it cannot write to the disk. The only other choice would be to fail the mount. norecovery is wrong though. The kernel fails the mount if the journal needs recovery, the block device is ro, and the user didn't specify norecovery. > It also doesn't catch when recovery is needed. What specifically do you mean "catch when recovery is needed"? 68 lines down from the ext2fs_open2 call is a check for the needsrecovery state, followed by recovering the journal. > My proposed patch only reopens read-only > when ro was requested and then later checks to see if recovery is needed > and if so, errors out. Your patch also didn't re-check the feature support after reopening the block device, which you dismissed even though that can lead to catastrophic behavior. --D > > Dave > ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 1/2] fuse2fs: mount norecovery if main block device is readonly 2025-10-17 19:38 ` Darrick J. Wong @ 2025-10-17 20:20 ` Dave Dykstra 2025-10-17 23:30 ` Darrick J. Wong 0 siblings, 1 reply; 7+ messages in thread From: Dave Dykstra @ 2025-10-17 20:20 UTC (permalink / raw) To: Darrick J. Wong; +Cc: tytso, linux-ext4 On Fri, Oct 17, 2025 at 12:38:41PM -0700, Darrick J. Wong wrote: > On Thu, Oct 16, 2025 at 02:34:18PM -0500, Dave Dykstra wrote: ... > > > + err_printf(ff, "%s.\n", > > > + _("read-only device, trying to mount norecovery")); > > > + flags &= ~EXT2_FLAG_RW; > > > + ff->ro = 1; > > > + ff->norecovery = 1; > > > > I don't think it's good to switch to read-only+norecovery even when a > > read-write mode was requested. That goes too far. > > The block device cannot be opened for write, so the mount cannot allow > user programs to write to files, and the fs driver cannot recover the > journal and it cannot write to the disk. The only other choice would > be to fail the mount. Yes, I think it's better to fail the mount if recovery is needed and it can't be done. > norecovery is wrong though. The kernel fails the mount if the journal > needs recovery, the block device is ro, and the user didn't specify > norecovery. That makes more sense. > > It also doesn't catch when recovery is needed. > > What specifically do you mean "catch when recovery is needed"? 68 lines > down from the ext2fs_open2 call is a check for the needsrecovery state, > followed by recovering the journal. I meant that it should fail in that case because it can't recover. > > My proposed patch only reopens read-only > > when ro was requested and then later checks to see if recovery is needed > > and if so, errors out. > > Your patch also didn't re-check the feature support after reopening the > block device, which you dismissed even though that can lead to > catastrophic behavior. In this version of the patch there is no reopening, there is only a switch to open without RW if the RW open fails. So all feature checks happen after it. Dave ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 1/2] fuse2fs: mount norecovery if main block device is readonly 2025-10-17 20:20 ` Dave Dykstra @ 2025-10-17 23:30 ` Darrick J. Wong 0 siblings, 0 replies; 7+ messages in thread From: Darrick J. Wong @ 2025-10-17 23:30 UTC (permalink / raw) To: Dave Dykstra; +Cc: tytso, linux-ext4 On Fri, Oct 17, 2025 at 03:20:00PM -0500, Dave Dykstra wrote: > On Fri, Oct 17, 2025 at 12:38:41PM -0700, Darrick J. Wong wrote: > > On Thu, Oct 16, 2025 at 02:34:18PM -0500, Dave Dykstra wrote: > ... > > > > + err_printf(ff, "%s.\n", > > > > + _("read-only device, trying to mount norecovery")); > > > > + flags &= ~EXT2_FLAG_RW; > > > > + ff->ro = 1; > > > > + ff->norecovery = 1; > > > > > > I don't think it's good to switch to read-only+norecovery even when a > > > read-write mode was requested. That goes too far. > > > > The block device cannot be opened for write, so the mount cannot allow > > user programs to write to files, and the fs driver cannot recover the > > journal and it cannot write to the disk. The only other choice would > > be to fail the mount. > > Yes, I think it's better to fail the mount if recovery is needed and it > can't be done. > > > norecovery is wrong though. The kernel fails the mount if the journal > > needs recovery, the block device is ro, and the user didn't specify > > norecovery. > > That makes more sense. > > > > It also doesn't catch when recovery is needed. > > > > What specifically do you mean "catch when recovery is needed"? 68 lines > > down from the ext2fs_open2 call is a check for the needsrecovery state, > > followed by recovering the journal. > > I meant that it should fail in that case because it can't recover. > > > > My proposed patch only reopens read-only > > > when ro was requested and then later checks to see if recovery is needed > > > and if so, errors out. > > > > Your patch also didn't re-check the feature support after reopening the > > block device, which you dismissed even though that can lead to > > catastrophic behavior. > > In this version of the patch there is no reopening, there is only a > switch to open without RW if the RW open fails. So all feature checks > happen after it. Well I'm struggling to reshuffle my patch deck to keep up with you. I already _had_ patches to fix every thing you've mentioned, but since you pre-declared you wouldn't make time even to go look at my patch stack why the hell am I even bothering? I no longer have patience for these interactions. --D > Dave > ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH 2/2] fuse2fs: use fuseblk mode for mounting filesystems 2025-09-15 23:59 [PATCHSET 4/6] fuse2fs: use fuseblk mode Darrick J. Wong 2025-09-16 0:03 ` [PATCH 1/2] fuse2fs: mount norecovery if main block device is readonly Darrick J. Wong @ 2025-09-16 0:03 ` Darrick J. Wong 1 sibling, 0 replies; 7+ messages in thread From: Darrick J. Wong @ 2025-09-16 0:03 UTC (permalink / raw) To: tytso; +Cc: linux-ext4 From: Darrick J. Wong <djwong@kernel.org> So this is awkward. Up until now, fuse2fs has opened the block device in exclusive mode so that it can do all the superblock feature parsing in main() prior to initiating the fuse connection. However, in running fstests on fuse2fs, I noticed a weird unmount race where the umount() syscall can return before the op_destroy function gets called by libfuse. This is problematic because fstests (and probably users too) make a lot of assumptions about the block device being openable after umount() completes. The op_destroy function can take some time to flush dirty blocks out of its pagecache, call fsync, etc. I poked around the kernel and libfuse and discovered that the kernel fuse driver has two modes: anonymous and block device mode. In block device mode the kernel will send a FUSE_DESTROY command to userspace and wait for libfuse to call our op_destroy function. In anonymous mode, the kernel closes the fuse device and completes the unmount, which means that libfuse calls op_destroy after the unmount has gone away. This is the root cause of _scratch_cycle_mount sporadically complaining about the block device being in use. The solution is to use block device mode, but this means we have to move the libext2fs initialization to op_init and we can no longer be the exclusive owner of the block device. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> --- misc/fuse2fs.c | 46 ++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 46 insertions(+) diff --git a/misc/fuse2fs.c b/misc/fuse2fs.c index fb44b0a79b53e6..8cd4dc600888b9 100644 --- a/misc/fuse2fs.c +++ b/misc/fuse2fs.c @@ -24,6 +24,7 @@ #include <sys/ioctl.h> #include <unistd.h> #include <ctype.h> +#include <stdbool.h> #define FUSE_DARWIN_ENABLE_EXTENSIONS 0 #ifdef __SET_FOB_FOR_FUSE # error Do not set magic value __SET_FOB_FOR_FUSE!!!! @@ -236,6 +237,8 @@ struct fuse2fs { int directio; int acl; int dirsync; + int unmount_in_destroy; + int noblkdev; int logfd; int blocklog; @@ -967,6 +970,11 @@ static errcode_t fuse2fs_open(struct fuse2fs *ff, int libext2_flags) return 0; } +static inline bool fuse2fs_on_bdev(const struct fuse2fs *ff) +{ + return ff->fs->io->flags & CHANNEL_FLAGS_BLOCK_DEVICE; +} + static errcode_t fuse2fs_config_cache(struct fuse2fs *ff) { char buf[128]; @@ -1152,6 +1160,9 @@ static void op_destroy(void *p EXT2FS_ATTR((unused))) log_printf(ff, "%s %s.\n", _("unmounting filesystem"), uuid); } + if (ff->unmount_in_destroy) + fuse2fs_unmount(ff); + fuse2fs_finish(ff, 0); } @@ -4998,6 +5009,7 @@ static struct fuse_opt fuse2fs_opts[] = { #ifdef HAVE_CLOCK_MONOTONIC FUSE2FS_OPT("timing", timing, 1), #endif + FUSE2FS_OPT("noblkdev", noblkdev, 1), FUSE_OPT_KEY("user_xattr", FUSE2FS_IGNORED), FUSE_OPT_KEY("noblock_validity", FUSE2FS_IGNORED), @@ -5143,6 +5155,18 @@ static unsigned long long default_cache_size(void) return ret; } +static inline bool fuse2fs_want_fuseblk(const struct fuse2fs *ff) +{ + if (ff->noblkdev) + return false; + + /* libfuse won't let non-root do fuseblk mounts */ + if (getuid() != 0) + return false; + + return fuse2fs_on_bdev(ff); +} + int main(int argc, char *argv[]) { struct fuse_args args = FUSE_ARGS_INIT(argc, argv); @@ -5201,6 +5225,28 @@ int main(int argc, char *argv[]) goto out; } + if (fuse2fs_want_fuseblk(&fctx)) { + /* + * If this is a block device, we want to close the fs, reopen + * the block device in non-exclusive mode, and start the fuse + * driver in fuseblk mode (which will reopen the block device + * in exclusive mode) so that unmount will wait until + * op_destroy completes. + */ + fuse2fs_unmount(&fctx); + err = fuse2fs_open(&fctx, 0); + if (err) { + ret = 32; + goto out; + } + + /* "blkdev" is the magic mount option for fuseblk mode */ + snprintf(extra_args, BUFSIZ, "-oblkdev,blksize=%u", + fctx.fs->blocksize); + fuse_opt_add_arg(&args, extra_args); + fctx.unmount_in_destroy = 1; + } + if (!fctx.cache_size) fctx.cache_size = default_cache_size(); if (fctx.cache_size) { ^ permalink raw reply related [flat|nested] 7+ messages in thread
end of thread, other threads:[~2025-10-17 23:30 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-09-15 23:59 [PATCHSET 4/6] fuse2fs: use fuseblk mode Darrick J. Wong 2025-09-16 0:03 ` [PATCH 1/2] fuse2fs: mount norecovery if main block device is readonly Darrick J. Wong 2025-10-16 19:34 ` Dave Dykstra 2025-10-17 19:38 ` Darrick J. Wong 2025-10-17 20:20 ` Dave Dykstra 2025-10-17 23:30 ` Darrick J. Wong 2025-09-16 0:03 ` [PATCH 2/2] fuse2fs: use fuseblk mode for mounting filesystems Darrick J. Wong
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox