From: "Darrick J. Wong" <djwong@kernel.org>
To: tytso@mit.edu
Cc: John@groves.net, bernd@bsbernd.com,
linux-fsdevel@vger.kernel.org, linux-ext4@vger.kernel.org,
miklos@szeredi.hu, joannelkoong@gmail.com, neal@gompa.dev
Subject: [PATCH 10/19] fuse2fs: don't do file data block IO when iomap is enabled
Date: Wed, 20 Aug 2025 18:18:18 -0700 [thread overview]
Message-ID: <175573713910.21970.597991894155936504.stgit@frogsfrogsfrogs> (raw)
In-Reply-To: <175573713645.21970.9783397720493472605.stgit@frogsfrogsfrogs>
From: Darrick J. Wong <djwong@kernel.org>
When iomap is in use for the page cache, the kernel will take care of
all the file data block IO for us, including zeroing of punched ranges
and post-EOF bytes. fuse2fs only needs to do IO for inline data.
Therefore, set the NOBLOCKIO ext2_file flag so that libext2fs will not
do any regular file IO to or from disk blocks at all.
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
misc/fuse2fs.c | 72 +++++++++++++++++++++++++++++++++++++++++++++++++++++++-
misc/fuse4fs.c | 11 ++++++++-
2 files changed, 81 insertions(+), 2 deletions(-)
diff --git a/misc/fuse2fs.c b/misc/fuse2fs.c
index dcf002f380b843..588b0053f43c95 100644
--- a/misc/fuse2fs.c
+++ b/misc/fuse2fs.c
@@ -3158,15 +3158,72 @@ static int fuse2fs_punch_posteof(struct fuse2fs *ff, ext2_ino_t ino,
return 0;
}
+/*
+ * Decide if file IO for this inode can use iomap.
+ *
+ * It turns out that libfuse creates internal node ids that have nothing to do
+ * with the ext2_ino_t that we give it. These internal node ids are what
+ * actually gets igetted in the kernel, which means that there can be multiple
+ * fuse_inode objects in the kernel for a single hardlinked ondisk ext2 inode.
+ *
+ * What this means, horrifyingly, is that on a fuse filesystem that supports
+ * hard links, the in-kernel i_rwsem does not protect against concurrent writes
+ * between files that point to the same inode. That in turn means that the
+ * file mode and size can get desynchronized between the multiple fuse_inode
+ * objects. This also means that we cannot cache iomaps in the kernel AT ALL
+ * because the caches will get out of sync, leading to WARN_ONs from the iomap
+ * zeroing code and probably data corruption after that.
+ *
+ * Therefore, libfuse won't let us create hardlinks of iomap files, and we must
+ * never turn on iomap for existing hardlinked files. Long term it means we
+ * have to find a way around this loss of functionality. fuse4fs gets around
+ * this by being a low level fuse driver and controlling the nodeids itself.
+ *
+ * Returns 0 for no, 1 for yes, or a negative errno.
+ */
+#ifdef HAVE_FUSE_IOMAP
+static int fuse2fs_file_uses_iomap(struct fuse2fs *ff, ext2_ino_t ino)
+{
+ struct stat statbuf;
+ int ret;
+
+ if (!fuse2fs_iomap_enabled(ff))
+ return 0;
+
+ ret = stat_inode(ff->fs, ino, &statbuf);
+ if (ret)
+ return ret;
+
+ /* the kernel handles all block IO for us in iomap mode */
+ return fuse_fs_can_enable_iomap(&statbuf);
+}
+#else
+# define fuse2fs_file_uses_iomap(...) (0)
+#endif
+
static int fuse2fs_truncate(struct fuse2fs *ff, ext2_ino_t ino, off_t new_size)
{
ext2_filsys fs = ff->fs;
ext2_file_t file;
__u64 old_isize;
errcode_t err;
+ int flags = EXT2_FILE_WRITE;
int ret = 0;
- err = ext2fs_file_open(fs, ino, EXT2_FILE_WRITE, &file);
+ /* the kernel handles all eof zeroing for us in iomap mode */
+ ret = fuse2fs_file_uses_iomap(ff, ino);
+ switch (ret) {
+ case 0:
+ break;
+ case 1:
+ flags |= EXT2_FILE_NOBLOCKIO;
+ ret = 0;
+ break;
+ default:
+ return ret;
+ }
+
+ err = ext2fs_file_open(fs, ino, flags, &file);
if (err)
return translate_error(fs, ino, err);
@@ -3324,6 +3381,19 @@ static int __op_open(struct fuse2fs *ff, const char *path,
goto out;
}
+ /* the kernel handles all block IO for us in iomap mode */
+ ret = fuse2fs_file_uses_iomap(ff, file->ino);
+ switch (ret) {
+ case 0:
+ break;
+ case 1:
+ file->open_flags |= EXT2_FILE_NOBLOCKIO;
+ ret = 0;
+ break;
+ default:
+ goto out;
+ }
+
if (fp->flags & O_TRUNC) {
ret = fuse2fs_truncate(ff, file->ino, 0);
if (ret)
diff --git a/misc/fuse4fs.c b/misc/fuse4fs.c
index 3082c23e398adf..e08c5af5abfd27 100644
--- a/misc/fuse4fs.c
+++ b/misc/fuse4fs.c
@@ -3375,9 +3375,14 @@ static int fuse4fs_truncate(struct fuse4fs *ff, ext2_ino_t ino, off_t new_size)
ext2_file_t file;
__u64 old_isize;
errcode_t err;
+ int flags = EXT2_FILE_WRITE;
int ret = 0;
- err = ext2fs_file_open(fs, ino, EXT2_FILE_WRITE, &file);
+ /* the kernel handles all eof zeroing for us in iomap mode */
+ if (fuse4fs_iomap_enabled(ff))
+ flags |= EXT2_FILE_NOBLOCKIO;
+
+ err = ext2fs_file_open(fs, ino, flags, &file);
if (err)
return translate_error(fs, ino, err);
@@ -3472,6 +3477,10 @@ static int fuse4fs_open_file(struct fuse4fs *ff, const struct fuse_ctx *ctxt,
if (linked)
check |= L_OK;
+ /* the kernel handles all block IO for us in iomap mode */
+ if (fuse4fs_iomap_enabled(ff))
+ file->open_flags |= EXT2_FILE_NOBLOCKIO;
+
/*
* If the caller wants to truncate the file, we need to ask for full
* write access even if the caller claims to be appending.
next prev parent reply other threads:[~2025-08-21 1:18 UTC|newest]
Thread overview: 74+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-08-21 0:37 [RFC v4] fuse: use fs-iomap for better performance so we can containerize ext4 Darrick J. Wong
2025-08-21 0:49 ` [PATCHSET RFC v4 1/6] fuse4fs: fork a low level fuse server Darrick J. Wong
2025-08-21 1:08 ` [PATCH 01/20] fuse2fs: port fuse2fs to lowlevel libfuse API Darrick J. Wong
2025-08-21 1:08 ` [PATCH 02/20] fuse4fs: drop fuse 2.x support code Darrick J. Wong
2025-08-21 1:08 ` [PATCH 03/20] fuse4fs: namespace some helpers Darrick J. Wong
2025-08-21 1:08 ` [PATCH 04/20] fuse4fs: convert to low level API Darrick J. Wong
2025-08-21 1:09 ` [PATCH 05/20] libsupport: port the kernel list.h to libsupport Darrick J. Wong
2025-08-21 1:09 ` [PATCH 06/20] libsupport: add a cache Darrick J. Wong
2025-08-21 1:09 ` [PATCH 07/20] cache: disable debugging Darrick J. Wong
2025-08-21 1:09 ` [PATCH 08/20] cache: use modern list iterator macros Darrick J. Wong
2025-08-21 1:10 ` [PATCH 09/20] cache: embed struct cache in the owner Darrick J. Wong
2025-08-21 1:10 ` [PATCH 10/20] cache: pass cache pointer to callbacks Darrick J. Wong
2025-08-21 1:10 ` [PATCH 11/20] cache: pass a private data pointer through cache_walk Darrick J. Wong
2025-08-21 1:11 ` [PATCH 12/20] cache: add a helper to grab a new refcount for a cache_node Darrick J. Wong
2025-08-21 1:11 ` [PATCH 13/20] cache: return results of a cache flush Darrick J. Wong
2025-08-21 1:11 ` [PATCH 14/20] cache: add a "get only if incore" flag to cache_node_get Darrick J. Wong
2025-08-21 1:11 ` [PATCH 15/20] cache: support gradual expansion Darrick J. Wong
2025-08-21 1:12 ` [PATCH 16/20] cache: implement automatic shrinking Darrick J. Wong
2025-08-21 1:12 ` [PATCH 17/20] fuse4fs: add cache to track open files Darrick J. Wong
2025-08-21 1:12 ` [PATCH 18/20] fuse4fs: use the orphaned inode list Darrick J. Wong
2025-08-21 1:12 ` [PATCH 19/20] fuse4fs: implement FUSE_TMPFILE Darrick J. Wong
2025-08-21 1:13 ` [PATCH 20/20] fuse4fs: create incore reverse orphan list Darrick J. Wong
2025-08-21 0:49 ` [PATCHSET RFC v4 2/6] libext2fs: refactoring for fuse2fs iomap support Darrick J. Wong
2025-08-21 1:13 ` [PATCH 01/10] libext2fs: make it possible to extract the fd from an IO manager Darrick J. Wong
2025-08-21 1:13 ` [PATCH 02/10] libext2fs: always fsync the device when flushing the cache Darrick J. Wong
2025-08-21 1:13 ` [PATCH 03/10] libext2fs: always fsync the device when closing the unix IO manager Darrick J. Wong
2025-08-21 1:14 ` [PATCH 04/10] libext2fs: only fsync the unix fd if we wrote to the device Darrick J. Wong
2025-08-21 1:14 ` [PATCH 05/10] libext2fs: invalidate cached blocks when freeing them Darrick J. Wong
2025-08-21 1:14 ` [PATCH 06/10] libext2fs: only flush affected blocks in unix_write_byte Darrick J. Wong
2025-08-21 1:14 ` [PATCH 07/10] libext2fs: allow unix_write_byte when the write would be aligned Darrick J. Wong
2025-08-21 1:15 ` [PATCH 08/10] libext2fs: allow clients to ask to write full superblocks Darrick J. Wong
2025-08-21 1:15 ` [PATCH 09/10] libext2fs: allow callers to disallow I/O to file data blocks Darrick J. Wong
2025-08-21 1:15 ` [PATCH 10/10] libext2fs: add posix advisory locking to the unix IO manager Darrick J. Wong
2025-08-21 0:49 ` [PATCHSET RFC v4 3/6] fuse2fs: use fuse iomap data paths for better file I/O performance Darrick J. Wong
2025-08-21 1:15 ` [PATCH 01/19] fuse2fs: implement bare minimum iomap for file mapping reporting Darrick J. Wong
2025-08-21 1:16 ` [PATCH 02/19] fuse2fs: add iomap= mount option Darrick J. Wong
2025-08-21 1:16 ` [PATCH 03/19] fuse2fs: implement iomap configuration Darrick J. Wong
2025-08-21 1:16 ` [PATCH 04/19] fuse2fs: register block devices for use with iomap Darrick J. Wong
2025-08-21 1:17 ` [PATCH 05/19] fuse2fs: implement directio file reads Darrick J. Wong
2025-08-21 1:17 ` [PATCH 06/19] fuse2fs: add extent dump function for debugging Darrick J. Wong
2025-08-21 1:17 ` [PATCH 07/19] fuse2fs: implement direct write support Darrick J. Wong
2025-08-21 1:17 ` [PATCH 08/19] fuse2fs: turn on iomap for pagecache IO Darrick J. Wong
2025-08-21 1:18 ` [PATCH 09/19] fuse2fs: don't zero bytes in punch hole Darrick J. Wong
2025-08-21 1:18 ` Darrick J. Wong [this message]
2025-08-21 1:18 ` [PATCH 11/19] fuse2fs: avoid fuseblk mode if fuse-iomap support is likely Darrick J. Wong
2025-08-21 1:18 ` [PATCH 12/19] fuse2fs: enable file IO to inline data files Darrick J. Wong
2025-08-21 1:19 ` [PATCH 13/19] fuse2fs: set iomap-related inode flags Darrick J. Wong
2025-08-21 1:19 ` [PATCH 14/19] fuse2fs: add strictatime/lazytime mount options Darrick J. Wong
2025-08-21 1:19 ` [PATCH 15/19] fuse2fs: configure block device block size Darrick J. Wong
2025-08-21 1:19 ` [PATCH 16/19] fuse4fs: don't use inode number translation when possible Darrick J. Wong
2025-08-21 1:20 ` [PATCH 17/19] fuse4fs: separate invalidation Darrick J. Wong
2025-08-21 1:20 ` [PATCH 18/19] fuse2fs: implement statx Darrick J. Wong
2025-08-21 1:20 ` [PATCH 19/19] fuse2fs: enable atomic writes Darrick J. Wong
2025-08-21 0:50 ` [PATCHSET RFC v4 4/6] fuse2fs: use fuse iomap data paths for better file I/O performance Darrick J. Wong
2025-08-21 1:20 ` [PATCH 1/2] fuse2fs: enable caching of iomaps Darrick J. Wong
2025-08-21 1:21 ` [PATCH 2/2] fuse2fs: be smarter about caching iomaps Darrick J. Wong
2025-08-21 0:50 ` [PATCHSET RFC v4 5/6] fuse2fs: handle timestamps and ACLs correctly when iomap is enabled Darrick J. Wong
2025-08-21 1:21 ` [PATCH 1/8] fuse2fs: skip permission checking on utimens " Darrick J. Wong
2025-08-21 1:21 ` [PATCH 2/8] fuse2fs: let the kernel tell us about acl/mode updates Darrick J. Wong
2025-08-21 1:21 ` [PATCH 3/8] fuse2fs: better debugging for file mode updates Darrick J. Wong
2025-08-21 1:22 ` [PATCH 4/8] fuse2fs: debug timestamp updates Darrick J. Wong
2025-08-21 1:22 ` [PATCH 5/8] fuse2fs: use coarse timestamps for iomap mode Darrick J. Wong
2025-08-21 1:22 ` [PATCH 6/8] fuse2fs: add tracing for retrieving timestamps Darrick J. Wong
2025-08-21 1:23 ` [PATCH 7/8] fuse2fs: enable syncfs Darrick J. Wong
2025-08-21 1:23 ` [PATCH 8/8] fuse2fs: skip the gdt write in op_destroy if syncfs is working Darrick J. Wong
2025-08-21 0:50 ` [PATCHSET RFC v4 6/6] fuse2fs: improve block and inode caching Darrick J. Wong
2025-08-21 1:23 ` [PATCH 1/6] libsupport: add caching IO manager Darrick J. Wong
2025-08-21 1:23 ` [PATCH 2/6] iocache: add the actual buffer cache Darrick J. Wong
2025-08-21 1:24 ` [PATCH 3/6] iocache: bump buffer mru priority every 50 accesses Darrick J. Wong
2025-08-21 1:24 ` [PATCH 4/6] fuse2fs: enable caching IO manager Darrick J. Wong
2025-08-21 1:24 ` [PATCH 5/6] fuse2fs: increase inode cache size Darrick J. Wong
2025-08-21 1:24 ` [PATCH 6/6] libext2fs: improve caching for inodes Darrick J. Wong
-- strict thread matches above, loose matches on Subject: below --
2026-02-23 23:04 [PATCHSET v7 1/8] fuse2fs: use fuse iomap data paths for better file I/O performance Darrick J. Wong
2026-02-23 23:38 ` [PATCH 10/19] fuse2fs: don't do file data block IO when iomap is enabled Darrick J. Wong
2026-04-29 14:20 [PATCHSET v8 2/6] fuse2fs: use fuse iomap data paths for better file I/O performance Darrick J. Wong
2026-04-29 14:55 ` [PATCH 10/19] fuse2fs: don't do file data block IO when iomap is enabled Darrick J. Wong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=175573713910.21970.597991894155936504.stgit@frogsfrogsfrogs \
--to=djwong@kernel.org \
--cc=John@groves.net \
--cc=bernd@bsbernd.com \
--cc=joannelkoong@gmail.com \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=miklos@szeredi.hu \
--cc=neal@gompa.dev \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox