public inbox for linux-ext4@vger.kernel.org
 help / color / mirror / Atom feed
* [RFC v5] fuse: containerize ext4 for safer operation
@ 2025-09-16  0:07 Darrick J. Wong
  2025-09-16  0:21 ` [PATCHSET 1/9] fuse2fs: upgrade to libfuse 3.17 Darrick J. Wong
                   ` (8 more replies)
  0 siblings, 9 replies; 87+ messages in thread
From: Darrick J. Wong @ 2025-09-16  0:07 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Miklos Szeredi, Bernd Schubert, Joanne Koong, John Groves,
	Josef Bacik, linux-ext4, Theodore Ts'o, Neal Gompa,
	Amir Goldstein, Christian Brauner, Jeff Layton

Hi everyone,

[Ok maybe it's time to merge some of this stuff.  I'm removing the RFC
tag, but most likely the only patches that should get merged at this
point are the bugfixes at the start.  Don't merge the rest until after
the 2025 LTS kernel merge window closes, please.]

This is the fifth public draft of a prototype to connect the Linux fuse
driver to fs-iomap for regular file IO operations to and from files
whose contents persist to locally attached storage devices.  With this
release, I show that it's possible to build a fuse server for a real
filesystem (ext4) that runs entirely in userspace yet maintains most of
its performance.  Furthermore, I also show that the userspace program
runs with minimal privilege, which means that we no longer need to have
filesystem metadata parsing be a privileged (== risky) operation.

Why would you want to do that?  Most filesystem drivers are seriously
vulnerable to metadata parsing attacks, as syzbot has shown repeatedly
over almost a decade of its existence.  Faulty code can lead to total
kernel compromise, and I think there's a very strong incentive to move
all that parsing out to userspace where we can containerize the fuse
server process.

willy's folios conversion project (and to a certain degree RH's new
mount API) have also demonstrated that treewide changes to the core
mm/pagecache/fs code are very very difficult to pull off and take years
because you have to understand every filesystem's bespoke use of that
core code.  Eeeugh.

The fuse command plumbing is very simple -- the ->iomap_begin,
->iomap_end, and iomap ->ioend calls within iomap are turned into
upcalls to the fuse server via a trio of new fuse commands.  Pagecache
writeback is now a directio write.  The fuse server is now able to
upsert mappings into the kernel for cached access (== zero upcalls for
rereads and pure overwrites!) and the iomap cache revalidation code
works.

At this stage I still get about 95% of the kernel ext4 driver's
streaming directio performance on streaming IO, and 110% of its
streaming buffered IO performance.  Random buffered IO is about 85% as
fast as the kernel.  Random direct IO is about 80% as fast as the
kernel; see the cover letter for the fuse2fs iomap changes for more
details.  Unwritten extent conversions on random direct writes are
especially painful for fuse+iomap (~90% more overhead) due to upcall
overhead.  And that's with (now dynamic) debugging turned on!

These items have been addressed since the fourth RFC:

1. After six months, I have achieved my primary goal: a containerized
   filesystem server!  We can now run fuse4fs as a completely
   unprivileged and namespace-restricted systemd service on behalf of
   anyone who can open a file and mount it.  Many thanks again to
   Christian (and Miklos and Bernd and Amir) for their help!

   Someone who knows how to design socket-based protocols ought to have
   a look at the libfuse changes.  The mount helper and the fuse server
   communicate via a AF_UNIX socket, which enables the mount helper to
   pass resources into the service container.

2. I took a stab at implementing fsdax.  I then encountered the horror
   that is dax_writeback_mapping_range and abandoned that work.
   Writeback needs to iterate the file mappings and not make assumptions
   about the backing device ... but that's not a problem that anyone
   here needs to solve.

3. struct fuse_inode shrank after I verified that the iomap fileio paths
   never have to venture into the regular or wb cache paths.

4. fstests passes 99% of the tests that run, when iomap is enabled!
   96% pass when iomap is disabled, and I think that's due to some
   bugs in fstests.

5. Some VFS iflags (sync/immutable/append) now work.

6. iomap and passthrough share the backing file management code.  They
   are not expected to share backing files.

There are some major warts remaining:

a. I would like to start a discussion about how the design review of
   this code should be structured, and how might I go about creating new
   userspace filesystem servers -- lightweight new ones based off the
   existing userspace tools?  Or by merging lklfuse?

b. No design review document yet.

c. Why aren't we at 100% fstests passing?  Even with the kernel ext4?

d. I'm not 100% certain that the code that handles EOF zeroing actually
   works correctly.  Does fuse+iomap need to track both the server's
   and the VFS' notion of EOF the same way that XFS does?

e. ext4 doesn't support out of place writes so I don't know if that
   actually works correctly.

f. fuse2fs doesn't support the ext4 journal.  Urk.

g. There's a VERY large quantity of fuse2fs improvements that need to be
   applied before we get to the fuse-iomap parts.  I'm not sending these
   (or the fstests changes) to keep the size of the patchbomb at
   "unreasonably large". :P

I'll work on these in October, but now you all have an alpha-complete
demonstration to take a look at.

--Darrick




^ permalink raw reply	[flat|nested] 87+ messages in thread

end of thread, other threads:[~2025-10-08 22:09 UTC | newest]

Thread overview: 87+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-16  0:07 [RFC v5] fuse: containerize ext4 for safer operation Darrick J. Wong
2025-09-16  0:21 ` [PATCHSET 1/9] fuse2fs: upgrade to libfuse 3.17 Darrick J. Wong
2025-09-16  0:49   ` [PATCH 1/4] fuse2fs: bump library version Darrick J. Wong
2025-09-16  0:50   ` [PATCH 2/4] fuse2fs: wrap the fuse_set_feature_flag helper for older libfuse Darrick J. Wong
2025-09-16  0:50   ` [PATCH 3/4] fuse2fs: disable nfs exports Darrick J. Wong
2025-09-16  0:50   ` [PATCH 4/4] fuse2fs: drop fuse 2.x support code Darrick J. Wong
2025-09-16  0:22 ` [PATCHSET RFC v5 2/9] fuse4fs: fork a low level fuse server Darrick J. Wong
2025-09-16  0:50   ` [PATCH 01/21] fuse2fs: separate libfuse3 and fuse2fs detection in configure Darrick J. Wong
2025-09-16  0:51   ` [PATCH 02/21] fuse2fs: start porting fuse2fs to lowlevel libfuse API Darrick J. Wong
2025-09-16  0:51   ` [PATCH 03/21] debian: create new package for fuse4fs Darrick J. Wong
2025-09-16  0:51   ` [PATCH 04/21] fuse4fs: namespace some helpers Darrick J. Wong
2025-09-16  0:51   ` [PATCH 05/21] fuse4fs: convert to low level API Darrick J. Wong
2025-09-16  0:52   ` [PATCH 06/21] libsupport: port the kernel list.h to libsupport Darrick J. Wong
2025-09-16  0:52   ` [PATCH 07/21] libsupport: add a cache Darrick J. Wong
2025-09-16  0:52   ` [PATCH 08/21] cache: disable debugging Darrick J. Wong
2025-09-16  0:53   ` [PATCH 09/21] cache: use modern list iterator macros Darrick J. Wong
2025-09-16  0:53   ` [PATCH 10/21] cache: embed struct cache in the owner Darrick J. Wong
2025-09-16  0:53   ` [PATCH 11/21] cache: pass cache pointer to callbacks Darrick J. Wong
2025-09-16  0:53   ` [PATCH 12/21] cache: pass a private data pointer through cache_walk Darrick J. Wong
2025-09-16  0:54   ` [PATCH 13/21] cache: add a helper to grab a new refcount for a cache_node Darrick J. Wong
2025-09-16  0:54   ` [PATCH 14/21] cache: return results of a cache flush Darrick J. Wong
2025-09-16  0:54   ` [PATCH 15/21] cache: add a "get only if incore" flag to cache_node_get Darrick J. Wong
2025-09-16  0:54   ` [PATCH 16/21] cache: support gradual expansion Darrick J. Wong
2025-09-16  0:55   ` [PATCH 17/21] cache: implement automatic shrinking Darrick J. Wong
2025-09-16  0:55   ` [PATCH 18/21] fuse4fs: add cache to track open files Darrick J. Wong
2025-09-16  0:55   ` [PATCH 19/21] fuse4fs: use the orphaned inode list Darrick J. Wong
2025-09-16  0:55   ` [PATCH 20/21] fuse4fs: implement FUSE_TMPFILE Darrick J. Wong
2025-09-16  0:56   ` [PATCH 21/21] fuse4fs: create incore reverse orphan list Darrick J. Wong
2025-09-16  0:22 ` [PATCHSET RFC v5 3/9] libext2fs: refactoring for fuse2fs iomap support Darrick J. Wong
2025-09-16  0:56   ` [PATCH 01/10] libext2fs: make it possible to extract the fd from an IO manager Darrick J. Wong
2025-09-16  0:56   ` [PATCH 02/10] libext2fs: always fsync the device when flushing the cache Darrick J. Wong
2025-09-16  0:56   ` [PATCH 03/10] libext2fs: always fsync the device when closing the unix IO manager Darrick J. Wong
2025-09-16  0:57   ` [PATCH 04/10] libext2fs: only fsync the unix fd if we wrote to the device Darrick J. Wong
2025-09-16  0:57   ` [PATCH 05/10] libext2fs: invalidate cached blocks when freeing them Darrick J. Wong
2025-09-16  0:57   ` [PATCH 06/10] libext2fs: only flush affected blocks in unix_write_byte Darrick J. Wong
2025-09-16  0:57   ` [PATCH 07/10] libext2fs: allow unix_write_byte when the write would be aligned Darrick J. Wong
2025-09-16  0:58   ` [PATCH 08/10] libext2fs: allow clients to ask to write full superblocks Darrick J. Wong
2025-09-16  0:58   ` [PATCH 09/10] libext2fs: allow callers to disallow I/O to file data blocks Darrick J. Wong
2025-09-16  0:58   ` [PATCH 10/10] libext2fs: add posix advisory locking to the unix IO manager Darrick J. Wong
2025-10-08 22:09     ` Darrick J. Wong
2025-09-16  0:22 ` [PATCHSET RFC v5 4/9] fuse2fs: use fuse iomap data paths for better file I/O performance Darrick J. Wong
2025-09-16  0:58   ` [PATCH 01/17] fuse2fs: implement bare minimum iomap for file mapping reporting Darrick J. Wong
2025-09-16  0:59   ` [PATCH 02/17] fuse2fs: add iomap= mount option Darrick J. Wong
2025-09-16  0:59   ` [PATCH 03/17] fuse2fs: implement iomap configuration Darrick J. Wong
2025-09-16  0:59   ` [PATCH 04/17] fuse2fs: register block devices for use with iomap Darrick J. Wong
2025-09-16  1:00   ` [PATCH 05/17] fuse2fs: implement directio file reads Darrick J. Wong
2025-09-16  1:00   ` [PATCH 06/17] fuse2fs: add extent dump function for debugging Darrick J. Wong
2025-09-16  1:00   ` [PATCH 07/17] fuse2fs: implement direct write support Darrick J. Wong
2025-09-16  1:00   ` [PATCH 08/17] fuse2fs: turn on iomap for pagecache IO Darrick J. Wong
2025-09-16  1:01   ` [PATCH 09/17] fuse2fs: don't zero bytes in punch hole Darrick J. Wong
2025-09-16  1:01   ` [PATCH 10/17] fuse2fs: don't do file data block IO when iomap is enabled Darrick J. Wong
2025-09-16  1:01   ` [PATCH 11/17] fuse2fs: avoid fuseblk mode if fuse-iomap support is likely Darrick J. Wong
2025-09-16  1:01   ` [PATCH 12/17] fuse2fs: enable file IO to inline data files Darrick J. Wong
2025-09-16  1:02   ` [PATCH 13/17] fuse2fs: set iomap-related inode flags Darrick J. Wong
2025-09-16  1:02   ` [PATCH 14/17] fuse2fs: configure block device block size Darrick J. Wong
2025-09-16  1:02   ` [PATCH 15/17] fuse4fs: separate invalidation Darrick J. Wong
2025-09-16  1:02   ` [PATCH 16/17] fuse2fs: implement statx Darrick J. Wong
2025-09-16  1:03   ` [PATCH 17/17] fuse2fs: enable atomic writes Darrick J. Wong
2025-09-16  0:22 ` [PATCHSET RFC v5 5/9] fuse4fs: specify the root node id Darrick J. Wong
2025-09-16  1:03   ` [PATCH 1/1] fuse4fs: don't use inode number translation when possible Darrick J. Wong
2025-09-16  0:23 ` [PATCHSET RFC v5 6/9] fuse2fs: handle timestamps and ACLs correctly when iomap is enabled Darrick J. Wong
2025-09-16  1:03   ` [PATCH 01/10] fuse2fs: add strictatime/lazytime mount options Darrick J. Wong
2025-09-16  1:03   ` [PATCH 02/10] fuse2fs: skip permission checking on utimens when iomap is enabled Darrick J. Wong
2025-09-16  1:04   ` [PATCH 03/10] fuse2fs: let the kernel tell us about acl/mode updates Darrick J. Wong
2025-09-16  1:04   ` [PATCH 04/10] fuse2fs: better debugging for file mode updates Darrick J. Wong
2025-09-16  1:04   ` [PATCH 05/10] fuse2fs: debug timestamp updates Darrick J. Wong
2025-09-16  1:05   ` [PATCH 06/10] fuse2fs: use coarse timestamps for iomap mode Darrick J. Wong
2025-09-16  1:05   ` [PATCH 07/10] fuse2fs: add tracing for retrieving timestamps Darrick J. Wong
2025-09-16  1:05   ` [PATCH 08/10] fuse2fs: enable syncfs Darrick J. Wong
2025-09-16  1:05   ` [PATCH 09/10] fuse2fs: skip the gdt write in op_destroy if syncfs is working Darrick J. Wong
2025-09-16  1:06   ` [PATCH 10/10] fuse2fs: set sync, immutable, and append at file load time Darrick J. Wong
2025-09-16  0:23 ` [PATCHSET RFC v5 7/9] fuse2fs: cache iomap mappings for even better file IO performance Darrick J. Wong
2025-09-16  1:06   ` [PATCH 1/3] fuse2fs: enable caching of iomaps Darrick J. Wong
2025-09-16  1:06   ` [PATCH 2/3] fuse2fs: be smarter about caching iomaps Darrick J. Wong
2025-09-16  1:06   ` [PATCH 3/3] fuse2fs: enable iomap Darrick J. Wong
2025-09-16  0:23 ` [PATCHSET RFC v5 8/9] fuse2fs: improve block and inode caching Darrick J. Wong
2025-09-16  1:07   ` [PATCH 1/6] libsupport: add caching IO manager Darrick J. Wong
2025-09-16  1:07   ` [PATCH 2/6] iocache: add the actual buffer cache Darrick J. Wong
2025-09-16  1:07   ` [PATCH 3/6] iocache: bump buffer mru priority every 50 accesses Darrick J. Wong
2025-09-16  1:07   ` [PATCH 4/6] fuse2fs: enable caching IO manager Darrick J. Wong
2025-09-16  1:08   ` [PATCH 5/6] fuse2fs: increase inode cache size Darrick J. Wong
2025-09-16  1:08   ` [PATCH 6/6] libext2fs: improve caching for inodes Darrick J. Wong
2025-09-16  0:24 ` [PATCHSET RFC v5 9/9] fuse4fs: run servers as a contained service Darrick J. Wong
2025-09-16  1:08   ` [PATCH 1/4] libext2fs: fix MMP code to work with unixfd IO manager Darrick J. Wong
2025-09-16  1:08   ` [PATCH 2/4] fuse4fs: enable safe service mode Darrick J. Wong
2025-09-16  1:09   ` [PATCH 3/4] fuse4fs: set proc title when in fuse " Darrick J. Wong
2025-09-16  1:09   ` [PATCH 4/4] fuse4fs: set iomap backing device blocksize Darrick J. Wong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox