public inbox for linux-fsdevel@vger.kernel.org
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <djwong@kernel.org>
To: Amir Goldstein <amir73il@gmail.com>
Cc: linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	linux-ext4 <linux-ext4@vger.kernel.org>,
	fuse-devel <fuse-devel@lists.linux.dev>,
	Miklos Szeredi <miklos@szeredi.hu>,
	Bernd Schubert <bernd@bsbernd.com>,
	Joanne Koong <joannelkoong@gmail.com>,
	Theodore Ts'o <tytso@mit.edu>, Neal Gompa <neal@gompa.dev>,
	Christian Brauner <brauner@kernel.org>,
	demiobenour@gmail.com, Naoki MATSUMOTO <m.naoki9911@gmail.com>
Subject: Re: [PATCHBOMB v5] fuse/libfuse/e2fsprogs/etc: containerize ext4 for safer operation
Date: Thu, 23 Apr 2026 07:50:26 -0700	[thread overview]
Message-ID: <20260423145026.GC3778109@frogsfrogsfrogs> (raw)
In-Reply-To: <CAOQ4uxhEmbFWuM80F+Tq7TbDWjrL-znkwu=O=P7Ng2P5dksXsg@mail.gmail.com>

On Thu, Apr 23, 2026 at 10:44:31AM +0200, Amir Goldstein wrote:
> On Thu, Apr 23, 2026 at 1:15 AM Darrick J. Wong <djwong@kernel.org> wrote:
> >
> > Hi everyone,
> >
> > This *would have been* the eight public draft of the gigantic patchset
> > to connect the Linux fuse driver to fs-iomap for regular file IO
> > operations to and from files whose contents persist to locally attached
> > storage devices.
> >
> > However, the previous submission was too large, and I didn't even send
> > half the patches!  I have therefore split the work into two sections.
> > This first section covers setting up fuse servers to run as contained
> > systemd services; I previously sent only the libfuse changes, without
> > any of the surrounding pieces.  Now I'm ready to send them all.
> >
> > To summarize this patchbomb: fuse servers can now run as non-root users,
> > with no privilege, no access to the network or hardware, etc.  The only
> > connection to the outside is an ephemeral AF_UNIX socket.  The process
> > on the other end is a helper program that acquires resources and calls
> > fsmount().
> >
> > Why would you want to do that?  Most filesystem drivers are seriously
> > vulnerable to metadata parsing attacks, as syzbot has shown repeatedly
> > over almost a decade of its existence.  Faulty code can lead to total
> > kernel compromise, and I think there's a very strong incentive to move
> > all that parsing out to userspace where we can containerize the fuse
> > server process.  Runtime filesystem metadata parsing is no longer a
> > privileged (== risky) operation.
> >
> > The consequences of a crashed driver is a dead mount, instead of a
> > crashed or corrupt OS kernel.
> >
> > Note that contained fuse filesystem servers are no faster than regular
> > fuse.  The redesign of the fuse IO path via iomap will be the subject of
> > the second patchbomb.  The containerization code only requires changes
> > to libfuse and is ready to go today.
> >
> > Since the seventh submission, I have made the following changes:
> >
> > 1) Added a couple of simple fuse service drivers to the example code
> >
> > 2) Adapted fuservicemount to be runnable as a setuid program so that
> > unprivileged users can start up a containerized filesystem driver
> >
> > 3) Fixed some endianness handling errors in the socket protocol between
> > the new mount helper and the fuse server
> >
> > 4) Added a high level fuse_main function so that fuse servers that use
> > the high level api can containerize without a total rewrite
> >
> > 5) Adapted mount.fuse to call the new mount helper code so that mount -t
> > fuse.XXX can try to start up a contained server
> >
> > 6) Cleaned up a lot of cppcheck complaints and refactored a bunch of
> > repetitious code
> >
> > 7) Started using codex to try to find bugs and security problems with
> > the new mount helper
> >
> > There are a few unanswered questions:
> >
> > a. How to integrate with the SYNC_INIT patches that Bernd is working on
> > merging into libfuse
> >
> > b. If /any/ of the new fsopen/fsconfig/fsmount/move_mount calls fail,
> > do we fall back to the old mount syscall?  Even after printing errors?
> >
> > c. Are there any Linux systems where some inetd implementation can
> > actually handle AF_UNIX sockets?  Does it make sense to try to do the
> > service isolation without the convenience of systemd directives?
> 
> A large part of the world is running container workloads on kubernetes
> and my understanding is that k8s does not mix well with systemd.
> 
> We have successfully used the fusetmount3-proxy [1] approach by Naoki
> MATSUMOTO as a way for unprivileged containers to delegate fuse mount
> by a (non-systemd) service, running in another container.
> 
> [1] https://github.com/pfnet-research/meta-fuse-csi-plugin#fusermount3-proxy-modified-fusermount3-approach
> 
> The README says that sshfs, s3fs and other high profile fuse fs have been
> tested with this approach and they do not require any rebuild.
> 
> So it bears the question...
> 
> >
> > d. meson/autoconf/cmake are a pain to deal with, hopefully the changes I
> > made are correct
> >
> > I have also converted a handful more fuse servers (fat, exfat, iso,
> > http) to the new service architecture so that I can run a (virtual)
> > Debian system with EFI completely off of containerized fuse servers.
> > These will be sent at the end.
> >
> 
> ... what is the added value of rebuilding those packages with systemd
> service support?
> 
> I am not implying that there is no added value, I just am not well versed
> in the world of container and system services.

From the discussion of fusermount3-proxy:
https://github.com/pfnet-research/meta-fuse-csi-plugin/raw/main/assets/inside-fusermount3-proxy.png
or
https://tech.preferred.jp/wp-content/uploads/2023/11/figures-6.png

Their approach spins up a second "CSI driver pod" (aka another contained
environment) to run fusermount3 with CAP_SYS_ADMIN.  This means that the
"user pod" has to be able to access all resources necessary to mount the
filesystem, e.g. virtual disk images, actual block devices, networking,
etc.  Once the fuse server is running, it's obviously still running in
the same environment as the user pod.

With my approach, resource acquisition can be done up front, and the
fuse server can run in a very sealed environment.  No block devices, no
networking, no /home, and the minimal root filesystem.  It's systemd, so
you can be more permissive with the environment if you'd like.

The downside is that requires code changes in the fuse server because
open() won't work if you've trimmed the directory tree.  Hmm, there's
not much provision for sockets, maybe I need to extend the protocol.

A big roadblock: none of that code can be merged into libfuse because
it's all Apache 2.0 licensed, whereas libfuse is GPL2/LGPL2.  LLMwash
notwithstanding.

--D

      reply	other threads:[~2026-04-23 14:50 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-22 23:15 [PATCHBOMB v5] fuse/libfuse/e2fsprogs/etc: containerize ext4 for safer operation Darrick J. Wong
2026-04-22 23:18 ` [PATCHSET v5] libfuse: run fuse servers as a contained service Darrick J. Wong
2026-04-22 23:19   ` [PATCH 01/13] Refactor mount code / move common functions to mount_util.c Darrick J. Wong
2026-04-22 23:19   ` [PATCH 02/13] mount_service: add systemd socket service mounting helper Darrick J. Wong
2026-04-22 23:20   ` [PATCH 03/13] mount_service: create high level fuse helpers Darrick J. Wong
2026-04-22 23:20   ` [PATCH 04/13] mount_service: use the new mount api for the mount service Darrick J. Wong
2026-04-22 23:20   ` [PATCH 05/13] mount_service: update mtab after a successful mount Darrick J. Wong
2026-04-22 23:20   ` [PATCH 06/13] util: hoist the fuse.conf parsing and setuid mode enforcement code Darrick J. Wong
2026-04-22 23:21   ` [PATCH 07/13] util: fix checkpatch complaints in fuser_conf.[ch] Darrick J. Wong
2026-04-22 23:21   ` [PATCH 08/13] mount_service: enable unprivileged users in a similar manner as fusermount Darrick J. Wong
2026-04-22 23:21   ` [PATCH 09/13] mount.fuse3: integrate systemd service startup Darrick J. Wong
2026-04-22 23:21   ` [PATCH 10/13] mount_service: allow installation as a setuid program Darrick J. Wong
2026-04-22 23:22   ` [PATCH 11/13] example/service_ll: create a sample systemd service fuse server Darrick J. Wong
2026-04-22 23:22   ` [PATCH 12/13] example/service: create a sample systemd service for a high-level " Darrick J. Wong
2026-04-22 23:22   ` [PATCH 13/13] nullfs: support fuse systemd service mode Darrick J. Wong
2026-04-22 23:19 ` [PATCHSET v5 2/2] fuse4fs: run servers as a contained service Darrick J. Wong
2026-04-22 23:23   ` [PATCH 01/10] libext2fs: make it possible to extract the fd from an IO manager Darrick J. Wong
2026-04-22 23:24   ` [PATCH 02/10] libext2fs: fix checking for valid fds in mmp.c Darrick J. Wong
2026-04-22 23:24   ` [PATCH 03/10] unix_io: allow passing /dev/fd/XXX paths to the unixfd IO manager Darrick J. Wong
2026-04-22 23:24   ` [PATCH 04/10] libext2fs: fix MMP code to work with " Darrick J. Wong
2026-04-22 23:24   ` [PATCH 05/10] libext2fs: bump libfuse API version to 3.19 Darrick J. Wong
2026-04-22 23:25   ` [PATCH 06/10] fuse4fs: hoist some code out of fuse4fs_main Darrick J. Wong
2026-04-22 23:25   ` [PATCH 07/10] fuse4fs: enable safe service mode Darrick J. Wong
2026-04-22 23:25   ` [PATCH 08/10] fuse4fs: set proc title when in fuse " Darrick J. Wong
2026-04-22 23:25   ` [PATCH 09/10] fuse4fs: make MMP work correctly in safe " Darrick J. Wong
2026-04-22 23:26   ` [PATCH 10/10] debian: update packaging for fuse4fs service Darrick J. Wong
2026-04-22 23:29 ` [RFC PATCH 1/4] fusefatfs: enable fuse systemd service mode Darrick J. Wong
2026-04-22 23:30 ` [RFC PATCH 2/4] exfat: " Darrick J. Wong
2026-04-22 23:32 ` [RFC PATCH 3/4] fuseiso: enable " Darrick J. Wong
2026-04-22 23:32 ` [RFC PATCH 4/4] httpdirfs: enable fuse " Darrick J. Wong
2026-04-23  8:44 ` [PATCHBOMB v5] fuse/libfuse/e2fsprogs/etc: containerize ext4 for safer operation Amir Goldstein
2026-04-23 14:50   ` Darrick J. Wong [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260423145026.GC3778109@frogsfrogsfrogs \
    --to=djwong@kernel.org \
    --cc=amir73il@gmail.com \
    --cc=bernd@bsbernd.com \
    --cc=brauner@kernel.org \
    --cc=demiobenour@gmail.com \
    --cc=fuse-devel@lists.linux.dev \
    --cc=joannelkoong@gmail.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=m.naoki9911@gmail.com \
    --cc=miklos@szeredi.hu \
    --cc=neal@gompa.dev \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox