public inbox for linux-ext4@vger.kernel.org
 help / color / mirror / Atom feed
* [RFC v5] fuse: containerize ext4 for safer operation
@ 2025-09-16  0:07 Darrick J. Wong
  2025-09-16  0:21 ` [PATCHSET 1/9] fuse2fs: upgrade to libfuse 3.17 Darrick J. Wong
                   ` (8 more replies)
  0 siblings, 9 replies; 87+ messages in thread
From: Darrick J. Wong @ 2025-09-16  0:07 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Miklos Szeredi, Bernd Schubert, Joanne Koong, John Groves,
	Josef Bacik, linux-ext4, Theodore Ts'o, Neal Gompa,
	Amir Goldstein, Christian Brauner, Jeff Layton

Hi everyone,

[Ok maybe it's time to merge some of this stuff.  I'm removing the RFC
tag, but most likely the only patches that should get merged at this
point are the bugfixes at the start.  Don't merge the rest until after
the 2025 LTS kernel merge window closes, please.]

This is the fifth public draft of a prototype to connect the Linux fuse
driver to fs-iomap for regular file IO operations to and from files
whose contents persist to locally attached storage devices.  With this
release, I show that it's possible to build a fuse server for a real
filesystem (ext4) that runs entirely in userspace yet maintains most of
its performance.  Furthermore, I also show that the userspace program
runs with minimal privilege, which means that we no longer need to have
filesystem metadata parsing be a privileged (== risky) operation.

Why would you want to do that?  Most filesystem drivers are seriously
vulnerable to metadata parsing attacks, as syzbot has shown repeatedly
over almost a decade of its existence.  Faulty code can lead to total
kernel compromise, and I think there's a very strong incentive to move
all that parsing out to userspace where we can containerize the fuse
server process.

willy's folios conversion project (and to a certain degree RH's new
mount API) have also demonstrated that treewide changes to the core
mm/pagecache/fs code are very very difficult to pull off and take years
because you have to understand every filesystem's bespoke use of that
core code.  Eeeugh.

The fuse command plumbing is very simple -- the ->iomap_begin,
->iomap_end, and iomap ->ioend calls within iomap are turned into
upcalls to the fuse server via a trio of new fuse commands.  Pagecache
writeback is now a directio write.  The fuse server is now able to
upsert mappings into the kernel for cached access (== zero upcalls for
rereads and pure overwrites!) and the iomap cache revalidation code
works.

At this stage I still get about 95% of the kernel ext4 driver's
streaming directio performance on streaming IO, and 110% of its
streaming buffered IO performance.  Random buffered IO is about 85% as
fast as the kernel.  Random direct IO is about 80% as fast as the
kernel; see the cover letter for the fuse2fs iomap changes for more
details.  Unwritten extent conversions on random direct writes are
especially painful for fuse+iomap (~90% more overhead) due to upcall
overhead.  And that's with (now dynamic) debugging turned on!

These items have been addressed since the fourth RFC:

1. After six months, I have achieved my primary goal: a containerized
   filesystem server!  We can now run fuse4fs as a completely
   unprivileged and namespace-restricted systemd service on behalf of
   anyone who can open a file and mount it.  Many thanks again to
   Christian (and Miklos and Bernd and Amir) for their help!

   Someone who knows how to design socket-based protocols ought to have
   a look at the libfuse changes.  The mount helper and the fuse server
   communicate via a AF_UNIX socket, which enables the mount helper to
   pass resources into the service container.

2. I took a stab at implementing fsdax.  I then encountered the horror
   that is dax_writeback_mapping_range and abandoned that work.
   Writeback needs to iterate the file mappings and not make assumptions
   about the backing device ... but that's not a problem that anyone
   here needs to solve.

3. struct fuse_inode shrank after I verified that the iomap fileio paths
   never have to venture into the regular or wb cache paths.

4. fstests passes 99% of the tests that run, when iomap is enabled!
   96% pass when iomap is disabled, and I think that's due to some
   bugs in fstests.

5. Some VFS iflags (sync/immutable/append) now work.

6. iomap and passthrough share the backing file management code.  They
   are not expected to share backing files.

There are some major warts remaining:

a. I would like to start a discussion about how the design review of
   this code should be structured, and how might I go about creating new
   userspace filesystem servers -- lightweight new ones based off the
   existing userspace tools?  Or by merging lklfuse?

b. No design review document yet.

c. Why aren't we at 100% fstests passing?  Even with the kernel ext4?

d. I'm not 100% certain that the code that handles EOF zeroing actually
   works correctly.  Does fuse+iomap need to track both the server's
   and the VFS' notion of EOF the same way that XFS does?

e. ext4 doesn't support out of place writes so I don't know if that
   actually works correctly.

f. fuse2fs doesn't support the ext4 journal.  Urk.

g. There's a VERY large quantity of fuse2fs improvements that need to be
   applied before we get to the fuse-iomap parts.  I'm not sending these
   (or the fstests changes) to keep the size of the patchbomb at
   "unreasonably large". :P

I'll work on these in October, but now you all have an alpha-complete
demonstration to take a look at.

--Darrick




^ permalink raw reply	[flat|nested] 87+ messages in thread

* [PATCHSET 1/9] fuse2fs: upgrade to libfuse 3.17
  2025-09-16  0:07 [RFC v5] fuse: containerize ext4 for safer operation Darrick J. Wong
@ 2025-09-16  0:21 ` Darrick J. Wong
  2025-09-16  0:49   ` [PATCH 1/4] fuse2fs: bump library version Darrick J. Wong
                     ` (3 more replies)
  2025-09-16  0:22 ` [PATCHSET RFC v5 2/9] fuse4fs: fork a low level fuse server Darrick J. Wong
                   ` (7 subsequent siblings)
  8 siblings, 4 replies; 87+ messages in thread
From: Darrick J. Wong @ 2025-09-16  0:21 UTC (permalink / raw)
  To: tytso; +Cc: amir73il, linux-ext4

Hi all,

In preparation to start hacking on fuse2fs and iomap, upgrade fuse2fs
library support to 3.17, which is the latest upstream release as of this
writing.  Drop support for libfuse2, which is now very obsolete.

If you're going to start using this code, I strongly recommend pulling
from my git trees, which are linked below.

Comments and questions are, as always, welcome.

e2fsprogs git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/e2fsprogs.git/log/?h=fuse2fs-library-upgrade
---
Commits in this patchset:
 * fuse2fs: bump library version
 * fuse2fs: wrap the fuse_set_feature_flag helper for older libfuse
 * fuse2fs: disable nfs exports
 * fuse2fs: drop fuse 2.x support code
---
 configure      |  318 +++++---------------------------------------------------
 configure.ac   |   85 +++++----------
 misc/fuse2fs.c |  252 ++++++++++----------------------------------
 3 files changed, 115 insertions(+), 540 deletions(-)


^ permalink raw reply	[flat|nested] 87+ messages in thread

* [PATCHSET RFC v5 2/9] fuse4fs: fork a low level fuse server
  2025-09-16  0:07 [RFC v5] fuse: containerize ext4 for safer operation Darrick J. Wong
  2025-09-16  0:21 ` [PATCHSET 1/9] fuse2fs: upgrade to libfuse 3.17 Darrick J. Wong
@ 2025-09-16  0:22 ` Darrick J. Wong
  2025-09-16  0:50   ` [PATCH 01/21] fuse2fs: separate libfuse3 and fuse2fs detection in configure Darrick J. Wong
                     ` (20 more replies)
  2025-09-16  0:22 ` [PATCHSET RFC v5 3/9] libext2fs: refactoring for fuse2fs iomap support Darrick J. Wong
                   ` (6 subsequent siblings)
  8 siblings, 21 replies; 87+ messages in thread
From: Darrick J. Wong @ 2025-09-16  0:22 UTC (permalink / raw)
  To: tytso
  Cc: amir73il, miklos, neal, amir73il, linux-fsdevel, linux-ext4, John,
	bernd, joannelkoong

Hi all,

Whilst developing the fuse2fs+iomap prototype, I discovered a
fundamental design limitation of the upper-level libfuse API: hardlinks.
The upper level fuse library really wants to communicate with the fuse
server with file paths, instead of using inode numbers.  This works
great for filesystems that don't have inodes, create files dynamically
at runtime, or lack stable inode numbers.

Unfortunately, the libfuse path abstraction assigns a unique nodeid to
every child file in the entire filesystem, without regard to hard links.
In other words, a hardlinked regular file may have one ondisk inode
number but multiple kernel inodes.  For classic fuse2fs this isn't a
problem because all file access goes through the fuse server and the big
library lock protects us from corruption.

For fuse2fs + iomap this is a disaster because we rely on the kernel to
coordinate access to inodes.  For hardlinked files, we *require* that
there only be one in-kernel inode for each ondisk inode.

The path based mechanism is also very inefficient for fuse2fs.  Every
time a file is accessed, the upper level libfuse passes a new nodeid to
the kernel, and on every file access the kernel passes that same nodeid
back to libfuse.  libfuse then walks its internal directory entry cache
to construct a path string for that nodeid and hands it to fuse2fs.
fuse2fs then walks the ondisk directory structure to find the ext2 inode
number.  Every time.

Create a new fuse4fs server from fuse2fs that uses the lowlevel fuse
API.  This affords us direct control over nodeids and eliminates the
path wrangling.  Hardlinks can be supported when iomap is turned on,
and metadata-heavy workloads run twice as fast.

If you're going to start using this code, I strongly recommend pulling
from my git trees, which are linked below.

Comments and questions are, as always, welcome.

e2fsprogs git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/e2fsprogs.git/log/?h=fuse4fs-fork
---
Commits in this patchset:
 * fuse2fs: separate libfuse3 and fuse2fs detection in configure
 * fuse2fs: start porting fuse2fs to lowlevel libfuse API
 * debian: create new package for fuse4fs
 * fuse4fs: namespace some helpers
 * fuse4fs: convert to low level API
 * libsupport: port the kernel list.h to libsupport
 * libsupport: add a cache
 * cache: disable debugging
 * cache: use modern list iterator macros
 * cache: embed struct cache in the owner
 * cache: pass cache pointer to callbacks
 * cache: pass a private data pointer through cache_walk
 * cache: add a helper to grab a new refcount for a cache_node
 * cache: return results of a cache flush
 * cache: add a "get only if incore" flag to cache_node_get
 * cache: support gradual expansion
 * cache: implement automatic shrinking
 * fuse4fs: add cache to track open files
 * fuse4fs: use the orphaned inode list
 * fuse4fs: implement FUSE_TMPFILE
 * fuse4fs: create incore reverse orphan list
---
 lib/ext2fs/jfs_compat.h  |    2 
 lib/ext2fs/kernel-list.h |  111 -
 lib/support/cache.h      |  177 +
 lib/support/list.h       |  901 +++++++
 lib/support/xbitops.h    |  128 +
 Makefile.in              |    3 
 configure                |  414 +--
 configure.ac             |  156 +
 debian/control           |   12 
 debian/fuse4fs.install   |    2 
 debian/rules             |   11 
 debugfs/Makefile.in      |   12 
 e2fsck/Makefile.in       |   56 
 fuse4fs/Makefile.in      |  193 +
 fuse4fs/fuse4fs.1.in     |  118 +
 fuse4fs/fuse4fs.c        | 6169 ++++++++++++++++++++++++++++++++++++++++++++++
 lib/config.h.in          |    3 
 lib/e2p/Makefile.in      |    4 
 lib/ext2fs/Makefile.in   |   14 
 lib/support/Makefile.in  |    8 
 lib/support/cache.c      |  853 ++++++
 misc/Makefile.in         |   18 
 misc/tune2fs.c           |    4 
 23 files changed, 8922 insertions(+), 447 deletions(-)
 delete mode 100644 lib/ext2fs/kernel-list.h
 create mode 100644 lib/support/cache.h
 create mode 100644 lib/support/list.h
 create mode 100644 lib/support/xbitops.h
 create mode 100644 debian/fuse4fs.install
 create mode 100644 fuse4fs/Makefile.in
 create mode 100644 fuse4fs/fuse4fs.1.in
 create mode 100644 fuse4fs/fuse4fs.c
 create mode 100644 lib/support/cache.c


^ permalink raw reply	[flat|nested] 87+ messages in thread

* [PATCHSET RFC v5 3/9] libext2fs: refactoring for fuse2fs iomap support
  2025-09-16  0:07 [RFC v5] fuse: containerize ext4 for safer operation Darrick J. Wong
  2025-09-16  0:21 ` [PATCHSET 1/9] fuse2fs: upgrade to libfuse 3.17 Darrick J. Wong
  2025-09-16  0:22 ` [PATCHSET RFC v5 2/9] fuse4fs: fork a low level fuse server Darrick J. Wong
@ 2025-09-16  0:22 ` Darrick J. Wong
  2025-09-16  0:56   ` [PATCH 01/10] libext2fs: make it possible to extract the fd from an IO manager Darrick J. Wong
                     ` (9 more replies)
  2025-09-16  0:22 ` [PATCHSET RFC v5 4/9] fuse2fs: use fuse iomap data paths for better file I/O performance Darrick J. Wong
                   ` (5 subsequent siblings)
  8 siblings, 10 replies; 87+ messages in thread
From: Darrick J. Wong @ 2025-09-16  0:22 UTC (permalink / raw)
  To: tytso; +Cc: miklos, neal, linux-fsdevel, linux-ext4, John, bernd,
	joannelkoong

Hi all,

In preparation for connecting fuse, iomap, and fuse2fs for a much more
performant file IO path, make some changes to the Unix IO manager in
libext2fs so that we can have better IO.  First we start by making
filesystem flushes a lot more efficient by eliding fsyncs when they're
not necessary, and allowing library clients to turn off the racy code
that writes the superblock byte by byte but exposes stale checksums.

XXX: The second part of this series adds IO tagging so that we could tag
IOs by inode number to distinguish file data blocks in cache from
everything else.  This is temporary scaffolding whilst we're in the
middle adding directio and later buffered writes.  Once we can use the
pagecache for all file IO activity I think we could drop the back half
of this series.

If you're going to start using this code, I strongly recommend pulling
from my git trees, which are linked below.

Comments and questions are, as always, welcome.

e2fsprogs git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/e2fsprogs.git/log/?h=libext2fs-iomap-prep
---
Commits in this patchset:
 * libext2fs: make it possible to extract the fd from an IO manager
 * libext2fs: always fsync the device when flushing the cache
 * libext2fs: always fsync the device when closing the unix IO manager
 * libext2fs: only fsync the unix fd if we wrote to the device
 * libext2fs: invalidate cached blocks when freeing them
 * libext2fs: only flush affected blocks in unix_write_byte
 * libext2fs: allow unix_write_byte when the write would be aligned
 * libext2fs: allow clients to ask to write full superblocks
 * libext2fs: allow callers to disallow I/O to file data blocks
 * libext2fs: add posix advisory locking to the unix IO manager
---
 lib/ext2fs/ext2_io.h         |   10 ++
 lib/ext2fs/ext2fs.h          |    4 +
 debian/libext2fs2t64.symbols |    2 
 lib/ext2fs/alloc_stats.c     |    6 +
 lib/ext2fs/closefs.c         |    7 ++
 lib/ext2fs/fileio.c          |   12 +++
 lib/ext2fs/io_manager.c      |   17 ++++
 lib/ext2fs/unix_io.c         |  180 ++++++++++++++++++++++++++++++++++++++++--
 8 files changed, 228 insertions(+), 10 deletions(-)


^ permalink raw reply	[flat|nested] 87+ messages in thread

* [PATCHSET RFC v5 4/9] fuse2fs: use fuse iomap data paths for better file I/O performance
  2025-09-16  0:07 [RFC v5] fuse: containerize ext4 for safer operation Darrick J. Wong
                   ` (2 preceding siblings ...)
  2025-09-16  0:22 ` [PATCHSET RFC v5 3/9] libext2fs: refactoring for fuse2fs iomap support Darrick J. Wong
@ 2025-09-16  0:22 ` Darrick J. Wong
  2025-09-16  0:58   ` [PATCH 01/17] fuse2fs: implement bare minimum iomap for file mapping reporting Darrick J. Wong
                     ` (16 more replies)
  2025-09-16  0:22 ` [PATCHSET RFC v5 5/9] fuse4fs: specify the root node id Darrick J. Wong
                   ` (4 subsequent siblings)
  8 siblings, 17 replies; 87+ messages in thread
From: Darrick J. Wong @ 2025-09-16  0:22 UTC (permalink / raw)
  To: tytso; +Cc: miklos, neal, linux-fsdevel, linux-ext4, John, bernd,
	joannelkoong

Hi all,

Switch fuse2fs to use the new iomap file data IO paths instead of
pushing it very slowly through the /dev/fuse connection.  For local
filesystems, all we have to do is respond to requests for file to device
mappings; the rest of the IO hot path stays within the kernel.  This
means that we can get rid of all file data block processing within
fuse2fs.

Because we're not pinning dirty pages through a potentially slow network
connection, we don't need the heavy BDI throttling for which most fuse
servers have become infamous.  Yes, mapping lookups for writeback can
stall, but mappings are small as compared to data and this situation
exists for all kernel filesystems as well.

The performance of this new data path is quite stunning: on a warm
system, streaming reads and writes through the pagecache go from
60-90MB/s to 2-2.5GB/s.  Direct IO reads and writes improve from the
same baseline to 2.5-8GB/s.  FIEMAP and SEEK_DATA/SEEK_HOLE now work
too.  The kernel ext4 driver can manage about 1.6GB/s for pagecache IO
and about 2.6-8.5GB/s, which means that fuse2fs is about as fast as the
kernel for streaming file IO.

Random 4k buffered IO is not so good: plain fuse2fs pokes along at
25-50MB/s, whereas fuse2fs with iomap manages 90-1300MB/s.  The kernel
can do 900-1300MB/s.  Random directio is worse: plain fuse2fs does
20-30MB/s, fuse-iomap does about 30-35MB/s, and the kernel does
40-55MB/s.  I suspect that metadata heavy workloads do not perform well
on fuse2fs because libext2fs wasn't designed for that and it doesn't
even have a journal to absorb all the fsync writes.  We also probably
need iomap caching really badly.

These performance numbers are slanted: my machine is 12 years old, and
fuse2fs is VERY poorly optimized for performance.  It contains a single
Big Filesystem Lock which nukes multi-threaded scalability.  There's no
inode cache nor is there a proper buffer cache, which means that fuse2fs
reads metadata in from disk and checksums it on EVERY ACCESS.  Sad!

Despite these gaps, this RFC demonstrates that it's feasible to run the
metadata parsing parts of a filesystem in userspace while not
sacrificing much performance.  We now have a vehicle to move the
filesystems out of the kernel, where they can be containerized so that
malicious filesystems can be contained, somewhat.

iomap mode also calls FUSE_DESTROY before unmounting the filesystem, so
for capable systems, fuse2fs doesn't need to run in fuseblk mode
anymore.

However, there are some major warts remaining:

1. The iomap cookie validation is not present, which can lead to subtle
races between pagecache zeroing and writeback on filesystems that
support unwritten and delalloc mappings.

2. Mappings ought to be cached in the kernel for more speed.

3. iomap doesn't support things like fscrypt or fsverity, and I haven't
yet figured out how inline data is supposed to work.

4. I would like to be able to turn on fuse+iomap on a per-inode basis,
which currently isn't possible because the kernel fuse driver will iget
inodes prior to calling FUSE_GETATTR to discover the properties of the
inode it just read.

5. ext4 doesn't support out of place writes so I don't know if that
actually works correctly.

6. iomap is an inode-based service, not a file-based service.  This
means that we /must/ push ext2's inode numbers into the kernel via
FUSE_GETATTR so that it can report those same numbers back out through
the FUSE_IOMAP_* calls.  However, the fuse kernel uses a separate nodeid
to index its incore inode, so we have to pass those too so that
notifications work properly.

If you're going to start using this code, I strongly recommend pulling
from my git trees, which are linked below.

Comments and questions are, as always, welcome.

e2fsprogs git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/e2fsprogs.git/log/?h=fuse2fs-iomap-fileio
---
Commits in this patchset:
 * fuse2fs: implement bare minimum iomap for file mapping reporting
 * fuse2fs: add iomap= mount option
 * fuse2fs: implement iomap configuration
 * fuse2fs: register block devices for use with iomap
 * fuse2fs: implement directio file reads
 * fuse2fs: add extent dump function for debugging
 * fuse2fs: implement direct write support
 * fuse2fs: turn on iomap for pagecache IO
 * fuse2fs: don't zero bytes in punch hole
 * fuse2fs: don't do file data block IO when iomap is enabled
 * fuse2fs: avoid fuseblk mode if fuse-iomap support is likely
 * fuse2fs: enable file IO to inline data files
 * fuse2fs: set iomap-related inode flags
 * fuse2fs: configure block device block size
 * fuse4fs: separate invalidation
 * fuse2fs: implement statx
 * fuse2fs: enable atomic writes
---
 configure            |   48 +
 configure.ac         |   31 +
 fuse4fs/fuse4fs.1.in |    6 
 fuse4fs/fuse4fs.c    | 1690 +++++++++++++++++++++++++++++++++++++++++++++++-
 lib/config.h.in      |    3 
 misc/fuse2fs.1.in    |    6 
 misc/fuse2fs.c       | 1755 ++++++++++++++++++++++++++++++++++++++++++++++++++
 7 files changed, 3503 insertions(+), 36 deletions(-)


^ permalink raw reply	[flat|nested] 87+ messages in thread

* [PATCHSET RFC v5 5/9] fuse4fs: specify the root node id
  2025-09-16  0:07 [RFC v5] fuse: containerize ext4 for safer operation Darrick J. Wong
                   ` (3 preceding siblings ...)
  2025-09-16  0:22 ` [PATCHSET RFC v5 4/9] fuse2fs: use fuse iomap data paths for better file I/O performance Darrick J. Wong
@ 2025-09-16  0:22 ` Darrick J. Wong
  2025-09-16  1:03   ` [PATCH 1/1] fuse4fs: don't use inode number translation when possible Darrick J. Wong
  2025-09-16  0:23 ` [PATCHSET RFC v5 6/9] fuse2fs: handle timestamps and ACLs correctly when iomap is enabled Darrick J. Wong
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 87+ messages in thread
From: Darrick J. Wong @ 2025-09-16  0:22 UTC (permalink / raw)
  To: tytso; +Cc: miklos, neal, linux-fsdevel, linux-ext4, John, bernd,
	joannelkoong

Hi all,


If you're going to start using this code, I strongly recommend pulling
from my git trees, which are linked below.

Comments and questions are, as always, welcome.

e2fsprogs git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/e2fsprogs.git/log/?h=fuse2fs-root-nodeid
---
Commits in this patchset:
 * fuse4fs: don't use inode number translation when possible
---
 fuse4fs/fuse4fs.c |   29 +++++++++++++++++++++++------
 1 file changed, 23 insertions(+), 6 deletions(-)


^ permalink raw reply	[flat|nested] 87+ messages in thread

* [PATCHSET RFC v5 6/9] fuse2fs: handle timestamps and ACLs correctly when iomap is enabled
  2025-09-16  0:07 [RFC v5] fuse: containerize ext4 for safer operation Darrick J. Wong
                   ` (4 preceding siblings ...)
  2025-09-16  0:22 ` [PATCHSET RFC v5 5/9] fuse4fs: specify the root node id Darrick J. Wong
@ 2025-09-16  0:23 ` Darrick J. Wong
  2025-09-16  1:03   ` [PATCH 01/10] fuse2fs: add strictatime/lazytime mount options Darrick J. Wong
                     ` (9 more replies)
  2025-09-16  0:23 ` [PATCHSET RFC v5 7/9] fuse2fs: cache iomap mappings for even better file IO performance Darrick J. Wong
                   ` (2 subsequent siblings)
  8 siblings, 10 replies; 87+ messages in thread
From: Darrick J. Wong @ 2025-09-16  0:23 UTC (permalink / raw)
  To: tytso; +Cc: miklos, neal, linux-fsdevel, linux-ext4, John, bernd,
	joannelkoong

Hi all,

When iomap is enabled for a fuse file, we try to keep as much of the
file IO path in the kernel as we possibly can.  That means no calling
out to the fuse server in the IO path when we can avoid it.  However,
the existing FUSE architecture defers all file attributes to the fuse
server -- [cm]time updates, ACL metadata management, set[ug]id removal,
and permissions checking thereof, etc.

We'd really rather do all these attribute updates in the kernel, and
only push them to the fuse server when it's actually necessary (e.g.
fsync).  Furthermore, the POSIX ACL code has the weird behavior that if
the access ACL can be represented entirely by i_mode bits, it will
change the mode and delete the ACL, which fuse servers generally don't
seem to implement.

IOWs, we want consistent and correct (as defined by fstests) behavior
of file attributes in iomap mode.  Let's make the kernel manage all that
and push the results to userspace as needed.  This improves performance
even further, since it's sort of like writeback_cache mode but more
aggressive.

If you're going to start using this code, I strongly recommend pulling
from my git trees, which are linked below.

Comments and questions are, as always, welcome.

e2fsprogs git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/e2fsprogs.git/log/?h=fuse2fs-iomap-attrs
---
Commits in this patchset:
 * fuse2fs: add strictatime/lazytime mount options
 * fuse2fs: skip permission checking on utimens when iomap is enabled
 * fuse2fs: let the kernel tell us about acl/mode updates
 * fuse2fs: better debugging for file mode updates
 * fuse2fs: debug timestamp updates
 * fuse2fs: use coarse timestamps for iomap mode
 * fuse2fs: add tracing for retrieving timestamps
 * fuse2fs: enable syncfs
 * fuse2fs: skip the gdt write in op_destroy if syncfs is working
 * fuse2fs: set sync, immutable, and append at file load time
---
 fuse4fs/fuse4fs.1.in |    6 +
 fuse4fs/fuse4fs.c    |  231 ++++++++++++++++++++++++++++----------
 misc/fuse2fs.1.in    |    6 +
 misc/fuse2fs.c       |  304 ++++++++++++++++++++++++++++++++++++++------------
 4 files changed, 413 insertions(+), 134 deletions(-)


^ permalink raw reply	[flat|nested] 87+ messages in thread

* [PATCHSET RFC v5 7/9] fuse2fs: cache iomap mappings for even better file IO performance
  2025-09-16  0:07 [RFC v5] fuse: containerize ext4 for safer operation Darrick J. Wong
                   ` (5 preceding siblings ...)
  2025-09-16  0:23 ` [PATCHSET RFC v5 6/9] fuse2fs: handle timestamps and ACLs correctly when iomap is enabled Darrick J. Wong
@ 2025-09-16  0:23 ` Darrick J. Wong
  2025-09-16  1:06   ` [PATCH 1/3] fuse2fs: enable caching of iomaps Darrick J. Wong
                     ` (2 more replies)
  2025-09-16  0:23 ` [PATCHSET RFC v5 8/9] fuse2fs: improve block and inode caching Darrick J. Wong
  2025-09-16  0:24 ` [PATCHSET RFC v5 9/9] fuse4fs: run servers as a contained service Darrick J. Wong
  8 siblings, 3 replies; 87+ messages in thread
From: Darrick J. Wong @ 2025-09-16  0:23 UTC (permalink / raw)
  To: tytso; +Cc: miklos, neal, linux-fsdevel, linux-ext4, John, bernd,
	joannelkoong

Hi all,

This series improves the performance (and correctness for some
filesystems) by adding the ability to cache iomap mappings in the
kernel.  For filesystems that can change mapping states during pagecache
writeback (e.g. unwritten extent conversion) this is absolutely
necessary to deal with races with writes to the pagecache because
writeback does not take i_rwsem.  For everyone else, it simply
eliminates roundtrips to userspace.

If you're going to start using this code, I strongly recommend pulling
from my git trees, which are linked below.

Comments and questions are, as always, welcome.

e2fsprogs git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/e2fsprogs.git/log/?h=fuse2fs-iomap-cache
---
Commits in this patchset:
 * fuse2fs: enable caching of iomaps
 * fuse2fs: be smarter about caching iomaps
 * fuse2fs: enable iomap
---
 fuse4fs/fuse4fs.c |   54 +++++++++++++++++++++++++++++++++++++++++++++++++----
 misc/fuse2fs.c    |   50 +++++++++++++++++++++++++++++++++++++++++++++----
 2 files changed, 96 insertions(+), 8 deletions(-)


^ permalink raw reply	[flat|nested] 87+ messages in thread

* [PATCHSET RFC v5 8/9] fuse2fs: improve block and inode caching
  2025-09-16  0:07 [RFC v5] fuse: containerize ext4 for safer operation Darrick J. Wong
                   ` (6 preceding siblings ...)
  2025-09-16  0:23 ` [PATCHSET RFC v5 7/9] fuse2fs: cache iomap mappings for even better file IO performance Darrick J. Wong
@ 2025-09-16  0:23 ` Darrick J. Wong
  2025-09-16  1:07   ` [PATCH 1/6] libsupport: add caching IO manager Darrick J. Wong
                     ` (5 more replies)
  2025-09-16  0:24 ` [PATCHSET RFC v5 9/9] fuse4fs: run servers as a contained service Darrick J. Wong
  8 siblings, 6 replies; 87+ messages in thread
From: Darrick J. Wong @ 2025-09-16  0:23 UTC (permalink / raw)
  To: tytso; +Cc: miklos, neal, linux-fsdevel, linux-ext4, John, bernd,
	joannelkoong

Hi all,

This series ports the libext2fs inode cache to the new cache.c hashtable
code that was added for fuse4fs unlinked file support and improves on
the UNIX I/O manager's block cache by adding a new I/O manager that does
its own caching.  Now we no longer have statically sized buffer caching
for the two fuse servers.

If you're going to start using this code, I strongly recommend pulling
from my git trees, which are linked below.

Comments and questions are, as always, welcome.

e2fsprogs git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/e2fsprogs.git/log/?h=fuse2fs-caching
---
Commits in this patchset:
 * libsupport: add caching IO manager
 * iocache: add the actual buffer cache
 * iocache: bump buffer mru priority every 50 accesses
 * fuse2fs: enable caching IO manager
 * fuse2fs: increase inode cache size
 * libext2fs: improve caching for inodes
---
 lib/ext2fs/ext2fsP.h    |   13 +
 lib/support/cache.h     |    1 
 lib/support/iocache.h   |   17 +
 debugfs/Makefile.in     |    4 
 e2fsck/Makefile.in      |    4 
 fuse4fs/Makefile.in     |    3 
 fuse4fs/fuse4fs.1.in    |    6 
 fuse4fs/fuse4fs.c       |   75 +----
 lib/ext2fs/Makefile.in  |    4 
 lib/ext2fs/inode.c      |  215 +++++++++++---
 lib/ext2fs/io_manager.c |    3 
 lib/support/Makefile.in |    6 
 lib/support/cache.c     |   16 +
 lib/support/iocache.c   |  740 +++++++++++++++++++++++++++++++++++++++++++++++
 misc/Makefile.in        |    4 
 misc/fuse2fs.1.in       |    6 
 misc/fuse2fs.c          |   77 +----
 resize/Makefile.in      |    4 
 tests/progs/Makefile.in |    4 
 19 files changed, 992 insertions(+), 210 deletions(-)
 create mode 100644 lib/support/iocache.h
 create mode 100644 lib/support/iocache.c


^ permalink raw reply	[flat|nested] 87+ messages in thread

* [PATCHSET RFC v5 9/9] fuse4fs: run servers as a contained service
  2025-09-16  0:07 [RFC v5] fuse: containerize ext4 for safer operation Darrick J. Wong
                   ` (7 preceding siblings ...)
  2025-09-16  0:23 ` [PATCHSET RFC v5 8/9] fuse2fs: improve block and inode caching Darrick J. Wong
@ 2025-09-16  0:24 ` Darrick J. Wong
  2025-09-16  1:08   ` [PATCH 1/4] libext2fs: fix MMP code to work with unixfd IO manager Darrick J. Wong
                     ` (3 more replies)
  8 siblings, 4 replies; 87+ messages in thread
From: Darrick J. Wong @ 2025-09-16  0:24 UTC (permalink / raw)
  To: tytso; +Cc: miklos, neal, linux-fsdevel, linux-ext4, John, bernd,
	joannelkoong

Hi all,

In this final series of the fuse-iomap prototype, we package the newly
created fuse4fs server into a systemd socket service.  This service can
be used by the "mount.service" helper in libfuse to implement untrusted
unprivileged mounts.

If you're going to start using this code, I strongly recommend pulling
from my git trees, which are linked below.

Comments and questions are, as always, welcome.

e2fsprogs git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/e2fsprogs.git/log/?h=fuse4fs-service-container
---
Commits in this patchset:
 * libext2fs: fix MMP code to work with unixfd IO manager
 * fuse4fs: enable safe service mode
 * fuse4fs: set proc title when in fuse service mode
 * fuse4fs: set iomap backing device blocksize
---
 MCONFIG.in                  |    1 
 configure                   |  181 +++++++++++++++++++++++++
 configure.ac                |   69 +++++++++
 debian/fuse4fs.install      |    2 
 fuse4fs/Makefile.in         |   42 +++++-
 fuse4fs/fuse4fs.c           |  315 +++++++++++++++++++++++++++++++++++++++++--
 fuse4fs/fuse4fs.socket.in   |   14 ++
 fuse4fs/fuse4fs@.service.in |   95 +++++++++++++
 lib/config.h.in             |    6 +
 lib/ext2fs/mmp.c            |   46 ++++++
 util/subst.conf.in          |    2 
 11 files changed, 750 insertions(+), 23 deletions(-)
 create mode 100644 fuse4fs/fuse4fs.socket.in
 create mode 100644 fuse4fs/fuse4fs@.service.in


^ permalink raw reply	[flat|nested] 87+ messages in thread

* [PATCH 1/4] fuse2fs: bump library version
  2025-09-16  0:21 ` [PATCHSET 1/9] fuse2fs: upgrade to libfuse 3.17 Darrick J. Wong
@ 2025-09-16  0:49   ` Darrick J. Wong
  2025-09-16  0:50   ` [PATCH 2/4] fuse2fs: wrap the fuse_set_feature_flag helper for older libfuse Darrick J. Wong
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 87+ messages in thread
From: Darrick J. Wong @ 2025-09-16  0:49 UTC (permalink / raw)
  To: tytso; +Cc: linux-ext4

From: Darrick J. Wong <djwong@kernel.org>

Bump the library version so we can take advantage of new functionality
since libfuse 3.5.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 configure    |    4 ++--
 configure.ac |    4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)


diff --git a/configure b/configure
index 356644beed651e..71750b1a8ee972 100755
--- a/configure
+++ b/configure
@@ -14687,14 +14687,14 @@ fi
 
 if test "$FUSE_LIB" = "-lfuse3"
 then
-	FUSE_USE_VERSION=35
+	FUSE_USE_VERSION=314
 	CFLAGS="$fuse3_CFLAGS $CFLAGS"
 	FUSE_LIB="$fuse3_LIBS"
 	       for ac_header in pthread.h fuse.h
 do :
   as_ac_Header=`printf "%s\n" "ac_cv_header_$ac_header" | $as_tr_sh`
 ac_fn_c_check_header_compile "$LINENO" "$ac_header" "$as_ac_Header" "#define _FILE_OFFSET_BITS	64
-#define FUSE_USE_VERSION 35
+#define FUSE_USE_VERSION 314
 #ifdef __linux__
 #include <linux/fs.h>
 #include <linux/falloc.h>
diff --git a/configure.ac b/configure.ac
index f065cd395cf33c..0591999b52b019 100644
--- a/configure.ac
+++ b/configure.ac
@@ -1431,13 +1431,13 @@ AC_SUBST(FUSE_LIB)
 AC_SUBST(FUSE_CMT)
 if test "$FUSE_LIB" = "-lfuse3"
 then
-	FUSE_USE_VERSION=35
+	FUSE_USE_VERSION=314
 	CFLAGS="$fuse3_CFLAGS $CFLAGS"
 	FUSE_LIB="$fuse3_LIBS"
 	AC_CHECK_HEADERS([pthread.h fuse.h], [],
 		[AC_MSG_FAILURE([Cannot find fuse3 fuse2fs headers.])],
 [#define _FILE_OFFSET_BITS	64
-#define FUSE_USE_VERSION 35
+#define FUSE_USE_VERSION 314
 #ifdef __linux__
 #include <linux/fs.h>
 #include <linux/falloc.h>


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 2/4] fuse2fs: wrap the fuse_set_feature_flag helper for older libfuse
  2025-09-16  0:21 ` [PATCHSET 1/9] fuse2fs: upgrade to libfuse 3.17 Darrick J. Wong
  2025-09-16  0:49   ` [PATCH 1/4] fuse2fs: bump library version Darrick J. Wong
@ 2025-09-16  0:50   ` Darrick J. Wong
  2025-09-16  0:50   ` [PATCH 3/4] fuse2fs: disable nfs exports Darrick J. Wong
  2025-09-16  0:50   ` [PATCH 4/4] fuse2fs: drop fuse 2.x support code Darrick J. Wong
  3 siblings, 0 replies; 87+ messages in thread
From: Darrick J. Wong @ 2025-09-16  0:50 UTC (permalink / raw)
  To: tytso; +Cc: linux-ext4

From: Darrick J. Wong <djwong@kernel.org>

Create a compatibility wrapper for fuse_set_feature_flag if the libfuse
version is older than the one where that function was introduced (3.17).

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 misc/fuse2fs.c |   32 +++++++++++++++++++++++++++++---
 1 file changed, 29 insertions(+), 3 deletions(-)


diff --git a/misc/fuse2fs.c b/misc/fuse2fs.c
index 101f0fa03c397d..aa51b8f55b0f50 100644
--- a/misc/fuse2fs.c
+++ b/misc/fuse2fs.c
@@ -1298,6 +1298,19 @@ static int fuse2fs_read_bitmaps(struct fuse2fs *ff)
 	return 0;
 }
 
+#if FUSE_VERSION < FUSE_MAKE_VERSION(3, 17)
+static inline int fuse_set_feature_flag(struct fuse_conn_info *conn,
+					 uint64_t flag)
+{
+	if (conn->capable & flag) {
+		conn->want |= flag;
+		return 1;
+	}
+
+	return 0;
+}
+#endif
+
 static void *op_init(struct fuse_conn_info *conn
 #if FUSE_VERSION >= FUSE_MAKE_VERSION(3, 0)
 			, struct fuse_config *cfg EXT2FS_ATTR((unused))
@@ -1322,14 +1335,14 @@ static void *op_init(struct fuse_conn_info *conn
 	fs = ff->fs;
 	dbg_printf(ff, "%s: dev=%s\n", __func__, fs->device_name);
 #ifdef FUSE_CAP_IOCTL_DIR
-	conn->want |= FUSE_CAP_IOCTL_DIR;
+	fuse_set_feature_flag(conn, FUSE_CAP_IOCTL_DIR);
 #endif
 #ifdef FUSE_CAP_POSIX_ACL
 	if (ff->acl)
-		conn->want |= FUSE_CAP_POSIX_ACL;
+		fuse_set_feature_flag(conn, FUSE_CAP_POSIX_ACL);
 #endif
 #ifdef FUSE_CAP_CACHE_SYMLINKS
-	conn->want |= FUSE_CAP_CACHE_SYMLINKS;
+	fuse_set_feature_flag(conn, FUSE_CAP_CACHE_SYMLINKS);
 #endif
 #if FUSE_VERSION >= FUSE_MAKE_VERSION(3, 0)
 	conn->time_gran = 1;
@@ -1349,6 +1362,19 @@ static void *op_init(struct fuse_conn_info *conn
 	if (ff->opstate == F2OP_WRITABLE)
 		fuse2fs_read_bitmaps(ff);
 
+#if FUSE_VERSION >= FUSE_MAKE_VERSION(3, 17)
+	/*
+	 * THIS MUST GO LAST!
+	 *
+	 * fuse_set_feature_flag in 3.17.0 has a strange bug: it sets feature
+	 * flags in conn->want_ext, but not conn->want.  Upon return to
+	 * libfuse, the lower level library observes that want and want_ext
+	 * have gotten out of sync, and refuses to mount.  Therefore,
+	 * synchronize the two.  This bug went away in 3.17.3, but we're stuck
+	 * with this forever because Debian trixie released with 3.17.2.
+	 */
+	conn->want = conn->want_ext & 0xFFFFFFFF;
+#endif
 	return ff;
 }
 


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 3/4] fuse2fs: disable nfs exports
  2025-09-16  0:21 ` [PATCHSET 1/9] fuse2fs: upgrade to libfuse 3.17 Darrick J. Wong
  2025-09-16  0:49   ` [PATCH 1/4] fuse2fs: bump library version Darrick J. Wong
  2025-09-16  0:50   ` [PATCH 2/4] fuse2fs: wrap the fuse_set_feature_flag helper for older libfuse Darrick J. Wong
@ 2025-09-16  0:50   ` Darrick J. Wong
  2025-09-16  0:50   ` [PATCH 4/4] fuse2fs: drop fuse 2.x support code Darrick J. Wong
  3 siblings, 0 replies; 87+ messages in thread
From: Darrick J. Wong @ 2025-09-16  0:50 UTC (permalink / raw)
  To: tytso; +Cc: linux-ext4

From: Darrick J. Wong <djwong@kernel.org>

The kernel fuse driver can export its own handles, but it doesn't
actually talk to the fuse server about those handles.  Hence they don't
survive unmount/mount cycles like regular ext4.  Disable them, because
they cause fstests regressions and it's not clear that they're suitable
for NFS export, at least not as most people understand ext4 NFS exports.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 misc/fuse2fs.c |    3 +++
 1 file changed, 3 insertions(+)


diff --git a/misc/fuse2fs.c b/misc/fuse2fs.c
index aa51b8f55b0f50..e3a350462f25f3 100644
--- a/misc/fuse2fs.c
+++ b/misc/fuse2fs.c
@@ -1344,6 +1344,9 @@ static void *op_init(struct fuse_conn_info *conn
 #ifdef FUSE_CAP_CACHE_SYMLINKS
 	fuse_set_feature_flag(conn, FUSE_CAP_CACHE_SYMLINKS);
 #endif
+#ifdef FUSE_CAP_NO_EXPORT_SUPPORT
+	fuse_set_feature_flag(conn, FUSE_CAP_NO_EXPORT_SUPPORT);
+#endif
 #if FUSE_VERSION >= FUSE_MAKE_VERSION(3, 0)
 	conn->time_gran = 1;
 	cfg->use_ino = 1;


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 4/4] fuse2fs: drop fuse 2.x support code
  2025-09-16  0:21 ` [PATCHSET 1/9] fuse2fs: upgrade to libfuse 3.17 Darrick J. Wong
                     ` (2 preceding siblings ...)
  2025-09-16  0:50   ` [PATCH 3/4] fuse2fs: disable nfs exports Darrick J. Wong
@ 2025-09-16  0:50   ` Darrick J. Wong
  3 siblings, 0 replies; 87+ messages in thread
From: Darrick J. Wong @ 2025-09-16  0:50 UTC (permalink / raw)
  To: tytso; +Cc: amir73il, linux-ext4

From: Darrick J. Wong <djwong@kernel.org>

We only enable fuse2fs if libfuse is from the 3.xx series and the lowlevel
libfuse API is present.  Drop support for 2.x.  This part is cribbed from Amir
who used an LLM aided conversion for fuse4fs, but the maintainer requested that
I apply it to fuse2fs as well.

Co-developed-by: Claude claude-4-sonnet
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 configure      |  314 +++++---------------------------------------------------
 configure.ac   |   81 +++++---------
 misc/fuse2fs.c |  219 ++++-----------------------------------
 3 files changed, 80 insertions(+), 534 deletions(-)


diff --git a/configure b/configure
index 71750b1a8ee972..86c9bc77321eee 100755
--- a/configure
+++ b/configure
@@ -1676,6 +1676,9 @@ Some influential environment variables:
               C compiler flags for ARCHIVE, overriding pkg-config
   ARCHIVE_LIBS
               linker flags for ARCHIVE, overriding pkg-config
+  fuse3_CFLAGS
+              C compiler flags for fuse3, overriding pkg-config
+  fuse3_LIBS  linker flags for fuse3, overriding pkg-config
   CXX         C++ compiler command
   CXXFLAGS    C++ compiler flags
   udev_CFLAGS C compiler flags for udev, overriding pkg-config
@@ -14054,19 +14057,20 @@ FUSE_LIB=
 # Check whether --enable-fuse2fs was given.
 if test ${enable_fuse2fs+y}
 then :
-  enableval=$enable_fuse2fs; if test "$enableval" = "no"
-then
-	FUSE_CMT="#"
-	{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: Disabling fuse2fs" >&5
+  enableval=$enable_fuse2fs;
+	if test "$enableval" = "no"
+	then
+		FUSE_CMT="#"
+		{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: Disabling fuse2fs" >&5
 printf "%s\n" "Disabling fuse2fs" >&6; }
-else
-	cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+	else
+		cat confdefs.h - <<_ACEOF >conftest.$ac_ext
 /* end confdefs.h.  */
 #ifdef __linux__
-#include <linux/fs.h>
-#include <linux/falloc.h>
-#include <linux/xattr.h>
-#endif
+	#include <linux/fs.h>
+	#include <linux/falloc.h>
+	#include <linux/xattr.h>
+	#endif
 
 int
 main (void)
@@ -14087,9 +14091,6 @@ See \`config.log' for more details" "$LINENO" 5; }
 fi
 rm -f conftest.err conftest.i conftest.$ac_ext
 
-	  fuse3_CFLAGS
-              C compiler flags for fuse3, overriding pkg-config
-  fuse3_LIBS  linker flags for fuse3, overriding pkg-config
 
 pkg_failed=no
 { printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for fuse3" >&5
@@ -14150,28 +14151,7 @@ fi
         echo "$fuse3_PKG_ERRORS" >&5
 
 
-		       for ac_header in pthread.h fuse.h
-do :
-  as_ac_Header=`printf "%s\n" "ac_cv_header_$ac_header" | $as_tr_sh`
-ac_fn_c_check_header_compile "$LINENO" "$ac_header" "$as_ac_Header" "#define _FILE_OFFSET_BITS	64
-#define FUSE_USE_VERSION 29
-"
-if eval test \"x\$"$as_ac_Header"\" = x"yes"
-then :
-  cat >>confdefs.h <<_ACEOF
-#define `printf "%s\n" "HAVE_$ac_header" | $as_tr_cpp` 1
-_ACEOF
-
-else $as_nop
-  { { printf "%s\n" "$as_me:${as_lineno-$LINENO}: error: in \`$ac_pwd':" >&5
-printf "%s\n" "$as_me: error: in \`$ac_pwd':" >&2;}
-as_fn_error $? "Cannot find fuse2fs headers.
-See \`config.log' for more details" "$LINENO" 5; }
-fi
-
-done
-
-		{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for fuse_main in -losxfuse" >&5
+			{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for fuse_main in -losxfuse" >&5
 printf %s "checking for fuse_main in -losxfuse... " >&6; }
 if test ${ac_cv_lib_osxfuse_fuse_main+y}
 then :
@@ -14209,45 +14189,6 @@ printf "%s\n" "$ac_cv_lib_osxfuse_fuse_main" >&6; }
 if test "x$ac_cv_lib_osxfuse_fuse_main" = xyes
 then :
   FUSE_LIB=-losxfuse
-else $as_nop
-  { printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for fuse_main in -lfuse" >&5
-printf %s "checking for fuse_main in -lfuse... " >&6; }
-if test ${ac_cv_lib_fuse_fuse_main+y}
-then :
-  printf %s "(cached) " >&6
-else $as_nop
-  ac_check_lib_save_LIBS=$LIBS
-LIBS="-lfuse  $LIBS"
-cat confdefs.h - <<_ACEOF >conftest.$ac_ext
-/* end confdefs.h.  */
-
-/* Override any GCC internal prototype to avoid an error.
-   Use char because int might match the return type of a GCC
-   builtin and then its argument prototype would still apply.  */
-char fuse_main ();
-int
-main (void)
-{
-return fuse_main ();
-  ;
-  return 0;
-}
-_ACEOF
-if ac_fn_c_try_link "$LINENO"
-then :
-  ac_cv_lib_fuse_fuse_main=yes
-else $as_nop
-  ac_cv_lib_fuse_fuse_main=no
-fi
-rm -f core conftest.err conftest.$ac_objext conftest.beam \
-    conftest$ac_exeext conftest.$ac_ext
-LIBS=$ac_check_lib_save_LIBS
-fi
-{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_cv_lib_fuse_fuse_main" >&5
-printf "%s\n" "$ac_cv_lib_fuse_fuse_main" >&6; }
-if test "x$ac_cv_lib_fuse_fuse_main" = xyes
-then :
-  FUSE_LIB=-lfuse
 else $as_nop
   { { printf "%s\n" "$as_me:${as_lineno-$LINENO}: error: in \`$ac_pwd':" >&5
 printf "%s\n" "$as_me: error: in \`$ac_pwd':" >&2;}
@@ -14255,35 +14196,12 @@ as_fn_error $? "Cannot find fuse library.
 See \`config.log' for more details" "$LINENO" 5; }
 fi
 
-fi
-
 
 elif test $pkg_failed = untried; then
         { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: no" >&5
 printf "%s\n" "no" >&6; }
 
-		       for ac_header in pthread.h fuse.h
-do :
-  as_ac_Header=`printf "%s\n" "ac_cv_header_$ac_header" | $as_tr_sh`
-ac_fn_c_check_header_compile "$LINENO" "$ac_header" "$as_ac_Header" "#define _FILE_OFFSET_BITS	64
-#define FUSE_USE_VERSION 29
-"
-if eval test \"x\$"$as_ac_Header"\" = x"yes"
-then :
-  cat >>confdefs.h <<_ACEOF
-#define `printf "%s\n" "HAVE_$ac_header" | $as_tr_cpp` 1
-_ACEOF
-
-else $as_nop
-  { { printf "%s\n" "$as_me:${as_lineno-$LINENO}: error: in \`$ac_pwd':" >&5
-printf "%s\n" "$as_me: error: in \`$ac_pwd':" >&2;}
-as_fn_error $? "Cannot find fuse2fs headers.
-See \`config.log' for more details" "$LINENO" 5; }
-fi
-
-done
-
-		{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for fuse_main in -losxfuse" >&5
+			{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for fuse_main in -losxfuse" >&5
 printf %s "checking for fuse_main in -losxfuse... " >&6; }
 if test ${ac_cv_lib_osxfuse_fuse_main+y}
 then :
@@ -14321,45 +14239,6 @@ printf "%s\n" "$ac_cv_lib_osxfuse_fuse_main" >&6; }
 if test "x$ac_cv_lib_osxfuse_fuse_main" = xyes
 then :
   FUSE_LIB=-losxfuse
-else $as_nop
-  { printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for fuse_main in -lfuse" >&5
-printf %s "checking for fuse_main in -lfuse... " >&6; }
-if test ${ac_cv_lib_fuse_fuse_main+y}
-then :
-  printf %s "(cached) " >&6
-else $as_nop
-  ac_check_lib_save_LIBS=$LIBS
-LIBS="-lfuse  $LIBS"
-cat confdefs.h - <<_ACEOF >conftest.$ac_ext
-/* end confdefs.h.  */
-
-/* Override any GCC internal prototype to avoid an error.
-   Use char because int might match the return type of a GCC
-   builtin and then its argument prototype would still apply.  */
-char fuse_main ();
-int
-main (void)
-{
-return fuse_main ();
-  ;
-  return 0;
-}
-_ACEOF
-if ac_fn_c_try_link "$LINENO"
-then :
-  ac_cv_lib_fuse_fuse_main=yes
-else $as_nop
-  ac_cv_lib_fuse_fuse_main=no
-fi
-rm -f core conftest.err conftest.$ac_objext conftest.beam \
-    conftest$ac_exeext conftest.$ac_ext
-LIBS=$ac_check_lib_save_LIBS
-fi
-{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_cv_lib_fuse_fuse_main" >&5
-printf "%s\n" "$ac_cv_lib_fuse_fuse_main" >&6; }
-if test "x$ac_cv_lib_fuse_fuse_main" = xyes
-then :
-  FUSE_LIB=-lfuse
 else $as_nop
   { { printf "%s\n" "$as_me:${as_lineno-$LINENO}: error: in \`$ac_pwd':" >&5
 printf "%s\n" "$as_me: error: in \`$ac_pwd':" >&2;}
@@ -14367,24 +14246,21 @@ as_fn_error $? "Cannot find fuse library.
 See \`config.log' for more details" "$LINENO" 5; }
 fi
 
-fi
-
 
 else
         fuse3_CFLAGS=$pkg_cv_fuse3_CFLAGS
         fuse3_LIBS=$pkg_cv_fuse3_LIBS
         { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: yes" >&5
 printf "%s\n" "yes" >&6; }
-
-		FUSE_LIB=-lfuse3
-
+        FUSE_LIB=-lfuse3
 fi
-	{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: Enabling fuse2fs" >&5
+		{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: Enabling fuse2fs" >&5
 printf "%s\n" "Enabling fuse2fs" >&6; }
-fi
+	fi
 
 else $as_nop
 
+
 pkg_failed=no
 { printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for fuse3" >&5
 printf %s "checking for fuse3... " >&6; }
@@ -14444,30 +14320,6 @@ fi
         echo "$fuse3_PKG_ERRORS" >&5
 
 
-	       for ac_header in pthread.h fuse.h
-do :
-  as_ac_Header=`printf "%s\n" "ac_cv_header_$ac_header" | $as_tr_sh`
-ac_fn_c_check_header_compile "$LINENO" "$ac_header" "$as_ac_Header" "#define _FILE_OFFSET_BITS	64
-#define FUSE_USE_VERSION 29
-#ifdef __linux__
-# include <linux/fs.h>
-# include <linux/falloc.h>
-# include <linux/xattr.h>
-#endif
-"
-if eval test \"x\$"$as_ac_Header"\" = x"yes"
-then :
-  cat >>confdefs.h <<_ACEOF
-#define `printf "%s\n" "HAVE_$ac_header" | $as_tr_cpp` 1
-_ACEOF
-
-else $as_nop
-  FUSE_CMT="#"
-fi
-
-done
-	if test -z "$FUSE_CMT"
-	then
 		{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for fuse_main in -losxfuse" >&5
 printf %s "checking for fuse_main in -losxfuse... " >&6; }
 if test ${ac_cv_lib_osxfuse_fuse_main+y}
@@ -14506,81 +14358,15 @@ printf "%s\n" "$ac_cv_lib_osxfuse_fuse_main" >&6; }
 if test "x$ac_cv_lib_osxfuse_fuse_main" = xyes
 then :
   FUSE_LIB=-losxfuse
-else $as_nop
-  { printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for fuse_main in -lfuse" >&5
-printf %s "checking for fuse_main in -lfuse... " >&6; }
-if test ${ac_cv_lib_fuse_fuse_main+y}
-then :
-  printf %s "(cached) " >&6
-else $as_nop
-  ac_check_lib_save_LIBS=$LIBS
-LIBS="-lfuse  $LIBS"
-cat confdefs.h - <<_ACEOF >conftest.$ac_ext
-/* end confdefs.h.  */
-
-/* Override any GCC internal prototype to avoid an error.
-   Use char because int might match the return type of a GCC
-   builtin and then its argument prototype would still apply.  */
-char fuse_main ();
-int
-main (void)
-{
-return fuse_main ();
-  ;
-  return 0;
-}
-_ACEOF
-if ac_fn_c_try_link "$LINENO"
-then :
-  ac_cv_lib_fuse_fuse_main=yes
-else $as_nop
-  ac_cv_lib_fuse_fuse_main=no
-fi
-rm -f core conftest.err conftest.$ac_objext conftest.beam \
-    conftest$ac_exeext conftest.$ac_ext
-LIBS=$ac_check_lib_save_LIBS
-fi
-{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_cv_lib_fuse_fuse_main" >&5
-printf "%s\n" "$ac_cv_lib_fuse_fuse_main" >&6; }
-if test "x$ac_cv_lib_fuse_fuse_main" = xyes
-then :
-  FUSE_LIB=-lfuse
 else $as_nop
   FUSE_CMT="#"
 fi
 
-fi
-
-	fi
 
 elif test $pkg_failed = untried; then
         { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: no" >&5
 printf "%s\n" "no" >&6; }
 
-	       for ac_header in pthread.h fuse.h
-do :
-  as_ac_Header=`printf "%s\n" "ac_cv_header_$ac_header" | $as_tr_sh`
-ac_fn_c_check_header_compile "$LINENO" "$ac_header" "$as_ac_Header" "#define _FILE_OFFSET_BITS	64
-#define FUSE_USE_VERSION 29
-#ifdef __linux__
-# include <linux/fs.h>
-# include <linux/falloc.h>
-# include <linux/xattr.h>
-#endif
-"
-if eval test \"x\$"$as_ac_Header"\" = x"yes"
-then :
-  cat >>confdefs.h <<_ACEOF
-#define `printf "%s\n" "HAVE_$ac_header" | $as_tr_cpp` 1
-_ACEOF
-
-else $as_nop
-  FUSE_CMT="#"
-fi
-
-done
-	if test -z "$FUSE_CMT"
-	then
 		{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for fuse_main in -losxfuse" >&5
 printf %s "checking for fuse_main in -losxfuse... " >&6; }
 if test ${ac_cv_lib_osxfuse_fuse_main+y}
@@ -14619,73 +14405,30 @@ printf "%s\n" "$ac_cv_lib_osxfuse_fuse_main" >&6; }
 if test "x$ac_cv_lib_osxfuse_fuse_main" = xyes
 then :
   FUSE_LIB=-losxfuse
-else $as_nop
-  { printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for fuse_main in -lfuse" >&5
-printf %s "checking for fuse_main in -lfuse... " >&6; }
-if test ${ac_cv_lib_fuse_fuse_main+y}
-then :
-  printf %s "(cached) " >&6
-else $as_nop
-  ac_check_lib_save_LIBS=$LIBS
-LIBS="-lfuse  $LIBS"
-cat confdefs.h - <<_ACEOF >conftest.$ac_ext
-/* end confdefs.h.  */
-
-/* Override any GCC internal prototype to avoid an error.
-   Use char because int might match the return type of a GCC
-   builtin and then its argument prototype would still apply.  */
-char fuse_main ();
-int
-main (void)
-{
-return fuse_main ();
-  ;
-  return 0;
-}
-_ACEOF
-if ac_fn_c_try_link "$LINENO"
-then :
-  ac_cv_lib_fuse_fuse_main=yes
-else $as_nop
-  ac_cv_lib_fuse_fuse_main=no
-fi
-rm -f core conftest.err conftest.$ac_objext conftest.beam \
-    conftest$ac_exeext conftest.$ac_ext
-LIBS=$ac_check_lib_save_LIBS
-fi
-{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_cv_lib_fuse_fuse_main" >&5
-printf "%s\n" "$ac_cv_lib_fuse_fuse_main" >&6; }
-if test "x$ac_cv_lib_fuse_fuse_main" = xyes
-then :
-  FUSE_LIB=-lfuse
 else $as_nop
   FUSE_CMT="#"
 fi
 
-fi
-
-	fi
 
 else
         fuse3_CFLAGS=$pkg_cv_fuse3_CFLAGS
         fuse3_LIBS=$pkg_cv_fuse3_LIBS
         { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: yes" >&5
 printf "%s\n" "yes" >&6; }
-
-	FUSE_LIB=-lfuse3
-
+        FUSE_LIB=-lfuse3
 fi
-if test -z "$FUSE_CMT"
-then
-	{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: Enabling fuse2fs by default." >&5
+	if test -z "$FUSE_CMT"
+	then
+		{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: Enabling fuse2fs by default." >&5
 printf "%s\n" "Enabling fuse2fs by default." >&6; }
-fi
+	fi
+
 
 fi
 
 
 
-if test "$FUSE_LIB" = "-lfuse3"
+if test -n "$FUSE_LIB"
 then
 	FUSE_USE_VERSION=314
 	CFLAGS="$fuse3_CFLAGS $CFLAGS"
@@ -14715,9 +14458,6 @@ See \`config.log' for more details" "$LINENO" 5; }
 fi
 
 done
-elif test -n "$FUSE_LIB"
-then
-	FUSE_USE_VERSION=29
 fi
 if test -n "$FUSE_USE_VERSION"
 then
diff --git a/configure.ac b/configure.ac
index 0591999b52b019..bf1b57377cd848 100644
--- a/configure.ac
+++ b/configure.ac
@@ -1367,69 +1367,49 @@ dnl
 AC_CHECK_LIB(rt, clock_gettime, [CLOCK_GETTIME_LIB=-lrt])
 AC_SUBST(CLOCK_GETTIME_LIB)
 dnl
-dnl Check to see if the FUSE library is -lfuse3, -losxfuse, or -lfuse
+dnl Check to see if the FUSE library is -lfuse3 or -losxfuse
 dnl
 FUSE_CMT=
 FUSE_LIB=
 dnl osxfuse.dylib supersedes fuselib.dylib
 AC_ARG_ENABLE([fuse2fs],
 AS_HELP_STRING([--disable-fuse2fs],[do not build fuse2fs]),
-if test "$enableval" = "no"
-then
-	FUSE_CMT="#"
-	AC_MSG_RESULT([Disabling fuse2fs])
-else
-	AC_PREPROC_IFELSE(
-[AC_LANG_PROGRAM([[#ifdef __linux__
-#include <linux/fs.h>
-#include <linux/falloc.h>
-#include <linux/xattr.h>
-#endif
-]], [])], [], [AC_MSG_FAILURE([Cannot find fuse2fs Linux headers.])])
-
-	PKG_CHECK_MODULES([fuse3], [fuse3],
-	  [
-		FUSE_LIB=-lfuse3
-	  ], [
-		AC_CHECK_HEADERS([pthread.h fuse.h], [],
-			[AC_MSG_FAILURE([Cannot find fuse2fs headers.])],
-[#define _FILE_OFFSET_BITS	64
-#define FUSE_USE_VERSION 29])
+[
+	if test "$enableval" = "no"
+	then
+		FUSE_CMT="#"
+		AC_MSG_RESULT([Disabling fuse2fs])
+	else
+		AC_PREPROC_IFELSE(
+	[AC_LANG_PROGRAM([[#ifdef __linux__
+	#include <linux/fs.h>
+	#include <linux/falloc.h>
+	#include <linux/xattr.h>
+	#endif
+	]], [])], [], [AC_MSG_FAILURE([Cannot find fuse2fs Linux headers.])])
 
+		PKG_CHECK_MODULES([fuse3], [fuse3], [FUSE_LIB=-lfuse3],
+		[
+			AC_CHECK_LIB(osxfuse, fuse_main, [FUSE_LIB=-losxfuse],
+				[AC_MSG_FAILURE([Cannot find fuse library.])])
+		])
+		AC_MSG_RESULT([Enabling fuse2fs])
+	fi
+], [
+	PKG_CHECK_MODULES([fuse3], [fuse3], [FUSE_LIB=-lfuse3],
+	[
 		AC_CHECK_LIB(osxfuse, fuse_main, [FUSE_LIB=-losxfuse],
-			[AC_CHECK_LIB(fuse, fuse_main, [FUSE_LIB=-lfuse],
-				[AC_MSG_FAILURE([Cannot find fuse library.])])])
-	  ])
-	AC_MSG_RESULT([Enabling fuse2fs])
-fi
-,
-PKG_CHECK_MODULES([fuse3], [fuse3],
-  [
-	FUSE_LIB=-lfuse3
-  ], [
-	AC_CHECK_HEADERS([pthread.h fuse.h], [], [FUSE_CMT="#"], 
-[#define _FILE_OFFSET_BITS	64
-#define FUSE_USE_VERSION 29
-#ifdef __linux__
-# include <linux/fs.h>
-# include <linux/falloc.h>
-# include <linux/xattr.h>
-#endif])
+			[FUSE_CMT="#"])
+	])
 	if test -z "$FUSE_CMT"
 	then
-		AC_CHECK_LIB(osxfuse, fuse_main, [FUSE_LIB=-losxfuse],
-			[AC_CHECK_LIB(fuse, fuse_main, [FUSE_LIB=-lfuse],
-				[FUSE_CMT="#"])])
+		AC_MSG_RESULT([Enabling fuse2fs by default.])
 	fi
-  ])
-if test -z "$FUSE_CMT"
-then
-	AC_MSG_RESULT([Enabling fuse2fs by default.])
-fi
+]
 )
 AC_SUBST(FUSE_LIB)
 AC_SUBST(FUSE_CMT)
-if test "$FUSE_LIB" = "-lfuse3"
+if test -n "$FUSE_LIB"
 then
 	FUSE_USE_VERSION=314
 	CFLAGS="$fuse3_CFLAGS $CFLAGS"
@@ -1443,9 +1423,6 @@ then
 #include <linux/falloc.h>
 #include <linux/xattr.h>
 #endif])
-elif test -n "$FUSE_LIB"
-then
-	FUSE_USE_VERSION=29
 fi
 if test -n "$FUSE_USE_VERSION"
 then
diff --git a/misc/fuse2fs.c b/misc/fuse2fs.c
index e3a350462f25f3..6290d22f2b9658 100644
--- a/misc/fuse2fs.c
+++ b/misc/fuse2fs.c
@@ -48,15 +48,6 @@
 #include "ext2fs/ext2fs.h"
 #include "ext2fs/ext2_fs.h"
 #include "ext2fs/ext2fsP.h"
-#if FUSE_VERSION >= FUSE_MAKE_VERSION(3, 0)
-# define FUSE_PLATFORM_OPTS	""
-#else
-# ifdef __linux__
-#  define FUSE_PLATFORM_OPTS	",use_ino,big_writes"
-# else
-#  define FUSE_PLATFORM_OPTS	",use_ino"
-# endif
-#endif
 
 #include "../version.h"
 #include "uuid/uuid.h"
@@ -171,11 +162,9 @@ static inline uint64_t round_down(uint64_t b, unsigned int align)
 		break; \
 	}
 
-#if FUSE_VERSION >= FUSE_MAKE_VERSION(2, 8)
-# ifdef _IOR
-#  ifdef _IOW
-#   define SUPPORT_I_FLAGS
-#  endif
+#ifdef _IOR
+# ifdef _IOW
+#  define SUPPORT_I_FLAGS
 # endif
 #endif
 
@@ -1311,11 +1300,8 @@ static inline int fuse_set_feature_flag(struct fuse_conn_info *conn,
 }
 #endif
 
-static void *op_init(struct fuse_conn_info *conn
-#if FUSE_VERSION >= FUSE_MAKE_VERSION(3, 0)
-			, struct fuse_config *cfg EXT2FS_ATTR((unused))
-#endif
-			)
+static void *op_init(struct fuse_conn_info *conn,
+		     struct fuse_config *cfg EXT2FS_ATTR((unused)))
 {
 	struct fuse2fs *ff = fuse2fs_get();
 	ext2_filsys fs;
@@ -1347,13 +1333,11 @@ static void *op_init(struct fuse_conn_info *conn
 #ifdef FUSE_CAP_NO_EXPORT_SUPPORT
 	fuse_set_feature_flag(conn, FUSE_CAP_NO_EXPORT_SUPPORT);
 #endif
-#if FUSE_VERSION >= FUSE_MAKE_VERSION(3, 0)
 	conn->time_gran = 1;
 	cfg->use_ino = 1;
 	if (ff->debug)
 		cfg->debug = 1;
 	cfg->nullpath_ok = 1;
-#endif
 
 	if (ff->kernel) {
 		char uuid[UUID_STR_SIZE];
@@ -1434,9 +1418,7 @@ static int stat_inode(ext2_filsys fs, ext2_ino_t ino, struct stat *statbuf)
 }
 
 static int __fuse2fs_file_ino(struct fuse2fs *ff, const char *path,
-#if FUSE_VERSION >= FUSE_MAKE_VERSION(3, 0)
 			      struct fuse_file_info *fp EXT2FS_ATTR((unused)),
-#endif
 			      ext2_ino_t *inop,
 			      const char *func,
 			      int line)
@@ -1444,7 +1426,6 @@ static int __fuse2fs_file_ino(struct fuse2fs *ff, const char *path,
 	ext2_filsys fs = ff->fs;
 	errcode_t err;
 
-#if FUSE_VERSION >= FUSE_MAKE_VERSION(3, 0)
 	if (fp) {
 		struct fuse2fs_file_handle *fh = fuse2fs_get_handle(fp);
 
@@ -1455,7 +1436,7 @@ static int __fuse2fs_file_ino(struct fuse2fs *ff, const char *path,
 		dbg_printf(ff, "%s: get ino=%d\n", func, fh->ino);
 		return 0;
 	}
-#endif
+
 	dbg_printf(ff, "%s: get path=%s\n", func, path);
 	err = ext2fs_namei(fs, EXT2_ROOT_INO, EXT2_ROOT_INO, path, inop);
 	if (err)
@@ -1464,19 +1445,11 @@ static int __fuse2fs_file_ino(struct fuse2fs *ff, const char *path,
 	return 0;
 }
 
-#if FUSE_VERSION >= FUSE_MAKE_VERSION(3, 0)
 # define fuse2fs_file_ino(ff, path, fp, inop) \
 	__fuse2fs_file_ino((ff), (path), (fp), (inop), __func__, __LINE__)
-#else
-# define fuse2fs_file_ino(ff, path, fp, inop) \
-	__fuse2fs_file_ino((ff), (path), NULL, (inop), __func__, __LINE__)
-#endif
 
-static int op_getattr(const char *path, struct stat *statbuf
-#if FUSE_VERSION >= FUSE_MAKE_VERSION(3, 0)
-			, struct fuse_file_info *fi
-#endif
-			)
+static int op_getattr(const char *path, struct stat *statbuf,
+		      struct fuse_file_info *fi)
 {
 	struct fuse2fs *ff = fuse2fs_get();
 	ext2_filsys fs;
@@ -2465,11 +2438,8 @@ static int update_dotdot_helper(ext2_ino_t dir EXT2FS_ATTR((unused)),
 	return 0;
 }
 
-static int op_rename(const char *from, const char *to
-#if FUSE_VERSION >= FUSE_MAKE_VERSION(3, 0)
-			, unsigned int flags EXT2FS_ATTR((unused))
-#endif
-			)
+static int op_rename(const char *from, const char *to,
+		     unsigned int flags EXT2FS_ATTR((unused)))
 {
 	struct fuse2fs *ff = fuse2fs_get();
 	ext2_filsys fs;
@@ -2482,11 +2452,9 @@ static int op_rename(const char *from, const char *to
 	int flushed = 0;
 	int ret = 0;
 
-#if FUSE_VERSION >= FUSE_MAKE_VERSION(3, 0)
 	/* renameat2 is not supported */
 	if (flags)
 		return -ENOSYS;
-#endif
 
 	FUSE2FS_CHECK_CONTEXT(ff);
 	dbg_printf(ff, "%s: renaming %s to %s\n", __func__, from, to);
@@ -2800,7 +2768,6 @@ static int op_link(const char *src, const char *dest)
 	return ret;
 }
 
-#if FUSE_VERSION >= FUSE_MAKE_VERSION(3, 0)
 /* Obtain group ids of the process that sent us a command(?) */
 static int get_req_groups(struct fuse2fs *ff, gid_t **gids, size_t *nr_gids)
 {
@@ -2879,19 +2846,8 @@ static int in_file_group(struct fuse_context *ctxt,
 	ext2fs_free_mem(&gids);
 	return ret;
 }
-#else
-static int in_file_group(struct fuse_context *ctxt,
-			 const struct ext2_inode_large *inode)
-{
-	return ctxt->gid == inode_gid(*inode);
-}
-#endif
 
-static int op_chmod(const char *path, mode_t mode
-#if FUSE_VERSION >= FUSE_MAKE_VERSION(3, 0)
-			, struct fuse_file_info *fi
-#endif
-			)
+static int op_chmod(const char *path, mode_t mode, struct fuse_file_info *fi)
 {
 	struct fuse_context *ctxt = fuse_get_context();
 	struct fuse2fs *ff = fuse2fs_get();
@@ -2958,11 +2914,8 @@ static int op_chmod(const char *path, mode_t mode
 	return ret;
 }
 
-static int op_chown(const char *path, uid_t owner, gid_t group
-#if FUSE_VERSION >= FUSE_MAKE_VERSION(3, 0)
-			, struct fuse_file_info *fi
-#endif
-			)
+static int op_chown(const char *path, uid_t owner, gid_t group,
+		    struct fuse_file_info *fi)
 {
 	struct fuse_context *ctxt = fuse_get_context();
 	struct fuse2fs *ff = fuse2fs_get();
@@ -3100,11 +3053,7 @@ static int fuse2fs_truncate(struct fuse2fs *ff, ext2_ino_t ino, off_t new_size)
 	return 0;
 }
 
-static int op_truncate(const char *path, off_t len
-#if FUSE_VERSION >= FUSE_MAKE_VERSION(3, 0)
-			, struct fuse_file_info *fi
-#endif
-			)
+static int op_truncate(const char *path, off_t len, struct fuse_file_info *fi)
 {
 	struct fuse2fs *ff = fuse2fs_get();
 	ext2_ino_t ino;
@@ -3834,9 +3783,7 @@ struct readdir_iter {
 	fuse_fill_dir_t func;
 
 	struct fuse2fs *ff;
-#if FUSE_VERSION >= FUSE_MAKE_VERSION(3, 0)
 	enum fuse_readdir_flags flags;
-#endif
 	unsigned int nr;
 	off_t startpos;
 	off_t dirpos;
@@ -3888,44 +3835,29 @@ static int op_readdir_iter(ext2_ino_t dir EXT2FS_ATTR((unused)),
 		return 0;
 
 	dbg_printf(i->ff, "READDIR%s ino=%d %u offset=0x%llx\n",
-#if FUSE_VERSION >= FUSE_MAKE_VERSION(3, 0)
 			i->flags == FUSE_READDIR_PLUS ? "PLUS" : "",
-#else
-			"",
-#endif
 			dir,
 			i->nr++,
 			(unsigned long long)i->dirpos);
 
-#if FUSE_VERSION >= FUSE_MAKE_VERSION(3, 0)
 	if (i->flags == FUSE_READDIR_PLUS) {
 		ret = stat_inode(i->fs, dirent->inode, &stat);
 		if (ret)
 			return DIRENT_ABORT;
 	}
-#endif
 
 	memcpy(namebuf, dirent->name, dirent->name_len & 0xFF);
 	namebuf[dirent->name_len & 0xFF] = 0;
-	ret = i->func(i->buf, namebuf, &stat, i->dirpos
-#if FUSE_VERSION >= FUSE_MAKE_VERSION(3, 0)
-			, 0
-#endif
-			);
+	ret = i->func(i->buf, namebuf, &stat, i->dirpos , 0);
 	if (ret)
 		return DIRENT_ABORT;
 
 	return 0;
 }
 
-static int op_readdir(const char *path EXT2FS_ATTR((unused)),
-		      void *buf, fuse_fill_dir_t fill_func,
-		      off_t offset,
-		      struct fuse_file_info *fp
-#if FUSE_VERSION >= FUSE_MAKE_VERSION(3, 0)
-			, enum fuse_readdir_flags flags
-#endif
-			)
+static int op_readdir(const char *path EXT2FS_ATTR((unused)), void *buf,
+		      fuse_fill_dir_t fill_func, off_t offset,
+		      struct fuse_file_info *fp, enum fuse_readdir_flags flags)
 {
 	struct fuse2fs *ff = fuse2fs_get();
 	struct fuse2fs_file_handle *fh = fuse2fs_get_handle(fp);
@@ -3934,9 +3866,7 @@ static int op_readdir(const char *path EXT2FS_ATTR((unused)),
 		.ff = ff,
 		.dirpos = 0,
 		.startpos = offset,
-#if FUSE_VERSION >= FUSE_MAKE_VERSION(3, 0)
 		.flags = flags,
-#endif
 	};
 	int ret = 0;
 
@@ -4119,82 +4049,8 @@ static int op_create(const char *path, mode_t mode, struct fuse_file_info *fp)
 	return ret;
 }
 
-#if FUSE_VERSION < FUSE_MAKE_VERSION(3, 0)
-static int op_ftruncate(const char *path EXT2FS_ATTR((unused)),
-			off_t len, struct fuse_file_info *fp)
-{
-	struct fuse2fs *ff = fuse2fs_get();
-	struct fuse2fs_file_handle *fh = fuse2fs_get_handle(fp);
-	ext2_filsys fs;
-	ext2_file_t efp;
-	errcode_t err;
-	int ret = 0;
-
-	FUSE2FS_CHECK_CONTEXT(ff);
-	FUSE2FS_CHECK_HANDLE(ff, fh);
-	dbg_printf(ff, "%s: ino=%d len=%jd\n", __func__, fh->ino,
-		   (intmax_t) len);
-	fs = fuse2fs_start(ff);
-	if (!fuse2fs_is_writeable(ff)) {
-		ret = -EROFS;
-		goto out;
-	}
-
-	err = ext2fs_file_open(fs, fh->ino, fh->open_flags, &efp);
-	if (err) {
-		ret = translate_error(fs, fh->ino, err);
-		goto out;
-	}
-
-	err = ext2fs_file_set_size2(efp, len);
-	if (err) {
-		ret = translate_error(fs, fh->ino, err);
-		goto out2;
-	}
-
-out2:
-	err = ext2fs_file_close(efp);
-	if (ret)
-		goto out;
-	if (err) {
-		ret = translate_error(fs, fh->ino, err);
-		goto out;
-	}
-
-	ret = update_mtime(fs, fh->ino, NULL);
-	if (ret)
-		goto out;
-
-out:
-	fuse2fs_finish(ff, ret);
-	return ret;
-}
-
-static int op_fgetattr(const char *path EXT2FS_ATTR((unused)),
-		       struct stat *statbuf,
-		       struct fuse_file_info *fp)
-{
-	struct fuse2fs *ff = fuse2fs_get();
-	ext2_filsys fs;
-	struct fuse2fs_file_handle *fh = fuse2fs_get_handle(fp);
-	int ret = 0;
-
-	FUSE2FS_CHECK_CONTEXT(ff);
-	FUSE2FS_CHECK_HANDLE(ff, fh);
-	dbg_printf(ff, "%s: ino=%d\n", __func__, fh->ino);
-	fs = fuse2fs_start(ff);
-	ret = stat_inode(fs, fh->ino, statbuf);
-	fuse2fs_finish(ff, ret);
-
-	return ret;
-}
-#endif /* FUSE_VERSION < FUSE_MAKE_VERSION(3, 0) */
-
-static int op_utimens(const char *path, const struct timespec ctv[2]
-#if FUSE_VERSION >= FUSE_MAKE_VERSION(3, 0)
-			, struct fuse_file_info *fi
-#endif
-			)
+static int op_utimens(const char *path, const struct timespec ctv[2],
+		      struct fuse_file_info *fi)
 {
 	struct fuse2fs *ff = fuse2fs_get();
 	struct timespec tv[2];
@@ -4626,13 +4482,8 @@ static int ioctl_shutdown(struct fuse2fs *ff, struct fuse2fs_file_handle *fh,
 	return 0;
 }
 
-#if FUSE_VERSION >= FUSE_MAKE_VERSION(2, 8)
 static int op_ioctl(const char *path EXT2FS_ATTR((unused)),
-#if FUSE_VERSION >= FUSE_MAKE_VERSION(3, 0)
 		    unsigned int cmd,
-#else
-		    int cmd,
-#endif
 		    void *arg EXT2FS_ATTR((unused)),
 		    struct fuse_file_info *fp,
 		    unsigned int flags EXT2FS_ATTR((unused)), void *data)
@@ -4683,7 +4534,6 @@ static int op_ioctl(const char *path EXT2FS_ATTR((unused)),
 
 	return ret;
 }
-#endif /* FUSE 28 */
 
 static int op_bmap(const char *path, size_t blocksize EXT2FS_ATTR((unused)),
 		   uint64_t *idx)
@@ -4714,8 +4564,7 @@ static int op_bmap(const char *path, size_t blocksize EXT2FS_ATTR((unused)),
 	return ret;
 }
 
-#if FUSE_VERSION >= FUSE_MAKE_VERSION(2, 9)
-# ifdef SUPPORT_FALLOCATE
+#ifdef SUPPORT_FALLOCATE
 static int fuse2fs_allocate_range(struct fuse2fs *ff,
 				  struct fuse2fs_file_handle *fh, int mode,
 				  off_t offset, off_t len)
@@ -4991,8 +4840,7 @@ static int op_fallocate(const char *path EXT2FS_ATTR((unused)), int mode,
 
 	return ret;
 }
-# endif /* SUPPORT_FALLOCATE */
-#endif /* FUSE 29 */
+#endif /* SUPPORT_FALLOCATE */
 
 static struct fuse_operations fs_ops = {
 	.init = op_init,
@@ -5025,34 +4873,15 @@ static struct fuse_operations fs_ops = {
 	.fsyncdir = op_fsync,
 	.access = op_access,
 	.create = op_create,
-#if FUSE_VERSION < FUSE_MAKE_VERSION(3, 0)
-	.ftruncate = op_ftruncate,
-	.fgetattr = op_fgetattr,
-#endif
 	.utimens = op_utimens,
-#if (FUSE_VERSION >= FUSE_MAKE_VERSION(2, 9)) && (FUSE_VERSION < FUSE_MAKE_VERSION(3, 0))
-# if defined(UTIME_NOW) || defined(UTIME_OMIT)
-	.flag_utime_omit_ok = 1,
-# endif
-#endif
 	.bmap = op_bmap,
 #ifdef SUPERFLUOUS
 	.lock = op_lock,
 	.poll = op_poll,
 #endif
-#if FUSE_VERSION >= FUSE_MAKE_VERSION(2, 8)
 	.ioctl = op_ioctl,
-#if FUSE_VERSION < FUSE_MAKE_VERSION(3, 0)
-	.flag_nullpath_ok = 1,
-#endif
-#endif
-#if FUSE_VERSION >= FUSE_MAKE_VERSION(2, 9)
-#if FUSE_VERSION < FUSE_MAKE_VERSION(3, 0)
-	.flag_nopath = 1,
-#endif
-# ifdef SUPPORT_FALLOCATE
+#ifdef SUPPORT_FALLOCATE
 	.fallocate = op_fallocate,
-# endif
 #endif
 };
 
@@ -5416,7 +5245,7 @@ int main(int argc, char *argv[])
 
 	/* Set up default fuse parameters */
 	snprintf(extra_args, BUFSIZ, "-okernel_cache,subtype=%s,"
-		 "fsname=%s,attr_timeout=0" FUSE_PLATFORM_OPTS,
+		 "fsname=%s,attr_timeout=0",
 		 get_subtype(argv[0]),
 		 fctx.device);
 	if (fctx.no_default_opts == 0)


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 01/21] fuse2fs: separate libfuse3 and fuse2fs detection in configure
  2025-09-16  0:22 ` [PATCHSET RFC v5 2/9] fuse4fs: fork a low level fuse server Darrick J. Wong
@ 2025-09-16  0:50   ` Darrick J. Wong
  2025-09-16  0:51   ` [PATCH 02/21] fuse2fs: start porting fuse2fs to lowlevel libfuse API Darrick J. Wong
                     ` (19 subsequent siblings)
  20 siblings, 0 replies; 87+ messages in thread
From: Darrick J. Wong @ 2025-09-16  0:50 UTC (permalink / raw)
  To: tytso
  Cc: miklos, neal, amir73il, linux-fsdevel, linux-ext4, John, bernd,
	joannelkoong

From: Darrick J. Wong <djwong@kernel.org>

Separate the detection of libfuse and fuse2fs so that we can add another
fuse server (fuse4fs) without tangling it up in --disable-fuse2fs.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 configure        |  301 +++++++++++++-----------------------------------------
 configure.ac     |   79 ++++++++------
 misc/Makefile.in |    6 +
 3 files changed, 116 insertions(+), 270 deletions(-)


diff --git a/configure b/configure
index 86c9bc77321eee..22031343f078ab 100755
--- a/configure
+++ b/configure
@@ -701,7 +701,7 @@ gcc_ranlib
 gcc_ar
 UNI_DIFF_OPTS
 SEM_INIT_LIB
-FUSE_CMT
+FUSE2FS_CMT
 FUSE_LIB
 fuse3_LIBS
 fuse3_CFLAGS
@@ -14052,214 +14052,8 @@ then :
 fi
 
 
-FUSE_CMT=
+
 FUSE_LIB=
-# Check whether --enable-fuse2fs was given.
-if test ${enable_fuse2fs+y}
-then :
-  enableval=$enable_fuse2fs;
-	if test "$enableval" = "no"
-	then
-		FUSE_CMT="#"
-		{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: Disabling fuse2fs" >&5
-printf "%s\n" "Disabling fuse2fs" >&6; }
-	else
-		cat confdefs.h - <<_ACEOF >conftest.$ac_ext
-/* end confdefs.h.  */
-#ifdef __linux__
-	#include <linux/fs.h>
-	#include <linux/falloc.h>
-	#include <linux/xattr.h>
-	#endif
-
-int
-main (void)
-{
-
-  ;
-  return 0;
-}
-_ACEOF
-if ac_fn_c_try_cpp "$LINENO"
-then :
-
-else $as_nop
-  { { printf "%s\n" "$as_me:${as_lineno-$LINENO}: error: in \`$ac_pwd':" >&5
-printf "%s\n" "$as_me: error: in \`$ac_pwd':" >&2;}
-as_fn_error $? "Cannot find fuse2fs Linux headers.
-See \`config.log' for more details" "$LINENO" 5; }
-fi
-rm -f conftest.err conftest.i conftest.$ac_ext
-
-
-pkg_failed=no
-{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for fuse3" >&5
-printf %s "checking for fuse3... " >&6; }
-
-if test -n "$fuse3_CFLAGS"; then
-    pkg_cv_fuse3_CFLAGS="$fuse3_CFLAGS"
- elif test -n "$PKG_CONFIG"; then
-    if test -n "$PKG_CONFIG" && \
-    { { printf "%s\n" "$as_me:${as_lineno-$LINENO}: \$PKG_CONFIG --exists --print-errors \"fuse3\""; } >&5
-  ($PKG_CONFIG --exists --print-errors "fuse3") 2>&5
-  ac_status=$?
-  printf "%s\n" "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
-  test $ac_status = 0; }; then
-  pkg_cv_fuse3_CFLAGS=`$PKG_CONFIG --cflags "fuse3" 2>/dev/null`
-		      test "x$?" != "x0" && pkg_failed=yes
-else
-  pkg_failed=yes
-fi
- else
-    pkg_failed=untried
-fi
-if test -n "$fuse3_LIBS"; then
-    pkg_cv_fuse3_LIBS="$fuse3_LIBS"
- elif test -n "$PKG_CONFIG"; then
-    if test -n "$PKG_CONFIG" && \
-    { { printf "%s\n" "$as_me:${as_lineno-$LINENO}: \$PKG_CONFIG --exists --print-errors \"fuse3\""; } >&5
-  ($PKG_CONFIG --exists --print-errors "fuse3") 2>&5
-  ac_status=$?
-  printf "%s\n" "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
-  test $ac_status = 0; }; then
-  pkg_cv_fuse3_LIBS=`$PKG_CONFIG --libs "fuse3" 2>/dev/null`
-		      test "x$?" != "x0" && pkg_failed=yes
-else
-  pkg_failed=yes
-fi
- else
-    pkg_failed=untried
-fi
-
-
-
-if test $pkg_failed = yes; then
-        { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: no" >&5
-printf "%s\n" "no" >&6; }
-
-if $PKG_CONFIG --atleast-pkgconfig-version 0.20; then
-        _pkg_short_errors_supported=yes
-else
-        _pkg_short_errors_supported=no
-fi
-        if test $_pkg_short_errors_supported = yes; then
-                fuse3_PKG_ERRORS=`$PKG_CONFIG --short-errors --print-errors --cflags --libs "fuse3" 2>&1`
-        else
-                fuse3_PKG_ERRORS=`$PKG_CONFIG --print-errors --cflags --libs "fuse3" 2>&1`
-        fi
-        # Put the nasty error message in config.log where it belongs
-        echo "$fuse3_PKG_ERRORS" >&5
-
-
-			{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for fuse_main in -losxfuse" >&5
-printf %s "checking for fuse_main in -losxfuse... " >&6; }
-if test ${ac_cv_lib_osxfuse_fuse_main+y}
-then :
-  printf %s "(cached) " >&6
-else $as_nop
-  ac_check_lib_save_LIBS=$LIBS
-LIBS="-losxfuse  $LIBS"
-cat confdefs.h - <<_ACEOF >conftest.$ac_ext
-/* end confdefs.h.  */
-
-/* Override any GCC internal prototype to avoid an error.
-   Use char because int might match the return type of a GCC
-   builtin and then its argument prototype would still apply.  */
-char fuse_main ();
-int
-main (void)
-{
-return fuse_main ();
-  ;
-  return 0;
-}
-_ACEOF
-if ac_fn_c_try_link "$LINENO"
-then :
-  ac_cv_lib_osxfuse_fuse_main=yes
-else $as_nop
-  ac_cv_lib_osxfuse_fuse_main=no
-fi
-rm -f core conftest.err conftest.$ac_objext conftest.beam \
-    conftest$ac_exeext conftest.$ac_ext
-LIBS=$ac_check_lib_save_LIBS
-fi
-{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_cv_lib_osxfuse_fuse_main" >&5
-printf "%s\n" "$ac_cv_lib_osxfuse_fuse_main" >&6; }
-if test "x$ac_cv_lib_osxfuse_fuse_main" = xyes
-then :
-  FUSE_LIB=-losxfuse
-else $as_nop
-  { { printf "%s\n" "$as_me:${as_lineno-$LINENO}: error: in \`$ac_pwd':" >&5
-printf "%s\n" "$as_me: error: in \`$ac_pwd':" >&2;}
-as_fn_error $? "Cannot find fuse library.
-See \`config.log' for more details" "$LINENO" 5; }
-fi
-
-
-elif test $pkg_failed = untried; then
-        { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: no" >&5
-printf "%s\n" "no" >&6; }
-
-			{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for fuse_main in -losxfuse" >&5
-printf %s "checking for fuse_main in -losxfuse... " >&6; }
-if test ${ac_cv_lib_osxfuse_fuse_main+y}
-then :
-  printf %s "(cached) " >&6
-else $as_nop
-  ac_check_lib_save_LIBS=$LIBS
-LIBS="-losxfuse  $LIBS"
-cat confdefs.h - <<_ACEOF >conftest.$ac_ext
-/* end confdefs.h.  */
-
-/* Override any GCC internal prototype to avoid an error.
-   Use char because int might match the return type of a GCC
-   builtin and then its argument prototype would still apply.  */
-char fuse_main ();
-int
-main (void)
-{
-return fuse_main ();
-  ;
-  return 0;
-}
-_ACEOF
-if ac_fn_c_try_link "$LINENO"
-then :
-  ac_cv_lib_osxfuse_fuse_main=yes
-else $as_nop
-  ac_cv_lib_osxfuse_fuse_main=no
-fi
-rm -f core conftest.err conftest.$ac_objext conftest.beam \
-    conftest$ac_exeext conftest.$ac_ext
-LIBS=$ac_check_lib_save_LIBS
-fi
-{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_cv_lib_osxfuse_fuse_main" >&5
-printf "%s\n" "$ac_cv_lib_osxfuse_fuse_main" >&6; }
-if test "x$ac_cv_lib_osxfuse_fuse_main" = xyes
-then :
-  FUSE_LIB=-losxfuse
-else $as_nop
-  { { printf "%s\n" "$as_me:${as_lineno-$LINENO}: error: in \`$ac_pwd':" >&5
-printf "%s\n" "$as_me: error: in \`$ac_pwd':" >&2;}
-as_fn_error $? "Cannot find fuse library.
-See \`config.log' for more details" "$LINENO" 5; }
-fi
-
-
-else
-        fuse3_CFLAGS=$pkg_cv_fuse3_CFLAGS
-        fuse3_LIBS=$pkg_cv_fuse3_LIBS
-        { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: yes" >&5
-printf "%s\n" "yes" >&6; }
-        FUSE_LIB=-lfuse3
-fi
-		{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: Enabling fuse2fs" >&5
-printf "%s\n" "Enabling fuse2fs" >&6; }
-	fi
-
-else $as_nop
-
 
 pkg_failed=no
 { printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for fuse3" >&5
@@ -14320,7 +14114,7 @@ fi
         echo "$fuse3_PKG_ERRORS" >&5
 
 
-		{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for fuse_main in -losxfuse" >&5
+	{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for fuse_main in -losxfuse" >&5
 printf %s "checking for fuse_main in -losxfuse... " >&6; }
 if test ${ac_cv_lib_osxfuse_fuse_main+y}
 then :
@@ -14358,8 +14152,6 @@ printf "%s\n" "$ac_cv_lib_osxfuse_fuse_main" >&6; }
 if test "x$ac_cv_lib_osxfuse_fuse_main" = xyes
 then :
   FUSE_LIB=-losxfuse
-else $as_nop
-  FUSE_CMT="#"
 fi
 
 
@@ -14367,7 +14159,7 @@ elif test $pkg_failed = untried; then
         { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: no" >&5
 printf "%s\n" "no" >&6; }
 
-		{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for fuse_main in -losxfuse" >&5
+	{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for fuse_main in -losxfuse" >&5
 printf %s "checking for fuse_main in -losxfuse... " >&6; }
 if test ${ac_cv_lib_osxfuse_fuse_main+y}
 then :
@@ -14405,8 +14197,6 @@ printf "%s\n" "$ac_cv_lib_osxfuse_fuse_main" >&6; }
 if test "x$ac_cv_lib_osxfuse_fuse_main" = xyes
 then :
   FUSE_LIB=-losxfuse
-else $as_nop
-  FUSE_CMT="#"
 fi
 
 
@@ -14417,15 +14207,6 @@ else
 printf "%s\n" "yes" >&6; }
         FUSE_LIB=-lfuse3
 fi
-	if test -z "$FUSE_CMT"
-	then
-		{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: Enabling fuse2fs by default." >&5
-printf "%s\n" "Enabling fuse2fs by default." >&6; }
-	fi
-
-
-fi
-
 
 
 if test -n "$FUSE_LIB"
@@ -14437,12 +14218,7 @@ then
 do :
   as_ac_Header=`printf "%s\n" "ac_cv_header_$ac_header" | $as_tr_sh`
 ac_fn_c_check_header_compile "$LINENO" "$ac_header" "$as_ac_Header" "#define _FILE_OFFSET_BITS	64
-#define FUSE_USE_VERSION 314
-#ifdef __linux__
-#include <linux/fs.h>
-#include <linux/falloc.h>
-#include <linux/xattr.h>
-#endif
+#define FUSE_USE_VERSION	314
 "
 if eval test \"x\$"$as_ac_Header"\" = x"yes"
 then :
@@ -14453,7 +14229,7 @@ _ACEOF
 else $as_nop
   { { printf "%s\n" "$as_me:${as_lineno-$LINENO}: error: in \`$ac_pwd':" >&5
 printf "%s\n" "$as_me: error: in \`$ac_pwd':" >&2;}
-as_fn_error $? "Cannot find fuse3 fuse2fs headers.
+as_fn_error $? "Cannot build against fuse3 headers
 See \`config.log' for more details" "$LINENO" 5; }
 fi
 
@@ -14466,6 +14242,71 @@ printf "%s\n" "#define FUSE_USE_VERSION $FUSE_USE_VERSION" >>confdefs.h
 
 fi
 
+FUSE2FS_CMT=
+# Check whether --enable-fuse2fs was given.
+if test ${enable_fuse2fs+y}
+then :
+  enableval=$enable_fuse2fs;
+	if test "$enableval" = "no"
+	then
+		FUSE2FS_CMT="#"
+		{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: Disabling fuse2fs" >&5
+printf "%s\n" "Disabling fuse2fs" >&6; }
+	else
+		cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+#ifdef __linux__
+	#include <linux/fs.h>
+	#include <linux/falloc.h>
+	#include <linux/xattr.h>
+	#endif
+
+int
+main (void)
+{
+
+  ;
+  return 0;
+}
+_ACEOF
+if ac_fn_c_try_cpp "$LINENO"
+then :
+
+else $as_nop
+  { { printf "%s\n" "$as_me:${as_lineno-$LINENO}: error: in \`$ac_pwd':" >&5
+printf "%s\n" "$as_me: error: in \`$ac_pwd':" >&2;}
+as_fn_error $? "Cannot find fuse2fs Linux headers
+See \`config.log' for more details" "$LINENO" 5; }
+fi
+rm -f conftest.err conftest.i conftest.$ac_ext
+
+		if test -z "$FUSE_USE_VERSION"
+		then
+			{ { printf "%s\n" "$as_me:${as_lineno-$LINENO}: error: in \`$ac_pwd':" >&5
+printf "%s\n" "$as_me: error: in \`$ac_pwd':" >&2;}
+as_fn_error $? "Cannot find fuse library.
+See \`config.log' for more details" "$LINENO" 5; }
+		fi
+		{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: Enabling fuse2fs" >&5
+printf "%s\n" "Enabling fuse2fs" >&6; }
+	fi
+
+else $as_nop
+
+	if test -n "$FUSE_USE_VERSION"
+	then
+		{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: Enabling fuse2fs by default" >&5
+printf "%s\n" "Enabling fuse2fs by default" >&6; }
+	else
+		{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: Disabling fuse2fs by default" >&5
+printf "%s\n" "Disabling fuse2fs by default" >&6; }
+	fi
+
+
+fi
+
+
+
 { printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for PR_SET_IO_FLUSHER" >&5
 printf %s "checking for PR_SET_IO_FLUSHER... " >&6; }
 cat confdefs.h - <<_ACEOF >conftest.$ac_ext
diff --git a/configure.ac b/configure.ac
index bf1b57377cd848..b40ed1456d1515 100644
--- a/configure.ac
+++ b/configure.ac
@@ -1366,18 +1366,48 @@ dnl Check to see if librt is required for clock_gettime() (glibc < 2.17)
 dnl
 AC_CHECK_LIB(rt, clock_gettime, [CLOCK_GETTIME_LIB=-lrt])
 AC_SUBST(CLOCK_GETTIME_LIB)
+
 dnl
 dnl Check to see if the FUSE library is -lfuse3 or -losxfuse
 dnl
-FUSE_CMT=
 FUSE_LIB=
 dnl osxfuse.dylib supersedes fuselib.dylib
+PKG_CHECK_MODULES([fuse3], [fuse3], [FUSE_LIB=-lfuse3],
+[
+	AC_CHECK_LIB(osxfuse, fuse_main, [FUSE_LIB=-losxfuse])
+])
+AC_SUBST(FUSE_LIB)
+
+dnl
+dnl Set FUSE_USE_VERSION, which is how fuse servers build against a particular
+dnl libfuse ABI.  Currently we link against the libfuse 3.14 ABI (hence 314)
+dnl
+if test -n "$FUSE_LIB"
+then
+	FUSE_USE_VERSION=314
+	CFLAGS="$fuse3_CFLAGS $CFLAGS"
+	FUSE_LIB="$fuse3_LIBS"
+	AC_CHECK_HEADERS([pthread.h fuse.h], [],
+		[AC_MSG_FAILURE([Cannot build against fuse3 headers])],
+[#define _FILE_OFFSET_BITS	64
+#define FUSE_USE_VERSION	314])
+fi
+if test -n "$FUSE_USE_VERSION"
+then
+	AC_DEFINE_UNQUOTED(FUSE_USE_VERSION, $FUSE_USE_VERSION,
+		[Define to the version of FUSE to use])
+fi
+
+dnl
+dnl Check if fuse2fs is actually built.
+dnl
+FUSE2FS_CMT=
 AC_ARG_ENABLE([fuse2fs],
 AS_HELP_STRING([--disable-fuse2fs],[do not build fuse2fs]),
 [
 	if test "$enableval" = "no"
 	then
-		FUSE_CMT="#"
+		FUSE2FS_CMT="#"
 		AC_MSG_RESULT([Disabling fuse2fs])
 	else
 		AC_PREPROC_IFELSE(
@@ -1386,49 +1416,24 @@ AS_HELP_STRING([--disable-fuse2fs],[do not build fuse2fs]),
 	#include <linux/falloc.h>
 	#include <linux/xattr.h>
 	#endif
-	]], [])], [], [AC_MSG_FAILURE([Cannot find fuse2fs Linux headers.])])
+	]], [])], [], [AC_MSG_FAILURE([Cannot find fuse2fs Linux headers])])
 
-		PKG_CHECK_MODULES([fuse3], [fuse3], [FUSE_LIB=-lfuse3],
-		[
-			AC_CHECK_LIB(osxfuse, fuse_main, [FUSE_LIB=-losxfuse],
-				[AC_MSG_FAILURE([Cannot find fuse library.])])
-		])
+		if test -z "$FUSE_USE_VERSION"
+		then
+			AC_MSG_FAILURE([Cannot find fuse library.])
+		fi
 		AC_MSG_RESULT([Enabling fuse2fs])
 	fi
 ], [
-	PKG_CHECK_MODULES([fuse3], [fuse3], [FUSE_LIB=-lfuse3],
-	[
-		AC_CHECK_LIB(osxfuse, fuse_main, [FUSE_LIB=-losxfuse],
-			[FUSE_CMT="#"])
-	])
-	if test -z "$FUSE_CMT"
+	if test -n "$FUSE_USE_VERSION"
 	then
-		AC_MSG_RESULT([Enabling fuse2fs by default.])
+		AC_MSG_RESULT([Enabling fuse2fs by default])
+	else
+		AC_MSG_RESULT([Disabling fuse2fs by default])
 	fi
 ]
 )
-AC_SUBST(FUSE_LIB)
-AC_SUBST(FUSE_CMT)
-if test -n "$FUSE_LIB"
-then
-	FUSE_USE_VERSION=314
-	CFLAGS="$fuse3_CFLAGS $CFLAGS"
-	FUSE_LIB="$fuse3_LIBS"
-	AC_CHECK_HEADERS([pthread.h fuse.h], [],
-		[AC_MSG_FAILURE([Cannot find fuse3 fuse2fs headers.])],
-[#define _FILE_OFFSET_BITS	64
-#define FUSE_USE_VERSION 314
-#ifdef __linux__
-#include <linux/fs.h>
-#include <linux/falloc.h>
-#include <linux/xattr.h>
-#endif])
-fi
-if test -n "$FUSE_USE_VERSION"
-then
-	AC_DEFINE_UNQUOTED(FUSE_USE_VERSION, $FUSE_USE_VERSION,
-		[Define to the version of FUSE to use])
-fi
+AC_SUBST(FUSE2FS_CMT)
 
 dnl
 dnl see if PR_SET_IO_FLUSHER exists
diff --git a/misc/Makefile.in b/misc/Makefile.in
index 0e3bed66dcb63d..b63a0424b19fec 100644
--- a/misc/Makefile.in
+++ b/misc/Makefile.in
@@ -34,7 +34,7 @@ MKDIR_P = @MKDIR_P@
 @BLKID_CMT@FINDFS_LINK= findfs
 @BLKID_CMT@FINDFS_MAN= findfs.8
 
-@FUSE_CMT@FUSE_PROG= fuse2fs
+@FUSE2FS_CMT@FUSE2FS_PROG= fuse2fs
 
 SPROGS=		mke2fs badblocks tune2fs dumpe2fs $(BLKID_PROG) logsave \
 			$(E2IMAGE_PROG) @FSCK_PROG@ e2undo
@@ -47,9 +47,9 @@ SMANPAGES=	tune2fs.8 mklost+found.8 mke2fs.8 dumpe2fs.8 badblocks.8 \
 			e2mmpstatus.8
 FMANPAGES=	mke2fs.conf.5 ext4.5
 
-UPROGS=		chattr lsattr $(FUSE_PROG) @UUID_CMT@ uuidgen
+UPROGS=		chattr lsattr $(FUSE2FS_PROG) @UUID_CMT@ uuidgen
 UMANPAGES=	chattr.1 lsattr.1 @UUID_CMT@ uuidgen.1
-UMANPAGES+=	@FUSE_CMT@ fuse2fs.1
+UMANPAGES+=	@FUSE2FS_CMT@ fuse2fs.1
 
 LPROGS=		@E2INITRD_PROG@
 


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 02/21] fuse2fs: start porting fuse2fs to lowlevel libfuse API
  2025-09-16  0:22 ` [PATCHSET RFC v5 2/9] fuse4fs: fork a low level fuse server Darrick J. Wong
  2025-09-16  0:50   ` [PATCH 01/21] fuse2fs: separate libfuse3 and fuse2fs detection in configure Darrick J. Wong
@ 2025-09-16  0:51   ` Darrick J. Wong
  2025-09-16  0:51   ` [PATCH 03/21] debian: create new package for fuse4fs Darrick J. Wong
                     ` (18 subsequent siblings)
  20 siblings, 0 replies; 87+ messages in thread
From: Darrick J. Wong @ 2025-09-16  0:51 UTC (permalink / raw)
  To: tytso
  Cc: miklos, neal, amir73il, linux-fsdevel, linux-ext4, John, bernd,
	joannelkoong

From: Darrick J. Wong <djwong@kernel.org>

Copy fuse2fs.c to fuse4fs.c.  This will become our testbed for trying
out lowlevel fuse server support in the next few patches.

Namespacing conversions performed via:
sed -e 's/fuse2fs/fuse4fs/g' -e 's/FUSE2FS/FUSE4FS/g' -e 's/F2OP_/F4OP_/g' -e 's/FUSE server/FUSE low-level server/g'

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 Makefile.in          |    3 
 configure            |  113 +
 configure.ac         |   65 +
 fuse4fs/Makefile.in  |  192 ++
 fuse4fs/fuse4fs.1.in |  118 +
 fuse4fs/fuse4fs.c    | 5516 ++++++++++++++++++++++++++++++++++++++++++++++++++
 lib/config.h.in      |    3 
 7 files changed, 6007 insertions(+), 3 deletions(-)
 create mode 100644 fuse4fs/Makefile.in
 create mode 100644 fuse4fs/fuse4fs.1.in
 create mode 100644 fuse4fs/fuse4fs.c


diff --git a/Makefile.in b/Makefile.in
index 277b500bbc9acc..d000f94bc88f0f 100644
--- a/Makefile.in
+++ b/Makefile.in
@@ -18,12 +18,13 @@ MKDIR_P = @MKDIR_P@
 @ALL_CMT@SUPPORT_LIB_SUBDIR= lib/support
 @ALL_CMT@E2P_LIB_SUBDIR= lib/e2p
 @ALL_CMT@EXT2FS_LIB_SUBDIR= lib/ext2fs
+@FUSE4FS_CMT@FUSE4FS_DIR=fuse4fs
 
 LIB_SUBDIRS=lib/et lib/ss $(E2P_LIB_SUBDIR) $(UUID_LIB_SUBDIR) \
 	$(BLKID_LIB_SUBDIR) $(SUPPORT_LIB_SUBDIR) $(EXT2FS_LIB_SUBDIR)
 
 PROG_SUBDIRS=e2fsck $(DEBUGFS_DIR) misc $(RESIZE_DIR) tests/progs \
-	tests/fuzz po $(E2SCRUB_DIR)
+	tests/fuzz po $(E2SCRUB_DIR) $(FUSE4FS_DIR)
 
 SUBDIRS=util $(LIB_SUBDIRS) $(PROG_SUBDIRS) tests
 
diff --git a/configure b/configure
index 22031343f078ab..7f5fb7c1a62084 100755
--- a/configure
+++ b/configure
@@ -701,6 +701,7 @@ gcc_ranlib
 gcc_ar
 UNI_DIFF_OPTS
 SEM_INIT_LIB
+FUSE4FS_CMT
 FUSE2FS_CMT
 FUSE_LIB
 fuse3_LIBS
@@ -933,6 +934,7 @@ with_libintl_prefix
 enable_largefile
 with_libarchive
 enable_fuse2fs
+enable_fuse4fs
 enable_lto
 enable_ubsan
 enable_addrsan
@@ -1628,6 +1630,7 @@ Optional Features:
   --disable-rpath         do not hardcode runtime library paths
   --disable-largefile     omit support for large files
   --disable-fuse2fs       do not build fuse2fs
+  --disable-fuse4fs       do not build fuse4fs
   --enable-lto            enable link time optimization
   --enable-ubsan          enable undefined behavior sanitizer
   --enable-addrsan        enable address sanitizer
@@ -14242,6 +14245,49 @@ printf "%s\n" "#define FUSE_USE_VERSION $FUSE_USE_VERSION" >>confdefs.h
 
 fi
 
+have_fuse_lowlevel=
+if test -n "$FUSE_USE_VERSION"
+then
+	{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for lowlevel interface in libfuse" >&5
+printf %s "checking for lowlevel interface in libfuse... " >&6; }
+	cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+
+	#define _GNU_SOURCE
+	#define _FILE_OFFSET_BITS	64
+	#define FUSE_USE_VERSION	314
+	#include <fuse_lowlevel.h>
+
+int
+main (void)
+{
+
+	struct fuse_lowlevel_ops fs_ops = { };
+
+  ;
+  return 0;
+}
+
+_ACEOF
+if ac_fn_c_try_link "$LINENO"
+then :
+  have_fuse_lowlevel=yes
+	   { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: yes" >&5
+printf "%s\n" "yes" >&6; }
+else $as_nop
+  { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: no" >&5
+printf "%s\n" "no" >&6; }
+fi
+rm -f core conftest.err conftest.$ac_objext conftest.beam \
+    conftest$ac_exeext conftest.$ac_ext
+fi
+if test -n "$have_fuse_lowlevel"
+then
+
+printf "%s\n" "#define HAVE_FUSE_LOWLEVEL 1" >>confdefs.h
+
+fi
+
 FUSE2FS_CMT=
 # Check whether --enable-fuse2fs was given.
 if test ${enable_fuse2fs+y}
@@ -14307,6 +14353,71 @@ fi
 
 
 
+FUSE4FS_CMT=
+# Check whether --enable-fuse4fs was given.
+if test ${enable_fuse4fs+y}
+then :
+  enableval=$enable_fuse4fs;
+	if test "$enableval" = "no"
+	then
+		FUSE4FS_CMT="#"
+		{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: Disabling fuse4fs" >&5
+printf "%s\n" "Disabling fuse4fs" >&6; }
+	else
+		cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+#ifdef __linux__
+	#include <linux/fs.h>
+	#include <linux/falloc.h>
+	#include <linux/xattr.h>
+	#endif
+
+int
+main (void)
+{
+
+  ;
+  return 0;
+}
+_ACEOF
+if ac_fn_c_try_cpp "$LINENO"
+then :
+
+else $as_nop
+  { { printf "%s\n" "$as_me:${as_lineno-$LINENO}: error: in \`$ac_pwd':" >&5
+printf "%s\n" "$as_me: error: in \`$ac_pwd':" >&2;}
+as_fn_error $? "Cannot find fuse4fs Linux headers
+See \`config.log' for more details" "$LINENO" 5; }
+fi
+rm -f conftest.err conftest.i conftest.$ac_ext
+
+		if test -z "$have_fuse_lowlevel"
+		then
+			{ { printf "%s\n" "$as_me:${as_lineno-$LINENO}: error: in \`$ac_pwd':" >&5
+printf "%s\n" "$as_me: error: in \`$ac_pwd':" >&2;}
+as_fn_error $? "Cannot find fuse lowlevel library.
+See \`config.log' for more details" "$LINENO" 5; }
+		fi
+		{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: Enabling fuse4fs" >&5
+printf "%s\n" "Enabling fuse4fs" >&6; }
+	fi
+
+else $as_nop
+
+	if test -n "$have_fuse_lowlevel"
+	then
+		{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: Enabling fuse4fs by default" >&5
+printf "%s\n" "Enabling fuse4fs by default" >&6; }
+	else
+		{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: Disabling fuse4fs by default" >&5
+printf "%s\n" "Disabling fuse4fs by default" >&6; }
+	fi
+
+
+fi
+
+
+
 { printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for PR_SET_IO_FLUSHER" >&5
 printf %s "checking for PR_SET_IO_FLUSHER... " >&6; }
 cat confdefs.h - <<_ACEOF >conftest.$ac_ext
@@ -15984,7 +16095,7 @@ for i in MCONFIG Makefile \
 	misc/Makefile ext2ed/Makefile e2fsck/Makefile \
 	debugfs/Makefile tests/Makefile tests/progs/Makefile \
 	tests/fuzz/Makefile resize/Makefile doc/Makefile \
-	po/Makefile.in scrub/Makefile; do
+	po/Makefile.in scrub/Makefile fuse4fs/Makefile; do
 	if test -d `dirname ${srcdir}/$i` ; then
 		outlist="$outlist $i"
 	fi
diff --git a/configure.ac b/configure.ac
index b40ed1456d1515..2eb11873ea0e50 100644
--- a/configure.ac
+++ b/configure.ac
@@ -1398,6 +1398,32 @@ then
 		[Define to the version of FUSE to use])
 fi
 
+dnl
+dnl Check if the FUSE lowlevel library is supported
+dnl
+have_fuse_lowlevel=
+if test -n "$FUSE_USE_VERSION"
+then
+	AC_MSG_CHECKING(for lowlevel interface in libfuse)
+	AC_LINK_IFELSE(
+	[	AC_LANG_PROGRAM([[
+	#define _GNU_SOURCE
+	#define _FILE_OFFSET_BITS	64
+	#define FUSE_USE_VERSION	314
+	#include <fuse_lowlevel.h>
+		]], [[
+	struct fuse_lowlevel_ops fs_ops = { };
+		]])
+	], have_fuse_lowlevel=yes
+	   AC_MSG_RESULT(yes),
+	   AC_MSG_RESULT(no))
+fi
+if test -n "$have_fuse_lowlevel"
+then
+	AC_DEFINE(HAVE_FUSE_LOWLEVEL, 1,
+		  [Define to 1 if fuse supports lowlevel API])
+fi
+
 dnl
 dnl Check if fuse2fs is actually built.
 dnl
@@ -1435,6 +1461,43 @@ AS_HELP_STRING([--disable-fuse2fs],[do not build fuse2fs]),
 )
 AC_SUBST(FUSE2FS_CMT)
 
+dnl
+dnl Check if fuse4fs is actually built.
+dnl
+FUSE4FS_CMT=
+AC_ARG_ENABLE([fuse4fs],
+AS_HELP_STRING([--disable-fuse4fs],[do not build fuse4fs]),
+[
+	if test "$enableval" = "no"
+	then
+		FUSE4FS_CMT="#"
+		AC_MSG_RESULT([Disabling fuse4fs])
+	else
+		AC_PREPROC_IFELSE(
+	[AC_LANG_PROGRAM([[#ifdef __linux__
+	#include <linux/fs.h>
+	#include <linux/falloc.h>
+	#include <linux/xattr.h>
+	#endif
+	]], [])], [], [AC_MSG_FAILURE([Cannot find fuse4fs Linux headers])])
+
+		if test -z "$have_fuse_lowlevel"
+		then
+			AC_MSG_FAILURE([Cannot find fuse lowlevel library.])
+		fi
+		AC_MSG_RESULT([Enabling fuse4fs])
+	fi
+], [
+	if test -n "$have_fuse_lowlevel"
+	then
+		AC_MSG_RESULT([Enabling fuse4fs by default])
+	else
+		AC_MSG_RESULT([Disabling fuse4fs by default])
+	fi
+]
+)
+AC_SUBST(FUSE4FS_CMT)
+
 dnl
 dnl see if PR_SET_IO_FLUSHER exists
 dnl
@@ -2042,7 +2105,7 @@ for i in MCONFIG Makefile \
 	misc/Makefile ext2ed/Makefile e2fsck/Makefile \
 	debugfs/Makefile tests/Makefile tests/progs/Makefile \
 	tests/fuzz/Makefile resize/Makefile doc/Makefile \
-	po/Makefile.in scrub/Makefile; do
+	po/Makefile.in scrub/Makefile fuse4fs/Makefile; do
 	if test -d `dirname ${srcdir}/$i` ; then
 		outlist="$outlist $i"
 	fi
diff --git a/fuse4fs/Makefile.in b/fuse4fs/Makefile.in
new file mode 100644
index 00000000000000..bc137a765ee2b7
--- /dev/null
+++ b/fuse4fs/Makefile.in
@@ -0,0 +1,192 @@
+#
+# Standard e2fsprogs prologue....
+#
+
+srcdir = @srcdir@
+top_srcdir = @top_srcdir@
+VPATH = @srcdir@
+top_builddir = ..
+my_dir = misc
+INSTALL = @INSTALL@
+MKDIR_P = @MKDIR_P@
+
+@MCONFIG@
+
+UPROGS=
+UMANPAGES=
+@FUSE4FS_CMT@UPROGS+=fuse4fs
+@FUSE4FS_CMT@UMANPAGES+=fuse4fs.1
+
+FUSE4FS_OBJS=	fuse4fs.o journal.o recovery.o revoke.o
+
+PROFILED_FUSE4FS_OJBS=	profiled/fuse4fs.o profiled/journal.o \
+			profiled/recovery.o profiled/revoke.o
+
+SRCS=\
+	$(srcdir)/fuse4fs.c \
+	$(srcdir)/../debugfs/journal.c \
+	$(srcdir)/../e2fsck/revoke.c \
+	$(srcdir)/../e2fsck/recovery.c
+
+LIBS= $(LIBEXT2FS) $(LIBCOM_ERR) $(LIBSUPPORT)
+DEPLIBS= $(LIBEXT2FS) $(DEPLIBCOM_ERR) $(DEPLIBSUPPORT)
+PROFILED_LIBS= $(LIBSUPPORT) $(PROFILED_LIBEXT2FS) $(PROFILED_LIBCOM_ERR)
+PROFILED_DEPLIBS= $(DEPLIBSUPPORT) $(PROFILED_LIBEXT2FS) $(DEPPROFILED_LIBCOM_ERR)
+
+STATIC_LIBS= $(LIBSUPPORT) $(STATIC_LIBEXT2FS) $(STATIC_LIBCOM_ERR)
+STATIC_DEPLIBS= $(DEPLIBSUPPORT) $(STATIC_LIBEXT2FS) $(DEPSTATIC_LIBCOM_ERR)
+
+LIBS_E2P= $(LIBE2P) $(LIBCOM_ERR)
+DEPLIBS_E2P= $(LIBE2P) $(DEPLIBCOM_ERR)
+
+COMPILE_ET=	_ET_DIR_OVERRIDE=$(srcdir)/../lib/et/et ../lib/et/compile_et
+
+# This nastiness is needed because of jfs_user.h hackery; when we finally
+# clean up this mess, we should be able to drop it
+JOURNAL_CFLAGS = -I$(srcdir)/../e2fsck $(ALL_CFLAGS) -DDEBUGFS
+DEPEND_CFLAGS = -I$(top_srcdir)/e2fsck
+
+.c.o:
+	$(E) "	CC $<"
+	$(Q) $(CC) -c $(ALL_CFLAGS) $< -o $@
+	$(Q) $(CHECK_CMD) $(ALL_CFLAGS) $<
+	$(Q) $(CPPCHECK_CMD) $(CPPFLAGS) $<
+@PROFILE_CMT@	$(Q) $(CC) $(ALL_CFLAGS) -g -pg -o profiled/$*.o -c $<
+
+all:: profiled $(SPROGS) $(UPROGS) $(USPROGS) $(SMANPAGES) $(UMANPAGES) \
+	$(FMANPAGES) $(LPROGS)
+
+all-static::
+
+@PROFILE_CMT@all:: fuse4fs.profiled
+
+profiled:
+@PROFILE_CMT@	$(E) "	MKDIR $@"
+@PROFILE_CMT@	$(Q) mkdir profiled
+
+fuse4fs: $(FUSE4FS_OBJS) $(DEPLIBS) $(DEPLIBBLKID) $(DEPLIBUUID) \
+		$(LIBEXT2FS) $(DEPLIBS_E2P)
+	$(E) "	LD $@"
+	$(Q) $(CC) $(ALL_LDFLAGS) -o fuse4fs $(FUSE4FS_OBJS) $(LIBS) \
+		$(LIBFUSE) $(LIBBLKID) $(LIBUUID) $(LIBEXT2FS) $(LIBINTL) \
+		$(CLOCK_GETTIME_LIB) $(SYSLIBS) $(LIBS_E2P)
+
+journal.o: $(srcdir)/../debugfs/journal.c
+	$(E) "	CC $<"
+	$(Q) $(CC) -c $(JOURNAL_CFLAGS) -I$(srcdir) \
+		$(srcdir)/../debugfs/journal.c -o $@
+@PROFILE_CMT@	$(Q) $(CC) $(JOURNAL_CFLAGS) -g -pg -o profiled/$*.o -c $<
+
+recovery.o: $(srcdir)/../e2fsck/recovery.c
+	$(E) "	CC $<"
+	$(Q) $(CC) -c $(JOURNAL_CFLAGS) -I$(srcdir) \
+		$(srcdir)/../e2fsck/recovery.c -o $@
+@PROFILE_CMT@	$(Q) $(CC) $(JOURNAL_CFLAGS) -g -pg -o profiled/$*.o -c $<
+
+revoke.o: $(srcdir)/../e2fsck/revoke.c
+	$(E) "	CC $<"
+	$(Q) $(CC) -c $(JOURNAL_CFLAGS) -I$(srcdir) \
+		$(srcdir)/../e2fsck/revoke.c -o $@
+@PROFILE_CMT@	$(Q) $(CC) $(JOURNAL_CFLAGS) -g -pg -o profiled/$*.o -c $<
+
+fuse4fs.1: $(DEP_SUBSTITUTE) $(srcdir)/fuse4fs.1.in
+	$(E) "	SUBST $@"
+	$(Q) $(SUBSTITUTE_UPTIME) $(srcdir)/fuse4fs.1.in fuse4fs.1
+
+installdirs:
+	$(E) "	MKDIR_P $(bindir) $(man1dir)"
+	$(Q) $(MKDIR_P) $(DESTDIR)$(bindir) $(DESTDIR)$(man1dir)
+
+install: all $(UMANPAGES) installdirs
+	$(Q) for i in $(UPROGS); do \
+		$(ES) "	INSTALL $(bindir)/$$i"; \
+		$(INSTALL_PROGRAM) $$i $(DESTDIR)$(bindir)/$$i; \
+	done
+	$(Q) for i in $(UMANPAGES); do \
+		for j in $(COMPRESS_EXT); do \
+			$(RM) -f $(DESTDIR)$(man1dir)/$$i.$$j; \
+		done; \
+		$(ES) "	INSTALL_DATA $(man1dir)/$$i"; \
+		$(INSTALL_DATA) $$i $(DESTDIR)$(man1dir)/$$i; \
+	done
+
+install-strip: install
+	$(Q) for i in $(UPROGS); do \
+		$(E) "	STRIP $(bindir)/$$i"; \
+		$(STRIP) $(DESTDIR)$(bindir)/$$i; \
+	done
+
+uninstall:
+	for i in $(UPROGS); do \
+		$(RM) -f $(DESTDIR)$(bindir)/$$i; \
+	done
+	for i in $(UMANPAGES); do \
+		$(RM) -f $(DESTDIR)$(man1dir)/$$i; \
+	done
+
+clean::
+	$(RM) -f $(UPROGS) $(UMANPAGES) profile.h \
+		fuse4fs.profiled \
+		profiled/*.o \#* *.s *.o *.a *~ core gmon.out
+
+mostlyclean: clean
+distclean: clean
+	$(RM) -f .depend Makefile $(srcdir)/TAGS $(srcdir)/Makefile.in.old
+
+# +++ Dependency line eater +++
+#
+# Makefile dependencies follow.  This must be the last section in
+# the Makefile.in file
+#
+fuse4fs.o: $(srcdir)/fuse4fs.c $(top_builddir)/lib/config.h \
+ $(top_builddir)/lib/dirpaths.h $(top_srcdir)/lib/ext2fs/ext2fs.h \
+ $(top_builddir)/lib/ext2fs/ext2_types.h $(top_srcdir)/lib/ext2fs/ext2_fs.h \
+ $(top_srcdir)/lib/ext2fs/ext3_extents.h $(top_srcdir)/lib/et/com_err.h \
+ $(top_srcdir)/lib/ext2fs/ext2_io.h $(top_builddir)/lib/ext2fs/ext2_err.h \
+ $(top_srcdir)/lib/ext2fs/ext2_ext_attr.h $(top_srcdir)/lib/ext2fs/hashmap.h \
+ $(top_srcdir)/lib/ext2fs/bitops.h $(top_srcdir)/lib/ext2fs/ext2fsP.h \
+ $(top_srcdir)/lib/ext2fs/ext2fs.h $(top_srcdir)/version.h \
+ $(top_srcdir)/lib/e2p/e2p.h
+journal.o: $(srcdir)/../debugfs/journal.c $(top_builddir)/lib/config.h \
+ $(top_builddir)/lib/dirpaths.h $(srcdir)/../debugfs/journal.h \
+ $(top_srcdir)/e2fsck/jfs_user.h $(top_srcdir)/e2fsck/e2fsck.h \
+ $(top_srcdir)/lib/ext2fs/ext2_fs.h $(top_builddir)/lib/ext2fs/ext2_types.h \
+ $(top_srcdir)/lib/ext2fs/ext2fs.h $(top_srcdir)/lib/ext2fs/ext3_extents.h \
+ $(top_srcdir)/lib/et/com_err.h $(top_srcdir)/lib/ext2fs/ext2_io.h \
+ $(top_builddir)/lib/ext2fs/ext2_err.h \
+ $(top_srcdir)/lib/ext2fs/ext2_ext_attr.h $(top_srcdir)/lib/ext2fs/hashmap.h \
+ $(top_srcdir)/lib/ext2fs/bitops.h $(top_srcdir)/lib/support/profile.h \
+ $(top_builddir)/lib/support/prof_err.h $(top_srcdir)/lib/support/quotaio.h \
+ $(top_srcdir)/lib/support/dqblk_v2.h \
+ $(top_srcdir)/lib/support/quotaio_tree.h \
+ $(top_srcdir)/lib/ext2fs/fast_commit.h $(top_srcdir)/lib/ext2fs/jfs_compat.h \
+ $(top_srcdir)/lib/ext2fs/kernel-list.h $(top_srcdir)/lib/ext2fs/compiler.h \
+ $(top_srcdir)/lib/ext2fs/kernel-jbd.h
+revoke.o: $(srcdir)/../e2fsck/revoke.c $(srcdir)/../e2fsck/jfs_user.h \
+ $(top_builddir)/lib/config.h $(top_builddir)/lib/dirpaths.h \
+ $(srcdir)/../e2fsck/e2fsck.h $(top_srcdir)/lib/ext2fs/ext2_fs.h \
+ $(top_builddir)/lib/ext2fs/ext2_types.h $(top_srcdir)/lib/ext2fs/ext2fs.h \
+ $(top_srcdir)/lib/ext2fs/ext3_extents.h $(top_srcdir)/lib/et/com_err.h \
+ $(top_srcdir)/lib/ext2fs/ext2_io.h $(top_builddir)/lib/ext2fs/ext2_err.h \
+ $(top_srcdir)/lib/ext2fs/ext2_ext_attr.h $(top_srcdir)/lib/ext2fs/hashmap.h \
+ $(top_srcdir)/lib/ext2fs/bitops.h $(top_srcdir)/lib/support/profile.h \
+ $(top_builddir)/lib/support/prof_err.h $(top_srcdir)/lib/support/quotaio.h \
+ $(top_srcdir)/lib/support/dqblk_v2.h \
+ $(top_srcdir)/lib/support/quotaio_tree.h \
+ $(top_srcdir)/lib/ext2fs/fast_commit.h $(top_srcdir)/lib/ext2fs/jfs_compat.h \
+ $(top_srcdir)/lib/ext2fs/kernel-list.h $(top_srcdir)/lib/ext2fs/compiler.h \
+ $(top_srcdir)/lib/ext2fs/kernel-jbd.h
+recovery.o: $(srcdir)/../e2fsck/recovery.c $(srcdir)/../e2fsck/jfs_user.h \
+ $(top_builddir)/lib/config.h $(top_builddir)/lib/dirpaths.h \
+ $(srcdir)/../e2fsck/e2fsck.h $(top_srcdir)/lib/ext2fs/ext2_fs.h \
+ $(top_builddir)/lib/ext2fs/ext2_types.h $(top_srcdir)/lib/ext2fs/ext2fs.h \
+ $(top_srcdir)/lib/ext2fs/ext3_extents.h $(top_srcdir)/lib/et/com_err.h \
+ $(top_srcdir)/lib/ext2fs/ext2_io.h $(top_builddir)/lib/ext2fs/ext2_err.h \
+ $(top_srcdir)/lib/ext2fs/ext2_ext_attr.h $(top_srcdir)/lib/ext2fs/hashmap.h \
+ $(top_srcdir)/lib/ext2fs/bitops.h $(top_srcdir)/lib/support/profile.h \
+ $(top_builddir)/lib/support/prof_err.h $(top_srcdir)/lib/support/quotaio.h \
+ $(top_srcdir)/lib/support/dqblk_v2.h \
+ $(top_srcdir)/lib/support/quotaio_tree.h \
+ $(top_srcdir)/lib/ext2fs/fast_commit.h $(top_srcdir)/lib/ext2fs/jfs_compat.h \
+ $(top_srcdir)/lib/ext2fs/kernel-list.h $(top_srcdir)/lib/ext2fs/compiler.h \
+ $(top_srcdir)/lib/ext2fs/kernel-jbd.h
diff --git a/fuse4fs/fuse4fs.1.in b/fuse4fs/fuse4fs.1.in
new file mode 100644
index 00000000000000..8bef5f48802385
--- /dev/null
+++ b/fuse4fs/fuse4fs.1.in
@@ -0,0 +1,118 @@
+.\" -*- nroff -*-
+.\" Copyright 2025 Oracle.  All Rights Reserved.
+.\" This file may be copied under the terms of the GNU Public License.
+.\"
+.TH FUSE4FS 1 "@E2FSPROGS_MONTH@ @E2FSPROGS_YEAR@" "E2fsprogs version @E2FSPROGS_VERSION@"
+.SH NAME
+fuse4fs \- FUSE file system client for ext2/ext3/ext4 file systems
+.SH SYNOPSIS
+.B fuse4fs
+[
+.B device/image
+]
+[
+.B mountpoint
+]
+[
+.I options
+]
+.SH DESCRIPTION
+.B fuse4fs
+is a FUSE file system client that supports reading and writing from
+devices or image files containing ext2, ext3, and ext4 file systems.
+.SH OPTIONS
+.SS "general options:"
+.TP
+\fB\-o\fR opt,[opt...]
+mount options
+.TP
+\fB\-h\fR   \fB\-\-help\fR
+print help
+.TP
+\fB\-V\fR   \fB\-\-version\fR
+print version
+.SS "fuse4fs options:"
+.TP
+\fB-o\fR ro
+read-only mount
+.TP
+\fB-o\fR rw
+read-write mount (default)
+.TP
+\fB-o\fR bsddf
+bsd-style df (default)
+.TP
+\fB-o\fR minixdf
+minix-style df
+.TP
+\fB-o\fR acl
+enable file access control lists
+.TP
+\fB-o\fR cache_size
+Set the disk cache size to this quantity.
+The quantity may contain the suffixes k, m, or g.
+By default, the size is 32MB.
+The size may not be larger than 2GB.
+.TP
+\fB-o\fR direct
+Use O_DIRECT to access the block device.
+.TP
+\fB-o\fR dirsync
+Flush dirty metadata to disk after every directory update.
+.TP
+\fB-o\fR errors=continue
+ignore errors
+.TP
+\fB-o\fR errors=remount-ro
+stop allowing writes after errors
+.TP
+\fB-o\fR errors=panic
+dump core on error
+.TP
+\fB-o\fR fakeroot
+pretend to be root for permission checks
+.TP
+\fB-o\fR fuse4fs_debug
+enable fuse4fs debugging
+.TP
+\fB-o\fR kernel
+Behave more like the kernel ext4 driver in the following ways:
+Allows processes owned by other users to access the filesystem.
+Uses the kernel's permissions checking logic instead of fuse4fs's.
+Enables setuid and device files.
+Note that these options can still be overridden (e.g.
+.I nosuid
+) later.
+.TP
+\fB-o\fR lockfile=path
+use this file to control access to the filesystem
+.TP
+\fB-o\fR no_default_opts
+do not include default fuse options
+.TP
+\fB-o\fR norecovery
+do not replay the journal and mount the file system read-only
+.SS "FUSE options:"
+.TP
+\fB-d -o\fR debug
+enable debug output (implies -f)
+.TP
+\fB-f\fR
+foreground operation
+.TP
+\fB-s\fR
+disable multi-threaded operation
+.P
+For other FUSE options please see
+.BR mount.fuse (8)
+or see the output of
+.I fuse4fs \-\-helpfull
+.SH AVAILABILITY
+.B fuse4fs
+is part of the e2fsprogs package and is available from
+http://e2fsprogs.sourceforge.net.
+.SH SEE ALSO
+.BR ext4 (5)
+.BR e2fsck (8),
+.BR mount.fuse (8)
+
diff --git a/fuse4fs/fuse4fs.c b/fuse4fs/fuse4fs.c
new file mode 100644
index 00000000000000..99b9b902b37a57
--- /dev/null
+++ b/fuse4fs/fuse4fs.c
@@ -0,0 +1,5516 @@
+/*
+ * fuse4fs.c - FUSE low-level server for e2fsprogs.
+ *
+ * Copyright (C) 2014-2025 Oracle.
+ *
+ * %Begin-Header%
+ * This file may be redistributed under the terms of the GNU Public
+ * License.
+ * %End-Header%
+ */
+#ifndef _GNU_SOURCE
+#define _GNU_SOURCE
+#endif
+#include "config.h"
+#include <pthread.h>
+#ifdef __linux__
+# include <linux/fs.h>
+# include <linux/falloc.h>
+# include <linux/xattr.h>
+# include <sys/prctl.h>
+#endif
+#ifdef HAVE_SYS_XATTR_H
+#include <sys/xattr.h>
+#endif
+#include <sys/ioctl.h>
+#include <unistd.h>
+#include <ctype.h>
+#include <stdbool.h>
+#define FUSE_DARWIN_ENABLE_EXTENSIONS 0
+#ifdef __SET_FOB_FOR_FUSE
+# error Do not set magic value __SET_FOB_FOR_FUSE!!!!
+#endif
+#ifndef _FILE_OFFSET_BITS
+/*
+ * Old versions of libfuse (e.g. Debian 2.9.9 package) required that the build
+ * system set _FILE_OFFSET_BITS explicitly, even if doing so isn't required to
+ * get a 64-bit off_t.  AC_SYS_LARGEFILE doesn't set any _FILE_OFFSET_BITS if
+ * it's not required (such as on aarch64), so we must inject it here.
+ */
+# define __SET_FOB_FOR_FUSE
+# define _FILE_OFFSET_BITS 64
+#endif /* _FILE_OFFSET_BITS */
+#include <fuse.h>
+#ifdef __SET_FOB_FOR_FUSE
+# undef _FILE_OFFSET_BITS
+#endif /* __SET_FOB_FOR_FUSE */
+#include <inttypes.h>
+#include "ext2fs/ext2fs.h"
+#include "ext2fs/ext2_fs.h"
+#include "ext2fs/ext2fsP.h"
+
+#include "../version.h"
+#include "uuid/uuid.h"
+#include "e2p/e2p.h"
+
+#ifdef ENABLE_NLS
+#include <libintl.h>
+#include <locale.h>
+#define _(a) (gettext(a))
+#ifdef gettext_noop
+#define N_(a) gettext_noop(a)
+#else
+#define N_(a) (a)
+#endif
+#define P_(singular, plural, n) (ngettext(singular, plural, n))
+#ifndef NLS_CAT_NAME
+#define NLS_CAT_NAME "e2fsprogs"
+#endif
+#ifndef LOCALEDIR
+#define LOCALEDIR "/usr/share/locale"
+#endif
+#else
+#define _(a) (a)
+#define N_(a) a
+#define P_(singular, plural, n) ((n) == 1 ? (singular) : (plural))
+#endif
+
+#ifndef XATTR_NAME_POSIX_ACL_DEFAULT
+#define XATTR_NAME_POSIX_ACL_DEFAULT "posix_acl_default"
+#endif
+#ifndef XATTR_SECURITY_PREFIX
+#define XATTR_SECURITY_PREFIX "security."
+#define XATTR_SECURITY_PREFIX_LEN (sizeof (XATTR_SECURITY_PREFIX) - 1)
+#endif
+
+/*
+ * Linux and MacOS implement the setxattr(2) interface, which defines
+ * XATTR_CREATE and XATTR_REPLACE.  However, FreeBSD uses
+ * extattr_set_file(2), which does not have a flags or options
+ * parameter, and does not define XATTR_CREATE and XATTR_REPLACE.
+ */
+#ifndef XATTR_CREATE
+#define XATTR_CREATE 0
+#endif
+
+#ifndef XATTR_REPLACE
+#define XATTR_REPLACE 0
+#endif
+
+#if !defined(EUCLEAN)
+#if !defined(EBADMSG)
+#define EUCLEAN EBADMSG
+#elif !defined(EPROTO)
+#define EUCLEAN EPROTO
+#else
+#define EUCLEAN EIO
+#endif
+#endif /* !defined(EUCLEAN) */
+
+#if !defined(ENODATA)
+#ifdef ENOATTR
+#define ENODATA ENOATTR
+#else
+#define ENODATA ENOENT
+#endif
+#endif /* !defined(ENODATA) */
+
+static inline uint64_t round_up(uint64_t b, unsigned int align)
+{
+	unsigned int m;
+
+	if (align == 0)
+		return b;
+	m = b % align;
+	if (m)
+		b += align - m;
+	return b;
+}
+
+static inline uint64_t round_down(uint64_t b, unsigned int align)
+{
+	unsigned int m;
+
+	if (align == 0)
+		return b;
+	m = b % align;
+	return b - m;
+}
+
+#define dbg_printf(fuse4fs, format, ...) \
+	while ((fuse4fs)->debug) { \
+		printf("FUSE4FS (%s): tid=%d " format, (fuse4fs)->shortdev, gettid(), ##__VA_ARGS__); \
+		fflush(stdout); \
+		break; \
+	}
+
+#define log_printf(fuse4fs, format, ...) \
+	do { \
+		printf("FUSE4FS (%s): " format, (fuse4fs)->shortdev, ##__VA_ARGS__); \
+		fflush(stdout); \
+	} while (0)
+
+#define err_printf(fuse4fs, format, ...) \
+	do { \
+		fprintf(stderr, "FUSE4FS (%s): " format, (fuse4fs)->shortdev, ##__VA_ARGS__); \
+		fflush(stderr); \
+	} while (0)
+
+#define timing_printf(fuse4fs, format, ...) \
+	while ((fuse4fs)->timing) { \
+		printf("FUSE4FS (%s): " format, (fuse4fs)->shortdev, ##__VA_ARGS__); \
+		break; \
+	}
+
+#ifdef _IOR
+# ifdef _IOW
+#  define SUPPORT_I_FLAGS
+# endif
+#endif
+
+#ifdef FALLOC_FL_KEEP_SIZE
+# define FL_KEEP_SIZE_FLAG FALLOC_FL_KEEP_SIZE
+# define SUPPORT_FALLOCATE
+#else
+# define FL_KEEP_SIZE_FLAG (0)
+#endif
+
+#ifdef FALLOC_FL_PUNCH_HOLE
+# define FL_PUNCH_HOLE_FLAG FALLOC_FL_PUNCH_HOLE
+#else
+# define FL_PUNCH_HOLE_FLAG (0)
+#endif
+
+#ifdef FALLOC_FL_ZERO_RANGE
+# define FL_ZERO_RANGE_FLAG FALLOC_FL_ZERO_RANGE
+#else
+# define FL_ZERO_RANGE_FLAG (0)
+#endif
+
+errcode_t ext2fs_run_ext3_journal(ext2_filsys *fs);
+
+const char *err_shortdev;
+
+#ifdef CONFIG_JBD_DEBUG		/* Enabled by configure --enable-jbd-debug */
+int journal_enable_debug = -1;
+#endif
+
+/*
+ * ext2_file_t contains a struct inode, so we can't leave files open.
+ * Use this as a proxy instead.
+ */
+#define FUSE4FS_FILE_MAGIC	(0xEF53DEAFUL)
+struct fuse4fs_file_handle {
+	unsigned long magic;
+	ext2_ino_t ino;
+	int open_flags;
+	int check_flags;
+};
+
+enum fuse4fs_opstate {
+	F4OP_READONLY,
+	F4OP_WRITABLE,
+	F4OP_SHUTDOWN,
+};
+
+/* Main program context */
+#define FUSE4FS_MAGIC		(0xEF53DEADUL)
+struct fuse4fs {
+	unsigned long magic;
+	ext2_filsys fs;
+	pthread_mutex_t bfl;
+	char *device;
+	char *shortdev;
+
+	/* options set by fuse_opt_parse must be of type int */
+	int ro;
+	int debug;
+	int no_default_opts;
+	int errors_behavior; /* actually an enum */
+	int minixdf;
+	int fakeroot;
+	int alloc_all_blocks;
+	int norecovery;
+	int kernel;
+	int directio;
+	int acl;
+	int dirsync;
+	int unmount_in_destroy;
+	int noblkdev;
+
+	enum fuse4fs_opstate opstate;
+	int logfd;
+	int blocklog;
+	unsigned int blockmask;
+	unsigned long offset;
+	unsigned int next_generation;
+	unsigned long long cache_size;
+	char *lockfile;
+#ifdef HAVE_CLOCK_MONOTONIC
+	struct timespec lock_start_time;
+	struct timespec op_start_time;
+
+	/* options set by fuse_opt_parse must be of type int */
+	int timing;
+#endif
+};
+
+#define FUSE4FS_CHECK_HANDLE(ff, fh) \
+	do { \
+		if ((fh) == NULL || (fh)->magic != FUSE4FS_FILE_MAGIC) { \
+			fprintf(stderr, \
+				"FUSE4FS: Corrupt in-memory file handle at %s:%d!\n", \
+				__func__, __LINE__); \
+			fflush(stderr); \
+			return -EUCLEAN; \
+		} \
+	} while (0)
+
+#define __FUSE4FS_CHECK_CONTEXT(ff, retcode, shutcode) \
+	do { \
+		if ((ff) == NULL || (ff)->magic != FUSE4FS_MAGIC) { \
+			fprintf(stderr, \
+				"FUSE4FS: Corrupt in-memory data at %s:%d!\n", \
+				__func__, __LINE__); \
+			fflush(stderr); \
+			retcode; \
+		} \
+		if ((ff)->opstate == F4OP_SHUTDOWN) { \
+			shutcode; \
+		} \
+	} while (0)
+
+#define FUSE4FS_CHECK_CONTEXT(ff) \
+	__FUSE4FS_CHECK_CONTEXT((ff), return -EUCLEAN, return -EIO)
+#define FUSE4FS_CHECK_CONTEXT_RETURN(ff) \
+	__FUSE4FS_CHECK_CONTEXT((ff), return, return)
+#define FUSE4FS_CHECK_CONTEXT_ABORT(ff) \
+	__FUSE4FS_CHECK_CONTEXT((ff), abort(), abort())
+
+static int __translate_error(ext2_filsys fs, ext2_ino_t ino, errcode_t err,
+			     const char *func, int line);
+#define translate_error(fs, ino, err) __translate_error((fs), (ino), (err), \
+			__func__, __LINE__)
+
+/* for macosx */
+#ifndef W_OK
+#  define W_OK 2
+#endif
+
+#ifndef R_OK
+#  define R_OK 4
+#endif
+
+static inline int u_log2(unsigned int arg)
+{
+	int	l = 0;
+
+	arg >>= 1;
+	while (arg) {
+		l++;
+		arg >>= 1;
+	}
+	return l;
+}
+
+static inline blk64_t FUSE4FS_B_TO_FSBT(const struct fuse4fs *ff, off_t pos)
+{
+	return pos >> ff->blocklog;
+}
+
+static inline blk64_t FUSE4FS_B_TO_FSB(const struct fuse4fs *ff, off_t pos)
+{
+	return (pos + ff->blockmask) >> ff->blocklog;
+}
+
+static inline unsigned int FUSE4FS_OFF_IN_FSB(const struct fuse4fs *ff,
+					      off_t pos)
+{
+	return pos & ff->blockmask;
+}
+
+static inline off_t FUSE4FS_FSB_TO_B(const struct fuse4fs *ff, blk64_t bno)
+{
+	return bno << ff->blocklog;
+}
+
+#define EXT4_EPOCH_BITS 2
+#define EXT4_EPOCH_MASK ((1 << EXT4_EPOCH_BITS) - 1)
+#define EXT4_NSEC_MASK  (~0UL << EXT4_EPOCH_BITS)
+
+/*
+ * Extended fields will fit into an inode if the filesystem was formatted
+ * with large inodes (-I 256 or larger) and there are not currently any EAs
+ * consuming all of the available space. For new inodes we always reserve
+ * enough space for the kernel's known extended fields, but for inodes
+ * created with an old kernel this might not have been the case. None of
+ * the extended inode fields is critical for correct filesystem operation.
+ * This macro checks if a certain field fits in the inode. Note that
+ * inode-size = GOOD_OLD_INODE_SIZE + i_extra_isize
+ */
+#define EXT4_FITS_IN_INODE(ext4_inode, field)		\
+	((offsetof(typeof(*ext4_inode), field) +	\
+	  sizeof((ext4_inode)->field))			\
+	 <= ((size_t) EXT2_GOOD_OLD_INODE_SIZE +		\
+	    (ext4_inode)->i_extra_isize))		\
+
+static inline __u32 ext4_encode_extra_time(const struct timespec *time)
+{
+	__u32 extra = sizeof(time->tv_sec) > 4 ?
+			((time->tv_sec - (__s32)time->tv_sec) >> 32) &
+			EXT4_EPOCH_MASK : 0;
+	return extra | (time->tv_nsec << EXT4_EPOCH_BITS);
+}
+
+static inline void ext4_decode_extra_time(struct timespec *time, __u32 extra)
+{
+	if (sizeof(time->tv_sec) > 4 && (extra & EXT4_EPOCH_MASK)) {
+		__u64 extra_bits = extra & EXT4_EPOCH_MASK;
+		/*
+		 * Prior to kernel 3.14?, we had a broken decode function,
+		 * wherein we effectively did this:
+		 * if (extra_bits == 3)
+		 *     extra_bits = 0;
+		 */
+		time->tv_sec += extra_bits << 32;
+	}
+	time->tv_nsec = ((extra) & EXT4_NSEC_MASK) >> EXT4_EPOCH_BITS;
+}
+
+#define EXT4_CLAMP_TIMESTAMP(xtime, timespec, raw_inode)		       \
+do {									       \
+	if ((timespec)->tv_sec < EXT4_TIMESTAMP_MIN)			       \
+		(timespec)->tv_sec = EXT4_TIMESTAMP_MIN;		       \
+	if ((timespec)->tv_sec < EXT4_TIMESTAMP_MIN)			       \
+		(timespec)->tv_sec = EXT4_TIMESTAMP_MIN;		       \
+									       \
+	if (EXT4_FITS_IN_INODE(raw_inode, xtime ## _extra)) {		       \
+		if ((timespec)->tv_sec > EXT4_EXTRA_TIMESTAMP_MAX)	       \
+			(timespec)->tv_sec = EXT4_EXTRA_TIMESTAMP_MAX;	       \
+	} else {							       \
+		if ((timespec)->tv_sec > EXT4_NON_EXTRA_TIMESTAMP_MAX)	       \
+			(timespec)->tv_sec = EXT4_NON_EXTRA_TIMESTAMP_MAX;     \
+	}								       \
+} while (0)
+
+#define EXT4_INODE_SET_XTIME(xtime, timespec, raw_inode)		       \
+do {									       \
+	typeof(*(timespec)) _ts = *(timespec);				       \
+									       \
+	EXT4_CLAMP_TIMESTAMP(xtime, &_ts, raw_inode);			       \
+	(raw_inode)->xtime = _ts.tv_sec;				       \
+	if (EXT4_FITS_IN_INODE(raw_inode, xtime ## _extra))		       \
+		(raw_inode)->xtime ## _extra =				       \
+				ext4_encode_extra_time(&_ts);		       \
+} while (0)
+
+#define EXT4_EINODE_SET_XTIME(xtime, timespec, raw_inode)		       \
+do {									       \
+	typeof(*(timespec)) _ts = *(timespec);				       \
+									       \
+	EXT4_CLAMP_TIMESTAMP(xtime, &_ts, raw_inode);			       \
+	if (EXT4_FITS_IN_INODE(raw_inode, xtime))			       \
+		(raw_inode)->xtime = _ts.tv_sec;			       \
+	if (EXT4_FITS_IN_INODE(raw_inode, xtime ## _extra))		       \
+		(raw_inode)->xtime ## _extra =				       \
+				ext4_encode_extra_time(&_ts);		       \
+} while (0)
+
+#define EXT4_INODE_GET_XTIME(xtime, timespec, raw_inode)		       \
+do {									       \
+	(timespec)->tv_sec = (signed)((raw_inode)->xtime);		       \
+	if (EXT4_FITS_IN_INODE(raw_inode, xtime ## _extra))		       \
+		ext4_decode_extra_time((timespec),			       \
+				       (raw_inode)->xtime ## _extra);	       \
+	else								       \
+		(timespec)->tv_nsec = 0;				       \
+} while (0)
+
+#define EXT4_EINODE_GET_XTIME(xtime, timespec, raw_inode)		       \
+do {									       \
+	if (EXT4_FITS_IN_INODE(raw_inode, xtime))			       \
+		(timespec)->tv_sec =					       \
+			(signed)((raw_inode)->xtime);			       \
+	if (EXT4_FITS_IN_INODE(raw_inode, xtime ## _extra))		       \
+		ext4_decode_extra_time((timespec),			       \
+				       raw_inode->xtime ## _extra);	       \
+	else								       \
+		(timespec)->tv_nsec = 0;				       \
+} while (0)
+
+static inline errcode_t fuse4fs_read_inode(ext2_filsys fs, ext2_ino_t ino,
+					   struct ext2_inode_large *inode)
+{
+	memset(inode, 0, sizeof(*inode));
+	return ext2fs_read_inode_full(fs, ino, EXT2_INODE(inode),
+				      sizeof(*inode));
+}
+
+static inline errcode_t fuse4fs_write_inode(ext2_filsys fs, ext2_ino_t ino,
+					    struct ext2_inode_large *inode)
+{
+	return ext2fs_write_inode_full(fs, ino, EXT2_INODE(inode),
+				       sizeof(*inode));
+}
+
+static inline struct fuse4fs *fuse4fs_get(void)
+{
+	struct fuse_context *ctxt = fuse_get_context();
+
+	return ctxt->private_data;
+}
+
+static inline struct fuse4fs_file_handle *
+fuse4fs_get_handle(const struct fuse_file_info *fp)
+{
+	return (struct fuse4fs_file_handle *)(uintptr_t)fp->fh;
+}
+
+static inline void
+fuse4fs_set_handle(struct fuse_file_info *fp, struct fuse4fs_file_handle *fh)
+{
+	fp->fh = (uintptr_t)fh;
+}
+
+#ifdef HAVE_CLOCK_MONOTONIC
+static inline ext2_filsys fuse4fs_start(struct fuse4fs *ff)
+{
+	struct timespec lock_time;
+	int ret;
+
+	if (ff->timing)
+		clock_gettime(CLOCK_MONOTONIC, &lock_time);
+
+	pthread_mutex_lock(&ff->bfl);
+	if (ff->timing) {
+		ret = clock_gettime(CLOCK_MONOTONIC, &ff->op_start_time);
+		if (ret)
+			ff->timing = 0;
+		ff->lock_start_time = lock_time;
+	}
+	return ff->fs;
+}
+
+static inline double ms_from_timespec(const struct timespec *ts)
+{
+	return ((double)ts->tv_sec * 1000) + ((double)ts->tv_nsec / 1000000);
+}
+
+static inline void fuse4fs_finish_timing(struct fuse4fs *ff, const char *func)
+{
+	struct timespec now;
+	double lockf, startf, nowf;
+	int ret;
+
+	if (!ff->timing)
+		return;
+
+	ret = clock_gettime(CLOCK_MONOTONIC, &now);
+	if (ret) {
+		ff->timing = 0;
+		return;
+	}
+
+	lockf = ms_from_timespec(&ff->lock_start_time);
+	startf = ms_from_timespec(&ff->op_start_time);
+	nowf = ms_from_timespec(&now);
+	timing_printf(ff, "%s: lock=%.2fms elapsed=%.2fms\n", func,
+		      startf - lockf, nowf - startf);
+}
+#else
+static inline ext2_filsys fuse4fs_start(struct fuse4fs *ff)
+{
+	pthread_mutex_lock(&ff->bfl);
+	return ff->fs;
+}
+# define fuse4fs_finish_timing(...)	((void)0)
+#endif
+
+static inline void __fuse4fs_finish(struct fuse4fs *ff, int ret,
+				    const char *func)
+{
+	fuse4fs_finish_timing(ff, func);
+	if (ret)
+		dbg_printf(ff, "%s: libfuse ret=%d\n", func, ret);
+	pthread_mutex_unlock(&ff->bfl);
+}
+#define fuse4fs_finish(ff, ret) __fuse4fs_finish((ff), (ret), __func__)
+
+static void get_now(struct timespec *now)
+{
+#ifdef CLOCK_REALTIME
+	if (!clock_gettime(CLOCK_REALTIME, now))
+		return;
+#endif
+
+	now->tv_sec = time(NULL);
+	now->tv_nsec = 0;
+}
+
+static void increment_version(struct ext2_inode_large *inode)
+{
+	__u64 ver;
+
+	ver = inode->osd1.linux1.l_i_version;
+	if (EXT4_FITS_IN_INODE(inode, i_version_hi))
+		ver |= (__u64)inode->i_version_hi << 32;
+	ver++;
+	inode->osd1.linux1.l_i_version = ver;
+	if (EXT4_FITS_IN_INODE(inode, i_version_hi))
+		inode->i_version_hi = ver >> 32;
+}
+
+static void init_times(struct ext2_inode_large *inode)
+{
+	struct timespec now;
+
+	get_now(&now);
+	EXT4_INODE_SET_XTIME(i_atime, &now, inode);
+	EXT4_INODE_SET_XTIME(i_ctime, &now, inode);
+	EXT4_INODE_SET_XTIME(i_mtime, &now, inode);
+	EXT4_EINODE_SET_XTIME(i_crtime, &now, inode);
+	increment_version(inode);
+}
+
+static int update_ctime(ext2_filsys fs, ext2_ino_t ino,
+			struct ext2_inode_large *pinode)
+{
+	errcode_t err;
+	struct timespec now;
+	struct ext2_inode_large inode;
+
+	get_now(&now);
+
+	/* If user already has a inode buffer, just update that */
+	if (pinode) {
+		increment_version(pinode);
+		EXT4_INODE_SET_XTIME(i_ctime, &now, pinode);
+		return 0;
+	}
+
+	/* Otherwise we have to read-modify-write the inode */
+	err = fuse4fs_read_inode(fs, ino, &inode);
+	if (err)
+		return translate_error(fs, ino, err);
+
+	increment_version(&inode);
+	EXT4_INODE_SET_XTIME(i_ctime, &now, &inode);
+
+	err = fuse4fs_write_inode(fs, ino, &inode);
+	if (err)
+		return translate_error(fs, ino, err);
+
+	return 0;
+}
+
+static int update_atime(ext2_filsys fs, ext2_ino_t ino)
+{
+	errcode_t err;
+	struct ext2_inode_large inode, *pinode;
+	struct timespec atime, mtime, now;
+	double datime, dmtime, dnow;
+
+	err = fuse4fs_read_inode(fs, ino, &inode);
+	if (err)
+		return translate_error(fs, ino, err);
+
+	pinode = &inode;
+	EXT4_INODE_GET_XTIME(i_atime, &atime, pinode);
+	EXT4_INODE_GET_XTIME(i_mtime, &mtime, pinode);
+	get_now(&now);
+
+	datime = atime.tv_sec + ((double)atime.tv_nsec / 1000000000);
+	dmtime = mtime.tv_sec + ((double)mtime.tv_nsec / 1000000000);
+	dnow = now.tv_sec + ((double)now.tv_nsec / 1000000000);
+
+	/*
+	 * If atime is newer than mtime and atime hasn't been updated in thirty
+	 * seconds, skip the atime update.  Same idea as Linux "relatime".  Use
+	 * doubles to account for nanosecond resolution.
+	 */
+	if (datime >= dmtime && datime >= dnow - 30)
+		return 0;
+	EXT4_INODE_SET_XTIME(i_atime, &now, &inode);
+
+	err = fuse4fs_write_inode(fs, ino, &inode);
+	if (err)
+		return translate_error(fs, ino, err);
+
+	return 0;
+}
+
+static int update_mtime(ext2_filsys fs, ext2_ino_t ino,
+			struct ext2_inode_large *pinode)
+{
+	errcode_t err;
+	struct ext2_inode_large inode;
+	struct timespec now;
+
+	if (pinode) {
+		get_now(&now);
+		EXT4_INODE_SET_XTIME(i_mtime, &now, pinode);
+		EXT4_INODE_SET_XTIME(i_ctime, &now, pinode);
+		increment_version(pinode);
+		return 0;
+	}
+
+	err = fuse4fs_read_inode(fs, ino, &inode);
+	if (err)
+		return translate_error(fs, ino, err);
+
+	get_now(&now);
+	EXT4_INODE_SET_XTIME(i_mtime, &now, &inode);
+	EXT4_INODE_SET_XTIME(i_ctime, &now, &inode);
+	increment_version(&inode);
+
+	err = fuse4fs_write_inode(fs, ino, &inode);
+	if (err)
+		return translate_error(fs, ino, err);
+
+	return 0;
+}
+
+static int ext2_file_type(unsigned int mode)
+{
+	if (LINUX_S_ISREG(mode))
+		return EXT2_FT_REG_FILE;
+
+	if (LINUX_S_ISDIR(mode))
+		return EXT2_FT_DIR;
+
+	if (LINUX_S_ISCHR(mode))
+		return EXT2_FT_CHRDEV;
+
+	if (LINUX_S_ISBLK(mode))
+		return EXT2_FT_BLKDEV;
+
+	if (LINUX_S_ISLNK(mode))
+		return EXT2_FT_SYMLINK;
+
+	if (LINUX_S_ISFIFO(mode))
+		return EXT2_FT_FIFO;
+
+	if (LINUX_S_ISSOCK(mode))
+		return EXT2_FT_SOCK;
+
+	return 0;
+}
+
+static int fs_can_allocate(struct fuse4fs *ff, blk64_t num)
+{
+	ext2_filsys fs = ff->fs;
+	blk64_t reserved;
+
+	dbg_printf(ff, "%s: Asking for %llu; alloc_all=%d total=%llu free=%llu "
+		   "rsvd=%llu\n", __func__, num, ff->alloc_all_blocks,
+		   ext2fs_blocks_count(fs->super),
+		   ext2fs_free_blocks_count(fs->super),
+		   ext2fs_r_blocks_count(fs->super));
+	if (num > ext2fs_blocks_count(fs->super))
+		return 0;
+
+	if (ff->alloc_all_blocks)
+		return 1;
+
+	/*
+	 * Different meaning for r_blocks -- libext2fs has bugs where the FS
+	 * can get corrupted if it totally runs out of blocks.  Avoid this
+	 * by refusing to allocate any of the reserve blocks to anybody.
+	 */
+	reserved = ext2fs_r_blocks_count(fs->super);
+	if (reserved == 0)
+		reserved = ext2fs_blocks_count(fs->super) / 10;
+	return ext2fs_free_blocks_count(fs->super) > reserved + num;
+}
+
+static int fuse4fs_is_writeable(struct fuse4fs *ff)
+{
+	return ff->opstate == F4OP_WRITABLE &&
+		(ff->fs->super->s_error_count == 0);
+}
+
+static inline int is_superuser(struct fuse4fs *ff, struct fuse_context *ctxt)
+{
+	if (ff->fakeroot)
+		return 1;
+	return ctxt->uid == 0;
+}
+
+static inline int want_check_owner(struct fuse4fs *ff,
+				   struct fuse_context *ctxt)
+{
+	/*
+	 * The kernel is responsible for access control, so we allow anything
+	 * that the superuser can do.
+	 */
+	if (ff->kernel)
+		return 0;
+	return !is_superuser(ff, ctxt);
+}
+
+/* Test for append permission */
+#define A_OK	16
+
+static int check_iflags_access(struct fuse4fs *ff, ext2_ino_t ino,
+			       const struct ext2_inode *inode, int mask)
+{
+	EXT2FS_BUILD_BUG_ON((A_OK & (R_OK | W_OK | X_OK | F_OK)) != 0);
+
+	/* no writing or metadata changes to read-only or broken fs */
+	if ((mask & (W_OK | A_OK)) && !fuse4fs_is_writeable(ff))
+		return -EROFS;
+
+	dbg_printf(ff, "access ino=%d mask=e%s%s%s%s iflags=0x%x\n",
+		   ino,
+		   (mask & R_OK ? "r" : ""),
+		   (mask & W_OK ? "w" : ""),
+		   (mask & X_OK ? "x" : ""),
+		   (mask & A_OK ? "a" : ""),
+		   inode->i_flags);
+
+	/* is immutable? */
+	if ((mask & W_OK) &&
+	    (inode->i_flags & EXT2_IMMUTABLE_FL))
+		return -EPERM;
+
+	/* is append-only? */
+	if ((inode->i_flags & EXT2_APPEND_FL) && (mask & W_OK) && !(mask & A_OK))
+		return -EPERM;
+
+	return 0;
+}
+
+static int check_inum_access(struct fuse4fs *ff, ext2_ino_t ino, int mask)
+{
+	struct fuse_context *ctxt = fuse_get_context();
+	ext2_filsys fs = ff->fs;
+	struct ext2_inode inode;
+	mode_t perms;
+	errcode_t err;
+	int ret;
+
+	/* no writing to read-only or broken fs */
+	if ((mask & (W_OK | A_OK)) && !fuse4fs_is_writeable(ff))
+		return -EROFS;
+
+	err = ext2fs_read_inode(fs, ino, &inode);
+	if (err)
+		return translate_error(fs, ino, err);
+	perms = inode.i_mode & 0777;
+
+	dbg_printf(ff, "access ino=%d mask=e%s%s%s%s perms=0%o iflags=0x%x "
+		   "fuid=%d fgid=%d uid=%d gid=%d\n", ino,
+		   (mask & R_OK ? "r" : ""),
+		   (mask & W_OK ? "w" : ""),
+		   (mask & X_OK ? "x" : ""),
+		   (mask & A_OK ? "a" : ""),
+		   perms, inode.i_flags,
+		   inode_uid(inode), inode_gid(inode),
+		   ctxt->uid, ctxt->gid);
+
+	/* existence check */
+	if (mask == 0)
+		return 0;
+
+	ret = check_iflags_access(ff, ino, &inode, mask);
+	if (ret)
+		return ret;
+
+	/* If kernel is responsible for mode and acl checks, we're done. */
+	if (ff->kernel)
+		return 0;
+
+	/* Figure out what root's allowed to do */
+	if (is_superuser(ff, ctxt)) {
+		/* Non-file access always ok */
+		if (!LINUX_S_ISREG(inode.i_mode))
+			return 0;
+
+		/* R/W access to a file always ok */
+		if (!(mask & X_OK))
+			return 0;
+
+		/* X access to a file ok if a user/group/other can X */
+		if (perms & 0111)
+			return 0;
+
+		/* Trying to execute a file that's not executable. BZZT! */
+		return -EACCES;
+	}
+
+	/* Remove the O_APPEND flag before testing permissions */
+	mask &= ~A_OK;
+
+	/* allow owner, if perms match */
+	if (inode_uid(inode) == ctxt->uid) {
+		if ((mask & (perms >> 6)) == mask)
+			return 0;
+		return -EACCES;
+	}
+
+	/* allow group, if perms match */
+	if (inode_gid(inode) == ctxt->gid) {
+		if ((mask & (perms >> 3)) == mask)
+			return 0;
+		return -EACCES;
+	}
+
+	/* otherwise check other */
+	if ((mask & perms) == mask)
+		return 0;
+	return -EACCES;
+}
+
+static errcode_t fuse4fs_acquire_lockfile(struct fuse4fs *ff)
+{
+	char *resolved;
+	int lockfd;
+	errcode_t err;
+
+	lockfd = open(ff->lockfile, O_RDWR | O_CREAT | O_EXCL, 0400);
+	if (lockfd < 0) {
+		if (errno == EEXIST)
+			err = EWOULDBLOCK;
+		else
+			err = errno;
+		err_printf(ff, "%s: %s: %s\n", ff->lockfile,
+			   _("opening lockfile failed"),
+			   strerror(err));
+		ff->lockfile = NULL;
+		return err;
+	}
+	close(lockfd);
+
+	resolved = realpath(ff->lockfile, NULL);
+	if (!resolved) {
+		err = errno;
+		err_printf(ff, "%s: %s: %s\n", ff->lockfile,
+			   _("resolving lockfile failed"),
+			   strerror(err));
+		unlink(ff->lockfile);
+		ff->lockfile = NULL;
+		return err;
+	}
+	free(ff->lockfile);
+	ff->lockfile = resolved;
+
+	return 0;
+}
+
+static void fuse4fs_release_lockfile(struct fuse4fs *ff)
+{
+	if (unlink(ff->lockfile)) {
+		errcode_t err = errno;
+
+		err_printf(ff, "%s: %s: %s\n", ff->lockfile,
+			   _("removing lockfile failed"),
+			   strerror(err));
+	}
+	free(ff->lockfile);
+}
+
+static void fuse4fs_unmount(struct fuse4fs *ff)
+{
+	errcode_t err;
+
+	if (!ff->fs)
+		return;
+
+	err = ext2fs_close(ff->fs);
+	if (err) {
+		err_printf(ff, "%s: %s\n", _("while closing fs"),
+			   error_message(err));
+		ext2fs_free(ff->fs);
+	}
+	ff->fs = NULL;
+
+	if (ff->lockfile)
+		fuse4fs_release_lockfile(ff);
+}
+
+static errcode_t fuse4fs_open(struct fuse4fs *ff, int libext2_flags)
+{
+	char options[128];
+	int flags = EXT2_FLAG_64BITS | EXT2_FLAG_THREADS | EXT2_FLAG_RW |
+		    libext2_flags;
+	errcode_t err;
+
+	if (ff->lockfile) {
+		err = fuse4fs_acquire_lockfile(ff);
+		if (err)
+			return err;
+	}
+
+	snprintf(options, sizeof(options) - 1, "offset=%lu", ff->offset);
+	ff->opstate = F4OP_READONLY;
+
+	if (ff->directio)
+		flags |= EXT2_FLAG_DIRECT_IO;
+
+	err = ext2fs_open2(ff->device, options, flags, 0, 0, unix_io_manager,
+			   &ff->fs);
+	if (err == EPERM) {
+		err_printf(ff, "%s.\n",
+			   _("read-only device, trying to mount norecovery"));
+		flags &= ~EXT2_FLAG_RW;
+		ff->ro = 1;
+		ff->norecovery = 1;
+		err = ext2fs_open2(ff->device, options, flags, 0, 0,
+				   unix_io_manager, &ff->fs);
+	}
+	if (err) {
+		err_printf(ff, "%s.\n", error_message(err));
+		err_printf(ff, "%s\n", _("Please run e2fsck -fy."));
+		return err;
+	}
+
+	ff->fs->priv_data = ff;
+	ff->blocklog = u_log2(ff->fs->blocksize);
+	ff->blockmask = ff->fs->blocksize - 1;
+	return 0;
+}
+
+static inline bool fuse4fs_on_bdev(const struct fuse4fs *ff)
+{
+	return ff->fs->io->flags & CHANNEL_FLAGS_BLOCK_DEVICE;
+}
+
+static errcode_t fuse4fs_config_cache(struct fuse4fs *ff)
+{
+	char buf[128];
+	errcode_t err;
+
+	snprintf(buf, sizeof(buf), "cache_blocks=%llu",
+		 FUSE4FS_B_TO_FSBT(ff, ff->cache_size));
+	err = io_channel_set_options(ff->fs->io, buf);
+	if (err) {
+		err_printf(ff, "%s %lluk: %s\n",
+			   _("cannot set disk cache size to"),
+			   ff->cache_size >> 10,
+			   error_message(err));
+		return err;
+	}
+
+	return 0;
+}
+
+static errcode_t fuse4fs_check_support(struct fuse4fs *ff)
+{
+	ext2_filsys fs = ff->fs;
+
+	if (ext2fs_has_feature_quota(fs->super)) {
+		err_printf(ff, "%s\n", _("quotas not supported."));
+		return EXT2_ET_UNSUPP_FEATURE;
+	}
+	if (ext2fs_has_feature_verity(fs->super)) {
+		err_printf(ff, "%s\n", _("verity not supported."));
+		return EXT2_ET_UNSUPP_FEATURE;
+	}
+	if (ext2fs_has_feature_encrypt(fs->super)) {
+		err_printf(ff, "%s\n", _("encryption not supported."));
+		return EXT2_ET_UNSUPP_FEATURE;
+	}
+	if (ext2fs_has_feature_casefold(fs->super)) {
+		err_printf(ff, "%s\n", _("casefolding not supported."));
+		return EXT2_ET_UNSUPP_FEATURE;
+	}
+
+	if (fs->super->s_state & EXT2_ERROR_FS) {
+		err_printf(ff, "%s\n",
+ _("Errors detected; running e2fsck is required."));
+		return EXT2_ET_FILESYSTEM_CORRUPTED;
+	}
+
+	return 0;
+}
+
+static int fuse4fs_check_norecovery(struct fuse4fs *ff)
+{
+	if (ext2fs_has_feature_journal_needs_recovery(ff->fs->super) &&
+	    !ff->ro) {
+		log_printf(ff, "%s\n",
+ _("Required journal recovery suppressed and not mounted read-only."));
+		return 32;
+	}
+
+	/*
+	 * Amazingly, norecovery allows a rw mount when there's a clean journal
+	 * present.
+	 */
+	return 0;
+}
+
+static int fuse4fs_mount(struct fuse4fs *ff)
+{
+	struct ext2_inode_large inode;
+	ext2_filsys fs = ff->fs;
+	errcode_t err;
+
+	if (ext2fs_has_feature_journal_needs_recovery(fs->super)) {
+		if (ff->norecovery) {
+			log_printf(ff, "%s\n",
+ _("Mounting read-only without recovering journal."));
+		} else {
+			log_printf(ff, "%s\n", _("Recovering journal."));
+			err = ext2fs_run_ext3_journal(&ff->fs);
+			if (err) {
+				err_printf(ff, "%s.\n", error_message(err));
+				err_printf(ff, "%s\n",
+						_("Please run e2fsck -fy."));
+				return translate_error(fs, 0, err);
+			}
+			fs = ff->fs;
+			ext2fs_clear_feature_journal_needs_recovery(fs->super);
+			ext2fs_mark_super_dirty(fs);
+
+			err = fuse4fs_check_support(ff);
+			if (err)
+				return err;
+		}
+	}
+
+	/* Make sure the root directory is readable. */
+	err = fuse4fs_read_inode(fs, EXT2_ROOT_INO, &inode);
+	if (err)
+		return translate_error(fs, EXT2_ROOT_INO, err);
+
+	if (fs->flags & EXT2_FLAG_RW) {
+		if (ext2fs_has_feature_journal(fs->super))
+			log_printf(ff, "%s",
+ _("Warning: fuse4fs does not support using the journal.\n"
+   "There may be file system corruption or data loss if\n"
+   "the file system is not gracefully unmounted.\n"));
+		ff->opstate = F4OP_WRITABLE;
+	}
+
+	if (!(fs->super->s_state & EXT2_VALID_FS))
+		err_printf(ff, "%s\n",
+ _("Warning: Mounting unchecked fs, running e2fsck is recommended."));
+	if (fs->super->s_max_mnt_count > 0 &&
+	    fs->super->s_mnt_count >= fs->super->s_max_mnt_count)
+		err_printf(ff, "%s\n",
+ _("Warning: Maximal mount count reached, running e2fsck is recommended."));
+	if (fs->super->s_checkinterval > 0 &&
+	    (time_t) (fs->super->s_lastcheck +
+		      fs->super->s_checkinterval) <= time(0))
+		err_printf(ff, "%s\n",
+ _("Warning: Check time reached; running e2fsck is recommended."));
+	if (fs->super->s_last_orphan)
+		err_printf(ff, "%s\n",
+ _("Orphans detected; running e2fsck is recommended."));
+
+	if (!ff->errors_behavior)
+		ff->errors_behavior = fs->super->s_errors;
+
+	/* Clear the valid flag so that an unclean shutdown forces a fsck */
+	if (ff->opstate == F4OP_WRITABLE) {
+		fs->super->s_mnt_count++;
+		ext2fs_set_tstamp(fs->super, s_mtime, time(NULL));
+		fs->super->s_state &= ~EXT2_VALID_FS;
+		ext2fs_mark_super_dirty(fs);
+		err = ext2fs_flush2(fs, 0);
+		if (err)
+			return translate_error(fs, 0, err);
+	}
+
+	return 0;
+}
+
+static void op_destroy(void *p EXT2FS_ATTR((unused)))
+{
+	struct fuse4fs *ff = fuse4fs_get();
+	ext2_filsys fs;
+	errcode_t err;
+
+	FUSE4FS_CHECK_CONTEXT_RETURN(ff);
+
+	fs = fuse4fs_start(ff);
+
+	dbg_printf(ff, "%s: dev=%s\n", __func__, fs->device_name);
+	if (ff->opstate == F4OP_WRITABLE) {
+		fs->super->s_state |= EXT2_VALID_FS;
+		if (fs->super->s_error_count)
+			fs->super->s_state |= EXT2_ERROR_FS;
+		ext2fs_mark_super_dirty(fs);
+		err = ext2fs_set_gdt_csum(fs);
+		if (err)
+			translate_error(fs, 0, err);
+
+		err = ext2fs_flush2(fs, 0);
+		if (err)
+			translate_error(fs, 0, err);
+	}
+
+	if (ff->debug && fs->io->manager->get_stats) {
+		io_stats stats = NULL;
+
+		fs->io->manager->get_stats(fs->io, &stats);
+		dbg_printf(ff, "read: %lluk\n",  stats->bytes_read >> 10);
+		dbg_printf(ff, "write: %lluk\n", stats->bytes_written >> 10);
+		dbg_printf(ff, "hits: %llu\n",   stats->cache_hits);
+		dbg_printf(ff, "misses: %llu\n", stats->cache_misses);
+		dbg_printf(ff, "hit_ratio: %.1f%%\n",
+				(100.0 * stats->cache_hits) /
+				(stats->cache_hits + stats->cache_misses));
+	}
+
+	if (ff->kernel) {
+		char uuid[UUID_STR_SIZE];
+
+		uuid_unparse(fs->super->s_uuid, uuid);
+		log_printf(ff, "%s %s.\n", _("unmounting filesystem"), uuid);
+	}
+
+	if (ff->unmount_in_destroy)
+		fuse4fs_unmount(ff);
+
+	fuse4fs_finish(ff, 0);
+}
+
+/* Reopen @stream with @fileno */
+static int fuse4fs_freopen_stream(const char *path, int fileno, FILE *stream)
+{
+	char _fdpath[256];
+	const char *fdpath;
+	FILE *fp;
+	int ret;
+
+	ret = snprintf(_fdpath, sizeof(_fdpath), "/dev/fd/%d", fileno);
+	if (ret >= sizeof(_fdpath))
+		fdpath = path;
+	else
+		fdpath = _fdpath;
+
+	/*
+	 * C23 defines std{out,err} as an expression of type FILE* that need
+	 * not be an lvalue.  What this means is that we can't just assign to
+	 * stdout: we have to use freopen, which takes a path.
+	 *
+	 * There's no guarantee that the OS provides a /dev/fd/X alias for open
+	 * file descriptors, so if that fails, fall back to the original log
+	 * file path.  We'd rather not do a path-based reopen because that
+	 * exposes us to rename race attacks.
+	 */
+	fp = freopen(fdpath, "a", stream);
+	if (!fp && errno == ENOENT && fdpath == _fdpath)
+		fp = freopen(path, "a", stream);
+	if (!fp) {
+		perror(fdpath);
+		return -1;
+	}
+
+	return 0;
+}
+
+/* Redirect stdout/stderr to a file, or return a mount-compatible error. */
+static int fuse4fs_capture_output(struct fuse4fs *ff, const char *path)
+{
+	int ret;
+	int fd;
+
+	/*
+	 * First, open the log file path with system calls so that we can
+	 * redirect the stdout/stderr file numbers (typically 1 and 2) to our
+	 * logfile descriptor.  We'd like to avoid allocating extra file
+	 * objects in the kernel if we can because pos will be the same between
+	 * stdout and stderr.
+	 */
+	if (ff->logfd < 0) {
+		fd = open(path, O_WRONLY | O_CREAT | O_APPEND, 0600);
+		if (fd < 0) {
+			perror(path);
+			return -1;
+		}
+
+		/*
+		 * Save the newly opened fd in case we have to do this again in
+		 * op_init.
+		 */
+		ff->logfd = fd;
+	}
+
+	ret = dup2(ff->logfd, STDOUT_FILENO);
+	if (ret < 0) {
+		perror(path);
+		return -1;
+	}
+
+	ret = dup2(ff->logfd, STDERR_FILENO);
+	if (ret < 0) {
+		perror(path);
+		return -1;
+	}
+
+	/*
+	 * Now that we've changed STD{OUT,ERR}_FILENO to be the log file, use
+	 * freopen to make sure that std{out,err} (the C library abstractions)
+	 * point to the STDXXX_FILENO because any of our library dependencies
+	 * might decide to printf to one of those streams and we want to
+	 * capture all output in the log.
+	 */
+	ret = fuse4fs_freopen_stream(path, STDOUT_FILENO, stdout);
+	if (ret)
+		return ret;
+	ret = fuse4fs_freopen_stream(path, STDERR_FILENO, stderr);
+	if (ret)
+		return ret;
+
+	return 0;
+}
+
+/* Set up debug and error logging files */
+static int fuse4fs_setup_logging(struct fuse4fs *ff)
+{
+	char *logfile = getenv("FUSE4FS_LOGFILE");
+	if (logfile)
+		return fuse4fs_capture_output(ff, logfile);
+
+	/* in kernel mode, try to log errors to the kernel log */
+	if (ff->kernel)
+		fuse4fs_capture_output(ff, "/dev/ttyprintk");
+
+	return 0;
+}
+
+static int fuse4fs_read_bitmaps(struct fuse4fs *ff)
+{
+	errcode_t err;
+
+	err = ext2fs_read_inode_bitmap(ff->fs);
+	if (err)
+		return translate_error(ff->fs, 0, err);
+
+	err = ext2fs_read_block_bitmap(ff->fs);
+	if (err)
+		return translate_error(ff->fs, 0, err);
+
+	return 0;
+}
+
+#if FUSE_VERSION < FUSE_MAKE_VERSION(3, 17)
+static inline int fuse_set_feature_flag(struct fuse_conn_info *conn,
+					 uint64_t flag)
+{
+	if (conn->capable & flag) {
+		conn->want |= flag;
+		return 1;
+	}
+
+	return 0;
+}
+#endif
+
+static void *op_init(struct fuse_conn_info *conn,
+		     struct fuse_config *cfg EXT2FS_ATTR((unused)))
+{
+	struct fuse4fs *ff = fuse4fs_get();
+	ext2_filsys fs;
+
+	FUSE4FS_CHECK_CONTEXT_ABORT(ff);
+
+	/*
+	 * Configure logging a second time, because libfuse might have
+	 * redirected std{out,err} as part of daemonization.  If this fails,
+	 * give up and move on.
+	 */
+	fuse4fs_setup_logging(ff);
+	if (ff->logfd >= 0)
+		close(ff->logfd);
+	ff->logfd = -1;
+
+	fs = ff->fs;
+	dbg_printf(ff, "%s: dev=%s\n", __func__, fs->device_name);
+#ifdef FUSE_CAP_IOCTL_DIR
+	fuse_set_feature_flag(conn, FUSE_CAP_IOCTL_DIR);
+#endif
+#ifdef FUSE_CAP_POSIX_ACL
+	if (ff->acl)
+		fuse_set_feature_flag(conn, FUSE_CAP_POSIX_ACL);
+#endif
+#ifdef FUSE_CAP_CACHE_SYMLINKS
+	fuse_set_feature_flag(conn, FUSE_CAP_CACHE_SYMLINKS);
+#endif
+#ifdef FUSE_CAP_NO_EXPORT_SUPPORT
+	fuse_set_feature_flag(conn, FUSE_CAP_NO_EXPORT_SUPPORT);
+#endif
+	conn->time_gran = 1;
+	cfg->use_ino = 1;
+	if (ff->debug)
+		cfg->debug = 1;
+	cfg->nullpath_ok = 1;
+
+	if (ff->kernel) {
+		char uuid[UUID_STR_SIZE];
+
+		uuid_unparse(fs->super->s_uuid, uuid);
+		log_printf(ff, "%s %s.\n", _("mounted filesystem"), uuid);
+	}
+
+	if (ff->opstate == F4OP_WRITABLE)
+		fuse4fs_read_bitmaps(ff);
+
+#if FUSE_VERSION >= FUSE_MAKE_VERSION(3, 17)
+	/*
+	 * THIS MUST GO LAST!
+	 *
+	 * fuse_set_feature_flag in 3.17.0 has a strange bug: it sets feature
+	 * flags in conn->want_ext, but not conn->want.  Upon return to
+	 * libfuse, the lower level library observes that want and want_ext
+	 * have gotten out of sync, and refuses to mount.  Therefore,
+	 * synchronize the two.  This bug went away in 3.17.3, but we're stuck
+	 * with this forever because Debian trixie released with 3.17.2.
+	 */
+	conn->want = conn->want_ext & 0xFFFFFFFF;
+#endif
+	return ff;
+}
+
+static int stat_inode(ext2_filsys fs, ext2_ino_t ino, struct stat *statbuf)
+{
+	struct ext2_inode_large inode;
+	dev_t fakedev = 0;
+	errcode_t err;
+	int ret = 0;
+	struct timespec tv;
+
+	err = fuse4fs_read_inode(fs, ino, &inode);
+	if (err)
+		return translate_error(fs, ino, err);
+
+	memcpy(&fakedev, fs->super->s_uuid, sizeof(fakedev));
+	statbuf->st_dev = fakedev;
+	statbuf->st_ino = ino;
+	statbuf->st_mode = inode.i_mode;
+	statbuf->st_nlink = inode.i_links_count;
+	statbuf->st_uid = inode_uid(inode);
+	statbuf->st_gid = inode_gid(inode);
+	statbuf->st_size = EXT2_I_SIZE(&inode);
+	statbuf->st_blksize = fs->blocksize;
+	statbuf->st_blocks = ext2fs_get_stat_i_blocks(fs,
+						EXT2_INODE(&inode));
+	EXT4_INODE_GET_XTIME(i_atime, &tv, &inode);
+#if HAVE_STRUCT_STAT_ST_ATIM
+	statbuf->st_atim = tv;
+#else
+	statbuf->st_atime = tv.tv_sec;
+#endif
+	EXT4_INODE_GET_XTIME(i_mtime, &tv, &inode);
+#if HAVE_STRUCT_STAT_ST_ATIM
+	statbuf->st_mtim = tv;
+#else
+	statbuf->st_mtime = tv.tv_sec;
+#endif
+	EXT4_INODE_GET_XTIME(i_ctime, &tv, &inode);
+#if HAVE_STRUCT_STAT_ST_ATIM
+	statbuf->st_ctim = tv;
+#else
+	statbuf->st_ctime = tv.tv_sec;
+#endif
+	if (LINUX_S_ISCHR(inode.i_mode) ||
+	    LINUX_S_ISBLK(inode.i_mode)) {
+		if (inode.i_block[0])
+			statbuf->st_rdev = inode.i_block[0];
+		else
+			statbuf->st_rdev = inode.i_block[1];
+	}
+
+	return ret;
+}
+
+static int __fuse4fs_file_ino(struct fuse4fs *ff, const char *path,
+			      struct fuse_file_info *fp EXT2FS_ATTR((unused)),
+			      ext2_ino_t *inop,
+			      const char *func,
+			      int line)
+{
+	ext2_filsys fs = ff->fs;
+	errcode_t err;
+
+	if (fp) {
+		struct fuse4fs_file_handle *fh = fuse4fs_get_handle(fp);
+
+		if (fh->ino == 0)
+			return -ESTALE;
+
+		*inop = fh->ino;
+		dbg_printf(ff, "%s: get ino=%d\n", func, fh->ino);
+		return 0;
+	}
+
+	dbg_printf(ff, "%s: get path=%s\n", func, path);
+	err = ext2fs_namei(fs, EXT2_ROOT_INO, EXT2_ROOT_INO, path, inop);
+	if (err)
+		return __translate_error(fs, 0, err, func, line);
+
+	return 0;
+}
+
+# define fuse4fs_file_ino(ff, path, fp, inop) \
+	__fuse4fs_file_ino((ff), (path), (fp), (inop), __func__, __LINE__)
+
+static int op_getattr(const char *path, struct stat *statbuf,
+		      struct fuse_file_info *fi)
+{
+	struct fuse4fs *ff = fuse4fs_get();
+	ext2_filsys fs;
+	ext2_ino_t ino;
+	int ret = 0;
+
+	FUSE4FS_CHECK_CONTEXT(ff);
+	fs = fuse4fs_start(ff);
+	ret = fuse4fs_file_ino(ff, path, fi, &ino);
+	if (ret)
+		goto out;
+	ret = stat_inode(fs, ino, statbuf);
+out:
+	fuse4fs_finish(ff, ret);
+	return ret;
+}
+
+static int op_readlink(const char *path, char *buf, size_t len)
+{
+	struct fuse4fs *ff = fuse4fs_get();
+	ext2_filsys fs;
+	errcode_t err;
+	ext2_ino_t ino;
+	struct ext2_inode inode;
+	unsigned int got;
+	ext2_file_t file;
+	int ret = 0;
+
+	FUSE4FS_CHECK_CONTEXT(ff);
+	dbg_printf(ff, "%s: path=%s\n", __func__, path);
+	fs = fuse4fs_start(ff);
+	err = ext2fs_namei(fs, EXT2_ROOT_INO, EXT2_ROOT_INO, path, &ino);
+	if (err || ino == 0) {
+		ret = translate_error(fs, 0, err);
+		goto out;
+	}
+
+	err = ext2fs_read_inode(fs, ino, &inode);
+	if (err) {
+		ret = translate_error(fs, ino, err);
+		goto out;
+	}
+
+	if (!LINUX_S_ISLNK(inode.i_mode)) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	len--;
+	if (inode.i_size < len)
+		len = inode.i_size;
+	if (ext2fs_is_fast_symlink(&inode))
+		memcpy(buf, (char *)inode.i_block, len);
+	else {
+		/* big/inline symlink */
+
+		err = ext2fs_file_open(fs, ino, 0, &file);
+		if (err) {
+			ret = translate_error(fs, ino, err);
+			goto out;
+		}
+
+		err = ext2fs_file_read(file, buf, len, &got);
+		if (err)
+			ret = translate_error(fs, ino, err);
+		else if (got != len)
+			ret = translate_error(fs, ino, EXT2_ET_INODE_CORRUPTED);
+
+		err = ext2fs_file_close(file);
+		if (ret)
+			goto out;
+		if (err) {
+			ret = translate_error(fs, ino, err);
+			goto out;
+		}
+	}
+	buf[len] = 0;
+
+	if (fuse4fs_is_writeable(ff)) {
+		ret = update_atime(fs, ino);
+		if (ret)
+			goto out;
+	}
+
+out:
+	fuse4fs_finish(ff, ret);
+	return ret;
+}
+
+static int __getxattr(struct fuse4fs *ff, ext2_ino_t ino, const char *name,
+		      void **value, size_t *value_len)
+{
+	ext2_filsys fs = ff->fs;
+	struct ext2_xattr_handle *h;
+	errcode_t err;
+	int ret = 0;
+
+	err = ext2fs_xattrs_open(fs, ino, &h);
+	if (err)
+		return translate_error(fs, ino, err);
+
+	err = ext2fs_xattrs_read(h);
+	if (err) {
+		ret = translate_error(fs, ino, err);
+		goto out_close;
+	}
+
+	err = ext2fs_xattr_get(h, name, value, value_len);
+	if (err) {
+		ret = translate_error(fs, ino, err);
+		goto out_close;
+	}
+
+out_close:
+	err = ext2fs_xattrs_close(&h);
+	if (err && !ret)
+		ret = translate_error(fs, ino, err);
+	return ret;
+}
+
+static int __setxattr(struct fuse4fs *ff, ext2_ino_t ino, const char *name,
+		      void *value, size_t valuelen)
+{
+	ext2_filsys fs = ff->fs;
+	struct ext2_xattr_handle *h;
+	errcode_t err;
+	int ret = 0;
+
+	err = ext2fs_xattrs_open(fs, ino, &h);
+	if (err)
+		return translate_error(fs, ino, err);
+
+	err = ext2fs_xattrs_read(h);
+	if (err) {
+		ret = translate_error(fs, ino, err);
+		goto out_close;
+	}
+
+	err = ext2fs_xattr_set(h, name, value, valuelen);
+	if (err) {
+		ret = translate_error(fs, ino, err);
+		goto out_close;
+	}
+
+out_close:
+	err = ext2fs_xattrs_close(&h);
+	if (err && !ret)
+		ret = translate_error(fs, ino, err);
+	return ret;
+}
+
+static int propagate_default_acls(struct fuse4fs *ff, ext2_ino_t parent,
+				  ext2_ino_t child, mode_t mode)
+{
+	void *def;
+	size_t deflen;
+	int ret;
+
+	if (!ff->acl || S_ISDIR(mode))
+		return 0;
+
+	ret = __getxattr(ff, parent, XATTR_NAME_POSIX_ACL_DEFAULT, &def,
+			 &deflen);
+	switch (ret) {
+	case -ENODATA:
+	case -ENOENT:
+		/* no default acl */
+		return 0;
+	case 0:
+		break;
+	default:
+		return ret;
+	}
+
+	ret = __setxattr(ff, child, XATTR_NAME_POSIX_ACL_DEFAULT, def, deflen);
+	ext2fs_free_mem(&def);
+	return ret;
+}
+
+static inline void fuse4fs_set_uid(struct ext2_inode_large *inode, uid_t uid)
+{
+	inode->i_uid = uid;
+	ext2fs_set_i_uid_high(*inode, uid >> 16);
+}
+
+static inline void fuse4fs_set_gid(struct ext2_inode_large *inode, gid_t gid)
+{
+	inode->i_gid = gid;
+	ext2fs_set_i_gid_high(*inode, gid >> 16);
+}
+
+static int fuse4fs_new_child_gid(struct fuse4fs *ff, ext2_ino_t parent,
+				 gid_t *gid, int *parent_sgid)
+{
+	struct ext2_inode_large inode;
+	struct fuse_context *ctxt = fuse_get_context();
+	errcode_t err;
+
+	err = fuse4fs_read_inode(ff->fs, parent, &inode);
+	if (err)
+		return translate_error(ff->fs, parent, err);
+
+	if (inode.i_mode & S_ISGID) {
+		if (parent_sgid)
+			*parent_sgid = 1;
+		*gid = inode.i_gid;
+	} else {
+		if (parent_sgid)
+			*parent_sgid = 0;
+		*gid = ctxt->gid;
+	}
+
+	return 0;
+}
+
+/*
+ * Flush dirty data to disk if we're running in dirsync mode.  If @flushed is a
+ * non-null pointer, this function sets @flushed to 1 if we decided to flush
+ * data, or 0 if not.
+ */
+static inline int fuse4fs_dirsync_flush(struct fuse4fs *ff, ext2_ino_t ino,
+					int *flushed)
+{
+	struct ext2_inode_large inode;
+	ext2_filsys fs = ff->fs;
+	errcode_t err;
+
+	if (ff->dirsync)
+		goto flush;
+
+	err = fuse4fs_read_inode(fs, ino, &inode);
+	if (err)
+		return translate_error(fs, 0, err);
+
+	if (inode.i_flags & EXT2_DIRSYNC_FL)
+		goto flush;
+
+	if (flushed)
+		*flushed = 0;
+	return 0;
+flush:
+	err = ext2fs_flush2(fs, 0);
+	if (err)
+		return translate_error(fs, 0, err);
+
+	if (flushed)
+		*flushed = 1;
+	return 0;
+}
+
+static void fuse4fs_set_extra_isize(struct fuse4fs *ff, ext2_ino_t ino,
+				    struct ext2_inode_large *inode)
+{
+	ext2_filsys fs = ff->fs;
+	size_t extra = sizeof(struct ext2_inode_large) -
+		EXT2_GOOD_OLD_INODE_SIZE;
+
+	if (ext2fs_has_feature_extra_isize(fs->super)) {
+		dbg_printf(ff, "%s: ino=%u extra=%zu want=%u min=%u\n",
+			   __func__, ino, extra, fs->super->s_want_extra_isize,
+			   fs->super->s_min_extra_isize);
+
+		if (fs->super->s_want_extra_isize > extra)
+			extra = fs->super->s_want_extra_isize;
+		if (fs->super->s_min_extra_isize > extra)
+			extra = fs->super->s_min_extra_isize;
+	}
+
+	inode->i_extra_isize = extra;
+}
+
+static int op_mknod(const char *path, mode_t mode, dev_t dev)
+{
+	struct fuse_context *ctxt = fuse_get_context();
+	struct fuse4fs *ff = fuse4fs_get();
+	ext2_filsys fs;
+	ext2_ino_t parent, child;
+	char *temp_path;
+	errcode_t err;
+	char *node_name, a;
+	int filetype;
+	struct ext2_inode_large inode;
+	gid_t gid;
+	int ret = 0;
+
+	FUSE4FS_CHECK_CONTEXT(ff);
+	dbg_printf(ff, "%s: path=%s mode=0%o dev=0x%x\n", __func__, path, mode,
+		   (unsigned int)dev);
+	temp_path = strdup(path);
+	if (!temp_path) {
+		ret = -ENOMEM;
+		goto out;
+	}
+	node_name = strrchr(temp_path, '/');
+	if (!node_name) {
+		ret = -ENOMEM;
+		goto out;
+	}
+	node_name++;
+	a = *node_name;
+	*node_name = 0;
+
+	fs = fuse4fs_start(ff);
+	if (!fs_can_allocate(ff, 2)) {
+		ret = -ENOSPC;
+		goto out2;
+	}
+
+	err = ext2fs_namei(fs, EXT2_ROOT_INO, EXT2_ROOT_INO, temp_path,
+			   &parent);
+	if (err) {
+		ret = translate_error(fs, 0, err);
+		goto out2;
+	}
+
+	ret = check_inum_access(ff, parent, A_OK | W_OK);
+	if (ret)
+		goto out2;
+
+	*node_name = a;
+
+	if (LINUX_S_ISCHR(mode))
+		filetype = EXT2_FT_CHRDEV;
+	else if (LINUX_S_ISBLK(mode))
+		filetype = EXT2_FT_BLKDEV;
+	else if (LINUX_S_ISFIFO(mode))
+		filetype = EXT2_FT_FIFO;
+	else if (LINUX_S_ISSOCK(mode))
+		filetype = EXT2_FT_SOCK;
+	else {
+		ret = -EINVAL;
+		goto out2;
+	}
+
+	err = fuse4fs_new_child_gid(ff, parent, &gid, NULL);
+	if (err)
+		goto out2;
+
+	err = ext2fs_new_inode(fs, parent, mode, 0, &child);
+	if (err) {
+		ret = translate_error(fs, 0, err);
+		goto out2;
+	}
+
+	dbg_printf(ff, "%s: create ino=%d/name=%s in dir=%d\n", __func__, child,
+		   node_name, parent);
+	err = ext2fs_link(fs, parent, node_name, child,
+			  filetype | EXT2FS_LINK_EXPAND);
+	if (err) {
+		ret = translate_error(fs, parent, err);
+		goto out2;
+	}
+
+	ret = update_mtime(fs, parent, NULL);
+	if (ret)
+		goto out2;
+
+	memset(&inode, 0, sizeof(inode));
+	inode.i_mode = mode;
+
+	if (dev & ~0xFFFF)
+		inode.i_block[1] = dev;
+	else
+		inode.i_block[0] = dev;
+	inode.i_links_count = 1;
+	fuse4fs_set_extra_isize(ff, child, &inode);
+	fuse4fs_set_uid(&inode, ctxt->uid);
+	fuse4fs_set_gid(&inode, gid);
+
+	err = ext2fs_write_new_inode(fs, child, EXT2_INODE(&inode));
+	if (err) {
+		ret = translate_error(fs, child, err);
+		goto out2;
+	}
+
+	inode.i_generation = ff->next_generation++;
+	init_times(&inode);
+	err = fuse4fs_write_inode(fs, child, &inode);
+	if (err) {
+		ret = translate_error(fs, child, err);
+		goto out2;
+	}
+
+	ext2fs_inode_alloc_stats2(fs, child, 1, 0);
+
+	ret = propagate_default_acls(ff, parent, child, inode.i_mode);
+	if (ret)
+		goto out2;
+
+	ret = fuse4fs_dirsync_flush(ff, parent, NULL);
+	if (ret)
+		goto out2;
+
+out2:
+	fuse4fs_finish(ff, ret);
+out:
+	free(temp_path);
+	return ret;
+}
+
+static int op_mkdir(const char *path, mode_t mode)
+{
+	struct fuse_context *ctxt = fuse_get_context();
+	struct fuse4fs *ff = fuse4fs_get();
+	ext2_filsys fs;
+	ext2_ino_t parent, child;
+	char *temp_path;
+	errcode_t err;
+	char *node_name, a;
+	struct ext2_inode_large inode;
+	char *block;
+	blk64_t blk;
+	int ret = 0;
+	gid_t gid;
+	int parent_sgid;
+
+	FUSE4FS_CHECK_CONTEXT(ff);
+	dbg_printf(ff, "%s: path=%s mode=0%o\n", __func__, path, mode);
+	temp_path = strdup(path);
+	if (!temp_path) {
+		ret = -ENOMEM;
+		goto out;
+	}
+	node_name = strrchr(temp_path, '/');
+	if (!node_name) {
+		ret = -ENOMEM;
+		goto out;
+	}
+	node_name++;
+	a = *node_name;
+	*node_name = 0;
+
+	fs = fuse4fs_start(ff);
+	if (!fs_can_allocate(ff, 1)) {
+		ret = -ENOSPC;
+		goto out2;
+	}
+
+	err = ext2fs_namei(fs, EXT2_ROOT_INO, EXT2_ROOT_INO, temp_path,
+			   &parent);
+	if (err) {
+		ret = translate_error(fs, 0, err);
+		goto out2;
+	}
+
+	ret = check_inum_access(ff, parent, A_OK | W_OK);
+	if (ret)
+		goto out2;
+
+	err = fuse4fs_new_child_gid(ff, parent, &gid, &parent_sgid);
+	if (err)
+		goto out2;
+
+	*node_name = a;
+
+	err = ext2fs_mkdir2(fs, parent, 0, 0, EXT2FS_LINK_EXPAND,
+			    node_name, NULL);
+	if (err) {
+		ret = translate_error(fs, parent, err);
+		goto out2;
+	}
+
+	ret = update_mtime(fs, parent, NULL);
+	if (ret)
+		goto out2;
+
+	/* Still have to update the uid/gid of the dir */
+	err = ext2fs_namei(fs, EXT2_ROOT_INO, EXT2_ROOT_INO, temp_path,
+			   &child);
+	if (err) {
+		ret = translate_error(fs, 0, err);
+		goto out2;
+	}
+	dbg_printf(ff, "%s: created ino=%d/path=%s in dir=%d\n", __func__, child,
+		   node_name, parent);
+
+	err = fuse4fs_read_inode(fs, child, &inode);
+	if (err) {
+		ret = translate_error(fs, child, err);
+		goto out2;
+	}
+
+	fuse4fs_set_extra_isize(ff, child, &inode);
+	fuse4fs_set_uid(&inode, ctxt->uid);
+	fuse4fs_set_gid(&inode, gid);
+	inode.i_mode = LINUX_S_IFDIR | (mode & ~S_ISUID);
+	if (parent_sgid)
+		inode.i_mode |= S_ISGID;
+	inode.i_generation = ff->next_generation++;
+	init_times(&inode);
+
+	err = fuse4fs_write_inode(fs, child, &inode);
+	if (err) {
+		ret = translate_error(fs, child, err);
+		goto out2;
+	}
+
+	/* Rewrite the directory block checksum, having set i_generation */
+	if ((inode.i_flags & EXT4_INLINE_DATA_FL) ||
+	    !ext2fs_has_feature_metadata_csum(fs->super))
+		goto out2;
+	err = ext2fs_new_dir_block(fs, child, parent, &block);
+	if (err) {
+		ret = translate_error(fs, child, err);
+		goto out2;
+	}
+	err = ext2fs_bmap2(fs, child, EXT2_INODE(&inode), NULL, 0, 0,
+			   NULL, &blk);
+	if (err) {
+		ret = translate_error(fs, child, err);
+		goto out3;
+	}
+	err = ext2fs_write_dir_block4(fs, blk, block, 0, child);
+	if (err) {
+		ret = translate_error(fs, child, err);
+		goto out3;
+	}
+
+	ret = propagate_default_acls(ff, parent, child, inode.i_mode);
+	if (ret)
+		goto out3;
+
+	ret = fuse4fs_dirsync_flush(ff, parent, NULL);
+	if (ret)
+		goto out3;
+
+out3:
+	ext2fs_free_mem(&block);
+out2:
+	fuse4fs_finish(ff, ret);
+out:
+	free(temp_path);
+	return ret;
+}
+
+static int fuse4fs_unlink(struct fuse4fs *ff, const char *path,
+			  ext2_ino_t *parent)
+{
+	ext2_filsys fs = ff->fs;
+	errcode_t err;
+	ext2_ino_t dir;
+	char *filename = strdup(path);
+	char *base_name;
+	int ret;
+
+	base_name = strrchr(filename, '/');
+	if (base_name) {
+		*base_name++ = '\0';
+		err = ext2fs_namei(fs, EXT2_ROOT_INO, EXT2_ROOT_INO, filename,
+				   &dir);
+		if (err) {
+			free(filename);
+			return translate_error(fs, 0, err);
+		}
+	} else {
+		dir = EXT2_ROOT_INO;
+		base_name = filename;
+	}
+
+	ret = check_inum_access(ff, dir, W_OK);
+	if (ret) {
+		free(filename);
+		return ret;
+	}
+
+	dbg_printf(ff, "%s: unlinking name=%s from dir=%d\n", __func__,
+		   base_name, dir);
+	err = ext2fs_unlink(fs, dir, base_name, 0, 0);
+	free(filename);
+	if (err)
+		return translate_error(fs, dir, err);
+
+	ret = update_mtime(fs, dir, NULL);
+	if (ret)
+		return ret;
+
+	if (parent)
+		*parent = dir;
+	return 0;
+}
+
+static int remove_ea_inodes(struct fuse4fs *ff, ext2_ino_t ino,
+			    struct ext2_inode_large *inode)
+{
+	ext2_filsys fs = ff->fs;
+	struct ext2_xattr_handle *h;
+	errcode_t err;
+	int ret = 0;
+
+	/*
+	 * The xattr handle maintains its own private copy of the inode, so
+	 * write ours to disk so that we can read it.
+	 */
+	err = fuse4fs_write_inode(fs, ino, inode);
+	if (err)
+		return translate_error(fs, ino, err);
+
+	err = ext2fs_xattrs_open(fs, ino, &h);
+	if (err)
+		return translate_error(fs, ino, err);
+
+	err = ext2fs_xattrs_read(h);
+	if (err) {
+		ret = translate_error(fs, ino, err);
+		goto out_close;
+	}
+
+	err = ext2fs_xattr_remove_all(h);
+	if (err) {
+		ret = translate_error(fs, ino, err);
+		goto out_close;
+	}
+
+out_close:
+	ext2fs_xattrs_close(&h);
+	if (ret)
+		return ret;
+
+	/* Now read the inode back in. */
+	err = fuse4fs_read_inode(fs, ino, inode);
+	if (err)
+		return translate_error(fs, ino, err);
+
+	return 0;
+}
+
+static int remove_inode(struct fuse4fs *ff, ext2_ino_t ino)
+{
+	ext2_filsys fs = ff->fs;
+	errcode_t err;
+	struct ext2_inode_large inode;
+	int ret = 0;
+
+	err = fuse4fs_read_inode(fs, ino, &inode);
+	if (err)
+		return translate_error(fs, ino, err);
+
+	dbg_printf(ff, "%s: put ino=%d links=%d\n", __func__, ino,
+		   inode.i_links_count);
+
+	switch (inode.i_links_count) {
+	case 0:
+		return 0; /* XXX: already done? */
+	case 1:
+		inode.i_links_count--;
+		ext2fs_set_dtime(fs, EXT2_INODE(&inode));
+		break;
+	default:
+		inode.i_links_count--;
+	}
+
+	ret = update_ctime(fs, ino, &inode);
+	if (ret)
+		return ret;
+
+	if (inode.i_links_count)
+		goto write_out;
+
+	if (ext2fs_has_feature_ea_inode(fs->super)) {
+		ret = remove_ea_inodes(ff, ino, &inode);
+		if (ret)
+			return ret;
+	}
+
+	/* Nobody holds this file; free its blocks! */
+	err = ext2fs_free_ext_attr(fs, ino, &inode);
+	if (err)
+		return translate_error(fs, ino, err);
+
+	if (ext2fs_inode_has_valid_blocks2(fs, EXT2_INODE(&inode))) {
+		err = ext2fs_punch(fs, ino, EXT2_INODE(&inode), NULL,
+				   0, ~0ULL);
+		if (err)
+			return translate_error(fs, ino, err);
+	}
+
+	ext2fs_inode_alloc_stats2(fs, ino, -1,
+				  LINUX_S_ISDIR(inode.i_mode));
+
+write_out:
+	err = fuse4fs_write_inode(fs, ino, &inode);
+	if (err)
+		return translate_error(fs, ino, err);
+
+	return 0;
+}
+
+static int __op_unlink(struct fuse4fs *ff, const char *path)
+{
+	ext2_filsys fs = ff->fs;
+	ext2_ino_t parent, ino;
+	errcode_t err;
+	int ret = 0;
+
+	err = ext2fs_namei(fs, EXT2_ROOT_INO, EXT2_ROOT_INO, path, &ino);
+	if (err) {
+		ret = translate_error(fs, 0, err);
+		goto out;
+	}
+
+	ret = check_inum_access(ff, ino, W_OK);
+	if (ret)
+		goto out;
+
+	ret = fuse4fs_unlink(ff, path, &parent);
+	if (ret)
+		goto out;
+
+	ret = remove_inode(ff, ino);
+	if (ret)
+		goto out;
+
+	ret = fuse4fs_dirsync_flush(ff, parent, NULL);
+	if (ret)
+		goto out;
+
+out:
+	return ret;
+}
+
+static int op_unlink(const char *path)
+{
+	struct fuse4fs *ff = fuse4fs_get();
+	int ret;
+
+	FUSE4FS_CHECK_CONTEXT(ff);
+	fuse4fs_start(ff);
+	ret = __op_unlink(ff, path);
+	fuse4fs_finish(ff, ret);
+	return ret;
+}
+
+struct rd_struct {
+	ext2_ino_t	parent;
+	int		empty;
+};
+
+static int rmdir_proc(ext2_ino_t dir EXT2FS_ATTR((unused)),
+		      int	entry EXT2FS_ATTR((unused)),
+		      struct ext2_dir_entry *dirent,
+		      int	offset EXT2FS_ATTR((unused)),
+		      int	blocksize EXT2FS_ATTR((unused)),
+		      char	*buf EXT2FS_ATTR((unused)),
+		      void	*private)
+{
+	struct rd_struct *rds = (struct rd_struct *) private;
+
+	if (dirent->inode == 0)
+		return 0;
+	if (((dirent->name_len & 0xFF) == 1) && (dirent->name[0] == '.'))
+		return 0;
+	if (((dirent->name_len & 0xFF) == 2) && (dirent->name[0] == '.') &&
+	    (dirent->name[1] == '.')) {
+		rds->parent = dirent->inode;
+		return 0;
+	}
+	rds->empty = 0;
+	return 0;
+}
+
+static int __op_rmdir(struct fuse4fs *ff, const char *path)
+{
+	ext2_filsys fs = ff->fs;
+	ext2_ino_t parent, child;
+	errcode_t err;
+	struct ext2_inode_large inode;
+	struct rd_struct rds;
+	int ret = 0;
+
+	err = ext2fs_namei(fs, EXT2_ROOT_INO, EXT2_ROOT_INO, path, &child);
+	if (err) {
+		ret = translate_error(fs, 0, err);
+		goto out;
+	}
+	dbg_printf(ff, "%s: rmdir path=%s ino=%d\n", __func__, path, child);
+
+	ret = check_inum_access(ff, child, W_OK);
+	if (ret)
+		goto out;
+
+	rds.parent = 0;
+	rds.empty = 1;
+
+	err = ext2fs_dir_iterate2(fs, child, 0, 0, rmdir_proc, &rds);
+	if (err) {
+		ret = translate_error(fs, child, err);
+		goto out;
+	}
+
+	/* the kernel checks parent permissions before emptiness */
+	if (rds.parent == 0) {
+		ret = translate_error(fs, child, EXT2_ET_FILESYSTEM_CORRUPTED);
+		goto out;
+	}
+
+	ret = check_inum_access(ff, rds.parent, W_OK);
+	if (ret)
+		goto out;
+
+	if (rds.empty == 0) {
+		ret = -ENOTEMPTY;
+		goto out;
+	}
+
+	ret = fuse4fs_unlink(ff, path, &parent);
+	if (ret)
+		goto out;
+	/* Directories have to be "removed" twice. */
+	ret = remove_inode(ff, child);
+	if (ret)
+		goto out;
+	ret = remove_inode(ff, child);
+	if (ret)
+		goto out;
+
+	if (rds.parent) {
+		dbg_printf(ff, "%s: decr dir=%d link count\n", __func__,
+			   rds.parent);
+		err = fuse4fs_read_inode(fs, rds.parent, &inode);
+		if (err) {
+			ret = translate_error(fs, rds.parent, err);
+			goto out;
+		}
+		if (inode.i_links_count > 1)
+			inode.i_links_count--;
+		ret = update_mtime(fs, rds.parent, &inode);
+		if (ret)
+			goto out;
+		err = fuse4fs_write_inode(fs, rds.parent, &inode);
+		if (err) {
+			ret = translate_error(fs, rds.parent, err);
+			goto out;
+		}
+	}
+
+	ret = fuse4fs_dirsync_flush(ff, parent, NULL);
+	if (ret)
+		goto out;
+
+out:
+	return ret;
+}
+
+static int op_rmdir(const char *path)
+{
+	struct fuse4fs *ff = fuse4fs_get();
+	int ret;
+
+	FUSE4FS_CHECK_CONTEXT(ff);
+	fuse4fs_start(ff);
+	ret = __op_rmdir(ff, path);
+	fuse4fs_finish(ff, ret);
+	return ret;
+}
+
+static int op_symlink(const char *src, const char *dest)
+{
+	struct fuse_context *ctxt = fuse_get_context();
+	struct fuse4fs *ff = fuse4fs_get();
+	ext2_filsys fs;
+	ext2_ino_t parent, child;
+	char *temp_path;
+	errcode_t err;
+	char *node_name, a;
+	struct ext2_inode_large inode;
+	gid_t gid;
+	int ret = 0;
+
+	FUSE4FS_CHECK_CONTEXT(ff);
+	dbg_printf(ff, "%s: symlink %s to %s\n", __func__, src, dest);
+	temp_path = strdup(dest);
+	if (!temp_path) {
+		ret = -ENOMEM;
+		goto out;
+	}
+	node_name = strrchr(temp_path, '/');
+	if (!node_name) {
+		ret = -ENOMEM;
+		goto out;
+	}
+	node_name++;
+	a = *node_name;
+	*node_name = 0;
+
+	fs = fuse4fs_start(ff);
+	if (!fs_can_allocate(ff, 1)) {
+		ret = -ENOSPC;
+		goto out2;
+	}
+	err = ext2fs_namei(fs, EXT2_ROOT_INO, EXT2_ROOT_INO, temp_path,
+			   &parent);
+	*node_name = a;
+	if (err) {
+		ret = translate_error(fs, 0, err);
+		goto out2;
+	}
+
+	ret = check_inum_access(ff, parent, A_OK | W_OK);
+	if (ret)
+		goto out2;
+
+	err = fuse4fs_new_child_gid(ff, parent, &gid, NULL);
+	if (err)
+		goto out2;
+
+	/* Create symlink */
+	err = ext2fs_symlink(fs, parent, 0, node_name, src);
+	if (err == EXT2_ET_DIR_NO_SPACE) {
+		err = ext2fs_expand_dir(fs, parent);
+		if (err) {
+			ret = translate_error(fs, parent, err);
+			goto out2;
+		}
+
+		err = ext2fs_symlink(fs, parent, 0, node_name, src);
+	}
+	if (err) {
+		ret = translate_error(fs, parent, err);
+		goto out2;
+	}
+
+	/* Update parent dir's mtime */
+	ret = update_mtime(fs, parent, NULL);
+	if (ret)
+		goto out2;
+
+	/* Still have to update the uid/gid of the symlink */
+	err = ext2fs_namei(fs, EXT2_ROOT_INO, EXT2_ROOT_INO, temp_path,
+			   &child);
+	if (err) {
+		ret = translate_error(fs, 0, err);
+		goto out2;
+	}
+	dbg_printf(ff, "%s: symlinking ino=%d/name=%s to dir=%d\n", __func__,
+		   child, node_name, parent);
+
+	err = fuse4fs_read_inode(fs, child, &inode);
+	if (err) {
+		ret = translate_error(fs, child, err);
+		goto out2;
+	}
+
+	fuse4fs_set_extra_isize(ff, child, &inode);
+	fuse4fs_set_uid(&inode, ctxt->uid);
+	fuse4fs_set_gid(&inode, gid);
+	inode.i_generation = ff->next_generation++;
+	init_times(&inode);
+
+	err = fuse4fs_write_inode(fs, child, &inode);
+	if (err) {
+		ret = translate_error(fs, child, err);
+		goto out2;
+	}
+
+	ret = fuse4fs_dirsync_flush(ff, parent, NULL);
+	if (ret)
+		goto out2;
+
+out2:
+	fuse4fs_finish(ff, ret);
+out:
+	free(temp_path);
+	return ret;
+}
+
+struct update_dotdot {
+	ext2_ino_t new_dotdot;
+};
+
+static int update_dotdot_helper(ext2_ino_t dir EXT2FS_ATTR((unused)),
+				int entry EXT2FS_ATTR((unused)),
+				struct ext2_dir_entry *dirent,
+				int offset EXT2FS_ATTR((unused)),
+				int blocksize EXT2FS_ATTR((unused)),
+				char *buf EXT2FS_ATTR((unused)),
+				void *priv_data)
+{
+	struct update_dotdot *ud = priv_data;
+
+	if (ext2fs_dirent_name_len(dirent) == 2 &&
+	    dirent->name[0] == '.' && dirent->name[1] == '.') {
+		dirent->inode = ud->new_dotdot;
+		return DIRENT_CHANGED | DIRENT_ABORT;
+	}
+
+	return 0;
+}
+
+static int op_rename(const char *from, const char *to,
+		     unsigned int flags EXT2FS_ATTR((unused)))
+{
+	struct fuse4fs *ff = fuse4fs_get();
+	ext2_filsys fs;
+	errcode_t err;
+	ext2_ino_t from_ino, to_ino, to_dir_ino, from_dir_ino;
+	char *temp_to = NULL, *temp_from = NULL;
+	char *cp, a;
+	struct ext2_inode inode;
+	struct update_dotdot ud;
+	int flushed = 0;
+	int ret = 0;
+
+	/* renameat2 is not supported */
+	if (flags)
+		return -ENOSYS;
+
+	FUSE4FS_CHECK_CONTEXT(ff);
+	dbg_printf(ff, "%s: renaming %s to %s\n", __func__, from, to);
+	fs = fuse4fs_start(ff);
+	if (!fs_can_allocate(ff, 5)) {
+		ret = -ENOSPC;
+		goto out;
+	}
+
+	err = ext2fs_namei(fs, EXT2_ROOT_INO, EXT2_ROOT_INO, from, &from_ino);
+	if (err || from_ino == 0) {
+		ret = translate_error(fs, 0, err);
+		goto out;
+	}
+
+	err = ext2fs_namei(fs, EXT2_ROOT_INO, EXT2_ROOT_INO, to, &to_ino);
+	if (err && err != EXT2_ET_FILE_NOT_FOUND) {
+		ret = translate_error(fs, 0, err);
+		goto out;
+	}
+
+	if (err == EXT2_ET_FILE_NOT_FOUND)
+		to_ino = 0;
+
+	/* Already the same file? */
+	if (to_ino != 0 && to_ino == from_ino) {
+		ret = 0;
+		goto out;
+	}
+
+	ret = check_inum_access(ff, from_ino, W_OK);
+	if (ret)
+		goto out;
+
+	if (to_ino) {
+		ret = check_inum_access(ff, to_ino, W_OK);
+		if (ret)
+			goto out;
+	}
+
+	temp_to = strdup(to);
+	if (!temp_to) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	temp_from = strdup(from);
+	if (!temp_from) {
+		ret = -ENOMEM;
+		goto out2;
+	}
+
+	/* Find parent dir of the source and check write access */
+	cp = strrchr(temp_from, '/');
+	if (!cp) {
+		ret = -EINVAL;
+		goto out2;
+	}
+
+	a = *(cp + 1);
+	*(cp + 1) = 0;
+	err = ext2fs_namei(fs, EXT2_ROOT_INO, EXT2_ROOT_INO, temp_from,
+			   &from_dir_ino);
+	*(cp + 1) = a;
+	if (err) {
+		ret = translate_error(fs, 0, err);
+		goto out2;
+	}
+	if (from_dir_ino == 0) {
+		ret = -ENOENT;
+		goto out2;
+	}
+
+	ret = check_inum_access(ff, from_dir_ino, W_OK);
+	if (ret)
+		goto out2;
+
+	/* Find parent dir of the destination and check write access */
+	cp = strrchr(temp_to, '/');
+	if (!cp) {
+		ret = -EINVAL;
+		goto out2;
+	}
+
+	a = *(cp + 1);
+	*(cp + 1) = 0;
+	err = ext2fs_namei(fs, EXT2_ROOT_INO, EXT2_ROOT_INO, temp_to,
+			   &to_dir_ino);
+	*(cp + 1) = a;
+	if (err) {
+		ret = translate_error(fs, 0, err);
+		goto out2;
+	}
+	if (to_dir_ino == 0) {
+		ret = -ENOENT;
+		goto out2;
+	}
+
+	ret = check_inum_access(ff, to_dir_ino, W_OK);
+	if (ret)
+		goto out2;
+
+	/* If the target exists, unlink it first */
+	if (to_ino != 0) {
+		err = ext2fs_read_inode(fs, to_ino, &inode);
+		if (err) {
+			ret = translate_error(fs, to_ino, err);
+			goto out2;
+		}
+
+		dbg_printf(ff, "%s: unlinking %s ino=%d\n", __func__,
+			   LINUX_S_ISDIR(inode.i_mode) ? "dir" : "file",
+			   to_ino);
+		if (LINUX_S_ISDIR(inode.i_mode))
+			ret = __op_rmdir(ff, to);
+		else
+			ret = __op_unlink(ff, to);
+		if (ret)
+			goto out2;
+	}
+
+	/* Get ready to do the move */
+	err = ext2fs_read_inode(fs, from_ino, &inode);
+	if (err) {
+		ret = translate_error(fs, from_ino, err);
+		goto out2;
+	}
+
+	/* Link in the new file */
+	dbg_printf(ff, "%s: linking ino=%d/path=%s to dir=%d\n", __func__,
+		   from_ino, cp + 1, to_dir_ino);
+	err = ext2fs_link(fs, to_dir_ino, cp + 1, from_ino,
+			  ext2_file_type(inode.i_mode) | EXT2FS_LINK_EXPAND);
+	if (err) {
+		ret = translate_error(fs, to_dir_ino, err);
+		goto out2;
+	}
+
+	/* Update '..' pointer if dir */
+	err = ext2fs_read_inode(fs, from_ino, &inode);
+	if (err) {
+		ret = translate_error(fs, from_ino, err);
+		goto out2;
+	}
+
+	if (LINUX_S_ISDIR(inode.i_mode)) {
+		ud.new_dotdot = to_dir_ino;
+		dbg_printf(ff, "%s: updating .. entry for dir=%d\n", __func__,
+			   to_dir_ino);
+		err = ext2fs_dir_iterate2(fs, from_ino, 0, NULL,
+					  update_dotdot_helper, &ud);
+		if (err) {
+			ret = translate_error(fs, from_ino, err);
+			goto out2;
+		}
+
+		/* Decrease from_dir_ino's links_count */
+		dbg_printf(ff, "%s: moving linkcount from dir=%d to dir=%d\n",
+			   __func__, from_dir_ino, to_dir_ino);
+		err = ext2fs_read_inode(fs, from_dir_ino, &inode);
+		if (err) {
+			ret = translate_error(fs, from_dir_ino, err);
+			goto out2;
+		}
+		inode.i_links_count--;
+		err = ext2fs_write_inode(fs, from_dir_ino, &inode);
+		if (err) {
+			ret = translate_error(fs, from_dir_ino, err);
+			goto out2;
+		}
+
+		/* Increase to_dir_ino's links_count */
+		err = ext2fs_read_inode(fs, to_dir_ino, &inode);
+		if (err) {
+			ret = translate_error(fs, to_dir_ino, err);
+			goto out2;
+		}
+		inode.i_links_count++;
+		err = ext2fs_write_inode(fs, to_dir_ino, &inode);
+		if (err) {
+			ret = translate_error(fs, to_dir_ino, err);
+			goto out2;
+		}
+	}
+
+	/* Update timestamps */
+	ret = update_ctime(fs, from_ino, NULL);
+	if (ret)
+		goto out2;
+
+	ret = update_mtime(fs, to_dir_ino, NULL);
+	if (ret)
+		goto out2;
+
+	/* Remove the old file */
+	ret = fuse4fs_unlink(ff, from, NULL);
+	if (ret)
+		goto out2;
+
+	ret = fuse4fs_dirsync_flush(ff, from_dir_ino, &flushed);
+	if (ret)
+		goto out2;
+
+	if (from_dir_ino != to_dir_ino && !flushed) {
+		ret = fuse4fs_dirsync_flush(ff, to_dir_ino, NULL);
+		if (ret)
+			goto out2;
+	}
+
+out2:
+	free(temp_from);
+	free(temp_to);
+out:
+	fuse4fs_finish(ff, ret);
+	return ret;
+}
+
+static int op_link(const char *src, const char *dest)
+{
+	struct fuse4fs *ff = fuse4fs_get();
+	ext2_filsys fs;
+	char *temp_path;
+	errcode_t err;
+	char *node_name, a;
+	ext2_ino_t parent, ino;
+	struct ext2_inode_large inode;
+	int ret = 0;
+
+	FUSE4FS_CHECK_CONTEXT(ff);
+	dbg_printf(ff, "%s: src=%s dest=%s\n", __func__, src, dest);
+	temp_path = strdup(dest);
+	if (!temp_path) {
+		ret = -ENOMEM;
+		goto out;
+	}
+	node_name = strrchr(temp_path, '/');
+	if (!node_name) {
+		ret = -ENOMEM;
+		goto out;
+	}
+	node_name++;
+	a = *node_name;
+	*node_name = 0;
+
+	fs = fuse4fs_start(ff);
+	if (!fs_can_allocate(ff, 2)) {
+		ret = -ENOSPC;
+		goto out2;
+	}
+
+	err = ext2fs_namei(fs, EXT2_ROOT_INO, EXT2_ROOT_INO, temp_path,
+			   &parent);
+	*node_name = a;
+	if (err) {
+		err = -ENOENT;
+		goto out2;
+	}
+
+	ret = check_inum_access(ff, parent, A_OK | W_OK);
+	if (ret)
+		goto out2;
+
+	err = ext2fs_namei(fs, EXT2_ROOT_INO, EXT2_ROOT_INO, src, &ino);
+	if (err || ino == 0) {
+		ret = translate_error(fs, 0, err);
+		goto out2;
+	}
+
+	err = fuse4fs_read_inode(fs, ino, &inode);
+	if (err) {
+		ret = translate_error(fs, ino, err);
+		goto out2;
+	}
+
+	ret = check_iflags_access(ff, ino, EXT2_INODE(&inode), W_OK);
+	if (ret)
+		goto out2;
+
+	inode.i_links_count++;
+	ret = update_ctime(fs, ino, &inode);
+	if (ret)
+		goto out2;
+
+	err = fuse4fs_write_inode(fs, ino, &inode);
+	if (err) {
+		ret = translate_error(fs, ino, err);
+		goto out2;
+	}
+
+	dbg_printf(ff, "%s: linking ino=%d/name=%s to dir=%d\n", __func__, ino,
+		   node_name, parent);
+	err = ext2fs_link(fs, parent, node_name, ino,
+			  ext2_file_type(inode.i_mode) | EXT2FS_LINK_EXPAND);
+	if (err) {
+		ret = translate_error(fs, parent, err);
+		goto out2;
+	}
+
+	ret = update_mtime(fs, parent, NULL);
+	if (ret)
+		goto out2;
+
+	ret = fuse4fs_dirsync_flush(ff, parent, NULL);
+	if (ret)
+		goto out2;
+
+out2:
+	fuse4fs_finish(ff, ret);
+out:
+	free(temp_path);
+	return ret;
+}
+
+/* Obtain group ids of the process that sent us a command(?) */
+static int get_req_groups(struct fuse4fs *ff, gid_t **gids, size_t *nr_gids)
+{
+	ext2_filsys fs = ff->fs;
+	errcode_t err;
+	gid_t *array;
+	int nr = 32;	/* nobody has more than 32 groups right? */
+	int ret;
+
+	do {
+		err = ext2fs_get_array(nr, sizeof(gid_t), &array);
+		if (err)
+			return translate_error(fs, 0, err);
+
+		ret = fuse_getgroups(nr, array);
+		if (ret < 0) {
+			/*
+			 * If there's an error, we failed to find the group
+			 * membership of the process that initiated the file
+			 * change, either because the process went away or
+			 * because there's no Linux procfs.  Regardless of the
+			 * cause, we return -ENOENT.
+			 */
+			ext2fs_free_mem(&array);
+			return -ENOENT;
+		}
+
+		if (ret <= nr) {
+			*gids = array;
+			*nr_gids = ret;
+			return 0;
+		}
+
+		ext2fs_free_mem(&array);
+		nr = ret;
+	} while (0);
+
+	/* shut up gcc */
+	return -ENOMEM;
+}
+
+/*
+ * Is this file's group id in the set of groups associated with the process
+ * that initiated the fuse request?  Returns 1 for yes, 0 for no, or a negative
+ * errno.
+ */
+static int in_file_group(struct fuse_context *ctxt,
+			 const struct ext2_inode_large *inode)
+{
+	struct fuse4fs *ff = fuse4fs_get();
+	gid_t *gids = NULL;
+	size_t i, nr_gids = 0;
+	gid_t gid = inode_gid(*inode);
+	int ret;
+
+	/* If the inode gid matches the process' primary group, we're done. */
+	if (ctxt->gid == gid)
+		return 1;
+
+	ret = get_req_groups(ff, &gids, &nr_gids);
+	if (ret == -ENOENT) {
+		/* magic return code for "could not get caller group info" */
+		return 0;
+	}
+	if (ret < 0)
+		return ret;
+
+	ret = 0;
+	for (i = 0; i < nr_gids; i++) {
+		if (gids[i] == gid) {
+			ret = 1;
+			break;
+		}
+	}
+
+	ext2fs_free_mem(&gids);
+	return ret;
+}
+
+static int op_chmod(const char *path, mode_t mode, struct fuse_file_info *fi)
+{
+	struct fuse_context *ctxt = fuse_get_context();
+	struct fuse4fs *ff = fuse4fs_get();
+	ext2_filsys fs;
+	errcode_t err;
+	ext2_ino_t ino;
+	struct ext2_inode_large inode;
+	int ret = 0;
+
+	FUSE4FS_CHECK_CONTEXT(ff);
+	fs = fuse4fs_start(ff);
+	ret = fuse4fs_file_ino(ff, path, fi, &ino);
+	if (ret)
+		goto out;
+	dbg_printf(ff, "%s: path=%s mode=0%o ino=%d\n", __func__, path, mode, ino);
+
+	err = fuse4fs_read_inode(fs, ino, &inode);
+	if (err) {
+		ret = translate_error(fs, ino, err);
+		goto out;
+	}
+
+	ret = check_iflags_access(ff, ino, EXT2_INODE(&inode), W_OK);
+	if (ret)
+		goto out;
+
+	if (want_check_owner(ff, ctxt) && ctxt->uid != inode_uid(inode)) {
+		ret = -EPERM;
+		goto out;
+	}
+
+	/*
+	 * XXX: We should really check that the inode gid is not in /any/
+	 * of the user's groups, but FUSE only tells us about the primary
+	 * group.
+	 */
+	if (!is_superuser(ff, ctxt)) {
+		ret = in_file_group(ctxt, &inode);
+		if (ret < 0)
+			goto out;
+
+		if (!ret)
+			mode &= ~S_ISGID;
+	}
+
+	inode.i_mode &= ~0xFFF;
+	inode.i_mode |= mode & 0xFFF;
+
+	dbg_printf(ff, "%s: path=%s new_mode=0%o ino=%d\n", __func__,
+		   path, inode.i_mode, ino);
+
+	ret = update_ctime(fs, ino, &inode);
+	if (ret)
+		goto out;
+
+	err = fuse4fs_write_inode(fs, ino, &inode);
+	if (err) {
+		ret = translate_error(fs, ino, err);
+		goto out;
+	}
+
+out:
+	fuse4fs_finish(ff, ret);
+	return ret;
+}
+
+static int op_chown(const char *path, uid_t owner, gid_t group,
+		    struct fuse_file_info *fi)
+{
+	struct fuse_context *ctxt = fuse_get_context();
+	struct fuse4fs *ff = fuse4fs_get();
+	ext2_filsys fs;
+	errcode_t err;
+	ext2_ino_t ino;
+	struct ext2_inode_large inode;
+	int ret = 0;
+
+	FUSE4FS_CHECK_CONTEXT(ff);
+	fs = fuse4fs_start(ff);
+	ret = fuse4fs_file_ino(ff, path, fi, &ino);
+	if (ret)
+		goto out;
+	dbg_printf(ff, "%s: path=%s owner=%d group=%d ino=%d\n", __func__,
+		   path, owner, group, ino);
+
+	err = fuse4fs_read_inode(fs, ino, &inode);
+	if (err) {
+		ret = translate_error(fs, ino, err);
+		goto out;
+	}
+
+	ret = check_iflags_access(ff, ino, EXT2_INODE(&inode), W_OK);
+	if (ret)
+		goto out;
+
+	/* FUSE seems to feed us ~0 to mean "don't change" */
+	if (owner != (uid_t) ~0) {
+		/* Only root gets to change UID. */
+		if (want_check_owner(ff, ctxt) &&
+		    !(inode_uid(inode) == ctxt->uid && owner == ctxt->uid)) {
+			ret = -EPERM;
+			goto out;
+		}
+		fuse4fs_set_uid(&inode, owner);
+	}
+
+	if (group != (gid_t) ~0) {
+		/* Only root or the owner get to change GID. */
+		if (want_check_owner(ff, ctxt) &&
+		    inode_uid(inode) != ctxt->uid) {
+			ret = -EPERM;
+			goto out;
+		}
+
+		/* XXX: We /should/ check group membership but FUSE */
+		fuse4fs_set_gid(&inode, group);
+	}
+
+	ret = update_ctime(fs, ino, &inode);
+	if (ret)
+		goto out;
+
+	err = fuse4fs_write_inode(fs, ino, &inode);
+	if (err) {
+		ret = translate_error(fs, ino, err);
+		goto out;
+	}
+
+out:
+	fuse4fs_finish(ff, ret);
+	return ret;
+}
+
+static int fuse4fs_punch_posteof(struct fuse4fs *ff, ext2_ino_t ino,
+				 off_t new_size)
+{
+	ext2_filsys fs = ff->fs;
+	struct ext2_inode_large inode;
+	blk64_t truncate_block = FUSE4FS_B_TO_FSB(ff, new_size);
+	errcode_t err;
+
+	err = fuse4fs_read_inode(fs, ino, &inode);
+	if (err)
+		return translate_error(fs, ino, err);
+
+	err = ext2fs_punch(fs, ino, EXT2_INODE(&inode), 0, truncate_block,
+			   ~0ULL);
+	if (err)
+		return translate_error(fs, ino, err);
+
+	err = fuse4fs_write_inode(fs, ino, &inode);
+	if (err)
+		return translate_error(fs, ino, err);
+
+	return 0;
+}
+
+static int fuse4fs_truncate(struct fuse4fs *ff, ext2_ino_t ino, off_t new_size)
+{
+	ext2_filsys fs = ff->fs;
+	ext2_file_t file;
+	__u64 old_isize;
+	errcode_t err;
+	int ret = 0;
+
+	err = ext2fs_file_open(fs, ino, EXT2_FILE_WRITE, &file);
+	if (err)
+		return translate_error(fs, ino, err);
+
+	err = ext2fs_file_get_lsize(file, &old_isize);
+	if (err) {
+		ret = translate_error(fs, ino, err);
+		goto out_close;
+	}
+
+	dbg_printf(ff, "%s: ino=%u isize=0x%llx new_size=0x%llx\n", __func__,
+		   ino,
+		   (unsigned long long)old_isize,
+		   (unsigned long long)new_size);
+
+	err = ext2fs_file_set_size2(file, new_size);
+	if (err)
+		ret = translate_error(fs, ino, err);
+
+out_close:
+	err = ext2fs_file_close(file);
+	if (ret)
+		return ret;
+	if (err)
+		return translate_error(fs, ino, err);
+
+	ret = update_mtime(fs, ino, NULL);
+	if (ret)
+		return ret;
+
+	/*
+	 * Truncating to the current size is usually understood to mean that
+	 * we should clear out post-EOF preallocations.
+	 */
+	if (new_size == old_isize)
+		return fuse4fs_punch_posteof(ff, ino, new_size);
+
+	return 0;
+}
+
+static int op_truncate(const char *path, off_t len, struct fuse_file_info *fi)
+{
+	struct fuse4fs *ff = fuse4fs_get();
+	ext2_ino_t ino;
+	int ret = 0;
+
+	FUSE4FS_CHECK_CONTEXT(ff);
+	fuse4fs_start(ff);
+	ret = fuse4fs_file_ino(ff, path, fi, &ino);
+	if (ret)
+		goto out;
+	dbg_printf(ff, "%s: ino=%d len=%jd\n", __func__, ino, (intmax_t) len);
+
+	ret = check_inum_access(ff, ino, W_OK);
+	if (ret)
+		goto out;
+
+	ret = fuse4fs_truncate(ff, ino, len);
+	if (ret)
+		goto out;
+
+out:
+	fuse4fs_finish(ff, ret);
+	return ret;
+}
+
+#ifdef __linux__
+static void detect_linux_executable_open(int kernel_flags, int *access_check,
+				  int *e2fs_open_flags)
+{
+	/*
+	 * On Linux, execve will bleed __FMODE_EXEC into the file mode flags,
+	 * and FUSE is more than happy to let that slip through.
+	 */
+	if (kernel_flags & 0x20) {
+		*access_check = X_OK;
+		*e2fs_open_flags &= ~EXT2_FILE_WRITE;
+	}
+}
+#else
+static void detect_linux_executable_open(int kernel_flags, int *access_check,
+				  int *e2fs_open_flags)
+{
+	/* empty */
+}
+#endif /* __linux__ */
+
+static int __op_open(struct fuse4fs *ff, const char *path,
+		     struct fuse_file_info *fp)
+{
+	ext2_filsys fs = ff->fs;
+	errcode_t err;
+	struct fuse4fs_file_handle *file;
+	int check = 0, ret = 0;
+
+	dbg_printf(ff, "%s: path=%s oflags=0o%o\n", __func__, path, fp->flags);
+	err = ext2fs_get_mem(sizeof(*file), &file);
+	if (err)
+		return translate_error(fs, 0, err);
+	file->magic = FUSE4FS_FILE_MAGIC;
+
+	file->open_flags = 0;
+	switch (fp->flags & O_ACCMODE) {
+	case O_RDONLY:
+		check = R_OK;
+		break;
+	case O_WRONLY:
+		check = W_OK;
+		file->open_flags |= EXT2_FILE_WRITE;
+		break;
+	case O_RDWR:
+		check = R_OK | W_OK;
+		file->open_flags |= EXT2_FILE_WRITE;
+		break;
+	}
+
+	/*
+	 * If the caller wants to truncate the file, we need to ask for full
+	 * write access even if the caller claims to be appending.
+	 */
+	if ((fp->flags & O_APPEND) && !(fp->flags & O_TRUNC))
+		check |= A_OK;
+
+	detect_linux_executable_open(fp->flags, &check, &file->open_flags);
+
+	if (fp->flags & O_CREAT)
+		file->open_flags |= EXT2_FILE_CREATE;
+
+	err = ext2fs_namei(fs, EXT2_ROOT_INO, EXT2_ROOT_INO, path, &file->ino);
+	if (err || file->ino == 0) {
+		ret = translate_error(fs, 0, err);
+		goto out;
+	}
+	dbg_printf(ff, "%s: ino=%d\n", __func__, file->ino);
+
+	ret = check_inum_access(ff, file->ino, check);
+	if (ret) {
+		/*
+		 * In a regular (Linux) fs driver, the kernel will open
+		 * binaries for reading if the user has --x privileges (i.e.
+		 * execute without read).  Since the kernel doesn't have any
+		 * way to tell us if it's opening a file via execve, we'll
+		 * just assume that allowing access is ok if asking for ro mode
+		 * fails but asking for x mode succeeds.  Of course we can
+		 * also employ undocumented hacks (see above).
+		 */
+		if (check == R_OK) {
+			ret = check_inum_access(ff, file->ino, X_OK);
+			if (ret)
+				goto out;
+			check = X_OK;
+		} else
+			goto out;
+	}
+
+	if (fp->flags & O_TRUNC) {
+		ret = fuse4fs_truncate(ff, file->ino, 0);
+		if (ret)
+			goto out;
+	}
+
+	file->check_flags = check;
+	fuse4fs_set_handle(fp, file);
+
+out:
+	if (ret)
+		ext2fs_free_mem(&file);
+	return ret;
+}
+
+static int op_open(const char *path, struct fuse_file_info *fp)
+{
+	struct fuse4fs *ff = fuse4fs_get();
+	int ret;
+
+	FUSE4FS_CHECK_CONTEXT(ff);
+	fuse4fs_start(ff);
+	ret = __op_open(ff, path, fp);
+	fuse4fs_finish(ff, ret);
+	return ret;
+}
+
+static int op_read(const char *path EXT2FS_ATTR((unused)), char *buf,
+		   size_t len, off_t offset,
+		   struct fuse_file_info *fp)
+{
+	struct fuse4fs *ff = fuse4fs_get();
+	struct fuse4fs_file_handle *fh = fuse4fs_get_handle(fp);
+	ext2_filsys fs;
+	ext2_file_t efp;
+	errcode_t err;
+	unsigned int got = 0;
+	int ret = 0;
+
+	FUSE4FS_CHECK_CONTEXT(ff);
+	FUSE4FS_CHECK_HANDLE(ff, fh);
+	dbg_printf(ff, "%s: ino=%d off=0x%llx len=0x%zx\n", __func__, fh->ino,
+		   (unsigned long long)offset, len);
+	fs = fuse4fs_start(ff);
+	err = ext2fs_file_open(fs, fh->ino, fh->open_flags, &efp);
+	if (err) {
+		ret = translate_error(fs, fh->ino, err);
+		goto out;
+	}
+
+	err = ext2fs_file_llseek(efp, offset, SEEK_SET, NULL);
+	if (err) {
+		ret = translate_error(fs, fh->ino, err);
+		goto out2;
+	}
+
+	err = ext2fs_file_read(efp, buf, len, &got);
+	if (err) {
+		ret = translate_error(fs, fh->ino, err);
+		goto out2;
+	}
+
+out2:
+	err = ext2fs_file_close(efp);
+	if (ret)
+		goto out;
+	if (err) {
+		ret = translate_error(fs, fh->ino, err);
+		goto out;
+	}
+
+	if (fh->check_flags != X_OK && fuse4fs_is_writeable(ff)) {
+		ret = update_atime(fs, fh->ino);
+		if (ret)
+			goto out;
+	}
+out:
+	fuse4fs_finish(ff, ret);
+	return got ? (int) got : ret;
+}
+
+static int op_write(const char *path EXT2FS_ATTR((unused)),
+		    const char *buf, size_t len, off_t offset,
+		    struct fuse_file_info *fp)
+{
+	struct fuse4fs *ff = fuse4fs_get();
+	struct fuse4fs_file_handle *fh = fuse4fs_get_handle(fp);
+	ext2_filsys fs;
+	ext2_file_t efp;
+	errcode_t err;
+	unsigned int got = 0;
+	int ret = 0;
+
+	FUSE4FS_CHECK_CONTEXT(ff);
+	FUSE4FS_CHECK_HANDLE(ff, fh);
+	dbg_printf(ff, "%s: ino=%d off=0x%llx len=0x%zx\n", __func__, fh->ino,
+		   (unsigned long long) offset, len);
+	fs = fuse4fs_start(ff);
+	if (!fuse4fs_is_writeable(ff)) {
+		ret = -EROFS;
+		goto out;
+	}
+
+	if (!fs_can_allocate(ff, FUSE4FS_B_TO_FSB(ff, len))) {
+		ret = -ENOSPC;
+		goto out;
+	}
+
+	err = ext2fs_file_open(fs, fh->ino, fh->open_flags, &efp);
+	if (err) {
+		ret = translate_error(fs, fh->ino, err);
+		goto out;
+	}
+
+	err = ext2fs_file_llseek(efp, offset, SEEK_SET, NULL);
+	if (err) {
+		ret = translate_error(fs, fh->ino, err);
+		goto out2;
+	}
+
+	err = ext2fs_file_write(efp, buf, len, &got);
+	if (err) {
+		ret = translate_error(fs, fh->ino, err);
+		goto out2;
+	}
+
+	err = ext2fs_file_flush(efp);
+	if (err) {
+		got = 0;
+		ret = translate_error(fs, fh->ino, err);
+		goto out2;
+	}
+
+out2:
+	err = ext2fs_file_close(efp);
+	if (ret)
+		goto out;
+	if (err) {
+		ret = translate_error(fs, fh->ino, err);
+		goto out;
+	}
+
+	ret = update_mtime(fs, fh->ino, NULL);
+	if (ret)
+		goto out;
+
+out:
+	fuse4fs_finish(ff, ret);
+	return got ? (int) got : ret;
+}
+
+static int op_release(const char *path EXT2FS_ATTR((unused)),
+		      struct fuse_file_info *fp)
+{
+	struct fuse4fs *ff = fuse4fs_get();
+	struct fuse4fs_file_handle *fh = fuse4fs_get_handle(fp);
+	ext2_filsys fs;
+	errcode_t err;
+	int ret = 0;
+
+	FUSE4FS_CHECK_CONTEXT(ff);
+	FUSE4FS_CHECK_HANDLE(ff, fh);
+	dbg_printf(ff, "%s: ino=%d\n", __func__, fh->ino);
+	fs = fuse4fs_start(ff);
+
+	if ((fp->flags & O_SYNC) &&
+	    fuse4fs_is_writeable(ff) &&
+	    (fh->open_flags & EXT2_FILE_WRITE)) {
+		err = ext2fs_flush2(fs, EXT2_FLAG_FLUSH_NO_SYNC);
+		if (err)
+			ret = translate_error(fs, fh->ino, err);
+	}
+
+	fp->fh = 0;
+	fuse4fs_finish(ff, ret);
+
+	ext2fs_free_mem(&fh);
+
+	return ret;
+}
+
+static int op_fsync(const char *path EXT2FS_ATTR((unused)),
+		    int datasync EXT2FS_ATTR((unused)),
+		    struct fuse_file_info *fp)
+{
+	struct fuse4fs *ff = fuse4fs_get();
+	struct fuse4fs_file_handle *fh = fuse4fs_get_handle(fp);
+	ext2_filsys fs;
+	errcode_t err;
+	int ret = 0;
+
+	FUSE4FS_CHECK_CONTEXT(ff);
+	FUSE4FS_CHECK_HANDLE(ff, fh);
+	dbg_printf(ff, "%s: ino=%d\n", __func__, fh->ino);
+	fs = fuse4fs_start(ff);
+	/* For now, flush everything, even if it's slow */
+	if (fuse4fs_is_writeable(ff) && fh->open_flags & EXT2_FILE_WRITE) {
+		err = ext2fs_flush2(fs, 0);
+		if (err)
+			ret = translate_error(fs, fh->ino, err);
+	}
+	fuse4fs_finish(ff, ret);
+
+	return ret;
+}
+
+static int op_statfs(const char *path EXT2FS_ATTR((unused)),
+		     struct statvfs *buf)
+{
+	struct fuse4fs *ff = fuse4fs_get();
+	ext2_filsys fs;
+	uint64_t fsid, *f;
+	blk64_t overhead, reserved, free;
+
+	FUSE4FS_CHECK_CONTEXT(ff);
+	dbg_printf(ff, "%s: path=%s\n", __func__, path);
+	fs = fuse4fs_start(ff);
+	buf->f_bsize = fs->blocksize;
+	buf->f_frsize = 0;
+
+	if (ff->minixdf)
+		overhead = 0;
+	else
+		overhead = fs->desc_blocks +
+			   (blk64_t)fs->group_desc_count *
+			   (fs->inode_blocks_per_group + 2);
+	reserved = ext2fs_r_blocks_count(fs->super);
+	if (!reserved)
+		reserved = ext2fs_blocks_count(fs->super) / 10;
+	free = ext2fs_free_blocks_count(fs->super);
+
+	buf->f_blocks = ext2fs_blocks_count(fs->super) - overhead;
+	buf->f_bfree = free;
+	if (free < reserved)
+		buf->f_bavail = 0;
+	else
+		buf->f_bavail = free - reserved;
+	buf->f_files = fs->super->s_inodes_count;
+	buf->f_ffree = fs->super->s_free_inodes_count;
+	buf->f_favail = fs->super->s_free_inodes_count;
+	f = (uint64_t *)fs->super->s_uuid;
+	fsid = *f;
+	f++;
+	fsid ^= *f;
+	buf->f_fsid = fsid;
+	buf->f_flag = 0;
+	if (ff->opstate != F4OP_WRITABLE)
+		buf->f_flag |= ST_RDONLY;
+	buf->f_namemax = EXT2_NAME_LEN;
+	fuse4fs_finish(ff, 0);
+
+	return 0;
+}
+
+static const char *valid_xattr_prefixes[] = {
+	"user.",
+	"trusted.",
+	"security.",
+	"gnu.",
+	"system.",
+};
+
+static int validate_xattr_name(const char *name)
+{
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(valid_xattr_prefixes); i++) {
+		if (!strncmp(name, valid_xattr_prefixes[i],
+					strlen(valid_xattr_prefixes[i])))
+			return 1;
+	}
+
+	return 0;
+}
+
+static int op_getxattr(const char *path, const char *key, char *value,
+		       size_t len)
+{
+	struct fuse4fs *ff = fuse4fs_get();
+	ext2_filsys fs;
+	void *ptr;
+	size_t plen;
+	ext2_ino_t ino;
+	errcode_t err;
+	int ret = 0;
+
+	if (!validate_xattr_name(key))
+		return -ENODATA;
+
+	FUSE4FS_CHECK_CONTEXT(ff);
+	fs = fuse4fs_start(ff);
+	if (!ext2fs_has_feature_xattr(fs->super)) {
+		ret = -ENOTSUP;
+		goto out;
+	}
+
+	err = ext2fs_namei(fs, EXT2_ROOT_INO, EXT2_ROOT_INO, path, &ino);
+	if (err || ino == 0) {
+		ret = translate_error(fs, 0, err);
+		goto out;
+	}
+	dbg_printf(ff, "%s: ino=%d name=%s\n", __func__, ino, key);
+
+	ret = check_inum_access(ff, ino, R_OK);
+	if (ret)
+		goto out;
+
+	ret = __getxattr(ff, ino, key, &ptr, &plen);
+	if (ret)
+		goto out;
+
+	if (!len) {
+		ret = plen;
+	} else if (len < plen) {
+		ret = -ERANGE;
+	} else {
+		memcpy(value, ptr, plen);
+		ret = plen;
+	}
+
+	ext2fs_free_mem(&ptr);
+out:
+	fuse4fs_finish(ff, ret);
+
+	return ret;
+}
+
+static int count_buffer_space(char *name, char *value EXT2FS_ATTR((unused)),
+			      size_t value_len EXT2FS_ATTR((unused)),
+			      void *data)
+{
+	unsigned int *x = data;
+
+	*x = *x + strlen(name) + 1;
+	return 0;
+}
+
+static int copy_names(char *name, char *value EXT2FS_ATTR((unused)),
+		      size_t value_len EXT2FS_ATTR((unused)), void *data)
+{
+	char **b = data;
+	size_t name_len = strlen(name);
+
+	memcpy(*b, name, name_len + 1);
+	*b = *b + name_len + 1;
+
+	return 0;
+}
+
+static int op_listxattr(const char *path, char *names, size_t len)
+{
+	struct fuse4fs *ff = fuse4fs_get();
+	ext2_filsys fs;
+	struct ext2_xattr_handle *h;
+	unsigned int bufsz;
+	ext2_ino_t ino;
+	errcode_t err;
+	int ret = 0;
+
+	FUSE4FS_CHECK_CONTEXT(ff);
+	fs = fuse4fs_start(ff);
+	if (!ext2fs_has_feature_xattr(fs->super)) {
+		ret = -ENOTSUP;
+		goto out;
+	}
+
+	err = ext2fs_namei(fs, EXT2_ROOT_INO, EXT2_ROOT_INO, path, &ino);
+	if (err || ino == 0) {
+		ret = translate_error(fs, ino, err);
+		goto out;
+	}
+	dbg_printf(ff, "%s: ino=%d\n", __func__, ino);
+
+	ret = check_inum_access(ff, ino, R_OK);
+	if (ret)
+		goto out;
+
+	err = ext2fs_xattrs_open(fs, ino, &h);
+	if (err) {
+		ret = translate_error(fs, ino, err);
+		goto out;
+	}
+
+	err = ext2fs_xattrs_read(h);
+	if (err) {
+		ret = translate_error(fs, ino, err);
+		goto out2;
+	}
+
+	/* Count buffer space needed for names */
+	bufsz = 0;
+	err = ext2fs_xattrs_iterate(h, count_buffer_space, &bufsz);
+	if (err) {
+		ret = translate_error(fs, ino, err);
+		goto out2;
+	}
+
+	if (len == 0) {
+		ret = bufsz;
+		goto out2;
+	} else if (len < bufsz) {
+		ret = -ERANGE;
+		goto out2;
+	}
+
+	/* Copy names out */
+	memset(names, 0, len);
+	err = ext2fs_xattrs_iterate(h, copy_names, &names);
+	if (err) {
+		ret = translate_error(fs, ino, err);
+		goto out2;
+	}
+	ret = bufsz;
+out2:
+	err = ext2fs_xattrs_close(&h);
+	if (err && !ret)
+		ret = translate_error(fs, ino, err);
+out:
+	fuse4fs_finish(ff, ret);
+
+	return ret;
+}
+
+static int op_setxattr(const char *path EXT2FS_ATTR((unused)),
+		       const char *key, const char *value,
+		       size_t len, int flags)
+{
+	struct fuse4fs *ff = fuse4fs_get();
+	ext2_filsys fs;
+	struct ext2_xattr_handle *h;
+	ext2_ino_t ino;
+	errcode_t err;
+	int ret = 0;
+
+	if (flags & ~(XATTR_CREATE | XATTR_REPLACE))
+		return -EOPNOTSUPP;
+
+	if (!validate_xattr_name(key))
+		return -EINVAL;
+
+	FUSE4FS_CHECK_CONTEXT(ff);
+	fs = fuse4fs_start(ff);
+	if (!ext2fs_has_feature_xattr(fs->super)) {
+		ret = -ENOTSUP;
+		goto out;
+	}
+
+	err = ext2fs_namei(fs, EXT2_ROOT_INO, EXT2_ROOT_INO, path, &ino);
+	if (err || ino == 0) {
+		ret = translate_error(fs, 0, err);
+		goto out;
+	}
+	dbg_printf(ff, "%s: ino=%d name=%s\n", __func__, ino, key);
+
+	ret = check_inum_access(ff, ino, W_OK);
+	if (ret == -EACCES) {
+		ret = -EPERM;
+		goto out;
+	} else if (ret)
+		goto out;
+
+	err = ext2fs_xattrs_open(fs, ino, &h);
+	if (err) {
+		ret = translate_error(fs, ino, err);
+		goto out;
+	}
+
+	err = ext2fs_xattrs_read(h);
+	if (err) {
+		ret = translate_error(fs, ino, err);
+		goto out2;
+	}
+
+	if (flags & (XATTR_CREATE | XATTR_REPLACE)) {
+		void *buf;
+		size_t buflen;
+
+		err = ext2fs_xattr_get(h, key, &buf, &buflen);
+		switch (err) {
+		case EXT2_ET_EA_KEY_NOT_FOUND:
+			if (flags & XATTR_REPLACE) {
+				ret = -ENODATA;
+				goto out2;
+			}
+			break;
+		case 0:
+			ext2fs_free_mem(&buf);
+			if (flags & XATTR_CREATE) {
+				ret = -EEXIST;
+				goto out2;
+			}
+			break;
+		default:
+			ret = translate_error(fs, ino, err);
+			goto out2;
+		}
+	}
+
+	err = ext2fs_xattr_set(h, key, value, len);
+	if (err) {
+		ret = translate_error(fs, ino, err);
+		goto out2;
+	}
+
+	ret = update_ctime(fs, ino, NULL);
+out2:
+	err = ext2fs_xattrs_close(&h);
+	if (!ret && err)
+		ret = translate_error(fs, ino, err);
+out:
+	fuse4fs_finish(ff, ret);
+
+	return ret;
+}
+
+static int op_removexattr(const char *path, const char *key)
+{
+	struct fuse4fs *ff = fuse4fs_get();
+	ext2_filsys fs;
+	struct ext2_xattr_handle *h;
+	void *buf;
+	size_t buflen;
+	ext2_ino_t ino;
+	errcode_t err;
+	int ret = 0;
+
+	/*
+	 * Once in a while libfuse gives us a no-name xattr to delete as part
+	 * of clearing ACLs.  Just pretend we cleared them.
+	 */
+	if (key[0] == 0)
+		return 0;
+
+	if (!validate_xattr_name(key))
+		return -ENODATA;
+
+	FUSE4FS_CHECK_CONTEXT(ff);
+	fs = fuse4fs_start(ff);
+	if (!ext2fs_has_feature_xattr(fs->super)) {
+		ret = -ENOTSUP;
+		goto out;
+	}
+
+	if (!fs_can_allocate(ff, 1)) {
+		ret = -ENOSPC;
+		goto out;
+	}
+
+	err = ext2fs_namei(fs, EXT2_ROOT_INO, EXT2_ROOT_INO, path, &ino);
+	if (err || ino == 0) {
+		ret = translate_error(fs, 0, err);
+		goto out;
+	}
+	dbg_printf(ff, "%s: ino=%d name=%s\n", __func__, ino, key);
+
+	ret = check_inum_access(ff, ino, W_OK);
+	if (ret)
+		goto out;
+
+	err = ext2fs_xattrs_open(fs, ino, &h);
+	if (err) {
+		ret = translate_error(fs, ino, err);
+		goto out;
+	}
+
+	err = ext2fs_xattrs_read(h);
+	if (err) {
+		ret = translate_error(fs, ino, err);
+		goto out2;
+	}
+
+	err = ext2fs_xattr_get(h, key, &buf, &buflen);
+	switch (err) {
+	case EXT2_ET_EA_KEY_NOT_FOUND:
+		/*
+		 * ACLs are special snowflakes that require a 0 return when
+		 * the ACL never existed in the first place.
+		 */
+		if (!strncmp(XATTR_SECURITY_PREFIX, key,
+			     XATTR_SECURITY_PREFIX_LEN))
+			ret = 0;
+		else
+			ret = -ENODATA;
+		goto out2;
+	case 0:
+		ext2fs_free_mem(&buf);
+		break;
+	default:
+		ret = translate_error(fs, ino, err);
+		goto out2;
+	}
+
+	err = ext2fs_xattr_remove(h, key);
+	if (err) {
+		ret = translate_error(fs, ino, err);
+		goto out2;
+	}
+
+	ret = update_ctime(fs, ino, NULL);
+out2:
+	err = ext2fs_xattrs_close(&h);
+	if (err && !ret)
+		ret = translate_error(fs, ino, err);
+out:
+	fuse4fs_finish(ff, ret);
+
+	return ret;
+}
+
+struct readdir_iter {
+	void *buf;
+	ext2_filsys fs;
+	fuse_fill_dir_t func;
+
+	struct fuse4fs *ff;
+	enum fuse_readdir_flags flags;
+	unsigned int nr;
+	off_t startpos;
+	off_t dirpos;
+};
+
+static inline mode_t dirent_fmode(ext2_filsys fs,
+				   const struct ext2_dir_entry *dirent)
+{
+	if (!ext2fs_has_feature_filetype(fs->super))
+		return 0;
+
+	switch (ext2fs_dirent_file_type(dirent)) {
+	case EXT2_FT_REG_FILE:
+		return S_IFREG;
+	case EXT2_FT_DIR:
+		return S_IFDIR;
+	case EXT2_FT_CHRDEV:
+		return S_IFCHR;
+	case EXT2_FT_BLKDEV:
+		return S_IFBLK;
+	case EXT2_FT_FIFO:
+		return S_IFIFO;
+	case EXT2_FT_SOCK:
+		return S_IFSOCK;
+	case EXT2_FT_SYMLINK:
+		return S_IFLNK;
+	}
+
+	return 0;
+}
+
+static int op_readdir_iter(ext2_ino_t dir EXT2FS_ATTR((unused)),
+			   int entry EXT2FS_ATTR((unused)),
+			   struct ext2_dir_entry *dirent,
+			   int offset EXT2FS_ATTR((unused)),
+			   int blocksize EXT2FS_ATTR((unused)),
+			   char *buf EXT2FS_ATTR((unused)), void *data)
+{
+	struct readdir_iter *i = data;
+	char namebuf[EXT2_NAME_LEN + 1];
+	struct stat stat = {
+		.st_ino = dirent->inode,
+		.st_mode = dirent_fmode(i->fs, dirent),
+	};
+	int ret;
+
+	i->dirpos++;
+	if (i->startpos >= i->dirpos)
+		return 0;
+
+	dbg_printf(i->ff, "READDIR%s ino=%d %u offset=0x%llx\n",
+			i->flags == FUSE_READDIR_PLUS ? "PLUS" : "",
+			dir,
+			i->nr++,
+			(unsigned long long)i->dirpos);
+
+	if (i->flags == FUSE_READDIR_PLUS) {
+		ret = stat_inode(i->fs, dirent->inode, &stat);
+		if (ret)
+			return DIRENT_ABORT;
+	}
+
+	memcpy(namebuf, dirent->name, dirent->name_len & 0xFF);
+	namebuf[dirent->name_len & 0xFF] = 0;
+	ret = i->func(i->buf, namebuf, &stat, i->dirpos , 0);
+	if (ret)
+		return DIRENT_ABORT;
+
+	return 0;
+}
+
+static int op_readdir(const char *path EXT2FS_ATTR((unused)), void *buf,
+		      fuse_fill_dir_t fill_func, off_t offset,
+		      struct fuse_file_info *fp, enum fuse_readdir_flags flags)
+{
+	struct fuse4fs *ff = fuse4fs_get();
+	struct fuse4fs_file_handle *fh = fuse4fs_get_handle(fp);
+	errcode_t err;
+	struct readdir_iter i = {
+		.ff = ff,
+		.dirpos = 0,
+		.startpos = offset,
+		.flags = flags,
+	};
+	int ret = 0;
+
+	FUSE4FS_CHECK_CONTEXT(ff);
+	FUSE4FS_CHECK_HANDLE(ff, fh);
+	dbg_printf(ff, "%s: ino=%d offset=0x%llx\n", __func__, fh->ino,
+			(unsigned long long)offset);
+	i.fs = fuse4fs_start(ff);
+	i.buf = buf;
+	i.func = fill_func;
+	err = ext2fs_dir_iterate2(i.fs, fh->ino, 0, NULL, op_readdir_iter, &i);
+	if (err) {
+		ret = translate_error(i.fs, fh->ino, err);
+		goto out;
+	}
+
+	if (fuse4fs_is_writeable(ff)) {
+		ret = update_atime(i.fs, fh->ino);
+		if (ret)
+			goto out;
+	}
+out:
+	fuse4fs_finish(ff, ret);
+	return ret;
+}
+
+static int op_access(const char *path, int mask)
+{
+	struct fuse4fs *ff = fuse4fs_get();
+	ext2_filsys fs;
+	errcode_t err;
+	ext2_ino_t ino;
+	int ret = 0;
+
+	FUSE4FS_CHECK_CONTEXT(ff);
+	dbg_printf(ff, "%s: path=%s mask=0x%x\n", __func__, path, mask);
+	fs = fuse4fs_start(ff);
+	err = ext2fs_namei(fs, EXT2_ROOT_INO, EXT2_ROOT_INO, path, &ino);
+	if (err || ino == 0) {
+		ret = translate_error(fs, 0, err);
+		goto out;
+	}
+
+	ret = check_inum_access(ff, ino, mask);
+	if (ret)
+		goto out;
+
+out:
+	fuse4fs_finish(ff, ret);
+	return ret;
+}
+
+static int op_create(const char *path, mode_t mode, struct fuse_file_info *fp)
+{
+	struct fuse_context *ctxt = fuse_get_context();
+	struct fuse4fs *ff = fuse4fs_get();
+	ext2_filsys fs;
+	ext2_ino_t parent, child;
+	char *temp_path;
+	errcode_t err;
+	char *node_name, a;
+	int filetype;
+	struct ext2_inode_large inode;
+	gid_t gid;
+	int ret = 0;
+
+	FUSE4FS_CHECK_CONTEXT(ff);
+	dbg_printf(ff, "%s: path=%s mode=0%o\n", __func__, path, mode);
+	temp_path = strdup(path);
+	if (!temp_path) {
+		ret = -ENOMEM;
+		goto out;
+	}
+	node_name = strrchr(temp_path, '/');
+	if (!node_name) {
+		ret = -ENOMEM;
+		goto out;
+	}
+	node_name++;
+	a = *node_name;
+	*node_name = 0;
+
+	fs = fuse4fs_start(ff);
+	if (!fs_can_allocate(ff, 1)) {
+		ret = -ENOSPC;
+		goto out2;
+	}
+
+	err = ext2fs_namei(fs, EXT2_ROOT_INO, EXT2_ROOT_INO, temp_path,
+			   &parent);
+	if (err) {
+		ret = translate_error(fs, 0, err);
+		goto out2;
+	}
+
+	ret = check_inum_access(ff, parent, A_OK | W_OK);
+	if (ret)
+		goto out2;
+
+	err = fuse4fs_new_child_gid(ff, parent, &gid, NULL);
+	if (err)
+		goto out2;
+
+	*node_name = a;
+
+	filetype = ext2_file_type(mode);
+
+	err = ext2fs_new_inode(fs, parent, mode, 0, &child);
+	if (err) {
+		ret = translate_error(fs, parent, err);
+		goto out2;
+	}
+
+	dbg_printf(ff, "%s: creating ino=%d/name=%s in dir=%d\n", __func__, child,
+		   node_name, parent);
+	err = ext2fs_link(fs, parent, node_name, child,
+			  filetype | EXT2FS_LINK_EXPAND);
+	if (err) {
+		ret = translate_error(fs, parent, err);
+		goto out2;
+	}
+
+	ret = update_mtime(fs, parent, NULL);
+	if (ret)
+		goto out2;
+
+	memset(&inode, 0, sizeof(inode));
+	inode.i_mode = mode;
+	inode.i_links_count = 1;
+	fuse4fs_set_extra_isize(ff, child, &inode);
+	fuse4fs_set_uid(&inode, ctxt->uid);
+	fuse4fs_set_gid(&inode, gid);
+	if (ext2fs_has_feature_extents(fs->super)) {
+		ext2_extent_handle_t handle;
+
+		inode.i_flags &= ~EXT4_EXTENTS_FL;
+		ret = ext2fs_extent_open2(fs, child,
+					  EXT2_INODE(&inode), &handle);
+		if (ret) {
+			ret = translate_error(fs, child, err);
+			goto out2;
+		}
+
+		ext2fs_extent_free(handle);
+	}
+
+	err = ext2fs_write_new_inode(fs, child, EXT2_INODE(&inode));
+	if (err) {
+		ret = translate_error(fs, child, err);
+		goto out2;
+	}
+
+	inode.i_generation = ff->next_generation++;
+	init_times(&inode);
+	err = fuse4fs_write_inode(fs, child, &inode);
+	if (err) {
+		ret = translate_error(fs, child, err);
+		goto out2;
+	}
+
+	ext2fs_inode_alloc_stats2(fs, child, 1, 0);
+
+	ret = propagate_default_acls(ff, parent, child, inode.i_mode);
+	if (ret)
+		goto out2;
+
+	fp->flags &= ~O_TRUNC;
+	ret = __op_open(ff, path, fp);
+	if (ret)
+		goto out2;
+
+	ret = fuse4fs_dirsync_flush(ff, parent, NULL);
+	if (ret)
+		goto out2;
+
+out2:
+	fuse4fs_finish(ff, ret);
+out:
+	free(temp_path);
+	return ret;
+}
+
+static int op_utimens(const char *path, const struct timespec ctv[2],
+		      struct fuse_file_info *fi)
+{
+	struct fuse4fs *ff = fuse4fs_get();
+	struct timespec tv[2];
+	ext2_filsys fs;
+	errcode_t err;
+	ext2_ino_t ino;
+	struct ext2_inode_large inode;
+	int access = W_OK;
+	int ret = 0;
+
+	FUSE4FS_CHECK_CONTEXT(ff);
+	fs = fuse4fs_start(ff);
+	ret = fuse4fs_file_ino(ff, path, fi, &ino);
+	if (ret)
+		goto out;
+	dbg_printf(ff, "%s: ino=%d atime=%lld.%ld mtime=%lld.%ld\n", __func__,
+			ino,
+			(long long int)ctv[0].tv_sec, ctv[0].tv_nsec,
+			(long long int)ctv[1].tv_sec, ctv[1].tv_nsec);
+
+	/*
+	 * ext4 allows timestamp updates of append-only files but only if we're
+	 * setting to current time
+	 */
+	if (ctv[0].tv_nsec == UTIME_NOW && ctv[1].tv_nsec == UTIME_NOW)
+		access |= A_OK;
+	ret = check_inum_access(ff, ino, access);
+	if (ret)
+		goto out;
+
+	err = fuse4fs_read_inode(fs, ino, &inode);
+	if (err) {
+		ret = translate_error(fs, ino, err);
+		goto out;
+	}
+
+	tv[0] = ctv[0];
+	tv[1] = ctv[1];
+#ifdef UTIME_NOW
+	if (tv[0].tv_nsec == UTIME_NOW)
+		get_now(tv);
+	if (tv[1].tv_nsec == UTIME_NOW)
+		get_now(tv + 1);
+#endif /* UTIME_NOW */
+#ifdef UTIME_OMIT
+	if (tv[0].tv_nsec != UTIME_OMIT)
+		EXT4_INODE_SET_XTIME(i_atime, &tv[0], &inode);
+	if (tv[1].tv_nsec != UTIME_OMIT)
+		EXT4_INODE_SET_XTIME(i_mtime, &tv[1], &inode);
+#endif /* UTIME_OMIT */
+	ret = update_ctime(fs, ino, &inode);
+	if (ret)
+		goto out;
+
+	err = fuse4fs_write_inode(fs, ino, &inode);
+	if (err) {
+		ret = translate_error(fs, ino, err);
+		goto out;
+	}
+
+out:
+	fuse4fs_finish(ff, ret);
+	return ret;
+}
+
+#define FUSE4FS_MODIFIABLE_IFLAGS \
+	(EXT2_FL_USER_MODIFIABLE & ~(EXT4_EXTENTS_FL | EXT4_CASEFOLD_FL | \
+				     EXT3_JOURNAL_DATA_FL))
+
+static inline int set_iflags(struct ext2_inode_large *inode, __u32 iflags)
+{
+	if ((inode->i_flags ^ iflags) & ~FUSE4FS_MODIFIABLE_IFLAGS)
+		return -EINVAL;
+
+	inode->i_flags = (inode->i_flags & ~FUSE4FS_MODIFIABLE_IFLAGS) |
+			 (iflags & FUSE4FS_MODIFIABLE_IFLAGS);
+	return 0;
+}
+
+#ifdef SUPPORT_I_FLAGS
+static int ioctl_getflags(struct fuse4fs *ff, struct fuse4fs_file_handle *fh,
+			  void *data)
+{
+	ext2_filsys fs = ff->fs;
+	errcode_t err;
+	struct ext2_inode_large inode;
+
+	dbg_printf(ff, "%s: ino=%d\n", __func__, fh->ino);
+	err = fuse4fs_read_inode(fs, fh->ino, &inode);
+	if (err)
+		return translate_error(fs, fh->ino, err);
+
+	*(__u32 *)data = inode.i_flags & EXT2_FL_USER_VISIBLE;
+	return 0;
+}
+
+static int ioctl_setflags(struct fuse4fs *ff, struct fuse4fs_file_handle *fh,
+			  void *data)
+{
+	ext2_filsys fs = ff->fs;
+	errcode_t err;
+	struct ext2_inode_large inode;
+	int ret;
+	__u32 flags = *(__u32 *)data;
+	struct fuse_context *ctxt = fuse_get_context();
+
+	dbg_printf(ff, "%s: ino=%d\n", __func__, fh->ino);
+	err = fuse4fs_read_inode(fs, fh->ino, &inode);
+	if (err)
+		return translate_error(fs, fh->ino, err);
+
+	if (want_check_owner(ff, ctxt) && inode_uid(inode) != ctxt->uid)
+		return -EPERM;
+
+	ret = set_iflags(&inode, flags);
+	if (ret)
+		return ret;
+
+	ret = update_ctime(fs, fh->ino, &inode);
+	if (ret)
+		return ret;
+
+	err = fuse4fs_write_inode(fs, fh->ino, &inode);
+	if (err)
+		return translate_error(fs, fh->ino, err);
+
+	return 0;
+}
+
+static int ioctl_getversion(struct fuse4fs *ff, struct fuse4fs_file_handle *fh,
+			    void *data)
+{
+	ext2_filsys fs = ff->fs;
+	errcode_t err;
+	struct ext2_inode_large inode;
+
+	dbg_printf(ff, "%s: ino=%d\n", __func__, fh->ino);
+	err = fuse4fs_read_inode(fs, fh->ino, &inode);
+	if (err)
+		return translate_error(fs, fh->ino, err);
+
+	*(__u32 *)data = inode.i_generation;
+	return 0;
+}
+
+static int ioctl_setversion(struct fuse4fs *ff, struct fuse4fs_file_handle *fh,
+			    void *data)
+{
+	ext2_filsys fs = ff->fs;
+	errcode_t err;
+	struct ext2_inode_large inode;
+	int ret;
+	__u32 generation = *(__u32 *)data;
+	struct fuse_context *ctxt = fuse_get_context();
+
+	dbg_printf(ff, "%s: ino=%d\n", __func__, fh->ino);
+	err = fuse4fs_read_inode(fs, fh->ino, &inode);
+	if (err)
+		return translate_error(fs, fh->ino, err);
+
+	if (want_check_owner(ff, ctxt) && inode_uid(inode) != ctxt->uid)
+		return -EPERM;
+
+	inode.i_generation = generation;
+
+	ret = update_ctime(fs, fh->ino, &inode);
+	if (ret)
+		return ret;
+
+	err = fuse4fs_write_inode(fs, fh->ino, &inode);
+	if (err)
+		return translate_error(fs, fh->ino, err);
+
+	return 0;
+}
+#endif /* SUPPORT_I_FLAGS */
+
+#ifdef FS_IOC_FSGETXATTR
+static __u32 iflags_to_fsxflags(__u32 iflags)
+{
+	__u32 xflags = 0;
+
+	if (iflags & FS_SYNC_FL)
+		xflags |= FS_XFLAG_SYNC;
+	if (iflags & FS_IMMUTABLE_FL)
+		xflags |= FS_XFLAG_IMMUTABLE;
+	if (iflags & FS_APPEND_FL)
+		xflags |= FS_XFLAG_APPEND;
+	if (iflags & FS_NODUMP_FL)
+		xflags |= FS_XFLAG_NODUMP;
+	if (iflags & FS_NOATIME_FL)
+		xflags |= FS_XFLAG_NOATIME;
+	if (iflags & FS_DAX_FL)
+		xflags |= FS_XFLAG_DAX;
+	if (iflags & FS_PROJINHERIT_FL)
+		xflags |= FS_XFLAG_PROJINHERIT;
+	return xflags;
+}
+
+static int ioctl_fsgetxattr(struct fuse4fs *ff, struct fuse4fs_file_handle *fh,
+			    void *data)
+{
+	ext2_filsys fs = ff->fs;
+	errcode_t err;
+	struct ext2_inode_large inode;
+	struct fsxattr *fsx = data;
+	unsigned int inode_size;
+
+	dbg_printf(ff, "%s: ino=%d\n", __func__, fh->ino);
+	err = fuse4fs_read_inode(fs, fh->ino, &inode);
+	if (err)
+		return translate_error(fs, fh->ino, err);
+
+	memset(fsx, 0, sizeof(*fsx));
+	inode_size = EXT2_GOOD_OLD_INODE_SIZE + inode.i_extra_isize;
+	if (ext2fs_inode_includes(inode_size, i_projid))
+		fsx->fsx_projid = inode_projid(inode);
+	fsx->fsx_xflags = iflags_to_fsxflags(inode.i_flags);
+	return 0;
+}
+
+static __u32 fsxflags_to_iflags(__u32 xflags)
+{
+	__u32 iflags = 0;
+
+	if (xflags & FS_XFLAG_IMMUTABLE)
+		iflags |= FS_IMMUTABLE_FL;
+	if (xflags & FS_XFLAG_APPEND)
+		iflags |= FS_APPEND_FL;
+	if (xflags & FS_XFLAG_SYNC)
+		iflags |= FS_SYNC_FL;
+	if (xflags & FS_XFLAG_NOATIME)
+		iflags |= FS_NOATIME_FL;
+	if (xflags & FS_XFLAG_NODUMP)
+		iflags |= FS_NODUMP_FL;
+	if (xflags & FS_XFLAG_DAX)
+		iflags |= FS_DAX_FL;
+	if (xflags & FS_XFLAG_PROJINHERIT)
+		iflags |= FS_PROJINHERIT_FL;
+	return iflags;
+}
+
+#define FUSE4FS_MODIFIABLE_XFLAGS (FS_XFLAG_IMMUTABLE | \
+				   FS_XFLAG_APPEND | \
+				   FS_XFLAG_SYNC | \
+				   FS_XFLAG_NOATIME | \
+				   FS_XFLAG_NODUMP | \
+				   FS_XFLAG_PROJINHERIT)
+
+#define FUSE4FS_MODIFIABLE_IXFLAGS (FS_IMMUTABLE_FL | \
+				    FS_APPEND_FL | \
+				    FS_SYNC_FL | \
+				    FS_NOATIME_FL | \
+				    FS_NODUMP_FL | \
+				    FS_PROJINHERIT_FL)
+
+static inline int set_xflags(struct ext2_inode_large *inode, __u32 xflags)
+{
+	__u32 iflags;
+
+	if (xflags & ~FUSE4FS_MODIFIABLE_XFLAGS)
+		return -EINVAL;
+
+	iflags = fsxflags_to_iflags(xflags);
+	inode->i_flags = (inode->i_flags & ~FUSE4FS_MODIFIABLE_IXFLAGS) |
+			 (iflags & FUSE4FS_MODIFIABLE_IXFLAGS);
+	return 0;
+}
+
+static int ioctl_fssetxattr(struct fuse4fs *ff, struct fuse4fs_file_handle *fh,
+			    void *data)
+{
+	ext2_filsys fs = ff->fs;
+	errcode_t err;
+	struct ext2_inode_large inode;
+	int ret;
+	struct fuse_context *ctxt = fuse_get_context();
+	struct fsxattr *fsx = data;
+	unsigned int inode_size;
+
+	dbg_printf(ff, "%s: ino=%d\n", __func__, fh->ino);
+	err = fuse4fs_read_inode(fs, fh->ino, &inode);
+	if (err)
+		return translate_error(fs, fh->ino, err);
+
+	if (want_check_owner(ff, ctxt) && inode_uid(inode) != ctxt->uid)
+		return -EPERM;
+
+	ret = set_xflags(&inode, fsx->fsx_xflags);
+	if (ret)
+		return ret;
+
+	inode_size = EXT2_GOOD_OLD_INODE_SIZE + inode.i_extra_isize;
+	if (ext2fs_inode_includes(inode_size, i_projid))
+		inode.i_projid = fsx->fsx_projid;
+
+	ret = update_ctime(fs, fh->ino, &inode);
+	if (ret)
+		return ret;
+
+	err = fuse4fs_write_inode(fs, fh->ino, &inode);
+	if (err)
+		return translate_error(fs, fh->ino, err);
+
+	return 0;
+}
+#endif /* FS_IOC_FSGETXATTR */
+
+#ifdef FITRIM
+static int ioctl_fitrim(struct fuse4fs *ff, struct fuse4fs_file_handle *fh,
+			void *data)
+{
+	ext2_filsys fs = ff->fs;
+	struct fstrim_range *fr = data;
+	blk64_t start, end, max_blocks, b, cleared, minlen;
+	blk64_t max_blks = ext2fs_blocks_count(fs->super);
+	errcode_t err = 0;
+
+	if (!fuse4fs_is_writeable(ff))
+		return -EROFS;
+
+	start = FUSE4FS_B_TO_FSBT(ff, fr->start);
+	if (fr->len == -1ULL)
+		end = -1ULL;
+	else
+		end = FUSE4FS_B_TO_FSBT(ff, fr->start + fr->len - 1);
+	minlen = FUSE4FS_B_TO_FSBT(ff, fr->minlen);
+
+	if (EXT2FS_NUM_B2C(fs, minlen) > EXT2_CLUSTERS_PER_GROUP(fs->super) ||
+	    start >= max_blks ||
+	    fr->len < fs->blocksize)
+		return -EINVAL;
+
+	dbg_printf(ff, "%s: start=0x%llx end=0x%llx minlen=0x%llx\n", __func__,
+		   start, end, minlen);
+
+	if (start < fs->super->s_first_data_block)
+		start = fs->super->s_first_data_block;
+
+	if (end < fs->super->s_first_data_block)
+		end = fs->super->s_first_data_block;
+	if (end >= ext2fs_blocks_count(fs->super))
+		end = ext2fs_blocks_count(fs->super) - 1;
+
+	cleared = 0;
+	max_blocks = FUSE4FS_B_TO_FSBT(ff, 2048ULL * 1024 * 1024);
+
+	fr->len = 0;
+	while (start <= end) {
+		err = ext2fs_find_first_zero_block_bitmap2(fs->block_map,
+							   start, end, &start);
+		switch (err) {
+		case 0:
+			break;
+		case ENOENT:
+			/* no free blocks found, so we're done */
+			err = 0;
+			goto out;
+		default:
+			return translate_error(fs, fh->ino, err);
+		}
+
+		b = start + max_blocks < end ? start + max_blocks : end;
+		err =  ext2fs_find_first_set_block_bitmap2(fs->block_map,
+							   start, b, &b);
+		switch (err) {
+		case 0:
+			break;
+		case ENOENT:
+			/*
+			 * No free blocks found between start and b; discard
+			 * the entire range.
+			 */
+			err = 0;
+			break;
+		default:
+			return translate_error(fs, fh->ino, err);
+		}
+
+		if (b - start >= minlen) {
+			err = io_channel_discard(fs->io, start, b - start);
+			if (err == EBUSY) {
+				/*
+				 * Apparently dm-thinp can return EBUSY when
+				 * it's too busy deallocating thinp units to
+				 * deallocate more.  Swallow these errors.
+				 */
+				err = 0;
+			}
+			if (err)
+				return translate_error(fs, fh->ino, err);
+			cleared += b - start;
+			fr->len = FUSE4FS_FSB_TO_B(ff, cleared);
+		}
+		start = b + 1;
+	}
+
+out:
+	fr->len = FUSE4FS_FSB_TO_B(ff, cleared);
+	dbg_printf(ff, "%s: len=%llu err=%ld\n", __func__, fr->len, err);
+	return err;
+}
+#endif /* FITRIM */
+
+#ifndef EXT4_IOC_SHUTDOWN
+# define EXT4_IOC_SHUTDOWN	_IOR('X', 125, __u32)
+#endif
+
+static int ioctl_shutdown(struct fuse4fs *ff, struct fuse4fs_file_handle *fh,
+			  void *data)
+{
+	struct fuse_context *ctxt = fuse_get_context();
+	ext2_filsys fs = ff->fs;
+
+	if (!is_superuser(ff, ctxt))
+		return -EPERM;
+
+	err_printf(ff, "%s.\n", _("shut down requested"));
+
+	/*
+	 * EXT4_IOC_SHUTDOWN inherited the inverted polarity on the ioctl
+	 * direction from XFS.  Unfortunately, that means we can't implement
+	 * any of the flags.  Flush whatever is dirty and shut down.
+	 */
+	if (ff->opstate == F4OP_WRITABLE)
+		ext2fs_flush2(fs, 0);
+	ff->opstate = F4OP_SHUTDOWN;
+
+	return 0;
+}
+
+static int op_ioctl(const char *path EXT2FS_ATTR((unused)),
+		    unsigned int cmd,
+		    void *arg EXT2FS_ATTR((unused)),
+		    struct fuse_file_info *fp,
+		    unsigned int flags EXT2FS_ATTR((unused)), void *data)
+{
+	struct fuse4fs *ff = fuse4fs_get();
+	struct fuse4fs_file_handle *fh = fuse4fs_get_handle(fp);
+	int ret = 0;
+
+	FUSE4FS_CHECK_CONTEXT(ff);
+	FUSE4FS_CHECK_HANDLE(ff, fh);
+	fuse4fs_start(ff);
+	switch ((unsigned long) cmd) {
+#ifdef SUPPORT_I_FLAGS
+	case EXT2_IOC_GETFLAGS:
+		ret = ioctl_getflags(ff, fh, data);
+		break;
+	case EXT2_IOC_SETFLAGS:
+		ret = ioctl_setflags(ff, fh, data);
+		break;
+	case EXT2_IOC_GETVERSION:
+		ret = ioctl_getversion(ff, fh, data);
+		break;
+	case EXT2_IOC_SETVERSION:
+		ret = ioctl_setversion(ff, fh, data);
+		break;
+#endif
+#ifdef FS_IOC_FSGETXATTR
+	case FS_IOC_FSGETXATTR:
+		ret = ioctl_fsgetxattr(ff, fh, data);
+		break;
+	case FS_IOC_FSSETXATTR:
+		ret = ioctl_fssetxattr(ff, fh, data);
+		break;
+#endif
+#ifdef FITRIM
+	case FITRIM:
+		ret = ioctl_fitrim(ff, fh, data);
+		break;
+#endif
+	case EXT4_IOC_SHUTDOWN:
+		ret = ioctl_shutdown(ff, fh, data);
+		break;
+	default:
+		dbg_printf(ff, "%s: Unknown ioctl %d\n", __func__, cmd);
+		ret = -ENOTTY;
+	}
+	fuse4fs_finish(ff, ret);
+
+	return ret;
+}
+
+static int op_bmap(const char *path, size_t blocksize EXT2FS_ATTR((unused)),
+		   uint64_t *idx)
+{
+	struct fuse4fs *ff = fuse4fs_get();
+	ext2_filsys fs;
+	ext2_ino_t ino;
+	errcode_t err;
+	int ret = 0;
+
+	FUSE4FS_CHECK_CONTEXT(ff);
+	fs = fuse4fs_start(ff);
+	err = ext2fs_namei(fs, EXT2_ROOT_INO, EXT2_ROOT_INO, path, &ino);
+	if (err) {
+		ret = translate_error(fs, 0, err);
+		goto out;
+	}
+	dbg_printf(ff, "%s: ino=%d blk=%"PRIu64"\n", __func__, ino, *idx);
+
+	err = ext2fs_bmap2(fs, ino, NULL, NULL, 0, *idx, 0, (blk64_t *)idx);
+	if (err) {
+		ret = translate_error(fs, ino, err);
+		goto out;
+	}
+
+out:
+	fuse4fs_finish(ff, ret);
+	return ret;
+}
+
+#ifdef SUPPORT_FALLOCATE
+static int fuse4fs_allocate_range(struct fuse4fs *ff,
+				  struct fuse4fs_file_handle *fh, int mode,
+				  off_t offset, off_t len)
+{
+	ext2_filsys fs = ff->fs;
+	struct ext2_inode_large inode;
+	blk64_t start, end;
+	__u64 fsize;
+	errcode_t err;
+	int flags;
+
+	start = FUSE4FS_B_TO_FSBT(ff, offset);
+	end = FUSE4FS_B_TO_FSBT(ff, offset + len - 1);
+	dbg_printf(ff, "%s: ino=%d mode=0x%x offset=0x%llx len=0x%llx start=0x%llx end=0x%llx\n",
+		   __func__, fh->ino, mode,
+		   (unsigned long long)offset,
+		   (unsigned long long)len,
+		   (unsigned long long)start,
+		   (unsigned long long)end);
+	if (!fs_can_allocate(ff, FUSE4FS_B_TO_FSB(ff, len)))
+		return -ENOSPC;
+
+	err = fuse4fs_read_inode(fs, fh->ino, &inode);
+	if (err)
+		return err;
+	fsize = EXT2_I_SIZE(&inode);
+
+	/* Indirect files do not support unwritten extents */
+	if (!(inode.i_flags & EXT4_EXTENTS_FL))
+		return -EOPNOTSUPP;
+
+	/* Allocate a bunch of blocks */
+	flags = (mode & FL_KEEP_SIZE_FLAG ? 0 :
+			EXT2_FALLOCATE_INIT_BEYOND_EOF);
+	err = ext2fs_fallocate(fs, flags, fh->ino,
+			       EXT2_INODE(&inode),
+			       ~0ULL, start, end - start + 1);
+	if (err && err != EXT2_ET_BLOCK_ALLOC_FAIL)
+		return translate_error(fs, fh->ino, err);
+
+	/* Update i_size */
+	if (!(mode & FL_KEEP_SIZE_FLAG)) {
+		if ((__u64) offset + len > fsize) {
+			err = ext2fs_inode_size_set(fs,
+						EXT2_INODE(&inode),
+						offset + len);
+			if (err)
+				return translate_error(fs, fh->ino, err);
+		}
+	}
+
+	err = update_mtime(fs, fh->ino, &inode);
+	if (err)
+		return err;
+
+	err = fuse4fs_write_inode(fs, fh->ino, &inode);
+	if (err)
+		return translate_error(fs, fh->ino, err);
+
+	return err;
+}
+
+static errcode_t clean_block_middle(struct fuse4fs *ff, ext2_ino_t ino,
+				    struct ext2_inode_large *inode,
+				    off_t offset, off_t len, char **buf)
+{
+	ext2_filsys fs = ff->fs;
+	blk64_t blk;
+	off_t residue = FUSE4FS_OFF_IN_FSB(ff, offset);
+	int retflags;
+	errcode_t err;
+
+	if (!*buf) {
+		err = ext2fs_get_mem(fs->blocksize, buf);
+		if (err)
+			return err;
+	}
+
+	err = ext2fs_bmap2(fs, ino, EXT2_INODE(inode), *buf, 0,
+			   FUSE4FS_B_TO_FSBT(ff, offset), &retflags, &blk);
+	if (err)
+		return err;
+	if (!blk || (retflags & BMAP_RET_UNINIT))
+		return 0;
+
+	err = io_channel_read_blk64(fs->io, blk, 1, *buf);
+	if (err)
+		return err;
+
+	dbg_printf(ff, "%s: ino=%d offset=0x%llx len=0x%llx\n",
+		   __func__, ino,
+		   (unsigned long long)offset + residue,
+		   (unsigned long long)len);
+	memset(*buf + residue, 0, len);
+
+	return io_channel_write_blk64(fs->io, blk, 1, *buf);
+}
+
+static errcode_t clean_block_edge(struct fuse4fs *ff, ext2_ino_t ino,
+				  struct ext2_inode_large *inode, off_t offset,
+				  int clean_before, char **buf)
+{
+	ext2_filsys fs = ff->fs;
+	blk64_t blk;
+	int retflags;
+	off_t residue;
+	errcode_t err;
+
+	residue = FUSE4FS_OFF_IN_FSB(ff, offset);
+	if (residue == 0)
+		return 0;
+
+	if (!*buf) {
+		err = ext2fs_get_mem(fs->blocksize, buf);
+		if (err)
+			return err;
+	}
+
+	err = ext2fs_bmap2(fs, ino, EXT2_INODE(inode), *buf, 0,
+			   FUSE4FS_B_TO_FSBT(ff, offset), &retflags, &blk);
+	if (err)
+		return err;
+
+	err = io_channel_read_blk64(fs->io, blk, 1, *buf);
+	if (err)
+		return err;
+	if (!blk || (retflags & BMAP_RET_UNINIT))
+		return 0;
+
+	if (clean_before) {
+		dbg_printf(ff, "%s: ino=%d before offset=0x%llx len=0x%llx\n",
+			   __func__, ino,
+			   (unsigned long long)offset,
+			   (unsigned long long)residue);
+		memset(*buf, 0, residue);
+	} else {
+		dbg_printf(ff, "%s: ino=%d after offset=0x%llx len=0x%llx\n",
+			   __func__, ino,
+			   (unsigned long long)offset,
+			   (unsigned long long)fs->blocksize - residue);
+		memset(*buf + residue, 0, fs->blocksize - residue);
+	}
+
+	return io_channel_write_blk64(fs->io, blk, 1, *buf);
+}
+
+static int fuse4fs_punch_range(struct fuse4fs *ff,
+			       struct fuse4fs_file_handle *fh, int mode,
+			       off_t offset, off_t len)
+{
+	ext2_filsys fs = ff->fs;
+	struct ext2_inode_large inode;
+	blk64_t start, end;
+	errcode_t err;
+	char *buf = NULL;
+
+	/* kernel ext4 punch requires this flag to be set */
+	if (!(mode & FL_KEEP_SIZE_FLAG))
+		return -EINVAL;
+
+	/*
+	 * Unmap out all full blocks in the middle of the range being punched.
+	 * The start of the unmap range should be the first byte of the first
+	 * fsblock that starts within the range.  The end of the range should
+	 * be the next byte after the last fsblock to end in the range.
+	 */
+	start = FUSE4FS_B_TO_FSBT(ff, round_up(offset, fs->blocksize));
+	end = FUSE4FS_B_TO_FSBT(ff, round_down(offset + len, fs->blocksize));
+
+	dbg_printf(ff,
+ "%s: ino=%d mode=0x%x offset=0x%llx len=0x%llx start=0x%llx end=0x%llx\n",
+		   __func__, fh->ino, mode,
+		   (unsigned long long)offset,
+		   (unsigned long long)len,
+		   (unsigned long long)start,
+		   (unsigned long long)end);
+
+	err = fuse4fs_read_inode(fs, fh->ino, &inode);
+	if (err)
+		return translate_error(fs, fh->ino, err);
+
+	/*
+	 * Indirect files do not support unwritten extents, which means we
+	 * can't support zero range.  Punch goes first in zero-range, which
+	 * is why the check is here.
+	 */
+	if ((mode & FL_ZERO_RANGE_FLAG) && !(inode.i_flags & EXT4_EXTENTS_FL))
+		return -EOPNOTSUPP;
+
+	/* Zero everything before the first block and after the last block */
+	if (FUSE4FS_B_TO_FSBT(ff, offset) == FUSE4FS_B_TO_FSBT(ff, offset + len))
+		err = clean_block_middle(ff, fh->ino, &inode, offset,
+					 len, &buf);
+	else {
+		err = clean_block_edge(ff, fh->ino, &inode, offset, 0, &buf);
+		if (!err)
+			err = clean_block_edge(ff, fh->ino, &inode,
+					       offset + len, 1, &buf);
+	}
+	if (buf)
+		ext2fs_free_mem(&buf);
+	if (err)
+		return translate_error(fs, fh->ino, err);
+
+	/*
+	 * Unmap full blocks in the middle, which is to say that start - end
+	 * must be at least one fsblock.  ext2fs_punch takes a closed interval
+	 * as its argument, so we pass [start, end - 1].
+	 */
+	if (start < end) {
+		err = ext2fs_punch(fs, fh->ino, EXT2_INODE(&inode),
+				   NULL, start, end - 1);
+		if (err)
+			return translate_error(fs, fh->ino, err);
+	}
+
+	err = update_mtime(fs, fh->ino, &inode);
+	if (err)
+		return err;
+
+	err = fuse4fs_write_inode(fs, fh->ino, &inode);
+	if (err)
+		return translate_error(fs, fh->ino, err);
+
+	return 0;
+}
+
+static int fuse4fs_zero_range(struct fuse4fs *ff,
+			      struct fuse4fs_file_handle *fh, int mode,
+			      off_t offset, off_t len)
+{
+	int ret = fuse4fs_punch_range(ff, fh, mode | FL_KEEP_SIZE_FLAG, offset,
+				      len);
+
+	if (!ret)
+		ret = fuse4fs_allocate_range(ff, fh, mode, offset, len);
+	return ret;
+}
+
+static int op_fallocate(const char *path EXT2FS_ATTR((unused)), int mode,
+			off_t offset, off_t len,
+			struct fuse_file_info *fp)
+{
+	struct fuse4fs *ff = fuse4fs_get();
+	struct fuse4fs_file_handle *fh = fuse4fs_get_handle(fp);
+	int ret;
+
+	/* Catch unknown flags */
+	if (mode & ~(FL_ZERO_RANGE_FLAG | FL_PUNCH_HOLE_FLAG | FL_KEEP_SIZE_FLAG))
+		return -EOPNOTSUPP;
+
+	FUSE4FS_CHECK_CONTEXT(ff);
+	FUSE4FS_CHECK_HANDLE(ff, fh);
+	fuse4fs_start(ff);
+	if (!fuse4fs_is_writeable(ff)) {
+		ret = -EROFS;
+		goto out;
+	}
+
+	dbg_printf(ff, "%s: ino=%d mode=0x%x start=0x%llx end=0x%llx\n", __func__,
+		   fh->ino, mode,
+		   (unsigned long long)offset,
+		   (unsigned long long)offset + len);
+
+	if (mode & FL_ZERO_RANGE_FLAG)
+		ret = fuse4fs_zero_range(ff, fh, mode, offset, len);
+	else if (mode & FL_PUNCH_HOLE_FLAG)
+		ret = fuse4fs_punch_range(ff, fh, mode, offset, len);
+	else
+		ret = fuse4fs_allocate_range(ff, fh, mode, offset, len);
+out:
+	fuse4fs_finish(ff, ret);
+
+	return ret;
+}
+#endif /* SUPPORT_FALLOCATE */
+
+static struct fuse_operations fs_ops = {
+	.init = op_init,
+	.destroy = op_destroy,
+	.getattr = op_getattr,
+	.readlink = op_readlink,
+	.mknod = op_mknod,
+	.mkdir = op_mkdir,
+	.unlink = op_unlink,
+	.rmdir = op_rmdir,
+	.symlink = op_symlink,
+	.rename = op_rename,
+	.link = op_link,
+	.chmod = op_chmod,
+	.chown = op_chown,
+	.truncate = op_truncate,
+	.open = op_open,
+	.read = op_read,
+	.write = op_write,
+	.statfs = op_statfs,
+	.release = op_release,
+	.fsync = op_fsync,
+	.setxattr = op_setxattr,
+	.getxattr = op_getxattr,
+	.listxattr = op_listxattr,
+	.removexattr = op_removexattr,
+	.opendir = op_open,
+	.readdir = op_readdir,
+	.releasedir = op_release,
+	.fsyncdir = op_fsync,
+	.access = op_access,
+	.create = op_create,
+	.utimens = op_utimens,
+	.bmap = op_bmap,
+#ifdef SUPERFLUOUS
+	.lock = op_lock,
+	.poll = op_poll,
+#endif
+	.ioctl = op_ioctl,
+#ifdef SUPPORT_FALLOCATE
+	.fallocate = op_fallocate,
+#endif
+};
+
+static int get_random_bytes(void *p, size_t sz)
+{
+	int fd;
+	ssize_t r;
+
+	fd = open("/dev/urandom", O_RDONLY);
+	if (fd < 0) {
+		perror("/dev/urandom");
+		return 0;
+	}
+
+	r = read(fd, p, sz);
+
+	close(fd);
+	return (size_t) r == sz;
+}
+
+enum {
+	FUSE4FS_IGNORED,
+	FUSE4FS_VERSION,
+	FUSE4FS_HELP,
+	FUSE4FS_HELPFULL,
+	FUSE4FS_CACHE_SIZE,
+	FUSE4FS_DIRSYNC,
+	FUSE4FS_ERRORS_BEHAVIOR,
+};
+
+#define FUSE4FS_OPT(t, p, v) { t, offsetof(struct fuse4fs, p), v }
+
+static struct fuse_opt fuse4fs_opts[] = {
+	FUSE4FS_OPT("ro",		ro,			1),
+	FUSE4FS_OPT("rw",		ro,			0),
+	FUSE4FS_OPT("minixdf",		minixdf,		1),
+	FUSE4FS_OPT("bsddf",		minixdf,		0),
+	FUSE4FS_OPT("fakeroot",		fakeroot,		1),
+	FUSE4FS_OPT("fuse4fs_debug",	debug,			1),
+	FUSE4FS_OPT("no_default_opts",	no_default_opts,	1),
+	FUSE4FS_OPT("norecovery",	norecovery,		1),
+	FUSE4FS_OPT("noload",		norecovery,		1),
+	FUSE4FS_OPT("offset=%lu",	offset,			0),
+	FUSE4FS_OPT("kernel",		kernel,			1),
+	FUSE4FS_OPT("directio",		directio,		1),
+	FUSE4FS_OPT("acl",		acl,			1),
+	FUSE4FS_OPT("noacl",		acl,			0),
+	FUSE4FS_OPT("lockfile=%s",	lockfile,		0),
+#ifdef HAVE_CLOCK_MONOTONIC
+	FUSE4FS_OPT("timing",		timing,			1),
+#endif
+	FUSE4FS_OPT("noblkdev",		noblkdev,		1),
+
+	FUSE_OPT_KEY("user_xattr",	FUSE4FS_IGNORED),
+	FUSE_OPT_KEY("noblock_validity", FUSE4FS_IGNORED),
+	FUSE_OPT_KEY("nodelalloc",	FUSE4FS_IGNORED),
+	FUSE_OPT_KEY("cache_size=%s",	FUSE4FS_CACHE_SIZE),
+	FUSE_OPT_KEY("dirsync",		FUSE4FS_DIRSYNC),
+	FUSE_OPT_KEY("errors=%s",	FUSE4FS_ERRORS_BEHAVIOR),
+
+	FUSE_OPT_KEY("-V",             FUSE4FS_VERSION),
+	FUSE_OPT_KEY("--version",      FUSE4FS_VERSION),
+	FUSE_OPT_KEY("-h",             FUSE4FS_HELP),
+	FUSE_OPT_KEY("--help",         FUSE4FS_HELP),
+	FUSE_OPT_KEY("--helpfull",     FUSE4FS_HELPFULL),
+	FUSE_OPT_END
+};
+
+
+static int fuse4fs_opt_proc(void *data, const char *arg,
+			    int key, struct fuse_args *outargs)
+{
+	struct fuse4fs *ff = data;
+
+	switch (key) {
+	case FUSE4FS_DIRSYNC:
+		ff->dirsync = 1;
+		/* pass through to libfuse */
+		return 1;
+	case FUSE_OPT_KEY_NONOPT:
+		if (!ff->device) {
+			ff->device = strdup(arg);
+			return 0;
+		}
+		return 1;
+	case FUSE4FS_CACHE_SIZE:
+		ff->cache_size = parse_num_blocks2(arg + 11, -1);
+		if (ff->cache_size < 1 || ff->cache_size > INT32_MAX) {
+			fprintf(stderr, "%s: %s\n", arg,
+ _("cache size must be between 1 block and 2GB."));
+			return -1;
+		}
+
+		/* do not pass through to libfuse */
+		return 0;
+	case FUSE4FS_ERRORS_BEHAVIOR:
+		if (strcmp(arg + 7, "continue") == 0)
+			ff->errors_behavior = EXT2_ERRORS_CONTINUE;
+		else if (strcmp(arg + 7, "remount-ro") == 0)
+			ff->errors_behavior = EXT2_ERRORS_RO;
+		else if (strcmp(arg + 7, "panic") == 0)
+			ff->errors_behavior = EXT2_ERRORS_PANIC;
+		else {
+			fprintf(stderr, "%s: %s\n", arg,
+ _("unknown errors behavior."));
+			return -1;
+		}
+
+		/* do not pass through to libfuse */
+		return 0;
+	case FUSE4FS_IGNORED:
+		return 0;
+	case FUSE4FS_HELP:
+	case FUSE4FS_HELPFULL:
+		fprintf(stderr,
+	"usage: %s device/image mountpoint [options]\n"
+	"\n"
+	"general options:\n"
+	"    -o opt,[opt...]  mount options\n"
+	"    -h   --help      print help\n"
+	"    -V   --version   print version\n"
+	"\n"
+	"fuse4fs options:\n"
+	"    -o errors=panic        dump core on error\n"
+	"    -o minixdf             minix-style df\n"
+	"    -o fakeroot            pretend to be root for permission checks\n"
+	"    -o no_default_opts     do not include default fuse options\n"
+	"    -o offset=<bytes>      similar to mount -o offset=<bytes>, mount the partition starting at <bytes>\n"
+	"    -o norecovery          don't replay the journal\n"
+	"    -o fuse4fs_debug       enable fuse4fs debugging\n"
+	"    -o lockfile=<file>     file to show that fuse is still using the file system image\n"
+	"    -o kernel              run this as if it were the kernel, which sets:\n"
+	"                           allow_others,default_permissions,suid,dev\n"
+	"    -o directio            use O_DIRECT to read and write the disk\n"
+	"    -o cache_size=N[KMG]   use a disk cache of this size\n"
+	"    -o errors=             behavior when an error is encountered:\n"
+	"                           continue|remount-ro|panic\n"
+	"\n",
+			outargs->argv[0]);
+		if (key == FUSE4FS_HELPFULL) {
+			fuse_opt_add_arg(outargs, "-h");
+			fuse_main(outargs->argc, outargs->argv, &fs_ops, NULL);
+		} else {
+			fprintf(stderr, "Try --helpfull to get a list of "
+				"all flags, including the FUSE options.\n");
+		}
+		exit(1);
+
+	case FUSE4FS_VERSION:
+		fprintf(stderr, "fuse4fs %s (%s)\n", E2FSPROGS_VERSION,
+			E2FSPROGS_DATE);
+		fuse_opt_add_arg(outargs, "--version");
+		fuse_main(outargs->argc, outargs->argv, &fs_ops, NULL);
+		exit(0);
+	}
+	return 1;
+}
+
+static const char *get_subtype(const char *argv0)
+{
+	size_t argvlen = strlen(argv0);
+
+	if (argvlen < 4)
+		goto out_default;
+
+	if (argv0[argvlen - 4] == 'e' &&
+	    argv0[argvlen - 3] == 'x' &&
+	    argv0[argvlen - 2] == 't' &&
+	    isdigit(argv0[argvlen - 1]))
+		return &argv0[argvlen - 4];
+
+out_default:
+	return "ext4";
+}
+
+/* Figure out a reasonable default size for the disk cache */
+static unsigned long long default_cache_size(void)
+{
+	long pages = 0, pagesize = 0;
+	unsigned long long max_cache;
+	unsigned long long ret = 32ULL << 20; /* 32 MB */
+
+#ifdef _SC_PHYS_PAGES
+	pages = sysconf(_SC_PHYS_PAGES);
+#endif
+#ifdef _SC_PAGESIZE
+	pagesize = sysconf(_SC_PAGESIZE);
+#endif
+	if (pages > 0 && pagesize > 0) {
+		max_cache = (unsigned long long)pagesize * pages / 20;
+
+		if (max_cache > 0 && ret > max_cache)
+			ret = max_cache;
+	}
+	return ret;
+}
+
+static inline bool fuse4fs_want_fuseblk(const struct fuse4fs *ff)
+{
+	if (ff->noblkdev)
+		return false;
+
+	/* libfuse won't let non-root do fuseblk mounts */
+	if (getuid() != 0)
+		return false;
+
+	return fuse4fs_on_bdev(ff);
+}
+
+static void fuse4fs_com_err_proc(const char *whoami, errcode_t code,
+				 const char *fmt, va_list args)
+{
+	fprintf(stderr, "FUSE4FS (%s): ", err_shortdev ? err_shortdev : "?");
+	if (whoami)
+		fprintf(stderr, "%s: ", whoami);
+	fprintf(stderr, "%s ", error_message(code));
+        vfprintf(stderr, fmt, args);
+	fprintf(stderr, "\n");
+	fflush(stderr);
+}
+
+int main(int argc, char *argv[])
+{
+	struct fuse_args args = FUSE_ARGS_INIT(argc, argv);
+	struct fuse4fs fctx;
+	errcode_t err;
+	FILE *orig_stderr = stderr;
+	char extra_args[BUFSIZ];
+	int ret;
+
+	memset(&fctx, 0, sizeof(fctx));
+	fctx.magic = FUSE4FS_MAGIC;
+	fctx.logfd = -1;
+	fctx.opstate = F4OP_WRITABLE;
+
+	ret = fuse_opt_parse(&args, &fctx, fuse4fs_opts, fuse4fs_opt_proc);
+	if (ret)
+		exit(1);
+	if (fctx.device == NULL) {
+		fprintf(stderr, "Missing ext4 device/image\n");
+		fprintf(stderr, "See '%s -h' for usage\n", argv[0]);
+		exit(1);
+	}
+
+	/* /dev/sda -> sda for reporting */
+	fctx.shortdev = strrchr(fctx.device, '/');
+	if (fctx.shortdev)
+		fctx.shortdev++;
+	else
+		fctx.shortdev = fctx.device;
+
+	/* capture library error messages */
+	err_shortdev = fctx.shortdev;
+	set_com_err_hook(fuse4fs_com_err_proc);
+
+#ifdef ENABLE_NLS
+	setlocale(LC_MESSAGES, "");
+	setlocale(LC_CTYPE, "");
+	bindtextdomain(NLS_CAT_NAME, LOCALEDIR);
+	textdomain(NLS_CAT_NAME);
+	set_com_err_gettext(gettext);
+#endif
+	add_error_table(&et_ext2_error_table);
+
+	ret = fuse4fs_setup_logging(&fctx);
+	if (ret) {
+		/* operational error */
+		ret = 2;
+		goto out;
+	}
+
+#ifdef HAVE_PR_SET_IO_FLUSHER
+	/*
+	 * Register as a filesystem I/O server process so that our memory
+	 * allocations don't cause fs reclaim.
+	 */
+	ret = prctl(PR_GET_IO_FLUSHER, 0, 0, 0, 0);
+	if (ret == 0) {
+		ret = prctl(PR_SET_IO_FLUSHER, 1, 0, 0, 0);
+		if (ret < 0) {
+			err_printf(&fctx, "%s: %s.\n",
+ _("Could not register as IO flusher thread"),
+					strerror(errno));
+			ret = 0;
+		}
+	}
+#endif
+
+	/* Will we allow users to allocate every last block? */
+	if (getenv("FUSE4FS_ALLOC_ALL_BLOCKS")) {
+		log_printf(&fctx, "%s\n",
+ _("Allowing users to allocate all blocks. This is dangerous!"));
+		fctx.alloc_all_blocks = 1;
+	}
+
+	err = fuse4fs_open(&fctx, EXT2_FLAG_EXCLUSIVE);
+	if (err) {
+		ret = 32;
+		goto out;
+	}
+
+	if (fuse4fs_want_fuseblk(&fctx)) {
+		/*
+		 * If this is a block device, we want to close the fs, reopen
+		 * the block device in non-exclusive mode, and start the fuse
+		 * driver in fuseblk mode (which will reopen the block device
+		 * in exclusive mode) so that unmount will wait until
+		 * op_destroy completes.
+		 */
+		fuse4fs_unmount(&fctx);
+		err = fuse4fs_open(&fctx, 0);
+		if (err) {
+			ret = 32;
+			goto out;
+		}
+
+		/* "blkdev" is the magic mount option for fuseblk mode */
+		snprintf(extra_args, BUFSIZ, "-oblkdev,blksize=%u",
+			 fctx.fs->blocksize);
+		fuse_opt_add_arg(&args, extra_args);
+		fctx.unmount_in_destroy = 1;
+	}
+
+	if (!fctx.cache_size)
+		fctx.cache_size = default_cache_size();
+	if (fctx.cache_size) {
+		err = fuse4fs_config_cache(&fctx);
+		if (err) {
+			ret = 32;
+			goto out;
+		}
+	}
+
+	err = fuse4fs_check_support(&fctx);
+	if (err) {
+		ret = 32;
+		goto out;
+	}
+
+	/*
+	 * ext4 can't do COW of shared blocks, so if the feature is enabled,
+	 * we must force ro mode.
+	 */
+	if (ext2fs_has_feature_shared_blocks(fctx.fs->super))
+		fctx.ro = 1;
+
+	if (fctx.norecovery) {
+		ret = fuse4fs_check_norecovery(&fctx);
+		if (ret)
+			goto out;
+	}
+
+	err = fuse4fs_mount(&fctx);
+	if (err) {
+		ret = 32;
+		goto out;
+	}
+
+	/* Initialize generation counter */
+	get_random_bytes(&fctx.next_generation, sizeof(unsigned int));
+
+	/* Set up default fuse parameters */
+	snprintf(extra_args, BUFSIZ, "-okernel_cache,subtype=%s,"
+		 "fsname=%s,attr_timeout=0",
+		 get_subtype(argv[0]),
+		 fctx.device);
+	if (fctx.no_default_opts == 0)
+		fuse_opt_add_arg(&args, extra_args);
+
+	if (fctx.ro)
+		fuse_opt_add_arg(&args, "-oro");
+
+	if (fctx.fakeroot) {
+#ifdef HAVE_MOUNT_NODEV
+		fuse_opt_add_arg(&args,"-onodev");
+#endif
+#ifdef HAVE_MOUNT_NOSUID
+		fuse_opt_add_arg(&args,"-onosuid");
+#endif
+	}
+
+	if (fctx.kernel) {
+		/*
+		 * ACLs are always enforced when kernel mode is enabled, to
+		 * match the kernel ext4 driver which always enables ACLs.
+		 */
+		fctx.acl = 1;
+		fuse_opt_insert_arg(&args, 1,
+ "-oallow_other,default_permissions,suid,dev");
+	}
+
+	/*
+	 * Since there's a Big Kernel Lock around all the libext2fs code, we
+	 * only need to start four threads -- one to decode a request, another
+	 * to do the filesystem work, a third to transmit the reply, and a
+	 * fourth to handle fuse notifications.
+	 */
+	fuse_opt_insert_arg(&args, 1, "-omax_threads=4");
+
+	if (fctx.debug) {
+		int	i;
+
+		printf("FUSE4FS (%s): fuse arguments:", fctx.shortdev);
+		for (i = 0; i < args.argc; i++)
+			printf(" '%s'", args.argv[i]);
+		printf("\n");
+		fflush(stdout);
+	}
+
+	pthread_mutex_init(&fctx.bfl, NULL);
+	ret = fuse_main(args.argc, args.argv, &fs_ops, &fctx);
+	pthread_mutex_destroy(&fctx.bfl);
+
+	switch(ret) {
+	case 0:
+		/* success */
+		ret = 0;
+		break;
+	case 1:
+	case 2:
+		/* invalid option or no mountpoint */
+		ret = 1;
+		break;
+	case 3:
+	case 4:
+	case 5:
+	case 6:
+	case 7:
+		/* setup or mounting failed */
+		ret = 32;
+		break;
+	default:
+		/* fuse started up enough to call op_init */
+		ret = 0;
+		break;
+	}
+out:
+	if (ret & 1) {
+		fprintf(orig_stderr, "%s\n",
+ _("Mount failed due to unrecognized options.  Check dmesg(1) for details."));
+		fflush(orig_stderr);
+	}
+	if (ret & 32) {
+		fprintf(orig_stderr, "%s\n",
+ _("Mount failed while opening filesystem.  Check dmesg(1) for details."));
+		fflush(orig_stderr);
+	}
+	fuse4fs_unmount(&fctx);
+	reset_com_err_hook();
+	err_shortdev = NULL;
+	if (fctx.device)
+		free(fctx.device);
+	fuse_opt_free_args(&args);
+	return ret;
+}
+
+static int __translate_error(ext2_filsys fs, ext2_ino_t ino, errcode_t err,
+			     const char *func, int line)
+{
+	struct timespec now;
+	int ret = err;
+	struct fuse4fs *ff = fs->priv_data;
+	int is_err = 0;
+
+	/* Translate ext2 error to unix error code */
+	switch (err) {
+	case 0:
+		break;
+	case EXT2_ET_NO_MEMORY:
+	case EXT2_ET_TDB_ERR_OOM:
+		ret = -ENOMEM;
+		break;
+	case EXT2_ET_INVALID_ARGUMENT:
+	case EXT2_ET_LLSEEK_FAILED:
+		ret = -EINVAL;
+		break;
+	case EXT2_ET_NO_DIRECTORY:
+		ret = -ENOTDIR;
+		break;
+	case EXT2_ET_FILE_NOT_FOUND:
+		ret = -ENOENT;
+		break;
+	case EXT2_ET_DIR_NO_SPACE:
+		is_err = 1;
+		/* fallthrough */
+	case EXT2_ET_TOOSMALL:
+	case EXT2_ET_BLOCK_ALLOC_FAIL:
+	case EXT2_ET_INODE_ALLOC_FAIL:
+	case EXT2_ET_EA_NO_SPACE:
+		ret = -ENOSPC;
+		break;
+	case EXT2_ET_SYMLINK_LOOP:
+		ret = -EMLINK;
+		break;
+	case EXT2_ET_FILE_TOO_BIG:
+		ret = -EFBIG;
+		break;
+	case EXT2_ET_TDB_ERR_EXISTS:
+	case EXT2_ET_FILE_EXISTS:
+		ret = -EEXIST;
+		break;
+	case EXT2_ET_MMP_FAILED:
+	case EXT2_ET_MMP_FSCK_ON:
+		ret = -EBUSY;
+		break;
+	case EXT2_ET_EA_KEY_NOT_FOUND:
+		ret = -ENODATA;
+		break;
+	case EXT2_ET_UNIMPLEMENTED:
+		ret = -EOPNOTSUPP;
+		break;
+	case EXT2_ET_RO_FILSYS:
+		ret = -EROFS;
+		break;
+	case EXT2_ET_MAGIC_EXT2_FILE:
+	case EXT2_ET_MAGIC_EXT2FS_FILSYS:
+	case EXT2_ET_MAGIC_BADBLOCKS_LIST:
+	case EXT2_ET_MAGIC_BADBLOCKS_ITERATE:
+	case EXT2_ET_MAGIC_INODE_SCAN:
+	case EXT2_ET_MAGIC_IO_CHANNEL:
+	case EXT2_ET_MAGIC_UNIX_IO_CHANNEL:
+	case EXT2_ET_MAGIC_IO_MANAGER:
+	case EXT2_ET_MAGIC_BLOCK_BITMAP:
+	case EXT2_ET_MAGIC_INODE_BITMAP:
+	case EXT2_ET_MAGIC_GENERIC_BITMAP:
+	case EXT2_ET_MAGIC_TEST_IO_CHANNEL:
+	case EXT2_ET_MAGIC_DBLIST:
+	case EXT2_ET_MAGIC_ICOUNT:
+	case EXT2_ET_MAGIC_PQ_IO_CHANNEL:
+	case EXT2_ET_MAGIC_E2IMAGE:
+	case EXT2_ET_MAGIC_INODE_IO_CHANNEL:
+	case EXT2_ET_MAGIC_EXTENT_HANDLE:
+	case EXT2_ET_BAD_MAGIC:
+	case EXT2_ET_MAGIC_EXTENT_PATH:
+	case EXT2_ET_MAGIC_GENERIC_BITMAP64:
+	case EXT2_ET_MAGIC_BLOCK_BITMAP64:
+	case EXT2_ET_MAGIC_INODE_BITMAP64:
+	case EXT2_ET_MAGIC_RESERVED_13:
+	case EXT2_ET_MAGIC_RESERVED_14:
+	case EXT2_ET_MAGIC_RESERVED_15:
+	case EXT2_ET_MAGIC_RESERVED_16:
+	case EXT2_ET_MAGIC_RESERVED_17:
+	case EXT2_ET_MAGIC_RESERVED_18:
+	case EXT2_ET_MAGIC_RESERVED_19:
+	case EXT2_ET_MMP_MAGIC_INVALID:
+	case EXT2_ET_MAGIC_EA_HANDLE:
+	case EXT2_ET_DIR_CORRUPTED:
+	case EXT2_ET_CORRUPT_SUPERBLOCK:
+	case EXT2_ET_RESIZE_INODE_CORRUPT:
+	case EXT2_ET_TDB_ERR_CORRUPT:
+	case EXT2_ET_UNDO_FILE_CORRUPT:
+	case EXT2_ET_FILESYSTEM_CORRUPTED:
+	case EXT2_ET_CORRUPT_JOURNAL_SB:
+	case EXT2_ET_INODE_CORRUPTED:
+	case EXT2_ET_EA_INODE_CORRUPTED:
+		/* same errno that linux uses */
+		is_err = 1;
+		ret = -EUCLEAN;
+		break;
+	case EIO:
+#ifdef EILSEQ
+	case EILSEQ:
+#endif
+	case EUCLEAN:
+		/* these errnos usually denote corruption or persistence fail */
+		is_err = 1;
+		ret = -err;
+		break;
+	default:
+		if (err < 256) {
+			/* other errno are usually operational errors */
+			ret = -err;
+		} else {
+			is_err = 1;
+			ret = -EIO;
+		}
+		break;
+	}
+
+	if (!is_err)
+		return ret;
+
+	if (ino)
+		err_printf(ff, "%s (inode #%d) at %s:%d.\n",
+			error_message(err), ino, func, line);
+	else
+		err_printf(ff, "%s at %s:%d.\n",
+			error_message(err), func, line);
+
+	/* Make a note in the error log */
+	get_now(&now);
+	ext2fs_set_tstamp(fs->super, s_last_error_time, now.tv_sec);
+	fs->super->s_last_error_ino = ino;
+	fs->super->s_last_error_line = line;
+	fs->super->s_last_error_block = err; /* Yeah... */
+	strncpy((char *)fs->super->s_last_error_func, func,
+		sizeof(fs->super->s_last_error_func));
+	if (ext2fs_get_tstamp(fs->super, s_first_error_time) == 0) {
+		ext2fs_set_tstamp(fs->super, s_first_error_time, now.tv_sec);
+		fs->super->s_first_error_ino = ino;
+		fs->super->s_first_error_line = line;
+		fs->super->s_first_error_block = err;
+		strncpy((char *)fs->super->s_first_error_func, func,
+			sizeof(fs->super->s_first_error_func));
+	}
+
+	fs->super->s_state |= EXT2_ERROR_FS;
+	fs->super->s_error_count++;
+	ext2fs_mark_super_dirty(fs);
+	ext2fs_flush(fs);
+	switch (ff->errors_behavior) {
+	case EXT2_ERRORS_CONTINUE:
+		err_printf(ff, "%s\n",
+ _("Continuing after errors; is this a good idea?"));
+		break;
+	case EXT2_ERRORS_RO:
+		if (ff->opstate == F4OP_WRITABLE) {
+			err_printf(ff, "%s\n",
+ _("Remounting read-only due to errors."));
+			ff->opstate = F4OP_READONLY;
+		}
+		fs->flags &= ~EXT2_FLAG_RW;
+		break;
+	case EXT2_ERRORS_PANIC:
+		err_printf(ff, "%s\n",
+ _("Aborting filesystem mount due to errors."));
+		abort();
+		break;
+	}
+
+	return ret;
+}
diff --git a/lib/config.h.in b/lib/config.h.in
index a4d8ce1c3765ed..c3379758c3c9bc 100644
--- a/lib/config.h.in
+++ b/lib/config.h.in
@@ -73,6 +73,9 @@
 /* Define to 1 if PR_SET_IO_FLUSHER is present */
 #undef HAVE_PR_SET_IO_FLUSHER
 
+/* Define to 1 if fuse supports lowlevel API */
+#undef HAVE_FUSE_LOWLEVEL
+
 /* Define to 1 if you have the Mac OS X function
    CFLocaleCopyPreferredLanguages in the CoreFoundation framework. */
 #undef HAVE_CFLOCALECOPYPREFERREDLANGUAGES


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 03/21] debian: create new package for fuse4fs
  2025-09-16  0:22 ` [PATCHSET RFC v5 2/9] fuse4fs: fork a low level fuse server Darrick J. Wong
  2025-09-16  0:50   ` [PATCH 01/21] fuse2fs: separate libfuse3 and fuse2fs detection in configure Darrick J. Wong
  2025-09-16  0:51   ` [PATCH 02/21] fuse2fs: start porting fuse2fs to lowlevel libfuse API Darrick J. Wong
@ 2025-09-16  0:51   ` Darrick J. Wong
  2025-09-16  0:51   ` [PATCH 04/21] fuse4fs: namespace some helpers Darrick J. Wong
                     ` (17 subsequent siblings)
  20 siblings, 0 replies; 87+ messages in thread
From: Darrick J. Wong @ 2025-09-16  0:51 UTC (permalink / raw)
  To: tytso
  Cc: miklos, neal, amir73il, linux-fsdevel, linux-ext4, John, bernd,
	joannelkoong

From: Darrick J. Wong <djwong@kernel.org>

Create a new package for fuse4fs.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 debian/control         |   12 +++++++++++-
 debian/fuse4fs.install |    2 ++
 debian/rules           |   11 +++++++++++
 3 files changed, 24 insertions(+), 1 deletion(-)
 create mode 100644 debian/fuse4fs.install


diff --git a/debian/control b/debian/control
index fb3487cd32b99a..94c6b82e25a97e 100644
--- a/debian/control
+++ b/debian/control
@@ -2,7 +2,7 @@ Source: e2fsprogs
 Section: admin
 Priority: important
 Maintainer: Theodore Y. Ts'o <tytso@mit.edu>
-Build-Depends: dpkg-dev (>= 1.22.5), gettext, texinfo, pkgconf, libarchive-dev <!nocheck>, libfuse3-dev [linux-any kfreebsd-any] <!pkg.e2fsprogs.no-fuse2fs>, debhelper-compat (= 12), dh-exec, libblkid-dev, uuid-dev, m4, udev [linux-any], systemd [linux-any], systemd-dev [linux-any], cron [linux-any], dh-sequence-movetousr
+Build-Depends: dpkg-dev (>= 1.22.5), gettext, texinfo, pkgconf, libarchive-dev <!nocheck>, libfuse3-dev [linux-any kfreebsd-any] <!pkg.e2fsprogs.no-fuse2fs> <!pkg.e2fsprogs.no-fuse4fs>, debhelper-compat (= 12), dh-exec, libblkid-dev, uuid-dev, m4, udev [linux-any], systemd [linux-any], systemd-dev [linux-any], cron [linux-any], dh-sequence-movetousr
 Rules-Requires-Root: no
 Standards-Version: 4.7.2
 Homepage: http://e2fsprogs.sourceforge.net
@@ -21,6 +21,16 @@ Description: ext2 / ext3 / ext4 file system driver for FUSE
  writing from devices or image files containing ext2, ext3, and ext4
  file systems.
 
+Package: fuse4fs
+Build-Profiles: <!pkg.e2fsprogs.no-fuse4fs>
+Priority: optional
+Depends: ${shlibs:Depends}, ${misc:Depends}
+Architecture: linux-any kfreebsd-any
+Description: ext2 / ext3 / ext4 file system driver for FUSE
+ fuse4fs is a faster FUSE file system client that supports reading and
+ writing from devices or image files containing ext2, ext3, and ext4
+ file systems.
+
 Package: fuseext2
 Build-Profiles: <!pkg.e2fsprogs.no-fuse2fs>
 Depends: fuse2fs (>= 1.47.1-2), ${misc:Depends}
diff --git a/debian/fuse4fs.install b/debian/fuse4fs.install
new file mode 100644
index 00000000000000..fb8c8ab671c73c
--- /dev/null
+++ b/debian/fuse4fs.install
@@ -0,0 +1,2 @@
+/usr/bin/fuse4fs
+/usr/share/man/man1/fuse4fs.1
diff --git a/debian/rules b/debian/rules
index 4cb80652115317..3240d6bc2640c9 100755
--- a/debian/rules
+++ b/debian/rules
@@ -12,6 +12,7 @@ export LC_ALL ?= C
 
 ifeq ($(DEB_HOST_ARCH_OS), hurd)
 SKIP_FUSE2FS=yes
+SKIP_FUSE4FS=yes
 endif
 
 ifeq ($(DEB_HOST_ARCH_OS), linux)
@@ -22,6 +23,9 @@ endif
 ifneq ($(filter pkg.e2fsprogs.no-fuse2fs,$(DEB_BUILD_PROFILES)),)
 SKIP_FUSE2FS=yes
 endif
+ifneq ($(filter pkg.e2fsprogs.no-fuse4fs,$(DEB_BUILD_PROFILES)),)
+SKIP_FUSE4FS=yes
+endif
 
 ifneq (,$(filter-out parallel=1,$(filter parallel=%,$(DEB_BUILD_OPTIONS))))
     NUMJOBS = $(patsubst parallel=%,%,$(filter parallel=%,$(DEB_BUILD_OPTIONS)))
@@ -60,6 +64,9 @@ COMMON_CONF_FLAGS = --enable-elf-shlibs --disable-ubsan \
 ifneq ($(SKIP_FUSE2FS),)
 COMMON_CONF_FLAGS +=  --disable-fuse2fs
 endif
+ifneq ($(SKIP_FUSE4FS),)
+COMMON_CONF_FLAGS +=  --disable-fuse4fs
+endif
 
 ifneq ($(DEB_BUILD_GNU_TYPE),$(DEB_HOST_GNU_TYPE))
 CC ?= $(DEB_HOST_GNU_TYPE)-gcc
@@ -189,6 +196,10 @@ endif
 ifeq ($(SKIP_FUSE2FS),)
 	dh_shlibdeps -pfuse2fs -l${stdbuilddir}/lib \
 		-- -Ldebian/e2fsprogs.shlibs.local
+endif
+ifeq ($(SKIP_FUSE4FS),)
+	dh_shlibdeps -pfuse4fs -l${stdbuilddir}/lib \
+		-- -Ldebian/e2fsprogs.shlibs.local
 endif
 	dh_shlibdeps --remaining-packages -l${stdbuilddir}/lib
 


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 04/21] fuse4fs: namespace some helpers
  2025-09-16  0:22 ` [PATCHSET RFC v5 2/9] fuse4fs: fork a low level fuse server Darrick J. Wong
                     ` (2 preceding siblings ...)
  2025-09-16  0:51   ` [PATCH 03/21] debian: create new package for fuse4fs Darrick J. Wong
@ 2025-09-16  0:51   ` Darrick J. Wong
  2025-09-16  0:51   ` [PATCH 05/21] fuse4fs: convert to low level API Darrick J. Wong
                     ` (16 subsequent siblings)
  20 siblings, 0 replies; 87+ messages in thread
From: Darrick J. Wong @ 2025-09-16  0:51 UTC (permalink / raw)
  To: tytso
  Cc: miklos, neal, amir73il, linux-fsdevel, linux-ext4, John, bernd,
	joannelkoong

From: Darrick J. Wong <djwong@kernel.org>

Prepend "fuse4fs_" to all helper functions that take a struct fuse4fs
object pointer.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 fuse4fs/fuse4fs.c |  177 +++++++++++++++++++++++++++--------------------------
 1 file changed, 90 insertions(+), 87 deletions(-)


diff --git a/fuse4fs/fuse4fs.c b/fuse4fs/fuse4fs.c
index 99b9b902b37a57..a4eeb86201db0c 100644
--- a/fuse4fs/fuse4fs.c
+++ b/fuse4fs/fuse4fs.c
@@ -2,6 +2,7 @@
  * fuse4fs.c - FUSE low-level server for e2fsprogs.
  *
  * Copyright (C) 2014-2025 Oracle.
+ * Copyright (C) 2025 CTERA Networks.
  *
  * %Begin-Header%
  * This file may be redistributed under the terms of the GNU Public
@@ -696,7 +697,7 @@ static int ext2_file_type(unsigned int mode)
 	return 0;
 }
 
-static int fs_can_allocate(struct fuse4fs *ff, blk64_t num)
+static int fuse4fs_can_allocate(struct fuse4fs *ff, blk64_t num)
 {
 	ext2_filsys fs = ff->fs;
 	blk64_t reserved;
@@ -723,21 +724,22 @@ static int fs_can_allocate(struct fuse4fs *ff, blk64_t num)
 	return ext2fs_free_blocks_count(fs->super) > reserved + num;
 }
 
-static int fuse4fs_is_writeable(struct fuse4fs *ff)
+static int fuse4fs_is_writeable(const struct fuse4fs *ff)
 {
 	return ff->opstate == F4OP_WRITABLE &&
 		(ff->fs->super->s_error_count == 0);
 }
 
-static inline int is_superuser(struct fuse4fs *ff, struct fuse_context *ctxt)
+static inline int fuse4fs_is_superuser(struct fuse4fs *ff,
+				       const struct fuse_context *ctxt)
 {
 	if (ff->fakeroot)
 		return 1;
 	return ctxt->uid == 0;
 }
 
-static inline int want_check_owner(struct fuse4fs *ff,
-				   struct fuse_context *ctxt)
+static inline int fuse4fs_want_check_owner(struct fuse4fs *ff,
+					   const struct fuse_context *ctxt)
 {
 	/*
 	 * The kernel is responsible for access control, so we allow anything
@@ -745,14 +747,14 @@ static inline int want_check_owner(struct fuse4fs *ff,
 	 */
 	if (ff->kernel)
 		return 0;
-	return !is_superuser(ff, ctxt);
+	return !fuse4fs_is_superuser(ff, ctxt);
 }
 
 /* Test for append permission */
 #define A_OK	16
 
-static int check_iflags_access(struct fuse4fs *ff, ext2_ino_t ino,
-			       const struct ext2_inode *inode, int mask)
+static int fuse4fs_iflags_access(struct fuse4fs *ff, ext2_ino_t ino,
+				 const struct ext2_inode *inode, int mask)
 {
 	EXT2FS_BUILD_BUG_ON((A_OK & (R_OK | W_OK | X_OK | F_OK)) != 0);
 
@@ -780,7 +782,7 @@ static int check_iflags_access(struct fuse4fs *ff, ext2_ino_t ino,
 	return 0;
 }
 
-static int check_inum_access(struct fuse4fs *ff, ext2_ino_t ino, int mask)
+static int fuse4fs_inum_access(struct fuse4fs *ff, ext2_ino_t ino, int mask)
 {
 	struct fuse_context *ctxt = fuse_get_context();
 	ext2_filsys fs = ff->fs;
@@ -812,7 +814,7 @@ static int check_inum_access(struct fuse4fs *ff, ext2_ino_t ino, int mask)
 	if (mask == 0)
 		return 0;
 
-	ret = check_iflags_access(ff, ino, &inode, mask);
+	ret = fuse4fs_iflags_access(ff, ino, &inode, mask);
 	if (ret)
 		return ret;
 
@@ -821,7 +823,7 @@ static int check_inum_access(struct fuse4fs *ff, ext2_ino_t ino, int mask)
 		return 0;
 
 	/* Figure out what root's allowed to do */
-	if (is_superuser(ff, ctxt)) {
+	if (fuse4fs_is_superuser(ff, ctxt)) {
 		/* Non-file access always ok */
 		if (!LINUX_S_ISREG(inode.i_mode))
 			return 0;
@@ -1539,8 +1541,8 @@ static int op_readlink(const char *path, char *buf, size_t len)
 	return ret;
 }
 
-static int __getxattr(struct fuse4fs *ff, ext2_ino_t ino, const char *name,
-		      void **value, size_t *value_len)
+static int fuse4fs_getxattr(struct fuse4fs *ff, ext2_ino_t ino,
+			    const char *name, void **value, size_t *value_len)
 {
 	ext2_filsys fs = ff->fs;
 	struct ext2_xattr_handle *h;
@@ -1570,8 +1572,8 @@ static int __getxattr(struct fuse4fs *ff, ext2_ino_t ino, const char *name,
 	return ret;
 }
 
-static int __setxattr(struct fuse4fs *ff, ext2_ino_t ino, const char *name,
-		      void *value, size_t valuelen)
+static int fuse4fs_setxattr(struct fuse4fs *ff, ext2_ino_t ino,
+			    const char *name, void *value, size_t valuelen)
 {
 	ext2_filsys fs = ff->fs;
 	struct ext2_xattr_handle *h;
@@ -1601,8 +1603,8 @@ static int __setxattr(struct fuse4fs *ff, ext2_ino_t ino, const char *name,
 	return ret;
 }
 
-static int propagate_default_acls(struct fuse4fs *ff, ext2_ino_t parent,
-				  ext2_ino_t child, mode_t mode)
+static int fuse4fs_propagate_default_acls(struct fuse4fs *ff, ext2_ino_t parent,
+					  ext2_ino_t child, mode_t mode)
 {
 	void *def;
 	size_t deflen;
@@ -1611,8 +1613,8 @@ static int propagate_default_acls(struct fuse4fs *ff, ext2_ino_t parent,
 	if (!ff->acl || S_ISDIR(mode))
 		return 0;
 
-	ret = __getxattr(ff, parent, XATTR_NAME_POSIX_ACL_DEFAULT, &def,
-			 &deflen);
+	ret = fuse4fs_getxattr(ff, parent, XATTR_NAME_POSIX_ACL_DEFAULT, &def,
+			       &deflen);
 	switch (ret) {
 	case -ENODATA:
 	case -ENOENT:
@@ -1624,7 +1626,8 @@ static int propagate_default_acls(struct fuse4fs *ff, ext2_ino_t parent,
 		return ret;
 	}
 
-	ret = __setxattr(ff, child, XATTR_NAME_POSIX_ACL_DEFAULT, def, deflen);
+	ret = fuse4fs_setxattr(ff, child, XATTR_NAME_POSIX_ACL_DEFAULT, def,
+			       deflen);
 	ext2fs_free_mem(&def);
 	return ret;
 }
@@ -1753,7 +1756,7 @@ static int op_mknod(const char *path, mode_t mode, dev_t dev)
 	*node_name = 0;
 
 	fs = fuse4fs_start(ff);
-	if (!fs_can_allocate(ff, 2)) {
+	if (!fuse4fs_can_allocate(ff, 2)) {
 		ret = -ENOSPC;
 		goto out2;
 	}
@@ -1765,7 +1768,7 @@ static int op_mknod(const char *path, mode_t mode, dev_t dev)
 		goto out2;
 	}
 
-	ret = check_inum_access(ff, parent, A_OK | W_OK);
+	ret = fuse4fs_inum_access(ff, parent, A_OK | W_OK);
 	if (ret)
 		goto out2;
 
@@ -1835,7 +1838,7 @@ static int op_mknod(const char *path, mode_t mode, dev_t dev)
 
 	ext2fs_inode_alloc_stats2(fs, child, 1, 0);
 
-	ret = propagate_default_acls(ff, parent, child, inode.i_mode);
+	ret = fuse4fs_propagate_default_acls(ff, parent, child, inode.i_mode);
 	if (ret)
 		goto out2;
 
@@ -1883,7 +1886,7 @@ static int op_mkdir(const char *path, mode_t mode)
 	*node_name = 0;
 
 	fs = fuse4fs_start(ff);
-	if (!fs_can_allocate(ff, 1)) {
+	if (!fuse4fs_can_allocate(ff, 1)) {
 		ret = -ENOSPC;
 		goto out2;
 	}
@@ -1895,7 +1898,7 @@ static int op_mkdir(const char *path, mode_t mode)
 		goto out2;
 	}
 
-	ret = check_inum_access(ff, parent, A_OK | W_OK);
+	ret = fuse4fs_inum_access(ff, parent, A_OK | W_OK);
 	if (ret)
 		goto out2;
 
@@ -1968,7 +1971,7 @@ static int op_mkdir(const char *path, mode_t mode)
 		goto out3;
 	}
 
-	ret = propagate_default_acls(ff, parent, child, inode.i_mode);
+	ret = fuse4fs_propagate_default_acls(ff, parent, child, inode.i_mode);
 	if (ret)
 		goto out3;
 
@@ -2009,7 +2012,7 @@ static int fuse4fs_unlink(struct fuse4fs *ff, const char *path,
 		base_name = filename;
 	}
 
-	ret = check_inum_access(ff, dir, W_OK);
+	ret = fuse4fs_inum_access(ff, dir, W_OK);
 	if (ret) {
 		free(filename);
 		return ret;
@@ -2031,8 +2034,8 @@ static int fuse4fs_unlink(struct fuse4fs *ff, const char *path,
 	return 0;
 }
 
-static int remove_ea_inodes(struct fuse4fs *ff, ext2_ino_t ino,
-			    struct ext2_inode_large *inode)
+static int fuse4fs_remove_ea_inodes(struct fuse4fs *ff, ext2_ino_t ino,
+				    struct ext2_inode_large *inode)
 {
 	ext2_filsys fs = ff->fs;
 	struct ext2_xattr_handle *h;
@@ -2076,7 +2079,7 @@ static int remove_ea_inodes(struct fuse4fs *ff, ext2_ino_t ino,
 	return 0;
 }
 
-static int remove_inode(struct fuse4fs *ff, ext2_ino_t ino)
+static int fuse4fs_remove_inode(struct fuse4fs *ff, ext2_ino_t ino)
 {
 	ext2_filsys fs = ff->fs;
 	errcode_t err;
@@ -2109,7 +2112,7 @@ static int remove_inode(struct fuse4fs *ff, ext2_ino_t ino)
 		goto write_out;
 
 	if (ext2fs_has_feature_ea_inode(fs->super)) {
-		ret = remove_ea_inodes(ff, ino, &inode);
+		ret = fuse4fs_remove_ea_inodes(ff, ino, &inode);
 		if (ret)
 			return ret;
 	}
@@ -2150,7 +2153,7 @@ static int __op_unlink(struct fuse4fs *ff, const char *path)
 		goto out;
 	}
 
-	ret = check_inum_access(ff, ino, W_OK);
+	ret = fuse4fs_inum_access(ff, ino, W_OK);
 	if (ret)
 		goto out;
 
@@ -2158,7 +2161,7 @@ static int __op_unlink(struct fuse4fs *ff, const char *path)
 	if (ret)
 		goto out;
 
-	ret = remove_inode(ff, ino);
+	ret = fuse4fs_remove_inode(ff, ino);
 	if (ret)
 		goto out;
 
@@ -2226,7 +2229,7 @@ static int __op_rmdir(struct fuse4fs *ff, const char *path)
 	}
 	dbg_printf(ff, "%s: rmdir path=%s ino=%d\n", __func__, path, child);
 
-	ret = check_inum_access(ff, child, W_OK);
+	ret = fuse4fs_inum_access(ff, child, W_OK);
 	if (ret)
 		goto out;
 
@@ -2245,7 +2248,7 @@ static int __op_rmdir(struct fuse4fs *ff, const char *path)
 		goto out;
 	}
 
-	ret = check_inum_access(ff, rds.parent, W_OK);
+	ret = fuse4fs_inum_access(ff, rds.parent, W_OK);
 	if (ret)
 		goto out;
 
@@ -2258,10 +2261,10 @@ static int __op_rmdir(struct fuse4fs *ff, const char *path)
 	if (ret)
 		goto out;
 	/* Directories have to be "removed" twice. */
-	ret = remove_inode(ff, child);
+	ret = fuse4fs_remove_inode(ff, child);
 	if (ret)
 		goto out;
-	ret = remove_inode(ff, child);
+	ret = fuse4fs_remove_inode(ff, child);
 	if (ret)
 		goto out;
 
@@ -2347,7 +2350,7 @@ static int op_symlink(const char *src, const char *dest)
 		goto out2;
 	}
 
-	ret = check_inum_access(ff, parent, A_OK | W_OK);
+	ret = fuse4fs_inum_access(ff, parent, A_OK | W_OK);
 	if (ret)
 		goto out2;
 
@@ -2459,7 +2462,7 @@ static int op_rename(const char *from, const char *to,
 	FUSE4FS_CHECK_CONTEXT(ff);
 	dbg_printf(ff, "%s: renaming %s to %s\n", __func__, from, to);
 	fs = fuse4fs_start(ff);
-	if (!fs_can_allocate(ff, 5)) {
+	if (!fuse4fs_can_allocate(ff, 5)) {
 		ret = -ENOSPC;
 		goto out;
 	}
@@ -2485,12 +2488,12 @@ static int op_rename(const char *from, const char *to,
 		goto out;
 	}
 
-	ret = check_inum_access(ff, from_ino, W_OK);
+	ret = fuse4fs_inum_access(ff, from_ino, W_OK);
 	if (ret)
 		goto out;
 
 	if (to_ino) {
-		ret = check_inum_access(ff, to_ino, W_OK);
+		ret = fuse4fs_inum_access(ff, to_ino, W_OK);
 		if (ret)
 			goto out;
 	}
@@ -2528,7 +2531,7 @@ static int op_rename(const char *from, const char *to,
 		goto out2;
 	}
 
-	ret = check_inum_access(ff, from_dir_ino, W_OK);
+	ret = fuse4fs_inum_access(ff, from_dir_ino, W_OK);
 	if (ret)
 		goto out2;
 
@@ -2553,7 +2556,7 @@ static int op_rename(const char *from, const char *to,
 		goto out2;
 	}
 
-	ret = check_inum_access(ff, to_dir_ino, W_OK);
+	ret = fuse4fs_inum_access(ff, to_dir_ino, W_OK);
 	if (ret)
 		goto out2;
 
@@ -2700,7 +2703,7 @@ static int op_link(const char *src, const char *dest)
 	*node_name = 0;
 
 	fs = fuse4fs_start(ff);
-	if (!fs_can_allocate(ff, 2)) {
+	if (!fuse4fs_can_allocate(ff, 2)) {
 		ret = -ENOSPC;
 		goto out2;
 	}
@@ -2713,7 +2716,7 @@ static int op_link(const char *src, const char *dest)
 		goto out2;
 	}
 
-	ret = check_inum_access(ff, parent, A_OK | W_OK);
+	ret = fuse4fs_inum_access(ff, parent, A_OK | W_OK);
 	if (ret)
 		goto out2;
 
@@ -2729,7 +2732,7 @@ static int op_link(const char *src, const char *dest)
 		goto out2;
 	}
 
-	ret = check_iflags_access(ff, ino, EXT2_INODE(&inode), W_OK);
+	ret = fuse4fs_iflags_access(ff, ino, EXT2_INODE(&inode), W_OK);
 	if (ret)
 		goto out2;
 
@@ -2769,7 +2772,7 @@ static int op_link(const char *src, const char *dest)
 }
 
 /* Obtain group ids of the process that sent us a command(?) */
-static int get_req_groups(struct fuse4fs *ff, gid_t **gids, size_t *nr_gids)
+static int fuse4fs_get_groups(struct fuse4fs *ff, gid_t **gids, size_t *nr_gids)
 {
 	ext2_filsys fs = ff->fs;
 	errcode_t err;
@@ -2814,8 +2817,8 @@ static int get_req_groups(struct fuse4fs *ff, gid_t **gids, size_t *nr_gids)
  * that initiated the fuse request?  Returns 1 for yes, 0 for no, or a negative
  * errno.
  */
-static int in_file_group(struct fuse_context *ctxt,
-			 const struct ext2_inode_large *inode)
+static int fuse4fs_in_file_group(struct fuse_context *ctxt,
+				 const struct ext2_inode_large *inode)
 {
 	struct fuse4fs *ff = fuse4fs_get();
 	gid_t *gids = NULL;
@@ -2827,7 +2830,7 @@ static int in_file_group(struct fuse_context *ctxt,
 	if (ctxt->gid == gid)
 		return 1;
 
-	ret = get_req_groups(ff, &gids, &nr_gids);
+	ret = fuse4fs_get_groups(ff, &gids, &nr_gids);
 	if (ret == -ENOENT) {
 		/* magic return code for "could not get caller group info" */
 		return 0;
@@ -2870,11 +2873,11 @@ static int op_chmod(const char *path, mode_t mode, struct fuse_file_info *fi)
 		goto out;
 	}
 
-	ret = check_iflags_access(ff, ino, EXT2_INODE(&inode), W_OK);
+	ret = fuse4fs_iflags_access(ff, ino, EXT2_INODE(&inode), W_OK);
 	if (ret)
 		goto out;
 
-	if (want_check_owner(ff, ctxt) && ctxt->uid != inode_uid(inode)) {
+	if (fuse4fs_want_check_owner(ff, ctxt) && ctxt->uid != inode_uid(inode)) {
 		ret = -EPERM;
 		goto out;
 	}
@@ -2884,8 +2887,8 @@ static int op_chmod(const char *path, mode_t mode, struct fuse_file_info *fi)
 	 * of the user's groups, but FUSE only tells us about the primary
 	 * group.
 	 */
-	if (!is_superuser(ff, ctxt)) {
-		ret = in_file_group(ctxt, &inode);
+	if (!fuse4fs_is_superuser(ff, ctxt)) {
+		ret = fuse4fs_in_file_group(ctxt, &inode);
 		if (ret < 0)
 			goto out;
 
@@ -2939,14 +2942,14 @@ static int op_chown(const char *path, uid_t owner, gid_t group,
 		goto out;
 	}
 
-	ret = check_iflags_access(ff, ino, EXT2_INODE(&inode), W_OK);
+	ret = fuse4fs_iflags_access(ff, ino, EXT2_INODE(&inode), W_OK);
 	if (ret)
 		goto out;
 
 	/* FUSE seems to feed us ~0 to mean "don't change" */
 	if (owner != (uid_t) ~0) {
 		/* Only root gets to change UID. */
-		if (want_check_owner(ff, ctxt) &&
+		if (fuse4fs_want_check_owner(ff, ctxt) &&
 		    !(inode_uid(inode) == ctxt->uid && owner == ctxt->uid)) {
 			ret = -EPERM;
 			goto out;
@@ -2956,7 +2959,7 @@ static int op_chown(const char *path, uid_t owner, gid_t group,
 
 	if (group != (gid_t) ~0) {
 		/* Only root or the owner get to change GID. */
-		if (want_check_owner(ff, ctxt) &&
+		if (fuse4fs_want_check_owner(ff, ctxt) &&
 		    inode_uid(inode) != ctxt->uid) {
 			ret = -EPERM;
 			goto out;
@@ -3066,7 +3069,7 @@ static int op_truncate(const char *path, off_t len, struct fuse_file_info *fi)
 		goto out;
 	dbg_printf(ff, "%s: ino=%d len=%jd\n", __func__, ino, (intmax_t) len);
 
-	ret = check_inum_access(ff, ino, W_OK);
+	ret = fuse4fs_inum_access(ff, ino, W_OK);
 	if (ret)
 		goto out;
 
@@ -3148,7 +3151,7 @@ static int __op_open(struct fuse4fs *ff, const char *path,
 	}
 	dbg_printf(ff, "%s: ino=%d\n", __func__, file->ino);
 
-	ret = check_inum_access(ff, file->ino, check);
+	ret = fuse4fs_inum_access(ff, file->ino, check);
 	if (ret) {
 		/*
 		 * In a regular (Linux) fs driver, the kernel will open
@@ -3160,7 +3163,7 @@ static int __op_open(struct fuse4fs *ff, const char *path,
 		 * also employ undocumented hacks (see above).
 		 */
 		if (check == R_OK) {
-			ret = check_inum_access(ff, file->ino, X_OK);
+			ret = fuse4fs_inum_access(ff, file->ino, X_OK);
 			if (ret)
 				goto out;
 			check = X_OK;
@@ -3271,7 +3274,7 @@ static int op_write(const char *path EXT2FS_ATTR((unused)),
 		goto out;
 	}
 
-	if (!fs_can_allocate(ff, FUSE4FS_B_TO_FSB(ff, len))) {
+	if (!fuse4fs_can_allocate(ff, FUSE4FS_B_TO_FSB(ff, len))) {
 		ret = -ENOSPC;
 		goto out;
 	}
@@ -3471,11 +3474,11 @@ static int op_getxattr(const char *path, const char *key, char *value,
 	}
 	dbg_printf(ff, "%s: ino=%d name=%s\n", __func__, ino, key);
 
-	ret = check_inum_access(ff, ino, R_OK);
+	ret = fuse4fs_inum_access(ff, ino, R_OK);
 	if (ret)
 		goto out;
 
-	ret = __getxattr(ff, ino, key, &ptr, &plen);
+	ret = fuse4fs_getxattr(ff, ino, key, &ptr, &plen);
 	if (ret)
 		goto out;
 
@@ -3541,7 +3544,7 @@ static int op_listxattr(const char *path, char *names, size_t len)
 	}
 	dbg_printf(ff, "%s: ino=%d\n", __func__, ino);
 
-	ret = check_inum_access(ff, ino, R_OK);
+	ret = fuse4fs_inum_access(ff, ino, R_OK);
 	if (ret)
 		goto out;
 
@@ -3622,7 +3625,7 @@ static int op_setxattr(const char *path EXT2FS_ATTR((unused)),
 	}
 	dbg_printf(ff, "%s: ino=%d name=%s\n", __func__, ino, key);
 
-	ret = check_inum_access(ff, ino, W_OK);
+	ret = fuse4fs_inum_access(ff, ino, W_OK);
 	if (ret == -EACCES) {
 		ret = -EPERM;
 		goto out;
@@ -3711,7 +3714,7 @@ static int op_removexattr(const char *path, const char *key)
 		goto out;
 	}
 
-	if (!fs_can_allocate(ff, 1)) {
+	if (!fuse4fs_can_allocate(ff, 1)) {
 		ret = -ENOSPC;
 		goto out;
 	}
@@ -3723,7 +3726,7 @@ static int op_removexattr(const char *path, const char *key)
 	}
 	dbg_printf(ff, "%s: ino=%d name=%s\n", __func__, ino, key);
 
-	ret = check_inum_access(ff, ino, W_OK);
+	ret = fuse4fs_inum_access(ff, ino, W_OK);
 	if (ret)
 		goto out;
 
@@ -3910,7 +3913,7 @@ static int op_access(const char *path, int mask)
 		goto out;
 	}
 
-	ret = check_inum_access(ff, ino, mask);
+	ret = fuse4fs_inum_access(ff, ino, mask);
 	if (ret)
 		goto out;
 
@@ -3950,7 +3953,7 @@ static int op_create(const char *path, mode_t mode, struct fuse_file_info *fp)
 	*node_name = 0;
 
 	fs = fuse4fs_start(ff);
-	if (!fs_can_allocate(ff, 1)) {
+	if (!fuse4fs_can_allocate(ff, 1)) {
 		ret = -ENOSPC;
 		goto out2;
 	}
@@ -3962,7 +3965,7 @@ static int op_create(const char *path, mode_t mode, struct fuse_file_info *fp)
 		goto out2;
 	}
 
-	ret = check_inum_access(ff, parent, A_OK | W_OK);
+	ret = fuse4fs_inum_access(ff, parent, A_OK | W_OK);
 	if (ret)
 		goto out2;
 
@@ -4029,7 +4032,7 @@ static int op_create(const char *path, mode_t mode, struct fuse_file_info *fp)
 
 	ext2fs_inode_alloc_stats2(fs, child, 1, 0);
 
-	ret = propagate_default_acls(ff, parent, child, inode.i_mode);
+	ret = fuse4fs_propagate_default_acls(ff, parent, child, inode.i_mode);
 	if (ret)
 		goto out2;
 
@@ -4077,7 +4080,7 @@ static int op_utimens(const char *path, const struct timespec ctv[2],
 	 */
 	if (ctv[0].tv_nsec == UTIME_NOW && ctv[1].tv_nsec == UTIME_NOW)
 		access |= A_OK;
-	ret = check_inum_access(ff, ino, access);
+	ret = fuse4fs_inum_access(ff, ino, access);
 	if (ret)
 		goto out;
 
@@ -4162,7 +4165,7 @@ static int ioctl_setflags(struct fuse4fs *ff, struct fuse4fs_file_handle *fh,
 	if (err)
 		return translate_error(fs, fh->ino, err);
 
-	if (want_check_owner(ff, ctxt) && inode_uid(inode) != ctxt->uid)
+	if (fuse4fs_want_check_owner(ff, ctxt) && inode_uid(inode) != ctxt->uid)
 		return -EPERM;
 
 	ret = set_iflags(&inode, flags);
@@ -4211,7 +4214,7 @@ static int ioctl_setversion(struct fuse4fs *ff, struct fuse4fs_file_handle *fh,
 	if (err)
 		return translate_error(fs, fh->ino, err);
 
-	if (want_check_owner(ff, ctxt) && inode_uid(inode) != ctxt->uid)
+	if (fuse4fs_want_check_owner(ff, ctxt) && inode_uid(inode) != ctxt->uid)
 		return -EPERM;
 
 	inode.i_generation = generation;
@@ -4336,7 +4339,7 @@ static int ioctl_fssetxattr(struct fuse4fs *ff, struct fuse4fs_file_handle *fh,
 	if (err)
 		return translate_error(fs, fh->ino, err);
 
-	if (want_check_owner(ff, ctxt) && inode_uid(inode) != ctxt->uid)
+	if (fuse4fs_want_check_owner(ff, ctxt) && inode_uid(inode) != ctxt->uid)
 		return -EPERM;
 
 	ret = set_xflags(&inode, fsx->fsx_xflags);
@@ -4465,7 +4468,7 @@ static int ioctl_shutdown(struct fuse4fs *ff, struct fuse4fs_file_handle *fh,
 	struct fuse_context *ctxt = fuse_get_context();
 	ext2_filsys fs = ff->fs;
 
-	if (!is_superuser(ff, ctxt))
+	if (!fuse4fs_is_superuser(ff, ctxt))
 		return -EPERM;
 
 	err_printf(ff, "%s.\n", _("shut down requested"));
@@ -4584,7 +4587,7 @@ static int fuse4fs_allocate_range(struct fuse4fs *ff,
 		   (unsigned long long)len,
 		   (unsigned long long)start,
 		   (unsigned long long)end);
-	if (!fs_can_allocate(ff, FUSE4FS_B_TO_FSB(ff, len)))
+	if (!fuse4fs_can_allocate(ff, FUSE4FS_B_TO_FSB(ff, len)))
 		return -ENOSPC;
 
 	err = fuse4fs_read_inode(fs, fh->ino, &inode);
@@ -4627,9 +4630,9 @@ static int fuse4fs_allocate_range(struct fuse4fs *ff,
 	return err;
 }
 
-static errcode_t clean_block_middle(struct fuse4fs *ff, ext2_ino_t ino,
-				    struct ext2_inode_large *inode,
-				    off_t offset, off_t len, char **buf)
+static errcode_t fuse4fs_zero_middle(struct fuse4fs *ff, ext2_ino_t ino,
+				     struct ext2_inode_large *inode,
+				     off_t offset, off_t len, char **buf)
 {
 	ext2_filsys fs = ff->fs;
 	blk64_t blk;
@@ -4663,9 +4666,9 @@ static errcode_t clean_block_middle(struct fuse4fs *ff, ext2_ino_t ino,
 	return io_channel_write_blk64(fs->io, blk, 1, *buf);
 }
 
-static errcode_t clean_block_edge(struct fuse4fs *ff, ext2_ino_t ino,
-				  struct ext2_inode_large *inode, off_t offset,
-				  int clean_before, char **buf)
+static errcode_t fuse4fs_zero_edge(struct fuse4fs *ff, ext2_ino_t ino,
+				   struct ext2_inode_large *inode, off_t offset,
+				   int clean_before, char **buf)
 {
 	ext2_filsys fs = ff->fs;
 	blk64_t blk;
@@ -4756,13 +4759,13 @@ static int fuse4fs_punch_range(struct fuse4fs *ff,
 
 	/* Zero everything before the first block and after the last block */
 	if (FUSE4FS_B_TO_FSBT(ff, offset) == FUSE4FS_B_TO_FSBT(ff, offset + len))
-		err = clean_block_middle(ff, fh->ino, &inode, offset,
+		err = fuse4fs_zero_middle(ff, fh->ino, &inode, offset,
 					 len, &buf);
 	else {
-		err = clean_block_edge(ff, fh->ino, &inode, offset, 0, &buf);
+		err = fuse4fs_zero_edge(ff, fh->ino, &inode, offset, 0, &buf);
 		if (!err)
-			err = clean_block_edge(ff, fh->ino, &inode,
-					       offset + len, 1, &buf);
+			err = fuse4fs_zero_edge(ff, fh->ino, &inode,
+						offset + len, 1, &buf);
 	}
 	if (buf)
 		ext2fs_free_mem(&buf);


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 05/21] fuse4fs: convert to low level API
  2025-09-16  0:22 ` [PATCHSET RFC v5 2/9] fuse4fs: fork a low level fuse server Darrick J. Wong
                     ` (3 preceding siblings ...)
  2025-09-16  0:51   ` [PATCH 04/21] fuse4fs: namespace some helpers Darrick J. Wong
@ 2025-09-16  0:51   ` Darrick J. Wong
  2025-09-16  0:52   ` [PATCH 06/21] libsupport: port the kernel list.h to libsupport Darrick J. Wong
                     ` (15 subsequent siblings)
  20 siblings, 0 replies; 87+ messages in thread
From: Darrick J. Wong @ 2025-09-16  0:51 UTC (permalink / raw)
  To: tytso
  Cc: amir73il, miklos, neal, amir73il, linux-fsdevel, linux-ext4, John,
	bernd, joannelkoong

From: Darrick J. Wong <djwong@kernel.org>

Convert fuse4fs to the lowlevel fuse API.  Amir supplied the auto
translation; I ported and cleaned it up by hand, and did the QA work to
make sure it still runs correctly.

Co-developed-by: Claude claude-4-sonnet
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 fuse4fs/fuse4fs.c | 2012 ++++++++++++++++++++++++++++-------------------------
 1 file changed, 1072 insertions(+), 940 deletions(-)


diff --git a/fuse4fs/fuse4fs.c b/fuse4fs/fuse4fs.c
index a4eeb86201db0c..8b65dd1b419eaa 100644
--- a/fuse4fs/fuse4fs.c
+++ b/fuse4fs/fuse4fs.c
@@ -41,7 +41,7 @@
 # define __SET_FOB_FOR_FUSE
 # define _FILE_OFFSET_BITS 64
 #endif /* _FILE_OFFSET_BITS */
-#include <fuse.h>
+#include <fuse_lowlevel.h>
 #ifdef __SET_FOB_FOR_FUSE
 # undef _FILE_OFFSET_BITS
 #endif /* __SET_FOB_FOR_FUSE */
@@ -116,6 +116,8 @@
 #endif
 #endif /* !defined(ENODATA) */
 
+#define FUSE4FS_ATTR_TIMEOUT	(0.0)
+
 static inline uint64_t round_up(uint64_t b, unsigned int align)
 {
 	unsigned int m;
@@ -254,16 +256,18 @@ struct fuse4fs {
 	/* options set by fuse_opt_parse must be of type int */
 	int timing;
 #endif
+	struct fuse_session *fuse;
 };
 
-#define FUSE4FS_CHECK_HANDLE(ff, fh) \
+#define FUSE4FS_CHECK_HANDLE(req, fh) \
 	do { \
 		if ((fh) == NULL || (fh)->magic != FUSE4FS_FILE_MAGIC) { \
 			fprintf(stderr, \
 				"FUSE4FS: Corrupt in-memory file handle at %s:%d!\n", \
 				__func__, __LINE__); \
 			fflush(stderr); \
-			return -EUCLEAN; \
+			fuse_reply_err(req, EUCLEAN); \
+			return; \
 		} \
 	} while (0)
 
@@ -275,19 +279,52 @@ struct fuse4fs {
 				__func__, __LINE__); \
 			fflush(stderr); \
 			retcode; \
+			return; \
 		} \
 		if ((ff)->opstate == F4OP_SHUTDOWN) { \
 			shutcode; \
+			return; \
 		} \
 	} while (0)
 
-#define FUSE4FS_CHECK_CONTEXT(ff) \
-	__FUSE4FS_CHECK_CONTEXT((ff), return -EUCLEAN, return -EIO)
+#define FUSE4FS_CHECK_CONTEXT(req) \
+	__FUSE4FS_CHECK_CONTEXT(fuse4fs_get(req), \
+				fuse_reply_err((req), EUCLEAN), \
+				fuse_reply_err((req), EIO))
 #define FUSE4FS_CHECK_CONTEXT_RETURN(ff) \
 	__FUSE4FS_CHECK_CONTEXT((ff), return, return)
 #define FUSE4FS_CHECK_CONTEXT_ABORT(ff) \
 	__FUSE4FS_CHECK_CONTEXT((ff), abort(), abort())
 
+static inline void fuse4fs_ino_from_fuse(ext2_ino_t *inop, fuse_ino_t fino)
+{
+	if (fino == FUSE_ROOT_ID)
+		*inop = EXT2_ROOT_INO;
+	else
+		*inop = fino;
+}
+
+static inline void fuse4fs_ino_to_fuse(fuse_ino_t *finop, ext2_ino_t ino)
+{
+	if (ino == EXT2_ROOT_INO)
+		*finop = FUSE_ROOT_ID;
+	else
+		*finop = ino;
+}
+
+#define FUSE4FS_CONVERT_FINO(req, ext2_inop, fuse_ino) \
+	do { \
+		if ((fuse_ino) > UINT32_MAX) { \
+			fprintf(stderr, \
+				"FUSE4FS: Bogus inode number 0x%llx at %s:%d!\n", \
+				(unsigned long long)(fuse_ino), __func__, __LINE__); \
+			fflush(stderr); \
+			fuse_reply_err((req), EIO); \
+			return; \
+		} \
+		fuse4fs_ino_from_fuse(ext2_inop, fuse_ino); \
+	} while (0)
+
 static int __translate_error(ext2_filsys fs, ext2_ino_t ino, errcode_t err,
 			     const char *func, int line);
 #define translate_error(fs, ino, err) __translate_error((fs), (ino), (err), \
@@ -454,11 +491,9 @@ static inline errcode_t fuse4fs_write_inode(ext2_filsys fs, ext2_ino_t ino,
 				       sizeof(*inode));
 }
 
-static inline struct fuse4fs *fuse4fs_get(void)
+static inline struct fuse4fs *fuse4fs_get(fuse_req_t req)
 {
-	struct fuse_context *ctxt = fuse_get_context();
-
-	return ctxt->private_data;
+	return (struct fuse4fs *)fuse_req_userdata(req);
 }
 
 static inline struct fuse4fs_file_handle *
@@ -471,6 +506,7 @@ static inline void
 fuse4fs_set_handle(struct fuse_file_info *fp, struct fuse4fs_file_handle *fh)
 {
 	fp->fh = (uintptr_t)fh;
+	fp->keep_cache = 1;
 }
 
 #ifdef HAVE_CLOCK_MONOTONIC
@@ -731,7 +767,7 @@ static int fuse4fs_is_writeable(const struct fuse4fs *ff)
 }
 
 static inline int fuse4fs_is_superuser(struct fuse4fs *ff,
-				       const struct fuse_context *ctxt)
+				       const struct fuse_ctx *ctxt)
 {
 	if (ff->fakeroot)
 		return 1;
@@ -739,7 +775,7 @@ static inline int fuse4fs_is_superuser(struct fuse4fs *ff,
 }
 
 static inline int fuse4fs_want_check_owner(struct fuse4fs *ff,
-					   const struct fuse_context *ctxt)
+					   const struct fuse_ctx *ctxt)
 {
 	/*
 	 * The kernel is responsible for access control, so we allow anything
@@ -782,9 +818,9 @@ static int fuse4fs_iflags_access(struct fuse4fs *ff, ext2_ino_t ino,
 	return 0;
 }
 
-static int fuse4fs_inum_access(struct fuse4fs *ff, ext2_ino_t ino, int mask)
+static int fuse4fs_inum_access(struct fuse4fs *ff, const struct fuse_ctx *ctxt,
+			       ext2_ino_t ino, int mask)
 {
-	struct fuse_context *ctxt = fuse_get_context();
 	ext2_filsys fs = ff->fs;
 	struct ext2_inode inode;
 	mode_t perms;
@@ -1118,9 +1154,9 @@ static int fuse4fs_mount(struct fuse4fs *ff)
 	return 0;
 }
 
-static void op_destroy(void *p EXT2FS_ATTR((unused)))
+static void op_destroy(void *userdata)
 {
-	struct fuse4fs *ff = fuse4fs_get();
+	struct fuse4fs *ff = userdata;
 	ext2_filsys fs;
 	errcode_t err;
 
@@ -1302,24 +1338,13 @@ static inline int fuse_set_feature_flag(struct fuse_conn_info *conn,
 }
 #endif
 
-static void *op_init(struct fuse_conn_info *conn,
-		     struct fuse_config *cfg EXT2FS_ATTR((unused)))
+static void op_init(void *userdata, struct fuse_conn_info *conn)
 {
-	struct fuse4fs *ff = fuse4fs_get();
+	struct fuse4fs *ff = userdata;
 	ext2_filsys fs;
 
 	FUSE4FS_CHECK_CONTEXT_ABORT(ff);
 
-	/*
-	 * Configure logging a second time, because libfuse might have
-	 * redirected std{out,err} as part of daemonization.  If this fails,
-	 * give up and move on.
-	 */
-	fuse4fs_setup_logging(ff);
-	if (ff->logfd >= 0)
-		close(ff->logfd);
-	ff->logfd = -1;
-
 	fs = ff->fs;
 	dbg_printf(ff, "%s: dev=%s\n", __func__, fs->device_name);
 #ifdef FUSE_CAP_IOCTL_DIR
@@ -1336,10 +1361,6 @@ static void *op_init(struct fuse_conn_info *conn,
 	fuse_set_feature_flag(conn, FUSE_CAP_NO_EXPORT_SUPPORT);
 #endif
 	conn->time_gran = 1;
-	cfg->use_ino = 1;
-	if (ff->debug)
-		cfg->debug = 1;
-	cfg->nullpath_ok = 1;
 
 	if (ff->kernel) {
 		char uuid[UUID_STR_SIZE];
@@ -1364,132 +1385,151 @@ static void *op_init(struct fuse_conn_info *conn,
 	 */
 	conn->want = conn->want_ext & 0xFFFFFFFF;
 #endif
-	return ff;
 }
 
-static int stat_inode(ext2_filsys fs, ext2_ino_t ino, struct stat *statbuf)
+struct fuse4fs_stat {
+	struct fuse_entry_param	entry;
+};
+
+static int fuse4fs_stat_inode(struct fuse4fs *ff, ext2_ino_t ino,
+			      struct ext2_inode_large *inodep,
+			      struct fuse4fs_stat *fstat)
 {
 	struct ext2_inode_large inode;
+	ext2_filsys fs = ff->fs;
+	struct fuse_entry_param *entry = &fstat->entry;
+	struct stat *statbuf = &entry->attr;
 	dev_t fakedev = 0;
 	errcode_t err;
-	int ret = 0;
 	struct timespec tv;
 
-	err = fuse4fs_read_inode(fs, ino, &inode);
-	if (err)
-		return translate_error(fs, ino, err);
+	memset(fstat, 0, sizeof(*fstat));
+
+	if (!inodep) {
+		err = fuse4fs_read_inode(fs, ino, &inode);
+		if (err)
+			return translate_error(fs, ino, err);
+		inodep = &inode;
+	}
 
 	memcpy(&fakedev, fs->super->s_uuid, sizeof(fakedev));
 	statbuf->st_dev = fakedev;
 	statbuf->st_ino = ino;
-	statbuf->st_mode = inode.i_mode;
-	statbuf->st_nlink = inode.i_links_count;
-	statbuf->st_uid = inode_uid(inode);
-	statbuf->st_gid = inode_gid(inode);
-	statbuf->st_size = EXT2_I_SIZE(&inode);
+	statbuf->st_mode = inodep->i_mode;
+	statbuf->st_nlink = inodep->i_links_count;
+	statbuf->st_uid = inode_uid(*inodep);
+	statbuf->st_gid = inode_gid(*inodep);
+	statbuf->st_size = EXT2_I_SIZE(inodep);
 	statbuf->st_blksize = fs->blocksize;
 	statbuf->st_blocks = ext2fs_get_stat_i_blocks(fs,
-						EXT2_INODE(&inode));
-	EXT4_INODE_GET_XTIME(i_atime, &tv, &inode);
+						EXT2_INODE(inodep));
+	EXT4_INODE_GET_XTIME(i_atime, &tv, inodep);
 #if HAVE_STRUCT_STAT_ST_ATIM
 	statbuf->st_atim = tv;
 #else
 	statbuf->st_atime = tv.tv_sec;
 #endif
-	EXT4_INODE_GET_XTIME(i_mtime, &tv, &inode);
+	EXT4_INODE_GET_XTIME(i_mtime, &tv, inodep);
 #if HAVE_STRUCT_STAT_ST_ATIM
 	statbuf->st_mtim = tv;
 #else
 	statbuf->st_mtime = tv.tv_sec;
 #endif
-	EXT4_INODE_GET_XTIME(i_ctime, &tv, &inode);
+	EXT4_INODE_GET_XTIME(i_ctime, &tv, inodep);
 #if HAVE_STRUCT_STAT_ST_ATIM
 	statbuf->st_ctim = tv;
 #else
 	statbuf->st_ctime = tv.tv_sec;
 #endif
-	if (LINUX_S_ISCHR(inode.i_mode) ||
-	    LINUX_S_ISBLK(inode.i_mode)) {
-		if (inode.i_block[0])
-			statbuf->st_rdev = inode.i_block[0];
+	if (LINUX_S_ISCHR(inodep->i_mode) ||
+	    LINUX_S_ISBLK(inodep->i_mode)) {
+		if (inodep->i_block[0])
+			statbuf->st_rdev = inodep->i_block[0];
 		else
-			statbuf->st_rdev = inode.i_block[1];
+			statbuf->st_rdev = inodep->i_block[1];
 	}
 
-	return ret;
-}
-
-static int __fuse4fs_file_ino(struct fuse4fs *ff, const char *path,
-			      struct fuse_file_info *fp EXT2FS_ATTR((unused)),
-			      ext2_ino_t *inop,
-			      const char *func,
-			      int line)
-{
-	ext2_filsys fs = ff->fs;
-	errcode_t err;
-
-	if (fp) {
-		struct fuse4fs_file_handle *fh = fuse4fs_get_handle(fp);
-
-		if (fh->ino == 0)
-			return -ESTALE;
-
-		*inop = fh->ino;
-		dbg_printf(ff, "%s: get ino=%d\n", func, fh->ino);
-		return 0;
-	}
-
-	dbg_printf(ff, "%s: get path=%s\n", func, path);
-	err = ext2fs_namei(fs, EXT2_ROOT_INO, EXT2_ROOT_INO, path, inop);
-	if (err)
-		return __translate_error(fs, 0, err, func, line);
+	fuse4fs_ino_to_fuse(&entry->ino, ino);
+	entry->generation = inodep->i_generation;
+	entry->attr_timeout = FUSE4FS_ATTR_TIMEOUT;
+	entry->entry_timeout = FUSE4FS_ATTR_TIMEOUT;
 
 	return 0;
 }
 
-# define fuse4fs_file_ino(ff, path, fp, inop) \
-	__fuse4fs_file_ino((ff), (path), (fp), (inop), __func__, __LINE__)
-
-static int op_getattr(const char *path, struct stat *statbuf,
-		      struct fuse_file_info *fi)
+static void op_lookup(fuse_req_t req, fuse_ino_t fino, const char *name)
 {
-	struct fuse4fs *ff = fuse4fs_get();
+	struct fuse4fs_stat fstat;
+	struct fuse4fs *ff = fuse4fs_get(req);
 	ext2_filsys fs;
-	ext2_ino_t ino;
+	ext2_ino_t parent, child;
+	errcode_t err;
 	int ret = 0;
 
-	FUSE4FS_CHECK_CONTEXT(ff);
+	FUSE4FS_CHECK_CONTEXT(req);
+	FUSE4FS_CONVERT_FINO(req, &parent, fino);
+	dbg_printf(ff, "%s: parent=%d name='%s'\n", __func__, parent, name);
 	fs = fuse4fs_start(ff);
-	ret = fuse4fs_file_ino(ff, path, fi, &ino);
+
+	err = ext2fs_namei(fs, EXT2_ROOT_INO, parent, name, &child);
+	if (err || child == 0) {
+		ret = translate_error(fs, 0, err);
+		goto out;
+	}
+
+	ret = fuse4fs_stat_inode(ff, child, NULL, &fstat);
 	if (ret)
 		goto out;
-	ret = stat_inode(fs, ino, statbuf);
+
 out:
 	fuse4fs_finish(ff, ret);
-	return ret;
+
+	if (ret)
+		fuse_reply_err(req, -ret);
+	else
+		fuse_reply_entry(req, &fstat.entry);
 }
 
-static int op_readlink(const char *path, char *buf, size_t len)
+static void op_getattr(fuse_req_t req, fuse_ino_t fino,
+		       struct fuse_file_info *fi EXT2FS_ATTR((unused)))
 {
-	struct fuse4fs *ff = fuse4fs_get();
-	ext2_filsys fs;
-	errcode_t err;
+	struct fuse4fs_stat fstat;
+	struct fuse4fs *ff = fuse4fs_get(req);
 	ext2_ino_t ino;
+	int ret = 0;
+
+	FUSE4FS_CHECK_CONTEXT(req);
+	FUSE4FS_CONVERT_FINO(req, &ino, fino);
+	fuse4fs_start(ff);
+	ret = fuse4fs_stat_inode(ff, ino, NULL, &fstat);
+	fuse4fs_finish(ff, ret);
+
+	if (ret)
+		fuse_reply_err(req, -ret);
+	else
+		fuse_reply_attr(req, &fstat.entry.attr,
+				fstat.entry.attr_timeout);
+}
+
+static void op_readlink(fuse_req_t req, fuse_ino_t fino)
+{
 	struct ext2_inode inode;
+	char buf[PATH_MAX + 1];
+	struct fuse4fs *ff = fuse4fs_get(req);
+	ext2_filsys fs;
+	ext2_file_t file;
+	errcode_t err;
+	ext2_ino_t ino;
+	size_t len = PATH_MAX;
 	unsigned int got;
-	ext2_file_t file;
 	int ret = 0;
 
-	FUSE4FS_CHECK_CONTEXT(ff);
-	dbg_printf(ff, "%s: path=%s\n", __func__, path);
+	FUSE4FS_CHECK_CONTEXT(req);
+	FUSE4FS_CONVERT_FINO(req, &ino, fino);
+	dbg_printf(ff, "%s: ino=%d\n", __func__, ino);
 	fs = fuse4fs_start(ff);
-	err = ext2fs_namei(fs, EXT2_ROOT_INO, EXT2_ROOT_INO, path, &ino);
-	if (err || ino == 0) {
-		ret = translate_error(fs, 0, err);
-		goto out;
-	}
 
-	err = ext2fs_read_inode(fs, ino, &inode);
+	err = ext2fs_read_inode(fs, fino, &inode);
 	if (err) {
 		ret = translate_error(fs, ino, err);
 		goto out;
@@ -1500,7 +1540,6 @@ static int op_readlink(const char *path, char *buf, size_t len)
 		goto out;
 	}
 
-	len--;
 	if (inode.i_size < len)
 		len = inode.i_size;
 	if (ext2fs_is_fast_symlink(&inode))
@@ -1538,7 +1577,11 @@ static int op_readlink(const char *path, char *buf, size_t len)
 
 out:
 	fuse4fs_finish(ff, ret);
-	return ret;
+
+	if (ret)
+		fuse_reply_err(req, -ret);
+	else
+		fuse_reply_readlink(req, buf);
 }
 
 static int fuse4fs_getxattr(struct fuse4fs *ff, ext2_ino_t ino,
@@ -1644,11 +1687,12 @@ static inline void fuse4fs_set_gid(struct ext2_inode_large *inode, gid_t gid)
 	ext2fs_set_i_gid_high(*inode, gid >> 16);
 }
 
-static int fuse4fs_new_child_gid(struct fuse4fs *ff, ext2_ino_t parent,
-				 gid_t *gid, int *parent_sgid)
+static int fuse4fs_new_child_gid(struct fuse4fs *ff,
+				 const struct fuse_ctx *ctxt,
+				 ext2_ino_t parent, gid_t *gid,
+				 int *parent_sgid)
 {
 	struct ext2_inode_large inode;
-	struct fuse_context *ctxt = fuse_get_context();
 	errcode_t err;
 
 	err = fuse4fs_read_inode(ff->fs, parent, &inode);
@@ -1724,36 +1768,44 @@ static void fuse4fs_set_extra_isize(struct fuse4fs *ff, ext2_ino_t ino,
 	inode->i_extra_isize = extra;
 }
 
-static int op_mknod(const char *path, mode_t mode, dev_t dev)
+static void fuse4fs_reply_entry(fuse_req_t req, ext2_ino_t ino,
+				struct ext2_inode_large *inode, int ret)
 {
-	struct fuse_context *ctxt = fuse_get_context();
-	struct fuse4fs *ff = fuse4fs_get();
+	struct fuse4fs_stat fstat;
+	struct fuse4fs *ff = fuse4fs_get(req);
+
+	if (ret) {
+		fuse_reply_err(req, -ret);
+		return;
+	}
+
+	/* Get stat info for the new entry */
+	ret = fuse4fs_stat_inode(ff, ino, inode, &fstat);
+	if (ret) {
+		fuse_reply_err(req, -ret);
+		return;
+	}
+
+	fuse_reply_entry(req, &fstat.entry);
+}
+
+static void op_mknod(fuse_req_t req, fuse_ino_t fino, const char *name,
+		     mode_t mode, dev_t dev)
+{
+	struct ext2_inode_large inode;
+	const struct fuse_ctx *ctxt = fuse_req_ctx(req);
+	struct fuse4fs *ff = fuse4fs_get(req);
 	ext2_filsys fs;
 	ext2_ino_t parent, child;
-	char *temp_path;
 	errcode_t err;
-	char *node_name, a;
 	int filetype;
-	struct ext2_inode_large inode;
 	gid_t gid;
 	int ret = 0;
 
-	FUSE4FS_CHECK_CONTEXT(ff);
-	dbg_printf(ff, "%s: path=%s mode=0%o dev=0x%x\n", __func__, path, mode,
-		   (unsigned int)dev);
-	temp_path = strdup(path);
-	if (!temp_path) {
-		ret = -ENOMEM;
-		goto out;
-	}
-	node_name = strrchr(temp_path, '/');
-	if (!node_name) {
-		ret = -ENOMEM;
-		goto out;
-	}
-	node_name++;
-	a = *node_name;
-	*node_name = 0;
+	FUSE4FS_CHECK_CONTEXT(req);
+	FUSE4FS_CONVERT_FINO(req, &parent, fino);
+	dbg_printf(ff, "%s: parent=%d name='%s' mode=0%o dev=0x%x\n",
+		   __func__, parent, name, mode, (unsigned int)dev);
 
 	fs = fuse4fs_start(ff);
 	if (!fuse4fs_can_allocate(ff, 2)) {
@@ -1761,33 +1813,14 @@ static int op_mknod(const char *path, mode_t mode, dev_t dev)
 		goto out2;
 	}
 
-	err = ext2fs_namei(fs, EXT2_ROOT_INO, EXT2_ROOT_INO, temp_path,
-			   &parent);
-	if (err) {
-		ret = translate_error(fs, 0, err);
-		goto out2;
-	}
-
-	ret = fuse4fs_inum_access(ff, parent, A_OK | W_OK);
+	ret = fuse4fs_inum_access(ff, ctxt, parent, A_OK | W_OK);
 	if (ret)
 		goto out2;
 
-	*node_name = a;
+	/* On a low level server, mknod handles all non-directory types */
+	filetype = ext2_file_type(mode);
 
-	if (LINUX_S_ISCHR(mode))
-		filetype = EXT2_FT_CHRDEV;
-	else if (LINUX_S_ISBLK(mode))
-		filetype = EXT2_FT_BLKDEV;
-	else if (LINUX_S_ISFIFO(mode))
-		filetype = EXT2_FT_FIFO;
-	else if (LINUX_S_ISSOCK(mode))
-		filetype = EXT2_FT_SOCK;
-	else {
-		ret = -EINVAL;
-		goto out2;
-	}
-
-	err = fuse4fs_new_child_gid(ff, parent, &gid, NULL);
+	err = fuse4fs_new_child_gid(ff, ctxt, parent, &gid, NULL);
 	if (err)
 		goto out2;
 
@@ -1797,9 +1830,9 @@ static int op_mknod(const char *path, mode_t mode, dev_t dev)
 		goto out2;
 	}
 
-	dbg_printf(ff, "%s: create ino=%d/name=%s in dir=%d\n", __func__, child,
-		   node_name, parent);
-	err = ext2fs_link(fs, parent, node_name, child,
+	dbg_printf(ff, "%s: create ino=%d name='%s' in dir=%d\n", __func__,
+		   child, name, parent);
+	err = ext2fs_link(fs, parent, name, child,
 			  filetype | EXT2FS_LINK_EXPAND);
 	if (err) {
 		ret = translate_error(fs, parent, err);
@@ -1848,42 +1881,28 @@ static int op_mknod(const char *path, mode_t mode, dev_t dev)
 
 out2:
 	fuse4fs_finish(ff, ret);
-out:
-	free(temp_path);
-	return ret;
+	fuse4fs_reply_entry(req, child, &inode, ret);
 }
 
-static int op_mkdir(const char *path, mode_t mode)
+static void op_mkdir(fuse_req_t req, fuse_ino_t fino, const char *name,
+		     mode_t mode)
 {
-	struct fuse_context *ctxt = fuse_get_context();
-	struct fuse4fs *ff = fuse4fs_get();
+	struct ext2_inode_large inode;
+	const struct fuse_ctx *ctxt = fuse_req_ctx(req);
+	struct fuse4fs *ff = fuse4fs_get(req);
 	ext2_filsys fs;
 	ext2_ino_t parent, child;
-	char *temp_path;
 	errcode_t err;
-	char *node_name, a;
-	struct ext2_inode_large inode;
 	char *block;
 	blk64_t blk;
 	int ret = 0;
 	gid_t gid;
 	int parent_sgid;
 
-	FUSE4FS_CHECK_CONTEXT(ff);
-	dbg_printf(ff, "%s: path=%s mode=0%o\n", __func__, path, mode);
-	temp_path = strdup(path);
-	if (!temp_path) {
-		ret = -ENOMEM;
-		goto out;
-	}
-	node_name = strrchr(temp_path, '/');
-	if (!node_name) {
-		ret = -ENOMEM;
-		goto out;
-	}
-	node_name++;
-	a = *node_name;
-	*node_name = 0;
+	FUSE4FS_CHECK_CONTEXT(req);
+	FUSE4FS_CONVERT_FINO(req, &parent, fino);
+	dbg_printf(ff, "%s: parent=%d name='%s' mode=0%o\n",
+		   __func__, parent, name, mode);
 
 	fs = fuse4fs_start(ff);
 	if (!fuse4fs_can_allocate(ff, 1)) {
@@ -1891,25 +1910,15 @@ static int op_mkdir(const char *path, mode_t mode)
 		goto out2;
 	}
 
-	err = ext2fs_namei(fs, EXT2_ROOT_INO, EXT2_ROOT_INO, temp_path,
-			   &parent);
-	if (err) {
-		ret = translate_error(fs, 0, err);
-		goto out2;
-	}
-
-	ret = fuse4fs_inum_access(ff, parent, A_OK | W_OK);
+	ret = fuse4fs_inum_access(ff, ctxt, parent, A_OK | W_OK);
 	if (ret)
 		goto out2;
 
-	err = fuse4fs_new_child_gid(ff, parent, &gid, &parent_sgid);
+	err = fuse4fs_new_child_gid(ff, ctxt, parent, &gid, &parent_sgid);
 	if (err)
 		goto out2;
 
-	*node_name = a;
-
-	err = ext2fs_mkdir2(fs, parent, 0, 0, EXT2FS_LINK_EXPAND,
-			    node_name, NULL);
+	err = ext2fs_mkdir2(fs, parent, 0, 0, EXT2FS_LINK_EXPAND, name, NULL);
 	if (err) {
 		ret = translate_error(fs, parent, err);
 		goto out2;
@@ -1920,14 +1929,13 @@ static int op_mkdir(const char *path, mode_t mode)
 		goto out2;
 
 	/* Still have to update the uid/gid of the dir */
-	err = ext2fs_namei(fs, EXT2_ROOT_INO, EXT2_ROOT_INO, temp_path,
-			   &child);
+	err = ext2fs_namei(fs, EXT2_ROOT_INO, parent, name, &child);
 	if (err) {
 		ret = translate_error(fs, 0, err);
 		goto out2;
 	}
-	dbg_printf(ff, "%s: created ino=%d/path=%s in dir=%d\n", __func__, child,
-		   node_name, parent);
+	dbg_printf(ff, "%s: created ino=%d name='%s' in dir=%d\n",
+		   __func__, child, name, parent);
 
 	err = fuse4fs_read_inode(fs, child, &inode);
 	if (err) {
@@ -1983,55 +1991,7 @@ static int op_mkdir(const char *path, mode_t mode)
 	ext2fs_free_mem(&block);
 out2:
 	fuse4fs_finish(ff, ret);
-out:
-	free(temp_path);
-	return ret;
-}
-
-static int fuse4fs_unlink(struct fuse4fs *ff, const char *path,
-			  ext2_ino_t *parent)
-{
-	ext2_filsys fs = ff->fs;
-	errcode_t err;
-	ext2_ino_t dir;
-	char *filename = strdup(path);
-	char *base_name;
-	int ret;
-
-	base_name = strrchr(filename, '/');
-	if (base_name) {
-		*base_name++ = '\0';
-		err = ext2fs_namei(fs, EXT2_ROOT_INO, EXT2_ROOT_INO, filename,
-				   &dir);
-		if (err) {
-			free(filename);
-			return translate_error(fs, 0, err);
-		}
-	} else {
-		dir = EXT2_ROOT_INO;
-		base_name = filename;
-	}
-
-	ret = fuse4fs_inum_access(ff, dir, W_OK);
-	if (ret) {
-		free(filename);
-		return ret;
-	}
-
-	dbg_printf(ff, "%s: unlinking name=%s from dir=%d\n", __func__,
-		   base_name, dir);
-	err = ext2fs_unlink(fs, dir, base_name, 0, 0);
-	free(filename);
-	if (err)
-		return translate_error(fs, dir, err);
-
-	ret = update_mtime(fs, dir, NULL);
-	if (ret)
-		return ret;
-
-	if (parent)
-		*parent = dir;
-	return 0;
+	fuse4fs_reply_entry(req, child, &inode, ret);
 }
 
 static int fuse4fs_remove_ea_inodes(struct fuse4fs *ff, ext2_ino_t ino,
@@ -2140,49 +2100,78 @@ static int fuse4fs_remove_inode(struct fuse4fs *ff, ext2_ino_t ino)
 	return 0;
 }
 
-static int __op_unlink(struct fuse4fs *ff, const char *path)
+static int fuse4fs_unlink(struct fuse4fs *ff, ext2_ino_t parent,
+			  const char *name, ext2_ino_t child)
 {
 	ext2_filsys fs = ff->fs;
-	ext2_ino_t parent, ino;
 	errcode_t err;
 	int ret = 0;
 
-	err = ext2fs_namei(fs, EXT2_ROOT_INO, EXT2_ROOT_INO, path, &ino);
+	err = ext2fs_unlink(fs, parent, name, child, 0);
+	if (err) {
+		ret = translate_error(fs, parent, err);
+		goto out;
+	}
+
+	ret = update_mtime(fs, parent, NULL);
+	if (ret)
+		goto out;
+out:
+	return ret;
+}
+
+static int fuse4fs_rmfile(struct fuse4fs *ff, ext2_ino_t parent,
+			  const char *name, ext2_ino_t child)
+{
+	int ret;
+
+	ret = fuse4fs_unlink(ff, parent, name, child);
+	if (ret)
+		return ret;
+
+	return fuse4fs_remove_inode(ff, child);
+}
+
+static void op_unlink(fuse_req_t req, fuse_ino_t fino, const char *name)
+{
+	const struct fuse_ctx *ctxt = fuse_req_ctx(req);
+	struct fuse4fs *ff = fuse4fs_get(req);
+	ext2_filsys fs;
+	ext2_ino_t parent, child;
+	errcode_t err;
+	int ret;
+
+	FUSE4FS_CHECK_CONTEXT(req);
+	FUSE4FS_CONVERT_FINO(req, &parent, fino);
+	fs = fuse4fs_start(ff);
+
+	/* Get the inode number for the file */
+	err = ext2fs_namei(fs, EXT2_ROOT_INO, parent, name, &child);
 	if (err) {
 		ret = translate_error(fs, 0, err);
 		goto out;
 	}
 
-	ret = fuse4fs_inum_access(ff, ino, W_OK);
+	ret = fuse4fs_inum_access(ff, ctxt, child, W_OK);
 	if (ret)
 		goto out;
 
-	ret = fuse4fs_unlink(ff, path, &parent);
+	ret = fuse4fs_inum_access(ff, ctxt, parent, W_OK);
 	if (ret)
 		goto out;
 
-	ret = fuse4fs_remove_inode(ff, ino);
+	dbg_printf(ff, "%s: unlink parent=%d name='%s' child=%d\n",
+		   __func__, parent, name, child);
+	ret = fuse4fs_rmfile(ff, parent, name, child);
 	if (ret)
 		goto out;
 
 	ret = fuse4fs_dirsync_flush(ff, parent, NULL);
 	if (ret)
 		goto out;
-
 out:
-	return ret;
-}
-
-static int op_unlink(const char *path)
-{
-	struct fuse4fs *ff = fuse4fs_get();
-	int ret;
-
-	FUSE4FS_CHECK_CONTEXT(ff);
-	fuse4fs_start(ff);
-	ret = __op_unlink(ff, path);
 	fuse4fs_finish(ff, ret);
-	return ret;
+	fuse_reply_err(req, -ret);
 }
 
 struct rd_struct {
@@ -2213,51 +2202,36 @@ static int rmdir_proc(ext2_ino_t dir EXT2FS_ATTR((unused)),
 	return 0;
 }
 
-static int __op_rmdir(struct fuse4fs *ff, const char *path)
+static int fuse4fs_rmdir(struct fuse4fs *ff, ext2_ino_t parent,
+			 const char *name, ext2_ino_t child)
 {
 	ext2_filsys fs = ff->fs;
-	ext2_ino_t parent, child;
 	errcode_t err;
 	struct ext2_inode_large inode;
-	struct rd_struct rds;
+	struct rd_struct rds = {
+		.parent = 0,
+		.empty = 1,
+	};
 	int ret = 0;
 
-	err = ext2fs_namei(fs, EXT2_ROOT_INO, EXT2_ROOT_INO, path, &child);
-	if (err) {
-		ret = translate_error(fs, 0, err);
-		goto out;
-	}
-	dbg_printf(ff, "%s: rmdir path=%s ino=%d\n", __func__, path, child);
-
-	ret = fuse4fs_inum_access(ff, child, W_OK);
-	if (ret)
-		goto out;
-
-	rds.parent = 0;
-	rds.empty = 1;
-
 	err = ext2fs_dir_iterate2(fs, child, 0, 0, rmdir_proc, &rds);
 	if (err) {
 		ret = translate_error(fs, child, err);
 		goto out;
 	}
 
-	/* the kernel checks parent permissions before emptiness */
+	/* Make sure we found a dotdot entry */
 	if (rds.parent == 0) {
 		ret = translate_error(fs, child, EXT2_ET_FILESYSTEM_CORRUPTED);
 		goto out;
 	}
 
-	ret = fuse4fs_inum_access(ff, rds.parent, W_OK);
-	if (ret)
-		goto out;
-
 	if (rds.empty == 0) {
 		ret = -ENOTEMPTY;
 		goto out;
 	}
 
-	ret = fuse4fs_unlink(ff, path, &parent);
+	ret = fuse4fs_unlink(ff, parent, name, child);
 	if (ret)
 		goto out;
 	/* Directories have to be "removed" twice. */
@@ -2288,78 +2262,85 @@ static int __op_rmdir(struct fuse4fs *ff, const char *path)
 		}
 	}
 
-	ret = fuse4fs_dirsync_flush(ff, parent, NULL);
-	if (ret)
-		goto out;
-
 out:
 	return ret;
 }
 
-static int op_rmdir(const char *path)
+static void op_rmdir(fuse_req_t req, fuse_ino_t fino, const char *name)
 {
-	struct fuse4fs *ff = fuse4fs_get();
+	const struct fuse_ctx *ctxt = fuse_req_ctx(req);
+	struct fuse4fs *ff = fuse4fs_get(req);
+	ext2_filsys fs;
+	ext2_ino_t parent, child;
+	errcode_t err;
 	int ret;
 
-	FUSE4FS_CHECK_CONTEXT(ff);
-	fuse4fs_start(ff);
-	ret = __op_rmdir(ff, path);
+	FUSE4FS_CHECK_CONTEXT(req);
+	FUSE4FS_CONVERT_FINO(req, &parent, fino);
+	fs = fuse4fs_start(ff);
+
+	err = ext2fs_namei(fs, EXT2_ROOT_INO, parent, name, &child);
+	if (err) {
+		ret = translate_error(fs, 0, err);
+		goto out;
+	}
+
+	ret = fuse4fs_inum_access(ff, ctxt, parent, W_OK);
+	if (ret)
+		goto out;
+
+	ret = fuse4fs_inum_access(ff, ctxt, child, W_OK);
+	if (ret)
+		goto out;
+
+	dbg_printf(ff, "%s: unlink parent=%d name='%s' child=%d\n",
+		   __func__, parent, name, child);
+	ret = fuse4fs_rmdir(ff, parent, name, child);
+	if (ret)
+		goto out;
+
+	ret = fuse4fs_dirsync_flush(ff, parent, NULL);
+	if (ret)
+		goto out;
+
+out:
 	fuse4fs_finish(ff, ret);
-	return ret;
+	fuse_reply_err(req, -ret);
 }
 
-static int op_symlink(const char *src, const char *dest)
+static void op_symlink(fuse_req_t req, const char *target, fuse_ino_t fino,
+		       const char *name)
 {
-	struct fuse_context *ctxt = fuse_get_context();
-	struct fuse4fs *ff = fuse4fs_get();
-	ext2_filsys fs;
-	ext2_ino_t parent, child;
-	char *temp_path;
-	errcode_t err;
-	char *node_name, a;
 	struct ext2_inode_large inode;
+	const struct fuse_ctx *ctxt = fuse_req_ctx(req);
+	struct fuse4fs *ff = fuse4fs_get(req);
+	ext2_filsys fs;
+	ext2_ino_t parent, child;
+	errcode_t err;
 	gid_t gid;
 	int ret = 0;
 
-	FUSE4FS_CHECK_CONTEXT(ff);
-	dbg_printf(ff, "%s: symlink %s to %s\n", __func__, src, dest);
-	temp_path = strdup(dest);
-	if (!temp_path) {
-		ret = -ENOMEM;
-		goto out;
-	}
-	node_name = strrchr(temp_path, '/');
-	if (!node_name) {
-		ret = -ENOMEM;
-		goto out;
-	}
-	node_name++;
-	a = *node_name;
-	*node_name = 0;
+	FUSE4FS_CHECK_CONTEXT(req);
+	FUSE4FS_CONVERT_FINO(req, &parent, fino);
+	dbg_printf(ff, "%s: symlink dir=%d name='%s' target='%s'\n",
+		   __func__, parent, name, target);
 
 	fs = fuse4fs_start(ff);
-	if (!fs_can_allocate(ff, 1)) {
+	if (!fuse4fs_can_allocate(ff, 1)) {
 		ret = -ENOSPC;
 		goto out2;
 	}
-	err = ext2fs_namei(fs, EXT2_ROOT_INO, EXT2_ROOT_INO, temp_path,
-			   &parent);
-	*node_name = a;
-	if (err) {
-		ret = translate_error(fs, 0, err);
-		goto out2;
-	}
 
-	ret = fuse4fs_inum_access(ff, parent, A_OK | W_OK);
+	ret = fuse4fs_inum_access(ff, ctxt, parent, A_OK | W_OK);
 	if (ret)
 		goto out2;
 
-	err = fuse4fs_new_child_gid(ff, parent, &gid, NULL);
+	err = fuse4fs_new_child_gid(ff, ctxt, parent, &gid, NULL);
 	if (err)
 		goto out2;
 
 	/* Create symlink */
-	err = ext2fs_symlink(fs, parent, 0, node_name, src);
+	err = ext2fs_symlink(fs, parent, 0, name, target);
 	if (err == EXT2_ET_DIR_NO_SPACE) {
 		err = ext2fs_expand_dir(fs, parent);
 		if (err) {
@@ -2367,7 +2348,7 @@ static int op_symlink(const char *src, const char *dest)
 			goto out2;
 		}
 
-		err = ext2fs_symlink(fs, parent, 0, node_name, src);
+		err = ext2fs_symlink(fs, parent, 0, name, target);
 	}
 	if (err) {
 		ret = translate_error(fs, parent, err);
@@ -2380,14 +2361,13 @@ static int op_symlink(const char *src, const char *dest)
 		goto out2;
 
 	/* Still have to update the uid/gid of the symlink */
-	err = ext2fs_namei(fs, EXT2_ROOT_INO, EXT2_ROOT_INO, temp_path,
-			   &child);
+	err = ext2fs_namei(fs, EXT2_ROOT_INO, parent, name, &child);
 	if (err) {
 		ret = translate_error(fs, 0, err);
 		goto out2;
 	}
-	dbg_printf(ff, "%s: symlinking ino=%d/name=%s to dir=%d\n", __func__,
-		   child, node_name, parent);
+	dbg_printf(ff, "%s: symlinking dir=%d name='%s' child=%d\n",
+		   __func__, parent, name, child);
 
 	err = fuse4fs_read_inode(fs, child, &inode);
 	if (err) {
@@ -2413,9 +2393,7 @@ static int op_symlink(const char *src, const char *dest)
 
 out2:
 	fuse4fs_finish(ff, ret);
-out:
-	free(temp_path);
-	return ret;
+	fuse4fs_reply_entry(req, child, &inode, ret);
 }
 
 struct update_dotdot {
@@ -2441,39 +2419,43 @@ static int update_dotdot_helper(ext2_ino_t dir EXT2FS_ATTR((unused)),
 	return 0;
 }
 
-static int op_rename(const char *from, const char *to,
-		     unsigned int flags EXT2FS_ATTR((unused)))
+static void op_rename(fuse_req_t req, fuse_ino_t from_parent, const char *from,
+		      fuse_ino_t to_parent, const char *to, unsigned int flags)
 {
-	struct fuse4fs *ff = fuse4fs_get();
+	const struct fuse_ctx *ctxt = fuse_req_ctx(req);
+	struct fuse4fs *ff = fuse4fs_get(req);
 	ext2_filsys fs;
 	errcode_t err;
 	ext2_ino_t from_ino, to_ino, to_dir_ino, from_dir_ino;
-	char *temp_to = NULL, *temp_from = NULL;
-	char *cp, a;
 	struct ext2_inode inode;
 	struct update_dotdot ud;
 	int flushed = 0;
 	int ret = 0;
 
 	/* renameat2 is not supported */
-	if (flags)
-		return -ENOSYS;
+	if (flags) {
+		fuse_reply_err(req, ENOSYS);
+		return;
+	}
 
-	FUSE4FS_CHECK_CONTEXT(ff);
-	dbg_printf(ff, "%s: renaming %s to %s\n", __func__, from, to);
+	FUSE4FS_CHECK_CONTEXT(req);
+	FUSE4FS_CONVERT_FINO(req, &from_dir_ino, from_parent);
+	FUSE4FS_CONVERT_FINO(req, &to_dir_ino, to_parent);
+	dbg_printf(ff, "%s: renaming dir=%d name='%s' to dir=%d name='%s'\n",
+		   __func__, from_dir_ino, from, to_dir_ino, to);
 	fs = fuse4fs_start(ff);
 	if (!fuse4fs_can_allocate(ff, 5)) {
 		ret = -ENOSPC;
 		goto out;
 	}
 
-	err = ext2fs_namei(fs, EXT2_ROOT_INO, EXT2_ROOT_INO, from, &from_ino);
+	err = ext2fs_namei(fs, EXT2_ROOT_INO, from_dir_ino, from, &from_ino);
 	if (err || from_ino == 0) {
 		ret = translate_error(fs, 0, err);
 		goto out;
 	}
 
-	err = ext2fs_namei(fs, EXT2_ROOT_INO, EXT2_ROOT_INO, to, &to_ino);
+	err = ext2fs_namei(fs, EXT2_ROOT_INO, to_dir_ino, to, &to_ino);
 	if (err && err != EXT2_ET_FILE_NOT_FOUND) {
 		ret = translate_error(fs, 0, err);
 		goto out;
@@ -2482,136 +2464,80 @@ static int op_rename(const char *from, const char *to,
 	if (err == EXT2_ET_FILE_NOT_FOUND)
 		to_ino = 0;
 
+	dbg_printf(ff,
+ "%s: renaming dir=%d name='%s' child=%d to dir=%d name='%s' child=%d\n",
+		   __func__, from_dir_ino, from, from_ino, to_dir_ino, to,
+		   to_ino);
+
 	/* Already the same file? */
 	if (to_ino != 0 && to_ino == from_ino) {
 		ret = 0;
 		goto out;
 	}
 
-	ret = fuse4fs_inum_access(ff, from_ino, W_OK);
+	ret = fuse4fs_inum_access(ff, ctxt, from_ino, W_OK);
 	if (ret)
 		goto out;
 
 	if (to_ino) {
-		ret = fuse4fs_inum_access(ff, to_ino, W_OK);
+		ret = fuse4fs_inum_access(ff, ctxt, to_ino, W_OK);
 		if (ret)
 			goto out;
 	}
 
-	temp_to = strdup(to);
-	if (!temp_to) {
-		ret = -ENOMEM;
-		goto out;
-	}
-
-	temp_from = strdup(from);
-	if (!temp_from) {
-		ret = -ENOMEM;
-		goto out2;
-	}
-
-	/* Find parent dir of the source and check write access */
-	cp = strrchr(temp_from, '/');
-	if (!cp) {
-		ret = -EINVAL;
-		goto out2;
-	}
-
-	a = *(cp + 1);
-	*(cp + 1) = 0;
-	err = ext2fs_namei(fs, EXT2_ROOT_INO, EXT2_ROOT_INO, temp_from,
-			   &from_dir_ino);
-	*(cp + 1) = a;
-	if (err) {
-		ret = translate_error(fs, 0, err);
-		goto out2;
-	}
-	if (from_dir_ino == 0) {
-		ret = -ENOENT;
-		goto out2;
-	}
-
-	ret = fuse4fs_inum_access(ff, from_dir_ino, W_OK);
+	ret = fuse4fs_inum_access(ff, ctxt, from_dir_ino, W_OK);
 	if (ret)
-		goto out2;
-
-	/* Find parent dir of the destination and check write access */
-	cp = strrchr(temp_to, '/');
-	if (!cp) {
-		ret = -EINVAL;
-		goto out2;
-	}
-
-	a = *(cp + 1);
-	*(cp + 1) = 0;
-	err = ext2fs_namei(fs, EXT2_ROOT_INO, EXT2_ROOT_INO, temp_to,
-			   &to_dir_ino);
-	*(cp + 1) = a;
-	if (err) {
-		ret = translate_error(fs, 0, err);
-		goto out2;
-	}
-	if (to_dir_ino == 0) {
-		ret = -ENOENT;
-		goto out2;
-	}
+		goto out;
 
-	ret = fuse4fs_inum_access(ff, to_dir_ino, W_OK);
+	ret = fuse4fs_inum_access(ff, ctxt, to_dir_ino, W_OK);
 	if (ret)
-		goto out2;
+		goto out;
 
 	/* If the target exists, unlink it first */
 	if (to_ino != 0) {
 		err = ext2fs_read_inode(fs, to_ino, &inode);
 		if (err) {
 			ret = translate_error(fs, to_ino, err);
-			goto out2;
+			goto out;
 		}
 
-		dbg_printf(ff, "%s: unlinking %s ino=%d\n", __func__,
-			   LINUX_S_ISDIR(inode.i_mode) ? "dir" : "file",
-			   to_ino);
+		dbg_printf(ff, "%s: unlink dir=%d name='%s' child=%d\n",
+			   __func__, to_dir_ino, to, to_ino);
 		if (LINUX_S_ISDIR(inode.i_mode))
-			ret = __op_rmdir(ff, to);
+			ret = fuse4fs_rmdir(ff, to_dir_ino, to, to_ino);
 		else
-			ret = __op_unlink(ff, to);
+			ret = fuse4fs_rmfile(ff, to_dir_ino, to, to_ino);
 		if (ret)
-			goto out2;
+			goto out;
 	}
 
 	/* Get ready to do the move */
 	err = ext2fs_read_inode(fs, from_ino, &inode);
 	if (err) {
 		ret = translate_error(fs, from_ino, err);
-		goto out2;
+		goto out;
 	}
 
 	/* Link in the new file */
-	dbg_printf(ff, "%s: linking ino=%d/path=%s to dir=%d\n", __func__,
-		   from_ino, cp + 1, to_dir_ino);
-	err = ext2fs_link(fs, to_dir_ino, cp + 1, from_ino,
+	dbg_printf(ff, "%s: link dir=%d name='%s' child=%d\n",
+		   __func__, to_dir_ino, to, from_ino);
+	err = ext2fs_link(fs, to_dir_ino, to, from_ino,
 			  ext2_file_type(inode.i_mode) | EXT2FS_LINK_EXPAND);
 	if (err) {
 		ret = translate_error(fs, to_dir_ino, err);
-		goto out2;
+		goto out;
 	}
 
 	/* Update '..' pointer if dir */
-	err = ext2fs_read_inode(fs, from_ino, &inode);
-	if (err) {
-		ret = translate_error(fs, from_ino, err);
-		goto out2;
-	}
-
 	if (LINUX_S_ISDIR(inode.i_mode)) {
 		ud.new_dotdot = to_dir_ino;
-		dbg_printf(ff, "%s: updating .. entry for dir=%d\n", __func__,
-			   to_dir_ino);
+		dbg_printf(ff, "%s: updating .. entry for child=%d parent=%d\n",
+			   __func__, from_ino, to_dir_ino);
 		err = ext2fs_dir_iterate2(fs, from_ino, 0, NULL,
 					  update_dotdot_helper, &ud);
 		if (err) {
 			ret = translate_error(fs, from_ino, err);
-			goto out2;
+			goto out;
 		}
 
 		/* Decrease from_dir_ino's links_count */
@@ -2620,87 +2546,76 @@ static int op_rename(const char *from, const char *to,
 		err = ext2fs_read_inode(fs, from_dir_ino, &inode);
 		if (err) {
 			ret = translate_error(fs, from_dir_ino, err);
-			goto out2;
+			goto out;
 		}
 		inode.i_links_count--;
 		err = ext2fs_write_inode(fs, from_dir_ino, &inode);
 		if (err) {
 			ret = translate_error(fs, from_dir_ino, err);
-			goto out2;
+			goto out;
 		}
 
 		/* Increase to_dir_ino's links_count */
 		err = ext2fs_read_inode(fs, to_dir_ino, &inode);
 		if (err) {
 			ret = translate_error(fs, to_dir_ino, err);
-			goto out2;
+			goto out;
 		}
 		inode.i_links_count++;
 		err = ext2fs_write_inode(fs, to_dir_ino, &inode);
 		if (err) {
 			ret = translate_error(fs, to_dir_ino, err);
-			goto out2;
+			goto out;
 		}
 	}
 
 	/* Update timestamps */
 	ret = update_ctime(fs, from_ino, NULL);
 	if (ret)
-		goto out2;
+		goto out;
 
 	ret = update_mtime(fs, to_dir_ino, NULL);
 	if (ret)
-		goto out2;
+		goto out;
 
 	/* Remove the old file */
-	ret = fuse4fs_unlink(ff, from, NULL);
+	dbg_printf(ff, "%s: unlink dir=%d name='%s' child=%d\n",
+		   __func__, from_dir_ino, from, from_ino);
+	ret = fuse4fs_unlink(ff, from_dir_ino, from, from_ino);
 	if (ret)
-		goto out2;
+		goto out;
 
 	ret = fuse4fs_dirsync_flush(ff, from_dir_ino, &flushed);
 	if (ret)
-		goto out2;
+		goto out;
 
 	if (from_dir_ino != to_dir_ino && !flushed) {
 		ret = fuse4fs_dirsync_flush(ff, to_dir_ino, NULL);
 		if (ret)
-			goto out2;
+			goto out;
 	}
 
-out2:
-	free(temp_from);
-	free(temp_to);
 out:
 	fuse4fs_finish(ff, ret);
-	return ret;
+	fuse_reply_err(req, -ret);
 }
 
-static int op_link(const char *src, const char *dest)
+static void op_link(fuse_req_t req, fuse_ino_t child_fino,
+		    fuse_ino_t parent_fino, const char *name)
 {
-	struct fuse4fs *ff = fuse4fs_get();
+	struct ext2_inode_large inode;
+	const struct fuse_ctx *ctxt = fuse_req_ctx(req);
+	struct fuse4fs *ff = fuse4fs_get(req);
 	ext2_filsys fs;
-	char *temp_path;
 	errcode_t err;
-	char *node_name, a;
-	ext2_ino_t parent, ino;
-	struct ext2_inode_large inode;
+	ext2_ino_t parent, child;
 	int ret = 0;
 
-	FUSE4FS_CHECK_CONTEXT(ff);
-	dbg_printf(ff, "%s: src=%s dest=%s\n", __func__, src, dest);
-	temp_path = strdup(dest);
-	if (!temp_path) {
-		ret = -ENOMEM;
-		goto out;
-	}
-	node_name = strrchr(temp_path, '/');
-	if (!node_name) {
-		ret = -ENOMEM;
-		goto out;
-	}
-	node_name++;
-	a = *node_name;
-	*node_name = 0;
+	FUSE4FS_CHECK_CONTEXT(req);
+	FUSE4FS_CONVERT_FINO(req, &parent, parent_fino);
+	FUSE4FS_CONVERT_FINO(req, &child, child_fino);
+	dbg_printf(ff, "%s: link dir=%d name='%s' child=%d\n",
+		   __func__, parent, name, child);
 
 	fs = fuse4fs_start(ff);
 	if (!fuse4fs_can_allocate(ff, 2)) {
@@ -2708,48 +2623,32 @@ static int op_link(const char *src, const char *dest)
 		goto out2;
 	}
 
-	err = ext2fs_namei(fs, EXT2_ROOT_INO, EXT2_ROOT_INO, temp_path,
-			   &parent);
-	*node_name = a;
-	if (err) {
-		err = -ENOENT;
-		goto out2;
-	}
-
-	ret = fuse4fs_inum_access(ff, parent, A_OK | W_OK);
+	ret = fuse4fs_inum_access(ff, ctxt, parent, A_OK | W_OK);
 	if (ret)
 		goto out2;
 
-	err = ext2fs_namei(fs, EXT2_ROOT_INO, EXT2_ROOT_INO, src, &ino);
-	if (err || ino == 0) {
-		ret = translate_error(fs, 0, err);
-		goto out2;
-	}
-
-	err = fuse4fs_read_inode(fs, ino, &inode);
+	err = fuse4fs_read_inode(fs, child, &inode);
 	if (err) {
-		ret = translate_error(fs, ino, err);
+		ret = translate_error(fs, child, err);
 		goto out2;
 	}
 
-	ret = fuse4fs_iflags_access(ff, ino, EXT2_INODE(&inode), W_OK);
+	ret = fuse4fs_iflags_access(ff, child, EXT2_INODE(&inode), W_OK);
 	if (ret)
 		goto out2;
 
 	inode.i_links_count++;
-	ret = update_ctime(fs, ino, &inode);
+	ret = update_ctime(fs, child, &inode);
 	if (ret)
 		goto out2;
 
-	err = fuse4fs_write_inode(fs, ino, &inode);
+	err = fuse4fs_write_inode(fs, child, &inode);
 	if (err) {
-		ret = translate_error(fs, ino, err);
+		ret = translate_error(fs, child, err);
 		goto out2;
 	}
 
-	dbg_printf(ff, "%s: linking ino=%d/name=%s to dir=%d\n", __func__, ino,
-		   node_name, parent);
-	err = ext2fs_link(fs, parent, node_name, ino,
+	err = ext2fs_link(fs, parent, name, child,
 			  ext2_file_type(inode.i_mode) | EXT2FS_LINK_EXPAND);
 	if (err) {
 		ret = translate_error(fs, parent, err);
@@ -2766,13 +2665,12 @@ static int op_link(const char *src, const char *dest)
 
 out2:
 	fuse4fs_finish(ff, ret);
-out:
-	free(temp_path);
-	return ret;
+	fuse4fs_reply_entry(req, child, &inode, ret);
 }
 
 /* Obtain group ids of the process that sent us a command(?) */
-static int fuse4fs_get_groups(struct fuse4fs *ff, gid_t **gids, size_t *nr_gids)
+static int fuse4fs_get_groups(struct fuse4fs *ff, fuse_req_t req, gid_t **gids,
+			      size_t *nr_gids)
 {
 	ext2_filsys fs = ff->fs;
 	errcode_t err;
@@ -2785,7 +2683,7 @@ static int fuse4fs_get_groups(struct fuse4fs *ff, gid_t **gids, size_t *nr_gids)
 		if (err)
 			return translate_error(fs, 0, err);
 
-		ret = fuse_getgroups(nr, array);
+		ret = fuse_req_getgroups(req, nr, array);
 		if (ret < 0) {
 			/*
 			 * If there's an error, we failed to find the group
@@ -2817,10 +2715,10 @@ static int fuse4fs_get_groups(struct fuse4fs *ff, gid_t **gids, size_t *nr_gids)
  * that initiated the fuse request?  Returns 1 for yes, 0 for no, or a negative
  * errno.
  */
-static int fuse4fs_in_file_group(struct fuse_context *ctxt,
+static int fuse4fs_in_file_group(struct fuse4fs *ff, fuse_req_t req,
 				 const struct ext2_inode_large *inode)
 {
-	struct fuse4fs *ff = fuse4fs_get();
+	const struct fuse_ctx *ctxt = fuse_req_ctx(req);
 	gid_t *gids = NULL;
 	size_t i, nr_gids = 0;
 	gid_t gid = inode_gid(*inode);
@@ -2830,7 +2728,7 @@ static int fuse4fs_in_file_group(struct fuse_context *ctxt,
 	if (ctxt->gid == gid)
 		return 1;
 
-	ret = fuse4fs_get_groups(ff, &gids, &nr_gids);
+	ret = fuse4fs_get_groups(ff, req, &gids, &nr_gids);
 	if (ret == -ENOENT) {
 		/* magic return code for "could not get caller group info" */
 		return 0;
@@ -2850,37 +2748,21 @@ static int fuse4fs_in_file_group(struct fuse_context *ctxt,
 	return ret;
 }
 
-static int op_chmod(const char *path, mode_t mode, struct fuse_file_info *fi)
+static int fuse4fs_chmod(struct fuse4fs *ff, fuse_req_t req, ext2_ino_t ino,
+			 mode_t mode, struct ext2_inode_large *inode)
 {
-	struct fuse_context *ctxt = fuse_get_context();
-	struct fuse4fs *ff = fuse4fs_get();
-	ext2_filsys fs;
-	errcode_t err;
-	ext2_ino_t ino;
-	struct ext2_inode_large inode;
+	const struct fuse_ctx *ctxt = fuse_req_ctx(req);
 	int ret = 0;
 
-	FUSE4FS_CHECK_CONTEXT(ff);
-	fs = fuse4fs_start(ff);
-	ret = fuse4fs_file_ino(ff, path, fi, &ino);
-	if (ret)
-		goto out;
-	dbg_printf(ff, "%s: path=%s mode=0%o ino=%d\n", __func__, path, mode, ino);
-
-	err = fuse4fs_read_inode(fs, ino, &inode);
-	if (err) {
-		ret = translate_error(fs, ino, err);
-		goto out;
-	}
+	dbg_printf(ff, "%s: ino=%d mode=0%o\n", __func__, ino, mode);
 
-	ret = fuse4fs_iflags_access(ff, ino, EXT2_INODE(&inode), W_OK);
+	ret = fuse4fs_iflags_access(ff, ino, EXT2_INODE(inode), W_OK);
 	if (ret)
-		goto out;
+		return ret;
 
-	if (fuse4fs_want_check_owner(ff, ctxt) && ctxt->uid != inode_uid(inode)) {
-		ret = -EPERM;
-		goto out;
-	}
+	if (fuse4fs_want_check_owner(ff, ctxt) &&
+	    ctxt->uid != inode_uid(*inode))
+		return -EPERM;
 
 	/*
 	 * XXX: We should really check that the inode gid is not in /any/
@@ -2888,100 +2770,60 @@ static int op_chmod(const char *path, mode_t mode, struct fuse_file_info *fi)
 	 * group.
 	 */
 	if (!fuse4fs_is_superuser(ff, ctxt)) {
-		ret = fuse4fs_in_file_group(ctxt, &inode);
+		ret = fuse4fs_in_file_group(ff, req, inode);
 		if (ret < 0)
-			goto out;
+			return ret;
 
 		if (!ret)
 			mode &= ~S_ISGID;
 	}
 
-	inode.i_mode &= ~0xFFF;
-	inode.i_mode |= mode & 0xFFF;
+	inode->i_mode &= ~0xFFF;
+	inode->i_mode |= mode & 0xFFF;
 
-	dbg_printf(ff, "%s: path=%s new_mode=0%o ino=%d\n", __func__,
-		   path, inode.i_mode, ino);
+	dbg_printf(ff, "%s: ino=%d new_mode=0%o\n",
+		   __func__, ino, inode->i_mode);
 
-	ret = update_ctime(fs, ino, &inode);
-	if (ret)
-		goto out;
-
-	err = fuse4fs_write_inode(fs, ino, &inode);
-	if (err) {
-		ret = translate_error(fs, ino, err);
-		goto out;
-	}
-
-out:
-	fuse4fs_finish(ff, ret);
-	return ret;
+	return 0;
 }
 
-static int op_chown(const char *path, uid_t owner, gid_t group,
-		    struct fuse_file_info *fi)
+static int fuse4fs_chown(struct fuse4fs *ff, const struct fuse_ctx *ctxt,
+			 ext2_ino_t ino, const int to_set,
+			 const struct stat *attr,
+			 struct ext2_inode_large *inode)
 {
-	struct fuse_context *ctxt = fuse_get_context();
-	struct fuse4fs *ff = fuse4fs_get();
-	ext2_filsys fs;
-	errcode_t err;
-	ext2_ino_t ino;
-	struct ext2_inode_large inode;
+	uid_t owner = (to_set & FUSE_SET_ATTR_UID) ? attr->st_uid : (uid_t)~0;
+	gid_t group = (to_set & FUSE_SET_ATTR_GID) ? attr->st_gid : (gid_t)~0;
 	int ret = 0;
 
-	FUSE4FS_CHECK_CONTEXT(ff);
-	fs = fuse4fs_start(ff);
-	ret = fuse4fs_file_ino(ff, path, fi, &ino);
-	if (ret)
-		goto out;
-	dbg_printf(ff, "%s: path=%s owner=%d group=%d ino=%d\n", __func__,
-		   path, owner, group, ino);
-
-	err = fuse4fs_read_inode(fs, ino, &inode);
-	if (err) {
-		ret = translate_error(fs, ino, err);
-		goto out;
-	}
+	dbg_printf(ff, "%s: ino=%d owner=%d group=%d\n",
+		   __func__, ino, owner, group);
 
-	ret = fuse4fs_iflags_access(ff, ino, EXT2_INODE(&inode), W_OK);
+	ret = fuse4fs_iflags_access(ff, ino, EXT2_INODE(inode), W_OK);
 	if (ret)
-		goto out;
+		return ret;
 
 	/* FUSE seems to feed us ~0 to mean "don't change" */
 	if (owner != (uid_t) ~0) {
 		/* Only root gets to change UID. */
 		if (fuse4fs_want_check_owner(ff, ctxt) &&
-		    !(inode_uid(inode) == ctxt->uid && owner == ctxt->uid)) {
-			ret = -EPERM;
-			goto out;
-		}
-		fuse4fs_set_uid(&inode, owner);
+		    !(inode_uid(*inode) == ctxt->uid && owner == ctxt->uid))
+			return -EPERM;
+
+		fuse4fs_set_uid(inode, owner);
 	}
 
 	if (group != (gid_t) ~0) {
 		/* Only root or the owner get to change GID. */
 		if (fuse4fs_want_check_owner(ff, ctxt) &&
-		    inode_uid(inode) != ctxt->uid) {
-			ret = -EPERM;
-			goto out;
-		}
+		    inode_uid(*inode) != ctxt->uid)
+			return -EPERM;
 
 		/* XXX: We /should/ check group membership but FUSE */
-		fuse4fs_set_gid(&inode, group);
+		fuse4fs_set_gid(inode, group);
 	}
 
-	ret = update_ctime(fs, ino, &inode);
-	if (ret)
-		goto out;
-
-	err = fuse4fs_write_inode(fs, ino, &inode);
-	if (err) {
-		ret = translate_error(fs, ino, err);
-		goto out;
-	}
-
-out:
-	fuse4fs_finish(ff, ret);
-	return ret;
+	return 0;
 }
 
 static int fuse4fs_punch_posteof(struct fuse4fs *ff, ext2_ino_t ino,
@@ -3056,32 +2898,6 @@ static int fuse4fs_truncate(struct fuse4fs *ff, ext2_ino_t ino, off_t new_size)
 	return 0;
 }
 
-static int op_truncate(const char *path, off_t len, struct fuse_file_info *fi)
-{
-	struct fuse4fs *ff = fuse4fs_get();
-	ext2_ino_t ino;
-	int ret = 0;
-
-	FUSE4FS_CHECK_CONTEXT(ff);
-	fuse4fs_start(ff);
-	ret = fuse4fs_file_ino(ff, path, fi, &ino);
-	if (ret)
-		goto out;
-	dbg_printf(ff, "%s: ino=%d len=%jd\n", __func__, ino, (intmax_t) len);
-
-	ret = fuse4fs_inum_access(ff, ino, W_OK);
-	if (ret)
-		goto out;
-
-	ret = fuse4fs_truncate(ff, ino, len);
-	if (ret)
-		goto out;
-
-out:
-	fuse4fs_finish(ff, ret);
-	return ret;
-}
-
 #ifdef __linux__
 static void detect_linux_executable_open(int kernel_flags, int *access_check,
 				  int *e2fs_open_flags)
@@ -3103,19 +2919,20 @@ static void detect_linux_executable_open(int kernel_flags, int *access_check,
 }
 #endif /* __linux__ */
 
-static int __op_open(struct fuse4fs *ff, const char *path,
-		     struct fuse_file_info *fp)
+static int fuse4fs_open_file(struct fuse4fs *ff, const struct fuse_ctx *ctxt,
+			     ext2_ino_t ino, struct fuse_file_info *fp)
 {
 	ext2_filsys fs = ff->fs;
 	errcode_t err;
 	struct fuse4fs_file_handle *file;
 	int check = 0, ret = 0;
 
-	dbg_printf(ff, "%s: path=%s oflags=0o%o\n", __func__, path, fp->flags);
+	dbg_printf(ff, "%s: ino=%d oflags=0o%o\n", __func__, ino, fp->flags);
 	err = ext2fs_get_mem(sizeof(*file), &file);
 	if (err)
 		return translate_error(fs, 0, err);
 	file->magic = FUSE4FS_FILE_MAGIC;
+	file->ino = ino;
 
 	file->open_flags = 0;
 	switch (fp->flags & O_ACCMODE) {
@@ -3144,14 +2961,7 @@ static int __op_open(struct fuse4fs *ff, const char *path,
 	if (fp->flags & O_CREAT)
 		file->open_flags |= EXT2_FILE_CREATE;
 
-	err = ext2fs_namei(fs, EXT2_ROOT_INO, EXT2_ROOT_INO, path, &file->ino);
-	if (err || file->ino == 0) {
-		ret = translate_error(fs, 0, err);
-		goto out;
-	}
-	dbg_printf(ff, "%s: ino=%d\n", __func__, file->ino);
-
-	ret = fuse4fs_inum_access(ff, file->ino, check);
+	ret = fuse4fs_inum_access(ff, ctxt, file->ino, check);
 	if (ret) {
 		/*
 		 * In a regular (Linux) fs driver, the kernel will open
@@ -3163,7 +2973,7 @@ static int __op_open(struct fuse4fs *ff, const char *path,
 		 * also employ undocumented hacks (see above).
 		 */
 		if (check == R_OK) {
-			ret = fuse4fs_inum_access(ff, file->ino, X_OK);
+			ret = fuse4fs_inum_access(ff, ctxt, file->ino, X_OK);
 			if (ret)
 				goto out;
 			check = X_OK;
@@ -3186,34 +2996,48 @@ static int __op_open(struct fuse4fs *ff, const char *path,
 	return ret;
 }
 
-static int op_open(const char *path, struct fuse_file_info *fp)
+static void op_open(fuse_req_t req, fuse_ino_t fino, struct fuse_file_info *fp)
 {
-	struct fuse4fs *ff = fuse4fs_get();
+	const struct fuse_ctx *ctxt = fuse_req_ctx(req);
+	struct fuse4fs *ff = fuse4fs_get(req);
+	ext2_ino_t ino;
 	int ret;
 
-	FUSE4FS_CHECK_CONTEXT(ff);
+	FUSE4FS_CHECK_CONTEXT(req);
+	FUSE4FS_CONVERT_FINO(req, &ino, fino);
 	fuse4fs_start(ff);
-	ret = __op_open(ff, path, fp);
+	ret = fuse4fs_open_file(ff, ctxt, ino, fp);
 	fuse4fs_finish(ff, ret);
-	return ret;
+
+	if (ret)
+		fuse_reply_err(req, -ret);
+	else
+		fuse_reply_open(req, fp);
 }
 
-static int op_read(const char *path EXT2FS_ATTR((unused)), char *buf,
-		   size_t len, off_t offset,
-		   struct fuse_file_info *fp)
+static void op_read(fuse_req_t req, fuse_ino_t fino EXT2FS_ATTR((unused)),
+		    size_t len, off_t offset, struct fuse_file_info *fp)
 {
-	struct fuse4fs *ff = fuse4fs_get();
+	struct fuse4fs *ff = fuse4fs_get(req);
 	struct fuse4fs_file_handle *fh = fuse4fs_get_handle(fp);
+	char *buf;
 	ext2_filsys fs;
 	ext2_file_t efp;
 	errcode_t err;
 	unsigned int got = 0;
 	int ret = 0;
 
-	FUSE4FS_CHECK_CONTEXT(ff);
-	FUSE4FS_CHECK_HANDLE(ff, fh);
+	buf = calloc(len, sizeof(char));
+	if (!buf) {
+		fuse_reply_err(req, errno);
+		return;
+	}
+
+	FUSE4FS_CHECK_CONTEXT(req);
+	FUSE4FS_CHECK_HANDLE(req, fh);
 	dbg_printf(ff, "%s: ino=%d off=0x%llx len=0x%zx\n", __func__, fh->ino,
 		   (unsigned long long)offset, len);
+
 	fs = fuse4fs_start(ff);
 	err = ext2fs_file_open(fs, fh->ino, fh->open_flags, &efp);
 	if (err) {
@@ -3249,14 +3073,18 @@ static int op_read(const char *path EXT2FS_ATTR((unused)), char *buf,
 	}
 out:
 	fuse4fs_finish(ff, ret);
-	return got ? (int) got : ret;
+	if (got)
+		fuse_reply_buf(req, buf, got);
+	else
+		fuse_reply_err(req, -ret);
+	ext2fs_free_mem(&buf);
 }
 
-static int op_write(const char *path EXT2FS_ATTR((unused)),
-		    const char *buf, size_t len, off_t offset,
-		    struct fuse_file_info *fp)
+static void op_write(fuse_req_t req, fuse_ino_t fino EXT2FS_ATTR((unused)),
+		     const char *buf, size_t len, off_t offset,
+		     struct fuse_file_info *fp)
 {
-	struct fuse4fs *ff = fuse4fs_get();
+	struct fuse4fs *ff = fuse4fs_get(req);
 	struct fuse4fs_file_handle *fh = fuse4fs_get_handle(fp);
 	ext2_filsys fs;
 	ext2_file_t efp;
@@ -3264,8 +3092,8 @@ static int op_write(const char *path EXT2FS_ATTR((unused)),
 	unsigned int got = 0;
 	int ret = 0;
 
-	FUSE4FS_CHECK_CONTEXT(ff);
-	FUSE4FS_CHECK_HANDLE(ff, fh);
+	FUSE4FS_CHECK_CONTEXT(req);
+	FUSE4FS_CHECK_HANDLE(req, fh);
 	dbg_printf(ff, "%s: ino=%d off=0x%llx len=0x%zx\n", __func__, fh->ino,
 		   (unsigned long long) offset, len);
 	fs = fuse4fs_start(ff);
@@ -3319,20 +3147,23 @@ static int op_write(const char *path EXT2FS_ATTR((unused)),
 
 out:
 	fuse4fs_finish(ff, ret);
-	return got ? (int) got : ret;
+	if (got)
+		fuse_reply_write(req, got);
+	else
+		fuse_reply_err(req, -ret);
 }
 
-static int op_release(const char *path EXT2FS_ATTR((unused)),
-		      struct fuse_file_info *fp)
+static void op_release(fuse_req_t req, fuse_ino_t fino EXT2FS_ATTR((unused)),
+		       struct fuse_file_info *fp)
 {
-	struct fuse4fs *ff = fuse4fs_get();
+	struct fuse4fs *ff = fuse4fs_get(req);
 	struct fuse4fs_file_handle *fh = fuse4fs_get_handle(fp);
 	ext2_filsys fs;
 	errcode_t err;
 	int ret = 0;
 
-	FUSE4FS_CHECK_CONTEXT(ff);
-	FUSE4FS_CHECK_HANDLE(ff, fh);
+	FUSE4FS_CHECK_CONTEXT(req);
+	FUSE4FS_CHECK_HANDLE(req, fh);
 	dbg_printf(ff, "%s: ino=%d\n", __func__, fh->ino);
 	fs = fuse4fs_start(ff);
 
@@ -3349,21 +3180,21 @@ static int op_release(const char *path EXT2FS_ATTR((unused)),
 
 	ext2fs_free_mem(&fh);
 
-	return ret;
+	fuse_reply_err(req, -ret);
 }
 
-static int op_fsync(const char *path EXT2FS_ATTR((unused)),
-		    int datasync EXT2FS_ATTR((unused)),
-		    struct fuse_file_info *fp)
+static void op_fsync(fuse_req_t req, fuse_ino_t fino EXT2FS_ATTR((unused)),
+		     int datasync EXT2FS_ATTR((unused)),
+		     struct fuse_file_info *fp)
 {
-	struct fuse4fs *ff = fuse4fs_get();
+	struct fuse4fs *ff = fuse4fs_get(req);
 	struct fuse4fs_file_handle *fh = fuse4fs_get_handle(fp);
 	ext2_filsys fs;
 	errcode_t err;
 	int ret = 0;
 
-	FUSE4FS_CHECK_CONTEXT(ff);
-	FUSE4FS_CHECK_HANDLE(ff, fh);
+	FUSE4FS_CHECK_CONTEXT(req);
+	FUSE4FS_CHECK_HANDLE(req, fh);
 	dbg_printf(ff, "%s: ino=%d\n", __func__, fh->ino);
 	fs = fuse4fs_start(ff);
 	/* For now, flush everything, even if it's slow */
@@ -3374,22 +3205,24 @@ static int op_fsync(const char *path EXT2FS_ATTR((unused)),
 	}
 	fuse4fs_finish(ff, ret);
 
-	return ret;
+	fuse_reply_err(req, -ret);
 }
 
-static int op_statfs(const char *path EXT2FS_ATTR((unused)),
-		     struct statvfs *buf)
+static void op_statfs(fuse_req_t req, fuse_ino_t fino)
 {
-	struct fuse4fs *ff = fuse4fs_get();
+	struct statvfs buf;
+	struct fuse4fs *ff = fuse4fs_get(req);
 	ext2_filsys fs;
 	uint64_t fsid, *f;
+	ext2_ino_t ino;
 	blk64_t overhead, reserved, free;
 
-	FUSE4FS_CHECK_CONTEXT(ff);
-	dbg_printf(ff, "%s: path=%s\n", __func__, path);
+	FUSE4FS_CHECK_CONTEXT(req);
+	FUSE4FS_CONVERT_FINO(req, &ino, fino);
+	dbg_printf(ff, "%s: ino=%d\n", __func__, ino);
 	fs = fuse4fs_start(ff);
-	buf->f_bsize = fs->blocksize;
-	buf->f_frsize = 0;
+	buf.f_bsize = fs->blocksize;
+	buf.f_frsize = 0;
 
 	if (ff->minixdf)
 		overhead = 0;
@@ -3402,27 +3235,27 @@ static int op_statfs(const char *path EXT2FS_ATTR((unused)),
 		reserved = ext2fs_blocks_count(fs->super) / 10;
 	free = ext2fs_free_blocks_count(fs->super);
 
-	buf->f_blocks = ext2fs_blocks_count(fs->super) - overhead;
-	buf->f_bfree = free;
+	buf.f_blocks = ext2fs_blocks_count(fs->super) - overhead;
+	buf.f_bfree = free;
 	if (free < reserved)
-		buf->f_bavail = 0;
+		buf.f_bavail = 0;
 	else
-		buf->f_bavail = free - reserved;
-	buf->f_files = fs->super->s_inodes_count;
-	buf->f_ffree = fs->super->s_free_inodes_count;
-	buf->f_favail = fs->super->s_free_inodes_count;
+		buf.f_bavail = free - reserved;
+	buf.f_files = fs->super->s_inodes_count;
+	buf.f_ffree = fs->super->s_free_inodes_count;
+	buf.f_favail = fs->super->s_free_inodes_count;
 	f = (uint64_t *)fs->super->s_uuid;
 	fsid = *f;
 	f++;
 	fsid ^= *f;
-	buf->f_fsid = fsid;
-	buf->f_flag = 0;
+	buf.f_fsid = fsid;
+	buf.f_flag = 0;
 	if (ff->opstate != F4OP_WRITABLE)
-		buf->f_flag |= ST_RDONLY;
-	buf->f_namemax = EXT2_NAME_LEN;
+		buf.f_flag |= ST_RDONLY;
+	buf.f_namemax = EXT2_NAME_LEN;
 	fuse4fs_finish(ff, 0);
 
-	return 0;
+	fuse_reply_statfs(req, &buf);
 }
 
 static const char *valid_xattr_prefixes[] = {
@@ -3446,35 +3279,33 @@ static int validate_xattr_name(const char *name)
 	return 0;
 }
 
-static int op_getxattr(const char *path, const char *key, char *value,
-		       size_t len)
+static void op_getxattr(fuse_req_t req, fuse_ino_t fino, const char *key,
+			size_t len)
 {
-	struct fuse4fs *ff = fuse4fs_get();
+	const struct fuse_ctx *ctxt = fuse_req_ctx(req);
+	struct fuse4fs *ff = fuse4fs_get(req);
 	ext2_filsys fs;
-	void *ptr;
+	void *ptr = NULL;
 	size_t plen;
 	ext2_ino_t ino;
-	errcode_t err;
 	int ret = 0;
 
-	if (!validate_xattr_name(key))
-		return -ENODATA;
+	if (!validate_xattr_name(key)) {
+		fuse_reply_err(req, ENODATA);
+		return;
+	}
 
-	FUSE4FS_CHECK_CONTEXT(ff);
+	FUSE4FS_CHECK_CONTEXT(req);
+	FUSE4FS_CONVERT_FINO(req, &ino, fino);
 	fs = fuse4fs_start(ff);
 	if (!ext2fs_has_feature_xattr(fs->super)) {
 		ret = -ENOTSUP;
 		goto out;
 	}
 
-	err = ext2fs_namei(fs, EXT2_ROOT_INO, EXT2_ROOT_INO, path, &ino);
-	if (err || ino == 0) {
-		ret = translate_error(fs, 0, err);
-		goto out;
-	}
-	dbg_printf(ff, "%s: ino=%d name=%s\n", __func__, ino, key);
+	dbg_printf(ff, "%s: ino=%d name='%s'\n", __func__, ino, key);
 
-	ret = fuse4fs_inum_access(ff, ino, R_OK);
+	ret = fuse4fs_inum_access(ff, ctxt, ino, R_OK);
 	if (ret)
 		goto out;
 
@@ -3483,19 +3314,26 @@ static int op_getxattr(const char *path, const char *key, char *value,
 		goto out;
 
 	if (!len) {
+		/* Just tell us the length */
 		ret = plen;
 	} else if (len < plen) {
+		/* Caller's buffer wasn't big enough */
 		ret = -ERANGE;
 	} else {
-		memcpy(value, ptr, plen);
+		/* We have data */
 		ret = plen;
 	}
 
+out:
+	fuse4fs_finish(ff, ret);
+
+	if (ret < 0)
+		fuse_reply_err(req, -ret);
+	else if (!len)
+		fuse_reply_xattr(req, ret);
+	else
+		fuse_reply_buf(req, ptr, ret);
 	ext2fs_free_mem(&ptr);
-out:
-	fuse4fs_finish(ff, ret);
-
-	return ret;
 }
 
 static int count_buffer_space(char *name, char *value EXT2FS_ATTR((unused)),
@@ -3520,31 +3358,30 @@ static int copy_names(char *name, char *value EXT2FS_ATTR((unused)),
 	return 0;
 }
 
-static int op_listxattr(const char *path, char *names, size_t len)
+static void op_listxattr(fuse_req_t req, fuse_ino_t fino, size_t len)
 {
-	struct fuse4fs *ff = fuse4fs_get();
+	const struct fuse_ctx *ctxt = fuse_req_ctx(req);
+	struct fuse4fs *ff = fuse4fs_get(req);
 	ext2_filsys fs;
 	struct ext2_xattr_handle *h;
+	char *names = NULL;
+	char *next_name;
 	unsigned int bufsz;
 	ext2_ino_t ino;
 	errcode_t err;
 	int ret = 0;
 
-	FUSE4FS_CHECK_CONTEXT(ff);
+	FUSE4FS_CHECK_CONTEXT(req);
+	FUSE4FS_CONVERT_FINO(req, &ino, fino);
 	fs = fuse4fs_start(ff);
 	if (!ext2fs_has_feature_xattr(fs->super)) {
 		ret = -ENOTSUP;
 		goto out;
 	}
 
-	err = ext2fs_namei(fs, EXT2_ROOT_INO, EXT2_ROOT_INO, path, &ino);
-	if (err || ino == 0) {
-		ret = translate_error(fs, ino, err);
-		goto out;
-	}
 	dbg_printf(ff, "%s: ino=%d\n", __func__, ino);
 
-	ret = fuse4fs_inum_access(ff, ino, R_OK);
+	ret = fuse4fs_inum_access(ff, ctxt, ino, R_OK);
 	if (ret)
 		goto out;
 
@@ -3569,21 +3406,28 @@ static int op_listxattr(const char *path, char *names, size_t len)
 	}
 
 	if (len == 0) {
-		ret = bufsz;
+		/* Just tell us the length */
 		goto out2;
 	} else if (len < bufsz) {
+		/* Caller's buffer wasn't big enough */
 		ret = -ERANGE;
 		goto out2;
 	}
 
 	/* Copy names out */
-	memset(names, 0, len);
-	err = ext2fs_xattrs_iterate(h, copy_names, &names);
+	names = calloc(len, sizeof(char));
+	if (!names) {
+		ret = translate_error(fs, ino, errno);
+		goto out2;
+	}
+	next_name = names;
+
+	err = ext2fs_xattrs_iterate(h, copy_names, &next_name);
 	if (err) {
 		ret = translate_error(fs, ino, err);
 		goto out2;
 	}
-	ret = bufsz;
+
 out2:
 	err = ext2fs_xattrs_close(&h);
 	if (err && !ret)
@@ -3591,41 +3435,47 @@ static int op_listxattr(const char *path, char *names, size_t len)
 out:
 	fuse4fs_finish(ff, ret);
 
-	return ret;
+	if (ret < 0)
+		fuse_reply_err(req, -ret);
+	else if (names)
+		fuse_reply_buf(req, names, bufsz);
+	else
+		fuse_reply_xattr(req, bufsz);
+	free(names);
 }
 
-static int op_setxattr(const char *path EXT2FS_ATTR((unused)),
-		       const char *key, const char *value,
-		       size_t len, int flags)
+static void op_setxattr(fuse_req_t req, fuse_ino_t fino, const char *key,
+			const char *value, size_t len, int flags)
 {
-	struct fuse4fs *ff = fuse4fs_get();
+	const struct fuse_ctx *ctxt = fuse_req_ctx(req);
+	struct fuse4fs *ff = fuse4fs_get(req);
 	ext2_filsys fs;
 	struct ext2_xattr_handle *h;
 	ext2_ino_t ino;
 	errcode_t err;
 	int ret = 0;
 
-	if (flags & ~(XATTR_CREATE | XATTR_REPLACE))
-		return -EOPNOTSUPP;
+	if (flags & ~(XATTR_CREATE | XATTR_REPLACE)) {
+		fuse_reply_err(req, EOPNOTSUPP);
+		return;
+	}
 
-	if (!validate_xattr_name(key))
-		return -EINVAL;
+	if (!validate_xattr_name(key)) {
+		fuse_reply_err(req, EINVAL);
+		return;
+	}
 
-	FUSE4FS_CHECK_CONTEXT(ff);
+	FUSE4FS_CHECK_CONTEXT(req);
+	FUSE4FS_CONVERT_FINO(req, &ino, fino);
 	fs = fuse4fs_start(ff);
 	if (!ext2fs_has_feature_xattr(fs->super)) {
 		ret = -ENOTSUP;
 		goto out;
 	}
 
-	err = ext2fs_namei(fs, EXT2_ROOT_INO, EXT2_ROOT_INO, path, &ino);
-	if (err || ino == 0) {
-		ret = translate_error(fs, 0, err);
-		goto out;
-	}
-	dbg_printf(ff, "%s: ino=%d name=%s\n", __func__, ino, key);
+	dbg_printf(ff, "%s: ino=%d name='%s'\n", __func__, ino, key);
 
-	ret = fuse4fs_inum_access(ff, ino, W_OK);
+	ret = fuse4fs_inum_access(ff, ctxt, ino, W_OK);
 	if (ret == -EACCES) {
 		ret = -EPERM;
 		goto out;
@@ -3682,13 +3532,13 @@ static int op_setxattr(const char *path EXT2FS_ATTR((unused)),
 		ret = translate_error(fs, ino, err);
 out:
 	fuse4fs_finish(ff, ret);
-
-	return ret;
+	fuse_reply_err(req, -ret);
 }
 
-static int op_removexattr(const char *path, const char *key)
+static void op_removexattr(fuse_req_t req, fuse_ino_t fino, const char *key)
 {
-	struct fuse4fs *ff = fuse4fs_get();
+	const struct fuse_ctx *ctxt = fuse_req_ctx(req);
+	struct fuse4fs *ff = fuse4fs_get(req);
 	ext2_filsys fs;
 	struct ext2_xattr_handle *h;
 	void *buf;
@@ -3701,13 +3551,18 @@ static int op_removexattr(const char *path, const char *key)
 	 * Once in a while libfuse gives us a no-name xattr to delete as part
 	 * of clearing ACLs.  Just pretend we cleared them.
 	 */
-	if (key[0] == 0)
-		return 0;
+	if (key[0] == 0) {
+		fuse_reply_err(req, 0);
+		return;
+	}
 
-	if (!validate_xattr_name(key))
-		return -ENODATA;
+	if (!validate_xattr_name(key)) {
+		fuse_reply_err(req, ENODATA);
+		return;
+	}
 
-	FUSE4FS_CHECK_CONTEXT(ff);
+	FUSE4FS_CHECK_CONTEXT(req);
+	FUSE4FS_CONVERT_FINO(req, &ino, fino);
 	fs = fuse4fs_start(ff);
 	if (!ext2fs_has_feature_xattr(fs->super)) {
 		ret = -ENOTSUP;
@@ -3719,14 +3574,9 @@ static int op_removexattr(const char *path, const char *key)
 		goto out;
 	}
 
-	err = ext2fs_namei(fs, EXT2_ROOT_INO, EXT2_ROOT_INO, path, &ino);
-	if (err || ino == 0) {
-		ret = translate_error(fs, 0, err);
-		goto out;
-	}
 	dbg_printf(ff, "%s: ino=%d name=%s\n", __func__, ino, key);
 
-	ret = fuse4fs_inum_access(ff, ino, W_OK);
+	ret = fuse4fs_inum_access(ff, ctxt, ino, W_OK);
 	if (ret)
 		goto out;
 
@@ -3776,24 +3626,26 @@ static int op_removexattr(const char *path, const char *key)
 		ret = translate_error(fs, ino, err);
 out:
 	fuse4fs_finish(ff, ret);
-
-	return ret;
+	fuse_reply_err(req, -ret);
 }
 
 struct readdir_iter {
 	void *buf;
-	ext2_filsys fs;
-	fuse_fill_dir_t func;
+	size_t bufsz;
+	size_t bufused;
 
+	ext2_filsys fs;
 	struct fuse4fs *ff;
-	enum fuse_readdir_flags flags;
+	fuse_req_t req;
+
+	bool readdirplus;
 	unsigned int nr;
 	off_t startpos;
 	off_t dirpos;
 };
 
 static inline mode_t dirent_fmode(ext2_filsys fs,
-				   const struct ext2_dir_entry *dirent)
+				  const struct ext2_dir_entry *dirent)
 {
 	if (!ext2fs_has_feature_filetype(fs->super))
 		return 0;
@@ -3827,10 +3679,15 @@ static int op_readdir_iter(ext2_ino_t dir EXT2FS_ATTR((unused)),
 {
 	struct readdir_iter *i = data;
 	char namebuf[EXT2_NAME_LEN + 1];
-	struct stat stat = {
-		.st_ino = dirent->inode,
-		.st_mode = dirent_fmode(i->fs, dirent),
+	struct fuse4fs_stat fstat = {
+		.entry = {
+			.attr = {
+				.st_ino = dirent->inode,
+				.st_mode = dirent_fmode(i->fs, dirent),
+			},
+		},
 	};
+	size_t entrysize;
 	int ret;
 
 	i->dirpos++;
@@ -3838,48 +3695,67 @@ static int op_readdir_iter(ext2_ino_t dir EXT2FS_ATTR((unused)),
 		return 0;
 
 	dbg_printf(i->ff, "READDIR%s ino=%d %u offset=0x%llx\n",
-			i->flags == FUSE_READDIR_PLUS ? "PLUS" : "",
+			i->readdirplus ? "PLUS" : "",
 			dir,
 			i->nr++,
 			(unsigned long long)i->dirpos);
 
-	if (i->flags == FUSE_READDIR_PLUS) {
-		ret = stat_inode(i->fs, dirent->inode, &stat);
+	if (i->readdirplus) {
+		ret = fuse4fs_stat_inode(i->ff, dirent->inode, NULL, &fstat);
 		if (ret)
 			return DIRENT_ABORT;
 	}
 
 	memcpy(namebuf, dirent->name, dirent->name_len & 0xFF);
 	namebuf[dirent->name_len & 0xFF] = 0;
-	ret = i->func(i->buf, namebuf, &stat, i->dirpos , 0);
-	if (ret)
+
+	if (i->readdirplus) {
+		entrysize = fuse_add_direntry_plus(i->req, i->buf + i->bufused,
+						   i->bufsz - i->bufused,
+						   namebuf, &fstat.entry,
+						   i->dirpos);
+	} else {
+		entrysize = fuse_add_direntry(i->req, i->buf + i->bufused,
+					      i->bufsz - i->bufused, namebuf,
+					      &fstat.entry.attr, i->dirpos);
+	}
+	if (entrysize > i->bufsz - i->bufused) {
+		/* Buffer is full */
 		return DIRENT_ABORT;
+	}
 
+	i->bufused += entrysize;
 	return 0;
 }
 
-static int op_readdir(const char *path EXT2FS_ATTR((unused)), void *buf,
-		      fuse_fill_dir_t fill_func, off_t offset,
-		      struct fuse_file_info *fp, enum fuse_readdir_flags flags)
+static void __op_readdir(fuse_req_t req, fuse_ino_t fino, size_t size,
+			 off_t offset, bool plus, struct fuse_file_info *fp)
 {
-	struct fuse4fs *ff = fuse4fs_get();
+	struct fuse4fs *ff = fuse4fs_get(req);
 	struct fuse4fs_file_handle *fh = fuse4fs_get_handle(fp);
 	errcode_t err;
 	struct readdir_iter i = {
 		.ff = ff,
+		.req = req,
 		.dirpos = 0,
 		.startpos = offset,
-		.flags = flags,
+		.readdirplus = plus,
+		.bufsz = size,
 	};
 	int ret = 0;
 
-	FUSE4FS_CHECK_CONTEXT(ff);
-	FUSE4FS_CHECK_HANDLE(ff, fh);
+	FUSE4FS_CHECK_CONTEXT(req);
+	FUSE4FS_CHECK_HANDLE(req, fh);
 	dbg_printf(ff, "%s: ino=%d offset=0x%llx\n", __func__, fh->ino,
 			(unsigned long long)offset);
+
+	err = ext2fs_get_mem(size, &i.buf);
+	if (err) {
+		ret = translate_error(i.fs, fh->ino, err);
+		goto out;
+	}
+
 	i.fs = fuse4fs_start(ff);
-	i.buf = buf;
-	i.func = fill_func;
 	err = ext2fs_dir_iterate2(i.fs, fh->ino, 0, NULL, op_readdir_iter, &i);
 	if (err) {
 		ret = translate_error(i.fs, fh->ino, err);
@@ -3893,64 +3769,66 @@ static int op_readdir(const char *path EXT2FS_ATTR((unused)), void *buf,
 	}
 out:
 	fuse4fs_finish(ff, ret);
-	return ret;
+	if (ret)
+		fuse_reply_err(req, -ret);
+	else
+		fuse_reply_buf(req, i.buf, i.bufused);
+
+	ext2fs_free_mem(&i.buf);
+}
+
+static void op_readdir(fuse_req_t req, fuse_ino_t fino, size_t size,
+		       off_t offset, struct fuse_file_info *fp)
+{
+	__op_readdir(req, fino, size, offset, false, fp);
+}
+
+static void op_readdirplus(fuse_req_t req, fuse_ino_t fino, size_t size,
+			   off_t offset, struct fuse_file_info *fp)
+{
+	__op_readdir(req, fino, size, offset, true, fp);
 }
 
-static int op_access(const char *path, int mask)
+static void op_access(fuse_req_t req, fuse_ino_t fino, int mask)
 {
-	struct fuse4fs *ff = fuse4fs_get();
-	ext2_filsys fs;
-	errcode_t err;
+	const struct fuse_ctx *ctxt = fuse_req_ctx(req);
+	struct fuse4fs *ff = fuse4fs_get(req);
 	ext2_ino_t ino;
 	int ret = 0;
 
-	FUSE4FS_CHECK_CONTEXT(ff);
-	dbg_printf(ff, "%s: path=%s mask=0x%x\n", __func__, path, mask);
-	fs = fuse4fs_start(ff);
-	err = ext2fs_namei(fs, EXT2_ROOT_INO, EXT2_ROOT_INO, path, &ino);
-	if (err || ino == 0) {
-		ret = translate_error(fs, 0, err);
-		goto out;
-	}
+	FUSE4FS_CHECK_CONTEXT(req);
+	FUSE4FS_CONVERT_FINO(req, &ino, fino);
+	dbg_printf(ff, "%s: ino=%d mask=0x%x\n",
+		   __func__, ino, mask);
+	fuse4fs_start(ff);
 
-	ret = fuse4fs_inum_access(ff, ino, mask);
+	ret = fuse4fs_inum_access(ff, ctxt, ino, mask);
 	if (ret)
 		goto out;
 
 out:
 	fuse4fs_finish(ff, ret);
-	return ret;
+	fuse_reply_err(req, -ret);
 }
 
-static int op_create(const char *path, mode_t mode, struct fuse_file_info *fp)
+static void op_create(fuse_req_t req, fuse_ino_t fino, const char *name,
+		      mode_t mode, struct fuse_file_info *fp)
 {
-	struct fuse_context *ctxt = fuse_get_context();
-	struct fuse4fs *ff = fuse4fs_get();
+	struct ext2_inode_large inode;
+	struct fuse4fs_stat fstat;
+	const struct fuse_ctx *ctxt = fuse_req_ctx(req);
+	struct fuse4fs *ff = fuse4fs_get(req);
 	ext2_filsys fs;
 	ext2_ino_t parent, child;
-	char *temp_path;
 	errcode_t err;
-	char *node_name, a;
 	int filetype;
-	struct ext2_inode_large inode;
 	gid_t gid;
 	int ret = 0;
 
-	FUSE4FS_CHECK_CONTEXT(ff);
-	dbg_printf(ff, "%s: path=%s mode=0%o\n", __func__, path, mode);
-	temp_path = strdup(path);
-	if (!temp_path) {
-		ret = -ENOMEM;
-		goto out;
-	}
-	node_name = strrchr(temp_path, '/');
-	if (!node_name) {
-		ret = -ENOMEM;
-		goto out;
-	}
-	node_name++;
-	a = *node_name;
-	*node_name = 0;
+	FUSE4FS_CHECK_CONTEXT(req);
+	FUSE4FS_CONVERT_FINO(req, &parent, fino);
+	dbg_printf(ff, "%s: parent=%d name='%s' mode=0%o\n",
+		   __func__, parent, name, mode);
 
 	fs = fuse4fs_start(ff);
 	if (!fuse4fs_can_allocate(ff, 1)) {
@@ -3958,23 +3836,14 @@ static int op_create(const char *path, mode_t mode, struct fuse_file_info *fp)
 		goto out2;
 	}
 
-	err = ext2fs_namei(fs, EXT2_ROOT_INO, EXT2_ROOT_INO, temp_path,
-			   &parent);
-	if (err) {
-		ret = translate_error(fs, 0, err);
-		goto out2;
-	}
-
-	ret = fuse4fs_inum_access(ff, parent, A_OK | W_OK);
+	ret = fuse4fs_inum_access(ff, ctxt, parent, A_OK | W_OK);
 	if (ret)
 		goto out2;
 
-	err = fuse4fs_new_child_gid(ff, parent, &gid, NULL);
+	err = fuse4fs_new_child_gid(ff, ctxt, parent, &gid, NULL);
 	if (err)
 		goto out2;
 
-	*node_name = a;
-
 	filetype = ext2_file_type(mode);
 
 	err = ext2fs_new_inode(fs, parent, mode, 0, &child);
@@ -3983,9 +3852,9 @@ static int op_create(const char *path, mode_t mode, struct fuse_file_info *fp)
 		goto out2;
 	}
 
-	dbg_printf(ff, "%s: creating ino=%d/name=%s in dir=%d\n", __func__, child,
-		   node_name, parent);
-	err = ext2fs_link(fs, parent, node_name, child,
+	dbg_printf(ff, "%s: creating dir=%d name='%s' child=%d\n",
+		   __func__, parent, name, child);
+	err = ext2fs_link(fs, parent, name, child,
 			  filetype | EXT2FS_LINK_EXPAND);
 	if (err) {
 		ret = translate_error(fs, parent, err);
@@ -4037,7 +3906,7 @@ static int op_create(const char *path, mode_t mode, struct fuse_file_info *fp)
 		goto out2;
 
 	fp->flags &= ~O_TRUNC;
-	ret = __op_open(ff, path, fp);
+	ret = fuse4fs_open_file(ff, ctxt, child, fp);
 	if (ret)
 		goto out2;
 
@@ -4045,44 +3914,152 @@ static int op_create(const char *path, mode_t mode, struct fuse_file_info *fp)
 	if (ret)
 		goto out2;
 
+	ret = fuse4fs_stat_inode(ff, child, NULL, &fstat);
+	if (ret)
+		goto out2;
+
 out2:
 	fuse4fs_finish(ff, ret);
-out:
-	free(temp_path);
-	return ret;
+
+	if (ret)
+		fuse_reply_err(req, -ret);
+	else
+		fuse_reply_create(req, &fstat.entry, fp);
+}
+
+enum fuse4fs_time_action {
+	TA_NOW,		/* set to current time */
+	TA_OMIT,	/* do not set timestamp */
+	TA_THIS,	/* set to specific timestamp */
+};
+
+static inline const char *
+fuse4fs_time_action_string(enum fuse4fs_time_action act)
+{
+	switch (act) {
+	case TA_NOW:
+		return "now";
+	case TA_OMIT:
+		return "omit";
+	case TA_THIS:
+		return "specific";
+	}
+	return NULL; /* shut up gcc */
 }
 
-static int op_utimens(const char *path, const struct timespec ctv[2],
-		      struct fuse_file_info *fi)
+static int fuse4fs_utimens(struct fuse4fs *ff, const struct fuse_ctx *ctxt,
+			   ext2_ino_t ino, const int to_set,
+			   const struct stat *attr,
+			   struct ext2_inode_large *inode)
 {
-	struct fuse4fs *ff = fuse4fs_get();
-	struct timespec tv[2];
-	ext2_filsys fs;
-	errcode_t err;
-	ext2_ino_t ino;
-	struct ext2_inode_large inode;
+	enum fuse4fs_time_action aact = TA_OMIT;
+	enum fuse4fs_time_action mact = TA_OMIT;
+	struct timespec atime = { };
+	struct timespec mtime = { };
+	struct timespec now = { };
 	int access = W_OK;
 	int ret = 0;
 
-	FUSE4FS_CHECK_CONTEXT(ff);
-	fs = fuse4fs_start(ff);
-	ret = fuse4fs_file_ino(ff, path, fi, &ino);
-	if (ret)
-		goto out;
-	dbg_printf(ff, "%s: ino=%d atime=%lld.%ld mtime=%lld.%ld\n", __func__,
-			ino,
-			(long long int)ctv[0].tv_sec, ctv[0].tv_nsec,
-			(long long int)ctv[1].tv_sec, ctv[1].tv_nsec);
+	if (to_set & (FUSE_SET_ATTR_ATIME_NOW | FUSE_SET_ATTR_MTIME_NOW))
+		get_now(&now);
+
+	if (to_set & FUSE_SET_ATTR_ATIME_NOW) {
+		atime = now;
+		aact = TA_NOW;
+	} else if (to_set & FUSE_SET_ATTR_ATIME) {
+#if HAVE_STRUCT_STAT_ST_ATIM
+		atime = attr->st_atim;
+#else
+		atime.tv_sec = attr->st_atime;
+#endif
+		aact = TA_THIS;
+	}
+
+	if (to_set & FUSE_SET_ATTR_MTIME_NOW) {
+		mtime = now;
+		mact = TA_NOW;
+	} else if (to_set & FUSE_SET_ATTR_MTIME) {
+#if HAVE_STRUCT_STAT_ST_ATIM
+		mtime = attr->st_mtim;
+#else
+		mtime.tv_sec = attr->st_mtime;
+#endif
+		mact = TA_THIS;
+	}
+
+	dbg_printf(ff, "%s: ino=%d atime=%s:%lld.%ld mtime=%s:%lld.%ld\n",
+		   __func__, ino, fuse4fs_time_action_string(aact),
+		   (long long int)atime.tv_sec, atime.tv_nsec,
+		   fuse4fs_time_action_string(mact),
+		   (long long int)mtime.tv_sec, mtime.tv_nsec);
 
 	/*
 	 * ext4 allows timestamp updates of append-only files but only if we're
 	 * setting to current time
 	 */
-	if (ctv[0].tv_nsec == UTIME_NOW && ctv[1].tv_nsec == UTIME_NOW)
+	if (aact == TA_NOW && mact == TA_NOW)
 		access |= A_OK;
-	ret = fuse4fs_inum_access(ff, ino, access);
+	ret = fuse4fs_inum_access(ff, ctxt, ino, access);
 	if (ret)
+		return ret;
+
+	if (aact != TA_OMIT)
+		EXT4_INODE_SET_XTIME(i_atime, &atime, inode);
+	if (mact != TA_OMIT)
+		EXT4_INODE_SET_XTIME(i_mtime, &mtime, inode);
+
+	return 0;
+}
+
+static int fuse4fs_setsize(struct fuse4fs *ff, const struct fuse_ctx *ctxt,
+			   ext2_ino_t ino, off_t new_size,
+			   struct ext2_inode_large *inode)
+{
+	errcode_t err;
+	int ret;
+
+	/* Write inode because truncate makes its own copy */
+	err = fuse4fs_write_inode(ff->fs, ino, inode);
+	if (err)
+		return translate_error(ff->fs, ino, err);
+
+	ret = fuse4fs_inum_access(ff, ctxt, ino, W_OK);
+	if (ret)
+		return ret;
+
+	ret = fuse4fs_truncate(ff, ino, new_size);
+	if (ret)
+		return ret;
+
+	/* Re-read inode after truncate */
+	err = fuse4fs_read_inode(ff->fs, ino, inode);
+	if (err)
+		return translate_error(ff->fs, ino, err);
+
+	return 0;
+}
+
+static void op_setattr(fuse_req_t req, fuse_ino_t fino, struct stat *attr,
+		       int to_set, struct fuse_file_info *fi EXT2FS_ATTR((unused)))
+{
+	struct ext2_inode_large inode;
+	struct fuse4fs_stat fstat;
+	const struct fuse_ctx *ctxt = fuse_req_ctx(req);
+	struct fuse4fs *ff = fuse4fs_get(req);
+	ext2_filsys fs;
+	ext2_ino_t ino;
+	errcode_t err;
+	int ret = 0;
+
+	FUSE4FS_CHECK_CONTEXT(req);
+	FUSE4FS_CONVERT_FINO(req, &ino, fino);
+	dbg_printf(ff, "%s: ino=%d to_set=0x%x\n", __func__, ino, to_set);
+	fs = fuse4fs_start(ff);
+
+	if (!fuse4fs_is_writeable(ff)) {
+		ret = -EROFS;
 		goto out;
+	}
 
 	err = fuse4fs_read_inode(fs, ino, &inode);
 	if (err) {
@@ -4090,20 +4067,35 @@ static int op_utimens(const char *path, const struct timespec ctv[2],
 		goto out;
 	}
 
-	tv[0] = ctv[0];
-	tv[1] = ctv[1];
-#ifdef UTIME_NOW
-	if (tv[0].tv_nsec == UTIME_NOW)
-		get_now(tv);
-	if (tv[1].tv_nsec == UTIME_NOW)
-		get_now(tv + 1);
-#endif /* UTIME_NOW */
-#ifdef UTIME_OMIT
-	if (tv[0].tv_nsec != UTIME_OMIT)
-		EXT4_INODE_SET_XTIME(i_atime, &tv[0], &inode);
-	if (tv[1].tv_nsec != UTIME_OMIT)
-		EXT4_INODE_SET_XTIME(i_mtime, &tv[1], &inode);
-#endif /* UTIME_OMIT */
+	/* Handle mode change using helper */
+	if (to_set & FUSE_SET_ATTR_MODE) {
+		ret = fuse4fs_chmod(ff, req, ino, attr->st_mode, &inode);
+		if (ret)
+			goto out;
+	}
+
+	/* Handle owner/group change using helper */
+	if (to_set & (FUSE_SET_ATTR_UID | FUSE_SET_ATTR_GID)) {
+		ret = fuse4fs_chown(ff, ctxt, ino, to_set, attr, &inode);
+		if (ret)
+			goto out;
+	}
+
+	/* Handle size change using helper */
+	if (to_set & FUSE_SET_ATTR_SIZE) {
+		ret = fuse4fs_setsize(ff, ctxt, ino, attr->st_size, &inode);
+		if (ret)
+			goto out;
+	}
+
+	/* Handle time changes using helper */
+	if (to_set & (FUSE_SET_ATTR_ATIME | FUSE_SET_ATTR_MTIME)) {
+		ret = fuse4fs_utimens(ff, ctxt, ino, to_set, attr, &inode);
+		if (ret)
+			goto out;
+	}
+
+	/* Update ctime for any attribute change */
 	ret = update_ctime(fs, ino, &inode);
 	if (ret)
 		goto out;
@@ -4114,9 +4106,17 @@ static int op_utimens(const char *path, const struct timespec ctv[2],
 		goto out;
 	}
 
+	/* Get updated stat info to return */
+	ret = fuse4fs_stat_inode(ff, ino, &inode, &fstat);
+
 out:
 	fuse4fs_finish(ff, ret);
-	return ret;
+
+	if (ret)
+		fuse_reply_err(req, -ret);
+	else
+		fuse_reply_attr(req, &fstat.entry.attr,
+				fstat.entry.attr_timeout);
 }
 
 #define FUSE4FS_MODIFIABLE_IFLAGS \
@@ -4135,32 +4135,38 @@ static inline int set_iflags(struct ext2_inode_large *inode, __u32 iflags)
 
 #ifdef SUPPORT_I_FLAGS
 static int ioctl_getflags(struct fuse4fs *ff, struct fuse4fs_file_handle *fh,
-			  void *data)
+			  __u32 *outdata, size_t *outsize)
 {
 	ext2_filsys fs = ff->fs;
 	errcode_t err;
 	struct ext2_inode_large inode;
 
+	if (*outsize < sizeof(__u32))
+		return -EFAULT;
+
 	dbg_printf(ff, "%s: ino=%d\n", __func__, fh->ino);
 	err = fuse4fs_read_inode(fs, fh->ino, &inode);
 	if (err)
 		return translate_error(fs, fh->ino, err);
 
-	*(__u32 *)data = inode.i_flags & EXT2_FL_USER_VISIBLE;
+	*outdata = inode.i_flags & EXT2_FL_USER_VISIBLE;
+	*outsize = sizeof(__u32);
 	return 0;
 }
 
-static int ioctl_setflags(struct fuse4fs *ff, struct fuse4fs_file_handle *fh,
-			  void *data)
+static int ioctl_setflags(struct fuse4fs *ff, const struct fuse_ctx *ctxt,
+			  struct fuse4fs_file_handle *fh, const __u32 *indata,
+			  size_t insize)
 {
 	ext2_filsys fs = ff->fs;
 	errcode_t err;
 	struct ext2_inode_large inode;
 	int ret;
-	__u32 flags = *(__u32 *)data;
-	struct fuse_context *ctxt = fuse_get_context();
 
-	dbg_printf(ff, "%s: ino=%d\n", __func__, fh->ino);
+	if (insize < sizeof(__u32))
+		return -EFAULT;
+
+	dbg_printf(ff, "%s: ino=%d iflags=0x%x\n", __func__, fh->ino, *indata);
 	err = fuse4fs_read_inode(fs, fh->ino, &inode);
 	if (err)
 		return translate_error(fs, fh->ino, err);
@@ -4168,7 +4174,7 @@ static int ioctl_setflags(struct fuse4fs *ff, struct fuse4fs_file_handle *fh,
 	if (fuse4fs_want_check_owner(ff, ctxt) && inode_uid(inode) != ctxt->uid)
 		return -EPERM;
 
-	ret = set_iflags(&inode, flags);
+	ret = set_iflags(&inode, *indata);
 	if (ret)
 		return ret;
 
@@ -4184,32 +4190,38 @@ static int ioctl_setflags(struct fuse4fs *ff, struct fuse4fs_file_handle *fh,
 }
 
 static int ioctl_getversion(struct fuse4fs *ff, struct fuse4fs_file_handle *fh,
-			    void *data)
+			    __u32 *outdata, size_t *outsize)
 {
 	ext2_filsys fs = ff->fs;
 	errcode_t err;
 	struct ext2_inode_large inode;
 
+	if (*outsize < sizeof(__u32))
+		return -EFAULT;
+
 	dbg_printf(ff, "%s: ino=%d\n", __func__, fh->ino);
 	err = fuse4fs_read_inode(fs, fh->ino, &inode);
 	if (err)
 		return translate_error(fs, fh->ino, err);
 
-	*(__u32 *)data = inode.i_generation;
+	*outdata = inode.i_generation;
+	*outsize = sizeof(__u32);
 	return 0;
 }
 
-static int ioctl_setversion(struct fuse4fs *ff, struct fuse4fs_file_handle *fh,
-			    void *data)
+static int ioctl_setversion(struct fuse4fs *ff, const struct fuse_ctx *ctxt,
+			    struct fuse4fs_file_handle *fh, const __u32 *indata,
+			    size_t insize)
 {
 	ext2_filsys fs = ff->fs;
 	errcode_t err;
 	struct ext2_inode_large inode;
 	int ret;
-	__u32 generation = *(__u32 *)data;
-	struct fuse_context *ctxt = fuse_get_context();
 
-	dbg_printf(ff, "%s: ino=%d\n", __func__, fh->ino);
+	if (insize < sizeof(__u32))
+		return -EFAULT;
+
+	dbg_printf(ff, "%s: ino=%d generation=%d\n", __func__, fh->ino, *indata);
 	err = fuse4fs_read_inode(fs, fh->ino, &inode);
 	if (err)
 		return translate_error(fs, fh->ino, err);
@@ -4217,7 +4229,7 @@ static int ioctl_setversion(struct fuse4fs *ff, struct fuse4fs_file_handle *fh,
 	if (fuse4fs_want_check_owner(ff, ctxt) && inode_uid(inode) != ctxt->uid)
 		return -EPERM;
 
-	inode.i_generation = generation;
+	inode.i_generation = *indata;
 
 	ret = update_ctime(fs, fh->ino, &inode);
 	if (ret)
@@ -4254,14 +4266,16 @@ static __u32 iflags_to_fsxflags(__u32 iflags)
 }
 
 static int ioctl_fsgetxattr(struct fuse4fs *ff, struct fuse4fs_file_handle *fh,
-			    void *data)
+			    struct fsxattr *fsx, size_t *outsize)
 {
 	ext2_filsys fs = ff->fs;
 	errcode_t err;
 	struct ext2_inode_large inode;
-	struct fsxattr *fsx = data;
 	unsigned int inode_size;
 
+	if (*outsize < sizeof(struct fsxattr))
+		return -EFAULT;
+
 	dbg_printf(ff, "%s: ino=%d\n", __func__, fh->ino);
 	err = fuse4fs_read_inode(fs, fh->ino, &inode);
 	if (err)
@@ -4272,6 +4286,7 @@ static int ioctl_fsgetxattr(struct fuse4fs *ff, struct fuse4fs_file_handle *fh,
 	if (ext2fs_inode_includes(inode_size, i_projid))
 		fsx->fsx_projid = inode_projid(inode);
 	fsx->fsx_xflags = iflags_to_fsxflags(inode.i_flags);
+	*outsize = sizeof(struct fsxattr);
 	return 0;
 }
 
@@ -4323,17 +4338,19 @@ static inline int set_xflags(struct ext2_inode_large *inode, __u32 xflags)
 	return 0;
 }
 
-static int ioctl_fssetxattr(struct fuse4fs *ff, struct fuse4fs_file_handle *fh,
-			    void *data)
+static int ioctl_fssetxattr(struct fuse4fs *ff, const struct fuse_ctx *ctxt,
+			    struct fuse4fs_file_handle *fh,
+			    const struct fsxattr *fsx, size_t insize)
 {
 	ext2_filsys fs = ff->fs;
 	errcode_t err;
 	struct ext2_inode_large inode;
 	int ret;
-	struct fuse_context *ctxt = fuse_get_context();
-	struct fsxattr *fsx = data;
 	unsigned int inode_size;
 
+	if (insize < sizeof(struct fsxattr))
+		return -EFAULT;
+
 	dbg_printf(ff, "%s: ino=%d\n", __func__, fh->ino);
 	err = fuse4fs_read_inode(fs, fh->ino, &inode);
 	if (err)
@@ -4364,17 +4381,24 @@ static int ioctl_fssetxattr(struct fuse4fs *ff, struct fuse4fs_file_handle *fh,
 
 #ifdef FITRIM
 static int ioctl_fitrim(struct fuse4fs *ff, struct fuse4fs_file_handle *fh,
-			void *data)
+			const struct fstrim_range *fr_in, size_t insize,
+			struct fstrim_range *fr, size_t *outsize)
 {
 	ext2_filsys fs = ff->fs;
-	struct fstrim_range *fr = data;
 	blk64_t start, end, max_blocks, b, cleared, minlen;
 	blk64_t max_blks = ext2fs_blocks_count(fs->super);
 	errcode_t err = 0;
 
+	if (insize < sizeof(struct fstrim_range))
+		return -EFAULT;
+
+	if (*outsize < sizeof(struct fstrim_range))
+		return -EFAULT;
+
 	if (!fuse4fs_is_writeable(ff))
 		return -EROFS;
 
+	memcpy(fr, fr_in, sizeof(*fr));
 	start = FUSE4FS_B_TO_FSBT(ff, fr->start);
 	if (fr->len == -1ULL)
 		end = -1ULL;
@@ -4453,6 +4477,7 @@ static int ioctl_fitrim(struct fuse4fs *ff, struct fuse4fs_file_handle *fh,
 
 out:
 	fr->len = FUSE4FS_FSB_TO_B(ff, cleared);
+	*outsize = sizeof(struct fstrim_range);
 	dbg_printf(ff, "%s: len=%llu err=%ld\n", __func__, fr->len, err);
 	return err;
 }
@@ -4462,10 +4487,10 @@ static int ioctl_fitrim(struct fuse4fs *ff, struct fuse4fs_file_handle *fh,
 # define EXT4_IOC_SHUTDOWN	_IOR('X', 125, __u32)
 #endif
 
-static int ioctl_shutdown(struct fuse4fs *ff, struct fuse4fs_file_handle *fh,
-			  void *data)
+static int ioctl_shutdown(struct fuse4fs *ff, const struct fuse_ctx *ctxt,
+			  struct fuse4fs_file_handle *fh, const void *indata,
+			  size_t insize)
 {
-	struct fuse_context *ctxt = fuse_get_context();
 	ext2_filsys fs = ff->fs;
 
 	if (!fuse4fs_is_superuser(ff, ctxt))
@@ -4485,49 +4510,61 @@ static int ioctl_shutdown(struct fuse4fs *ff, struct fuse4fs_file_handle *fh,
 	return 0;
 }
 
-static int op_ioctl(const char *path EXT2FS_ATTR((unused)),
-		    unsigned int cmd,
-		    void *arg EXT2FS_ATTR((unused)),
-		    struct fuse_file_info *fp,
-		    unsigned int flags EXT2FS_ATTR((unused)), void *data)
+static void op_ioctl(fuse_req_t req, fuse_ino_t fino EXT2FS_ATTR((unused)),
+		     unsigned int cmd,
+		     void *arg EXT2FS_ATTR((unused)),
+		     struct fuse_file_info *fp,
+		     unsigned int flags EXT2FS_ATTR((unused)),
+		     const void *indata, size_t insize,
+		     size_t outsize)
 {
-	struct fuse4fs *ff = fuse4fs_get();
+	const struct fuse_ctx *ctxt = fuse_req_ctx(req);
+	struct fuse4fs *ff = fuse4fs_get(req);
 	struct fuse4fs_file_handle *fh = fuse4fs_get_handle(fp);
+	void *outdata = NULL;
 	int ret = 0;
 
-	FUSE4FS_CHECK_CONTEXT(ff);
-	FUSE4FS_CHECK_HANDLE(ff, fh);
+	if (outsize > 0) {
+		outdata = calloc(outsize, sizeof(char));
+		if (!outdata) {
+			fuse_reply_err(req, errno);
+			return;
+		}
+	}
+
+	FUSE4FS_CHECK_CONTEXT(req);
+	FUSE4FS_CHECK_HANDLE(req, fh);
 	fuse4fs_start(ff);
 	switch ((unsigned long) cmd) {
 #ifdef SUPPORT_I_FLAGS
 	case EXT2_IOC_GETFLAGS:
-		ret = ioctl_getflags(ff, fh, data);
+		ret = ioctl_getflags(ff, fh, outdata, &outsize);
 		break;
 	case EXT2_IOC_SETFLAGS:
-		ret = ioctl_setflags(ff, fh, data);
+		ret = ioctl_setflags(ff, ctxt, fh, indata, insize);
 		break;
 	case EXT2_IOC_GETVERSION:
-		ret = ioctl_getversion(ff, fh, data);
+		ret = ioctl_getversion(ff, fh, outdata, &outsize);
 		break;
 	case EXT2_IOC_SETVERSION:
-		ret = ioctl_setversion(ff, fh, data);
+		ret = ioctl_setversion(ff, ctxt, fh, indata, insize);
 		break;
 #endif
 #ifdef FS_IOC_FSGETXATTR
 	case FS_IOC_FSGETXATTR:
-		ret = ioctl_fsgetxattr(ff, fh, data);
+		ret = ioctl_fsgetxattr(ff, fh, outdata, &outsize);
 		break;
 	case FS_IOC_FSSETXATTR:
-		ret = ioctl_fssetxattr(ff, fh, data);
+		ret = ioctl_fssetxattr(ff, ctxt, fh, indata, insize);
 		break;
 #endif
 #ifdef FITRIM
 	case FITRIM:
-		ret = ioctl_fitrim(ff, fh, data);
+		ret = ioctl_fitrim(ff, fh, indata, insize, outdata, &outsize);
 		break;
 #endif
 	case EXT4_IOC_SHUTDOWN:
-		ret = ioctl_shutdown(ff, fh, data);
+		ret = ioctl_shutdown(ff, ctxt, fh, indata, insize);
 		break;
 	default:
 		dbg_printf(ff, "%s: Unknown ioctl %d\n", __func__, cmd);
@@ -4535,28 +4572,29 @@ static int op_ioctl(const char *path EXT2FS_ATTR((unused)),
 	}
 	fuse4fs_finish(ff, ret);
 
-	return ret;
+	if (ret)
+		fuse_reply_err(req, -ret);
+	else
+		fuse_reply_ioctl(req, 0, outdata, outsize);
+	free(outdata);
 }
 
-static int op_bmap(const char *path, size_t blocksize EXT2FS_ATTR((unused)),
-		   uint64_t *idx)
+static void op_bmap(fuse_req_t req, fuse_ino_t fino,
+		    size_t blocksize EXT2FS_ATTR((unused)), uint64_t idx)
 {
-	struct fuse4fs *ff = fuse4fs_get();
+	struct fuse4fs *ff = fuse4fs_get(req);
 	ext2_filsys fs;
 	ext2_ino_t ino;
+	blk64_t blkno;
 	errcode_t err;
 	int ret = 0;
 
-	FUSE4FS_CHECK_CONTEXT(ff);
+	FUSE4FS_CHECK_CONTEXT(req);
+	FUSE4FS_CONVERT_FINO(req, &ino, fino);
 	fs = fuse4fs_start(ff);
-	err = ext2fs_namei(fs, EXT2_ROOT_INO, EXT2_ROOT_INO, path, &ino);
-	if (err) {
-		ret = translate_error(fs, 0, err);
-		goto out;
-	}
-	dbg_printf(ff, "%s: ino=%d blk=%"PRIu64"\n", __func__, ino, *idx);
+	dbg_printf(ff, "%s: ino=%d blk=%"PRIu64"\n", __func__, ino, idx);
 
-	err = ext2fs_bmap2(fs, ino, NULL, NULL, 0, *idx, 0, (blk64_t *)idx);
+	err = ext2fs_bmap2(fs, ino, NULL, NULL, 0, idx, 0, &blkno);
 	if (err) {
 		ret = translate_error(fs, ino, err);
 		goto out;
@@ -4564,7 +4602,10 @@ static int op_bmap(const char *path, size_t blocksize EXT2FS_ATTR((unused)),
 
 out:
 	fuse4fs_finish(ff, ret);
-	return ret;
+	if (ret)
+		fuse_reply_err(req, -ret);
+	else
+		fuse_reply_bmap(req, blkno);
 }
 
 #ifdef SUPPORT_FALLOCATE
@@ -4807,20 +4848,22 @@ static int fuse4fs_zero_range(struct fuse4fs *ff,
 	return ret;
 }
 
-static int op_fallocate(const char *path EXT2FS_ATTR((unused)), int mode,
-			off_t offset, off_t len,
-			struct fuse_file_info *fp)
+static void op_fallocate(fuse_req_t req, fuse_ino_t fino EXT2FS_ATTR((unused)),
+			 int mode, off_t offset, off_t len,
+			 struct fuse_file_info *fp)
 {
-	struct fuse4fs *ff = fuse4fs_get();
+	struct fuse4fs *ff = fuse4fs_get(req);
 	struct fuse4fs_file_handle *fh = fuse4fs_get_handle(fp);
 	int ret;
 
 	/* Catch unknown flags */
-	if (mode & ~(FL_ZERO_RANGE_FLAG | FL_PUNCH_HOLE_FLAG | FL_KEEP_SIZE_FLAG))
-		return -EOPNOTSUPP;
+	if (mode & ~(FL_ZERO_RANGE_FLAG | FL_PUNCH_HOLE_FLAG | FL_KEEP_SIZE_FLAG)) {
+		fuse_reply_err(req, EOPNOTSUPP);
+		return;
+	}
 
-	FUSE4FS_CHECK_CONTEXT(ff);
-	FUSE4FS_CHECK_HANDLE(ff, fh);
+	FUSE4FS_CHECK_CONTEXT(req);
+	FUSE4FS_CHECK_HANDLE(req, fh);
 	fuse4fs_start(ff);
 	if (!fuse4fs_is_writeable(ff)) {
 		ret = -EROFS;
@@ -4840,12 +4883,13 @@ static int op_fallocate(const char *path EXT2FS_ATTR((unused)), int mode,
 		ret = fuse4fs_allocate_range(ff, fh, mode, offset, len);
 out:
 	fuse4fs_finish(ff, ret);
-
-	return ret;
+	fuse_reply_err(req, -ret);
 }
 #endif /* SUPPORT_FALLOCATE */
 
-static struct fuse_operations fs_ops = {
+static struct fuse_lowlevel_ops fs_ops = {
+	.lookup = op_lookup,
+	.setattr = op_setattr,
 	.init = op_init,
 	.destroy = op_destroy,
 	.getattr = op_getattr,
@@ -4857,9 +4901,6 @@ static struct fuse_operations fs_ops = {
 	.symlink = op_symlink,
 	.rename = op_rename,
 	.link = op_link,
-	.chmod = op_chmod,
-	.chown = op_chown,
-	.truncate = op_truncate,
 	.open = op_open,
 	.read = op_read,
 	.write = op_write,
@@ -4872,11 +4913,11 @@ static struct fuse_operations fs_ops = {
 	.removexattr = op_removexattr,
 	.opendir = op_open,
 	.readdir = op_readdir,
+	.readdirplus = op_readdirplus,
 	.releasedir = op_release,
 	.fsyncdir = op_fsync,
 	.access = op_access,
 	.create = op_create,
-	.utimens = op_utimens,
 	.bmap = op_bmap,
 #ifdef SUPERFLUOUS
 	.lock = op_lock,
@@ -5025,8 +5066,8 @@ static int fuse4fs_opt_proc(void *data, const char *arg,
 	"\n",
 			outargs->argv[0]);
 		if (key == FUSE4FS_HELPFULL) {
-			fuse_opt_add_arg(outargs, "-h");
-			fuse_main(outargs->argc, outargs->argv, &fs_ops, NULL);
+			printf("FUSE options:\n");
+			fuse_cmdline_help();
 		} else {
 			fprintf(stderr, "Try --helpfull to get a list of "
 				"all flags, including the FUSE options.\n");
@@ -5036,8 +5077,7 @@ static int fuse4fs_opt_proc(void *data, const char *arg,
 	case FUSE4FS_VERSION:
 		fprintf(stderr, "fuse4fs %s (%s)\n", E2FSPROGS_VERSION,
 			E2FSPROGS_DATE);
-		fuse_opt_add_arg(outargs, "--version");
-		fuse_main(outargs->argc, outargs->argv, &fs_ops, NULL);
+		fprintf(stderr, "FUSE library version %s\n", fuse_pkgversion());
 		exit(0);
 	}
 	return 1;
@@ -5106,6 +5146,107 @@ static void fuse4fs_com_err_proc(const char *whoami, errcode_t code,
 	fflush(stderr);
 }
 
+static int fuse4fs_main(struct fuse_args *args, struct fuse4fs *ff)
+{
+	struct fuse_cmdline_opts opts;
+	struct fuse_session *se;
+	struct fuse_loop_config *loop_config = NULL;
+	int ret;
+
+	if (fuse_parse_cmdline(args, &opts) != 0) {
+		ret = 1;
+		goto out;
+	}
+
+	if (ff->debug)
+		opts.debug = true;
+
+	if (opts.show_help) {
+		fuse_cmdline_help();
+		ret = 0;
+		goto out_free_opts;
+	}
+
+	if (opts.show_version) {
+		printf("FUSE library version %s\n", fuse_pkgversion());
+		ret = 0;
+		goto out_free_opts;
+	}
+
+	if (!opts.mountpoint) {
+		fprintf(stderr, "error: no mountpoint specified\n");
+		ret = 2;
+		goto out_free_opts;
+	}
+
+	se = fuse_session_new(args, &fs_ops, sizeof(fs_ops), ff);
+	if (se == NULL) {
+		ret = 3;
+		goto out_free_opts;
+	}
+	ff->fuse = se;
+
+	if (fuse_session_mount(se, opts.mountpoint) != 0) {
+		ret = 4;
+		goto out_destroy_session;
+	}
+
+	if (fuse_daemonize(opts.foreground) != 0) {
+		ret = 5;
+		goto out_unmount;
+	}
+
+	/*
+	 * Configure logging a second time, because libfuse might have
+	 * redirected std{out,err} as part of daemonization.  If this fails,
+	 * give up and move on.
+	 */
+	fuse4fs_setup_logging(ff);
+	if (ff->logfd >= 0)
+		close(ff->logfd);
+	ff->logfd = -1;
+
+	if (fuse_set_signal_handlers(se) != 0) {
+		ret = 6;
+		goto out_unmount;
+	}
+
+	loop_config = fuse_loop_cfg_create();
+	if (loop_config == NULL) {
+		ret = 7;
+		goto out_remove_signal_handlers;
+	}
+
+	/*
+	 * Since there's a Big Kernel Lock around all the libext2fs code, we
+	 * only need to start four threads -- one to decode a request, another
+	 * to do the filesystem work, a third to transmit the reply, and a
+	 * fourth to handle fuse notifications.
+	 */
+	fuse_loop_cfg_set_clone_fd(loop_config, opts.clone_fd);
+	fuse_loop_cfg_set_idle_threads(loop_config, opts.max_idle_threads);
+	fuse_loop_cfg_set_max_threads(loop_config, 4);
+
+	if (fuse_session_loop_mt(se, loop_config) != 0) {
+		ret = 8;
+		goto out_loopcfg;
+	}
+
+out_loopcfg:
+	fuse_loop_cfg_destroy(loop_config);
+out_remove_signal_handlers:
+	fuse_remove_signal_handlers(se);
+out_unmount:
+	fuse_session_unmount(se);
+out_destroy_session:
+	ff->fuse = NULL;
+	fuse_session_destroy(se);
+out_free_opts:
+	free(opts.mountpoint);
+out:
+	return ret;
+}
+
 int main(int argc, char *argv[])
 {
 	struct fuse_args args = FUSE_ARGS_INIT(argc, argv);
@@ -5247,8 +5388,7 @@ int main(int argc, char *argv[])
 	get_random_bytes(&fctx.next_generation, sizeof(unsigned int));
 
 	/* Set up default fuse parameters */
-	snprintf(extra_args, BUFSIZ, "-okernel_cache,subtype=%s,"
-		 "fsname=%s,attr_timeout=0",
+	snprintf(extra_args, BUFSIZ, "-osubtype=%s,fsname=%s",
 		 get_subtype(argv[0]),
 		 fctx.device);
 	if (fctx.no_default_opts == 0)
@@ -5276,14 +5416,6 @@ int main(int argc, char *argv[])
  "-oallow_other,default_permissions,suid,dev");
 	}
 
-	/*
-	 * Since there's a Big Kernel Lock around all the libext2fs code, we
-	 * only need to start four threads -- one to decode a request, another
-	 * to do the filesystem work, a third to transmit the reply, and a
-	 * fourth to handle fuse notifications.
-	 */
-	fuse_opt_insert_arg(&args, 1, "-omax_threads=4");
-
 	if (fctx.debug) {
 		int	i;
 
@@ -5295,7 +5427,7 @@ int main(int argc, char *argv[])
 	}
 
 	pthread_mutex_init(&fctx.bfl, NULL);
-	ret = fuse_main(args.argc, args.argv, &fs_ops, &fctx);
+	ret = fuse4fs_main(&args, &fctx);
 	pthread_mutex_destroy(&fctx.bfl);
 
 	switch(ret) {


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 06/21] libsupport: port the kernel list.h to libsupport
  2025-09-16  0:22 ` [PATCHSET RFC v5 2/9] fuse4fs: fork a low level fuse server Darrick J. Wong
                     ` (4 preceding siblings ...)
  2025-09-16  0:51   ` [PATCH 05/21] fuse4fs: convert to low level API Darrick J. Wong
@ 2025-09-16  0:52   ` Darrick J. Wong
  2025-09-16  0:52   ` [PATCH 07/21] libsupport: add a cache Darrick J. Wong
                     ` (14 subsequent siblings)
  20 siblings, 0 replies; 87+ messages in thread
From: Darrick J. Wong @ 2025-09-16  0:52 UTC (permalink / raw)
  To: tytso
  Cc: miklos, neal, amir73il, linux-fsdevel, linux-ext4, John, bernd,
	joannelkoong

From: Darrick J. Wong <djwong@kernel.org>

In the next patch, we're going to add the xfsprogs cache manager code to
e2fsprogs.  That code is going into libsupport so that it doesn't become
part of the libext2fs ABI, and it depends on a richer set of list_head
helpers than what is in kernel-list.h, so port the Linux 6.17 list.h to
libsupport and drop the one in libext2fs.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 lib/ext2fs/jfs_compat.h  |    2 
 lib/ext2fs/kernel-list.h |  111 ------
 lib/support/list.h       |  894 ++++++++++++++++++++++++++++++++++++++++++++++
 debugfs/Makefile.in      |   12 -
 e2fsck/Makefile.in       |   56 +--
 fuse4fs/Makefile.in      |    6 
 lib/e2p/Makefile.in      |    4 
 lib/ext2fs/Makefile.in   |   14 -
 misc/Makefile.in         |   12 -
 misc/tune2fs.c           |    4 
 10 files changed, 947 insertions(+), 168 deletions(-)
 delete mode 100644 lib/ext2fs/kernel-list.h
 create mode 100644 lib/support/list.h


diff --git a/lib/ext2fs/jfs_compat.h b/lib/ext2fs/jfs_compat.h
index 30b05822b6fd4d..8e598bcfa73ef7 100644
--- a/lib/ext2fs/jfs_compat.h
+++ b/lib/ext2fs/jfs_compat.h
@@ -2,7 +2,7 @@
 #ifndef _JFS_COMPAT_H
 #define _JFS_COMPAT_H
 
-#include "kernel-list.h"
+#include "support/list.h"
 #include <errno.h>
 #ifdef HAVE_NETINET_IN_H
 #include <netinet/in.h>
diff --git a/lib/ext2fs/kernel-list.h b/lib/ext2fs/kernel-list.h
deleted file mode 100644
index dd7b8e07dd56c4..00000000000000
--- a/lib/ext2fs/kernel-list.h
+++ /dev/null
@@ -1,111 +0,0 @@
-#ifndef _LINUX_LIST_H
-#define _LINUX_LIST_H
-
-#include "compiler.h"
-
-/*
- * Simple doubly linked list implementation.
- *
- * Some of the internal functions ("__xxx") are useful when
- * manipulating whole lists rather than single entries, as
- * sometimes we already know the next/prev entries and we can
- * generate better code by using them directly rather than
- * using the generic single-entry routines.
- */
-
-struct list_head {
-	struct list_head *next, *prev;
-};
-
-#define LIST_HEAD_INIT(name) { &(name), &(name) }
-
-#define INIT_LIST_HEAD(ptr) do { \
-	(ptr)->next = (ptr); (ptr)->prev = (ptr); \
-} while (0)
-
-#if (!defined(__GNUC__) && !defined(__WATCOMC__))
-#define __inline__
-#endif
-
-/*
- * Insert a new entry between two known consecutive entries.
- *
- * This is only for internal list manipulation where we know
- * the prev/next entries already!
- */
-static __inline__ void __list_add(struct list_head * new,
-	struct list_head * prev,
-	struct list_head * next)
-{
-	next->prev = new;
-	new->next = next;
-	new->prev = prev;
-	prev->next = new;
-}
-
-/*
- * Insert a new entry after the specified head..
- */
-static __inline__ void list_add(struct list_head *new, struct list_head *head)
-{
-	__list_add(new, head, head->next);
-}
-
-/*
- * Insert a new entry at the tail
- */
-static __inline__ void list_add_tail(struct list_head *new, struct list_head *head)
-{
-	__list_add(new, head->prev, head);
-}
-
-/*
- * Delete a list entry by making the prev/next entries
- * point to each other.
- *
- * This is only for internal list manipulation where we know
- * the prev/next entries already!
- */
-static __inline__ void __list_del(struct list_head * prev,
-				  struct list_head * next)
-{
-	next->prev = prev;
-	prev->next = next;
-}
-
-static __inline__ void list_del(struct list_head *entry)
-{
-	__list_del(entry->prev, entry->next);
-}
-
-static __inline__ int list_empty(struct list_head *head)
-{
-	return head->next == head;
-}
-
-/*
- * Splice in "list" into "head"
- */
-static __inline__ void list_splice(struct list_head *list, struct list_head *head)
-{
-	struct list_head *first = list->next;
-
-	if (first != list) {
-		struct list_head *last = list->prev;
-		struct list_head *at = head->next;
-
-		first->prev = head;
-		head->next = first;
-
-		last->next = at;
-		at->prev = last;
-	}
-}
-
-#define list_entry(ptr, type, member) \
-	container_of(ptr, type, member)
-
-#define list_for_each(pos, head) \
-        for (pos = (head)->next; pos != (head); pos = pos->next)
-
-#endif
diff --git a/lib/support/list.h b/lib/support/list.h
new file mode 100644
index 00000000000000..df6c99708e4a8e
--- /dev/null
+++ b/lib/support/list.h
@@ -0,0 +1,894 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _LINUX_LIST_H
+#define _LINUX_LIST_H
+
+#include <stdbool.h>
+
+struct list_head {
+	struct list_head *next, *prev;
+};
+
+#ifdef __GNUC__
+#define container_of(ptr, type, member) ({				\
+	__typeof__( ((type *)0)->member ) *__mptr = (ptr);	\
+	(type *)( (char *)__mptr - offsetof(type,member) );})
+#else
+#define container_of(ptr, type, member)				\
+	((type *)((char *)(ptr) - offsetof(type, member)))
+#endif
+
+/*
+ * Circular doubly linked list implementation.
+ *
+ * Some of the internal functions ("__xxx") are useful when
+ * manipulating whole lists rather than single entries, as
+ * sometimes we already know the next/prev entries and we can
+ * generate better code by using them directly rather than
+ * using the generic single-entry routines.
+ */
+
+#define LIST_HEAD_INIT(name) { &(name), &(name) }
+
+#define LIST_HEAD(name) \
+	struct list_head name = LIST_HEAD_INIT(name)
+
+/**
+ * INIT_LIST_HEAD - Initialize a list_head structure
+ * @list: list_head structure to be initialized.
+ *
+ * Initializes the list_head to point to itself.  If it is a list header,
+ * the result is an empty list.
+ */
+static inline void INIT_LIST_HEAD(struct list_head *list)
+{
+	list->next = list;
+	list->prev = list;
+}
+
+#ifdef CONFIG_LIST_HARDENED
+
+#ifdef CONFIG_DEBUG_LIST
+# define __list_valid_slowpath
+#else
+# define __list_valid_slowpath __cold __preserve_most
+#endif
+
+/*
+ * Performs the full set of list corruption checks before __list_add().
+ * On list corruption reports a warning, and returns false.
+ */
+bool __list_valid_slowpath __list_add_valid_or_report(struct list_head *new,
+						      struct list_head *prev,
+						      struct list_head *next);
+
+/*
+ * Performs list corruption checks before __list_add(). Returns false if a
+ * corruption is detected, true otherwise.
+ *
+ * With CONFIG_LIST_HARDENED only, performs minimal list integrity checking
+ * inline to catch non-faulting corruptions, and only if a corruption is
+ * detected calls the reporting function __list_add_valid_or_report().
+ */
+static __always_inline bool __list_add_valid(struct list_head *new,
+					     struct list_head *prev,
+					     struct list_head *next)
+{
+	bool ret = true;
+
+	if (!IS_ENABLED(CONFIG_DEBUG_LIST)) {
+		/*
+		 * With the hardening version, elide checking if next and prev
+		 * are NULL, since the immediate dereference of them below would
+		 * result in a fault if NULL.
+		 *
+		 * With the reduced set of checks, we can afford to inline the
+		 * checks, which also gives the compiler a chance to elide some
+		 * of them completely if they can be proven at compile-time. If
+		 * one of the pre-conditions does not hold, the slow-path will
+		 * show a report which pre-condition failed.
+		 */
+		if (likely(next->prev == prev && prev->next == next && new != prev && new != next))
+			return true;
+		ret = false;
+	}
+
+	ret &= __list_add_valid_or_report(new, prev, next);
+	return ret;
+}
+
+/*
+ * Performs the full set of list corruption checks before __list_del_entry().
+ * On list corruption reports a warning, and returns false.
+ */
+bool __list_valid_slowpath __list_del_entry_valid_or_report(struct list_head *entry);
+
+/*
+ * Performs list corruption checks before __list_del_entry(). Returns false if a
+ * corruption is detected, true otherwise.
+ *
+ * With CONFIG_LIST_HARDENED only, performs minimal list integrity checking
+ * inline to catch non-faulting corruptions, and only if a corruption is
+ * detected calls the reporting function __list_del_entry_valid_or_report().
+ */
+static __always_inline bool __list_del_entry_valid(struct list_head *entry)
+{
+	bool ret = true;
+
+	if (!IS_ENABLED(CONFIG_DEBUG_LIST)) {
+		struct list_head *prev = entry->prev;
+		struct list_head *next = entry->next;
+
+		/*
+		 * With the hardening version, elide checking if next and prev
+		 * are NULL, LIST_POISON1 or LIST_POISON2, since the immediate
+		 * dereference of them below would result in a fault.
+		 */
+		if (likely(prev->next == entry && next->prev == entry))
+			return true;
+		ret = false;
+	}
+
+	ret &= __list_del_entry_valid_or_report(entry);
+	return ret;
+}
+#else
+static inline bool __list_add_valid(struct list_head *new,
+				struct list_head *prev,
+				struct list_head *next)
+{
+	return true;
+}
+static inline bool __list_del_entry_valid(struct list_head *entry)
+{
+	return true;
+}
+#endif
+
+/*
+ * Insert a new entry between two known consecutive entries.
+ *
+ * This is only for internal list manipulation where we know
+ * the prev/next entries already!
+ */
+static inline void __list_add(struct list_head *new,
+			      struct list_head *prev,
+			      struct list_head *next)
+{
+	if (!__list_add_valid(new, prev, next))
+		return;
+
+	next->prev = new;
+	new->next = next;
+	new->prev = prev;
+	prev->next = new;
+}
+
+/**
+ * list_add - add a new entry
+ * @new: new entry to be added
+ * @head: list head to add it after
+ *
+ * Insert a new entry after the specified head.
+ * This is good for implementing stacks.
+ */
+static inline void list_add(struct list_head *new, struct list_head *head)
+{
+	__list_add(new, head, head->next);
+}
+
+
+/**
+ * list_add_tail - add a new entry
+ * @new: new entry to be added
+ * @head: list head to add it before
+ *
+ * Insert a new entry before the specified head.
+ * This is useful for implementing queues.
+ */
+static inline void list_add_tail(struct list_head *new, struct list_head *head)
+{
+	__list_add(new, head->prev, head);
+}
+
+/*
+ * Delete a list entry by making the prev/next entries
+ * point to each other.
+ *
+ * This is only for internal list manipulation where we know
+ * the prev/next entries already!
+ */
+static inline void __list_del(struct list_head * prev, struct list_head * next)
+{
+	next->prev = prev;
+	prev->next = next;
+}
+
+/*
+ * Delete a list entry and clear the 'prev' pointer.
+ *
+ * This is a special-purpose list clearing method used in the networking code
+ * for lists allocated as per-cpu, where we don't want to incur the extra
+ * WRITE_ONCE() overhead of a regular list_del_init(). The code that uses this
+ * needs to check the node 'prev' pointer instead of calling list_empty().
+ */
+static inline void __list_del_clearprev(struct list_head *entry)
+{
+	__list_del(entry->prev, entry->next);
+	entry->prev = NULL;
+}
+
+static inline void __list_del_entry(struct list_head *entry)
+{
+	if (!__list_del_entry_valid(entry))
+		return;
+
+	__list_del(entry->prev, entry->next);
+}
+
+/**
+ * list_del - deletes entry from list.
+ * @entry: the element to delete from the list.
+ * Note: list_empty() on entry does not return true after this, the entry is
+ * in an undefined state.
+ */
+static inline void list_del(struct list_head *entry)
+{
+	__list_del_entry(entry);
+	entry->next = NULL;
+	entry->prev = NULL;
+}
+
+/**
+ * list_replace - replace old entry by new one
+ * @old : the element to be replaced
+ * @new : the new element to insert
+ *
+ * If @old was empty, it will be overwritten.
+ */
+static inline void list_replace(struct list_head *old,
+				struct list_head *new)
+{
+	new->next = old->next;
+	new->next->prev = new;
+	new->prev = old->prev;
+	new->prev->next = new;
+}
+
+/**
+ * list_replace_init - replace old entry by new one and initialize the old one
+ * @old : the element to be replaced
+ * @new : the new element to insert
+ *
+ * If @old was empty, it will be overwritten.
+ */
+static inline void list_replace_init(struct list_head *old,
+				     struct list_head *new)
+{
+	list_replace(old, new);
+	INIT_LIST_HEAD(old);
+}
+
+/**
+ * list_swap - replace entry1 with entry2 and re-add entry1 at entry2's position
+ * @entry1: the location to place entry2
+ * @entry2: the location to place entry1
+ */
+static inline void list_swap(struct list_head *entry1,
+			     struct list_head *entry2)
+{
+	struct list_head *pos = entry2->prev;
+
+	list_del(entry2);
+	list_replace(entry1, entry2);
+	if (pos == entry1)
+		pos = entry2;
+	list_add(entry1, pos);
+}
+
+/**
+ * list_del_init - deletes entry from list and reinitialize it.
+ * @entry: the element to delete from the list.
+ */
+static inline void list_del_init(struct list_head *entry)
+{
+	__list_del_entry(entry);
+	INIT_LIST_HEAD(entry);
+}
+
+/**
+ * list_move - delete from one list and add as another's head
+ * @list: the entry to move
+ * @head: the head that will precede our entry
+ */
+static inline void list_move(struct list_head *list, struct list_head *head)
+{
+	__list_del_entry(list);
+	list_add(list, head);
+}
+
+/**
+ * list_move_tail - delete from one list and add as another's tail
+ * @list: the entry to move
+ * @head: the head that will follow our entry
+ */
+static inline void list_move_tail(struct list_head *list,
+				  struct list_head *head)
+{
+	__list_del_entry(list);
+	list_add_tail(list, head);
+}
+
+/**
+ * list_bulk_move_tail - move a subsection of a list to its tail
+ * @head: the head that will follow our entry
+ * @first: first entry to move
+ * @last: last entry to move, can be the same as first
+ *
+ * Move all entries between @first and including @last before @head.
+ * All three entries must belong to the same linked list.
+ */
+static inline void list_bulk_move_tail(struct list_head *head,
+				       struct list_head *first,
+				       struct list_head *last)
+{
+	first->prev->next = last->next;
+	last->next->prev = first->prev;
+
+	head->prev->next = first;
+	first->prev = head->prev;
+
+	last->next = head;
+	head->prev = last;
+}
+
+/**
+ * list_is_first -- tests whether @list is the first entry in list @head
+ * @list: the entry to test
+ * @head: the head of the list
+ */
+static inline int list_is_first(const struct list_head *list, const struct list_head *head)
+{
+	return list->prev == head;
+}
+
+/**
+ * list_is_last - tests whether @list is the last entry in list @head
+ * @list: the entry to test
+ * @head: the head of the list
+ */
+static inline int list_is_last(const struct list_head *list, const struct list_head *head)
+{
+	return list->next == head;
+}
+
+/**
+ * list_is_head - tests whether @list is the list @head
+ * @list: the entry to test
+ * @head: the head of the list
+ */
+static inline int list_is_head(const struct list_head *list, const struct list_head *head)
+{
+	return list == head;
+}
+
+/**
+ * list_empty - tests whether a list is empty
+ * @head: the list to test.
+ */
+static inline int list_empty(const struct list_head *head)
+{
+	return head->next == head;
+}
+
+/**
+ * list_rotate_left - rotate the list to the left
+ * @head: the head of the list
+ */
+static inline void list_rotate_left(struct list_head *head)
+{
+	struct list_head *first;
+
+	if (!list_empty(head)) {
+		first = head->next;
+		list_move_tail(first, head);
+	}
+}
+
+/**
+ * list_rotate_to_front() - Rotate list to specific item.
+ * @list: The desired new front of the list.
+ * @head: The head of the list.
+ *
+ * Rotates list so that @list becomes the new front of the list.
+ */
+static inline void list_rotate_to_front(struct list_head *list,
+					struct list_head *head)
+{
+	/*
+	 * Deletes the list head from the list denoted by @head and
+	 * places it as the tail of @list, this effectively rotates the
+	 * list so that @list is at the front.
+	 */
+	list_move_tail(head, list);
+}
+
+/**
+ * list_is_singular - tests whether a list has just one entry.
+ * @head: the list to test.
+ */
+static inline int list_is_singular(const struct list_head *head)
+{
+	return !list_empty(head) && (head->next == head->prev);
+}
+
+static inline void __list_cut_position(struct list_head *list,
+		struct list_head *head, struct list_head *entry)
+{
+	struct list_head *new_first = entry->next;
+	list->next = head->next;
+	list->next->prev = list;
+	list->prev = entry;
+	entry->next = list;
+	head->next = new_first;
+	new_first->prev = head;
+}
+
+/**
+ * list_cut_position - cut a list into two
+ * @list: a new list to add all removed entries
+ * @head: a list with entries
+ * @entry: an entry within head, could be the head itself
+ *	and if so we won't cut the list
+ *
+ * This helper moves the initial part of @head, up to and
+ * including @entry, from @head to @list. You should
+ * pass on @entry an element you know is on @head. @list
+ * should be an empty list or a list you do not care about
+ * losing its data.
+ *
+ */
+static inline void list_cut_position(struct list_head *list,
+		struct list_head *head, struct list_head *entry)
+{
+	if (list_empty(head))
+		return;
+	if (list_is_singular(head) && !list_is_head(entry, head) && (entry != head->next))
+		return;
+	if (list_is_head(entry, head))
+		INIT_LIST_HEAD(list);
+	else
+		__list_cut_position(list, head, entry);
+}
+
+/**
+ * list_cut_before - cut a list into two, before given entry
+ * @list: a new list to add all removed entries
+ * @head: a list with entries
+ * @entry: an entry within head, could be the head itself
+ *
+ * This helper moves the initial part of @head, up to but
+ * excluding @entry, from @head to @list.  You should pass
+ * in @entry an element you know is on @head.  @list should
+ * be an empty list or a list you do not care about losing
+ * its data.
+ * If @entry == @head, all entries on @head are moved to
+ * @list.
+ */
+static inline void list_cut_before(struct list_head *list,
+				   struct list_head *head,
+				   struct list_head *entry)
+{
+	if (head->next == entry) {
+		INIT_LIST_HEAD(list);
+		return;
+	}
+	list->next = head->next;
+	list->next->prev = list;
+	list->prev = entry->prev;
+	list->prev->next = list;
+	head->next = entry;
+	entry->prev = head;
+}
+
+static inline void __list_splice(const struct list_head *list,
+				 struct list_head *prev,
+				 struct list_head *next)
+{
+	struct list_head *first = list->next;
+	struct list_head *last = list->prev;
+
+	first->prev = prev;
+	prev->next = first;
+
+	last->next = next;
+	next->prev = last;
+}
+
+/**
+ * list_splice - join two lists, this is designed for stacks
+ * @list: the new list to add.
+ * @head: the place to add it in the first list.
+ */
+static inline void list_splice(const struct list_head *list,
+				struct list_head *head)
+{
+	if (!list_empty(list))
+		__list_splice(list, head, head->next);
+}
+
+/**
+ * list_splice_tail - join two lists, each list being a queue
+ * @list: the new list to add.
+ * @head: the place to add it in the first list.
+ */
+static inline void list_splice_tail(struct list_head *list,
+				struct list_head *head)
+{
+	if (!list_empty(list))
+		__list_splice(list, head->prev, head);
+}
+
+/**
+ * list_splice_init - join two lists and reinitialise the emptied list.
+ * @list: the new list to add.
+ * @head: the place to add it in the first list.
+ *
+ * The list at @list is reinitialised
+ */
+static inline void list_splice_init(struct list_head *list,
+				    struct list_head *head)
+{
+	if (!list_empty(list)) {
+		__list_splice(list, head, head->next);
+		INIT_LIST_HEAD(list);
+	}
+}
+
+/**
+ * list_splice_tail_init - join two lists and reinitialise the emptied list
+ * @list: the new list to add.
+ * @head: the place to add it in the first list.
+ *
+ * Each of the lists is a queue.
+ * The list at @list is reinitialised
+ */
+static inline void list_splice_tail_init(struct list_head *list,
+					 struct list_head *head)
+{
+	if (!list_empty(list)) {
+		__list_splice(list, head->prev, head);
+		INIT_LIST_HEAD(list);
+	}
+}
+
+/**
+ * list_entry - get the struct for this entry
+ * @ptr:	the &struct list_head pointer.
+ * @type:	the type of the struct this is embedded in.
+ * @member:	the name of the list_head within the struct.
+ */
+#define list_entry(ptr, type, member) \
+	container_of(ptr, type, member)
+
+/**
+ * list_first_entry - get the first element from a list
+ * @ptr:	the list head to take the element from.
+ * @type:	the type of the struct this is embedded in.
+ * @member:	the name of the list_head within the struct.
+ *
+ * Note, that list is expected to be not empty.
+ */
+#define list_first_entry(ptr, type, member) \
+	list_entry((ptr)->next, type, member)
+
+/**
+ * list_last_entry - get the last element from a list
+ * @ptr:	the list head to take the element from.
+ * @type:	the type of the struct this is embedded in.
+ * @member:	the name of the list_head within the struct.
+ *
+ * Note, that list is expected to be not empty.
+ */
+#define list_last_entry(ptr, type, member) \
+	list_entry((ptr)->prev, type, member)
+
+/**
+ * list_first_entry_or_null - get the first element from a list
+ * @ptr:	the list head to take the element from.
+ * @type:	the type of the struct this is embedded in.
+ * @member:	the name of the list_head within the struct.
+ *
+ * Note that if the list is empty, it returns NULL.
+ */
+#define list_first_entry_or_null(ptr, type, member) ({ \
+	struct list_head *head__ = (ptr); \
+	struct list_head *pos__ = head__->next; \
+	pos__ != head__ ? list_entry(pos__, type, member) : NULL; \
+})
+
+/**
+ * list_next_entry - get the next element in list
+ * @pos:	the type * to cursor
+ * @member:	the name of the list_head within the struct.
+ */
+#define list_next_entry(pos, member) \
+	list_entry((pos)->member.next, typeof(*(pos)), member)
+
+/**
+ * list_next_entry_circular - get the next element in list
+ * @pos:	the type * to cursor.
+ * @head:	the list head to take the element from.
+ * @member:	the name of the list_head within the struct.
+ *
+ * Wraparound if pos is the last element (return the first element).
+ * Note, that list is expected to be not empty.
+ */
+#define list_next_entry_circular(pos, head, member) \
+	(list_is_last(&(pos)->member, head) ? \
+	list_first_entry(head, typeof(*(pos)), member) : list_next_entry(pos, member))
+
+/**
+ * list_prev_entry - get the prev element in list
+ * @pos:	the type * to cursor
+ * @member:	the name of the list_head within the struct.
+ */
+#define list_prev_entry(pos, member) \
+	list_entry((pos)->member.prev, typeof(*(pos)), member)
+
+/**
+ * list_prev_entry_circular - get the prev element in list
+ * @pos:	the type * to cursor.
+ * @head:	the list head to take the element from.
+ * @member:	the name of the list_head within the struct.
+ *
+ * Wraparound if pos is the first element (return the last element).
+ * Note, that list is expected to be not empty.
+ */
+#define list_prev_entry_circular(pos, head, member) \
+	(list_is_first(&(pos)->member, head) ? \
+	list_last_entry(head, typeof(*(pos)), member) : list_prev_entry(pos, member))
+
+/**
+ * list_for_each	-	iterate over a list
+ * @pos:	the &struct list_head to use as a loop cursor.
+ * @head:	the head for your list.
+ */
+#define list_for_each(pos, head) \
+	for (pos = (head)->next; !list_is_head(pos, (head)); pos = pos->next)
+
+/**
+ * list_for_each_rcu - Iterate over a list in an RCU-safe fashion
+ * @pos:	the &struct list_head to use as a loop cursor.
+ * @head:	the head for your list.
+ */
+#define list_for_each_rcu(pos, head)		  \
+	for (pos = rcu_dereference((head)->next); \
+	     !list_is_head(pos, (head)); \
+	     pos = rcu_dereference(pos->next))
+
+/**
+ * list_for_each_continue - continue iteration over a list
+ * @pos:	the &struct list_head to use as a loop cursor.
+ * @head:	the head for your list.
+ *
+ * Continue to iterate over a list, continuing after the current position.
+ */
+#define list_for_each_continue(pos, head) \
+	for (pos = pos->next; !list_is_head(pos, (head)); pos = pos->next)
+
+/**
+ * list_for_each_prev	-	iterate over a list backwards
+ * @pos:	the &struct list_head to use as a loop cursor.
+ * @head:	the head for your list.
+ */
+#define list_for_each_prev(pos, head) \
+	for (pos = (head)->prev; !list_is_head(pos, (head)); pos = pos->prev)
+
+/**
+ * list_for_each_safe - iterate over a list safe against removal of list entry
+ * @pos:	the &struct list_head to use as a loop cursor.
+ * @n:		another &struct list_head to use as temporary storage
+ * @head:	the head for your list.
+ */
+#define list_for_each_safe(pos, n, head) \
+	for (pos = (head)->next, n = pos->next; \
+	     !list_is_head(pos, (head)); \
+	     pos = n, n = pos->next)
+
+/**
+ * list_for_each_prev_safe - iterate over a list backwards safe against removal of list entry
+ * @pos:	the &struct list_head to use as a loop cursor.
+ * @n:		another &struct list_head to use as temporary storage
+ * @head:	the head for your list.
+ */
+#define list_for_each_prev_safe(pos, n, head) \
+	for (pos = (head)->prev, n = pos->prev; \
+	     !list_is_head(pos, (head)); \
+	     pos = n, n = pos->prev)
+
+/**
+ * list_count_nodes - count nodes in the list
+ * @head:	the head for your list.
+ */
+static inline size_t list_count_nodes(struct list_head *head)
+{
+	struct list_head *pos;
+	size_t count = 0;
+
+	list_for_each(pos, head)
+		count++;
+
+	return count;
+}
+
+/**
+ * list_entry_is_head - test if the entry points to the head of the list
+ * @pos:	the type * to cursor
+ * @head:	the head for your list.
+ * @member:	the name of the list_head within the struct.
+ */
+#define list_entry_is_head(pos, head, member)				\
+	list_is_head(&pos->member, (head))
+
+/**
+ * list_for_each_entry	-	iterate over list of given type
+ * @pos:	the type * to use as a loop cursor.
+ * @head:	the head for your list.
+ * @member:	the name of the list_head within the struct.
+ */
+#define list_for_each_entry(pos, head, member)				\
+	for (pos = list_first_entry(head, typeof(*pos), member);	\
+	     !list_entry_is_head(pos, head, member);			\
+	     pos = list_next_entry(pos, member))
+
+/**
+ * list_for_each_entry_reverse - iterate backwards over list of given type.
+ * @pos:	the type * to use as a loop cursor.
+ * @head:	the head for your list.
+ * @member:	the name of the list_head within the struct.
+ */
+#define list_for_each_entry_reverse(pos, head, member)			\
+	for (pos = list_last_entry(head, typeof(*pos), member);		\
+	     !list_entry_is_head(pos, head, member); 			\
+	     pos = list_prev_entry(pos, member))
+
+/**
+ * list_prepare_entry - prepare a pos entry for use in list_for_each_entry_continue()
+ * @pos:	the type * to use as a start point
+ * @head:	the head of the list
+ * @member:	the name of the list_head within the struct.
+ *
+ * Prepares a pos entry for use as a start point in list_for_each_entry_continue().
+ */
+#define list_prepare_entry(pos, head, member) \
+	((pos) ? : list_entry(head, typeof(*pos), member))
+
+/**
+ * list_for_each_entry_continue - continue iteration over list of given type
+ * @pos:	the type * to use as a loop cursor.
+ * @head:	the head for your list.
+ * @member:	the name of the list_head within the struct.
+ *
+ * Continue to iterate over list of given type, continuing after
+ * the current position.
+ */
+#define list_for_each_entry_continue(pos, head, member) 		\
+	for (pos = list_next_entry(pos, member);			\
+	     !list_entry_is_head(pos, head, member);			\
+	     pos = list_next_entry(pos, member))
+
+/**
+ * list_for_each_entry_continue_reverse - iterate backwards from the given point
+ * @pos:	the type * to use as a loop cursor.
+ * @head:	the head for your list.
+ * @member:	the name of the list_head within the struct.
+ *
+ * Start to iterate over list of given type backwards, continuing after
+ * the current position.
+ */
+#define list_for_each_entry_continue_reverse(pos, head, member)		\
+	for (pos = list_prev_entry(pos, member);			\
+	     !list_entry_is_head(pos, head, member);			\
+	     pos = list_prev_entry(pos, member))
+
+/**
+ * list_for_each_entry_from - iterate over list of given type from the current point
+ * @pos:	the type * to use as a loop cursor.
+ * @head:	the head for your list.
+ * @member:	the name of the list_head within the struct.
+ *
+ * Iterate over list of given type, continuing from current position.
+ */
+#define list_for_each_entry_from(pos, head, member) 			\
+	for (; !list_entry_is_head(pos, head, member);			\
+	     pos = list_next_entry(pos, member))
+
+/**
+ * list_for_each_entry_from_reverse - iterate backwards over list of given type
+ *                                    from the current point
+ * @pos:	the type * to use as a loop cursor.
+ * @head:	the head for your list.
+ * @member:	the name of the list_head within the struct.
+ *
+ * Iterate backwards over list of given type, continuing from current position.
+ */
+#define list_for_each_entry_from_reverse(pos, head, member)		\
+	for (; !list_entry_is_head(pos, head, member);			\
+	     pos = list_prev_entry(pos, member))
+
+/**
+ * list_for_each_entry_safe - iterate over list of given type safe against removal of list entry
+ * @pos:	the type * to use as a loop cursor.
+ * @n:		another type * to use as temporary storage
+ * @head:	the head for your list.
+ * @member:	the name of the list_head within the struct.
+ */
+#define list_for_each_entry_safe(pos, n, head, member)			\
+	for (pos = list_first_entry(head, typeof(*pos), member),	\
+		n = list_next_entry(pos, member);			\
+	     !list_entry_is_head(pos, head, member); 			\
+	     pos = n, n = list_next_entry(n, member))
+
+/**
+ * list_for_each_entry_safe_continue - continue list iteration safe against removal
+ * @pos:	the type * to use as a loop cursor.
+ * @n:		another type * to use as temporary storage
+ * @head:	the head for your list.
+ * @member:	the name of the list_head within the struct.
+ *
+ * Iterate over list of given type, continuing after current point,
+ * safe against removal of list entry.
+ */
+#define list_for_each_entry_safe_continue(pos, n, head, member) 		\
+	for (pos = list_next_entry(pos, member), 				\
+		n = list_next_entry(pos, member);				\
+	     !list_entry_is_head(pos, head, member);				\
+	     pos = n, n = list_next_entry(n, member))
+
+/**
+ * list_for_each_entry_safe_from - iterate over list from current point safe against removal
+ * @pos:	the type * to use as a loop cursor.
+ * @n:		another type * to use as temporary storage
+ * @head:	the head for your list.
+ * @member:	the name of the list_head within the struct.
+ *
+ * Iterate over list of given type from current point, safe against
+ * removal of list entry.
+ */
+#define list_for_each_entry_safe_from(pos, n, head, member) 			\
+	for (n = list_next_entry(pos, member);					\
+	     !list_entry_is_head(pos, head, member);				\
+	     pos = n, n = list_next_entry(n, member))
+
+/**
+ * list_for_each_entry_safe_reverse - iterate backwards over list safe against removal
+ * @pos:	the type * to use as a loop cursor.
+ * @n:		another type * to use as temporary storage
+ * @head:	the head for your list.
+ * @member:	the name of the list_head within the struct.
+ *
+ * Iterate backwards over list of given type, safe against removal
+ * of list entry.
+ */
+#define list_for_each_entry_safe_reverse(pos, n, head, member)		\
+	for (pos = list_last_entry(head, typeof(*pos), member),		\
+		n = list_prev_entry(pos, member);			\
+	     !list_entry_is_head(pos, head, member); 			\
+	     pos = n, n = list_prev_entry(n, member))
+
+/**
+ * list_safe_reset_next - reset a stale list_for_each_entry_safe loop
+ * @pos:	the loop cursor used in the list_for_each_entry_safe loop
+ * @n:		temporary storage used in list_for_each_entry_safe
+ * @member:	the name of the list_head within the struct.
+ *
+ * list_safe_reset_next is not safe to use in general if the list may be
+ * modified concurrently (eg. the lock is dropped in the loop body). An
+ * exception to this is if the cursor element (pos) is pinned in the list,
+ * and list_safe_reset_next is called after re-taking the lock and before
+ * completing the current iteration of the loop body.
+ */
+#define list_safe_reset_next(pos, n, member)				\
+	n = list_next_entry(pos, member)
+
+#endif
diff --git a/debugfs/Makefile.in b/debugfs/Makefile.in
index 689bf0c4a3c13d..700ae87418c268 100644
--- a/debugfs/Makefile.in
+++ b/debugfs/Makefile.in
@@ -195,7 +195,7 @@ debugfs.o: $(srcdir)/debugfs.c $(top_builddir)/lib/config.h \
  $(top_srcdir)/lib/support/dqblk_v2.h \
  $(top_srcdir)/lib/support/quotaio_tree.h $(top_srcdir)/version.h \
  $(srcdir)/../e2fsck/jfs_user.h $(top_srcdir)/lib/ext2fs/kernel-jbd.h \
- $(top_srcdir)/lib/ext2fs/jfs_compat.h $(top_srcdir)/lib/ext2fs/kernel-list.h \
+ $(top_srcdir)/lib/ext2fs/jfs_compat.h $(top_srcdir)/lib/support/list.h \
  $(top_srcdir)/lib/ext2fs/compiler.h $(top_srcdir)/lib/support/plausible.h
 util.o: $(srcdir)/util.c $(top_builddir)/lib/config.h \
  $(top_builddir)/lib/dirpaths.h $(top_srcdir)/lib/ss/ss.h \
@@ -287,7 +287,7 @@ logdump.o: $(srcdir)/logdump.c $(top_builddir)/lib/config.h \
  $(top_srcdir)/lib/support/dqblk_v2.h \
  $(top_srcdir)/lib/support/quotaio_tree.h $(srcdir)/../e2fsck/jfs_user.h \
  $(top_srcdir)/lib/ext2fs/kernel-jbd.h $(top_srcdir)/lib/ext2fs/jfs_compat.h \
- $(top_srcdir)/lib/ext2fs/kernel-list.h $(top_srcdir)/lib/ext2fs/compiler.h \
+ $(top_srcdir)/lib/support/list.h $(top_srcdir)/lib/ext2fs/compiler.h \
  $(top_srcdir)/lib/ext2fs/fast_commit.h
 htree.o: $(srcdir)/htree.c $(top_builddir)/lib/config.h \
  $(top_builddir)/lib/dirpaths.h $(srcdir)/debugfs.h $(top_srcdir)/lib/ss/ss.h \
@@ -408,7 +408,7 @@ journal.o: $(srcdir)/journal.c $(top_builddir)/lib/config.h \
  $(top_srcdir)/lib/ext2fs/ext2_io.h $(top_builddir)/lib/ext2fs/ext2_err.h \
  $(top_srcdir)/lib/ext2fs/ext2_ext_attr.h $(top_srcdir)/lib/ext2fs/hashmap.h \
  $(top_srcdir)/lib/ext2fs/bitops.h $(top_srcdir)/lib/ext2fs/kernel-jbd.h \
- $(top_srcdir)/lib/ext2fs/jfs_compat.h $(top_srcdir)/lib/ext2fs/kernel-list.h \
+ $(top_srcdir)/lib/ext2fs/jfs_compat.h $(top_srcdir)/lib/support/list.h \
  $(top_srcdir)/lib/ext2fs/compiler.h
 revoke.o: $(srcdir)/../e2fsck/revoke.c $(srcdir)/../e2fsck/jfs_user.h \
  $(top_builddir)/lib/config.h $(top_builddir)/lib/dirpaths.h \
@@ -418,7 +418,7 @@ revoke.o: $(srcdir)/../e2fsck/revoke.c $(srcdir)/../e2fsck/jfs_user.h \
  $(top_builddir)/lib/ext2fs/ext2_err.h \
  $(top_srcdir)/lib/ext2fs/ext2_ext_attr.h $(top_srcdir)/lib/ext2fs/hashmap.h \
  $(top_srcdir)/lib/ext2fs/bitops.h $(top_srcdir)/lib/ext2fs/kernel-jbd.h \
- $(top_srcdir)/lib/ext2fs/jfs_compat.h $(top_srcdir)/lib/ext2fs/kernel-list.h \
+ $(top_srcdir)/lib/ext2fs/jfs_compat.h $(top_srcdir)/lib/support/list.h \
  $(top_srcdir)/lib/ext2fs/compiler.h
 recovery.o: $(srcdir)/../e2fsck/recovery.c $(srcdir)/../e2fsck/jfs_user.h \
  $(top_builddir)/lib/config.h $(top_builddir)/lib/dirpaths.h \
@@ -428,7 +428,7 @@ recovery.o: $(srcdir)/../e2fsck/recovery.c $(srcdir)/../e2fsck/jfs_user.h \
  $(top_builddir)/lib/ext2fs/ext2_err.h \
  $(top_srcdir)/lib/ext2fs/ext2_ext_attr.h $(top_srcdir)/lib/ext2fs/hashmap.h \
  $(top_srcdir)/lib/ext2fs/bitops.h $(top_srcdir)/lib/ext2fs/kernel-jbd.h \
- $(top_srcdir)/lib/ext2fs/jfs_compat.h $(top_srcdir)/lib/ext2fs/kernel-list.h \
+ $(top_srcdir)/lib/ext2fs/jfs_compat.h $(top_srcdir)/lib/support/list.h \
  $(top_srcdir)/lib/ext2fs/compiler.h
 do_journal.o: $(srcdir)/do_journal.c $(top_builddir)/lib/config.h \
  $(top_builddir)/lib/dirpaths.h $(srcdir)/debugfs.h $(top_srcdir)/lib/ss/ss.h \
@@ -442,7 +442,7 @@ do_journal.o: $(srcdir)/do_journal.c $(top_builddir)/lib/config.h \
  $(top_srcdir)/lib/support/dqblk_v2.h \
  $(top_srcdir)/lib/support/quotaio_tree.h \
  $(top_srcdir)/lib/ext2fs/kernel-jbd.h $(top_srcdir)/lib/ext2fs/jfs_compat.h \
- $(top_srcdir)/lib/ext2fs/kernel-list.h $(top_srcdir)/lib/ext2fs/compiler.h \
+ $(top_srcdir)/lib/support/list.h $(top_srcdir)/lib/ext2fs/compiler.h \
  $(srcdir)/journal.h $(srcdir)/../e2fsck/jfs_user.h
 do_orphan.o: $(srcdir)/do_orphan.c $(top_builddir)/lib/config.h \
  $(top_builddir)/lib/dirpaths.h $(srcdir)/debugfs.h $(top_srcdir)/lib/ss/ss.h \
diff --git a/e2fsck/Makefile.in b/e2fsck/Makefile.in
index fbb7b156d5c759..52fad9cbfd2b23 100644
--- a/e2fsck/Makefile.in
+++ b/e2fsck/Makefile.in
@@ -282,7 +282,7 @@ e2fsck.o: $(srcdir)/e2fsck.c $(top_builddir)/lib/config.h \
  $(top_srcdir)/lib/support/dqblk_v2.h \
  $(top_srcdir)/lib/support/quotaio_tree.h \
  $(top_srcdir)/lib/ext2fs/fast_commit.h $(top_srcdir)/lib/ext2fs/jfs_compat.h \
- $(top_srcdir)/lib/ext2fs/kernel-list.h $(top_srcdir)/lib/ext2fs/compiler.h \
+ $(top_srcdir)/lib/support/list.h $(top_srcdir)/lib/ext2fs/compiler.h \
  $(srcdir)/problem.h
 super.o: $(srcdir)/super.c $(top_builddir)/lib/config.h \
  $(top_builddir)/lib/dirpaths.h $(srcdir)/e2fsck.h \
@@ -296,7 +296,7 @@ super.o: $(srcdir)/super.c $(top_builddir)/lib/config.h \
  $(top_srcdir)/lib/support/dqblk_v2.h \
  $(top_srcdir)/lib/support/quotaio_tree.h \
  $(top_srcdir)/lib/ext2fs/fast_commit.h $(top_srcdir)/lib/ext2fs/jfs_compat.h \
- $(top_srcdir)/lib/ext2fs/kernel-list.h $(top_srcdir)/lib/ext2fs/compiler.h \
+ $(top_srcdir)/lib/support/list.h $(top_srcdir)/lib/ext2fs/compiler.h \
  $(srcdir)/problem.h
 pass1.o: $(srcdir)/pass1.c $(top_builddir)/lib/config.h \
  $(top_builddir)/lib/dirpaths.h $(srcdir)/e2fsck.h \
@@ -310,7 +310,7 @@ pass1.o: $(srcdir)/pass1.c $(top_builddir)/lib/config.h \
  $(top_srcdir)/lib/support/dqblk_v2.h \
  $(top_srcdir)/lib/support/quotaio_tree.h \
  $(top_srcdir)/lib/ext2fs/fast_commit.h $(top_srcdir)/lib/ext2fs/jfs_compat.h \
- $(top_srcdir)/lib/ext2fs/kernel-list.h $(top_srcdir)/lib/ext2fs/compiler.h \
+ $(top_srcdir)/lib/support/list.h $(top_srcdir)/lib/ext2fs/compiler.h \
  $(top_srcdir)/lib/e2p/e2p.h $(srcdir)/problem.h
 pass1b.o: $(srcdir)/pass1b.c $(top_builddir)/lib/config.h \
  $(top_builddir)/lib/dirpaths.h $(top_srcdir)/lib/et/com_err.h \
@@ -324,7 +324,7 @@ pass1b.o: $(srcdir)/pass1b.c $(top_builddir)/lib/config.h \
  $(top_srcdir)/lib/support/dqblk_v2.h \
  $(top_srcdir)/lib/support/quotaio_tree.h \
  $(top_srcdir)/lib/ext2fs/fast_commit.h $(top_srcdir)/lib/ext2fs/jfs_compat.h \
- $(top_srcdir)/lib/ext2fs/kernel-list.h $(top_srcdir)/lib/ext2fs/compiler.h \
+ $(top_srcdir)/lib/support/list.h $(top_srcdir)/lib/ext2fs/compiler.h \
  $(srcdir)/problem.h $(top_srcdir)/lib/support/dict.h
 pass2.o: $(srcdir)/pass2.c $(top_builddir)/lib/config.h \
  $(top_builddir)/lib/dirpaths.h $(srcdir)/e2fsck.h \
@@ -338,7 +338,7 @@ pass2.o: $(srcdir)/pass2.c $(top_builddir)/lib/config.h \
  $(top_srcdir)/lib/support/dqblk_v2.h \
  $(top_srcdir)/lib/support/quotaio_tree.h \
  $(top_srcdir)/lib/ext2fs/fast_commit.h $(top_srcdir)/lib/ext2fs/jfs_compat.h \
- $(top_srcdir)/lib/ext2fs/kernel-list.h $(top_srcdir)/lib/ext2fs/compiler.h \
+ $(top_srcdir)/lib/support/list.h $(top_srcdir)/lib/ext2fs/compiler.h \
  $(srcdir)/problem.h $(top_srcdir)/lib/support/dict.h
 pass3.o: $(srcdir)/pass3.c $(top_builddir)/lib/config.h \
  $(top_builddir)/lib/dirpaths.h $(srcdir)/e2fsck.h \
@@ -352,7 +352,7 @@ pass3.o: $(srcdir)/pass3.c $(top_builddir)/lib/config.h \
  $(top_srcdir)/lib/support/dqblk_v2.h \
  $(top_srcdir)/lib/support/quotaio_tree.h \
  $(top_srcdir)/lib/ext2fs/fast_commit.h $(top_srcdir)/lib/ext2fs/jfs_compat.h \
- $(top_srcdir)/lib/ext2fs/kernel-list.h $(top_srcdir)/lib/ext2fs/compiler.h \
+ $(top_srcdir)/lib/support/list.h $(top_srcdir)/lib/ext2fs/compiler.h \
  $(srcdir)/problem.h
 pass4.o: $(srcdir)/pass4.c $(top_builddir)/lib/config.h \
  $(top_builddir)/lib/dirpaths.h $(srcdir)/e2fsck.h \
@@ -366,7 +366,7 @@ pass4.o: $(srcdir)/pass4.c $(top_builddir)/lib/config.h \
  $(top_srcdir)/lib/support/dqblk_v2.h \
  $(top_srcdir)/lib/support/quotaio_tree.h \
  $(top_srcdir)/lib/ext2fs/fast_commit.h $(top_srcdir)/lib/ext2fs/jfs_compat.h \
- $(top_srcdir)/lib/ext2fs/kernel-list.h $(top_srcdir)/lib/ext2fs/compiler.h \
+ $(top_srcdir)/lib/support/list.h $(top_srcdir)/lib/ext2fs/compiler.h \
  $(srcdir)/problem.h
 pass5.o: $(srcdir)/pass5.c $(top_builddir)/lib/config.h \
  $(top_builddir)/lib/dirpaths.h $(srcdir)/e2fsck.h \
@@ -380,7 +380,7 @@ pass5.o: $(srcdir)/pass5.c $(top_builddir)/lib/config.h \
  $(top_srcdir)/lib/support/dqblk_v2.h \
  $(top_srcdir)/lib/support/quotaio_tree.h \
  $(top_srcdir)/lib/ext2fs/fast_commit.h $(top_srcdir)/lib/ext2fs/jfs_compat.h \
- $(top_srcdir)/lib/ext2fs/kernel-list.h $(top_srcdir)/lib/ext2fs/compiler.h \
+ $(top_srcdir)/lib/support/list.h $(top_srcdir)/lib/ext2fs/compiler.h \
  $(srcdir)/problem.h
 journal.o: $(srcdir)/journal.c $(top_builddir)/lib/config.h \
  $(top_builddir)/lib/dirpaths.h $(srcdir)/jfs_user.h $(srcdir)/e2fsck.h \
@@ -394,7 +394,7 @@ journal.o: $(srcdir)/journal.c $(top_builddir)/lib/config.h \
  $(top_srcdir)/lib/support/dqblk_v2.h \
  $(top_srcdir)/lib/support/quotaio_tree.h \
  $(top_srcdir)/lib/ext2fs/fast_commit.h $(top_srcdir)/lib/ext2fs/jfs_compat.h \
- $(top_srcdir)/lib/ext2fs/kernel-list.h $(top_srcdir)/lib/ext2fs/compiler.h \
+ $(top_srcdir)/lib/support/list.h $(top_srcdir)/lib/ext2fs/compiler.h \
  $(top_srcdir)/lib/ext2fs/kernel-jbd.h $(srcdir)/problem.h
 recovery.o: $(srcdir)/recovery.c $(srcdir)/jfs_user.h \
  $(top_builddir)/lib/config.h $(top_builddir)/lib/dirpaths.h \
@@ -408,7 +408,7 @@ recovery.o: $(srcdir)/recovery.c $(srcdir)/jfs_user.h \
  $(top_srcdir)/lib/support/dqblk_v2.h \
  $(top_srcdir)/lib/support/quotaio_tree.h \
  $(top_srcdir)/lib/ext2fs/fast_commit.h $(top_srcdir)/lib/ext2fs/jfs_compat.h \
- $(top_srcdir)/lib/ext2fs/kernel-list.h $(top_srcdir)/lib/ext2fs/compiler.h \
+ $(top_srcdir)/lib/support/list.h $(top_srcdir)/lib/ext2fs/compiler.h \
  $(top_srcdir)/lib/ext2fs/kernel-jbd.h
 revoke.o: $(srcdir)/revoke.c $(srcdir)/jfs_user.h \
  $(top_builddir)/lib/config.h $(top_builddir)/lib/dirpaths.h \
@@ -422,7 +422,7 @@ revoke.o: $(srcdir)/revoke.c $(srcdir)/jfs_user.h \
  $(top_srcdir)/lib/support/dqblk_v2.h \
  $(top_srcdir)/lib/support/quotaio_tree.h \
  $(top_srcdir)/lib/ext2fs/fast_commit.h $(top_srcdir)/lib/ext2fs/jfs_compat.h \
- $(top_srcdir)/lib/ext2fs/kernel-list.h $(top_srcdir)/lib/ext2fs/compiler.h \
+ $(top_srcdir)/lib/support/list.h $(top_srcdir)/lib/ext2fs/compiler.h \
  $(top_srcdir)/lib/ext2fs/kernel-jbd.h
 badblocks.o: $(srcdir)/badblocks.c $(top_builddir)/lib/config.h \
  $(top_builddir)/lib/dirpaths.h $(top_srcdir)/lib/et/com_err.h \
@@ -436,7 +436,7 @@ badblocks.o: $(srcdir)/badblocks.c $(top_builddir)/lib/config.h \
  $(top_srcdir)/lib/support/dqblk_v2.h \
  $(top_srcdir)/lib/support/quotaio_tree.h \
  $(top_srcdir)/lib/ext2fs/fast_commit.h $(top_srcdir)/lib/ext2fs/jfs_compat.h \
- $(top_srcdir)/lib/ext2fs/kernel-list.h $(top_srcdir)/lib/ext2fs/compiler.h
+ $(top_srcdir)/lib/support/list.h $(top_srcdir)/lib/ext2fs/compiler.h
 util.o: $(srcdir)/util.c $(top_builddir)/lib/config.h \
  $(top_builddir)/lib/dirpaths.h $(srcdir)/e2fsck.h \
  $(top_srcdir)/lib/ext2fs/ext2_fs.h $(top_builddir)/lib/ext2fs/ext2_types.h \
@@ -449,7 +449,7 @@ util.o: $(srcdir)/util.c $(top_builddir)/lib/config.h \
  $(top_srcdir)/lib/support/dqblk_v2.h \
  $(top_srcdir)/lib/support/quotaio_tree.h \
  $(top_srcdir)/lib/ext2fs/fast_commit.h $(top_srcdir)/lib/ext2fs/jfs_compat.h \
- $(top_srcdir)/lib/ext2fs/kernel-list.h $(top_srcdir)/lib/ext2fs/compiler.h
+ $(top_srcdir)/lib/support/list.h $(top_srcdir)/lib/ext2fs/compiler.h
 unix.o: $(srcdir)/unix.c $(top_builddir)/lib/config.h \
  $(top_builddir)/lib/dirpaths.h $(top_srcdir)/lib/e2p/e2p.h \
  $(top_srcdir)/lib/ext2fs/ext2_fs.h $(top_builddir)/lib/ext2fs/ext2_types.h \
@@ -463,7 +463,7 @@ unix.o: $(srcdir)/unix.c $(top_builddir)/lib/config.h \
  $(top_srcdir)/lib/support/dqblk_v2.h \
  $(top_srcdir)/lib/support/quotaio_tree.h \
  $(top_srcdir)/lib/ext2fs/fast_commit.h $(top_srcdir)/lib/ext2fs/jfs_compat.h \
- $(top_srcdir)/lib/ext2fs/kernel-list.h $(top_srcdir)/lib/ext2fs/compiler.h \
+ $(top_srcdir)/lib/support/list.h $(top_srcdir)/lib/ext2fs/compiler.h \
  $(srcdir)/problem.h $(srcdir)/jfs_user.h \
  $(top_srcdir)/lib/ext2fs/kernel-jbd.h $(top_srcdir)/version.h
 dirinfo.o: $(srcdir)/dirinfo.c $(top_builddir)/lib/config.h \
@@ -478,7 +478,7 @@ dirinfo.o: $(srcdir)/dirinfo.c $(top_builddir)/lib/config.h \
  $(top_srcdir)/lib/support/dqblk_v2.h \
  $(top_srcdir)/lib/support/quotaio_tree.h \
  $(top_srcdir)/lib/ext2fs/fast_commit.h $(top_srcdir)/lib/ext2fs/jfs_compat.h \
- $(top_srcdir)/lib/ext2fs/kernel-list.h $(top_srcdir)/lib/ext2fs/compiler.h \
+ $(top_srcdir)/lib/support/list.h $(top_srcdir)/lib/ext2fs/compiler.h \
  $(top_srcdir)/lib/ext2fs/tdb.h
 dx_dirinfo.o: $(srcdir)/dx_dirinfo.c $(top_builddir)/lib/config.h \
  $(top_builddir)/lib/dirpaths.h $(srcdir)/e2fsck.h \
@@ -492,7 +492,7 @@ dx_dirinfo.o: $(srcdir)/dx_dirinfo.c $(top_builddir)/lib/config.h \
  $(top_srcdir)/lib/support/dqblk_v2.h \
  $(top_srcdir)/lib/support/quotaio_tree.h \
  $(top_srcdir)/lib/ext2fs/fast_commit.h $(top_srcdir)/lib/ext2fs/jfs_compat.h \
- $(top_srcdir)/lib/ext2fs/kernel-list.h $(top_srcdir)/lib/ext2fs/compiler.h
+ $(top_srcdir)/lib/support/list.h $(top_srcdir)/lib/ext2fs/compiler.h
 ehandler.o: $(srcdir)/ehandler.c $(top_builddir)/lib/config.h \
  $(top_builddir)/lib/dirpaths.h $(srcdir)/e2fsck.h \
  $(top_srcdir)/lib/ext2fs/ext2_fs.h $(top_builddir)/lib/ext2fs/ext2_types.h \
@@ -505,7 +505,7 @@ ehandler.o: $(srcdir)/ehandler.c $(top_builddir)/lib/config.h \
  $(top_srcdir)/lib/support/dqblk_v2.h \
  $(top_srcdir)/lib/support/quotaio_tree.h \
  $(top_srcdir)/lib/ext2fs/fast_commit.h $(top_srcdir)/lib/ext2fs/jfs_compat.h \
- $(top_srcdir)/lib/ext2fs/kernel-list.h $(top_srcdir)/lib/ext2fs/compiler.h
+ $(top_srcdir)/lib/support/list.h $(top_srcdir)/lib/ext2fs/compiler.h
 problem.o: $(srcdir)/problem.c $(top_builddir)/lib/config.h \
  $(top_builddir)/lib/dirpaths.h $(srcdir)/e2fsck.h \
  $(top_srcdir)/lib/ext2fs/ext2_fs.h $(top_builddir)/lib/ext2fs/ext2_types.h \
@@ -518,7 +518,7 @@ problem.o: $(srcdir)/problem.c $(top_builddir)/lib/config.h \
  $(top_srcdir)/lib/support/dqblk_v2.h \
  $(top_srcdir)/lib/support/quotaio_tree.h \
  $(top_srcdir)/lib/ext2fs/fast_commit.h $(top_srcdir)/lib/ext2fs/jfs_compat.h \
- $(top_srcdir)/lib/ext2fs/kernel-list.h $(top_srcdir)/lib/ext2fs/compiler.h \
+ $(top_srcdir)/lib/support/list.h $(top_srcdir)/lib/ext2fs/compiler.h \
  $(srcdir)/problem.h $(srcdir)/problemP.h
 message.o: $(srcdir)/message.c $(top_builddir)/lib/config.h \
  $(top_builddir)/lib/dirpaths.h $(top_srcdir)/lib/support/quotaio.h \
@@ -531,7 +531,7 @@ message.o: $(srcdir)/message.c $(top_builddir)/lib/config.h \
  $(top_srcdir)/lib/support/quotaio_tree.h $(srcdir)/e2fsck.h \
  $(top_srcdir)/lib/support/profile.h $(top_builddir)/lib/support/prof_err.h \
  $(top_srcdir)/lib/ext2fs/fast_commit.h $(top_srcdir)/lib/ext2fs/jfs_compat.h \
- $(top_srcdir)/lib/ext2fs/kernel-list.h $(top_srcdir)/lib/ext2fs/compiler.h \
+ $(top_srcdir)/lib/support/list.h $(top_srcdir)/lib/ext2fs/compiler.h \
  $(srcdir)/problem.h
 ea_refcount.o: $(srcdir)/ea_refcount.c $(top_builddir)/lib/config.h \
  $(top_builddir)/lib/dirpaths.h $(srcdir)/e2fsck.h \
@@ -545,7 +545,7 @@ ea_refcount.o: $(srcdir)/ea_refcount.c $(top_builddir)/lib/config.h \
  $(top_srcdir)/lib/support/dqblk_v2.h \
  $(top_srcdir)/lib/support/quotaio_tree.h \
  $(top_srcdir)/lib/ext2fs/fast_commit.h $(top_srcdir)/lib/ext2fs/jfs_compat.h \
- $(top_srcdir)/lib/ext2fs/kernel-list.h $(top_srcdir)/lib/ext2fs/compiler.h
+ $(top_srcdir)/lib/support/list.h $(top_srcdir)/lib/ext2fs/compiler.h
 rehash.o: $(srcdir)/rehash.c $(top_builddir)/lib/config.h \
  $(top_builddir)/lib/dirpaths.h $(srcdir)/e2fsck.h \
  $(top_srcdir)/lib/ext2fs/ext2_fs.h $(top_builddir)/lib/ext2fs/ext2_types.h \
@@ -558,7 +558,7 @@ rehash.o: $(srcdir)/rehash.c $(top_builddir)/lib/config.h \
  $(top_srcdir)/lib/support/dqblk_v2.h \
  $(top_srcdir)/lib/support/quotaio_tree.h \
  $(top_srcdir)/lib/ext2fs/fast_commit.h $(top_srcdir)/lib/ext2fs/jfs_compat.h \
- $(top_srcdir)/lib/ext2fs/kernel-list.h $(top_srcdir)/lib/ext2fs/compiler.h \
+ $(top_srcdir)/lib/support/list.h $(top_srcdir)/lib/ext2fs/compiler.h \
  $(srcdir)/problem.h $(top_srcdir)/lib/support/sort_r.h
 readahead.o: $(srcdir)/readahead.c $(top_builddir)/lib/config.h \
  $(top_builddir)/lib/dirpaths.h $(srcdir)/e2fsck.h \
@@ -572,7 +572,7 @@ readahead.o: $(srcdir)/readahead.c $(top_builddir)/lib/config.h \
  $(top_srcdir)/lib/support/dqblk_v2.h \
  $(top_srcdir)/lib/support/quotaio_tree.h \
  $(top_srcdir)/lib/ext2fs/fast_commit.h $(top_srcdir)/lib/ext2fs/jfs_compat.h \
- $(top_srcdir)/lib/ext2fs/kernel-list.h $(top_srcdir)/lib/ext2fs/compiler.h
+ $(top_srcdir)/lib/support/list.h $(top_srcdir)/lib/ext2fs/compiler.h
 region.o: $(srcdir)/region.c $(top_builddir)/lib/config.h \
  $(top_builddir)/lib/dirpaths.h $(srcdir)/e2fsck.h \
  $(top_srcdir)/lib/ext2fs/ext2_fs.h $(top_builddir)/lib/ext2fs/ext2_types.h \
@@ -585,7 +585,7 @@ region.o: $(srcdir)/region.c $(top_builddir)/lib/config.h \
  $(top_srcdir)/lib/support/dqblk_v2.h \
  $(top_srcdir)/lib/support/quotaio_tree.h \
  $(top_srcdir)/lib/ext2fs/fast_commit.h $(top_srcdir)/lib/ext2fs/jfs_compat.h \
- $(top_srcdir)/lib/ext2fs/kernel-list.h $(top_srcdir)/lib/ext2fs/compiler.h
+ $(top_srcdir)/lib/support/list.h $(top_srcdir)/lib/ext2fs/compiler.h
 sigcatcher.o: $(srcdir)/sigcatcher.c $(top_builddir)/lib/config.h \
  $(top_builddir)/lib/dirpaths.h $(srcdir)/e2fsck.h \
  $(top_srcdir)/lib/ext2fs/ext2_fs.h $(top_builddir)/lib/ext2fs/ext2_types.h \
@@ -598,7 +598,7 @@ sigcatcher.o: $(srcdir)/sigcatcher.c $(top_builddir)/lib/config.h \
  $(top_srcdir)/lib/support/dqblk_v2.h \
  $(top_srcdir)/lib/support/quotaio_tree.h \
  $(top_srcdir)/lib/ext2fs/fast_commit.h $(top_srcdir)/lib/ext2fs/jfs_compat.h \
- $(top_srcdir)/lib/ext2fs/kernel-list.h $(top_srcdir)/lib/ext2fs/compiler.h
+ $(top_srcdir)/lib/support/list.h $(top_srcdir)/lib/ext2fs/compiler.h
 logfile.o: $(srcdir)/logfile.c $(top_builddir)/lib/config.h \
  $(top_builddir)/lib/dirpaths.h $(srcdir)/e2fsck.h \
  $(top_srcdir)/lib/ext2fs/ext2_fs.h $(top_builddir)/lib/ext2fs/ext2_types.h \
@@ -611,7 +611,7 @@ logfile.o: $(srcdir)/logfile.c $(top_builddir)/lib/config.h \
  $(top_srcdir)/lib/support/dqblk_v2.h \
  $(top_srcdir)/lib/support/quotaio_tree.h \
  $(top_srcdir)/lib/ext2fs/fast_commit.h $(top_srcdir)/lib/ext2fs/jfs_compat.h \
- $(top_srcdir)/lib/ext2fs/kernel-list.h $(top_srcdir)/lib/ext2fs/compiler.h
+ $(top_srcdir)/lib/support/list.h $(top_srcdir)/lib/ext2fs/compiler.h
 quota.o: $(srcdir)/quota.c $(top_builddir)/lib/config.h \
  $(top_builddir)/lib/dirpaths.h $(srcdir)/e2fsck.h \
  $(top_srcdir)/lib/ext2fs/ext2_fs.h $(top_builddir)/lib/ext2fs/ext2_types.h \
@@ -624,7 +624,7 @@ quota.o: $(srcdir)/quota.c $(top_builddir)/lib/config.h \
  $(top_srcdir)/lib/support/dqblk_v2.h \
  $(top_srcdir)/lib/support/quotaio_tree.h \
  $(top_srcdir)/lib/ext2fs/fast_commit.h $(top_srcdir)/lib/ext2fs/jfs_compat.h \
- $(top_srcdir)/lib/ext2fs/kernel-list.h $(top_srcdir)/lib/ext2fs/compiler.h \
+ $(top_srcdir)/lib/support/list.h $(top_srcdir)/lib/ext2fs/compiler.h \
  $(srcdir)/problem.h
 extents.o: $(srcdir)/extents.c $(top_builddir)/lib/config.h \
  $(top_builddir)/lib/dirpaths.h $(srcdir)/e2fsck.h \
@@ -638,7 +638,7 @@ extents.o: $(srcdir)/extents.c $(top_builddir)/lib/config.h \
  $(top_srcdir)/lib/support/dqblk_v2.h \
  $(top_srcdir)/lib/support/quotaio_tree.h \
  $(top_srcdir)/lib/ext2fs/fast_commit.h $(top_srcdir)/lib/ext2fs/jfs_compat.h \
- $(top_srcdir)/lib/ext2fs/kernel-list.h $(top_srcdir)/lib/ext2fs/compiler.h \
+ $(top_srcdir)/lib/support/list.h $(top_srcdir)/lib/ext2fs/compiler.h \
  $(srcdir)/problem.h
 encrypted_files.o: $(srcdir)/encrypted_files.c $(top_builddir)/lib/config.h \
  $(top_builddir)/lib/dirpaths.h $(srcdir)/e2fsck.h \
@@ -652,5 +652,5 @@ encrypted_files.o: $(srcdir)/encrypted_files.c $(top_builddir)/lib/config.h \
  $(top_srcdir)/lib/support/dqblk_v2.h \
  $(top_srcdir)/lib/support/quotaio_tree.h \
  $(top_srcdir)/lib/ext2fs/fast_commit.h $(top_srcdir)/lib/ext2fs/jfs_compat.h \
- $(top_srcdir)/lib/ext2fs/kernel-list.h $(top_srcdir)/lib/ext2fs/compiler.h \
+ $(top_srcdir)/lib/support/list.h $(top_srcdir)/lib/ext2fs/compiler.h \
  $(srcdir)/problem.h $(top_srcdir)/lib/ext2fs/rbtree.h
diff --git a/fuse4fs/Makefile.in b/fuse4fs/Makefile.in
index bc137a765ee2b7..6b41d1dd5ffe8d 100644
--- a/fuse4fs/Makefile.in
+++ b/fuse4fs/Makefile.in
@@ -160,7 +160,7 @@ journal.o: $(srcdir)/../debugfs/journal.c $(top_builddir)/lib/config.h \
  $(top_srcdir)/lib/support/dqblk_v2.h \
  $(top_srcdir)/lib/support/quotaio_tree.h \
  $(top_srcdir)/lib/ext2fs/fast_commit.h $(top_srcdir)/lib/ext2fs/jfs_compat.h \
- $(top_srcdir)/lib/ext2fs/kernel-list.h $(top_srcdir)/lib/ext2fs/compiler.h \
+ $(top_srcdir)/lib/support/list.h $(top_srcdir)/lib/ext2fs/compiler.h \
  $(top_srcdir)/lib/ext2fs/kernel-jbd.h
 revoke.o: $(srcdir)/../e2fsck/revoke.c $(srcdir)/../e2fsck/jfs_user.h \
  $(top_builddir)/lib/config.h $(top_builddir)/lib/dirpaths.h \
@@ -174,7 +174,7 @@ revoke.o: $(srcdir)/../e2fsck/revoke.c $(srcdir)/../e2fsck/jfs_user.h \
  $(top_srcdir)/lib/support/dqblk_v2.h \
  $(top_srcdir)/lib/support/quotaio_tree.h \
  $(top_srcdir)/lib/ext2fs/fast_commit.h $(top_srcdir)/lib/ext2fs/jfs_compat.h \
- $(top_srcdir)/lib/ext2fs/kernel-list.h $(top_srcdir)/lib/ext2fs/compiler.h \
+ $(top_srcdir)/lib/support/list.h $(top_srcdir)/lib/ext2fs/compiler.h \
  $(top_srcdir)/lib/ext2fs/kernel-jbd.h
 recovery.o: $(srcdir)/../e2fsck/recovery.c $(srcdir)/../e2fsck/jfs_user.h \
  $(top_builddir)/lib/config.h $(top_builddir)/lib/dirpaths.h \
@@ -188,5 +188,5 @@ recovery.o: $(srcdir)/../e2fsck/recovery.c $(srcdir)/../e2fsck/jfs_user.h \
  $(top_srcdir)/lib/support/dqblk_v2.h \
  $(top_srcdir)/lib/support/quotaio_tree.h \
  $(top_srcdir)/lib/ext2fs/fast_commit.h $(top_srcdir)/lib/ext2fs/jfs_compat.h \
- $(top_srcdir)/lib/ext2fs/kernel-list.h $(top_srcdir)/lib/ext2fs/compiler.h \
+ $(top_srcdir)/lib/support/list.h $(top_srcdir)/lib/ext2fs/compiler.h \
  $(top_srcdir)/lib/ext2fs/kernel-jbd.h
diff --git a/lib/e2p/Makefile.in b/lib/e2p/Makefile.in
index 92d9c018fe46c8..f642f5ec367c93 100644
--- a/lib/e2p/Makefile.in
+++ b/lib/e2p/Makefile.in
@@ -130,7 +130,7 @@ feature.o: $(srcdir)/feature.c $(top_builddir)/lib/config.h \
  $(top_builddir)/lib/ext2fs/ext2_err.h \
  $(top_srcdir)/lib/ext2fs/ext2_ext_attr.h $(top_srcdir)/lib/ext2fs/hashmap.h \
  $(top_srcdir)/lib/ext2fs/bitops.h $(top_srcdir)/lib/ext2fs/kernel-jbd.h \
- $(top_srcdir)/lib/ext2fs/jfs_compat.h $(top_srcdir)/lib/ext2fs/kernel-list.h \
+ $(top_srcdir)/lib/ext2fs/jfs_compat.h $(top_srcdir)/lib/support/list.h \
  $(top_srcdir)/lib/ext2fs/compiler.h
 fgetflags.o: $(srcdir)/fgetflags.c $(top_builddir)/lib/config.h \
  $(top_builddir)/lib/dirpaths.h $(srcdir)/e2p.h \
@@ -173,7 +173,7 @@ ljs.o: $(srcdir)/ljs.c $(top_builddir)/lib/config.h \
  $(top_srcdir)/lib/ext2fs/ext2_ext_attr.h $(top_srcdir)/lib/ext2fs/hashmap.h \
  $(top_srcdir)/lib/ext2fs/bitops.h $(srcdir)/e2p.h \
  $(top_srcdir)/lib/ext2fs/kernel-jbd.h $(top_srcdir)/lib/ext2fs/jfs_compat.h \
- $(top_srcdir)/lib/ext2fs/kernel-list.h $(top_srcdir)/lib/ext2fs/compiler.h
+ $(top_srcdir)/lib/support/list.h $(top_srcdir)/lib/ext2fs/compiler.h
 mntopts.o: $(srcdir)/mntopts.c $(top_builddir)/lib/config.h \
  $(top_builddir)/lib/dirpaths.h $(srcdir)/e2p.h \
  $(top_srcdir)/lib/ext2fs/ext2_fs.h $(top_builddir)/lib/ext2fs/ext2_types.h
diff --git a/lib/ext2fs/Makefile.in b/lib/ext2fs/Makefile.in
index e9a6ced244ea26..1d0991defff804 100644
--- a/lib/ext2fs/Makefile.in
+++ b/lib/ext2fs/Makefile.in
@@ -1032,7 +1032,7 @@ mkjournal.o: $(srcdir)/mkjournal.c $(top_builddir)/lib/config.h \
  $(srcdir)/ext3_extents.h $(top_srcdir)/lib/et/com_err.h $(srcdir)/ext2_io.h \
  $(top_builddir)/lib/ext2fs/ext2_err.h $(srcdir)/ext2_ext_attr.h \
  $(srcdir)/hashmap.h $(srcdir)/bitops.h $(srcdir)/kernel-jbd.h \
- $(srcdir)/jfs_compat.h $(srcdir)/kernel-list.h $(srcdir)/compiler.h
+ $(srcdir)/jfs_compat.h $(srcdir)/../support/list.h $(srcdir)/compiler.h
 mmp.o: $(srcdir)/mmp.c $(top_builddir)/lib/config.h \
  $(top_builddir)/lib/dirpaths.h $(srcdir)/ext2_fs.h \
  $(top_builddir)/lib/ext2fs/ext2_types.h $(srcdir)/ext2fs.h \
@@ -1263,7 +1263,7 @@ debugfs.o: $(top_srcdir)/debugfs/debugfs.c $(top_builddir)/lib/config.h \
  $(top_srcdir)/lib/support/quotaio.h $(top_srcdir)/lib/support/dqblk_v2.h \
  $(top_srcdir)/lib/support/quotaio_tree.h $(top_srcdir)/debugfs/../version.h \
  $(srcdir)/../../e2fsck/jfs_user.h $(srcdir)/kernel-jbd.h \
- $(srcdir)/jfs_compat.h $(srcdir)/kernel-list.h $(srcdir)/compiler.h \
+ $(srcdir)/jfs_compat.h $(srcdir)/../support/list.h $(srcdir)/compiler.h \
  $(top_srcdir)/lib/support/plausible.h
 util.o: $(top_srcdir)/debugfs/util.c $(top_builddir)/lib/config.h \
  $(top_builddir)/lib/dirpaths.h $(top_srcdir)/lib/ss/ss.h \
@@ -1353,7 +1353,7 @@ logdump.o: $(top_srcdir)/debugfs/logdump.c $(top_builddir)/lib/config.h \
  $(top_srcdir)/debugfs/../misc/create_inode.h $(top_srcdir)/lib/e2p/e2p.h \
  $(top_srcdir)/lib/support/quotaio.h $(top_srcdir)/lib/support/dqblk_v2.h \
  $(top_srcdir)/lib/support/quotaio_tree.h $(srcdir)/../../e2fsck/jfs_user.h \
- $(srcdir)/kernel-jbd.h $(srcdir)/jfs_compat.h $(srcdir)/kernel-list.h \
+ $(srcdir)/kernel-jbd.h $(srcdir)/jfs_compat.h $(srcdir)/../support/list.h \
  $(srcdir)/compiler.h $(srcdir)/fast_commit.h
 htree.o: $(top_srcdir)/debugfs/htree.c $(top_builddir)/lib/config.h \
  $(top_builddir)/lib/dirpaths.h $(top_srcdir)/debugfs/debugfs.h \
@@ -1469,14 +1469,14 @@ journal.o: $(top_srcdir)/debugfs/journal.c $(top_builddir)/lib/config.h \
  $(srcdir)/ext3_extents.h $(top_srcdir)/lib/et/com_err.h $(srcdir)/ext2_io.h \
  $(top_builddir)/lib/ext2fs/ext2_err.h $(srcdir)/ext2_ext_attr.h \
  $(srcdir)/hashmap.h $(srcdir)/bitops.h $(srcdir)/kernel-jbd.h \
- $(srcdir)/jfs_compat.h $(srcdir)/kernel-list.h $(srcdir)/compiler.h
+ $(srcdir)/jfs_compat.h $(srcdir)/../support/list.h $(srcdir)/compiler.h
 revoke.o: $(top_srcdir)/e2fsck/revoke.c $(top_srcdir)/e2fsck/jfs_user.h \
  $(top_builddir)/lib/config.h $(top_builddir)/lib/dirpaths.h \
  $(srcdir)/ext2_fs.h $(top_builddir)/lib/ext2fs/ext2_types.h \
  $(srcdir)/ext2fs.h $(srcdir)/ext3_extents.h $(top_srcdir)/lib/et/com_err.h \
  $(srcdir)/ext2_io.h $(top_builddir)/lib/ext2fs/ext2_err.h \
  $(srcdir)/ext2_ext_attr.h $(srcdir)/hashmap.h $(srcdir)/bitops.h \
- $(srcdir)/kernel-jbd.h $(srcdir)/jfs_compat.h $(srcdir)/kernel-list.h \
+ $(srcdir)/kernel-jbd.h $(srcdir)/jfs_compat.h $(srcdir)/../support/list.h \
  $(srcdir)/compiler.h
 recovery.o: $(top_srcdir)/e2fsck/recovery.c $(top_srcdir)/e2fsck/jfs_user.h \
  $(top_builddir)/lib/config.h $(top_builddir)/lib/dirpaths.h \
@@ -1484,7 +1484,7 @@ recovery.o: $(top_srcdir)/e2fsck/recovery.c $(top_srcdir)/e2fsck/jfs_user.h \
  $(srcdir)/ext2fs.h $(srcdir)/ext3_extents.h $(top_srcdir)/lib/et/com_err.h \
  $(srcdir)/ext2_io.h $(top_builddir)/lib/ext2fs/ext2_err.h \
  $(srcdir)/ext2_ext_attr.h $(srcdir)/hashmap.h $(srcdir)/bitops.h \
- $(srcdir)/kernel-jbd.h $(srcdir)/jfs_compat.h $(srcdir)/kernel-list.h \
+ $(srcdir)/kernel-jbd.h $(srcdir)/jfs_compat.h $(srcdir)/../support/list.h \
  $(srcdir)/compiler.h
 do_journal.o: $(top_srcdir)/debugfs/do_journal.c $(top_builddir)/lib/config.h \
  $(top_builddir)/lib/dirpaths.h $(top_srcdir)/debugfs/debugfs.h \
@@ -1497,7 +1497,7 @@ do_journal.o: $(top_srcdir)/debugfs/do_journal.c $(top_builddir)/lib/config.h \
  $(top_srcdir)/debugfs/../misc/create_inode.h $(top_srcdir)/lib/e2p/e2p.h \
  $(top_srcdir)/lib/support/quotaio.h $(top_srcdir)/lib/support/dqblk_v2.h \
  $(top_srcdir)/lib/support/quotaio_tree.h $(srcdir)/kernel-jbd.h \
- $(srcdir)/jfs_compat.h $(srcdir)/kernel-list.h $(srcdir)/compiler.h \
+ $(srcdir)/jfs_compat.h $(srcdir)/../support/list.h $(srcdir)/compiler.h \
  $(top_srcdir)/debugfs/journal.h $(srcdir)/../../e2fsck/jfs_user.h
 do_orphan.o: $(top_srcdir)/debugfs/do_orphan.c $(top_builddir)/lib/config.h \
  $(top_builddir)/lib/dirpaths.h $(top_srcdir)/debugfs/debugfs.h \
diff --git a/misc/Makefile.in b/misc/Makefile.in
index b63a0424b19fec..ec964688acd623 100644
--- a/misc/Makefile.in
+++ b/misc/Makefile.in
@@ -736,7 +736,7 @@ tune2fs.o: $(srcdir)/tune2fs.c $(top_builddir)/lib/config.h \
  $(top_srcdir)/lib/ext2fs/ext2_io.h $(top_builddir)/lib/ext2fs/ext2_err.h \
  $(top_srcdir)/lib/ext2fs/ext2_ext_attr.h $(top_srcdir)/lib/ext2fs/hashmap.h \
  $(top_srcdir)/lib/ext2fs/bitops.h $(top_srcdir)/lib/ext2fs/kernel-jbd.h \
- $(top_srcdir)/lib/ext2fs/jfs_compat.h $(top_srcdir)/lib/ext2fs/kernel-list.h \
+ $(top_srcdir)/lib/ext2fs/jfs_compat.h $(top_srcdir)/lib/support/list.h \
  $(top_srcdir)/lib/ext2fs/compiler.h $(top_srcdir)/lib/support/plausible.h \
  $(top_srcdir)/lib/support/quotaio.h $(top_srcdir)/lib/support/dqblk_v2.h \
  $(top_srcdir)/lib/support/quotaio_tree.h $(top_srcdir)/lib/support/devname.h \
@@ -789,7 +789,7 @@ dumpe2fs.o: $(srcdir)/dumpe2fs.c $(top_builddir)/lib/config.h \
  $(top_srcdir)/lib/ext2fs/ext2_ext_attr.h $(top_srcdir)/lib/ext2fs/hashmap.h \
  $(top_srcdir)/lib/ext2fs/bitops.h $(top_srcdir)/lib/e2p/e2p.h \
  $(top_srcdir)/lib/ext2fs/kernel-jbd.h $(top_srcdir)/lib/ext2fs/jfs_compat.h \
- $(top_srcdir)/lib/ext2fs/kernel-list.h $(top_srcdir)/lib/ext2fs/compiler.h \
+ $(top_srcdir)/lib/support/list.h $(top_srcdir)/lib/ext2fs/compiler.h \
  $(top_srcdir)/lib/support/devname.h $(top_srcdir)/lib/support/nls-enable.h \
  $(top_srcdir)/lib/support/plausible.h $(top_srcdir)/version.h
 badblocks.o: $(srcdir)/badblocks.c $(top_builddir)/lib/config.h \
@@ -812,7 +812,7 @@ util.o: $(srcdir)/util.c $(top_builddir)/lib/config.h \
  $(top_builddir)/lib/ext2fs/ext2_err.h \
  $(top_srcdir)/lib/ext2fs/ext2_ext_attr.h $(top_srcdir)/lib/ext2fs/hashmap.h \
  $(top_srcdir)/lib/ext2fs/bitops.h $(top_srcdir)/lib/ext2fs/kernel-jbd.h \
- $(top_srcdir)/lib/ext2fs/jfs_compat.h $(top_srcdir)/lib/ext2fs/kernel-list.h \
+ $(top_srcdir)/lib/ext2fs/jfs_compat.h $(top_srcdir)/lib/support/list.h \
  $(top_srcdir)/lib/ext2fs/compiler.h $(top_srcdir)/lib/support/nls-enable.h \
  $(top_srcdir)/lib/support/devname.h $(srcdir)/util.h
 uuidgen.o: $(srcdir)/uuidgen.c $(top_builddir)/lib/config.h \
@@ -907,7 +907,7 @@ journal.o: $(srcdir)/../debugfs/journal.c $(top_builddir)/lib/config.h \
  $(top_srcdir)/lib/support/dqblk_v2.h \
  $(top_srcdir)/lib/support/quotaio_tree.h \
  $(top_srcdir)/lib/ext2fs/fast_commit.h $(top_srcdir)/lib/ext2fs/jfs_compat.h \
- $(top_srcdir)/lib/ext2fs/kernel-list.h $(top_srcdir)/lib/ext2fs/compiler.h \
+ $(top_srcdir)/lib/support/list.h $(top_srcdir)/lib/ext2fs/compiler.h \
  $(top_srcdir)/lib/ext2fs/kernel-jbd.h
 revoke.o: $(srcdir)/../e2fsck/revoke.c $(srcdir)/../e2fsck/jfs_user.h \
  $(top_builddir)/lib/config.h $(top_builddir)/lib/dirpaths.h \
@@ -921,7 +921,7 @@ revoke.o: $(srcdir)/../e2fsck/revoke.c $(srcdir)/../e2fsck/jfs_user.h \
  $(top_srcdir)/lib/support/dqblk_v2.h \
  $(top_srcdir)/lib/support/quotaio_tree.h \
  $(top_srcdir)/lib/ext2fs/fast_commit.h $(top_srcdir)/lib/ext2fs/jfs_compat.h \
- $(top_srcdir)/lib/ext2fs/kernel-list.h $(top_srcdir)/lib/ext2fs/compiler.h \
+ $(top_srcdir)/lib/support/list.h $(top_srcdir)/lib/ext2fs/compiler.h \
  $(top_srcdir)/lib/ext2fs/kernel-jbd.h
 recovery.o: $(srcdir)/../e2fsck/recovery.c $(srcdir)/../e2fsck/jfs_user.h \
  $(top_builddir)/lib/config.h $(top_builddir)/lib/dirpaths.h \
@@ -935,5 +935,5 @@ recovery.o: $(srcdir)/../e2fsck/recovery.c $(srcdir)/../e2fsck/jfs_user.h \
  $(top_srcdir)/lib/support/dqblk_v2.h \
  $(top_srcdir)/lib/support/quotaio_tree.h \
  $(top_srcdir)/lib/ext2fs/fast_commit.h $(top_srcdir)/lib/ext2fs/jfs_compat.h \
- $(top_srcdir)/lib/ext2fs/kernel-list.h $(top_srcdir)/lib/ext2fs/compiler.h \
+ $(top_srcdir)/lib/support/list.h $(top_srcdir)/lib/ext2fs/compiler.h \
  $(top_srcdir)/lib/ext2fs/kernel-jbd.h
diff --git a/misc/tune2fs.c b/misc/tune2fs.c
index 3db57632c88d42..ac440176351e83 100644
--- a/misc/tune2fs.c
+++ b/misc/tune2fs.c
@@ -2857,10 +2857,6 @@ static int expand_inode_table(ext2_filsys fs, unsigned long new_ino_size)
 }
 
 
-#define list_for_each_safe(pos, pnext, head) \
-	for (pos = (head)->next, pnext = pos->next; pos != (head); \
-	     pos = pnext, pnext = pos->next)
-
 static void free_blk_move_list(void)
 {
 	struct list_head *entry, *tmp;


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 07/21] libsupport: add a cache
  2025-09-16  0:22 ` [PATCHSET RFC v5 2/9] fuse4fs: fork a low level fuse server Darrick J. Wong
                     ` (5 preceding siblings ...)
  2025-09-16  0:52   ` [PATCH 06/21] libsupport: port the kernel list.h to libsupport Darrick J. Wong
@ 2025-09-16  0:52   ` Darrick J. Wong
  2025-09-16  0:52   ` [PATCH 08/21] cache: disable debugging Darrick J. Wong
                     ` (13 subsequent siblings)
  20 siblings, 0 replies; 87+ messages in thread
From: Darrick J. Wong @ 2025-09-16  0:52 UTC (permalink / raw)
  To: tytso
  Cc: miklos, neal, amir73il, linux-fsdevel, linux-ext4, John, bernd,
	joannelkoong

From: Darrick J. Wong <djwong@kernel.org>

Reuse the cache code from xfsprogs.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 lib/support/cache.h     |  139 +++++++++
 lib/support/list.h      |    7 
 lib/support/xbitops.h   |  128 ++++++++
 lib/support/Makefile.in |    8 -
 lib/support/cache.c     |  739 +++++++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 1019 insertions(+), 2 deletions(-)
 create mode 100644 lib/support/cache.h
 create mode 100644 lib/support/xbitops.h
 create mode 100644 lib/support/cache.c


diff --git a/lib/support/cache.h b/lib/support/cache.h
new file mode 100644
index 00000000000000..16b17a9b7a1a51
--- /dev/null
+++ b/lib/support/cache.h
@@ -0,0 +1,139 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2006 Silicon Graphics, Inc.
+ * All Rights Reserved.
+ */
+#ifndef __CACHE_H__
+#define __CACHE_H__
+
+/*
+ * initialisation flags
+ */
+/*
+ * xfs_db always writes changes immediately, and so we need to purge buffers
+ * when we get a buffer lookup mismatch due to reading the same block with a
+ * different buffer configuration.
+ */
+#define CACHE_MISCOMPARE_PURGE	(1 << 0)
+
+/*
+ * cache object campare return values
+ */
+enum {
+	CACHE_HIT,
+	CACHE_MISS,
+	CACHE_PURGE,
+};
+
+#define	HASH_CACHE_RATIO	8
+
+/*
+ * Cache priorities range from BASE to MAX.
+ *
+ * For prefetch support, the top half of the range starts at
+ * CACHE_PREFETCH_PRIORITY and everytime the buffer is fetched and is at or
+ * above this priority level, it is reduced to below this level (refer to
+ * libxfs_buf_get).
+ *
+ * If we have dirty nodes, we can't recycle them until they've been cleaned. To
+ * keep these out of the reclaimable lists (as there can be lots of them) give
+ * them their own priority that the shaker doesn't attempt to walk.
+ */
+
+#define CACHE_BASE_PRIORITY	0
+#define CACHE_PREFETCH_PRIORITY	8
+#define CACHE_MAX_PRIORITY	15
+#define CACHE_DIRTY_PRIORITY	(CACHE_MAX_PRIORITY + 1)
+#define CACHE_NR_PRIORITIES	CACHE_DIRTY_PRIORITY
+
+/*
+ * Simple, generic implementation of a cache (arbitrary data).
+ * Provides a hash table with a capped number of cache entries.
+ */
+
+struct cache;
+struct cache_node;
+
+typedef void *cache_key_t;
+
+typedef void (*cache_walk_t)(struct cache_node *);
+typedef struct cache_node * (*cache_node_alloc_t)(cache_key_t);
+typedef int (*cache_node_flush_t)(struct cache_node *);
+typedef void (*cache_node_relse_t)(struct cache_node *);
+typedef unsigned int (*cache_node_hash_t)(cache_key_t, unsigned int,
+					  unsigned int);
+typedef int (*cache_node_compare_t)(struct cache_node *, cache_key_t);
+typedef unsigned int (*cache_bulk_relse_t)(struct cache *, struct list_head *);
+typedef int (*cache_node_get_t)(struct cache_node *);
+typedef void (*cache_node_put_t)(struct cache_node *);
+
+struct cache_operations {
+	cache_node_hash_t	hash;
+	cache_node_alloc_t	alloc;
+	cache_node_flush_t	flush;
+	cache_node_relse_t	relse;
+	cache_node_compare_t	compare;
+	cache_bulk_relse_t	bulkrelse;	/* optional */
+	cache_node_get_t	get;		/* optional */
+	cache_node_put_t	put;		/* optional */
+};
+
+struct cache_hash {
+	struct list_head	ch_list;	/* hash chain head */
+	unsigned int		ch_count;	/* hash chain length */
+	pthread_mutex_t		ch_mutex;	/* hash chain mutex */
+};
+
+struct cache_mru {
+	struct list_head	cm_list;	/* MRU head */
+	unsigned int		cm_count;	/* MRU length */
+	pthread_mutex_t		cm_mutex;	/* MRU lock */
+};
+
+struct cache_node {
+	struct list_head	cn_hash;	/* hash chain */
+	struct list_head	cn_mru;		/* MRU chain */
+	unsigned int		cn_count;	/* reference count */
+	unsigned int		cn_hashidx;	/* hash chain index */
+	int			cn_priority;	/* priority, -1 = free list */
+	int			cn_old_priority;/* saved pre-dirty prio */
+	pthread_mutex_t		cn_mutex;	/* node mutex */
+};
+
+struct cache {
+	int			c_flags;	/* behavioural flags */
+	unsigned int		c_maxcount;	/* max cache nodes */
+	unsigned int		c_count;	/* count of nodes */
+	pthread_mutex_t		c_mutex;	/* node count mutex */
+	cache_node_hash_t	hash;		/* node hash function */
+	cache_node_alloc_t	alloc;		/* allocation function */
+	cache_node_flush_t	flush;		/* flush dirty data function */
+	cache_node_relse_t	relse;		/* memory free function */
+	cache_node_compare_t	compare;	/* comparison routine */
+	cache_bulk_relse_t	bulkrelse;	/* bulk release routine */
+	cache_node_get_t	get;		/* prepare cache node after get */
+	cache_node_put_t	put;		/* prepare to put cache node */
+	unsigned int		c_hashsize;	/* hash bucket count */
+	unsigned int		c_hashshift;	/* hash key shift */
+	struct cache_hash	*c_hash;	/* hash table buckets */
+	struct cache_mru	c_mrus[CACHE_DIRTY_PRIORITY + 1];
+	unsigned long long	c_misses;	/* cache misses */
+	unsigned long long	c_hits;		/* cache hits */
+	unsigned int 		c_max;		/* max nodes ever used */
+};
+
+struct cache *cache_init(int, unsigned int, const struct cache_operations *);
+void cache_destroy(struct cache *);
+void cache_walk(struct cache *, cache_walk_t);
+void cache_purge(struct cache *);
+void cache_flush(struct cache *);
+
+int cache_node_get(struct cache *, cache_key_t, struct cache_node **);
+void cache_node_put(struct cache *, struct cache_node *);
+void cache_node_set_priority(struct cache *, struct cache_node *, int);
+int cache_node_get_priority(struct cache_node *);
+int cache_node_purge(struct cache *, cache_key_t, struct cache_node *);
+void cache_report(FILE *fp, const char *, struct cache *);
+int cache_overflowed(struct cache *);
+
+#endif	/* __CACHE_H__ */
diff --git a/lib/support/list.h b/lib/support/list.h
index df6c99708e4a8e..0e00e446dd7214 100644
--- a/lib/support/list.h
+++ b/lib/support/list.h
@@ -17,6 +17,13 @@ struct list_head {
 	((type *)((char *)(ptr) - offsetof(type, member)))
 #endif
 
+static inline void list_head_destroy(struct list_head *list)
+{
+	list->next = list->prev = NULL;
+}
+
+#define list_head_init(list) INIT_LIST_HEAD(list)
+
 /*
  * Circular doubly linked list implementation.
  *
diff --git a/lib/support/xbitops.h b/lib/support/xbitops.h
new file mode 100644
index 00000000000000..78a8f2a8545f4c
--- /dev/null
+++ b/lib/support/xbitops.h
@@ -0,0 +1,128 @@
+// SPDX-License-Identifier: GPL-2.0
+#ifndef __BITOPS_H__
+#define __BITOPS_H__
+
+/*
+ * fls: find last bit set.
+ */
+
+static inline int fls(int x)
+{
+	int r = 32;
+
+	if (!x)
+		return 0;
+	if (!(x & 0xffff0000u)) {
+		x = (x & 0xffffu) << 16;
+		r -= 16;
+	}
+	if (!(x & 0xff000000u)) {
+		x = (x & 0xffffffu) << 8;
+		r -= 8;
+	}
+	if (!(x & 0xf0000000u)) {
+		x = (x & 0xfffffffu) << 4;
+		r -= 4;
+	}
+	if (!(x & 0xc0000000u)) {
+		x = (x & 0x3fffffffu) << 2;
+		r -= 2;
+	}
+	if (!(x & 0x80000000u)) {
+		r -= 1;
+	}
+	return r;
+}
+
+static inline int fls64(uint64_t x)
+{
+	uint32_t h = x >> 32;
+	if (h)
+		return fls(h) + 32;
+	return fls(x);
+}
+
+static inline unsigned fls_long(unsigned long l)
+{
+        if (sizeof(l) == 4)
+                return fls(l);
+        return fls64(l);
+}
+
+/*
+ * ffz: find first zero bit.
+ * Result is undefined if no zero bit exists.
+ */
+#define ffz(x)	ffs(~(x))
+
+/*
+ * XFS bit manipulation routines.  Repeated here so that some programs
+ * don't have to link in all of libxfs just to have bit manipulation.
+ */
+
+/*
+ * masks with n high/low bits set, 64-bit values
+ */
+static inline uint64_t mask64hi(int n)
+{
+	return (uint64_t)-1 << (64 - (n));
+}
+static inline uint32_t mask32lo(int n)
+{
+	return ((uint32_t)1 << (n)) - 1;
+}
+static inline uint64_t mask64lo(int n)
+{
+	return ((uint64_t)1 << (n)) - 1;
+}
+
+/* Get high bit set out of 32-bit argument, -1 if none set */
+static inline int highbit32(uint32_t v)
+{
+	return fls(v) - 1;
+}
+
+/* Get high bit set out of 64-bit argument, -1 if none set */
+static inline int highbit64(uint64_t v)
+{
+	return fls64(v) - 1;
+}
+
+/* Get low bit set out of 32-bit argument, -1 if none set */
+static inline int lowbit32(uint32_t v)
+{
+	return ffs(v) - 1;
+}
+
+/* Get low bit set out of 64-bit argument, -1 if none set */
+static inline int lowbit64(uint64_t v)
+{
+	uint32_t	w = (uint32_t)v;
+	int		n = 0;
+
+	if (w) {	/* lower bits */
+		n = ffs(w);
+	} else {	/* upper bits */
+		w = (uint32_t)(v >> 32);
+		if (w) {
+			n = ffs(w);
+			if (n)
+				n += 32;
+		}
+	}
+	return n - 1;
+}
+
+/**
+ * __rounddown_pow_of_two() - round down to nearest power of two
+ * @n: value to round down
+ */
+static inline __attribute__((const))
+unsigned long __rounddown_pow_of_two(unsigned long n)
+{
+	return 1UL << (fls_long(n) - 1);
+}
+
+#define rounddown_pow_of_two(n) __rounddown_pow_of_two(n)
+
+#endif
diff --git a/lib/support/Makefile.in b/lib/support/Makefile.in
index 3f26cd30172f51..13d6f06f150afd 100644
--- a/lib/support/Makefile.in
+++ b/lib/support/Makefile.in
@@ -25,7 +25,8 @@ OBJS=		cstring.o \
 		quotaio_v2.o \
 		quotaio_tree.o \
 		dict.o \
-		devname.o
+		devname.o \
+		cache.o
 
 SRCS=		$(srcdir)/argv_parse.c \
 		$(srcdir)/cstring.c \
@@ -40,7 +41,8 @@ SRCS=		$(srcdir)/argv_parse.c \
 		$(srcdir)/quotaio_tree.c \
 		$(srcdir)/quotaio_v2.c \
 		$(srcdir)/dict.c \
-		$(srcdir)/devname.c
+		$(srcdir)/devname.c \
+		$(srcdir)/cache.c
 
 LIBRARY= libsupport
 LIBDIR= support
@@ -183,3 +185,5 @@ dict.o: $(srcdir)/dict.c $(top_builddir)/lib/config.h \
  $(top_builddir)/lib/dirpaths.h $(srcdir)/dict.h
 devname.o: $(srcdir)/devname.c $(top_builddir)/lib/config.h \
  $(top_builddir)/lib/dirpaths.h $(srcdir)/devname.h $(srcdir)/nls-enable.h
+cache.o: $(srcdir)/cache.c $(top_builddir)/lib/config.h \
+ $(srcdir)/cache.h $(srcdir)/list.h $(srcdir)/xbitops.h
diff --git a/lib/support/cache.c b/lib/support/cache.c
new file mode 100644
index 00000000000000..fe04f62f262aaa
--- /dev/null
+++ b/lib/support/cache.c
@@ -0,0 +1,739 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2006 Silicon Graphics, Inc.
+ * All Rights Reserved.
+ */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <unistd.h>
+#include <pthread.h>
+#include <stdbool.h>
+#include <stddef.h>
+#include <stdint.h>
+
+#include "list.h"
+#include "cache.h"
+#include "xbitops.h"
+
+#define CACHE_DEBUG 1
+#undef CACHE_DEBUG
+#define CACHE_DEBUG 1
+#undef CACHE_ABORT
+/* #define CACHE_ABORT 1 */
+
+#define CACHE_SHAKE_COUNT	64
+
+#ifdef CACHE_DEBUG
+# include <assert.h>
+# define ASSERT(x)		assert(x)
+#endif
+
+static unsigned int cache_generic_bulkrelse(struct cache *, struct list_head *);
+
+struct cache *
+cache_init(
+	int			flags,
+	unsigned int		hashsize,
+	const struct cache_operations	*cache_operations)
+{
+	struct cache *		cache;
+	unsigned int		i, maxcount;
+
+	maxcount = hashsize * HASH_CACHE_RATIO;
+
+	if (!(cache = malloc(sizeof(struct cache))))
+		return NULL;
+	if (!(cache->c_hash = calloc(hashsize, sizeof(struct cache_hash)))) {
+		free(cache);
+		return NULL;
+	}
+
+	cache->c_flags = flags;
+	cache->c_count = 0;
+	cache->c_max = 0;
+	cache->c_hits = 0;
+	cache->c_misses = 0;
+	cache->c_maxcount = maxcount;
+	cache->c_hashsize = hashsize;
+	cache->c_hashshift = fls(hashsize) - 1;
+	cache->hash = cache_operations->hash;
+	cache->alloc = cache_operations->alloc;
+	cache->flush = cache_operations->flush;
+	cache->relse = cache_operations->relse;
+	cache->compare = cache_operations->compare;
+	cache->bulkrelse = cache_operations->bulkrelse ?
+		cache_operations->bulkrelse : cache_generic_bulkrelse;
+	cache->get = cache_operations->get;
+	cache->put = cache_operations->put;
+	pthread_mutex_init(&cache->c_mutex, NULL);
+
+	for (i = 0; i < hashsize; i++) {
+		list_head_init(&cache->c_hash[i].ch_list);
+		cache->c_hash[i].ch_count = 0;
+		pthread_mutex_init(&cache->c_hash[i].ch_mutex, NULL);
+	}
+
+	for (i = 0; i <= CACHE_DIRTY_PRIORITY; i++) {
+		list_head_init(&cache->c_mrus[i].cm_list);
+		cache->c_mrus[i].cm_count = 0;
+		pthread_mutex_init(&cache->c_mrus[i].cm_mutex, NULL);
+	}
+	return cache;
+}
+
+static void
+cache_expand(
+	struct cache *		cache)
+{
+	pthread_mutex_lock(&cache->c_mutex);
+#ifdef CACHE_DEBUG
+	fprintf(stderr, "doubling cache size to %d\n", 2 * cache->c_maxcount);
+#endif
+	cache->c_maxcount *= 2;
+	pthread_mutex_unlock(&cache->c_mutex);
+}
+
+void
+cache_walk(
+	struct cache *		cache,
+	cache_walk_t		visit)
+{
+	struct cache_hash *	hash;
+	struct list_head *	head;
+	struct list_head *	pos;
+	unsigned int		i;
+
+	for (i = 0; i < cache->c_hashsize; i++) {
+		hash = &cache->c_hash[i];
+		head = &hash->ch_list;
+		pthread_mutex_lock(&hash->ch_mutex);
+		for (pos = head->next; pos != head; pos = pos->next)
+			visit((struct cache_node *)pos);
+		pthread_mutex_unlock(&hash->ch_mutex);
+	}
+}
+
+#ifdef CACHE_ABORT
+#define cache_abort()	abort()
+#else
+#define cache_abort()	do { } while (0)
+#endif
+
+#ifdef CACHE_DEBUG
+static void
+cache_zero_check(
+	struct cache_node *	node)
+{
+	if (node->cn_count > 0) {
+		fprintf(stderr, "%s: refcount is %u, not zero (node=%p)\n",
+			__FUNCTION__, node->cn_count, node);
+		cache_abort();
+	}
+}
+#define cache_destroy_check(c)	cache_walk((c), cache_zero_check)
+#else
+#define cache_destroy_check(c)	do { } while (0)
+#endif
+
+void
+cache_destroy(
+	struct cache *		cache)
+{
+	unsigned int		i;
+
+	cache_destroy_check(cache);
+	for (i = 0; i < cache->c_hashsize; i++) {
+		list_head_destroy(&cache->c_hash[i].ch_list);
+		pthread_mutex_destroy(&cache->c_hash[i].ch_mutex);
+	}
+	for (i = 0; i <= CACHE_DIRTY_PRIORITY; i++) {
+		list_head_destroy(&cache->c_mrus[i].cm_list);
+		pthread_mutex_destroy(&cache->c_mrus[i].cm_mutex);
+	}
+	pthread_mutex_destroy(&cache->c_mutex);
+	free(cache->c_hash);
+	free(cache);
+}
+
+static unsigned int
+cache_generic_bulkrelse(
+	struct cache *		cache,
+	struct list_head *	list)
+{
+	struct cache_node *	node;
+	unsigned int		count = 0;
+
+	while (!list_empty(list)) {
+		node = list_entry(list->next, struct cache_node, cn_mru);
+		pthread_mutex_destroy(&node->cn_mutex);
+		list_del_init(&node->cn_mru);
+		cache->relse(node);
+		count++;
+	}
+
+	return count;
+}
+
+/*
+ * Park unflushable nodes on their own special MRU so that cache_shake() doesn't
+ * end up repeatedly scanning them in the futile attempt to clean them before
+ * reclaim.
+ */
+static void
+cache_add_to_dirty_mru(
+	struct cache		*cache,
+	struct cache_node	*node)
+{
+	struct cache_mru	*mru = &cache->c_mrus[CACHE_DIRTY_PRIORITY];
+
+	pthread_mutex_lock(&mru->cm_mutex);
+	node->cn_old_priority = node->cn_priority;
+	node->cn_priority = CACHE_DIRTY_PRIORITY;
+	list_add(&node->cn_mru, &mru->cm_list);
+	mru->cm_count++;
+	pthread_mutex_unlock(&mru->cm_mutex);
+}
+
+/*
+ * We've hit the limit on cache size, so we need to start reclaiming nodes we've
+ * used. The MRU specified by the priority is shaken.  Returns new priority at
+ * end of the call (in case we call again). We are not allowed to reclaim dirty
+ * objects, so we have to flush them first. If flushing fails, we move them to
+ * the "dirty, unreclaimable" list.
+ *
+ * Hence we skip priorities > CACHE_MAX_PRIORITY unless "purge" is set as we
+ * park unflushable (and hence unreclaimable) buffers at these priorities.
+ * Trying to shake unreclaimable buffer lists when there is memory pressure is a
+ * waste of time and CPU and greatly slows down cache node recycling operations.
+ * Hence we only try to free them if we are being asked to purge the cache of
+ * all entries.
+ */
+static unsigned int
+cache_shake(
+	struct cache *		cache,
+	unsigned int		priority,
+	bool			purge)
+{
+	struct cache_mru	*mru;
+	struct cache_hash *	hash;
+	struct list_head	temp;
+	struct list_head *	head;
+	struct list_head *	pos;
+	struct list_head *	n;
+	struct cache_node *	node;
+	unsigned int		count;
+
+	ASSERT(priority <= CACHE_DIRTY_PRIORITY);
+	if (priority > CACHE_MAX_PRIORITY && !purge)
+		priority = 0;
+
+	mru = &cache->c_mrus[priority];
+	count = 0;
+	list_head_init(&temp);
+	head = &mru->cm_list;
+
+	pthread_mutex_lock(&mru->cm_mutex);
+	for (pos = head->prev, n = pos->prev; pos != head;
+						pos = n, n = pos->prev) {
+		node = list_entry(pos, struct cache_node, cn_mru);
+
+		if (pthread_mutex_trylock(&node->cn_mutex) != 0)
+			continue;
+
+		/* memory pressure is not allowed to release dirty objects */
+		if (cache->flush(node) && !purge) {
+			list_del(&node->cn_mru);
+			mru->cm_count--;
+			node->cn_priority = -1;
+			pthread_mutex_unlock(&node->cn_mutex);
+			cache_add_to_dirty_mru(cache, node);
+			continue;
+		}
+
+		hash = cache->c_hash + node->cn_hashidx;
+		if (pthread_mutex_trylock(&hash->ch_mutex) != 0) {
+			pthread_mutex_unlock(&node->cn_mutex);
+			continue;
+		}
+		ASSERT(node->cn_count == 0);
+		ASSERT(node->cn_priority == priority);
+		node->cn_priority = -1;
+
+		list_move(&node->cn_mru, &temp);
+		list_del_init(&node->cn_hash);
+		hash->ch_count--;
+		mru->cm_count--;
+		pthread_mutex_unlock(&hash->ch_mutex);
+		pthread_mutex_unlock(&node->cn_mutex);
+
+		count++;
+		if (!purge && count == CACHE_SHAKE_COUNT)
+			break;
+	}
+	pthread_mutex_unlock(&mru->cm_mutex);
+
+	if (count > 0) {
+		cache->bulkrelse(cache, &temp);
+
+		pthread_mutex_lock(&cache->c_mutex);
+		cache->c_count -= count;
+		pthread_mutex_unlock(&cache->c_mutex);
+	}
+
+	return (count == CACHE_SHAKE_COUNT) ? priority : ++priority;
+}
+
+/*
+ * Allocate a new hash node (updating atomic counter in the process),
+ * unless doing so will push us over the maximum cache size.
+ */
+static struct cache_node *
+cache_node_allocate(
+	struct cache *		cache,
+	cache_key_t		key)
+{
+	unsigned int		nodesfree;
+	struct cache_node *	node;
+
+	pthread_mutex_lock(&cache->c_mutex);
+	nodesfree = (cache->c_count < cache->c_maxcount);
+	if (nodesfree) {
+		cache->c_count++;
+		if (cache->c_count > cache->c_max)
+			cache->c_max = cache->c_count;
+	}
+	cache->c_misses++;
+	pthread_mutex_unlock(&cache->c_mutex);
+	if (!nodesfree)
+		return NULL;
+	node = cache->alloc(key);
+	if (node == NULL) {	/* uh-oh */
+		pthread_mutex_lock(&cache->c_mutex);
+		cache->c_count--;
+		pthread_mutex_unlock(&cache->c_mutex);
+		return NULL;
+	}
+	pthread_mutex_init(&node->cn_mutex, NULL);
+	list_head_init(&node->cn_mru);
+	node->cn_count = 1;
+	node->cn_priority = 0;
+	node->cn_old_priority = -1;
+	return node;
+}
+
+int
+cache_overflowed(
+	struct cache *		cache)
+{
+	return cache->c_maxcount == cache->c_max;
+}
+
+
+static int
+__cache_node_purge(
+	struct cache *		cache,
+	struct cache_node *	node)
+{
+	int			count;
+	struct cache_mru *	mru;
+
+	pthread_mutex_lock(&node->cn_mutex);
+	count = node->cn_count;
+	if (count != 0) {
+		pthread_mutex_unlock(&node->cn_mutex);
+		return count;
+	}
+
+	/* can't purge dirty objects */
+	if (cache->flush(node)) {
+		pthread_mutex_unlock(&node->cn_mutex);
+		return 1;
+	}
+
+	mru = &cache->c_mrus[node->cn_priority];
+	pthread_mutex_lock(&mru->cm_mutex);
+	list_del_init(&node->cn_mru);
+	mru->cm_count--;
+	pthread_mutex_unlock(&mru->cm_mutex);
+
+	pthread_mutex_unlock(&node->cn_mutex);
+	pthread_mutex_destroy(&node->cn_mutex);
+	list_del_init(&node->cn_hash);
+	cache->relse(node);
+	return 0;
+}
+
+/*
+ * Lookup in the cache hash table.  With any luck we'll get a cache
+ * hit, in which case this will all be over quickly and painlessly.
+ * Otherwise, we allocate a new node, taking care not to expand the
+ * cache beyond the requested maximum size (shrink it if it would).
+ * Returns one if hit in cache, otherwise zero.  A node is _always_
+ * returned, however.
+ */
+int
+cache_node_get(
+	struct cache *		cache,
+	cache_key_t		key,
+	struct cache_node **	nodep)
+{
+	struct cache_node *	node = NULL;
+	struct cache_hash *	hash;
+	struct cache_mru *	mru;
+	struct list_head *	head;
+	struct list_head *	pos;
+	struct list_head *	n;
+	unsigned int		hashidx;
+	int			priority = 0;
+	int			purged = 0;
+
+	hashidx = cache->hash(key, cache->c_hashsize, cache->c_hashshift);
+	hash = cache->c_hash + hashidx;
+	head = &hash->ch_list;
+
+	for (;;) {
+		pthread_mutex_lock(&hash->ch_mutex);
+		for (pos = head->next, n = pos->next; pos != head;
+						pos = n, n = pos->next) {
+			int result;
+
+			node = list_entry(pos, struct cache_node, cn_hash);
+			result = cache->compare(node, key);
+			switch (result) {
+			case CACHE_HIT:
+				break;
+			case CACHE_PURGE:
+				if ((cache->c_flags & CACHE_MISCOMPARE_PURGE) &&
+				    !__cache_node_purge(cache, node)) {
+					purged++;
+					hash->ch_count--;
+				}
+				/* FALL THROUGH */
+			case CACHE_MISS:
+				goto next_object;
+			}
+
+			/*
+			 * node found, bump node's reference count, remove it
+			 * from its MRU list, and update stats.
+			 */
+			pthread_mutex_lock(&node->cn_mutex);
+
+			if (node->cn_count == 0 && cache->get) {
+				int err = cache->get(node);
+				if (err) {
+					pthread_mutex_unlock(&node->cn_mutex);
+					goto next_object;
+				}
+			}
+			if (node->cn_count == 0) {
+				ASSERT(node->cn_priority >= 0);
+				ASSERT(!list_empty(&node->cn_mru));
+				mru = &cache->c_mrus[node->cn_priority];
+				pthread_mutex_lock(&mru->cm_mutex);
+				mru->cm_count--;
+				list_del_init(&node->cn_mru);
+				pthread_mutex_unlock(&mru->cm_mutex);
+				if (node->cn_old_priority != -1) {
+					ASSERT(node->cn_priority ==
+							CACHE_DIRTY_PRIORITY);
+					node->cn_priority = node->cn_old_priority;
+					node->cn_old_priority = -1;
+				}
+			}
+			node->cn_count++;
+
+			pthread_mutex_unlock(&node->cn_mutex);
+			pthread_mutex_unlock(&hash->ch_mutex);
+
+			pthread_mutex_lock(&cache->c_mutex);
+			cache->c_hits++;
+			pthread_mutex_unlock(&cache->c_mutex);
+
+			*nodep = node;
+			return 0;
+next_object:
+			continue;	/* what the hell, gcc? */
+		}
+		pthread_mutex_unlock(&hash->ch_mutex);
+		/*
+		 * not found, allocate a new entry
+		 */
+		node = cache_node_allocate(cache, key);
+		if (node)
+			break;
+		priority = cache_shake(cache, priority, false);
+		/*
+		 * We start at 0; if we free CACHE_SHAKE_COUNT we get
+		 * back the same priority, if not we get back priority+1.
+		 * If we exceed CACHE_MAX_PRIORITY all slots are full; grow it.
+		 */
+		if (priority > CACHE_MAX_PRIORITY) {
+			priority = 0;
+			cache_expand(cache);
+		}
+	}
+
+	node->cn_hashidx = hashidx;
+
+	/* add new node to appropriate hash */
+	pthread_mutex_lock(&hash->ch_mutex);
+	hash->ch_count++;
+	list_add(&node->cn_hash, &hash->ch_list);
+	pthread_mutex_unlock(&hash->ch_mutex);
+
+	if (purged) {
+		pthread_mutex_lock(&cache->c_mutex);
+		cache->c_count -= purged;
+		pthread_mutex_unlock(&cache->c_mutex);
+	}
+
+	*nodep = node;
+	return 1;
+}
+
+void
+cache_node_put(
+	struct cache *		cache,
+	struct cache_node *	node)
+{
+	struct cache_mru *	mru;
+
+	pthread_mutex_lock(&node->cn_mutex);
+#ifdef CACHE_DEBUG
+	if (node->cn_count < 1) {
+		fprintf(stderr, "%s: node put on refcount %u (node=%p)\n",
+				__FUNCTION__, node->cn_count, node);
+		cache_abort();
+	}
+	if (!list_empty(&node->cn_mru)) {
+		fprintf(stderr, "%s: node put on node (%p) in MRU list\n",
+				__FUNCTION__, node);
+		cache_abort();
+	}
+#endif
+	node->cn_count--;
+
+	if (node->cn_count == 0 && cache->put)
+		cache->put(node);
+	if (node->cn_count == 0) {
+		/* add unreferenced node to appropriate MRU for shaker */
+		mru = &cache->c_mrus[node->cn_priority];
+		pthread_mutex_lock(&mru->cm_mutex);
+		mru->cm_count++;
+		list_add(&node->cn_mru, &mru->cm_list);
+		pthread_mutex_unlock(&mru->cm_mutex);
+	}
+
+	pthread_mutex_unlock(&node->cn_mutex);
+}
+
+void
+cache_node_set_priority(
+	struct cache *		cache,
+	struct cache_node *	node,
+	int			priority)
+{
+	if (priority < 0)
+		priority = 0;
+	else if (priority > CACHE_MAX_PRIORITY)
+		priority = CACHE_MAX_PRIORITY;
+
+	pthread_mutex_lock(&node->cn_mutex);
+	ASSERT(node->cn_count > 0);
+	node->cn_priority = priority;
+	node->cn_old_priority = -1;
+	pthread_mutex_unlock(&node->cn_mutex);
+}
+
+int
+cache_node_get_priority(
+	struct cache_node *	node)
+{
+	int			priority;
+
+	pthread_mutex_lock(&node->cn_mutex);
+	priority = node->cn_priority;
+	pthread_mutex_unlock(&node->cn_mutex);
+
+	return priority;
+}
+
+
+/*
+ * Purge a specific node from the cache.  Reference count must be zero.
+ */
+int
+cache_node_purge(
+	struct cache *		cache,
+	cache_key_t		key,
+	struct cache_node *	node)
+{
+	struct list_head *	head;
+	struct list_head *	pos;
+	struct list_head *	n;
+	struct cache_hash *	hash;
+	int			count = -1;
+
+	hash = cache->c_hash + cache->hash(key, cache->c_hashsize,
+					   cache->c_hashshift);
+	head = &hash->ch_list;
+	pthread_mutex_lock(&hash->ch_mutex);
+	for (pos = head->next, n = pos->next; pos != head;
+						pos = n, n = pos->next) {
+		if ((struct cache_node *)pos != node)
+			continue;
+
+		count = __cache_node_purge(cache, node);
+		if (!count)
+			hash->ch_count--;
+		break;
+	}
+	pthread_mutex_unlock(&hash->ch_mutex);
+
+	if (count == 0) {
+		pthread_mutex_lock(&cache->c_mutex);
+		cache->c_count--;
+		pthread_mutex_unlock(&cache->c_mutex);
+	}
+#ifdef CACHE_DEBUG
+	if (count >= 1) {
+		fprintf(stderr, "%s: refcount was %u, not zero (node=%p)\n",
+				__FUNCTION__, count, node);
+		cache_abort();
+	}
+	if (count == -1) {
+		fprintf(stderr, "%s: purge node not found! (node=%p)\n",
+			__FUNCTION__, node);
+		cache_abort();
+	}
+#endif
+	return count == 0;
+}
+
+/*
+ * Purge all nodes from the cache.  All reference counts must be zero.
+ */
+void
+cache_purge(
+	struct cache *		cache)
+{
+	int			i;
+
+	for (i = 0; i <= CACHE_DIRTY_PRIORITY; i++)
+		cache_shake(cache, i, true);
+
+#ifdef CACHE_DEBUG
+	if (cache->c_count != 0) {
+		/* flush referenced nodes to disk */
+		cache_flush(cache);
+		fprintf(stderr, "%s: shake on cache %p left %u nodes!?\n",
+				__FUNCTION__, cache, cache->c_count);
+		cache_abort();
+	}
+#endif
+}
+
+/*
+ * Flush all nodes in the cache to disk.
+ */
+void
+cache_flush(
+	struct cache *		cache)
+{
+	struct cache_hash *	hash;
+	struct list_head *	head;
+	struct list_head *	pos;
+	struct cache_node *	node;
+	int			i;
+
+	if (!cache->flush)
+		return;
+
+	for (i = 0; i < cache->c_hashsize; i++) {
+		hash = &cache->c_hash[i];
+
+		pthread_mutex_lock(&hash->ch_mutex);
+		head = &hash->ch_list;
+		for (pos = head->next; pos != head; pos = pos->next) {
+			node = (struct cache_node *)pos;
+			pthread_mutex_lock(&node->cn_mutex);
+			cache->flush(node);
+			pthread_mutex_unlock(&node->cn_mutex);
+		}
+		pthread_mutex_unlock(&hash->ch_mutex);
+	}
+}
+
+#define	HASH_REPORT	(3 * HASH_CACHE_RATIO)
+void
+cache_report(
+	FILE		*fp,
+	const char	*name,
+	struct cache	*cache)
+{
+	int		i;
+	unsigned long	count, index, total;
+	unsigned long	hash_bucket_lengths[HASH_REPORT + 2];
+
+	if ((cache->c_hits + cache->c_misses) == 0)
+		return;
+
+	/* report cache summary */
+	fprintf(fp, "%s: %p\n"
+			"Max supported entries = %u\n"
+			"Max utilized entries = %u\n"
+			"Active entries = %u\n"
+			"Hash table size = %u\n"
+			"Hits = %llu\n"
+			"Misses = %llu\n"
+			"Hit ratio = %5.2f\n",
+			name, cache,
+			cache->c_maxcount,
+			cache->c_max,
+			cache->c_count,
+			cache->c_hashsize,
+			cache->c_hits,
+			cache->c_misses,
+			(double)cache->c_hits * 100 /
+				(cache->c_hits + cache->c_misses)
+	);
+
+	for (i = 0; i <= CACHE_MAX_PRIORITY; i++)
+		fprintf(fp, "MRU %d entries = %6u (%3u%%)\n",
+			i, cache->c_mrus[i].cm_count,
+			cache->c_mrus[i].cm_count * 100 / cache->c_count);
+
+	i = CACHE_DIRTY_PRIORITY;
+	fprintf(fp, "Dirty MRU %d entries = %6u (%3u%%)\n",
+		i, cache->c_mrus[i].cm_count,
+		cache->c_mrus[i].cm_count * 100 / cache->c_count);
+
+	/* report hash bucket lengths */
+	bzero(hash_bucket_lengths, sizeof(hash_bucket_lengths));
+
+	for (i = 0; i < cache->c_hashsize; i++) {
+		count = cache->c_hash[i].ch_count;
+		if (count > HASH_REPORT)
+			index = HASH_REPORT + 1;
+		else
+			index = count;
+		hash_bucket_lengths[index]++;
+	}
+
+	total = 0;
+	for (i = 0; i < HASH_REPORT + 1; i++) {
+		total += i * hash_bucket_lengths[i];
+		if (hash_bucket_lengths[i] == 0)
+			continue;
+		fprintf(fp, "Hash buckets with  %2d entries %6ld (%3ld%%)\n",
+			i, hash_bucket_lengths[i],
+			(i * hash_bucket_lengths[i] * 100) / cache->c_count);
+	}
+	if (hash_bucket_lengths[i])	/* last report bucket is the overflow bucket */
+		fprintf(fp, "Hash buckets with >%2d entries %6ld (%3ld%%)\n",
+			i - 1, hash_bucket_lengths[i],
+			((cache->c_count - total) * 100) / cache->c_count);
+}


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 08/21] cache: disable debugging
  2025-09-16  0:22 ` [PATCHSET RFC v5 2/9] fuse4fs: fork a low level fuse server Darrick J. Wong
                     ` (6 preceding siblings ...)
  2025-09-16  0:52   ` [PATCH 07/21] libsupport: add a cache Darrick J. Wong
@ 2025-09-16  0:52   ` Darrick J. Wong
  2025-09-16  0:53   ` [PATCH 09/21] cache: use modern list iterator macros Darrick J. Wong
                     ` (12 subsequent siblings)
  20 siblings, 0 replies; 87+ messages in thread
From: Darrick J. Wong @ 2025-09-16  0:52 UTC (permalink / raw)
  To: tytso
  Cc: miklos, neal, amir73il, linux-fsdevel, linux-ext4, John, bernd,
	joannelkoong

From: Darrick J. Wong <djwong@kernel.org>

Not sure why debugging is turned on by default in the xfsprogs cache
code, but let's turn it off.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 lib/support/cache.c |    5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)


diff --git a/lib/support/cache.c b/lib/support/cache.c
index fe04f62f262aaa..08e0b484cca298 100644
--- a/lib/support/cache.c
+++ b/lib/support/cache.c
@@ -17,9 +17,8 @@
 #include "cache.h"
 #include "xbitops.h"
 
-#define CACHE_DEBUG 1
 #undef CACHE_DEBUG
-#define CACHE_DEBUG 1
+/* #define CACHE_DEBUG 1 */
 #undef CACHE_ABORT
 /* #define CACHE_ABORT 1 */
 
@@ -28,6 +27,8 @@
 #ifdef CACHE_DEBUG
 # include <assert.h>
 # define ASSERT(x)		assert(x)
+#else
+# define ASSERT(x)		do { } while (0)
 #endif
 
 static unsigned int cache_generic_bulkrelse(struct cache *, struct list_head *);


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 09/21] cache: use modern list iterator macros
  2025-09-16  0:22 ` [PATCHSET RFC v5 2/9] fuse4fs: fork a low level fuse server Darrick J. Wong
                     ` (7 preceding siblings ...)
  2025-09-16  0:52   ` [PATCH 08/21] cache: disable debugging Darrick J. Wong
@ 2025-09-16  0:53   ` Darrick J. Wong
  2025-09-16  0:53   ` [PATCH 10/21] cache: embed struct cache in the owner Darrick J. Wong
                     ` (11 subsequent siblings)
  20 siblings, 0 replies; 87+ messages in thread
From: Darrick J. Wong @ 2025-09-16  0:53 UTC (permalink / raw)
  To: tytso
  Cc: miklos, neal, amir73il, linux-fsdevel, linux-ext4, John, bernd,
	joannelkoong

From: Darrick J. Wong <djwong@kernel.org>

Use the list iterator macros from list.h.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 lib/support/cache.c |   71 +++++++++++++++++----------------------------------
 1 file changed, 24 insertions(+), 47 deletions(-)


diff --git a/lib/support/cache.c b/lib/support/cache.c
index 08e0b484cca298..d8f8231ac36d28 100644
--- a/lib/support/cache.c
+++ b/lib/support/cache.c
@@ -98,20 +98,18 @@ cache_expand(
 
 void
 cache_walk(
-	struct cache *		cache,
+	struct cache		*cache,
 	cache_walk_t		visit)
 {
-	struct cache_hash *	hash;
-	struct list_head *	head;
-	struct list_head *	pos;
+	struct cache_hash	*hash;
+	struct cache_node	*pos;
 	unsigned int		i;
 
 	for (i = 0; i < cache->c_hashsize; i++) {
 		hash = &cache->c_hash[i];
-		head = &hash->ch_list;
 		pthread_mutex_lock(&hash->ch_mutex);
-		for (pos = head->next; pos != head; pos = pos->next)
-			visit((struct cache_node *)pos);
+		list_for_each_entry(pos, &hash->ch_list, cn_hash)
+			visit(pos);
 		pthread_mutex_unlock(&hash->ch_mutex);
 	}
 }
@@ -218,12 +216,9 @@ cache_shake(
 	bool			purge)
 {
 	struct cache_mru	*mru;
-	struct cache_hash *	hash;
+	struct cache_hash	*hash;
 	struct list_head	temp;
-	struct list_head *	head;
-	struct list_head *	pos;
-	struct list_head *	n;
-	struct cache_node *	node;
+	struct cache_node	*node, *n;
 	unsigned int		count;
 
 	ASSERT(priority <= CACHE_DIRTY_PRIORITY);
@@ -233,13 +228,9 @@ cache_shake(
 	mru = &cache->c_mrus[priority];
 	count = 0;
 	list_head_init(&temp);
-	head = &mru->cm_list;
 
 	pthread_mutex_lock(&mru->cm_mutex);
-	for (pos = head->prev, n = pos->prev; pos != head;
-						pos = n, n = pos->prev) {
-		node = list_entry(pos, struct cache_node, cn_mru);
-
+	list_for_each_entry_safe_reverse(node, n, &mru->cm_list, cn_mru) {
 		if (pthread_mutex_trylock(&node->cn_mutex) != 0)
 			continue;
 
@@ -376,31 +367,25 @@ __cache_node_purge(
  */
 int
 cache_node_get(
-	struct cache *		cache,
+	struct cache		*cache,
 	cache_key_t		key,
-	struct cache_node **	nodep)
+	struct cache_node	**nodep)
 {
-	struct cache_node *	node = NULL;
-	struct cache_hash *	hash;
-	struct cache_mru *	mru;
-	struct list_head *	head;
-	struct list_head *	pos;
-	struct list_head *	n;
+	struct cache_hash	*hash;
+	struct cache_mru	*mru;
+	struct cache_node	*node = NULL, *n;
 	unsigned int		hashidx;
 	int			priority = 0;
 	int			purged = 0;
 
 	hashidx = cache->hash(key, cache->c_hashsize, cache->c_hashshift);
 	hash = cache->c_hash + hashidx;
-	head = &hash->ch_list;
 
 	for (;;) {
 		pthread_mutex_lock(&hash->ch_mutex);
-		for (pos = head->next, n = pos->next; pos != head;
-						pos = n, n = pos->next) {
+		list_for_each_entry_safe(node, n, &hash->ch_list, cn_hash) {
 			int result;
 
-			node = list_entry(pos, struct cache_node, cn_hash);
 			result = cache->compare(node, key);
 			switch (result) {
 			case CACHE_HIT:
@@ -568,23 +553,19 @@ cache_node_get_priority(
  */
 int
 cache_node_purge(
-	struct cache *		cache,
+	struct cache		*cache,
 	cache_key_t		key,
-	struct cache_node *	node)
+	struct cache_node	*node)
 {
-	struct list_head *	head;
-	struct list_head *	pos;
-	struct list_head *	n;
-	struct cache_hash *	hash;
+	struct cache_node	*pos, *n;
+	struct cache_hash	*hash;
 	int			count = -1;
 
 	hash = cache->c_hash + cache->hash(key, cache->c_hashsize,
 					   cache->c_hashshift);
-	head = &hash->ch_list;
 	pthread_mutex_lock(&hash->ch_mutex);
-	for (pos = head->next, n = pos->next; pos != head;
-						pos = n, n = pos->next) {
-		if ((struct cache_node *)pos != node)
+	list_for_each_entry_safe(pos, n, &hash->ch_list, cn_hash) {
+		if (pos != node)
 			continue;
 
 		count = __cache_node_purge(cache, node);
@@ -642,12 +623,10 @@ cache_purge(
  */
 void
 cache_flush(
-	struct cache *		cache)
+	struct cache		*cache)
 {
-	struct cache_hash *	hash;
-	struct list_head *	head;
-	struct list_head *	pos;
-	struct cache_node *	node;
+	struct cache_hash	*hash;
+	struct cache_node	*node;
 	int			i;
 
 	if (!cache->flush)
@@ -657,9 +636,7 @@ cache_flush(
 		hash = &cache->c_hash[i];
 
 		pthread_mutex_lock(&hash->ch_mutex);
-		head = &hash->ch_list;
-		for (pos = head->next; pos != head; pos = pos->next) {
-			node = (struct cache_node *)pos;
+		list_for_each_entry(node, &hash->ch_list, cn_hash) {
 			pthread_mutex_lock(&node->cn_mutex);
 			cache->flush(node);
 			pthread_mutex_unlock(&node->cn_mutex);


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 10/21] cache: embed struct cache in the owner
  2025-09-16  0:22 ` [PATCHSET RFC v5 2/9] fuse4fs: fork a low level fuse server Darrick J. Wong
                     ` (8 preceding siblings ...)
  2025-09-16  0:53   ` [PATCH 09/21] cache: use modern list iterator macros Darrick J. Wong
@ 2025-09-16  0:53   ` Darrick J. Wong
  2025-09-16  0:53   ` [PATCH 11/21] cache: pass cache pointer to callbacks Darrick J. Wong
                     ` (10 subsequent siblings)
  20 siblings, 0 replies; 87+ messages in thread
From: Darrick J. Wong @ 2025-09-16  0:53 UTC (permalink / raw)
  To: tytso
  Cc: miklos, neal, amir73il, linux-fsdevel, linux-ext4, John, bernd,
	joannelkoong

From: Darrick J. Wong <djwong@kernel.org>

It'll be easier to embed a struct cache into the object that owns the
cache rather than passing pointers around.  This is the prelude to the
next patch, which will enable cache functions to walk back to the owning
struct.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 lib/support/cache.h |   10 ++++++++--
 lib/support/cache.c |   38 ++++++++++++++++++++------------------
 2 files changed, 28 insertions(+), 20 deletions(-)


diff --git a/lib/support/cache.h b/lib/support/cache.h
index 16b17a9b7a1a51..993f1385dedcee 100644
--- a/lib/support/cache.h
+++ b/lib/support/cache.h
@@ -122,8 +122,14 @@ struct cache {
 	unsigned int 		c_max;		/* max nodes ever used */
 };
 
-struct cache *cache_init(int, unsigned int, const struct cache_operations *);
-void cache_destroy(struct cache *);
+static inline bool cache_initialized(const struct cache *cache)
+{
+	return cache->hash != NULL;
+}
+
+int cache_init(int flags, unsigned int size,
+	       const struct cache_operations *ops, struct cache *cache);
+void cache_destroy(struct cache *cache);
 void cache_walk(struct cache *, cache_walk_t);
 void cache_purge(struct cache *);
 void cache_flush(struct cache *);
diff --git a/lib/support/cache.c b/lib/support/cache.c
index d8f8231ac36d28..8b4f9f03c3899b 100644
--- a/lib/support/cache.c
+++ b/lib/support/cache.c
@@ -12,6 +12,7 @@
 #include <stdbool.h>
 #include <stddef.h>
 #include <stdint.h>
+#include <errno.h>
 
 #include "list.h"
 #include "cache.h"
@@ -33,23 +34,18 @@
 
 static unsigned int cache_generic_bulkrelse(struct cache *, struct list_head *);
 
-struct cache *
+int
 cache_init(
 	int			flags,
 	unsigned int		hashsize,
-	const struct cache_operations	*cache_operations)
+	const struct cache_operations	*cache_operations,
+	struct cache		*cache)
 {
-	struct cache *		cache;
 	unsigned int		i, maxcount;
 
 	maxcount = hashsize * HASH_CACHE_RATIO;
 
-	if (!(cache = malloc(sizeof(struct cache))))
-		return NULL;
-	if (!(cache->c_hash = calloc(hashsize, sizeof(struct cache_hash)))) {
-		free(cache);
-		return NULL;
-	}
+	memset(cache, 0, sizeof(*cache));
 
 	cache->c_flags = flags;
 	cache->c_count = 0;
@@ -57,8 +53,6 @@ cache_init(
 	cache->c_hits = 0;
 	cache->c_misses = 0;
 	cache->c_maxcount = maxcount;
-	cache->c_hashsize = hashsize;
-	cache->c_hashshift = fls(hashsize) - 1;
 	cache->hash = cache_operations->hash;
 	cache->alloc = cache_operations->alloc;
 	cache->flush = cache_operations->flush;
@@ -70,18 +64,26 @@ cache_init(
 	cache->put = cache_operations->put;
 	pthread_mutex_init(&cache->c_mutex, NULL);
 
+	for (i = 0; i <= CACHE_DIRTY_PRIORITY; i++) {
+		list_head_init(&cache->c_mrus[i].cm_list);
+		cache->c_mrus[i].cm_count = 0;
+		pthread_mutex_init(&cache->c_mrus[i].cm_mutex, NULL);
+	}
+
+	cache->c_hash = calloc(hashsize, sizeof(struct cache_hash));
+	if (!cache->c_hash)
+		return ENOMEM;
+
+	cache->c_hashsize = hashsize;
+	cache->c_hashshift = fls(hashsize) - 1;
+
 	for (i = 0; i < hashsize; i++) {
 		list_head_init(&cache->c_hash[i].ch_list);
 		cache->c_hash[i].ch_count = 0;
 		pthread_mutex_init(&cache->c_hash[i].ch_mutex, NULL);
 	}
 
-	for (i = 0; i <= CACHE_DIRTY_PRIORITY; i++) {
-		list_head_init(&cache->c_mrus[i].cm_list);
-		cache->c_mrus[i].cm_count = 0;
-		pthread_mutex_init(&cache->c_mrus[i].cm_mutex, NULL);
-	}
-	return cache;
+	return 0;
 }
 
 static void
@@ -153,7 +155,7 @@ cache_destroy(
 	}
 	pthread_mutex_destroy(&cache->c_mutex);
 	free(cache->c_hash);
-	free(cache);
+	memset(cache, 0, sizeof(*cache));
 }
 
 static unsigned int


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 11/21] cache: pass cache pointer to callbacks
  2025-09-16  0:22 ` [PATCHSET RFC v5 2/9] fuse4fs: fork a low level fuse server Darrick J. Wong
                     ` (9 preceding siblings ...)
  2025-09-16  0:53   ` [PATCH 10/21] cache: embed struct cache in the owner Darrick J. Wong
@ 2025-09-16  0:53   ` Darrick J. Wong
  2025-09-16  0:53   ` [PATCH 12/21] cache: pass a private data pointer through cache_walk Darrick J. Wong
                     ` (9 subsequent siblings)
  20 siblings, 0 replies; 87+ messages in thread
From: Darrick J. Wong @ 2025-09-16  0:53 UTC (permalink / raw)
  To: tytso
  Cc: miklos, neal, amir73il, linux-fsdevel, linux-ext4, John, bernd,
	joannelkoong

From: Darrick J. Wong <djwong@kernel.org>

Pass the cache pointer to the cache node callbacks so that subsequent
patches don't have to waste memory putting pointers to struct fuse4fs in
the cached objects.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 lib/support/cache.h |   12 ++++++------
 lib/support/cache.c |   21 +++++++++++----------
 2 files changed, 17 insertions(+), 16 deletions(-)


diff --git a/lib/support/cache.h b/lib/support/cache.h
index 993f1385dedcee..0168fdca027896 100644
--- a/lib/support/cache.h
+++ b/lib/support/cache.h
@@ -56,16 +56,16 @@ struct cache_node;
 
 typedef void *cache_key_t;
 
-typedef void (*cache_walk_t)(struct cache_node *);
-typedef struct cache_node * (*cache_node_alloc_t)(cache_key_t);
-typedef int (*cache_node_flush_t)(struct cache_node *);
-typedef void (*cache_node_relse_t)(struct cache_node *);
+typedef void (*cache_walk_t)(struct cache *c, struct cache_node *cn);
+typedef struct cache_node * (*cache_node_alloc_t)(struct cache *c, cache_key_t k);
+typedef int (*cache_node_flush_t)(struct cache *c, struct cache_node *cn);
+typedef void (*cache_node_relse_t)(struct cache *c, struct cache_node *cn);
 typedef unsigned int (*cache_node_hash_t)(cache_key_t, unsigned int,
 					  unsigned int);
 typedef int (*cache_node_compare_t)(struct cache_node *, cache_key_t);
 typedef unsigned int (*cache_bulk_relse_t)(struct cache *, struct list_head *);
-typedef int (*cache_node_get_t)(struct cache_node *);
-typedef void (*cache_node_put_t)(struct cache_node *);
+typedef int (*cache_node_get_t)(struct cache *c, struct cache_node *cn);
+typedef void (*cache_node_put_t)(struct cache *c, struct cache_node *cn);
 
 struct cache_operations {
 	cache_node_hash_t	hash;
diff --git a/lib/support/cache.c b/lib/support/cache.c
index 8b4f9f03c3899b..2e2e36ccc3ef78 100644
--- a/lib/support/cache.c
+++ b/lib/support/cache.c
@@ -111,7 +111,7 @@ cache_walk(
 		hash = &cache->c_hash[i];
 		pthread_mutex_lock(&hash->ch_mutex);
 		list_for_each_entry(pos, &hash->ch_list, cn_hash)
-			visit(pos);
+			visit(cache, pos);
 		pthread_mutex_unlock(&hash->ch_mutex);
 	}
 }
@@ -125,7 +125,8 @@ cache_walk(
 #ifdef CACHE_DEBUG
 static void
 cache_zero_check(
-	struct cache_node *	node)
+	struct cache		*cache,
+	struct cache_node	*node)
 {
 	if (node->cn_count > 0) {
 		fprintf(stderr, "%s: refcount is %u, not zero (node=%p)\n",
@@ -170,7 +171,7 @@ cache_generic_bulkrelse(
 		node = list_entry(list->next, struct cache_node, cn_mru);
 		pthread_mutex_destroy(&node->cn_mutex);
 		list_del_init(&node->cn_mru);
-		cache->relse(node);
+		cache->relse(cache, node);
 		count++;
 	}
 
@@ -237,7 +238,7 @@ cache_shake(
 			continue;
 
 		/* memory pressure is not allowed to release dirty objects */
-		if (cache->flush(node) && !purge) {
+		if (cache->flush(cache, node) && !purge) {
 			list_del(&node->cn_mru);
 			mru->cm_count--;
 			node->cn_priority = -1;
@@ -302,7 +303,7 @@ cache_node_allocate(
 	pthread_mutex_unlock(&cache->c_mutex);
 	if (!nodesfree)
 		return NULL;
-	node = cache->alloc(key);
+	node = cache->alloc(cache, key);
 	if (node == NULL) {	/* uh-oh */
 		pthread_mutex_lock(&cache->c_mutex);
 		cache->c_count--;
@@ -341,7 +342,7 @@ __cache_node_purge(
 	}
 
 	/* can't purge dirty objects */
-	if (cache->flush(node)) {
+	if (cache->flush(cache, node)) {
 		pthread_mutex_unlock(&node->cn_mutex);
 		return 1;
 	}
@@ -355,7 +356,7 @@ __cache_node_purge(
 	pthread_mutex_unlock(&node->cn_mutex);
 	pthread_mutex_destroy(&node->cn_mutex);
 	list_del_init(&node->cn_hash);
-	cache->relse(node);
+	cache->relse(cache, node);
 	return 0;
 }
 
@@ -410,7 +411,7 @@ cache_node_get(
 			pthread_mutex_lock(&node->cn_mutex);
 
 			if (node->cn_count == 0 && cache->get) {
-				int err = cache->get(node);
+				int err = cache->get(cache, node);
 				if (err) {
 					pthread_mutex_unlock(&node->cn_mutex);
 					goto next_object;
@@ -505,7 +506,7 @@ cache_node_put(
 	node->cn_count--;
 
 	if (node->cn_count == 0 && cache->put)
-		cache->put(node);
+		cache->put(cache, node);
 	if (node->cn_count == 0) {
 		/* add unreferenced node to appropriate MRU for shaker */
 		mru = &cache->c_mrus[node->cn_priority];
@@ -640,7 +641,7 @@ cache_flush(
 		pthread_mutex_lock(&hash->ch_mutex);
 		list_for_each_entry(node, &hash->ch_list, cn_hash) {
 			pthread_mutex_lock(&node->cn_mutex);
-			cache->flush(node);
+			cache->flush(cache, node);
 			pthread_mutex_unlock(&node->cn_mutex);
 		}
 		pthread_mutex_unlock(&hash->ch_mutex);


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 12/21] cache: pass a private data pointer through cache_walk
  2025-09-16  0:22 ` [PATCHSET RFC v5 2/9] fuse4fs: fork a low level fuse server Darrick J. Wong
                     ` (10 preceding siblings ...)
  2025-09-16  0:53   ` [PATCH 11/21] cache: pass cache pointer to callbacks Darrick J. Wong
@ 2025-09-16  0:53   ` Darrick J. Wong
  2025-09-16  0:54   ` [PATCH 13/21] cache: add a helper to grab a new refcount for a cache_node Darrick J. Wong
                     ` (8 subsequent siblings)
  20 siblings, 0 replies; 87+ messages in thread
From: Darrick J. Wong @ 2025-09-16  0:53 UTC (permalink / raw)
  To: tytso
  Cc: miklos, neal, amir73il, linux-fsdevel, linux-ext4, John, bernd,
	joannelkoong

From: Darrick J. Wong <djwong@kernel.org>

Allow cache_walk callers to pass a pointer to the callback function.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 lib/support/cache.h |    4 ++--
 lib/support/cache.c |   10 ++++++----
 2 files changed, 8 insertions(+), 6 deletions(-)


diff --git a/lib/support/cache.h b/lib/support/cache.h
index 0168fdca027896..b18b6d3325e9ad 100644
--- a/lib/support/cache.h
+++ b/lib/support/cache.h
@@ -56,7 +56,7 @@ struct cache_node;
 
 typedef void *cache_key_t;
 
-typedef void (*cache_walk_t)(struct cache *c, struct cache_node *cn);
+typedef void (*cache_walk_t)(struct cache *c, struct cache_node *cn, void *d);
 typedef struct cache_node * (*cache_node_alloc_t)(struct cache *c, cache_key_t k);
 typedef int (*cache_node_flush_t)(struct cache *c, struct cache_node *cn);
 typedef void (*cache_node_relse_t)(struct cache *c, struct cache_node *cn);
@@ -130,7 +130,7 @@ static inline bool cache_initialized(const struct cache *cache)
 int cache_init(int flags, unsigned int size,
 	       const struct cache_operations *ops, struct cache *cache);
 void cache_destroy(struct cache *cache);
-void cache_walk(struct cache *, cache_walk_t);
+void cache_walk(struct cache *cache, cache_walk_t fn, void *data);
 void cache_purge(struct cache *);
 void cache_flush(struct cache *);
 
diff --git a/lib/support/cache.c b/lib/support/cache.c
index 2e2e36ccc3ef78..606acd5453cf10 100644
--- a/lib/support/cache.c
+++ b/lib/support/cache.c
@@ -101,7 +101,8 @@ cache_expand(
 void
 cache_walk(
 	struct cache		*cache,
-	cache_walk_t		visit)
+	cache_walk_t		visit,
+	void			*data)
 {
 	struct cache_hash	*hash;
 	struct cache_node	*pos;
@@ -111,7 +112,7 @@ cache_walk(
 		hash = &cache->c_hash[i];
 		pthread_mutex_lock(&hash->ch_mutex);
 		list_for_each_entry(pos, &hash->ch_list, cn_hash)
-			visit(cache, pos);
+			visit(cache, pos, data);
 		pthread_mutex_unlock(&hash->ch_mutex);
 	}
 }
@@ -126,7 +127,8 @@ cache_walk(
 static void
 cache_zero_check(
 	struct cache		*cache,
-	struct cache_node	*node)
+	struct cache_node	*node,
+	void			*data)
 {
 	if (node->cn_count > 0) {
 		fprintf(stderr, "%s: refcount is %u, not zero (node=%p)\n",
@@ -134,7 +136,7 @@ cache_zero_check(
 		cache_abort();
 	}
 }
-#define cache_destroy_check(c)	cache_walk((c), cache_zero_check)
+#define cache_destroy_check(c)	cache_walk((c), cache_zero_check, NULL)
 #else
 #define cache_destroy_check(c)	do { } while (0)
 #endif


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 13/21] cache: add a helper to grab a new refcount for a cache_node
  2025-09-16  0:22 ` [PATCHSET RFC v5 2/9] fuse4fs: fork a low level fuse server Darrick J. Wong
                     ` (11 preceding siblings ...)
  2025-09-16  0:53   ` [PATCH 12/21] cache: pass a private data pointer through cache_walk Darrick J. Wong
@ 2025-09-16  0:54   ` Darrick J. Wong
  2025-09-16  0:54   ` [PATCH 14/21] cache: return results of a cache flush Darrick J. Wong
                     ` (7 subsequent siblings)
  20 siblings, 0 replies; 87+ messages in thread
From: Darrick J. Wong @ 2025-09-16  0:54 UTC (permalink / raw)
  To: tytso
  Cc: miklos, neal, amir73il, linux-fsdevel, linux-ext4, John, bernd,
	joannelkoong

From: Darrick J. Wong <djwong@kernel.org>

Create a helper to bump the refcount of a cache node.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 lib/support/cache.h |    1 +
 lib/support/cache.c |   57 +++++++++++++++++++++++++++++----------------------
 2 files changed, 33 insertions(+), 25 deletions(-)


diff --git a/lib/support/cache.h b/lib/support/cache.h
index b18b6d3325e9ad..e8f1c82ef7869c 100644
--- a/lib/support/cache.h
+++ b/lib/support/cache.h
@@ -141,5 +141,6 @@ int cache_node_get_priority(struct cache_node *);
 int cache_node_purge(struct cache *, cache_key_t, struct cache_node *);
 void cache_report(FILE *fp, const char *, struct cache *);
 int cache_overflowed(struct cache *);
+struct cache_node *cache_node_grab(struct cache *cache, struct cache_node *node);
 
 #endif	/* __CACHE_H__ */
diff --git a/lib/support/cache.c b/lib/support/cache.c
index 606acd5453cf10..49568ffa6de2e4 100644
--- a/lib/support/cache.c
+++ b/lib/support/cache.c
@@ -362,6 +362,35 @@ __cache_node_purge(
 	return 0;
 }
 
+/* Grab a new refcount to the cache node object.  Caller must hold cn_mutex. */
+struct cache_node *cache_node_grab(struct cache *cache, struct cache_node *node)
+{
+	struct cache_mru *mru;
+
+	if (node->cn_count == 0 && cache->get) {
+		int err = cache->get(cache, node);
+		if (err)
+			return NULL;
+	}
+	if (node->cn_count == 0) {
+		ASSERT(node->cn_priority >= 0);
+		ASSERT(!list_empty(&node->cn_mru));
+		mru = &cache->c_mrus[node->cn_priority];
+		pthread_mutex_lock(&mru->cm_mutex);
+		mru->cm_count--;
+		list_del_init(&node->cn_mru);
+		pthread_mutex_unlock(&mru->cm_mutex);
+		if (node->cn_old_priority != -1) {
+			ASSERT(node->cn_priority ==
+					CACHE_DIRTY_PRIORITY);
+			node->cn_priority = node->cn_old_priority;
+			node->cn_old_priority = -1;
+		}
+	}
+	node->cn_count++;
+	return node;
+}
+
 /*
  * Lookup in the cache hash table.  With any luck we'll get a cache
  * hit, in which case this will all be over quickly and painlessly.
@@ -377,7 +406,6 @@ cache_node_get(
 	struct cache_node	**nodep)
 {
 	struct cache_hash	*hash;
-	struct cache_mru	*mru;
 	struct cache_node	*node = NULL, *n;
 	unsigned int		hashidx;
 	int			priority = 0;
@@ -411,31 +439,10 @@ cache_node_get(
 			 * from its MRU list, and update stats.
 			 */
 			pthread_mutex_lock(&node->cn_mutex);
-
-			if (node->cn_count == 0 && cache->get) {
-				int err = cache->get(cache, node);
-				if (err) {
-					pthread_mutex_unlock(&node->cn_mutex);
-					goto next_object;
-				}
+			if (!cache_node_grab(cache, node)) {
+				pthread_mutex_unlock(&node->cn_mutex);
+				goto next_object;
 			}
-			if (node->cn_count == 0) {
-				ASSERT(node->cn_priority >= 0);
-				ASSERT(!list_empty(&node->cn_mru));
-				mru = &cache->c_mrus[node->cn_priority];
-				pthread_mutex_lock(&mru->cm_mutex);
-				mru->cm_count--;
-				list_del_init(&node->cn_mru);
-				pthread_mutex_unlock(&mru->cm_mutex);
-				if (node->cn_old_priority != -1) {
-					ASSERT(node->cn_priority ==
-							CACHE_DIRTY_PRIORITY);
-					node->cn_priority = node->cn_old_priority;
-					node->cn_old_priority = -1;
-				}
-			}
-			node->cn_count++;
-
 			pthread_mutex_unlock(&node->cn_mutex);
 			pthread_mutex_unlock(&hash->ch_mutex);
 


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 14/21] cache: return results of a cache flush
  2025-09-16  0:22 ` [PATCHSET RFC v5 2/9] fuse4fs: fork a low level fuse server Darrick J. Wong
                     ` (12 preceding siblings ...)
  2025-09-16  0:54   ` [PATCH 13/21] cache: add a helper to grab a new refcount for a cache_node Darrick J. Wong
@ 2025-09-16  0:54   ` Darrick J. Wong
  2025-09-16  0:54   ` [PATCH 15/21] cache: add a "get only if incore" flag to cache_node_get Darrick J. Wong
                     ` (6 subsequent siblings)
  20 siblings, 0 replies; 87+ messages in thread
From: Darrick J. Wong @ 2025-09-16  0:54 UTC (permalink / raw)
  To: tytso
  Cc: miklos, neal, amir73il, linux-fsdevel, linux-ext4, John, bernd,
	joannelkoong

From: Darrick J. Wong <djwong@kernel.org>

Modify cache_flush to return whether or not there were errors whilst
flushing the cache.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 lib/support/cache.h |    4 ++--
 lib/support/cache.c |   11 +++++++----
 2 files changed, 9 insertions(+), 6 deletions(-)


diff --git a/lib/support/cache.h b/lib/support/cache.h
index e8f1c82ef7869c..8d39ca5c02a285 100644
--- a/lib/support/cache.h
+++ b/lib/support/cache.h
@@ -58,7 +58,7 @@ typedef void *cache_key_t;
 
 typedef void (*cache_walk_t)(struct cache *c, struct cache_node *cn, void *d);
 typedef struct cache_node * (*cache_node_alloc_t)(struct cache *c, cache_key_t k);
-typedef int (*cache_node_flush_t)(struct cache *c, struct cache_node *cn);
+typedef bool (*cache_node_flush_t)(struct cache *c, struct cache_node *cn);
 typedef void (*cache_node_relse_t)(struct cache *c, struct cache_node *cn);
 typedef unsigned int (*cache_node_hash_t)(cache_key_t, unsigned int,
 					  unsigned int);
@@ -132,7 +132,7 @@ int cache_init(int flags, unsigned int size,
 void cache_destroy(struct cache *cache);
 void cache_walk(struct cache *cache, cache_walk_t fn, void *data);
 void cache_purge(struct cache *);
-void cache_flush(struct cache *);
+bool cache_flush(struct cache *cache);
 
 int cache_node_get(struct cache *, cache_key_t, struct cache_node **);
 void cache_node_put(struct cache *, struct cache_node *);
diff --git a/lib/support/cache.c b/lib/support/cache.c
index 49568ffa6de2e4..fa07b4ad8222d2 100644
--- a/lib/support/cache.c
+++ b/lib/support/cache.c
@@ -631,18 +631,19 @@ cache_purge(
 }
 
 /*
- * Flush all nodes in the cache to disk.
+ * Flush all nodes in the cache to disk.  Returns true if the flush succeeded.
  */
-void
+bool
 cache_flush(
 	struct cache		*cache)
 {
 	struct cache_hash	*hash;
 	struct cache_node	*node;
 	int			i;
+	bool			still_dirty = false;
 
 	if (!cache->flush)
-		return;
+		return true;
 
 	for (i = 0; i < cache->c_hashsize; i++) {
 		hash = &cache->c_hash[i];
@@ -650,11 +651,13 @@ cache_flush(
 		pthread_mutex_lock(&hash->ch_mutex);
 		list_for_each_entry(node, &hash->ch_list, cn_hash) {
 			pthread_mutex_lock(&node->cn_mutex);
-			cache->flush(cache, node);
+			still_dirty |= cache->flush(cache, node);
 			pthread_mutex_unlock(&node->cn_mutex);
 		}
 		pthread_mutex_unlock(&hash->ch_mutex);
 	}
+
+	return !still_dirty;
 }
 
 #define	HASH_REPORT	(3 * HASH_CACHE_RATIO)


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 15/21] cache: add a "get only if incore" flag to cache_node_get
  2025-09-16  0:22 ` [PATCHSET RFC v5 2/9] fuse4fs: fork a low level fuse server Darrick J. Wong
                     ` (13 preceding siblings ...)
  2025-09-16  0:54   ` [PATCH 14/21] cache: return results of a cache flush Darrick J. Wong
@ 2025-09-16  0:54   ` Darrick J. Wong
  2025-09-16  0:54   ` [PATCH 16/21] cache: support gradual expansion Darrick J. Wong
                     ` (5 subsequent siblings)
  20 siblings, 0 replies; 87+ messages in thread
From: Darrick J. Wong @ 2025-09-16  0:54 UTC (permalink / raw)
  To: tytso
  Cc: miklos, neal, amir73il, linux-fsdevel, linux-ext4, John, bernd,
	joannelkoong

From: Darrick J. Wong <djwong@kernel.org>

Add a new flag to cache_node_get so that callers can specify that they
only want the cache to return an existing cache node, and not create a
new one.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 lib/support/cache.h |    5 ++++-
 lib/support/cache.c |    7 +++++++
 2 files changed, 11 insertions(+), 1 deletion(-)


diff --git a/lib/support/cache.h b/lib/support/cache.h
index 8d39ca5c02a285..98b2182d49a6e0 100644
--- a/lib/support/cache.h
+++ b/lib/support/cache.h
@@ -134,7 +134,10 @@ void cache_walk(struct cache *cache, cache_walk_t fn, void *data);
 void cache_purge(struct cache *);
 bool cache_flush(struct cache *cache);
 
-int cache_node_get(struct cache *, cache_key_t, struct cache_node **);
+/* don't allocate a new node */
+#define CACHE_GET_INCORE	(1U << 0)
+int cache_node_get(struct cache *c, cache_key_t key, unsigned int cgflags,
+		   struct cache_node **nodep);
 void cache_node_put(struct cache *, struct cache_node *);
 void cache_node_set_priority(struct cache *, struct cache_node *, int);
 int cache_node_get_priority(struct cache_node *);
diff --git a/lib/support/cache.c b/lib/support/cache.c
index fa07b4ad8222d2..9da6c59b3b6391 100644
--- a/lib/support/cache.c
+++ b/lib/support/cache.c
@@ -403,6 +403,7 @@ int
 cache_node_get(
 	struct cache		*cache,
 	cache_key_t		key,
+	unsigned int		cgflags,
 	struct cache_node	**nodep)
 {
 	struct cache_hash	*hash;
@@ -456,6 +457,12 @@ cache_node_get(
 			continue;	/* what the hell, gcc? */
 		}
 		pthread_mutex_unlock(&hash->ch_mutex);
+
+		if (cgflags & CACHE_GET_INCORE) {
+			*nodep = NULL;
+			return 0;
+		}
+
 		/*
 		 * not found, allocate a new entry
 		 */


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 16/21] cache: support gradual expansion
  2025-09-16  0:22 ` [PATCHSET RFC v5 2/9] fuse4fs: fork a low level fuse server Darrick J. Wong
                     ` (14 preceding siblings ...)
  2025-09-16  0:54   ` [PATCH 15/21] cache: add a "get only if incore" flag to cache_node_get Darrick J. Wong
@ 2025-09-16  0:54   ` Darrick J. Wong
  2025-09-16  0:55   ` [PATCH 17/21] cache: implement automatic shrinking Darrick J. Wong
                     ` (4 subsequent siblings)
  20 siblings, 0 replies; 87+ messages in thread
From: Darrick J. Wong @ 2025-09-16  0:54 UTC (permalink / raw)
  To: tytso
  Cc: miklos, neal, amir73il, linux-fsdevel, linux-ext4, John, bernd,
	joannelkoong

From: Darrick J. Wong <djwong@kernel.org>

It's probably not a good idea to expand the cache size by powers of two
beyond some random limit, so let the users figure that out if they want
to.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 lib/support/cache.h |   10 ++++++++++
 lib/support/cache.c |   12 ++++++++++--
 2 files changed, 20 insertions(+), 2 deletions(-)


diff --git a/lib/support/cache.h b/lib/support/cache.h
index 98b2182d49a6e0..ae37945c545f46 100644
--- a/lib/support/cache.h
+++ b/lib/support/cache.h
@@ -66,6 +66,14 @@ typedef int (*cache_node_compare_t)(struct cache_node *, cache_key_t);
 typedef unsigned int (*cache_bulk_relse_t)(struct cache *, struct list_head *);
 typedef int (*cache_node_get_t)(struct cache *c, struct cache_node *cn);
 typedef void (*cache_node_put_t)(struct cache *c, struct cache_node *cn);
+typedef unsigned int (*cache_node_resize_t)(const struct cache *c,
+					    unsigned int curr_size);
+
+static inline unsigned int cache_gradual_resize(const struct cache *cache,
+						unsigned int curr_size)
+{
+	return curr_size * 5 / 4;
+}
 
 struct cache_operations {
 	cache_node_hash_t	hash;
@@ -76,6 +84,7 @@ struct cache_operations {
 	cache_bulk_relse_t	bulkrelse;	/* optional */
 	cache_node_get_t	get;		/* optional */
 	cache_node_put_t	put;		/* optional */
+	cache_node_resize_t	resize;		/* optional */
 };
 
 struct cache_hash {
@@ -113,6 +122,7 @@ struct cache {
 	cache_bulk_relse_t	bulkrelse;	/* bulk release routine */
 	cache_node_get_t	get;		/* prepare cache node after get */
 	cache_node_put_t	put;		/* prepare to put cache node */
+	cache_node_resize_t	resize;		/* compute new maxcount */
 	unsigned int		c_hashsize;	/* hash bucket count */
 	unsigned int		c_hashshift;	/* hash key shift */
 	struct cache_hash	*c_hash;	/* hash table buckets */
diff --git a/lib/support/cache.c b/lib/support/cache.c
index 9da6c59b3b6391..dbaddc1bd36d3d 100644
--- a/lib/support/cache.c
+++ b/lib/support/cache.c
@@ -62,6 +62,7 @@ cache_init(
 		cache_operations->bulkrelse : cache_generic_bulkrelse;
 	cache->get = cache_operations->get;
 	cache->put = cache_operations->put;
+	cache->resize = cache_operations->resize;
 	pthread_mutex_init(&cache->c_mutex, NULL);
 
 	for (i = 0; i <= CACHE_DIRTY_PRIORITY; i++) {
@@ -90,11 +91,18 @@ static void
 cache_expand(
 	struct cache *		cache)
 {
+	unsigned int		new_size = 0;
+
 	pthread_mutex_lock(&cache->c_mutex);
+	if (cache->resize)
+		new_size = cache->resize(cache, cache->c_maxcount);
+	if (new_size <= cache->c_maxcount)
+		new_size = cache->c_maxcount * 2;
 #ifdef CACHE_DEBUG
-	fprintf(stderr, "doubling cache size to %d\n", 2 * cache->c_maxcount);
+	fprintf(stderr, "increasing cache max size from %u to %u\n",
+			cache->c_maxcount, new_size);
 #endif
-	cache->c_maxcount *= 2;
+	cache->c_maxcount = new_size;
 	pthread_mutex_unlock(&cache->c_mutex);
 }
 


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 17/21] cache: implement automatic shrinking
  2025-09-16  0:22 ` [PATCHSET RFC v5 2/9] fuse4fs: fork a low level fuse server Darrick J. Wong
                     ` (15 preceding siblings ...)
  2025-09-16  0:54   ` [PATCH 16/21] cache: support gradual expansion Darrick J. Wong
@ 2025-09-16  0:55   ` Darrick J. Wong
  2025-09-16  0:55   ` [PATCH 18/21] fuse4fs: add cache to track open files Darrick J. Wong
                     ` (3 subsequent siblings)
  20 siblings, 0 replies; 87+ messages in thread
From: Darrick J. Wong @ 2025-09-16  0:55 UTC (permalink / raw)
  To: tytso
  Cc: miklos, neal, amir73il, linux-fsdevel, linux-ext4, John, bernd,
	joannelkoong

From: Darrick J. Wong <djwong@kernel.org>

Shrink the cache whenever maxcount has been expanded beyond its initial
value, we release a cached object to one of the mru lists and the number
of objects sitting on the mru is enough to drop the cache count down a
level.  This enables a cache to reduce its memory consumption after a
spike in which reclamation wasn't possible.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 lib/support/cache.h |   17 ++++++-
 lib/support/cache.c |  118 ++++++++++++++++++++++++++++++++++++++++++++++++---
 2 files changed, 126 insertions(+), 9 deletions(-)


diff --git a/lib/support/cache.h b/lib/support/cache.h
index ae37945c545f46..cd738b6cd3a460 100644
--- a/lib/support/cache.h
+++ b/lib/support/cache.h
@@ -16,6 +16,9 @@
  */
 #define CACHE_MISCOMPARE_PURGE	(1 << 0)
 
+/* Automatically shrink the cache's max_count when possible. */
+#define CACHE_CAN_SHRINK	(1U << 1)
+
 /*
  * cache object campare return values
  */
@@ -67,12 +70,18 @@ typedef unsigned int (*cache_bulk_relse_t)(struct cache *, struct list_head *);
 typedef int (*cache_node_get_t)(struct cache *c, struct cache_node *cn);
 typedef void (*cache_node_put_t)(struct cache *c, struct cache_node *cn);
 typedef unsigned int (*cache_node_resize_t)(const struct cache *c,
-					    unsigned int curr_size);
+					    unsigned int curr_size,
+					    int dir);
 
 static inline unsigned int cache_gradual_resize(const struct cache *cache,
-						unsigned int curr_size)
+						unsigned int curr_size,
+						int dir)
 {
-	return curr_size * 5 / 4;
+	if (dir < 0)
+		return curr_size * 9 / 10;
+	else if (dir > 0)
+		return curr_size * 5 / 4;
+	return curr_size;
 }
 
 struct cache_operations {
@@ -111,6 +120,7 @@ struct cache_node {
 
 struct cache {
 	int			c_flags;	/* behavioural flags */
+	unsigned int		c_orig_max;	/* original max cache nodes */
 	unsigned int		c_maxcount;	/* max cache nodes */
 	unsigned int		c_count;	/* count of nodes */
 	pthread_mutex_t		c_mutex;	/* node count mutex */
@@ -143,6 +153,7 @@ void cache_destroy(struct cache *cache);
 void cache_walk(struct cache *cache, cache_walk_t fn, void *data);
 void cache_purge(struct cache *);
 bool cache_flush(struct cache *cache);
+void cache_shrink(struct cache *cache);
 
 /* don't allocate a new node */
 #define CACHE_GET_INCORE	(1U << 0)
diff --git a/lib/support/cache.c b/lib/support/cache.c
index dbaddc1bd36d3d..7e1ddc3cc8788d 100644
--- a/lib/support/cache.c
+++ b/lib/support/cache.c
@@ -53,6 +53,7 @@ cache_init(
 	cache->c_hits = 0;
 	cache->c_misses = 0;
 	cache->c_maxcount = maxcount;
+	cache->c_orig_max = maxcount;
 	cache->hash = cache_operations->hash;
 	cache->alloc = cache_operations->alloc;
 	cache->flush = cache_operations->flush;
@@ -95,7 +96,7 @@ cache_expand(
 
 	pthread_mutex_lock(&cache->c_mutex);
 	if (cache->resize)
-		new_size = cache->resize(cache, cache->c_maxcount);
+		new_size = cache->resize(cache, cache->c_maxcount, 1);
 	if (new_size <= cache->c_maxcount)
 		new_size = cache->c_maxcount * 2;
 #ifdef CACHE_DEBUG
@@ -226,7 +227,8 @@ static unsigned int
 cache_shake(
 	struct cache *		cache,
 	unsigned int		priority,
-	bool			purge)
+	bool			purge,
+	unsigned int		nr_to_shake)
 {
 	struct cache_mru	*mru;
 	struct cache_hash	*hash;
@@ -274,7 +276,7 @@ cache_shake(
 		pthread_mutex_unlock(&node->cn_mutex);
 
 		count++;
-		if (!purge && count == CACHE_SHAKE_COUNT)
+		if (!purge && count == nr_to_shake)
 			break;
 	}
 	pthread_mutex_unlock(&mru->cm_mutex);
@@ -287,7 +289,7 @@ cache_shake(
 		pthread_mutex_unlock(&cache->c_mutex);
 	}
 
-	return (count == CACHE_SHAKE_COUNT) ? priority : ++priority;
+	return (count == nr_to_shake) ? priority : ++priority;
 }
 
 /*
@@ -477,7 +479,7 @@ cache_node_get(
 		node = cache_node_allocate(cache, key);
 		if (node)
 			break;
-		priority = cache_shake(cache, priority, false);
+		priority = cache_shake(cache, priority, false, CACHE_SHAKE_COUNT);
 		/*
 		 * We start at 0; if we free CACHE_SHAKE_COUNT we get
 		 * back the same priority, if not we get back priority+1.
@@ -507,12 +509,112 @@ cache_node_get(
 	return 1;
 }
 
+static unsigned int cache_mru_count(const struct cache *cache)
+{
+	const struct cache_mru	*mru = cache->c_mrus;
+	unsigned int		mru_count = 0;
+	unsigned int		i;
+
+	for (i = 0; i < CACHE_NR_PRIORITIES; i++, mru++)
+		mru_count += mru->cm_count;
+
+	return mru_count;
+}
+
+
+void cache_shrink(struct cache *cache)
+{
+	unsigned int		mru_count = 0;
+	unsigned int		threshold = 0;
+	unsigned int		priority = 0;
+	unsigned int		new_size;
+
+	pthread_mutex_lock(&cache->c_mutex);
+	/* Don't shrink below the original cache size */
+	if (cache->c_maxcount <= cache->c_orig_max)
+		goto out_unlock;
+
+	mru_count = cache_mru_count(cache);
+
+	/*
+	 * If there's not even a batch of nodes on the MRU to try to free,
+	 * don't bother with the rest.
+	 */
+	if (mru_count < CACHE_SHAKE_COUNT)
+		goto out_unlock;
+
+	/*
+	 * Figure out the next step down in size, but don't go below the
+	 * original size.
+	 */
+	if (cache->resize)
+		new_size = cache->resize(cache, cache->c_maxcount, -1);
+	else
+		new_size = cache->c_maxcount / 2;
+	if (new_size >= cache->c_maxcount)
+		goto out_unlock;
+	if (new_size < cache->c_orig_max)
+		new_size = cache->c_orig_max;
+
+	/*
+	 * If we can't purge enough nodes to get the node count below new_size,
+	 * don't resize the cache.
+	 */
+	if (cache->c_count - mru_count >= new_size)
+		goto out_unlock;
+
+#ifdef CACHE_DEBUG
+	fprintf(stderr, "decreasing cache max size from %u to %u (currently %u)\n",
+		cache->c_maxcount, new_size, cache->c_count);
+#endif
+	cache->c_maxcount = new_size;
+
+	/* Try to reduce the number of cached objects. */
+	do {
+		unsigned int new_priority;
+
+		/*
+		 * The threshold is the amount we need to purge to get c_count
+		 * below the new maxcount.  Try to free some objects off the
+		 * MRU.  Drop c_mutex because cache_shake will take it.
+		 */
+		threshold = cache->c_count - new_size;
+		pthread_mutex_unlock(&cache->c_mutex);
+
+		new_priority = cache_shake(cache, priority, false, threshold);
+
+		/* Either we made no progress or we ran out of MRU levels */
+		if (new_priority == priority ||
+		    new_priority > CACHE_MAX_PRIORITY)
+			return;
+		priority = new_priority;
+
+		pthread_mutex_lock(&cache->c_mutex);
+		/*
+		 * Someone could have walked in and changed the cache maxsize
+		 * again while we had the lock dropped.  If that happened, stop
+		 * clearing.
+		 */
+		if (cache->c_maxcount != new_size)
+			goto out_unlock;
+
+		mru_count = cache_mru_count(cache);
+		if (cache->c_count - mru_count >= new_size)
+			goto out_unlock;
+	} while (1);
+
+out_unlock:
+	pthread_mutex_unlock(&cache->c_mutex);
+	return;
+}
+
 void
 cache_node_put(
 	struct cache *		cache,
 	struct cache_node *	node)
 {
 	struct cache_mru *	mru;
+	bool was_put = false;
 
 	pthread_mutex_lock(&node->cn_mutex);
 #ifdef CACHE_DEBUG
@@ -528,6 +630,7 @@ cache_node_put(
 	}
 #endif
 	node->cn_count--;
+	was_put = (node->cn_count == 0);
 
 	if (node->cn_count == 0 && cache->put)
 		cache->put(cache, node);
@@ -541,6 +644,9 @@ cache_node_put(
 	}
 
 	pthread_mutex_unlock(&node->cn_mutex);
+
+	if (was_put && (cache->c_flags & CACHE_CAN_SHRINK))
+		cache_shrink(cache);
 }
 
 void
@@ -632,7 +738,7 @@ cache_purge(
 	int			i;
 
 	for (i = 0; i <= CACHE_DIRTY_PRIORITY; i++)
-		cache_shake(cache, i, true);
+		cache_shake(cache, i, true, CACHE_SHAKE_COUNT);
 
 #ifdef CACHE_DEBUG
 	if (cache->c_count != 0) {


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 18/21] fuse4fs: add cache to track open files
  2025-09-16  0:22 ` [PATCHSET RFC v5 2/9] fuse4fs: fork a low level fuse server Darrick J. Wong
                     ` (16 preceding siblings ...)
  2025-09-16  0:55   ` [PATCH 17/21] cache: implement automatic shrinking Darrick J. Wong
@ 2025-09-16  0:55   ` Darrick J. Wong
  2025-09-16  0:55   ` [PATCH 19/21] fuse4fs: use the orphaned inode list Darrick J. Wong
                     ` (2 subsequent siblings)
  20 siblings, 0 replies; 87+ messages in thread
From: Darrick J. Wong @ 2025-09-16  0:55 UTC (permalink / raw)
  To: tytso
  Cc: miklos, neal, amir73il, linux-fsdevel, linux-ext4, John, bernd,
	joannelkoong

From: Darrick J. Wong <djwong@kernel.org>

Add our own inode cache so that we can track open files.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 lib/support/cache.h |    7 +++
 fuse4fs/Makefile.in |    3 +
 fuse4fs/fuse4fs.c   |  132 +++++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 141 insertions(+), 1 deletion(-)


diff --git a/lib/support/cache.h b/lib/support/cache.h
index cd738b6cd3a460..f482948a3b6331 100644
--- a/lib/support/cache.h
+++ b/lib/support/cache.h
@@ -6,6 +6,13 @@
 #ifndef __CACHE_H__
 #define __CACHE_H__
 
+/*  2^63 + 2^61 - 2^57 + 2^54 - 2^51 - 2^18 + 1 */
+#define GOLDEN_RATIO_PRIME	0x9e37fffffffc0001UL
+#ifndef CACHE_LINE_SIZE
+/* if the system didn't tell us, guess something reasonable */
+#define CACHE_LINE_SIZE		64
+#endif
+
 /*
  * initialisation flags
  */
diff --git a/fuse4fs/Makefile.in b/fuse4fs/Makefile.in
index 6b41d1dd5ffe8d..9f3547c271638f 100644
--- a/fuse4fs/Makefile.in
+++ b/fuse4fs/Makefile.in
@@ -146,7 +146,8 @@ fuse4fs.o: $(srcdir)/fuse4fs.c $(top_builddir)/lib/config.h \
  $(top_srcdir)/lib/ext2fs/ext2_ext_attr.h $(top_srcdir)/lib/ext2fs/hashmap.h \
  $(top_srcdir)/lib/ext2fs/bitops.h $(top_srcdir)/lib/ext2fs/ext2fsP.h \
  $(top_srcdir)/lib/ext2fs/ext2fs.h $(top_srcdir)/version.h \
- $(top_srcdir)/lib/e2p/e2p.h
+ $(top_srcdir)/lib/e2p/e2p.h $(top_srcdir)/lib/support/cache.h \
+ $(top_srcdir)/lib/support/list.h $(top_srcdir)/lib/support/xbitops.h
 journal.o: $(srcdir)/../debugfs/journal.c $(top_builddir)/lib/config.h \
  $(top_builddir)/lib/dirpaths.h $(srcdir)/../debugfs/journal.h \
  $(top_srcdir)/e2fsck/jfs_user.h $(top_srcdir)/e2fsck/e2fsck.h \
diff --git a/fuse4fs/fuse4fs.c b/fuse4fs/fuse4fs.c
index 8b65dd1b419eaa..5b06e5a5b9668e 100644
--- a/fuse4fs/fuse4fs.c
+++ b/fuse4fs/fuse4fs.c
@@ -27,6 +27,7 @@
 #include <unistd.h>
 #include <ctype.h>
 #include <stdbool.h>
+#include <assert.h>
 #define FUSE_DARWIN_ENABLE_EXTENSIONS 0
 #ifdef __SET_FOB_FOR_FUSE
 # error Do not set magic value __SET_FOB_FOR_FUSE!!!!
@@ -49,6 +50,8 @@
 #include "ext2fs/ext2fs.h"
 #include "ext2fs/ext2_fs.h"
 #include "ext2fs/ext2fsP.h"
+#include "support/list.h"
+#include "support/cache.h"
 
 #include "../version.h"
 #include "uuid/uuid.h"
@@ -205,6 +208,7 @@ int journal_enable_debug = -1;
 #define FUSE4FS_FILE_MAGIC	(0xEF53DEAFUL)
 struct fuse4fs_file_handle {
 	unsigned long magic;
+	struct fuse4fs_inode *fi;
 	ext2_ino_t ino;
 	int open_flags;
 	int check_flags;
@@ -257,6 +261,7 @@ struct fuse4fs {
 	int timing;
 #endif
 	struct fuse_session *fuse;
+	struct cache inodes;
 };
 
 #define FUSE4FS_CHECK_HANDLE(req, fh) \
@@ -351,6 +356,115 @@ static inline int u_log2(unsigned int arg)
 	return l;
 }
 
+struct fuse4fs_inode {
+	struct cache_node	i_cnode;
+	ext2_ino_t		i_ino;
+	unsigned int		i_open_count;
+};
+
+struct fuse4fs_ikey {
+	ext2_ino_t		i_ino;
+};
+
+#define ICKEY(key)	((struct fuse4fs_ikey *)(key))
+#define ICNODE(node)	(container_of((node), struct fuse4fs_inode, i_cnode))
+
+static unsigned int
+icache_hash(cache_key_t key, unsigned int hashsize, unsigned int hashshift)
+{
+	uint64_t	hashval = ICKEY(key)->i_ino;
+	uint64_t	tmp;
+
+	tmp = hashval ^ (GOLDEN_RATIO_PRIME + hashval) / CACHE_LINE_SIZE;
+	tmp = tmp ^ ((tmp ^ GOLDEN_RATIO_PRIME) >> hashshift);
+	return tmp % hashsize;
+}
+
+static int icache_compare(struct cache_node *node, cache_key_t key)
+{
+	struct fuse4fs_inode *fi = ICNODE(node);
+	struct fuse4fs_ikey *ikey = ICKEY(key);
+
+	if (fi->i_ino == ikey->i_ino)
+		return CACHE_HIT;
+
+	return CACHE_MISS;
+}
+
+static struct cache_node *icache_alloc(struct cache *c, cache_key_t key)
+{
+	struct fuse4fs_ikey *ikey = ICKEY(key);
+	struct fuse4fs_inode *fi;
+
+	fi = calloc(1, sizeof(struct fuse4fs_inode));
+	if (!fi)
+		return NULL;
+
+	fi->i_ino = ikey->i_ino;
+	return &fi->i_cnode;
+}
+
+static bool icache_flush(struct cache *c, struct cache_node *node)
+{
+	return false;
+}
+
+static void icache_relse(struct cache *c, struct cache_node *node)
+{
+	struct fuse4fs_inode *fi = ICNODE(node);
+
+	assert(fi->i_open_count == 0);
+	free(fi);
+}
+
+static unsigned int icache_bulkrelse(struct cache *cache,
+				     struct list_head *list)
+{
+	struct cache_node *cn, *n;
+	int count = 0;
+
+	if (list_empty(list))
+		return 0;
+
+	list_for_each_entry_safe(cn, n, list, cn_mru) {
+		icache_relse(cache, cn);
+		count++;
+	}
+
+	return count;
+}
+
+static const struct cache_operations icache_ops = {
+	.hash		= icache_hash,
+	.alloc		= icache_alloc,
+	.flush		= icache_flush,
+	.relse		= icache_relse,
+	.compare	= icache_compare,
+	.bulkrelse	= icache_bulkrelse,
+	.resize		= cache_gradual_resize,
+};
+
+static errcode_t fuse4fs_iget(struct fuse4fs *ff, ext2_ino_t ino,
+			      struct fuse4fs_inode **fip)
+{
+	struct fuse4fs_ikey ikey = {
+		.i_ino = ino,
+	};
+	struct cache_node *node = NULL;
+
+	cache_node_get(&ff->inodes, &ikey, 0, &node);
+	if (!node)
+		return ENOMEM;
+
+	*fip = ICNODE(node);
+	return 0;
+}
+
+static void fuse4fs_iput(struct fuse4fs *ff, struct fuse4fs_inode *fi)
+{
+	cache_node_put(&ff->inodes, &fi->i_cnode);
+}
+
 static inline blk64_t FUSE4FS_B_TO_FSBT(const struct fuse4fs *ff, off_t pos)
 {
 	return pos >> ff->blocklog;
@@ -954,6 +1068,11 @@ static void fuse4fs_unmount(struct fuse4fs *ff)
 	if (!ff->fs)
 		return;
 
+	if (cache_initialized(&ff->inodes)) {
+		cache_purge(&ff->inodes);
+		cache_destroy(&ff->inodes);
+	}
+
 	err = ext2fs_close(ff->fs);
 	if (err) {
 		err_printf(ff, "%s: %s\n", _("while closing fs"),
@@ -1002,6 +1121,10 @@ static errcode_t fuse4fs_open(struct fuse4fs *ff, int libext2_flags)
 		return err;
 	}
 
+	err = cache_init(CACHE_CAN_SHRINK, 1U << 10, &icache_ops, &ff->inodes);
+	if (err)
+		return translate_error(ff->fs, 0, err);
+
 	ff->fs->priv_data = ff;
 	ff->blocklog = u_log2(ff->fs->blocksize);
 	ff->blockmask = ff->fs->blocksize - 1;
@@ -2071,6 +2194,7 @@ static int fuse4fs_remove_inode(struct fuse4fs *ff, ext2_ino_t ino)
 	if (inode.i_links_count)
 		goto write_out;
 
+
 	if (ext2fs_has_feature_ea_inode(fs->super)) {
 		ret = fuse4fs_remove_ea_inodes(ff, ino, &inode);
 		if (ret)
@@ -2987,6 +3111,13 @@ static int fuse4fs_open_file(struct fuse4fs *ff, const struct fuse_ctx *ctxt,
 			goto out;
 	}
 
+	err = fuse4fs_iget(ff, file->ino, &file->fi);
+	if (err) {
+		ret = translate_error(fs, 0, err);
+		goto out;
+	}
+	file->fi->i_open_count++;
+
 	file->check_flags = check;
 	fuse4fs_set_handle(fp, file);
 
@@ -3175,6 +3306,7 @@ static void op_release(fuse_req_t req, fuse_ino_t fino EXT2FS_ATTR((unused)),
 			ret = translate_error(fs, fh->ino, err);
 	}
 
+	fuse4fs_iput(ff, fh->fi);
 	fp->fh = 0;
 	fuse4fs_finish(ff, ret);
 


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 19/21] fuse4fs: use the orphaned inode list
  2025-09-16  0:22 ` [PATCHSET RFC v5 2/9] fuse4fs: fork a low level fuse server Darrick J. Wong
                     ` (17 preceding siblings ...)
  2025-09-16  0:55   ` [PATCH 18/21] fuse4fs: add cache to track open files Darrick J. Wong
@ 2025-09-16  0:55   ` Darrick J. Wong
  2025-09-16  0:55   ` [PATCH 20/21] fuse4fs: implement FUSE_TMPFILE Darrick J. Wong
  2025-09-16  0:56   ` [PATCH 21/21] fuse4fs: create incore reverse orphan list Darrick J. Wong
  20 siblings, 0 replies; 87+ messages in thread
From: Darrick J. Wong @ 2025-09-16  0:55 UTC (permalink / raw)
  To: tytso
  Cc: miklos, neal, amir73il, linux-fsdevel, linux-ext4, John, bernd,
	joannelkoong

From: Darrick J. Wong <djwong@kernel.org>

Put open but unlinked files on the orphan list, and remove them when the
last open fd releases the inode.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 fuse4fs/fuse4fs.c |  181 ++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 178 insertions(+), 3 deletions(-)


diff --git a/fuse4fs/fuse4fs.c b/fuse4fs/fuse4fs.c
index 5b06e5a5b9668e..e046c782957e60 100644
--- a/fuse4fs/fuse4fs.c
+++ b/fuse4fs/fuse4fs.c
@@ -960,6 +960,13 @@ static int fuse4fs_inum_access(struct fuse4fs *ff, const struct fuse_ctx *ctxt,
 		   inode_uid(inode), inode_gid(inode),
 		   ctxt->uid, ctxt->gid);
 
+	/* linked files cannot be on the unlinked list or deleted */
+	if (inode.i_dtime != 0) {
+		dbg_printf(ff, "%s: unlinked ino=%d dtime=0x%x\n",
+			   __func__, ino, inode.i_dtime);
+		return -ENOENT;
+	}
+
 	/* existence check */
 	if (mask == 0)
 		return 0;
@@ -2162,9 +2169,80 @@ static int fuse4fs_remove_ea_inodes(struct fuse4fs *ff, ext2_ino_t ino,
 	return 0;
 }
 
+static int fuse4fs_add_to_orphans(struct fuse4fs *ff, ext2_ino_t ino,
+				  struct ext2_inode_large *inode)
+{
+	ext2_filsys fs = ff->fs;
+
+	dbg_printf(ff, "%s: orphan ino=%d dtime=%d next=%d\n",
+		   __func__, ino, inode->i_dtime, fs->super->s_last_orphan);
+
+	inode->i_dtime = fs->super->s_last_orphan;
+	fs->super->s_last_orphan = ino;
+	ext2fs_mark_super_dirty(fs);
+
+	return 0;
+}
+
+static int fuse4fs_remove_from_orphans(struct fuse4fs *ff, ext2_ino_t ino,
+				       struct ext2_inode_large *inode)
+{
+	ext2_filsys fs = ff->fs;
+	ext2_ino_t prev_orphan;
+	errcode_t err;
+
+	dbg_printf(ff, "%s: super=%d ino=%d next=%d\n",
+		   __func__, fs->super->s_last_orphan, ino, inode->i_dtime);
+
+	/* If we're lucky, the ondisk superblock points to us */
+	if (fs->super->s_last_orphan == ino) {
+		dbg_printf(ff, "%s: superblock\n", __func__);
+
+		fs->super->s_last_orphan = inode->i_dtime;
+		inode->i_dtime = 0;
+		ext2fs_mark_super_dirty(fs);
+		return 0;
+	}
+
+	/* Otherwise walk the ondisk orphan list. */
+	prev_orphan = fs->super->s_last_orphan;
+	while (prev_orphan != 0) {
+		struct ext2_inode_large orphan;
+
+		err = fuse4fs_read_inode(fs, prev_orphan, &orphan);
+		if (err)
+			return translate_error(fs, prev_orphan, err);
+
+		if (orphan.i_dtime == prev_orphan)
+			return translate_error(fs, prev_orphan,
+					       EXT2_ET_FILESYSTEM_CORRUPTED);
+
+		if (orphan.i_dtime == ino) {
+			dbg_printf(ff, "%s: prev=%d\n",
+				   __func__, prev_orphan);
+
+			orphan.i_dtime = inode->i_dtime;
+			inode->i_dtime = 0;
+
+			err = fuse4fs_write_inode(fs, prev_orphan, &orphan);
+			if (err)
+				return translate_error(fs, prev_orphan, err);
+
+			return 0;
+		}
+
+		dbg_printf(ff, "%s: orphan=%d next=%d\n",
+			   __func__, prev_orphan, orphan.i_dtime);
+		prev_orphan = orphan.i_dtime;
+	}
+
+	return translate_error(fs, ino, EXT2_ET_FILESYSTEM_CORRUPTED);
+}
+
 static int fuse4fs_remove_inode(struct fuse4fs *ff, ext2_ino_t ino)
 {
 	ext2_filsys fs = ff->fs;
+	struct fuse4fs_inode *fi;
 	errcode_t err;
 	struct ext2_inode_large inode;
 	int ret = 0;
@@ -2181,7 +2259,6 @@ static int fuse4fs_remove_inode(struct fuse4fs *ff, ext2_ino_t ino)
 		return 0; /* XXX: already done? */
 	case 1:
 		inode.i_links_count--;
-		ext2fs_set_dtime(fs, EXT2_INODE(&inode));
 		break;
 	default:
 		inode.i_links_count--;
@@ -2194,6 +2271,26 @@ static int fuse4fs_remove_inode(struct fuse4fs *ff, ext2_ino_t ino)
 	if (inode.i_links_count)
 		goto write_out;
 
+	err = fuse4fs_iget(ff, ino, &fi);
+	if (err)
+		return translate_error(fs, ino, err);
+
+	dbg_printf(ff, "%s: put ino=%d opencount=%d\n", __func__, ino,
+		   fi->i_open_count);
+
+	/*
+	 * The file is unlinked but still open; add it to the orphan list and
+	 * free it later.
+	 */
+	if (fi->i_open_count > 0) {
+		fuse4fs_iput(ff, fi);
+		ret = fuse4fs_add_to_orphans(ff, ino, &inode);
+		if (ret)
+			return ret;
+
+		goto write_out;
+	}
+	fuse4fs_iput(ff, fi);
 
 	if (ext2fs_has_feature_ea_inode(fs->super)) {
 		ret = fuse4fs_remove_ea_inodes(ff, ino, &inode);
@@ -2213,6 +2310,7 @@ static int fuse4fs_remove_inode(struct fuse4fs *ff, ext2_ino_t ino)
 			return translate_error(fs, ino, err);
 	}
 
+	ext2fs_set_dtime(fs, EXT2_INODE(&inode));
 	ext2fs_inode_alloc_stats2(fs, ino, -1,
 				  LINUX_S_ISDIR(inode.i_mode));
 
@@ -2761,6 +2859,16 @@ static void op_link(fuse_req_t req, fuse_ino_t child_fino,
 	if (ret)
 		goto out2;
 
+	/*
+	 * Linking a file back into the filesystem requires removing it from
+	 * the orphan list.
+	 */
+	if (inode.i_links_count == 0) {
+		ret = fuse4fs_remove_from_orphans(ff, child, &inode);
+		if (ret)
+			goto out2;
+	}
+
 	inode.i_links_count++;
 	ret = update_ctime(fs, child, &inode);
 	if (ret)
@@ -3044,7 +3152,8 @@ static void detect_linux_executable_open(int kernel_flags, int *access_check,
 #endif /* __linux__ */
 
 static int fuse4fs_open_file(struct fuse4fs *ff, const struct fuse_ctx *ctxt,
-			     ext2_ino_t ino, struct fuse_file_info *fp)
+			     ext2_ino_t ino,
+			     struct fuse_file_info *fp)
 {
 	ext2_filsys fs = ff->fs;
 	errcode_t err;
@@ -3120,6 +3229,8 @@ static int fuse4fs_open_file(struct fuse4fs *ff, const struct fuse_ctx *ctxt,
 
 	file->check_flags = check;
 	fuse4fs_set_handle(fp, file);
+	dbg_printf(ff, "%s: ino=%d fh=%p opencount=%d\n", __func__, ino, file,
+		   file->fi->i_open_count);
 
 out:
 	if (ret)
@@ -3136,6 +3247,8 @@ static void op_open(fuse_req_t req, fuse_ino_t fino, struct fuse_file_info *fp)
 
 	FUSE4FS_CHECK_CONTEXT(req);
 	FUSE4FS_CONVERT_FINO(req, &ino, fino);
+	dbg_printf(ff, "%s: ino=%d\n", __func__, ino);
+
 	fuse4fs_start(ff);
 	ret = fuse4fs_open_file(ff, ctxt, ino, fp);
 	fuse4fs_finish(ff, ret);
@@ -3284,6 +3397,55 @@ static void op_write(fuse_req_t req, fuse_ino_t fino EXT2FS_ATTR((unused)),
 		fuse_reply_err(req, -ret);
 }
 
+static int fuse4fs_free_unlinked(struct fuse4fs *ff, ext2_ino_t ino)
+{
+	struct ext2_inode_large inode;
+	ext2_filsys fs = ff->fs;
+	errcode_t err;
+	int ret = 0;
+
+	err = fuse4fs_read_inode(fs, ino, &inode);
+	if (err)
+		return translate_error(fs, ino, err);
+
+	if (inode.i_links_count > 0)
+		return 0;
+
+	dbg_printf(ff, "%s: ino=%d links=%d\n", __func__, ino,
+		   inode.i_links_count);
+
+	if (ext2fs_has_feature_ea_inode(fs->super)) {
+		ret = fuse4fs_remove_ea_inodes(ff, ino, &inode);
+		if (ret)
+			return ret;
+	}
+
+	/* Nobody holds this file; free its blocks! */
+	err = ext2fs_free_ext_attr(fs, ino, &inode);
+	if (err)
+		return translate_error(fs, ino, err);
+
+	if (ext2fs_inode_has_valid_blocks2(fs, EXT2_INODE(&inode))) {
+		err = ext2fs_punch(fs, ino, EXT2_INODE(&inode), NULL,
+				   0, ~0ULL);
+		if (err)
+			return translate_error(fs, ino, err);
+	}
+
+	ret = fuse4fs_remove_from_orphans(ff, ino, &inode);
+	if (ret)
+		return ret;
+
+	ext2fs_set_dtime(fs, EXT2_INODE(&inode));
+	ext2fs_inode_alloc_stats2(fs, ino, -1, LINUX_S_ISDIR(inode.i_mode));
+
+	err = fuse4fs_write_inode(fs, ino, &inode);
+	if (err)
+		return translate_error(fs, ino, err);
+
+	return 0;
+}
+
 static void op_release(fuse_req_t req, fuse_ino_t fino EXT2FS_ATTR((unused)),
 		       struct fuse_file_info *fp)
 {
@@ -3295,9 +3457,21 @@ static void op_release(fuse_req_t req, fuse_ino_t fino EXT2FS_ATTR((unused)),
 
 	FUSE4FS_CHECK_CONTEXT(req);
 	FUSE4FS_CHECK_HANDLE(req, fh);
-	dbg_printf(ff, "%s: ino=%d\n", __func__, fh->ino);
+	dbg_printf(ff, "%s: ino=%d fh=%p opencount=%u\n",
+		   __func__, fh->ino, fh, fh->fi->i_open_count);
+
 	fs = fuse4fs_start(ff);
 
+	/*
+	 * If the file is no longer open and is unlinked, free it, which
+	 * removes it from the ondisk list.
+	 */
+	if (--fh->fi->i_open_count == 0) {
+		ret = fuse4fs_free_unlinked(ff, fh->ino);
+		if (ret)
+			goto out_iput;
+	}
+
 	if ((fp->flags & O_SYNC) &&
 	    fuse4fs_is_writeable(ff) &&
 	    (fh->open_flags & EXT2_FILE_WRITE)) {
@@ -3306,6 +3480,7 @@ static void op_release(fuse_req_t req, fuse_ino_t fino EXT2FS_ATTR((unused)),
 			ret = translate_error(fs, fh->ino, err);
 	}
 
+out_iput:
 	fuse4fs_iput(ff, fh->fi);
 	fp->fh = 0;
 	fuse4fs_finish(ff, ret);


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 20/21] fuse4fs: implement FUSE_TMPFILE
  2025-09-16  0:22 ` [PATCHSET RFC v5 2/9] fuse4fs: fork a low level fuse server Darrick J. Wong
                     ` (18 preceding siblings ...)
  2025-09-16  0:55   ` [PATCH 19/21] fuse4fs: use the orphaned inode list Darrick J. Wong
@ 2025-09-16  0:55   ` Darrick J. Wong
  2025-09-16  0:56   ` [PATCH 21/21] fuse4fs: create incore reverse orphan list Darrick J. Wong
  20 siblings, 0 replies; 87+ messages in thread
From: Darrick J. Wong @ 2025-09-16  0:55 UTC (permalink / raw)
  To: tytso
  Cc: miklos, neal, amir73il, linux-fsdevel, linux-ext4, John, bernd,
	joannelkoong

From: Darrick J. Wong <djwong@kernel.org>

Allow creation of O_TMPFILE files now that we know how to use the
unlinked list.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 fuse4fs/fuse4fs.c |   93 ++++++++++++++++++++++++++++++++++++++---------------
 1 file changed, 67 insertions(+), 26 deletions(-)


diff --git a/fuse4fs/fuse4fs.c b/fuse4fs/fuse4fs.c
index e046c782957e60..1015b13f4adac7 100644
--- a/fuse4fs/fuse4fs.c
+++ b/fuse4fs/fuse4fs.c
@@ -902,22 +902,25 @@ static inline int fuse4fs_want_check_owner(struct fuse4fs *ff,
 
 /* Test for append permission */
 #define A_OK	16
+/* Test for linked file */
+#define L_OK	32
 
 static int fuse4fs_iflags_access(struct fuse4fs *ff, ext2_ino_t ino,
 				 const struct ext2_inode *inode, int mask)
 {
-	EXT2FS_BUILD_BUG_ON((A_OK & (R_OK | W_OK | X_OK | F_OK)) != 0);
+	EXT2FS_BUILD_BUG_ON(((A_OK | L_OK) & (R_OK | W_OK | X_OK | F_OK)) != 0);
 
 	/* no writing or metadata changes to read-only or broken fs */
 	if ((mask & (W_OK | A_OK)) && !fuse4fs_is_writeable(ff))
 		return -EROFS;
 
-	dbg_printf(ff, "access ino=%d mask=e%s%s%s%s iflags=0x%x\n",
+	dbg_printf(ff, "access ino=%d mask=e%s%s%s%s%s iflags=0x%x\n",
 		   ino,
 		   (mask & R_OK ? "r" : ""),
 		   (mask & W_OK ? "w" : ""),
 		   (mask & X_OK ? "x" : ""),
 		   (mask & A_OK ? "a" : ""),
+		   (mask & L_OK ? "l" : ""),
 		   inode->i_flags);
 
 	/* is immutable? */
@@ -950,21 +953,31 @@ static int fuse4fs_inum_access(struct fuse4fs *ff, const struct fuse_ctx *ctxt,
 		return translate_error(fs, ino, err);
 	perms = inode.i_mode & 0777;
 
-	dbg_printf(ff, "access ino=%d mask=e%s%s%s%s perms=0%o iflags=0x%x "
+	dbg_printf(ff, "access ino=%d mask=e%s%s%s%s%s perms=0%o iflags=0x%x "
 		   "fuid=%d fgid=%d uid=%d gid=%d\n", ino,
 		   (mask & R_OK ? "r" : ""),
 		   (mask & W_OK ? "w" : ""),
 		   (mask & X_OK ? "x" : ""),
 		   (mask & A_OK ? "a" : ""),
+		   (mask & L_OK ? "l" : ""),
 		   perms, inode.i_flags,
 		   inode_uid(inode), inode_gid(inode),
 		   ctxt->uid, ctxt->gid);
 
-	/* linked files cannot be on the unlinked list or deleted */
-	if (inode.i_dtime != 0) {
-		dbg_printf(ff, "%s: unlinked ino=%d dtime=0x%x\n",
-			   __func__, ino, inode.i_dtime);
-		return -ENOENT;
+	if (mask & L_OK) {
+		/* linked files cannot be on the unlinked list or deleted */
+		if (inode.i_dtime != 0) {
+			dbg_printf(ff, "%s: unlinked ino=%d dtime=0x%x\n",
+				   __func__, ino, inode.i_dtime);
+			return -ENOENT;
+		}
+	} else {
+		/* unlinked files cannot be deleted */
+		if (inode.i_dtime >= fs->super->s_inodes_count) {
+			dbg_printf(ff, "%s: deleted ino=%d dtime=0x%x\n",
+				   __func__, ino, inode.i_dtime);
+			return -ENOENT;
+		}
 	}
 
 	/* existence check */
@@ -3152,7 +3165,7 @@ static void detect_linux_executable_open(int kernel_flags, int *access_check,
 #endif /* __linux__ */
 
 static int fuse4fs_open_file(struct fuse4fs *ff, const struct fuse_ctx *ctxt,
-			     ext2_ino_t ino,
+			     ext2_ino_t ino, bool linked,
 			     struct fuse_file_info *fp)
 {
 	ext2_filsys fs = ff->fs;
@@ -3182,6 +3195,9 @@ static int fuse4fs_open_file(struct fuse4fs *ff, const struct fuse_ctx *ctxt,
 		break;
 	}
 
+	if (linked)
+		check |= L_OK;
+
 	/*
 	 * If the caller wants to truncate the file, we need to ask for full
 	 * write access even if the caller claims to be appending.
@@ -3250,7 +3266,7 @@ static void op_open(fuse_req_t req, fuse_ino_t fino, struct fuse_file_info *fp)
 	dbg_printf(ff, "%s: ino=%d\n", __func__, ino);
 
 	fuse4fs_start(ff);
-	ret = fuse4fs_open_file(ff, ctxt, ino, fp);
+	ret = fuse4fs_open_file(ff, ctxt, ino, true, fp);
 	fuse4fs_finish(ff, ret);
 
 	if (ret)
@@ -4159,22 +4175,28 @@ static void op_create(fuse_req_t req, fuse_ino_t fino, const char *name,
 		goto out2;
 	}
 
-	dbg_printf(ff, "%s: creating dir=%d name='%s' child=%d\n",
-		   __func__, parent, name, child);
-	err = ext2fs_link(fs, parent, name, child,
-			  filetype | EXT2FS_LINK_EXPAND);
-	if (err) {
-		ret = translate_error(fs, parent, err);
-		goto out2;
+	if (name) {
+		dbg_printf(ff, "%s: creating dir=%d name='%s' child=%d\n",
+			   __func__, parent, name, child);
+
+		err = ext2fs_link(fs, parent, name, child,
+				  filetype | EXT2FS_LINK_EXPAND);
+		if (err) {
+			ret = translate_error(fs, parent, err);
+			goto out2;
+		}
+
+		ret = update_mtime(fs, parent, NULL);
+		if (ret)
+			goto out2;
+	} else {
+		dbg_printf(ff, "%s: creating dir=%d tempfile=%d\n",
+			   __func__, parent, child);
 	}
 
-	ret = update_mtime(fs, parent, NULL);
-	if (ret)
-		goto out2;
-
 	memset(&inode, 0, sizeof(inode));
 	inode.i_mode = mode;
-	inode.i_links_count = 1;
+	inode.i_links_count = name ? 1 : 0;
 	fuse4fs_set_extra_isize(ff, child, &inode);
 	fuse4fs_set_uid(&inode, ctxt->uid);
 	fuse4fs_set_gid(&inode, gid);
@@ -4192,6 +4214,12 @@ static void op_create(fuse_req_t req, fuse_ino_t fino, const char *name,
 		ext2fs_extent_free(handle);
 	}
 
+	if (!name) {
+		ret = fuse4fs_add_to_orphans(ff, child, &inode);
+		if (ret)
+			goto out2;
+	}
+
 	err = ext2fs_write_new_inode(fs, child, EXT2_INODE(&inode));
 	if (err) {
 		ret = translate_error(fs, child, err);
@@ -4213,13 +4241,15 @@ static void op_create(fuse_req_t req, fuse_ino_t fino, const char *name,
 		goto out2;
 
 	fp->flags &= ~O_TRUNC;
-	ret = fuse4fs_open_file(ff, ctxt, child, fp);
+	ret = fuse4fs_open_file(ff, ctxt, child, name != NULL, fp);
 	if (ret)
 		goto out2;
 
-	ret = fuse4fs_dirsync_flush(ff, parent, NULL);
-	if (ret)
-		goto out2;
+	if (name) {
+		ret = fuse4fs_dirsync_flush(ff, parent, NULL);
+		if (ret)
+			goto out2;
+	}
 
 	ret = fuse4fs_stat_inode(ff, child, NULL, &fstat);
 	if (ret)
@@ -4234,6 +4264,14 @@ static void op_create(fuse_req_t req, fuse_ino_t fino, const char *name,
 		fuse_reply_create(req, &fstat.entry, fp);
 }
 
+#if FUSE_VERSION >= FUSE_MAKE_VERSION(3, 17)
+static void op_tmpfile(fuse_req_t req, fuse_ino_t fino, mode_t mode,
+		       struct fuse_file_info *fp)
+{
+	op_create(req, fino, NULL, mode, fp);
+}
+#endif
+
 enum fuse4fs_time_action {
 	TA_NOW,		/* set to current time */
 	TA_OMIT,	/* do not set timestamp */
@@ -5225,6 +5263,9 @@ static struct fuse_lowlevel_ops fs_ops = {
 	.fsyncdir = op_fsync,
 	.access = op_access,
 	.create = op_create,
+#if FUSE_VERSION >= FUSE_MAKE_VERSION(3, 17)
+	.tmpfile = op_tmpfile,
+#endif
 	.bmap = op_bmap,
 #ifdef SUPERFLUOUS
 	.lock = op_lock,


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 21/21] fuse4fs: create incore reverse orphan list
  2025-09-16  0:22 ` [PATCHSET RFC v5 2/9] fuse4fs: fork a low level fuse server Darrick J. Wong
                     ` (19 preceding siblings ...)
  2025-09-16  0:55   ` [PATCH 20/21] fuse4fs: implement FUSE_TMPFILE Darrick J. Wong
@ 2025-09-16  0:56   ` Darrick J. Wong
  20 siblings, 0 replies; 87+ messages in thread
From: Darrick J. Wong @ 2025-09-16  0:56 UTC (permalink / raw)
  To: tytso
  Cc: miklos, neal, amir73il, linux-fsdevel, linux-ext4, John, bernd,
	joannelkoong

From: Darrick J. Wong <djwong@kernel.org>

Create an incore orphan list so that removing open unlinked inodes
doesn't take forever.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 fuse4fs/fuse4fs.c |  178 ++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 174 insertions(+), 4 deletions(-)


diff --git a/fuse4fs/fuse4fs.c b/fuse4fs/fuse4fs.c
index 1015b13f4adac7..9a6913f6eef16a 100644
--- a/fuse4fs/fuse4fs.c
+++ b/fuse4fs/fuse4fs.c
@@ -356,10 +356,20 @@ static inline int u_log2(unsigned int arg)
 	return l;
 }
 
+/* inode is not on unlinked list */
+#define FUSE4FS_NULL_INO	((ext2_ino_t)~0ULL)
+
 struct fuse4fs_inode {
 	struct cache_node	i_cnode;
 	ext2_ino_t		i_ino;
 	unsigned int		i_open_count;
+
+	/*
+	 * FUSE4FS_NULL_INO: inode is not on the orphan list
+	 * 0: inode is the first on the orphan list
+	 * otherwise: inode is in the middle of the list
+	 */
+	ext2_ino_t		i_prev_orphan;
 };
 
 struct fuse4fs_ikey {
@@ -401,12 +411,15 @@ static struct cache_node *icache_alloc(struct cache *c, cache_key_t key)
 		return NULL;
 
 	fi->i_ino = ikey->i_ino;
+	fi->i_prev_orphan = FUSE4FS_NULL_INO;
 	return &fi->i_cnode;
 }
 
 static bool icache_flush(struct cache *c, struct cache_node *node)
 {
-	return false;
+	struct fuse4fs_inode *fi = ICNODE(node);
+
+	return fi->i_prev_orphan != FUSE4FS_NULL_INO;
 }
 
 static void icache_relse(struct cache *c, struct cache_node *node)
@@ -2186,10 +2199,31 @@ static int fuse4fs_add_to_orphans(struct fuse4fs *ff, ext2_ino_t ino,
 				  struct ext2_inode_large *inode)
 {
 	ext2_filsys fs = ff->fs;
+	struct fuse4fs_inode *fi;
+	ext2_ino_t orphan_ino = fs->super->s_last_orphan;
+	errcode_t err;
 
 	dbg_printf(ff, "%s: orphan ino=%d dtime=%d next=%d\n",
 		   __func__, ino, inode->i_dtime, fs->super->s_last_orphan);
 
+	/* Make the first orphan on the list point back to us */
+	if (orphan_ino != 0) {
+		err = fuse4fs_iget(ff, orphan_ino, &fi);
+		if (err)
+			return translate_error(fs, orphan_ino, err);
+
+		fi->i_prev_orphan = ino;
+		fuse4fs_iput(ff, fi);
+	}
+
+	/* Add ourselves to the head of the orphan list */
+	err = fuse4fs_iget(ff, ino, &fi);
+	if (err)
+		return translate_error(fs, ino, err);
+
+	fi->i_prev_orphan = 0;
+	fuse4fs_iput(ff, fi);
+
 	inode->i_dtime = fs->super->s_last_orphan;
 	fs->super->s_last_orphan = ino;
 	ext2fs_mark_super_dirty(fs);
@@ -2197,24 +2231,158 @@ static int fuse4fs_add_to_orphans(struct fuse4fs *ff, ext2_ino_t ino,
 	return 0;
 }
 
+/*
+ * Given the orphan list excerpt: prev_orphan -> ino -> next_orphan, set
+ * next_orphan's backpointer to ino's backpointer (prev_orphan), having removed
+ * ino from the orphan list.
+ */
+static int fuse2fs_update_next_orphan_backlink(struct fuse4fs *ff,
+					       ext2_ino_t prev_orphan,
+					       ext2_ino_t ino,
+					       ext2_ino_t next_orphan)
+{
+	struct fuse4fs_inode *fi;
+	errcode_t err;
+	int ret = 0;
+
+	err = fuse4fs_iget(ff, next_orphan, &fi);
+	if (err)
+		return translate_error(ff->fs, next_orphan, err);
+
+	dbg_printf(ff, "%s: ino=%d cached next=%d nextprev=%d prev=%d\n",
+		   __func__, ino, next_orphan, fi->i_prev_orphan,
+		   prev_orphan);
+
+	if (fi->i_prev_orphan != ino) {
+		ret = translate_error(ff->fs, next_orphan,
+				      EXT2_ET_FILESYSTEM_CORRUPTED);
+		goto out_iput;
+	}
+
+	fi->i_prev_orphan = prev_orphan;
+out_iput:
+	fuse4fs_iput(ff, fi);
+	return ret;
+}
+
+/*
+ * Remove ino from the orphan list the fast way.  Returns 1 for success, 0 if
+ * it didn't do anything, or a negative errno.
+ */
+static int fuse4fs_fast_remove_from_orphans(struct fuse4fs *ff, ext2_ino_t ino,
+					    struct ext2_inode_large *inode)
+{
+	struct ext2_inode_large orphan;
+	ext2_filsys fs = ff->fs;
+	struct fuse4fs_inode *fi;
+	ext2_ino_t prev_orphan;
+	ext2_ino_t next_orphan = 0;
+	errcode_t err;
+	int ret = 0;
+
+	err = fuse4fs_iget(ff, ino, &fi);
+	if (err)
+		return translate_error(fs, ino, err);
+
+	prev_orphan = fi->i_prev_orphan;
+	switch (prev_orphan) {
+	case 0:
+		/* First inode in the list */
+		dbg_printf(ff, "%s: ino=%d cached superblock\n", __func__, ino);
+
+		fs->super->s_last_orphan = inode->i_dtime;
+		next_orphan = inode->i_dtime;
+		inode->i_dtime = 0;
+		ext2fs_mark_super_dirty(fs);
+		fi->i_prev_orphan = FUSE4FS_NULL_INO;
+		break;
+	case FUSE4FS_NULL_INO:
+		/* unknown */
+		dbg_printf(ff, "%s: ino=%d broken list??\n", __func__, ino);
+		ret = 0;
+		goto out_iput;
+	default:
+		/* We're in the middle of the list */
+		err = fuse4fs_read_inode(fs, prev_orphan, &orphan);
+		if (err) {
+			ret = translate_error(fs, prev_orphan, err);
+			goto out_iput;
+		}
+
+		dbg_printf(ff,
+ "%s: ino=%d cached prev=%d prevnext=%d next=%d\n",
+			   __func__, ino, prev_orphan, orphan.i_dtime,
+			   inode->i_dtime);
+
+		if (orphan.i_dtime != ino) {
+			ret = translate_error(fs, prev_orphan,
+					      EXT2_ET_FILESYSTEM_CORRUPTED);
+			goto out_iput;
+		}
+
+		fi->i_prev_orphan = FUSE4FS_NULL_INO;
+		orphan.i_dtime = inode->i_dtime;
+		next_orphan = inode->i_dtime;
+		inode->i_dtime = 0;
+
+		err = fuse4fs_write_inode(fs, prev_orphan, &orphan);
+		if (err) {
+			ret = translate_error(fs, prev_orphan, err);
+			goto out_iput;
+		}
+
+		break;
+	}
+
+	/*
+	 * Make the next orphaned inode point back to the our own previous list
+	 * entry
+	 */
+	if (next_orphan != 0) {
+		ret = fuse2fs_update_next_orphan_backlink(ff, prev_orphan, ino,
+							  next_orphan);
+		if (ret)
+			goto out_iput;
+	}
+	ret = 1;
+
+out_iput:
+	fuse4fs_iput(ff, fi);
+	return ret;
+}
+
 static int fuse4fs_remove_from_orphans(struct fuse4fs *ff, ext2_ino_t ino,
 				       struct ext2_inode_large *inode)
 {
 	ext2_filsys fs = ff->fs;
 	ext2_ino_t prev_orphan;
+	ext2_ino_t next_orphan;
 	errcode_t err;
+	int ret;
 
 	dbg_printf(ff, "%s: super=%d ino=%d next=%d\n",
 		   __func__, fs->super->s_last_orphan, ino, inode->i_dtime);
 
-	/* If we're lucky, the ondisk superblock points to us */
+	/*
+	 * Fast way: use the incore list, which doesn't include any orphans
+	 * that were already on the superblock when we mounted.
+	 */
+	ret = fuse4fs_fast_remove_from_orphans(ff, ino, inode);
+	if (ret < 0)
+		return ret;
+	if (ret == 1)
+		return 0;
+
+	/* Slow way: If we're lucky, the ondisk superblock points to us */
 	if (fs->super->s_last_orphan == ino) {
 		dbg_printf(ff, "%s: superblock\n", __func__);
 
+		next_orphan = inode->i_dtime;
 		fs->super->s_last_orphan = inode->i_dtime;
 		inode->i_dtime = 0;
 		ext2fs_mark_super_dirty(fs);
-		return 0;
+		return fuse2fs_update_next_orphan_backlink(ff, 0, ino,
+							   next_orphan);
 	}
 
 	/* Otherwise walk the ondisk orphan list. */
@@ -2234,6 +2402,7 @@ static int fuse4fs_remove_from_orphans(struct fuse4fs *ff, ext2_ino_t ino,
 			dbg_printf(ff, "%s: prev=%d\n",
 				   __func__, prev_orphan);
 
+			next_orphan = inode->i_dtime;
 			orphan.i_dtime = inode->i_dtime;
 			inode->i_dtime = 0;
 
@@ -2241,7 +2410,8 @@ static int fuse4fs_remove_from_orphans(struct fuse4fs *ff, ext2_ino_t ino,
 			if (err)
 				return translate_error(fs, prev_orphan, err);
 
-			return 0;
+			return fuse2fs_update_next_orphan_backlink(ff,
+					prev_orphan, ino, next_orphan);
 		}
 
 		dbg_printf(ff, "%s: orphan=%d next=%d\n",


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 01/10] libext2fs: make it possible to extract the fd from an IO manager
  2025-09-16  0:22 ` [PATCHSET RFC v5 3/9] libext2fs: refactoring for fuse2fs iomap support Darrick J. Wong
@ 2025-09-16  0:56   ` Darrick J. Wong
  2025-09-16  0:56   ` [PATCH 02/10] libext2fs: always fsync the device when flushing the cache Darrick J. Wong
                     ` (8 subsequent siblings)
  9 siblings, 0 replies; 87+ messages in thread
From: Darrick J. Wong @ 2025-09-16  0:56 UTC (permalink / raw)
  To: tytso; +Cc: miklos, neal, linux-fsdevel, linux-ext4, John, bernd,
	joannelkoong

From: Darrick J. Wong <djwong@kernel.org>

Make it so that we can extract the fd from an open IO manager.  This
will be used in subsequent patches to register the open block device
with the fuse iomap kernel driver.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 lib/ext2fs/ext2_io.h         |    4 +++-
 debian/libext2fs2t64.symbols |    1 +
 lib/ext2fs/io_manager.c      |    8 ++++++++
 lib/ext2fs/unix_io.c         |   15 +++++++++++++++
 4 files changed, 27 insertions(+), 1 deletion(-)


diff --git a/lib/ext2fs/ext2_io.h b/lib/ext2fs/ext2_io.h
index 39a4e8fcf6b515..f53983b30996b4 100644
--- a/lib/ext2fs/ext2_io.h
+++ b/lib/ext2fs/ext2_io.h
@@ -102,7 +102,8 @@ struct struct_io_manager {
 				     unsigned long long count);
 	errcode_t (*zeroout)(io_channel channel, unsigned long long block,
 			     unsigned long long count);
-	long	reserved[14];
+	errcode_t (*get_fd)(io_channel channel, int *fd);
+	long	reserved[13];
 };
 
 #define IO_FLAG_RW		0x0001
@@ -145,6 +146,7 @@ extern errcode_t io_channel_alloc_buf(io_channel channel,
 extern errcode_t io_channel_cache_readahead(io_channel io,
 					    unsigned long long block,
 					    unsigned long long count);
+extern errcode_t io_channel_get_fd(io_channel io, int *fd);
 
 #ifdef _WIN32
 /* windows_io.c */
diff --git a/debian/libext2fs2t64.symbols b/debian/libext2fs2t64.symbols
index a3042c3292da93..8e3214ee31e337 100644
--- a/debian/libext2fs2t64.symbols
+++ b/debian/libext2fs2t64.symbols
@@ -693,6 +693,7 @@ libext2fs.so.2 libext2fs2t64 #MINVER#
  io_channel_alloc_buf@Base 1.42.3
  io_channel_cache_readahead@Base 1.43
  io_channel_discard@Base 1.42
+ io_channel_get_fd@Base 1.47.99
  io_channel_read_blk64@Base 1.41.1
  io_channel_set_options@Base 1.37
  io_channel_write_blk64@Base 1.41.1
diff --git a/lib/ext2fs/io_manager.c b/lib/ext2fs/io_manager.c
index dca6af09996b70..6b4dca5e4dbca2 100644
--- a/lib/ext2fs/io_manager.c
+++ b/lib/ext2fs/io_manager.c
@@ -150,3 +150,11 @@ errcode_t io_channel_cache_readahead(io_channel io, unsigned long long block,
 
 	return io->manager->cache_readahead(io, block, count);
 }
+
+errcode_t io_channel_get_fd(io_channel io, int *fd)
+{
+	if (!io->manager->get_fd)
+		return EXT2_ET_OP_NOT_SUPPORTED;
+
+	return io->manager->get_fd(io, fd);
+}
diff --git a/lib/ext2fs/unix_io.c b/lib/ext2fs/unix_io.c
index 723a5c2474cdd5..a540572a840d17 100644
--- a/lib/ext2fs/unix_io.c
+++ b/lib/ext2fs/unix_io.c
@@ -1662,6 +1662,19 @@ static errcode_t unix_zeroout(io_channel channel, unsigned long long block,
 unimplemented:
 	return EXT2_ET_UNIMPLEMENTED;
 }
+
+static errcode_t unix_get_fd(io_channel channel, int *fd)
+{
+	struct unix_private_data *data;
+
+	EXT2_CHECK_MAGIC(channel, EXT2_ET_MAGIC_IO_CHANNEL);
+	data = (struct unix_private_data *) channel->private_data;
+	EXT2_CHECK_MAGIC(data, EXT2_ET_MAGIC_UNIX_IO_CHANNEL);
+
+	*fd = data->dev;
+	return 0;
+}
+
 #if __GNUC_PREREQ (4, 6)
 #pragma GCC diagnostic pop
 #endif
@@ -1683,6 +1696,7 @@ static struct struct_io_manager struct_unix_manager = {
 	.discard	= unix_discard,
 	.cache_readahead	= unix_cache_readahead,
 	.zeroout	= unix_zeroout,
+	.get_fd		= unix_get_fd,
 };
 
 io_manager unix_io_manager = &struct_unix_manager;
@@ -1704,6 +1718,7 @@ static struct struct_io_manager struct_unixfd_manager = {
 	.discard	= unix_discard,
 	.cache_readahead	= unix_cache_readahead,
 	.zeroout	= unix_zeroout,
+	.get_fd		= unix_get_fd,
 };
 
 io_manager unixfd_io_manager = &struct_unixfd_manager;


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 02/10] libext2fs: always fsync the device when flushing the cache
  2025-09-16  0:22 ` [PATCHSET RFC v5 3/9] libext2fs: refactoring for fuse2fs iomap support Darrick J. Wong
  2025-09-16  0:56   ` [PATCH 01/10] libext2fs: make it possible to extract the fd from an IO manager Darrick J. Wong
@ 2025-09-16  0:56   ` Darrick J. Wong
  2025-09-16  0:56   ` [PATCH 03/10] libext2fs: always fsync the device when closing the unix IO manager Darrick J. Wong
                     ` (7 subsequent siblings)
  9 siblings, 0 replies; 87+ messages in thread
From: Darrick J. Wong @ 2025-09-16  0:56 UTC (permalink / raw)
  To: tytso; +Cc: miklos, neal, linux-fsdevel, linux-ext4, John, bernd,
	joannelkoong

From: Darrick J. Wong <djwong@kernel.org>

When we're flushing the unix IO manager's buffer cache, we should always
fsync the block device, because something could have written to the
block device -- either the buffer cache itself, or a direct write.
Regardless, the callers all want all dirtied regions to be persisted to
stable media.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 lib/ext2fs/unix_io.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)


diff --git a/lib/ext2fs/unix_io.c b/lib/ext2fs/unix_io.c
index a540572a840d17..f716de35cf5cb1 100644
--- a/lib/ext2fs/unix_io.c
+++ b/lib/ext2fs/unix_io.c
@@ -1462,7 +1462,8 @@ static errcode_t unix_flush(io_channel channel)
 	retval = flush_cached_blocks(channel, data, 0);
 #endif
 #ifdef HAVE_FSYNC
-	if (!retval && fsync(data->dev) != 0)
+	/* always fsync the device, even if flushing our own cache failed */
+	if (fsync(data->dev) != 0 && !retval)
 		return errno;
 #endif
 	return retval;


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 03/10] libext2fs: always fsync the device when closing the unix IO manager
  2025-09-16  0:22 ` [PATCHSET RFC v5 3/9] libext2fs: refactoring for fuse2fs iomap support Darrick J. Wong
  2025-09-16  0:56   ` [PATCH 01/10] libext2fs: make it possible to extract the fd from an IO manager Darrick J. Wong
  2025-09-16  0:56   ` [PATCH 02/10] libext2fs: always fsync the device when flushing the cache Darrick J. Wong
@ 2025-09-16  0:56   ` Darrick J. Wong
  2025-09-16  0:57   ` [PATCH 04/10] libext2fs: only fsync the unix fd if we wrote to the device Darrick J. Wong
                     ` (6 subsequent siblings)
  9 siblings, 0 replies; 87+ messages in thread
From: Darrick J. Wong @ 2025-09-16  0:56 UTC (permalink / raw)
  To: tytso; +Cc: miklos, neal, linux-fsdevel, linux-ext4, John, bernd,
	joannelkoong

From: Darrick J. Wong <djwong@kernel.org>

unix_close is the last chance that libext2fs has to report write
failures to users.  Although it's likely that ext2fs_close already
called ext2fs_flush and told the IO manager to flush, we could do one
more sync before we close the file descriptor.  Also don't override the
fsync's errno with the close's errno.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 lib/ext2fs/unix_io.c |    5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)


diff --git a/lib/ext2fs/unix_io.c b/lib/ext2fs/unix_io.c
index f716de35cf5cb1..b04e8a89a951dd 100644
--- a/lib/ext2fs/unix_io.c
+++ b/lib/ext2fs/unix_io.c
@@ -1146,8 +1146,11 @@ static errcode_t unix_close(io_channel channel)
 #ifndef NO_IO_CACHE
 	retval = flush_cached_blocks(channel, data, 0);
 #endif
+	/* always fsync the device, even if flushing our own cache failed */
+	if (fsync(data->dev) != 0 && !retval)
+		retval = errno;
 
-	if (close(data->dev) < 0)
+	if (close(data->dev) < 0 && !retval)
 		retval = errno;
 	free_cache(data);
 	free(data->cache);


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 04/10] libext2fs: only fsync the unix fd if we wrote to the device
  2025-09-16  0:22 ` [PATCHSET RFC v5 3/9] libext2fs: refactoring for fuse2fs iomap support Darrick J. Wong
                     ` (2 preceding siblings ...)
  2025-09-16  0:56   ` [PATCH 03/10] libext2fs: always fsync the device when closing the unix IO manager Darrick J. Wong
@ 2025-09-16  0:57   ` Darrick J. Wong
  2025-09-16  0:57   ` [PATCH 05/10] libext2fs: invalidate cached blocks when freeing them Darrick J. Wong
                     ` (5 subsequent siblings)
  9 siblings, 0 replies; 87+ messages in thread
From: Darrick J. Wong @ 2025-09-16  0:57 UTC (permalink / raw)
  To: tytso; +Cc: miklos, neal, linux-fsdevel, linux-ext4, John, bernd,
	joannelkoong

From: Darrick J. Wong <djwong@kernel.org>

As an optimization, only fsync the block device fd if we tried to write
to the io channel.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 lib/ext2fs/unix_io.c |   48 ++++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 42 insertions(+), 6 deletions(-)


diff --git a/lib/ext2fs/unix_io.c b/lib/ext2fs/unix_io.c
index b04e8a89a951dd..b462c587e3e2ac 100644
--- a/lib/ext2fs/unix_io.c
+++ b/lib/ext2fs/unix_io.c
@@ -129,10 +129,13 @@ struct unix_cache {
 #define WRITE_DIRECT_SIZE 4	/* Must be smaller than CACHE_SIZE */
 #define READ_DIRECT_SIZE 4	/* Should be smaller than CACHE_SIZE */
 
+#define UNIX_STATE_DIRTY	(1U << 0) /* device needs fsyncing */
+
 struct unix_private_data {
 	int	magic;
 	int	dev;
 	int	flags;
+	unsigned int	state; /* UNIX_STATE_* */
 	int	align;
 	int	access_time;
 	ext2_loff_t offset;
@@ -1131,10 +1134,37 @@ static errcode_t unix_open(const char *name, int flags,
 	return unix_open_channel(name, fd, flags, channel, unix_io_manager);
 }
 
+static void mark_dirty(io_channel channel)
+{
+	struct unix_private_data *data =
+		(struct unix_private_data *) channel->private_data;
+
+	mutex_lock(data, CACHE_MTX);
+	data->state |= UNIX_STATE_DIRTY;
+	mutex_unlock(data, CACHE_MTX);
+}
+
+static errcode_t maybe_fsync(io_channel channel)
+{
+	struct unix_private_data *data =
+		(struct unix_private_data *) channel->private_data;
+	int was_dirty;
+
+	mutex_lock(data, CACHE_MTX);
+	was_dirty = data->state & UNIX_STATE_DIRTY;
+	data->state &= ~UNIX_STATE_DIRTY;
+	mutex_unlock(data, CACHE_MTX);
+
+	if (was_dirty && fsync(data->dev) != 0)
+		return errno;
+
+	return 0;
+}
+
 static errcode_t unix_close(io_channel channel)
 {
 	struct unix_private_data *data;
-	errcode_t	retval = 0;
+	errcode_t	retval = 0, retval2;
 
 	EXT2_CHECK_MAGIC(channel, EXT2_ET_MAGIC_IO_CHANNEL);
 	data = (struct unix_private_data *) channel->private_data;
@@ -1147,8 +1177,9 @@ static errcode_t unix_close(io_channel channel)
 	retval = flush_cached_blocks(channel, data, 0);
 #endif
 	/* always fsync the device, even if flushing our own cache failed */
-	if (fsync(data->dev) != 0 && !retval)
-		retval = errno;
+	retval2 = maybe_fsync(channel);
+	if (retval2 && !retval)
+		retval = retval2;
 
 	if (close(data->dev) < 0 && !retval)
 		retval = errno;
@@ -1316,6 +1347,8 @@ static errcode_t unix_write_blk64(io_channel channel, unsigned long long block,
 	data = (struct unix_private_data *) channel->private_data;
 	EXT2_CHECK_MAGIC(data, EXT2_ET_MAGIC_UNIX_IO_CHANNEL);
 
+	mark_dirty(channel);
+
 #ifdef NO_IO_CACHE
 	return raw_write_blk(channel, data, block, count, buf, 0);
 #else
@@ -1440,6 +1473,8 @@ static errcode_t unix_write_byte(io_channel channel, unsigned long offset,
 	if (lseek(data->dev, offset + data->offset, SEEK_SET) < 0)
 		return errno;
 
+	mark_dirty(channel);
+
 	actual = write(data->dev, buf, size);
 	if (actual < 0)
 		return errno;
@@ -1455,7 +1490,7 @@ static errcode_t unix_write_byte(io_channel channel, unsigned long offset,
 static errcode_t unix_flush(io_channel channel)
 {
 	struct unix_private_data *data;
-	errcode_t retval = 0;
+	errcode_t retval = 0, retval2;
 
 	EXT2_CHECK_MAGIC(channel, EXT2_ET_MAGIC_IO_CHANNEL);
 	data = (struct unix_private_data *) channel->private_data;
@@ -1466,8 +1501,9 @@ static errcode_t unix_flush(io_channel channel)
 #endif
 #ifdef HAVE_FSYNC
 	/* always fsync the device, even if flushing our own cache failed */
-	if (fsync(data->dev) != 0 && !retval)
-		return errno;
+	retval2 = maybe_fsync(channel);
+	if (retval2 && !retval)
+		retval = retval2;
 #endif
 	return retval;
 }


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 05/10] libext2fs: invalidate cached blocks when freeing them
  2025-09-16  0:22 ` [PATCHSET RFC v5 3/9] libext2fs: refactoring for fuse2fs iomap support Darrick J. Wong
                     ` (3 preceding siblings ...)
  2025-09-16  0:57   ` [PATCH 04/10] libext2fs: only fsync the unix fd if we wrote to the device Darrick J. Wong
@ 2025-09-16  0:57   ` Darrick J. Wong
  2025-09-16  0:57   ` [PATCH 06/10] libext2fs: only flush affected blocks in unix_write_byte Darrick J. Wong
                     ` (4 subsequent siblings)
  9 siblings, 0 replies; 87+ messages in thread
From: Darrick J. Wong @ 2025-09-16  0:57 UTC (permalink / raw)
  To: tytso; +Cc: miklos, neal, linux-fsdevel, linux-ext4, John, bernd,
	joannelkoong

From: Darrick J. Wong <djwong@kernel.org>

When we're freeing blocks, we should tell the IO manager to drop them
from any cache it might be maintaining to improve performance.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 lib/ext2fs/ext2_io.h         |    8 +++++++-
 debian/libext2fs2t64.symbols |    1 +
 lib/ext2fs/alloc_stats.c     |    6 ++++++
 lib/ext2fs/io_manager.c      |    9 +++++++++
 lib/ext2fs/unix_io.c         |   35 +++++++++++++++++++++++++++++++++++
 5 files changed, 58 insertions(+), 1 deletion(-)


diff --git a/lib/ext2fs/ext2_io.h b/lib/ext2fs/ext2_io.h
index f53983b30996b4..26ecd128954a0e 100644
--- a/lib/ext2fs/ext2_io.h
+++ b/lib/ext2fs/ext2_io.h
@@ -103,7 +103,10 @@ struct struct_io_manager {
 	errcode_t (*zeroout)(io_channel channel, unsigned long long block,
 			     unsigned long long count);
 	errcode_t (*get_fd)(io_channel channel, int *fd);
-	long	reserved[13];
+	errcode_t (*invalidate_blocks)(io_channel channel,
+				       unsigned long long block,
+				       unsigned long long count);
+	long	reserved[12];
 };
 
 #define IO_FLAG_RW		0x0001
@@ -147,6 +150,9 @@ extern errcode_t io_channel_cache_readahead(io_channel io,
 					    unsigned long long block,
 					    unsigned long long count);
 extern errcode_t io_channel_get_fd(io_channel io, int *fd);
+extern errcode_t io_channel_invalidate_blocks(io_channel io,
+					      unsigned long long block,
+					      unsigned long long count);
 
 #ifdef _WIN32
 /* windows_io.c */
diff --git a/debian/libext2fs2t64.symbols b/debian/libext2fs2t64.symbols
index 8e3214ee31e337..864a284b940009 100644
--- a/debian/libext2fs2t64.symbols
+++ b/debian/libext2fs2t64.symbols
@@ -694,6 +694,7 @@ libext2fs.so.2 libext2fs2t64 #MINVER#
  io_channel_cache_readahead@Base 1.43
  io_channel_discard@Base 1.42
  io_channel_get_fd@Base 1.47.99
+ io_channel_invalidate_blocks@Base 1.47.99
  io_channel_read_blk64@Base 1.41.1
  io_channel_set_options@Base 1.37
  io_channel_write_blk64@Base 1.41.1
diff --git a/lib/ext2fs/alloc_stats.c b/lib/ext2fs/alloc_stats.c
index 95a6438f252e0f..68bbe6807a8ed3 100644
--- a/lib/ext2fs/alloc_stats.c
+++ b/lib/ext2fs/alloc_stats.c
@@ -82,6 +82,9 @@ void ext2fs_block_alloc_stats2(ext2_filsys fs, blk64_t blk, int inuse)
 			     -inuse * (blk64_t) EXT2FS_CLUSTER_RATIO(fs));
 	ext2fs_mark_super_dirty(fs);
 	ext2fs_mark_bb_dirty(fs);
+	if (inuse < 0)
+		io_channel_invalidate_blocks(fs->io, blk,
+					     EXT2FS_CLUSTER_RATIO(fs));
 	if (fs->block_alloc_stats)
 		(fs->block_alloc_stats)(fs, (blk64_t) blk, inuse);
 }
@@ -144,11 +147,14 @@ void ext2fs_block_alloc_stats_range(ext2_filsys fs, blk64_t blk,
 		ext2fs_bg_flags_clear(fs, group, EXT2_BG_BLOCK_UNINIT);
 		ext2fs_group_desc_csum_set(fs, group);
 		ext2fs_free_blocks_count_add(fs->super, -inuse * (blk64_t) n);
+
 		blk += n;
 		num -= n;
 	}
 	ext2fs_mark_super_dirty(fs);
 	ext2fs_mark_bb_dirty(fs);
+	if (inuse < 0)
+		io_channel_invalidate_blocks(fs->io, orig_blk, orig_num);
 	if (fs->block_alloc_stats_range)
 		(fs->block_alloc_stats_range)(fs, orig_blk, orig_num, inuse);
 }
diff --git a/lib/ext2fs/io_manager.c b/lib/ext2fs/io_manager.c
index 6b4dca5e4dbca2..c91fab4eb290d5 100644
--- a/lib/ext2fs/io_manager.c
+++ b/lib/ext2fs/io_manager.c
@@ -158,3 +158,12 @@ errcode_t io_channel_get_fd(io_channel io, int *fd)
 
 	return io->manager->get_fd(io, fd);
 }
+
+errcode_t io_channel_invalidate_blocks(io_channel io, unsigned long long block,
+				       unsigned long long count)
+{
+	if (!io->manager->invalidate_blocks)
+		return EXT2_ET_OP_NOT_SUPPORTED;
+
+	return io->manager->invalidate_blocks(io, block, count);
+}
diff --git a/lib/ext2fs/unix_io.c b/lib/ext2fs/unix_io.c
index b462c587e3e2ac..be253b5fddf281 100644
--- a/lib/ext2fs/unix_io.c
+++ b/lib/ext2fs/unix_io.c
@@ -667,6 +667,25 @@ static errcode_t reuse_cache(io_channel channel,
 #define FLUSH_INVALIDATE	0x01
 #define FLUSH_NOLOCK		0x02
 
+/* Remove blocks from the cache.  Dirty contents are discarded. */
+static void invalidate_cached_blocks(io_channel channel,
+				     struct unix_private_data *data,
+				     unsigned long long block,
+				     unsigned long long count)
+{
+	struct unix_cache	*cache;
+	int			i;
+
+	mutex_lock(data, CACHE_MTX);
+	for (i = 0, cache = data->cache; i < data->cache_size; i++, cache++) {
+		if (!cache->in_use || cache->block < block ||
+		    cache->block >= block + count)
+			continue;
+		cache->in_use = 0;
+	}
+	mutex_unlock(data, CACHE_MTX);
+}
+
 /*
  * Flush all of the blocks in the cache
  */
@@ -1715,6 +1734,20 @@ static errcode_t unix_get_fd(io_channel channel, int *fd)
 	return 0;
 }
 
+static errcode_t unix_invalidate_blocks(io_channel channel,
+					unsigned long long block,
+					unsigned long long count)
+{
+	struct unix_private_data *data;
+
+	EXT2_CHECK_MAGIC(channel, EXT2_ET_MAGIC_IO_CHANNEL);
+	data = (struct unix_private_data *) channel->private_data;
+	EXT2_CHECK_MAGIC(data, EXT2_ET_MAGIC_UNIX_IO_CHANNEL);
+
+	invalidate_cached_blocks(channel, data, block, count);
+	return 0;
+}
+
 #if __GNUC_PREREQ (4, 6)
 #pragma GCC diagnostic pop
 #endif
@@ -1737,6 +1770,7 @@ static struct struct_io_manager struct_unix_manager = {
 	.cache_readahead	= unix_cache_readahead,
 	.zeroout	= unix_zeroout,
 	.get_fd		= unix_get_fd,
+	.invalidate_blocks = unix_invalidate_blocks,
 };
 
 io_manager unix_io_manager = &struct_unix_manager;
@@ -1759,6 +1793,7 @@ static struct struct_io_manager struct_unixfd_manager = {
 	.cache_readahead	= unix_cache_readahead,
 	.zeroout	= unix_zeroout,
 	.get_fd		= unix_get_fd,
+	.invalidate_blocks = unix_invalidate_blocks,
 };
 
 io_manager unixfd_io_manager = &struct_unixfd_manager;


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 06/10] libext2fs: only flush affected blocks in unix_write_byte
  2025-09-16  0:22 ` [PATCHSET RFC v5 3/9] libext2fs: refactoring for fuse2fs iomap support Darrick J. Wong
                     ` (4 preceding siblings ...)
  2025-09-16  0:57   ` [PATCH 05/10] libext2fs: invalidate cached blocks when freeing them Darrick J. Wong
@ 2025-09-16  0:57   ` Darrick J. Wong
  2025-09-16  0:57   ` [PATCH 07/10] libext2fs: allow unix_write_byte when the write would be aligned Darrick J. Wong
                     ` (3 subsequent siblings)
  9 siblings, 0 replies; 87+ messages in thread
From: Darrick J. Wong @ 2025-09-16  0:57 UTC (permalink / raw)
  To: tytso; +Cc: miklos, neal, linux-fsdevel, linux-ext4, John, bernd,
	joannelkoong

From: Darrick J. Wong <djwong@kernel.org>

There's no need to invalidate the entire cache when writing a range of
bytes to the device.  The only ones we need to invalidate are the ones
that we're writing separately.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 lib/ext2fs/unix_io.c |   12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)


diff --git a/lib/ext2fs/unix_io.c b/lib/ext2fs/unix_io.c
index be253b5fddf281..d4973d1a878057 100644
--- a/lib/ext2fs/unix_io.c
+++ b/lib/ext2fs/unix_io.c
@@ -1468,6 +1468,7 @@ static errcode_t unix_write_byte(io_channel channel, unsigned long offset,
 {
 	struct unix_private_data *data;
 	errcode_t	retval = 0;
+	unsigned long long bno, nbno;
 	ssize_t		actual;
 
 	EXT2_CHECK_MAGIC(channel, EXT2_ET_MAGIC_IO_CHANNEL);
@@ -1483,10 +1484,17 @@ static errcode_t unix_write_byte(io_channel channel, unsigned long offset,
 
 #ifndef NO_IO_CACHE
 	/*
-	 * Flush out the cache completely
+	 * Flush all the dirty blocks, then invalidate the blocks we're about
+	 * to write.
 	 */
-	if ((retval = flush_cached_blocks(channel, data, FLUSH_INVALIDATE)))
+	retval = flush_cached_blocks(channel, data, 0);
+	if (retval)
 		return retval;
+
+	bno = offset / channel->block_size;
+	nbno = (offset + size + channel->block_size - 1) / channel->block_size;
+
+	invalidate_cached_blocks(channel, data, bno, nbno - bno);
 #endif
 
 	if (lseek(data->dev, offset + data->offset, SEEK_SET) < 0)


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 07/10] libext2fs: allow unix_write_byte when the write would be aligned
  2025-09-16  0:22 ` [PATCHSET RFC v5 3/9] libext2fs: refactoring for fuse2fs iomap support Darrick J. Wong
                     ` (5 preceding siblings ...)
  2025-09-16  0:57   ` [PATCH 06/10] libext2fs: only flush affected blocks in unix_write_byte Darrick J. Wong
@ 2025-09-16  0:57   ` Darrick J. Wong
  2025-09-16  0:58   ` [PATCH 08/10] libext2fs: allow clients to ask to write full superblocks Darrick J. Wong
                     ` (2 subsequent siblings)
  9 siblings, 0 replies; 87+ messages in thread
From: Darrick J. Wong @ 2025-09-16  0:57 UTC (permalink / raw)
  To: tytso; +Cc: miklos, neal, linux-fsdevel, linux-ext4, John, bernd,
	joannelkoong

From: Darrick J. Wong <djwong@kernel.org>

If someone calls write_byte on an IO channel with an alignment
requirement and the range to be written is aligned correctly, go ahead
and do the write.  This will be needed later when we try to speed up
superblock writes.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 lib/ext2fs/unix_io.c |    4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)


diff --git a/lib/ext2fs/unix_io.c b/lib/ext2fs/unix_io.c
index d4973d1a878057..068be689326443 100644
--- a/lib/ext2fs/unix_io.c
+++ b/lib/ext2fs/unix_io.c
@@ -1479,7 +1479,9 @@ static errcode_t unix_write_byte(io_channel channel, unsigned long offset,
 #ifdef ALIGN_DEBUG
 		printf("unix_write_byte: O_DIRECT fallback\n");
 #endif
-		return EXT2_ET_UNIMPLEMENTED;
+		if (!IS_ALIGNED(data->offset + offset, channel->align) ||
+		    !IS_ALIGNED(data->offset + offset + size, channel->align))
+			return EXT2_ET_UNIMPLEMENTED;
 	}
 
 #ifndef NO_IO_CACHE


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 08/10] libext2fs: allow clients to ask to write full superblocks
  2025-09-16  0:22 ` [PATCHSET RFC v5 3/9] libext2fs: refactoring for fuse2fs iomap support Darrick J. Wong
                     ` (6 preceding siblings ...)
  2025-09-16  0:57   ` [PATCH 07/10] libext2fs: allow unix_write_byte when the write would be aligned Darrick J. Wong
@ 2025-09-16  0:58   ` Darrick J. Wong
  2025-09-16  0:58   ` [PATCH 09/10] libext2fs: allow callers to disallow I/O to file data blocks Darrick J. Wong
  2025-09-16  0:58   ` [PATCH 10/10] libext2fs: add posix advisory locking to the unix IO manager Darrick J. Wong
  9 siblings, 0 replies; 87+ messages in thread
From: Darrick J. Wong @ 2025-09-16  0:58 UTC (permalink / raw)
  To: tytso; +Cc: miklos, neal, linux-fsdevel, linux-ext4, John, bernd,
	joannelkoong

From: Darrick J. Wong <djwong@kernel.org>

write_primary_superblock currently does this weird dance where it will
try to write only the dirty bytes of the primary superblock to disk.  In
theory, this is done so that tune2fs can incrementally update superblock
bytes when the filesystem is mounted; ext2 was famous for allowing using
this dance to set new fs parameters and have them take effect in real
time.

The ability to do this safely was obliterated back in 2001 when ext3 was
introduced with journalling, because tune2fs has no way to know if the
journal has already logged an updated primary superblock but not yet
written it to disk, which means that they can race to write, and changes
can be lost.

This (non-)safety was further obliterated back in 2012 when I added
checksums to all the metadata blocks in ext4 because anyone else with
the block device open can see the primary superblock in an intermediate
state where the checksum does not match the superblock contents.

At this point in 2025 it's kind of stupid for fuse2fs to be doing this
because you can't have the kernel and fuse2fs mount the same filesystem
at the same time.  It also makes fuse2fs op_fsync slow because libext2fs
performs a bunch of small writes and introduce extra fsyncs.

So, add a new flag to ask for full superblock writes, which fuse2fs will
use later.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 lib/ext2fs/ext2fs.h  |    1 +
 lib/ext2fs/closefs.c |    7 +++++++
 2 files changed, 8 insertions(+)


diff --git a/lib/ext2fs/ext2fs.h b/lib/ext2fs/ext2fs.h
index bb2170b78d6308..dee9feb02624ed 100644
--- a/lib/ext2fs/ext2fs.h
+++ b/lib/ext2fs/ext2fs.h
@@ -220,6 +220,7 @@ typedef struct ext2_file *ext2_file_t;
 #define EXT2_FLAG_IBITMAP_TAIL_PROBLEM	0x2000000
 #define EXT2_FLAG_THREADS		0x4000000
 #define EXT2_FLAG_IGNORE_SWAP_DIRENT	0x8000000
+#define EXT2_FLAG_WRITE_FULL_SUPER	0x10000000
 
 /*
  * Internal flags for use by the ext2fs library only
diff --git a/lib/ext2fs/closefs.c b/lib/ext2fs/closefs.c
index 8e5bec03a050de..9a67db76e7b326 100644
--- a/lib/ext2fs/closefs.c
+++ b/lib/ext2fs/closefs.c
@@ -196,6 +196,13 @@ static errcode_t write_primary_superblock(ext2_filsys fs,
 	int		check_idx, write_idx, size;
 	errcode_t	retval;
 
+	if (fs->flags & EXT2_FLAG_WRITE_FULL_SUPER) {
+		retval = io_channel_write_byte(fs->io, SUPERBLOCK_OFFSET,
+					       SUPERBLOCK_SIZE, super);
+		if (!retval)
+			return 0;
+	}
+
 	if (!fs->io->manager->write_byte || !fs->orig_super) {
 	fallback:
 		io_channel_set_blksize(fs->io, SUPERBLOCK_OFFSET);


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 09/10] libext2fs: allow callers to disallow I/O to file data blocks
  2025-09-16  0:22 ` [PATCHSET RFC v5 3/9] libext2fs: refactoring for fuse2fs iomap support Darrick J. Wong
                     ` (7 preceding siblings ...)
  2025-09-16  0:58   ` [PATCH 08/10] libext2fs: allow clients to ask to write full superblocks Darrick J. Wong
@ 2025-09-16  0:58   ` Darrick J. Wong
  2025-09-16  0:58   ` [PATCH 10/10] libext2fs: add posix advisory locking to the unix IO manager Darrick J. Wong
  9 siblings, 0 replies; 87+ messages in thread
From: Darrick J. Wong @ 2025-09-16  0:58 UTC (permalink / raw)
  To: tytso; +Cc: miklos, neal, linux-fsdevel, linux-ext4, John, bernd,
	joannelkoong

From: Darrick J. Wong <djwong@kernel.org>

Add a flag to ext2_file_t to disallow read and write I/O to file data
blocks.  This supports fuse2fs iomap support, which will keep all the
file data I/O inside the kerne.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 lib/ext2fs/ext2fs.h |    3 +++
 lib/ext2fs/fileio.c |   12 +++++++++++-
 2 files changed, 14 insertions(+), 1 deletion(-)


diff --git a/lib/ext2fs/ext2fs.h b/lib/ext2fs/ext2fs.h
index dee9feb02624ed..7d36b1a839dc57 100644
--- a/lib/ext2fs/ext2fs.h
+++ b/lib/ext2fs/ext2fs.h
@@ -178,6 +178,9 @@ typedef struct ext2_struct_dblist *ext2_dblist;
 #define EXT2_FILE_WRITE		0x0001
 #define EXT2_FILE_CREATE	0x0002
 
+/* no file I/O to disk blocks, only to inline data */
+#define EXT2_FILE_NOBLOCKIO	0x0004
+
 #define EXT2_FILE_MASK		0x00FF
 
 #define EXT2_FILE_BUF_DIRTY	0x4000
diff --git a/lib/ext2fs/fileio.c b/lib/ext2fs/fileio.c
index 3a36e9e7fff43b..95ee45ec7371ae 100644
--- a/lib/ext2fs/fileio.c
+++ b/lib/ext2fs/fileio.c
@@ -314,6 +314,11 @@ errcode_t ext2fs_file_read(ext2_file_t file, void *buf,
 	if (file->inode.i_flags & EXT4_INLINE_DATA_FL)
 		return ext2fs_file_read_inline_data(file, buf, wanted, got);
 
+	if (file->flags & EXT2_FILE_NOBLOCKIO) {
+		retval = EXT2_ET_OP_NOT_SUPPORTED;
+		goto fail;
+	}
+
 	while ((file->pos < EXT2_I_SIZE(&file->inode)) && (wanted > 0)) {
 		retval = sync_buffer_position(file);
 		if (retval)
@@ -441,6 +446,11 @@ errcode_t ext2fs_file_write(ext2_file_t file, const void *buf,
 		retval = 0;
 	}
 
+	if (file->flags & EXT2_FILE_NOBLOCKIO) {
+		retval = EXT2_ET_OP_NOT_SUPPORTED;
+		goto fail;
+	}
+
 	while (nbytes > 0) {
 		retval = sync_buffer_position(file);
 		if (retval)
@@ -609,7 +619,7 @@ static errcode_t ext2fs_file_zero_past_offset(ext2_file_t file,
 	int ret_flags;
 	errcode_t retval;
 
-	if (off == 0)
+	if (off == 0 || (file->flags & EXT2_FILE_NOBLOCKIO))
 		return 0;
 
 	retval = sync_buffer_position(file);


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 10/10] libext2fs: add posix advisory locking to the unix IO manager
  2025-09-16  0:22 ` [PATCHSET RFC v5 3/9] libext2fs: refactoring for fuse2fs iomap support Darrick J. Wong
                     ` (8 preceding siblings ...)
  2025-09-16  0:58   ` [PATCH 09/10] libext2fs: allow callers to disallow I/O to file data blocks Darrick J. Wong
@ 2025-09-16  0:58   ` Darrick J. Wong
  2025-10-08 22:09     ` Darrick J. Wong
  9 siblings, 1 reply; 87+ messages in thread
From: Darrick J. Wong @ 2025-09-16  0:58 UTC (permalink / raw)
  To: tytso; +Cc: miklos, neal, linux-fsdevel, linux-ext4, John, bernd,
	joannelkoong

From: Darrick J. Wong <djwong@kernel.org>

Add support for using flock() to protect the files opened by the Unix IO
manager so that we can't mount the same fs multiple times.  This also
prevents systemd and udev from accessing the device while e2fsprogs is
doing something with the device.

Link: https://systemd.io/BLOCK_DEVICE_LOCKING/
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 lib/ext2fs/unix_io.c |   64 ++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 64 insertions(+)


diff --git a/lib/ext2fs/unix_io.c b/lib/ext2fs/unix_io.c
index 068be689326443..55007ad7d2ae15 100644
--- a/lib/ext2fs/unix_io.c
+++ b/lib/ext2fs/unix_io.c
@@ -65,6 +65,12 @@
 #include <pthread.h>
 #endif
 
+#if defined(HAVE_SYS_FILE_H) && defined(HAVE_SIGNAL_H)
+# include <sys/file.h>
+# include <signal.h>
+# define WANT_LOCK_UNIX_FD
+#endif
+
 #if defined(__linux__) && defined(_IO) && !defined(BLKROGET)
 #define BLKROGET   _IO(0x12, 94) /* Get read-only status (0 = read_write).  */
 #endif
@@ -149,6 +155,9 @@ struct unix_private_data {
 	pthread_mutex_t bounce_mutex;
 	pthread_mutex_t stats_mutex;
 #endif
+#ifdef WANT_LOCK_UNIX_FD
+	int	lock_flags;
+#endif
 };
 
 #define IS_ALIGNED(n, align) ((((uintptr_t) n) & \
@@ -897,6 +906,47 @@ int ext2fs_fstat(int fd, ext2fs_struct_stat *buf)
 #endif
 }
 
+#ifdef WANT_LOCK_UNIX_FD
+static void unix_lock_alarm_handler(int signal, siginfo_t *data, void *p)
+{
+	/* do nothing, the signal will abort the flock operation */
+}
+
+static int unix_lock_fd(int fd, int flags)
+{
+	struct sigaction newsa = {
+		.sa_flags = SA_SIGINFO,
+		.sa_sigaction = unix_lock_alarm_handler,
+	};
+	struct sigaction oldsa;
+	const int operation = (flags & IO_FLAG_EXCLUSIVE) ? LOCK_EX : LOCK_SH;
+	int ret;
+
+	/* wait five seconds for the lock */
+	ret = sigaction(SIGALRM, &newsa, &oldsa);
+	if (ret)
+		return ret;
+
+	alarm(5);
+
+	ret = flock(fd, operation);
+	if (ret == 0)
+		ret = operation;
+	else if (errno == EINTR) {
+		errno = EWOULDBLOCK;
+		ret = -1;
+	}
+
+	alarm(0);
+	sigaction(SIGALRM, &oldsa, NULL);
+	return ret;
+}
+
+static void unix_unlock_fd(int fd)
+{
+	flock(fd, LOCK_UN);
+}
+#endif
 
 static errcode_t unix_open_channel(const char *name, int fd,
 				   int flags, io_channel *channel,
@@ -935,6 +985,16 @@ static errcode_t unix_open_channel(const char *name, int fd,
 	if (retval)
 		goto cleanup;
 
+#ifdef WANT_LOCK_UNIX_FD
+	if (flags & IO_FLAG_RW) {
+		data->lock_flags = unix_lock_fd(fd, flags);
+		if (data->lock_flags < 0) {
+			retval = errno;
+			goto cleanup;
+		}
+	}
+#endif
+
 	strcpy(io->name, name);
 	io->private_data = data;
 	io->block_size = 1024;
@@ -1200,6 +1260,10 @@ static errcode_t unix_close(io_channel channel)
 	if (retval2 && !retval)
 		retval = retval2;
 
+#ifdef WANT_LOCK_UNIX_FD
+	if (data->lock_flags)
+		unix_unlock_fd(data->dev);
+#endif
 	if (close(data->dev) < 0 && !retval)
 		retval = errno;
 	free_cache(data);


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 01/17] fuse2fs: implement bare minimum iomap for file mapping reporting
  2025-09-16  0:22 ` [PATCHSET RFC v5 4/9] fuse2fs: use fuse iomap data paths for better file I/O performance Darrick J. Wong
@ 2025-09-16  0:58   ` Darrick J. Wong
  2025-09-16  0:59   ` [PATCH 02/17] fuse2fs: add iomap= mount option Darrick J. Wong
                     ` (15 subsequent siblings)
  16 siblings, 0 replies; 87+ messages in thread
From: Darrick J. Wong @ 2025-09-16  0:58 UTC (permalink / raw)
  To: tytso; +Cc: miklos, neal, linux-fsdevel, linux-ext4, John, bernd,
	joannelkoong

From: Darrick J. Wong <djwong@kernel.org>

Add enough of an iomap implementation that we can do FIEMAP and
SEEK_DATA and SEEK_HOLE.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 configure         |   48 +++++
 configure.ac      |   31 +++
 fuse4fs/fuse4fs.c |  525 ++++++++++++++++++++++++++++++++++++++++++++++++++++-
 lib/config.h.in   |    3 
 misc/fuse2fs.c    |  525 ++++++++++++++++++++++++++++++++++++++++++++++++++++-
 5 files changed, 1118 insertions(+), 14 deletions(-)


diff --git a/configure b/configure
index 7f5fb7c1a62084..4137f942efaef5 100755
--- a/configure
+++ b/configure
@@ -14212,6 +14212,7 @@ printf "%s\n" "yes" >&6; }
 fi
 
 
+have_fuse_iomap=
 if test -n "$FUSE_LIB"
 then
 	FUSE_USE_VERSION=314
@@ -14237,12 +14238,59 @@ See \`config.log' for more details" "$LINENO" 5; }
 fi
 
 done
+
+					{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for iomap_begin in libfuse" >&5
+printf %s "checking for iomap_begin in libfuse... " >&6; }
+	cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+
+	#define _GNU_SOURCE
+	#define _FILE_OFFSET_BITS	64
+	#define FUSE_USE_VERSION	399
+	#include <fuse.h>
+
+int
+main (void)
+{
+
+	struct fuse_operations fs_ops = {
+		.iomap_begin = NULL,
+		.iomap_end = NULL,
+	};
+	struct fuse_file_iomap narf = { };
+
+  ;
+  return 0;
+}
+
+_ACEOF
+if ac_fn_c_try_link "$LINENO"
+then :
+  have_fuse_iomap=yes
+	   { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: yes" >&5
+printf "%s\n" "yes" >&6; }
+else $as_nop
+  { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: no" >&5
+printf "%s\n" "no" >&6; }
+fi
+rm -f core conftest.err conftest.$ac_objext conftest.beam \
+    conftest$ac_exeext conftest.$ac_ext
+	if test "$have_fuse_iomap" = yes
+	then
+		FUSE_USE_VERSION=399
+	fi
 fi
 if test -n "$FUSE_USE_VERSION"
 then
 
 printf "%s\n" "#define FUSE_USE_VERSION $FUSE_USE_VERSION" >>confdefs.h
 
+fi
+if test -n "$have_fuse_iomap"
+then
+
+printf "%s\n" "#define HAVE_FUSE_IOMAP 1" >>confdefs.h
+
 fi
 
 have_fuse_lowlevel=
diff --git a/configure.ac b/configure.ac
index 2eb11873ea0e50..a1057c07b8c056 100644
--- a/configure.ac
+++ b/configure.ac
@@ -1382,6 +1382,7 @@ dnl
 dnl Set FUSE_USE_VERSION, which is how fuse servers build against a particular
 dnl libfuse ABI.  Currently we link against the libfuse 3.14 ABI (hence 314)
 dnl
+have_fuse_iomap=
 if test -n "$FUSE_LIB"
 then
 	FUSE_USE_VERSION=314
@@ -1391,12 +1392,42 @@ then
 		[AC_MSG_FAILURE([Cannot build against fuse3 headers])],
 [#define _FILE_OFFSET_BITS	64
 #define FUSE_USE_VERSION	314])
+
+	dnl
+	dnl Check if the fuse library supports iomap, which requires a higher
+	dnl FUSE_USE_VERSION ABI version (3.99)
+	dnl
+	AC_MSG_CHECKING(for iomap_begin in libfuse)
+	AC_LINK_IFELSE(
+	[	AC_LANG_PROGRAM([[
+	#define _GNU_SOURCE
+	#define _FILE_OFFSET_BITS	64
+	#define FUSE_USE_VERSION	399
+	#include <fuse.h>
+		]], [[
+	struct fuse_operations fs_ops = {
+		.iomap_begin = NULL,
+		.iomap_end = NULL,
+	};
+	struct fuse_file_iomap narf = { };
+		]])
+	], have_fuse_iomap=yes
+	   AC_MSG_RESULT(yes),
+	   AC_MSG_RESULT(no))
+	if test "$have_fuse_iomap" = yes
+	then
+		FUSE_USE_VERSION=399
+	fi
 fi
 if test -n "$FUSE_USE_VERSION"
 then
 	AC_DEFINE_UNQUOTED(FUSE_USE_VERSION, $FUSE_USE_VERSION,
 		[Define to the version of FUSE to use])
 fi
+if test -n "$have_fuse_iomap"
+then
+	AC_DEFINE(HAVE_FUSE_IOMAP, 1, [Define to 1 if fuse supports iomap])
+fi
 
 dnl
 dnl Check if the FUSE lowlevel library is supported
diff --git a/fuse4fs/fuse4fs.c b/fuse4fs/fuse4fs.c
index 9a6913f6eef16a..bf9c2081702132 100644
--- a/fuse4fs/fuse4fs.c
+++ b/fuse4fs/fuse4fs.c
@@ -143,6 +143,9 @@ static inline uint64_t round_down(uint64_t b, unsigned int align)
 	return b - m;
 }
 
+#define max(a, b)	((a) > (b) ? (a) : (b))
+#define min(a, b)	((a) < (b) ? (a) : (b))
+
 #define dbg_printf(fuse4fs, format, ...) \
 	while ((fuse4fs)->debug) { \
 		printf("FUSE4FS (%s): tid=%d " format, (fuse4fs)->shortdev, gettid(), ##__VA_ARGS__); \
@@ -220,6 +223,14 @@ enum fuse4fs_opstate {
 	F4OP_SHUTDOWN,
 };
 
+#ifdef HAVE_FUSE_IOMAP
+enum fuse4fs_iomap_state {
+	IOMAP_DISABLED,
+	IOMAP_UNKNOWN,
+	IOMAP_ENABLED,
+};
+#endif
+
 /* Main program context */
 #define FUSE4FS_MAGIC		(0xEF53DEADUL)
 struct fuse4fs {
@@ -248,6 +259,9 @@ struct fuse4fs {
 	enum fuse4fs_opstate opstate;
 	int logfd;
 	int blocklog;
+#ifdef HAVE_FUSE_IOMAP
+	enum fuse4fs_iomap_state iomap_state;
+#endif
 	unsigned int blockmask;
 	unsigned long offset;
 	unsigned int next_generation;
@@ -700,6 +714,15 @@ static inline void __fuse4fs_finish(struct fuse4fs *ff, int ret,
 }
 #define fuse4fs_finish(ff, ret) __fuse4fs_finish((ff), (ret), __func__)
 
+#ifdef HAVE_FUSE_IOMAP
+static inline int fuse4fs_iomap_enabled(const struct fuse4fs *ff)
+{
+	return ff->iomap_state >= IOMAP_ENABLED;
+}
+#else
+# define fuse4fs_iomap_enabled(...)	(0)
+#endif
+
 static void get_now(struct timespec *now)
 {
 #ifdef CLOCK_REALTIME
@@ -1122,7 +1145,7 @@ static errcode_t fuse4fs_open(struct fuse4fs *ff, int libext2_flags)
 {
 	char options[128];
 	int flags = EXT2_FLAG_64BITS | EXT2_FLAG_THREADS | EXT2_FLAG_RW |
-		    libext2_flags;
+		    EXT2_FLAG_WRITE_FULL_SUPER | libext2_flags;
 	errcode_t err;
 
 	if (ff->lockfile) {
@@ -1494,6 +1517,33 @@ static inline int fuse_set_feature_flag(struct fuse_conn_info *conn,
 }
 #endif
 
+#ifdef HAVE_FUSE_IOMAP
+static void fuse4fs_iomap_enable(struct fuse_conn_info *conn,
+				 struct fuse4fs *ff)
+{
+	/* Don't let anyone touch iomap until the end of the patchset. */
+	ff->iomap_state = IOMAP_DISABLED;
+	return;
+
+	/* iomap only works with block devices */
+	if (ff->iomap_state != IOMAP_DISABLED && fuse4fs_on_bdev(ff) &&
+	    fuse_set_feature_flag(conn, FUSE_CAP_IOMAP)) {
+		/*
+		 * If we're mounting in iomap mode, we need to unmount in
+		 * op_destroy so that the block device will be released before
+		 * umount(2) returns.
+		 */
+		ff->unmount_in_destroy = 1;
+		ff->iomap_state = IOMAP_ENABLED;
+	}
+
+	if (ff->iomap_state == IOMAP_UNKNOWN)
+		ff->iomap_state = IOMAP_DISABLED;
+}
+#else
+# define fuse4fs_iomap_enable(...)	((void)0)
+#endif
+
 static void op_init(void *userdata, struct fuse_conn_info *conn)
 {
 	struct fuse4fs *ff = userdata;
@@ -1516,6 +1566,7 @@ static void op_init(void *userdata, struct fuse_conn_info *conn)
 #ifdef FUSE_CAP_NO_EXPORT_SUPPORT
 	fuse_set_feature_flag(conn, FUSE_CAP_NO_EXPORT_SUPPORT);
 #endif
+	fuse4fs_iomap_enable(conn, ff);
 	conn->time_gran = 1;
 
 	if (ff->kernel) {
@@ -5402,6 +5453,460 @@ static void op_fallocate(fuse_req_t req, fuse_ino_t fino EXT2FS_ATTR((unused)),
 }
 #endif /* SUPPORT_FALLOCATE */
 
+#ifdef HAVE_FUSE_IOMAP
+static void fuse4fs_iomap_hole(struct fuse4fs *ff, struct fuse_file_iomap *iomap,
+			       off_t pos, uint64_t count)
+{
+	iomap->dev = FUSE_IOMAP_DEV_NULL;
+	iomap->addr = FUSE_IOMAP_NULL_ADDR;
+	iomap->offset = pos;
+	iomap->length = count;
+	iomap->type = FUSE_IOMAP_TYPE_HOLE;
+}
+
+static void fuse4fs_iomap_hole_to_eof(struct fuse4fs *ff,
+				      struct fuse_file_iomap *iomap, off_t pos,
+				      off_t count,
+				      const struct ext2_inode_large *inode)
+{
+	ext2_filsys fs = ff->fs;
+	uint64_t isize = EXT2_I_SIZE(inode);
+
+	/*
+	 * We have to be careful about handling a hole to the right of the
+	 * entire mapping tree.  First, the mapping must start and end on a
+	 * block boundary because they must be aligned to at least an LBA for
+	 * the block layer; and to the fsblock for smoother operation.
+	 *
+	 * As for the length -- we could return a mapping all the way to
+	 * i_size, but i_size could be less than pos/count if we're zeroing the
+	 * EOF block in anticipation of a truncate operation.  Similarly, we
+	 * don't want to end the mapping at pos+count because we know there's
+	 * nothing mapped byeond here.
+	 */
+	uint64_t startoff = round_down(pos, fs->blocksize);
+	uint64_t eofoff = round_up(max(pos + count, isize), fs->blocksize);
+
+	dbg_printf(ff,
+ "pos=0x%llx count=0x%llx isize=0x%llx startoff=0x%llx eofoff=0x%llx\n",
+		   (unsigned long long)pos,
+		   (unsigned long long)count,
+		   (unsigned long long)isize,
+		   (unsigned long long)startoff,
+		   (unsigned long long)eofoff);
+
+	fuse4fs_iomap_hole(ff, iomap, startoff, eofoff - startoff);
+}
+
+#define DEBUG_IOMAP
+#ifdef DEBUG_IOMAP
+# define __DUMP_EXTENT(ff, func, tag, startoff, err, extent) \
+	do { \
+		dbg_printf((ff), \
+ "%s: %s startoff 0x%llx err %ld lblk 0x%llx pblk 0x%llx len 0x%x flags 0x%x\n", \
+			   (func), (tag), (startoff), (err), (extent)->e_lblk, \
+			   (extent)->e_pblk, (extent)->e_len, \
+			   (extent)->e_flags & EXT2_EXTENT_FLAGS_UNINIT); \
+	} while(0)
+# define DUMP_EXTENT(ff, tag, startoff, err, extent) \
+	__DUMP_EXTENT((ff), __func__, (tag), (startoff), (err), (extent))
+
+# define __DUMP_INFO(ff, func, tag, startoff, err, info) \
+	do { \
+		dbg_printf((ff), \
+ "%s: %s startoff 0x%llx err %ld entry %d/%d/%d level  %d/%d\n", \
+			   (func), (tag), (startoff), (err), \
+			   (info)->curr_entry, (info)->num_entries, \
+			   (info)->max_entries, (info)->curr_level, \
+			   (info)->max_depth); \
+	} while(0)
+# define DUMP_INFO(ff, tag, startoff, err, info) \
+	__DUMP_INFO((ff), __func__, (tag), (startoff), (err), (info))
+#else
+# define __DUMP_EXTENT(...)	((void)0)
+# define DUMP_EXTENT(...)	((void)0)
+# define DUMP_INFO(...)		((void)0)
+#endif
+
+static inline errcode_t __fuse4fs_get_mapping_at(struct fuse4fs *ff,
+						 ext2_extent_handle_t handle,
+						 blk64_t startoff,
+						 struct ext2fs_extent *bmap,
+						 const char *func)
+{
+	errcode_t err;
+
+	/*
+	 * Find the file mapping at startoff.  We don't check the return value
+	 * of _goto because _get will error out if _goto failed.  There's a
+	 * subtlety to the outcome of _goto when startoff falls in a sparse
+	 * hole however:
+	 *
+	 * Most of the time, _goto points the cursor at the mapping whose lblk
+	 * is just to the left of startoff.  The mapping may or may not overlap
+	 * startoff; this is ok.  In other words, the tree lookup behaves as if
+	 * we asked it to use a less than or equals comparison.
+	 *
+	 * However, if startoff is to the left of the first mapping in the
+	 * extent tree, _goto points the cursor at that first mapping because
+	 * it doesn't know how to deal with this situation.  In this case,
+	 * the tree lookup behaves as if we asked it to use a greater than
+	 * or equals comparison.
+	 *
+	 * Note: If _get() returns 'no current node', that means that there
+	 * aren't any mappings at all.
+	 */
+	ext2fs_extent_goto(handle, startoff);
+	err = ext2fs_extent_get(handle, EXT2_EXTENT_CURRENT, bmap);
+	__DUMP_EXTENT(ff, func, "lookup", startoff, err, bmap);
+	if (err == EXT2_ET_NO_CURRENT_NODE)
+		err = EXT2_ET_EXTENT_NOT_FOUND;
+	return err;
+}
+
+static inline errcode_t __fuse4fs_get_next_mapping(struct fuse4fs *ff,
+						   ext2_extent_handle_t handle,
+						   blk64_t startoff,
+						   struct ext2fs_extent *bmap,
+						   const char *func)
+{
+	struct ext2fs_extent newex;
+	struct ext2_extent_info info;
+	errcode_t err;
+
+	/*
+	 * The extent tree code has this (probably broken) behavior that if
+	 * more than two of the highest levels of the cursor point at the
+	 * rightmost edge of an extent tree block, a _NEXT_LEAF movement fails
+	 * to move the cursor position of any of the lower levels.  IOWs, if
+	 * leaf level N is at the right edge, it will only advance level N-1
+	 * to the right.  If N-1 was at the right edge, the cursor resets to
+	 * record 0 of that level and goes down to the wrong leaf.
+	 *
+	 * Work around this by walking up (towards root level 0) the extent
+	 * tree until we find a level where we're not already at the rightmost
+	 * edge.  The _NEXT_LEAF movement will walk down the tree to find the
+	 * leaves.
+	 */
+	err = ext2fs_extent_get_info(handle, &info);
+	DUMP_INFO(ff, "UP?", startoff, err, &info);
+	if (err)
+		return err;
+
+	while (info.curr_entry == info.num_entries && info.curr_level > 0) {
+		err = ext2fs_extent_get(handle, EXT2_EXTENT_UP, &newex);
+		DUMP_EXTENT(ff, "UP", startoff, err, &newex);
+		if (err)
+			return err;
+		err = ext2fs_extent_get_info(handle, &info);
+		DUMP_INFO(ff, "UP", startoff, err, &info);
+		if (err)
+			return err;
+	}
+
+	/*
+	 * If we're at the root and there are no more entries, there's nothing
+	 * else to be found.
+	 */
+	if (info.curr_level == 0 && info.curr_entry == info.num_entries)
+		return EXT2_ET_EXTENT_NOT_FOUND;
+
+	/* Otherwise grab this next leaf and return it. */
+	err = ext2fs_extent_get(handle, EXT2_EXTENT_NEXT_LEAF, &newex);
+	DUMP_EXTENT(ff, "NEXT", startoff, err, &newex);
+	if (err)
+		return err;
+
+	*bmap = newex;
+	return 0;
+}
+
+#define fuse4fs_get_mapping_at(ff, handle, startoff, bmap) \
+	__fuse4fs_get_mapping_at((ff), (handle), (startoff), (bmap), __func__)
+#define fuse4fs_get_next_mapping(ff, handle, startoff, bmap) \
+	__fuse4fs_get_next_mapping((ff), (handle), (startoff), (bmap), __func__)
+
+static errcode_t fuse4fs_iomap_begin_extent(struct fuse4fs *ff, uint64_t ino,
+					    struct ext2_inode_large *inode,
+					    off_t pos, uint64_t count,
+					    uint32_t opflags,
+					    struct fuse_file_iomap *iomap)
+{
+	ext2_extent_handle_t handle;
+	struct ext2fs_extent extent = { };
+	ext2_filsys fs = ff->fs;
+	const blk64_t startoff = FUSE4FS_B_TO_FSBT(ff, pos);
+	errcode_t err;
+	int ret = 0;
+
+	err = ext2fs_extent_open2(fs, ino, EXT2_INODE(inode), &handle);
+	if (err)
+		return translate_error(fs, ino, err);
+
+	err = fuse4fs_get_mapping_at(ff, handle, startoff, &extent);
+	if (err == EXT2_ET_EXTENT_NOT_FOUND) {
+		/* No mappings at all; the whole range is a hole. */
+		fuse4fs_iomap_hole_to_eof(ff, iomap, pos, count, inode);
+		goto out_handle;
+	}
+	if (err) {
+		ret = translate_error(fs, ino, err);
+		goto out_handle;
+	}
+
+	if (startoff < extent.e_lblk) {
+		/*
+		 * Mapping starts to the right of the current position.
+		 * Synthesize a hole going to that next extent.
+		 */
+		fuse4fs_iomap_hole(ff, iomap, FUSE4FS_FSB_TO_B(ff, startoff),
+				FUSE4FS_FSB_TO_B(ff, extent.e_lblk - startoff));
+		goto out_handle;
+	}
+
+	if (startoff >= extent.e_lblk + extent.e_len) {
+		/*
+		 * Mapping ends to the left of the current position.  Try to
+		 * find the next mapping.  If there is no next mapping, the
+		 * whole range is in a hole.
+		 */
+		err = fuse4fs_get_next_mapping(ff, handle, startoff, &extent);
+		if (err == EXT2_ET_EXTENT_NOT_FOUND) {
+			fuse4fs_iomap_hole_to_eof(ff, iomap, pos, count, inode);
+			goto out_handle;
+		}
+
+		/*
+		 * If the new mapping starts to the right of startoff, there's
+		 * a hole from startoff to the start of the new mapping.
+		 */
+		if (startoff < extent.e_lblk) {
+			fuse4fs_iomap_hole(ff, iomap,
+				FUSE4FS_FSB_TO_B(ff, startoff),
+				FUSE4FS_FSB_TO_B(ff, extent.e_lblk - startoff));
+			goto out_handle;
+		}
+
+		/*
+		 * The new mapping starts at startoff.  Something weird
+		 * happened in the extent tree lookup, but we found a valid
+		 * mapping so we'll run with it.
+		 */
+	}
+
+	/* Mapping overlaps startoff, report this. */
+	iomap->dev = FUSE_IOMAP_DEV_NULL;
+	iomap->addr = FUSE4FS_FSB_TO_B(ff, extent.e_pblk);
+	iomap->offset = FUSE4FS_FSB_TO_B(ff, extent.e_lblk);
+	iomap->length = FUSE4FS_FSB_TO_B(ff, extent.e_len);
+	if (extent.e_flags & EXT2_EXTENT_FLAGS_UNINIT)
+		iomap->type = FUSE_IOMAP_TYPE_UNWRITTEN;
+	else
+		iomap->type = FUSE_IOMAP_TYPE_MAPPED;
+
+out_handle:
+	ext2fs_extent_free(handle);
+	return ret;
+}
+
+static int fuse4fs_iomap_begin_indirect(struct fuse4fs *ff, uint64_t ino,
+					struct ext2_inode_large *inode,
+					off_t pos, uint64_t count,
+					uint32_t opflags,
+					struct fuse_file_iomap *iomap)
+{
+	ext2_filsys fs = ff->fs;
+	blk64_t startoff = FUSE4FS_B_TO_FSBT(ff, pos);
+	uint64_t isize = EXT2_I_SIZE(inode);
+	uint64_t real_count = min(count, 131072);
+	const blk64_t endoff = FUSE4FS_B_TO_FSB(ff, pos + real_count);
+	blk64_t startblock;
+	errcode_t err;
+
+	err = ext2fs_bmap2(fs, ino, EXT2_INODE(inode), NULL, 0, startoff, NULL,
+			   &startblock);
+	if (err)
+		return translate_error(fs, ino, err);
+
+	iomap->dev = FUSE_IOMAP_DEV_NULL;
+	iomap->offset = FUSE4FS_FSB_TO_B(ff, startoff);
+	iomap->flags |= FUSE_IOMAP_F_MERGED;
+	if (startblock) {
+		iomap->addr = FUSE4FS_FSB_TO_B(ff, startblock);
+		iomap->type = FUSE_IOMAP_TYPE_MAPPED;
+	} else {
+		iomap->addr = FUSE_IOMAP_NULL_ADDR;
+		iomap->type = FUSE_IOMAP_TYPE_HOLE;
+	}
+	iomap->length = fs->blocksize;
+
+	/* See how long the mapping goes for. */
+	for (startoff++; startoff < endoff; startoff++) {
+		blk64_t prev_startblock = startblock;
+
+		err = ext2fs_bmap2(fs, ino, EXT2_INODE(inode), NULL, 0,
+				   startoff, NULL, &startblock);
+		if (err)
+			break;
+
+		if (iomap->type == FUSE_IOMAP_TYPE_MAPPED) {
+			if (startblock == prev_startblock + 1)
+				iomap->length += fs->blocksize;
+			else
+				break;
+		} else {
+			if (startblock == 0)
+				iomap->length += fs->blocksize;
+			else
+				break;
+		}
+	}
+
+	/*
+	 * If this is a hole that goes beyond EOF, report this as a hole to the
+	 * end of the range queried so that FIEMAP doesn't go mad.
+	 */
+	if (iomap->type == FUSE_IOMAP_TYPE_HOLE &&
+	    iomap->offset + iomap->length >= isize)
+		fuse4fs_iomap_hole_to_eof(ff, iomap, pos, count, inode);
+
+	return 0;
+}
+
+static int fuse4fs_iomap_begin_inline(struct fuse4fs *ff, ext2_ino_t ino,
+				      struct ext2_inode_large *inode, off_t pos,
+				      uint64_t count, struct fuse_file_iomap *iomap)
+{
+	uint64_t one_fsb = FUSE4FS_FSB_TO_B(ff, 1);
+
+	if (pos >= one_fsb) {
+		fuse4fs_iomap_hole_to_eof(ff, iomap, pos, count, inode);
+	} else {
+		/* ext4 only supports inline data files up to 1 fsb */
+		iomap->dev = FUSE_IOMAP_DEV_NULL;
+		iomap->addr = FUSE_IOMAP_NULL_ADDR;
+		iomap->offset = 0;
+		iomap->length = one_fsb;
+		iomap->type = FUSE_IOMAP_TYPE_INLINE;
+	}
+
+	return 0;
+}
+
+static int fuse4fs_iomap_begin_report(struct fuse4fs *ff, ext2_ino_t ino,
+				      struct ext2_inode_large *inode,
+				      off_t pos, uint64_t count,
+				      uint32_t opflags,
+				      struct fuse_file_iomap *read)
+{
+	if (inode->i_flags & EXT4_INLINE_DATA_FL)
+		return fuse4fs_iomap_begin_inline(ff, ino, inode, pos, count,
+						  read);
+
+	if (inode->i_flags & EXT4_EXTENTS_FL)
+		return fuse4fs_iomap_begin_extent(ff, ino, inode, pos, count,
+						  opflags, read);
+
+	return fuse4fs_iomap_begin_indirect(ff, ino, inode, pos, count,
+					    opflags, read);
+}
+
+static int fuse4fs_iomap_begin_read(struct fuse4fs *ff, ext2_ino_t ino,
+				    struct ext2_inode_large *inode, off_t pos,
+				    uint64_t count, uint32_t opflags,
+				    struct fuse_file_iomap *read)
+{
+	return -ENOSYS;
+}
+
+static int fuse4fs_iomap_begin_write(struct fuse4fs *ff, ext2_ino_t ino,
+				     struct ext2_inode_large *inode, off_t pos,
+				     uint64_t count, uint32_t opflags,
+				     struct fuse_file_iomap *read)
+{
+	return -ENOSYS;
+}
+
+static void op_iomap_begin(fuse_req_t req, fuse_ino_t fino, uint64_t dontcare,
+			   off_t pos, uint64_t count, uint32_t opflags)
+{
+	struct fuse4fs *ff = fuse4fs_get(req);
+	struct ext2_inode_large inode;
+	struct fuse_file_iomap read = { };
+	ext2_filsys fs;
+	ext2_ino_t ino;
+	errcode_t err;
+	int ret = 0;
+
+	FUSE4FS_CHECK_CONTEXT(req);
+	FUSE4FS_CONVERT_FINO(req, &ino, fino);
+
+	dbg_printf(ff, "%s: ino=%d pos=0x%llx count=0x%llx opflags=0x%x\n",
+		   __func__, ino,
+		   (unsigned long long)pos,
+		   (unsigned long long)count,
+		   opflags);
+
+	fs = fuse4fs_start(ff);
+	err = fuse4fs_read_inode(fs, ino, &inode);
+	if (err) {
+		ret = translate_error(fs, ino, err);
+		goto out_unlock;
+	}
+
+	if (opflags & FUSE_IOMAP_OP_REPORT)
+		ret = fuse4fs_iomap_begin_report(ff, ino, &inode, pos, count,
+						 opflags, &read);
+	else if (fuse_iomap_is_write(opflags))
+		ret = fuse4fs_iomap_begin_write(ff, ino, &inode, pos, count,
+						opflags, &read);
+	else
+		ret = fuse4fs_iomap_begin_read(ff, ino, &inode, pos, count,
+					       opflags, &read);
+	if (ret)
+		goto out_unlock;
+
+	dbg_printf(ff,
+ "%s: ino=%d pos=0x%llx -> addr=0x%llx offset=0x%llx length=0x%llx type=%u flags=0x%x\n",
+		   __func__, ino,
+		   (unsigned long long)pos,
+		   (unsigned long long)read.addr,
+		   (unsigned long long)read.offset,
+		   (unsigned long long)read.length,
+		   read.type,
+		   read.flags);
+
+out_unlock:
+	fuse4fs_finish(ff, ret);
+	if (ret)
+		fuse_reply_err(req, -ret);
+	else
+		fuse_reply_iomap_begin(req, &read, NULL);
+}
+
+static void op_iomap_end(fuse_req_t req, fuse_ino_t fino, uint64_t dontcare,
+			 off_t pos, uint64_t count, uint32_t opflags,
+			 ssize_t written, const struct fuse_file_iomap *iomap)
+{
+	struct fuse4fs *ff = fuse4fs_get(req);
+	ext2_ino_t ino;
+
+	FUSE4FS_CHECK_CONTEXT(req);
+	FUSE4FS_CONVERT_FINO(req, &ino, fino);
+
+	dbg_printf(ff,
+ "%s: ino=%d pos=0x%llx count=0x%llx opflags=0x%x written=0x%zx mapflags=0x%x\n",
+		   __func__, ino,
+		   (unsigned long long)pos,
+		   (unsigned long long)count,
+		   opflags,
+		   written,
+		   iomap->flags);
+
+	fuse_reply_err(req, 0);
+}
+#endif /* HAVE_FUSE_IOMAP */
+
 static struct fuse_lowlevel_ops fs_ops = {
 	.lookup = op_lookup,
 	.setattr = op_setattr,
@@ -5445,6 +5950,10 @@ static struct fuse_lowlevel_ops fs_ops = {
 #ifdef SUPPORT_FALLOCATE
 	.fallocate = op_fallocate,
 #endif
+#ifdef HAVE_FUSE_IOMAP
+	.iomap_begin = op_iomap_begin,
+	.iomap_end = op_iomap_end,
+#endif /* HAVE_FUSE_IOMAP */
 };
 
 static int get_random_bytes(void *p, size_t sz)
@@ -5768,17 +6277,19 @@ static int fuse4fs_main(struct fuse_args *args, struct fuse4fs *ff)
 int main(int argc, char *argv[])
 {
 	struct fuse_args args = FUSE_ARGS_INIT(argc, argv);
-	struct fuse4fs fctx;
+	struct fuse4fs fctx = {
+		.magic = FUSE4FS_MAGIC,
+		.opstate = F4OP_WRITABLE,
+		.logfd = -1,
+#ifdef HAVE_FUSE_IOMAP
+		.iomap_state = IOMAP_UNKNOWN,
+#endif
+	};
 	errcode_t err;
 	FILE *orig_stderr = stderr;
 	char extra_args[BUFSIZ];
 	int ret;
 
-	memset(&fctx, 0, sizeof(fctx));
-	fctx.magic = FUSE4FS_MAGIC;
-	fctx.logfd = -1;
-	fctx.opstate = F4OP_WRITABLE;
-
 	ret = fuse_opt_parse(&args, &fctx, fuse4fs_opts, fuse4fs_opt_proc);
 	if (ret)
 		exit(1);
diff --git a/lib/config.h.in b/lib/config.h.in
index c3379758c3c9bc..55e515020af422 100644
--- a/lib/config.h.in
+++ b/lib/config.h.in
@@ -76,6 +76,9 @@
 /* Define to 1 if fuse supports lowlevel API */
 #undef HAVE_FUSE_LOWLEVEL
 
+/* Define to 1 if fuse supports iomap */
+#undef HAVE_FUSE_IOMAP
+
 /* Define to 1 if you have the Mac OS X function
    CFLocaleCopyPreferredLanguages in the CoreFoundation framework. */
 #undef HAVE_CFLOCALECOPYPREFERREDLANGUAGES
diff --git a/misc/fuse2fs.c b/misc/fuse2fs.c
index 6290d22f2b9658..ca61fbc89f5fda 100644
--- a/misc/fuse2fs.c
+++ b/misc/fuse2fs.c
@@ -137,6 +137,9 @@ static inline uint64_t round_down(uint64_t b, unsigned int align)
 	return b - m;
 }
 
+#define max(a, b)	((a) > (b) ? (a) : (b))
+#define min(a, b)	((a) < (b) ? (a) : (b))
+
 #define dbg_printf(fuse2fs, format, ...) \
 	while ((fuse2fs)->debug) { \
 		printf("FUSE2FS (%s): tid=%d " format, (fuse2fs)->shortdev, gettid(), ##__VA_ARGS__); \
@@ -213,6 +216,14 @@ enum fuse2fs_opstate {
 	F2OP_SHUTDOWN,
 };
 
+#ifdef HAVE_FUSE_IOMAP
+enum fuse2fs_iomap_state {
+	IOMAP_DISABLED,
+	IOMAP_UNKNOWN,
+	IOMAP_ENABLED,
+};
+#endif
+
 /* Main program context */
 #define FUSE2FS_MAGIC		(0xEF53DEADUL)
 struct fuse2fs {
@@ -241,6 +252,9 @@ struct fuse2fs {
 	enum fuse2fs_opstate opstate;
 	int logfd;
 	int blocklog;
+#ifdef HAVE_FUSE_IOMAP
+	enum fuse2fs_iomap_state iomap_state;
+#endif
 	unsigned int blockmask;
 	unsigned long offset;
 	unsigned int next_generation;
@@ -536,6 +550,15 @@ static inline void __fuse2fs_finish(struct fuse2fs *ff, int ret,
 }
 #define fuse2fs_finish(ff, ret) __fuse2fs_finish((ff), (ret), __func__)
 
+#ifdef HAVE_FUSE_IOMAP
+static inline int fuse2fs_iomap_enabled(const struct fuse2fs *ff)
+{
+	return ff->iomap_state >= IOMAP_ENABLED;
+}
+#else
+# define fuse2fs_iomap_enabled(...)	(0)
+#endif
+
 static void get_now(struct timespec *now)
 {
 #ifdef CLOCK_REALTIME
@@ -932,7 +955,7 @@ static errcode_t fuse2fs_open(struct fuse2fs *ff, int libext2_flags)
 {
 	char options[128];
 	int flags = EXT2_FLAG_64BITS | EXT2_FLAG_THREADS | EXT2_FLAG_RW |
-		    libext2_flags;
+		    EXT2_FLAG_WRITE_FULL_SUPER | libext2_flags;
 	errcode_t err;
 
 	if (ff->lockfile) {
@@ -1300,6 +1323,33 @@ static inline int fuse_set_feature_flag(struct fuse_conn_info *conn,
 }
 #endif
 
+#ifdef HAVE_FUSE_IOMAP
+static void fuse2fs_iomap_enable(struct fuse_conn_info *conn,
+				 struct fuse2fs *ff)
+{
+	/* Don't let anyone touch iomap until the end of the patchset. */
+	ff->iomap_state = IOMAP_DISABLED;
+	return;
+
+	/* iomap only works with block devices */
+	if (ff->iomap_state != IOMAP_DISABLED && fuse2fs_on_bdev(ff) &&
+	    fuse_set_feature_flag(conn, FUSE_CAP_IOMAP)) {
+		/*
+		 * If we're mounting in iomap mode, we need to unmount in
+		 * op_destroy so that the block device will be released before
+		 * umount(2) returns.
+		 */
+		ff->unmount_in_destroy = 1;
+		ff->iomap_state = IOMAP_ENABLED;
+	}
+
+	if (ff->iomap_state == IOMAP_UNKNOWN)
+		ff->iomap_state = IOMAP_DISABLED;
+}
+#else
+# define fuse2fs_iomap_enable(...)	((void)0)
+#endif
+
 static void *op_init(struct fuse_conn_info *conn,
 		     struct fuse_config *cfg EXT2FS_ATTR((unused)))
 {
@@ -1333,6 +1383,8 @@ static void *op_init(struct fuse_conn_info *conn,
 #ifdef FUSE_CAP_NO_EXPORT_SUPPORT
 	fuse_set_feature_flag(conn, FUSE_CAP_NO_EXPORT_SUPPORT);
 #endif
+	fuse2fs_iomap_enable(conn, ff);
+
 	conn->time_gran = 1;
 	cfg->use_ino = 1;
 	if (ff->debug)
@@ -4842,6 +4894,459 @@ static int op_fallocate(const char *path EXT2FS_ATTR((unused)), int mode,
 }
 #endif /* SUPPORT_FALLOCATE */
 
+#ifdef HAVE_FUSE_IOMAP
+static void fuse2fs_iomap_hole(struct fuse2fs *ff, struct fuse_file_iomap *iomap,
+			       off_t pos, uint64_t count)
+{
+	iomap->dev = FUSE_IOMAP_DEV_NULL;
+	iomap->addr = FUSE_IOMAP_NULL_ADDR;
+	iomap->offset = pos;
+	iomap->length = count;
+	iomap->type = FUSE_IOMAP_TYPE_HOLE;
+}
+
+static void fuse2fs_iomap_hole_to_eof(struct fuse2fs *ff,
+				      struct fuse_file_iomap *iomap, off_t pos,
+				      off_t count,
+				      const struct ext2_inode_large *inode)
+{
+	ext2_filsys fs = ff->fs;
+	uint64_t isize = EXT2_I_SIZE(inode);
+
+	/*
+	 * We have to be careful about handling a hole to the right of the
+	 * entire mapping tree.  First, the mapping must start and end on a
+	 * block boundary because they must be aligned to at least an LBA for
+	 * the block layer; and to the fsblock for smoother operation.
+	 *
+	 * As for the length -- we could return a mapping all the way to
+	 * i_size, but i_size could be less than pos/count if we're zeroing the
+	 * EOF block in anticipation of a truncate operation.  Similarly, we
+	 * don't want to end the mapping at pos+count because we know there's
+	 * nothing mapped byeond here.
+	 */
+	uint64_t startoff = round_down(pos, fs->blocksize);
+	uint64_t eofoff = round_up(max(pos + count, isize), fs->blocksize);
+
+	dbg_printf(ff,
+ "pos=0x%llx count=0x%llx isize=0x%llx startoff=0x%llx eofoff=0x%llx\n",
+		   (unsigned long long)pos,
+		   (unsigned long long)count,
+		   (unsigned long long)isize,
+		   (unsigned long long)startoff,
+		   (unsigned long long)eofoff);
+
+	fuse2fs_iomap_hole(ff, iomap, startoff, eofoff - startoff);
+}
+
+#define DEBUG_IOMAP
+#ifdef DEBUG_IOMAP
+# define __DUMP_EXTENT(ff, func, tag, startoff, err, extent) \
+	do { \
+		dbg_printf((ff), \
+ "%s: %s startoff 0x%llx err %ld lblk 0x%llx pblk 0x%llx len 0x%x flags 0x%x\n", \
+			   (func), (tag), (startoff), (err), (extent)->e_lblk, \
+			   (extent)->e_pblk, (extent)->e_len, \
+			   (extent)->e_flags & EXT2_EXTENT_FLAGS_UNINIT); \
+	} while(0)
+# define DUMP_EXTENT(ff, tag, startoff, err, extent) \
+	__DUMP_EXTENT((ff), __func__, (tag), (startoff), (err), (extent))
+
+# define __DUMP_INFO(ff, func, tag, startoff, err, info) \
+	do { \
+		dbg_printf((ff), \
+ "%s: %s startoff 0x%llx err %ld entry %d/%d/%d level  %d/%d\n", \
+			   (func), (tag), (startoff), (err), \
+			   (info)->curr_entry, (info)->num_entries, \
+			   (info)->max_entries, (info)->curr_level, \
+			   (info)->max_depth); \
+	} while(0)
+# define DUMP_INFO(ff, tag, startoff, err, info) \
+	__DUMP_INFO((ff), __func__, (tag), (startoff), (err), (info))
+#else
+# define __DUMP_EXTENT(...)	((void)0)
+# define DUMP_EXTENT(...)	((void)0)
+# define DUMP_INFO(...)		((void)0)
+#endif
+
+static inline errcode_t __fuse2fs_get_mapping_at(struct fuse2fs *ff,
+						 ext2_extent_handle_t handle,
+						 blk64_t startoff,
+						 struct ext2fs_extent *bmap,
+						 const char *func)
+{
+	errcode_t err;
+
+	/*
+	 * Find the file mapping at startoff.  We don't check the return value
+	 * of _goto because _get will error out if _goto failed.  There's a
+	 * subtlety to the outcome of _goto when startoff falls in a sparse
+	 * hole however:
+	 *
+	 * Most of the time, _goto points the cursor at the mapping whose lblk
+	 * is just to the left of startoff.  The mapping may or may not overlap
+	 * startoff; this is ok.  In other words, the tree lookup behaves as if
+	 * we asked it to use a less than or equals comparison.
+	 *
+	 * However, if startoff is to the left of the first mapping in the
+	 * extent tree, _goto points the cursor at that first mapping because
+	 * it doesn't know how to deal with this situation.  In this case,
+	 * the tree lookup behaves as if we asked it to use a greater than
+	 * or equals comparison.
+	 *
+	 * Note: If _get() returns 'no current node', that means that there
+	 * aren't any mappings at all.
+	 */
+	ext2fs_extent_goto(handle, startoff);
+	err = ext2fs_extent_get(handle, EXT2_EXTENT_CURRENT, bmap);
+	__DUMP_EXTENT(ff, func, "lookup", startoff, err, bmap);
+	if (err == EXT2_ET_NO_CURRENT_NODE)
+		err = EXT2_ET_EXTENT_NOT_FOUND;
+	return err;
+}
+
+static inline errcode_t __fuse2fs_get_next_mapping(struct fuse2fs *ff,
+						   ext2_extent_handle_t handle,
+						   blk64_t startoff,
+						   struct ext2fs_extent *bmap,
+						   const char *func)
+{
+	struct ext2fs_extent newex;
+	struct ext2_extent_info info;
+	errcode_t err;
+
+	/*
+	 * The extent tree code has this (probably broken) behavior that if
+	 * more than two of the highest levels of the cursor point at the
+	 * rightmost edge of an extent tree block, a _NEXT_LEAF movement fails
+	 * to move the cursor position of any of the lower levels.  IOWs, if
+	 * leaf level N is at the right edge, it will only advance level N-1
+	 * to the right.  If N-1 was at the right edge, the cursor resets to
+	 * record 0 of that level and goes down to the wrong leaf.
+	 *
+	 * Work around this by walking up (towards root level 0) the extent
+	 * tree until we find a level where we're not already at the rightmost
+	 * edge.  The _NEXT_LEAF movement will walk down the tree to find the
+	 * leaves.
+	 */
+	err = ext2fs_extent_get_info(handle, &info);
+	DUMP_INFO(ff, "UP?", startoff, err, &info);
+	if (err)
+		return err;
+
+	while (info.curr_entry == info.num_entries && info.curr_level > 0) {
+		err = ext2fs_extent_get(handle, EXT2_EXTENT_UP, &newex);
+		DUMP_EXTENT(ff, "UP", startoff, err, &newex);
+		if (err)
+			return err;
+		err = ext2fs_extent_get_info(handle, &info);
+		DUMP_INFO(ff, "UP", startoff, err, &info);
+		if (err)
+			return err;
+	}
+
+	/*
+	 * If we're at the root and there are no more entries, there's nothing
+	 * else to be found.
+	 */
+	if (info.curr_level == 0 && info.curr_entry == info.num_entries)
+		return EXT2_ET_EXTENT_NOT_FOUND;
+
+	/* Otherwise grab this next leaf and return it. */
+	err = ext2fs_extent_get(handle, EXT2_EXTENT_NEXT_LEAF, &newex);
+	DUMP_EXTENT(ff, "NEXT", startoff, err, &newex);
+	if (err)
+		return err;
+
+	*bmap = newex;
+	return 0;
+}
+
+#define fuse2fs_get_mapping_at(ff, handle, startoff, bmap) \
+	__fuse2fs_get_mapping_at((ff), (handle), (startoff), (bmap), __func__)
+#define fuse2fs_get_next_mapping(ff, handle, startoff, bmap) \
+	__fuse2fs_get_next_mapping((ff), (handle), (startoff), (bmap), __func__)
+
+static errcode_t fuse2fs_iomap_begin_extent(struct fuse2fs *ff, uint64_t ino,
+					    struct ext2_inode_large *inode,
+					    off_t pos, uint64_t count,
+					    uint32_t opflags,
+					    struct fuse_file_iomap *iomap)
+{
+	ext2_extent_handle_t handle;
+	struct ext2fs_extent extent = { };
+	ext2_filsys fs = ff->fs;
+	const blk64_t startoff = FUSE2FS_B_TO_FSBT(ff, pos);
+	errcode_t err;
+	int ret = 0;
+
+	err = ext2fs_extent_open2(fs, ino, EXT2_INODE(inode), &handle);
+	if (err)
+		return translate_error(fs, ino, err);
+
+	err = fuse2fs_get_mapping_at(ff, handle, startoff, &extent);
+	if (err == EXT2_ET_EXTENT_NOT_FOUND) {
+		/* No mappings at all; the whole range is a hole. */
+		fuse2fs_iomap_hole_to_eof(ff, iomap, pos, count, inode);
+		goto out_handle;
+	}
+	if (err) {
+		ret = translate_error(fs, ino, err);
+		goto out_handle;
+	}
+
+	if (startoff < extent.e_lblk) {
+		/*
+		 * Mapping starts to the right of the current position.
+		 * Synthesize a hole going to that next extent.
+		 */
+		fuse2fs_iomap_hole(ff, iomap, FUSE2FS_FSB_TO_B(ff, startoff),
+				FUSE2FS_FSB_TO_B(ff, extent.e_lblk - startoff));
+		goto out_handle;
+	}
+
+	if (startoff >= extent.e_lblk + extent.e_len) {
+		/*
+		 * Mapping ends to the left of the current position.  Try to
+		 * find the next mapping.  If there is no next mapping, the
+		 * whole range is in a hole.
+		 */
+		err = fuse2fs_get_next_mapping(ff, handle, startoff, &extent);
+		if (err == EXT2_ET_EXTENT_NOT_FOUND) {
+			fuse2fs_iomap_hole_to_eof(ff, iomap, pos, count, inode);
+			goto out_handle;
+		}
+
+		/*
+		 * If the new mapping starts to the right of startoff, there's
+		 * a hole from startoff to the start of the new mapping.
+		 */
+		if (startoff < extent.e_lblk) {
+			fuse2fs_iomap_hole(ff, iomap,
+				FUSE2FS_FSB_TO_B(ff, startoff),
+				FUSE2FS_FSB_TO_B(ff, extent.e_lblk - startoff));
+			goto out_handle;
+		}
+
+		/*
+		 * The new mapping starts at startoff.  Something weird
+		 * happened in the extent tree lookup, but we found a valid
+		 * mapping so we'll run with it.
+		 */
+	}
+
+	/* Mapping overlaps startoff, report this. */
+	iomap->dev = FUSE_IOMAP_DEV_NULL;
+	iomap->addr = FUSE2FS_FSB_TO_B(ff, extent.e_pblk);
+	iomap->offset = FUSE2FS_FSB_TO_B(ff, extent.e_lblk);
+	iomap->length = FUSE2FS_FSB_TO_B(ff, extent.e_len);
+	if (extent.e_flags & EXT2_EXTENT_FLAGS_UNINIT)
+		iomap->type = FUSE_IOMAP_TYPE_UNWRITTEN;
+	else
+		iomap->type = FUSE_IOMAP_TYPE_MAPPED;
+
+out_handle:
+	ext2fs_extent_free(handle);
+	return ret;
+}
+
+static int fuse2fs_iomap_begin_indirect(struct fuse2fs *ff, uint64_t ino,
+					struct ext2_inode_large *inode,
+					off_t pos, uint64_t count,
+					uint32_t opflags,
+					struct fuse_file_iomap *iomap)
+{
+	ext2_filsys fs = ff->fs;
+	blk64_t startoff = FUSE2FS_B_TO_FSBT(ff, pos);
+	uint64_t isize = EXT2_I_SIZE(inode);
+	uint64_t real_count = min(count, 131072);
+	const blk64_t endoff = FUSE2FS_B_TO_FSB(ff, pos + real_count);
+	blk64_t startblock;
+	errcode_t err;
+
+	err = ext2fs_bmap2(fs, ino, EXT2_INODE(inode), NULL, 0, startoff, NULL,
+			   &startblock);
+	if (err)
+		return translate_error(fs, ino, err);
+
+	iomap->dev = FUSE_IOMAP_DEV_NULL;
+	iomap->offset = FUSE2FS_FSB_TO_B(ff, startoff);
+	iomap->flags |= FUSE_IOMAP_F_MERGED;
+	if (startblock) {
+		iomap->addr = FUSE2FS_FSB_TO_B(ff, startblock);
+		iomap->type = FUSE_IOMAP_TYPE_MAPPED;
+	} else {
+		iomap->addr = FUSE_IOMAP_NULL_ADDR;
+		iomap->type = FUSE_IOMAP_TYPE_HOLE;
+	}
+	iomap->length = fs->blocksize;
+
+	/* See how long the mapping goes for. */
+	for (startoff++; startoff < endoff; startoff++) {
+		blk64_t prev_startblock = startblock;
+
+		err = ext2fs_bmap2(fs, ino, EXT2_INODE(inode), NULL, 0,
+				   startoff, NULL, &startblock);
+		if (err)
+			break;
+
+		if (iomap->type == FUSE_IOMAP_TYPE_MAPPED) {
+			if (startblock == prev_startblock + 1)
+				iomap->length += fs->blocksize;
+			else
+				break;
+		} else {
+			if (startblock == 0)
+				iomap->length += fs->blocksize;
+			else
+				break;
+		}
+	}
+
+	/*
+	 * If this is a hole that goes beyond EOF, report this as a hole to the
+	 * end of the range queried so that FIEMAP doesn't go mad.
+	 */
+	if (iomap->type == FUSE_IOMAP_TYPE_HOLE &&
+	    iomap->offset + iomap->length >= isize)
+		fuse2fs_iomap_hole_to_eof(ff, iomap, pos, count, inode);
+
+	return 0;
+}
+
+static int fuse2fs_iomap_begin_inline(struct fuse2fs *ff, ext2_ino_t ino,
+				      struct ext2_inode_large *inode, off_t pos,
+				      uint64_t count, struct fuse_file_iomap *iomap)
+{
+	uint64_t one_fsb = FUSE2FS_FSB_TO_B(ff, 1);
+
+	if (pos >= one_fsb) {
+		fuse2fs_iomap_hole_to_eof(ff, iomap, pos, count, inode);
+	} else {
+		/* ext4 only supports inline data files up to 1 fsb */
+		iomap->dev = FUSE_IOMAP_DEV_NULL;
+		iomap->addr = FUSE_IOMAP_NULL_ADDR;
+		iomap->offset = 0;
+		iomap->length = one_fsb;
+		iomap->type = FUSE_IOMAP_TYPE_INLINE;
+	}
+
+	return 0;
+}
+
+static int fuse2fs_iomap_begin_report(struct fuse2fs *ff, ext2_ino_t ino,
+				      struct ext2_inode_large *inode,
+				      off_t pos, uint64_t count,
+				      uint32_t opflags,
+				      struct fuse_file_iomap *read)
+{
+	if (inode->i_flags & EXT4_INLINE_DATA_FL)
+		return fuse2fs_iomap_begin_inline(ff, ino, inode, pos, count,
+						  read);
+
+	if (inode->i_flags & EXT4_EXTENTS_FL)
+		return fuse2fs_iomap_begin_extent(ff, ino, inode, pos, count,
+						  opflags, read);
+
+	return fuse2fs_iomap_begin_indirect(ff, ino, inode, pos, count,
+					    opflags, read);
+}
+
+static int fuse2fs_iomap_begin_read(struct fuse2fs *ff, ext2_ino_t ino,
+				    struct ext2_inode_large *inode, off_t pos,
+				    uint64_t count, uint32_t opflags,
+				    struct fuse_file_iomap *read)
+{
+	return -ENOSYS;
+}
+
+static int fuse2fs_iomap_begin_write(struct fuse2fs *ff, ext2_ino_t ino,
+				     struct ext2_inode_large *inode, off_t pos,
+				     uint64_t count, uint32_t opflags,
+				     struct fuse_file_iomap *read)
+{
+	return -ENOSYS;
+}
+
+static int op_iomap_begin(const char *path, uint64_t nodeid, uint64_t attr_ino,
+			  off_t pos, uint64_t count, uint32_t opflags,
+			  struct fuse_file_iomap *read,
+			  struct fuse_file_iomap *write)
+{
+	struct fuse2fs *ff = fuse2fs_get();
+	struct ext2_inode_large inode;
+	ext2_filsys fs;
+	errcode_t err;
+	int ret = 0;
+
+	FUSE2FS_CHECK_CONTEXT(ff);
+
+	dbg_printf(ff,
+ "%s: path=%s nodeid=%llu attr_ino=%llu pos=0x%llx count=0x%llx opflags=0x%x\n",
+		   __func__, path,
+		   (unsigned long long)nodeid,
+		   (unsigned long long)attr_ino,
+		   (unsigned long long)pos,
+		   (unsigned long long)count,
+		   opflags);
+
+	fs = fuse2fs_start(ff);
+	err = fuse2fs_read_inode(fs, attr_ino, &inode);
+	if (err) {
+		ret = translate_error(fs, attr_ino, err);
+		goto out_unlock;
+	}
+
+	if (opflags & FUSE_IOMAP_OP_REPORT)
+		ret = fuse2fs_iomap_begin_report(ff, attr_ino, &inode, pos,
+						 count, opflags, read);
+	else if (fuse_iomap_is_write(opflags))
+		ret = fuse2fs_iomap_begin_write(ff, attr_ino, &inode, pos,
+						count, opflags, read);
+	else
+		ret = fuse2fs_iomap_begin_read(ff, attr_ino, &inode, pos,
+					       count, opflags, read);
+	if (ret)
+		goto out_unlock;
+
+	dbg_printf(ff, "%s: nodeid=%llu attr_ino=%llu pos=0x%llx -> addr=0x%llx offset=0x%llx length=0x%llx type=%u\n",
+		   __func__,
+		   (unsigned long long)nodeid,
+		   (unsigned long long)attr_ino,
+		   (unsigned long long)pos,
+		   (unsigned long long)read->addr,
+		   (unsigned long long)read->offset,
+		   (unsigned long long)read->length,
+		   read->type);
+
+out_unlock:
+	fuse2fs_finish(ff, ret);
+	return ret;
+}
+
+static int op_iomap_end(const char *path, uint64_t nodeid, uint64_t attr_ino,
+			off_t pos, uint64_t count, uint32_t opflags,
+			ssize_t written, const struct fuse_file_iomap *iomap)
+{
+	struct fuse2fs *ff = fuse2fs_get();
+
+	FUSE2FS_CHECK_CONTEXT(ff);
+
+	dbg_printf(ff,
+ "%s: path=%s nodeid=%llu attr_ino=%llu pos=0x%llx count=0x%llx opflags=0x%x written=0x%zx mapflags=0x%x\n",
+		   __func__, path,
+		   (unsigned long long)nodeid,
+		   (unsigned long long)attr_ino,
+		   (unsigned long long)pos,
+		   (unsigned long long)count,
+		   opflags,
+		   written,
+		   iomap->flags);
+
+	return 0;
+}
+#endif /* HAVE_FUSE_IOMAP */
+
 static struct fuse_operations fs_ops = {
 	.init = op_init,
 	.destroy = op_destroy,
@@ -4883,6 +5388,10 @@ static struct fuse_operations fs_ops = {
 #ifdef SUPPORT_FALLOCATE
 	.fallocate = op_fallocate,
 #endif
+#ifdef HAVE_FUSE_IOMAP
+	.iomap_begin = op_iomap_begin,
+	.iomap_end = op_iomap_end,
+#endif /* HAVE_FUSE_IOMAP */
 };
 
 static int get_random_bytes(void *p, size_t sz)
@@ -5106,17 +5615,19 @@ static void fuse2fs_com_err_proc(const char *whoami, errcode_t code,
 int main(int argc, char *argv[])
 {
 	struct fuse_args args = FUSE_ARGS_INIT(argc, argv);
-	struct fuse2fs fctx;
+	struct fuse2fs fctx = {
+		.magic = FUSE2FS_MAGIC,
+		.opstate = F2OP_WRITABLE,
+		.logfd = -1,
+#ifdef HAVE_FUSE_IOMAP
+		.iomap_state = IOMAP_UNKNOWN,
+#endif
+	};
 	errcode_t err;
 	FILE *orig_stderr = stderr;
 	char extra_args[BUFSIZ];
 	int ret;
 
-	memset(&fctx, 0, sizeof(fctx));
-	fctx.magic = FUSE2FS_MAGIC;
-	fctx.logfd = -1;
-	fctx.opstate = F2OP_WRITABLE;
-
 	ret = fuse_opt_parse(&args, &fctx, fuse2fs_opts, fuse2fs_opt_proc);
 	if (ret)
 		exit(1);


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 02/17] fuse2fs: add iomap= mount option
  2025-09-16  0:22 ` [PATCHSET RFC v5 4/9] fuse2fs: use fuse iomap data paths for better file I/O performance Darrick J. Wong
  2025-09-16  0:58   ` [PATCH 01/17] fuse2fs: implement bare minimum iomap for file mapping reporting Darrick J. Wong
@ 2025-09-16  0:59   ` Darrick J. Wong
  2025-09-16  0:59   ` [PATCH 03/17] fuse2fs: implement iomap configuration Darrick J. Wong
                     ` (14 subsequent siblings)
  16 siblings, 0 replies; 87+ messages in thread
From: Darrick J. Wong @ 2025-09-16  0:59 UTC (permalink / raw)
  To: tytso; +Cc: miklos, neal, linux-fsdevel, linux-ext4, John, bernd,
	joannelkoong

From: Darrick J. Wong <djwong@kernel.org>

Add a mount option to control iomap usage so that we can test before and
after scenarios.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 fuse4fs/fuse4fs.1.in |    6 ++++++
 fuse4fs/fuse4fs.c    |   46 ++++++++++++++++++++++++++++++++++++++++++++++
 misc/fuse2fs.1.in    |    6 ++++++
 misc/fuse2fs.c       |   46 ++++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 104 insertions(+)


diff --git a/fuse4fs/fuse4fs.1.in b/fuse4fs/fuse4fs.1.in
index 8bef5f48802385..8855867d27101d 100644
--- a/fuse4fs/fuse4fs.1.in
+++ b/fuse4fs/fuse4fs.1.in
@@ -75,6 +75,12 @@ .SS "fuse4fs options:"
 \fB-o\fR fuse4fs_debug
 enable fuse4fs debugging
 .TP
+\fB-o\fR iomap=
+If set to \fI1\fR, requires iomap to be enabled.
+If set to \fI0\fR, forbids use of iomap.
+If set to \fIdefault\fR (or not set), enables iomap if present.
+This substantially improves the performance of the fuse4fs server.
+.TP
 \fB-o\fR kernel
 Behave more like the kernel ext4 driver in the following ways:
 Allows processes owned by other users to access the filesystem.
diff --git a/fuse4fs/fuse4fs.c b/fuse4fs/fuse4fs.c
index bf9c2081702132..2d7b40911ce0f7 100644
--- a/fuse4fs/fuse4fs.c
+++ b/fuse4fs/fuse4fs.c
@@ -223,6 +223,12 @@ enum fuse4fs_opstate {
 	F4OP_SHUTDOWN,
 };
 
+enum fuse4fs_feature_toggle {
+	FT_DISABLE,
+	FT_ENABLE,
+	FT_DEFAULT,
+};
+
 #ifdef HAVE_FUSE_IOMAP
 enum fuse4fs_iomap_state {
 	IOMAP_DISABLED,
@@ -260,6 +266,7 @@ struct fuse4fs {
 	int logfd;
 	int blocklog;
 #ifdef HAVE_FUSE_IOMAP
+	enum fuse4fs_feature_toggle iomap_want;
 	enum fuse4fs_iomap_state iomap_state;
 #endif
 	unsigned int blockmask;
@@ -1539,6 +1546,12 @@ static void fuse4fs_iomap_enable(struct fuse_conn_info *conn,
 
 	if (ff->iomap_state == IOMAP_UNKNOWN)
 		ff->iomap_state = IOMAP_DISABLED;
+
+	if (!fuse4fs_iomap_enabled(ff)) {
+		if (ff->iomap_want == FT_ENABLE)
+			err_printf(ff, "%s\n", _("Could not enable iomap."));
+		return;
+	}
 }
 #else
 # define fuse4fs_iomap_enable(...)	((void)0)
@@ -5981,6 +5994,9 @@ enum {
 	FUSE4FS_CACHE_SIZE,
 	FUSE4FS_DIRSYNC,
 	FUSE4FS_ERRORS_BEHAVIOR,
+#ifdef HAVE_FUSE_IOMAP
+	FUSE4FS_IOMAP,
+#endif
 };
 
 #define FUSE4FS_OPT(t, p, v) { t, offsetof(struct fuse4fs, p), v }
@@ -6012,6 +6028,10 @@ static struct fuse_opt fuse4fs_opts[] = {
 	FUSE_OPT_KEY("cache_size=%s",	FUSE4FS_CACHE_SIZE),
 	FUSE_OPT_KEY("dirsync",		FUSE4FS_DIRSYNC),
 	FUSE_OPT_KEY("errors=%s",	FUSE4FS_ERRORS_BEHAVIOR),
+#ifdef HAVE_FUSE_IOMAP
+	FUSE_OPT_KEY("iomap=%s",	FUSE4FS_IOMAP),
+	FUSE_OPT_KEY("iomap",		FUSE4FS_IOMAP),
+#endif
 
 	FUSE_OPT_KEY("-V",             FUSE4FS_VERSION),
 	FUSE_OPT_KEY("--version",      FUSE4FS_VERSION),
@@ -6063,6 +6083,23 @@ static int fuse4fs_opt_proc(void *data, const char *arg,
 
 		/* do not pass through to libfuse */
 		return 0;
+#ifdef HAVE_FUSE_IOMAP
+	case FUSE4FS_IOMAP:
+		if (strcmp(arg, "iomap") == 0 || strcmp(arg + 6, "1") == 0)
+			ff->iomap_want = FT_ENABLE;
+		else if (strcmp(arg + 6, "0") == 0)
+			ff->iomap_want = FT_DISABLE;
+		else if (strcmp(arg + 6, "default") == 0)
+			ff->iomap_want = FT_DEFAULT;
+		else {
+			fprintf(stderr, "%s: %s\n", arg,
+ _("unknown iomap= behavior."));
+			return -1;
+		}
+
+		/* do not pass through to libfuse */
+		return 0;
+#endif
 	case FUSE4FS_IGNORED:
 		return 0;
 	case FUSE4FS_HELP:
@@ -6090,6 +6127,9 @@ static int fuse4fs_opt_proc(void *data, const char *arg,
 	"    -o cache_size=N[KMG]   use a disk cache of this size\n"
 	"    -o errors=             behavior when an error is encountered:\n"
 	"                           continue|remount-ro|panic\n"
+#ifdef HAVE_FUSE_IOMAP
+	"    -o iomap=              0 to disable iomap, 1 to enable iomap\n"
+#endif
 	"\n",
 			outargs->argv[0]);
 		if (key == FUSE4FS_HELPFULL) {
@@ -6282,6 +6322,7 @@ int main(int argc, char *argv[])
 		.opstate = F4OP_WRITABLE,
 		.logfd = -1,
 #ifdef HAVE_FUSE_IOMAP
+		.iomap_want = FT_DEFAULT,
 		.iomap_state = IOMAP_UNKNOWN,
 #endif
 	};
@@ -6299,6 +6340,11 @@ int main(int argc, char *argv[])
 		exit(1);
 	}
 
+#ifdef HAVE_FUSE_IOMAP
+	if (fctx.iomap_want == FT_DISABLE)
+		fctx.iomap_state = IOMAP_DISABLED;
+#endif
+
 	/* /dev/sda -> sda for reporting */
 	fctx.shortdev = strrchr(fctx.device, '/');
 	if (fctx.shortdev)
diff --git a/misc/fuse2fs.1.in b/misc/fuse2fs.1.in
index 6acfa092851292..2b55fa0e723966 100644
--- a/misc/fuse2fs.1.in
+++ b/misc/fuse2fs.1.in
@@ -75,6 +75,12 @@ .SS "fuse2fs options:"
 \fB-o\fR fuse2fs_debug
 enable fuse2fs debugging
 .TP
+\fB-o\fR iomap=
+If set to \fI1\fR, requires iomap to be enabled.
+If set to \fI0\fR, forbids use of iomap.
+If set to \fIdefault\fR (or not set), enables iomap if present.
+This substantially improves the performance of the fuse2fs server.
+.TP
 \fB-o\fR kernel
 Behave more like the kernel ext4 driver in the following ways:
 Allows processes owned by other users to access the filesystem.
diff --git a/misc/fuse2fs.c b/misc/fuse2fs.c
index ca61fbc89f5fda..95d3dedbdb8b80 100644
--- a/misc/fuse2fs.c
+++ b/misc/fuse2fs.c
@@ -216,6 +216,12 @@ enum fuse2fs_opstate {
 	F2OP_SHUTDOWN,
 };
 
+enum fuse2fs_feature_toggle {
+	FT_DISABLE,
+	FT_ENABLE,
+	FT_DEFAULT,
+};
+
 #ifdef HAVE_FUSE_IOMAP
 enum fuse2fs_iomap_state {
 	IOMAP_DISABLED,
@@ -253,6 +259,7 @@ struct fuse2fs {
 	int logfd;
 	int blocklog;
 #ifdef HAVE_FUSE_IOMAP
+	enum fuse2fs_feature_toggle iomap_want;
 	enum fuse2fs_iomap_state iomap_state;
 #endif
 	unsigned int blockmask;
@@ -1345,6 +1352,12 @@ static void fuse2fs_iomap_enable(struct fuse_conn_info *conn,
 
 	if (ff->iomap_state == IOMAP_UNKNOWN)
 		ff->iomap_state = IOMAP_DISABLED;
+
+	if (!fuse2fs_iomap_enabled(ff)) {
+		if (ff->iomap_want == FT_ENABLE)
+			err_printf(ff, "%s\n", _("Could not enable iomap."));
+		return;
+	}
 }
 #else
 # define fuse2fs_iomap_enable(...)	((void)0)
@@ -5419,6 +5432,9 @@ enum {
 	FUSE2FS_CACHE_SIZE,
 	FUSE2FS_DIRSYNC,
 	FUSE2FS_ERRORS_BEHAVIOR,
+#ifdef HAVE_FUSE_IOMAP
+	FUSE2FS_IOMAP,
+#endif
 };
 
 #define FUSE2FS_OPT(t, p, v) { t, offsetof(struct fuse2fs, p), v }
@@ -5450,6 +5466,10 @@ static struct fuse_opt fuse2fs_opts[] = {
 	FUSE_OPT_KEY("cache_size=%s",	FUSE2FS_CACHE_SIZE),
 	FUSE_OPT_KEY("dirsync",		FUSE2FS_DIRSYNC),
 	FUSE_OPT_KEY("errors=%s",	FUSE2FS_ERRORS_BEHAVIOR),
+#ifdef HAVE_FUSE_IOMAP
+	FUSE_OPT_KEY("iomap=%s",	FUSE2FS_IOMAP),
+	FUSE_OPT_KEY("iomap",		FUSE2FS_IOMAP),
+#endif
 
 	FUSE_OPT_KEY("-V",             FUSE2FS_VERSION),
 	FUSE_OPT_KEY("--version",      FUSE2FS_VERSION),
@@ -5501,6 +5521,23 @@ static int fuse2fs_opt_proc(void *data, const char *arg,
 
 		/* do not pass through to libfuse */
 		return 0;
+#ifdef HAVE_FUSE_IOMAP
+	case FUSE2FS_IOMAP:
+		if (strcmp(arg, "iomap") == 0 || strcmp(arg + 6, "1") == 0)
+			ff->iomap_want = FT_ENABLE;
+		else if (strcmp(arg + 6, "0") == 0)
+			ff->iomap_want = FT_DISABLE;
+		else if (strcmp(arg + 6, "default") == 0)
+			ff->iomap_want = FT_DEFAULT;
+		else {
+			fprintf(stderr, "%s: %s\n", arg,
+ _("unknown iomap= behavior."));
+			return -1;
+		}
+
+		/* do not pass through to libfuse */
+		return 0;
+#endif
 	case FUSE2FS_IGNORED:
 		return 0;
 	case FUSE2FS_HELP:
@@ -5528,6 +5565,9 @@ static int fuse2fs_opt_proc(void *data, const char *arg,
 	"    -o cache_size=N[KMG]   use a disk cache of this size\n"
 	"    -o errors=             behavior when an error is encountered:\n"
 	"                           continue|remount-ro|panic\n"
+#ifdef HAVE_FUSE_IOMAP
+	"    -o iomap=              0 to disable iomap, 1 to enable iomap\n"
+#endif
 	"\n",
 			outargs->argv[0]);
 		if (key == FUSE2FS_HELPFULL) {
@@ -5620,6 +5660,7 @@ int main(int argc, char *argv[])
 		.opstate = F2OP_WRITABLE,
 		.logfd = -1,
 #ifdef HAVE_FUSE_IOMAP
+		.iomap_want = FT_DEFAULT,
 		.iomap_state = IOMAP_UNKNOWN,
 #endif
 	};
@@ -5637,6 +5678,11 @@ int main(int argc, char *argv[])
 		exit(1);
 	}
 
+#ifdef HAVE_FUSE_IOMAP
+	if (fctx.iomap_want == FT_DISABLE)
+		fctx.iomap_state = IOMAP_DISABLED;
+#endif
+
 	/* /dev/sda -> sda for reporting */
 	fctx.shortdev = strrchr(fctx.device, '/');
 	if (fctx.shortdev)


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 03/17] fuse2fs: implement iomap configuration
  2025-09-16  0:22 ` [PATCHSET RFC v5 4/9] fuse2fs: use fuse iomap data paths for better file I/O performance Darrick J. Wong
  2025-09-16  0:58   ` [PATCH 01/17] fuse2fs: implement bare minimum iomap for file mapping reporting Darrick J. Wong
  2025-09-16  0:59   ` [PATCH 02/17] fuse2fs: add iomap= mount option Darrick J. Wong
@ 2025-09-16  0:59   ` Darrick J. Wong
  2025-09-16  0:59   ` [PATCH 04/17] fuse2fs: register block devices for use with iomap Darrick J. Wong
                     ` (13 subsequent siblings)
  16 siblings, 0 replies; 87+ messages in thread
From: Darrick J. Wong @ 2025-09-16  0:59 UTC (permalink / raw)
  To: tytso; +Cc: miklos, neal, linux-fsdevel, linux-ext4, John, bernd,
	joannelkoong

From: Darrick J. Wong <djwong@kernel.org>

Upload the filesystem geometry to the kernel when asked.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 fuse4fs/fuse4fs.c |   96 +++++++++++++++++++++++++++++++++++++++++++++++++++--
 misc/fuse2fs.c    |   96 +++++++++++++++++++++++++++++++++++++++++++++++++++--
 2 files changed, 186 insertions(+), 6 deletions(-)


diff --git a/fuse4fs/fuse4fs.c b/fuse4fs/fuse4fs.c
index 2d7b40911ce0f7..66683a416749d8 100644
--- a/fuse4fs/fuse4fs.c
+++ b/fuse4fs/fuse4fs.c
@@ -196,6 +196,10 @@ static inline uint64_t round_down(uint64_t b, unsigned int align)
 # define FL_ZERO_RANGE_FLAG (0)
 #endif
 
+#ifndef NSEC_PER_SEC
+# define NSEC_PER_SEC	(1000000000L)
+#endif
+
 errcode_t ext2fs_run_ext3_journal(ext2_filsys *fs);
 
 const char *err_shortdev;
@@ -813,9 +817,9 @@ static int update_atime(ext2_filsys fs, ext2_ino_t ino)
 	EXT4_INODE_GET_XTIME(i_mtime, &mtime, pinode);
 	get_now(&now);
 
-	datime = atime.tv_sec + ((double)atime.tv_nsec / 1000000000);
-	dmtime = mtime.tv_sec + ((double)mtime.tv_nsec / 1000000000);
-	dnow = now.tv_sec + ((double)now.tv_nsec / 1000000000);
+	datime = atime.tv_sec + ((double)atime.tv_nsec / NSEC_PER_SEC);
+	dmtime = mtime.tv_sec + ((double)mtime.tv_nsec / NSEC_PER_SEC);
+	dnow = now.tv_sec + ((double)now.tv_nsec / NSEC_PER_SEC);
 
 	/*
 	 * If atime is newer than mtime and atime hasn't been updated in thirty
@@ -5918,6 +5922,91 @@ static void op_iomap_end(fuse_req_t req, fuse_ino_t fino, uint64_t dontcare,
 
 	fuse_reply_err(req, 0);
 }
+
+/*
+ * Maximal extent format file size.
+ * Resulting logical blkno at s_maxbytes must fit in our on-disk
+ * extent format containers, within a sector_t, and within i_blocks
+ * in the vfs.  ext4 inode has 48 bits of i_block in fsblock units,
+ * so that won't be a limiting factor.
+ *
+ * However there is other limiting factor. We do store extents in the form
+ * of starting block and length, hence the resulting length of the extent
+ * covering maximum file size must fit into on-disk format containers as
+ * well. Given that length is always by 1 unit bigger than max unit (because
+ * we count 0 as well) we have to lower the s_maxbytes by one fs block.
+ *
+ * Note, this does *not* consider any metadata overhead for vfs i_blocks.
+ */
+static off_t fuse4fs_max_size(struct fuse4fs *ff, off_t upper_limit)
+{
+	off_t res;
+
+	if (!ext2fs_has_feature_huge_file(ff->fs->super)) {
+		upper_limit = (1LL << 32) - 1;
+
+		/* total blocks in file system block size */
+		upper_limit >>= (ff->blocklog - 9);
+		upper_limit <<= ff->blocklog;
+	}
+
+	/*
+	 * 32-bit extent-start container, ee_block. We lower the maxbytes
+	 * by one fs block, so ee_len can cover the extent of maximum file
+	 * size
+	 */
+	res = (1LL << 32) - 1;
+	res <<= ff->blocklog;
+
+	/* Sanity check against vm- & vfs- imposed limits */
+	if (res > upper_limit)
+		res = upper_limit;
+
+	return res;
+}
+
+static void op_iomap_config(fuse_req_t req, uint64_t flags, uint64_t maxbytes)
+{
+	struct fuse_iomap_config cfg = { };
+	struct fuse4fs *ff = fuse4fs_get(req);
+	ext2_filsys fs;
+
+	FUSE4FS_CHECK_CONTEXT(req);
+
+	dbg_printf(ff, "%s: flags=0x%llx maxbytes=0x%llx\n", __func__,
+		   (unsigned long long)flags,
+		   (unsigned long long)maxbytes);
+	fs = fuse4fs_start(ff);
+
+	cfg.flags |= FUSE_IOMAP_CONFIG_UUID;
+	memcpy(cfg.s_uuid, fs->super->s_uuid, sizeof(cfg.s_uuid));
+	cfg.s_uuid_len = sizeof(fs->super->s_uuid);
+
+	cfg.flags |= FUSE_IOMAP_CONFIG_BLOCKSIZE;
+	cfg.s_blocksize = FUSE4FS_FSB_TO_B(ff, 1);
+
+	/*
+	 * If there inode is large enough to house i_[acm]time_extra then we
+	 * can turn on nanosecond timestamps; i_crtime was the next field added
+	 * after i_atime_extra.
+	 */
+	cfg.flags |= FUSE_IOMAP_CONFIG_TIME;
+	if (fs->super->s_inode_size >=
+	    offsetof(struct ext2_inode_large, i_crtime)) {
+		cfg.s_time_gran = 1;
+		cfg.s_time_max = EXT4_EXTRA_TIMESTAMP_MAX;
+	} else {
+		cfg.s_time_gran = NSEC_PER_SEC;
+		cfg.s_time_max = EXT4_NON_EXTRA_TIMESTAMP_MAX;
+	}
+	cfg.s_time_min = EXT4_TIMESTAMP_MIN;
+
+	cfg.flags |= FUSE_IOMAP_CONFIG_MAXBYTES;
+	cfg.s_maxbytes = fuse4fs_max_size(ff, maxbytes);
+
+	fuse4fs_finish(ff, 0);
+	fuse_reply_iomap_config(req, &cfg);
+}
 #endif /* HAVE_FUSE_IOMAP */
 
 static struct fuse_lowlevel_ops fs_ops = {
@@ -5966,6 +6055,7 @@ static struct fuse_lowlevel_ops fs_ops = {
 #ifdef HAVE_FUSE_IOMAP
 	.iomap_begin = op_iomap_begin,
 	.iomap_end = op_iomap_end,
+	.iomap_config = op_iomap_config,
 #endif /* HAVE_FUSE_IOMAP */
 };
 
diff --git a/misc/fuse2fs.c b/misc/fuse2fs.c
index 95d3dedbdb8b80..5b5a0934062b64 100644
--- a/misc/fuse2fs.c
+++ b/misc/fuse2fs.c
@@ -190,6 +190,10 @@ static inline uint64_t round_down(uint64_t b, unsigned int align)
 # define FL_ZERO_RANGE_FLAG (0)
 #endif
 
+#ifndef NSEC_PER_SEC
+# define NSEC_PER_SEC	(1000000000L)
+#endif
+
 errcode_t ext2fs_run_ext3_journal(ext2_filsys *fs);
 
 const char *err_shortdev;
@@ -649,9 +653,9 @@ static int update_atime(ext2_filsys fs, ext2_ino_t ino)
 	EXT4_INODE_GET_XTIME(i_mtime, &mtime, pinode);
 	get_now(&now);
 
-	datime = atime.tv_sec + ((double)atime.tv_nsec / 1000000000);
-	dmtime = mtime.tv_sec + ((double)mtime.tv_nsec / 1000000000);
-	dnow = now.tv_sec + ((double)now.tv_nsec / 1000000000);
+	datime = atime.tv_sec + ((double)atime.tv_nsec / NSEC_PER_SEC);
+	dmtime = mtime.tv_sec + ((double)mtime.tv_nsec / NSEC_PER_SEC);
+	dnow = now.tv_sec + ((double)now.tv_nsec / NSEC_PER_SEC);
 
 	/*
 	 * If atime is newer than mtime and atime hasn't been updated in thirty
@@ -5358,6 +5362,91 @@ static int op_iomap_end(const char *path, uint64_t nodeid, uint64_t attr_ino,
 
 	return 0;
 }
+
+/*
+ * Maximal extent format file size.
+ * Resulting logical blkno at s_maxbytes must fit in our on-disk
+ * extent format containers, within a sector_t, and within i_blocks
+ * in the vfs.  ext4 inode has 48 bits of i_block in fsblock units,
+ * so that won't be a limiting factor.
+ *
+ * However there is other limiting factor. We do store extents in the form
+ * of starting block and length, hence the resulting length of the extent
+ * covering maximum file size must fit into on-disk format containers as
+ * well. Given that length is always by 1 unit bigger than max unit (because
+ * we count 0 as well) we have to lower the s_maxbytes by one fs block.
+ *
+ * Note, this does *not* consider any metadata overhead for vfs i_blocks.
+ */
+static off_t fuse2fs_max_size(struct fuse2fs *ff, off_t upper_limit)
+{
+	off_t res;
+
+	if (!ext2fs_has_feature_huge_file(ff->fs->super)) {
+		upper_limit = (1LL << 32) - 1;
+
+		/* total blocks in file system block size */
+		upper_limit >>= (ff->blocklog - 9);
+		upper_limit <<= ff->blocklog;
+	}
+
+	/*
+	 * 32-bit extent-start container, ee_block. We lower the maxbytes
+	 * by one fs block, so ee_len can cover the extent of maximum file
+	 * size
+	 */
+	res = (1LL << 32) - 1;
+	res <<= ff->blocklog;
+
+	/* Sanity check against vm- & vfs- imposed limits */
+	if (res > upper_limit)
+		res = upper_limit;
+
+	return res;
+}
+
+static int op_iomap_config(uint64_t flags, off_t maxbytes,
+			   struct fuse_iomap_config *cfg)
+{
+	struct fuse2fs *ff = fuse2fs_get();
+	ext2_filsys fs;
+
+	FUSE2FS_CHECK_CONTEXT(ff);
+
+	dbg_printf(ff, "%s: flags=0x%llx maxbytes=0x%llx\n", __func__,
+		   (unsigned long long)flags,
+		   (unsigned long long)maxbytes);
+	fs = fuse2fs_start(ff);
+
+	cfg->flags |= FUSE_IOMAP_CONFIG_UUID;
+	memcpy(cfg->s_uuid, fs->super->s_uuid, sizeof(cfg->s_uuid));
+	cfg->s_uuid_len = sizeof(fs->super->s_uuid);
+
+	cfg->flags |= FUSE_IOMAP_CONFIG_BLOCKSIZE;
+	cfg->s_blocksize = FUSE2FS_FSB_TO_B(ff, 1);
+
+	/*
+	 * If there inode is large enough to house i_[acm]time_extra then we
+	 * can turn on nanosecond timestamps; i_crtime was the next field added
+	 * after i_atime_extra.
+	 */
+	cfg->flags |= FUSE_IOMAP_CONFIG_TIME;
+	if (fs->super->s_inode_size >=
+	    offsetof(struct ext2_inode_large, i_crtime)) {
+		cfg->s_time_gran = 1;
+		cfg->s_time_max = EXT4_EXTRA_TIMESTAMP_MAX;
+	} else {
+		cfg->s_time_gran = NSEC_PER_SEC;
+		cfg->s_time_max = EXT4_NON_EXTRA_TIMESTAMP_MAX;
+	}
+	cfg->s_time_min = EXT4_TIMESTAMP_MIN;
+
+	cfg->flags |= FUSE_IOMAP_CONFIG_MAXBYTES;
+	cfg->s_maxbytes = fuse2fs_max_size(ff, maxbytes);
+
+	fuse2fs_finish(ff, 0);
+	return 0;
+}
 #endif /* HAVE_FUSE_IOMAP */
 
 static struct fuse_operations fs_ops = {
@@ -5404,6 +5493,7 @@ static struct fuse_operations fs_ops = {
 #ifdef HAVE_FUSE_IOMAP
 	.iomap_begin = op_iomap_begin,
 	.iomap_end = op_iomap_end,
+	.iomap_config = op_iomap_config,
 #endif /* HAVE_FUSE_IOMAP */
 };
 


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 04/17] fuse2fs: register block devices for use with iomap
  2025-09-16  0:22 ` [PATCHSET RFC v5 4/9] fuse2fs: use fuse iomap data paths for better file I/O performance Darrick J. Wong
                     ` (2 preceding siblings ...)
  2025-09-16  0:59   ` [PATCH 03/17] fuse2fs: implement iomap configuration Darrick J. Wong
@ 2025-09-16  0:59   ` Darrick J. Wong
  2025-09-16  1:00   ` [PATCH 05/17] fuse2fs: implement directio file reads Darrick J. Wong
                     ` (12 subsequent siblings)
  16 siblings, 0 replies; 87+ messages in thread
From: Darrick J. Wong @ 2025-09-16  0:59 UTC (permalink / raw)
  To: tytso; +Cc: miklos, neal, linux-fsdevel, linux-ext4, John, bernd,
	joannelkoong

From: Darrick J. Wong <djwong@kernel.org>

Register the ext4 block device with the kernel for use with iomap.  For
now this is redundant with using fuseblk mode because the kernel
automatically registers any fuseblk devices, but eventually we'll go
back to regular fuse mode and we'll have to pin the bdev ourselves.
In theory this interface supports strange beasts where the metadata can
exist somewhere else entirely (or be made up by AI) while the file data
persists to real disks.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 fuse4fs/fuse4fs.c |   44 ++++++++++++++++++++++++++++++++++++++++----
 misc/fuse2fs.c    |   42 ++++++++++++++++++++++++++++++++++++++----
 2 files changed, 78 insertions(+), 8 deletions(-)


diff --git a/fuse4fs/fuse4fs.c b/fuse4fs/fuse4fs.c
index 66683a416749d8..958427efef04b7 100644
--- a/fuse4fs/fuse4fs.c
+++ b/fuse4fs/fuse4fs.c
@@ -272,6 +272,7 @@ struct fuse4fs {
 #ifdef HAVE_FUSE_IOMAP
 	enum fuse4fs_feature_toggle iomap_want;
 	enum fuse4fs_iomap_state iomap_state;
+	uint32_t iomap_dev;
 #endif
 	unsigned int blockmask;
 	unsigned long offset;
@@ -5712,7 +5713,7 @@ static errcode_t fuse4fs_iomap_begin_extent(struct fuse4fs *ff, uint64_t ino,
 	}
 
 	/* Mapping overlaps startoff, report this. */
-	iomap->dev = FUSE_IOMAP_DEV_NULL;
+	iomap->dev = ff->iomap_dev;
 	iomap->addr = FUSE4FS_FSB_TO_B(ff, extent.e_pblk);
 	iomap->offset = FUSE4FS_FSB_TO_B(ff, extent.e_lblk);
 	iomap->length = FUSE4FS_FSB_TO_B(ff, extent.e_len);
@@ -5745,13 +5746,14 @@ static int fuse4fs_iomap_begin_indirect(struct fuse4fs *ff, uint64_t ino,
 	if (err)
 		return translate_error(fs, ino, err);
 
-	iomap->dev = FUSE_IOMAP_DEV_NULL;
 	iomap->offset = FUSE4FS_FSB_TO_B(ff, startoff);
 	iomap->flags |= FUSE_IOMAP_F_MERGED;
 	if (startblock) {
+		iomap->dev = ff->iomap_dev;
 		iomap->addr = FUSE4FS_FSB_TO_B(ff, startblock);
 		iomap->type = FUSE_IOMAP_TYPE_MAPPED;
 	} else {
+		iomap->dev = FUSE_IOMAP_DEV_NULL;
 		iomap->addr = FUSE_IOMAP_NULL_ADDR;
 		iomap->type = FUSE_IOMAP_TYPE_HOLE;
 	}
@@ -5965,11 +5967,36 @@ static off_t fuse4fs_max_size(struct fuse4fs *ff, off_t upper_limit)
 	return res;
 }
 
+static int fuse4fs_iomap_config_devices(struct fuse4fs *ff)
+{
+	errcode_t err;
+	int fd;
+	int ret;
+
+	err = io_channel_get_fd(ff->fs->io, &fd);
+	if (err)
+		return translate_error(ff->fs, 0, err);
+
+	ret = fuse_lowlevel_iomap_device_add(ff->fuse, fd, 0);
+	if (ret < 0) {
+		dbg_printf(ff, "%s: cannot register iomap dev fd=%d, err=%d\n",
+			   __func__, fd, -ret);
+		return translate_error(ff->fs, 0, -ret);
+	}
+
+	dbg_printf(ff, "%s: registered iomap dev fd=%d iomap_dev=%u\n",
+		   __func__, fd, ff->iomap_dev);
+
+	ff->iomap_dev = ret;
+	return 0;
+}
+
 static void op_iomap_config(fuse_req_t req, uint64_t flags, uint64_t maxbytes)
 {
 	struct fuse_iomap_config cfg = { };
 	struct fuse4fs *ff = fuse4fs_get(req);
 	ext2_filsys fs;
+	int ret = 0;
 
 	FUSE4FS_CHECK_CONTEXT(req);
 
@@ -6004,8 +6031,16 @@ static void op_iomap_config(fuse_req_t req, uint64_t flags, uint64_t maxbytes)
 	cfg.flags |= FUSE_IOMAP_CONFIG_MAXBYTES;
 	cfg.s_maxbytes = fuse4fs_max_size(ff, maxbytes);
 
-	fuse4fs_finish(ff, 0);
-	fuse_reply_iomap_config(req, &cfg);
+	ret = fuse4fs_iomap_config_devices(ff);
+	if (ret)
+		goto out_unlock;
+
+out_unlock:
+	fuse4fs_finish(ff, ret);
+	if (ret)
+		fuse_reply_err(req, -ret);
+	else
+		fuse_reply_iomap_config(req, &cfg);
 }
 #endif /* HAVE_FUSE_IOMAP */
 
@@ -6414,6 +6449,7 @@ int main(int argc, char *argv[])
 #ifdef HAVE_FUSE_IOMAP
 		.iomap_want = FT_DEFAULT,
 		.iomap_state = IOMAP_UNKNOWN,
+		.iomap_dev = FUSE_IOMAP_DEV_NULL,
 #endif
 	};
 	errcode_t err;
diff --git a/misc/fuse2fs.c b/misc/fuse2fs.c
index 5b5a0934062b64..adaa25718ddaaf 100644
--- a/misc/fuse2fs.c
+++ b/misc/fuse2fs.c
@@ -41,6 +41,7 @@
 # define _FILE_OFFSET_BITS 64
 #endif /* _FILE_OFFSET_BITS */
 #include <fuse.h>
+#include <fuse_lowlevel.h>
 #ifdef __SET_FOB_FOR_FUSE
 # undef _FILE_OFFSET_BITS
 #endif /* __SET_FOB_FOR_FUSE */
@@ -265,6 +266,7 @@ struct fuse2fs {
 #ifdef HAVE_FUSE_IOMAP
 	enum fuse2fs_feature_toggle iomap_want;
 	enum fuse2fs_iomap_state iomap_state;
+	uint32_t iomap_dev;
 #endif
 	unsigned int blockmask;
 	unsigned long offset;
@@ -5153,7 +5155,7 @@ static errcode_t fuse2fs_iomap_begin_extent(struct fuse2fs *ff, uint64_t ino,
 	}
 
 	/* Mapping overlaps startoff, report this. */
-	iomap->dev = FUSE_IOMAP_DEV_NULL;
+	iomap->dev = ff->iomap_dev;
 	iomap->addr = FUSE2FS_FSB_TO_B(ff, extent.e_pblk);
 	iomap->offset = FUSE2FS_FSB_TO_B(ff, extent.e_lblk);
 	iomap->length = FUSE2FS_FSB_TO_B(ff, extent.e_len);
@@ -5186,13 +5188,14 @@ static int fuse2fs_iomap_begin_indirect(struct fuse2fs *ff, uint64_t ino,
 	if (err)
 		return translate_error(fs, ino, err);
 
-	iomap->dev = FUSE_IOMAP_DEV_NULL;
 	iomap->offset = FUSE2FS_FSB_TO_B(ff, startoff);
 	iomap->flags |= FUSE_IOMAP_F_MERGED;
 	if (startblock) {
+		iomap->dev = ff->iomap_dev;
 		iomap->addr = FUSE2FS_FSB_TO_B(ff, startblock);
 		iomap->type = FUSE_IOMAP_TYPE_MAPPED;
 	} else {
+		iomap->dev = FUSE_IOMAP_DEV_NULL;
 		iomap->addr = FUSE_IOMAP_NULL_ADDR;
 		iomap->type = FUSE_IOMAP_TYPE_HOLE;
 	}
@@ -5405,11 +5408,36 @@ static off_t fuse2fs_max_size(struct fuse2fs *ff, off_t upper_limit)
 	return res;
 }
 
+static int fuse2fs_iomap_config_devices(struct fuse2fs *ff)
+{
+	errcode_t err;
+	int fd;
+	int ret;
+
+	err = io_channel_get_fd(ff->fs->io, &fd);
+	if (err)
+		return translate_error(ff->fs, 0, err);
+
+	ret = fuse_fs_iomap_device_add(fd, 0);
+	if (ret < 0) {
+		dbg_printf(ff, "%s: cannot register iomap dev fd=%d, err=%d\n",
+			   __func__, fd, -ret);
+		return translate_error(ff->fs, 0, -ret);
+	}
+
+	dbg_printf(ff, "%s: registered iomap dev fd=%d iomap_dev=%u\n",
+		   __func__, fd, ff->iomap_dev);
+
+	ff->iomap_dev = ret;
+	return 0;
+}
+
 static int op_iomap_config(uint64_t flags, off_t maxbytes,
 			   struct fuse_iomap_config *cfg)
 {
 	struct fuse2fs *ff = fuse2fs_get();
 	ext2_filsys fs;
+	int ret = 0;
 
 	FUSE2FS_CHECK_CONTEXT(ff);
 
@@ -5444,8 +5472,13 @@ static int op_iomap_config(uint64_t flags, off_t maxbytes,
 	cfg->flags |= FUSE_IOMAP_CONFIG_MAXBYTES;
 	cfg->s_maxbytes = fuse2fs_max_size(ff, maxbytes);
 
-	fuse2fs_finish(ff, 0);
-	return 0;
+	ret = fuse2fs_iomap_config_devices(ff);
+	if (ret)
+		goto out_unlock;
+
+out_unlock:
+	fuse2fs_finish(ff, ret);
+	return ret;
 }
 #endif /* HAVE_FUSE_IOMAP */
 
@@ -5752,6 +5785,7 @@ int main(int argc, char *argv[])
 #ifdef HAVE_FUSE_IOMAP
 		.iomap_want = FT_DEFAULT,
 		.iomap_state = IOMAP_UNKNOWN,
+		.iomap_dev = FUSE_IOMAP_DEV_NULL,
 #endif
 	};
 	errcode_t err;


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 05/17] fuse2fs: implement directio file reads
  2025-09-16  0:22 ` [PATCHSET RFC v5 4/9] fuse2fs: use fuse iomap data paths for better file I/O performance Darrick J. Wong
                     ` (3 preceding siblings ...)
  2025-09-16  0:59   ` [PATCH 04/17] fuse2fs: register block devices for use with iomap Darrick J. Wong
@ 2025-09-16  1:00   ` Darrick J. Wong
  2025-09-16  1:00   ` [PATCH 06/17] fuse2fs: add extent dump function for debugging Darrick J. Wong
                     ` (11 subsequent siblings)
  16 siblings, 0 replies; 87+ messages in thread
From: Darrick J. Wong @ 2025-09-16  1:00 UTC (permalink / raw)
  To: tytso; +Cc: miklos, neal, linux-fsdevel, linux-ext4, John, bernd,
	joannelkoong

From: Darrick J. Wong <djwong@kernel.org>

Implement file reads via iomap.  Currently only directio is supported.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 fuse4fs/fuse4fs.c |   14 +++++++++++++-
 misc/fuse2fs.c    |   14 +++++++++++++-
 2 files changed, 26 insertions(+), 2 deletions(-)


diff --git a/fuse4fs/fuse4fs.c b/fuse4fs/fuse4fs.c
index 958427efef04b7..90d94bb7404f90 100644
--- a/fuse4fs/fuse4fs.c
+++ b/fuse4fs/fuse4fs.c
@@ -5835,7 +5835,19 @@ static int fuse4fs_iomap_begin_read(struct fuse4fs *ff, ext2_ino_t ino,
 				    uint64_t count, uint32_t opflags,
 				    struct fuse_file_iomap *read)
 {
-	return -ENOSYS;
+	if (!(opflags & FUSE_IOMAP_OP_DIRECT))
+		return -ENOSYS;
+
+	/* fall back to slow path for inline data reads */
+	if (inode->i_flags & EXT4_INLINE_DATA_FL)
+		return -ENOSYS;
+
+	if (inode->i_flags & EXT4_EXTENTS_FL)
+		return fuse4fs_iomap_begin_extent(ff, ino, inode, pos, count,
+						  opflags, read);
+
+	return fuse4fs_iomap_begin_indirect(ff, ino, inode, pos, count,
+					    opflags, read);
 }
 
 static int fuse4fs_iomap_begin_write(struct fuse4fs *ff, ext2_ino_t ino,
diff --git a/misc/fuse2fs.c b/misc/fuse2fs.c
index adaa25718ddaaf..31fd882dac4ef6 100644
--- a/misc/fuse2fs.c
+++ b/misc/fuse2fs.c
@@ -5277,7 +5277,19 @@ static int fuse2fs_iomap_begin_read(struct fuse2fs *ff, ext2_ino_t ino,
 				    uint64_t count, uint32_t opflags,
 				    struct fuse_file_iomap *read)
 {
-	return -ENOSYS;
+	if (!(opflags & FUSE_IOMAP_OP_DIRECT))
+		return -ENOSYS;
+
+	/* fall back to slow path for inline data reads */
+	if (inode->i_flags & EXT4_INLINE_DATA_FL)
+		return -ENOSYS;
+
+	if (inode->i_flags & EXT4_EXTENTS_FL)
+		return fuse2fs_iomap_begin_extent(ff, ino, inode, pos, count,
+						  opflags, read);
+
+	return fuse2fs_iomap_begin_indirect(ff, ino, inode, pos, count,
+					    opflags, read);
 }
 
 static int fuse2fs_iomap_begin_write(struct fuse2fs *ff, ext2_ino_t ino,


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 06/17] fuse2fs: add extent dump function for debugging
  2025-09-16  0:22 ` [PATCHSET RFC v5 4/9] fuse2fs: use fuse iomap data paths for better file I/O performance Darrick J. Wong
                     ` (4 preceding siblings ...)
  2025-09-16  1:00   ` [PATCH 05/17] fuse2fs: implement directio file reads Darrick J. Wong
@ 2025-09-16  1:00   ` Darrick J. Wong
  2025-09-16  1:00   ` [PATCH 07/17] fuse2fs: implement direct write support Darrick J. Wong
                     ` (10 subsequent siblings)
  16 siblings, 0 replies; 87+ messages in thread
From: Darrick J. Wong @ 2025-09-16  1:00 UTC (permalink / raw)
  To: tytso; +Cc: miklos, neal, linux-fsdevel, linux-ext4, John, bernd,
	joannelkoong

From: Darrick J. Wong <djwong@kernel.org>

Add a function to dump an inode's extent map for debugging purposes.
This helped debug a problem with generic/299 failing on 1k fsblock
filesystems:

 --- a/tests/generic/299.out	2025-07-15 14:45:15.030113607 -0700
 +++ b/tests/generic/299.out.bad	2025-07-16 19:33:50.889344998 -0700
 @@ -3,3 +3,4 @@ QA output created by 299
  Run fio with random aio-dio pattern

  Start fallocate/truncate loop
 +fio: io_u error on file /opt/direct_aio.0.0: Input/output error: write offset=2602827776, buflen=131072

(The cause of this was misuse of the libext2fs extent code)

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 fuse4fs/fuse4fs.c |   73 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 misc/fuse2fs.c    |   73 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 146 insertions(+)


diff --git a/fuse4fs/fuse4fs.c b/fuse4fs/fuse4fs.c
index 90d94bb7404f90..03fc25de7b6fbb 100644
--- a/fuse4fs/fuse4fs.c
+++ b/fuse4fs/fuse4fs.c
@@ -735,6 +735,74 @@ static inline int fuse4fs_iomap_enabled(const struct fuse4fs *ff)
 # define fuse4fs_iomap_enabled(...)	(0)
 #endif
 
+static inline void fuse4fs_dump_extents(struct fuse4fs *ff, ext2_ino_t ino,
+					struct ext2_inode_large *inode,
+					const char *why)
+{
+	ext2_filsys fs = ff->fs;
+	unsigned int nr = 0;
+	blk64_t blockcount = 0;
+	struct ext2_inode_large xinode;
+	struct ext2fs_extent extent;
+	ext2_extent_handle_t extents;
+	int op = EXT2_EXTENT_ROOT;
+	errcode_t retval;
+
+	if (!inode) {
+		inode = &xinode;
+
+		retval = fuse4fs_read_inode(fs, ino, inode);
+		if (retval) {
+			com_err(__func__, retval, _("reading ino %u"), ino);
+			return;
+		}
+	}
+
+	if (!(inode->i_flags & EXT4_EXTENTS_FL))
+		return;
+
+	printf("%s: %s ino=%u isize %llu iblocks %llu\n", __func__, why, ino,
+	       EXT2_I_SIZE(inode),
+	       (ext2fs_get_stat_i_blocks(fs, EXT2_INODE(inode)) * 512) /
+	        fs->blocksize);
+	fflush(stdout);
+
+	retval = ext2fs_extent_open(fs, ino, &extents);
+	if (retval) {
+		com_err(__func__, retval, _("opening extents of ino \"%u\""),
+			ino);
+		return;
+	}
+
+	while ((retval = ext2fs_extent_get(extents, op, &extent)) == 0) {
+		op = EXT2_EXTENT_NEXT;
+
+		if (extent.e_flags & EXT2_EXTENT_FLAGS_SECOND_VISIT)
+			continue;
+
+		printf("[%u]: %s ino=%u lblk 0x%llx pblk 0x%llx len 0x%x flags 0x%x\n",
+		       nr++, why, ino, extent.e_lblk, extent.e_pblk,
+		       extent.e_len, extent.e_flags);
+		fflush(stdout);
+		if (extent.e_flags & EXT2_EXTENT_FLAGS_LEAF)
+			blockcount += extent.e_len;
+		else
+			blockcount++;
+	}
+	if (retval == EXT2_ET_EXTENT_NO_NEXT)
+		retval = 0;
+	if (retval) {
+		com_err(__func__, retval, ("getting extents of ino %u"),
+			ino);
+	}
+	if (inode->i_file_acl)
+		blockcount++;
+	printf("%s: %s sum(e_len) %llu\n", __func__, why, blockcount);
+	fflush(stdout);
+
+	ext2fs_extent_free(extents);
+}
+
 static void get_now(struct timespec *now)
 {
 #ifdef CLOCK_REALTIME
@@ -5907,6 +5975,11 @@ static void op_iomap_begin(fuse_req_t req, fuse_ino_t fino, uint64_t dontcare,
 		   read.type,
 		   read.flags);
 
+	/* Not filling even the first byte will make the kernel unhappy. */
+	if (ff->debug && (read.offset > pos ||
+			  read.offset + read.length <= pos))
+		fuse4fs_dump_extents(ff, ino, &inode, "BAD DATA");
+
 out_unlock:
 	fuse4fs_finish(ff, ret);
 	if (ret)
diff --git a/misc/fuse2fs.c b/misc/fuse2fs.c
index 31fd882dac4ef6..76540f4fc3c694 100644
--- a/misc/fuse2fs.c
+++ b/misc/fuse2fs.c
@@ -572,6 +572,74 @@ static inline int fuse2fs_iomap_enabled(const struct fuse2fs *ff)
 # define fuse2fs_iomap_enabled(...)	(0)
 #endif
 
+static inline void fuse2fs_dump_extents(struct fuse2fs *ff, ext2_ino_t ino,
+					struct ext2_inode_large *inode,
+					const char *why)
+{
+	ext2_filsys fs = ff->fs;
+	unsigned int nr = 0;
+	blk64_t blockcount = 0;
+	struct ext2_inode_large xinode;
+	struct ext2fs_extent extent;
+	ext2_extent_handle_t extents;
+	int op = EXT2_EXTENT_ROOT;
+	errcode_t retval;
+
+	if (!inode) {
+		inode = &xinode;
+
+		retval = fuse2fs_read_inode(fs, ino, inode);
+		if (retval) {
+			com_err(__func__, retval, _("reading ino %u"), ino);
+			return;
+		}
+	}
+
+	if (!(inode->i_flags & EXT4_EXTENTS_FL))
+		return;
+
+	printf("%s: %s ino=%u isize %llu iblocks %llu\n", __func__, why, ino,
+	       EXT2_I_SIZE(inode),
+	       (ext2fs_get_stat_i_blocks(fs, EXT2_INODE(inode)) * 512) /
+	        fs->blocksize);
+	fflush(stdout);
+
+	retval = ext2fs_extent_open(fs, ino, &extents);
+	if (retval) {
+		com_err(__func__, retval, _("opening extents of ino \"%u\""),
+			ino);
+		return;
+	}
+
+	while ((retval = ext2fs_extent_get(extents, op, &extent)) == 0) {
+		op = EXT2_EXTENT_NEXT;
+
+		if (extent.e_flags & EXT2_EXTENT_FLAGS_SECOND_VISIT)
+			continue;
+
+		printf("[%u]: %s ino=%u lblk 0x%llx pblk 0x%llx len 0x%x flags 0x%x\n",
+		       nr++, why, ino, extent.e_lblk, extent.e_pblk,
+		       extent.e_len, extent.e_flags);
+		fflush(stdout);
+		if (extent.e_flags & EXT2_EXTENT_FLAGS_LEAF)
+			blockcount += extent.e_len;
+		else
+			blockcount++;
+	}
+	if (retval == EXT2_ET_EXTENT_NO_NEXT)
+		retval = 0;
+	if (retval) {
+		com_err(__func__, retval, ("getting extents of ino %u"),
+			ino);
+	}
+	if (inode->i_file_acl)
+		blockcount++;
+	printf("%s: %s sum(e_len) %llu\n", __func__, why, blockcount);
+	fflush(stdout);
+
+	ext2fs_extent_free(extents);
+}
+
 static void get_now(struct timespec *now)
 {
 #ifdef CLOCK_REALTIME
@@ -5351,6 +5419,11 @@ static int op_iomap_begin(const char *path, uint64_t nodeid, uint64_t attr_ino,
 		   (unsigned long long)read->length,
 		   read->type);
 
+	/* Not filling even the first byte will make the kernel unhappy. */
+	if (ff->debug && (read->offset > pos ||
+			  read->offset + read->length <= pos))
+		fuse2fs_dump_extents(ff, attr_ino, &inode, "BAD DATA");
+
 out_unlock:
 	fuse2fs_finish(ff, ret);
 	return ret;


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 07/17] fuse2fs: implement direct write support
  2025-09-16  0:22 ` [PATCHSET RFC v5 4/9] fuse2fs: use fuse iomap data paths for better file I/O performance Darrick J. Wong
                     ` (5 preceding siblings ...)
  2025-09-16  1:00   ` [PATCH 06/17] fuse2fs: add extent dump function for debugging Darrick J. Wong
@ 2025-09-16  1:00   ` Darrick J. Wong
  2025-09-16  1:00   ` [PATCH 08/17] fuse2fs: turn on iomap for pagecache IO Darrick J. Wong
                     ` (9 subsequent siblings)
  16 siblings, 0 replies; 87+ messages in thread
From: Darrick J. Wong @ 2025-09-16  1:00 UTC (permalink / raw)
  To: tytso; +Cc: miklos, neal, linux-fsdevel, linux-ext4, John, bernd,
	joannelkoong

From: Darrick J. Wong <djwong@kernel.org>

Wire up an iomap_begin method that can allocate into holes so that we
can do directio writes.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 fuse4fs/fuse4fs.c |  473 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 misc/fuse2fs.c    |  470 ++++++++++++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 937 insertions(+), 6 deletions(-)


diff --git a/fuse4fs/fuse4fs.c b/fuse4fs/fuse4fs.c
index 03fc25de7b6fbb..b7184e3416860d 100644
--- a/fuse4fs/fuse4fs.c
+++ b/fuse4fs/fuse4fs.c
@@ -5918,12 +5918,106 @@ static int fuse4fs_iomap_begin_read(struct fuse4fs *ff, ext2_ino_t ino,
 					    opflags, read);
 }
 
+static int fuse4fs_iomap_write_allocate(struct fuse4fs *ff, ext2_ino_t ino,
+					struct ext2_inode_large *inode,
+					off_t pos, uint64_t count,
+					uint32_t opflags,
+					struct fuse_file_iomap *read,
+					bool *dirty)
+{
+	ext2_filsys fs = ff->fs;
+	blk64_t startoff = FUSE4FS_B_TO_FSBT(ff, pos);
+	blk64_t stopoff = FUSE4FS_B_TO_FSB(ff, pos + count);
+	blk64_t old_iblocks;
+	errcode_t err;
+	int ret;
+
+	dbg_printf(ff,
+ "%s: ino=%d startoff 0x%llx blockcount 0x%llx\n",
+		   __func__, ino, startoff, stopoff - startoff);
+
+	if (!fuse4fs_can_allocate(ff, stopoff - startoff))
+		return -ENOSPC;
+
+	old_iblocks = ext2fs_get_stat_i_blocks(fs, EXT2_INODE(inode));
+	err = ext2fs_fallocate(fs, EXT2_FALLOCATE_FORCE_UNINIT, ino,
+			       EXT2_INODE(inode), ~0ULL, startoff,
+			       stopoff - startoff);
+	if (err)
+		return translate_error(fs, ino, err);
+
+	/*
+	 * New allocations for file data blocks on indirect mapped files are
+	 * zeroed through the IO manager so we have to flush it to disk.
+	 */
+	if (!(inode->i_flags & EXT4_EXTENTS_FL) &&
+	    old_iblocks != ext2fs_get_stat_i_blocks(fs, EXT2_INODE(inode))) {
+		err = io_channel_flush(fs->io);
+		if (err)
+			return translate_error(fs, ino, err);
+	}
+
+	/* pick up the newly allocated mapping */
+	ret = fuse4fs_iomap_begin_read(ff, ino, inode, pos, count, opflags,
+				       read);
+	if (ret)
+		return ret;
+
+	read->flags |= FUSE_IOMAP_F_DIRTY;
+	*dirty = true;
+	return 0;
+}
+
+static off_t fuse4fs_max_file_size(const struct fuse4fs *ff,
+				   const struct ext2_inode_large *inode)
+{
+	ext2_filsys fs = ff->fs;
+	blk64_t addr_per_block, max_map_block;
+
+	if (inode->i_flags & EXT4_EXTENTS_FL) {
+		max_map_block = (1ULL << 32) - 1;
+	} else {
+		addr_per_block = fs->blocksize >> 2;
+		max_map_block = addr_per_block;
+		max_map_block += addr_per_block * addr_per_block;
+		max_map_block += addr_per_block * addr_per_block * addr_per_block;
+		max_map_block += 12;
+	}
+
+	return FUSE4FS_FSB_TO_B(ff, max_map_block) + (fs->blocksize - 1);
+}
+
 static int fuse4fs_iomap_begin_write(struct fuse4fs *ff, ext2_ino_t ino,
 				     struct ext2_inode_large *inode, off_t pos,
 				     uint64_t count, uint32_t opflags,
-				     struct fuse_file_iomap *read)
+				     struct fuse_file_iomap *read,
+				     bool *dirty)
 {
-	return -ENOSYS;
+	off_t max_size = fuse4fs_max_file_size(ff, inode);
+	int ret;
+
+	if (!(opflags & FUSE_IOMAP_OP_DIRECT))
+		return -ENOSYS;
+
+	if (pos >= max_size)
+		return -EFBIG;
+
+	if (pos >= max_size - count)
+		count = max_size - pos;
+
+	ret = fuse4fs_iomap_begin_read(ff, ino, inode, pos, count, opflags,
+				       read);
+	if (ret)
+		return ret;
+
+	if (fuse_iomap_need_write_allocate(opflags, read)) {
+		ret = fuse4fs_iomap_write_allocate(ff, ino, inode, pos, count,
+						   opflags, read, dirty);
+		if (ret)
+			return ret;
+	}
+
+	return 0;
 }
 
 static void op_iomap_begin(fuse_req_t req, fuse_ino_t fino, uint64_t dontcare,
@@ -5935,6 +6029,7 @@ static void op_iomap_begin(fuse_req_t req, fuse_ino_t fino, uint64_t dontcare,
 	ext2_filsys fs;
 	ext2_ino_t ino;
 	errcode_t err;
+	bool dirty = false;
 	int ret = 0;
 
 	FUSE4FS_CHECK_CONTEXT(req);
@@ -5958,7 +6053,7 @@ static void op_iomap_begin(fuse_req_t req, fuse_ino_t fino, uint64_t dontcare,
 						 opflags, &read);
 	else if (fuse_iomap_is_write(opflags))
 		ret = fuse4fs_iomap_begin_write(ff, ino, &inode, pos, count,
-						opflags, &read);
+						opflags, &read, &dirty);
 	else
 		ret = fuse4fs_iomap_begin_read(ff, ino, &inode, pos, count,
 					       opflags, &read);
@@ -5980,6 +6075,14 @@ static void op_iomap_begin(fuse_req_t req, fuse_ino_t fino, uint64_t dontcare,
 			  read.offset + read.length <= pos))
 		fuse4fs_dump_extents(ff, ino, &inode, "BAD DATA");
 
+	if (dirty) {
+		err = fuse4fs_write_inode(fs, ino, &inode);
+		if (err) {
+			ret = translate_error(fs, ino, err);
+			goto out_unlock;
+		}
+	}
+
 out_unlock:
 	fuse4fs_finish(ff, ret);
 	if (ret)
@@ -6127,6 +6230,369 @@ static void op_iomap_config(fuse_req_t req, uint64_t flags, uint64_t maxbytes)
 	else
 		fuse_reply_iomap_config(req, &cfg);
 }
+
+static inline bool fuse4fs_can_merge_mappings(const struct ext2fs_extent *left,
+					      const struct ext2fs_extent *right)
+{
+	uint64_t max_len = (left->e_flags & EXT2_EXTENT_FLAGS_UNINIT) ?
+				EXT_UNINIT_MAX_LEN : EXT_INIT_MAX_LEN;
+
+	return left->e_lblk + left->e_len == right->e_lblk &&
+	       left->e_pblk + left->e_len == right->e_pblk &&
+	       (left->e_flags & EXT2_EXTENT_FLAGS_UNINIT) ==
+	        (right->e_flags & EXT2_EXTENT_FLAGS_UNINIT) &&
+	       (uint64_t)left->e_len + right->e_len <= max_len;
+}
+
+static int fuse4fs_try_merge_mappings(struct fuse4fs *ff, ext2_ino_t ino,
+				      ext2_extent_handle_t handle,
+				      blk64_t startoff)
+{
+	ext2_filsys fs = ff->fs;
+	struct ext2fs_extent left, right;
+	errcode_t err;
+
+	/* Look up the mappings before startoff */
+	err = fuse4fs_get_mapping_at(ff, handle, startoff - 1, &left);
+	if (err == EXT2_ET_EXTENT_NOT_FOUND)
+		return 0;
+	if (err)
+		return translate_error(fs, ino, err);
+
+	/* Look up the mapping at startoff */
+	err = fuse4fs_get_mapping_at(ff, handle, startoff, &right);
+	if (err == EXT2_ET_EXTENT_NOT_FOUND)
+		return 0;
+	if (err)
+		return translate_error(fs, ino, err);
+
+	/* Can we combine them? */
+	if (!fuse4fs_can_merge_mappings(&left, &right))
+		return 0;
+
+	/*
+	 * Delete the mapping after startoff because libext2fs cannot handle
+	 * overlapping mappings.
+	 */
+	err = ext2fs_extent_delete(handle, 0);
+	DUMP_EXTENT(ff, "remover", startoff, err, &right);
+	if (err)
+		return translate_error(fs, ino, err);
+
+	err = ext2fs_extent_fix_parents(handle);
+	DUMP_EXTENT(ff, "fixremover", startoff, err, &right);
+	if (err)
+		return translate_error(fs, ino, err);
+
+	/* Move back and lengthen the mapping before startoff */
+	err = ext2fs_extent_goto(handle, left.e_lblk);
+	DUMP_EXTENT(ff, "movel", startoff - 1, err, &left);
+	if (err)
+		return translate_error(fs, ino, err);
+
+	left.e_len += right.e_len;
+	err = ext2fs_extent_replace(handle, 0, &left);
+	DUMP_EXTENT(ff, "replacel", startoff - 1, err, &left);
+	if (err)
+		return translate_error(fs, ino, err);
+
+	err = ext2fs_extent_fix_parents(handle);
+	DUMP_EXTENT(ff, "fixreplacel", startoff - 1, err, &left);
+	if (err)
+		return translate_error(fs, ino, err);
+
+	return 0;
+}
+
+static int fuse4fs_convert_unwritten_mapping(struct fuse4fs *ff,
+					     ext2_ino_t ino,
+					     struct ext2_inode_large *inode,
+					     ext2_extent_handle_t handle,
+					     blk64_t *cursor, blk64_t stopoff)
+{
+	ext2_filsys fs = ff->fs;
+	struct ext2fs_extent extent;
+	blk64_t startoff = *cursor;
+	errcode_t err;
+
+	/*
+	 * Find the mapping at startoff.  Note that we can find holes because
+	 * the mapping data can change due to racing writes.
+	 */
+	err = fuse4fs_get_mapping_at(ff, handle, startoff, &extent);
+	if (err == EXT2_ET_EXTENT_NOT_FOUND) {
+		/*
+		 * If we didn't find any mappings at all then the file is
+		 * completely sparse.  There's nothing to convert.
+		 */
+		*cursor = stopoff;
+		return 0;
+	}
+	if (err)
+		return translate_error(fs, ino, err);
+
+	/*
+	 * The mapping is completely to the left of the range that we want.
+	 * Let's see what's in the next extent, if there is one.
+	 */
+	if (startoff >= extent.e_lblk + extent.e_len) {
+		/*
+		 * Mapping ends to the left of the current position.  Try to
+		 * find the next mapping.  If there is no next mapping, then
+		 * we're done.
+		 */
+		err = fuse4fs_get_next_mapping(ff, handle, startoff, &extent);
+		if (err == EXT2_ET_EXTENT_NOT_FOUND) {
+			*cursor = stopoff;
+			return 0;
+		}
+		if (err)
+			return translate_error(fs, ino, err);
+	}
+
+	/*
+	 * The mapping is completely to the right of the range that we want,
+	 * so we're done.
+	 */
+	if (extent.e_lblk >= stopoff) {
+		*cursor = stopoff;
+		return 0;
+	}
+
+	/*
+	 * At this point, we have a mapping that overlaps (startoff, stopoff].
+	 * If the mapping is already written, move on to the next one.
+	 */
+	if (!(extent.e_flags & EXT2_EXTENT_FLAGS_UNINIT))
+		goto next;
+
+	if (startoff > extent.e_lblk) {
+		struct ext2fs_extent newex = extent;
+
+		/*
+		 * Unwritten mapping starts before startoff.  Shorten
+		 * the previous mapping...
+		 */
+		newex.e_len = startoff - extent.e_lblk;
+		err = ext2fs_extent_replace(handle, 0, &newex);
+		DUMP_EXTENT(ff, "shortenp", startoff, err, &newex);
+		if (err)
+			return translate_error(fs, ino, err);
+
+		err = ext2fs_extent_fix_parents(handle);
+		DUMP_EXTENT(ff, "fixshortenp", startoff, err, &newex);
+		if (err)
+			return translate_error(fs, ino, err);
+
+		/* ...and create new written mapping at startoff. */
+		extent.e_len -= newex.e_len;
+		extent.e_lblk += newex.e_len;
+		extent.e_pblk += newex.e_len;
+		extent.e_flags = newex.e_flags & ~EXT2_EXTENT_FLAGS_UNINIT;
+
+		err = ext2fs_extent_insert(handle,
+					   EXT2_EXTENT_INSERT_AFTER,
+					   &extent);
+		DUMP_EXTENT(ff, "insertx", startoff, err, &extent);
+		if (err)
+			return translate_error(fs, ino, err);
+
+		err = ext2fs_extent_fix_parents(handle);
+		DUMP_EXTENT(ff, "fixinsertx", startoff, err, &extent);
+		if (err)
+			return translate_error(fs, ino, err);
+	}
+
+	if (extent.e_lblk + extent.e_len > stopoff) {
+		struct ext2fs_extent newex = extent;
+
+		/*
+		 * Unwritten mapping ends after stopoff.  Shorten the current
+		 * mapping...
+		 */
+		extent.e_len = stopoff - extent.e_lblk;
+		extent.e_flags &= ~EXT2_EXTENT_FLAGS_UNINIT;
+
+		err = ext2fs_extent_replace(handle, 0, &extent);
+		DUMP_EXTENT(ff, "shortenn", startoff, err, &extent);
+		if (err)
+			return translate_error(fs, ino, err);
+
+		err = ext2fs_extent_fix_parents(handle);
+		DUMP_EXTENT(ff, "fixshortenn", startoff, err, &extent);
+		if (err)
+			return translate_error(fs, ino, err);
+
+		/* ..and create a new unwritten mapping at stopoff. */
+		newex.e_pblk += extent.e_len;
+		newex.e_lblk += extent.e_len;
+		newex.e_len -= extent.e_len;
+		newex.e_flags |= EXT2_EXTENT_FLAGS_UNINIT;
+
+		err = ext2fs_extent_insert(handle,
+					   EXT2_EXTENT_INSERT_AFTER,
+					   &newex);
+		DUMP_EXTENT(ff, "insertn", startoff, err, &newex);
+		if (err)
+			return translate_error(fs, ino, err);
+
+		err = ext2fs_extent_fix_parents(handle);
+		DUMP_EXTENT(ff, "fixinsertn", startoff, err, &newex);
+		if (err)
+			return translate_error(fs, ino, err);
+	}
+
+	/* Still unwritten?  Update the state. */
+	if (extent.e_flags & EXT2_EXTENT_FLAGS_UNINIT) {
+		extent.e_flags &= ~EXT2_EXTENT_FLAGS_UNINIT;
+
+		err = ext2fs_extent_replace(handle, 0, &extent);
+		DUMP_EXTENT(ff, "replacex", startoff, err, &extent);
+		if (err)
+			return translate_error(fs, ino, err);
+
+		err = ext2fs_extent_fix_parents(handle);
+		DUMP_EXTENT(ff, "fixreplacex", startoff, err, &extent);
+		if (err)
+			return translate_error(fs, ino, err);
+	}
+
+next:
+	/* Try to merge with the previous extent */
+	if (startoff > 0) {
+		err = fuse4fs_try_merge_mappings(ff, ino, handle, startoff);
+		if (err)
+			return translate_error(fs, ino, err);
+	}
+
+	*cursor = extent.e_lblk + extent.e_len;
+	return 0;
+}
+
+static int fuse4fs_convert_unwritten_mappings(struct fuse4fs *ff,
+					      ext2_ino_t ino,
+					      struct ext2_inode_large *inode,
+					      off_t pos, size_t written)
+{
+	ext2_extent_handle_t handle;
+	ext2_filsys fs = ff->fs;
+	blk64_t startoff = FUSE4FS_B_TO_FSBT(ff, pos);
+	const blk64_t stopoff = FUSE4FS_B_TO_FSB(ff, pos + written);
+	errcode_t err;
+	int ret;
+
+	err = ext2fs_extent_open2(fs, ino, EXT2_INODE(inode), &handle);
+	if (err)
+		return translate_error(fs, ino, err);
+
+	/* Walk every mapping in the range, converting them. */
+	while (startoff < stopoff) {
+		blk64_t old_startoff = startoff;
+
+		ret = fuse4fs_convert_unwritten_mapping(ff, ino, inode, handle,
+							&startoff, stopoff);
+		if (ret)
+			goto out_handle;
+		if (startoff <= old_startoff) {
+			/* Do not go backwards. */
+			ret = translate_error(fs, ino, EXT2_ET_INODE_CORRUPTED);
+			goto out_handle;
+		}
+	}
+
+	/* Try to merge the right edge */
+	ret = fuse4fs_try_merge_mappings(ff, ino, handle, stopoff);
+out_handle:
+	ext2fs_extent_free(handle);
+	return ret;
+}
+
+static void op_iomap_ioend(fuse_req_t req, fuse_ino_t fino, uint64_t dontcare,
+			   off_t pos, size_t written, uint32_t ioendflags,
+			   int error, uint64_t new_addr)
+{
+	struct fuse4fs *ff = fuse4fs_get(req);
+	struct ext2_inode_large inode;
+	ext2_filsys fs;
+	ext2_ino_t ino;
+	errcode_t err;
+	bool dirty = false;
+	int ret = 0;
+
+	FUSE4FS_CHECK_CONTEXT(req);
+	FUSE4FS_CONVERT_FINO(req, &ino, fino);
+
+	dbg_printf(ff,
+ "%s: ino=%d pos=0x%llx written=0x%zx ioendflags=0x%x error=%d new_addr=0x%llx\n",
+		   __func__, ino,
+		   (unsigned long long)pos,
+		   written,
+		   ioendflags,
+		   error,
+		   (unsigned long long)new_addr);
+
+	if (error) {
+		fuse_reply_err(req, -error);
+		return;
+	}
+
+	fs = fuse4fs_start(ff);
+
+	/* should never see these ioend types */
+	if (ioendflags & FUSE_IOMAP_IOEND_SHARED) {
+		ret = translate_error(fs, ino, EXT2_ET_FILESYSTEM_CORRUPTED);
+		goto out_unlock;
+	}
+
+	err = fuse4fs_read_inode(fs, ino, &inode);
+	if (err) {
+		ret = translate_error(fs, ino, err);
+		goto out_unlock;
+	}
+
+	if (ioendflags & FUSE_IOMAP_IOEND_UNWRITTEN) {
+		/* unwritten extents are only supported on extents files */
+		if (!(inode.i_flags & EXT4_EXTENTS_FL)) {
+			ret = translate_error(fs, ino,
+					      EXT2_ET_FILESYSTEM_CORRUPTED);
+			goto out_unlock;
+		}
+
+		ret = fuse4fs_convert_unwritten_mappings(ff, ino, &inode,
+							 pos, written);
+		if (ret)
+			goto out_unlock;
+
+		dirty = true;
+	}
+
+	if (ioendflags & FUSE_IOMAP_IOEND_APPEND) {
+		ext2_off64_t isize = EXT2_I_SIZE(&inode);
+
+		if (pos + written > isize) {
+			err = ext2fs_inode_size_set(fs, EXT2_INODE(&inode),
+						    pos + written);
+			if (err) {
+				ret = translate_error(fs, ino, err);
+				goto out_unlock;
+			}
+
+			dirty = true;
+		}
+	}
+
+	if (dirty) {
+		err = fuse4fs_write_inode(fs, ino, &inode);
+		if (err) {
+			ret = translate_error(fs, ino, err);
+			goto out_unlock;
+		}
+	}
+
+out_unlock:
+	fuse4fs_finish(ff, ret);
+	fuse_reply_err(req, -ret);
+}
 #endif /* HAVE_FUSE_IOMAP */
 
 static struct fuse_lowlevel_ops fs_ops = {
@@ -6176,6 +6642,7 @@ static struct fuse_lowlevel_ops fs_ops = {
 	.iomap_begin = op_iomap_begin,
 	.iomap_end = op_iomap_end,
 	.iomap_config = op_iomap_config,
+	.iomap_ioend = op_iomap_ioend,
 #endif /* HAVE_FUSE_IOMAP */
 };
 
diff --git a/misc/fuse2fs.c b/misc/fuse2fs.c
index 76540f4fc3c694..9bcf2c81b7e732 100644
--- a/misc/fuse2fs.c
+++ b/misc/fuse2fs.c
@@ -5360,12 +5360,103 @@ static int fuse2fs_iomap_begin_read(struct fuse2fs *ff, ext2_ino_t ino,
 					    opflags, read);
 }
 
+static int fuse2fs_iomap_write_allocate(struct fuse2fs *ff, ext2_ino_t ino,
+				     struct ext2_inode_large *inode, off_t pos,
+				     uint64_t count, uint32_t opflags,
+				     struct fuse_file_iomap *read, bool *dirty)
+{
+	ext2_filsys fs = ff->fs;
+	blk64_t startoff = FUSE2FS_B_TO_FSBT(ff, pos);
+	blk64_t stopoff = FUSE2FS_B_TO_FSB(ff, pos + count);
+	blk64_t old_iblocks;
+	errcode_t err;
+	int ret;
+
+	dbg_printf(ff, "%s: write_alloc ino=%u startoff 0x%llx blockcount 0x%llx\n",
+		   __func__, ino, startoff, stopoff - startoff);
+
+	if (!fs_can_allocate(ff, stopoff - startoff))
+		return -ENOSPC;
+
+	old_iblocks = ext2fs_get_stat_i_blocks(fs, EXT2_INODE(inode));
+	err = ext2fs_fallocate(fs, EXT2_FALLOCATE_FORCE_UNINIT, ino,
+			       EXT2_INODE(inode), ~0ULL, startoff,
+			       stopoff - startoff);
+	if (err)
+		return translate_error(fs, ino, err);
+
+	/*
+	 * New allocations for file data blocks on indirect mapped files are
+	 * zeroed through the IO manager so we have to flush it to disk.
+	 */
+	if (!(inode->i_flags & EXT4_EXTENTS_FL) &&
+	    old_iblocks != ext2fs_get_stat_i_blocks(fs, EXT2_INODE(inode))) {
+		err = io_channel_flush(fs->io);
+		if (err)
+			return translate_error(fs, ino, err);
+	}
+
+	/* pick up the newly allocated mapping */
+	ret = fuse2fs_iomap_begin_read(ff, ino, inode, pos, count, opflags,
+				       read);
+	if (ret)
+		return ret;
+
+	read->flags |= FUSE_IOMAP_F_DIRTY;
+	*dirty = true;
+	return 0;
+}
+
+static off_t fuse2fs_max_file_size(const struct fuse2fs *ff,
+				   const struct ext2_inode_large *inode)
+{
+	ext2_filsys fs = ff->fs;
+	blk64_t addr_per_block, max_map_block;
+
+	if (inode->i_flags & EXT4_EXTENTS_FL) {
+		max_map_block = (1ULL << 32) - 1;
+	} else {
+		addr_per_block = fs->blocksize >> 2;
+		max_map_block = addr_per_block;
+		max_map_block += addr_per_block * addr_per_block;
+		max_map_block += addr_per_block * addr_per_block * addr_per_block;
+		max_map_block += 12;
+	}
+
+	return FUSE2FS_FSB_TO_B(ff, max_map_block) + (fs->blocksize - 1);
+}
+
 static int fuse2fs_iomap_begin_write(struct fuse2fs *ff, ext2_ino_t ino,
 				     struct ext2_inode_large *inode, off_t pos,
 				     uint64_t count, uint32_t opflags,
-				     struct fuse_file_iomap *read)
+				     struct fuse_file_iomap *read,
+				     bool *dirty)
 {
-	return -ENOSYS;
+	off_t max_size = fuse2fs_max_file_size(ff, inode);
+	int ret;
+
+	if (!(opflags & FUSE_IOMAP_OP_DIRECT))
+		return -ENOSYS;
+
+	if (pos >= max_size)
+		return -EFBIG;
+
+	if (pos >= max_size - count)
+		count = max_size - pos;
+
+	ret = fuse2fs_iomap_begin_read(ff, ino, inode, pos, count, opflags,
+				       read);
+	if (ret)
+		return ret;
+
+	if (fuse_iomap_need_write_allocate(opflags, read)) {
+		ret = fuse2fs_iomap_write_allocate(ff, ino, inode, pos, count,
+						   opflags, read, dirty);
+		if (ret)
+			return ret;
+	}
+
+	return 0;
 }
 
 static int op_iomap_begin(const char *path, uint64_t nodeid, uint64_t attr_ino,
@@ -5377,6 +5468,7 @@ static int op_iomap_begin(const char *path, uint64_t nodeid, uint64_t attr_ino,
 	struct ext2_inode_large inode;
 	ext2_filsys fs;
 	errcode_t err;
+	bool dirty = false;
 	int ret = 0;
 
 	FUSE2FS_CHECK_CONTEXT(ff);
@@ -5402,7 +5494,7 @@ static int op_iomap_begin(const char *path, uint64_t nodeid, uint64_t attr_ino,
 						 count, opflags, read);
 	else if (fuse_iomap_is_write(opflags))
 		ret = fuse2fs_iomap_begin_write(ff, attr_ino, &inode, pos,
-						count, opflags, read);
+						count, opflags, read, &dirty);
 	else
 		ret = fuse2fs_iomap_begin_read(ff, attr_ino, &inode, pos,
 					       count, opflags, read);
@@ -5424,6 +5516,14 @@ static int op_iomap_begin(const char *path, uint64_t nodeid, uint64_t attr_ino,
 			  read->offset + read->length <= pos))
 		fuse2fs_dump_extents(ff, attr_ino, &inode, "BAD DATA");
 
+	if (dirty) {
+		err = fuse2fs_write_inode(fs, attr_ino, &inode);
+		if (err) {
+			ret = translate_error(fs, attr_ino, err);
+			goto out_unlock;
+		}
+	}
+
 out_unlock:
 	fuse2fs_finish(ff, ret);
 	return ret;
@@ -5561,6 +5661,369 @@ static int op_iomap_config(uint64_t flags, off_t maxbytes,
 	if (ret)
 		goto out_unlock;
 
+out_unlock:
+	fuse2fs_finish(ff, ret);
+	return ret;
+}
+
+static inline bool fuse2fs_can_merge_mappings(const struct ext2fs_extent *left,
+					      const struct ext2fs_extent *right)
+{
+	uint64_t max_len = (left->e_flags & EXT2_EXTENT_FLAGS_UNINIT) ?
+				EXT_UNINIT_MAX_LEN : EXT_INIT_MAX_LEN;
+
+	return left->e_lblk + left->e_len == right->e_lblk &&
+	       left->e_pblk + left->e_len == right->e_pblk &&
+	       (left->e_flags & EXT2_EXTENT_FLAGS_UNINIT) ==
+	        (right->e_flags & EXT2_EXTENT_FLAGS_UNINIT) &&
+	       (uint64_t)left->e_len + right->e_len <= max_len;
+}
+
+static int fuse2fs_try_merge_mappings(struct fuse2fs *ff, ext2_ino_t ino,
+				      ext2_extent_handle_t handle,
+				      blk64_t startoff)
+{
+	ext2_filsys fs = ff->fs;
+	struct ext2fs_extent left, right;
+	errcode_t err;
+
+	/* Look up the mappings before startoff */
+	err = fuse2fs_get_mapping_at(ff, handle, startoff - 1, &left);
+	if (err == EXT2_ET_EXTENT_NOT_FOUND)
+		return 0;
+	if (err)
+		return translate_error(fs, ino, err);
+
+	/* Look up the mapping at startoff */
+	err = fuse2fs_get_mapping_at(ff, handle, startoff, &right);
+	if (err == EXT2_ET_EXTENT_NOT_FOUND)
+		return 0;
+	if (err)
+		return translate_error(fs, ino, err);
+
+	/* Can we combine them? */
+	if (!fuse2fs_can_merge_mappings(&left, &right))
+		return 0;
+
+	/*
+	 * Delete the mapping after startoff because libext2fs cannot handle
+	 * overlapping mappings.
+	 */
+	err = ext2fs_extent_delete(handle, 0);
+	DUMP_EXTENT(ff, "remover", startoff, err, &right);
+	if (err)
+		return translate_error(fs, ino, err);
+
+	err = ext2fs_extent_fix_parents(handle);
+	DUMP_EXTENT(ff, "fixremover", startoff, err, &right);
+	if (err)
+		return translate_error(fs, ino, err);
+
+	/* Move back and lengthen the mapping before startoff */
+	err = ext2fs_extent_goto(handle, left.e_lblk);
+	DUMP_EXTENT(ff, "movel", startoff - 1, err, &left);
+	if (err)
+		return translate_error(fs, ino, err);
+
+	left.e_len += right.e_len;
+	err = ext2fs_extent_replace(handle, 0, &left);
+	DUMP_EXTENT(ff, "replacel", startoff - 1, err, &left);
+	if (err)
+		return translate_error(fs, ino, err);
+
+	err = ext2fs_extent_fix_parents(handle);
+	DUMP_EXTENT(ff, "fixreplacel", startoff - 1, err, &left);
+	if (err)
+		return translate_error(fs, ino, err);
+
+	return 0;
+}
+
+static int fuse2fs_convert_unwritten_mapping(struct fuse2fs *ff,
+					     ext2_ino_t ino,
+					     struct ext2_inode_large *inode,
+					     ext2_extent_handle_t handle,
+					     blk64_t *cursor, blk64_t stopoff)
+{
+	ext2_filsys fs = ff->fs;
+	struct ext2fs_extent extent;
+	blk64_t startoff = *cursor;
+	errcode_t err;
+
+	/*
+	 * Find the mapping at startoff.  Note that we can find holes because
+	 * the mapping data can change due to racing writes.
+	 */
+	err = fuse2fs_get_mapping_at(ff, handle, startoff, &extent);
+	if (err == EXT2_ET_EXTENT_NOT_FOUND) {
+		/*
+		 * If we didn't find any mappings at all then the file is
+		 * completely sparse.  There's nothing to convert.
+		 */
+		*cursor = stopoff;
+		return 0;
+	}
+	if (err)
+		return translate_error(fs, ino, err);
+
+	/*
+	 * The mapping is completely to the left of the range that we want.
+	 * Let's see what's in the next extent, if there is one.
+	 */
+	if (startoff >= extent.e_lblk + extent.e_len) {
+		/*
+		 * Mapping ends to the left of the current position.  Try to
+		 * find the next mapping.  If there is no next mapping, then
+		 * we're done.
+		 */
+		err = fuse2fs_get_next_mapping(ff, handle, startoff, &extent);
+		if (err == EXT2_ET_EXTENT_NOT_FOUND) {
+			*cursor = stopoff;
+			return 0;
+		}
+		if (err)
+			return translate_error(fs, ino, err);
+	}
+
+	/*
+	 * The mapping is completely to the right of the range that we want,
+	 * so we're done.
+	 */
+	if (extent.e_lblk >= stopoff) {
+		*cursor = stopoff;
+		return 0;
+	}
+
+	/*
+	 * At this point, we have a mapping that overlaps (startoff, stopoff].
+	 * If the mapping is already written, move on to the next one.
+	 */
+	if (!(extent.e_flags & EXT2_EXTENT_FLAGS_UNINIT))
+		goto next;
+
+	if (startoff > extent.e_lblk) {
+		struct ext2fs_extent newex = extent;
+
+		/*
+		 * Unwritten mapping starts before startoff.  Shorten
+		 * the previous mapping...
+		 */
+		newex.e_len = startoff - extent.e_lblk;
+		err = ext2fs_extent_replace(handle, 0, &newex);
+		DUMP_EXTENT(ff, "shortenp", startoff, err, &newex);
+		if (err)
+			return translate_error(fs, ino, err);
+
+		err = ext2fs_extent_fix_parents(handle);
+		DUMP_EXTENT(ff, "fixshortenp", startoff, err, &newex);
+		if (err)
+			return translate_error(fs, ino, err);
+
+		/* ...and create new written mapping at startoff. */
+		extent.e_len -= newex.e_len;
+		extent.e_lblk += newex.e_len;
+		extent.e_pblk += newex.e_len;
+		extent.e_flags = newex.e_flags & ~EXT2_EXTENT_FLAGS_UNINIT;
+
+		err = ext2fs_extent_insert(handle,
+					   EXT2_EXTENT_INSERT_AFTER,
+					   &extent);
+		DUMP_EXTENT(ff, "insertx", startoff, err, &extent);
+		if (err)
+			return translate_error(fs, ino, err);
+
+		err = ext2fs_extent_fix_parents(handle);
+		DUMP_EXTENT(ff, "fixinsertx", startoff, err, &extent);
+		if (err)
+			return translate_error(fs, ino, err);
+	}
+
+	if (extent.e_lblk + extent.e_len > stopoff) {
+		struct ext2fs_extent newex = extent;
+
+		/*
+		 * Unwritten mapping ends after stopoff.  Shorten the current
+		 * mapping...
+		 */
+		extent.e_len = stopoff - extent.e_lblk;
+		extent.e_flags &= ~EXT2_EXTENT_FLAGS_UNINIT;
+
+		err = ext2fs_extent_replace(handle, 0, &extent);
+		DUMP_EXTENT(ff, "shortenn", startoff, err, &extent);
+		if (err)
+			return translate_error(fs, ino, err);
+
+		err = ext2fs_extent_fix_parents(handle);
+		DUMP_EXTENT(ff, "fixshortenn", startoff, err, &extent);
+		if (err)
+			return translate_error(fs, ino, err);
+
+		/* ..and create a new unwritten mapping at stopoff. */
+		newex.e_pblk += extent.e_len;
+		newex.e_lblk += extent.e_len;
+		newex.e_len -= extent.e_len;
+		newex.e_flags |= EXT2_EXTENT_FLAGS_UNINIT;
+
+		err = ext2fs_extent_insert(handle,
+					   EXT2_EXTENT_INSERT_AFTER,
+					   &newex);
+		DUMP_EXTENT(ff, "insertn", startoff, err, &newex);
+		if (err)
+			return translate_error(fs, ino, err);
+
+		err = ext2fs_extent_fix_parents(handle);
+		DUMP_EXTENT(ff, "fixinsertn", startoff, err, &newex);
+		if (err)
+			return translate_error(fs, ino, err);
+	}
+
+	/* Still unwritten?  Update the state. */
+	if (extent.e_flags & EXT2_EXTENT_FLAGS_UNINIT) {
+		extent.e_flags &= ~EXT2_EXTENT_FLAGS_UNINIT;
+
+		err = ext2fs_extent_replace(handle, 0, &extent);
+		DUMP_EXTENT(ff, "replacex", startoff, err, &extent);
+		if (err)
+			return translate_error(fs, ino, err);
+
+		err = ext2fs_extent_fix_parents(handle);
+		DUMP_EXTENT(ff, "fixreplacex", startoff, err, &extent);
+		if (err)
+			return translate_error(fs, ino, err);
+	}
+
+next:
+	/* Try to merge with the previous extent */
+	if (startoff > 0) {
+		err = fuse2fs_try_merge_mappings(ff, ino, handle, startoff);
+		if (err)
+			return translate_error(fs, ino, err);
+	}
+
+	*cursor = extent.e_lblk + extent.e_len;
+	return 0;
+}
+
+static int fuse2fs_convert_unwritten_mappings(struct fuse2fs *ff,
+					      ext2_ino_t ino,
+					      struct ext2_inode_large *inode,
+					      off_t pos, size_t written)
+{
+	ext2_extent_handle_t handle;
+	ext2_filsys fs = ff->fs;
+	blk64_t startoff = FUSE2FS_B_TO_FSBT(ff, pos);
+	const blk64_t stopoff = FUSE2FS_B_TO_FSB(ff, pos + written);
+	errcode_t err;
+	int ret;
+
+	err = ext2fs_extent_open2(fs, ino, EXT2_INODE(inode), &handle);
+	if (err)
+		return translate_error(fs, ino, err);
+
+	/* Walk every mapping in the range, converting them. */
+	while (startoff < stopoff) {
+		blk64_t old_startoff = startoff;
+
+		ret = fuse2fs_convert_unwritten_mapping(ff, ino, inode, handle,
+							&startoff, stopoff);
+		if (ret)
+			goto out_handle;
+		if (startoff <= old_startoff) {
+			/* Do not go backwards. */
+			ret = translate_error(fs, ino, EXT2_ET_INODE_CORRUPTED);
+			goto out_handle;
+		}
+	}
+
+	/* Try to merge the right edge */
+	ret = fuse2fs_try_merge_mappings(ff, ino, handle, stopoff);
+out_handle:
+	ext2fs_extent_free(handle);
+	return ret;
+}
+
+static int op_iomap_ioend(const char *path, uint64_t nodeid, uint64_t attr_ino,
+			  off_t pos, size_t written, uint32_t ioendflags,
+			  int error, uint64_t new_addr)
+{
+	struct fuse2fs *ff = fuse2fs_get();
+	struct ext2_inode_large inode;
+	ext2_filsys fs;
+	errcode_t err;
+	bool dirty = false;
+	int ret = 0;
+
+	FUSE2FS_CHECK_CONTEXT(ff);
+
+	dbg_printf(ff,
+ "%s: path=%s nodeid=%llu attr_ino=%llu pos=0x%llx written=0x%zx ioendflags=0x%x error=%d new_addr=%llu\n",
+		   __func__, path,
+		   (unsigned long long)nodeid,
+		   (unsigned long long)attr_ino,
+		   (unsigned long long)pos,
+		   written,
+		   ioendflags,
+		   error,
+		   (unsigned long long)new_addr);
+
+	fs = fuse2fs_start(ff);
+	if (error) {
+		ret = error;
+		goto out_unlock;
+	}
+
+	/* should never see these ioend types */
+	if (ioendflags & FUSE_IOMAP_IOEND_SHARED) {
+		ret = translate_error(fs, attr_ino,
+				      EXT2_ET_FILESYSTEM_CORRUPTED);
+		goto out_unlock;
+	}
+
+	err = fuse2fs_read_inode(fs, attr_ino, &inode);
+	if (err) {
+		ret = translate_error(fs, attr_ino, err);
+		goto out_unlock;
+	}
+
+	if (ioendflags & FUSE_IOMAP_IOEND_UNWRITTEN) {
+		/* unwritten extents are only supported on extents files */
+		if (!(inode.i_flags & EXT4_EXTENTS_FL)) {
+			ret = translate_error(fs, attr_ino,
+					      EXT2_ET_FILESYSTEM_CORRUPTED);
+			goto out_unlock;
+		}
+
+		ret = fuse2fs_convert_unwritten_mappings(ff, attr_ino, &inode,
+							 pos, written);
+		if (ret)
+			goto out_unlock;
+
+		dirty = true;
+	}
+
+	if (ioendflags & FUSE_IOMAP_IOEND_APPEND) {
+		ext2_off64_t isize = EXT2_I_SIZE(&inode);
+
+		if (pos + written > isize) {
+			err = ext2fs_inode_size_set(fs, EXT2_INODE(&inode),
+						    pos + written);
+			if (err) {
+				ret = translate_error(fs, attr_ino, err);
+				goto out_unlock;
+			}
+
+			dirty = true;
+		}
+	}
+
+	if (dirty) {
+		err = fuse2fs_write_inode(fs, attr_ino, &inode);
+		if (err) {
+			ret = translate_error(fs, attr_ino, err);
+			goto out_unlock;
+		}
+	}
+
 out_unlock:
 	fuse2fs_finish(ff, ret);
 	return ret;
@@ -5612,6 +6075,7 @@ static struct fuse_operations fs_ops = {
 	.iomap_begin = op_iomap_begin,
 	.iomap_end = op_iomap_end,
 	.iomap_config = op_iomap_config,
+	.iomap_ioend = op_iomap_ioend,
 #endif /* HAVE_FUSE_IOMAP */
 };
 


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 08/17] fuse2fs: turn on iomap for pagecache IO
  2025-09-16  0:22 ` [PATCHSET RFC v5 4/9] fuse2fs: use fuse iomap data paths for better file I/O performance Darrick J. Wong
                     ` (6 preceding siblings ...)
  2025-09-16  1:00   ` [PATCH 07/17] fuse2fs: implement direct write support Darrick J. Wong
@ 2025-09-16  1:00   ` Darrick J. Wong
  2025-09-16  1:01   ` [PATCH 09/17] fuse2fs: don't zero bytes in punch hole Darrick J. Wong
                     ` (8 subsequent siblings)
  16 siblings, 0 replies; 87+ messages in thread
From: Darrick J. Wong @ 2025-09-16  1:00 UTC (permalink / raw)
  To: tytso; +Cc: miklos, neal, linux-fsdevel, linux-ext4, John, bernd,
	joannelkoong

From: Darrick J. Wong <djwong@kernel.org>

Turn on iomap for pagecache IO to regular files.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 fuse4fs/fuse4fs.c |   61 +++++++++++++++++++++++++++++++++++++++++++++++------
 misc/fuse2fs.c    |   61 +++++++++++++++++++++++++++++++++++++++++++++++------
 2 files changed, 108 insertions(+), 14 deletions(-)


diff --git a/fuse4fs/fuse4fs.c b/fuse4fs/fuse4fs.c
index b7184e3416860d..6b5d14e4f044cb 100644
--- a/fuse4fs/fuse4fs.c
+++ b/fuse4fs/fuse4fs.c
@@ -5903,9 +5903,6 @@ static int fuse4fs_iomap_begin_read(struct fuse4fs *ff, ext2_ino_t ino,
 				    uint64_t count, uint32_t opflags,
 				    struct fuse_file_iomap *read)
 {
-	if (!(opflags & FUSE_IOMAP_OP_DIRECT))
-		return -ENOSYS;
-
 	/* fall back to slow path for inline data reads */
 	if (inode->i_flags & EXT4_INLINE_DATA_FL)
 		return -ENOSYS;
@@ -5996,9 +5993,6 @@ static int fuse4fs_iomap_begin_write(struct fuse4fs *ff, ext2_ino_t ino,
 	off_t max_size = fuse4fs_max_file_size(ff, inode);
 	int ret;
 
-	if (!(opflags & FUSE_IOMAP_OP_DIRECT))
-		return -ENOSYS;
-
 	if (pos >= max_size)
 		return -EFBIG;
 
@@ -6091,12 +6085,51 @@ static void op_iomap_begin(fuse_req_t req, fuse_ino_t fino, uint64_t dontcare,
 		fuse_reply_iomap_begin(req, &read, NULL);
 }
 
+static int fuse4fs_iomap_append_setsize(struct fuse4fs *ff, ext2_ino_t ino,
+					loff_t newsize)
+{
+	ext2_filsys fs = ff->fs;
+	struct ext2_inode_large inode;
+	ext2_off64_t isize;
+	errcode_t err;
+
+	dbg_printf(ff, "%s: ino=%u newsize=%llu\n", __func__, ino,
+		   (unsigned long long)newsize);
+
+	err = fuse4fs_read_inode(fs, ino, &inode);
+	if (err)
+		return translate_error(fs, ino, err);
+
+	isize = EXT2_I_SIZE(&inode);
+	if (newsize <= isize)
+		return 0;
+
+	dbg_printf(ff, "%s: ino=%u oldsize=%llu newsize=%llu\n", __func__, ino,
+		   (unsigned long long)isize,
+		   (unsigned long long)newsize);
+
+	/*
+	 * XXX cheesily update the ondisk size even though we only want to do
+	 * the incore size until writeback happens
+	 */
+	err = ext2fs_inode_size_set(fs, EXT2_INODE(&inode), newsize);
+	if (err)
+		return translate_error(fs, ino, err);
+
+	err = fuse4fs_write_inode(fs, ino, &inode);
+	if (err)
+		return translate_error(fs, ino, err);
+
+	return 0;
+}
+
 static void op_iomap_end(fuse_req_t req, fuse_ino_t fino, uint64_t dontcare,
 			 off_t pos, uint64_t count, uint32_t opflags,
 			 ssize_t written, const struct fuse_file_iomap *iomap)
 {
 	struct fuse4fs *ff = fuse4fs_get(req);
 	ext2_ino_t ino;
+	int ret = 0;
 
 	FUSE4FS_CHECK_CONTEXT(req);
 	FUSE4FS_CONVERT_FINO(req, &ino, fino);
@@ -6110,7 +6143,21 @@ static void op_iomap_end(fuse_req_t req, fuse_ino_t fino, uint64_t dontcare,
 		   written,
 		   iomap->flags);
 
-	fuse_reply_err(req, 0);
+	fuse4fs_start(ff);
+
+	/* XXX is this really necessary? */
+	if ((opflags & FUSE_IOMAP_OP_WRITE) &&
+	    !(opflags & FUSE_IOMAP_OP_DIRECT) &&
+	    (iomap->flags & FUSE_IOMAP_F_SIZE_CHANGED) &&
+	    written > 0) {
+		ret = fuse4fs_iomap_append_setsize(ff, ino, pos + written);
+		if (ret)
+			goto out_unlock;
+	}
+
+out_unlock:
+	fuse4fs_finish(ff, ret);
+	fuse_reply_err(req, -ret);
 }
 
 /*
diff --git a/misc/fuse2fs.c b/misc/fuse2fs.c
index 9bcf2c81b7e732..afc65c774dc148 100644
--- a/misc/fuse2fs.c
+++ b/misc/fuse2fs.c
@@ -5345,9 +5345,6 @@ static int fuse2fs_iomap_begin_read(struct fuse2fs *ff, ext2_ino_t ino,
 				    uint64_t count, uint32_t opflags,
 				    struct fuse_file_iomap *read)
 {
-	if (!(opflags & FUSE_IOMAP_OP_DIRECT))
-		return -ENOSYS;
-
 	/* fall back to slow path for inline data reads */
 	if (inode->i_flags & EXT4_INLINE_DATA_FL)
 		return -ENOSYS;
@@ -5435,9 +5432,6 @@ static int fuse2fs_iomap_begin_write(struct fuse2fs *ff, ext2_ino_t ino,
 	off_t max_size = fuse2fs_max_file_size(ff, inode);
 	int ret;
 
-	if (!(opflags & FUSE_IOMAP_OP_DIRECT))
-		return -ENOSYS;
-
 	if (pos >= max_size)
 		return -EFBIG;
 
@@ -5529,11 +5523,50 @@ static int op_iomap_begin(const char *path, uint64_t nodeid, uint64_t attr_ino,
 	return ret;
 }
 
+static int fuse2fs_iomap_append_setsize(struct fuse2fs *ff, ext2_ino_t ino,
+					loff_t newsize)
+{
+	ext2_filsys fs = ff->fs;
+	struct ext2_inode_large inode;
+	ext2_off64_t isize;
+	errcode_t err;
+
+	dbg_printf(ff, "%s: ino=%u newsize=%llu\n", __func__, ino,
+		   (unsigned long long)newsize);
+
+	err = fuse2fs_read_inode(fs, ino, &inode);
+	if (err)
+		return translate_error(fs, ino, err);
+
+	isize = EXT2_I_SIZE(&inode);
+	if (newsize <= isize)
+		return 0;
+
+	dbg_printf(ff, "%s: ino=%u oldsize=%llu newsize=%llu\n", __func__, ino,
+		   (unsigned long long)isize,
+		   (unsigned long long)newsize);
+
+	/*
+	 * XXX cheesily update the ondisk size even though we only want to do
+	 * the incore size until writeback happens
+	 */
+	err = ext2fs_inode_size_set(fs, EXT2_INODE(&inode), newsize);
+	if (err)
+		return translate_error(fs, ino, err);
+
+	err = fuse2fs_write_inode(fs, ino, &inode);
+	if (err)
+		return translate_error(fs, ino, err);
+
+	return 0;
+}
+
 static int op_iomap_end(const char *path, uint64_t nodeid, uint64_t attr_ino,
 			off_t pos, uint64_t count, uint32_t opflags,
 			ssize_t written, const struct fuse_file_iomap *iomap)
 {
 	struct fuse2fs *ff = fuse2fs_get();
+	int ret = 0;
 
 	FUSE2FS_CHECK_CONTEXT(ff);
 
@@ -5548,7 +5581,21 @@ static int op_iomap_end(const char *path, uint64_t nodeid, uint64_t attr_ino,
 		   written,
 		   iomap->flags);
 
-	return 0;
+	fuse2fs_start(ff);
+
+	/* XXX is this really necessary? */
+	if ((opflags & FUSE_IOMAP_OP_WRITE) &&
+	    !(opflags & FUSE_IOMAP_OP_DIRECT) &&
+	    (iomap->flags & FUSE_IOMAP_F_SIZE_CHANGED) &&
+	    written > 0) {
+		ret = fuse2fs_iomap_append_setsize(ff, attr_ino, pos + written);
+		if (ret)
+			goto out_unlock;
+	}
+
+out_unlock:
+	fuse2fs_finish(ff, ret);
+	return ret;
 }
 
 /*


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 09/17] fuse2fs: don't zero bytes in punch hole
  2025-09-16  0:22 ` [PATCHSET RFC v5 4/9] fuse2fs: use fuse iomap data paths for better file I/O performance Darrick J. Wong
                     ` (7 preceding siblings ...)
  2025-09-16  1:00   ` [PATCH 08/17] fuse2fs: turn on iomap for pagecache IO Darrick J. Wong
@ 2025-09-16  1:01   ` Darrick J. Wong
  2025-09-16  1:01   ` [PATCH 10/17] fuse2fs: don't do file data block IO when iomap is enabled Darrick J. Wong
                     ` (7 subsequent siblings)
  16 siblings, 0 replies; 87+ messages in thread
From: Darrick J. Wong @ 2025-09-16  1:01 UTC (permalink / raw)
  To: tytso; +Cc: miklos, neal, linux-fsdevel, linux-ext4, John, bernd,
	joannelkoong

From: Darrick J. Wong <djwong@kernel.org>

When iomap is in use for the pagecache, it will take care of zeroing the
unaligned parts of punched out regions so we don't have to do it
ourselves.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 fuse4fs/fuse4fs.c |    8 ++++++++
 misc/fuse2fs.c    |    9 +++++++++
 2 files changed, 17 insertions(+)


diff --git a/fuse4fs/fuse4fs.c b/fuse4fs/fuse4fs.c
index 6b5d14e4f044cb..6c9e725d54b87a 100644
--- a/fuse4fs/fuse4fs.c
+++ b/fuse4fs/fuse4fs.c
@@ -5333,6 +5333,10 @@ static errcode_t fuse4fs_zero_middle(struct fuse4fs *ff, ext2_ino_t ino,
 	int retflags;
 	errcode_t err;
 
+	/* the kernel does this for us in iomap mode */
+	if (fuse4fs_iomap_enabled(ff))
+		return 0;
+
 	if (!*buf) {
 		err = ext2fs_get_mem(fs->blocksize, buf);
 		if (err)
@@ -5369,6 +5373,10 @@ static errcode_t fuse4fs_zero_edge(struct fuse4fs *ff, ext2_ino_t ino,
 	off_t residue;
 	errcode_t err;
 
+	/* the kernel does this for us in iomap mode */
+	if (fuse4fs_iomap_enabled(ff))
+		return 0;
+
 	residue = FUSE4FS_OFF_IN_FSB(ff, offset);
 	if (residue == 0)
 		return 0;
diff --git a/misc/fuse2fs.c b/misc/fuse2fs.c
index afc65c774dc148..5dbd8c5a17f79d 100644
--- a/misc/fuse2fs.c
+++ b/misc/fuse2fs.c
@@ -570,6 +570,7 @@ static inline int fuse2fs_iomap_enabled(const struct fuse2fs *ff)
 }
 #else
 # define fuse2fs_iomap_enabled(...)	(0)
+# define fuse2fs_iomap_enabled(...)	(0)
 #endif
 
 static inline void fuse2fs_dump_extents(struct fuse2fs *ff, ext2_ino_t ino,
@@ -4776,6 +4777,10 @@ static errcode_t clean_block_middle(struct fuse2fs *ff, ext2_ino_t ino,
 	int retflags;
 	errcode_t err;
 
+	/* the kernel does this for us in iomap mode */
+	if (fuse2fs_iomap_enabled(ff))
+		return 0;
+
 	if (!*buf) {
 		err = ext2fs_get_mem(fs->blocksize, buf);
 		if (err)
@@ -4812,6 +4817,10 @@ static errcode_t clean_block_edge(struct fuse2fs *ff, ext2_ino_t ino,
 	off_t residue;
 	errcode_t err;
 
+	/* the kernel does this for us in iomap mode */
+	if (fuse2fs_iomap_enabled(ff))
+		return 0;
+
 	residue = FUSE2FS_OFF_IN_FSB(ff, offset);
 	if (residue == 0)
 		return 0;


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 10/17] fuse2fs: don't do file data block IO when iomap is enabled
  2025-09-16  0:22 ` [PATCHSET RFC v5 4/9] fuse2fs: use fuse iomap data paths for better file I/O performance Darrick J. Wong
                     ` (8 preceding siblings ...)
  2025-09-16  1:01   ` [PATCH 09/17] fuse2fs: don't zero bytes in punch hole Darrick J. Wong
@ 2025-09-16  1:01   ` Darrick J. Wong
  2025-09-16  1:01   ` [PATCH 11/17] fuse2fs: avoid fuseblk mode if fuse-iomap support is likely Darrick J. Wong
                     ` (6 subsequent siblings)
  16 siblings, 0 replies; 87+ messages in thread
From: Darrick J. Wong @ 2025-09-16  1:01 UTC (permalink / raw)
  To: tytso; +Cc: miklos, neal, linux-fsdevel, linux-ext4, John, bernd,
	joannelkoong

From: Darrick J. Wong <djwong@kernel.org>

When iomap is in use for the page cache, the kernel will take care of
all the file data block IO for us, including zeroing of punched ranges
and post-EOF bytes.  fuse2fs only needs to do IO for inline data.

Therefore, set the NOBLOCKIO ext2_file flag so that libext2fs will not
do any regular file IO to or from disk blocks at all.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 fuse4fs/fuse4fs.c |   11 +++++++-
 misc/fuse2fs.c    |   72 ++++++++++++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 81 insertions(+), 2 deletions(-)


diff --git a/fuse4fs/fuse4fs.c b/fuse4fs/fuse4fs.c
index 6c9e725d54b87a..e482b00f14d572 100644
--- a/fuse4fs/fuse4fs.c
+++ b/fuse4fs/fuse4fs.c
@@ -3408,9 +3408,14 @@ static int fuse4fs_truncate(struct fuse4fs *ff, ext2_ino_t ino, off_t new_size)
 	ext2_file_t file;
 	__u64 old_isize;
 	errcode_t err;
+	int flags = EXT2_FILE_WRITE;
 	int ret = 0;
 
-	err = ext2fs_file_open(fs, ino, EXT2_FILE_WRITE, &file);
+	/* the kernel handles all eof zeroing for us in iomap mode */
+	if (fuse4fs_iomap_enabled(ff))
+		flags |= EXT2_FILE_NOBLOCKIO;
+
+	err = ext2fs_file_open(fs, ino, flags, &file);
 	if (err)
 		return translate_error(fs, ino, err);
 
@@ -3505,6 +3510,10 @@ static int fuse4fs_open_file(struct fuse4fs *ff, const struct fuse_ctx *ctxt,
 	if (linked)
 		check |= L_OK;
 
+	/* the kernel handles all block IO for us in iomap mode */
+	if (fuse4fs_iomap_enabled(ff))
+		file->open_flags |= EXT2_FILE_NOBLOCKIO;
+
 	/*
 	 * If the caller wants to truncate the file, we need to ask for full
 	 * write access even if the caller claims to be appending.
diff --git a/misc/fuse2fs.c b/misc/fuse2fs.c
index 5dbd8c5a17f79d..c13bd6c3baf9c9 100644
--- a/misc/fuse2fs.c
+++ b/misc/fuse2fs.c
@@ -3145,15 +3145,72 @@ static int fuse2fs_punch_posteof(struct fuse2fs *ff, ext2_ino_t ino,
 	return 0;
 }
 
+/*
+ * Decide if file IO for this inode can use iomap.
+ *
+ * It turns out that libfuse creates internal node ids that have nothing to do
+ * with the ext2_ino_t that we give it.  These internal node ids are what
+ * actually gets igetted in the kernel, which means that there can be multiple
+ * fuse_inode objects in the kernel for a single hardlinked ondisk ext2 inode.
+ *
+ * What this means, horrifyingly, is that on a fuse filesystem that supports
+ * hard links, the in-kernel i_rwsem does not protect against concurrent writes
+ * between files that point to the same inode.  That in turn means that the
+ * file mode and size can get desynchronized between the multiple fuse_inode
+ * objects.  This also means that we cannot cache iomaps in the kernel AT ALL
+ * because the caches will get out of sync, leading to WARN_ONs from the iomap
+ * zeroing code and probably data corruption after that.
+ *
+ * Therefore, libfuse won't let us create hardlinks of iomap files, and we must
+ * never turn on iomap for existing hardlinked files.  Long term it means we
+ * have to find a way around this loss of functionality.  fuse4fs gets around
+ * this by being a low level fuse driver and controlling the nodeids itself.
+ *
+ * Returns 0 for no, 1 for yes, or a negative errno.
+ */
+#ifdef HAVE_FUSE_IOMAP
+static int fuse2fs_file_uses_iomap(struct fuse2fs *ff, ext2_ino_t ino)
+{
+	struct stat statbuf;
+	int ret;
+
+	if (!fuse2fs_iomap_enabled(ff))
+		return 0;
+
+	ret = stat_inode(ff->fs, ino, &statbuf);
+	if (ret)
+		return ret;
+
+	/* the kernel handles all block IO for us in iomap mode */
+	return fuse_fs_can_enable_iomap(&statbuf);
+}
+#else
+# define fuse2fs_file_uses_iomap(...)	(0)
+#endif
+
 static int fuse2fs_truncate(struct fuse2fs *ff, ext2_ino_t ino, off_t new_size)
 {
 	ext2_filsys fs = ff->fs;
 	ext2_file_t file;
 	__u64 old_isize;
 	errcode_t err;
+	int flags = EXT2_FILE_WRITE;
 	int ret = 0;
 
-	err = ext2fs_file_open(fs, ino, EXT2_FILE_WRITE, &file);
+	/* the kernel handles all eof zeroing for us in iomap mode */
+	ret = fuse2fs_file_uses_iomap(ff, ino);
+	switch (ret) {
+	case 0:
+		break;
+	case 1:
+		flags |= EXT2_FILE_NOBLOCKIO;
+		ret = 0;
+		break;
+	default:
+		return ret;
+	}
+
+	err = ext2fs_file_open(fs, ino, flags, &file);
 	if (err)
 		return translate_error(fs, ino, err);
 
@@ -3308,6 +3365,19 @@ static int __op_open(struct fuse2fs *ff, const char *path,
 			goto out;
 	}
 
+	/* the kernel handles all block IO for us in iomap mode */
+	ret = fuse2fs_file_uses_iomap(ff, file->ino);
+	switch (ret) {
+	case 0:
+		break;
+	case 1:
+		file->open_flags |= EXT2_FILE_NOBLOCKIO;
+		ret = 0;
+		break;
+	default:
+		goto out;
+	}
+
 	if (fp->flags & O_TRUNC) {
 		ret = fuse2fs_truncate(ff, file->ino, 0);
 		if (ret)


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 11/17] fuse2fs: avoid fuseblk mode if fuse-iomap support is likely
  2025-09-16  0:22 ` [PATCHSET RFC v5 4/9] fuse2fs: use fuse iomap data paths for better file I/O performance Darrick J. Wong
                     ` (9 preceding siblings ...)
  2025-09-16  1:01   ` [PATCH 10/17] fuse2fs: don't do file data block IO when iomap is enabled Darrick J. Wong
@ 2025-09-16  1:01   ` Darrick J. Wong
  2025-09-16  1:01   ` [PATCH 12/17] fuse2fs: enable file IO to inline data files Darrick J. Wong
                     ` (5 subsequent siblings)
  16 siblings, 0 replies; 87+ messages in thread
From: Darrick J. Wong @ 2025-09-16  1:01 UTC (permalink / raw)
  To: tytso; +Cc: miklos, neal, linux-fsdevel, linux-ext4, John, bernd,
	joannelkoong

From: Darrick J. Wong <djwong@kernel.org>

Since fuse in iomap mode guarantees that op_destroy will be called
before umount returns, we don't need to use fuseblk mode to get that
guarantee.  Disable fuseblk mode, which saves us the trouble of closing
and reopening the device.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 fuse4fs/fuse4fs.c |   20 +++++++++++++++++++-
 misc/fuse2fs.c    |   20 +++++++++++++++++++-
 2 files changed, 38 insertions(+), 2 deletions(-)


diff --git a/fuse4fs/fuse4fs.c b/fuse4fs/fuse4fs.c
index e482b00f14d572..8965edbaf9b834 100644
--- a/fuse4fs/fuse4fs.c
+++ b/fuse4fs/fuse4fs.c
@@ -273,6 +273,7 @@ struct fuse4fs {
 	enum fuse4fs_feature_toggle iomap_want;
 	enum fuse4fs_iomap_state iomap_state;
 	uint32_t iomap_dev;
+	uint64_t iomap_cap;
 #endif
 	unsigned int blockmask;
 	unsigned long offset;
@@ -1240,6 +1241,8 @@ static errcode_t fuse4fs_open(struct fuse4fs *ff, int libext2_flags)
 	if (ff->directio)
 		flags |= EXT2_FLAG_DIRECT_IO;
 
+	dbg_printf(ff, "opening with flags=0x%x\n", flags);
+
 	err = ext2fs_open2(ff->device, options, flags, 0, 0, unix_io_manager,
 			   &ff->fs);
 	if (err == EPERM) {
@@ -6930,6 +6933,19 @@ static unsigned long long default_cache_size(void)
 	return ret;
 }
 
+#ifdef HAVE_FUSE_IOMAP
+static inline bool fuse4fs_discover_iomap(struct fuse4fs *ff)
+{
+	if (ff->iomap_want == FT_DISABLE)
+		return false;
+
+	ff->iomap_cap = fuse_lowlevel_discover_iomap(-1);
+	return ff->iomap_cap & FUSE_IOMAP_SUPPORT_FILEIO;
+}
+#else
+# define fuse4fs_discover_iomap(...)	(false)
+#endif
+
 static inline bool fuse4fs_want_fuseblk(const struct fuse4fs *ff)
 {
 	if (ff->noblkdev)
@@ -7071,6 +7087,7 @@ int main(int argc, char *argv[])
 	errcode_t err;
 	FILE *orig_stderr = stderr;
 	char extra_args[BUFSIZ];
+	bool iomap_detected = false;
 	int ret;
 
 	ret = fuse_opt_parse(&args, &fctx, fuse4fs_opts, fuse4fs_opt_proc);
@@ -7144,7 +7161,8 @@ int main(int argc, char *argv[])
 		goto out;
 	}
 
-	if (fuse4fs_want_fuseblk(&fctx)) {
+	iomap_detected = fuse4fs_discover_iomap(&fctx);
+	if (!iomap_detected && fuse4fs_want_fuseblk(&fctx)) {
 		/*
 		 * If this is a block device, we want to close the fs, reopen
 		 * the block device in non-exclusive mode, and start the fuse
diff --git a/misc/fuse2fs.c b/misc/fuse2fs.c
index c13bd6c3baf9c9..7fa4070dee0367 100644
--- a/misc/fuse2fs.c
+++ b/misc/fuse2fs.c
@@ -267,6 +267,7 @@ struct fuse2fs {
 	enum fuse2fs_feature_toggle iomap_want;
 	enum fuse2fs_iomap_state iomap_state;
 	uint32_t iomap_dev;
+	uint64_t iomap_cap;
 #endif
 	unsigned int blockmask;
 	unsigned long offset;
@@ -1052,6 +1053,8 @@ static errcode_t fuse2fs_open(struct fuse2fs *ff, int libext2_flags)
 	if (ff->directio)
 		flags |= EXT2_FLAG_DIRECT_IO;
 
+	dbg_printf(ff, "opening with flags=0x%x\n", flags);
+
 	err = ext2fs_open2(ff->device, options, flags, 0, 0, unix_io_manager,
 			   &ff->fs);
 	if (err == EPERM) {
@@ -6426,6 +6429,19 @@ static unsigned long long default_cache_size(void)
 	return ret;
 }
 
+#ifdef HAVE_FUSE_IOMAP
+static inline bool fuse2fs_discover_iomap(struct fuse2fs *ff)
+{
+	if (ff->iomap_want == FT_DISABLE)
+		return false;
+
+	ff->iomap_cap = fuse_lowlevel_discover_iomap(-1);
+	return ff->iomap_cap & FUSE_IOMAP_SUPPORT_FILEIO;
+}
+#else
+# define fuse2fs_discover_iomap(...)	(false)
+#endif
+
 static inline bool fuse2fs_want_fuseblk(const struct fuse2fs *ff)
 {
 	if (ff->noblkdev)
@@ -6466,6 +6482,7 @@ int main(int argc, char *argv[])
 	errcode_t err;
 	FILE *orig_stderr = stderr;
 	char extra_args[BUFSIZ];
+	bool iomap_detected = false;
 	int ret;
 
 	ret = fuse_opt_parse(&args, &fctx, fuse2fs_opts, fuse2fs_opt_proc);
@@ -6539,7 +6556,8 @@ int main(int argc, char *argv[])
 		goto out;
 	}
 
-	if (fuse2fs_want_fuseblk(&fctx)) {
+	iomap_detected = fuse2fs_discover_iomap(&fctx);
+	if (!iomap_detected && fuse2fs_want_fuseblk(&fctx)) {
 		/*
 		 * If this is a block device, we want to close the fs, reopen
 		 * the block device in non-exclusive mode, and start the fuse


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 12/17] fuse2fs: enable file IO to inline data files
  2025-09-16  0:22 ` [PATCHSET RFC v5 4/9] fuse2fs: use fuse iomap data paths for better file I/O performance Darrick J. Wong
                     ` (10 preceding siblings ...)
  2025-09-16  1:01   ` [PATCH 11/17] fuse2fs: avoid fuseblk mode if fuse-iomap support is likely Darrick J. Wong
@ 2025-09-16  1:01   ` Darrick J. Wong
  2025-09-16  1:02   ` [PATCH 13/17] fuse2fs: set iomap-related inode flags Darrick J. Wong
                     ` (4 subsequent siblings)
  16 siblings, 0 replies; 87+ messages in thread
From: Darrick J. Wong @ 2025-09-16  1:01 UTC (permalink / raw)
  To: tytso; +Cc: miklos, neal, linux-fsdevel, linux-ext4, John, bernd,
	joannelkoong

From: Darrick J. Wong <djwong@kernel.org>

Enable file reads and writes from inline data files.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 fuse4fs/fuse4fs.c |    3 ++-
 misc/fuse2fs.c    |   42 ++++++++++++++++++++++++++++++++++++++++--
 2 files changed, 42 insertions(+), 3 deletions(-)


diff --git a/fuse4fs/fuse4fs.c b/fuse4fs/fuse4fs.c
index 8965edbaf9b834..47ad8215c3e1d1 100644
--- a/fuse4fs/fuse4fs.c
+++ b/fuse4fs/fuse4fs.c
@@ -5925,7 +5925,8 @@ static int fuse4fs_iomap_begin_read(struct fuse4fs *ff, ext2_ino_t ino,
 {
 	/* fall back to slow path for inline data reads */
 	if (inode->i_flags & EXT4_INLINE_DATA_FL)
-		return -ENOSYS;
+		return fuse4fs_iomap_begin_inline(ff, ino, inode, pos, count,
+						  read);
 
 	if (inode->i_flags & EXT4_EXTENTS_FL)
 		return fuse4fs_iomap_begin_extent(ff, ino, inode, pos, count,
diff --git a/misc/fuse2fs.c b/misc/fuse2fs.c
index 7fa4070dee0367..5dc0b0606112af 100644
--- a/misc/fuse2fs.c
+++ b/misc/fuse2fs.c
@@ -1480,7 +1480,16 @@ static void *op_init(struct fuse_conn_info *conn,
 	cfg->use_ino = 1;
 	if (ff->debug)
 		cfg->debug = 1;
-	cfg->nullpath_ok = 1;
+
+	/*
+	 * Inline data file io depends on op_read/write being fed a path, so we
+	 * have to slow everyone down to look up the path from the nodeid.
+	 */
+	if (fuse2fs_iomap_enabled(ff) &&
+	    ext2fs_has_feature_inline_data(ff->fs->super))
+		cfg->nullpath_ok = 0;
+	else
+		cfg->nullpath_ok = 1;
 
 	if (ff->kernel) {
 		char uuid[UUID_STR_SIZE];
@@ -3412,6 +3421,9 @@ static int op_read(const char *path EXT2FS_ATTR((unused)), char *buf,
 		   size_t len, off_t offset,
 		   struct fuse_file_info *fp)
 {
+	struct fuse2fs_file_handle fhurk = {
+		.magic = FUSE2FS_FILE_MAGIC,
+	};
 	struct fuse2fs *ff = fuse2fs_get();
 	struct fuse2fs_file_handle *fh = fuse2fs_get_handle(fp);
 	ext2_filsys fs;
@@ -3421,10 +3433,21 @@ static int op_read(const char *path EXT2FS_ATTR((unused)), char *buf,
 	int ret = 0;
 
 	FUSE2FS_CHECK_CONTEXT(ff);
+
+	if (!fh)
+		fh = &fhurk;
+
 	FUSE2FS_CHECK_HANDLE(ff, fh);
 	dbg_printf(ff, "%s: ino=%d off=0x%llx len=0x%zx\n", __func__, fh->ino,
 		   (unsigned long long)offset, len);
 	fs = fuse2fs_start(ff);
+
+	if (fh == &fhurk) {
+		ret = fuse2fs_file_ino(ff, path, NULL, &fhurk.ino);
+		if (ret)
+			goto out;
+	}
+
 	err = ext2fs_file_open(fs, fh->ino, fh->open_flags, &efp);
 	if (err) {
 		ret = translate_error(fs, fh->ino, err);
@@ -3466,6 +3489,10 @@ static int op_write(const char *path EXT2FS_ATTR((unused)),
 		    const char *buf, size_t len, off_t offset,
 		    struct fuse_file_info *fp)
 {
+	struct fuse2fs_file_handle fhurk = {
+		.magic = FUSE2FS_FILE_MAGIC,
+		.open_flags = EXT2_FILE_WRITE,
+	};
 	struct fuse2fs *ff = fuse2fs_get();
 	struct fuse2fs_file_handle *fh = fuse2fs_get_handle(fp);
 	ext2_filsys fs;
@@ -3475,6 +3502,10 @@ static int op_write(const char *path EXT2FS_ATTR((unused)),
 	int ret = 0;
 
 	FUSE2FS_CHECK_CONTEXT(ff);
+
+	if (!fh)
+		fh = &fhurk;
+
 	FUSE2FS_CHECK_HANDLE(ff, fh);
 	dbg_printf(ff, "%s: ino=%d off=0x%llx len=0x%zx\n", __func__, fh->ino,
 		   (unsigned long long) offset, len);
@@ -3489,6 +3520,12 @@ static int op_write(const char *path EXT2FS_ATTR((unused)),
 		goto out;
 	}
 
+	if (fh == &fhurk) {
+		ret = fuse2fs_file_ino(ff, path, NULL, &fhurk.ino);
+		if (ret)
+			goto out;
+	}
+
 	err = ext2fs_file_open(fs, fh->ino, fh->open_flags, &efp);
 	if (err) {
 		ret = translate_error(fs, fh->ino, err);
@@ -5429,7 +5466,8 @@ static int fuse2fs_iomap_begin_read(struct fuse2fs *ff, ext2_ino_t ino,
 {
 	/* fall back to slow path for inline data reads */
 	if (inode->i_flags & EXT4_INLINE_DATA_FL)
-		return -ENOSYS;
+		return fuse2fs_iomap_begin_inline(ff, ino, inode, pos, count,
+						  read);
 
 	if (inode->i_flags & EXT4_EXTENTS_FL)
 		return fuse2fs_iomap_begin_extent(ff, ino, inode, pos, count,


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 13/17] fuse2fs: set iomap-related inode flags
  2025-09-16  0:22 ` [PATCHSET RFC v5 4/9] fuse2fs: use fuse iomap data paths for better file I/O performance Darrick J. Wong
                     ` (11 preceding siblings ...)
  2025-09-16  1:01   ` [PATCH 12/17] fuse2fs: enable file IO to inline data files Darrick J. Wong
@ 2025-09-16  1:02   ` Darrick J. Wong
  2025-09-16  1:02   ` [PATCH 14/17] fuse2fs: configure block device block size Darrick J. Wong
                     ` (3 subsequent siblings)
  16 siblings, 0 replies; 87+ messages in thread
From: Darrick J. Wong @ 2025-09-16  1:02 UTC (permalink / raw)
  To: tytso; +Cc: miklos, neal, linux-fsdevel, linux-ext4, John, bernd,
	joannelkoong

From: Darrick J. Wong <djwong@kernel.org>

Set FUSE_IFLAG_* when we do a getattr, so that all files will have iomap
enabled.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 fuse4fs/fuse4fs.c |   46 +++++++++++++++++++++++++++++++++++-----------
 misc/fuse2fs.c    |   20 ++++++++++++++++++++
 2 files changed, 55 insertions(+), 11 deletions(-)


diff --git a/fuse4fs/fuse4fs.c b/fuse4fs/fuse4fs.c
index 47ad8215c3e1d1..37999d864a05e5 100644
--- a/fuse4fs/fuse4fs.c
+++ b/fuse4fs/fuse4fs.c
@@ -1685,6 +1685,7 @@ static void op_init(void *userdata, struct fuse_conn_info *conn)
 
 struct fuse4fs_stat {
 	struct fuse_entry_param	entry;
+	unsigned int iflags;
 };
 
 static int fuse4fs_stat_inode(struct fuse4fs *ff, ext2_ino_t ino,
@@ -1750,9 +1751,29 @@ static int fuse4fs_stat_inode(struct fuse4fs *ff, ext2_ino_t ino,
 	entry->attr_timeout = FUSE4FS_ATTR_TIMEOUT;
 	entry->entry_timeout = FUSE4FS_ATTR_TIMEOUT;
 
+	fstat->iflags = 0;
+#ifdef HAVE_FUSE_IOMAP
+	if (fuse4fs_iomap_enabled(ff))
+		fstat->iflags |= FUSE_IFLAG_IOMAP;
+#endif
+
 	return 0;
 }
 
+#if FUSE_VERSION < FUSE_MAKE_VERSION(3, 99)
+#define fuse_reply_entry_iflags(req, entry, iflags) \
+	fuse_reply_entry((req), (entry))
+
+#define fuse_reply_attr_iflags(req, entry, iflags, timeout) \
+	fuse_reply_attr((req), (entry), (timeout))
+
+#define fuse_add_direntry_plus_iflags(req, buf, sz, name, iflags, entry, dirpos) \
+	fuse_add_direntry_plus((req), (buf), (sz), (name), (entry), (dirpos))
+
+#define fuse_reply_create_iflags(req, entry, iflags, fp) \
+	fuse_reply_create((req), (entry), (fp))
+#endif
+
 static void op_lookup(fuse_req_t req, fuse_ino_t fino, const char *name)
 {
 	struct fuse4fs_stat fstat;
@@ -1783,7 +1804,7 @@ static void op_lookup(fuse_req_t req, fuse_ino_t fino, const char *name)
 	if (ret)
 		fuse_reply_err(req, -ret);
 	else
-		fuse_reply_entry(req, &fstat.entry);
+		fuse_reply_entry_iflags(req, &fstat.entry, fstat.iflags);
 }
 
 static void op_getattr(fuse_req_t req, fuse_ino_t fino,
@@ -1803,8 +1824,8 @@ static void op_getattr(fuse_req_t req, fuse_ino_t fino,
 	if (ret)
 		fuse_reply_err(req, -ret);
 	else
-		fuse_reply_attr(req, &fstat.entry.attr,
-				fstat.entry.attr_timeout);
+		fuse_reply_attr_iflags(req, &fstat.entry.attr, fstat.iflags,
+				       fstat.entry.attr_timeout);
 }
 
 static void op_readlink(fuse_req_t req, fuse_ino_t fino)
@@ -2082,7 +2103,7 @@ static void fuse4fs_reply_entry(fuse_req_t req, ext2_ino_t ino,
 		return;
 	}
 
-	fuse_reply_entry(req, &fstat.entry);
+	fuse_reply_entry_iflags(req, &fstat.entry, fstat.iflags);
 }
 
 static void op_mknod(fuse_req_t req, fuse_ino_t fino, const char *name,
@@ -4352,10 +4373,13 @@ static int op_readdir_iter(ext2_ino_t dir EXT2FS_ATTR((unused)),
 	namebuf[dirent->name_len & 0xFF] = 0;
 
 	if (i->readdirplus) {
-		entrysize = fuse_add_direntry_plus(i->req, i->buf + i->bufused,
-						   i->bufsz - i->bufused,
-						   namebuf, &fstat.entry,
-						   i->dirpos);
+		entrysize = fuse_add_direntry_plus_iflags(i->req,
+							  i->buf + i->bufused,
+							  i->bufsz - i->bufused,
+							  namebuf,
+							  fstat.iflags,
+							  &fstat.entry,
+							  i->dirpos);
 	} else {
 		entrysize = fuse_add_direntry(i->req, i->buf + i->bufused,
 					      i->bufsz - i->bufused, namebuf,
@@ -4580,7 +4604,7 @@ static void op_create(fuse_req_t req, fuse_ino_t fino, const char *name,
 	if (ret)
 		fuse_reply_err(req, -ret);
 	else
-		fuse_reply_create(req, &fstat.entry, fp);
+		fuse_reply_create_iflags(req, &fstat.entry, fstat.iflags, fp);
 }
 
 #if FUSE_VERSION >= FUSE_MAKE_VERSION(3, 17)
@@ -4779,8 +4803,8 @@ static void op_setattr(fuse_req_t req, fuse_ino_t fino, struct stat *attr,
 	if (ret)
 		fuse_reply_err(req, -ret);
 	else
-		fuse_reply_attr(req, &fstat.entry.attr,
-				fstat.entry.attr_timeout);
+		fuse_reply_attr_iflags(req, &fstat.entry.attr, fstat.iflags,
+				       fstat.entry.attr_timeout);
 }
 
 #define FUSE4FS_MODIFIABLE_IFLAGS \
diff --git a/misc/fuse2fs.c b/misc/fuse2fs.c
index 5dc0b0606112af..32fcada0426752 100644
--- a/misc/fuse2fs.c
+++ b/misc/fuse2fs.c
@@ -1619,6 +1619,23 @@ static int op_getattr(const char *path, struct stat *statbuf,
 	return ret;
 }
 
+#if FUSE_VERSION >= FUSE_MAKE_VERSION(3, 99)
+static int op_getattr_iflags(const char *path, struct stat *statbuf,
+			     unsigned int *iflags, struct fuse_file_info *fi)
+{
+	int ret = op_getattr(path, statbuf, fi);
+
+	if (ret)
+		return ret;
+
+	if (fuse_fs_can_enable_iomap(statbuf))
+		*iflags |= FUSE_IFLAG_IOMAP;
+
+	return 0;
+}
+#endif
+
+
 static int op_readlink(const char *path, char *buf, size_t len)
 {
 	struct fuse2fs *ff = fuse2fs_get();
@@ -6238,6 +6255,9 @@ static struct fuse_operations fs_ops = {
 #ifdef SUPPORT_FALLOCATE
 	.fallocate = op_fallocate,
 #endif
+#if FUSE_VERSION >= FUSE_MAKE_VERSION(3, 99)
+	.getattr_iflags = op_getattr_iflags,
+#endif
 #ifdef HAVE_FUSE_IOMAP
 	.iomap_begin = op_iomap_begin,
 	.iomap_end = op_iomap_end,


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 14/17] fuse2fs: configure block device block size
  2025-09-16  0:22 ` [PATCHSET RFC v5 4/9] fuse2fs: use fuse iomap data paths for better file I/O performance Darrick J. Wong
                     ` (12 preceding siblings ...)
  2025-09-16  1:02   ` [PATCH 13/17] fuse2fs: set iomap-related inode flags Darrick J. Wong
@ 2025-09-16  1:02   ` Darrick J. Wong
  2025-09-16  1:02   ` [PATCH 15/17] fuse4fs: separate invalidation Darrick J. Wong
                     ` (2 subsequent siblings)
  16 siblings, 0 replies; 87+ messages in thread
From: Darrick J. Wong @ 2025-09-16  1:02 UTC (permalink / raw)
  To: tytso; +Cc: miklos, neal, linux-fsdevel, linux-ext4, John, bernd,
	joannelkoong

From: Darrick J. Wong <djwong@kernel.org>

Set the blocksize of the block device to the filesystem blocksize.
This prevents the bdev pagecache from caching file data blocks that
iomap will read and write directly.  Cache duplication is dangerous.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 fuse4fs/fuse4fs.c |   43 +++++++++++++++++++++++++++++++++++++++++++
 misc/fuse2fs.c    |   43 +++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 86 insertions(+)


diff --git a/fuse4fs/fuse4fs.c b/fuse4fs/fuse4fs.c
index 37999d864a05e5..6f3ddceea85c27 100644
--- a/fuse4fs/fuse4fs.c
+++ b/fuse4fs/fuse4fs.c
@@ -6247,6 +6247,45 @@ static off_t fuse4fs_max_size(struct fuse4fs *ff, off_t upper_limit)
 	return res;
 }
 
+/*
+ * Set the block device's blocksize to the fs blocksize.
+ *
+ * This is required to avoid creating uptodate bdev pagecache that aliases file
+ * data blocks because iomap reads and writes directly to file data blocks.
+ */
+static int fuse4fs_set_bdev_blocksize(struct fuse4fs *ff, int fd)
+{
+	int blocksize = ff->fs->blocksize;
+	int set_error;
+	int ret;
+
+	ret = ioctl(fd, BLKBSZSET, &blocksize);
+	if (!ret)
+		return 0;
+
+	/*
+	 * Save the original errno so we can report that if the block device
+	 * blocksize isn't set in an agreeable way.
+	 */
+	set_error = errno;
+
+	ret = ioctl(fd, BLKBSZGET, &blocksize);
+	if (ret)
+		goto out_bad;
+
+	/* Pretend that BLKBSZSET rejected our proposed block size */
+	if (blocksize > ff->fs->blocksize) {
+		set_error = EINVAL;
+		goto out_bad;
+	}
+
+	return 0;
+out_bad:
+	err_printf(ff, "%s: cannot set blocksize %u: %s\n", __func__,
+		   blocksize, strerror(set_error));
+	return -EIO;
+}
+
 static int fuse4fs_iomap_config_devices(struct fuse4fs *ff)
 {
 	errcode_t err;
@@ -6257,6 +6296,10 @@ static int fuse4fs_iomap_config_devices(struct fuse4fs *ff)
 	if (err)
 		return translate_error(ff->fs, 0, err);
 
+	ret = fuse4fs_set_bdev_blocksize(ff, fd);
+	if (ret)
+		return ret;
+
 	ret = fuse_lowlevel_iomap_device_add(ff->fuse, fd, 0);
 	if (ret < 0) {
 		dbg_printf(ff, "%s: cannot register iomap dev fd=%d, err=%d\n",
diff --git a/misc/fuse2fs.c b/misc/fuse2fs.c
index 32fcada0426752..9212235495dc22 100644
--- a/misc/fuse2fs.c
+++ b/misc/fuse2fs.c
@@ -5777,6 +5777,45 @@ static off_t fuse2fs_max_size(struct fuse2fs *ff, off_t upper_limit)
 	return res;
 }
 
+/*
+ * Set the block device's blocksize to the fs blocksize.
+ *
+ * This is required to avoid creating uptodate bdev pagecache that aliases file
+ * data blocks because iomap reads and writes directly to file data blocks.
+ */
+static int fuse2fs_set_bdev_blocksize(struct fuse2fs *ff, int fd)
+{
+	int blocksize = ff->fs->blocksize;
+	int set_error;
+	int ret;
+
+	ret = ioctl(fd, BLKBSZSET, &blocksize);
+	if (!ret)
+		return 0;
+
+	/*
+	 * Save the original errno so we can report that if the block device
+	 * blocksize isn't set in an agreeable way.
+	 */
+	set_error = errno;
+
+	ret = ioctl(fd, BLKBSZGET, &blocksize);
+	if (ret)
+		goto out_bad;
+
+	/* Pretend that BLKBSZSET rejected our proposed block size */
+	if (blocksize > ff->fs->blocksize) {
+		set_error = EINVAL;
+		goto out_bad;
+	}
+
+	return 0;
+out_bad:
+	err_printf(ff, "%s: cannot set blocksize %u: %s\n", __func__,
+		   blocksize, strerror(set_error));
+	return -EIO;
+}
+
 static int fuse2fs_iomap_config_devices(struct fuse2fs *ff)
 {
 	errcode_t err;
@@ -5787,6 +5826,10 @@ static int fuse2fs_iomap_config_devices(struct fuse2fs *ff)
 	if (err)
 		return translate_error(ff->fs, 0, err);
 
+	ret = fuse2fs_set_bdev_blocksize(ff, fd);
+	if (ret)
+		return ret;
+
 	ret = fuse_fs_iomap_device_add(fd, 0);
 	if (ret < 0) {
 		dbg_printf(ff, "%s: cannot register iomap dev fd=%d, err=%d\n",


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 15/17] fuse4fs: separate invalidation
  2025-09-16  0:22 ` [PATCHSET RFC v5 4/9] fuse2fs: use fuse iomap data paths for better file I/O performance Darrick J. Wong
                     ` (13 preceding siblings ...)
  2025-09-16  1:02   ` [PATCH 14/17] fuse2fs: configure block device block size Darrick J. Wong
@ 2025-09-16  1:02   ` Darrick J. Wong
  2025-09-16  1:02   ` [PATCH 16/17] fuse2fs: implement statx Darrick J. Wong
  2025-09-16  1:03   ` [PATCH 17/17] fuse2fs: enable atomic writes Darrick J. Wong
  16 siblings, 0 replies; 87+ messages in thread
From: Darrick J. Wong @ 2025-09-16  1:02 UTC (permalink / raw)
  To: tytso; +Cc: miklos, neal, linux-fsdevel, linux-ext4, John, bernd,
	joannelkoong

From: Darrick J. Wong <djwong@kernel.org>

Use the new stuff

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 fuse4fs/fuse4fs.c |   61 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 misc/fuse2fs.c    |   60 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 121 insertions(+)


diff --git a/fuse4fs/fuse4fs.c b/fuse4fs/fuse4fs.c
index 6f3ddceea85c27..c633bb9eca068a 100644
--- a/fuse4fs/fuse4fs.c
+++ b/fuse4fs/fuse4fs.c
@@ -274,6 +274,9 @@ struct fuse4fs {
 	enum fuse4fs_iomap_state iomap_state;
 	uint32_t iomap_dev;
 	uint64_t iomap_cap;
+	void (*old_alloc_stats)(ext2_filsys fs, blk64_t blk, int inuse);
+	void (*old_alloc_stats_range)(ext2_filsys fs, blk64_t blk, blk_t num,
+				      int inuse);
 #endif
 	unsigned int blockmask;
 	unsigned long offset;
@@ -6314,6 +6317,51 @@ static int fuse4fs_iomap_config_devices(struct fuse4fs *ff)
 	return 0;
 }
 
+static void fuse4fs_invalidate_bdev(struct fuse4fs *ff, blk64_t blk, blk_t num)
+{
+	off_t offset = FUSE4FS_FSB_TO_B(ff, blk);
+	off_t length = FUSE4FS_FSB_TO_B(ff, num);
+	int ret;
+
+	ret = fuse_lowlevel_iomap_device_invalidate(ff->fuse, ff->iomap_dev,
+						    offset, length);
+	if (!ret)
+		return;
+
+	if (num == 1)
+		err_printf(ff, "%s %llu: %s\n",
+			   _("error invalidating block"),
+			   (unsigned long long)blk,
+			   strerror(ret));
+	else
+		err_printf(ff, "%s %llu-%llu: %s\n",
+			   _("error invalidating blocks"),
+			   (unsigned long long)blk,
+			   (unsigned long long)blk + num - 1,
+			   strerror(ret));
+}
+
+static void fuse4fs_alloc_stats(ext2_filsys fs, blk64_t blk, int inuse)
+{
+	struct fuse4fs *ff = fs->priv_data;
+
+	if (inuse < 0)
+		fuse4fs_invalidate_bdev(ff, blk, 1);
+	if (ff->old_alloc_stats)
+		ff->old_alloc_stats(fs, blk, inuse);
+}
+
+static void fuse4fs_alloc_stats_range(ext2_filsys fs, blk64_t blk, blk_t num,
+				      int inuse)
+{
+	struct fuse4fs *ff = fs->priv_data;
+
+	if (inuse < 0)
+		fuse4fs_invalidate_bdev(ff, blk, num);
+	if (ff->old_alloc_stats_range)
+		ff->old_alloc_stats_range(fs, blk, num, inuse);
+}
+
 static void op_iomap_config(fuse_req_t req, uint64_t flags, uint64_t maxbytes)
 {
 	struct fuse_iomap_config cfg = { };
@@ -6358,6 +6406,19 @@ static void op_iomap_config(fuse_req_t req, uint64_t flags, uint64_t maxbytes)
 	if (ret)
 		goto out_unlock;
 
+	/*
+	 * If we let iomap do all file block IO, then we need to watch for
+	 * freed blocks so that we can invalidate any page cache that might
+	 * get written to the block deivce.
+	 */
+	if (fuse4fs_iomap_enabled(ff)) {
+		ext2fs_set_block_alloc_stats_callback(ff->fs,
+				fuse4fs_alloc_stats, &ff->old_alloc_stats);
+		ext2fs_set_block_alloc_stats_range_callback(ff->fs,
+				fuse4fs_alloc_stats_range,
+				&ff->old_alloc_stats_range);
+	}
+
 out_unlock:
 	fuse4fs_finish(ff, ret);
 	if (ret)
diff --git a/misc/fuse2fs.c b/misc/fuse2fs.c
index 9212235495dc22..1567c2e72279c2 100644
--- a/misc/fuse2fs.c
+++ b/misc/fuse2fs.c
@@ -268,6 +268,9 @@ struct fuse2fs {
 	enum fuse2fs_iomap_state iomap_state;
 	uint32_t iomap_dev;
 	uint64_t iomap_cap;
+	void (*old_alloc_stats)(ext2_filsys fs, blk64_t blk, int inuse);
+	void (*old_alloc_stats_range)(ext2_filsys fs, blk64_t blk, blk_t num,
+				      int inuse);
 #endif
 	unsigned int blockmask;
 	unsigned long offset;
@@ -5844,6 +5847,50 @@ static int fuse2fs_iomap_config_devices(struct fuse2fs *ff)
 	return 0;
 }
 
+static void fuse2fs_invalidate_bdev(struct fuse2fs *ff, blk64_t blk, blk_t num)
+{
+	off_t offset = FUSE2FS_FSB_TO_B(ff, blk);
+	off_t length = FUSE2FS_FSB_TO_B(ff, num);
+	int ret;
+
+	ret = fuse_fs_iomap_device_invalidate(ff->iomap_dev, offset, length);
+	if (!ret)
+		return;
+
+	if (num == 1)
+		err_printf(ff, "%s %llu: %s\n",
+			   _("error invalidating block"),
+			   (unsigned long long)blk,
+			   strerror(ret));
+	else
+		err_printf(ff, "%s %llu-%llu: %s\n",
+			   _("error invalidating blocks"),
+			   (unsigned long long)blk,
+			   (unsigned long long)blk + num - 1,
+			   strerror(ret));
+}
+
+static void fuse2fs_alloc_stats(ext2_filsys fs, blk64_t blk, int inuse)
+{
+	struct fuse2fs *ff = fs->priv_data;
+
+	if (inuse < 0)
+		fuse2fs_invalidate_bdev(ff, blk, 1);
+	if (ff->old_alloc_stats)
+		ff->old_alloc_stats(fs, blk, inuse);
+}
+
+static void fuse2fs_alloc_stats_range(ext2_filsys fs, blk64_t blk, blk_t num,
+				      int inuse)
+{
+	struct fuse2fs *ff = fs->priv_data;
+
+	if (inuse < 0)
+		fuse2fs_invalidate_bdev(ff, blk, num);
+	if (ff->old_alloc_stats_range)
+		ff->old_alloc_stats_range(fs, blk, num, inuse);
+}
+
 static int op_iomap_config(uint64_t flags, off_t maxbytes,
 			   struct fuse_iomap_config *cfg)
 {
@@ -5888,6 +5935,19 @@ static int op_iomap_config(uint64_t flags, off_t maxbytes,
 	if (ret)
 		goto out_unlock;
 
+	/*
+	 * If we let iomap do all file block IO, then we need to watch for
+	 * freed blocks so that we can invalidate any page cache that might
+	 * get written to the block deivce.
+	 */
+	if (fuse2fs_iomap_enabled(ff)) {
+		ext2fs_set_block_alloc_stats_callback(ff->fs,
+				fuse2fs_alloc_stats, &ff->old_alloc_stats);
+		ext2fs_set_block_alloc_stats_range_callback(ff->fs,
+				fuse2fs_alloc_stats_range,
+				&ff->old_alloc_stats_range);
+	}
+
 out_unlock:
 	fuse2fs_finish(ff, ret);
 	return ret;


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 16/17] fuse2fs: implement statx
  2025-09-16  0:22 ` [PATCHSET RFC v5 4/9] fuse2fs: use fuse iomap data paths for better file I/O performance Darrick J. Wong
                     ` (14 preceding siblings ...)
  2025-09-16  1:02   ` [PATCH 15/17] fuse4fs: separate invalidation Darrick J. Wong
@ 2025-09-16  1:02   ` Darrick J. Wong
  2025-09-16  1:03   ` [PATCH 17/17] fuse2fs: enable atomic writes Darrick J. Wong
  16 siblings, 0 replies; 87+ messages in thread
From: Darrick J. Wong @ 2025-09-16  1:02 UTC (permalink / raw)
  To: tytso; +Cc: miklos, neal, linux-fsdevel, linux-ext4, John, bernd,
	joannelkoong

From: Darrick J. Wong <djwong@kernel.org>

Implement statx.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 fuse4fs/fuse4fs.c |  133 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 misc/fuse2fs.c    |  128 +++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 261 insertions(+)


diff --git a/fuse4fs/fuse4fs.c b/fuse4fs/fuse4fs.c
index c633bb9eca068a..6c3e2992a04211 100644
--- a/fuse4fs/fuse4fs.c
+++ b/fuse4fs/fuse4fs.c
@@ -24,6 +24,7 @@
 #include <sys/xattr.h>
 #endif
 #include <sys/ioctl.h>
+#include <sys/sysmacros.h>
 #include <unistd.h>
 #include <ctype.h>
 #include <stdbool.h>
@@ -1831,6 +1832,135 @@ static void op_getattr(fuse_req_t req, fuse_ino_t fino,
 				       fstat.entry.attr_timeout);
 }
 
+#if FUSE_VERSION >= FUSE_MAKE_VERSION(3, 18) && defined(STATX_BASIC_STATS)
+static inline void fuse4fs_set_statx_attr(struct statx *stx,
+					  uint64_t statx_flag, int set)
+{
+	if (set)
+		stx->stx_attributes |= statx_flag;
+	stx->stx_attributes_mask |= statx_flag;
+}
+
+static void fuse4fs_statx_directio(struct fuse4fs *ff, struct statx *stx)
+{
+	struct statx devx;
+	errcode_t err;
+	int fd;
+
+	err = io_channel_get_fd(ff->fs->io, &fd);
+	if (err)
+		return;
+
+	err = statx(fd, "", AT_EMPTY_PATH, STATX_DIOALIGN, &devx);
+	if (err)
+		return;
+	if (!(devx.stx_mask & STATX_DIOALIGN))
+		return;
+
+	stx->stx_mask |= STATX_DIOALIGN;
+	stx->stx_dio_mem_align = devx.stx_dio_mem_align;
+	stx->stx_dio_offset_align = devx.stx_dio_offset_align;
+}
+
+static int fuse4fs_statx(struct fuse4fs *ff, ext2_ino_t ino, int statx_mask,
+			 struct statx *stx)
+{
+	struct ext2_inode_large inode;
+	ext2_filsys fs = ff->fs;;
+	dev_t fakedev = 0;
+	errcode_t err;
+	struct timespec tv;
+
+	err = fuse4fs_read_inode(fs, ino, &inode);
+	if (err)
+		return translate_error(fs, ino, err);
+
+	memcpy(&fakedev, fs->super->s_uuid, sizeof(fakedev));
+	stx->stx_mask = STATX_BASIC_STATS | STATX_BTIME;
+	stx->stx_dev_major = major(fakedev);
+	stx->stx_dev_minor = minor(fakedev);
+	stx->stx_ino = ino;
+	stx->stx_mode = inode.i_mode;
+	stx->stx_nlink = inode.i_links_count;
+	stx->stx_uid = inode_uid(inode);
+	stx->stx_gid = inode_gid(inode);
+	stx->stx_size = EXT2_I_SIZE(&inode);
+	stx->stx_blksize = fs->blocksize;
+	stx->stx_blocks = ext2fs_get_stat_i_blocks(fs,
+						EXT2_INODE(&inode));
+	EXT4_INODE_GET_XTIME(i_atime, &tv, &inode);
+	stx->stx_atime.tv_sec = tv.tv_sec;
+	stx->stx_atime.tv_nsec = tv.tv_nsec;
+
+	EXT4_INODE_GET_XTIME(i_mtime, &tv, &inode);
+	stx->stx_mtime.tv_sec = tv.tv_sec;
+	stx->stx_mtime.tv_nsec = tv.tv_nsec;
+
+	EXT4_INODE_GET_XTIME(i_ctime, &tv, &inode);
+	stx->stx_ctime.tv_sec = tv.tv_sec;
+	stx->stx_ctime.tv_nsec = tv.tv_nsec;
+
+	EXT4_INODE_GET_XTIME(i_crtime, &tv, &inode);
+	stx->stx_btime.tv_sec = tv.tv_sec;
+	stx->stx_btime.tv_nsec = tv.tv_nsec;
+
+	dbg_printf(ff, "%s: ino=%d atime=%lld.%d mtime=%lld.%d ctime=%lld.%d btime=%lld.%d\n",
+		   __func__, ino,
+		   (long long int)stx->stx_atime.tv_sec, stx->stx_atime.tv_nsec,
+		   (long long int)stx->stx_mtime.tv_sec, stx->stx_mtime.tv_nsec,
+		   (long long int)stx->stx_ctime.tv_sec, stx->stx_ctime.tv_nsec,
+		   (long long int)stx->stx_btime.tv_sec, stx->stx_btime.tv_nsec);
+
+	if (LINUX_S_ISCHR(inode.i_mode) ||
+	    LINUX_S_ISBLK(inode.i_mode)) {
+		if (inode.i_block[0]) {
+			stx->stx_rdev_major = major(inode.i_block[0]);
+			stx->stx_rdev_minor = minor(inode.i_block[0]);
+		} else {
+			stx->stx_rdev_major = major(inode.i_block[1]);
+			stx->stx_rdev_minor = minor(inode.i_block[1]);
+		}
+	}
+
+	fuse4fs_set_statx_attr(stx, STATX_ATTR_COMPRESSED,
+			       inode.i_flags & EXT2_COMPR_FL);
+	fuse4fs_set_statx_attr(stx, STATX_ATTR_IMMUTABLE,
+			       inode.i_flags & EXT2_IMMUTABLE_FL);
+	fuse4fs_set_statx_attr(stx, STATX_ATTR_APPEND,
+			       inode.i_flags & EXT2_APPEND_FL);
+	fuse4fs_set_statx_attr(stx, STATX_ATTR_NODUMP,
+			       inode.i_flags & EXT2_NODUMP_FL);
+
+	fuse4fs_statx_directio(ff, stx);
+
+	return 0;
+}
+
+static void op_statx(fuse_req_t req, fuse_ino_t fino, int flags, int mask,
+		     struct fuse_file_info *fi)
+{
+	struct statx stx;
+	struct fuse4fs *ff = fuse4fs_get(req);
+	ext2_ino_t ino;
+	int ret = 0;
+
+	FUSE4FS_CHECK_CONTEXT(req);
+	FUSE4FS_CONVERT_FINO(req, &ino, fino);
+	fuse4fs_start(ff);
+	ret = fuse4fs_statx(ff, ino, mask, &stx);
+	if (ret)
+		goto out;
+out:
+	fuse4fs_finish(ff, ret);
+	if (ret)
+		fuse_reply_err(req, -ret);
+	else
+		fuse_reply_statx(req, 0, &stx, FUSE4FS_ATTR_TIMEOUT);
+}
+#else
+# define op_statx		NULL
+#endif
+
 static void op_readlink(fuse_req_t req, fuse_ino_t fino)
 {
 	struct ext2_inode inode;
@@ -6834,6 +6964,9 @@ static struct fuse_lowlevel_ops fs_ops = {
 #ifdef SUPPORT_FALLOCATE
 	.fallocate = op_fallocate,
 #endif
+#if FUSE_VERSION >= FUSE_MAKE_VERSION(3, 18)
+	.statx = op_statx,
+#endif
 #ifdef HAVE_FUSE_IOMAP
 	.iomap_begin = op_iomap_begin,
 	.iomap_end = op_iomap_end,
diff --git a/misc/fuse2fs.c b/misc/fuse2fs.c
index 1567c2e72279c2..d6bf7357653acd 100644
--- a/misc/fuse2fs.c
+++ b/misc/fuse2fs.c
@@ -23,6 +23,7 @@
 #include <sys/xattr.h>
 #endif
 #include <sys/ioctl.h>
+#include <sys/sysmacros.h>
 #include <unistd.h>
 #include <ctype.h>
 #include <stdbool.h>
@@ -1638,6 +1639,130 @@ static int op_getattr_iflags(const char *path, struct stat *statbuf,
 }
 #endif
 
+#if FUSE_VERSION >= FUSE_MAKE_VERSION(3, 18) && defined(STATX_BASIC_STATS)
+static inline void fuse2fs_set_statx_attr(struct statx *stx,
+					  uint64_t statx_flag, int set)
+{
+	if (set)
+		stx->stx_attributes |= statx_flag;
+	stx->stx_attributes_mask |= statx_flag;
+}
+
+static void fuse2fs_statx_directio(struct fuse2fs *ff, struct statx *stx)
+{
+	struct statx devx;
+	errcode_t err;
+	int fd;
+
+	err = io_channel_get_fd(ff->fs->io, &fd);
+	if (err)
+		return;
+
+	err = statx(fd, "", AT_EMPTY_PATH, STATX_DIOALIGN, &devx);
+	if (err)
+		return;
+	if (!(devx.stx_mask & STATX_DIOALIGN))
+		return;
+
+	stx->stx_mask |= STATX_DIOALIGN;
+	stx->stx_dio_mem_align = devx.stx_dio_mem_align;
+	stx->stx_dio_offset_align = devx.stx_dio_offset_align;
+}
+
+static int fuse2fs_statx(struct fuse2fs *ff, ext2_ino_t ino, int statx_mask,
+			 struct statx *stx)
+{
+	struct ext2_inode_large inode;
+	ext2_filsys fs = ff->fs;;
+	dev_t fakedev = 0;
+	errcode_t err;
+	struct timespec tv;
+
+	err = fuse2fs_read_inode(fs, ino, &inode);
+	if (err)
+		return translate_error(fs, ino, err);
+
+	memcpy(&fakedev, fs->super->s_uuid, sizeof(fakedev));
+	stx->stx_mask = STATX_BASIC_STATS | STATX_BTIME;
+	stx->stx_dev_major = major(fakedev);
+	stx->stx_dev_minor = minor(fakedev);
+	stx->stx_ino = ino;
+	stx->stx_mode = inode.i_mode;
+	stx->stx_nlink = inode.i_links_count;
+	stx->stx_uid = inode_uid(inode);
+	stx->stx_gid = inode_gid(inode);
+	stx->stx_size = EXT2_I_SIZE(&inode);
+	stx->stx_blksize = fs->blocksize;
+	stx->stx_blocks = ext2fs_get_stat_i_blocks(fs,
+						EXT2_INODE(&inode));
+	EXT4_INODE_GET_XTIME(i_atime, &tv, &inode);
+	stx->stx_atime.tv_sec = tv.tv_sec;
+	stx->stx_atime.tv_nsec = tv.tv_nsec;
+
+	EXT4_INODE_GET_XTIME(i_mtime, &tv, &inode);
+	stx->stx_mtime.tv_sec = tv.tv_sec;
+	stx->stx_mtime.tv_nsec = tv.tv_nsec;
+
+	EXT4_INODE_GET_XTIME(i_ctime, &tv, &inode);
+	stx->stx_ctime.tv_sec = tv.tv_sec;
+	stx->stx_ctime.tv_nsec = tv.tv_nsec;
+
+	EXT4_INODE_GET_XTIME(i_crtime, &tv, &inode);
+	stx->stx_btime.tv_sec = tv.tv_sec;
+	stx->stx_btime.tv_nsec = tv.tv_nsec;
+
+	dbg_printf(ff, "%s: ino=%d atime=%lld.%d mtime=%lld.%d ctime=%lld.%d btime=%lld.%d\n",
+		   __func__, ino,
+		   (long long int)stx->stx_atime.tv_sec, stx->stx_atime.tv_nsec,
+		   (long long int)stx->stx_mtime.tv_sec, stx->stx_mtime.tv_nsec,
+		   (long long int)stx->stx_ctime.tv_sec, stx->stx_ctime.tv_nsec,
+		   (long long int)stx->stx_btime.tv_sec, stx->stx_btime.tv_nsec);
+
+	if (LINUX_S_ISCHR(inode.i_mode) ||
+	    LINUX_S_ISBLK(inode.i_mode)) {
+		if (inode.i_block[0]) {
+			stx->stx_rdev_major = major(inode.i_block[0]);
+			stx->stx_rdev_minor = minor(inode.i_block[0]);
+		} else {
+			stx->stx_rdev_major = major(inode.i_block[1]);
+			stx->stx_rdev_minor = minor(inode.i_block[1]);
+		}
+	}
+
+	fuse2fs_set_statx_attr(stx, STATX_ATTR_COMPRESSED,
+			       inode.i_flags & EXT2_COMPR_FL);
+	fuse2fs_set_statx_attr(stx, STATX_ATTR_IMMUTABLE,
+			       inode.i_flags & EXT2_IMMUTABLE_FL);
+	fuse2fs_set_statx_attr(stx, STATX_ATTR_APPEND,
+			       inode.i_flags & EXT2_APPEND_FL);
+	fuse2fs_set_statx_attr(stx, STATX_ATTR_NODUMP,
+			       inode.i_flags & EXT2_NODUMP_FL);
+
+	fuse2fs_statx_directio(ff, stx);
+
+	return 0;
+}
+
+static int op_statx(const char *path, int statx_flags, int statx_mask,
+		    struct statx *stx, struct fuse_file_info *fi)
+{
+	struct fuse2fs *ff = fuse2fs_get();
+	ext2_ino_t ino;
+	int ret = 0;
+
+	FUSE2FS_CHECK_CONTEXT(ff);
+	fuse2fs_start(ff);
+	ret = fuse2fs_file_ino(ff, path, fi, &ino);
+	if (ret)
+		goto out;
+	ret = fuse2fs_statx(ff, ino, statx_mask, stx);
+out:
+	fuse2fs_finish(ff, ret);
+	return ret;
+}
+#else
+# define op_statx		NULL
+#endif
 
 static int op_readlink(const char *path, char *buf, size_t len)
 {
@@ -6358,6 +6483,9 @@ static struct fuse_operations fs_ops = {
 #ifdef SUPPORT_FALLOCATE
 	.fallocate = op_fallocate,
 #endif
+#if FUSE_VERSION >= FUSE_MAKE_VERSION(3, 18)
+	.statx = op_statx,
+#endif
 #if FUSE_VERSION >= FUSE_MAKE_VERSION(3, 99)
 	.getattr_iflags = op_getattr_iflags,
 #endif


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 17/17] fuse2fs: enable atomic writes
  2025-09-16  0:22 ` [PATCHSET RFC v5 4/9] fuse2fs: use fuse iomap data paths for better file I/O performance Darrick J. Wong
                     ` (15 preceding siblings ...)
  2025-09-16  1:02   ` [PATCH 16/17] fuse2fs: implement statx Darrick J. Wong
@ 2025-09-16  1:03   ` Darrick J. Wong
  16 siblings, 0 replies; 87+ messages in thread
From: Darrick J. Wong @ 2025-09-16  1:03 UTC (permalink / raw)
  To: tytso; +Cc: miklos, neal, linux-fsdevel, linux-ext4, John, bernd,
	joannelkoong

From: Darrick J. Wong <djwong@kernel.org>

Advertise the single-fsblock atomic write capability that iomap can do.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 fuse4fs/fuse4fs.c |   67 +++++++++++++++++++++++++++++++++++++++++++++++++++-
 misc/fuse2fs.c    |   68 ++++++++++++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 133 insertions(+), 2 deletions(-)


diff --git a/fuse4fs/fuse4fs.c b/fuse4fs/fuse4fs.c
index 6c3e2992a04211..abbc67bccef786 100644
--- a/fuse4fs/fuse4fs.c
+++ b/fuse4fs/fuse4fs.c
@@ -278,6 +278,9 @@ struct fuse4fs {
 	void (*old_alloc_stats)(ext2_filsys fs, blk64_t blk, int inuse);
 	void (*old_alloc_stats_range)(ext2_filsys fs, blk64_t blk, blk_t num,
 				      int inuse);
+#ifdef STATX_WRITE_ATOMIC
+	unsigned int awu_min, awu_max;
+#endif
 #endif
 	unsigned int blockmask;
 	unsigned long offset;
@@ -736,8 +739,20 @@ static inline int fuse4fs_iomap_enabled(const struct fuse4fs *ff)
 {
 	return ff->iomap_state >= IOMAP_ENABLED;
 }
+
+static inline int fuse4fs_iomap_can_hw_atomic(const struct fuse4fs *ff)
+{
+	return fuse4fs_iomap_enabled(ff) &&
+	       (ff->iomap_cap & FUSE_IOMAP_SUPPORT_ATOMIC) &&
+#ifdef STATX_WRITE_ATOMIC
+		ff->awu_min > 0 && ff->awu_min > 0;
+#else
+		0;
+#endif
+}
 #else
 # define fuse4fs_iomap_enabled(...)	(0)
+# define fuse4fs_iomap_can_hw_atomic(...)	(0)
 #endif
 
 static inline void fuse4fs_dump_extents(struct fuse4fs *ff, ext2_ino_t ino,
@@ -1757,8 +1772,12 @@ static int fuse4fs_stat_inode(struct fuse4fs *ff, ext2_ino_t ino,
 
 	fstat->iflags = 0;
 #ifdef HAVE_FUSE_IOMAP
-	if (fuse4fs_iomap_enabled(ff))
+	if (fuse4fs_iomap_enabled(ff)) {
 		fstat->iflags |= FUSE_IFLAG_IOMAP;
+
+		if (fuse4fs_iomap_can_hw_atomic(ff))
+			fstat->iflags |= FUSE_IFLAG_ATOMIC;
+	}
 #endif
 
 	return 0;
@@ -1933,6 +1952,15 @@ static int fuse4fs_statx(struct fuse4fs *ff, ext2_ino_t ino, int statx_mask,
 
 	fuse4fs_statx_directio(ff, stx);
 
+#ifdef STATX_WRITE_ATOMIC
+	if (fuse4fs_iomap_can_hw_atomic(ff)) {
+		stx->stx_mask |= STATX_WRITE_ATOMIC;
+		stx->stx_atomic_write_unit_min = ff->awu_min;
+		stx->stx_atomic_write_unit_max = ff->awu_max;
+		stx->stx_atomic_write_segments_max = 1;
+	}
+#endif
+
 	return 0;
 }
 
@@ -6255,6 +6283,9 @@ static void op_iomap_begin(fuse_req_t req, fuse_ino_t fino, uint64_t dontcare,
 		}
 	}
 
+	if (opflags & FUSE_IOMAP_OP_ATOMIC)
+		read.flags |= FUSE_IOMAP_F_ATOMIC_BIO;
+
 out_unlock:
 	fuse4fs_finish(ff, ret);
 	if (ret)
@@ -6419,6 +6450,38 @@ static int fuse4fs_set_bdev_blocksize(struct fuse4fs *ff, int fd)
 	return -EIO;
 }
 
+#ifdef STATX_WRITE_ATOMIC
+static void fuse4fs_configure_atomic_write(struct fuse4fs *ff, int bdev_fd)
+{
+	struct statx devx;
+	unsigned int awu_min, awu_max;
+	int ret;
+
+	if (!ext2fs_has_feature_extents(ff->fs->super))
+		return;
+
+	ret = statx(bdev_fd, "", AT_EMPTY_PATH, STATX_WRITE_ATOMIC, &devx);
+	if (ret)
+		return;
+	if (!(devx.stx_mask & STATX_WRITE_ATOMIC))
+		return;
+
+	awu_min = max(ff->fs->blocksize, devx.stx_atomic_write_unit_min);
+	awu_max = min(ff->fs->blocksize, devx.stx_atomic_write_unit_max);
+	if (awu_min > awu_max)
+		return;
+
+	log_printf(ff, "%s awu_min: %u, awu_max: %u\n",
+		   _("Supports (experimental) DIO atomic writes"),
+		   awu_min, awu_max);
+
+	ff->awu_min = awu_min;
+	ff->awu_max = awu_max;
+}
+#else
+# define fuse4fs_configure_atomic_write(...)	((void)0)
+#endif
+
 static int fuse4fs_iomap_config_devices(struct fuse4fs *ff)
 {
 	errcode_t err;
@@ -6443,6 +6506,8 @@ static int fuse4fs_iomap_config_devices(struct fuse4fs *ff)
 	dbg_printf(ff, "%s: registered iomap dev fd=%d iomap_dev=%u\n",
 		   __func__, fd, ff->iomap_dev);
 
+	fuse4fs_configure_atomic_write(ff, fd);
+
 	ff->iomap_dev = ret;
 	return 0;
 }
diff --git a/misc/fuse2fs.c b/misc/fuse2fs.c
index d6bf7357653acd..0832a758bdad79 100644
--- a/misc/fuse2fs.c
+++ b/misc/fuse2fs.c
@@ -272,6 +272,9 @@ struct fuse2fs {
 	void (*old_alloc_stats)(ext2_filsys fs, blk64_t blk, int inuse);
 	void (*old_alloc_stats_range)(ext2_filsys fs, blk64_t blk, blk_t num,
 				      int inuse);
+#ifdef STATX_WRITE_ATOMIC
+	unsigned int awu_min, awu_max;
+#endif
 #endif
 	unsigned int blockmask;
 	unsigned long offset;
@@ -573,9 +576,21 @@ static inline int fuse2fs_iomap_enabled(const struct fuse2fs *ff)
 {
 	return ff->iomap_state >= IOMAP_ENABLED;
 }
+
+static inline int fuse2fs_iomap_can_hw_atomic(const struct fuse2fs *ff)
+{
+	return fuse2fs_iomap_enabled(ff) &&
+	       (ff->iomap_cap & FUSE_IOMAP_SUPPORT_ATOMIC) &&
+#ifdef STATX_WRITE_ATOMIC
+		ff->awu_min > 0 && ff->awu_min > 0;
+#else
+		0;
+#endif
+}
 #else
 # define fuse2fs_iomap_enabled(...)	(0)
 # define fuse2fs_iomap_enabled(...)	(0)
+# define fuse2fs_iomap_can_hw_atomic(...)	(0)
 #endif
 
 static inline void fuse2fs_dump_extents(struct fuse2fs *ff, ext2_ino_t ino,
@@ -1627,14 +1642,19 @@ static int op_getattr(const char *path, struct stat *statbuf,
 static int op_getattr_iflags(const char *path, struct stat *statbuf,
 			     unsigned int *iflags, struct fuse_file_info *fi)
 {
+	struct fuse2fs *ff = fuse2fs_get();
 	int ret = op_getattr(path, statbuf, fi);
 
 	if (ret)
 		return ret;
 
-	if (fuse_fs_can_enable_iomap(statbuf))
+	if (fuse_fs_can_enable_iomap(statbuf)) {
 		*iflags |= FUSE_IFLAG_IOMAP;
 
+		if (fuse2fs_iomap_can_hw_atomic(ff))
+			*iflags |= FUSE_IFLAG_ATOMIC;
+	}
+
 	return 0;
 }
 #endif
@@ -1740,6 +1760,15 @@ static int fuse2fs_statx(struct fuse2fs *ff, ext2_ino_t ino, int statx_mask,
 
 	fuse2fs_statx_directio(ff, stx);
 
+#ifdef STATX_WRITE_ATOMIC
+	if (fuse_fs_can_enable_iomapx(stx) && fuse2fs_iomap_can_hw_atomic(ff)) {
+		stx->stx_mask |= STATX_WRITE_ATOMIC;
+		stx->stx_atomic_write_unit_min = ff->awu_min;
+		stx->stx_atomic_write_unit_max = ff->awu_max;
+		stx->stx_atomic_write_segments_max = 1;
+	}
+#endif
+
 	return 0;
 }
 
@@ -5783,6 +5812,9 @@ static int op_iomap_begin(const char *path, uint64_t nodeid, uint64_t attr_ino,
 		}
 	}
 
+	if (opflags & FUSE_IOMAP_OP_ATOMIC)
+		read->flags |= FUSE_IOMAP_F_ATOMIC_BIO;
+
 out_unlock:
 	fuse2fs_finish(ff, ret);
 	return ret;
@@ -5944,6 +5976,38 @@ static int fuse2fs_set_bdev_blocksize(struct fuse2fs *ff, int fd)
 	return -EIO;
 }
 
+#ifdef STATX_WRITE_ATOMIC
+static void fuse2fs_configure_atomic_write(struct fuse2fs *ff, int bdev_fd)
+{
+	struct statx devx;
+	unsigned int awu_min, awu_max;
+	int ret;
+
+	if (!ext2fs_has_feature_extents(ff->fs->super))
+		return;
+
+	ret = statx(bdev_fd, "", AT_EMPTY_PATH, STATX_WRITE_ATOMIC, &devx);
+	if (ret)
+		return;
+	if (!(devx.stx_mask & STATX_WRITE_ATOMIC))
+		return;
+
+	awu_min = max(ff->fs->blocksize, devx.stx_atomic_write_unit_min);
+	awu_max = min(ff->fs->blocksize, devx.stx_atomic_write_unit_max);
+	if (awu_min > awu_max)
+		return;
+
+	log_printf(ff, "%s awu_min: %u, awu_max: %u\n",
+		   _("Supports (experimental) DIO atomic writes"),
+		   awu_min, awu_max);
+
+	ff->awu_min = awu_min;
+	ff->awu_max = awu_max;
+}
+#else
+# define fuse2fs_configure_atomic_write(...)	((void)0)
+#endif
+
 static int fuse2fs_iomap_config_devices(struct fuse2fs *ff)
 {
 	errcode_t err;
@@ -5968,6 +6032,8 @@ static int fuse2fs_iomap_config_devices(struct fuse2fs *ff)
 	dbg_printf(ff, "%s: registered iomap dev fd=%d iomap_dev=%u\n",
 		   __func__, fd, ff->iomap_dev);
 
+	fuse2fs_configure_atomic_write(ff, fd);
+
 	ff->iomap_dev = ret;
 	return 0;
 }


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 1/1] fuse4fs: don't use inode number translation when possible
  2025-09-16  0:22 ` [PATCHSET RFC v5 5/9] fuse4fs: specify the root node id Darrick J. Wong
@ 2025-09-16  1:03   ` Darrick J. Wong
  0 siblings, 0 replies; 87+ messages in thread
From: Darrick J. Wong @ 2025-09-16  1:03 UTC (permalink / raw)
  To: tytso; +Cc: miklos, neal, linux-fsdevel, linux-ext4, John, bernd,
	joannelkoong

From: Darrick J. Wong <djwong@kernel.org>

Prior to the integration of iomap into fuse, the fuse client (aka the
kernel) required that the root directory have an inumber of
FUSE_ROOT_ID, which is 1.  However, the ext2 filesystem defines the root
inode number to be EXT2_ROOT_INO, which is 2.  This dissonance means
that we have to have translator functions, and that any access to
inumber 1 (the ext2 badblocks file) will instead redirect to the root
directory.

That's horrible.  Use the new mount option to set the root directory
nodeid to EXT2_ROOT_INO so that we don't need this translation.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 fuse4fs/fuse4fs.c |   29 +++++++++++++++++++++++------
 1 file changed, 23 insertions(+), 6 deletions(-)


diff --git a/fuse4fs/fuse4fs.c b/fuse4fs/fuse4fs.c
index abbc67bccef786..3be19f59fc3976 100644
--- a/fuse4fs/fuse4fs.c
+++ b/fuse4fs/fuse4fs.c
@@ -266,6 +266,7 @@ struct fuse4fs {
 	int dirsync;
 	int unmount_in_destroy;
 	int noblkdev;
+	int translate_inums;
 
 	enum fuse4fs_opstate opstate;
 	int logfd;
@@ -335,17 +336,19 @@ struct fuse4fs {
 #define FUSE4FS_CHECK_CONTEXT_ABORT(ff) \
 	__FUSE4FS_CHECK_CONTEXT((ff), abort(), abort())
 
-static inline void fuse4fs_ino_from_fuse(ext2_ino_t *inop, fuse_ino_t fino)
+static inline void fuse4fs_ino_from_fuse(const struct fuse4fs *ff,
+					 ext2_ino_t *inop, fuse_ino_t fino)
 {
-	if (fino == FUSE_ROOT_ID)
+	if (ff->translate_inums && fino == FUSE_ROOT_ID)
 		*inop = EXT2_ROOT_INO;
 	else
 		*inop = fino;
 }
 
-static inline void fuse4fs_ino_to_fuse(fuse_ino_t *finop, ext2_ino_t ino)
+static inline void fuse4fs_ino_to_fuse(const struct fuse4fs *ff,
+				       fuse_ino_t *finop, ext2_ino_t ino)
 {
-	if (ino == EXT2_ROOT_INO)
+	if (ff->translate_inums && ino == EXT2_ROOT_INO)
 		*finop = FUSE_ROOT_ID;
 	else
 		*finop = ino;
@@ -361,7 +364,7 @@ static inline void fuse4fs_ino_to_fuse(fuse_ino_t *finop, ext2_ino_t ino)
 			fuse_reply_err((req), EIO); \
 			return; \
 		} \
-		fuse4fs_ino_from_fuse(ext2_inop, fuse_ino); \
+		fuse4fs_ino_from_fuse(fuse4fs_get(req), ext2_inop, fuse_ino); \
 	} while (0)
 
 static int __translate_error(ext2_filsys fs, ext2_ino_t ino, errcode_t err,
@@ -1765,7 +1768,7 @@ static int fuse4fs_stat_inode(struct fuse4fs *ff, ext2_ino_t ino,
 			statbuf->st_rdev = inodep->i_block[1];
 	}
 
-	fuse4fs_ino_to_fuse(&entry->ino, ino);
+	fuse4fs_ino_to_fuse(ff, &entry->ino, ino);
 	entry->generation = inodep->i_generation;
 	entry->attr_timeout = FUSE4FS_ATTR_TIMEOUT;
 	entry->entry_timeout = FUSE4FS_ATTR_TIMEOUT;
@@ -7410,6 +7413,7 @@ int main(int argc, char *argv[])
 		.iomap_state = IOMAP_UNKNOWN,
 		.iomap_dev = FUSE_IOMAP_DEV_NULL,
 #endif
+		.translate_inums = 1,
 	};
 	errcode_t err;
 	FILE *orig_stderr = stderr;
@@ -7511,6 +7515,19 @@ int main(int argc, char *argv[])
 		fctx.unmount_in_destroy = 1;
 	}
 
+	if (iomap_detected) {
+		/*
+		 * The root_nodeid mount option was added when iomap support
+		 * was added to fuse.  This enables us to control the root
+		 * nodeid in the kernel, which enables a 1:1 translation of
+		 * ext2 to kernel inumbers.
+		 */
+		snprintf(extra_args, BUFSIZ, "-oroot_nodeid=%d",
+			 EXT2_ROOT_INO);
+		fuse_opt_add_arg(&args, extra_args);
+		fctx.translate_inums = 0;
+	}
+
 	if (!fctx.cache_size)
 		fctx.cache_size = default_cache_size();
 	if (fctx.cache_size) {


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 01/10] fuse2fs: add strictatime/lazytime mount options
  2025-09-16  0:23 ` [PATCHSET RFC v5 6/9] fuse2fs: handle timestamps and ACLs correctly when iomap is enabled Darrick J. Wong
@ 2025-09-16  1:03   ` Darrick J. Wong
  2025-09-16  1:03   ` [PATCH 02/10] fuse2fs: skip permission checking on utimens when iomap is enabled Darrick J. Wong
                     ` (8 subsequent siblings)
  9 siblings, 0 replies; 87+ messages in thread
From: Darrick J. Wong @ 2025-09-16  1:03 UTC (permalink / raw)
  To: tytso; +Cc: miklos, neal, linux-fsdevel, linux-ext4, John, bernd,
	joannelkoong

From: Darrick J. Wong <djwong@kernel.org>

In iomap mode, we can support the strictatime/lazytime mount options.
Add them to fuse2fs.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 fuse4fs/fuse4fs.1.in |    6 ++++++
 fuse4fs/fuse4fs.c    |   28 ++++++++++++++++++++++++++++
 misc/fuse2fs.1.in    |    6 ++++++
 misc/fuse2fs.c       |   28 ++++++++++++++++++++++++++++
 4 files changed, 68 insertions(+)


diff --git a/fuse4fs/fuse4fs.1.in b/fuse4fs/fuse4fs.1.in
index 8855867d27101d..119cbcc903d8af 100644
--- a/fuse4fs/fuse4fs.1.in
+++ b/fuse4fs/fuse4fs.1.in
@@ -90,6 +90,9 @@ .SS "fuse4fs options:"
 .I nosuid
 ) later.
 .TP
+\fB-o\fR lazytime
+if iomap is enabled, enable lazy updates of timestamps
+.TP
 \fB-o\fR lockfile=path
 use this file to control access to the filesystem
 .TP
@@ -98,6 +101,9 @@ .SS "fuse4fs options:"
 .TP
 \fB-o\fR norecovery
 do not replay the journal and mount the file system read-only
+.TP
+\fB-o\fR strictatime
+if iomap is enabled, update atime on every access
 .SS "FUSE options:"
 .TP
 \fB-d -o\fR debug
diff --git a/fuse4fs/fuse4fs.c b/fuse4fs/fuse4fs.c
index 3be19f59fc3976..bc2cf41085695f 100644
--- a/fuse4fs/fuse4fs.c
+++ b/fuse4fs/fuse4fs.c
@@ -267,6 +267,7 @@ struct fuse4fs {
 	int unmount_in_destroy;
 	int noblkdev;
 	int translate_inums;
+	int iomap_passthrough_options;
 
 	enum fuse4fs_opstate opstate;
 	int logfd;
@@ -1648,6 +1649,8 @@ static void fuse4fs_iomap_enable(struct fuse_conn_info *conn,
 	if (!fuse4fs_iomap_enabled(ff)) {
 		if (ff->iomap_want == FT_ENABLE)
 			err_printf(ff, "%s\n", _("Could not enable iomap."));
+		if (ff->iomap_passthrough_options)
+			err_printf(ff, "%s\n", _("Some mount options require iomap."));
 		return;
 	}
 }
@@ -7070,6 +7073,7 @@ enum {
 	FUSE4FS_ERRORS_BEHAVIOR,
 #ifdef HAVE_FUSE_IOMAP
 	FUSE4FS_IOMAP,
+	FUSE4FS_IOMAP_PASSTHROUGH,
 #endif
 };
 
@@ -7096,6 +7100,17 @@ static struct fuse_opt fuse4fs_opts[] = {
 #endif
 	FUSE4FS_OPT("noblkdev",		noblkdev,		1),
 
+#ifdef HAVE_FUSE_IOMAP
+#ifdef MS_LAZYTIME
+	FUSE_OPT_KEY("lazytime",	FUSE4FS_IOMAP_PASSTHROUGH),
+	FUSE_OPT_KEY("nolazytime",	FUSE4FS_IOMAP_PASSTHROUGH),
+#endif
+#ifdef MS_STRICTATIME
+	FUSE_OPT_KEY("strictatime",	FUSE4FS_IOMAP_PASSTHROUGH),
+	FUSE_OPT_KEY("nostrictatime",	FUSE4FS_IOMAP_PASSTHROUGH),
+#endif
+#endif
+
 	FUSE_OPT_KEY("user_xattr",	FUSE4FS_IGNORED),
 	FUSE_OPT_KEY("noblock_validity", FUSE4FS_IGNORED),
 	FUSE_OPT_KEY("nodelalloc",	FUSE4FS_IGNORED),
@@ -7122,6 +7137,12 @@ static int fuse4fs_opt_proc(void *data, const char *arg,
 	struct fuse4fs *ff = data;
 
 	switch (key) {
+#ifdef HAVE_FUSE_IOMAP
+	case FUSE4FS_IOMAP_PASSTHROUGH:
+		ff->iomap_passthrough_options = 1;
+		/* pass through to libfuse */
+		return 1;
+#endif
 	case FUSE4FS_DIRSYNC:
 		ff->dirsync = 1;
 		/* pass through to libfuse */
@@ -7515,6 +7536,13 @@ int main(int argc, char *argv[])
 		fctx.unmount_in_destroy = 1;
 	}
 
+	if (fctx.iomap_passthrough_options && !iomap_detected) {
+		err_printf(&fctx, "%s\n",
+			   _("Some mount options require iomap."));
+		ret |= 1;
+		goto out;
+	}
+
 	if (iomap_detected) {
 		/*
 		 * The root_nodeid mount option was added when iomap support
diff --git a/misc/fuse2fs.1.in b/misc/fuse2fs.1.in
index 2b55fa0e723966..0c0934f03c9543 100644
--- a/misc/fuse2fs.1.in
+++ b/misc/fuse2fs.1.in
@@ -90,6 +90,9 @@ .SS "fuse2fs options:"
 .I nosuid
 ) later.
 .TP
+\fB-o\fR lazytime
+if iomap is enabled, enable lazy updates of timestamps
+.TP
 \fB-o\fR lockfile=path
 use this file to control access to the filesystem
 .TP
@@ -98,6 +101,9 @@ .SS "fuse2fs options:"
 .TP
 \fB-o\fR norecovery
 do not replay the journal and mount the file system read-only
+.TP
+\fB-o\fR strictatime
+if iomap is enabled, update atime on every access
 .SS "FUSE options:"
 .TP
 \fB-d -o\fR debug
diff --git a/misc/fuse2fs.c b/misc/fuse2fs.c
index 0832a758bdad79..8f7194f4f815ee 100644
--- a/misc/fuse2fs.c
+++ b/misc/fuse2fs.c
@@ -260,6 +260,7 @@ struct fuse2fs {
 	int dirsync;
 	int unmount_in_destroy;
 	int noblkdev;
+	int iomap_passthrough_options;
 
 	enum fuse2fs_opstate opstate;
 	int logfd;
@@ -1453,6 +1454,8 @@ static void fuse2fs_iomap_enable(struct fuse_conn_info *conn,
 	if (!fuse2fs_iomap_enabled(ff)) {
 		if (ff->iomap_want == FT_ENABLE)
 			err_printf(ff, "%s\n", _("Could not enable iomap."));
+		if (ff->iomap_passthrough_options)
+			err_printf(ff, "%s\n", _("Some mount options require iomap."));
 		return;
 	}
 }
@@ -6590,6 +6593,7 @@ enum {
 	FUSE2FS_ERRORS_BEHAVIOR,
 #ifdef HAVE_FUSE_IOMAP
 	FUSE2FS_IOMAP,
+	FUSE2FS_IOMAP_PASSTHROUGH,
 #endif
 };
 
@@ -6616,6 +6620,17 @@ static struct fuse_opt fuse2fs_opts[] = {
 #endif
 	FUSE2FS_OPT("noblkdev",		noblkdev,		1),
 
+#ifdef HAVE_FUSE_IOMAP
+#ifdef MS_LAZYTIME
+	FUSE_OPT_KEY("lazytime",	FUSE2FS_IOMAP_PASSTHROUGH),
+	FUSE_OPT_KEY("nolazytime",	FUSE2FS_IOMAP_PASSTHROUGH),
+#endif
+#ifdef MS_STRICTATIME
+	FUSE_OPT_KEY("strictatime",	FUSE2FS_IOMAP_PASSTHROUGH),
+	FUSE_OPT_KEY("nostrictatime",	FUSE2FS_IOMAP_PASSTHROUGH),
+#endif
+#endif
+
 	FUSE_OPT_KEY("user_xattr",	FUSE2FS_IGNORED),
 	FUSE_OPT_KEY("noblock_validity", FUSE2FS_IGNORED),
 	FUSE_OPT_KEY("nodelalloc",	FUSE2FS_IGNORED),
@@ -6642,6 +6657,12 @@ static int fuse2fs_opt_proc(void *data, const char *arg,
 	struct fuse2fs *ff = data;
 
 	switch (key) {
+#ifdef HAVE_FUSE_IOMAP
+	case FUSE2FS_IOMAP_PASSTHROUGH:
+		ff->iomap_passthrough_options = 1;
+		/* pass through to libfuse */
+		return 1;
+#endif
 	case FUSE2FS_DIRSYNC:
 		ff->dirsync = 1;
 		/* pass through to libfuse */
@@ -6934,6 +6955,13 @@ int main(int argc, char *argv[])
 		fctx.unmount_in_destroy = 1;
 	}
 
+	if (fctx.iomap_passthrough_options && !iomap_detected) {
+		err_printf(&fctx, "%s\n",
+			   _("Some mount options require iomap."));
+		ret |= 1;
+		goto out;
+	}
+
 	if (!fctx.cache_size)
 		fctx.cache_size = default_cache_size();
 	if (fctx.cache_size) {


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 02/10] fuse2fs: skip permission checking on utimens when iomap is enabled
  2025-09-16  0:23 ` [PATCHSET RFC v5 6/9] fuse2fs: handle timestamps and ACLs correctly when iomap is enabled Darrick J. Wong
  2025-09-16  1:03   ` [PATCH 01/10] fuse2fs: add strictatime/lazytime mount options Darrick J. Wong
@ 2025-09-16  1:03   ` Darrick J. Wong
  2025-09-16  1:04   ` [PATCH 03/10] fuse2fs: let the kernel tell us about acl/mode updates Darrick J. Wong
                     ` (7 subsequent siblings)
  9 siblings, 0 replies; 87+ messages in thread
From: Darrick J. Wong @ 2025-09-16  1:03 UTC (permalink / raw)
  To: tytso; +Cc: miklos, neal, linux-fsdevel, linux-ext4, John, bernd,
	joannelkoong

From: Darrick J. Wong <djwong@kernel.org>

When iomap is enabled, the kernel is in charge of enforcing permissions
checks on timestamp updates for files.  We needn't do that in userspace
anymore.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 fuse4fs/fuse4fs.c |   11 +++++++----
 misc/fuse2fs.c    |   11 +++++++----
 2 files changed, 14 insertions(+), 8 deletions(-)


diff --git a/fuse4fs/fuse4fs.c b/fuse4fs/fuse4fs.c
index bc2cf41085695f..06be49164c783d 100644
--- a/fuse4fs/fuse4fs.c
+++ b/fuse4fs/fuse4fs.c
@@ -4850,13 +4850,16 @@ static int fuse4fs_utimens(struct fuse4fs *ff, const struct fuse_ctx *ctxt,
 
 	/*
 	 * ext4 allows timestamp updates of append-only files but only if we're
-	 * setting to current time
+	 * setting to current time.  If iomap is enabled, the kernel does the
+	 * permission checking for timestamp updates; skip the access check.
 	 */
 	if (aact == TA_NOW && mact == TA_NOW)
 		access |= A_OK;
-	ret = fuse4fs_inum_access(ff, ctxt, ino, access);
-	if (ret)
-		return ret;
+	if (!fuse4fs_iomap_enabled(ff)) {
+		ret = fuse4fs_inum_access(ff, ctxt, ino, access);
+		if (ret)
+			return ret;
+	}
 
 	if (aact != TA_OMIT)
 		EXT4_INODE_SET_XTIME(i_atime, &atime, inode);
diff --git a/misc/fuse2fs.c b/misc/fuse2fs.c
index 8f7194f4f815ee..716793b5fa485c 100644
--- a/misc/fuse2fs.c
+++ b/misc/fuse2fs.c
@@ -4500,13 +4500,16 @@ static int op_utimens(const char *path, const struct timespec ctv[2],
 
 	/*
 	 * ext4 allows timestamp updates of append-only files but only if we're
-	 * setting to current time
+	 * setting to current time.  If iomap is enabled, the kernel does the
+	 * permission checking for timestamp updates; skip the access check.
 	 */
 	if (ctv[0].tv_nsec == UTIME_NOW && ctv[1].tv_nsec == UTIME_NOW)
 		access |= A_OK;
-	ret = check_inum_access(ff, ino, access);
-	if (ret)
-		goto out;
+	if (!fuse2fs_iomap_enabled(ff)) {
+		ret = check_inum_access(ff, ino, access);
+		if (ret)
+			goto out;
+	}
 
 	err = fuse2fs_read_inode(fs, ino, &inode);
 	if (err) {


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 03/10] fuse2fs: let the kernel tell us about acl/mode updates
  2025-09-16  0:23 ` [PATCHSET RFC v5 6/9] fuse2fs: handle timestamps and ACLs correctly when iomap is enabled Darrick J. Wong
  2025-09-16  1:03   ` [PATCH 01/10] fuse2fs: add strictatime/lazytime mount options Darrick J. Wong
  2025-09-16  1:03   ` [PATCH 02/10] fuse2fs: skip permission checking on utimens when iomap is enabled Darrick J. Wong
@ 2025-09-16  1:04   ` Darrick J. Wong
  2025-09-16  1:04   ` [PATCH 04/10] fuse2fs: better debugging for file mode updates Darrick J. Wong
                     ` (6 subsequent siblings)
  9 siblings, 0 replies; 87+ messages in thread
From: Darrick J. Wong @ 2025-09-16  1:04 UTC (permalink / raw)
  To: tytso; +Cc: miklos, neal, linux-fsdevel, linux-ext4, John, bernd,
	joannelkoong

From: Darrick J. Wong <djwong@kernel.org>

When the kernel is running in iomap mode, it will also manage all the
ACL updates and the resulting file mode changes for us.  Disable the
manual implementation of it in fuse2fs.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 fuse4fs/fuse4fs.c |    4 ++--
 misc/fuse2fs.c    |    4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)


diff --git a/fuse4fs/fuse4fs.c b/fuse4fs/fuse4fs.c
index 06be49164c783d..184066855517b1 100644
--- a/fuse4fs/fuse4fs.c
+++ b/fuse4fs/fuse4fs.c
@@ -2137,7 +2137,7 @@ static int fuse4fs_propagate_default_acls(struct fuse4fs *ff, ext2_ino_t parent,
 	size_t deflen;
 	int ret;
 
-	if (!ff->acl || S_ISDIR(mode))
+	if (!ff->acl || S_ISDIR(mode) || fuse4fs_iomap_enabled(ff))
 		return 0;
 
 	ret = fuse4fs_getxattr(ff, parent, XATTR_NAME_POSIX_ACL_DEFAULT, &def,
@@ -3512,7 +3512,7 @@ static int fuse4fs_chmod(struct fuse4fs *ff, fuse_req_t req, ext2_ino_t ino,
 	 * of the user's groups, but FUSE only tells us about the primary
 	 * group.
 	 */
-	if (!fuse4fs_is_superuser(ff, ctxt)) {
+	if (!fuse4fs_iomap_enabled(ff) && !fuse4fs_is_superuser(ff, ctxt)) {
 		ret = fuse4fs_in_file_group(ff, req, inode);
 		if (ret < 0)
 			return ret;
diff --git a/misc/fuse2fs.c b/misc/fuse2fs.c
index 716793b5fa485c..2b3c09a59270bc 100644
--- a/misc/fuse2fs.c
+++ b/misc/fuse2fs.c
@@ -1937,7 +1937,7 @@ static int propagate_default_acls(struct fuse2fs *ff, ext2_ino_t parent,
 	size_t deflen;
 	int ret;
 
-	if (!ff->acl || S_ISDIR(mode))
+	if (!ff->acl || S_ISDIR(mode) || fuse2fs_iomap_enabled(ff))
 		return 0;
 
 	ret = __getxattr(ff, parent, XATTR_NAME_POSIX_ACL_DEFAULT, &def,
@@ -3213,7 +3213,7 @@ static int op_chmod(const char *path, mode_t mode, struct fuse_file_info *fi)
 	 * of the user's groups, but FUSE only tells us about the primary
 	 * group.
 	 */
-	if (!is_superuser(ff, ctxt)) {
+	if (!fuse2fs_iomap_enabled(ff) && !is_superuser(ff, ctxt)) {
 		ret = in_file_group(ctxt, &inode);
 		if (ret < 0)
 			goto out;


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 04/10] fuse2fs: better debugging for file mode updates
  2025-09-16  0:23 ` [PATCHSET RFC v5 6/9] fuse2fs: handle timestamps and ACLs correctly when iomap is enabled Darrick J. Wong
                     ` (2 preceding siblings ...)
  2025-09-16  1:04   ` [PATCH 03/10] fuse2fs: let the kernel tell us about acl/mode updates Darrick J. Wong
@ 2025-09-16  1:04   ` Darrick J. Wong
  2025-09-16  1:04   ` [PATCH 05/10] fuse2fs: debug timestamp updates Darrick J. Wong
                     ` (5 subsequent siblings)
  9 siblings, 0 replies; 87+ messages in thread
From: Darrick J. Wong @ 2025-09-16  1:04 UTC (permalink / raw)
  To: tytso; +Cc: miklos, neal, linux-fsdevel, linux-ext4, John, bernd,
	joannelkoong

From: Darrick J. Wong <djwong@kernel.org>

Improve the tracing of a chmod operation so that we can debug file mode
updates.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 fuse4fs/fuse4fs.c |   10 ++++++----
 misc/fuse2fs.c    |   12 +++++++-----
 2 files changed, 13 insertions(+), 9 deletions(-)


diff --git a/fuse4fs/fuse4fs.c b/fuse4fs/fuse4fs.c
index 184066855517b1..ab807f1479870c 100644
--- a/fuse4fs/fuse4fs.c
+++ b/fuse4fs/fuse4fs.c
@@ -3495,6 +3495,7 @@ static int fuse4fs_chmod(struct fuse4fs *ff, fuse_req_t req, ext2_ino_t ino,
 			 mode_t mode, struct ext2_inode_large *inode)
 {
 	const struct fuse_ctx *ctxt = fuse_req_ctx(req);
+	mode_t new_mode;
 	int ret = 0;
 
 	dbg_printf(ff, "%s: ino=%d mode=0%o\n", __func__, ino, mode);
@@ -3521,11 +3522,12 @@ static int fuse4fs_chmod(struct fuse4fs *ff, fuse_req_t req, ext2_ino_t ino,
 			mode &= ~S_ISGID;
 	}
 
-	inode->i_mode &= ~0xFFF;
-	inode->i_mode |= mode & 0xFFF;
+	new_mode = (inode->i_mode & ~0xFFF) | (mode & 0xFFF);
 
-	dbg_printf(ff, "%s: ino=%d new_mode=0%o\n",
-		   __func__, ino, inode->i_mode);
+	dbg_printf(ff, "%s: ino=%d old_mode=0%o new_mode=0%o\n",
+		   __func__, ino, inode->i_mode, new_mode);
+
+	inode->i_mode = new_mode;
 
 	return 0;
 }
diff --git a/misc/fuse2fs.c b/misc/fuse2fs.c
index 2b3c09a59270bc..53adf8542e2f42 100644
--- a/misc/fuse2fs.c
+++ b/misc/fuse2fs.c
@@ -3184,6 +3184,7 @@ static int op_chmod(const char *path, mode_t mode, struct fuse_file_info *fi)
 	errcode_t err;
 	ext2_ino_t ino;
 	struct ext2_inode_large inode;
+	mode_t new_mode;
 	int ret = 0;
 
 	FUSE2FS_CHECK_CONTEXT(ff);
@@ -3222,11 +3223,12 @@ static int op_chmod(const char *path, mode_t mode, struct fuse_file_info *fi)
 			mode &= ~S_ISGID;
 	}
 
-	inode.i_mode &= ~0xFFF;
-	inode.i_mode |= mode & 0xFFF;
+	new_mode = (inode.i_mode & ~0xFFF) | (mode & 0xFFF);
 
-	dbg_printf(ff, "%s: path=%s new_mode=0%o ino=%d\n", __func__,
-		   path, inode.i_mode, ino);
+	dbg_printf(ff, "%s: path=%s old_mode=0%o new_mode=0%o ino=%d\n",
+		   __func__, path, inode.i_mode, new_mode, ino);
+
+	inode.i_mode = new_mode;
 
 	ret = update_ctime(fs, ino, &inode);
 	if (ret)
@@ -3246,12 +3248,12 @@ static int op_chmod(const char *path, mode_t mode, struct fuse_file_info *fi)
 static int op_chown(const char *path, uid_t owner, gid_t group,
 		    struct fuse_file_info *fi)
 {
+	struct ext2_inode_large inode;
 	struct fuse_context *ctxt = fuse_get_context();
 	struct fuse2fs *ff = fuse2fs_get();
 	ext2_filsys fs;
 	errcode_t err;
 	ext2_ino_t ino;
-	struct ext2_inode_large inode;
 	int ret = 0;
 
 	FUSE2FS_CHECK_CONTEXT(ff);


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 05/10] fuse2fs: debug timestamp updates
  2025-09-16  0:23 ` [PATCHSET RFC v5 6/9] fuse2fs: handle timestamps and ACLs correctly when iomap is enabled Darrick J. Wong
                     ` (3 preceding siblings ...)
  2025-09-16  1:04   ` [PATCH 04/10] fuse2fs: better debugging for file mode updates Darrick J. Wong
@ 2025-09-16  1:04   ` Darrick J. Wong
  2025-09-16  1:05   ` [PATCH 06/10] fuse2fs: use coarse timestamps for iomap mode Darrick J. Wong
                     ` (4 subsequent siblings)
  9 siblings, 0 replies; 87+ messages in thread
From: Darrick J. Wong @ 2025-09-16  1:04 UTC (permalink / raw)
  To: tytso; +Cc: miklos, neal, linux-fsdevel, linux-ext4, John, bernd,
	joannelkoong

From: Darrick J. Wong <djwong@kernel.org>

Add tracing for timestamp updates to files.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 misc/fuse2fs.c |   97 +++++++++++++++++++++++++++++++++++---------------------
 1 file changed, 61 insertions(+), 36 deletions(-)


diff --git a/misc/fuse2fs.c b/misc/fuse2fs.c
index 53adf8542e2f42..aedd1add48db82 100644
--- a/misc/fuse2fs.c
+++ b/misc/fuse2fs.c
@@ -686,7 +686,8 @@ static void increment_version(struct ext2_inode_large *inode)
 		inode->i_version_hi = ver >> 32;
 }
 
-static void init_times(struct ext2_inode_large *inode)
+static void fuse2fs_init_timestamps(struct fuse2fs *ff, ext2_ino_t ino,
+				    struct ext2_inode_large *inode)
 {
 	struct timespec now;
 
@@ -696,11 +697,15 @@ static void init_times(struct ext2_inode_large *inode)
 	EXT4_INODE_SET_XTIME(i_mtime, &now, inode);
 	EXT4_EINODE_SET_XTIME(i_crtime, &now, inode);
 	increment_version(inode);
+
+	dbg_printf(ff, "%s: ino=%u time %ld:%lu\n", __func__, ino, now.tv_sec,
+		   now.tv_nsec);
 }
 
-static int update_ctime(ext2_filsys fs, ext2_ino_t ino,
-			struct ext2_inode_large *pinode)
+static int fuse2fs_update_ctime(struct fuse2fs *ff, ext2_ino_t ino,
+				struct ext2_inode_large *pinode)
 {
+	ext2_filsys fs = ff->fs;
 	errcode_t err;
 	struct timespec now;
 	struct ext2_inode_large inode;
@@ -711,6 +716,10 @@ static int update_ctime(ext2_filsys fs, ext2_ino_t ino,
 	if (pinode) {
 		increment_version(pinode);
 		EXT4_INODE_SET_XTIME(i_ctime, &now, pinode);
+
+		dbg_printf(ff, "%s: ino=%u ctime %ld:%lu\n", __func__, ino,
+			   now.tv_sec, now.tv_nsec);
+
 		return 0;
 	}
 
@@ -722,6 +731,9 @@ static int update_ctime(ext2_filsys fs, ext2_ino_t ino,
 	increment_version(&inode);
 	EXT4_INODE_SET_XTIME(i_ctime, &now, &inode);
 
+	dbg_printf(ff, "%s: ino=%u ctime %ld:%lu\n", __func__, ino,
+		   now.tv_sec, now.tv_nsec);
+
 	err = fuse2fs_write_inode(fs, ino, &inode);
 	if (err)
 		return translate_error(fs, ino, err);
@@ -729,8 +741,9 @@ static int update_ctime(ext2_filsys fs, ext2_ino_t ino,
 	return 0;
 }
 
-static int update_atime(ext2_filsys fs, ext2_ino_t ino)
+static int fuse2fs_update_atime(struct fuse2fs *ff, ext2_ino_t ino)
 {
+	ext2_filsys fs = ff->fs;
 	errcode_t err;
 	struct ext2_inode_large inode, *pinode;
 	struct timespec atime, mtime, now;
@@ -749,6 +762,10 @@ static int update_atime(ext2_filsys fs, ext2_ino_t ino)
 	dmtime = mtime.tv_sec + ((double)mtime.tv_nsec / NSEC_PER_SEC);
 	dnow = now.tv_sec + ((double)now.tv_nsec / NSEC_PER_SEC);
 
+	dbg_printf(ff, "%s: ino=%u atime %ld:%lu mtime %ld:%lu now %ld:%lu\n",
+		   __func__, ino, atime.tv_sec, atime.tv_nsec, mtime.tv_sec,
+		   mtime.tv_nsec, now.tv_sec, now.tv_nsec);
+
 	/*
 	 * If atime is newer than mtime and atime hasn't been updated in thirty
 	 * seconds, skip the atime update.  Same idea as Linux "relatime".  Use
@@ -765,9 +782,10 @@ static int update_atime(ext2_filsys fs, ext2_ino_t ino)
 	return 0;
 }
 
-static int update_mtime(ext2_filsys fs, ext2_ino_t ino,
-			struct ext2_inode_large *pinode)
+static int fuse2fs_update_mtime(struct fuse2fs *ff, ext2_ino_t ino,
+				struct ext2_inode_large *pinode)
 {
+	ext2_filsys fs = ff->fs;
 	errcode_t err;
 	struct ext2_inode_large inode;
 	struct timespec now;
@@ -777,6 +795,10 @@ static int update_mtime(ext2_filsys fs, ext2_ino_t ino,
 		EXT4_INODE_SET_XTIME(i_mtime, &now, pinode);
 		EXT4_INODE_SET_XTIME(i_ctime, &now, pinode);
 		increment_version(pinode);
+
+		dbg_printf(ff, "%s: ino=%u mtime/ctime %ld:%lu\n",
+			   __func__, ino, now.tv_sec, now.tv_nsec);
+
 		return 0;
 	}
 
@@ -789,6 +811,9 @@ static int update_mtime(ext2_filsys fs, ext2_ino_t ino,
 	EXT4_INODE_SET_XTIME(i_ctime, &now, &inode);
 	increment_version(&inode);
 
+	dbg_printf(ff, "%s: ino=%u mtime/ctime %ld:%lu\n",
+		   __func__, ino, now.tv_sec, now.tv_nsec);
+
 	err = fuse2fs_write_inode(fs, ino, &inode);
 	if (err)
 		return translate_error(fs, ino, err);
@@ -1858,7 +1883,7 @@ static int op_readlink(const char *path, char *buf, size_t len)
 	buf[len] = 0;
 
 	if (fuse2fs_is_writeable(ff)) {
-		ret = update_atime(fs, ino);
+		ret = fuse2fs_update_atime(ff, ino);
 		if (ret)
 			goto out;
 	}
@@ -2132,7 +2157,7 @@ static int op_mknod(const char *path, mode_t mode, dev_t dev)
 		goto out2;
 	}
 
-	ret = update_mtime(fs, parent, NULL);
+	ret = fuse2fs_update_mtime(ff, parent, NULL);
 	if (ret)
 		goto out2;
 
@@ -2155,7 +2180,7 @@ static int op_mknod(const char *path, mode_t mode, dev_t dev)
 	}
 
 	inode.i_generation = ff->next_generation++;
-	init_times(&inode);
+	fuse2fs_init_timestamps(ff, child, &inode);
 	err = fuse2fs_write_inode(fs, child, &inode);
 	if (err) {
 		ret = translate_error(fs, child, err);
@@ -2241,7 +2266,7 @@ static int op_mkdir(const char *path, mode_t mode)
 		goto out2;
 	}
 
-	ret = update_mtime(fs, parent, NULL);
+	ret = fuse2fs_update_mtime(ff, parent, NULL);
 	if (ret)
 		goto out2;
 
@@ -2268,7 +2293,7 @@ static int op_mkdir(const char *path, mode_t mode)
 	if (parent_sgid)
 		inode.i_mode |= S_ISGID;
 	inode.i_generation = ff->next_generation++;
-	init_times(&inode);
+	fuse2fs_init_timestamps(ff, child, &inode);
 
 	err = fuse2fs_write_inode(fs, child, &inode);
 	if (err) {
@@ -2351,7 +2376,7 @@ static int fuse2fs_unlink(struct fuse2fs *ff, const char *path,
 	if (err)
 		return translate_error(fs, dir, err);
 
-	ret = update_mtime(fs, dir, NULL);
+	ret = fuse2fs_update_mtime(ff, dir, NULL);
 	if (ret)
 		return ret;
 
@@ -2430,7 +2455,7 @@ static int remove_inode(struct fuse2fs *ff, ext2_ino_t ino)
 		inode.i_links_count--;
 	}
 
-	ret = update_ctime(fs, ino, &inode);
+	ret = fuse2fs_update_ctime(ff, ino, &inode);
 	if (ret)
 		return ret;
 
@@ -2604,7 +2629,7 @@ static int __op_rmdir(struct fuse2fs *ff, const char *path)
 		}
 		if (inode.i_links_count > 1)
 			inode.i_links_count--;
-		ret = update_mtime(fs, rds.parent, &inode);
+		ret = fuse2fs_update_mtime(ff, rds.parent, &inode);
 		if (ret)
 			goto out;
 		err = fuse2fs_write_inode(fs, rds.parent, &inode);
@@ -2701,7 +2726,7 @@ static int op_symlink(const char *src, const char *dest)
 	}
 
 	/* Update parent dir's mtime */
-	ret = update_mtime(fs, parent, NULL);
+	ret = fuse2fs_update_mtime(ff, parent, NULL);
 	if (ret)
 		goto out2;
 
@@ -2725,7 +2750,7 @@ static int op_symlink(const char *src, const char *dest)
 	fuse2fs_set_uid(&inode, ctxt->uid);
 	fuse2fs_set_gid(&inode, gid);
 	inode.i_generation = ff->next_generation++;
-	init_times(&inode);
+	fuse2fs_init_timestamps(ff, child, &inode);
 
 	err = fuse2fs_write_inode(fs, child, &inode);
 	if (err) {
@@ -2970,11 +2995,11 @@ static int op_rename(const char *from, const char *to,
 	}
 
 	/* Update timestamps */
-	ret = update_ctime(fs, from_ino, NULL);
+	ret = fuse2fs_update_ctime(ff, from_ino, NULL);
 	if (ret)
 		goto out2;
 
-	ret = update_mtime(fs, to_dir_ino, NULL);
+	ret = fuse2fs_update_mtime(ff, to_dir_ino, NULL);
 	if (ret)
 		goto out2;
 
@@ -3063,7 +3088,7 @@ static int op_link(const char *src, const char *dest)
 		goto out2;
 
 	inode.i_links_count++;
-	ret = update_ctime(fs, ino, &inode);
+	ret = fuse2fs_update_ctime(ff, ino, &inode);
 	if (ret)
 		goto out2;
 
@@ -3082,7 +3107,7 @@ static int op_link(const char *src, const char *dest)
 		goto out2;
 	}
 
-	ret = update_mtime(fs, parent, NULL);
+	ret = fuse2fs_update_mtime(ff, parent, NULL);
 	if (ret)
 		goto out2;
 
@@ -3230,7 +3255,7 @@ static int op_chmod(const char *path, mode_t mode, struct fuse_file_info *fi)
 
 	inode.i_mode = new_mode;
 
-	ret = update_ctime(fs, ino, &inode);
+	ret = fuse2fs_update_ctime(ff, ino, &inode);
 	if (ret)
 		goto out;
 
@@ -3297,7 +3322,7 @@ static int op_chown(const char *path, uid_t owner, gid_t group,
 		fuse2fs_set_gid(&inode, group);
 	}
 
-	ret = update_ctime(fs, ino, &inode);
+	ret = fuse2fs_update_ctime(ff, ino, &inode);
 	if (ret)
 		goto out;
 
@@ -3427,7 +3452,7 @@ static int fuse2fs_truncate(struct fuse2fs *ff, ext2_ino_t ino, off_t new_size)
 	if (err)
 		return translate_error(fs, ino, err);
 
-	ret = update_mtime(fs, ino, NULL);
+	ret = fuse2fs_update_mtime(ff, ino, NULL);
 	if (ret)
 		return ret;
 
@@ -3655,7 +3680,7 @@ static int op_read(const char *path EXT2FS_ATTR((unused)), char *buf,
 	}
 
 	if (fh->check_flags != X_OK && fuse2fs_is_writeable(ff)) {
-		ret = update_atime(fs, fh->ino);
+		ret = fuse2fs_update_atime(ff, fh->ino);
 		if (ret)
 			goto out;
 	}
@@ -3739,7 +3764,7 @@ static int op_write(const char *path EXT2FS_ATTR((unused)),
 		goto out;
 	}
 
-	ret = update_mtime(fs, fh->ino, NULL);
+	ret = fuse2fs_update_mtime(ff, fh->ino, NULL);
 	if (ret)
 		goto out;
 
@@ -4101,7 +4126,7 @@ static int op_setxattr(const char *path EXT2FS_ATTR((unused)),
 		goto out2;
 	}
 
-	ret = update_ctime(fs, ino, NULL);
+	ret = fuse2fs_update_ctime(ff, ino, NULL);
 out2:
 	err = ext2fs_xattrs_close(&h);
 	if (!ret && err)
@@ -4195,7 +4220,7 @@ static int op_removexattr(const char *path, const char *key)
 		goto out2;
 	}
 
-	ret = update_ctime(fs, ino, NULL);
+	ret = fuse2fs_update_ctime(ff, ino, NULL);
 out2:
 	err = ext2fs_xattrs_close(&h);
 	if (err && !ret)
@@ -4313,7 +4338,7 @@ static int op_readdir(const char *path EXT2FS_ATTR((unused)), void *buf,
 	}
 
 	if (fuse2fs_is_writeable(ff)) {
-		ret = update_atime(i.fs, fh->ino);
+		ret = fuse2fs_update_atime(ff, fh->ino);
 		if (ret)
 			goto out;
 	}
@@ -4418,7 +4443,7 @@ static int op_create(const char *path, mode_t mode, struct fuse_file_info *fp)
 		goto out2;
 	}
 
-	ret = update_mtime(fs, parent, NULL);
+	ret = fuse2fs_update_mtime(ff, parent, NULL);
 	if (ret)
 		goto out2;
 
@@ -4449,7 +4474,7 @@ static int op_create(const char *path, mode_t mode, struct fuse_file_info *fp)
 	}
 
 	inode.i_generation = ff->next_generation++;
-	init_times(&inode);
+	fuse2fs_init_timestamps(ff, child, &inode);
 	err = fuse2fs_write_inode(fs, child, &inode);
 	if (err) {
 		ret = translate_error(fs, child, err);
@@ -4533,7 +4558,7 @@ static int op_utimens(const char *path, const struct timespec ctv[2],
 	if (tv[1].tv_nsec != UTIME_OMIT)
 		EXT4_INODE_SET_XTIME(i_mtime, &tv[1], &inode);
 #endif /* UTIME_OMIT */
-	ret = update_ctime(fs, ino, &inode);
+	ret = fuse2fs_update_ctime(ff, ino, &inode);
 	if (ret)
 		goto out;
 
@@ -4601,7 +4626,7 @@ static int ioctl_setflags(struct fuse2fs *ff, struct fuse2fs_file_handle *fh,
 	if (ret)
 		return ret;
 
-	ret = update_ctime(fs, fh->ino, &inode);
+	ret = fuse2fs_update_ctime(ff, fh->ino, &inode);
 	if (ret)
 		return ret;
 
@@ -4648,7 +4673,7 @@ static int ioctl_setversion(struct fuse2fs *ff, struct fuse2fs_file_handle *fh,
 
 	inode.i_generation = generation;
 
-	ret = update_ctime(fs, fh->ino, &inode);
+	ret = fuse2fs_update_ctime(ff, fh->ino, &inode);
 	if (ret)
 		return ret;
 
@@ -4779,7 +4804,7 @@ static int ioctl_fssetxattr(struct fuse2fs *ff, struct fuse2fs_file_handle *fh,
 	if (ext2fs_inode_includes(inode_size, i_projid))
 		inode.i_projid = fsx->fsx_projid;
 
-	ret = update_ctime(fs, fh->ino, &inode);
+	ret = fuse2fs_update_ctime(ff, fh->ino, &inode);
 	if (ret)
 		return ret;
 
@@ -5048,7 +5073,7 @@ static int fuse2fs_allocate_range(struct fuse2fs *ff,
 		}
 	}
 
-	err = update_mtime(fs, fh->ino, &inode);
+	err = fuse2fs_update_mtime(ff, fh->ino, &inode);
 	if (err)
 		return err;
 
@@ -5221,7 +5246,7 @@ static int fuse2fs_punch_range(struct fuse2fs *ff,
 			return translate_error(fs, fh->ino, err);
 	}
 
-	err = update_mtime(fs, fh->ino, &inode);
+	err = fuse2fs_update_mtime(ff, fh->ino, &inode);
 	if (err)
 		return err;
 


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 06/10] fuse2fs: use coarse timestamps for iomap mode
  2025-09-16  0:23 ` [PATCHSET RFC v5 6/9] fuse2fs: handle timestamps and ACLs correctly when iomap is enabled Darrick J. Wong
                     ` (4 preceding siblings ...)
  2025-09-16  1:04   ` [PATCH 05/10] fuse2fs: debug timestamp updates Darrick J. Wong
@ 2025-09-16  1:05   ` Darrick J. Wong
  2025-09-16  1:05   ` [PATCH 07/10] fuse2fs: add tracing for retrieving timestamps Darrick J. Wong
                     ` (3 subsequent siblings)
  9 siblings, 0 replies; 87+ messages in thread
From: Darrick J. Wong @ 2025-09-16  1:05 UTC (permalink / raw)
  To: tytso; +Cc: miklos, neal, linux-fsdevel, linux-ext4, John, bernd,
	joannelkoong

From: Darrick J. Wong <djwong@kernel.org>

In iomap mode, the kernel is responsible for maintaining timestamps
because file writes don't upcall to fuse2fs.  The kernel's predicate for
deciding if [cm]time should be updated bases its decisions off [cm]time
being an exact match for the coarse clock (instead of checking that
[cm]time < coarse_clock) which means that fuse2fs setting a fine-grained
timestamp that is slightly ahead of the coarse clock can result in
timestamps appearing to go backwards.  generic/423 doesn't like seeing
btime > ctime from statx, so we'll use the coarse clock in iomap mode.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 fuse4fs/fuse4fs.c |  110 +++++++++++++++++++++++++++++++----------------------
 misc/fuse2fs.c    |   34 ++++++++++++----
 2 files changed, 90 insertions(+), 54 deletions(-)


diff --git a/fuse4fs/fuse4fs.c b/fuse4fs/fuse4fs.c
index ab807f1479870c..5673a545b06b31 100644
--- a/fuse4fs/fuse4fs.c
+++ b/fuse4fs/fuse4fs.c
@@ -827,8 +827,24 @@ static inline void fuse4fs_dump_extents(struct fuse4fs *ff, ext2_ino_t ino,
 	ext2fs_extent_free(extents);
 }
 
-static void get_now(struct timespec *now)
+static void fuse4fs_get_now(struct fuse4fs *ff, struct timespec *now)
 {
+#ifdef CLOCK_REALTIME_COARSE
+	/*
+	 * In iomap mode, the kernel is responsible for maintaining timestamps
+	 * because file writes don't upcall to fuse4fs.  The kernel's predicate
+	 * for deciding if [cm]time should be updated bases its decisions off
+	 * [cm]time being an exact match for the coarse clock (instead of
+	 * checking that [cm]time < coarse_clock) which means that fuse4fs
+	 * setting a fine-grained timestamp that is slightly ahead of the
+	 * coarse clock can result in timestamps appearing to go backwards.
+	 * generic/423 doesn't like seeing btime > ctime from statx, so we'll
+	 * use the coarse clock in iomap mode.
+	 */
+	if (fuse4fs_iomap_enabled(ff) &&
+	    !clock_gettime(CLOCK_REALTIME_COARSE, now))
+		return;
+#endif
 #ifdef CLOCK_REALTIME
 	if (!clock_gettime(CLOCK_REALTIME, now))
 		return;
@@ -851,11 +867,12 @@ static void increment_version(struct ext2_inode_large *inode)
 		inode->i_version_hi = ver >> 32;
 }
 
-static void init_times(struct ext2_inode_large *inode)
+static void fuse4fs_init_timestamps(struct fuse4fs *ff,
+				    struct ext2_inode_large *inode)
 {
 	struct timespec now;
 
-	get_now(&now);
+	fuse4fs_get_now(ff, &now);
 	EXT4_INODE_SET_XTIME(i_atime, &now, inode);
 	EXT4_INODE_SET_XTIME(i_ctime, &now, inode);
 	EXT4_INODE_SET_XTIME(i_mtime, &now, inode);
@@ -863,14 +880,15 @@ static void init_times(struct ext2_inode_large *inode)
 	increment_version(inode);
 }
 
-static int update_ctime(ext2_filsys fs, ext2_ino_t ino,
-			struct ext2_inode_large *pinode)
+static int fuse4fs_update_ctime(struct fuse4fs *ff, ext2_ino_t ino,
+				struct ext2_inode_large *pinode)
 {
-	errcode_t err;
 	struct timespec now;
 	struct ext2_inode_large inode;
+	ext2_filsys fs = ff->fs;
+	errcode_t err;
 
-	get_now(&now);
+	fuse4fs_get_now(ff, &now);
 
 	/* If user already has a inode buffer, just update that */
 	if (pinode) {
@@ -894,12 +912,13 @@ static int update_ctime(ext2_filsys fs, ext2_ino_t ino,
 	return 0;
 }
 
-static int update_atime(ext2_filsys fs, ext2_ino_t ino)
+static int fuse4fs_update_atime(struct fuse4fs *ff, ext2_ino_t ino)
 {
-	errcode_t err;
 	struct ext2_inode_large inode, *pinode;
 	struct timespec atime, mtime, now;
+	ext2_filsys fs = ff->fs;
 	double datime, dmtime, dnow;
+	errcode_t err;
 
 	err = fuse4fs_read_inode(fs, ino, &inode);
 	if (err)
@@ -908,7 +927,7 @@ static int update_atime(ext2_filsys fs, ext2_ino_t ino)
 	pinode = &inode;
 	EXT4_INODE_GET_XTIME(i_atime, &atime, pinode);
 	EXT4_INODE_GET_XTIME(i_mtime, &mtime, pinode);
-	get_now(&now);
+	fuse4fs_get_now(ff, &now);
 
 	datime = atime.tv_sec + ((double)atime.tv_nsec / NSEC_PER_SEC);
 	dmtime = mtime.tv_sec + ((double)mtime.tv_nsec / NSEC_PER_SEC);
@@ -930,15 +949,16 @@ static int update_atime(ext2_filsys fs, ext2_ino_t ino)
 	return 0;
 }
 
-static int update_mtime(ext2_filsys fs, ext2_ino_t ino,
-			struct ext2_inode_large *pinode)
+static int fuse4fs_update_mtime(struct fuse4fs *ff, ext2_ino_t ino,
+				struct ext2_inode_large *pinode)
 {
-	errcode_t err;
 	struct ext2_inode_large inode;
 	struct timespec now;
+	ext2_filsys fs = ff->fs;
+	errcode_t err;
 
 	if (pinode) {
-		get_now(&now);
+		fuse4fs_get_now(ff, &now);
 		EXT4_INODE_SET_XTIME(i_mtime, &now, pinode);
 		EXT4_INODE_SET_XTIME(i_ctime, &now, pinode);
 		increment_version(pinode);
@@ -949,7 +969,7 @@ static int update_mtime(ext2_filsys fs, ext2_ino_t ino,
 	if (err)
 		return translate_error(fs, ino, err);
 
-	get_now(&now);
+	fuse4fs_get_now(ff, &now);
 	EXT4_INODE_SET_XTIME(i_mtime, &now, &inode);
 	EXT4_INODE_SET_XTIME(i_ctime, &now, &inode);
 	increment_version(&inode);
@@ -2054,7 +2074,7 @@ static void op_readlink(fuse_req_t req, fuse_ino_t fino)
 	buf[len] = 0;
 
 	if (fuse4fs_is_writeable(ff)) {
-		ret = update_atime(fs, ino);
+		ret = fuse4fs_update_atime(ff, ino);
 		if (ret)
 			goto out;
 	}
@@ -2323,7 +2343,7 @@ static void op_mknod(fuse_req_t req, fuse_ino_t fino, const char *name,
 		goto out2;
 	}
 
-	ret = update_mtime(fs, parent, NULL);
+	ret = fuse4fs_update_mtime(ff, parent, NULL);
 	if (ret)
 		goto out2;
 
@@ -2346,7 +2366,7 @@ static void op_mknod(fuse_req_t req, fuse_ino_t fino, const char *name,
 	}
 
 	inode.i_generation = ff->next_generation++;
-	init_times(&inode);
+	fuse4fs_init_timestamps(ff, &inode);
 	err = fuse4fs_write_inode(fs, child, &inode);
 	if (err) {
 		ret = translate_error(fs, child, err);
@@ -2408,7 +2428,7 @@ static void op_mkdir(fuse_req_t req, fuse_ino_t fino, const char *name,
 		goto out2;
 	}
 
-	ret = update_mtime(fs, parent, NULL);
+	ret = fuse4fs_update_mtime(ff, parent, NULL);
 	if (ret)
 		goto out2;
 
@@ -2434,7 +2454,7 @@ static void op_mkdir(fuse_req_t req, fuse_ino_t fino, const char *name,
 	if (parent_sgid)
 		inode.i_mode |= S_ISGID;
 	inode.i_generation = ff->next_generation++;
-	init_times(&inode);
+	fuse4fs_init_timestamps(ff, &inode);
 
 	err = fuse4fs_write_inode(fs, child, &inode);
 	if (err) {
@@ -2775,7 +2795,7 @@ static int fuse4fs_remove_inode(struct fuse4fs *ff, ext2_ino_t ino)
 		inode.i_links_count--;
 	}
 
-	ret = update_ctime(fs, ino, &inode);
+	ret = fuse4fs_update_ctime(ff, ino, &inode);
 	if (ret)
 		return ret;
 
@@ -2846,7 +2866,7 @@ static int fuse4fs_unlink(struct fuse4fs *ff, ext2_ino_t parent,
 		goto out;
 	}
 
-	ret = update_mtime(fs, parent, NULL);
+	ret = fuse4fs_update_mtime(ff, parent, NULL);
 	if (ret)
 		goto out;
 out:
@@ -2985,7 +3005,7 @@ static int fuse4fs_rmdir(struct fuse4fs *ff, ext2_ino_t parent,
 		}
 		if (inode.i_links_count > 1)
 			inode.i_links_count--;
-		ret = update_mtime(fs, rds.parent, &inode);
+		ret = fuse4fs_update_mtime(ff, rds.parent, &inode);
 		if (ret)
 			goto out;
 		err = fuse4fs_write_inode(fs, rds.parent, &inode);
@@ -3089,7 +3109,7 @@ static void op_symlink(fuse_req_t req, const char *target, fuse_ino_t fino,
 	}
 
 	/* Update parent dir's mtime */
-	ret = update_mtime(fs, parent, NULL);
+	ret = fuse4fs_update_mtime(ff, parent, NULL);
 	if (ret)
 		goto out2;
 
@@ -3112,7 +3132,7 @@ static void op_symlink(fuse_req_t req, const char *target, fuse_ino_t fino,
 	fuse4fs_set_uid(&inode, ctxt->uid);
 	fuse4fs_set_gid(&inode, gid);
 	inode.i_generation = ff->next_generation++;
-	init_times(&inode);
+	fuse4fs_init_timestamps(ff, &inode);
 
 	err = fuse4fs_write_inode(fs, child, &inode);
 	if (err) {
@@ -3303,11 +3323,11 @@ static void op_rename(fuse_req_t req, fuse_ino_t from_parent, const char *from,
 	}
 
 	/* Update timestamps */
-	ret = update_ctime(fs, from_ino, NULL);
+	ret = fuse4fs_update_ctime(ff, from_ino, NULL);
 	if (ret)
 		goto out;
 
-	ret = update_mtime(fs, to_dir_ino, NULL);
+	ret = fuse4fs_update_mtime(ff, to_dir_ino, NULL);
 	if (ret)
 		goto out;
 
@@ -3381,7 +3401,7 @@ static void op_link(fuse_req_t req, fuse_ino_t child_fino,
 	}
 
 	inode.i_links_count++;
-	ret = update_ctime(fs, child, &inode);
+	ret = fuse4fs_update_ctime(ff, child, &inode);
 	if (ret)
 		goto out2;
 
@@ -3398,7 +3418,7 @@ static void op_link(fuse_req_t req, fuse_ino_t child_fino,
 		goto out2;
 	}
 
-	ret = update_mtime(fs, parent, NULL);
+	ret = fuse4fs_update_mtime(ff, parent, NULL);
 	if (ret)
 		goto out2;
 
@@ -3634,7 +3654,7 @@ static int fuse4fs_truncate(struct fuse4fs *ff, ext2_ino_t ino, off_t new_size)
 	if (err)
 		return translate_error(fs, ino, err);
 
-	ret = update_mtime(fs, ino, NULL);
+	ret = fuse4fs_update_mtime(ff, ino, NULL);
 	if (ret)
 		return ret;
 
@@ -3836,7 +3856,7 @@ static void op_read(fuse_req_t req, fuse_ino_t fino EXT2FS_ATTR((unused)),
 	}
 
 	if (fh->check_flags != X_OK && fuse4fs_is_writeable(ff)) {
-		ret = update_atime(fs, fh->ino);
+		ret = fuse4fs_update_atime(ff, fh->ino);
 		if (ret)
 			goto out;
 	}
@@ -3910,7 +3930,7 @@ static void op_write(fuse_req_t req, fuse_ino_t fino EXT2FS_ATTR((unused)),
 		goto out;
 	}
 
-	ret = update_mtime(fs, fh->ino, NULL);
+	ret = fuse4fs_update_mtime(ff, fh->ino, NULL);
 	if (ret)
 		goto out;
 
@@ -4357,7 +4377,7 @@ static void op_setxattr(fuse_req_t req, fuse_ino_t fino, const char *key,
 		goto out2;
 	}
 
-	ret = update_ctime(fs, ino, NULL);
+	ret = fuse4fs_update_ctime(ff, ino, NULL);
 out2:
 	err = ext2fs_xattrs_close(&h);
 	if (!ret && err)
@@ -4451,7 +4471,7 @@ static void op_removexattr(fuse_req_t req, fuse_ino_t fino, const char *key)
 		goto out2;
 	}
 
-	ret = update_ctime(fs, ino, NULL);
+	ret = fuse4fs_update_ctime(ff, ino, NULL);
 out2:
 	err = ext2fs_xattrs_close(&h);
 	if (err && !ret)
@@ -4598,7 +4618,7 @@ static void __op_readdir(fuse_req_t req, fuse_ino_t fino, size_t size,
 	}
 
 	if (fuse4fs_is_writeable(ff)) {
-		ret = update_atime(i.fs, fh->ino);
+		ret = fuse4fs_update_atime(i.ff, fh->ino);
 		if (ret)
 			goto out;
 	}
@@ -4698,7 +4718,7 @@ static void op_create(fuse_req_t req, fuse_ino_t fino, const char *name,
 			goto out2;
 		}
 
-		ret = update_mtime(fs, parent, NULL);
+		ret = fuse4fs_update_mtime(ff, parent, NULL);
 		if (ret)
 			goto out2;
 	} else {
@@ -4739,7 +4759,7 @@ static void op_create(fuse_req_t req, fuse_ino_t fino, const char *name,
 	}
 
 	inode.i_generation = ff->next_generation++;
-	init_times(&inode);
+	fuse4fs_init_timestamps(ff, &inode);
 	err = fuse4fs_write_inode(fs, child, &inode);
 	if (err) {
 		ret = translate_error(fs, child, err);
@@ -4818,7 +4838,7 @@ static int fuse4fs_utimens(struct fuse4fs *ff, const struct fuse_ctx *ctxt,
 	int ret = 0;
 
 	if (to_set & (FUSE_SET_ATTR_ATIME_NOW | FUSE_SET_ATTR_MTIME_NOW))
-		get_now(&now);
+		fuse4fs_get_now(ff, &now);
 
 	if (to_set & FUSE_SET_ATTR_ATIME_NOW) {
 		atime = now;
@@ -4956,7 +4976,7 @@ static void op_setattr(fuse_req_t req, fuse_ino_t fino, struct stat *attr,
 	}
 
 	/* Update ctime for any attribute change */
-	ret = update_ctime(fs, ino, &inode);
+	ret = fuse4fs_update_ctime(ff, ino, &inode);
 	if (ret)
 		goto out;
 
@@ -5038,7 +5058,7 @@ static int ioctl_setflags(struct fuse4fs *ff, const struct fuse_ctx *ctxt,
 	if (ret)
 		return ret;
 
-	ret = update_ctime(fs, fh->ino, &inode);
+	ret = fuse4fs_update_ctime(ff, fh->ino, &inode);
 	if (ret)
 		return ret;
 
@@ -5091,7 +5111,7 @@ static int ioctl_setversion(struct fuse4fs *ff, const struct fuse_ctx *ctxt,
 
 	inode.i_generation = *indata;
 
-	ret = update_ctime(fs, fh->ino, &inode);
+	ret = fuse4fs_update_ctime(ff, fh->ino, &inode);
 	if (ret)
 		return ret;
 
@@ -5227,7 +5247,7 @@ static int ioctl_fssetxattr(struct fuse4fs *ff, const struct fuse_ctx *ctxt,
 	if (ext2fs_inode_includes(inode_size, i_projid))
 		inode.i_projid = fsx->fsx_projid;
 
-	ret = update_ctime(fs, fh->ino, &inode);
+	ret = fuse4fs_update_ctime(ff, fh->ino, &inode);
 	if (ret)
 		return ret;
 
@@ -5520,7 +5540,7 @@ static int fuse4fs_allocate_range(struct fuse4fs *ff,
 		}
 	}
 
-	err = update_mtime(fs, fh->ino, &inode);
+	err = fuse4fs_update_mtime(ff, fh->ino, &inode);
 	if (err)
 		return err;
 
@@ -5693,7 +5713,7 @@ static int fuse4fs_punch_range(struct fuse4fs *ff,
 			return translate_error(fs, fh->ino, err);
 	}
 
-	err = update_mtime(fs, fh->ino, &inode);
+	err = fuse4fs_update_mtime(ff, fh->ino, &inode);
 	if (err)
 		return err;
 
@@ -7819,7 +7839,7 @@ static int __translate_error(ext2_filsys fs, ext2_ino_t ino, errcode_t err,
 			error_message(err), func, line);
 
 	/* Make a note in the error log */
-	get_now(&now);
+	fuse4fs_get_now(ff, &now);
 	ext2fs_set_tstamp(fs->super, s_last_error_time, now.tv_sec);
 	fs->super->s_last_error_ino = ino;
 	fs->super->s_last_error_line = line;
diff --git a/misc/fuse2fs.c b/misc/fuse2fs.c
index aedd1add48db82..fa4359133a79fc 100644
--- a/misc/fuse2fs.c
+++ b/misc/fuse2fs.c
@@ -662,8 +662,24 @@ static inline void fuse2fs_dump_extents(struct fuse2fs *ff, ext2_ino_t ino,
 	ext2fs_extent_free(extents);
 }
 
-static void get_now(struct timespec *now)
+static void fuse2fs_get_now(struct fuse2fs *ff, struct timespec *now)
 {
+#ifdef CLOCK_REALTIME_COARSE
+	/*
+	 * In iomap mode, the kernel is responsible for maintaining timestamps
+	 * because file writes don't upcall to fuse2fs.  The kernel's predicate
+	 * for deciding if [cm]time should be updated bases its decisions off
+	 * [cm]time being an exact match for the coarse clock (instead of
+	 * checking that [cm]time < coarse_clock) which means that fuse2fs
+	 * setting a fine-grained timestamp that is slightly ahead of the
+	 * coarse clock can result in timestamps appearing to go backwards.
+	 * generic/423 doesn't like seeing btime > ctime from statx, so we'll
+	 * use the coarse clock in iomap mode.
+	 */
+	if (fuse2fs_iomap_enabled(ff) &&
+	    !clock_gettime(CLOCK_REALTIME_COARSE, now))
+		return;
+#endif
 #ifdef CLOCK_REALTIME
 	if (!clock_gettime(CLOCK_REALTIME, now))
 		return;
@@ -691,7 +707,7 @@ static void fuse2fs_init_timestamps(struct fuse2fs *ff, ext2_ino_t ino,
 {
 	struct timespec now;
 
-	get_now(&now);
+	fuse2fs_get_now(ff, &now);
 	EXT4_INODE_SET_XTIME(i_atime, &now, inode);
 	EXT4_INODE_SET_XTIME(i_ctime, &now, inode);
 	EXT4_INODE_SET_XTIME(i_mtime, &now, inode);
@@ -710,7 +726,7 @@ static int fuse2fs_update_ctime(struct fuse2fs *ff, ext2_ino_t ino,
 	struct timespec now;
 	struct ext2_inode_large inode;
 
-	get_now(&now);
+	fuse2fs_get_now(ff, &now);
 
 	/* If user already has a inode buffer, just update that */
 	if (pinode) {
@@ -756,7 +772,7 @@ static int fuse2fs_update_atime(struct fuse2fs *ff, ext2_ino_t ino)
 	pinode = &inode;
 	EXT4_INODE_GET_XTIME(i_atime, &atime, pinode);
 	EXT4_INODE_GET_XTIME(i_mtime, &mtime, pinode);
-	get_now(&now);
+	fuse2fs_get_now(ff, &now);
 
 	datime = atime.tv_sec + ((double)atime.tv_nsec / NSEC_PER_SEC);
 	dmtime = mtime.tv_sec + ((double)mtime.tv_nsec / NSEC_PER_SEC);
@@ -791,7 +807,7 @@ static int fuse2fs_update_mtime(struct fuse2fs *ff, ext2_ino_t ino,
 	struct timespec now;
 
 	if (pinode) {
-		get_now(&now);
+		fuse2fs_get_now(ff, &now);
 		EXT4_INODE_SET_XTIME(i_mtime, &now, pinode);
 		EXT4_INODE_SET_XTIME(i_ctime, &now, pinode);
 		increment_version(pinode);
@@ -806,7 +822,7 @@ static int fuse2fs_update_mtime(struct fuse2fs *ff, ext2_ino_t ino,
 	if (err)
 		return translate_error(fs, ino, err);
 
-	get_now(&now);
+	fuse2fs_get_now(ff, &now);
 	EXT4_INODE_SET_XTIME(i_mtime, &now, &inode);
 	EXT4_INODE_SET_XTIME(i_ctime, &now, &inode);
 	increment_version(&inode);
@@ -4548,9 +4564,9 @@ static int op_utimens(const char *path, const struct timespec ctv[2],
 	tv[1] = ctv[1];
 #ifdef UTIME_NOW
 	if (tv[0].tv_nsec == UTIME_NOW)
-		get_now(tv);
+		fuse2fs_get_now(ff, tv);
 	if (tv[1].tv_nsec == UTIME_NOW)
-		get_now(tv + 1);
+		fuse2fs_get_now(ff, tv + 1);
 #endif /* UTIME_NOW */
 #ifdef UTIME_OMIT
 	if (tv[0].tv_nsec != UTIME_OMIT)
@@ -7259,7 +7275,7 @@ static int __translate_error(ext2_filsys fs, ext2_ino_t ino, errcode_t err,
 			error_message(err), func, line);
 
 	/* Make a note in the error log */
-	get_now(&now);
+	fuse2fs_get_now(ff, &now);
 	ext2fs_set_tstamp(fs->super, s_last_error_time, now.tv_sec);
 	fs->super->s_last_error_ino = ino;
 	fs->super->s_last_error_line = line;


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 07/10] fuse2fs: add tracing for retrieving timestamps
  2025-09-16  0:23 ` [PATCHSET RFC v5 6/9] fuse2fs: handle timestamps and ACLs correctly when iomap is enabled Darrick J. Wong
                     ` (5 preceding siblings ...)
  2025-09-16  1:05   ` [PATCH 06/10] fuse2fs: use coarse timestamps for iomap mode Darrick J. Wong
@ 2025-09-16  1:05   ` Darrick J. Wong
  2025-09-16  1:05   ` [PATCH 08/10] fuse2fs: enable syncfs Darrick J. Wong
                     ` (2 subsequent siblings)
  9 siblings, 0 replies; 87+ messages in thread
From: Darrick J. Wong @ 2025-09-16  1:05 UTC (permalink / raw)
  To: tytso; +Cc: miklos, neal, linux-fsdevel, linux-ext4, John, bernd,
	joannelkoong

From: Darrick J. Wong <djwong@kernel.org>

Add tracing for retrieving timestamps so we can debug the weird
behavior.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 misc/fuse2fs.c |   20 ++++++++++++++------
 1 file changed, 14 insertions(+), 6 deletions(-)


diff --git a/misc/fuse2fs.c b/misc/fuse2fs.c
index fa4359133a79fc..00dafec79f7766 100644
--- a/misc/fuse2fs.c
+++ b/misc/fuse2fs.c
@@ -1580,9 +1580,11 @@ static void *op_init(struct fuse_conn_info *conn,
 	return ff;
 }
 
-static int stat_inode(ext2_filsys fs, ext2_ino_t ino, struct stat *statbuf)
+static int fuse2fs_stat(struct fuse2fs *ff, ext2_ino_t ino,
+			struct stat *statbuf)
 {
 	struct ext2_inode_large inode;
+	ext2_filsys fs = ff->fs;
 	dev_t fakedev = 0;
 	errcode_t err;
 	int ret = 0;
@@ -1621,6 +1623,13 @@ static int stat_inode(ext2_filsys fs, ext2_ino_t ino, struct stat *statbuf)
 #else
 	statbuf->st_ctime = tv.tv_sec;
 #endif
+
+	dbg_printf(ff, "%s: ino=%d atime=%lld.%ld mtime=%lld.%ld ctime=%lld.%ld\n",
+		   __func__, ino,
+		   (long long int)statbuf->st_atim.tv_sec, statbuf->st_atim.tv_nsec,
+		   (long long int)statbuf->st_mtim.tv_sec, statbuf->st_mtim.tv_nsec,
+		   (long long int)statbuf->st_ctim.tv_sec, statbuf->st_ctim.tv_nsec);
+
 	if (LINUX_S_ISCHR(inode.i_mode) ||
 	    LINUX_S_ISBLK(inode.i_mode)) {
 		if (inode.i_block[0])
@@ -1667,16 +1676,15 @@ static int op_getattr(const char *path, struct stat *statbuf,
 		      struct fuse_file_info *fi)
 {
 	struct fuse2fs *ff = fuse2fs_get();
-	ext2_filsys fs;
 	ext2_ino_t ino;
 	int ret = 0;
 
 	FUSE2FS_CHECK_CONTEXT(ff);
-	fs = fuse2fs_start(ff);
+	fuse2fs_start(ff);
 	ret = fuse2fs_file_ino(ff, path, fi, &ino);
 	if (ret)
 		goto out;
-	ret = stat_inode(fs, ino, statbuf);
+	ret = fuse2fs_stat(ff, ino, statbuf);
 out:
 	fuse2fs_finish(ff, ret);
 	return ret;
@@ -3409,7 +3417,7 @@ static int fuse2fs_file_uses_iomap(struct fuse2fs *ff, ext2_ino_t ino)
 	if (!fuse2fs_iomap_enabled(ff))
 		return 0;
 
-	ret = stat_inode(ff->fs, ino, &statbuf);
+	ret = fuse2fs_stat(ff, ino, &statbuf);
 	if (ret)
 		return ret;
 
@@ -4311,7 +4319,7 @@ static int op_readdir_iter(ext2_ino_t dir EXT2FS_ATTR((unused)),
 			(unsigned long long)i->dirpos);
 
 	if (i->flags == FUSE_READDIR_PLUS) {
-		ret = stat_inode(i->fs, dirent->inode, &stat);
+		ret = fuse2fs_stat(i->ff, dirent->inode, &stat);
 		if (ret)
 			return DIRENT_ABORT;
 	}


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 08/10] fuse2fs: enable syncfs
  2025-09-16  0:23 ` [PATCHSET RFC v5 6/9] fuse2fs: handle timestamps and ACLs correctly when iomap is enabled Darrick J. Wong
                     ` (6 preceding siblings ...)
  2025-09-16  1:05   ` [PATCH 07/10] fuse2fs: add tracing for retrieving timestamps Darrick J. Wong
@ 2025-09-16  1:05   ` Darrick J. Wong
  2025-09-16  1:05   ` [PATCH 09/10] fuse2fs: skip the gdt write in op_destroy if syncfs is working Darrick J. Wong
  2025-09-16  1:06   ` [PATCH 10/10] fuse2fs: set sync, immutable, and append at file load time Darrick J. Wong
  9 siblings, 0 replies; 87+ messages in thread
From: Darrick J. Wong @ 2025-09-16  1:05 UTC (permalink / raw)
  To: tytso; +Cc: miklos, neal, linux-fsdevel, linux-ext4, John, bernd,
	joannelkoong

From: Darrick J. Wong <djwong@kernel.org>

Enable syncfs calls in fuse2fs.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 fuse4fs/fuse4fs.c |   39 ++++++++++++++++++++++++++++++++++++++-
 misc/fuse2fs.c    |   36 ++++++++++++++++++++++++++++++++++++
 2 files changed, 74 insertions(+), 1 deletion(-)


diff --git a/fuse4fs/fuse4fs.c b/fuse4fs/fuse4fs.c
index 5673a545b06b31..25c19c0f0deca0 100644
--- a/fuse4fs/fuse4fs.c
+++ b/fuse4fs/fuse4fs.c
@@ -1993,7 +1993,7 @@ static int fuse4fs_statx(struct fuse4fs *ff, ext2_ino_t ino, int statx_mask,
 static void op_statx(fuse_req_t req, fuse_ino_t fino, int flags, int mask,
 		     struct fuse_file_info *fi)
 {
-	struct statx stx;
+	struct statx stx = { };
 	struct fuse4fs *ff = fuse4fs_get(req);
 	ext2_ino_t ino;
 	int ret = 0;
@@ -5775,6 +5775,40 @@ static void op_fallocate(fuse_req_t req, fuse_ino_t fino EXT2FS_ATTR((unused)),
 }
 #endif /* SUPPORT_FALLOCATE */
 
+#if FUSE_VERSION >= FUSE_MAKE_VERSION(3, 99)
+static void op_syncfs(fuse_req_t req, fuse_ino_t ino)
+{
+	struct fuse4fs *ff = fuse4fs_get(req);
+	ext2_filsys fs;
+	errcode_t err;
+	int ret = 0;
+
+	FUSE4FS_CHECK_CONTEXT(req);
+	fs = fuse4fs_start(ff);
+
+	if (ff->opstate == F4OP_WRITABLE) {
+		if (fs->super->s_error_count)
+			fs->super->s_state |= EXT2_ERROR_FS;
+		ext2fs_mark_super_dirty(fs);
+		err = ext2fs_set_gdt_csum(fs);
+		if (err) {
+			ret = translate_error(fs, 0, err);
+			goto out_unlock;
+		}
+
+		err = ext2fs_flush2(fs, 0);
+		if (err) {
+			ret = translate_error(fs, 0, err);
+			goto out_unlock;
+		}
+	}
+
+out_unlock:
+	fuse4fs_finish(ff, ret);
+	fuse_reply_err(req, -ret);
+}
+#endif
+
 #ifdef HAVE_FUSE_IOMAP
 static void fuse4fs_iomap_hole(struct fuse4fs *ff, struct fuse_file_iomap *iomap,
 			       off_t pos, uint64_t count)
@@ -7063,6 +7097,9 @@ static struct fuse_lowlevel_ops fs_ops = {
 #if FUSE_VERSION >= FUSE_MAKE_VERSION(3, 18)
 	.statx = op_statx,
 #endif
+#if FUSE_VERSION >= FUSE_MAKE_VERSION(3, 99)
+	.syncfs = op_syncfs,
+#endif
 #ifdef HAVE_FUSE_IOMAP
 	.iomap_begin = op_iomap_begin,
 	.iomap_end = op_iomap_end,
diff --git a/misc/fuse2fs.c b/misc/fuse2fs.c
index 00dafec79f7766..d4f1825dd695ad 100644
--- a/misc/fuse2fs.c
+++ b/misc/fuse2fs.c
@@ -5331,6 +5331,41 @@ static int op_fallocate(const char *path EXT2FS_ATTR((unused)), int mode,
 }
 #endif /* SUPPORT_FALLOCATE */
 
+#if FUSE_VERSION >= FUSE_MAKE_VERSION(3, 99)
+static int op_syncfs(const char *path)
+{
+	struct fuse2fs *ff = fuse2fs_get();
+	ext2_filsys fs;
+	errcode_t err;
+	int ret = 0;
+
+	FUSE2FS_CHECK_CONTEXT(ff);
+	dbg_printf(ff, "%s: path=%s\n", __func__, path);
+	fs = fuse2fs_start(ff);
+
+	if (ff->opstate == F2OP_WRITABLE) {
+		if (fs->super->s_error_count)
+			fs->super->s_state |= EXT2_ERROR_FS;
+		ext2fs_mark_super_dirty(fs);
+		err = ext2fs_set_gdt_csum(fs);
+		if (err) {
+			ret = translate_error(fs, 0, err);
+			goto out_unlock;
+		}
+
+		err = ext2fs_flush2(fs, 0);
+		if (err) {
+			ret = translate_error(fs, 0, err);
+			goto out_unlock;
+		}
+	}
+
+out_unlock:
+	fuse2fs_finish(ff, ret);
+	return ret;
+}
+#endif
+
 #ifdef HAVE_FUSE_IOMAP
 static void fuse2fs_iomap_hole(struct fuse2fs *ff, struct fuse_file_iomap *iomap,
 			       off_t pos, uint64_t count)
@@ -6611,6 +6646,7 @@ static struct fuse_operations fs_ops = {
 #endif
 #if FUSE_VERSION >= FUSE_MAKE_VERSION(3, 99)
 	.getattr_iflags = op_getattr_iflags,
+	.syncfs = op_syncfs,
 #endif
 #ifdef HAVE_FUSE_IOMAP
 	.iomap_begin = op_iomap_begin,


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 09/10] fuse2fs: skip the gdt write in op_destroy if syncfs is working
  2025-09-16  0:23 ` [PATCHSET RFC v5 6/9] fuse2fs: handle timestamps and ACLs correctly when iomap is enabled Darrick J. Wong
                     ` (7 preceding siblings ...)
  2025-09-16  1:05   ` [PATCH 08/10] fuse2fs: enable syncfs Darrick J. Wong
@ 2025-09-16  1:05   ` Darrick J. Wong
  2025-09-16  1:06   ` [PATCH 10/10] fuse2fs: set sync, immutable, and append at file load time Darrick J. Wong
  9 siblings, 0 replies; 87+ messages in thread
From: Darrick J. Wong @ 2025-09-16  1:05 UTC (permalink / raw)
  To: tytso; +Cc: miklos, neal, linux-fsdevel, linux-ext4, John, bernd,
	joannelkoong

From: Darrick J. Wong <djwong@kernel.org>

As an umount-time performance enhancement, don't bother to write the
group descriptor tables in op_destroy if we know that op_syncfs will do
it for us.  That only happens if iomap is enabled.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 fuse4fs/fuse4fs.c |   19 ++++++++++++++++---
 misc/fuse2fs.c    |   19 ++++++++++++++++---
 2 files changed, 32 insertions(+), 6 deletions(-)


diff --git a/fuse4fs/fuse4fs.c b/fuse4fs/fuse4fs.c
index 25c19c0f0deca0..4f5618e64a93c3 100644
--- a/fuse4fs/fuse4fs.c
+++ b/fuse4fs/fuse4fs.c
@@ -268,6 +268,7 @@ struct fuse4fs {
 	int noblkdev;
 	int translate_inums;
 	int iomap_passthrough_options;
+	int write_gdt_on_destroy;
 
 	enum fuse4fs_opstate opstate;
 	int logfd;
@@ -1475,9 +1476,11 @@ static void op_destroy(void *userdata)
 		if (fs->super->s_error_count)
 			fs->super->s_state |= EXT2_ERROR_FS;
 		ext2fs_mark_super_dirty(fs);
-		err = ext2fs_set_gdt_csum(fs);
-		if (err)
-			translate_error(fs, 0, err);
+		if (ff->write_gdt_on_destroy) {
+			err = ext2fs_set_gdt_csum(fs);
+			if (err)
+				translate_error(fs, 0, err);
+		}
 
 		err = ext2fs_flush2(fs, 0);
 		if (err)
@@ -5803,6 +5806,15 @@ static void op_syncfs(fuse_req_t req, fuse_ino_t ino)
 		}
 	}
 
+	/*
+	 * When iomap is enabled, the kernel will call syncfs right before
+	 * calling the destroy method.  If any syncfs succeeds, then we know
+	 * that there will be a last syncfs and that it will write the GDT, so
+	 * destroy doesn't need to waste time doing that.
+	 */
+	if (fuse4fs_iomap_enabled(ff))
+		ff->write_gdt_on_destroy = 0;
+
 out_unlock:
 	fuse4fs_finish(ff, ret);
 	fuse_reply_err(req, -ret);
@@ -7497,6 +7509,7 @@ int main(int argc, char *argv[])
 		.iomap_dev = FUSE_IOMAP_DEV_NULL,
 #endif
 		.translate_inums = 1,
+		.write_gdt_on_destroy = 1,
 	};
 	errcode_t err;
 	FILE *orig_stderr = stderr;
diff --git a/misc/fuse2fs.c b/misc/fuse2fs.c
index d4f1825dd695ad..b193e0b2c06b69 100644
--- a/misc/fuse2fs.c
+++ b/misc/fuse2fs.c
@@ -261,6 +261,7 @@ struct fuse2fs {
 	int unmount_in_destroy;
 	int noblkdev;
 	int iomap_passthrough_options;
+	int write_gdt_on_destroy;
 
 	enum fuse2fs_opstate opstate;
 	int logfd;
@@ -1301,9 +1302,11 @@ static void op_destroy(void *p EXT2FS_ATTR((unused)))
 		if (fs->super->s_error_count)
 			fs->super->s_state |= EXT2_ERROR_FS;
 		ext2fs_mark_super_dirty(fs);
-		err = ext2fs_set_gdt_csum(fs);
-		if (err)
-			translate_error(fs, 0, err);
+		if (ff->write_gdt_on_destroy) {
+			err = ext2fs_set_gdt_csum(fs);
+			if (err)
+				translate_error(fs, 0, err);
+		}
 
 		err = ext2fs_flush2(fs, 0);
 		if (err)
@@ -5360,6 +5363,15 @@ static int op_syncfs(const char *path)
 		}
 	}
 
+	/*
+	 * When iomap is enabled, the kernel will call syncfs right before
+	 * calling the destroy method.  If any syncfs succeeds, then we know
+	 * that there will be a last syncfs and that it will write the GDT, so
+	 * destroy doesn't need to waste time doing that.
+	 */
+	if (fuse2fs_iomap_enabled(ff))
+		ff->write_gdt_on_destroy = 0;
+
 out_unlock:
 	fuse2fs_finish(ff, ret);
 	return ret;
@@ -6944,6 +6956,7 @@ int main(int argc, char *argv[])
 		.iomap_state = IOMAP_UNKNOWN,
 		.iomap_dev = FUSE_IOMAP_DEV_NULL,
 #endif
+		.write_gdt_on_destroy = 1,
 	};
 	errcode_t err;
 	FILE *orig_stderr = stderr;


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 10/10] fuse2fs: set sync, immutable, and append at file load time
  2025-09-16  0:23 ` [PATCHSET RFC v5 6/9] fuse2fs: handle timestamps and ACLs correctly when iomap is enabled Darrick J. Wong
                     ` (8 preceding siblings ...)
  2025-09-16  1:05   ` [PATCH 09/10] fuse2fs: skip the gdt write in op_destroy if syncfs is working Darrick J. Wong
@ 2025-09-16  1:06   ` Darrick J. Wong
  9 siblings, 0 replies; 87+ messages in thread
From: Darrick J. Wong @ 2025-09-16  1:06 UTC (permalink / raw)
  To: tytso; +Cc: miklos, neal, linux-fsdevel, linux-ext4, John, bernd,
	joannelkoong

From: Darrick J. Wong <djwong@kernel.org>

Convey these three inode flags to the kernel when we're loading a file.
This way the kernel can advertise and enforce those flags so that the
fuse server doesn't have to.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 fuse4fs/fuse4fs.c |   10 ++++++++++
 misc/fuse2fs.c    |   53 ++++++++++++++++++++++++++++++++++++++---------------
 2 files changed, 48 insertions(+), 15 deletions(-)


diff --git a/fuse4fs/fuse4fs.c b/fuse4fs/fuse4fs.c
index 4f5618e64a93c3..a7709a7e6fb699 100644
--- a/fuse4fs/fuse4fs.c
+++ b/fuse4fs/fuse4fs.c
@@ -1800,6 +1800,16 @@ static int fuse4fs_stat_inode(struct fuse4fs *ff, ext2_ino_t ino,
 	entry->entry_timeout = FUSE4FS_ATTR_TIMEOUT;
 
 	fstat->iflags = 0;
+
+	if (inodep->i_flags & EXT2_SYNC_FL)
+		fstat->iflags |= FUSE_IFLAG_SYNC;
+
+	if (inodep->i_flags & EXT2_IMMUTABLE_FL)
+		fstat->iflags |= FUSE_IFLAG_IMMUTABLE;
+
+	if (inodep->i_flags & EXT2_APPEND_FL)
+		fstat->iflags |= FUSE_IFLAG_APPEND;
+
 #ifdef HAVE_FUSE_IOMAP
 	if (fuse4fs_iomap_enabled(ff)) {
 		fstat->iflags |= FUSE_IFLAG_IOMAP;
diff --git a/misc/fuse2fs.c b/misc/fuse2fs.c
index b193e0b2c06b69..260d1b77e3f24b 100644
--- a/misc/fuse2fs.c
+++ b/misc/fuse2fs.c
@@ -1584,7 +1584,7 @@ static void *op_init(struct fuse_conn_info *conn,
 }
 
 static int fuse2fs_stat(struct fuse2fs *ff, ext2_ino_t ino,
-			struct stat *statbuf)
+			struct stat *statbuf, unsigned int *iflags)
 {
 	struct ext2_inode_large inode;
 	ext2_filsys fs = ff->fs;
@@ -1641,6 +1641,7 @@ static int fuse2fs_stat(struct fuse2fs *ff, ext2_ino_t ino,
 			statbuf->st_rdev = inode.i_block[1];
 	}
 
+	*iflags = inode.i_flags;
 	return ret;
 }
 
@@ -1675,22 +1676,31 @@ static int __fuse2fs_file_ino(struct fuse2fs *ff, const char *path,
 # define fuse2fs_file_ino(ff, path, fp, inop) \
 	__fuse2fs_file_ino((ff), (path), (fp), (inop), __func__, __LINE__)
 
+static int fuse2fs_getattr(struct fuse2fs *ff, const char *path,
+			   struct stat *statbuf, struct fuse_file_info *fi,
+			   unsigned int *iflags)
+{
+	ext2_ino_t ino;
+	int ret = 0;
+
+	FUSE2FS_CHECK_CONTEXT(ff);
+	fuse2fs_start(ff);
+	ret = fuse2fs_file_ino(ff, path, fi, &ino);
+	if (ret)
+		goto out;
+	ret = fuse2fs_stat(ff, ino, statbuf, iflags);
+out:
+	fuse2fs_finish(ff, ret);
+	return ret;
+}
+
 static int op_getattr(const char *path, struct stat *statbuf,
 		      struct fuse_file_info *fi)
 {
 	struct fuse2fs *ff = fuse2fs_get();
-	ext2_ino_t ino;
-	int ret = 0;
+	unsigned int dontcare;
 
-	FUSE2FS_CHECK_CONTEXT(ff);
-	fuse2fs_start(ff);
-	ret = fuse2fs_file_ino(ff, path, fi, &ino);
-	if (ret)
-		goto out;
-	ret = fuse2fs_stat(ff, ino, statbuf);
-out:
-	fuse2fs_finish(ff, ret);
-	return ret;
+	return fuse2fs_getattr(ff, path, statbuf, fi, &dontcare);
 }
 
 #if FUSE_VERSION >= FUSE_MAKE_VERSION(3, 99)
@@ -1698,11 +1708,21 @@ static int op_getattr_iflags(const char *path, struct stat *statbuf,
 			     unsigned int *iflags, struct fuse_file_info *fi)
 {
 	struct fuse2fs *ff = fuse2fs_get();
-	int ret = op_getattr(path, statbuf, fi);
+	unsigned int i_flags;
+	int ret = fuse2fs_getattr(ff, path, statbuf, fi, &i_flags);
 
 	if (ret)
 		return ret;
 
+	if (i_flags & EXT2_IMMUTABLE_FL)
+		*iflags |= FUSE_IFLAG_IMMUTABLE;
+
+	if (i_flags & EXT2_SYNC_FL)
+		*iflags |= FUSE_IFLAG_SYNC;
+
+	if (i_flags & EXT2_APPEND_FL)
+		*iflags |= FUSE_IFLAG_APPEND;
+
 	if (fuse_fs_can_enable_iomap(statbuf)) {
 		*iflags |= FUSE_IFLAG_IOMAP;
 
@@ -3415,12 +3435,13 @@ static int fuse2fs_punch_posteof(struct fuse2fs *ff, ext2_ino_t ino,
 static int fuse2fs_file_uses_iomap(struct fuse2fs *ff, ext2_ino_t ino)
 {
 	struct stat statbuf;
+	unsigned int dontcare;
 	int ret;
 
 	if (!fuse2fs_iomap_enabled(ff))
 		return 0;
 
-	ret = fuse2fs_stat(ff, ino, &statbuf);
+	ret = fuse2fs_stat(ff, ino, &statbuf, &dontcare);
 	if (ret)
 		return ret;
 
@@ -4322,7 +4343,9 @@ static int op_readdir_iter(ext2_ino_t dir EXT2FS_ATTR((unused)),
 			(unsigned long long)i->dirpos);
 
 	if (i->flags == FUSE_READDIR_PLUS) {
-		ret = fuse2fs_stat(i->ff, dirent->inode, &stat);
+		unsigned int dontcare;
+
+		ret = fuse2fs_stat(i->ff, dirent->inode, &stat, &dontcare);
 		if (ret)
 			return DIRENT_ABORT;
 	}


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 1/3] fuse2fs: enable caching of iomaps
  2025-09-16  0:23 ` [PATCHSET RFC v5 7/9] fuse2fs: cache iomap mappings for even better file IO performance Darrick J. Wong
@ 2025-09-16  1:06   ` Darrick J. Wong
  2025-09-16  1:06   ` [PATCH 2/3] fuse2fs: be smarter about caching iomaps Darrick J. Wong
  2025-09-16  1:06   ` [PATCH 3/3] fuse2fs: enable iomap Darrick J. Wong
  2 siblings, 0 replies; 87+ messages in thread
From: Darrick J. Wong @ 2025-09-16  1:06 UTC (permalink / raw)
  To: tytso; +Cc: miklos, neal, linux-fsdevel, linux-ext4, John, bernd,
	joannelkoong

From: Darrick J. Wong <djwong@kernel.org>

Cache the iomaps we generate in the kernel for better performance.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 fuse4fs/fuse4fs.c |   25 +++++++++++++++++++++++++
 misc/fuse2fs.c    |   24 ++++++++++++++++++++++++
 2 files changed, 49 insertions(+)


diff --git a/fuse4fs/fuse4fs.c b/fuse4fs/fuse4fs.c
index a7709a7e6fb699..6ab660b36d0472 100644
--- a/fuse4fs/fuse4fs.c
+++ b/fuse4fs/fuse4fs.c
@@ -284,6 +284,8 @@ struct fuse4fs {
 #ifdef STATX_WRITE_ATOMIC
 	unsigned int awu_min, awu_max;
 #endif
+	/* options set by fuse_opt_parse must be of type int */
+	int iomap_cache;
 #endif
 	unsigned int blockmask;
 	unsigned long offset;
@@ -6373,6 +6375,24 @@ static void op_iomap_begin(fuse_req_t req, fuse_ino_t fino, uint64_t dontcare,
 	if (opflags & FUSE_IOMAP_OP_ATOMIC)
 		read.flags |= FUSE_IOMAP_F_ATOMIC_BIO;
 
+	/*
+	 * Cache the mapping in the kernel so that we can reuse them for
+	 * subsequent IO.
+	 */
+	if (ff->iomap_cache) {
+		ret = fuse_lowlevel_notify_iomap_upsert(ff->fuse, fino, ino,
+							&read, NULL);
+		if (ret) {
+			ret = translate_error(fs, ino, -ret);
+			goto out_unlock;
+		} else {
+			/* Tell the kernel to retry from cache */
+			read.type = FUSE_IOMAP_TYPE_RETRY_CACHE;
+			read.dev = FUSE_IOMAP_DEV_NULL;
+			read.addr = FUSE_IOMAP_NULL_ADDR;
+		}
+	}
+
 out_unlock:
 	fuse4fs_finish(ff, ret);
 	if (ret)
@@ -7183,6 +7203,10 @@ static struct fuse_opt fuse4fs_opts[] = {
 	FUSE4FS_OPT("timing",		timing,			1),
 #endif
 	FUSE4FS_OPT("noblkdev",		noblkdev,		1),
+#ifdef HAVE_FUSE_IOMAP
+	FUSE4FS_OPT("iomap_cache",	iomap_cache,		1),
+	FUSE4FS_OPT("noiomap_cache",	iomap_cache,		0),
+#endif
 
 #ifdef HAVE_FUSE_IOMAP
 #ifdef MS_LAZYTIME
@@ -7517,6 +7541,7 @@ int main(int argc, char *argv[])
 		.iomap_want = FT_DEFAULT,
 		.iomap_state = IOMAP_UNKNOWN,
 		.iomap_dev = FUSE_IOMAP_DEV_NULL,
+		.iomap_cache = 1,
 #endif
 		.translate_inums = 1,
 		.write_gdt_on_destroy = 1,
diff --git a/misc/fuse2fs.c b/misc/fuse2fs.c
index 260d1b77e3f24b..14a1ceeea46a0b 100644
--- a/misc/fuse2fs.c
+++ b/misc/fuse2fs.c
@@ -277,6 +277,8 @@ struct fuse2fs {
 #ifdef STATX_WRITE_ATOMIC
 	unsigned int awu_min, awu_max;
 #endif
+	/* options set by fuse_opt_parse must be of type int */
+	int iomap_cache;
 #endif
 	unsigned int blockmask;
 	unsigned long offset;
@@ -5942,6 +5944,23 @@ static int op_iomap_begin(const char *path, uint64_t nodeid, uint64_t attr_ino,
 	if (opflags & FUSE_IOMAP_OP_ATOMIC)
 		read->flags |= FUSE_IOMAP_F_ATOMIC_BIO;
 
+	/*
+	 * Cache the mapping in the kernel so that we can reuse them for
+	 * subsequent IO.
+	 */
+	if (ff->iomap_cache) {
+		ret = fuse_fs_iomap_upsert(nodeid, attr_ino, read, NULL);
+		if (ret) {
+			ret = translate_error(fs, attr_ino, -ret);
+			goto out_unlock;
+		} else {
+			/* Tell the kernel to retry from cache */
+			read->type = FUSE_IOMAP_TYPE_RETRY_CACHE;
+			read->dev = FUSE_IOMAP_DEV_NULL;
+			read->addr = FUSE_IOMAP_NULL_ADDR;
+		}
+	}
+
 out_unlock:
 	fuse2fs_finish(ff, ret);
 	return ret;
@@ -6744,6 +6763,10 @@ static struct fuse_opt fuse2fs_opts[] = {
 	FUSE2FS_OPT("timing",		timing,			1),
 #endif
 	FUSE2FS_OPT("noblkdev",		noblkdev,		1),
+#ifdef HAVE_FUSE_IOMAP
+	FUSE2FS_OPT("iomap_cache",	iomap_cache,		1),
+	FUSE2FS_OPT("noiomap_cache",	iomap_cache,		0),
+#endif
 
 #ifdef HAVE_FUSE_IOMAP
 #ifdef MS_LAZYTIME
@@ -6978,6 +7001,7 @@ int main(int argc, char *argv[])
 		.iomap_want = FT_DEFAULT,
 		.iomap_state = IOMAP_UNKNOWN,
 		.iomap_dev = FUSE_IOMAP_DEV_NULL,
+		.iomap_cache = 1,
 #endif
 		.write_gdt_on_destroy = 1,
 	};


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 2/3] fuse2fs: be smarter about caching iomaps
  2025-09-16  0:23 ` [PATCHSET RFC v5 7/9] fuse2fs: cache iomap mappings for even better file IO performance Darrick J. Wong
  2025-09-16  1:06   ` [PATCH 1/3] fuse2fs: enable caching of iomaps Darrick J. Wong
@ 2025-09-16  1:06   ` Darrick J. Wong
  2025-09-16  1:06   ` [PATCH 3/3] fuse2fs: enable iomap Darrick J. Wong
  2 siblings, 0 replies; 87+ messages in thread
From: Darrick J. Wong @ 2025-09-16  1:06 UTC (permalink / raw)
  To: tytso; +Cc: miklos, neal, linux-fsdevel, linux-ext4, John, bernd,
	joannelkoong

From: Darrick J. Wong <djwong@kernel.org>

There's no point in caching iomaps when we're initiating a disk write to
an unwritten region -- we'll just replace the mapping in the ioend.
Save ourselves a bit of overhead by screening for that.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 fuse4fs/fuse4fs.c |   27 ++++++++++++++++++++++++++-
 misc/fuse2fs.c    |   24 +++++++++++++++++++++++-
 2 files changed, 49 insertions(+), 2 deletions(-)


diff --git a/fuse4fs/fuse4fs.c b/fuse4fs/fuse4fs.c
index 6ab660b36d0472..5c563eff1c38c1 100644
--- a/fuse4fs/fuse4fs.c
+++ b/fuse4fs/fuse4fs.c
@@ -6309,6 +6309,31 @@ static int fuse4fs_iomap_begin_write(struct fuse4fs *ff, ext2_ino_t ino,
 	return 0;
 }
 
+static inline int fuse4fs_should_cache_iomap(struct fuse4fs *ff,
+					     uint32_t opflags,
+					     const struct fuse_file_iomap *map)
+{
+	if (!ff->iomap_cache)
+		return 0;
+
+	/* XXX I think this is stupid */
+	return 1;
+
+	/*
+	 * Don't cache small unwritten extents that are being written to the
+	 * device because the overhead of keeping the cache updated will tank
+	 * performance.
+	 */
+	if ((opflags & (FUSE_IOMAP_OP_WRITE | FUSE_IOMAP_OP_DIRECT)) == 0)
+		return 1;
+	if (map->type != FUSE_IOMAP_TYPE_UNWRITTEN)
+		return 1;
+	if (map->length >= FUSE4FS_FSB_TO_B(ff, 16))
+		return 1;
+
+	return 0;
+}
+
 static void op_iomap_begin(fuse_req_t req, fuse_ino_t fino, uint64_t dontcare,
 			   off_t pos, uint64_t count, uint32_t opflags)
 {
@@ -6379,7 +6404,7 @@ static void op_iomap_begin(fuse_req_t req, fuse_ino_t fino, uint64_t dontcare,
 	 * Cache the mapping in the kernel so that we can reuse them for
 	 * subsequent IO.
 	 */
-	if (ff->iomap_cache) {
+	if (fuse4fs_should_cache_iomap(ff, opflags, &read)) {
 		ret = fuse_lowlevel_notify_iomap_upsert(ff->fuse, fino, ino,
 							&read, NULL);
 		if (ret) {
diff --git a/misc/fuse2fs.c b/misc/fuse2fs.c
index 14a1ceeea46a0b..7a10b6cab87f7c 100644
--- a/misc/fuse2fs.c
+++ b/misc/fuse2fs.c
@@ -5876,6 +5876,28 @@ static int fuse2fs_iomap_begin_write(struct fuse2fs *ff, ext2_ino_t ino,
 	return 0;
 }
 
+static inline int fuse2fs_should_cache_iomap(struct fuse2fs *ff,
+					     uint32_t opflags,
+					     const struct fuse_file_iomap *map)
+{
+	if (!ff->iomap_cache)
+		return 0;
+
+	/*
+	 * Don't cache small unwritten extents that are being written to the
+	 * device because the overhead of keeping the cache updated will tank
+	 * performance.
+	 */
+	if ((opflags & (FUSE_IOMAP_OP_WRITE | FUSE_IOMAP_OP_DIRECT)) == 0)
+		return 1;
+	if (map->type != FUSE_IOMAP_TYPE_UNWRITTEN)
+		return 1;
+	if (map->length >= FUSE2FS_FSB_TO_B(ff, 16))
+		return 1;
+
+	return 0;
+}
+
 static int op_iomap_begin(const char *path, uint64_t nodeid, uint64_t attr_ino,
 			  off_t pos, uint64_t count, uint32_t opflags,
 			  struct fuse_file_iomap *read,
@@ -5948,7 +5970,7 @@ static int op_iomap_begin(const char *path, uint64_t nodeid, uint64_t attr_ino,
 	 * Cache the mapping in the kernel so that we can reuse them for
 	 * subsequent IO.
 	 */
-	if (ff->iomap_cache) {
+	if (fuse2fs_should_cache_iomap(ff, opflags, read)) {
 		ret = fuse_fs_iomap_upsert(nodeid, attr_ino, read, NULL);
 		if (ret) {
 			ret = translate_error(fs, attr_ino, -ret);


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 3/3] fuse2fs: enable iomap
  2025-09-16  0:23 ` [PATCHSET RFC v5 7/9] fuse2fs: cache iomap mappings for even better file IO performance Darrick J. Wong
  2025-09-16  1:06   ` [PATCH 1/3] fuse2fs: enable caching of iomaps Darrick J. Wong
  2025-09-16  1:06   ` [PATCH 2/3] fuse2fs: be smarter about caching iomaps Darrick J. Wong
@ 2025-09-16  1:06   ` Darrick J. Wong
  2 siblings, 0 replies; 87+ messages in thread
From: Darrick J. Wong @ 2025-09-16  1:06 UTC (permalink / raw)
  To: tytso; +Cc: miklos, neal, linux-fsdevel, linux-ext4, John, bernd,
	joannelkoong

From: Darrick J. Wong <djwong@kernel.org>

Now that iomap functionality is complete, enable this for users.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 fuse4fs/fuse4fs.c |    4 ----
 misc/fuse2fs.c    |    4 ----
 2 files changed, 8 deletions(-)


diff --git a/fuse4fs/fuse4fs.c b/fuse4fs/fuse4fs.c
index 5c563eff1c38c1..c4397fc365ced7 100644
--- a/fuse4fs/fuse4fs.c
+++ b/fuse4fs/fuse4fs.c
@@ -1652,10 +1652,6 @@ static inline int fuse_set_feature_flag(struct fuse_conn_info *conn,
 static void fuse4fs_iomap_enable(struct fuse_conn_info *conn,
 				 struct fuse4fs *ff)
 {
-	/* Don't let anyone touch iomap until the end of the patchset. */
-	ff->iomap_state = IOMAP_DISABLED;
-	return;
-
 	/* iomap only works with block devices */
 	if (ff->iomap_state != IOMAP_DISABLED && fuse4fs_on_bdev(ff) &&
 	    fuse_set_feature_flag(conn, FUSE_CAP_IOMAP)) {
diff --git a/misc/fuse2fs.c b/misc/fuse2fs.c
index 7a10b6cab87f7c..5e4680ca023282 100644
--- a/misc/fuse2fs.c
+++ b/misc/fuse2fs.c
@@ -1478,10 +1478,6 @@ static inline int fuse_set_feature_flag(struct fuse_conn_info *conn,
 static void fuse2fs_iomap_enable(struct fuse_conn_info *conn,
 				 struct fuse2fs *ff)
 {
-	/* Don't let anyone touch iomap until the end of the patchset. */
-	ff->iomap_state = IOMAP_DISABLED;
-	return;
-
 	/* iomap only works with block devices */
 	if (ff->iomap_state != IOMAP_DISABLED && fuse2fs_on_bdev(ff) &&
 	    fuse_set_feature_flag(conn, FUSE_CAP_IOMAP)) {


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 1/6] libsupport: add caching IO manager
  2025-09-16  0:23 ` [PATCHSET RFC v5 8/9] fuse2fs: improve block and inode caching Darrick J. Wong
@ 2025-09-16  1:07   ` Darrick J. Wong
  2025-09-16  1:07   ` [PATCH 2/6] iocache: add the actual buffer cache Darrick J. Wong
                     ` (4 subsequent siblings)
  5 siblings, 0 replies; 87+ messages in thread
From: Darrick J. Wong @ 2025-09-16  1:07 UTC (permalink / raw)
  To: tytso; +Cc: miklos, neal, linux-fsdevel, linux-ext4, John, bernd,
	joannelkoong

From: Darrick J. Wong <djwong@kernel.org>

Start creating a caching IO manager so that we can have better caching
of metadata blocks in fuse2fs.  For now it's just a passthrough cache.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 lib/support/iocache.h   |   17 +++
 lib/ext2fs/io_manager.c |    3 
 lib/support/Makefile.in |    6 +
 lib/support/iocache.c   |  306 +++++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 331 insertions(+), 1 deletion(-)
 create mode 100644 lib/support/iocache.h
 create mode 100644 lib/support/iocache.c


diff --git a/lib/support/iocache.h b/lib/support/iocache.h
new file mode 100644
index 00000000000000..3c1d1df00e25bd
--- /dev/null
+++ b/lib/support/iocache.h
@@ -0,0 +1,17 @@
+/*
+ * iocache.h - IO cache
+ *
+ * Copyright (C) 2025 Oracle.
+ *
+ * %Begin-Header%
+ * This file may be redistributed under the terms of the GNU Public
+ * License.
+ * %End-Header%
+ */
+#ifndef __IOCACHE_H__
+#define __IOCACHE_H__
+
+errcode_t iocache_set_backing_manager(io_manager manager);
+extern io_manager iocache_io_manager;
+
+#endif /* __IOCACHE_H__ */
diff --git a/lib/ext2fs/io_manager.c b/lib/ext2fs/io_manager.c
index c91fab4eb290d5..7a6a6bfedc8a1c 100644
--- a/lib/ext2fs/io_manager.c
+++ b/lib/ext2fs/io_manager.c
@@ -16,9 +16,12 @@
 #if HAVE_SYS_TYPES_H
 #include <sys/types.h>
 #endif
+#include <stdbool.h>
 
 #include "ext2_fs.h"
 #include "ext2fs.h"
+#include "support/list.h"
+#include "support/cache.h"
 
 errcode_t io_channel_set_options(io_channel channel, const char *opts)
 {
diff --git a/lib/support/Makefile.in b/lib/support/Makefile.in
index 13d6f06f150afd..98a9bd42eef55e 100644
--- a/lib/support/Makefile.in
+++ b/lib/support/Makefile.in
@@ -14,6 +14,7 @@ MKDIR_P = @MKDIR_P@
 all::
 
 OBJS=		cstring.o \
+		iocache.o \
 		mkquota.o \
 		plausible.o \
 		profile.o \
@@ -42,7 +43,8 @@ SRCS=		$(srcdir)/argv_parse.c \
 		$(srcdir)/quotaio_v2.c \
 		$(srcdir)/dict.c \
 		$(srcdir)/devname.c \
-		$(srcdir)/cache.c
+		$(srcdir)/cache.c \
+		$(srcdir)/iocache.c
 
 LIBRARY= libsupport
 LIBDIR= support
@@ -187,3 +189,5 @@ devname.o: $(srcdir)/devname.c $(top_builddir)/lib/config.h \
  $(top_builddir)/lib/dirpaths.h $(srcdir)/devname.h $(srcdir)/nls-enable.h
 cache.o: $(srcdir)/cache.c $(top_builddir)/lib/config.h \
  $(srcdir)/cache.h $(srcdir)/list.h $(srcdir)/xbitops.h
+iocache.o: $(srcdir)/iocache.c $(top_builddir)/lib/config.h \
+ $(srcdir)/iocache.h $(srcdir)/cache.h $(srcdir)/list.h $(srcdir)/xbitops.h
diff --git a/lib/support/iocache.c b/lib/support/iocache.c
new file mode 100644
index 00000000000000..9870780d65ef61
--- /dev/null
+++ b/lib/support/iocache.c
@@ -0,0 +1,306 @@
+/*
+ * fuse4fs.c - FUSE low-level server for e2fsprogs.
+ *
+ * Copyright (C) 2025 Oracle.
+ *
+ * %Begin-Header%
+ * This file may be redistributed under the terms of the GNU Public
+ * License.
+ * %End-Header%
+ */
+#include "config.h"
+#include "ext2fs/ext2_fs.h"
+#include "ext2fs/ext2fs.h"
+#include "ext2fs/ext2fsP.h"
+#include "support/iocache.h"
+
+#define IOCACHE_IO_CHANNEL_MAGIC	0x424F5254	/* BORT */
+
+static io_manager iocache_backing_manager;
+
+struct iocache_private_data {
+	int			magic;
+	io_channel		real;
+};
+
+static struct iocache_private_data *IOCACHE(io_channel channel)
+{
+	return (struct iocache_private_data *)channel->private_data;
+}
+
+static errcode_t iocache_read_error(io_channel channel, unsigned long block,
+				    int count, void *data, size_t size,
+				    int actual_bytes_read, errcode_t error)
+{
+	io_channel iocache_channel = channel->app_data;
+
+	return iocache_channel->read_error(iocache_channel, block, count, data,
+					   size, actual_bytes_read, error);
+}
+
+static errcode_t iocache_write_error(io_channel channel, unsigned long block,
+				     int count, const void *data, size_t size,
+				     int actual_bytes_written,
+				     errcode_t error)
+{
+	io_channel iocache_channel = channel->app_data;
+
+	return iocache_channel->write_error(iocache_channel, block, count, data,
+					    size, actual_bytes_written, error);
+}
+
+static errcode_t iocache_open(const char *name, int flags, io_channel *channel)
+{
+	io_channel	io = NULL;
+	io_channel	real;
+	struct iocache_private_data *data = NULL;
+	errcode_t	retval;
+
+	if (!name)
+		return EXT2_ET_BAD_DEVICE_NAME;
+	if (!iocache_backing_manager)
+		return EXT2_ET_INVALID_ARGUMENT;
+
+	retval = iocache_backing_manager->open(name, flags, &real);
+	if (retval)
+		return retval;
+
+	retval = ext2fs_get_mem(sizeof(struct struct_io_channel), &io);
+	if (retval)
+		goto out_backing;
+	memset(io, 0, sizeof(struct struct_io_channel));
+	io->magic = EXT2_ET_MAGIC_IO_CHANNEL;
+
+	retval = ext2fs_get_mem(sizeof(struct iocache_private_data), &data);
+	if (retval)
+		goto out_channel;
+	memset(data, 0, sizeof(struct iocache_private_data));
+	data->magic = IOCACHE_IO_CHANNEL_MAGIC;
+
+	io->manager = iocache_io_manager;
+	retval = ext2fs_get_mem(strlen(name) + 1, &io->name);
+	if (retval)
+		goto out_data;
+
+	strcpy(io->name, name);
+	io->private_data = data;
+	io->block_size = real->block_size;
+	io->read_error = 0;
+	io->write_error = 0;
+	io->refcount = 1;
+	io->flags = real->flags;
+	data->real = real;
+	real->app_data = io;
+	real->read_error = iocache_read_error;
+	real->write_error = iocache_write_error;
+
+	*channel = io;
+	return 0;
+
+out_data:
+	ext2fs_free_mem(&data);
+out_channel:
+	ext2fs_free_mem(&io);
+out_backing:
+	io_channel_close(real);
+	return retval;
+}
+
+static errcode_t iocache_close(io_channel channel)
+{
+	struct iocache_private_data *data = IOCACHE(channel);
+	errcode_t	retval = 0;
+
+	EXT2_CHECK_MAGIC(channel, EXT2_ET_MAGIC_IO_CHANNEL);
+	EXT2_CHECK_MAGIC(data, IOCACHE_IO_CHANNEL_MAGIC);
+
+	if (--channel->refcount > 0)
+		return 0;
+	if (data->real)
+		retval = io_channel_close(data->real);
+	ext2fs_free_mem(&channel->private_data);
+	if (channel->name)
+		ext2fs_free_mem(&channel->name);
+	ext2fs_free_mem(&channel);
+
+	return retval;
+}
+
+static errcode_t iocache_set_blksize(io_channel channel, int blksize)
+{
+	struct iocache_private_data *data = IOCACHE(channel);
+	errcode_t retval;
+
+	EXT2_CHECK_MAGIC(channel, EXT2_ET_MAGIC_IO_CHANNEL);
+	EXT2_CHECK_MAGIC(data, IOCACHE_IO_CHANNEL_MAGIC);
+
+	retval = io_channel_set_blksize(data->real, blksize);
+	if (retval)
+		return retval;
+
+	channel->block_size = data->real->block_size;
+	return 0;
+}
+
+static errcode_t iocache_flush(io_channel channel)
+{
+	struct iocache_private_data *data = IOCACHE(channel);
+
+	EXT2_CHECK_MAGIC(channel, EXT2_ET_MAGIC_IO_CHANNEL);
+	EXT2_CHECK_MAGIC(data, IOCACHE_IO_CHANNEL_MAGIC);
+
+	return io_channel_flush(data->real);
+}
+
+static errcode_t iocache_write_byte(io_channel channel, unsigned long offset,
+				    int count, const void *buf)
+{
+	struct iocache_private_data *data = IOCACHE(channel);
+
+	EXT2_CHECK_MAGIC(channel, EXT2_ET_MAGIC_IO_CHANNEL);
+	EXT2_CHECK_MAGIC(data, IOCACHE_IO_CHANNEL_MAGIC);
+
+	return io_channel_write_byte(data->real, offset, count, buf);
+}
+
+static errcode_t iocache_set_option(io_channel channel, const char *option,
+				    const char *arg)
+{
+	struct iocache_private_data *data = IOCACHE(channel);
+
+	EXT2_CHECK_MAGIC(channel, EXT2_ET_MAGIC_IO_CHANNEL);
+	EXT2_CHECK_MAGIC(data, IOCACHE_IO_CHANNEL_MAGIC);
+
+	return data->real->manager->set_option(data->real, option, arg);
+}
+
+static errcode_t iocache_get_stats(io_channel channel, io_stats *io_stats)
+{
+	struct iocache_private_data *data = IOCACHE(channel);
+
+	EXT2_CHECK_MAGIC(channel, EXT2_ET_MAGIC_IO_CHANNEL);
+	EXT2_CHECK_MAGIC(data, IOCACHE_IO_CHANNEL_MAGIC);
+
+	return data->real->manager->get_stats(data->real, io_stats);
+}
+
+static errcode_t iocache_read_blk64(io_channel channel,
+				    unsigned long long block, int count,
+				    void *buf)
+{
+	struct iocache_private_data *data = IOCACHE(channel);
+
+	EXT2_CHECK_MAGIC(channel, EXT2_ET_MAGIC_IO_CHANNEL);
+	EXT2_CHECK_MAGIC(data, IOCACHE_IO_CHANNEL_MAGIC);
+
+	return io_channel_read_blk64(data->real, block, count, buf);
+}
+
+static errcode_t iocache_write_blk64(io_channel channel,
+				     unsigned long long block, int count,
+				     const void *buf)
+{
+	struct iocache_private_data *data = IOCACHE(channel);
+
+	EXT2_CHECK_MAGIC(channel, EXT2_ET_MAGIC_IO_CHANNEL);
+	EXT2_CHECK_MAGIC(data, IOCACHE_IO_CHANNEL_MAGIC);
+
+	return io_channel_write_blk64(data->real, block, count, buf);
+}
+
+static errcode_t iocache_read_blk(io_channel channel, unsigned long block,
+				  int count, void *buf)
+{
+	return iocache_read_blk64(channel, block, count, buf);
+}
+
+static errcode_t iocache_write_blk(io_channel channel, unsigned long block,
+				   int count, const void *buf)
+{
+	return iocache_write_blk64(channel, block, count, buf);
+}
+
+static errcode_t iocache_discard(io_channel channel, unsigned long long block,
+				 unsigned long long count)
+{
+	struct iocache_private_data *data = IOCACHE(channel);
+
+	EXT2_CHECK_MAGIC(channel, EXT2_ET_MAGIC_IO_CHANNEL);
+	EXT2_CHECK_MAGIC(data, IOCACHE_IO_CHANNEL_MAGIC);
+
+	return io_channel_discard(data->real, block, count);
+}
+
+static errcode_t iocache_cache_readahead(io_channel channel,
+					 unsigned long long block,
+					 unsigned long long count)
+{
+	struct iocache_private_data *data = IOCACHE(channel);
+
+	EXT2_CHECK_MAGIC(channel, EXT2_ET_MAGIC_IO_CHANNEL);
+	EXT2_CHECK_MAGIC(data, IOCACHE_IO_CHANNEL_MAGIC);
+
+	return io_channel_cache_readahead(data->real, block, count);
+}
+
+static errcode_t iocache_zeroout(io_channel channel, unsigned long long block,
+				 unsigned long long count)
+{
+	struct iocache_private_data *data = IOCACHE(channel);
+
+	EXT2_CHECK_MAGIC(channel, EXT2_ET_MAGIC_IO_CHANNEL);
+	EXT2_CHECK_MAGIC(data, IOCACHE_IO_CHANNEL_MAGIC);
+
+	return io_channel_zeroout(data->real, block, count);
+}
+
+static errcode_t iocache_get_fd(io_channel channel, int *fd)
+{
+	struct iocache_private_data *data = IOCACHE(channel);
+
+	EXT2_CHECK_MAGIC(channel, EXT2_ET_MAGIC_IO_CHANNEL);
+	EXT2_CHECK_MAGIC(data, IOCACHE_IO_CHANNEL_MAGIC);
+
+	return io_channel_get_fd(data->real, fd);
+}
+
+static errcode_t iocache_invalidate_blocks(io_channel channel,
+					   unsigned long long block,
+					   unsigned long long count)
+{
+	struct iocache_private_data *data = IOCACHE(channel);
+
+	EXT2_CHECK_MAGIC(channel, EXT2_ET_MAGIC_IO_CHANNEL);
+	EXT2_CHECK_MAGIC(data, IOCACHE_IO_CHANNEL_MAGIC);
+
+	return io_channel_invalidate_blocks(data->real, block, count);
+}
+
+static struct struct_io_manager struct_iocache_manager = {
+	.magic			= EXT2_ET_MAGIC_IO_MANAGER,
+	.name			= "iocache I/O manager",
+	.open			= iocache_open,
+	.close			= iocache_close,
+	.set_blksize		= iocache_set_blksize,
+	.read_blk		= iocache_read_blk,
+	.write_blk		= iocache_write_blk,
+	.flush			= iocache_flush,
+	.write_byte		= iocache_write_byte,
+	.set_option		= iocache_set_option,
+	.get_stats		= iocache_get_stats,
+	.read_blk64		= iocache_read_blk64,
+	.write_blk64		= iocache_write_blk64,
+	.discard		= iocache_discard,
+	.cache_readahead	= iocache_cache_readahead,
+	.zeroout		= iocache_zeroout,
+	.get_fd			= iocache_get_fd,
+	.invalidate_blocks	= iocache_invalidate_blocks,
+};
+
+io_manager iocache_io_manager = &struct_iocache_manager;
+
+errcode_t iocache_set_backing_manager(io_manager manager)
+{
+	iocache_backing_manager = manager;
+	return 0;
+}


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 2/6] iocache: add the actual buffer cache
  2025-09-16  0:23 ` [PATCHSET RFC v5 8/9] fuse2fs: improve block and inode caching Darrick J. Wong
  2025-09-16  1:07   ` [PATCH 1/6] libsupport: add caching IO manager Darrick J. Wong
@ 2025-09-16  1:07   ` Darrick J. Wong
  2025-09-16  1:07   ` [PATCH 3/6] iocache: bump buffer mru priority every 50 accesses Darrick J. Wong
                     ` (3 subsequent siblings)
  5 siblings, 0 replies; 87+ messages in thread
From: Darrick J. Wong @ 2025-09-16  1:07 UTC (permalink / raw)
  To: tytso; +Cc: miklos, neal, linux-fsdevel, linux-ext4, John, bernd,
	joannelkoong

From: Darrick J. Wong <djwong@kernel.org>

Wire up buffer caching into our new caching IO manager.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 lib/support/iocache.c |  469 +++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 447 insertions(+), 22 deletions(-)


diff --git a/lib/support/iocache.c b/lib/support/iocache.c
index 9870780d65ef61..ab879e85d18f2a 100644
--- a/lib/support/iocache.c
+++ b/lib/support/iocache.c
@@ -9,46 +9,288 @@
  * %End-Header%
  */
 #include "config.h"
+#include <assert.h>
+#include <stdbool.h>
+#include <pthread.h>
+#include <unistd.h>
 #include "ext2fs/ext2_fs.h"
 #include "ext2fs/ext2fs.h"
 #include "ext2fs/ext2fsP.h"
 #include "support/iocache.h"
+#include "support/list.h"
+#include "support/cache.h"
 
 #define IOCACHE_IO_CHANNEL_MAGIC	0x424F5254	/* BORT */
 
 static io_manager iocache_backing_manager;
 
+static inline uint64_t B_TO_FSBT(io_channel channel, uint64_t number) {
+	return number / channel->block_size;
+}
+
+static inline uint64_t B_TO_FSB(io_channel channel, uint64_t number) {
+	return (number + channel->block_size - 1) / channel->block_size;
+}
+
 struct iocache_private_data {
 	int			magic;
-	io_channel		real;
+	io_channel		real;		/* lower level io channel */
+	io_channel		channel;	/* cache channel */
+	struct cache		cache;
+	pthread_mutex_t		stats_lock;
+	struct struct_io_stats	io_stats;
+	unsigned long long	write_errors;
 };
 
+#define IOCACHEDATA(cache) \
+	(container_of(cache, struct iocache_private_data, cache))
+
 static struct iocache_private_data *IOCACHE(io_channel channel)
 {
 	return (struct iocache_private_data *)channel->private_data;
 }
 
-static errcode_t iocache_read_error(io_channel channel, unsigned long block,
-				    int count, void *data, size_t size,
-				    int actual_bytes_read, errcode_t error)
+struct iocache_buf {
+	struct cache_node	node;
+	struct list_head	list;
+	blk64_t			block;
+	void			*buf;
+	errcode_t		write_error;
+	unsigned int		uptodate:1;
+	unsigned int		dirty:1;
+};
+
+static inline void iocache_buf_lock(struct iocache_buf *ubuf)
 {
-	io_channel iocache_channel = channel->app_data;
+	pthread_mutex_lock(&ubuf->node.cn_mutex);
+}
 
-	return iocache_channel->read_error(iocache_channel, block, count, data,
-					   size, actual_bytes_read, error);
+static inline void iocache_buf_unlock(struct iocache_buf *ubuf)
+{
+	pthread_mutex_unlock(&ubuf->node.cn_mutex);
 }
 
-static errcode_t iocache_write_error(io_channel channel, unsigned long block,
-				     int count, const void *data, size_t size,
-				     int actual_bytes_written,
-				     errcode_t error)
+struct iocache_key {
+	blk64_t			block;
+};
+
+#define IOKEY(key)	((struct iocache_key *)(key))
+#define IOBUF(node)	(container_of((node), struct iocache_buf, node))
+
+static unsigned int
+iocache_hash(cache_key_t key, unsigned int hashsize, unsigned int hashshift)
 {
-	io_channel iocache_channel = channel->app_data;
+	uint64_t	hashval = IOKEY(key)->block;
+	uint64_t	tmp;
 
-	return iocache_channel->write_error(iocache_channel, block, count, data,
-					    size, actual_bytes_written, error);
+	tmp = hashval ^ (GOLDEN_RATIO_PRIME + hashval) / CACHE_LINE_SIZE;
+	tmp = tmp ^ ((tmp ^ GOLDEN_RATIO_PRIME) >> hashshift);
+	return tmp % hashsize;
 }
 
+static int iocache_compare(struct cache_node *node, cache_key_t key)
+{
+	struct iocache_buf *ubuf = IOBUF(node);
+	struct iocache_key *ukey = IOKEY(key);
+
+	if (ubuf->block == ukey->block)
+		return CACHE_HIT;
+
+	return CACHE_MISS;
+}
+
+static struct cache_node *iocache_alloc_node(struct cache *cache,
+					     cache_key_t key)
+{
+	struct iocache_private_data *data = IOCACHEDATA(cache);
+	struct iocache_key *ukey = IOKEY(key);
+	struct iocache_buf *ubuf;
+	errcode_t retval;
+
+	retval = ext2fs_get_mem(sizeof(struct iocache_buf), &ubuf);
+	if (retval)
+		return NULL;
+	memset(ubuf, 0, sizeof(*ubuf));
+
+	retval = io_channel_alloc_buf(data->channel, 0, &ubuf->buf);
+	if (retval) {
+		free(ubuf);
+		return NULL;
+	}
+	memset(ubuf->buf, 0, data->channel->block_size);
+
+	INIT_LIST_HEAD(&ubuf->list);
+	ubuf->block = ukey->block;
+	return &ubuf->node;
+}
+
+static bool iocache_flush_node(struct cache *cache, struct cache_node *node)
+{
+	struct iocache_private_data *data = IOCACHEDATA(cache);
+	struct iocache_buf *ubuf = IOBUF(node);
+	errcode_t retval;
+
+	if (ubuf->dirty) {
+		retval = io_channel_write_blk64(data->real, ubuf->block, 1,
+						ubuf->buf);
+		if (retval) {
+			ubuf->write_error = retval;
+			data->write_errors++;
+		} else {
+			ubuf->dirty = 0;
+			ubuf->write_error = 0;
+		}
+	}
+
+	return ubuf->dirty;
+}
+
+static void iocache_relse(struct cache *cache, struct cache_node *node)
+{
+	struct iocache_buf *ubuf = IOBUF(node);
+
+	assert(!ubuf->dirty);
+
+	ext2fs_free_mem(&ubuf->buf);
+	ext2fs_free_mem(&ubuf);
+}
+
+static unsigned int iocache_bulkrelse(struct cache *cache,
+				      struct list_head *list)
+{
+	struct cache_node *cn, *n;
+	int count = 0;
+
+	if (list_empty(list))
+		return 0;
+
+	list_for_each_entry_safe(cn, n, list, cn_mru) {
+		iocache_relse(cache, cn);
+		count++;
+	}
+
+	return count;
+}
+
+/* Flush all dirty buffers in the cache to disk. */
+static errcode_t iocache_flush_cache(struct iocache_private_data *data)
+{
+	return cache_flush(&data->cache) ? 0 : EIO;
+}
+
+/* Flush all dirty buffers in this range of the cache to disk. */
+static errcode_t iocache_flush_range(struct iocache_private_data *data,
+				     blk64_t block, uint64_t count)
+{
+	uint64_t i;
+	bool still_dirty = false;
+
+	for (i = 0; i < count; i++) {
+		struct iocache_key ukey = {
+			.block = block + i,
+		};
+		struct cache_node *node;
+
+		cache_node_get(&data->cache, &ukey, CACHE_GET_INCORE,
+			       &node);
+		if (!node)
+			continue;
+
+		/* cache_flush holds cn_mutex across the node flush */
+		pthread_mutex_unlock(&node->cn_mutex);
+		still_dirty |= iocache_flush_node(&data->cache, node);
+		pthread_mutex_unlock(&node->cn_mutex);
+
+		cache_node_put(&data->cache, node);
+	}
+
+	return still_dirty ? EIO : 0;
+}
+
+static void iocache_add_list(struct cache *cache, struct cache_node *node,
+			     void *data)
+{
+	struct iocache_buf *ubuf = IOBUF(node);
+	struct list_head *list = data;
+
+	assert(node->cn_count == 0 || node->cn_count == 1);
+
+	iocache_buf_lock(ubuf);
+	cache_node_grab(cache, node);
+	list_add_tail(&ubuf->list, list);
+	iocache_buf_unlock(ubuf);
+}
+
+static void iocache_invalidate_bufs(struct iocache_private_data *data,
+				    struct list_head *list)
+{
+	struct iocache_buf *ubuf, *n;
+
+	list_for_each_entry_safe(ubuf, n, list, list) {
+		struct iocache_key ukey = {
+			.block = ubuf->block,
+		};
+
+		assert(ubuf->node.cn_count == 1);
+
+		iocache_buf_lock(ubuf);
+		ubuf->dirty = 0;
+		list_del_init(&ubuf->list);
+		iocache_buf_unlock(ubuf);
+
+		cache_node_put(&data->cache, &ubuf->node);
+		cache_node_purge(&data->cache, &ukey, &ubuf->node);
+	}
+}
+
+/*
+ * Remove all blocks from the cache.  Dirty contents are discarded.  Buffer
+ * refcounts must be zero!
+ */
+static void iocache_invalidate_cache(struct iocache_private_data *data)
+{
+	LIST_HEAD(list);
+
+	cache_walk(&data->cache, iocache_add_list, &list);
+	iocache_invalidate_bufs(data, &list);
+}
+
+/*
+ * Remove a range of blocks from the cache.  Dirty contents are discarded.
+ * Buffer refcounts must be zero!
+ */
+static void iocache_invalidate_range(struct iocache_private_data *data,
+				     blk64_t block, uint64_t count)
+{
+	LIST_HEAD(list);
+	uint64_t i;
+
+	for (i = 0; i < count; i++) {
+		struct iocache_key ukey = {
+			.block = block + i,
+		};
+		struct cache_node *node;
+
+		cache_node_get(&data->cache, &ukey, CACHE_GET_INCORE,
+			       &node);
+		if (node) {
+			iocache_add_list(&data->cache, node, &list);
+			cache_node_put(&data->cache, node);
+		}
+	}
+	iocache_invalidate_bufs(data, &list);
+}
+
+static const struct cache_operations iocache_ops = {
+	.hash		= iocache_hash,
+	.alloc		= iocache_alloc_node,
+	.flush		= iocache_flush_node,
+	.relse		= iocache_relse,
+	.compare	= iocache_compare,
+	.bulkrelse	= iocache_bulkrelse,
+	.resize		= cache_gradual_resize,
+};
+
 static errcode_t iocache_open(const char *name, int flags, io_channel *channel)
 {
 	io_channel	io = NULL;
@@ -65,6 +307,9 @@ static errcode_t iocache_open(const char *name, int flags, io_channel *channel)
 	if (retval)
 		return retval;
 
+	/* disable any static cache in the lower io manager */
+	real->manager->set_option(real, "cache", "off");
+
 	retval = ext2fs_get_mem(sizeof(struct struct_io_channel), &io);
 	if (retval)
 		goto out_backing;
@@ -76,12 +321,19 @@ static errcode_t iocache_open(const char *name, int flags, io_channel *channel)
 		goto out_channel;
 	memset(data, 0, sizeof(struct iocache_private_data));
 	data->magic = IOCACHE_IO_CHANNEL_MAGIC;
+	data->io_stats.num_fields = 4;
+	data->channel = io;
 
 	io->manager = iocache_io_manager;
 	retval = ext2fs_get_mem(strlen(name) + 1, &io->name);
 	if (retval)
 		goto out_data;
 
+	retval = cache_init(CACHE_CAN_SHRINK, 1U << 10, &iocache_ops,
+			    &data->cache);
+	if (retval)
+		goto out_name;
+
 	strcpy(io->name, name);
 	io->private_data = data;
 	io->block_size = real->block_size;
@@ -91,12 +343,14 @@ static errcode_t iocache_open(const char *name, int flags, io_channel *channel)
 	io->flags = real->flags;
 	data->real = real;
 	real->app_data = io;
-	real->read_error = iocache_read_error;
-	real->write_error = iocache_write_error;
+
+	pthread_mutex_init(&data->stats_lock, NULL);
 
 	*channel = io;
 	return 0;
 
+out_name:
+	ext2fs_free_mem(&io->name);
 out_data:
 	ext2fs_free_mem(&data);
 out_channel:
@@ -116,6 +370,10 @@ static errcode_t iocache_close(io_channel channel)
 
 	if (--channel->refcount > 0)
 		return 0;
+	pthread_mutex_destroy(&data->stats_lock);
+	cache_flush(&data->cache);
+	cache_purge(&data->cache);
+	cache_destroy(&data->cache);
 	if (data->real)
 		retval = io_channel_close(data->real);
 	ext2fs_free_mem(&channel->private_data);
@@ -134,6 +392,11 @@ static errcode_t iocache_set_blksize(io_channel channel, int blksize)
 	EXT2_CHECK_MAGIC(channel, EXT2_ET_MAGIC_IO_CHANNEL);
 	EXT2_CHECK_MAGIC(data, IOCACHE_IO_CHANNEL_MAGIC);
 
+	retval = iocache_flush_cache(data);
+	if (retval)
+		return retval;
+	iocache_invalidate_cache(data);
+
 	retval = io_channel_set_blksize(data->real, blksize);
 	if (retval)
 		return retval;
@@ -145,21 +408,34 @@ static errcode_t iocache_set_blksize(io_channel channel, int blksize)
 static errcode_t iocache_flush(io_channel channel)
 {
 	struct iocache_private_data *data = IOCACHE(channel);
+	errcode_t retval = 0;
+	errcode_t retval2;
 
 	EXT2_CHECK_MAGIC(channel, EXT2_ET_MAGIC_IO_CHANNEL);
 	EXT2_CHECK_MAGIC(data, IOCACHE_IO_CHANNEL_MAGIC);
 
-	return io_channel_flush(data->real);
+	retval = iocache_flush_cache(data);
+	retval2 = io_channel_flush(data->real);
+	if (retval)
+		return retval;
+	return retval2;
 }
 
 static errcode_t iocache_write_byte(io_channel channel, unsigned long offset,
 				    int count, const void *buf)
 {
 	struct iocache_private_data *data = IOCACHE(channel);
+	blk64_t bno = B_TO_FSBT(channel, offset);
+	blk64_t next_bno = B_TO_FSB(channel, offset + count);
+	errcode_t retval;
 
 	EXT2_CHECK_MAGIC(channel, EXT2_ET_MAGIC_IO_CHANNEL);
 	EXT2_CHECK_MAGIC(data, IOCACHE_IO_CHANNEL_MAGIC);
 
+	retval = iocache_flush_range(data, bno, next_bno - bno);
+	if (retval)
+		return retval;
+	iocache_invalidate_range(data, bno, next_bno - bno);
 	return io_channel_write_byte(data->real, offset, count, buf);
 }
 
@@ -170,6 +446,16 @@ static errcode_t iocache_set_option(io_channel channel, const char *option,
 
 	EXT2_CHECK_MAGIC(channel, EXT2_ET_MAGIC_IO_CHANNEL);
 	EXT2_CHECK_MAGIC(data, IOCACHE_IO_CHANNEL_MAGIC);
+	errcode_t retval;
+
+	/* don't let unix io cache options leak through */
+	if (!strcmp(option, "cache_blocks") || !strcmp(option, "cache"))
+		return 0;
+
+	retval = iocache_flush_cache(data);
+	if (retval)
+		return retval;
+	iocache_invalidate_cache(data);
 
 	return data->real->manager->set_option(data->real, option, arg);
 }
@@ -181,31 +467,157 @@ static errcode_t iocache_get_stats(io_channel channel, io_stats *io_stats)
 	EXT2_CHECK_MAGIC(channel, EXT2_ET_MAGIC_IO_CHANNEL);
 	EXT2_CHECK_MAGIC(data, IOCACHE_IO_CHANNEL_MAGIC);
 
-	return data->real->manager->get_stats(data->real, io_stats);
+	/*
+	 * Yes, io_stats is a double-pointer, and we let the caller scribble on
+	 * our stats struct WITHOUT LOCKING!
+	 */
+	if (io_stats)
+		*io_stats = &data->io_stats;
+	return 0;
+}
+
+static void iocache_update_stats(struct iocache_private_data *data,
+				 unsigned long long bytes_read,
+				 unsigned long long bytes_written,
+				 int cache_op)
+{
+	pthread_mutex_lock(&data->stats_lock);
+	data->io_stats.bytes_read += bytes_read;
+	data->io_stats.bytes_written += bytes_written;
+	if (cache_op == CACHE_HIT)
+		data->io_stats.cache_hits++;
+	else
+		data->io_stats.cache_misses++;
+	pthread_mutex_unlock(&data->stats_lock);
 }
 
 static errcode_t iocache_read_blk64(io_channel channel,
 				    unsigned long long block, int count,
 				    void *buf)
 {
+	struct iocache_key ukey = {
+		.block = block,
+	};
 	struct iocache_private_data *data = IOCACHE(channel);
+	unsigned long long i;
+	errcode_t retval;
 
 	EXT2_CHECK_MAGIC(channel, EXT2_ET_MAGIC_IO_CHANNEL);
 	EXT2_CHECK_MAGIC(data, IOCACHE_IO_CHANNEL_MAGIC);
 
-	return io_channel_read_blk64(data->real, block, count, buf);
+	/*
+	 * If we're doing an odd-sized read, flush out the cache and then do a
+	 * direct read.
+	 */
+	if (count < 0) {
+		uint64_t fsbcount = B_TO_FSB(channel, -count);
+
+		retval = iocache_flush_range(data, block, fsbcount);
+		if (retval)
+			return retval;
+		iocache_invalidate_range(data, block, fsbcount);
+		iocache_update_stats(data, 0, 0, CACHE_MISS);
+		return io_channel_read_blk64(data->real, block, count, buf);
+	}
+
+	for (i = 0; i < count; i++, ukey.block++, buf += channel->block_size) {
+		struct cache_node *node;
+		struct iocache_buf *ubuf;
+
+		cache_node_get(&data->cache, &ukey, 0, &node);
+		if (!node) {
+			/* cannot instantiate cache, just do a direct read */
+			retval = io_channel_read_blk64(data->real, ukey.block,
+						       1, buf);
+			if (retval)
+				return retval;
+			iocache_update_stats(data, channel->block_size, 0,
+					     CACHE_MISS);
+			continue;
+		}
+
+		ubuf = IOBUF(node);
+		iocache_buf_lock(ubuf);
+		if (!ubuf->uptodate) {
+			retval = io_channel_read_blk64(data->real, ukey.block,
+						       1, ubuf->buf);
+			if (!retval) {
+				ubuf->uptodate = 1;
+				iocache_update_stats(data, channel->block_size,
+						     0, CACHE_MISS);
+			}
+		} else {
+			iocache_update_stats(data, channel->block_size, 0,
+					     CACHE_HIT);
+		}
+		if (ubuf->uptodate)
+			memcpy(buf, ubuf->buf, channel->block_size);
+		iocache_buf_unlock(ubuf);
+		cache_node_put(&data->cache, node);
+		if (retval)
+			return retval;
+	}
+
+	return 0;
 }
 
 static errcode_t iocache_write_blk64(io_channel channel,
 				     unsigned long long block, int count,
 				     const void *buf)
 {
+	struct iocache_key ukey = {
+		.block = block,
+	};
 	struct iocache_private_data *data = IOCACHE(channel);
+	unsigned long long i;
+	errcode_t retval;
 
 	EXT2_CHECK_MAGIC(channel, EXT2_ET_MAGIC_IO_CHANNEL);
 	EXT2_CHECK_MAGIC(data, IOCACHE_IO_CHANNEL_MAGIC);
 
-	return io_channel_write_blk64(data->real, block, count, buf);
+	/*
+	 * If we're doing an odd-sized write, flush out the cache and then do a
+	 * direct write.
+	 */
+	if (count < 0) {
+		uint64_t fsbcount = B_TO_FSB(channel, -count);
+
+		retval = iocache_flush_range(data, block, fsbcount);
+		if (retval)
+			return retval;
+		iocache_invalidate_range(data, block, fsbcount);
+		iocache_update_stats(data, 0, 0, CACHE_MISS);
+		return io_channel_write_blk64(data->real, block, count, buf);
+	}
+
+	for (i = 0; i < count; i++, ukey.block++, buf += channel->block_size) {
+		struct cache_node *node;
+		struct iocache_buf *ubuf;
+
+		cache_node_get(&data->cache, &ukey, 0, &node);
+		if (!node) {
+			/* cannot instantiate cache, do a direct write */
+			retval = io_channel_write_blk64(data->real, ukey.block,
+							1, buf);
+			if (retval)
+				return retval;
+			iocache_update_stats(data, 0, channel->block_size,
+					     CACHE_MISS);
+			continue;
+		}
+
+		ubuf = IOBUF(node);
+		iocache_buf_lock(ubuf);
+		memcpy(ubuf->buf, buf, channel->block_size);
+		iocache_update_stats(data, 0, channel->block_size,
+				     ubuf->uptodate ? CACHE_HIT : CACHE_MISS);
+		ubuf->dirty = 1;
+		ubuf->uptodate = 1;
+		iocache_buf_unlock(ubuf);
+		cache_node_put(&data->cache, node);
+	}
+
+	return 0;
 }
 
 static errcode_t iocache_read_blk(io_channel channel, unsigned long block,
@@ -224,11 +636,17 @@ static errcode_t iocache_discard(io_channel channel, unsigned long long block,
 				 unsigned long long count)
 {
 	struct iocache_private_data *data = IOCACHE(channel);
+	errcode_t retval;
 
 	EXT2_CHECK_MAGIC(channel, EXT2_ET_MAGIC_IO_CHANNEL);
 	EXT2_CHECK_MAGIC(data, IOCACHE_IO_CHANNEL_MAGIC);
 
-	return io_channel_discard(data->real, block, count);
+	retval = io_channel_discard(data->real, block, count);
+	if (retval)
+		return retval;
+
+	iocache_invalidate_range(data, block, count);
+	return 0;
 }
 
 static errcode_t iocache_cache_readahead(io_channel channel,
@@ -247,11 +665,17 @@ static errcode_t iocache_zeroout(io_channel channel, unsigned long long block,
 				 unsigned long long count)
 {
 	struct iocache_private_data *data = IOCACHE(channel);
+	errcode_t retval;
 
 	EXT2_CHECK_MAGIC(channel, EXT2_ET_MAGIC_IO_CHANNEL);
 	EXT2_CHECK_MAGIC(data, IOCACHE_IO_CHANNEL_MAGIC);
 
-	return io_channel_zeroout(data->real, block, count);
+	retval = io_channel_zeroout(data->real, block, count);
+	if (retval)
+		return retval;
+
+	iocache_invalidate_range(data, block, count);
+	return 0;
 }
 
 static errcode_t iocache_get_fd(io_channel channel, int *fd)
@@ -273,6 +697,7 @@ static errcode_t iocache_invalidate_blocks(io_channel channel,
 	EXT2_CHECK_MAGIC(channel, EXT2_ET_MAGIC_IO_CHANNEL);
 	EXT2_CHECK_MAGIC(data, IOCACHE_IO_CHANNEL_MAGIC);
 
+	iocache_invalidate_range(data, block, count);
 	return io_channel_invalidate_blocks(data->real, block, count);
 }
 


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 3/6] iocache: bump buffer mru priority every 50 accesses
  2025-09-16  0:23 ` [PATCHSET RFC v5 8/9] fuse2fs: improve block and inode caching Darrick J. Wong
  2025-09-16  1:07   ` [PATCH 1/6] libsupport: add caching IO manager Darrick J. Wong
  2025-09-16  1:07   ` [PATCH 2/6] iocache: add the actual buffer cache Darrick J. Wong
@ 2025-09-16  1:07   ` Darrick J. Wong
  2025-09-16  1:07   ` [PATCH 4/6] fuse2fs: enable caching IO manager Darrick J. Wong
                     ` (2 subsequent siblings)
  5 siblings, 0 replies; 87+ messages in thread
From: Darrick J. Wong @ 2025-09-16  1:07 UTC (permalink / raw)
  To: tytso; +Cc: miklos, neal, linux-fsdevel, linux-ext4, John, bernd,
	joannelkoong

From: Darrick J. Wong <djwong@kernel.org>

If a buffer is hot enough to survive more than 50 access without being
reclaimed, bump its priority to the next MRU so it sticks around longer.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 lib/support/cache.h   |    1 +
 lib/support/cache.c   |   16 ++++++++++++++++
 lib/support/iocache.c |    9 +++++++++
 3 files changed, 26 insertions(+)


diff --git a/lib/support/cache.h b/lib/support/cache.h
index f482948a3b6331..5a8e19f5d18e78 100644
--- a/lib/support/cache.h
+++ b/lib/support/cache.h
@@ -173,5 +173,6 @@ int cache_node_purge(struct cache *, cache_key_t, struct cache_node *);
 void cache_report(FILE *fp, const char *, struct cache *);
 int cache_overflowed(struct cache *);
 struct cache_node *cache_node_grab(struct cache *cache, struct cache_node *node);
+void cache_node_bump_priority(struct cache *cache, struct cache_node *node);
 
 #endif	/* __CACHE_H__ */
diff --git a/lib/support/cache.c b/lib/support/cache.c
index 7e1ddc3cc8788d..34df5cb51cd5e4 100644
--- a/lib/support/cache.c
+++ b/lib/support/cache.c
@@ -649,6 +649,22 @@ cache_node_put(
 		cache_shrink(cache);
 }
 
+/* Bump the priority of a cache node.  Caller must hold cn_mutex. */
+void
+cache_node_bump_priority(
+	struct cache		*cache,
+	struct cache_node	*node)
+{
+	int			*priop;
+
+	if (node->cn_priority == CACHE_DIRTY_PRIORITY)
+		priop = &node->cn_old_priority;
+	else
+		priop = &node->cn_priority;
+	if (*priop < CACHE_MAX_PRIORITY)
+		(*priop)++;
+}
+
 void
 cache_node_set_priority(
 	struct cache *		cache,
diff --git a/lib/support/iocache.c b/lib/support/iocache.c
index ab879e85d18f2a..92d88331bfa54d 100644
--- a/lib/support/iocache.c
+++ b/lib/support/iocache.c
@@ -56,6 +56,7 @@ struct iocache_buf {
 	blk64_t			block;
 	void			*buf;
 	errcode_t		write_error;
+	uint8_t			access;
 	unsigned int		uptodate:1;
 	unsigned int		dirty:1;
 };
@@ -552,6 +553,10 @@ static errcode_t iocache_read_blk64(io_channel channel,
 		}
 		if (ubuf->uptodate)
 			memcpy(buf, ubuf->buf, channel->block_size);
+		if (++ubuf->access > 50) {
+			cache_node_bump_priority(&data->cache, node);
+			ubuf->access = 0;
+		}
 		iocache_buf_unlock(ubuf);
 		cache_node_put(&data->cache, node);
 		if (retval)
@@ -613,6 +618,10 @@ static errcode_t iocache_write_blk64(io_channel channel,
 				     ubuf->uptodate ? CACHE_HIT : CACHE_MISS);
 		ubuf->dirty = 1;
 		ubuf->uptodate = 1;
+		if (++ubuf->access > 50) {
+			cache_node_bump_priority(&data->cache, node);
+			ubuf->access = 0;
+		}
 		iocache_buf_unlock(ubuf);
 		cache_node_put(&data->cache, node);
 	}


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 4/6] fuse2fs: enable caching IO manager
  2025-09-16  0:23 ` [PATCHSET RFC v5 8/9] fuse2fs: improve block and inode caching Darrick J. Wong
                     ` (2 preceding siblings ...)
  2025-09-16  1:07   ` [PATCH 3/6] iocache: bump buffer mru priority every 50 accesses Darrick J. Wong
@ 2025-09-16  1:07   ` Darrick J. Wong
  2025-09-16  1:08   ` [PATCH 5/6] fuse2fs: increase inode cache size Darrick J. Wong
  2025-09-16  1:08   ` [PATCH 6/6] libext2fs: improve caching for inodes Darrick J. Wong
  5 siblings, 0 replies; 87+ messages in thread
From: Darrick J. Wong @ 2025-09-16  1:07 UTC (permalink / raw)
  To: tytso; +Cc: miklos, neal, linux-fsdevel, linux-ext4, John, bernd,
	joannelkoong

From: Darrick J. Wong <djwong@kernel.org>

Enable the new dynamic iocache I/O manager in the fuse server, and turn
off all the other cache control.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 fuse4fs/Makefile.in  |    3 +-
 fuse4fs/fuse4fs.1.in |    6 ----
 fuse4fs/fuse4fs.c    |   71 +++----------------------------------------------
 misc/Makefile.in     |    4 ++-
 misc/fuse2fs.1.in    |    6 ----
 misc/fuse2fs.c       |   73 ++++----------------------------------------------
 6 files changed, 15 insertions(+), 148 deletions(-)


diff --git a/fuse4fs/Makefile.in b/fuse4fs/Makefile.in
index 9f3547c271638f..0a558da23ced81 100644
--- a/fuse4fs/Makefile.in
+++ b/fuse4fs/Makefile.in
@@ -147,7 +147,8 @@ fuse4fs.o: $(srcdir)/fuse4fs.c $(top_builddir)/lib/config.h \
  $(top_srcdir)/lib/ext2fs/bitops.h $(top_srcdir)/lib/ext2fs/ext2fsP.h \
  $(top_srcdir)/lib/ext2fs/ext2fs.h $(top_srcdir)/version.h \
  $(top_srcdir)/lib/e2p/e2p.h $(top_srcdir)/lib/support/cache.h \
- $(top_srcdir)/lib/support/list.h $(top_srcdir)/lib/support/xbitops.h
+ $(top_srcdir)/lib/support/list.h $(top_srcdir)/lib/support/xbitops.h \
+ $(top_srcdir)/lib/support/iocache.h
 journal.o: $(srcdir)/../debugfs/journal.c $(top_builddir)/lib/config.h \
  $(top_builddir)/lib/dirpaths.h $(srcdir)/../debugfs/journal.h \
  $(top_srcdir)/e2fsck/jfs_user.h $(top_srcdir)/e2fsck/e2fsck.h \
diff --git a/fuse4fs/fuse4fs.1.in b/fuse4fs/fuse4fs.1.in
index 119cbcc903d8af..7ab197465c9713 100644
--- a/fuse4fs/fuse4fs.1.in
+++ b/fuse4fs/fuse4fs.1.in
@@ -48,12 +48,6 @@ .SS "fuse4fs options:"
 \fB-o\fR acl
 enable file access control lists
 .TP
-\fB-o\fR cache_size
-Set the disk cache size to this quantity.
-The quantity may contain the suffixes k, m, or g.
-By default, the size is 32MB.
-The size may not be larger than 2GB.
-.TP
 \fB-o\fR direct
 Use O_DIRECT to access the block device.
 .TP
diff --git a/fuse4fs/fuse4fs.c b/fuse4fs/fuse4fs.c
index c4397fc365ced7..2dd7c0f6759de5 100644
--- a/fuse4fs/fuse4fs.c
+++ b/fuse4fs/fuse4fs.c
@@ -53,6 +53,7 @@
 #include "ext2fs/ext2fsP.h"
 #include "support/list.h"
 #include "support/cache.h"
+#include "support/iocache.h"
 
 #include "../version.h"
 #include "uuid/uuid.h"
@@ -290,7 +291,6 @@ struct fuse4fs {
 	unsigned int blockmask;
 	unsigned long offset;
 	unsigned int next_generation;
-	unsigned long long cache_size;
 	char *lockfile;
 #ifdef HAVE_CLOCK_MONOTONIC
 	struct timespec lock_start_time;
@@ -1289,7 +1289,8 @@ static errcode_t fuse4fs_open(struct fuse4fs *ff, int libext2_flags)
 
 	dbg_printf(ff, "opening with flags=0x%x\n", flags);
 
-	err = ext2fs_open2(ff->device, options, flags, 0, 0, unix_io_manager,
+	iocache_set_backing_manager(unix_io_manager);
+	err = ext2fs_open2(ff->device, options, flags, 0, 0, iocache_io_manager,
 			   &ff->fs);
 	if (err == EPERM) {
 		err_printf(ff, "%s.\n",
@@ -1298,7 +1299,7 @@ static errcode_t fuse4fs_open(struct fuse4fs *ff, int libext2_flags)
 		ff->ro = 1;
 		ff->norecovery = 1;
 		err = ext2fs_open2(ff->device, options, flags, 0, 0,
-				   unix_io_manager, &ff->fs);
+				   iocache_io_manager, &ff->fs);
 	}
 	if (err) {
 		err_printf(ff, "%s.\n", error_message(err));
@@ -1321,25 +1322,6 @@ static inline bool fuse4fs_on_bdev(const struct fuse4fs *ff)
 	return ff->fs->io->flags & CHANNEL_FLAGS_BLOCK_DEVICE;
 }
 
-static errcode_t fuse4fs_config_cache(struct fuse4fs *ff)
-{
-	char buf[128];
-	errcode_t err;
-
-	snprintf(buf, sizeof(buf), "cache_blocks=%llu",
-		 FUSE4FS_B_TO_FSBT(ff, ff->cache_size));
-	err = io_channel_set_options(ff->fs->io, buf);
-	if (err) {
-		err_printf(ff, "%s %lluk: %s\n",
-			   _("cannot set disk cache size to"),
-			   ff->cache_size >> 10,
-			   error_message(err));
-		return err;
-	}
-
-	return 0;
-}
-
 static errcode_t fuse4fs_check_support(struct fuse4fs *ff)
 {
 	ext2_filsys fs = ff->fs;
@@ -7193,7 +7175,6 @@ enum {
 	FUSE4FS_VERSION,
 	FUSE4FS_HELP,
 	FUSE4FS_HELPFULL,
-	FUSE4FS_CACHE_SIZE,
 	FUSE4FS_DIRSYNC,
 	FUSE4FS_ERRORS_BEHAVIOR,
 #ifdef HAVE_FUSE_IOMAP
@@ -7243,7 +7224,6 @@ static struct fuse_opt fuse4fs_opts[] = {
 	FUSE_OPT_KEY("user_xattr",	FUSE4FS_IGNORED),
 	FUSE_OPT_KEY("noblock_validity", FUSE4FS_IGNORED),
 	FUSE_OPT_KEY("nodelalloc",	FUSE4FS_IGNORED),
-	FUSE_OPT_KEY("cache_size=%s",	FUSE4FS_CACHE_SIZE),
 	FUSE_OPT_KEY("dirsync",		FUSE4FS_DIRSYNC),
 	FUSE_OPT_KEY("errors=%s",	FUSE4FS_ERRORS_BEHAVIOR),
 #ifdef HAVE_FUSE_IOMAP
@@ -7282,16 +7262,6 @@ static int fuse4fs_opt_proc(void *data, const char *arg,
 			return 0;
 		}
 		return 1;
-	case FUSE4FS_CACHE_SIZE:
-		ff->cache_size = parse_num_blocks2(arg + 11, -1);
-		if (ff->cache_size < 1 || ff->cache_size > INT32_MAX) {
-			fprintf(stderr, "%s: %s\n", arg,
- _("cache size must be between 1 block and 2GB."));
-			return -1;
-		}
-
-		/* do not pass through to libfuse */
-		return 0;
 	case FUSE4FS_ERRORS_BEHAVIOR:
 		if (strcmp(arg + 7, "continue") == 0)
 			ff->errors_behavior = EXT2_ERRORS_CONTINUE;
@@ -7348,7 +7318,6 @@ static int fuse4fs_opt_proc(void *data, const char *arg,
 	"    -o kernel              run this as if it were the kernel, which sets:\n"
 	"                           allow_others,default_permissions,suid,dev\n"
 	"    -o directio            use O_DIRECT to read and write the disk\n"
-	"    -o cache_size=N[KMG]   use a disk cache of this size\n"
 	"    -o errors=             behavior when an error is encountered:\n"
 	"                           continue|remount-ro|panic\n"
 #ifdef HAVE_FUSE_IOMAP
@@ -7391,28 +7360,6 @@ static const char *get_subtype(const char *argv0)
 	return "ext4";
 }
 
-/* Figure out a reasonable default size for the disk cache */
-static unsigned long long default_cache_size(void)
-{
-	long pages = 0, pagesize = 0;
-	unsigned long long max_cache;
-	unsigned long long ret = 32ULL << 20; /* 32 MB */
-
-#ifdef _SC_PHYS_PAGES
-	pages = sysconf(_SC_PHYS_PAGES);
-#endif
-#ifdef _SC_PAGESIZE
-	pagesize = sysconf(_SC_PAGESIZE);
-#endif
-	if (pages > 0 && pagesize > 0) {
-		max_cache = (unsigned long long)pagesize * pages / 20;
-
-		if (max_cache > 0 && ret > max_cache)
-			ret = max_cache;
-	}
-	return ret;
-}
-
 #ifdef HAVE_FUSE_IOMAP
 static inline bool fuse4fs_discover_iomap(struct fuse4fs *ff)
 {
@@ -7687,16 +7634,6 @@ int main(int argc, char *argv[])
 		fctx.translate_inums = 0;
 	}
 
-	if (!fctx.cache_size)
-		fctx.cache_size = default_cache_size();
-	if (fctx.cache_size) {
-		err = fuse4fs_config_cache(&fctx);
-		if (err) {
-			ret = 32;
-			goto out;
-		}
-	}
-
 	err = fuse4fs_check_support(&fctx);
 	if (err) {
 		ret = 32;
diff --git a/misc/Makefile.in b/misc/Makefile.in
index ec964688acd623..8a3adc70fb736e 100644
--- a/misc/Makefile.in
+++ b/misc/Makefile.in
@@ -880,7 +880,9 @@ fuse2fs.o: $(srcdir)/fuse2fs.c $(top_builddir)/lib/config.h \
  $(top_srcdir)/lib/ext2fs/ext2_ext_attr.h $(top_srcdir)/lib/ext2fs/hashmap.h \
  $(top_srcdir)/lib/ext2fs/bitops.h $(top_srcdir)/lib/ext2fs/ext2fsP.h \
  $(top_srcdir)/lib/ext2fs/ext2fs.h $(top_srcdir)/version.h \
- $(top_srcdir)/lib/e2p/e2p.h
+ $(top_srcdir)/lib/e2p/e2p.h $(top_srcdir)/lib/support/cache.h \
+ $(top_srcdir)/lib/support/list.h $(top_srcdir)/lib/support/xbitops.h \
+ $(top_srcdir)/lib/support/iocache.h
 e2fuzz.o: $(srcdir)/e2fuzz.c $(top_builddir)/lib/config.h \
  $(top_builddir)/lib/dirpaths.h $(top_srcdir)/lib/ext2fs/ext2_fs.h \
  $(top_builddir)/lib/ext2fs/ext2_types.h $(top_srcdir)/lib/ext2fs/ext2fs.h \
diff --git a/misc/fuse2fs.1.in b/misc/fuse2fs.1.in
index 0c0934f03c9543..21917bdda31a12 100644
--- a/misc/fuse2fs.1.in
+++ b/misc/fuse2fs.1.in
@@ -48,12 +48,6 @@ .SS "fuse2fs options:"
 \fB-o\fR acl
 enable file access control lists
 .TP
-\fB-o\fR cache_size
-Set the disk cache size to this quantity.
-The quantity may contain the suffixes k, m, or g.
-By default, the size is 32MB.
-The size may not be larger than 2GB.
-.TP
 \fB-o\fR direct
 Use O_DIRECT to access the block device.
 .TP
diff --git a/misc/fuse2fs.c b/misc/fuse2fs.c
index 5e4680ca023282..8bd7cedc9f1ca8 100644
--- a/misc/fuse2fs.c
+++ b/misc/fuse2fs.c
@@ -50,6 +50,9 @@
 #include "ext2fs/ext2fs.h"
 #include "ext2fs/ext2_fs.h"
 #include "ext2fs/ext2fsP.h"
+#include "support/list.h"
+#include "support/cache.h"
+#include "support/iocache.h"
 
 #include "../version.h"
 #include "uuid/uuid.h"
@@ -283,7 +286,6 @@ struct fuse2fs {
 	unsigned int blockmask;
 	unsigned long offset;
 	unsigned int next_generation;
-	unsigned long long cache_size;
 	char *lockfile;
 #ifdef HAVE_CLOCK_MONOTONIC
 	struct timespec lock_start_time;
@@ -1119,7 +1121,8 @@ static errcode_t fuse2fs_open(struct fuse2fs *ff, int libext2_flags)
 
 	dbg_printf(ff, "opening with flags=0x%x\n", flags);
 
-	err = ext2fs_open2(ff->device, options, flags, 0, 0, unix_io_manager,
+	iocache_set_backing_manager(unix_io_manager);
+	err = ext2fs_open2(ff->device, options, flags, 0, 0, iocache_io_manager,
 			   &ff->fs);
 	if (err == EPERM) {
 		err_printf(ff, "%s.\n",
@@ -1128,7 +1131,7 @@ static errcode_t fuse2fs_open(struct fuse2fs *ff, int libext2_flags)
 		ff->ro = 1;
 		ff->norecovery = 1;
 		err = ext2fs_open2(ff->device, options, flags, 0, 0,
-				   unix_io_manager, &ff->fs);
+				   iocache_io_manager, &ff->fs);
 	}
 	if (err) {
 		err_printf(ff, "%s.\n", error_message(err));
@@ -1147,25 +1150,6 @@ static inline bool fuse2fs_on_bdev(const struct fuse2fs *ff)
 	return ff->fs->io->flags & CHANNEL_FLAGS_BLOCK_DEVICE;
 }
 
-static errcode_t fuse2fs_config_cache(struct fuse2fs *ff)
-{
-	char buf[128];
-	errcode_t err;
-
-	snprintf(buf, sizeof(buf), "cache_blocks=%llu",
-		 FUSE2FS_B_TO_FSBT(ff, ff->cache_size));
-	err = io_channel_set_options(ff->fs->io, buf);
-	if (err) {
-		err_printf(ff, "%s %lluk: %s\n",
-			   _("cannot set disk cache size to"),
-			   ff->cache_size >> 10,
-			   error_message(err));
-		return err;
-	}
-
-	return 0;
-}
-
 static errcode_t fuse2fs_check_support(struct fuse2fs *ff)
 {
 	ext2_filsys fs = ff->fs;
@@ -6750,7 +6734,6 @@ enum {
 	FUSE2FS_VERSION,
 	FUSE2FS_HELP,
 	FUSE2FS_HELPFULL,
-	FUSE2FS_CACHE_SIZE,
 	FUSE2FS_DIRSYNC,
 	FUSE2FS_ERRORS_BEHAVIOR,
 #ifdef HAVE_FUSE_IOMAP
@@ -6800,7 +6783,6 @@ static struct fuse_opt fuse2fs_opts[] = {
 	FUSE_OPT_KEY("user_xattr",	FUSE2FS_IGNORED),
 	FUSE_OPT_KEY("noblock_validity", FUSE2FS_IGNORED),
 	FUSE_OPT_KEY("nodelalloc",	FUSE2FS_IGNORED),
-	FUSE_OPT_KEY("cache_size=%s",	FUSE2FS_CACHE_SIZE),
 	FUSE_OPT_KEY("dirsync",		FUSE2FS_DIRSYNC),
 	FUSE_OPT_KEY("errors=%s",	FUSE2FS_ERRORS_BEHAVIOR),
 #ifdef HAVE_FUSE_IOMAP
@@ -6839,16 +6821,6 @@ static int fuse2fs_opt_proc(void *data, const char *arg,
 			return 0;
 		}
 		return 1;
-	case FUSE2FS_CACHE_SIZE:
-		ff->cache_size = parse_num_blocks2(arg + 11, -1);
-		if (ff->cache_size < 1 || ff->cache_size > INT32_MAX) {
-			fprintf(stderr, "%s: %s\n", arg,
- _("cache size must be between 1 block and 2GB."));
-			return -1;
-		}
-
-		/* do not pass through to libfuse */
-		return 0;
 	case FUSE2FS_ERRORS_BEHAVIOR:
 		if (strcmp(arg + 7, "continue") == 0)
 			ff->errors_behavior = EXT2_ERRORS_CONTINUE;
@@ -6905,7 +6877,6 @@ static int fuse2fs_opt_proc(void *data, const char *arg,
 	"    -o kernel              run this as if it were the kernel, which sets:\n"
 	"                           allow_others,default_permissions,suid,dev\n"
 	"    -o directio            use O_DIRECT to read and write the disk\n"
-	"    -o cache_size=N[KMG]   use a disk cache of this size\n"
 	"    -o errors=             behavior when an error is encountered:\n"
 	"                           continue|remount-ro|panic\n"
 #ifdef HAVE_FUSE_IOMAP
@@ -6949,28 +6920,6 @@ static const char *get_subtype(const char *argv0)
 	return "ext4";
 }
 
-/* Figure out a reasonable default size for the disk cache */
-static unsigned long long default_cache_size(void)
-{
-	long pages = 0, pagesize = 0;
-	unsigned long long max_cache;
-	unsigned long long ret = 32ULL << 20; /* 32 MB */
-
-#ifdef _SC_PHYS_PAGES
-	pages = sysconf(_SC_PHYS_PAGES);
-#endif
-#ifdef _SC_PAGESIZE
-	pagesize = sysconf(_SC_PAGESIZE);
-#endif
-	if (pages > 0 && pagesize > 0) {
-		max_cache = (unsigned long long)pagesize * pages / 20;
-
-		if (max_cache > 0 && ret > max_cache)
-			ret = max_cache;
-	}
-	return ret;
-}
-
 #ifdef HAVE_FUSE_IOMAP
 static inline bool fuse2fs_discover_iomap(struct fuse2fs *ff)
 {
@@ -7130,16 +7079,6 @@ int main(int argc, char *argv[])
 		goto out;
 	}
 
-	if (!fctx.cache_size)
-		fctx.cache_size = default_cache_size();
-	if (fctx.cache_size) {
-		err = fuse2fs_config_cache(&fctx);
-		if (err) {
-			ret = 32;
-			goto out;
-		}
-	}
-
 	err = fuse2fs_check_support(&fctx);
 	if (err) {
 		ret = 32;


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 5/6] fuse2fs: increase inode cache size
  2025-09-16  0:23 ` [PATCHSET RFC v5 8/9] fuse2fs: improve block and inode caching Darrick J. Wong
                     ` (3 preceding siblings ...)
  2025-09-16  1:07   ` [PATCH 4/6] fuse2fs: enable caching IO manager Darrick J. Wong
@ 2025-09-16  1:08   ` Darrick J. Wong
  2025-09-16  1:08   ` [PATCH 6/6] libext2fs: improve caching for inodes Darrick J. Wong
  5 siblings, 0 replies; 87+ messages in thread
From: Darrick J. Wong @ 2025-09-16  1:08 UTC (permalink / raw)
  To: tytso; +Cc: miklos, neal, linux-fsdevel, linux-ext4, John, bernd,
	joannelkoong

From: Darrick J. Wong <djwong@kernel.org>

Increase the internal inode cache size.  Does this improve performance
any?

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 fuse4fs/fuse4fs.c |    4 ++++
 misc/fuse2fs.c    |    4 ++++
 2 files changed, 8 insertions(+)


diff --git a/fuse4fs/fuse4fs.c b/fuse4fs/fuse4fs.c
index 2dd7c0f6759de5..3e8822fac08630 100644
--- a/fuse4fs/fuse4fs.c
+++ b/fuse4fs/fuse4fs.c
@@ -1311,6 +1311,10 @@ static errcode_t fuse4fs_open(struct fuse4fs *ff, int libext2_flags)
 	if (err)
 		return translate_error(ff->fs, 0, err);
 
+	err = ext2fs_create_inode_cache(ff->fs, 1024);
+	if (err)
+		return translate_error(ff->fs, 0, err);
+
 	ff->fs->priv_data = ff;
 	ff->blocklog = u_log2(ff->fs->blocksize);
 	ff->blockmask = ff->fs->blocksize - 1;
diff --git a/misc/fuse2fs.c b/misc/fuse2fs.c
index 8bd7cedc9f1ca8..83a3ed3ac3450e 100644
--- a/misc/fuse2fs.c
+++ b/misc/fuse2fs.c
@@ -1139,6 +1139,10 @@ static errcode_t fuse2fs_open(struct fuse2fs *ff, int libext2_flags)
 		return err;
 	}
 
+	err = ext2fs_create_inode_cache(ff->fs, 1024);
+	if (err)
+		return translate_error(ff->fs, 0, err);
+
 	ff->fs->priv_data = ff;
 	ff->blocklog = u_log2(ff->fs->blocksize);
 	ff->blockmask = ff->fs->blocksize - 1;


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 6/6] libext2fs: improve caching for inodes
  2025-09-16  0:23 ` [PATCHSET RFC v5 8/9] fuse2fs: improve block and inode caching Darrick J. Wong
                     ` (4 preceding siblings ...)
  2025-09-16  1:08   ` [PATCH 5/6] fuse2fs: increase inode cache size Darrick J. Wong
@ 2025-09-16  1:08   ` Darrick J. Wong
  5 siblings, 0 replies; 87+ messages in thread
From: Darrick J. Wong @ 2025-09-16  1:08 UTC (permalink / raw)
  To: tytso; +Cc: miklos, neal, linux-fsdevel, linux-ext4, John, bernd,
	joannelkoong

From: Darrick J. Wong <djwong@kernel.org>

Use our new cache code to improve the ondisk inode cache inside
libext2fs.  Oops, list.h duplication, and libext2fs needs to link
against libsupport now.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 lib/ext2fs/ext2fsP.h    |   13 ++-
 debugfs/Makefile.in     |    4 -
 e2fsck/Makefile.in      |    4 -
 lib/ext2fs/Makefile.in  |    4 +
 lib/ext2fs/inode.c      |  215 +++++++++++++++++++++++++++++++++++++----------
 resize/Makefile.in      |    4 -
 tests/progs/Makefile.in |    4 -
 7 files changed, 187 insertions(+), 61 deletions(-)


diff --git a/lib/ext2fs/ext2fsP.h b/lib/ext2fs/ext2fsP.h
index 428081c9e2ff38..8490dd5139d543 100644
--- a/lib/ext2fs/ext2fsP.h
+++ b/lib/ext2fs/ext2fsP.h
@@ -82,21 +82,26 @@ struct dir_context {
 	errcode_t	errcode;
 };
 
+#include "support/list.h"
+#include "support/cache.h"
+
 /*
  * Inode cache structure
  */
 struct ext2_inode_cache {
 	void *				buffer;
 	blk64_t				buffer_blk;
-	int				cache_last;
-	unsigned int			cache_size;
 	int				refcount;
-	struct ext2_inode_cache_ent	*cache;
+	struct cache			cache;
 };
 
 struct ext2_inode_cache_ent {
+	struct cache_node	node;
 	ext2_ino_t		ino;
-	struct ext2_inode	*inode;
+	uint8_t			access;
+
+	/* bytes representing a host-endian ext2_inode_large object */
+	char			raw[];
 };
 
 /*
diff --git a/debugfs/Makefile.in b/debugfs/Makefile.in
index 700ae87418c268..8dfd802692b839 100644
--- a/debugfs/Makefile.in
+++ b/debugfs/Makefile.in
@@ -38,9 +38,9 @@ SRCS= debug_cmds.c $(srcdir)/debugfs.c $(srcdir)/util.c $(srcdir)/ls.c \
 	$(srcdir)/../e2fsck/recovery.c $(srcdir)/do_journal.c \
 	$(srcdir)/do_orphan.c
 
-LIBS= $(LIBSUPPORT) $(LIBEXT2FS) $(LIBE2P) $(LIBSS) $(LIBCOM_ERR) $(LIBBLKID) \
+LIBS= $(LIBEXT2FS) $(LIBSUPPORT) $(LIBE2P) $(LIBSS) $(LIBCOM_ERR) $(LIBBLKID) \
 	$(LIBUUID) $(LIBMAGIC) $(SYSLIBS) $(LIBARCHIVE)
-DEPLIBS= $(DEPLIBSUPPORT) $(LIBEXT2FS) $(LIBE2P) $(DEPLIBSS) $(DEPLIBCOM_ERR) \
+DEPLIBS= $(LIBEXT2FS) $(DEPLIBSUPPORT) $(LIBE2P) $(DEPLIBSS) $(DEPLIBCOM_ERR) \
 	$(DEPLIBBLKID) $(DEPLIBUUID)
 
 STATIC_LIBS= $(STATIC_LIBSUPPORT) $(STATIC_LIBEXT2FS) $(STATIC_LIBSS) \
diff --git a/e2fsck/Makefile.in b/e2fsck/Makefile.in
index 52fad9cbfd2b23..61451f2d9e3276 100644
--- a/e2fsck/Makefile.in
+++ b/e2fsck/Makefile.in
@@ -16,9 +16,9 @@ PROGS=		e2fsck
 MANPAGES=	e2fsck.8
 FMANPAGES=	e2fsck.conf.5
 
-LIBS= $(LIBSUPPORT) $(LIBEXT2FS) $(LIBCOM_ERR) $(LIBBLKID) $(LIBUUID) \
+LIBS= $(LIBEXT2FS) $(LIBSUPPORT) $(LIBCOM_ERR) $(LIBBLKID) $(LIBUUID) \
 	$(LIBINTL) $(LIBE2P) $(LIBMAGIC) $(SYSLIBS)
-DEPLIBS= $(DEPLIBSUPPORT) $(LIBEXT2FS) $(DEPLIBCOM_ERR) $(DEPLIBBLKID) \
+DEPLIBS= $(LIBEXT2FS) $(DEPLIBSUPPORT) $(DEPLIBCOM_ERR) $(DEPLIBBLKID) \
 	 $(DEPLIBUUID) $(DEPLIBE2P)
 
 STATIC_LIBS= $(STATIC_LIBSUPPORT) $(STATIC_LIBEXT2FS) $(STATIC_LIBCOM_ERR) \
diff --git a/lib/ext2fs/Makefile.in b/lib/ext2fs/Makefile.in
index 1d0991defff804..89254ded7c0723 100644
--- a/lib/ext2fs/Makefile.in
+++ b/lib/ext2fs/Makefile.in
@@ -976,7 +976,9 @@ inode.o: $(srcdir)/inode.c $(top_builddir)/lib/config.h \
  $(srcdir)/ext2fs.h $(srcdir)/ext2_fs.h $(srcdir)/ext3_extents.h \
  $(top_srcdir)/lib/et/com_err.h $(srcdir)/ext2_io.h \
  $(top_builddir)/lib/ext2fs/ext2_err.h $(srcdir)/ext2_ext_attr.h \
- $(srcdir)/hashmap.h $(srcdir)/bitops.h $(srcdir)/e2image.h
+ $(srcdir)/hashmap.h $(srcdir)/bitops.h $(srcdir)/e2image.h \
+ $(srcdir)/../support/cache.h $(srcdir)/../support/list.h \
+ $(srcdir)/../support/xbitops.h 
 inode_io.o: $(srcdir)/inode_io.c $(top_builddir)/lib/config.h \
  $(top_builddir)/lib/dirpaths.h $(srcdir)/ext2_fs.h \
  $(top_builddir)/lib/ext2fs/ext2_types.h $(srcdir)/ext2fs.h \
diff --git a/lib/ext2fs/inode.c b/lib/ext2fs/inode.c
index c9389a2324be07..8ca82af1ab35d3 100644
--- a/lib/ext2fs/inode.c
+++ b/lib/ext2fs/inode.c
@@ -59,18 +59,145 @@ struct ext2_struct_inode_scan {
 	int			reserved[6];
 };
 
+struct ext2_inode_cache_key {
+	ext2_filsys		fs;
+	ext2_ino_t		ino;
+};
+
+#define ICKEY(key)	((struct ext2_inode_cache_key *)(key))
+#define ICNODE(node)	(container_of((node), struct ext2_inode_cache_ent, node))
+
+static unsigned int
+ext2_inode_cache_hash(cache_key_t key, unsigned int hashsize,
+		      unsigned int hashshift)
+{
+	uint64_t	hashval = ICKEY(key)->ino;
+	uint64_t	tmp;
+
+	tmp = hashval ^ (GOLDEN_RATIO_PRIME + hashval) / CACHE_LINE_SIZE;
+	tmp = tmp ^ ((tmp ^ GOLDEN_RATIO_PRIME) >> hashshift);
+	return tmp % hashsize;
+}
+
+static int ext2_inode_cache_compare(struct cache_node *node, cache_key_t key)
+{
+	struct ext2_inode_cache_ent *ent = ICNODE(node);
+	struct ext2_inode_cache_key *ikey = ICKEY(key);
+
+	if (ent->ino == ikey->ino)
+		return CACHE_HIT;
+
+	return CACHE_MISS;
+}
+
+static struct cache_node *ext2_inode_cache_alloc(struct cache *c,
+						 cache_key_t key)
+{
+	struct ext2_inode_cache_key *ikey = ICKEY(key);
+	struct ext2_inode_cache_ent *ent;
+
+	ent = calloc(1, sizeof(struct ext2_inode_cache_ent) +
+			EXT2_INODE_SIZE(ikey->fs->super));
+	if (!ent)
+		return NULL;
+
+	ent->ino = ikey->ino;
+	return &ent->node;
+}
+
+static bool ext2_inode_cache_flush(struct cache *c, struct cache_node *node)
+{
+	/* can always drop inode cache */
+	return 0;
+}
+
+static void ext2_inode_cache_relse(struct cache *c, struct cache_node *node)
+{
+	struct ext2_inode_cache_ent *ent = ICNODE(node);
+
+	free(ent);
+}
+
+static unsigned int ext2_inode_cache_bulkrelse(struct cache *cache,
+					       struct list_head *list)
+{
+	struct cache_node *cn, *n;
+	int count = 0;
+
+	if (list_empty(list))
+		return 0;
+
+	list_for_each_entry_safe(cn, n, list, cn_mru) {
+		ext2_inode_cache_relse(cache, cn);
+		count++;
+	}
+
+	return count;
+}
+
+static const struct cache_operations ext2_inode_cache_ops = {
+	.hash		= ext2_inode_cache_hash,
+	.alloc		= ext2_inode_cache_alloc,
+	.flush		= ext2_inode_cache_flush,
+	.relse		= ext2_inode_cache_relse,
+	.compare	= ext2_inode_cache_compare,
+	.bulkrelse	= ext2_inode_cache_bulkrelse,
+	.resize		= cache_gradual_resize,
+};
+
+static errcode_t ext2_inode_cache_iget(ext2_filsys fs, ext2_ino_t ino,
+				       unsigned int getflags,
+				       struct ext2_inode_cache_ent **entp)
+{
+	struct ext2_inode_cache_key ikey = {
+		.fs = fs,
+		.ino = ino,
+	};
+	struct cache_node *node = NULL;
+
+	cache_node_get(&fs->icache->cache, &ikey, getflags, &node);
+	if (!node)
+		return ENOMEM;
+
+	*entp = ICNODE(node);
+	return 0;
+}
+
+static void ext2_inode_cache_iput(ext2_filsys fs,
+				  struct ext2_inode_cache_ent *ent)
+{
+	cache_node_put(&fs->icache->cache, &ent->node);
+}
+
+static int ext2_inode_cache_ipurge(ext2_filsys fs, ext2_ino_t ino,
+				   struct ext2_inode_cache_ent *ent)
+{
+	struct ext2_inode_cache_key ikey = {
+		.fs = fs,
+		.ino = ino,
+	};
+
+	return cache_node_purge(&fs->icache->cache, &ikey, &ent->node);
+}
+
+static void ext2_inode_cache_ibump(ext2_filsys fs,
+				   struct ext2_inode_cache_ent *ent)
+{
+	if (++ent->access > 50) {
+		cache_node_bump_priority(&fs->icache->cache, &ent->node);
+		ent->access = 0;
+	}
+}
+
 /*
  * This routine flushes the icache, if it exists.
  */
 errcode_t ext2fs_flush_icache(ext2_filsys fs)
 {
-	unsigned	i;
-
 	if (!fs->icache)
 		return 0;
 
-	for (i=0; i < fs->icache->cache_size; i++)
-		fs->icache->cache[i].ino = 0;
+	cache_purge(&fs->icache->cache);
 
 	fs->icache->buffer_blk = 0;
 	return 0;
@@ -81,23 +208,20 @@ errcode_t ext2fs_flush_icache(ext2_filsys fs)
  */
 void ext2fs_free_inode_cache(struct ext2_inode_cache *icache)
 {
-	unsigned i;
-
 	if (--icache->refcount)
 		return;
 	if (icache->buffer)
 		ext2fs_free_mem(&icache->buffer);
-	for (i = 0; i < icache->cache_size; i++)
-		ext2fs_free_mem(&icache->cache[i].inode);
-	if (icache->cache)
-		ext2fs_free_mem(&icache->cache);
+	if (cache_initialized(&icache->cache)) {
+		cache_purge(&icache->cache);
+		cache_destroy(&icache->cache);
+	}
 	icache->buffer_blk = 0;
 	ext2fs_free_mem(&icache);
 }
 
 errcode_t ext2fs_create_inode_cache(ext2_filsys fs, unsigned int cache_size)
 {
-	unsigned	i;
 	errcode_t	retval;
 
 	if (fs->icache)
@@ -112,22 +236,12 @@ errcode_t ext2fs_create_inode_cache(ext2_filsys fs, unsigned int cache_size)
 		goto errout;
 
 	fs->icache->buffer_blk = 0;
-	fs->icache->cache_last = -1;
-	fs->icache->cache_size = cache_size;
 	fs->icache->refcount = 1;
-	retval = ext2fs_get_array(fs->icache->cache_size,
-				  sizeof(struct ext2_inode_cache_ent),
-				  &fs->icache->cache);
+	retval = cache_init(0, cache_size, &ext2_inode_cache_ops,
+			    &fs->icache->cache);
 	if (retval)
 		goto errout;
 
-	for (i = 0; i < fs->icache->cache_size; i++) {
-		retval = ext2fs_get_mem(EXT2_INODE_SIZE(fs->super),
-					&fs->icache->cache[i].inode);
-		if (retval)
-			goto errout;
-	}
-
 	ext2fs_flush_icache(fs);
 	return 0;
 errout:
@@ -762,12 +876,12 @@ errcode_t ext2fs_read_inode2(ext2_filsys fs, ext2_ino_t ino,
 	unsigned long 	block, offset;
 	char 		*ptr;
 	errcode_t	retval;
-	unsigned	i;
 	int		clen, inodes_per_block;
 	io_channel	io;
 	int		length = EXT2_INODE_SIZE(fs->super);
 	struct ext2_inode_large	*iptr;
-	int		cache_slot, fail_csum;
+	struct ext2_inode_cache_ent *ent = NULL;
+	int		fail_csum;
 
 	EXT2_CHECK_MAGIC(fs, EXT2_ET_MAGIC_EXT2FS_FILSYS);
 
@@ -794,12 +908,12 @@ errcode_t ext2fs_read_inode2(ext2_filsys fs, ext2_ino_t ino,
 			return retval;
 	}
 	/* Check to see if it's in the inode cache */
-	for (i = 0; i < fs->icache->cache_size; i++) {
-		if (fs->icache->cache[i].ino == ino) {
-			memcpy(inode, fs->icache->cache[i].inode,
-			       (bufsize > length) ? length : bufsize);
-			return 0;
-		}
+	ext2_inode_cache_iget(fs, ino, CACHE_GET_INCORE, &ent);
+	if (ent) {
+		memcpy(inode, ent->raw, (bufsize > length) ? length : bufsize);
+		ext2_inode_cache_ibump(fs, ent);
+		ext2_inode_cache_iput(fs, ent);
+		return 0;
 	}
 	if (fs->flags & EXT2_FLAG_IMAGE_FILE) {
 		inodes_per_block = fs->blocksize / EXT2_INODE_SIZE(fs->super);
@@ -827,8 +941,10 @@ errcode_t ext2fs_read_inode2(ext2_filsys fs, ext2_ino_t ino,
 	}
 	offset &= (EXT2_BLOCK_SIZE(fs->super) - 1);
 
-	cache_slot = (fs->icache->cache_last + 1) % fs->icache->cache_size;
-	iptr = (struct ext2_inode_large *)fs->icache->cache[cache_slot].inode;
+	retval = ext2_inode_cache_iget(fs, ino, 0, &ent);
+	if (retval)
+		return retval;
+	iptr = (struct ext2_inode_large *)ent->raw;
 
 	ptr = (char *) iptr;
 	while (length) {
@@ -863,13 +979,15 @@ errcode_t ext2fs_read_inode2(ext2_filsys fs, ext2_ino_t ino,
 			       0, length);
 #endif
 
-	/* Update the inode cache bookkeeping */
-	if (!fail_csum) {
-		fs->icache->cache_last = cache_slot;
-		fs->icache->cache[cache_slot].ino = ino;
-	}
 	memcpy(inode, iptr, (bufsize > length) ? length : bufsize);
 
+	/* Update the inode cache bookkeeping */
+	if (!fail_csum)
+		ext2_inode_cache_ibump(fs, ent);
+	ext2_inode_cache_iput(fs, ent);
+	if (fail_csum)
+		ext2_inode_cache_ipurge(fs, ino, ent);
+
 	if (!(fs->flags & EXT2_FLAG_IGNORE_CSUM_ERRORS) &&
 	    !(flags & READ_INODE_NOCSUM) && fail_csum)
 		return EXT2_ET_INODE_CSUM_INVALID;
@@ -899,8 +1017,8 @@ errcode_t ext2fs_write_inode2(ext2_filsys fs, ext2_ino_t ino,
 	unsigned long block, offset;
 	errcode_t retval = 0;
 	struct ext2_inode_large *w_inode;
+	struct ext2_inode_cache_ent *ent;
 	char *ptr;
-	unsigned i;
 	int clen;
 	int length = EXT2_INODE_SIZE(fs->super);
 
@@ -933,19 +1051,20 @@ errcode_t ext2fs_write_inode2(ext2_filsys fs, ext2_ino_t ino,
 	}
 
 	/* Check to see if the inode cache needs to be updated */
-	if (fs->icache) {
-		for (i=0; i < fs->icache->cache_size; i++) {
-			if (fs->icache->cache[i].ino == ino) {
-				memcpy(fs->icache->cache[i].inode, inode,
-				       (bufsize > length) ? length : bufsize);
-				break;
-			}
-		}
-	} else {
+	if (!fs->icache) {
 		retval = ext2fs_create_inode_cache(fs, 4);
 		if (retval)
 			goto errout;
 	}
+
+	retval = ext2_inode_cache_iget(fs, ino, 0, &ent);
+	if (retval)
+		goto errout;
+
+	memcpy(ent->raw, inode, (bufsize > length) ? length : bufsize);
+	ext2_inode_cache_ibump(fs, ent);
+	ext2_inode_cache_iput(fs, ent);
+
 	memcpy(w_inode, inode, (bufsize > length) ? length : bufsize);
 
 	if (!(fs->flags & EXT2_FLAG_RW)) {
diff --git a/resize/Makefile.in b/resize/Makefile.in
index 27f721305e052e..d03d3bfc309968 100644
--- a/resize/Makefile.in
+++ b/resize/Makefile.in
@@ -28,8 +28,8 @@ SRCS= $(srcdir)/extent.c \
 	$(srcdir)/resource_track.c \
 	$(srcdir)/sim_progress.c
 
-LIBS= $(LIBE2P) $(LIBEXT2FS) $(LIBCOM_ERR) $(LIBINTL) $(SYSLIBS)
-DEPLIBS= $(LIBE2P) $(LIBEXT2FS) $(DEPLIBCOM_ERR)
+LIBS= $(LIBE2P) $(LIBEXT2FS) $(LIBSUPPORT) $(LIBCOM_ERR) $(LIBINTL) $(SYSLIBS)
+DEPLIBS= $(LIBE2P) $(LIBEXT2FS) $(DEPLIBSUPPORT) $(DEPLIBCOM_ERR)
 
 STATIC_LIBS= $(STATIC_LIBE2P) $(STATIC_LIBEXT2FS) $(STATIC_LIBCOM_ERR) \
 	$(LIBINTL) $(SYSLIBS)
diff --git a/tests/progs/Makefile.in b/tests/progs/Makefile.in
index 1a8e9299a1c1ca..64069a52c57cd3 100644
--- a/tests/progs/Makefile.in
+++ b/tests/progs/Makefile.in
@@ -23,8 +23,8 @@ TEST_ICOUNT_OBJS=	test_icount.o test_icount_cmds.o
 SRCS=	$(srcdir)/test_icount.c \
 	$(srcdir)/test_rel.c
 
-LIBS= $(LIBEXT2FS) $(LIBSS) $(LIBCOM_ERR) $(SYSLIBS)
-DEPLIBS= $(LIBEXT2FS) $(DEPLIBSS) $(DEPLIBCOM_ERR)
+LIBS= $(LIBEXT2FS) $(LIBSUPPORT) $(LIBSS) $(LIBCOM_ERR) $(SYSLIBS)
+DEPLIBS= $(LIBEXT2FS) $(DEPLIBSUPPORT) $(DEPLIBSS) $(DEPLIBCOM_ERR)
 
 .c.o:
 	$(E) "	CC $<"


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 1/4] libext2fs: fix MMP code to work with unixfd IO manager
  2025-09-16  0:24 ` [PATCHSET RFC v5 9/9] fuse4fs: run servers as a contained service Darrick J. Wong
@ 2025-09-16  1:08   ` Darrick J. Wong
  2025-09-16  1:08   ` [PATCH 2/4] fuse4fs: enable safe service mode Darrick J. Wong
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 87+ messages in thread
From: Darrick J. Wong @ 2025-09-16  1:08 UTC (permalink / raw)
  To: tytso; +Cc: miklos, neal, linux-fsdevel, linux-ext4, John, bernd,
	joannelkoong

From: Darrick J. Wong <djwong@kernel.org>

The MMP code assumes that it can (re)open() the filesystem via
fs->device_name.  However, if the Unix FD IO manager is in use the path
will be the string representation of the fd number.  This is a horrible
layering violation, but let's take the easy route and reroute the open()
call to dup() if desirable.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 lib/ext2fs/mmp.c |   46 +++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 45 insertions(+), 1 deletion(-)


diff --git a/lib/ext2fs/mmp.c b/lib/ext2fs/mmp.c
index eb9417020e6d3f..936ae920563fc5 100644
--- a/lib/ext2fs/mmp.c
+++ b/lib/ext2fs/mmp.c
@@ -26,6 +26,7 @@
 #include <sys/types.h>
 #include <sys/stat.h>
 #include <fcntl.h>
+#include <limits.h>
 
 #include "ext2fs/ext2_fs.h"
 #include "ext2fs/ext2fs.h"
@@ -41,6 +42,49 @@
 #endif
 #endif
 
+static int ext2fs_mmp_open_device(ext2_filsys fs, int flags)
+{
+	struct stat st;
+	char *endptr = NULL;
+	long maybe_fd;
+	int new_fd;
+	int want_directio = 1;
+	int ret;
+
+	/*
+	 * If the device name is only a number, then most likely the unixfd IO
+	 * manager is in use here.  Try to extract the fd number; if we can't,
+	 * then fall back to regular open.
+	 */
+	errno = 0;
+	maybe_fd = strtol(fs->device_name, &endptr, 10);
+	if (errno || endptr != fs->device_name + strlen(fs->device_name))
+		return open(fs->device_name, flags);
+
+	if (maybe_fd < 0 || maybe_fd > INT_MAX)
+		return -1;
+
+	/* Skip directio if this is a regular file, just like below */
+	ret = fstat(maybe_fd, &st);
+	if (ret == 0 && S_ISREG(st.st_mode))
+		want_directio = 0;
+
+	/* Duplicate the fd so that the MMP code can close it later */
+	new_fd = dup(maybe_fd);
+	if (new_fd < 0)
+		return -1;
+
+	if (want_directio) {
+		ret = fcntl(new_fd, F_SETFL, O_DIRECT);
+		if (ret) {
+			close(new_fd);
+			return -1;
+		}
+	}
+
+	return new_fd;
+}
+
 errcode_t ext2fs_mmp_read(ext2_filsys fs, blk64_t mmp_blk, void *buf)
 {
 #ifdef CONFIG_MMP
@@ -70,7 +114,7 @@ errcode_t ext2fs_mmp_read(ext2_filsys fs, blk64_t mmp_blk, void *buf)
 		    S_ISREG(st.st_mode))
 			flags &= ~O_DIRECT;
 
-		fs->mmp_fd = open(fs->device_name, flags);
+		fs->mmp_fd = ext2fs_mmp_open_device(fs, flags);
 		if (fs->mmp_fd < 0) {
 			retval = EXT2_ET_MMP_OPEN_DIRECT;
 			goto out;


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 2/4] fuse4fs: enable safe service mode
  2025-09-16  0:24 ` [PATCHSET RFC v5 9/9] fuse4fs: run servers as a contained service Darrick J. Wong
  2025-09-16  1:08   ` [PATCH 1/4] libext2fs: fix MMP code to work with unixfd IO manager Darrick J. Wong
@ 2025-09-16  1:08   ` Darrick J. Wong
  2025-09-16  1:09   ` [PATCH 3/4] fuse4fs: set proc title when in fuse " Darrick J. Wong
  2025-09-16  1:09   ` [PATCH 4/4] fuse4fs: set iomap backing device blocksize Darrick J. Wong
  3 siblings, 0 replies; 87+ messages in thread
From: Darrick J. Wong @ 2025-09-16  1:08 UTC (permalink / raw)
  To: tytso; +Cc: miklos, neal, linux-fsdevel, linux-ext4, John, bernd,
	joannelkoong

From: Darrick J. Wong <djwong@kernel.org>

Make it possible to run fuse4fs as a safe systemd service, wherein the
fuse server only has access to the fds that we pass in.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 MCONFIG.in                  |    1 
 configure                   |  133 +++++++++++++++++++++++
 configure.ac                |   45 ++++++++
 debian/fuse4fs.install      |    2 
 fuse4fs/Makefile.in         |   40 ++++++-
 fuse4fs/fuse4fs.c           |  253 ++++++++++++++++++++++++++++++++++++++++++-
 fuse4fs/fuse4fs.socket.in   |   14 ++
 fuse4fs/fuse4fs@.service.in |   95 ++++++++++++++++
 lib/config.h.in             |    3 +
 util/subst.conf.in          |    2 
 10 files changed, 577 insertions(+), 11 deletions(-)
 create mode 100644 fuse4fs/fuse4fs.socket.in
 create mode 100644 fuse4fs/fuse4fs@.service.in


diff --git a/MCONFIG.in b/MCONFIG.in
index 96c6fe8928b1d6..7f94ebf23c2124 100644
--- a/MCONFIG.in
+++ b/MCONFIG.in
@@ -42,6 +42,7 @@ HAVE_CROND = @have_crond@
 CROND_DIR = @crond_dir@
 HAVE_SYSTEMD = @have_systemd@
 SYSTEMD_SYSTEM_UNIT_DIR = @systemd_system_unit_dir@
+HAVE_FUSE_SERVICE = @have_fuse_service@
 
 @SET_MAKE@
 
diff --git a/configure b/configure
index 4137f942efaef5..b2b8bbf2f92ea3 100755
--- a/configure
+++ b/configure
@@ -703,6 +703,8 @@ UNI_DIFF_OPTS
 SEM_INIT_LIB
 FUSE4FS_CMT
 FUSE2FS_CMT
+fuse_service_socket_dir
+have_fuse_service
 FUSE_LIB
 fuse3_LIBS
 fuse3_CFLAGS
@@ -933,6 +935,7 @@ with_libiconv_prefix
 with_libintl_prefix
 enable_largefile
 with_libarchive
+with_fuse_service_socket_dir
 enable_fuse2fs
 enable_fuse4fs
 enable_lto
@@ -1654,6 +1657,8 @@ Optional Packages:
   --with-libintl-prefix[=DIR]  search for libintl in DIR/include and DIR/lib
   --without-libintl-prefix     don't search for libintl in includedir and libdir
   --without-libarchive    disable use of libarchive
+  --with-fuse-service-socket-dir[=DIR]
+                          Create fuse3 filesystem service sockets in DIR.
   --with-multiarch=ARCH   specify the multiarch triplet
   --with-udev-rules-dir[=DIR]
                           Install udev rules into DIR.
@@ -14336,6 +14341,134 @@ printf "%s\n" "#define HAVE_FUSE_LOWLEVEL 1" >>confdefs.h
 
 fi
 
+have_fuse_service=
+fuse_service_socket_dir=
+if test -n "$have_fuse_lowlevel"
+then
+
+# Check whether --with-fuse_service_socket_dir was given.
+if test ${with_fuse_service_socket_dir+y}
+then :
+  withval=$with_fuse_service_socket_dir;
+else $as_nop
+  with_fuse_service_socket_dir=yes
+fi
+
+	if test "x${with_fuse_service_socket_dir}" != "xno"
+then :
+
+		if test "x${with_fuse_service_socket_dir}" = "xyes"
+then :
+
+
+pkg_failed=no
+{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for fuse3" >&5
+printf %s "checking for fuse3... " >&6; }
+
+if test -n "$fuse3_CFLAGS"; then
+    pkg_cv_fuse3_CFLAGS="$fuse3_CFLAGS"
+ elif test -n "$PKG_CONFIG"; then
+    if test -n "$PKG_CONFIG" && \
+    { { printf "%s\n" "$as_me:${as_lineno-$LINENO}: \$PKG_CONFIG --exists --print-errors \"fuse3\""; } >&5
+  ($PKG_CONFIG --exists --print-errors "fuse3") 2>&5
+  ac_status=$?
+  printf "%s\n" "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
+  test $ac_status = 0; }; then
+  pkg_cv_fuse3_CFLAGS=`$PKG_CONFIG --cflags "fuse3" 2>/dev/null`
+		      test "x$?" != "x0" && pkg_failed=yes
+else
+  pkg_failed=yes
+fi
+ else
+    pkg_failed=untried
+fi
+if test -n "$fuse3_LIBS"; then
+    pkg_cv_fuse3_LIBS="$fuse3_LIBS"
+ elif test -n "$PKG_CONFIG"; then
+    if test -n "$PKG_CONFIG" && \
+    { { printf "%s\n" "$as_me:${as_lineno-$LINENO}: \$PKG_CONFIG --exists --print-errors \"fuse3\""; } >&5
+  ($PKG_CONFIG --exists --print-errors "fuse3") 2>&5
+  ac_status=$?
+  printf "%s\n" "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
+  test $ac_status = 0; }; then
+  pkg_cv_fuse3_LIBS=`$PKG_CONFIG --libs "fuse3" 2>/dev/null`
+		      test "x$?" != "x0" && pkg_failed=yes
+else
+  pkg_failed=yes
+fi
+ else
+    pkg_failed=untried
+fi
+
+
+
+if test $pkg_failed = yes; then
+        { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: no" >&5
+printf "%s\n" "no" >&6; }
+
+if $PKG_CONFIG --atleast-pkgconfig-version 0.20; then
+        _pkg_short_errors_supported=yes
+else
+        _pkg_short_errors_supported=no
+fi
+        if test $_pkg_short_errors_supported = yes; then
+                fuse3_PKG_ERRORS=`$PKG_CONFIG --short-errors --print-errors --cflags --libs "fuse3" 2>&1`
+        else
+                fuse3_PKG_ERRORS=`$PKG_CONFIG --print-errors --cflags --libs "fuse3" 2>&1`
+        fi
+        # Put the nasty error message in config.log where it belongs
+        echo "$fuse3_PKG_ERRORS" >&5
+
+
+				with_fuse_service_socket_dir=""
+
+elif test $pkg_failed = untried; then
+        { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: no" >&5
+printf "%s\n" "no" >&6; }
+
+				with_fuse_service_socket_dir=""
+
+else
+        fuse3_CFLAGS=$pkg_cv_fuse3_CFLAGS
+        fuse3_LIBS=$pkg_cv_fuse3_LIBS
+        { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: yes" >&5
+printf "%s\n" "yes" >&6; }
+
+				with_fuse_service_socket_dir="$($PKG_CONFIG --variable=service_socket_dir fuse3)"
+
+fi
+
+
+fi
+		{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for fuse3 service socket dir" >&5
+printf %s "checking for fuse3 service socket dir... " >&6; }
+		fuse_service_socket_dir="${with_fuse_service_socket_dir}"
+		if test -n "${fuse_service_socket_dir}"
+then :
+
+			{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: ${fuse_service_socket_dir}" >&5
+printf "%s\n" "${fuse_service_socket_dir}" >&6; }
+			have_fuse_service="yes"
+
+else $as_nop
+
+			{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: no" >&5
+printf "%s\n" "no" >&6; }
+			have_fuse_service="no"
+
+fi
+
+fi
+fi
+
+
+if test "$have_fuse_service" = yes
+then
+
+printf "%s\n" "#define HAVE_FUSE_SERVICE 1" >>confdefs.h
+
+fi
+
 FUSE2FS_CMT=
 # Check whether --enable-fuse2fs was given.
 if test ${enable_fuse2fs+y}
diff --git a/configure.ac b/configure.ac
index a1057c07b8c056..7d3e3d86fff94e 100644
--- a/configure.ac
+++ b/configure.ac
@@ -1455,6 +1455,51 @@ then
 		  [Define to 1 if fuse supports lowlevel API])
 fi
 
+dnl
+dnl Check if the FUSE library tells us where to put fs service sockets
+dnl
+have_fuse_service=
+fuse_service_socket_dir=
+if test -n "$have_fuse_lowlevel"
+then
+	AC_ARG_WITH([fuse_service_socket_dir],
+	  [AS_HELP_STRING([--with-fuse-service-socket-dir@<:@=DIR@:>@],
+		  [Create fuse3 filesystem service sockets in DIR.])],
+	  [],
+	  [with_fuse_service_socket_dir=yes])
+	AS_IF([test "x${with_fuse_service_socket_dir}" != "xno"],
+	  [
+		AS_IF([test "x${with_fuse_service_socket_dir}" = "xyes"],
+		  [
+			PKG_CHECK_MODULES([fuse3], [fuse3],
+			  [
+				with_fuse_service_socket_dir="$($PKG_CONFIG --variable=service_socket_dir fuse3)"
+			  ], [
+				with_fuse_service_socket_dir=""
+			  ])
+			m4_pattern_allow([^PKG_(MAJOR|MINOR|BUILD|REVISION)$])
+		  ])
+		AC_MSG_CHECKING([for fuse3 service socket dir])
+		fuse_service_socket_dir="${with_fuse_service_socket_dir}"
+		AS_IF([test -n "${fuse_service_socket_dir}"],
+		  [
+			AC_MSG_RESULT(${fuse_service_socket_dir})
+			have_fuse_service="yes"
+		  ],
+		  [
+			AC_MSG_RESULT(no)
+			have_fuse_service="no"
+		  ])
+	  ],
+	  [])
+fi
+AC_SUBST(have_fuse_service)
+AC_SUBST(fuse_service_socket_dir)
+if test "$have_fuse_service" = yes
+then
+	AC_DEFINE(HAVE_FUSE_SERVICE, 1, [Define to 1 if fuse supports service])
+fi
+
 dnl
 dnl Check if fuse2fs is actually built.
 dnl
diff --git a/debian/fuse4fs.install b/debian/fuse4fs.install
index fb8c8ab671c73c..2da71546e8c1d5 100644
--- a/debian/fuse4fs.install
+++ b/debian/fuse4fs.install
@@ -1,2 +1,4 @@
 /usr/bin/fuse4fs
 /usr/share/man/man1/fuse4fs.1
+[linux-any] lib/systemd/system/fuse4fs.socket
+[linux-any] lib/systemd/system/fuse4fs@.service
diff --git a/fuse4fs/Makefile.in b/fuse4fs/Makefile.in
index 0a558da23ced81..ef15316eff59ca 100644
--- a/fuse4fs/Makefile.in
+++ b/fuse4fs/Makefile.in
@@ -17,6 +17,13 @@ UMANPAGES=
 @FUSE4FS_CMT@UPROGS+=fuse4fs
 @FUSE4FS_CMT@UMANPAGES+=fuse4fs.1
 
+ifeq ($(HAVE_SYSTEMD),yes)
+SERVICE_FILES	+= fuse4fs.socket fuse4fs@.service
+INSTALLDIRS_TGT	+= installdirs-systemd
+INSTALL_TGT	+= install-systemd
+UNINSTALL_TGT	+= uninstall-systemd
+endif
+
 FUSE4FS_OBJS=	fuse4fs.o journal.o recovery.o revoke.o
 
 PROFILED_FUSE4FS_OJBS=	profiled/fuse4fs.o profiled/journal.o \
@@ -54,7 +61,7 @@ DEPEND_CFLAGS = -I$(top_srcdir)/e2fsck
 @PROFILE_CMT@	$(Q) $(CC) $(ALL_CFLAGS) -g -pg -o profiled/$*.o -c $<
 
 all:: profiled $(SPROGS) $(UPROGS) $(USPROGS) $(SMANPAGES) $(UMANPAGES) \
-	$(FMANPAGES) $(LPROGS)
+	$(FMANPAGES) $(LPROGS) $(SERVICE_FILES)
 
 all-static::
 
@@ -71,6 +78,14 @@ fuse4fs: $(FUSE4FS_OBJS) $(DEPLIBS) $(DEPLIBBLKID) $(DEPLIBUUID) \
 		$(LIBFUSE) $(LIBBLKID) $(LIBUUID) $(LIBEXT2FS) $(LIBINTL) \
 		$(CLOCK_GETTIME_LIB) $(SYSLIBS) $(LIBS_E2P)
 
+%.socket: %.socket.in $(DEP_SUBSTITUTE)
+	$(E) "	SUBST $@"
+	$(Q) $(SUBSTITUTE_UPTIME) $< $@
+
+%.service: %.service.in $(DEP_SUBSTITUTE)
+	$(E) "	SUBST $@"
+	$(Q) $(SUBSTITUTE_UPTIME) $< $@
+
 journal.o: $(srcdir)/../debugfs/journal.c
 	$(E) "	CC $<"
 	$(Q) $(CC) -c $(JOURNAL_CFLAGS) -I$(srcdir) \
@@ -93,11 +108,15 @@ fuse4fs.1: $(DEP_SUBSTITUTE) $(srcdir)/fuse4fs.1.in
 	$(E) "	SUBST $@"
 	$(Q) $(SUBSTITUTE_UPTIME) $(srcdir)/fuse4fs.1.in fuse4fs.1
 
-installdirs:
+installdirs: $(INSTALLDIRS_TGT)
 	$(E) "	MKDIR_P $(bindir) $(man1dir)"
 	$(Q) $(MKDIR_P) $(DESTDIR)$(bindir) $(DESTDIR)$(man1dir)
 
-install: all $(UMANPAGES) installdirs
+installdirs-systemd:
+	$(E) "	MKDIR_P $(SYSTEMD_SYSTEM_UNIT_DIR)"
+	$(Q) $(MKDIR_P) $(DESTDIR)$(SYSTEMD_SYSTEM_UNIT_DIR)
+
+install: all $(UMANPAGES) installdirs $(INSTALL_TGT)
 	$(Q) for i in $(UPROGS); do \
 		$(ES) "	INSTALL $(bindir)/$$i"; \
 		$(INSTALL_PROGRAM) $$i $(DESTDIR)$(bindir)/$$i; \
@@ -110,13 +129,19 @@ install: all $(UMANPAGES) installdirs
 		$(INSTALL_DATA) $$i $(DESTDIR)$(man1dir)/$$i; \
 	done
 
+install-systemd: $(SERVICE_FILES) installdirs-systemd
+	$(Q) for i in $(SERVICE_FILES); do \
+		$(ES) "	INSTALL_DATA $(SYSTEMD_SYSTEM_UNIT_DIR)/$$i"; \
+		$(INSTALL_DATA) $$i $(DESTDIR)$(SYSTEMD_SYSTEM_UNIT_DIR)/$$i; \
+	done
+
 install-strip: install
 	$(Q) for i in $(UPROGS); do \
 		$(E) "	STRIP $(bindir)/$$i"; \
 		$(STRIP) $(DESTDIR)$(bindir)/$$i; \
 	done
 
-uninstall:
+uninstall: $(UNINSTALL_TGT)
 	for i in $(UPROGS); do \
 		$(RM) -f $(DESTDIR)$(bindir)/$$i; \
 	done
@@ -124,9 +149,16 @@ uninstall:
 		$(RM) -f $(DESTDIR)$(man1dir)/$$i; \
 	done
 
+uninstall-systemd:
+	for i in $(SERVICE_FILES); do \
+		$(RM) -f $(DESTDIR)$(SYSTEMD_SYSTEM_UNIT_DIR)/$$i; \
+	done
+
 clean::
 	$(RM) -f $(UPROGS) $(UMANPAGES) profile.h \
 		fuse4fs.profiled \
+		$(SERVICE_FILES) \
+		fuse4fs.socket \
 		profiled/*.o \#* *.s *.o *.a *~ core gmon.out
 
 mostlyclean: clean
diff --git a/fuse4fs/fuse4fs.c b/fuse4fs/fuse4fs.c
index 3e8822fac08630..db86a749b74af0 100644
--- a/fuse4fs/fuse4fs.c
+++ b/fuse4fs/fuse4fs.c
@@ -44,6 +44,10 @@
 # define _FILE_OFFSET_BITS 64
 #endif /* _FILE_OFFSET_BITS */
 #include <fuse_lowlevel.h>
+#ifdef HAVE_FUSE_SERVICE
+# include <sys/mount.h>
+# include <fuse_service.h>
+#endif
 #ifdef __SET_FOB_FOR_FUSE
 # undef _FILE_OFFSET_BITS
 #endif /* __SET_FOB_FOR_FUSE */
@@ -301,6 +305,11 @@ struct fuse4fs {
 #endif
 	struct fuse_session *fuse;
 	struct cache inodes;
+#ifdef HAVE_FUSE_SERVICE
+	struct fuse_service *service;
+	int bdev_fd;
+	int fusedev_fd;
+#endif
 };
 
 #define FUSE4FS_CHECK_HANDLE(req, fh) \
@@ -1196,6 +1205,167 @@ static int fuse4fs_inum_access(struct fuse4fs *ff, const struct fuse_ctx *ctxt,
 	return -EACCES;
 }
 
+#ifdef HAVE_FUSE_SERVICE
+static inline bool fuse4fs_is_service(const struct fuse4fs *ff)
+{
+	return fuse_service_accepted(ff->service);
+}
+
+static int fuse4fs_service_connect(struct fuse4fs *ff, struct fuse_args *args)
+{
+	int ret;
+
+	ret = fuse_service_accept(&ff->service);
+	if (ret)
+		return ret;
+
+	if (fuse4fs_is_service(ff))
+		fuse_service_append_args(ff->service, args);
+
+	return 0;
+}
+
+static inline int
+fuse4fs_service_parse_cmdline(struct fuse_args *args,
+			      struct fuse_cmdline_opts *opts)
+{
+	return fuse_service_parse_cmdline_opts(args, opts);
+}
+
+static void fuse4fs_service_release(struct fuse4fs *ff, int mount_ret)
+{
+	if (fuse4fs_is_service(ff)) {
+		fuse_service_send_goodbye(ff->service, mount_ret);
+		fuse_service_release(ff->service);
+	}
+}
+
+static int fuse4fs_service_finish(struct fuse4fs *ff, int ret)
+{
+	if (!fuse4fs_is_service(ff))
+		return ret;
+
+	fuse_service_destroy(&ff->service);
+
+	/*
+	 * If we're being run as a service, the return code must fit the LSB
+	 * init script action error guidelines, which is to say that we
+	 * compress all errors to 1 ("generic or unspecified error", LSB 5.0
+	 * section 22.2) and hope the admin will scan the log for what actually
+	 * happened.
+	 *
+	 * We have to sleep 2 seconds here because journald uses the pid to
+	 * connect our log messages to the systemd service.  This is critical
+	 * for capturing all the log messages if the scrub fails, because the
+	 * fail service uses the service name to gather log messages for the
+	 * error report.
+	 */
+	sleep(2);
+	if (ret != EXIT_SUCCESS)
+		return EXIT_FAILURE;
+	return EXIT_SUCCESS;
+}
+
+static int fuse4fs_service_get_config(struct fuse4fs *ff)
+{
+	int open_flags = O_RDWR | O_EXCL;
+	int ret;
+
+retry:
+	ret = fuse_service_request_file(ff->service, ff->device, open_flags,
+					0);
+	if (ret)
+		return ret;
+
+	ret = fuse_service_receive_file(ff->service, ff->device, &ff->bdev_fd);
+	if (ret == -2) {
+		if (errno == EACCES && (open_flags & O_ACCMODE) != O_RDONLY) {
+			open_flags = O_RDONLY | O_EXCL;
+			goto retry;
+		}
+		err_printf(ff, "opening %s: %s.\n", ff->device, strerror(errno));
+		return ret;
+	}
+	if (ret)
+		return ret;
+
+	if (ff->bdev_fd < 0) {
+		err_printf(ff, "%s: %s: %s.\n", ff->device,
+			   _("opening service"), strerror(-ff->bdev_fd));
+		return -1;
+	}
+
+	ret = fuse_service_finish_file_requests(ff->service);
+	if (ret)
+		return ret;
+
+	ff->fusedev_fd = fuse_service_take_fusedev(ff->service);
+	return 0;
+}
+
+static errcode_t fuse4fs_service_openfs(struct fuse4fs *ff, char *options,
+					int flags)
+{
+	char path[32];
+
+	snprintf(path, sizeof(path), "%d", ff->bdev_fd);
+	iocache_set_backing_manager(unixfd_io_manager);
+	return ext2fs_open2(path, options, flags, 0, 0, iocache_io_manager,
+			&ff->fs);
+}
+
+static int fuse4fs_service_configure_iomap(struct fuse4fs *ff)
+{
+	int error = 0;
+	int ret;
+
+	ret = fuse_service_configure_iomap(ff->service,
+					   ff->iomap_want == FT_ENABLE,
+					   &error);
+	if (ret)
+		return -1;
+
+	if (error) {
+		err_printf(ff, "%s: %s.\n", _("enabling iomap"),
+			   strerror(error));
+		return -1;
+	}
+
+	return 0;
+}
+
+static int fuse4fs_service(struct fuse4fs *ff, struct fuse_session *se,
+			   const char *mountpoint)
+{
+	char path[32];
+	int ret = 0;
+
+	snprintf(path, sizeof(path), "/dev/fd/%d", ff->fusedev_fd);
+	ret = fuse_session_mount(se, path);
+	if (ret)
+		return ret;
+
+	ret = fuse_service_mount(ff->service, se, mountpoint);
+	if (ret) {
+		err_printf(ff, "%s: %s\n", _("mounting filesystem"),
+			   strerror(errno));
+		return ret;
+	}
+
+	return 0;
+}
+#else
+# define fuse4fs_is_service(...)		(false)
+# define fuse4fs_service_connect(...)		(0)
+# define fuse4fs_service_parse_cmdline(...)	(EOPNOTSUPP)
+# define fuse4fs_service_release(...)		((void)0)
+# define fuse4fs_service_finish(ret)		(ret)
+# define fuse4fs_service_get_config(...)	(EOPNOTSUPP)
+# define fuse4fs_service_openfs(...)		(EOPNOTSUPP)
+# define fuse4fs_service_configure_iomap(...)	(EOPNOTSUPP)
+# define fuse4fs_service(...)			(EOPNOTSUPP)
+#endif
+
 static errcode_t fuse4fs_acquire_lockfile(struct fuse4fs *ff)
 {
 	char *resolved;
@@ -1290,16 +1460,22 @@ static errcode_t fuse4fs_open(struct fuse4fs *ff, int libext2_flags)
 	dbg_printf(ff, "opening with flags=0x%x\n", flags);
 
 	iocache_set_backing_manager(unix_io_manager);
-	err = ext2fs_open2(ff->device, options, flags, 0, 0, iocache_io_manager,
-			   &ff->fs);
+	if (fuse4fs_is_service(ff))
+		err = fuse4fs_service_openfs(ff, options, flags);
+	else
+		err = ext2fs_open2(ff->device, options, flags, 0, 0,
+				   iocache_io_manager, &ff->fs);
 	if (err == EPERM) {
 		err_printf(ff, "%s.\n",
 			   _("read-only device, trying to mount norecovery"));
 		flags &= ~EXT2_FLAG_RW;
 		ff->ro = 1;
 		ff->norecovery = 1;
-		err = ext2fs_open2(ff->device, options, flags, 0, 0,
-				   iocache_io_manager, &ff->fs);
+		if (fuse4fs_is_service(ff))
+			err = fuse4fs_service_openfs(ff, options, flags);
+		else
+			err = ext2fs_open2(ff->device, options, flags, 0, 0,
+					   iocache_io_manager, &ff->fs);
 	}
 	if (err) {
 		err_printf(ff, "%s.\n", error_message(err));
@@ -1599,6 +1775,10 @@ static int fuse4fs_setup_logging(struct fuse4fs *ff)
 	if (logfile)
 		return fuse4fs_capture_output(ff, logfile);
 
+	/* systemd already hooked us up to /dev/ttyprintk */
+	if (fuse4fs_is_service(ff))
+		return 0;
+
 	/* in kernel mode, try to log errors to the kernel log */
 	if (ff->kernel)
 		fuse4fs_capture_output(ff, "/dev/ttyprintk");
@@ -7370,7 +7550,11 @@ static inline bool fuse4fs_discover_iomap(struct fuse4fs *ff)
 	if (ff->iomap_want == FT_DISABLE)
 		return false;
 
+#ifdef HAVE_FUSE_SERVICE
+	ff->iomap_cap = fuse_lowlevel_discover_iomap(ff->fusedev_fd);
+#else
 	ff->iomap_cap = fuse_lowlevel_discover_iomap(-1);
+#endif
 	return ff->iomap_cap & FUSE_IOMAP_SUPPORT_FILEIO;
 }
 #else
@@ -7408,7 +7592,11 @@ static int fuse4fs_main(struct fuse_args *args, struct fuse4fs *ff)
 	struct fuse_loop_config *loop_config = NULL;
 	int ret;
 
-	if (fuse_parse_cmdline(args, &opts) != 0) {
+	if (fuse4fs_is_service(ff))
+		ret = fuse4fs_service_parse_cmdline(args, &opts);
+	else
+		ret = fuse_parse_cmdline(args, &opts);
+	if (ret != 0) {
 		ret = 1;
 		goto out;
 	}
@@ -7441,7 +7629,18 @@ static int fuse4fs_main(struct fuse_args *args, struct fuse4fs *ff)
 	}
 	ff->fuse = se;
 
-	if (fuse_session_mount(se, opts.mountpoint) != 0) {
+	if (fuse4fs_is_service(ff)) {
+		/*
+		 * foreground mode is needed so that systemd actually tracks
+		 * the service correctly and doesnt try to kill it; and so that
+		 * stdout/stderr don't get zapped
+		 */
+		opts.foreground = 1;
+		ret = fuse4fs_service(ff, se, opts.mountpoint);
+	} else {
+		ret = fuse_session_mount(se, opts.mountpoint);
+	}
+	if (ret != 0) {
 		ret = 4;
 		goto out_destroy_session;
 	}
@@ -7482,6 +7681,8 @@ static int fuse4fs_main(struct fuse_args *args, struct fuse4fs *ff)
 	fuse_loop_cfg_set_idle_threads(loop_config, opts.max_idle_threads);
 	fuse_loop_cfg_set_max_threads(loop_config, 4);
 
+	fuse4fs_service_release(ff, 0);
+
 	if (fuse_session_loop_mt(se, loop_config) != 0) {
 		ret = 8;
 		goto out_loopcfg;
@@ -7499,6 +7700,7 @@ static int fuse4fs_main(struct fuse_args *args, struct fuse4fs *ff)
 out_free_opts:
 	free(opts.mountpoint);
 out:
+	fuse4fs_service_release(ff, ret);
 	return ret;
 }
 
@@ -7517,6 +7719,10 @@ int main(int argc, char *argv[])
 #endif
 		.translate_inums = 1,
 		.write_gdt_on_destroy = 1,
+#ifdef HAVE_FUSE_SERVICE
+		.bdev_fd = -1,
+		.fusedev_fd = -1,
+#endif
 	};
 	errcode_t err;
 	FILE *orig_stderr = stderr;
@@ -7524,6 +7730,22 @@ int main(int argc, char *argv[])
 	bool iomap_detected = false;
 	int ret;
 
+	/* XXX */
+	if (getenv("FUSE4FS_DEBUGGER")) {
+		char *moo = getenv("FUSE4FS_DEBUGGER");
+		int del = atoi(moo);
+
+		fprintf(stderr, "WAITING %ds FOR DEBUGGER\n", del);
+		fflush(stderr);
+		sleep(del);
+	}
+
+	ret = fuse4fs_service_connect(&fctx, &args);
+	if (ret) {
+		fprintf(stderr, "Could not connect to service socket!\n");
+		exit(1);
+	}
+
 	ret = fuse_opt_parse(&args, &fctx, fuse4fs_opts, fuse4fs_opt_proc);
 	if (ret)
 		exit(1);
@@ -7565,6 +7787,22 @@ int main(int argc, char *argv[])
 		goto out;
 	}
 
+	if (fuse4fs_is_service(&fctx)) {
+		ret = fuse4fs_service_get_config(&fctx);
+		if (ret) {
+			ret = 2;
+			goto out;
+		}
+
+		if (fctx.iomap_want != FT_DISABLE) {
+			ret = fuse4fs_service_configure_iomap(&fctx);
+			if (ret) {
+				ret = 2;
+				goto out;
+			}
+		}
+	}
+
 #ifdef HAVE_PR_SET_IO_FLUSHER
 	/*
 	 * Register as a filesystem I/O server process so that our memory
@@ -7668,7 +7906,7 @@ int main(int argc, char *argv[])
 
 	/* Set up default fuse parameters */
 	snprintf(extra_args, BUFSIZ, "-osubtype=%s,fsname=%s",
-		 get_subtype(argv[0]),
+		 get_subtype(args.argv[0]),
 		 fctx.device);
 	if (fctx.no_default_opts == 0)
 		fuse_opt_add_arg(&args, extra_args);
@@ -7748,6 +7986,7 @@ int main(int argc, char *argv[])
 	err_shortdev = NULL;
 	if (fctx.device)
 		free(fctx.device);
+	ret = fuse4fs_service_finish(&fctx, ret);
 	fuse_opt_free_args(&args);
 	return ret;
 }
diff --git a/fuse4fs/fuse4fs.socket.in b/fuse4fs/fuse4fs.socket.in
new file mode 100644
index 00000000000000..58b9173c0bd727
--- /dev/null
+++ b/fuse4fs/fuse4fs.socket.in
@@ -0,0 +1,14 @@
+# SPDX-License-Identifier: GPL-2.0-or-later
+#
+# Copyright (C) 2025 Oracle.  All Rights Reserved.
+# Author: Darrick J. Wong <djwong@kernel.org>
+[Unit]
+Description=Socket for ext4 Service
+
+[Socket]
+ListenSequentialPacket=@fuse_service_socket_dir@/ext2
+ListenSequentialPacket=@fuse_service_socket_dir@/ext3
+ListenSequentialPacket=@fuse_service_socket_dir@/ext4
+Accept=yes
+SocketMode=0660
+RemoveOnStop=yes
diff --git a/fuse4fs/fuse4fs@.service.in b/fuse4fs/fuse4fs@.service.in
new file mode 100644
index 00000000000000..4765df462c6461
--- /dev/null
+++ b/fuse4fs/fuse4fs@.service.in
@@ -0,0 +1,95 @@
+# SPDX-License-Identifier: GPL-2.0-or-later
+#
+# Copyright (C) 2025 Oracle.  All Rights Reserved.
+# Author: Darrick J. Wong <djwong@kernel.org>
+[Unit]
+Description=ext4 Service
+
+[Service]
+Type=exec
+ExecStart=@bindir@/fuse4fs -o kernel
+
+# Try to capture core dumps
+LimitCORE=infinity
+
+SyslogIdentifier=%N
+
+# No realtime CPU scheduling
+RestrictRealtime=true
+
+# Don't let us see anything in the regular system, and don't run as root
+DynamicUser=true
+ProtectSystem=strict
+ProtectHome=true
+PrivateTmp=true
+PrivateDevices=true
+PrivateUsers=true
+
+# No network access
+PrivateNetwork=true
+ProtectHostname=true
+RestrictAddressFamilies=none
+IPAddressDeny=any
+
+# Don't let the program mess with the kernel configuration at all
+ProtectKernelLogs=true
+ProtectKernelModules=true
+ProtectKernelTunables=true
+ProtectControlGroups=true
+ProtectProc=invisible
+RestrictNamespaces=true
+RestrictFileSystems=
+
+# Hide everything in /proc, even /proc/mounts
+ProcSubset=pid
+
+# Only allow the default personality Linux
+LockPersonality=true
+
+# No writable memory pages
+MemoryDenyWriteExecute=true
+
+# Don't let our mounts leak out to the host
+PrivateMounts=true
+
+# Restrict system calls to the native arch and only enough to get things going
+SystemCallArchitectures=native
+SystemCallFilter=@system-service
+SystemCallFilter=~@privileged
+SystemCallFilter=~@resources
+
+SystemCallFilter=~@clock
+SystemCallFilter=~@cpu-emulation
+SystemCallFilter=~@debug
+SystemCallFilter=~@module
+SystemCallFilter=~@reboot
+SystemCallFilter=~@swap
+
+SystemCallFilter=~@mount
+
+# Leave a breadcrumb if we get whacked by the system call filter
+SystemCallErrorNumber=EL3RST
+
+# Log to the kernel dmesg, just like an in-kernel ext4 driver
+StandardOutput=append:/dev/ttyprintk
+StandardError=append:/dev/ttyprintk
+
+# Run with no capabilities at all
+CapabilityBoundingSet=
+AmbientCapabilities=
+NoNewPrivileges=true
+
+# fuse4fs doesn't create files
+UMask=7777
+
+# No access to hardware /dev files at all
+ProtectClock=true
+DevicePolicy=closed
+
+# Don't mess with set[ug]id anything.
+RestrictSUIDSGID=true
+
+# Don't let OOM kills of processes in this containment group kill the whole
+# service, because we don't want filesystem drivers to go down.
+OOMPolicy=continue
+OOMScoreAdjust=-1000
diff --git a/lib/config.h.in b/lib/config.h.in
index 55e515020af422..dcbbb3a7bf1ac4 100644
--- a/lib/config.h.in
+++ b/lib/config.h.in
@@ -79,6 +79,9 @@
 /* Define to 1 if fuse supports iomap */
 #undef HAVE_FUSE_IOMAP
 
+/* Define to 1 if fuse supports service */
+#undef HAVE_FUSE_SERVICE
+
 /* Define to 1 if you have the Mac OS X function
    CFLocaleCopyPreferredLanguages in the CoreFoundation framework. */
 #undef HAVE_CFLOCALECOPYPREFERREDLANGUAGES
diff --git a/util/subst.conf.in b/util/subst.conf.in
index 5af5e356d46ac7..5fc7cf8f33fa76 100644
--- a/util/subst.conf.in
+++ b/util/subst.conf.in
@@ -24,3 +24,5 @@ root_bindir		@root_bindir@
 libdir			@libdir@
 $exec_prefix		@exec_prefix@
 pkglibexecdir		@libexecdir@/e2fsprogs
+bindir			@bindir@
+fuse_service_socket_dir	@fuse_service_socket_dir@


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 3/4] fuse4fs: set proc title when in fuse service mode
  2025-09-16  0:24 ` [PATCHSET RFC v5 9/9] fuse4fs: run servers as a contained service Darrick J. Wong
  2025-09-16  1:08   ` [PATCH 1/4] libext2fs: fix MMP code to work with unixfd IO manager Darrick J. Wong
  2025-09-16  1:08   ` [PATCH 2/4] fuse4fs: enable safe service mode Darrick J. Wong
@ 2025-09-16  1:09   ` Darrick J. Wong
  2025-09-16  1:09   ` [PATCH 4/4] fuse4fs: set iomap backing device blocksize Darrick J. Wong
  3 siblings, 0 replies; 87+ messages in thread
From: Darrick J. Wong @ 2025-09-16  1:09 UTC (permalink / raw)
  To: tytso; +Cc: miklos, neal, linux-fsdevel, linux-ext4, John, bernd,
	joannelkoong

From: Darrick J. Wong <djwong@kernel.org>

When in fuse service mode, set the proc title so that we can identify
fuse servers by mount arguments.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 configure           |   48 ++++++++++++++++++++++++++++++++++++++++++++++++
 configure.ac        |   24 ++++++++++++++++++++++++
 fuse4fs/Makefile.in |    2 +-
 fuse4fs/fuse4fs.c   |   23 ++++++++++++++++++++++-
 lib/config.h.in     |    3 +++
 5 files changed, 98 insertions(+), 2 deletions(-)


diff --git a/configure b/configure
index b2b8bbf2f92ea3..d089e5e35a66c3 100755
--- a/configure
+++ b/configure
@@ -701,6 +701,7 @@ gcc_ranlib
 gcc_ar
 UNI_DIFF_OPTS
 SEM_INIT_LIB
+LIBBSD_LIB
 FUSE4FS_CMT
 FUSE2FS_CMT
 fuse_service_socket_dir
@@ -14599,6 +14600,53 @@ fi
 
 
 
+{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for setproctitle in -lbsd" >&5
+printf %s "checking for setproctitle in -lbsd... " >&6; }
+if test ${ac_cv_lib_bsd_setproctitle+y}
+then :
+  printf %s "(cached) " >&6
+else $as_nop
+  ac_check_lib_save_LIBS=$LIBS
+LIBS="-lbsd  $LIBS"
+cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+
+/* Override any GCC internal prototype to avoid an error.
+   Use char because int might match the return type of a GCC
+   builtin and then its argument prototype would still apply.  */
+char setproctitle ();
+int
+main (void)
+{
+return setproctitle ();
+  ;
+  return 0;
+}
+_ACEOF
+if ac_fn_c_try_link "$LINENO"
+then :
+  ac_cv_lib_bsd_setproctitle=yes
+else $as_nop
+  ac_cv_lib_bsd_setproctitle=no
+fi
+rm -f core conftest.err conftest.$ac_objext conftest.beam \
+    conftest$ac_exeext conftest.$ac_ext
+LIBS=$ac_check_lib_save_LIBS
+fi
+{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_cv_lib_bsd_setproctitle" >&5
+printf "%s\n" "$ac_cv_lib_bsd_setproctitle" >&6; }
+if test "x$ac_cv_lib_bsd_setproctitle" = xyes
+then :
+  LIBBSD_LIB=-lbsd
+fi
+
+
+if test "$ac_cv_lib_bsd_setproctitle" = yes ; then
+	printf "%s\n" "#define HAVE_SETPROCTITLE 1" >>confdefs.h
+
+fi
+
+
 { printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for PR_SET_IO_FLUSHER" >&5
 printf %s "checking for PR_SET_IO_FLUSHER... " >&6; }
 cat confdefs.h - <<_ACEOF >conftest.$ac_ext
diff --git a/configure.ac b/configure.ac
index 7d3e3d86fff94e..603d6ec1a1712c 100644
--- a/configure.ac
+++ b/configure.ac
@@ -1574,6 +1574,30 @@ AS_HELP_STRING([--disable-fuse4fs],[do not build fuse4fs]),
 )
 AC_SUBST(FUSE4FS_CMT)
 
+dnl
+dnl see if setproctitle exists
+dnl
+AC_CHECK_LIB(bsd, setproctitle, [LIBBSD_LIB=-lbsd])
+AC_SUBST(LIBBSD_LIB)
+if test "$ac_cv_lib_bsd_setproctitle" = yes ; then
+	AC_DEFINE(HAVE_SETPROCTITLE, 1, Define to 1 if setproctitle])
+fi
+
+dnl AC_LINK_IFELSE(
+dnl [	AC_LANG_PROGRAM([[
+dnl #define _GNU_SOURCE
+dnl #include <bsd/unistd.h>
+dnl 	]], [[
+dnl setproctitle_init(argc, argv, environ);
+dnl setproctitle("-What sourcery is this???");
+dnl 	]])
+dnl ], have_setproctitle=yes
+dnl    AC_MSG_RESULT(yes),
+dnl    AC_MSG_RESULT(no))
+dnl if test "$setproctitle" = yes; then
+dnl   AC_DEFINE(HAVE_SETPROCTITLE, 1, [Define to 1 if setproctitle exists])
+dnl fi
+
 dnl
 dnl see if PR_SET_IO_FLUSHER exists
 dnl
diff --git a/fuse4fs/Makefile.in b/fuse4fs/Makefile.in
index ef15316eff59ca..447212f836cbc0 100644
--- a/fuse4fs/Makefile.in
+++ b/fuse4fs/Makefile.in
@@ -76,7 +76,7 @@ fuse4fs: $(FUSE4FS_OBJS) $(DEPLIBS) $(DEPLIBBLKID) $(DEPLIBUUID) \
 	$(E) "	LD $@"
 	$(Q) $(CC) $(ALL_LDFLAGS) -o fuse4fs $(FUSE4FS_OBJS) $(LIBS) \
 		$(LIBFUSE) $(LIBBLKID) $(LIBUUID) $(LIBEXT2FS) $(LIBINTL) \
-		$(CLOCK_GETTIME_LIB) $(SYSLIBS) $(LIBS_E2P)
+		$(CLOCK_GETTIME_LIB) $(SYSLIBS) $(LIBS_E2P) @LIBBSD_LIB@
 
 %.socket: %.socket.in $(DEP_SUBSTITUTE)
 	$(E) "	SUBST $@"
diff --git a/fuse4fs/fuse4fs.c b/fuse4fs/fuse4fs.c
index db86a749b74af0..0e43e99c3c080d 100644
--- a/fuse4fs/fuse4fs.c
+++ b/fuse4fs/fuse4fs.c
@@ -47,6 +47,9 @@
 #ifdef HAVE_FUSE_SERVICE
 # include <sys/mount.h>
 # include <fuse_service.h>
+# ifdef HAVE_SETPROCTITLE
+#  include <bsd/unistd.h>
+# endif
 #endif
 #ifdef __SET_FOB_FOR_FUSE
 # undef _FILE_OFFSET_BITS
@@ -1221,10 +1224,24 @@ static int fuse4fs_service_connect(struct fuse4fs *ff, struct fuse_args *args)
 
 	if (fuse4fs_is_service(ff))
 		fuse_service_append_args(ff->service, args);
-
 	return 0;
 }
 
+static void fuse4fs_service_set_proc_cmdline(struct fuse4fs *ff, int argc,
+					     char *argv[],
+					     struct fuse_args *args)
+{
+	char *cmdline;
+
+	setproctitle_init(argc, argv, environ);
+	cmdline = fuse_service_cmdline(argc, argv, args);
+	if (!cmdline)
+		return;
+
+	setproctitle("-%s", cmdline);
+	free(cmdline);
+}
+
 static inline int
 fuse4fs_service_parse_cmdline(struct fuse_args *args,
 			      struct fuse_cmdline_opts *opts)
@@ -1357,6 +1374,7 @@ static int fuse4fs_service(struct fuse4fs *ff, struct fuse_session *se,
 #else
 # define fuse4fs_is_service(...)		(false)
 # define fuse4fs_service_connect(...)		(0)
+# define fuse4fs_service_set_proc_cmdline(...)	(0)
 # define fuse4fs_service_parse_cmdline(...)	(EOPNOTSUPP)
 # define fuse4fs_service_release(...)		((void)0)
 # define fuse4fs_service_finish(ret)		(ret)
@@ -7746,6 +7764,9 @@ int main(int argc, char *argv[])
 		exit(1);
 	}
 
+	if (fuse4fs_is_service(&fctx))
+		fuse4fs_service_set_proc_cmdline(&fctx, argc, argv, &args);
+
 	ret = fuse_opt_parse(&args, &fctx, fuse4fs_opts, fuse4fs_opt_proc);
 	if (ret)
 		exit(1);
diff --git a/lib/config.h.in b/lib/config.h.in
index dcbbb3a7bf1ac4..7ef6a815213856 100644
--- a/lib/config.h.in
+++ b/lib/config.h.in
@@ -358,6 +358,9 @@
 /* Define to 1 if you have the `setmntent' function. */
 #undef HAVE_SETMNTENT
 
+/* Define to 1 if setproctitle */
+#undef HAVE_SETPROCTITLE
+
 /* Define to 1 if you have the `setresgid' function. */
 #undef HAVE_SETRESGID
 


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 4/4] fuse4fs: set iomap backing device blocksize
  2025-09-16  0:24 ` [PATCHSET RFC v5 9/9] fuse4fs: run servers as a contained service Darrick J. Wong
                     ` (2 preceding siblings ...)
  2025-09-16  1:09   ` [PATCH 3/4] fuse4fs: set proc title when in fuse " Darrick J. Wong
@ 2025-09-16  1:09   ` Darrick J. Wong
  3 siblings, 0 replies; 87+ messages in thread
From: Darrick J. Wong @ 2025-09-16  1:09 UTC (permalink / raw)
  To: tytso; +Cc: miklos, neal, linux-fsdevel, linux-ext4, John, bernd,
	joannelkoong

From: Darrick J. Wong <djwong@kernel.org>

If we're running as an unprivileged iomap fuse server, we must ask the
kernel to set the blocksize of the block device.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 fuse4fs/fuse4fs.c |   41 +++++++++++++++++++++++++++++++----------
 1 file changed, 31 insertions(+), 10 deletions(-)


diff --git a/fuse4fs/fuse4fs.c b/fuse4fs/fuse4fs.c
index 0e43e99c3c080d..40171a8cab5279 100644
--- a/fuse4fs/fuse4fs.c
+++ b/fuse4fs/fuse4fs.c
@@ -1371,6 +1371,21 @@ static int fuse4fs_service(struct fuse4fs *ff, struct fuse_session *se,
 
 	return 0;
 }
+
+int fuse4fs_service_set_bdev_blocksize(struct fuse4fs *ff, int dev_index)
+{
+	int ret;
+
+	ret = fuse_lowlevel_iomap_set_blocksize(ff->fusedev_fd, dev_index,
+						ff->fs->blocksize);
+	if (ret) {
+		err_printf(ff, "%s: cannot set blocksize %u: %s\n", __func__,
+			   ff->fs->blocksize, strerror(errno));
+		return -EIO;
+	}
+
+	return 0;
+}
 #else
 # define fuse4fs_is_service(...)		(false)
 # define fuse4fs_service_connect(...)		(0)
@@ -1382,6 +1397,7 @@ static int fuse4fs_service(struct fuse4fs *ff, struct fuse_session *se,
 # define fuse4fs_service_openfs(...)		(EOPNOTSUPP)
 # define fuse4fs_service_configure_iomap(...)	(EOPNOTSUPP)
 # define fuse4fs_service(...)			(EOPNOTSUPP)
+# define fuse4fs_service_set_bdev_blocksize(...) (EOPNOTSUPP)
 #endif
 
 static errcode_t fuse4fs_acquire_lockfile(struct fuse4fs *ff)
@@ -6798,21 +6814,19 @@ static int fuse4fs_iomap_config_devices(struct fuse4fs *ff)
 {
 	errcode_t err;
 	int fd;
+	int dev_index;
 	int ret;
 
 	err = io_channel_get_fd(ff->fs->io, &fd);
 	if (err)
 		return translate_error(ff->fs, 0, err);
 
-	ret = fuse4fs_set_bdev_blocksize(ff, fd);
-	if (ret)
-		return ret;
-
-	ret = fuse_lowlevel_iomap_device_add(ff->fuse, fd, 0);
-	if (ret < 0) {
-		dbg_printf(ff, "%s: cannot register iomap dev fd=%d, err=%d\n",
-			   __func__, fd, -ret);
-		return translate_error(ff->fs, 0, -ret);
+	dev_index = fuse_lowlevel_iomap_device_add(ff->fuse, fd, 0);
+	if (dev_index < 0) {
+		ret = -dev_index;
+		dbg_printf(ff, "%s: cannot register iomap dev fd=%d: %s\n",
+			   __func__, fd, strerror(ret));
+		return translate_error(ff->fs, 0, ret);
 	}
 
 	dbg_printf(ff, "%s: registered iomap dev fd=%d iomap_dev=%u\n",
@@ -6820,7 +6834,14 @@ static int fuse4fs_iomap_config_devices(struct fuse4fs *ff)
 
 	fuse4fs_configure_atomic_write(ff, fd);
 
-	ff->iomap_dev = ret;
+	if (fuse4fs_is_service(ff))
+		ret = fuse4fs_service_set_bdev_blocksize(ff, dev_index);
+	else
+		ret = fuse4fs_set_bdev_blocksize(ff, fd);
+	if (ret)
+		return ret;
+
+	ff->iomap_dev = dev_index;
 	return 0;
 }
 


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* Re: [PATCH 10/10] libext2fs: add posix advisory locking to the unix IO manager
  2025-09-16  0:58   ` [PATCH 10/10] libext2fs: add posix advisory locking to the unix IO manager Darrick J. Wong
@ 2025-10-08 22:09     ` Darrick J. Wong
  0 siblings, 0 replies; 87+ messages in thread
From: Darrick J. Wong @ 2025-10-08 22:09 UTC (permalink / raw)
  To: tytso; +Cc: miklos, neal, linux-fsdevel, linux-ext4, John, bernd,
	joannelkoong

On Mon, Sep 15, 2025 at 05:58:43PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> Add support for using flock() to protect the files opened by the Unix IO
> manager so that we can't mount the same fs multiple times.  This also
> prevents systemd and udev from accessing the device while e2fsprogs is
> doing something with the device.
> 
> Link: https://systemd.io/BLOCK_DEVICE_LOCKING/
> Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>

This actually causes a lot of problems with fstests -- if fuse2fs
flock()s the block device, then udevd will spin in a slow trylock loop
until the bdev can be locked.  Meanwhile, any scripts calling udevadm
settle will block until fuse2fs exits (or it gives up after 2 minutes go
by), because udev still has a uevent that it cannot settle.  This causes
any test that uses udevadm settle to take forever to run.

In general, we don't want to block udev from reading the block device
while fuse2fs has it mounted.  For block devices this is unnecessary
anyway because we have O_EXCL.

However, the advisory locking is still useful for coordinating access to
filesystem images in regular files, so I'll rework this to only do it
for regular files.

--D

> ---
>  lib/ext2fs/unix_io.c |   64 ++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 64 insertions(+)
> 
> 
> diff --git a/lib/ext2fs/unix_io.c b/lib/ext2fs/unix_io.c
> index 068be689326443..55007ad7d2ae15 100644
> --- a/lib/ext2fs/unix_io.c
> +++ b/lib/ext2fs/unix_io.c
> @@ -65,6 +65,12 @@
>  #include <pthread.h>
>  #endif
>  
> +#if defined(HAVE_SYS_FILE_H) && defined(HAVE_SIGNAL_H)
> +# include <sys/file.h>
> +# include <signal.h>
> +# define WANT_LOCK_UNIX_FD
> +#endif
> +
>  #if defined(__linux__) && defined(_IO) && !defined(BLKROGET)
>  #define BLKROGET   _IO(0x12, 94) /* Get read-only status (0 = read_write).  */
>  #endif
> @@ -149,6 +155,9 @@ struct unix_private_data {
>  	pthread_mutex_t bounce_mutex;
>  	pthread_mutex_t stats_mutex;
>  #endif
> +#ifdef WANT_LOCK_UNIX_FD
> +	int	lock_flags;
> +#endif
>  };
>  
>  #define IS_ALIGNED(n, align) ((((uintptr_t) n) & \
> @@ -897,6 +906,47 @@ int ext2fs_fstat(int fd, ext2fs_struct_stat *buf)
>  #endif
>  }
>  
> +#ifdef WANT_LOCK_UNIX_FD
> +static void unix_lock_alarm_handler(int signal, siginfo_t *data, void *p)
> +{
> +	/* do nothing, the signal will abort the flock operation */
> +}
> +
> +static int unix_lock_fd(int fd, int flags)
> +{
> +	struct sigaction newsa = {
> +		.sa_flags = SA_SIGINFO,
> +		.sa_sigaction = unix_lock_alarm_handler,
> +	};
> +	struct sigaction oldsa;
> +	const int operation = (flags & IO_FLAG_EXCLUSIVE) ? LOCK_EX : LOCK_SH;
> +	int ret;
> +
> +	/* wait five seconds for the lock */
> +	ret = sigaction(SIGALRM, &newsa, &oldsa);
> +	if (ret)
> +		return ret;
> +
> +	alarm(5);
> +
> +	ret = flock(fd, operation);
> +	if (ret == 0)
> +		ret = operation;
> +	else if (errno == EINTR) {
> +		errno = EWOULDBLOCK;
> +		ret = -1;
> +	}
> +
> +	alarm(0);
> +	sigaction(SIGALRM, &oldsa, NULL);
> +	return ret;
> +}
> +
> +static void unix_unlock_fd(int fd)
> +{
> +	flock(fd, LOCK_UN);
> +}
> +#endif
>  
>  static errcode_t unix_open_channel(const char *name, int fd,
>  				   int flags, io_channel *channel,
> @@ -935,6 +985,16 @@ static errcode_t unix_open_channel(const char *name, int fd,
>  	if (retval)
>  		goto cleanup;
>  
> +#ifdef WANT_LOCK_UNIX_FD
> +	if (flags & IO_FLAG_RW) {
> +		data->lock_flags = unix_lock_fd(fd, flags);
> +		if (data->lock_flags < 0) {
> +			retval = errno;
> +			goto cleanup;
> +		}
> +	}
> +#endif
> +
>  	strcpy(io->name, name);
>  	io->private_data = data;
>  	io->block_size = 1024;
> @@ -1200,6 +1260,10 @@ static errcode_t unix_close(io_channel channel)
>  	if (retval2 && !retval)
>  		retval = retval2;
>  
> +#ifdef WANT_LOCK_UNIX_FD
> +	if (data->lock_flags)
> +		unix_unlock_fd(data->dev);
> +#endif
>  	if (close(data->dev) < 0 && !retval)
>  		retval = errno;
>  	free_cache(data);
> 
> 

^ permalink raw reply	[flat|nested] 87+ messages in thread

end of thread, other threads:[~2025-10-08 22:09 UTC | newest]

Thread overview: 87+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-16  0:07 [RFC v5] fuse: containerize ext4 for safer operation Darrick J. Wong
2025-09-16  0:21 ` [PATCHSET 1/9] fuse2fs: upgrade to libfuse 3.17 Darrick J. Wong
2025-09-16  0:49   ` [PATCH 1/4] fuse2fs: bump library version Darrick J. Wong
2025-09-16  0:50   ` [PATCH 2/4] fuse2fs: wrap the fuse_set_feature_flag helper for older libfuse Darrick J. Wong
2025-09-16  0:50   ` [PATCH 3/4] fuse2fs: disable nfs exports Darrick J. Wong
2025-09-16  0:50   ` [PATCH 4/4] fuse2fs: drop fuse 2.x support code Darrick J. Wong
2025-09-16  0:22 ` [PATCHSET RFC v5 2/9] fuse4fs: fork a low level fuse server Darrick J. Wong
2025-09-16  0:50   ` [PATCH 01/21] fuse2fs: separate libfuse3 and fuse2fs detection in configure Darrick J. Wong
2025-09-16  0:51   ` [PATCH 02/21] fuse2fs: start porting fuse2fs to lowlevel libfuse API Darrick J. Wong
2025-09-16  0:51   ` [PATCH 03/21] debian: create new package for fuse4fs Darrick J. Wong
2025-09-16  0:51   ` [PATCH 04/21] fuse4fs: namespace some helpers Darrick J. Wong
2025-09-16  0:51   ` [PATCH 05/21] fuse4fs: convert to low level API Darrick J. Wong
2025-09-16  0:52   ` [PATCH 06/21] libsupport: port the kernel list.h to libsupport Darrick J. Wong
2025-09-16  0:52   ` [PATCH 07/21] libsupport: add a cache Darrick J. Wong
2025-09-16  0:52   ` [PATCH 08/21] cache: disable debugging Darrick J. Wong
2025-09-16  0:53   ` [PATCH 09/21] cache: use modern list iterator macros Darrick J. Wong
2025-09-16  0:53   ` [PATCH 10/21] cache: embed struct cache in the owner Darrick J. Wong
2025-09-16  0:53   ` [PATCH 11/21] cache: pass cache pointer to callbacks Darrick J. Wong
2025-09-16  0:53   ` [PATCH 12/21] cache: pass a private data pointer through cache_walk Darrick J. Wong
2025-09-16  0:54   ` [PATCH 13/21] cache: add a helper to grab a new refcount for a cache_node Darrick J. Wong
2025-09-16  0:54   ` [PATCH 14/21] cache: return results of a cache flush Darrick J. Wong
2025-09-16  0:54   ` [PATCH 15/21] cache: add a "get only if incore" flag to cache_node_get Darrick J. Wong
2025-09-16  0:54   ` [PATCH 16/21] cache: support gradual expansion Darrick J. Wong
2025-09-16  0:55   ` [PATCH 17/21] cache: implement automatic shrinking Darrick J. Wong
2025-09-16  0:55   ` [PATCH 18/21] fuse4fs: add cache to track open files Darrick J. Wong
2025-09-16  0:55   ` [PATCH 19/21] fuse4fs: use the orphaned inode list Darrick J. Wong
2025-09-16  0:55   ` [PATCH 20/21] fuse4fs: implement FUSE_TMPFILE Darrick J. Wong
2025-09-16  0:56   ` [PATCH 21/21] fuse4fs: create incore reverse orphan list Darrick J. Wong
2025-09-16  0:22 ` [PATCHSET RFC v5 3/9] libext2fs: refactoring for fuse2fs iomap support Darrick J. Wong
2025-09-16  0:56   ` [PATCH 01/10] libext2fs: make it possible to extract the fd from an IO manager Darrick J. Wong
2025-09-16  0:56   ` [PATCH 02/10] libext2fs: always fsync the device when flushing the cache Darrick J. Wong
2025-09-16  0:56   ` [PATCH 03/10] libext2fs: always fsync the device when closing the unix IO manager Darrick J. Wong
2025-09-16  0:57   ` [PATCH 04/10] libext2fs: only fsync the unix fd if we wrote to the device Darrick J. Wong
2025-09-16  0:57   ` [PATCH 05/10] libext2fs: invalidate cached blocks when freeing them Darrick J. Wong
2025-09-16  0:57   ` [PATCH 06/10] libext2fs: only flush affected blocks in unix_write_byte Darrick J. Wong
2025-09-16  0:57   ` [PATCH 07/10] libext2fs: allow unix_write_byte when the write would be aligned Darrick J. Wong
2025-09-16  0:58   ` [PATCH 08/10] libext2fs: allow clients to ask to write full superblocks Darrick J. Wong
2025-09-16  0:58   ` [PATCH 09/10] libext2fs: allow callers to disallow I/O to file data blocks Darrick J. Wong
2025-09-16  0:58   ` [PATCH 10/10] libext2fs: add posix advisory locking to the unix IO manager Darrick J. Wong
2025-10-08 22:09     ` Darrick J. Wong
2025-09-16  0:22 ` [PATCHSET RFC v5 4/9] fuse2fs: use fuse iomap data paths for better file I/O performance Darrick J. Wong
2025-09-16  0:58   ` [PATCH 01/17] fuse2fs: implement bare minimum iomap for file mapping reporting Darrick J. Wong
2025-09-16  0:59   ` [PATCH 02/17] fuse2fs: add iomap= mount option Darrick J. Wong
2025-09-16  0:59   ` [PATCH 03/17] fuse2fs: implement iomap configuration Darrick J. Wong
2025-09-16  0:59   ` [PATCH 04/17] fuse2fs: register block devices for use with iomap Darrick J. Wong
2025-09-16  1:00   ` [PATCH 05/17] fuse2fs: implement directio file reads Darrick J. Wong
2025-09-16  1:00   ` [PATCH 06/17] fuse2fs: add extent dump function for debugging Darrick J. Wong
2025-09-16  1:00   ` [PATCH 07/17] fuse2fs: implement direct write support Darrick J. Wong
2025-09-16  1:00   ` [PATCH 08/17] fuse2fs: turn on iomap for pagecache IO Darrick J. Wong
2025-09-16  1:01   ` [PATCH 09/17] fuse2fs: don't zero bytes in punch hole Darrick J. Wong
2025-09-16  1:01   ` [PATCH 10/17] fuse2fs: don't do file data block IO when iomap is enabled Darrick J. Wong
2025-09-16  1:01   ` [PATCH 11/17] fuse2fs: avoid fuseblk mode if fuse-iomap support is likely Darrick J. Wong
2025-09-16  1:01   ` [PATCH 12/17] fuse2fs: enable file IO to inline data files Darrick J. Wong
2025-09-16  1:02   ` [PATCH 13/17] fuse2fs: set iomap-related inode flags Darrick J. Wong
2025-09-16  1:02   ` [PATCH 14/17] fuse2fs: configure block device block size Darrick J. Wong
2025-09-16  1:02   ` [PATCH 15/17] fuse4fs: separate invalidation Darrick J. Wong
2025-09-16  1:02   ` [PATCH 16/17] fuse2fs: implement statx Darrick J. Wong
2025-09-16  1:03   ` [PATCH 17/17] fuse2fs: enable atomic writes Darrick J. Wong
2025-09-16  0:22 ` [PATCHSET RFC v5 5/9] fuse4fs: specify the root node id Darrick J. Wong
2025-09-16  1:03   ` [PATCH 1/1] fuse4fs: don't use inode number translation when possible Darrick J. Wong
2025-09-16  0:23 ` [PATCHSET RFC v5 6/9] fuse2fs: handle timestamps and ACLs correctly when iomap is enabled Darrick J. Wong
2025-09-16  1:03   ` [PATCH 01/10] fuse2fs: add strictatime/lazytime mount options Darrick J. Wong
2025-09-16  1:03   ` [PATCH 02/10] fuse2fs: skip permission checking on utimens when iomap is enabled Darrick J. Wong
2025-09-16  1:04   ` [PATCH 03/10] fuse2fs: let the kernel tell us about acl/mode updates Darrick J. Wong
2025-09-16  1:04   ` [PATCH 04/10] fuse2fs: better debugging for file mode updates Darrick J. Wong
2025-09-16  1:04   ` [PATCH 05/10] fuse2fs: debug timestamp updates Darrick J. Wong
2025-09-16  1:05   ` [PATCH 06/10] fuse2fs: use coarse timestamps for iomap mode Darrick J. Wong
2025-09-16  1:05   ` [PATCH 07/10] fuse2fs: add tracing for retrieving timestamps Darrick J. Wong
2025-09-16  1:05   ` [PATCH 08/10] fuse2fs: enable syncfs Darrick J. Wong
2025-09-16  1:05   ` [PATCH 09/10] fuse2fs: skip the gdt write in op_destroy if syncfs is working Darrick J. Wong
2025-09-16  1:06   ` [PATCH 10/10] fuse2fs: set sync, immutable, and append at file load time Darrick J. Wong
2025-09-16  0:23 ` [PATCHSET RFC v5 7/9] fuse2fs: cache iomap mappings for even better file IO performance Darrick J. Wong
2025-09-16  1:06   ` [PATCH 1/3] fuse2fs: enable caching of iomaps Darrick J. Wong
2025-09-16  1:06   ` [PATCH 2/3] fuse2fs: be smarter about caching iomaps Darrick J. Wong
2025-09-16  1:06   ` [PATCH 3/3] fuse2fs: enable iomap Darrick J. Wong
2025-09-16  0:23 ` [PATCHSET RFC v5 8/9] fuse2fs: improve block and inode caching Darrick J. Wong
2025-09-16  1:07   ` [PATCH 1/6] libsupport: add caching IO manager Darrick J. Wong
2025-09-16  1:07   ` [PATCH 2/6] iocache: add the actual buffer cache Darrick J. Wong
2025-09-16  1:07   ` [PATCH 3/6] iocache: bump buffer mru priority every 50 accesses Darrick J. Wong
2025-09-16  1:07   ` [PATCH 4/6] fuse2fs: enable caching IO manager Darrick J. Wong
2025-09-16  1:08   ` [PATCH 5/6] fuse2fs: increase inode cache size Darrick J. Wong
2025-09-16  1:08   ` [PATCH 6/6] libext2fs: improve caching for inodes Darrick J. Wong
2025-09-16  0:24 ` [PATCHSET RFC v5 9/9] fuse4fs: run servers as a contained service Darrick J. Wong
2025-09-16  1:08   ` [PATCH 1/4] libext2fs: fix MMP code to work with unixfd IO manager Darrick J. Wong
2025-09-16  1:08   ` [PATCH 2/4] fuse4fs: enable safe service mode Darrick J. Wong
2025-09-16  1:09   ` [PATCH 3/4] fuse4fs: set proc title when in fuse " Darrick J. Wong
2025-09-16  1:09   ` [PATCH 4/4] fuse4fs: set iomap backing device blocksize Darrick J. Wong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox