* [PATCH 05/12] swap: cleanup setup_swap_extents
From: Christoph Hellwig @ 2026-05-12 5:35 UTC (permalink / raw)
To: Andrew Morton, Chris Li, Kairui Song
Cc: Christian Brauner, Darrick J . Wong , Jens Axboe, David Sterba,
Theodore Ts'o, Jaegeuk Kim, Chao Yu, Trond Myklebust,
Anna Schumaker, Namjae Jeon, Hyunchul Lee, Steve French,
Paulo Alcantara, Carlos Maiolino, Damien Le Moal, Naohiro Aota,
linux-xfs, linux-fsdevel, linux-doc, linux-mm, linux-block,
linux-btrfs, linux-ext4, linux-f2fs-devel, linux-nfs, linux-cifs
In-Reply-To: <20260512053625.2950900-1-hch@lst.de>
Reflow setup_swap_extents so that the flag checking is not conditional on
a swap_activate method. This is currently a no-op because the swapoff
code still checks the presence of a swap_deactivate method, but it
simplifies adding a new check, and also makes the SWP_ACTIVATED flag
more consistent.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
mm/swapfile.c | 23 +++++++++++------------
1 file changed, 11 insertions(+), 12 deletions(-)
diff --git a/mm/swapfile.c b/mm/swapfile.c
index 651c1b59ff9f..1b7fc03612f4 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -2783,25 +2783,24 @@ static int setup_swap_extents(struct swap_info_struct *sis,
{
struct address_space *mapping = swap_file->f_mapping;
struct inode *inode = mapping->host;
- int ret;
+ int ret, error = 0;
if (S_ISBLK(inode->i_mode))
return add_swap_extent(sis, sis->max, 0);
- if (swap_file->f_op->swap_activate) {
+ if (swap_file->f_op->swap_activate)
ret = swap_file->f_op->swap_activate(swap_file, sis);
- if (ret < 0)
- return ret;
- sis->flags |= SWP_ACTIVATED;
- if ((sis->flags & SWP_FS_OPS) &&
- sio_pool_init() != 0) {
- destroy_swap_extents(sis, swap_file);
- return -ENOMEM;
- }
+ else
+ ret = generic_swap_activate(swap_file, sis);
+ if (ret < 0)
return ret;
- }
- return generic_swap_activate(swap_file, sis);
+ sis->flags |= SWP_ACTIVATED;
+ if (sis->flags & SWP_FS_OPS)
+ error = sio_pool_init();
+ if (error)
+ destroy_swap_extents(sis, swap_file);
+ return error;
}
static void _enable_swap_info(struct swap_info_struct *si)
--
2.53.0
^ permalink raw reply related
* [PATCH 04/12] swap: restrict to regular files or block devices
From: Christoph Hellwig @ 2026-05-12 5:35 UTC (permalink / raw)
To: Andrew Morton, Chris Li, Kairui Song
Cc: Christian Brauner, Darrick J . Wong , Jens Axboe, David Sterba,
Theodore Ts'o, Jaegeuk Kim, Chao Yu, Trond Myklebust,
Anna Schumaker, Namjae Jeon, Hyunchul Lee, Steve French,
Paulo Alcantara, Carlos Maiolino, Damien Le Moal, Naohiro Aota,
linux-xfs, linux-fsdevel, linux-doc, linux-mm, linux-block,
linux-btrfs, linux-ext4, linux-f2fs-devel, linux-nfs, linux-cifs
In-Reply-To: <20260512053625.2950900-1-hch@lst.de>
Various swap code assumes it runs either on a block device or on a
regular file. Make this restriction explicit using checks right
after opening the file.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
mm/swapfile.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/mm/swapfile.c b/mm/swapfile.c
index a183c9c95695..651c1b59ff9f 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -3515,6 +3515,10 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags)
error = -ENOENT;
goto bad_swap_unlock_inode;
}
+ if (!S_ISBLK(inode->i_mode) && !S_ISREG(inode->i_mode)) {
+ error = -EINVAL;
+ goto bad_swap_unlock_inode;
+ }
if (IS_SWAPFILE(inode)) {
error = -EBUSY;
goto bad_swap_unlock_inode;
--
2.53.0
^ permalink raw reply related
* Re: [PATCH v4 0/9] fstests: add test coverage for cloned filesystem ids
From: Christoph Hellwig @ 2026-05-12 5:37 UTC (permalink / raw)
To: Anand Jain
Cc: fstests, linux-btrfs, linux-ext4, linux-xfs, linux-f2fs, amir73il,
zlang, hch
In-Reply-To: <cover.1777357320.git.asj@kernel.org>
This is missing a real cover letter saying what this series is trying
to archive.
On Tue, Apr 28, 2026 at 02:42:50PM +0800, Anand Jain wrote:
> v4:
> In _loop_image_create_clone() (Patch 1/9):
> . Added _require_fs_space $TEST_DIR $((size * 1024)).
> . Switched to the _create_file_sized() helper.
> . Used the loop device in mkfs instead of the image file directly.
> . Added a sync on the loop device before copying to ensure consistency.
> For test cases (Patches 3/9 to 9/9):
> . Added _require_block_device $TEST_DEV.
> For test case (Patch 4/9):
> . Removed the ext4 patch reference in _fixed_by_kernel_commit since
> that part of the plan was dropped.
>
> v3:
> https://lore.kernel.org/fstests/cover.1777281778.git.asj@kernel.org
>
> Anand Jain (9):
> fstests: add _loop_image_create_clone() helper
> fstests: add _clone_mount_option() helper
> fstests: add test for inotify isolation on cloned devices
> fstests: verify fanotify isolation on cloned filesystems
> fstests: verify f_fsid for cloned filesystems
> fstests: verify libblkid resolution of duplicate UUIDs
> fstests: verify IMA isolation on cloned filesystems
> fstests: verify exportfs file handles on cloned filesystems
> fstests: btrfs: test UUID consistency for clones with metadata_uuid
>
> common/config | 2 +
> common/rc | 61 +++++++++++++++++++++++
> tests/btrfs/348 | 92 ++++++++++++++++++++++++++++++++++
> tests/btrfs/348.out | 19 +++++++
> tests/generic/800 | 89 +++++++++++++++++++++++++++++++++
> tests/generic/800.out | 7 +++
> tests/generic/801 | 113 ++++++++++++++++++++++++++++++++++++++++++
> tests/generic/801.out | 7 +++
> tests/generic/802 | 62 +++++++++++++++++++++++
> tests/generic/802.out | 7 +++
> tests/generic/803 | 76 ++++++++++++++++++++++++++++
> tests/generic/803.out | 19 +++++++
> tests/generic/804 | 103 ++++++++++++++++++++++++++++++++++++++
> tests/generic/804.out | 10 ++++
> tests/generic/805 | 73 +++++++++++++++++++++++++++
> tests/generic/805.out | 2 +
> 16 files changed, 742 insertions(+)
> create mode 100644 tests/btrfs/348
> create mode 100644 tests/btrfs/348.out
> create mode 100644 tests/generic/800
> create mode 100644 tests/generic/800.out
> create mode 100644 tests/generic/801
> create mode 100644 tests/generic/801.out
> create mode 100644 tests/generic/802
> create mode 100644 tests/generic/802.out
> create mode 100644 tests/generic/803
> create mode 100644 tests/generic/803.out
> create mode 100644 tests/generic/804
> create mode 100644 tests/generic/804.out
> create mode 100644 tests/generic/805
> create mode 100644 tests/generic/805.out
>
> --
> 2.43.0
>
---end quoted text---
^ permalink raw reply
* [PATCH 03/12] swap,fs: move swapfile operations to struct file_operations
From: Christoph Hellwig @ 2026-05-12 5:35 UTC (permalink / raw)
To: Andrew Morton, Chris Li, Kairui Song
Cc: Christian Brauner, Darrick J . Wong , Jens Axboe, David Sterba,
Theodore Ts'o, Jaegeuk Kim, Chao Yu, Trond Myklebust,
Anna Schumaker, Namjae Jeon, Hyunchul Lee, Steve French,
Paulo Alcantara, Carlos Maiolino, Damien Le Moal, Naohiro Aota,
linux-xfs, linux-fsdevel, linux-doc, linux-mm, linux-block,
linux-btrfs, linux-ext4, linux-f2fs-devel, linux-nfs, linux-cifs
In-Reply-To: <20260512053625.2950900-1-hch@lst.de>
The swap operations have nothing to do with the address_space, which is
used for pagecache operations. Move them to struct file_operations
instead. This will allow moving the block device special cases into
block/fops.c subsequently.
Pass struct file first to ->swap_activate as file operations typically
get the file or iocb as first argument and use swap_activate instead of
swapfile_activate in all names to be consistent.
Note that while the trivial iomap wrappers are moved to a new file when
applicable to keep them local to the file operation instances, complex
implementation are kept in their existing place. It might be worth to
move them in follow-on patches if the maintainers desire so.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
.../filesystems/iomap/operations.rst | 3 +-
Documentation/filesystems/locking.rst | 35 +++++++-------
Documentation/filesystems/vfs.rst | 40 ++++++++--------
fs/btrfs/btrfs_inode.h | 3 ++
fs/btrfs/file.c | 4 ++
fs/btrfs/inode.c | 15 +-----
fs/ext4/file.c | 6 +++
fs/ext4/inode.c | 10 ----
fs/f2fs/data.c | 15 +-----
fs/f2fs/f2fs.h | 2 +
fs/f2fs/file.c | 4 ++
fs/iomap/swapfile.c | 12 ++---
fs/nfs/direct.c | 1 +
fs/nfs/file.c | 12 +++--
fs/nfs/nfs4file.c | 3 ++
fs/ntfs/aops.c | 7 ---
fs/ntfs/file.c | 6 +++
fs/smb/client/cifsfs.c | 18 ++++++++
fs/smb/client/cifsfs.h | 3 ++
fs/smb/client/file.c | 12 ++---
fs/xfs/xfs_aops.c | 46 -------------------
fs/xfs/xfs_file.c | 45 ++++++++++++++++++
fs/zonefs/file.c | 29 ++++++------
include/linux/fs.h | 10 ++--
include/linux/iomap.h | 6 +--
include/linux/nfs_fs.h | 3 ++
include/linux/swap.h | 2 +-
mm/page_io.c | 9 ++--
mm/swapfile.c | 12 ++---
29 files changed, 187 insertions(+), 186 deletions(-)
diff --git a/Documentation/filesystems/iomap/operations.rst b/Documentation/filesystems/iomap/operations.rst
index da982ca7e413..2a78037665b7 100644
--- a/Documentation/filesystems/iomap/operations.rst
+++ b/Documentation/filesystems/iomap/operations.rst
@@ -55,7 +55,6 @@ The following address space operations can be wrapped easily:
* ``readahead``
* ``writepages``
* ``bmap``
- * ``swap_activate``
``struct iomap_write_ops``
--------------------------
@@ -747,7 +746,7 @@ function.
Swap File Activation
====================
-The ``iomap_swapfile_activate`` function finds all the base-page aligned
+The ``iomap_swap_activate`` function finds all the base-page aligned
regions in a file and sets them up as swap space.
The file will be ``fsync()``'d before activation.
``IOMAP_REPORT`` will be passed as the ``flags`` argument to
diff --git a/Documentation/filesystems/locking.rst b/Documentation/filesystems/locking.rst
index f3658204d070..e79d72a12273 100644
--- a/Documentation/filesystems/locking.rst
+++ b/Documentation/filesystems/locking.rst
@@ -264,9 +264,6 @@ prototypes::
int (*launder_folio)(struct folio *);
bool (*is_partially_uptodate)(struct folio *, size_t from, size_t count);
int (*error_remove_folio)(struct address_space *, struct folio *);
- int (*swap_activate)(struct swap_info_struct *sis, struct file *f)
- int (*swap_deactivate)(struct file *);
- int (*swap_rw)(struct kiocb *iocb, struct iov_iter *iter);
locking rules:
All except dirty_folio and free_folio may block
@@ -289,9 +286,6 @@ migrate_folio: yes (both)
launder_folio: yes
is_partially_uptodate: yes
error_remove_folio: yes
-swap_activate: no
-swap_deactivate: no
-swap_rw: yes, unlocks
====================== ======================== ========= ===============
->write_begin(), ->write_end() and ->read_folio() may be called from
@@ -350,19 +344,6 @@ cleaned, or an error value if not. Note that in order to prevent the folio
getting mapped back in and redirtied, it needs to be kept locked
across the entire operation.
-->swap_activate() will be called to prepare the given file for swap. It
-should perform any validation and preparation necessary to ensure that
-writes can be performed with minimal memory allocation. It should call
-add_swap_extent(), or the helper iomap_swapfile_activate(), and return
-the number of extents added. If IO should be submitted through
-->swap_rw(), it should set SWP_FS_OPS, otherwise IO will be submitted
-directly to the block device ``sis->bdev``.
-
-->swap_deactivate() will be called in the sys_swapoff()
-path after ->swap_activate() returned success.
-
-->swap_rw will be called for swap IO if SWP_FS_OPS was set by ->swap_activate().
-
file_lock_operations
====================
@@ -503,6 +484,9 @@ prototypes::
struct file *file_out, loff_t pos_out,
loff_t len, unsigned int remap_flags);
int (*fadvise)(struct file *, loff_t, loff_t, int);
+ int (*swap_activate)(struct file *file, struct swap_info_struct *sis);
+ int (*swap_deactivate)(struct file *);
+ int (*swap_rw)(struct kiocb *iocb, struct iov_iter *iter);
locking rules:
All may block.
@@ -555,6 +539,19 @@ used. To block changes to file contents via a memory mapping during the
operation, the filesystem must take mapping->invalidate_lock to coordinate
with ->page_mkwrite.
+->swap_activate() is called to prepare the given file for swap. It should
+perform any validation and preparation necessary to ensure that writes can be
+performed with minimal memory allocation. It should call add_swap_extent(),
+or the helper iomap_swap_activate(), and return the number of extents added.
+If IO should be submitted through ->swap_rw(), the file system must set
+SWP_FS_OPS from ->swap_activate(), otherwise IO will be submitted directly to
+the block device ``sis->bdev``.
+
+->swap_deactivate() is called from the swapoff path to disable a swapfile
+successfully activated using ->swap_activate().
+
+->swap_rw will be called for swap IO if SWP_FS_OPS was set by ->swap_activate().
+
dquot_operations
================
diff --git a/Documentation/filesystems/vfs.rst b/Documentation/filesystems/vfs.rst
index 4092b2149a5d..1624c1ee82d6 100644
--- a/Documentation/filesystems/vfs.rst
+++ b/Documentation/filesystems/vfs.rst
@@ -774,9 +774,6 @@ cache in your filesystem. The following members are defined:
size_t count);
void (*is_dirty_writeback)(struct folio *, bool *, bool *);
int (*error_remove_folio)(struct mapping *mapping, struct folio *);
- int (*swap_activate)(struct swap_info_struct *sis, struct file *f);
- int (*swap_deactivate)(struct file *);
- int (*swap_rw)(struct kiocb *iocb, struct iov_iter *iter);
};
``read_folio``
@@ -970,23 +967,6 @@ cache in your filesystem. The following members are defined:
Setting this implies you deal with pages going away under you,
unless you have them locked or reference counts increased.
-``swap_activate``
-
- Called to prepare the given file for swap. It should perform
- any validation and preparation necessary to ensure that writes
- can be performed with minimal memory allocation. It should call
- add_swap_extent(), or the helper iomap_swapfile_activate(), and
- return the number of extents added. If IO should be submitted
- through ->swap_rw(), it should set SWP_FS_OPS, otherwise IO will
- be submitted directly to the block device ``sis->bdev``.
-
-``swap_deactivate``
- Called during swapoff on files where swap_activate was
- successful.
-
-``swap_rw``
- Called to read or write swap pages when SWP_FS_OPS is set.
-
The File Object
===============
@@ -1046,6 +1026,9 @@ This describes how the VFS can manipulate an open file. As of kernel
int (*uring_cmd_iopoll)(struct io_uring_cmd *, struct io_comp_batch *,
unsigned int poll_flags);
int (*mmap_prepare)(struct vm_area_desc *);
+ int (*swap_activate)(struct file *file, struct swap_info_struct *sis);
+ int (*swap_deactivate)(struct file *);
+ int (*swap_rw)(struct kiocb *iocb, struct iov_iter *iter);
};
Again, all methods are called without any locks being held, unless
@@ -1175,6 +1158,23 @@ otherwise noted.
this can be specified by the vm_area_desc->action field and related
parameters.
+``swap_activate``
+
+ Called to prepare the given file for swap. It should perform
+ any validation and preparation necessary to ensure that writes
+ can be performed with minimal memory allocation. It should call
+ add_swap_extent(), or the helper iomap_swap_activate(), and
+ return the number of extents added. If IO should be submitted
+ through ->swap_rw(), it should set SWP_FS_OPS, otherwise IO will
+ be submitted directly to the block device ``sis->bdev``.
+
+``swap_deactivate``
+ Called during swapoff on files where swap_activate was
+ successful.
+
+``swap_rw``
+ Called to read or write swap pages when SWP_FS_OPS is set.
+
Note that the file operations are implemented by the specific
filesystem in which the inode resides. When opening a device node
(character or block special) most filesystems will call special
diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h
index 55c272fe5d92..f527126882d6 100644
--- a/fs/btrfs/btrfs_inode.h
+++ b/fs/btrfs/btrfs_inode.h
@@ -670,4 +670,7 @@ struct extent_map *btrfs_create_io_em(struct btrfs_inode *inode, u64 start,
const struct btrfs_file_extent *file_extent,
int type);
+int btrfs_swap_activate(struct file *file, struct swap_info_struct *sis);
+void btrfs_swap_deactivate(struct file *file);
+
#endif
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index cf1cb5c4db75..165b8da1d7db 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -3867,6 +3867,10 @@ const struct file_operations btrfs_file_operations = {
.uring_cmd = btrfs_uring_cmd,
.fop_flags = FOP_BUFFER_RASYNC | FOP_BUFFER_WASYNC,
.setlease = generic_setlease,
+#ifdef CONFIG_SWAP
+ .swap_activate = btrfs_swap_activate,
+ .swap_deactivate = btrfs_swap_deactivate,
+#endif
};
int btrfs_fdatawrite_range(struct btrfs_inode *inode, loff_t start, loff_t end)
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 198d87e6f19a..ee0a7947706a 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -10217,7 +10217,7 @@ static int btrfs_add_swap_extent(struct swap_info_struct *sis,
return add_swap_extent(sis, next_ppage - first_ppage, first_ppage);
}
-static void btrfs_swap_deactivate(struct file *file)
+void btrfs_swap_deactivate(struct file *file)
{
struct inode *inode = file_inode(file);
@@ -10225,7 +10225,7 @@ static void btrfs_swap_deactivate(struct file *file)
atomic_dec(&BTRFS_I(inode)->root->nr_swapfiles);
}
-static int btrfs_swap_activate(struct swap_info_struct *sis, struct file *file)
+int btrfs_swap_activate(struct file *file, struct swap_info_struct *sis)
{
struct inode *inode = file_inode(file);
struct btrfs_root *root = BTRFS_I(inode)->root;
@@ -10537,15 +10537,6 @@ static int btrfs_swap_activate(struct swap_info_struct *sis, struct file *file)
sis->bdev = device->bdev;
return ret;
}
-#else
-static void btrfs_swap_deactivate(struct file *file)
-{
-}
-
-static int btrfs_swap_activate(struct swap_info_struct *sis, struct file *file)
-{
- return -EOPNOTSUPP;
-}
#endif
/*
@@ -10692,8 +10683,6 @@ static const struct address_space_operations btrfs_aops = {
.migrate_folio = btrfs_migrate_folio,
.dirty_folio = filemap_dirty_folio,
.error_remove_folio = generic_error_remove_folio,
- .swap_activate = btrfs_swap_activate,
- .swap_deactivate = btrfs_swap_deactivate,
};
static const struct inode_operations btrfs_file_inode_operations = {
diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index eb1a323962b1..fad3ed05c02a 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -971,6 +971,11 @@ loff_t ext4_llseek(struct file *file, loff_t offset, int whence)
return vfs_setpos(file, offset, maxbytes);
}
+static int ext4_swap_activate(struct file *file, struct swap_info_struct *sis)
+{
+ return iomap_swap_activate(file, sis, &ext4_iomap_report_ops);
+}
+
const struct file_operations ext4_file_operations = {
.llseek = ext4_llseek,
.read_iter = ext4_file_read_iter,
@@ -992,6 +997,7 @@ const struct file_operations ext4_file_operations = {
FOP_DIO_PARALLEL_WRITE |
FOP_DONTCACHE,
.setlease = generic_setlease,
+ .swap_activate = ext4_swap_activate,
};
const struct inode_operations ext4_file_inode_operations = {
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index ca7bac4a8b4a..efbb2ddad363 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3939,12 +3939,6 @@ static bool ext4_dirty_folio(struct address_space *mapping, struct folio *folio)
return block_dirty_folio(mapping, folio);
}
-static int ext4_iomap_swap_activate(struct swap_info_struct *sis,
- struct file *file)
-{
- return iomap_swapfile_activate(sis, file, &ext4_iomap_report_ops);
-}
-
static const struct address_space_operations ext4_aops = {
.read_folio = ext4_read_folio,
.readahead = ext4_readahead,
@@ -3958,7 +3952,6 @@ static const struct address_space_operations ext4_aops = {
.migrate_folio = buffer_migrate_folio,
.is_partially_uptodate = block_is_partially_uptodate,
.error_remove_folio = generic_error_remove_folio,
- .swap_activate = ext4_iomap_swap_activate,
};
static const struct address_space_operations ext4_journalled_aops = {
@@ -3974,7 +3967,6 @@ static const struct address_space_operations ext4_journalled_aops = {
.migrate_folio = buffer_migrate_folio_norefs,
.is_partially_uptodate = block_is_partially_uptodate,
.error_remove_folio = generic_error_remove_folio,
- .swap_activate = ext4_iomap_swap_activate,
};
static const struct address_space_operations ext4_da_aops = {
@@ -3990,14 +3982,12 @@ static const struct address_space_operations ext4_da_aops = {
.migrate_folio = buffer_migrate_folio,
.is_partially_uptodate = block_is_partially_uptodate,
.error_remove_folio = generic_error_remove_folio,
- .swap_activate = ext4_iomap_swap_activate,
};
static const struct address_space_operations ext4_dax_aops = {
.writepages = ext4_dax_writepages,
.dirty_folio = noop_dirty_folio,
.bmap = ext4_bmap,
- .swap_activate = ext4_iomap_swap_activate,
};
void ext4_set_aops(struct inode *inode)
diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index 86fabacc67e6..8bcf630df557 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -4338,7 +4338,7 @@ static int check_swap_activate(struct swap_info_struct *sis,
return ret;
}
-static int f2fs_swap_activate(struct swap_info_struct *sis, struct file *file)
+int f2fs_swap_activate(struct file *file, struct swap_info_struct *sis)
{
struct inode *inode = file_inode(file);
struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
@@ -4378,22 +4378,13 @@ static int f2fs_swap_activate(struct swap_info_struct *sis, struct file *file)
return 0;
}
-static void f2fs_swap_deactivate(struct file *file)
+void f2fs_swap_deactivate(struct file *file)
{
struct inode *inode = file_inode(file);
stat_dec_swapfile_inode(inode);
clear_inode_flag(inode, FI_PIN_FILE);
}
-#else
-static int f2fs_swap_activate(struct swap_info_struct *sis, struct file *file)
-{
- return -EOPNOTSUPP;
-}
-
-static void f2fs_swap_deactivate(struct file *file)
-{
-}
#endif
const struct address_space_operations f2fs_dblock_aops = {
@@ -4407,8 +4398,6 @@ const struct address_space_operations f2fs_dblock_aops = {
.invalidate_folio = f2fs_invalidate_folio,
.release_folio = f2fs_release_folio,
.bmap = f2fs_bmap,
- .swap_activate = f2fs_swap_activate,
- .swap_deactivate = f2fs_swap_deactivate,
};
void f2fs_clear_page_cache_dirty_tag(struct folio *folio)
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 91f506e7c9cf..93e9709f26fa 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -4195,6 +4195,8 @@ int f2fs_init_post_read_processing(void);
void f2fs_destroy_post_read_processing(void);
int f2fs_init_post_read_wq(struct f2fs_sb_info *sbi);
void f2fs_destroy_post_read_wq(struct f2fs_sb_info *sbi);
+int f2fs_swap_activate(struct file *file, struct swap_info_struct *sis);
+void f2fs_swap_deactivate(struct file *file);
extern const struct iomap_ops f2fs_iomap_ops;
/*
diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index fb12c5c9affd..aa91d5fff1cf 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -5488,4 +5488,8 @@ const struct file_operations f2fs_file_operations = {
.fadvise = f2fs_file_fadvise,
.fop_flags = FOP_BUFFER_RASYNC,
.setlease = generic_setlease,
+#ifdef CONFIG_SWAP
+ .swap_activate = f2fs_swap_activate,
+ .swap_deactivate = f2fs_swap_deactivate,
+#endif
};
diff --git a/fs/iomap/swapfile.c b/fs/iomap/swapfile.c
index f778b2c6c922..cf354fdfb7c3 100644
--- a/fs/iomap/swapfile.c
+++ b/fs/iomap/swapfile.c
@@ -100,10 +100,10 @@ static int iomap_swapfile_iter(struct iomap_iter *iter,
* Iterate a swap file's iomaps to construct physical extents that can be
* passed to the swapfile subsystem.
*/
-int iomap_swapfile_activate(struct swap_info_struct *sis,
- struct file *swap_file, const struct iomap_ops *ops)
+int iomap_swap_activate(struct file *file, struct swap_info_struct *sis,
+ const struct iomap_ops *ops)
{
- struct inode *inode = swap_file->f_mapping->host;
+ struct inode *inode = file->f_mapping->host;
struct iomap_iter iter = {
.inode = inode,
.pos = 0,
@@ -112,7 +112,7 @@ int iomap_swapfile_activate(struct swap_info_struct *sis,
};
struct iomap_swapfile_info isi = {
.sis = sis,
- .file = swap_file,
+ .file = file,
};
int ret;
@@ -120,7 +120,7 @@ int iomap_swapfile_activate(struct swap_info_struct *sis,
* Persist all file mapping metadata so that we won't have any
* IOMAP_F_DIRTY iomaps.
*/
- ret = vfs_fsync(swap_file, 1);
+ ret = vfs_fsync(file, 1);
if (ret)
return ret;
@@ -137,4 +137,4 @@ int iomap_swapfile_activate(struct swap_info_struct *sis,
return 0;
}
-EXPORT_SYMBOL_GPL(iomap_swapfile_activate);
+EXPORT_SYMBOL_GPL(iomap_swap_activate);
diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c
index 48d89716193a..e92a4c8f8f77 100644
--- a/fs/nfs/direct.c
+++ b/fs/nfs/direct.c
@@ -164,6 +164,7 @@ int nfs_swap_rw(struct kiocb *iocb, struct iov_iter *iter)
return ret;
return 0;
}
+EXPORT_SYMBOL_GPL(nfs_swap_rw);
static void nfs_direct_release_pages(struct page **pages, unsigned int npages)
{
diff --git a/fs/nfs/file.c b/fs/nfs/file.c
index 74b401aa2b3a..2bc55d9d71e1 100644
--- a/fs/nfs/file.c
+++ b/fs/nfs/file.c
@@ -567,7 +567,7 @@ static int nfs_launder_folio(struct folio *folio)
return ret;
}
-static int nfs_swap_activate(struct swap_info_struct *sis, struct file *file)
+int nfs_swap_activate(struct file *file, struct swap_info_struct *sis)
{
unsigned long blocks;
long long isize;
@@ -600,8 +600,9 @@ static int nfs_swap_activate(struct swap_info_struct *sis, struct file *file)
sis->flags |= SWP_FS_OPS;
return 0;
}
+EXPORT_SYMBOL_GPL(nfs_swap_activate);
-static void nfs_swap_deactivate(struct file *file)
+void nfs_swap_deactivate(struct file *file)
{
struct inode *inode = file_inode(file);
struct rpc_clnt *clnt = NFS_CLIENT(inode);
@@ -611,6 +612,7 @@ static void nfs_swap_deactivate(struct file *file)
if (cl->rpc_ops->disable_swap)
cl->rpc_ops->disable_swap(file_inode(file));
}
+EXPORT_SYMBOL_GPL(nfs_swap_deactivate);
const struct address_space_operations nfs_file_aops = {
.read_folio = nfs_read_folio,
@@ -625,9 +627,6 @@ const struct address_space_operations nfs_file_aops = {
.launder_folio = nfs_launder_folio,
.is_dirty_writeback = nfs_check_dirty_writeback,
.error_remove_folio = generic_error_remove_folio,
- .swap_activate = nfs_swap_activate,
- .swap_deactivate = nfs_swap_deactivate,
- .swap_rw = nfs_swap_rw,
};
/*
@@ -960,6 +959,9 @@ const struct file_operations nfs_file_operations = {
.splice_read = nfs_file_splice_read,
.splice_write = iter_file_splice_write,
.check_flags = nfs_check_flags,
+ .swap_activate = nfs_swap_activate,
+ .swap_deactivate = nfs_swap_deactivate,
+ .swap_rw = nfs_swap_rw,
.fop_flags = FOP_DONTCACHE,
};
EXPORT_SYMBOL_GPL(nfs_file_operations);
diff --git a/fs/nfs/nfs4file.c b/fs/nfs/nfs4file.c
index be40e126c539..eb1a8dbab55a 100644
--- a/fs/nfs/nfs4file.c
+++ b/fs/nfs/nfs4file.c
@@ -455,5 +455,8 @@ const struct file_operations nfs4_file_operations = {
#else
.llseek = nfs_file_llseek,
#endif
+ .swap_activate = nfs_swap_activate,
+ .swap_deactivate = nfs_swap_deactivate,
+ .swap_rw = nfs_swap_rw,
.fop_flags = FOP_DONTCACHE,
};
diff --git a/fs/ntfs/aops.c b/fs/ntfs/aops.c
index 4b7d019bc6ed..a94f5f675790 100644
--- a/fs/ntfs/aops.c
+++ b/fs/ntfs/aops.c
@@ -270,12 +270,6 @@ static int ntfs_writepages(struct address_space *mapping,
return iomap_writepages(&wpc);
}
-static int ntfs_swap_activate(struct swap_info_struct *sis,
- struct file *swap_file)
-{
- return iomap_swapfile_activate(sis, swap_file, &ntfs_read_iomap_ops);
-}
-
const struct address_space_operations ntfs_aops = {
.read_folio = ntfs_read_folio,
.readahead = ntfs_readahead,
@@ -287,7 +281,6 @@ const struct address_space_operations ntfs_aops = {
.error_remove_folio = generic_error_remove_folio,
.release_folio = iomap_release_folio,
.invalidate_folio = iomap_invalidate_folio,
- .swap_activate = ntfs_swap_activate,
};
const struct address_space_operations ntfs_mft_aops = {
diff --git a/fs/ntfs/file.c b/fs/ntfs/file.c
index e8bea22b81a7..0dcf8479362a 100644
--- a/fs/ntfs/file.c
+++ b/fs/ntfs/file.c
@@ -1114,6 +1114,11 @@ static long ntfs_fallocate(struct file *file, int mode, loff_t offset, loff_t le
return err;
}
+static int ntfs_swap_activate(struct file *file, struct swap_info_struct *sis)
+{
+ return iomap_swap_activate(file, sis, &ntfs_read_iomap_ops);
+}
+
const struct file_operations ntfs_file_ops = {
.llseek = ntfs_file_llseek,
.read_iter = ntfs_file_read_iter,
@@ -1130,6 +1135,7 @@ const struct file_operations ntfs_file_ops = {
#endif
.fallocate = ntfs_fallocate,
.setlease = generic_setlease,
+ .swap_activate = ntfs_swap_activate,
};
const struct inode_operations ntfs_file_inode_ops = {
diff --git a/fs/smb/client/cifsfs.c b/fs/smb/client/cifsfs.c
index 9f76b0347fa9..f0d8a3a46074 100644
--- a/fs/smb/client/cifsfs.c
+++ b/fs/smb/client/cifsfs.c
@@ -1577,6 +1577,9 @@ const struct file_operations cifs_file_ops = {
.remap_file_range = cifs_remap_file_range,
.setlease = cifs_setlease,
.fallocate = cifs_fallocate,
+ .swap_activate = cifs_swap_activate,
+ .swap_deactivate = cifs_swap_deactivate,
+ .swap_rw = cifs_swap_rw,
};
const struct file_operations cifs_file_strict_ops = {
@@ -1597,6 +1600,9 @@ const struct file_operations cifs_file_strict_ops = {
.remap_file_range = cifs_remap_file_range,
.setlease = cifs_setlease,
.fallocate = cifs_fallocate,
+ .swap_activate = cifs_swap_activate,
+ .swap_deactivate = cifs_swap_deactivate,
+ .swap_rw = cifs_swap_rw,
};
const struct file_operations cifs_file_direct_ops = {
@@ -1617,6 +1623,9 @@ const struct file_operations cifs_file_direct_ops = {
.llseek = cifs_llseek,
.setlease = cifs_setlease,
.fallocate = cifs_fallocate,
+ .swap_activate = cifs_swap_activate,
+ .swap_deactivate = cifs_swap_deactivate,
+ .swap_rw = cifs_swap_rw,
};
const struct file_operations cifs_file_nobrl_ops = {
@@ -1635,6 +1644,9 @@ const struct file_operations cifs_file_nobrl_ops = {
.remap_file_range = cifs_remap_file_range,
.setlease = cifs_setlease,
.fallocate = cifs_fallocate,
+ .swap_activate = cifs_swap_activate,
+ .swap_deactivate = cifs_swap_deactivate,
+ .swap_rw = cifs_swap_rw,
};
const struct file_operations cifs_file_strict_nobrl_ops = {
@@ -1653,6 +1665,9 @@ const struct file_operations cifs_file_strict_nobrl_ops = {
.remap_file_range = cifs_remap_file_range,
.setlease = cifs_setlease,
.fallocate = cifs_fallocate,
+ .swap_activate = cifs_swap_activate,
+ .swap_deactivate = cifs_swap_deactivate,
+ .swap_rw = cifs_swap_rw,
};
const struct file_operations cifs_file_direct_nobrl_ops = {
@@ -1671,6 +1686,9 @@ const struct file_operations cifs_file_direct_nobrl_ops = {
.llseek = cifs_llseek,
.setlease = cifs_setlease,
.fallocate = cifs_fallocate,
+ .swap_activate = cifs_swap_activate,
+ .swap_deactivate = cifs_swap_deactivate,
+ .swap_rw = cifs_swap_rw,
};
const struct file_operations cifs_dir_ops = {
diff --git a/fs/smb/client/cifsfs.h b/fs/smb/client/cifsfs.h
index c455b15f2778..1e5b9fce84f9 100644
--- a/fs/smb/client/cifsfs.h
+++ b/fs/smb/client/cifsfs.h
@@ -115,6 +115,9 @@ int cifs_file_mmap_prepare(struct vm_area_desc *desc);
int cifs_file_strict_mmap_prepare(struct vm_area_desc *desc);
extern const struct file_operations cifs_dir_ops;
int cifs_readdir(struct file *file, struct dir_context *ctx);
+int cifs_swap_activate(struct file *swap_file, struct swap_info_struct *sis);
+void cifs_swap_deactivate(struct file *file);
+int cifs_swap_rw(struct kiocb *iocb, struct iov_iter *iter);
/* Functions related to dir entries */
extern const struct dentry_operations cifs_dentry_ops;
diff --git a/fs/smb/client/file.c b/fs/smb/client/file.c
index 11d4655ef490..84459f87907e 100644
--- a/fs/smb/client/file.c
+++ b/fs/smb/client/file.c
@@ -3286,8 +3286,7 @@ void cifs_oplock_break(struct work_struct *work)
cifs_done_oplock_break(cinode);
}
-static int cifs_swap_activate(struct swap_info_struct *sis,
- struct file *swap_file)
+int cifs_swap_activate(struct file *swap_file, struct swap_info_struct *sis)
{
struct cifsFileInfo *cfile = swap_file->private_data;
struct inode *inode = swap_file->f_mapping->host;
@@ -3296,7 +3295,7 @@ static int cifs_swap_activate(struct swap_info_struct *sis,
cifs_dbg(FYI, "swap activate\n");
- if (!swap_file->f_mapping->a_ops->swap_rw)
+ if (!swap_file->f_op->swap_rw)
/* Cannot support swap */
return -EINVAL;
@@ -3331,7 +3330,7 @@ static int cifs_swap_activate(struct swap_info_struct *sis,
return add_swap_extent(sis, sis->max, 0);
}
-static void cifs_swap_deactivate(struct file *file)
+void cifs_swap_deactivate(struct file *file)
{
struct cifsFileInfo *cfile = file->private_data;
@@ -3352,7 +3351,7 @@ static void cifs_swap_deactivate(struct file *file)
*
* Perform IO to the swap-file. This is much like direct IO.
*/
-static int cifs_swap_rw(struct kiocb *iocb, struct iov_iter *iter)
+int cifs_swap_rw(struct kiocb *iocb, struct iov_iter *iter)
{
ssize_t ret;
@@ -3378,9 +3377,6 @@ const struct address_space_operations cifs_addr_ops = {
* TODO: investigate and if useful we could add an is_dirty_writeback
* helper if needed
*/
- .swap_activate = cifs_swap_activate,
- .swap_deactivate = cifs_swap_deactivate,
- .swap_rw = cifs_swap_rw,
};
/*
diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 1e8662e0e7cd..7488fc6a7b78 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -796,50 +796,6 @@ xfs_vm_readahead(
iomap_readahead(&xfs_read_iomap_ops, &ctx, NULL);
}
-static int
-xfs_vm_swap_activate(
- struct swap_info_struct *sis,
- struct file *swap_file)
-{
- struct xfs_inode *ip = XFS_I(file_inode(swap_file));
-
- if (xfs_is_zoned_inode(ip))
- return -EINVAL;
-
- /*
- * Swap file activation can race against concurrent shared extent
- * removal in files that have been cloned. If this happens,
- * iomap_swapfile_iter() can fail because it encountered a shared
- * extent even though an operation is in progress to remove those
- * shared extents.
- *
- * This race becomes problematic when we defer extent removal
- * operations beyond the end of a syscall (i.e. use async background
- * processing algorithms). Users think the extents are no longer
- * shared, but iomap_swapfile_iter() still sees them as shared
- * because the refcountbt entries for the extents being removed have
- * not yet been updated. Hence the swapon call fails unexpectedly.
- *
- * The race condition is currently most obvious from the unlink()
- * operation as extent removal is deferred until after the last
- * reference to the inode goes away. We then process the extent
- * removal asynchronously, hence triggers the "syscall completed but
- * work not done" condition mentioned above. To close this race
- * window, we need to flush any pending inodegc operations to ensure
- * they have updated the refcountbt records before we try to map the
- * swapfile.
- */
- xfs_inodegc_flush(ip->i_mount);
-
- /*
- * Direct the swap code to the correct block device when this file
- * sits on the RT device.
- */
- sis->bdev = xfs_inode_buftarg(ip)->bt_bdev;
-
- return iomap_swapfile_activate(sis, swap_file, &xfs_read_iomap_ops);
-}
-
const struct address_space_operations xfs_address_space_operations = {
.read_folio = xfs_vm_read_folio,
.readahead = xfs_vm_readahead,
@@ -851,11 +807,9 @@ const struct address_space_operations xfs_address_space_operations = {
.migrate_folio = filemap_migrate_folio,
.is_partially_uptodate = iomap_is_partially_uptodate,
.error_remove_folio = generic_error_remove_folio,
- .swap_activate = xfs_vm_swap_activate,
};
const struct address_space_operations xfs_dax_aops = {
.writepages = xfs_dax_writepages,
.dirty_folio = noop_dirty_folio,
- .swap_activate = xfs_vm_swap_activate,
};
diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index 845a97c9b063..41f7e19bd31f 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -2081,6 +2081,50 @@ xfs_file_mmap_prepare(
return 0;
}
+static int
+xfs_file_swap_activate(
+ struct file *file,
+ struct swap_info_struct *sis)
+{
+ struct xfs_inode *ip = XFS_I(file_inode(file));
+
+ if (xfs_is_zoned_inode(ip))
+ return -EINVAL;
+
+ /*
+ * Swap file activation can race against concurrent shared extent
+ * removal in files that have been cloned. If this happens,
+ * iomap_swapfile_iter() can fail because it encountered a shared
+ * extent even though an operation is in progress to remove those
+ * shared extents.
+ *
+ * This race becomes problematic when we defer extent removal
+ * operations beyond the end of a syscall (i.e. use async background
+ * processing algorithms). Users think the extents are no longer
+ * shared, but iomap_swapfile_iter() still sees them as shared
+ * because the refcountbt entries for the extents being removed have
+ * not yet been updated. Hence the swapon call fails unexpectedly.
+ *
+ * The race condition is currently most obvious from the unlink()
+ * operation as extent removal is deferred until after the last
+ * reference to the inode goes away. We then process the extent
+ * removal asynchronously, hence triggers the "syscall completed but
+ * work not done" condition mentioned above. To close this race
+ * window, we need to flush any pending inodegc operations to ensure
+ * they have updated the refcountbt records before we try to map the
+ * swapfile.
+ */
+ xfs_inodegc_flush(ip->i_mount);
+
+ /*
+ * Direct the swap code to the correct block device when this file
+ * sits on the RT device.
+ */
+ sis->bdev = xfs_inode_buftarg(ip)->bt_bdev;
+
+ return iomap_swap_activate(file, sis, &xfs_read_iomap_ops);
+}
+
const struct file_operations xfs_file_operations = {
.llseek = xfs_file_llseek,
.read_iter = xfs_file_read_iter,
@@ -2104,6 +2148,7 @@ const struct file_operations xfs_file_operations = {
FOP_BUFFER_WASYNC | FOP_DIO_PARALLEL_WRITE |
FOP_DONTCACHE,
.setlease = generic_setlease,
+ .swap_activate = xfs_file_swap_activate,
};
const struct file_operations xfs_dir_file_operations = {
diff --git a/fs/zonefs/file.c b/fs/zonefs/file.c
index 214e4bf8e30a..2c817917a13d 100644
--- a/fs/zonefs/file.c
+++ b/fs/zonefs/file.c
@@ -167,20 +167,6 @@ static int zonefs_writepages(struct address_space *mapping,
return iomap_writepages(&wpc);
}
-static int zonefs_swap_activate(struct swap_info_struct *sis,
- struct file *swap_file)
-{
- struct inode *inode = file_inode(swap_file);
-
- if (zonefs_inode_is_seq(inode)) {
- zonefs_err(inode->i_sb,
- "swap file: not a conventional zone file\n");
- return -EINVAL;
- }
-
- return iomap_swapfile_activate(sis, swap_file, &zonefs_read_iomap_ops);
-}
-
const struct address_space_operations zonefs_file_aops = {
.read_folio = zonefs_read_folio,
.readahead = zonefs_readahead,
@@ -191,7 +177,6 @@ const struct address_space_operations zonefs_file_aops = {
.migrate_folio = filemap_migrate_folio,
.is_partially_uptodate = iomap_is_partially_uptodate,
.error_remove_folio = generic_error_remove_folio,
- .swap_activate = zonefs_swap_activate,
};
int zonefs_file_truncate(struct inode *inode, loff_t isize)
@@ -858,6 +843,19 @@ static int zonefs_file_release(struct inode *inode, struct file *file)
return 0;
}
+static int zonefs_swap_activate(struct file *file, struct swap_info_struct *sis)
+{
+ struct inode *inode = file_inode(file);
+
+ if (zonefs_inode_is_seq(inode)) {
+ zonefs_err(inode->i_sb,
+ "swap file: not a conventional zone file\n");
+ return -EINVAL;
+ }
+
+ return iomap_swap_activate(file, sis, &zonefs_read_iomap_ops);
+}
+
const struct file_operations zonefs_file_operations = {
.open = zonefs_file_open,
.release = zonefs_file_release,
@@ -869,4 +867,5 @@ const struct file_operations zonefs_file_operations = {
.splice_read = zonefs_file_splice_read,
.splice_write = iter_file_splice_write,
.iopoll = iocb_bio_iopoll,
+ .swap_activate = zonefs_swap_activate,
};
diff --git a/include/linux/fs.h b/include/linux/fs.h
index b8b6f7a38f4d..7564cef5405d 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -433,11 +433,6 @@ struct address_space_operations {
size_t count);
void (*is_dirty_writeback) (struct folio *, bool *dirty, bool *wb);
int (*error_remove_folio)(struct address_space *, struct folio *);
-
- /* swapfile support */
- int (*swap_activate)(struct swap_info_struct *sis, struct file *file);
- void (*swap_deactivate)(struct file *file);
- int (*swap_rw)(struct kiocb *iocb, struct iov_iter *iter);
};
extern const struct address_space_operations empty_aops;
@@ -1966,6 +1961,11 @@ struct file_operations {
int (*uring_cmd_iopoll)(struct io_uring_cmd *, struct io_comp_batch *,
unsigned int poll_flags);
int (*mmap_prepare)(struct vm_area_desc *);
+
+ /* swapfile support */
+ int (*swap_activate)(struct file *file, struct swap_info_struct *sis);
+ void (*swap_deactivate)(struct file *file);
+ int (*swap_rw)(struct kiocb *iocb, struct iov_iter *iter);
} __randomize_layout;
/* Supports async buffered reads */
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index d82126e3d086..3fd582d375b6 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -603,10 +603,10 @@ void iomap_dio_bio_end_io(struct bio *bio);
struct file;
struct swap_info_struct;
-int iomap_swapfile_activate(struct swap_info_struct *sis,
- struct file *swap_file, const struct iomap_ops *ops);
+int iomap_swap_activate(struct file *file, struct swap_info_struct *sis,
+ const struct iomap_ops *ops);
#else
-# define iomap_swapfile_activate(sis, swapfile, ops) (-EIO)
+# define iomap_swap_activate(file, sis, ops) (-EIO)
#endif /* CONFIG_SWAP */
extern struct bio_set iomap_ioend_bioset;
diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
index 4623262da3c0..9746212a085e 100644
--- a/include/linux/nfs_fs.h
+++ b/include/linux/nfs_fs.h
@@ -538,6 +538,9 @@ extern __be32 root_nfs_parse_addr(char *name); /*__init*/
/*
* linux/fs/nfs/file.c
*/
+int nfs_swap_activate(struct file *file, struct swap_info_struct *sis);
+void nfs_swap_deactivate(struct file *file);
+
extern const struct file_operations nfs_file_operations;
#if IS_ENABLED(CONFIG_NFS_V4)
extern const struct file_operations nfs4_file_operations;
diff --git a/include/linux/swap.h b/include/linux/swap.h
index b8dfe2c6bc98..657779485ae4 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -405,7 +405,7 @@ extern void __meminit kswapd_stop(int nid);
int add_swap_extent(struct swap_info_struct *sis, unsigned long nr_pages,
sector_t start_block);
-int generic_swapfile_activate(struct swap_info_struct *, struct file *);
+int generic_swap_activate(struct file *swap_file, struct swap_info_struct *sis);
static inline unsigned long total_swapcache_pages(void)
{
diff --git a/mm/page_io.c b/mm/page_io.c
index f30f36ec1ed0..3e1c12649448 100644
--- a/mm/page_io.c
+++ b/mm/page_io.c
@@ -75,8 +75,7 @@ static void end_swap_bio_read(struct bio *bio)
bio_put(bio);
}
-int generic_swapfile_activate(struct swap_info_struct *sis,
- struct file *swap_file)
+int generic_swap_activate(struct file *swap_file, struct swap_info_struct *sis)
{
struct address_space *mapping = swap_file->f_mapping;
struct inode *inode = mapping->host;
@@ -451,11 +450,10 @@ void __swap_writepage(struct folio *folio, struct swap_iocb **swap_plug)
void swap_write_unplug(struct swap_iocb *sio)
{
struct iov_iter from;
- struct address_space *mapping = sio->iocb.ki_filp->f_mapping;
int ret;
iov_iter_bvec(&from, ITER_SOURCE, sio->bvec, sio->pages, sio->len);
- ret = mapping->a_ops->swap_rw(&sio->iocb, &from);
+ ret = sio->iocb.ki_filp->f_op->swap_rw(&sio->iocb, &from);
if (ret != -EIOCBQUEUED)
sio_write_complete(&sio->iocb, ret);
}
@@ -640,11 +638,10 @@ void swap_read_folio(struct folio *folio, struct swap_iocb **plug)
void __swap_read_unplug(struct swap_iocb *sio)
{
struct iov_iter from;
- struct address_space *mapping = sio->iocb.ki_filp->f_mapping;
int ret;
iov_iter_bvec(&from, ITER_DEST, sio->bvec, sio->pages, sio->len);
- ret = mapping->a_ops->swap_rw(&sio->iocb, &from);
+ ret = sio->iocb.ki_filp->f_op->swap_rw(&sio->iocb, &from);
if (ret != -EIOCBQUEUED)
sio_read_complete(&sio->iocb, ret);
}
diff --git a/mm/swapfile.c b/mm/swapfile.c
index 158620fd2978..a183c9c95695 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -2692,11 +2692,9 @@ static void destroy_swap_extents(struct swap_info_struct *sis,
}
if (sis->flags & SWP_ACTIVATED) {
- struct address_space *mapping = swap_file->f_mapping;
-
sis->flags &= ~SWP_ACTIVATED;
- if (mapping->a_ops->swap_deactivate)
- mapping->a_ops->swap_deactivate(swap_file);
+ if (swap_file->f_op->swap_deactivate)
+ swap_file->f_op->swap_deactivate(swap_file);
}
}
@@ -2790,8 +2788,8 @@ static int setup_swap_extents(struct swap_info_struct *sis,
if (S_ISBLK(inode->i_mode))
return add_swap_extent(sis, sis->max, 0);
- if (mapping->a_ops->swap_activate) {
- ret = mapping->a_ops->swap_activate(sis, swap_file);
+ if (swap_file->f_op->swap_activate) {
+ ret = swap_file->f_op->swap_activate(swap_file, sis);
if (ret < 0)
return ret;
sis->flags |= SWP_ACTIVATED;
@@ -2803,7 +2801,7 @@ static int setup_swap_extents(struct swap_info_struct *sis,
return ret;
}
- return generic_swapfile_activate(sis, swap_file);
+ return generic_swap_activate(swap_file, sis);
}
static void _enable_swap_info(struct swap_info_struct *si)
--
2.53.0
^ permalink raw reply related
* [PATCH 02/12] swap: move boilerplate code into the core swap code
From: Christoph Hellwig @ 2026-05-12 5:35 UTC (permalink / raw)
To: Andrew Morton, Chris Li, Kairui Song
Cc: Christian Brauner, Darrick J . Wong , Jens Axboe, David Sterba,
Theodore Ts'o, Jaegeuk Kim, Chao Yu, Trond Myklebust,
Anna Schumaker, Namjae Jeon, Hyunchul Lee, Steve French,
Paulo Alcantara, Carlos Maiolino, Damien Le Moal, Naohiro Aota,
linux-xfs, linux-fsdevel, linux-doc, linux-mm, linux-block,
linux-btrfs, linux-ext4, linux-f2fs-devel, linux-nfs, linux-cifs
In-Reply-To: <20260512053625.2950900-1-hch@lst.de>
Make the core swap code calculate sis->pages, nr_extents and the span,
re-set sis->max based on it and don't require passing the current offset
into the swap file to swap_add_extent as all that can trivially be
calculated internally. Also truncate the spans based on the available
information.
All this removes a lot of boilerplate code in the callers.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
Documentation/filesystems/locking.rst | 2 +-
Documentation/filesystems/vfs.rst | 2 +-
fs/btrfs/inode.c | 58 ++-----------
fs/ext4/inode.c | 5 +-
fs/f2fs/data.c | 38 ++-------
fs/iomap/swapfile.c | 58 +------------
fs/nfs/file.c | 9 +-
fs/ntfs/aops.c | 5 +-
fs/smb/client/file.c | 5 +-
fs/xfs/xfs_aops.c | 6 +-
fs/zonefs/file.c | 5 +-
include/linux/fs.h | 3 +-
include/linux/iomap.h | 5 +-
include/linux/swap.h | 11 ++-
mm/page_io.c | 39 ++-------
mm/swapfile.c | 116 ++++++++++++++++----------
16 files changed, 121 insertions(+), 246 deletions(-)
diff --git a/Documentation/filesystems/locking.rst b/Documentation/filesystems/locking.rst
index 8421ea21bd35..f3658204d070 100644
--- a/Documentation/filesystems/locking.rst
+++ b/Documentation/filesystems/locking.rst
@@ -264,7 +264,7 @@ prototypes::
int (*launder_folio)(struct folio *);
bool (*is_partially_uptodate)(struct folio *, size_t from, size_t count);
int (*error_remove_folio)(struct address_space *, struct folio *);
- int (*swap_activate)(struct swap_info_struct *sis, struct file *f, sector_t *span)
+ int (*swap_activate)(struct swap_info_struct *sis, struct file *f)
int (*swap_deactivate)(struct file *);
int (*swap_rw)(struct kiocb *iocb, struct iov_iter *iter);
diff --git a/Documentation/filesystems/vfs.rst b/Documentation/filesystems/vfs.rst
index 7c753148af88..4092b2149a5d 100644
--- a/Documentation/filesystems/vfs.rst
+++ b/Documentation/filesystems/vfs.rst
@@ -774,7 +774,7 @@ cache in your filesystem. The following members are defined:
size_t count);
void (*is_dirty_writeback)(struct folio *, bool *, bool *);
int (*error_remove_folio)(struct mapping *mapping, struct folio *);
- int (*swap_activate)(struct swap_info_struct *sis, struct file *f, sector_t *span)
+ int (*swap_activate)(struct swap_info_struct *sis, struct file *f);
int (*swap_deactivate)(struct file *);
int (*swap_rw)(struct kiocb *iocb, struct iov_iter *iter);
};
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 906d5c21ebc4..198d87e6f19a 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -10204,51 +10204,17 @@ struct btrfs_swap_info {
u64 start;
u64 block_start;
u64 block_len;
- u64 lowest_ppage;
- u64 highest_ppage;
- unsigned long nr_pages;
- int nr_extents;
};
static int btrfs_add_swap_extent(struct swap_info_struct *sis,
struct btrfs_swap_info *bsi)
{
- unsigned long nr_pages;
- unsigned long max_pages;
- u64 first_ppage, first_ppage_reported, next_ppage;
- int ret;
-
- /*
- * Our swapfile may have had its size extended after the swap header was
- * written. In that case activating the swapfile should not go beyond
- * the max size set in the swap header.
- */
- if (bsi->nr_pages >= sis->max)
- return 0;
+ u64 first_ppage, next_ppage;
- max_pages = sis->max - bsi->nr_pages;
first_ppage = PAGE_ALIGN(bsi->block_start) >> PAGE_SHIFT;
next_ppage = PAGE_ALIGN_DOWN(bsi->block_start + bsi->block_len) >> PAGE_SHIFT;
- if (first_ppage >= next_ppage)
- return 0;
- nr_pages = next_ppage - first_ppage;
- nr_pages = min(nr_pages, max_pages);
-
- first_ppage_reported = first_ppage;
- if (bsi->start == 0)
- first_ppage_reported++;
- if (bsi->lowest_ppage > first_ppage_reported)
- bsi->lowest_ppage = first_ppage_reported;
- if (bsi->highest_ppage < (next_ppage - 1))
- bsi->highest_ppage = next_ppage - 1;
-
- ret = add_swap_extent(sis, bsi->nr_pages, nr_pages, first_ppage);
- if (ret < 0)
- return ret;
- bsi->nr_extents += ret;
- bsi->nr_pages += nr_pages;
- return 0;
+ return add_swap_extent(sis, next_ppage - first_ppage, first_ppage);
}
static void btrfs_swap_deactivate(struct file *file)
@@ -10259,8 +10225,7 @@ static void btrfs_swap_deactivate(struct file *file)
atomic_dec(&BTRFS_I(inode)->root->nr_swapfiles);
}
-static int btrfs_swap_activate(struct swap_info_struct *sis, struct file *file,
- sector_t *span)
+static int btrfs_swap_activate(struct swap_info_struct *sis, struct file *file)
{
struct inode *inode = file_inode(file);
struct btrfs_root *root = BTRFS_I(inode)->root;
@@ -10269,9 +10234,7 @@ static int btrfs_swap_activate(struct swap_info_struct *sis, struct file *file,
struct extent_state *cached_state = NULL;
struct btrfs_chunk_map *map = NULL;
struct btrfs_device *device = NULL;
- struct btrfs_swap_info bsi = {
- .lowest_ppage = (sector_t)-1ULL,
- };
+ struct btrfs_swap_info bsi = {};
struct btrfs_backref_share_check_ctx *backref_ctx = NULL;
struct btrfs_path *path = NULL;
int ret = 0;
@@ -10570,23 +10533,16 @@ static int btrfs_swap_activate(struct swap_info_struct *sis, struct file *file,
up_write(&BTRFS_I(inode)->i_mmap_lock);
btrfs_free_backref_share_ctx(backref_ctx);
btrfs_free_path(path);
- if (ret)
- return ret;
-
- if (device)
+ if (!ret && device)
sis->bdev = device->bdev;
- *span = bsi.highest_ppage - bsi.lowest_ppage + 1;
- sis->max = bsi.nr_pages;
- sis->pages = bsi.nr_pages - 1;
- return bsi.nr_extents;
+ return ret;
}
#else
static void btrfs_swap_deactivate(struct file *file)
{
}
-static int btrfs_swap_activate(struct swap_info_struct *sis, struct file *file,
- sector_t *span)
+static int btrfs_swap_activate(struct swap_info_struct *sis, struct file *file)
{
return -EOPNOTSUPP;
}
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index c2c2d6ac7f3d..ca7bac4a8b4a 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3940,10 +3940,9 @@ static bool ext4_dirty_folio(struct address_space *mapping, struct folio *folio)
}
static int ext4_iomap_swap_activate(struct swap_info_struct *sis,
- struct file *file, sector_t *span)
+ struct file *file)
{
- return iomap_swapfile_activate(sis, file, span,
- &ext4_iomap_report_ops);
+ return iomap_swapfile_activate(sis, file, &ext4_iomap_report_ops);
}
static const struct address_space_operations ext4_aops = {
diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index 8d4f1e75dee3..86fabacc67e6 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -4249,7 +4249,7 @@ static int f2fs_migrate_blocks(struct inode *inode, block_t start_blk,
}
static int check_swap_activate(struct swap_info_struct *sis,
- struct file *swap_file, sector_t *span)
+ struct file *swap_file)
{
struct address_space *mapping = swap_file->f_mapping;
struct inode *inode = mapping->host;
@@ -4257,9 +4257,6 @@ static int check_swap_activate(struct swap_info_struct *sis,
block_t cur_lblock;
block_t last_lblock;
block_t pblock;
- block_t lowest_pblock = -1;
- block_t highest_pblock = 0;
- int nr_extents = 0;
unsigned int nr_pblocks;
unsigned int blks_per_sec = BLKS_PER_SEC(sbi);
unsigned int not_aligned = 0;
@@ -4272,7 +4269,7 @@ static int check_swap_activate(struct swap_info_struct *sis,
cur_lblock = 0;
last_lblock = F2FS_BYTES_TO_BLK(i_size_read(inode));
- while (cur_lblock < last_lblock && cur_lblock < sis->max) {
+ while (cur_lblock < last_lblock) {
struct f2fs_map_blocks map;
bool last_extent = false;
retry:
@@ -4307,8 +4304,6 @@ static int check_swap_activate(struct swap_info_struct *sis,
not_aligned++;
nr_pblocks = roundup(nr_pblocks, blks_per_sec);
- if (cur_lblock + nr_pblocks > sis->max)
- nr_pblocks -= blks_per_sec;
/* this extent is last one */
if (!nr_pblocks) {
@@ -4328,31 +4323,14 @@ static int check_swap_activate(struct swap_info_struct *sis,
goto retry;
}
- if (cur_lblock + nr_pblocks >= sis->max)
- nr_pblocks = sis->max - cur_lblock;
-
- if (cur_lblock) { /* exclude the header page */
- if (pblock < lowest_pblock)
- lowest_pblock = pblock;
- if (pblock + nr_pblocks - 1 > highest_pblock)
- highest_pblock = pblock + nr_pblocks - 1;
- }
-
/*
* We found a PAGE_SIZE-length, PAGE_SIZE-aligned run of blocks
*/
- ret = add_swap_extent(sis, cur_lblock, nr_pblocks, pblock);
+ ret = add_swap_extent(sis, nr_pblocks, pblock);
if (ret < 0)
goto out;
- nr_extents += ret;
cur_lblock += nr_pblocks;
}
- ret = nr_extents;
- *span = 1 + highest_pblock - lowest_pblock;
- if (cur_lblock == 0)
- cur_lblock = 1; /* force Empty message */
- sis->max = cur_lblock;
- sis->pages = cur_lblock - 1;
out:
if (not_aligned)
f2fs_warn(sbi, "Swapfile (%u) is not align to section: 1) creat(), 2) ioctl(F2FS_IOC_SET_PIN_FILE), 3) fallocate(%lu * N)",
@@ -4360,8 +4338,7 @@ static int check_swap_activate(struct swap_info_struct *sis,
return ret;
}
-static int f2fs_swap_activate(struct swap_info_struct *sis, struct file *file,
- sector_t *span)
+static int f2fs_swap_activate(struct swap_info_struct *sis, struct file *file)
{
struct inode *inode = file_inode(file);
struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
@@ -4391,14 +4368,14 @@ static int f2fs_swap_activate(struct swap_info_struct *sis, struct file *file,
f2fs_precache_extents(inode);
- ret = check_swap_activate(sis, file, span);
+ ret = check_swap_activate(sis, file);
if (ret < 0)
return ret;
stat_inc_swapfile_inode(inode);
set_inode_flag(inode, FI_PIN_FILE);
f2fs_update_time(sbi, REQ_TIME);
- return ret;
+ return 0;
}
static void f2fs_swap_deactivate(struct file *file)
@@ -4409,8 +4386,7 @@ static void f2fs_swap_deactivate(struct file *file)
clear_inode_flag(inode, FI_PIN_FILE);
}
#else
-static int f2fs_swap_activate(struct swap_info_struct *sis, struct file *file,
- sector_t *span)
+static int f2fs_swap_activate(struct swap_info_struct *sis, struct file *file)
{
return -EOPNOTSUPP;
}
diff --git a/fs/iomap/swapfile.c b/fs/iomap/swapfile.c
index 0db77c449467..f778b2c6c922 100644
--- a/fs/iomap/swapfile.c
+++ b/fs/iomap/swapfile.c
@@ -11,10 +11,7 @@
struct iomap_swapfile_info {
struct iomap iomap; /* accumulated iomap */
struct swap_info_struct *sis;
- uint64_t lowest_ppage; /* lowest physical addr seen (pages) */
- uint64_t highest_ppage; /* highest physical addr seen (pages) */
unsigned long nr_pages; /* number of pages collected */
- int nr_extents; /* extent count */
struct file *file;
};
@@ -27,16 +24,8 @@ struct iomap_swapfile_info {
static int iomap_swapfile_add_extent(struct iomap_swapfile_info *isi)
{
struct iomap *iomap = &isi->iomap;
- unsigned long nr_pages;
- unsigned long max_pages;
uint64_t first_ppage;
- uint64_t first_ppage_reported;
uint64_t next_ppage;
- int error;
-
- if (unlikely(isi->nr_pages >= isi->sis->max))
- return 0;
- max_pages = isi->sis->max - isi->nr_pages;
/*
* Round the start up and the end down so that the physical
@@ -45,33 +34,7 @@ static int iomap_swapfile_add_extent(struct iomap_swapfile_info *isi)
first_ppage = ALIGN(iomap->addr, PAGE_SIZE) >> PAGE_SHIFT;
next_ppage = ALIGN_DOWN(iomap->addr + iomap->length, PAGE_SIZE) >>
PAGE_SHIFT;
-
- /* Skip too-short physical extents. */
- if (first_ppage >= next_ppage)
- return 0;
- nr_pages = next_ppage - first_ppage;
- nr_pages = min(nr_pages, max_pages);
-
- /*
- * Calculate how much swap space we're adding; the first page contains
- * the swap header and doesn't count. The mm still wants that first
- * page fed to add_swap_extent, however.
- */
- first_ppage_reported = first_ppage;
- if (iomap->offset == 0)
- first_ppage_reported++;
- if (isi->lowest_ppage > first_ppage_reported)
- isi->lowest_ppage = first_ppage_reported;
- if (isi->highest_ppage < (next_ppage - 1))
- isi->highest_ppage = next_ppage - 1;
-
- /* Add extent, set up for the next call. */
- error = add_swap_extent(isi->sis, isi->nr_pages, nr_pages, first_ppage);
- if (error < 0)
- return error;
- isi->nr_extents += error;
- isi->nr_pages += nr_pages;
- return 0;
+ return add_swap_extent(isi->sis, next_ppage - first_ppage, first_ppage);
}
static int iomap_swapfile_fail(struct iomap_swapfile_info *isi, const char *str)
@@ -138,8 +101,7 @@ static int iomap_swapfile_iter(struct iomap_iter *iter,
* passed to the swapfile subsystem.
*/
int iomap_swapfile_activate(struct swap_info_struct *sis,
- struct file *swap_file, sector_t *pagespan,
- const struct iomap_ops *ops)
+ struct file *swap_file, const struct iomap_ops *ops)
{
struct inode *inode = swap_file->f_mapping->host;
struct iomap_iter iter = {
@@ -150,7 +112,6 @@ int iomap_swapfile_activate(struct swap_info_struct *sis,
};
struct iomap_swapfile_info isi = {
.sis = sis,
- .lowest_ppage = (sector_t)-1ULL,
.file = swap_file,
};
int ret;
@@ -174,19 +135,6 @@ int iomap_swapfile_activate(struct swap_info_struct *sis,
return ret;
}
- /*
- * If this swapfile doesn't contain even a single page-aligned
- * contiguous range of blocks, reject this useless swapfile to
- * prevent confusion later on.
- */
- if (isi.nr_pages == 0) {
- pr_warn("swapon: Cannot find a single usable page in file.\n");
- return -EINVAL;
- }
-
- *pagespan = 1 + isi.highest_ppage - isi.lowest_ppage;
- sis->max = isi.nr_pages;
- sis->pages = isi.nr_pages - 1;
- return isi.nr_extents;
+ return 0;
}
EXPORT_SYMBOL_GPL(iomap_swapfile_activate);
diff --git a/fs/nfs/file.c b/fs/nfs/file.c
index 25048a3c2364..74b401aa2b3a 100644
--- a/fs/nfs/file.c
+++ b/fs/nfs/file.c
@@ -567,8 +567,7 @@ static int nfs_launder_folio(struct folio *folio)
return ret;
}
-static int nfs_swap_activate(struct swap_info_struct *sis, struct file *file,
- sector_t *span)
+static int nfs_swap_activate(struct swap_info_struct *sis, struct file *file)
{
unsigned long blocks;
long long isize;
@@ -589,19 +588,17 @@ static int nfs_swap_activate(struct swap_info_struct *sis, struct file *file,
ret = rpc_clnt_swap_activate(clnt);
if (ret)
return ret;
- ret = add_swap_extent(sis, 0, sis->max, 0);
+ ret = add_swap_extent(sis, sis->max, 0);
if (ret < 0) {
rpc_clnt_swap_deactivate(clnt);
return ret;
}
- *span = sis->pages;
-
if (cl->rpc_ops->enable_swap)
cl->rpc_ops->enable_swap(inode);
sis->flags |= SWP_FS_OPS;
- return ret;
+ return 0;
}
static void nfs_swap_deactivate(struct file *file)
diff --git a/fs/ntfs/aops.c b/fs/ntfs/aops.c
index 1fbf832ad165..4b7d019bc6ed 100644
--- a/fs/ntfs/aops.c
+++ b/fs/ntfs/aops.c
@@ -271,10 +271,9 @@ static int ntfs_writepages(struct address_space *mapping,
}
static int ntfs_swap_activate(struct swap_info_struct *sis,
- struct file *swap_file, sector_t *span)
+ struct file *swap_file)
{
- return iomap_swapfile_activate(sis, swap_file, span,
- &ntfs_read_iomap_ops);
+ return iomap_swapfile_activate(sis, swap_file, &ntfs_read_iomap_ops);
}
const struct address_space_operations ntfs_aops = {
diff --git a/fs/smb/client/file.c b/fs/smb/client/file.c
index 664a2c223089..11d4655ef490 100644
--- a/fs/smb/client/file.c
+++ b/fs/smb/client/file.c
@@ -3287,7 +3287,7 @@ void cifs_oplock_break(struct work_struct *work)
}
static int cifs_swap_activate(struct swap_info_struct *sis,
- struct file *swap_file, sector_t *span)
+ struct file *swap_file)
{
struct cifsFileInfo *cfile = swap_file->private_data;
struct inode *inode = swap_file->f_mapping->host;
@@ -3308,7 +3308,6 @@ static int cifs_swap_activate(struct swap_info_struct *sis,
pr_warn("swap activate: swapfile has holes\n");
return -EINVAL;
}
- *span = sis->pages;
pr_warn_once("Swap support over SMB3 is experimental\n");
@@ -3329,7 +3328,7 @@ static int cifs_swap_activate(struct swap_info_struct *sis,
*/
sis->flags |= SWP_FS_OPS;
- return add_swap_extent(sis, 0, sis->max, 0);
+ return add_swap_extent(sis, sis->max, 0);
}
static void cifs_swap_deactivate(struct file *file)
diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index f279055fcea0..1e8662e0e7cd 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -799,8 +799,7 @@ xfs_vm_readahead(
static int
xfs_vm_swap_activate(
struct swap_info_struct *sis,
- struct file *swap_file,
- sector_t *span)
+ struct file *swap_file)
{
struct xfs_inode *ip = XFS_I(file_inode(swap_file));
@@ -838,8 +837,7 @@ xfs_vm_swap_activate(
*/
sis->bdev = xfs_inode_buftarg(ip)->bt_bdev;
- return iomap_swapfile_activate(sis, swap_file, span,
- &xfs_read_iomap_ops);
+ return iomap_swapfile_activate(sis, swap_file, &xfs_read_iomap_ops);
}
const struct address_space_operations xfs_address_space_operations = {
diff --git a/fs/zonefs/file.c b/fs/zonefs/file.c
index 5ada33f70bb4..214e4bf8e30a 100644
--- a/fs/zonefs/file.c
+++ b/fs/zonefs/file.c
@@ -168,7 +168,7 @@ static int zonefs_writepages(struct address_space *mapping,
}
static int zonefs_swap_activate(struct swap_info_struct *sis,
- struct file *swap_file, sector_t *span)
+ struct file *swap_file)
{
struct inode *inode = file_inode(swap_file);
@@ -178,8 +178,7 @@ static int zonefs_swap_activate(struct swap_info_struct *sis,
return -EINVAL;
}
- return iomap_swapfile_activate(sis, swap_file, span,
- &zonefs_read_iomap_ops);
+ return iomap_swapfile_activate(sis, swap_file, &zonefs_read_iomap_ops);
}
const struct address_space_operations zonefs_file_aops = {
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 11559c513dfb..b8b6f7a38f4d 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -435,8 +435,7 @@ struct address_space_operations {
int (*error_remove_folio)(struct address_space *, struct folio *);
/* swapfile support */
- int (*swap_activate)(struct swap_info_struct *sis, struct file *file,
- sector_t *span);
+ int (*swap_activate)(struct swap_info_struct *sis, struct file *file);
void (*swap_deactivate)(struct file *file);
int (*swap_rw)(struct kiocb *iocb, struct iov_iter *iter);
};
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index 2c5685adf3a9..d82126e3d086 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -604,10 +604,9 @@ struct file;
struct swap_info_struct;
int iomap_swapfile_activate(struct swap_info_struct *sis,
- struct file *swap_file, sector_t *pagespan,
- const struct iomap_ops *ops);
+ struct file *swap_file, const struct iomap_ops *ops);
#else
-# define iomap_swapfile_activate(sis, swapfile, pagespan, ops) (-EIO)
+# define iomap_swapfile_activate(sis, swapfile, ops) (-EIO)
#endif /* CONFIG_SWAP */
extern struct bio_set iomap_ioend_bioset;
diff --git a/include/linux/swap.h b/include/linux/swap.h
index 7a09df6977a5..b8dfe2c6bc98 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -403,10 +403,9 @@ extern void __meminit kswapd_stop(int nid);
#ifdef CONFIG_SWAP
-int add_swap_extent(struct swap_info_struct *sis, unsigned long start_page,
- unsigned long nr_pages, sector_t start_block);
-int generic_swapfile_activate(struct swap_info_struct *, struct file *,
- sector_t *);
+int add_swap_extent(struct swap_info_struct *sis, unsigned long nr_pages,
+ sector_t start_block);
+int generic_swapfile_activate(struct swap_info_struct *, struct file *);
static inline unsigned long total_swapcache_pages(void)
{
@@ -528,8 +527,8 @@ static inline bool folio_free_swap(struct folio *folio)
}
static inline int add_swap_extent(struct swap_info_struct *sis,
- unsigned long start_page,
- unsigned long nr_pages, sector_t start_block)
+ unsigned long start_page, unsigned long nr_pages,
+ sector_t start_block)
{
return -EINVAL;
}
diff --git a/mm/page_io.c b/mm/page_io.c
index 70cea9e24d2f..f30f36ec1ed0 100644
--- a/mm/page_io.c
+++ b/mm/page_io.c
@@ -76,19 +76,14 @@ static void end_swap_bio_read(struct bio *bio)
}
int generic_swapfile_activate(struct swap_info_struct *sis,
- struct file *swap_file,
- sector_t *span)
+ struct file *swap_file)
{
struct address_space *mapping = swap_file->f_mapping;
struct inode *inode = mapping->host;
unsigned blocks_per_page;
- unsigned long page_no;
unsigned blkbits;
sector_t probe_block;
sector_t last_block;
- sector_t lowest_block = -1;
- sector_t highest_block = 0;
- int nr_extents = 0;
int ret;
blkbits = inode->i_blkbits;
@@ -99,10 +94,8 @@ int generic_swapfile_activate(struct swap_info_struct *sis,
* to be very smart.
*/
probe_block = 0;
- page_no = 0;
last_block = i_size_read(inode) >> blkbits;
- while ((probe_block + blocks_per_page) <= last_block &&
- page_no < sis->max) {
+ while ((probe_block + blocks_per_page) <= last_block) {
unsigned block_in_page;
sector_t first_block;
@@ -137,38 +130,22 @@ int generic_swapfile_activate(struct swap_info_struct *sis,
}
}
- first_block >>= (PAGE_SHIFT - blkbits);
- if (page_no) { /* exclude the header page */
- if (first_block < lowest_block)
- lowest_block = first_block;
- if (first_block > highest_block)
- highest_block = first_block;
- }
-
/*
* We found a PAGE_SIZE-length, PAGE_SIZE-aligned run of blocks
*/
- ret = add_swap_extent(sis, page_no, 1, first_block);
+ ret = add_swap_extent(sis, 1,
+ first_block >> (PAGE_SHIFT - blkbits));
if (ret < 0)
- goto out;
- nr_extents += ret;
- page_no++;
+ return ret;
probe_block += blocks_per_page;
reprobe:
continue;
}
- ret = nr_extents;
- *span = 1 + highest_block - lowest_block;
- if (page_no == 0)
- page_no = 1; /* force Empty message */
- sis->max = page_no;
- sis->pages = page_no - 1;
-out:
- return ret;
+ return 0;
+
bad_bmap:
pr_err("swapon: swapfile has holes\n");
- ret = -EINVAL;
- goto out;
+ return -EINVAL;
}
static bool is_folio_zero_filled(struct folio *folio)
diff --git a/mm/swapfile.c b/mm/swapfile.c
index f7ebd97e28a3..158620fd2978 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -2704,15 +2704,21 @@ static void destroy_swap_extents(struct swap_info_struct *sis,
* Add a block range (and the corresponding page range) into this swapdev's
* extent tree.
*
- * This function rather assumes that it is called in ascending page order.
+ * Note that start_block is in units of PAGE_SIZE and not actually in block
+ * layer sectors as the sector_t would suggest.
*/
int
-add_swap_extent(struct swap_info_struct *sis, unsigned long start_page,
- unsigned long nr_pages, sector_t start_block)
+add_swap_extent(struct swap_info_struct *sis, unsigned long nr_pages,
+ sector_t start_block)
{
struct rb_node **link = &sis->swap_extent_root.rb_node, *parent = NULL;
struct swap_extent *se;
- struct swap_extent *new_se;
+
+ if (!nr_pages)
+ return 0;
+ if (unlikely(sis->pages >= sis->max))
+ return 0;
+ nr_pages = min(nr_pages, sis->max - sis->pages);
/*
* place the new node at the right most since the
@@ -2725,25 +2731,25 @@ add_swap_extent(struct swap_info_struct *sis, unsigned long start_page,
if (parent) {
se = rb_entry(parent, struct swap_extent, rb_node);
- BUG_ON(se->start_page + se->nr_pages != start_page);
- if (se->start_block + se->nr_pages == start_block) {
- /* Merge it */
- se->nr_pages += nr_pages;
- return 0;
- }
+ if (WARN_ON_ONCE(se->start_page + se->nr_pages != sis->pages))
+ return -EINVAL;
+ if (se->start_block + se->nr_pages == start_block)
+ goto add;
}
/* No merge, insert a new extent. */
- new_se = kmalloc_obj(*se);
- if (new_se == NULL)
+ se = kzalloc_obj(*se);
+ if (!se)
return -ENOMEM;
- new_se->start_page = start_page;
- new_se->nr_pages = nr_pages;
- new_se->start_block = start_block;
-
- rb_link_node(&new_se->rb_node, parent, link);
- rb_insert_color(&new_se->rb_node, &sis->swap_extent_root);
- return 1;
+ rb_link_node(&se->rb_node, parent, link);
+ rb_insert_color(&se->rb_node, &sis->swap_extent_root);
+
+ se->start_page = sis->pages;
+ se->start_block = start_block;
+add:
+ se->nr_pages += nr_pages;
+ sis->pages += nr_pages;
+ return 0;
}
EXPORT_SYMBOL_GPL(add_swap_extent);
@@ -2775,20 +2781,17 @@ EXPORT_SYMBOL_GPL(add_swap_extent);
* extents in the rbtree. - akpm.
*/
static int setup_swap_extents(struct swap_info_struct *sis,
- struct file *swap_file, sector_t *span)
+ struct file *swap_file)
{
struct address_space *mapping = swap_file->f_mapping;
struct inode *inode = mapping->host;
int ret;
- if (S_ISBLK(inode->i_mode)) {
- ret = add_swap_extent(sis, 0, sis->max, 0);
- *span = sis->pages;
- return ret;
- }
+ if (S_ISBLK(inode->i_mode))
+ return add_swap_extent(sis, sis->max, 0);
if (mapping->a_ops->swap_activate) {
- ret = mapping->a_ops->swap_activate(sis, swap_file, span);
+ ret = mapping->a_ops->swap_activate(sis, swap_file);
if (ret < 0)
return ret;
sis->flags |= SWP_ACTIVATED;
@@ -2800,7 +2803,7 @@ static int setup_swap_extents(struct swap_info_struct *sis,
return ret;
}
- return generic_swapfile_activate(sis, swap_file, span);
+ return generic_swapfile_activate(sis, swap_file);
}
static void _enable_swap_info(struct swap_info_struct *si)
@@ -3428,6 +3431,40 @@ static int setup_swap_clusters_info(struct swap_info_struct *si,
return err;
}
+static void swap_print_info(struct swap_info_struct *si, const char *name)
+{
+ unsigned int nr_extents = 0;
+ u64 lowest_ppage = (u64)-1;
+ u64 highest_ppage = 0;
+ struct swap_extent *se;
+
+ /*
+ * Calculate how much swap space we're adding; the first page contains
+ * the swap header and doesn't count.
+ */
+ for (se = first_se(si); se; se = next_se(se)) {
+ u64 first_ppage = se->start_block;
+ u64 next_ppage = se->start_block + se->nr_pages;
+
+ if (se->start_page == 0)
+ first_ppage++;
+
+ if (lowest_ppage > first_ppage)
+ lowest_ppage = first_ppage;
+ if (highest_ppage < next_ppage - 1)
+ highest_ppage = next_ppage - 1;
+ nr_extents++;
+ }
+
+ pr_info("Adding %uk swap on %s. Priority:%d extents:%d across:%lluk %s%s%s%s\n",
+ K(si->pages), name, si->prio, nr_extents,
+ K(highest_ppage - lowest_ppage),
+ (si->flags & SWP_SOLIDSTATE) ? "SS" : "",
+ (si->flags & SWP_DISCARDABLE) ? "D" : "",
+ (si->flags & SWP_AREA_DISCARD) ? "s" : "",
+ (si->flags & SWP_PAGE_DISCARD) ? "c" : "");
+}
+
SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags)
{
struct swap_info_struct *si;
@@ -3437,8 +3474,6 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags)
int prio;
int error;
union swap_header *swap_header;
- int nr_extents;
- sector_t span;
struct folio *folio = NULL;
struct inode *inode = NULL;
bool inced_nr_rotate_swap = false;
@@ -3510,24 +3545,25 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags)
}
swap_header = kmap_local_folio(folio, 0);
+ si->pages = 0;
si->max = read_swap_header(si, swap_header, inode);
if (unlikely(!si->max)) {
error = -EINVAL;
goto bad_swap_unlock_inode;
}
- si->pages = si->max - 1;
- nr_extents = setup_swap_extents(si, swap_file, &span);
- if (nr_extents < 0) {
- error = nr_extents;
+ error = setup_swap_extents(si, swap_file);
+ if (error < 0)
goto bad_swap_unlock_inode;
- }
- if (si->pages != si->max - 1) {
- pr_err("swap:%u != (max:%u - 1)\n", si->pages, si->max);
+ if (si->pages != si->max) {
+ pr_err("swap:%u != (max:%u)\n", si->pages, si->max);
error = -EINVAL;
goto bad_swap_unlock_inode;
}
+ /* Remove the first page countaining the swap header. */
+ si->pages--;
+
/* Set up the swap cluster info */
error = setup_swap_clusters_info(si, swap_header);
if (error)
@@ -3624,13 +3660,7 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags)
/* Sets SWP_WRITEOK, resurrect the percpu ref, expose the swap device */
enable_swap_info(si);
- pr_info("Adding %uk swap on %s. Priority:%d extents:%d across:%lluk %s%s%s%s\n",
- K(si->pages), name->name, si->prio, nr_extents,
- K((unsigned long long)span),
- (si->flags & SWP_SOLIDSTATE) ? "SS" : "",
- (si->flags & SWP_DISCARDABLE) ? "D" : "",
- (si->flags & SWP_AREA_DISCARD) ? "s" : "",
- (si->flags & SWP_PAGE_DISCARD) ? "c" : "");
+ swap_print_info(si, name->name);
mutex_unlock(&swapon_mutex);
atomic_inc(&proc_poll_event);
--
2.53.0
^ permalink raw reply related
* [PATCH 01/12] swap: remove the maxpages variable in sys_swapon
From: Christoph Hellwig @ 2026-05-12 5:35 UTC (permalink / raw)
To: Andrew Morton, Chris Li, Kairui Song
Cc: Christian Brauner, Darrick J . Wong , Jens Axboe, David Sterba,
Theodore Ts'o, Jaegeuk Kim, Chao Yu, Trond Myklebust,
Anna Schumaker, Namjae Jeon, Hyunchul Lee, Steve French,
Paulo Alcantara, Carlos Maiolino, Damien Le Moal, Naohiro Aota,
linux-xfs, linux-fsdevel, linux-doc, linux-mm, linux-block,
linux-btrfs, linux-ext4, linux-f2fs-devel, linux-nfs, linux-cifs
In-Reply-To: <20260512053625.2950900-1-hch@lst.de>
Always use si->max which is updated setup_swap_extents instead of copying
into and out of maxpages.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
mm/swapfile.c | 27 +++++++++++----------------
1 file changed, 11 insertions(+), 16 deletions(-)
diff --git a/mm/swapfile.c b/mm/swapfile.c
index 9174f1eeffb0..f7ebd97e28a3 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -3350,10 +3350,9 @@ static unsigned long read_swap_header(struct swap_info_struct *si,
}
static int setup_swap_clusters_info(struct swap_info_struct *si,
- union swap_header *swap_header,
- unsigned long maxpages)
+ union swap_header *swap_header)
{
- unsigned long nr_clusters = DIV_ROUND_UP(maxpages, SWAPFILE_CLUSTER);
+ unsigned long nr_clusters = DIV_ROUND_UP(si->max, SWAPFILE_CLUSTER);
struct swap_cluster_info *cluster_info;
int err = -ENOMEM;
unsigned long i;
@@ -3395,7 +3394,7 @@ static int setup_swap_clusters_info(struct swap_info_struct *si,
if (err)
goto err;
}
- for (i = maxpages; i < round_up(maxpages, SWAPFILE_CLUSTER); i++) {
+ for (i = si->max; i < round_up(si->max, SWAPFILE_CLUSTER); i++) {
err = swap_cluster_setup_bad_slot(si, cluster_info, i, true);
if (err)
goto err;
@@ -3425,7 +3424,7 @@ static int setup_swap_clusters_info(struct swap_info_struct *si,
si->cluster_info = cluster_info;
return 0;
err:
- free_swap_cluster_info(cluster_info, maxpages);
+ free_swap_cluster_info(cluster_info, si->max);
return err;
}
@@ -3440,7 +3439,6 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags)
union swap_header *swap_header;
int nr_extents;
sector_t span;
- unsigned long maxpages;
struct folio *folio = NULL;
struct inode *inode = NULL;
bool inced_nr_rotate_swap = false;
@@ -3512,14 +3510,13 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags)
}
swap_header = kmap_local_folio(folio, 0);
- maxpages = read_swap_header(si, swap_header, inode);
- if (unlikely(!maxpages)) {
+ si->max = read_swap_header(si, swap_header, inode);
+ if (unlikely(!si->max)) {
error = -EINVAL;
goto bad_swap_unlock_inode;
}
- si->max = maxpages;
- si->pages = maxpages - 1;
+ si->pages = si->max - 1;
nr_extents = setup_swap_extents(si, swap_file, &span);
if (nr_extents < 0) {
error = nr_extents;
@@ -3531,14 +3528,12 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags)
goto bad_swap_unlock_inode;
}
- maxpages = si->max;
-
/* Set up the swap cluster info */
- error = setup_swap_clusters_info(si, swap_header, maxpages);
+ error = setup_swap_clusters_info(si, swap_header);
if (error)
goto bad_swap_unlock_inode;
- error = swap_cgroup_swapon(si->type, maxpages);
+ error = swap_cgroup_swapon(si->type, si->max);
if (error)
goto bad_swap_unlock_inode;
@@ -3546,7 +3541,7 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags)
* Use kvmalloc_array instead of bitmap_zalloc as the allocation order might
* be above MAX_PAGE_ORDER incase of a large swap file.
*/
- si->zeromap = kvmalloc_array(BITS_TO_LONGS(maxpages), sizeof(long),
+ si->zeromap = kvmalloc_array(BITS_TO_LONGS(si->max), sizeof(long),
GFP_KERNEL | __GFP_ZERO);
if (!si->zeromap) {
error = -ENOMEM;
@@ -3597,7 +3592,7 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags)
}
}
- error = zswap_swapon(si->type, maxpages);
+ error = zswap_swapon(si->type, si->max);
if (error)
goto bad_swap_unlock_inode;
--
2.53.0
^ permalink raw reply related
* improve the swap_activate interface
From: Christoph Hellwig @ 2026-05-12 5:35 UTC (permalink / raw)
To: Andrew Morton, Chris Li, Kairui Song
Cc: Christian Brauner, Darrick J . Wong , Jens Axboe, David Sterba,
Theodore Ts'o, Jaegeuk Kim, Chao Yu, Trond Myklebust,
Anna Schumaker, Namjae Jeon, Hyunchul Lee, Steve French,
Paulo Alcantara, Carlos Maiolino, Damien Le Moal, Naohiro Aota,
linux-xfs, linux-fsdevel, linux-doc, linux-mm, linux-block,
linux-btrfs, linux-ext4, linux-f2fs-devel, linux-nfs, linux-cifs
Hi all,
Darrick recently posted iomap support for fuse-iomap, which was trivial
but a bit ugly, which triggered me into looking how this could be done
in a cleaner way. The result of that is this fairly big series that
reworks how the MM code calls into the file system to activate swap
files to make it much cleaner and easier to use.
I've tested this with swap devices manually, and using the swap tests
in xfstests on btrfs, ext3, ext4, f2fs and xfs to exercise the different
implementation. Out of those all passed, but f2fs actually notruns all
tests even in the baseline as it requires special preparation for
swapfiles which never got wired up in xfstests.
Diffstat:
Documentation/filesystems/iomap/operations.rst | 3
Documentation/filesystems/locking.rst | 35 +--
Documentation/filesystems/vfs.rst | 40 ++--
block/fops.c | 15 +
fs/btrfs/btrfs_inode.h | 3
fs/btrfs/file.c | 4
fs/btrfs/inode.c | 72 -------
fs/ext4/file.c | 6
fs/ext4/inode.c | 11 -
fs/f2fs/data.c | 50 -----
fs/f2fs/f2fs.h | 2
fs/f2fs/file.c | 4
fs/iomap/swapfile.c | 165 +++---------------
fs/nfs/direct.c | 1
fs/nfs/file.c | 21 --
fs/nfs/nfs4file.c | 3
fs/ntfs/aops.c | 8
fs/ntfs/file.c | 6
fs/smb/client/cifsfs.c | 18 +
fs/smb/client/cifsfs.h | 3
fs/smb/client/file.c | 16 -
fs/xfs/xfs_aops.c | 48 -----
fs/xfs/xfs_file.c | 39 ++++
fs/zonefs/file.c | 30 +--
include/linux/fs.h | 11 -
include/linux/iomap.h | 5
include/linux/nfs_fs.h | 3
include/linux/swap.h | 129 +-------------
mm/page_io.c | 45 ----
mm/swap.h | 92 ++++++++++
mm/swapfile.c | 227 ++++++++++++++-----------
31 files changed, 471 insertions(+), 644 deletions(-)
^ permalink raw reply
* Re: [PATCH v4 v4 2/3] ext4: use atomic operations for s_mb_stats accesses
From: liubaolin @ 2026-05-12 2:11 UTC (permalink / raw)
To: sashiko, libaokun, adilger.kernel, tytso
Cc: linux-ext4, linux-kernel, adilger, wangguanyu
In-Reply-To: <20260508200245.A703BC2BCC7@smtp.kernel.org>
Dear All,
Sashiko AI reviewed the patch and provided two points. I will take
these into consideration and submit a revised version for v5 as soon as
possible.
Thanks,
Baolin
在 2026/5/9 04:02, sashiko-bot@kernel.org 写道:
> Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
> - [Low] Architectural Anti-pattern: Using `atomic_t` for a simple configuration flag
> --
>
> commit 4ef770887d49b5db285ae4848dba912e51abd333
> Author: Baolin Liu <liubaolin@kylinos.cn>
> ext4: use atomic operations for s_mb_stats accesses
>
> This commit converts the s_mb_stats configuration flag in ext4 from an
> unsigned int to an atomic_t to fix data races between sysfs updates and
> mballoc reads.
>
>> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
>> index 94283a991e5c4..4fb19fcb1f5f3 100644
>> --- a/fs/ext4/ext4.h
>> +++ b/fs/ext4/ext4.h
> [ ... ]
>> @@ -1622,7 +1622,7 @@ struct ext4_sb_info {
>> unsigned int s_mb_stream_request;
>> unsigned int s_mb_max_to_scan;
>> unsigned int s_mb_min_to_scan;
>> - unsigned int s_mb_stats;
>> + atomic_t s_mb_stats;
>
> Since s_mb_stats is only loaded and stored concurrently without any
> read-modify-write operations (like atomic_inc), would it be better to retain
> the unsigned int type and use READ_ONCE() and WRITE_ONCE() instead? Using
> atomic_t for simple configuration flags can add unnecessary conceptual
> overhead.
>
>> diff --git a/fs/ext4/sysfs.c b/fs/ext4/sysfs.c
>> index 923b375e017fa..fad2819f36319 100644
>> --- a/fs/ext4/sysfs.c
>> +++ b/fs/ext4/sysfs.c
> [ ... ]
>> @@ -540,6 +541,12 @@ static ssize_t ext4_generic_attr_store(struct ext4_attr *a,
>> return ret;
>> *((unsigned long *) ptr) = lt;
>> return len;
>> + case attr_pointer_atomic:
>> + ret = kstrtoint(skip_spaces(buf), 0, &i);
>> + if (ret)
>> + return ret;
>> + atomic_set((atomic_t *)ptr, i);
>> + return len;
>> }
>> return 0;
>
> Does migrating from attr_pointer_ui (which uses kstrtouint) to
> attr_pointer_atomic unintentionally allow logically invalid negative
> values to be written to s_mb_stats, since kstrtoint accepts negative numbers?
>
^ permalink raw reply
* [syzbot ci] Re: fs: Fix missed inode write during fsync
From: syzbot ci @ 2026-05-11 20:49 UTC (permalink / raw)
To: aivazian.tigran, brauner, dsterba, hirofumi, jack, linux-ext4,
linux-fsdevel, tytso
Cc: syzbot, syzkaller-bugs
In-Reply-To: <20260511115725.28441-1-jack@suse.cz>
syzbot ci has tested the following series
[v1] fs: Fix missed inode write during fsync
https://lore.kernel.org/all/20260511115725.28441-1-jack@suse.cz
* [PATCH 1/9] affs: Drop support for metadata bh tracking
* [PATCH 2/9] ext4: Allocate mapping_metadata_bhs struct on demand
* [PATCH 3/9] fs: Writeout inode buffer from mmb_sync()
* [PATCH 4/9] ext2: Fix possibly missing inode write on fsync(2)
* [PATCH 5/9] udf: Fix possibly missing inode write on fsync(2)
* [PATCH 6/9] fat: Fix possibly missing inode write on fsync(2)
* [PATCH 7/9] minix: Fix possibly missing inode write on fsync(2)
* [PATCH 8/9] bfs: Fix possibly missing inode write on fsync(2)
* [PATCH 9/9] ext4: Use mmb infrastructure for inode buffer writeout
and found the following issue:
KASAN: null-ptr-deref Write in write_dirty_buffer
Full report is available here:
https://ci.syzbot.org/series/d987d2d8-3775-4aa9-959f-8a045778888c
***
KASAN: null-ptr-deref Write in write_dirty_buffer
tree: torvalds
URL: https://kernel.googlesource.com/pub/scm/linux/kernel/git/torvalds/linux
base: 5d6919055dec134de3c40167a490f33c74c12581
arch: amd64
compiler: Debian clang version 21.1.8 (++20251221033036+2078da43e25a-1~exp1~20251221153213.50), Debian LLD 21.1.8
config: https://ci.syzbot.org/builds/567d596c-ca65-43c9-bd7d-1e60cfe9da2a/config
syz repro: https://ci.syzbot.org/findings/1bc13af8-2d91-4fbd-b43e-fbe72f29ca41/syz_repro
EXT4-fs (loop2): unmounting filesystem 00000000-0000-0000-0000-000000000000.
==================================================================
BUG: KASAN: null-ptr-deref in instrument_atomic_read_write include/linux/instrumented.h:112 [inline]
BUG: KASAN: null-ptr-deref in test_and_set_bit_lock include/asm-generic/bitops/instrumented-lock.h:57 [inline]
BUG: KASAN: null-ptr-deref in trylock_buffer include/linux/buffer_head.h:425 [inline]
BUG: KASAN: null-ptr-deref in lock_buffer include/linux/buffer_head.h:431 [inline]
BUG: KASAN: null-ptr-deref in write_dirty_buffer+0x37/0x190 fs/buffer.c:2760
Write of size 8 at addr 0000000000000000 by task syz-executor/5742
CPU: 1 UID: 0 PID: 5742 Comm: syz-executor Not tainted syzkaller #0 PREEMPT(full)
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0xe8/0x150 lib/dump_stack.c:120
kasan_report+0x117/0x150 mm/kasan/report.c:595
check_region_inline mm/kasan/generic.c:-1 [inline]
kasan_check_range+0x264/0x2c0 mm/kasan/generic.c:200
instrument_atomic_read_write include/linux/instrumented.h:112 [inline]
test_and_set_bit_lock include/asm-generic/bitops/instrumented-lock.h:57 [inline]
trylock_buffer include/linux/buffer_head.h:425 [inline]
lock_buffer include/linux/buffer_head.h:431 [inline]
write_dirty_buffer+0x37/0x190 fs/buffer.c:2760
mmb_sync+0x74c/0xed0 fs/buffer.c:603
ext4_evict_inode+0x2fa/0x1040 fs/ext4/inode.c:199
evict+0x61e/0xb10 fs/inode.c:841
ext4_quota_off+0x470/0x580 fs/ext4/super.c:7326
ext4_quotas_off fs/ext4/super.c:1195 [inline]
ext4_put_super+0xdf/0xd80 fs/ext4/super.c:1306
generic_shutdown_super+0x13d/0x2d0 fs/super.c:646
kill_block_super+0x44/0x90 fs/super.c:1725
ext4_kill_sb+0x68/0xb0 fs/ext4/super.c:7494
deactivate_locked_super+0xbc/0x130 fs/super.c:476
cleanup_mnt+0x437/0x4d0 fs/namespace.c:1312
task_work_run+0x1d9/0x270 kernel/task_work.c:233
resume_user_mode_work include/linux/resume_user_mode.h:50 [inline]
__exit_to_user_mode_loop kernel/entry/common.c:67 [inline]
exit_to_user_mode_loop+0xf3/0x4d0 kernel/entry/common.c:98
__exit_to_user_mode_prepare include/linux/irq-entry-common.h:207 [inline]
syscall_exit_to_user_mode_prepare include/linux/irq-entry-common.h:238 [inline]
syscall_exit_to_user_mode include/linux/entry-common.h:318 [inline]
do_syscall_64+0x33e/0xf80 arch/x86/entry/syscall_64.c:100
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7fd8d1b9e017
Code: a2 c7 05 dc 06 25 00 00 00 00 00 eb 96 e8 e1 12 00 00 90 31 f6 e9 09 00 00 00 66 0f 1f 84 00 00 00 00 00 b8 a6 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 01 c3 48 c7 c2 e8 ff ff ff f7 d8 64 89 02 b8
RSP: 002b:00007ffef04ebf88 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
RAX: 0000000000000000 RBX: 00007fd8d1c32120 RCX: 00007fd8d1b9e017
RDX: 0000000000000000 RSI: 0000000000000009 RDI: 00007ffef04ec040
RBP: 00007ffef04ec040 R08: 00007ffef04ed040 R09: 00000000ffffffff
R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffef04ed0d0
R13: 00007fd8d1c32120 R14: 0000000000014595 R15: 00007ffef04ed110
</TASK>
==================================================================
***
If these findings have caused you to resend the series or submit a
separate fix, please add the following tag to your commit message:
Tested-by: syzbot@syzkaller.appspotmail.com
---
This report is generated by a bot. It may contain errors.
syzbot ci engineers can be reached at syzkaller@googlegroups.com.
To test a patch for this bug, please reply with `#syz test`
(should be on a separate line).
The patch should be attached to the email.
Note: arguments like custom git repos and branches are not supported.
^ permalink raw reply
* Re: [PATCH 6/9] fat: Fix possibly missing inode write on fsync(2)
From: OGAWA Hirofumi @ 2026-05-11 18:02 UTC (permalink / raw)
To: Jan Kara
Cc: linux-fsdevel, Christian Brauner, aivazian.tigran, Ted Tso,
linux-ext4
In-Reply-To: <rnl552jwa6x72ibx3sg2oeitn6sh6jp5tnfu7evycol2fopw7v@ahwlfjnuoiwy>
Jan Kara <jack@suse.cz> writes:
> On Mon 11-05-26 23:32:45, OGAWA Hirofumi wrote:
>> Jan Kara <jack@suse.cz> writes:
>>
>> > Use mmb inode buffer writeout infrastructure to reliably write out
>> > inode's buffer on fsync(2).
>>
>> > Signed-off-by: Jan Kara <jack@suse.cz>
>> > ---
>> > fs/fat/inode.c | 3 ++-
>> > 1 file changed, 2 insertions(+), 1 deletion(-)
>> >
>> > diff --git a/fs/fat/inode.c b/fs/fat/inode.c
>> > index 28f78df086ef..4ca00b7a618b 100644
>> > --- a/fs/fat/inode.c
>> > +++ b/fs/fat/inode.c
>> > @@ -907,6 +907,7 @@ static int __fat_write_inode(struct inode *inode, int wait)
>> > }
>> > spin_unlock(&sbi->inode_hash_lock);
>> > mark_buffer_dirty(bh);
>> > + MSDOS_I(inode)->i_metadata_bhs.inode_blk = bh->b_blocknr;
>>
>> When inode position was changed/removed, this will point the wrong
>> block. And maybe sync a unrelated block and wait.
>
> So I didn't realize that e.g. rename does change the backing inode block.
> But given we set i_metadata_bhs.inode_blk on each inode write, inode_blk
> should always contain the current position where the inode was written so
> fsync should be syncing the right block. Or am I still missing something?
I didn't check the case of rename completely, just recalled it when I
saw this code, need confirm/check. But at least, the case of remove
will leave it even after the block is reused.
--
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
^ permalink raw reply
* Re: [PATCH 6/9] fat: Fix possibly missing inode write on fsync(2)
From: Jan Kara @ 2026-05-11 17:03 UTC (permalink / raw)
To: OGAWA Hirofumi
Cc: Jan Kara, linux-fsdevel, Christian Brauner, aivazian.tigran,
Ted Tso, linux-ext4
In-Reply-To: <87pl32yq2a.fsf@mail.parknet.co.jp>
On Mon 11-05-26 23:32:45, OGAWA Hirofumi wrote:
> Jan Kara <jack@suse.cz> writes:
>
> > Use mmb inode buffer writeout infrastructure to reliably write out
> > inode's buffer on fsync(2).
>
> > Signed-off-by: Jan Kara <jack@suse.cz>
> > ---
> > fs/fat/inode.c | 3 ++-
> > 1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/fs/fat/inode.c b/fs/fat/inode.c
> > index 28f78df086ef..4ca00b7a618b 100644
> > --- a/fs/fat/inode.c
> > +++ b/fs/fat/inode.c
> > @@ -907,6 +907,7 @@ static int __fat_write_inode(struct inode *inode, int wait)
> > }
> > spin_unlock(&sbi->inode_hash_lock);
> > mark_buffer_dirty(bh);
> > + MSDOS_I(inode)->i_metadata_bhs.inode_blk = bh->b_blocknr;
>
> When inode position was changed/removed, this will point the wrong
> block. And maybe sync a unrelated block and wait.
So I didn't realize that e.g. rename does change the backing inode block.
But given we set i_metadata_bhs.inode_blk on each inode write, inode_blk
should always contain the current position where the inode was written so
fsync should be syncing the right block. Or am I still missing something?
Honza
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply
* Re: [PATCH v14 00/15] Exposing case folding behavior
From: Christian Brauner @ 2026-05-11 14:55 UTC (permalink / raw)
To: Chuck Lever
Cc: Christian Brauner, Al Viro, Jan Kara, linux-fsdevel, linux-ext4,
linux-xfs, linux-cifs, linux-nfs, linux-api, linux-f2fs-devel,
hirofumi, linkinjeon, sj1557.seo, yuezhang.mo,
almaz.alexandrovich, slava, glaubitz, frank.li, tytso,
adilger.kernel, cem, sfrench, pc, ronniesahlberg, sprasad,
trondmy, anna, jaegeuk, chao, hansg, senozhatsky, Darrick J. Wong,
Roland Mainz, Steve French
In-Reply-To: <20260507-case-sensitivity-v14-0-e62cc8200435@oracle.com>
On Thu, 07 May 2026 04:52:53 -0400, Chuck Lever wrote:
> Christian, let's lock this one in. I will post subsequent changes
> as delta patches.
>
> Following on from:
>
> https://lore.kernel.org/linux-nfs/20251021-zypressen-bazillus-545a44af57fd@brauner/T/#m0ba197d75b7921d994cf284f3cef3a62abb11aaa
>
> [...]
Now on the correct branch and pushed out.
---
Applied to the vfs-7.2.casefold branch of the vfs/vfs.git tree.
Patches in the vfs-7.2.casefold branch should appear in linux-next soon.
Please report any outstanding bugs that were missed during review in a
new review to the original patch series allowing us to drop it.
It's encouraged to provide Acked-bys and Reviewed-bys even though the
patch has now been applied. If possible patch trailers will be updated.
Note that commit hashes shown below are subject to change due to rebase,
trailer updates or similar. If in doubt, please check the listed branch.
tree: https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git
branch: master
[01/15] fs: Move file_kattr initialization to callers
https://git.kernel.org/vfs/vfs/c/14c3197ecf07
[02/15] fs: Add case sensitivity flags to file_kattr
https://git.kernel.org/vfs/vfs/c/3035e4454142
[03/15] fat: Implement fileattr_get for case sensitivity
https://git.kernel.org/vfs/vfs/c/c92db2ca726f
[04/15] exfat: Implement fileattr_get for case sensitivity
https://git.kernel.org/vfs/vfs/c/27e0b573dd4a
[05/15] ntfs3: Implement fileattr_get for case sensitivity
https://git.kernel.org/vfs/vfs/c/eeb7b37b9700
[06/15] hfs: Implement fileattr_get for case sensitivity
https://git.kernel.org/vfs/vfs/c/b6fe046c3023
[07/15] hfsplus: Report case sensitivity in fileattr_get
https://git.kernel.org/vfs/vfs/c/a6469a15eefe
[08/15] xfs: Report case sensitivity in fileattr_get
https://git.kernel.org/vfs/vfs/c/c9da43e4e5c3
[09/15] cifs: Implement fileattr_get for case sensitivity
https://git.kernel.org/vfs/vfs/c/e50bc12f5a36
[10/15] nfs: Implement fileattr_get for case sensitivity
https://git.kernel.org/vfs/vfs/c/92d67628a1a9
[11/15] vboxsf: Implement fileattr_get for case sensitivity
https://git.kernel.org/vfs/vfs/c/ef14aa143f1d
[12/15] isofs: Implement fileattr_get for case sensitivity
https://git.kernel.org/vfs/vfs/c/7bbd51b1d748
[13/15] nfsd: Report export case-folding via NFSv3 PATHCONF
https://git.kernel.org/vfs/vfs/c/211cb2ba4877
[14/15] nfsd: Implement NFSv4 FATTR4_CASE_INSENSITIVE and FATTR4_CASE_PRESERVING
https://git.kernel.org/vfs/vfs/c/01ee7c3d2e23
[15/15] ksmbd: Report filesystem case sensitivity via FS_ATTRIBUTE_INFORMATION
https://git.kernel.org/vfs/vfs/c/0164df1d1de7
^ permalink raw reply
* Re: [PATCH v14 00/15] Exposing case folding behavior
From: Christian Brauner @ 2026-05-11 14:55 UTC (permalink / raw)
To: Chuck Lever
Cc: linux-fsdevel, linux-ext4, linux-xfs, linux-cifs, linux-nfs,
linux-api, linux-f2fs-devel, hirofumi, linkinjeon, sj1557.seo,
yuezhang.mo, almaz.alexandrovich, slava, glaubitz, frank.li,
tytso, adilger.kernel, cem, sfrench, pc, ronniesahlberg, sprasad,
trondmy, anna, jaegeuk, chao, hansg, senozhatsky, Darrick J. Wong,
Roland Mainz, Steve French
In-Reply-To: <705e1769-6a5e-440d-bf50-5e5feec2b88d@oracle.com>
On Mon, May 11, 2026 at 10:07:44AM -0400, Chuck Lever wrote:
> On 5/11/26 10:02 AM, Christian Brauner wrote:
> > On Thu, 07 May 2026 04:52:53 -0400, Chuck Lever wrote:
> >> Christian, let's lock this one in. I will post subsequent changes
> >> as delta patches.
> >>
> >> Following on from:
> >>
> >> https://lore.kernel.org/linux-nfs/20251021-zypressen-bazillus-545a44af57fd@brauner/T/#m0ba197d75b7921d994cf284f3cef3a62abb11aaa
> >>
> >> [...]
> >
> > Applied to the vfs-7.2.exportfs branch of the vfs/vfs.git tree.
> > Patches in the vfs-7.2.exportfs branch should appear in linux-next soon.
>
> Not vfs-7.2-casefold ?
>
> Fwiw, I was intending to rebase nfsd-next on the vfs integration branch,
> which should have both vfs-7.2-casefold and vfs-7.2.exportfs merged in,
> along with Jeff's series that implements the infrastructure to support
> directory delegation properly. LMK if that's crazy talk.
I think you should just merge:
vfs-7.2.exportfs
vfs-7.2.casefold
as the order doesn't matter and they don't depend on each other. I have
marked both branches as shared so they won't get touched again. If more
ends on either of these branches it doesn't matter to you. IOW, I think
you should just pick the minimum you need and then you don't need to
hear from again and by the time you send your pull request all the
prerequisites will already be in Linus' tree.
If you merge in vfs.all you're subject to problems from rebases. So I'd
not recommend that.
^ permalink raw reply
* Re: [PATCH 6/9] fat: Fix possibly missing inode write on fsync(2)
From: OGAWA Hirofumi @ 2026-05-11 14:32 UTC (permalink / raw)
To: Jan Kara
Cc: linux-fsdevel, Christian Brauner, aivazian.tigran, Ted Tso,
linux-ext4
In-Reply-To: <20260511121356.241821-15-jack@suse.cz>
Jan Kara <jack@suse.cz> writes:
> Use mmb inode buffer writeout infrastructure to reliably write out
> inode's buffer on fsync(2).
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
> fs/fat/inode.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/fs/fat/inode.c b/fs/fat/inode.c
> index 28f78df086ef..4ca00b7a618b 100644
> --- a/fs/fat/inode.c
> +++ b/fs/fat/inode.c
> @@ -907,6 +907,7 @@ static int __fat_write_inode(struct inode *inode, int wait)
> }
> spin_unlock(&sbi->inode_hash_lock);
> mark_buffer_dirty(bh);
> + MSDOS_I(inode)->i_metadata_bhs.inode_blk = bh->b_blocknr;
When inode position was changed/removed, this will point the wrong
block. And maybe sync a unrelated block and wait.
> err = 0;
> if (wait)
> err = sync_dirty_buffer(bh);
> @@ -925,7 +926,7 @@ static int fat_write_inode(struct inode *inode, struct writeback_control *wbc)
> err = fat_clusters_flush(sb);
> mutex_unlock(&MSDOS_SB(sb)->s_lock);
> } else
> - err = __fat_write_inode(inode, wbc->sync_mode == WB_SYNC_ALL);
> + err = __fat_write_inode(inode, 0);
>
> return err;
> }
--
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
^ permalink raw reply
* Re: [PATCH v14 00/15] Exposing case folding behavior
From: Chuck Lever @ 2026-05-11 14:07 UTC (permalink / raw)
To: Christian Brauner
Cc: linux-fsdevel, linux-ext4, linux-xfs, linux-cifs, linux-nfs,
linux-api, linux-f2fs-devel, hirofumi, linkinjeon, sj1557.seo,
yuezhang.mo, almaz.alexandrovich, slava, glaubitz, frank.li,
tytso, adilger.kernel, cem, sfrench, pc, ronniesahlberg, sprasad,
trondmy, anna, jaegeuk, chao, hansg, senozhatsky, Darrick J. Wong,
Roland Mainz, Steve French
In-Reply-To: <20260511-wertverlust-vorbringen-070f016f3bd4@brauner>
On 5/11/26 10:02 AM, Christian Brauner wrote:
> On Thu, 07 May 2026 04:52:53 -0400, Chuck Lever wrote:
>> Christian, let's lock this one in. I will post subsequent changes
>> as delta patches.
>>
>> Following on from:
>>
>> https://lore.kernel.org/linux-nfs/20251021-zypressen-bazillus-545a44af57fd@brauner/T/#m0ba197d75b7921d994cf284f3cef3a62abb11aaa
>>
>> [...]
>
> Applied to the vfs-7.2.exportfs branch of the vfs/vfs.git tree.
> Patches in the vfs-7.2.exportfs branch should appear in linux-next soon.
Not vfs-7.2-casefold ?
Fwiw, I was intending to rebase nfsd-next on the vfs integration branch,
which should have both vfs-7.2-casefold and vfs-7.2.exportfs merged in,
along with Jeff's series that implements the infrastructure to support
directory delegation properly. LMK if that's crazy talk.
--
Chuck Lever
^ permalink raw reply
* Re: [PATCH v14 00/15] Exposing case folding behavior
From: Christian Brauner @ 2026-05-11 14:02 UTC (permalink / raw)
To: Chuck Lever
Cc: Al Viro, Jan Kara, linux-fsdevel, linux-ext4, linux-xfs,
linux-cifs, linux-nfs, linux-api, linux-f2fs-devel, hirofumi,
linkinjeon, sj1557.seo, yuezhang.mo, almaz.alexandrovich, slava,
glaubitz, frank.li, tytso, adilger.kernel, cem, sfrench, pc,
ronniesahlberg, sprasad, trondmy, anna, jaegeuk, chao, hansg,
senozhatsky, Chuck Lever, Darrick J. Wong, Roland Mainz,
Steve French
In-Reply-To: <20260507-case-sensitivity-v14-0-e62cc8200435@oracle.com>
On Thu, May 07, 2026 at 04:52:53AM -0400, Chuck Lever wrote:
> Christian, let's lock this one in. I will post subsequent changes
> as delta patches.
Perfect!
^ permalink raw reply
* Re: [PATCH v14 00/15] Exposing case folding behavior
From: Christian Brauner @ 2026-05-11 14:02 UTC (permalink / raw)
To: Chuck Lever
Cc: Christian Brauner, linux-fsdevel, linux-ext4, linux-xfs,
linux-cifs, linux-nfs, linux-api, linux-f2fs-devel, hirofumi,
linkinjeon, sj1557.seo, yuezhang.mo, almaz.alexandrovich, slava,
glaubitz, frank.li, tytso, adilger.kernel, cem, sfrench, pc,
ronniesahlberg, sprasad, trondmy, anna, jaegeuk, chao, hansg,
senozhatsky, Darrick J. Wong, Roland Mainz, Steve French
In-Reply-To: <20260507-case-sensitivity-v14-0-e62cc8200435@oracle.com>
On Thu, 07 May 2026 04:52:53 -0400, Chuck Lever wrote:
> Christian, let's lock this one in. I will post subsequent changes
> as delta patches.
>
> Following on from:
>
> https://lore.kernel.org/linux-nfs/20251021-zypressen-bazillus-545a44af57fd@brauner/T/#m0ba197d75b7921d994cf284f3cef3a62abb11aaa
>
> [...]
Applied to the vfs-7.2.exportfs branch of the vfs/vfs.git tree.
Patches in the vfs-7.2.exportfs branch should appear in linux-next soon.
Please report any outstanding bugs that were missed during review in a
new review to the original patch series allowing us to drop it.
It's encouraged to provide Acked-bys and Reviewed-bys even though the
patch has now been applied. If possible patch trailers will be updated.
Note that commit hashes shown below are subject to change due to rebase,
trailer updates or similar. If in doubt, please check the listed branch.
tree: https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git
branch: vfs-7.2.exportfs
[01/15] fs: Move file_kattr initialization to callers
https://git.kernel.org/vfs/vfs/c/9d3942fa6a55
[02/15] fs: Add case sensitivity flags to file_kattr
https://git.kernel.org/vfs/vfs/c/72504a889e52
[03/15] fat: Implement fileattr_get for case sensitivity
https://git.kernel.org/vfs/vfs/c/d0d06cfce960
[04/15] exfat: Implement fileattr_get for case sensitivity
https://git.kernel.org/vfs/vfs/c/64a4f2090cb2
[05/15] ntfs3: Implement fileattr_get for case sensitivity
https://git.kernel.org/vfs/vfs/c/5fff53318cbf
[06/15] hfs: Implement fileattr_get for case sensitivity
https://git.kernel.org/vfs/vfs/c/1b25c01375e0
[07/15] hfsplus: Report case sensitivity in fileattr_get
https://git.kernel.org/vfs/vfs/c/b9e976dd58ff
[08/15] xfs: Report case sensitivity in fileattr_get
https://git.kernel.org/vfs/vfs/c/30617d630d2f
[09/15] cifs: Implement fileattr_get for case sensitivity
https://git.kernel.org/vfs/vfs/c/0f372b05c80c
[10/15] nfs: Implement fileattr_get for case sensitivity
https://git.kernel.org/vfs/vfs/c/3ca9954cdc04
[11/15] vboxsf: Implement fileattr_get for case sensitivity
https://git.kernel.org/vfs/vfs/c/0f5f23d411ac
[12/15] isofs: Implement fileattr_get for case sensitivity
https://git.kernel.org/vfs/vfs/c/d56f6094035c
[13/15] nfsd: Report export case-folding via NFSv3 PATHCONF
https://git.kernel.org/vfs/vfs/c/5ca2c8f14428
[14/15] nfsd: Implement NFSv4 FATTR4_CASE_INSENSITIVE and FATTR4_CASE_PRESERVING
https://git.kernel.org/vfs/vfs/c/62c9555937ca
[15/15] ksmbd: Report filesystem case sensitivity via FS_ATTRIBUTE_INFORMATION
https://git.kernel.org/vfs/vfs/c/35188379f010
^ permalink raw reply
* Re: [PATCH 9/9] ext4: Use mmb infrastructure for inode buffer writeout
From: Christian Brauner @ 2026-05-11 13:30 UTC (permalink / raw)
To: Jan Kara
Cc: linux-fsdevel, aivazian.tigran, OGAWA Hirofumi, Ted Tso,
linux-ext4
In-Reply-To: <20260511121356.241821-18-jack@suse.cz>
On Mon, May 11, 2026 at 02:13:59PM +0200, Jan Kara wrote:
> Use mmb inode buffer writeout infrastructure to reliably write out
> inode's inode table block on fsync(2) in nojournal mode (from
> ext4_sync_parent() and ext4_fsync_nojournal()). This significantly
> simplifies the code as we don't have to explicitely handle inode buffer
> writeback in ext4_write_inode() and thus we can also remove
> sync_inode_metadata() calls from ext4_sync_parent() and
> ext4_write_inode() call from ext4_fsync_nojournal().
>
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
> fs/ext4/ext4_jbd2.c | 2 +-
> fs/ext4/ext4_jbd2.h | 2 ++
> fs/ext4/fsync.c | 12 ------------
> fs/ext4/inode.c | 24 +++++-------------------
> 4 files changed, 8 insertions(+), 32 deletions(-)
>
> diff --git a/fs/ext4/ext4_jbd2.c b/fs/ext4/ext4_jbd2.c
> index 74f05bd0cdde..6bbaf72108fd 100644
> --- a/fs/ext4/ext4_jbd2.c
> +++ b/fs/ext4/ext4_jbd2.c
> @@ -350,7 +350,7 @@ int __ext4_journal_get_create_access(const char *where, unsigned int line,
> return 0;
> }
>
> -static void ext4_inode_attach_mmb(struct inode *inode)
> +void ext4_inode_attach_mmb(struct inode *inode)
> {
> struct mapping_metadata_bhs *mmb;
>
> diff --git a/fs/ext4/ext4_jbd2.h b/fs/ext4/ext4_jbd2.h
> index 63d17c5201b5..2a01b8279c88 100644
> --- a/fs/ext4/ext4_jbd2.h
> +++ b/fs/ext4/ext4_jbd2.h
> @@ -122,6 +122,8 @@
> #define EXT4_HT_EXT_CONVERT 11
> #define EXT4_HT_MAX 12
>
> +void ext4_inode_attach_mmb(struct inode *inode);
> +
> int
> ext4_mark_iloc_dirty(handle_t *handle,
> struct inode *inode,
> diff --git a/fs/ext4/fsync.c b/fs/ext4/fsync.c
> index e25d365e1179..af84489e57c6 100644
> --- a/fs/ext4/fsync.c
> +++ b/fs/ext4/fsync.c
> @@ -75,9 +75,6 @@ static int ext4_sync_parent(struct inode *inode)
> if (ret)
> break;
> }
> - ret = sync_inode_metadata(inode, 1);
> - if (ret)
> - break;
> }
> dput(dentry);
> return ret;
> @@ -87,10 +84,6 @@ static int ext4_fsync_nojournal(struct file *file, loff_t start, loff_t end,
> int datasync, bool *needs_barrier)
> {
> struct inode *inode = file->f_inode;
> - struct writeback_control wbc = {
> - .sync_mode = WB_SYNC_ALL,
> - .nr_to_write = 0,
> - };
> int ret;
>
> ret = mmb_fsync_noflush(file, EXT4_I(inode)->i_metadata_bhs,
> @@ -98,11 +91,6 @@ static int ext4_fsync_nojournal(struct file *file, loff_t start, loff_t end,
> if (ret)
> return ret;
>
> - /* Force writeout of inode table buffer to disk */
> - ret = ext4_write_inode(inode, &wbc);
> - if (ret)
> - return ret;
> -
> ret = ext4_sync_parent(inode);
>
> if (test_opt(inode->i_sb, BARRIER))
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 3e66e9510909..09506b4de1b2 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -5786,24 +5786,6 @@ int ext4_write_inode(struct inode *inode, struct writeback_control *wbc)
>
> err = ext4_fc_commit(EXT4_SB(inode->i_sb)->s_journal,
> EXT4_I(inode)->i_sync_tid);
> - } else {
> - struct ext4_iloc iloc;
> -
> - err = __ext4_get_inode_loc_noinmem(inode, &iloc);
> - if (err)
> - return err;
> - /*
> - * sync(2) will flush the whole buffer cache. No need to do
> - * it here separately for each inode.
> - */
> - if (wbc->sync_mode == WB_SYNC_ALL && !wbc->for_sync)
> - sync_dirty_buffer(iloc.bh);
> - if (buffer_req(iloc.bh) && !buffer_uptodate(iloc.bh)) {
> - ext4_error_inode_block(inode, iloc.bh->b_blocknr, EIO,
> - "IO error syncing inode");
> - err = -EIO;
> - }
> - brelse(iloc.bh);
> }
> return err;
> }
> @@ -6348,7 +6330,11 @@ int ext4_mark_iloc_dirty(handle_t *handle,
>
> /* the do_update_inode consumes one bh->b_count */
> get_bh(iloc->bh);
> -
> + if (!ext4_handle_valid(handle)) {
> + if (!EXT4_I(inode)->i_metadata_bhs)
> + ext4_inode_attach_mmb(inode);
> + EXT4_I(inode)->i_metadata_bhs->inode_blk = iloc->bh->b_blocknr;
The series is great overall. The only thing I think we should change is
that we should hide this
EXT4_I(inode)->i_metadata_bhs->inode_blk = iloc->bh->b_blocknr;
behind a dedicated static inline/regular function call instead of
open-coding it everywhere. Can then also be paired with some
VFS_WARN_ON_ONCE() to detect garbage bh->b_blocknr.
^ permalink raw reply
* Re: [PATCH 3/9] fs: Writeout inode buffer from mmb_sync()
From: Christian Brauner @ 2026-05-11 13:27 UTC (permalink / raw)
To: Jan Kara
Cc: linux-fsdevel, aivazian.tigran, OGAWA Hirofumi, Ted Tso,
linux-ext4
In-Reply-To: <20260511121356.241821-12-jack@suse.cz>
On Mon, May 11, 2026 at 02:13:53PM +0200, Jan Kara wrote:
> Currently metadata bh tracking does not track inode buffers because they
> are usually shared by several inodes and so our linked list tracking
> cannot be used. On fsync we call sync_inode_metadata() to write inode
> instead where filesystems' .write_inode methods detect data integrity
> writeback and take care to submit inode buffer to disk and wait for it
> in that case. This is however racy as for example flush worker can
> submit normal (WB_SYNC_NONE) inode writeback first, which makes the
> inode clean and copies the inode to the buffer but doesn't submit the
> buffer for IO. Thus sync_inode_metadata() call does nothing and we fail
> to persist inode buffer to disk on fsync(2).
>
> Fix the problem by allowing filesystem to set the number of block backing
> the inode in mmb structure and mmb_sync() then takes care to writeout
> corresponding buffer and wait for it.
>
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
> fs/buffer.c | 34 +++++++++++++++++++++++-----------
> include/linux/fs.h | 1 +
> 2 files changed, 24 insertions(+), 11 deletions(-)
>
> diff --git a/fs/buffer.c b/fs/buffer.c
> index b0b3792b1496..dba29a45346b 100644
> --- a/fs/buffer.c
> +++ b/fs/buffer.c
> @@ -477,12 +477,14 @@ EXPORT_SYMBOL(mark_buffer_async_write);
> * using RCU, grab the lock, verify we didn't race with somebody detaching the
> * bh / moving it to different inode and only then proceeding.
> */
> +#define INVALID_BLK (~0ULL)
>
> void mmb_init(struct mapping_metadata_bhs *mmb, struct address_space *mapping)
> {
> spin_lock_init(&mmb->lock);
> INIT_LIST_HEAD(&mmb->list);
> mmb->mapping = mapping;
> + mmb->inode_blk = INVALID_BLK;
> }
> EXPORT_SYMBOL(mmb_init);
>
> @@ -593,8 +595,18 @@ int mmb_sync(struct mapping_metadata_bhs *mmb)
> }
> }
> }
> -
> spin_unlock(&mmb->lock);
> +
> + /* Writeout inode buffer head */
> + if (mmb->inode_blk != INVALID_BLK) {
> + bh = sb_find_get_block(mmb->mapping->host->i_sb, mmb->inode_blk);
> + write_dirty_buffer(bh, REQ_SYNC);
> + wait_on_buffer(bh);
> + if (!buffer_uptodate(bh))
> + err = -EIO;
> + brelse(bh);
> + }
> +
> blk_finish_plug(&plug);
> spin_lock(&mmb->lock);
>
> @@ -646,18 +658,18 @@ int mmb_fsync_noflush(struct file *file, struct mapping_metadata_bhs *mmb,
> if (err)
> return err;
>
> - if (mmb)
> - ret = mmb_sync(mmb);
> if (!(inode_state_read_once(inode) & I_DIRTY_ALL))
> - goto out;
> + goto sync_buffers;
> if (datasync && !(inode_state_read_once(inode) & I_DIRTY_DATASYNC))
> - goto out;
> -
> - err = sync_inode_metadata(inode, 1);
> - if (ret == 0)
> - ret = err;
> -
> -out:
> + goto sync_buffers;
> +
> + ret = sync_inode_metadata(inode, 1);
> +sync_buffers:
> + if (mmb) {
> + err = mmb_sync(mmb);
> + if (ret == 0)
> + ret = err;
> + }
> /* check and advance again to catch errors after syncing out buffers */
> err = file_check_and_advance_wb_err(file);
> if (ret == 0)
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 11559c513dfb..435a41e4c90f 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -446,6 +446,7 @@ extern const struct address_space_operations empty_aops;
> /* Structure for tracking metadata buffer heads associated with the mapping */
> struct mapping_metadata_bhs {
> struct address_space *mapping; /* Mapping bhs are associated with */
> + sector_t inode_blk; /* Number of block containing the inode */
This is great, thanks!
^ permalink raw reply
* Re: [PATCH v2] iomap: add simple read path for small direct I/O
From: changfengnan @ 2026-05-11 12:36 UTC (permalink / raw)
To: Christoph Hellwig
Cc: brauner, djwong, hch, ojaswin, dgc, linux-xfs, linux-fsdevel,
linux-ext4, linux-kernel, lidiangang
In-Reply-To: <agHJoGlwx4MxTQbr@infradead.org>
> From: "Christoph Hellwig"<hch@infradead.org>
> Date: Mon, May 11, 2026, 20:21
> Subject: Re: [PATCH v2] iomap: add simple read path for small direct I/O
> To: "changfengnan"<changfengnan@bytedance.com>
> Cc: <brauner@kernel.org>, <djwong@kernel.org>, <hch@infradead.org>, <ojaswin@linux.ibm.com>, <dgc@kernel.org>, <linux-xfs@vger.kernel.org>, <linux-fsdevel@vger.kernel.org>, <linux-ext4@vger.kernel.org>, <linux-kernel@vger.kernel.org>, <lidiangang@bytedance.com>
> On Mon, May 11, 2026 at 08:09:51PM +0800, changfengnan wrote:
> > Ping.
>
> Assisted-by: AI slop is at the end of my todo list sorry.
Get.
>
> If you take all responsibily for it and understand what you are doing,
> please drop it. If not it'll need to wait until everyone else is
> served.
I fully understand what I’m editing, and I’ll take full responsibility for it.
I used AI to help me review and test this, and I’ll remove it in later version.
>
^ permalink raw reply
* Re: [PATCH v2] iomap: add simple read path for small direct I/O
From: Christoph Hellwig @ 2026-05-11 12:20 UTC (permalink / raw)
To: changfengnan
Cc: brauner, djwong, hch, ojaswin, dgc, linux-xfs, linux-fsdevel,
linux-ext4, linux-kernel, lidiangang
In-Reply-To: <d9210bcdf73fbe1ac8b6ec132865609a3ed68688.4a56b91c.8a12.4668.b740.76ff3633f48f@bytedance.com>
On Mon, May 11, 2026 at 08:09:51PM +0800, changfengnan wrote:
> Ping.
Assisted-by: AI slop is at the end of my todo list sorry.
If you take all responsibily for it and understand what you are doing,
please drop it. If not it'll need to wait until everyone else is
served.
^ permalink raw reply
* [PATCH 9/9] ext4: Use mmb infrastructure for inode buffer writeout
From: Jan Kara @ 2026-05-11 12:13 UTC (permalink / raw)
To: linux-fsdevel
Cc: Christian Brauner, aivazian.tigran, OGAWA Hirofumi, Ted Tso,
linux-ext4, Jan Kara
In-Reply-To: <20260511115725.28441-1-jack@suse.cz>
Use mmb inode buffer writeout infrastructure to reliably write out
inode's inode table block on fsync(2) in nojournal mode (from
ext4_sync_parent() and ext4_fsync_nojournal()). This significantly
simplifies the code as we don't have to explicitely handle inode buffer
writeback in ext4_write_inode() and thus we can also remove
sync_inode_metadata() calls from ext4_sync_parent() and
ext4_write_inode() call from ext4_fsync_nojournal().
Signed-off-by: Jan Kara <jack@suse.cz>
---
fs/ext4/ext4_jbd2.c | 2 +-
fs/ext4/ext4_jbd2.h | 2 ++
fs/ext4/fsync.c | 12 ------------
fs/ext4/inode.c | 24 +++++-------------------
4 files changed, 8 insertions(+), 32 deletions(-)
diff --git a/fs/ext4/ext4_jbd2.c b/fs/ext4/ext4_jbd2.c
index 74f05bd0cdde..6bbaf72108fd 100644
--- a/fs/ext4/ext4_jbd2.c
+++ b/fs/ext4/ext4_jbd2.c
@@ -350,7 +350,7 @@ int __ext4_journal_get_create_access(const char *where, unsigned int line,
return 0;
}
-static void ext4_inode_attach_mmb(struct inode *inode)
+void ext4_inode_attach_mmb(struct inode *inode)
{
struct mapping_metadata_bhs *mmb;
diff --git a/fs/ext4/ext4_jbd2.h b/fs/ext4/ext4_jbd2.h
index 63d17c5201b5..2a01b8279c88 100644
--- a/fs/ext4/ext4_jbd2.h
+++ b/fs/ext4/ext4_jbd2.h
@@ -122,6 +122,8 @@
#define EXT4_HT_EXT_CONVERT 11
#define EXT4_HT_MAX 12
+void ext4_inode_attach_mmb(struct inode *inode);
+
int
ext4_mark_iloc_dirty(handle_t *handle,
struct inode *inode,
diff --git a/fs/ext4/fsync.c b/fs/ext4/fsync.c
index e25d365e1179..af84489e57c6 100644
--- a/fs/ext4/fsync.c
+++ b/fs/ext4/fsync.c
@@ -75,9 +75,6 @@ static int ext4_sync_parent(struct inode *inode)
if (ret)
break;
}
- ret = sync_inode_metadata(inode, 1);
- if (ret)
- break;
}
dput(dentry);
return ret;
@@ -87,10 +84,6 @@ static int ext4_fsync_nojournal(struct file *file, loff_t start, loff_t end,
int datasync, bool *needs_barrier)
{
struct inode *inode = file->f_inode;
- struct writeback_control wbc = {
- .sync_mode = WB_SYNC_ALL,
- .nr_to_write = 0,
- };
int ret;
ret = mmb_fsync_noflush(file, EXT4_I(inode)->i_metadata_bhs,
@@ -98,11 +91,6 @@ static int ext4_fsync_nojournal(struct file *file, loff_t start, loff_t end,
if (ret)
return ret;
- /* Force writeout of inode table buffer to disk */
- ret = ext4_write_inode(inode, &wbc);
- if (ret)
- return ret;
-
ret = ext4_sync_parent(inode);
if (test_opt(inode->i_sb, BARRIER))
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 3e66e9510909..09506b4de1b2 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -5786,24 +5786,6 @@ int ext4_write_inode(struct inode *inode, struct writeback_control *wbc)
err = ext4_fc_commit(EXT4_SB(inode->i_sb)->s_journal,
EXT4_I(inode)->i_sync_tid);
- } else {
- struct ext4_iloc iloc;
-
- err = __ext4_get_inode_loc_noinmem(inode, &iloc);
- if (err)
- return err;
- /*
- * sync(2) will flush the whole buffer cache. No need to do
- * it here separately for each inode.
- */
- if (wbc->sync_mode == WB_SYNC_ALL && !wbc->for_sync)
- sync_dirty_buffer(iloc.bh);
- if (buffer_req(iloc.bh) && !buffer_uptodate(iloc.bh)) {
- ext4_error_inode_block(inode, iloc.bh->b_blocknr, EIO,
- "IO error syncing inode");
- err = -EIO;
- }
- brelse(iloc.bh);
}
return err;
}
@@ -6348,7 +6330,11 @@ int ext4_mark_iloc_dirty(handle_t *handle,
/* the do_update_inode consumes one bh->b_count */
get_bh(iloc->bh);
-
+ if (!ext4_handle_valid(handle)) {
+ if (!EXT4_I(inode)->i_metadata_bhs)
+ ext4_inode_attach_mmb(inode);
+ EXT4_I(inode)->i_metadata_bhs->inode_blk = iloc->bh->b_blocknr;
+ }
/* ext4_do_update_inode() does jbd2_journal_dirty_metadata */
err = ext4_do_update_inode(handle, inode, iloc);
put_bh(iloc->bh);
--
2.51.0
^ permalink raw reply related
* [PATCH 8/9] bfs: Fix possibly missing inode write on fsync(2)
From: Jan Kara @ 2026-05-11 12:13 UTC (permalink / raw)
To: linux-fsdevel
Cc: Christian Brauner, aivazian.tigran, OGAWA Hirofumi, Ted Tso,
linux-ext4, Jan Kara
In-Reply-To: <20260511115725.28441-1-jack@suse.cz>
Use mmb inode buffer writeout infrastructure to reliably write out
inode's buffer on fsync(2).
Signed-off-by: Jan Kara <jack@suse.cz>
---
fs/bfs/inode.c | 6 +-----
1 file changed, 1 insertion(+), 5 deletions(-)
diff --git a/fs/bfs/inode.c b/fs/bfs/inode.c
index 19e49c8cf750..16d351b2f122 100644
--- a/fs/bfs/inode.c
+++ b/fs/bfs/inode.c
@@ -165,11 +165,7 @@ static int bfs_write_inode(struct inode *inode, struct writeback_control *wbc)
di->i_eoffset = cpu_to_le32(i_sblock * BFS_BSIZE + inode->i_size - 1);
mark_buffer_dirty(bh);
- if (wbc->sync_mode == WB_SYNC_ALL) {
- sync_dirty_buffer(bh);
- if (buffer_req(bh) && !buffer_uptodate(bh))
- err = -EIO;
- }
+ BFS_I(inode)->i_metadata_bhs.inode_blk = bh->b_blocknr;
brelse(bh);
mutex_unlock(&info->bfs_lock);
return err;
--
2.51.0
^ permalink raw reply related
* [PATCH 7/9] minix: Fix possibly missing inode write on fsync(2)
From: Jan Kara @ 2026-05-11 12:13 UTC (permalink / raw)
To: linux-fsdevel
Cc: Christian Brauner, aivazian.tigran, OGAWA Hirofumi, Ted Tso,
linux-ext4, Jan Kara
In-Reply-To: <20260511115725.28441-1-jack@suse.cz>
Use mmb inode buffer writeout infrastructure to reliably write out
inode's buffer on fsync(2).
Signed-off-by: Jan Kara <jack@suse.cz>
---
fs/minix/inode.c | 9 +--------
1 file changed, 1 insertion(+), 8 deletions(-)
diff --git a/fs/minix/inode.c b/fs/minix/inode.c
index 9c6bac248907..e3e05c9308bd 100644
--- a/fs/minix/inode.c
+++ b/fs/minix/inode.c
@@ -693,14 +693,7 @@ static int minix_write_inode(struct inode *inode, struct writeback_control *wbc)
bh = V2_minix_update_inode(inode);
if (!bh)
return -EIO;
- if (wbc->sync_mode == WB_SYNC_ALL && buffer_dirty(bh)) {
- sync_dirty_buffer(bh);
- if (buffer_req(bh) && !buffer_uptodate(bh)) {
- printk("IO error syncing minix inode [%s:%08llx]\n",
- inode->i_sb->s_id, inode->i_ino);
- err = -EIO;
- }
- }
+ minix_i(inode)->i_metadata_bhs.inode_blk = bh->b_blocknr;
brelse (bh);
return err;
}
--
2.51.0
^ permalink raw reply related
* [PATCH 6/9] fat: Fix possibly missing inode write on fsync(2)
From: Jan Kara @ 2026-05-11 12:13 UTC (permalink / raw)
To: linux-fsdevel
Cc: Christian Brauner, aivazian.tigran, OGAWA Hirofumi, Ted Tso,
linux-ext4, Jan Kara
In-Reply-To: <20260511115725.28441-1-jack@suse.cz>
Use mmb inode buffer writeout infrastructure to reliably write out
inode's buffer on fsync(2).
Signed-off-by: Jan Kara <jack@suse.cz>
---
fs/fat/inode.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/fs/fat/inode.c b/fs/fat/inode.c
index 28f78df086ef..4ca00b7a618b 100644
--- a/fs/fat/inode.c
+++ b/fs/fat/inode.c
@@ -907,6 +907,7 @@ static int __fat_write_inode(struct inode *inode, int wait)
}
spin_unlock(&sbi->inode_hash_lock);
mark_buffer_dirty(bh);
+ MSDOS_I(inode)->i_metadata_bhs.inode_blk = bh->b_blocknr;
err = 0;
if (wait)
err = sync_dirty_buffer(bh);
@@ -925,7 +926,7 @@ static int fat_write_inode(struct inode *inode, struct writeback_control *wbc)
err = fat_clusters_flush(sb);
mutex_unlock(&MSDOS_SB(sb)->s_lock);
} else
- err = __fat_write_inode(inode, wbc->sync_mode == WB_SYNC_ALL);
+ err = __fat_write_inode(inode, 0);
return err;
}
--
2.51.0
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox