[f2fs-dev] [RFC PATCH 0/9] f2fs: Enable buffered read/write large folios support with extended iomap

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [f2fs-dev] [RFC PATCH 0/9] f2fs: Enable buffered read/write large folios support with extended iomap
@ 2025-08-13  9:21 Nanzhe Zhao
  2025-08-13  9:21 ` [RFC PATCH 1/9] f2fs: Introduce f2fs_iomap_folio_state Nanzhe Zhao
                   ` (9 more replies)
  0 siblings, 10 replies; 14+ messages in thread
From: Nanzhe Zhao @ 2025-08-13  9:21 UTC (permalink / raw)
  To: Jaegeuk Kim, linux-f2fs, linux-fsdevel
  Cc: Matthew Wilcox, Chao Yu, Yi Zhang, Barry Song, Nanzhe Zhao

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=y, Size: 6441 bytes --]

This RFC series enable buffered read/write paths large folio support
with F2FS-specific extended iomap, combined with some other preparation
work for large folio integration.

Because this is my first time to send a patch series to the kernel
mailing list,
I might have missed some conventions.
The patch passes checkpatch.pl with no errors,
but a few warnings remain.
I wasn't sure about the best way to address them,
so I would appreciate your guidance.
I am happy to fix them if needed.

Motivations:

* **Why iomap**:
  * F2FS couples pages directly to BIOs without a per-block tracking
    struct like buffer-head or sub-page.
    A naive large-folio port would cause:
    * Write-amplification.
    * Premature folio_end_read() / folio_end_writeback()
      when multi sub-ranges of a large folio are in io concurrently.
    Above issues has already been handled cleanly by iomap_folio_state.

  * Original buffered write path unlocks a folio halfway,causes status
    recheck for large folios carried with iomap_folio_state
    or partially trucnated folio tricky.iomap handles all locking
    unlocking operations automatically.

* **Why extends iomap**
  * F2FS stores its flags in the folio's private field,
    which conflicts with iomap_folio_state.
  * To resolve this, we designed f2fs_iomap_folio_state,
    compatible with iomap_folio_state's layout while extending
    its flexible state array for F2FS private flags.
  * We store a magic number in read_bytes_pending to distinguish
    whether a folio uses the original or F2FS's iomap_folio_state.
    It's chosen because it remains 0 after readahead completes.

Implementation notes:

* New Kconfig: `CONFIG_F2FS_IOMAP_FOLIO_STATE`; when off,falls back
  to the legacy buffered io path.

Limitations

* Don't support BLOCK_SIZE > PAGE_SIZE now.
* Don't support large folios for encrypted and fsverity files.
* Page writeback and compressed files large folios support is still WIP.

Why RFC:

* Need review and potential improvement on
 `f2fs_iomap_folio_state` design and implementation.
* Limited test coverage so far.Any extra testing is highly appreciated.
* Two runtime issues remain (see below).

Performance Testing:

* Platform: x86-64 laptop (PCIe 4.0 NVMe) -> qemu-arm64 VM, 4 GiB RAM
* Kernel: gcc-13.2, defconfig + `CONFIG_F2FS_IOMAP_FOLIO_STATE=y`
fio 3.35, `ioengine=psync`, `size=1G`, `numjobs=1`

Read throughput (MiB/s):

--- Kernel: iomap_v1 file type: normal ---
Block Size (bs)      | Avg. Bandwidth (MiB/s)       | Avg. IOPS
---------------------+------------------------------+-----------------
100M                 | 2809.60                      | 27.50
10M                  | 3184.60                      | 317.90
128k                 | 1376.20                      | 11000.80
1G                   | 1954.70                      | 1.20
1M                   | 2717.00                      | 2716.70
4k                   | 616.50                       | 157800.00

--- Kernel: vanilla  file type: normal ---
Block Size (bs)      | Avg. Bandwidth (MiB/s)       | Avg. IOPS
---------------------+------------------------------+-----------------
100M                 | 994.60                       | 9.60
10M                  | 986.50                       | 98.10
128k                 | 693.80                       | 5550.90
1G                   | 816.90                       | 0.00
1M                   | 968.90                       | 968.40
4k                   | 429.80                       | 109990.00
--- Kernel: iomap_v1 file type: hole ---
Block Size (bs)      | Avg. Bandwidth (MiB/s)       | Avg. IOPS
---------------------+------------------------------+-----------------
100M                 | 1825.60                      | 17.70
10M                  | 1989.24                      | 198.42
1G                   | 1312.80                      | 0.90
1M                   | 2326.02                      | 2325.42
4k                   | 799.40                       | 204700.00

--- Kernel: vanilla file type: hole ---
Block Size (bs)      | Avg. Bandwidth (MiB/s)       | Avg. IOPS
---------------------+------------------------------+-----------------
100M                 | 708.90                       | 6.50
10M                  | 735.00                       | 73.10
128k                 | 786.70                       | 6292.20
1G                   | 613.20                       | 0.00
1M                   | 764.50                       | 764.25
4k                   | 478.80                       | 122400.00

Sparse-file numbers on qemu look skewed; further bare-metal tests planned.

Write benchmarks are currently blocked by the issues below.

Known issues (help appreciated)

**Write throttling stalls**
  ```sh
  dd if=/dev/zero of=test.img bs=1G count=1 conv=fsync
  ```
  Write speed decays; task spins in `iomap_write_iter`
  ->`balance_dirty_pages_ratelimited_flags`.

**fsync dead-lock**
  ```sh
  fio --rw=write --bs=4K --fsync=1 --size=1G --ioengine=psync …
  ```
  Task Hangs in `f2fs_issue_flush`->'submit_bio_wait'

Full traces will be posted in a follow-up.

Nanzhe Zhao (9):
  f2fs: Introduce f2fs_iomap_folio_state
  f2fs: Integrate f2fs_iomap_folio_state into f2fs page private helpers
  f2fs: Using `folio_detach_f2fs_private` in invalidate and release
    folio
  f2fs: Convert outplace write path page private funcions to folio
    private functions.
  f2fs:Refactor `f2fs_is_compressed_page` to `f2fs_is_compressed_folio`
  f2fs: Extend f2fs_io_info to support sub-folio ranges
  f2fs:Make GC aware of large folios
  f2fs: Introduce F2FS_GET_BLOCK_IOMAP and map_blocks he lpers
  f2fs: Enable buffered read/write path large folios support for normal
    and atomic file with iomap

 fs/f2fs/Kconfig    |  10 ++
 fs/f2fs/Makefile   |   1 +
 fs/f2fs/compress.c |  11 +-
 fs/f2fs/data.c     | 389 ++++++++++++++++++++++++++++++++++++------
 fs/f2fs/f2fs.h     | 412 ++++++++++++++++++++++++++++++++++-----------
 fs/f2fs/f2fs_ifs.c | 221 ++++++++++++++++++++++++
 fs/f2fs/f2fs_ifs.h |  79 +++++++++
 fs/f2fs/file.c     |  33 +++-
 fs/f2fs/gc.c       |  37 ++--
 fs/f2fs/inline.c   |  15 +-
 fs/f2fs/inode.c    |  27 +++
 fs/f2fs/namei.c    |   7 +
 fs/f2fs/segment.c  |   2 +-
 fs/f2fs/super.c    |   3 +
 14 files changed, 1082 insertions(+), 165 deletions(-)
 create mode 100644 fs/f2fs/f2fs_ifs.c
 create mode 100644 fs/f2fs/f2fs_ifs.h


base-commit: b45116aef78ff0059abf563b339e62a734487a50
--
2.34.1


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [RFC PATCH 1/9] f2fs: Introduce f2fs_iomap_folio_state
  2025-08-13  9:21 [f2fs-dev] [RFC PATCH 0/9] f2fs: Enable buffered read/write large folios support with extended iomap Nanzhe Zhao
@ 2025-08-13  9:21 ` Nanzhe Zhao
  2025-08-13  9:21 ` [RFC PATCH 2/9] f2fs: Integrate f2fs_iomap_folio_state into f2fs page private helpers Nanzhe Zhao
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 14+ messages in thread
From: Nanzhe Zhao @ 2025-08-13  9:21 UTC (permalink / raw)
  To: Jaegeuk Kim, linux-f2fs, linux-fsdevel
  Cc: Matthew Wilcox, Chao Yu, Yi Zhang, Barry Song, Nanzhe Zhao

Add f2fs's own per-folio structure to track
per-block dirty state of a folio.

The reason for introducing this structure is that f2fs's private flag
would conflict with iomap_folio_state's use of the folio->private field.
Thanks to Mr. Matthew for providing the idea. See for details:
[https://lore.kernel.org/linux-f2fs-devel/Z-oPTUrF7kkhzJg_
@casper.infradead.org/]

The memory layout of this structure is the same as iomap_folio_state,
except that we set read_bytes_pending to a magic number. This is because
we need to be able to distinguish it from the original iomap_folio_state.
We additionally allocate an unsigned long at the end of the state array
to store f2fs-specific flags.

This implementation is compatible with high-order folios, order-0 folios,
and metadata folios.
However, it does not support compressed data folios.

Introduction to related functions:

- f2fs_ifs_alloc: Allocates f2fs's own f2fs_iomap_folio_state. If it
  detects that folio->private already has a value, we distinguish
  whether it is f2fs's own flag value or an iomap_folio_state. If it is
  the latter, we will copy its content to our f2fs_iomap_folio_state
  and then free it.

- folio_detach_f2fs_private: Serves as a unified interface to release
  f2fs's private resources, no matter what it is.

- f2fs_ifs_clear_range_uptodate && f2fs_ifs_set_range_dirty: Helper
  functions copied and slightly modified from fs/iomap.

- folio_get_f2fs_ifs: Specifically used to get f2fs_iomap_folio_state.
  It cannot be used to get f2fs's own fields used on compressed folios.
  For the former, we return a null pointer to indicate that the current
  folio does not hold an f2fs_iomap_folio_state. For the latter, we
  directly BUG_ON.

Signed-off-by: Nanzhe Zhao <nzzhao@126.com>
---
 fs/f2fs/Kconfig    |  10 ++
 fs/f2fs/Makefile   |   1 +
 fs/f2fs/f2fs_ifs.c | 221 +++++++++++++++++++++++++++++++++++++++++++++
 fs/f2fs/f2fs_ifs.h |  79 ++++++++++++++++
 4 files changed, 311 insertions(+)
 create mode 100644 fs/f2fs/f2fs_ifs.c
 create mode 100644 fs/f2fs/f2fs_ifs.h

diff --git a/fs/f2fs/Kconfig b/fs/f2fs/Kconfig
index 5916a02fb46d..480b8536fa39 100644
--- a/fs/f2fs/Kconfig
+++ b/fs/f2fs/Kconfig
@@ -150,3 +150,13 @@ config F2FS_UNFAIR_RWSEM
 	help
 	  Use unfair rw_semaphore, if system configured IO priority by block
 	  cgroup.
+
+config F2FS_IOMAP_FOLIO_STATE
+	bool "F2FS folio per-block I/O state tracking"
+	depends on F2FS_FS && FS_IOMAP
+	help
+	  Enable a custom F2FS structure for tracking the I/O state
+	  (up-to-date, dirty) on a per-block basis within a memory folio.
+	  This structure stores F2FS private flag in its state flexible
+	  array while keeping compatibility with generic iomap_folio_state.
+	  Must be enabled if using iomap large folios support in F2FS.
\ No newline at end of file
diff --git a/fs/f2fs/Makefile b/fs/f2fs/Makefile
index 8a7322d229e4..3b9270d774e8 100644
--- a/fs/f2fs/Makefile
+++ b/fs/f2fs/Makefile
@@ -10,3 +10,4 @@ f2fs-$(CONFIG_F2FS_FS_POSIX_ACL) += acl.o
 f2fs-$(CONFIG_FS_VERITY) += verity.o
 f2fs-$(CONFIG_F2FS_FS_COMPRESSION) += compress.o
 f2fs-$(CONFIG_F2FS_IOSTAT) += iostat.o
+f2fs-$(CONFIG_F2FS_IOMAP_FOLIO_STATE) += f2fs_ifs.o
diff --git a/fs/f2fs/f2fs_ifs.c b/fs/f2fs/f2fs_ifs.c
new file mode 100644
index 000000000000..6b7503474580
--- /dev/null
+++ b/fs/f2fs/f2fs_ifs.c
@@ -0,0 +1,221 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/fs.h>
+#include <linux/f2fs_fs.h>
+
+#include "f2fs.h"
+#include "f2fs_ifs.h"
+
+/*
+ * Have to set parameter ifs's type to void*
+ * and have to interpret ifs as f2fs_ifs to access its fields because
+ * we cannot see iomap_folio_state definition
+ */
+static void ifs_to_f2fs_ifs(void *ifs, struct f2fs_iomap_folio_state *fifs,
+			    struct folio *folio)
+{
+	struct f2fs_iomap_folio_state *src_ifs =
+		(struct f2fs_iomap_folio_state *)ifs;
+	size_t iomap_longs = f2fs_ifs_iomap_longs(folio);
+
+	fifs->read_bytes_pending = READ_ONCE(src_ifs->read_bytes_pending);
+	atomic_set(&fifs->write_bytes_pending,
+		   atomic_read(&src_ifs->write_bytes_pending));
+	memcpy(fifs->state, src_ifs->state,
+	       iomap_longs * sizeof(unsigned long));
+}
+
+static inline bool is_f2fs_ifs(struct folio *folio)
+{
+	struct f2fs_iomap_folio_state *fifs;
+
+	if (!folio_test_private(folio))
+		return false;
+
+	// first directly test no pointer flag is set or not
+	if (test_bit(PAGE_PRIVATE_NOT_POINTER,
+		     (unsigned long *)&folio->private))
+		return false;
+
+	fifs = (struct f2fs_iomap_folio_state *)folio->private;
+	if (!fifs)
+		return false;
+
+	if (READ_ONCE(fifs->read_bytes_pending) == F2FS_IFS_MAGIC)
+		return true;
+
+	return false;
+}
+
+struct f2fs_iomap_folio_state *f2fs_ifs_alloc(struct folio *folio, gfp_t gfp,
+					      bool force_alloc)
+{
+	struct inode *inode = folio->mapping->host;
+	size_t alloc_size = 0;
+
+	if (!folio_test_large(folio)) {
+		if (!force_alloc) {
+			WARN_ON_ONCE(1);
+			return NULL;
+		}
+		/*
+		 * GC can store private flag in 0 order folio's folio->private
+		 * causes iomap buffered write mistakenly interpret as a pointer
+		 * we add a bool force_alloc to deal with this case
+		 */
+		struct f2fs_iomap_folio_state *fifs;
+
+		alloc_size = sizeof(*fifs) + 2 * sizeof(unsigned long);
+		fifs = kmalloc(alloc_size, gfp);
+		if (!fifs)
+			return NULL;
+		spin_lock_init(&fifs->state_lock);
+		WRITE_ONCE(fifs->read_bytes_pending, F2FS_IFS_MAGIC);
+		atomic_set(&fifs->write_bytes_pending, 0);
+		unsigned int nr_blocks =
+			i_blocks_per_folio(inode, folio);
+		if (folio_test_uptodate(folio))
+			bitmap_set(fifs->state, 0, nr_blocks);
+		if (folio_test_dirty(folio))
+			bitmap_set(fifs->state, nr_blocks, nr_blocks);
+		*f2fs_ifs_private_flags_ptr(fifs, folio) = 0;
+		folio_attach_private(folio, fifs);
+		return fifs;
+	}
+
+	struct f2fs_iomap_folio_state *fifs;
+	void *old_private;
+	size_t iomap_longs;
+	size_t total_longs;
+
+	WARN_ON_ONCE(!inode); // Should have an inode
+
+	old_private = folio_get_private(folio);
+
+	if (old_private) {
+		// Check if it's already our type using the magic number directly
+		if (READ_ONCE(((struct f2fs_iomap_folio_state *)old_private)
+				      ->read_bytes_pending) == F2FS_IFS_MAGIC) {
+			return (struct f2fs_iomap_folio_state *)
+				old_private; // Already ours
+		}
+		// Non-NULL, not ours -> Allocate, Copy, Replace path
+		total_longs = f2fs_ifs_total_longs(folio);
+		alloc_size = sizeof(*fifs) +
+				total_longs * sizeof(unsigned long);
+
+		fifs = kmalloc(alloc_size, gfp);
+		if (!fifs)
+			return NULL;
+
+		spin_lock_init(&fifs->state_lock);
+		*f2fs_ifs_private_flags_ptr(fifs, folio) = 0;
+		// Copy data from the presumed iomap_folio_state (old_private)
+		ifs_to_f2fs_ifs(old_private, fifs, folio);
+		WRITE_ONCE(fifs->read_bytes_pending, F2FS_IFS_MAGIC);
+		folio_change_private(folio, fifs);
+		kfree(old_private);
+		return fifs;
+	}
+
+	iomap_longs = f2fs_ifs_iomap_longs(folio);
+	total_longs = iomap_longs + 1;
+	alloc_size =
+		sizeof(*fifs) + total_longs * sizeof(unsigned long);
+
+	fifs = kzalloc(alloc_size, gfp);
+	if (!fifs)
+		return NULL;
+
+	spin_lock_init(&fifs->state_lock);
+
+	unsigned int nr_blocks = i_blocks_per_folio(inode, folio);
+
+	if (folio_test_uptodate(folio))
+		bitmap_set(fifs->state, 0, nr_blocks);
+	if (folio_test_dirty(folio))
+		bitmap_set(fifs->state, nr_blocks, nr_blocks);
+	WRITE_ONCE(fifs->read_bytes_pending, F2FS_IFS_MAGIC);
+	atomic_set(&fifs->write_bytes_pending, 0);
+	folio_attach_private(folio, fifs);
+	return fifs;
+}
+
+void folio_detach_f2fs_private(struct folio *folio)
+{
+	struct f2fs_iomap_folio_state *fifs;
+
+	if (!folio_test_private(folio))
+		return;
+
+	// Check if it's using direct flags
+	if (test_bit(PAGE_PRIVATE_NOT_POINTER,
+		     (unsigned long *)&folio->private)) {
+		folio_detach_private(folio);
+		return;
+	}
+
+	fifs = folio_detach_private(folio);
+	if (!fifs)
+		return;
+
+	if (is_f2fs_ifs(folio)) {
+		WARN_ON_ONCE(READ_ONCE(fifs->read_bytes_pending) !=
+			     F2FS_IFS_MAGIC);
+		WARN_ON_ONCE(atomic_read(&fifs->write_bytes_pending));
+	} else {
+		WARN_ON_ONCE(READ_ONCE(fifs->read_bytes_pending) != 0);
+		WARN_ON_ONCE(atomic_read(&fifs->write_bytes_pending));
+	}
+
+	kfree(fifs);
+}
+
+struct f2fs_iomap_folio_state *folio_get_f2fs_ifs(struct folio *folio)
+{
+	if (!folio_test_private(folio))
+		return NULL;
+
+	if (test_bit(PAGE_PRIVATE_NOT_POINTER,
+		     (unsigned long *)&folio->private))
+		return NULL;
+	/*
+	 * Note we assume folio->private can be either ifs or f2fs_ifs here.
+	 * Compresssed folios should not call this function
+	 */
+	f2fs_bug_on(F2FS_F_SB(folio),
+		    *((u32 *)folio->private) == F2FS_COMPRESSED_PAGE_MAGIC);
+	return folio->private;
+}
+
+void f2fs_ifs_clear_range_uptodate(struct folio *folio,
+				   struct f2fs_iomap_folio_state *fifs,
+				   size_t off, size_t len)
+{
+	struct inode *inode = folio->mapping->host;
+	unsigned int first_blk = (off >> inode->i_blkbits);
+	unsigned int last_blk = (off + len - 1) >> inode->i_blkbits;
+	unsigned int nr_blks = last_blk - first_blk + 1;
+	unsigned long flags;
+
+	spin_lock_irqsave(&fifs->state_lock, flags);
+	bitmap_clear(fifs->state, first_blk, nr_blks);
+	spin_unlock_irqrestore(&fifs->state_lock, flags);
+}
+
+void f2fs_iomap_set_range_dirty(struct folio *folio, size_t off, size_t len)
+{
+	struct f2fs_iomap_folio_state *fifs = folio_get_f2fs_ifs(folio);
+
+	if (fifs) {
+		struct inode *inode = folio->mapping->host;
+		unsigned int blks_per_folio = i_blocks_per_folio(inode, folio);
+		unsigned int first_blk = (off >> inode->i_blkbits);
+		unsigned int last_blk = (off + len - 1) >> inode->i_blkbits;
+		unsigned int nr_blks = last_blk - first_blk + 1;
+		unsigned long flags;
+
+		spin_lock_irqsave(&fifs->state_lock, flags);
+		bitmap_set(fifs->state, first_blk + blks_per_folio, nr_blks);
+		spin_unlock_irqrestore(&fifs->state_lock, flags);
+	}
+}
diff --git a/fs/f2fs/f2fs_ifs.h b/fs/f2fs/f2fs_ifs.h
new file mode 100644
index 000000000000..3b16deda8a1e
--- /dev/null
+++ b/fs/f2fs/f2fs_ifs.h
@@ -0,0 +1,79 @@
+// SPDX-License-Identifier: GPL-2.0
+#ifndef F2FS_IFS_H
+#define F2FS_IFS_H
+
+#include <linux/fs.h>
+#include <linux/bug.h>
+#include <linux/f2fs_fs.h>
+#include <linux/mm.h>
+#include <linux/iomap.h>
+#include <linux/slab.h>
+#include <linux/spinlock.h>
+#include <linux/atomic.h>
+
+#include "f2fs.h"
+
+#define F2FS_IFS_MAGIC 0xf2f5
+#define F2FS_IFS_PRIVATE_LONGS 1
+
+/*
+ * F2FS structure for folio private data, mimicking iomap_folio_state layout.
+ * F2FS private flags/data are stored in extra space allocated at the end
+ */
+struct f2fs_iomap_folio_state {
+	spinlock_t state_lock;
+	unsigned int read_bytes_pending;
+	atomic_t write_bytes_pending;
+	/*
+	 * Flexible array member.
+	 * Holds [0...iomap_longs-1] for iomap uptodate/dirty bits.
+	 * Holds [iomap_longs] for F2FS private flags/data (unsigned long).
+	 */
+	unsigned long state[];
+};
+
+static inline bool
+f2fs_ifs_block_is_uptodate(struct f2fs_iomap_folio_state *ifs,
+			   unsigned int block)
+{
+	return test_bit(block, ifs->state);
+}
+
+static inline size_t f2fs_ifs_iomap_longs(const struct folio *folio)
+{
+	struct inode *inode = folio->mapping->host;
+
+	WARN_ON_ONCE(!inode);
+	unsigned int nr_blocks =
+		i_blocks_per_folio(inode, (struct folio *)folio);
+	return BITS_TO_LONGS(2 * nr_blocks);
+}
+
+static inline size_t f2fs_ifs_total_longs(struct folio *folio)
+{
+	return f2fs_ifs_iomap_longs(folio) + F2FS_IFS_PRIVATE_LONGS;
+}
+
+static inline unsigned long *
+f2fs_ifs_private_flags_ptr(struct f2fs_iomap_folio_state *fifs,
+			   const struct folio *folio)
+{
+	return &fifs->state[f2fs_ifs_iomap_longs(folio)];
+}
+
+struct f2fs_iomap_folio_state *f2fs_ifs_alloc(struct folio *folio, gfp_t gfp,
+					      bool force_alloc);
+void folio_detach_f2fs_private(struct folio *folio);
+struct f2fs_iomap_folio_state *folio_get_f2fs_ifs(struct folio *folio);
+
+/*
+ * 0-order and fully dirty folio has no fifs
+ * they store private flag directly in their folio->private field
+ * as original f2fs page private behaviour
+ */
+void f2fs_ifs_clear_range_uptodate(struct folio *folio,
+				   struct f2fs_iomap_folio_state *fifs,
+				   size_t off, size_t len);
+void f2fs_iomap_set_range_dirty(struct folio *folio, size_t off, size_t len);
+
+#endif /* F2FS_IFS_H */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC PATCH 2/9] f2fs: Integrate f2fs_iomap_folio_state into f2fs page private helpers
  2025-08-13  9:21 [f2fs-dev] [RFC PATCH 0/9] f2fs: Enable buffered read/write large folios support with extended iomap Nanzhe Zhao
  2025-08-13  9:21 ` [RFC PATCH 1/9] f2fs: Introduce f2fs_iomap_folio_state Nanzhe Zhao
@ 2025-08-13  9:21 ` Nanzhe Zhao
  2025-08-13  9:21 ` [RFC PATCH 3/9] f2fs: Using `folio_detach_f2fs_private` in invalidate and release folio Nanzhe Zhao
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 14+ messages in thread
From: Nanzhe Zhao @ 2025-08-13  9:21 UTC (permalink / raw)
  To: Jaegeuk Kim, linux-f2fs, linux-fsdevel
  Cc: Matthew Wilcox, Chao Yu, Yi Zhang, Barry Song, Nanzhe Zhao

Integrate f2fs_iomap_folio_state into the f2fs page private helper
functions.

In these functions, we adopt a two-stage strategy to handle the
folio->private field, now supporting both direct bit flags and the
new f2fs_iomap_folio_state pointer.

Note that my implementation does not rely on checking the folio's
order to distinguish whether the folio's private field stores
a flag or an f2fs_iomap_folio_state.
This is because in the folio_set_f2fs_xxx
functions, we will forcibly allocate an f2fs_iomap_folio_state
struct even for order-0 folios under certain conditions.

The reason for doing this is that if an order-0 folio's private field
is set to an f2fs private flag by a thread like gc, the generic
iomap_folio_state helper functions used in iomap buffered write will
mistakenly interpret it as an iomap_folio_state pointer.
We cannot, or rather should not, modify fs/iomap to make it recognize
f2fs's private flags.
Therefore, for now, I have to uniformly allocate an
f2fs_iomap_folio_state for all folios that will need to store an
f2fs private flag to ensure correctness.

I am also thinking about other ways to eliminate the extra memory
overhead this approach introduces. Suggestions would be grateful.

Signed-off-by: Nanzhe Zhao <nzzhao@126.com>
---
 fs/f2fs/f2fs.h | 278 +++++++++++++++++++++++++++++++++++++++----------
 1 file changed, 225 insertions(+), 53 deletions(-)

diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 8df0443dd189..a14bef4dc394 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -27,7 +27,10 @@
 
 #include <linux/fscrypt.h>
 #include <linux/fsverity.h>
-
+#include <linux/iomap.h>
+#ifdef CONFIG_F2FS_IOMAP_FOLIO_STATE
+#include "f2fs_ifs.h"
+#endif
 struct pagevec;
 
 #ifdef CONFIG_F2FS_CHECK_FS
@@ -2509,58 +2512,227 @@ static inline int inc_valid_block_count(struct f2fs_sb_info *sbi,
 	return -ENOSPC;
 }
 
-#define PAGE_PRIVATE_GET_FUNC(name, flagname) \
-static inline bool folio_test_f2fs_##name(const struct folio *folio)	\
-{									\
-	unsigned long priv = (unsigned long)folio->private;		\
-	unsigned long v = (1UL << PAGE_PRIVATE_NOT_POINTER) |		\
-			     (1UL << PAGE_PRIVATE_##flagname);		\
-	return (priv & v) == v;						\
-}									\
-static inline bool page_private_##name(struct page *page) \
-{ \
-	return PagePrivate(page) && \
-		test_bit(PAGE_PRIVATE_NOT_POINTER, &page_private(page)) && \
-		test_bit(PAGE_PRIVATE_##flagname, &page_private(page)); \
-}
-
-#define PAGE_PRIVATE_SET_FUNC(name, flagname) \
-static inline void folio_set_f2fs_##name(struct folio *folio)		\
-{									\
-	unsigned long v = (1UL << PAGE_PRIVATE_NOT_POINTER) |		\
-			     (1UL << PAGE_PRIVATE_##flagname);		\
-	if (!folio->private)						\
-		folio_attach_private(folio, (void *)v);			\
-	else {								\
-		v |= (unsigned long)folio->private;			\
-		folio->private = (void *)v;				\
-	}								\
-}									\
-static inline void set_page_private_##name(struct page *page) \
-{ \
-	if (!PagePrivate(page)) \
-		attach_page_private(page, (void *)0); \
-	set_bit(PAGE_PRIVATE_NOT_POINTER, &page_private(page)); \
-	set_bit(PAGE_PRIVATE_##flagname, &page_private(page)); \
-}
-
-#define PAGE_PRIVATE_CLEAR_FUNC(name, flagname) \
-static inline void folio_clear_f2fs_##name(struct folio *folio)		\
-{									\
-	unsigned long v = (unsigned long)folio->private;		\
-									\
-	v &= ~(1UL << PAGE_PRIVATE_##flagname);				\
-	if (v == (1UL << PAGE_PRIVATE_NOT_POINTER))			\
-		folio_detach_private(folio);				\
-	else								\
-		folio->private = (void *)v;				\
-}									\
-static inline void clear_page_private_##name(struct page *page) \
-{ \
-	clear_bit(PAGE_PRIVATE_##flagname, &page_private(page)); \
-	if (page_private(page) == BIT(PAGE_PRIVATE_NOT_POINTER)) \
-		detach_page_private(page); \
+extern bool f2fs_should_use_buffered_iomap(struct inode *inode);
+#ifdef CONFIG_F2FS_IOMAP_FOLIO_STATE
+#define F2FS_FOLIO_PRIVATE_GET_FUNC(name, flagname)                            \
+	static inline bool folio_test_f2fs_##name(const struct folio *folio)   \
+	{                                                                      \
+		/* First try direct folio->private access for meta folio */    \
+		if (folio_test_private(folio) &&                               \
+		    test_bit(PAGE_PRIVATE_NOT_POINTER,                         \
+			     (unsigned long *)&folio->private)) {              \
+			return test_bit(PAGE_PRIVATE_##flagname,               \
+					(unsigned long *)&folio->private);     \
+		}                                                              \
+		/* For higher-order folios, use iomap folio state */           \
+		struct f2fs_iomap_folio_state *fifs =                          \
+			(struct f2fs_iomap_folio_state *)folio->private;       \
+		unsigned long *private_p;                                      \
+		if (unlikely(!fifs || !folio->mapping))                        \
+			return false;                                          \
+		/* Check magic number before accessing private data */         \
+		if (READ_ONCE(fifs->read_bytes_pending) != F2FS_IFS_MAGIC)     \
+			return false;                                          \
+		private_p = f2fs_ifs_private_flags_ptr(fifs, folio);           \
+		if (!private_p)                                                \
+			return false;                                          \
+		/* Test bits directly on the 'private' slot */                 \
+		return test_bit(PAGE_PRIVATE_##flagname, private_p);           \
+	}                                                                      \
+	static inline bool page_private_##name(struct page *page)              \
+	{                                                                      \
+		return PagePrivate(page) &&                                    \
+		       test_bit(PAGE_PRIVATE_NOT_POINTER,                      \
+				&page_private(page)) &&                        \
+		       test_bit(PAGE_PRIVATE_##flagname, &page_private(page)); \
+	}
+#define F2FS_FOLIO_PRIVATE_SET_FUNC(name, flagname)                              \
+	static inline int folio_set_f2fs_##name(struct folio *folio)             \
+	{                                                                        \
+		/* For higher-order folios, use iomap folio state */             \
+		if (unlikely(!folio->mapping))                                   \
+			return -ENOENT;                                          \
+		bool force_alloc =                                               \
+			f2fs_should_use_buffered_iomap(folio_inode(folio));      \
+		if (!force_alloc && !folio_test_private(folio)) {                \
+			folio_attach_private(folio, (void *)0);                  \
+			set_bit(PAGE_PRIVATE_NOT_POINTER,                        \
+				(unsigned long *)&folio->private);               \
+			set_bit(PAGE_PRIVATE_##flagname,                         \
+				(unsigned long *)&folio->private);               \
+			return 0;                                                \
+		}                                                                \
+		struct f2fs_iomap_folio_state *fifs =                            \
+			f2fs_ifs_alloc(folio, GFP_NOFS, true);                   \
+		if (unlikely(!fifs))                                             \
+			return -ENOMEM;                                          \
+		unsigned long *private_p;                                        \
+		WRITE_ONCE(fifs->read_bytes_pending, F2FS_IFS_MAGIC);            \
+		private_p = f2fs_ifs_private_flags_ptr(fifs, folio);             \
+		if (!private_p)                                                  \
+			return -EINVAL;                                          \
+		/* Set the bit atomically */                                     \
+		set_bit(PAGE_PRIVATE_##flagname, private_p);                     \
+		/* Ensure NOT_POINTER bit is also set if any F2FS flag is set */ \
+		if (PAGE_PRIVATE_##flagname != PAGE_PRIVATE_NOT_POINTER)         \
+			set_bit(PAGE_PRIVATE_NOT_POINTER, private_p);            \
+		return 0;                                                        \
+	}                                                                        \
+	static inline void set_page_private_##name(struct page *page)            \
+	{                                                                        \
+		if (!PagePrivate(page))                                          \
+			attach_page_private(page, (void *)0);                    \
+		set_bit(PAGE_PRIVATE_NOT_POINTER, &page_private(page));          \
+		set_bit(PAGE_PRIVATE_##flagname, &page_private(page));           \
+	}
+
+#define F2FS_FOLIO_PRIVATE_CLEAR_FUNC(name, flagname)                      \
+	static inline void folio_clear_f2fs_##name(struct folio *folio)    \
+	{                                                                  \
+		/* First try direct folio->private access */               \
+		if (folio_test_private(folio) &&                           \
+		    test_bit(PAGE_PRIVATE_NOT_POINTER,                     \
+			     (unsigned long *)&folio->private)) {          \
+			clear_bit(PAGE_PRIVATE_##flagname,                 \
+				  (unsigned long *)&folio->private);       \
+			folio_detach_private(folio);                       \
+			return;                                            \
+		}                                                          \
+		/* For higher-order folios, use iomap folio state */       \
+		struct f2fs_iomap_folio_state *fifs =                      \
+			(struct f2fs_iomap_folio_state *)folio->private;   \
+		unsigned long *private_p;                                  \
+		if (unlikely(!fifs || !folio->mapping))                    \
+			return;                                            \
+		/* Check magic number before clearing */                   \
+		if (READ_ONCE(fifs->read_bytes_pending) != F2FS_IFS_MAGIC) \
+			return; /* Not ours or state unclear */            \
+		private_p = f2fs_ifs_private_flags_ptr(fifs, folio);       \
+		if (!private_p)                                            \
+			return;                                            \
+		clear_bit(PAGE_PRIVATE_##flagname, private_p);             \
+	}                                                                  \
+	static inline void clear_page_private_##name(struct page *page)    \
+	{                                                                  \
+		clear_bit(PAGE_PRIVATE_##flagname, &page_private(page));   \
+		if (page_private(page) == BIT(PAGE_PRIVATE_NOT_POINTER))   \
+			detach_page_private(page);                         \
+	}
+// Generate the accessor functions using the macros
+F2FS_FOLIO_PRIVATE_GET_FUNC(nonpointer, NOT_POINTER);
+F2FS_FOLIO_PRIVATE_GET_FUNC(inline, INLINE_INODE);
+F2FS_FOLIO_PRIVATE_GET_FUNC(gcing, ONGOING_MIGRATION);
+F2FS_FOLIO_PRIVATE_GET_FUNC(atomic, ATOMIC_WRITE);
+F2FS_FOLIO_PRIVATE_GET_FUNC(reference, REF_RESOURCE);
+
+F2FS_FOLIO_PRIVATE_SET_FUNC(reference, REF_RESOURCE);
+F2FS_FOLIO_PRIVATE_SET_FUNC(inline, INLINE_INODE);
+F2FS_FOLIO_PRIVATE_SET_FUNC(gcing, ONGOING_MIGRATION);
+F2FS_FOLIO_PRIVATE_SET_FUNC(atomic, ATOMIC_WRITE);
+
+F2FS_FOLIO_PRIVATE_CLEAR_FUNC(reference, REF_RESOURCE);
+F2FS_FOLIO_PRIVATE_CLEAR_FUNC(inline, INLINE_INODE);
+F2FS_FOLIO_PRIVATE_CLEAR_FUNC(gcing, ONGOING_MIGRATION);
+F2FS_FOLIO_PRIVATE_CLEAR_FUNC(atomic, ATOMIC_WRITE);
+static inline int folio_set_f2fs_data(struct folio *folio, unsigned long data)
+{
+	if (unlikely(!folio->mapping))
+		return -ENOENT;
+
+	struct f2fs_iomap_folio_state *fifs =
+		f2fs_ifs_alloc(folio, GFP_NOFS, true);
+	if (unlikely(!fifs))
+		return -ENOMEM;
+
+	unsigned long *private_p;
+
+	private_p = f2fs_ifs_private_flags_ptr(fifs, folio);
+	if (!private_p)
+		return -EINVAL;
+
+	*private_p &= GENMASK(PAGE_PRIVATE_MAX - 1, 0);
+	*private_p |= (data << PAGE_PRIVATE_MAX);
+	set_bit(PAGE_PRIVATE_NOT_POINTER, private_p);
+
+	return 0;
 }
+static inline unsigned long folio_get_f2fs_data(struct folio *folio)
+{
+	struct f2fs_iomap_folio_state *fifs =
+		(struct f2fs_iomap_folio_state *)folio->private;
+	unsigned long *private_p;
+	unsigned long data_val;
+
+	if (!folio->mapping)
+		return 0;
+	f2fs_bug_on(F2FS_I_SB(folio_inode(folio)), !fifs);
+	if (READ_ONCE(fifs->read_bytes_pending) != F2FS_IFS_MAGIC)
+		return 0;
+
+	private_p = f2fs_ifs_private_flags_ptr(fifs, folio);
+	if (!private_p)
+		return 0;
+
+	data_val = READ_ONCE(*private_p);
+
+	if (!test_bit(PAGE_PRIVATE_NOT_POINTER, &data_val))
+		return 0;
+
+	return data_val >> PAGE_PRIVATE_MAX;
+}
+#else
+#define PAGE_PRIVATE_GET_FUNC(name, flagname)                                  \
+	static inline bool folio_test_f2fs_##name(const struct folio *folio)   \
+	{                                                                      \
+		unsigned long priv = (unsigned long)folio->private;            \
+		unsigned long v = (1UL << PAGE_PRIVATE_NOT_POINTER) |          \
+				  (1UL << PAGE_PRIVATE_##flagname);            \
+		return (priv & v) == v;                                        \
+	}                                                                      \
+	static inline bool page_private_##name(struct page *page)              \
+	{                                                                      \
+		return PagePrivate(page) &&                                    \
+		       test_bit(PAGE_PRIVATE_NOT_POINTER,                      \
+				&page_private(page)) &&                        \
+		       test_bit(PAGE_PRIVATE_##flagname, &page_private(page)); \
+	}
+
+#define PAGE_PRIVATE_SET_FUNC(name, flagname)                           \
+	static inline void folio_set_f2fs_##name(struct folio *folio)   \
+	{                                                               \
+		unsigned long v = (1UL << PAGE_PRIVATE_NOT_POINTER) |   \
+				  (1UL << PAGE_PRIVATE_##flagname);     \
+		if (!folio->private)                                    \
+			folio_attach_private(folio, (void *)v);         \
+		else {                                                  \
+			v |= (unsigned long)folio->private;             \
+			folio->private = (void *)v;                     \
+		}                                                       \
+	}                                                               \
+	static inline void set_page_private_##name(struct page *page)   \
+	{                                                               \
+		if (!PagePrivate(page))                                 \
+			attach_page_private(page, (void *)0);           \
+		set_bit(PAGE_PRIVATE_NOT_POINTER, &page_private(page)); \
+		set_bit(PAGE_PRIVATE_##flagname, &page_private(page));  \
+	}
+
+#define PAGE_PRIVATE_CLEAR_FUNC(name, flagname)                          \
+	static inline void folio_clear_f2fs_##name(struct folio *folio)  \
+	{                                                                \
+		unsigned long v = (unsigned long)folio->private;         \
+		v &= ~(1UL << PAGE_PRIVATE_##flagname);                  \
+		if (v == (1UL << PAGE_PRIVATE_NOT_POINTER))              \
+			folio_detach_private(folio);                     \
+		else                                                     \
+			folio->private = (void *)v;                      \
+	}                                                                \
+	static inline void clear_page_private_##name(struct page *page)  \
+	{                                                                \
+		clear_bit(PAGE_PRIVATE_##flagname, &page_private(page)); \
+		if (page_private(page) == BIT(PAGE_PRIVATE_NOT_POINTER)) \
+			detach_page_private(page);                       \
+	}
 
 PAGE_PRIVATE_GET_FUNC(nonpointer, NOT_POINTER);
 PAGE_PRIVATE_GET_FUNC(inline, INLINE_INODE);
@@ -2595,7 +2767,7 @@ static inline void folio_set_f2fs_data(struct folio *folio, unsigned long data)
 	else
 		folio->private = (void *)((unsigned long)folio->private | data);
 }
-
+#endif
 static inline void dec_valid_block_count(struct f2fs_sb_info *sbi,
 						struct inode *inode,
 						block_t count)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC PATCH 3/9] f2fs: Using `folio_detach_f2fs_private` in invalidate and release folio
  2025-08-13  9:21 [f2fs-dev] [RFC PATCH 0/9] f2fs: Enable buffered read/write large folios support with extended iomap Nanzhe Zhao
  2025-08-13  9:21 ` [RFC PATCH 1/9] f2fs: Introduce f2fs_iomap_folio_state Nanzhe Zhao
  2025-08-13  9:21 ` [RFC PATCH 2/9] f2fs: Integrate f2fs_iomap_folio_state into f2fs page private helpers Nanzhe Zhao
@ 2025-08-13  9:21 ` Nanzhe Zhao
  2025-08-13  9:21 ` [RFC PATCH 4/9] f2fs: Convert outplace write path page private funcions to folio private functions Nanzhe Zhao
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 14+ messages in thread
From: Nanzhe Zhao @ 2025-08-13  9:21 UTC (permalink / raw)
  To: Jaegeuk Kim, linux-f2fs, linux-fsdevel
  Cc: Matthew Wilcox, Chao Yu, Yi Zhang, Barry Song, Nanzhe Zhao

Since `folio_detach_f2fs_private` can handle all case for a
folio to free it's private date , intergrate it as a subtitute
for `folio_detach_private`.

Signed-off-by: Nanzhe Zhao <nzzhao@126.com>
---
 fs/f2fs/data.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index ed1174430827..415f51602492 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -3748,7 +3748,16 @@ void f2fs_invalidate_folio(struct folio *folio, size_t offset, size_t length)
 			f2fs_remove_dirty_inode(inode);
 		}
 	}
+#ifdef CONFIG_F2FS_IOMAP_FOLIO_STATE
+	/* Same to iomap_invalidate_folio*/
+	if (offset == 0 && length == folio_size(folio)) {
+		WARN_ON_ONCE(folio_test_writeback(folio));
+		folio_cancel_dirty(folio);
+		folio_detach_f2fs_private(folio);
+	}
+#else
 	folio_detach_private(folio);
+#endif
 }
 
 bool f2fs_release_folio(struct folio *folio, gfp_t wait)
@@ -3757,7 +3766,11 @@ bool f2fs_release_folio(struct folio *folio, gfp_t wait)
 	if (folio_test_dirty(folio))
 		return false;
 
+#ifdef CONFIG_F2FS_IOMAP_FOLIO_STATE
+	folio_detach_f2fs_private(folio);
+#else
 	folio_detach_private(folio);
+#endif
 	return true;
 }
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC PATCH 4/9] f2fs: Convert outplace write path page private funcions to folio private functions.
  2025-08-13  9:21 [f2fs-dev] [RFC PATCH 0/9] f2fs: Enable buffered read/write large folios support with extended iomap Nanzhe Zhao
                   ` (2 preceding siblings ...)
  2025-08-13  9:21 ` [RFC PATCH 3/9] f2fs: Using `folio_detach_f2fs_private` in invalidate and release folio Nanzhe Zhao
@ 2025-08-13  9:21 ` Nanzhe Zhao
  2025-08-13  9:21 ` [RFC PATCH 5/9] f2fs:Refactor `f2fs_is_compressed_page` to `f2fs_is_compressed_folio` Nanzhe Zhao
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 14+ messages in thread
From: Nanzhe Zhao @ 2025-08-13  9:21 UTC (permalink / raw)
  To: Jaegeuk Kim, linux-f2fs, linux-fsdevel
  Cc: Matthew Wilcox, Chao Yu, Yi Zhang, Barry Song, Nanzhe Zhao

    The core function `f2fs_out_place_write` and `__get_segment_type_6`
    in outplace write path haven't got their legacy page private functions
    converted which can be harmful for large folios support.
    Convert them to use our folio private funcions.

Signed-off-by: Nanzhe Zhao <nzzhao@126.com>
---
 fs/f2fs/data.c    | 2 +-
 fs/f2fs/segment.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index 415f51602492..5589280294c1 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -2637,7 +2637,7 @@ bool f2fs_should_update_outplace(struct inode *inode, struct f2fs_io_info *fio)
 		return true;
 
 	if (fio) {
-		if (page_private_gcing(fio->page))
+		if (folio_test_f2fs_gcing(fio->folio))
 			return true;
 		if (unlikely(is_sbi_flag_set(sbi, SBI_CP_DISABLED) &&
 			f2fs_is_checkpointed_data(sbi, fio->old_blkaddr)))
diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index 949ee1f8fb5c..7e9dd045b55d 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -3653,7 +3653,7 @@ static int __get_segment_type_6(struct f2fs_io_info *fio)
 		if (is_inode_flag_set(inode, FI_ALIGNED_WRITE))
 			return CURSEG_COLD_DATA_PINNED;
 
-		if (page_private_gcing(fio->page)) {
+		if (folio_test_f2fs_gcing(fio->folio)) {
 			if (fio->sbi->am.atgc_enabled &&
 				(fio->io_type == FS_DATA_IO) &&
 				(fio->sbi->gc_mode != GC_URGENT_HIGH) &&
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC PATCH 5/9] f2fs:Refactor `f2fs_is_compressed_page` to `f2fs_is_compressed_folio`
  2025-08-13  9:21 [f2fs-dev] [RFC PATCH 0/9] f2fs: Enable buffered read/write large folios support with extended iomap Nanzhe Zhao
                   ` (3 preceding siblings ...)
  2025-08-13  9:21 ` [RFC PATCH 4/9] f2fs: Convert outplace write path page private funcions to folio private functions Nanzhe Zhao
@ 2025-08-13  9:21 ` Nanzhe Zhao
  2025-08-13  9:21 ` [RFC PATCH 6/9] f2fs: Extend f2fs_io_info to support sub-folio ranges Nanzhe Zhao
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 14+ messages in thread
From: Nanzhe Zhao @ 2025-08-13  9:21 UTC (permalink / raw)
  To: Jaegeuk Kim, linux-f2fs, linux-fsdevel
  Cc: Matthew Wilcox, Chao Yu, Yi Zhang, Barry Song, Nanzhe Zhao

`f2fs_is_compressed_page` now already accept a folio as
a parameter.So the name now is confusing.
Rename it to `f2fs_is_compressed_folio`.
If a folio has f2fs_iomap_folio_state then it must not be
a compressed folio.

Signed-off-by: Nanzhe Zhao <nzzhao@126.com>
---
 fs/f2fs/compress.c | 11 ++++++-----
 fs/f2fs/data.c     | 10 +++++-----
 fs/f2fs/f2fs.h     |  7 +++++--
 3 files changed, 16 insertions(+), 12 deletions(-)

diff --git a/fs/f2fs/compress.c b/fs/f2fs/compress.c
index 6ad8d3bc6df7..627013ef856c 100644
--- a/fs/f2fs/compress.c
+++ b/fs/f2fs/compress.c
@@ -71,13 +71,14 @@ static pgoff_t start_idx_of_cluster(struct compress_ctx *cc)
 	return cc->cluster_idx << cc->log_cluster_size;
 }
 
-bool f2fs_is_compressed_page(struct folio *folio)
+bool f2fs_is_compressed_folio(struct folio *folio)
 {
-	if (!folio->private)
+	if (!folio_test_private(folio))
 		return false;
 	if (folio_test_f2fs_nonpointer(folio))
 		return false;
-
+	if (folio_get_f2fs_ifs(folio)) /*compressed folio current don't support higer order*/
+		return false;
 	f2fs_bug_on(F2FS_F_SB(folio),
 		*((u32 *)folio->private) != F2FS_COMPRESSED_PAGE_MAGIC);
 	return true;
@@ -1483,8 +1484,8 @@ void f2fs_compress_write_end_io(struct bio *bio, struct folio *folio)
 	struct page *page = &folio->page;
 	struct f2fs_sb_info *sbi = bio->bi_private;
 	struct compress_io_ctx *cic = folio->private;
-	enum count_type type = WB_DATA_TYPE(folio,
-				f2fs_is_compressed_page(folio));
+	enum count_type type =
+		WB_DATA_TYPE(folio, f2fs_is_compressed_folio(folio));
 	int i;
 
 	if (unlikely(bio->bi_status != BLK_STS_OK))
diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index 5589280294c1..a9dc2572bdc4 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -142,7 +142,7 @@ static void f2fs_finish_read_bio(struct bio *bio, bool in_task)
 	bio_for_each_folio_all(fi, bio) {
 		struct folio *folio = fi.folio;
 
-		if (f2fs_is_compressed_page(folio)) {
+		if (f2fs_is_compressed_folio(folio)) {
 			if (ctx && !ctx->decompression_attempted)
 				f2fs_end_read_compressed_page(folio, true, 0,
 							in_task);
@@ -186,7 +186,7 @@ static void f2fs_verify_bio(struct work_struct *work)
 		bio_for_each_folio_all(fi, bio) {
 			struct folio *folio = fi.folio;
 
-			if (!f2fs_is_compressed_page(folio) &&
+			if (!f2fs_is_compressed_folio(folio) &&
 			    !fsverity_verify_page(&folio->page)) {
 				bio->bi_status = BLK_STS_IOERR;
 				break;
@@ -239,7 +239,7 @@ static void f2fs_handle_step_decompress(struct bio_post_read_ctx *ctx,
 	bio_for_each_folio_all(fi, ctx->bio) {
 		struct folio *folio = fi.folio;
 
-		if (f2fs_is_compressed_page(folio))
+		if (f2fs_is_compressed_folio(folio))
 			f2fs_end_read_compressed_page(folio, false, blkaddr,
 						      in_task);
 		else
@@ -344,7 +344,7 @@ static void f2fs_write_end_io(struct bio *bio)
 		}
 
 #ifdef CONFIG_F2FS_FS_COMPRESSION
-		if (f2fs_is_compressed_page(folio)) {
+		if (f2fs_is_compressed_folio(folio)) {
 			f2fs_compress_write_end_io(bio, folio);
 			continue;
 		}
@@ -568,7 +568,7 @@ static bool __has_merged_page(struct bio *bio, struct inode *inode,
 			if (IS_ERR(target))
 				continue;
 		}
-		if (f2fs_is_compressed_page(target)) {
+		if (f2fs_is_compressed_folio(target)) {
 			target = f2fs_compress_control_folio(target);
 			if (IS_ERR(target))
 				continue;
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index a14bef4dc394..9f88be53174b 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -4677,7 +4677,7 @@ enum cluster_check_type {
 	CLUSTER_COMPR_BLKS, /* return # of compressed blocks in a cluster */
 	CLUSTER_RAW_BLKS    /* return # of raw blocks in a cluster */
 };
-bool f2fs_is_compressed_page(struct folio *folio);
+bool f2fs_is_compressed_folio(struct folio *folio);
 struct folio *f2fs_compress_control_folio(struct folio *folio);
 int f2fs_prepare_compress_overwrite(struct inode *inode,
 			struct page **pagep, pgoff_t index, void **fsdata);
@@ -4744,7 +4744,10 @@ void f2fs_invalidate_compress_pages(struct f2fs_sb_info *sbi, nid_t ino);
 		sbi->compr_saved_block += diff;				\
 	} while (0)
 #else
-static inline bool f2fs_is_compressed_page(struct folio *folio) { return false; }
+static inline bool f2fs_is_compressed_folio(struct folio *folio)
+{
+	return false;
+}
 static inline bool f2fs_is_compress_backend_ready(struct inode *inode)
 {
 	if (!f2fs_compressed_file(inode))
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC PATCH 6/9] f2fs: Extend f2fs_io_info to support sub-folio ranges
  2025-08-13  9:21 [f2fs-dev] [RFC PATCH 0/9] f2fs: Enable buffered read/write large folios support with extended iomap Nanzhe Zhao
                   ` (4 preceding siblings ...)
  2025-08-13  9:21 ` [RFC PATCH 5/9] f2fs:Refactor `f2fs_is_compressed_page` to `f2fs_is_compressed_folio` Nanzhe Zhao
@ 2025-08-13  9:21 ` Nanzhe Zhao
  2025-08-13  9:21 ` [RFC PATCH 7/9] f2fs:Make GC aware of large folios Nanzhe Zhao
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 14+ messages in thread
From: Nanzhe Zhao @ 2025-08-13  9:21 UTC (permalink / raw)
  To: Jaegeuk Kim, linux-f2fs, linux-fsdevel
  Cc: Matthew Wilcox, Chao Yu, Yi Zhang, Barry Song, Nanzhe Zhao

Since f2fs_io_info (hereafter fio) has been converted to use fio->folio
and fio->page is deprecated, we must now track which sub-part of a
folio is being submitted to a bio in order to support large folios.

To achieve this, we add `idx` and `cnt` fields to the fio struct.
`fio->idx` represents the offset (in pages) within the current folio
for this I/O operation, and `fio->cnt` represents the number of
contiguous blocks being processed.

With the introduction of these two fields, the existing `old_blkaddr`
and `new_blkaddr` fields in fio are reinterpreted. They now represent
the starting old and new block addresses corresponding to `fio->idx`.
Consequently, an fio no longer represents a single mapping from one
old_blkaddr to one new_blkaddr, but rather a range mapping from
[old_blkaddr, old_blkaddr + fio->cnt - 1] to
[new_blkaddr, new_blkaddr + fio->cnt - 1].

In bio submission paths, for cases where `fio->cnt` is not explicitly
initialized, we default it to 1 and `fio->idx` to 0. This ensures
backward compatibility with all existing f2fs logic that operates on
single pages.

Discussion: Now I don't know if it's better to store bytes-unit
logical file offset and LBA length instead of block-unit cnt and
page idx in fio if we are to support BLOCK_SIZE > PAGE_SIZE.
Suggestions are appreciated.

Signed-off-by: Nanzhe Zhao <nzzhao@126.com>
---
 fs/f2fs/data.c | 16 ++++++++++------
 fs/f2fs/f2fs.h |  2 ++
 2 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index a9dc2572bdc4..b7bef2a28c8e 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -711,7 +711,9 @@ int f2fs_submit_page_bio(struct f2fs_io_info *fio)
 
 	f2fs_set_bio_crypt_ctx(bio, fio_folio->mapping->host,
 			fio_folio->index, fio, GFP_NOIO);
-	bio_add_folio_nofail(bio, data_folio, folio_size(data_folio), 0);
+	bio_add_folio_nofail(bio, data_folio,
+			     F2FS_BLK_TO_BYTES(fio->cnt ? fio->cnt : 1),
+			     fio->idx << PAGE_SHIFT);
 
 	if (fio->io_wbc && !is_read_io(fio->op))
 		wbc_account_cgroup_owner(fio->io_wbc, fio_folio, PAGE_SIZE);
@@ -1010,16 +1012,18 @@ void f2fs_submit_page_write(struct f2fs_io_info *fio)
 		io->fio = *fio;
 	}
 
-	if (!bio_add_folio(io->bio, bio_folio, folio_size(bio_folio), 0)) {
+	if (!bio_add_folio(io->bio, bio_folio,
+			   F2FS_BLK_TO_BYTES(fio->cnt ? fio->cnt : 1),
+			   fio->idx << PAGE_SHIFT)) {
 		__submit_merged_bio(io);
 		goto alloc_new;
 	}
 
 	if (fio->io_wbc)
 		wbc_account_cgroup_owner(fio->io_wbc, fio->folio,
-				folio_size(fio->folio));
+					 F2FS_BLK_TO_BYTES(fio->cnt));
 
-	io->last_block_in_bio = fio->new_blkaddr;
+	io->last_block_in_bio = fio->new_blkaddr + fio->cnt - 1;
 
 	trace_f2fs_submit_folio_write(fio->folio, fio);
 #ifdef CONFIG_BLK_DEV_ZONED
@@ -2675,7 +2679,7 @@ int f2fs_do_write_data_page(struct f2fs_io_info *fio)
 		set_new_dnode(&dn, inode, NULL, NULL, 0);
 
 	if (need_inplace_update(fio) &&
-	    f2fs_lookup_read_extent_cache_block(inode, folio->index,
+	    f2fs_lookup_read_extent_cache_block(inode, folio->index + fio->idx,
 						&fio->old_blkaddr)) {
 		if (!f2fs_is_valid_blkaddr(fio->sbi, fio->old_blkaddr,
 						DATA_GENERIC_ENHANCE))
@@ -2690,7 +2694,7 @@ int f2fs_do_write_data_page(struct f2fs_io_info *fio)
 	if (fio->need_lock == LOCK_REQ && !f2fs_trylock_op(fio->sbi))
 		return -EAGAIN;
 
-	err = f2fs_get_dnode_of_data(&dn, folio->index, LOOKUP_NODE);
+	err = f2fs_get_dnode_of_data(&dn, folio->index + fio->idx, LOOKUP_NODE);
 	if (err)
 		goto out;
 
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 9f88be53174b..c6b23fa63588 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -1281,6 +1281,8 @@ struct f2fs_io_info {
 	blk_opf_t op_flags;	/* req_flag_bits */
 	block_t new_blkaddr;	/* new block address to be written */
 	block_t old_blkaddr;	/* old block address before Cow */
+	pgoff_t idx; /*start page index within current active folio in this fio*/
+	unsigned int cnt; /*block cnts of the active folio, we assume they are continuous.*/
 	union {
 		struct page *page;	/* page to be written */
 		struct folio *folio;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC PATCH 7/9] f2fs:Make GC aware of large folios
  2025-08-13  9:21 [f2fs-dev] [RFC PATCH 0/9] f2fs: Enable buffered read/write large folios support with extended iomap Nanzhe Zhao
                   ` (5 preceding siblings ...)
  2025-08-13  9:21 ` [RFC PATCH 6/9] f2fs: Extend f2fs_io_info to support sub-folio ranges Nanzhe Zhao
@ 2025-08-13  9:21 ` Nanzhe Zhao
  2025-08-13  9:21 ` [RFC PATCH 8/9] f2fs: Introduce F2FS_GET_BLOCK_IOMAP and map_blocks he lpers Nanzhe Zhao
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 14+ messages in thread
From: Nanzhe Zhao @ 2025-08-13  9:21 UTC (permalink / raw)
  To: Jaegeuk Kim, linux-f2fs, linux-fsdevel
  Cc: Matthew Wilcox, Chao Yu, Yi Zhang, Barry Song, Nanzhe Zhao

Previously, the GC (Garbage Collection) logic for performing I/O and
marking folios dirty only supported order-0 folios and lacked awareness
of higher-order folios. To enable GC to correctly handle higher-order
folios, we made two changes:

- In `move_data_page`, we now use `f2fs_iomap_set_range_dirty` to mark
  only the sub-part of the folio corresponding to `bidx` as dirty,
  instead of the entire folio.

- The `f2fs_submit_page_read` function has been augmented with an
  `index` parameter, allowing it to precisely identify which sub-page
  of the current folio is being submitted.

Signed-off-by: Nanzhe Zhao <nzzhao@126.com>
---
 fs/f2fs/data.c | 13 +++++++------
 fs/f2fs/gc.c   | 37 +++++++++++++++++++++++--------------
 2 files changed, 30 insertions(+), 20 deletions(-)

diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index b7bef2a28c8e..5ecd08a3dd0b 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -1096,7 +1096,7 @@ static struct bio *f2fs_grab_read_bio(struct inode *inode, block_t blkaddr,
 /* This can handle encryption stuffs */
 static int f2fs_submit_page_read(struct inode *inode, struct folio *folio,
 				 block_t blkaddr, blk_opf_t op_flags,
-				 bool for_write)
+				 pgoff_t index, bool for_write)
 {
 	struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
 	struct bio *bio;
@@ -1109,7 +1109,8 @@ static int f2fs_submit_page_read(struct inode *inode, struct folio *folio,
 	/* wait for GCed page writeback via META_MAPPING */
 	f2fs_wait_on_block_writeback(inode, blkaddr);
 
-	if (!bio_add_folio(bio, folio, PAGE_SIZE, 0)) {
+	if (!bio_add_folio(bio, folio, PAGE_SIZE,
+			   (index - folio->index) << PAGE_SHIFT)) {
 		iostat_update_and_unbind_ctx(bio);
 		if (bio->bi_private)
 			mempool_free(bio->bi_private, bio_post_read_ctx_pool);
@@ -1276,8 +1277,8 @@ struct folio *f2fs_get_read_data_folio(struct inode *inode, pgoff_t index,
 		return folio;
 	}
 
-	err = f2fs_submit_page_read(inode, folio, dn.data_blkaddr,
-						op_flags, for_write);
+	err = f2fs_submit_page_read(inode, folio, dn.data_blkaddr, op_flags,
+				    index, for_write);
 	if (err)
 		goto put_err;
 	return folio;
@@ -3651,8 +3652,8 @@ static int f2fs_write_begin(const struct kiocb *iocb,
 			goto put_folio;
 		}
 		err = f2fs_submit_page_read(use_cow ?
-				F2FS_I(inode)->cow_inode : inode,
-				folio, blkaddr, 0, true);
+			F2FS_I(inode)->cow_inode : inode, folio,
+			blkaddr, 0, folio->index, true);
 		if (err)
 			goto put_folio;
 
diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
index 098e9f71421e..6d28f01bec42 100644
--- a/fs/f2fs/gc.c
+++ b/fs/f2fs/gc.c
@@ -1475,22 +1475,31 @@ static int move_data_page(struct inode *inode, block_t bidx, int gc_type,
 			err = -EAGAIN;
 			goto out;
 		}
-		folio_mark_dirty(folio);
 		folio_set_f2fs_gcing(folio);
+#ifdef CONFIG_F2FS_IOMAP_FOLIO_STATE
+		if (!folio_test_large(folio)) {
+			folio_mark_dirty(folio);
+		} else {
+			f2fs_iomap_set_range_dirty(folio, (bidx - folio->index) << PAGE_SHIFT,
+				PAGE_SIZE);
+		}
+#else
+		folio_mark_dirty(folio);
+#endif
 	} else {
-		struct f2fs_io_info fio = {
-			.sbi = F2FS_I_SB(inode),
-			.ino = inode->i_ino,
-			.type = DATA,
-			.temp = COLD,
-			.op = REQ_OP_WRITE,
-			.op_flags = REQ_SYNC,
-			.old_blkaddr = NULL_ADDR,
-			.folio = folio,
-			.encrypted_page = NULL,
-			.need_lock = LOCK_REQ,
-			.io_type = FS_GC_DATA_IO,
-		};
+		struct f2fs_io_info fio = { .sbi = F2FS_I_SB(inode),
+					    .ino = inode->i_ino,
+					    .type = DATA,
+					    .temp = COLD,
+					    .op = REQ_OP_WRITE,
+					    .op_flags = REQ_SYNC,
+					    .old_blkaddr = NULL_ADDR,
+					    .folio = folio,
+					    .encrypted_page = NULL,
+					    .need_lock = LOCK_REQ,
+					    .io_type = FS_GC_DATA_IO,
+					    .idx = bidx - folio->index,
+					    .cnt = 1 };
 		bool is_dirty = folio_test_dirty(folio);
 
 retry:
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC PATCH 8/9] f2fs: Introduce F2FS_GET_BLOCK_IOMAP and map_blocks he lpers
  2025-08-13  9:21 [f2fs-dev] [RFC PATCH 0/9] f2fs: Enable buffered read/write large folios support with extended iomap Nanzhe Zhao
                   ` (6 preceding siblings ...)
  2025-08-13  9:21 ` [RFC PATCH 7/9] f2fs:Make GC aware of large folios Nanzhe Zhao
@ 2025-08-13  9:21 ` Nanzhe Zhao
  2025-08-13  9:21 ` [RFC PATCH 9/9] f2fs: Enable buffered read/write path large folios support for normal and atomic file with iomap Nanzhe Zhao
  2025-08-13 15:22 ` [f2fs-dev] [RFC PATCH 0/9] f2fs: Enable buffered read/write large folios support with extended iomap Christoph Hellwig
  9 siblings, 0 replies; 14+ messages in thread
From: Nanzhe Zhao @ 2025-08-13  9:21 UTC (permalink / raw)
  To: Jaegeuk Kim, linux-f2fs, linux-fsdevel
  Cc: Matthew Wilcox, Chao Yu, Yi Zhang, Barry Song, Nanzhe Zhao

    Introduce the `F2FS_GET_BLOCK_IOMAP` flag for `f2fs_ma
p_blocks`.

    With this flag, holes encountered during buffered I/O
iterative mapping
    can now be merged under `map_is_mergeable`. Furthermor
e, when this flag
    is passed, `f2fs_map_blocks` will by default store the
 mapped block
    information (from the `f2fs_map_blocks` structure) int
o the extent cache,
    provided the resulting extent size is greater than the
 minimum allowed
    length for the f2fs extent cache.
    Notably, both holes and `NEW_ADDR`
    extents will also be cached under the influence of thi
s flag.
    This improves buffered write performance for sparse fi
les.

    Additionally, two helper functions are introduced:
    - `f2fs_map_blocks_iomap`: A simple wrapper for `f2fs_
map_blocks` that
      enables the `F2FS_GET_BLOCK_IOMAP` flag.
    - `f2fs_map_blocks_prealloc`: A simple wrapper for usi
ng
      `f2fs_map_blocks` to preallocate blocks.

Signed-off-by: Nanzhe Zhao <nzzhao@126.com>
---
 fs/f2fs/data.c | 49 +++++++++++++++++++++++++++++++++++++++++++------
 fs/f2fs/f2fs.h |  5 +++++
 2 files changed, 48 insertions(+), 6 deletions(-)

diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index 5ecd08a3dd0b..37eaf431ab42 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -1537,8 +1537,11 @@ static bool map_is_mergeable(struct f2fs_sb_info *sbi,
 		return true;
 	if (flag == F2FS_GET_BLOCK_PRE_DIO)
 		return true;
-	if (flag == F2FS_GET_BLOCK_DIO &&
-		map->m_pblk == NULL_ADDR && blkaddr == NULL_ADDR)
+	if (flag == F2FS_GET_BLOCK_DIO && map->m_pblk == NULL_ADDR &&
+	    blkaddr == NULL_ADDR)
+		return true;
+	if (flag == F2FS_GET_BLOCK_IOMAP && map->m_pblk == NULL_ADDR &&
+	    blkaddr == NULL_ADDR)
 		return true;
 	return false;
 }
@@ -1676,6 +1679,10 @@ int f2fs_map_blocks(struct inode *inode, struct f2fs_map_blocks *map, int flag)
 			if (map->m_next_pgofs)
 				*map->m_next_pgofs = pgofs + 1;
 			break;
+		case F2FS_GET_BLOCK_IOMAP:
+			if (map->m_next_pgofs)
+				*map->m_next_pgofs = pgofs + 1;
+			break;
 		default:
 			/* for defragment case */
 			if (map->m_next_pgofs)
@@ -1741,8 +1748,9 @@ int f2fs_map_blocks(struct inode *inode, struct f2fs_map_blocks *map, int flag)
 	else if (dn.ofs_in_node < end_offset)
 		goto next_block;
 
-	if (flag == F2FS_GET_BLOCK_PRECACHE) {
-		if (map->m_flags & F2FS_MAP_MAPPED) {
+	if (flag == F2FS_GET_BLOCK_PRECACHE || flag == F2FS_GET_BLOCK_IOMAP) {
+		if (map->m_flags & F2FS_MAP_MAPPED &&
+		    map->m_len > F2FS_MIN_EXTENT_LEN) {
 			unsigned int ofs = start_pgofs - map->m_lblk;
 
 			f2fs_update_read_extent_cache_range(&dn,
@@ -1786,8 +1794,9 @@ int f2fs_map_blocks(struct inode *inode, struct f2fs_map_blocks *map, int flag)
 		}
 	}
 
-	if (flag == F2FS_GET_BLOCK_PRECACHE) {
-		if (map->m_flags & F2FS_MAP_MAPPED) {
+	if (flag == F2FS_GET_BLOCK_PRECACHE || flag == F2FS_GET_BLOCK_IOMAP) {
+		if (map->m_flags & F2FS_MAP_MAPPED &&
+		    map->m_len > F2FS_MIN_EXTENT_LEN) {
 			unsigned int ofs = start_pgofs - map->m_lblk;
 
 			f2fs_update_read_extent_cache_range(&dn,
@@ -1808,6 +1817,34 @@ int f2fs_map_blocks(struct inode *inode, struct f2fs_map_blocks *map, int flag)
 	return err;
 }
 
+int f2fs_map_blocks_iomap(struct inode *inode, block_t start, block_t len,
+			  struct f2fs_map_blocks *map)
+{
+	int err = 0;
+
+	map->m_lblk = start; // Logical block number for the start pos
+	map->m_len = len; // Length in blocks
+	map->m_may_create = false;
+	map->m_seg_type =
+		f2fs_rw_hint_to_seg_type(F2FS_I_SB(inode), inode->i_write_hint);
+	err = f2fs_map_blocks(inode, map, F2FS_GET_BLOCK_IOMAP);
+	return err;
+}
+
+int f2fs_map_blocks_preallocate(struct inode *inode, block_t start, block_t len,
+				struct f2fs_map_blocks *map)
+{
+	int err = 0;
+
+	map->m_lblk = start;
+	map->m_len = len; // Length in blocks
+	map->m_may_create = true;
+	map->m_seg_type =
+		f2fs_rw_hint_to_seg_type(F2FS_I_SB(inode), inode->i_write_hint);
+	err = f2fs_map_blocks(inode, map, F2FS_GET_BLOCK_PRE_AIO);
+	return err;
+}
+
 bool f2fs_overwrite_io(struct inode *inode, loff_t pos, size_t len)
 {
 	struct f2fs_map_blocks map;
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index c6b23fa63588..ac9a6ac13e1f 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -788,6 +788,7 @@ enum {
 	F2FS_GET_BLOCK_PRE_DIO,
 	F2FS_GET_BLOCK_PRE_AIO,
 	F2FS_GET_BLOCK_PRECACHE,
+	F2FS_GET_BLOCK_IOMAP,
 };
 
 /*
@@ -4232,6 +4233,10 @@ struct folio *f2fs_get_new_data_folio(struct inode *inode,
 			struct folio *ifolio, pgoff_t index, bool new_i_size);
 int f2fs_do_write_data_page(struct f2fs_io_info *fio);
 int f2fs_map_blocks(struct inode *inode, struct f2fs_map_blocks *map, int flag);
+int f2fs_map_blocks_iomap(struct inode *inode, block_t start, block_t len,
+			  struct f2fs_map_blocks *map);
+int f2fs_map_blocks_preallocate(struct inode *inode, block_t start, block_t len,
+				struct f2fs_map_blocks *map);
 int f2fs_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
 			u64 start, u64 len);
 int f2fs_encrypt_one_page(struct f2fs_io_info *fio);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC PATCH 9/9] f2fs: Enable buffered read/write path large folios support for normal and atomic file with iomap
  2025-08-13  9:21 [f2fs-dev] [RFC PATCH 0/9] f2fs: Enable buffered read/write large folios support with extended iomap Nanzhe Zhao
                   ` (7 preceding siblings ...)
  2025-08-13  9:21 ` [RFC PATCH 8/9] f2fs: Introduce F2FS_GET_BLOCK_IOMAP and map_blocks he lpers Nanzhe Zhao
@ 2025-08-13  9:21 ` Nanzhe Zhao
  2025-08-13 15:22 ` [f2fs-dev] [RFC PATCH 0/9] f2fs: Enable buffered read/write large folios support with extended iomap Christoph Hellwig
  9 siblings, 0 replies; 14+ messages in thread
From: Nanzhe Zhao @ 2025-08-13  9:21 UTC (permalink / raw)
  To: Jaegeuk Kim, linux-f2fs, linux-fsdevel
  Cc: Matthew Wilcox, Chao Yu, Yi Zhang, Barry Song, Nanzhe Zhao

This commit enables large folios support for F2FS's buffered read and
write paths.

We introduce a helper function `f2fs_set_iomap` to handle all the logic
that converts a f2fs_map_blocks to iomap.

Currently, compressed files, encrypted files, and fsverity are not
supported with iomap large folios.

Since F2FS requires `f2fs_iomap_folio_state` (or a similar equivalent
mechanism) to correctly support the iomap framework, when
`CONFIG_F2FS_IOMAP_FOLIO_STATE` is not enabled, we will not use the
iomap buffered read/write paths.

Note: Since holes reported by f2fs_map_blocks come in two types
(NULL_ADDR and unmapped dnodes).
They requiring different handle logic to set iomap.length,
So we add a new block state flag for f2fs_map_blocks

Signed-off-by: Nanzhe Zhao <nzzhao@126.com>
---
 fs/f2fs/data.c   | 286 +++++++++++++++++++++++++++++++++++++++++++----
 fs/f2fs/f2fs.h   | 120 +++++++++++++-------
 fs/f2fs/file.c   |  33 +++++-
 fs/f2fs/inline.c |  15 ++-
 fs/f2fs/inode.c  |  27 +++++
 fs/f2fs/namei.c  |   7 ++
 fs/f2fs/super.c  |   3 +
 7 files changed, 425 insertions(+), 66 deletions(-)

diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index 37eaf431ab42..243c6305b0c5 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -1149,6 +1149,9 @@ void f2fs_update_data_blkaddr(struct dnode_of_data *dn, block_t blkaddr)
 {
 	f2fs_set_data_blkaddr(dn, blkaddr);
 	f2fs_update_read_extent_cache(dn);
+#ifdef CONFIG_F2FS_IOMAP_FOLIO_STATE
+	f2fs_iomap_seq_inc(dn->inode);
+#endif
 }
 
 /* dn->ofs_in_node will be returned with up-to-date last block pointer */
@@ -1182,6 +1185,9 @@ int f2fs_reserve_new_blocks(struct dnode_of_data *dn, blkcnt_t count)
 
 	if (folio_mark_dirty(dn->node_folio))
 		dn->node_changed = true;
+#ifdef CONFIG_F2FS_IOMAP_FOLIO_STATE
+	f2fs_iomap_seq_inc(dn->inode);
+#endif
 	return 0;
 }
 
@@ -1486,6 +1492,7 @@ static int f2fs_map_no_dnode(struct inode *inode,
 		*map->m_next_pgofs = f2fs_get_next_page_offset(dn, pgoff);
 	if (map->m_next_extent)
 		*map->m_next_extent = f2fs_get_next_page_offset(dn, pgoff);
+	map->m_flags |= F2FS_MAP_NODNODE;
 	return 0;
 }
 
@@ -1702,7 +1709,9 @@ int f2fs_map_blocks(struct inode *inode, struct f2fs_map_blocks *map, int flag)
 		if (blkaddr == NEW_ADDR)
 			map->m_flags |= F2FS_MAP_DELALLOC;
 		/* DIO READ and hole case, should not map the blocks. */
-		if (!(flag == F2FS_GET_BLOCK_DIO && is_hole && !map->m_may_create))
+		if (!(flag == F2FS_GET_BLOCK_DIO && is_hole &&
+		      !map->m_may_create) &&
+		    !(flag == F2FS_GET_BLOCK_IOMAP && is_hole))
 			map->m_flags |= F2FS_MAP_MAPPED;
 
 		map->m_pblk = blkaddr;
@@ -1736,6 +1745,10 @@ int f2fs_map_blocks(struct inode *inode, struct f2fs_map_blocks *map, int flag)
 			goto sync_out;
 
 		map->m_len += dn.ofs_in_node - ofs_in_node;
+		/* Since we successfully reserved blocks, we can update the pblk now.
+		 * No need to perform inefficient look up in write_begin again
+		 */
+		map->m_pblk = dn.data_blkaddr;
 		if (prealloc && dn.ofs_in_node != last_ofs_in_node + 1) {
 			err = -ENOSPC;
 			goto sync_out;
@@ -4255,9 +4268,6 @@ static int f2fs_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
 	err = f2fs_map_blocks(inode, &map, F2FS_GET_BLOCK_DIO);
 	if (err)
 		return err;
-
-	iomap->offset = F2FS_BLK_TO_BYTES(map.m_lblk);
-
 	/*
 	 * When inline encryption is enabled, sometimes I/O to an encrypted file
 	 * has to be broken up to guarantee DUN contiguity.  Handle this by
@@ -4272,28 +4282,44 @@ static int f2fs_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
 	if (WARN_ON_ONCE(map.m_pblk == COMPRESS_ADDR))
 		return -EINVAL;
 
-	if (map.m_flags & F2FS_MAP_MAPPED) {
-		if (WARN_ON_ONCE(map.m_pblk == NEW_ADDR))
-			return -EINVAL;
-
-		iomap->length = F2FS_BLK_TO_BYTES(map.m_len);
-		iomap->type = IOMAP_MAPPED;
-		iomap->flags |= IOMAP_F_MERGED;
-		iomap->bdev = map.m_bdev;
-		iomap->addr = F2FS_BLK_TO_BYTES(map.m_pblk);
-
-		if (flags & IOMAP_WRITE && map.m_last_pblk)
-			iomap->private = (void *)map.m_last_pblk;
+	return f2fs_set_iomap(inode, &map, iomap, flags, offset, length, false);
+}
+int f2fs_set_iomap(struct inode *inode, struct f2fs_map_blocks *map,
+		   struct iomap *iomap, unsigned int flags, loff_t offset,
+		   loff_t length, bool dio)
+{
+	iomap->offset = F2FS_BLK_TO_BYTES(map->m_lblk);
+	if (map->m_flags & F2FS_MAP_MAPPED) {
+		if (dio) {
+			if (WARN_ON_ONCE(map->m_pblk == NEW_ADDR))
+				return -EINVAL;
+		}
+		iomap->length = F2FS_BLK_TO_BYTES(map->m_len);
+		iomap->bdev = map->m_bdev;
+		if (map->m_pblk != NEW_ADDR) {
+			iomap->type = IOMAP_MAPPED;
+			iomap->flags |= IOMAP_F_MERGED;
+			iomap->addr = F2FS_BLK_TO_BYTES(map->m_pblk);
+		} else {
+			iomap->type = IOMAP_UNWRITTEN;
+			iomap->addr = IOMAP_NULL_ADDR;
+		}
+		if (flags & IOMAP_WRITE && map->m_last_pblk)
+			iomap->private = (void *)map->m_last_pblk;
 	} else {
-		if (flags & IOMAP_WRITE)
+		if (dio && flags & IOMAP_WRITE)
 			return -ENOTBLK;
 
-		if (map.m_pblk == NULL_ADDR) {
-			iomap->length = F2FS_BLK_TO_BYTES(next_pgofs) -
-							iomap->offset;
+		if (map->m_pblk == NULL_ADDR) {
+			if (map->m_flags & F2FS_MAP_NODNODE)
+				iomap->length =
+					F2FS_BLK_TO_BYTES(*map->m_next_pgofs) -
+					iomap->offset;
+			else
+				iomap->length = F2FS_BLK_TO_BYTES(map->m_len);
 			iomap->type = IOMAP_HOLE;
-		} else if (map.m_pblk == NEW_ADDR) {
-			iomap->length = F2FS_BLK_TO_BYTES(map.m_len);
+		} else if (map->m_pblk == NEW_ADDR) {
+			iomap->length = F2FS_BLK_TO_BYTES(map->m_len);
 			iomap->type = IOMAP_UNWRITTEN;
 		} else {
 			f2fs_bug_on(F2FS_I_SB(inode), 1);
@@ -4301,7 +4327,7 @@ static int f2fs_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
 		iomap->addr = IOMAP_NULL_ADDR;
 	}
 
-	if (map.m_flags & F2FS_MAP_NEW)
+	if (map->m_flags & F2FS_MAP_NEW)
 		iomap->flags |= IOMAP_F_NEW;
 	if ((inode->i_state & I_DIRTY_DATASYNC) ||
 	    offset + length > i_size_read(inode))
@@ -4313,3 +4339,217 @@ static int f2fs_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
 const struct iomap_ops f2fs_iomap_ops = {
 	.iomap_begin	= f2fs_iomap_begin,
 };
+
+/* iomap buffered-io */
+static int f2fs_buffered_read_iomap_begin(struct inode *inode, loff_t offset,
+					  loff_t length, unsigned int flags,
+					  struct iomap *iomap,
+					  struct iomap *srcmap)
+{
+	pgoff_t next_pgofs = 0;
+	int err;
+	struct f2fs_map_blocks map = {};
+
+	map.m_lblk = F2FS_BYTES_TO_BLK(offset);
+	map.m_len = F2FS_BYTES_TO_BLK(offset + length - 1) - map.m_lblk + 1;
+	map.m_next_pgofs = &next_pgofs;
+	map.m_seg_type =
+		f2fs_rw_hint_to_seg_type(F2FS_I_SB(inode), inode->i_write_hint);
+	map.m_may_create = false;
+	if (is_sbi_flag_set(F2FS_I_SB(inode), SBI_IS_SHUTDOWN))
+		return -EIO;
+	/*
+	 * If the blocks being overwritten are already allocated,
+	 * f2fs_map_lock and f2fs_balance_fs are not necessary.
+	 */
+	if (flags & IOMAP_WRITE)
+		return -EINVAL;
+
+	err = f2fs_map_blocks(inode, &map, F2FS_GET_BLOCK_IOMAP);
+	if (err)
+		return err;
+
+	if (WARN_ON_ONCE(map.m_pblk == COMPRESS_ADDR))
+		return -EINVAL;
+
+	return f2fs_set_iomap(inode, &map, iomap, flags, offset, length, false);
+}
+
+const struct iomap_ops f2fs_buffered_read_iomap_ops = {
+	.iomap_begin = f2fs_buffered_read_iomap_begin,
+};
+
+static void f2fs_iomap_readahead(struct readahead_control *rac)
+{
+	struct inode *inode = rac->mapping->host;
+
+	if (!f2fs_is_compress_backend_ready(inode))
+		return;
+
+	/* If the file has inline data, skip readahead */
+	if (f2fs_has_inline_data(inode))
+		return;
+	iomap_readahead(rac, &f2fs_buffered_read_iomap_ops);
+}
+
+static int f2fs_buffered_write_iomap_begin(struct inode *inode, loff_t offset,
+					   loff_t length, unsigned flags,
+					   struct iomap *iomap,
+					   struct iomap *srcmap)
+{
+	struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
+	struct f2fs_map_blocks map = {};
+	struct folio *ifolio = NULL;
+	int err = 0;
+
+	iomap->offset = offset;
+	iomap->bdev = sbi->sb->s_bdev;
+#ifdef CONFIG_F2FS_IOMAP_FOLIO_STATE
+	iomap->validity_cookie = f2fs_iomap_seq_read(inode);
+#endif
+	if (f2fs_has_inline_data(inode)) {
+		if (offset + length <= MAX_INLINE_DATA(inode)) {
+			ifolio = f2fs_get_inode_folio(sbi, inode->i_ino);
+			if (IS_ERR(ifolio)) {
+				err = PTR_ERR(ifolio);
+				goto failed;
+			}
+			set_inode_flag(inode, FI_DATA_EXIST);
+			f2fs_iomap_prepare_read_inline(inode, ifolio, iomap,
+						       offset, length);
+			if (inode->i_nlink)
+				folio_set_f2fs_inline(ifolio);
+
+			f2fs_folio_put(ifolio, 1);
+			goto out;
+		}
+	}
+	block_t start_blk = F2FS_BYTES_TO_BLK(offset);
+	block_t len_blks =
+		F2FS_BYTES_TO_BLK(offset + length - 1) - start_blk + 1;
+	err = f2fs_map_blocks_iomap(inode, start_blk, len_blks, &map);
+	if (map.m_pblk == NULL_ADDR) {
+		err = f2fs_map_blocks_preallocate(inode, map.m_lblk, len_blks,
+						  &map);
+		if (err)
+			goto failed;
+	}
+	if (WARN_ON_ONCE(map.m_pblk == COMPRESS_ADDR))
+		return -EIO; // Should not happen for buffered write prep
+	err = f2fs_set_iomap(inode, &map, iomap, flags, offset, length, false);
+	if (err)
+		return err;
+failed:
+	f2fs_write_failed(inode, offset + length);
+out:
+	return err;
+}
+
+static int f2fs_buffered_write_atomic_iomap_begin(struct inode *inode,
+						  loff_t offset, loff_t length,
+						  unsigned flags,
+						  struct iomap *iomap,
+						  struct iomap *srcmap)
+{
+	struct inode *cow_inode = F2FS_I(inode)->cow_inode;
+	struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
+	struct f2fs_map_blocks map = {};
+	int err = 0;
+
+	iomap->offset = offset;
+	iomap->bdev = sbi->sb->s_bdev;
+#ifdef CONFIG_F2FS_IOMAP_FOLIO_STATE
+	iomap->validity_cookie = f2fs_iomap_seq_read(inode);
+#endif
+	block_t start_blk = F2FS_BYTES_TO_BLK(offset);
+	block_t len_blks =
+		F2FS_BYTES_TO_BLK(offset + length - 1) - start_blk + 1;
+	err = f2fs_map_blocks_iomap(cow_inode, start_blk, len_blks, &map);
+	if (err)
+		return err;
+	if (map.m_pblk == NULL_ADDR &&
+	    is_inode_flag_set(inode, FI_ATOMIC_REPLACE)) {
+		err = f2fs_map_blocks_preallocate(cow_inode, map.m_lblk,
+						  map.m_len, &map);
+		if (err)
+			return err;
+		inc_atomic_write_cnt(inode);
+		goto out;
+	} else if (map.m_pblk != NULL_ADDR) {
+		goto out;
+	}
+	err = f2fs_map_blocks_iomap(inode, start_blk, len_blks, &map);
+	if (err)
+		return err;
+out:
+	if (WARN_ON_ONCE(map.m_pblk == COMPRESS_ADDR))
+		return -EIO;
+
+	return f2fs_set_iomap(inode, &map, iomap, flags, offset, length, false);
+}
+
+static int f2fs_buffered_write_iomap_end(struct inode *inode, loff_t pos,
+					 loff_t length, ssize_t written,
+					 unsigned flags, struct iomap *iomap)
+{
+	return written;
+}
+
+const struct iomap_ops f2fs_buffered_write_iomap_ops = {
+	.iomap_begin = f2fs_buffered_write_iomap_begin,
+	.iomap_end = f2fs_buffered_write_iomap_end,
+};
+
+const struct iomap_ops f2fs_buffered_write_atomic_iomap_ops = {
+	.iomap_begin = f2fs_buffered_write_atomic_iomap_begin,
+};
+
+const struct address_space_operations f2fs_iomap_aops = {
+	.read_folio = f2fs_read_data_folio,
+	.readahead = f2fs_iomap_readahead,
+	.write_begin = f2fs_write_begin,
+	.write_end = f2fs_write_end,
+	.writepages = f2fs_write_data_pages,
+	.dirty_folio = f2fs_dirty_data_folio,
+	.invalidate_folio = f2fs_invalidate_folio,
+	.release_folio = f2fs_release_folio,
+	.migrate_folio = filemap_migrate_folio,
+	.is_partially_uptodate = iomap_is_partially_uptodate,
+	.error_remove_folio = generic_error_remove_folio,
+};
+
+static void f2fs_iomap_put_folio(struct inode *inode, loff_t pos,
+				 unsigned copied, struct folio *folio)
+{
+	if (!copied)
+		goto unlock_out;
+	if (f2fs_is_atomic_file(inode))
+		folio_set_f2fs_atomic(folio);
+
+	if (pos + copied > i_size_read(inode) &&
+	    !f2fs_verity_in_progress(inode)) {
+		if (f2fs_is_atomic_file(inode))
+			f2fs_i_size_write(F2FS_I(inode)->cow_inode,
+					  pos + copied);
+	}
+unlock_out:
+	folio_unlock(folio);
+	folio_put(folio);
+	f2fs_update_time(F2FS_I_SB(inode), REQ_TIME);
+}
+
+#ifdef CONFIG_F2FS_IOMAP_FOLIO_STATE
+static bool f2fs_iomap_valid(struct inode *inode, const struct iomap *iomap)
+{
+	return iomap->validity_cookie == f2fs_iomap_seq_read(inode);
+}
+#else
+static bool f2fs_iomap_valid(struct inode *inode, const struct iomap *iomap)
+{
+	return 1;
+}
+#endif
+const struct iomap_write_ops f2fs_iomap_write_ops = {
+	.put_folio = f2fs_iomap_put_folio,
+	.iomap_valid = f2fs_iomap_valid
+};
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index ac9a6ac13e1f..1cf12b76b09a 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -762,6 +762,7 @@ struct extent_tree_info {
 #define F2FS_MAP_NEW		(1U << 0)
 #define F2FS_MAP_MAPPED		(1U << 1)
 #define F2FS_MAP_DELALLOC	(1U << 2)
+#define F2FS_MAP_NODNODE	(1U << 3)
 #define F2FS_MAP_FLAGS		(F2FS_MAP_NEW | F2FS_MAP_MAPPED |\
 				F2FS_MAP_DELALLOC)
 
@@ -837,49 +838,53 @@ enum {
 
 /* used for f2fs_inode_info->flags */
 enum {
-	FI_NEW_INODE,		/* indicate newly allocated inode */
-	FI_DIRTY_INODE,		/* indicate inode is dirty or not */
-	FI_AUTO_RECOVER,	/* indicate inode is recoverable */
-	FI_DIRTY_DIR,		/* indicate directory has dirty pages */
-	FI_INC_LINK,		/* need to increment i_nlink */
-	FI_ACL_MODE,		/* indicate acl mode */
-	FI_NO_ALLOC,		/* should not allocate any blocks */
-	FI_FREE_NID,		/* free allocated nide */
-	FI_NO_EXTENT,		/* not to use the extent cache */
-	FI_INLINE_XATTR,	/* used for inline xattr */
-	FI_INLINE_DATA,		/* used for inline data*/
-	FI_INLINE_DENTRY,	/* used for inline dentry */
-	FI_APPEND_WRITE,	/* inode has appended data */
-	FI_UPDATE_WRITE,	/* inode has in-place-update data */
-	FI_NEED_IPU,		/* used for ipu per file */
-	FI_ATOMIC_FILE,		/* indicate atomic file */
-	FI_DATA_EXIST,		/* indicate data exists */
-	FI_SKIP_WRITES,		/* should skip data page writeback */
-	FI_OPU_WRITE,		/* used for opu per file */
-	FI_DIRTY_FILE,		/* indicate regular/symlink has dirty pages */
-	FI_PREALLOCATED_ALL,	/* all blocks for write were preallocated */
-	FI_HOT_DATA,		/* indicate file is hot */
-	FI_EXTRA_ATTR,		/* indicate file has extra attribute */
-	FI_PROJ_INHERIT,	/* indicate file inherits projectid */
-	FI_PIN_FILE,		/* indicate file should not be gced */
-	FI_VERITY_IN_PROGRESS,	/* building fs-verity Merkle tree */
-	FI_COMPRESSED_FILE,	/* indicate file's data can be compressed */
-	FI_COMPRESS_CORRUPT,	/* indicate compressed cluster is corrupted */
-	FI_MMAP_FILE,		/* indicate file was mmapped */
-	FI_ENABLE_COMPRESS,	/* enable compression in "user" compression mode */
-	FI_COMPRESS_RELEASED,	/* compressed blocks were released */
-	FI_ALIGNED_WRITE,	/* enable aligned write */
-	FI_COW_FILE,		/* indicate COW file */
-	FI_ATOMIC_COMMITTED,	/* indicate atomic commit completed except disk sync */
-	FI_ATOMIC_DIRTIED,	/* indicate atomic file is dirtied */
-	FI_ATOMIC_REPLACE,	/* indicate atomic replace */
-	FI_OPENED_FILE,		/* indicate file has been opened */
-	FI_DONATE_FINISHED,	/* indicate page donation of file has been finished */
-	FI_MAX,			/* max flag, never be used */
+	FI_NEW_INODE, /* indicate newly allocated inode */
+	FI_DIRTY_INODE, /* indicate inode is dirty or not */
+	FI_AUTO_RECOVER, /* indicate inode is recoverable */
+	FI_DIRTY_DIR, /* indicate directory has dirty pages */
+	FI_INC_LINK, /* need to increment i_nlink */
+	FI_ACL_MODE, /* indicate acl mode */
+	FI_NO_ALLOC, /* should not allocate any blocks */
+	FI_FREE_NID, /* free allocated nide */
+	FI_NO_EXTENT, /* not to use the extent cache */
+	FI_INLINE_XATTR, /* used for inline xattr */
+	FI_INLINE_DATA, /* used for inline data*/
+	FI_INLINE_DENTRY, /* used for inline dentry */
+	FI_APPEND_WRITE, /* inode has appended data */
+	FI_UPDATE_WRITE, /* inode has in-place-update data */
+	FI_NEED_IPU, /* used for ipu per file */
+	FI_ATOMIC_FILE, /* indicate atomic file */
+	FI_DATA_EXIST, /* indicate data exists */
+	FI_SKIP_WRITES, /* should skip data page writeback */
+	FI_OPU_WRITE, /* used for opu per file */
+	FI_DIRTY_FILE, /* indicate regular/symlink has dirty pages */
+	FI_PREALLOCATED_ALL, /* all blocks for write were preallocated */
+	FI_HOT_DATA, /* indicate file is hot */
+	FI_EXTRA_ATTR, /* indicate file has extra attribute */
+	FI_PROJ_INHERIT, /* indicate file inherits projectid */
+	FI_PIN_FILE, /* indicate file should not be gced */
+	FI_VERITY_IN_PROGRESS, /* building fs-verity Merkle tree */
+	FI_COMPRESSED_FILE, /* indicate file's data can be compressed */
+	FI_COMPRESS_CORRUPT, /* indicate compressed cluster is corrupted */
+	FI_MMAP_FILE, /* indicate file was mmapped */
+	FI_ENABLE_COMPRESS, /* enable compression in "user" compression mode */
+	FI_COMPRESS_RELEASED, /* compressed blocks were released */
+	FI_ALIGNED_WRITE, /* enable aligned write */
+	FI_COW_FILE, /* indicate COW file */
+	FI_ATOMIC_COMMITTED, /* indicate atomic commit completed except disk sync */
+	FI_ATOMIC_DIRTIED, /* indicate atomic file is dirtied */
+	FI_ATOMIC_REPLACE, /* indicate atomic replace */
+	FI_OPENED_FILE, /* indicate file has been opened */
+	FI_DONATE_FINISHED, /* indicate page donation of file has been finished */
+	FI_IOMAP, /* indicate whether this inode should enable iomap*/
+	FI_MAX, /* max flag, never be used */
 };
 
 struct f2fs_inode_info {
 	struct inode vfs_inode;		/* serve a vfs inode */
+#ifdef CONFIG_F2FS_IOMAP_FOLIO_STATE
+	atomic64_t i_iomap_seq; /* for iomap_valid sequence number */
+#endif
 	unsigned long i_flags;		/* keep an inode flags for ioctl */
 	unsigned char i_advise;		/* use to give file attribute hints */
 	unsigned char i_dir_level;	/* use for dentry level for large dir */
@@ -2814,6 +2819,16 @@ static inline void inc_page_count(struct f2fs_sb_info *sbi, int count_type)
 		set_sbi_flag(sbi, SBI_IS_DIRTY);
 }
 
+static inline void inc_page_count_multiple(struct f2fs_sb_info *sbi,
+					   int count_type, int npages)
+{
+	atomic_add(npages, &sbi->nr_pages[count_type]);
+
+	if (count_type == F2FS_DIRTY_DENTS || count_type == F2FS_DIRTY_NODES ||
+	    count_type == F2FS_DIRTY_META || count_type == F2FS_DIRTY_QDATA ||
+	    count_type == F2FS_DIRTY_IMETA)
+		set_sbi_flag(sbi, SBI_IS_DIRTY);
+}
 static inline void inode_inc_dirty_pages(struct inode *inode)
 {
 	atomic_inc(&F2FS_I(inode)->dirty_pages);
@@ -3657,6 +3672,10 @@ static inline bool f2fs_is_cow_file(struct inode *inode)
 	return is_inode_flag_set(inode, FI_COW_FILE);
 }
 
+static inline bool f2fs_iomap_inode(struct inode *inode)
+{
+	return is_inode_flag_set(inode, FI_IOMAP);
+}
 static inline void *inline_data_addr(struct inode *inode, struct folio *folio)
 {
 	__le32 *addr = get_dnode_addr(inode, folio);
@@ -3880,7 +3899,17 @@ int f2fs_write_inode(struct inode *inode, struct writeback_control *wbc);
 void f2fs_remove_donate_inode(struct inode *inode);
 void f2fs_evict_inode(struct inode *inode);
 void f2fs_handle_failed_inode(struct inode *inode);
+#ifdef CONFIG_F2FS_IOMAP_FOLIO_STATE
+static inline void f2fs_iomap_seq_inc(struct inode *inode)
+{
+	atomic64_inc(&F2FS_I(inode)->i_iomap_seq);
+}
 
+static inline u64 f2fs_iomap_seq_read(struct inode *inode)
+{
+	return atomic64_read(&F2FS_I(inode)->i_iomap_seq);
+}
+#endif
 /*
  * namei.c
  */
@@ -4248,6 +4277,9 @@ int f2fs_write_single_data_page(struct folio *folio, int *submitted,
 				enum iostat_type io_type,
 				int compr_blocks, bool allow_balance);
 void f2fs_write_failed(struct inode *inode, loff_t to);
+int f2fs_set_iomap(struct inode *inode, struct f2fs_map_blocks *map,
+		   struct iomap *iomap, unsigned int flags, loff_t offset,
+		   loff_t length, bool dio);
 void f2fs_invalidate_folio(struct folio *folio, size_t offset, size_t length);
 bool f2fs_release_folio(struct folio *folio, gfp_t wait);
 bool f2fs_overwrite_io(struct inode *inode, loff_t pos, size_t len);
@@ -4258,6 +4290,11 @@ int f2fs_init_post_read_wq(struct f2fs_sb_info *sbi);
 void f2fs_destroy_post_read_wq(struct f2fs_sb_info *sbi);
 extern const struct iomap_ops f2fs_iomap_ops;
 
+extern const struct iomap_write_ops f2fs_iomap_write_ops;
+extern const struct iomap_ops f2fs_buffered_read_iomap_ops;
+extern const struct iomap_ops f2fs_buffered_write_iomap_ops;
+extern const struct iomap_ops f2fs_buffered_write_atomic_iomap_ops;
+
 /*
  * gc.c
  */
@@ -4540,6 +4577,7 @@ extern const struct file_operations f2fs_dir_operations;
 extern const struct file_operations f2fs_file_operations;
 extern const struct inode_operations f2fs_file_inode_operations;
 extern const struct address_space_operations f2fs_dblock_aops;
+extern const struct address_space_operations f2fs_iomap_aops;
 extern const struct address_space_operations f2fs_node_aops;
 extern const struct address_space_operations f2fs_meta_aops;
 extern const struct inode_operations f2fs_dir_inode_operations;
@@ -4578,7 +4616,9 @@ int f2fs_read_inline_dir(struct file *file, struct dir_context *ctx,
 int f2fs_inline_data_fiemap(struct inode *inode,
 			struct fiemap_extent_info *fieinfo,
 			__u64 start, __u64 len);
-
+void f2fs_iomap_prepare_read_inline(struct inode *inode, struct folio *ifolio,
+				    struct iomap *iomap, loff_t pos,
+				    loff_t length);
 /*
  * shrinker.c
  */
diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index 42faaed6a02d..6c5b3e632f2b 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -4965,7 +4965,14 @@ static int f2fs_preallocate_blocks(struct kiocb *iocb, struct iov_iter *iter,
 		if (ret)
 			return ret;
 	}
-
+#ifdef CONFIG_F2FS_IOMAP_FOLIO_STATE
+	/* Buffered write can convert inline file to large normal file
+	 * when convert success, we uses mapping set large folios here
+	 */
+	if (f2fs_should_use_buffered_iomap(inode))
+		mapping_set_large_folios(inode->i_mapping);
+	set_inode_flag(inode, FI_IOMAP);
+#endif
 	/* Do not preallocate blocks that will be written partially in 4KB. */
 	map.m_lblk = F2FS_BLK_ALIGN(pos);
 	map.m_len = F2FS_BYTES_TO_BLK(pos + count);
@@ -4994,6 +5001,24 @@ static int f2fs_preallocate_blocks(struct kiocb *iocb, struct iov_iter *iter,
 	return map.m_len;
 }
 
+static ssize_t f2fs_iomap_buffered_write(struct kiocb *iocb, struct iov_iter *i)
+{
+	struct file *file = iocb->ki_filp;
+	struct inode *inode = file_inode(file);
+	ssize_t ret;
+
+	if (f2fs_is_atomic_file(inode)) {
+		ret = iomap_file_buffered_write(iocb, i,
+						&f2fs_buffered_write_atomic_iomap_ops,
+						&f2fs_iomap_write_ops, NULL);
+	} else {
+		ret = iomap_file_buffered_write(iocb, i,
+						&f2fs_buffered_write_iomap_ops,
+						&f2fs_iomap_write_ops, NULL);
+	}
+	return ret;
+}
+
 static ssize_t f2fs_buffered_write_iter(struct kiocb *iocb,
 					struct iov_iter *from)
 {
@@ -5004,7 +5029,11 @@ static ssize_t f2fs_buffered_write_iter(struct kiocb *iocb,
 	if (iocb->ki_flags & IOCB_NOWAIT)
 		return -EOPNOTSUPP;
 
-	ret = generic_perform_write(iocb, from);
+	if (f2fs_iomap_inode(inode)) {
+		ret = f2fs_iomap_buffered_write(iocb, from);
+	} else {
+		ret = generic_perform_write(iocb, from);
+	}
 
 	if (ret > 0) {
 		f2fs_update_iostat(F2FS_I_SB(inode), inode,
diff --git a/fs/f2fs/inline.c b/fs/f2fs/inline.c
index 58ac831ef704..bda338b4fc22 100644
--- a/fs/f2fs/inline.c
+++ b/fs/f2fs/inline.c
@@ -13,7 +13,7 @@
 #include "f2fs.h"
 #include "node.h"
 #include <trace/events/f2fs.h>
-
+#include <linux/iomap.h>
 static bool support_inline_data(struct inode *inode)
 {
 	if (f2fs_used_in_atomic_write(inode))
@@ -832,3 +832,16 @@ int f2fs_inline_data_fiemap(struct inode *inode,
 	f2fs_folio_put(ifolio, true);
 	return err;
 }
+/* fill iomap struct for inline data case for
+ *iomap buffered write
+ */
+void f2fs_iomap_prepare_read_inline(struct inode *inode, struct folio *ifolio,
+				    struct iomap *iomap, loff_t pos,
+				    loff_t length)
+{
+	iomap->addr = IOMAP_NULL_ADDR;
+	iomap->length = length;
+	iomap->type = IOMAP_INLINE;
+	iomap->flags = 0;
+	iomap->inline_data = inline_data_addr(inode, ifolio);
+}
diff --git a/fs/f2fs/inode.c b/fs/f2fs/inode.c
index 8c4eafe9ffac..29378270d561 100644
--- a/fs/f2fs/inode.c
+++ b/fs/f2fs/inode.c
@@ -23,6 +23,24 @@
 extern const struct address_space_operations f2fs_compress_aops;
 #endif
 
+bool f2fs_should_use_buffered_iomap(struct inode *inode)
+{
+	if (!S_ISREG(inode->i_mode))
+		return false;
+	if (S_ISDIR(inode->i_mode) || S_ISLNK(inode->i_mode))
+		return false;
+	if (inode->i_mapping == NODE_MAPPING(F2FS_I_SB(inode)))
+		return false;
+	if (inode->i_mapping == META_MAPPING(F2FS_I_SB(inode)))
+		return false;
+	if (f2fs_encrypted_file(inode))
+		return false;
+	if (fsverity_active(inode))
+		return false;
+	if (f2fs_compressed_file(inode))
+		return false;
+	return true;
+}
 void f2fs_mark_inode_dirty_sync(struct inode *inode, bool sync)
 {
 	if (is_inode_flag_set(inode, FI_NEW_INODE))
@@ -611,7 +629,16 @@ struct inode *f2fs_iget(struct super_block *sb, unsigned long ino)
 	} else if (S_ISREG(inode->i_mode)) {
 		inode->i_op = &f2fs_file_inode_operations;
 		inode->i_fop = &f2fs_file_operations;
+#ifdef CONFIG_F2FS_IOMAP_FOLIO_STATE
+		if (f2fs_should_use_buffered_iomap(inode)) {
+			mapping_set_large_folios(inode->i_mapping);
+			set_inode_flag(inode, FI_IOMAP);
+			inode->i_mapping->a_ops = &f2fs_iomap_aops;
+		} else
+			inode->i_mapping->a_ops = &f2fs_dblock_aops;
+#else
 		inode->i_mapping->a_ops = &f2fs_dblock_aops;
+#endif
 	} else if (S_ISDIR(inode->i_mode)) {
 		inode->i_op = &f2fs_dir_inode_operations;
 		inode->i_fop = &f2fs_dir_operations;
diff --git a/fs/f2fs/namei.c b/fs/f2fs/namei.c
index b882771e4699..2d995860c488 100644
--- a/fs/f2fs/namei.c
+++ b/fs/f2fs/namei.c
@@ -328,6 +328,13 @@ static struct inode *f2fs_new_inode(struct mnt_idmap *idmap,
 	f2fs_init_extent_tree(inode);
 
 	trace_f2fs_new_inode(inode, 0);
+#ifdef CONFIG_F2FS_IOMAP_FOLIO_STATE
+	if (f2fs_should_use_buffered_iomap(inode)) {
+		set_inode_flag(inode, FI_IOMAP);
+		mapping_set_large_folios(inode->i_mapping);
+	}
+#endif
+
 	return inode;
 
 fail:
diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
index 2000880b7dca..35a42d6214fe 100644
--- a/fs/f2fs/super.c
+++ b/fs/f2fs/super.c
@@ -1719,6 +1719,9 @@ static struct inode *f2fs_alloc_inode(struct super_block *sb)
 	init_once((void *) fi);
 
 	/* Initialize f2fs-specific inode info */
+#ifdef CONFIG_F2FS_IOMAP_FOLIO_STATE
+	atomic64_set(&fi->i_iomap_seq, 0);
+#endif
 	atomic_set(&fi->dirty_pages, 0);
 	atomic_set(&fi->i_compr_blocks, 0);
 	atomic_set(&fi->open_count, 0);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC PATCH 7/9] f2fs:Make GC aware of large folios
  2025-08-13  9:37 [f2fs-dev] [RESEND RFC " Nanzhe Zhao
@ 2025-08-13  9:37 ` Nanzhe Zhao
  0 siblings, 0 replies; 14+ messages in thread
From: Nanzhe Zhao @ 2025-08-13  9:37 UTC (permalink / raw)
  To: Jaegeuk Kim, linux-f2fs-devel, linux-fsdevel
  Cc: Matthew Wilcox, Chao Yu, Yi Zhang, Barry Song, Nanzhe Zhao

Previously, the GC (Garbage Collection) logic for performing I/O and
marking folios dirty only supported order-0 folios and lacked awareness
of higher-order folios. To enable GC to correctly handle higher-order
folios, we made two changes:

- In `move_data_page`, we now use `f2fs_iomap_set_range_dirty` to mark
  only the sub-part of the folio corresponding to `bidx` as dirty,
  instead of the entire folio.

- The `f2fs_submit_page_read` function has been augmented with an
  `index` parameter, allowing it to precisely identify which sub-page
  of the current folio is being submitted.

Signed-off-by: Nanzhe Zhao <nzzhao@126.com>
---
 fs/f2fs/data.c | 13 +++++++------
 fs/f2fs/gc.c   | 37 +++++++++++++++++++++++--------------
 2 files changed, 30 insertions(+), 20 deletions(-)

diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index b7bef2a28c8e..5ecd08a3dd0b 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -1096,7 +1096,7 @@ static struct bio *f2fs_grab_read_bio(struct inode *inode, block_t blkaddr,
 /* This can handle encryption stuffs */
 static int f2fs_submit_page_read(struct inode *inode, struct folio *folio,
 				 block_t blkaddr, blk_opf_t op_flags,
-				 bool for_write)
+				 pgoff_t index, bool for_write)
 {
 	struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
 	struct bio *bio;
@@ -1109,7 +1109,8 @@ static int f2fs_submit_page_read(struct inode *inode, struct folio *folio,
 	/* wait for GCed page writeback via META_MAPPING */
 	f2fs_wait_on_block_writeback(inode, blkaddr);
 
-	if (!bio_add_folio(bio, folio, PAGE_SIZE, 0)) {
+	if (!bio_add_folio(bio, folio, PAGE_SIZE,
+			   (index - folio->index) << PAGE_SHIFT)) {
 		iostat_update_and_unbind_ctx(bio);
 		if (bio->bi_private)
 			mempool_free(bio->bi_private, bio_post_read_ctx_pool);
@@ -1276,8 +1277,8 @@ struct folio *f2fs_get_read_data_folio(struct inode *inode, pgoff_t index,
 		return folio;
 	}
 
-	err = f2fs_submit_page_read(inode, folio, dn.data_blkaddr,
-						op_flags, for_write);
+	err = f2fs_submit_page_read(inode, folio, dn.data_blkaddr, op_flags,
+				    index, for_write);
 	if (err)
 		goto put_err;
 	return folio;
@@ -3651,8 +3652,8 @@ static int f2fs_write_begin(const struct kiocb *iocb,
 			goto put_folio;
 		}
 		err = f2fs_submit_page_read(use_cow ?
-				F2FS_I(inode)->cow_inode : inode,
-				folio, blkaddr, 0, true);
+			F2FS_I(inode)->cow_inode : inode, folio,
+			blkaddr, 0, folio->index, true);
 		if (err)
 			goto put_folio;
 
diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
index 098e9f71421e..6d28f01bec42 100644
--- a/fs/f2fs/gc.c
+++ b/fs/f2fs/gc.c
@@ -1475,22 +1475,31 @@ static int move_data_page(struct inode *inode, block_t bidx, int gc_type,
 			err = -EAGAIN;
 			goto out;
 		}
-		folio_mark_dirty(folio);
 		folio_set_f2fs_gcing(folio);
+#ifdef CONFIG_F2FS_IOMAP_FOLIO_STATE
+		if (!folio_test_large(folio)) {
+			folio_mark_dirty(folio);
+		} else {
+			f2fs_iomap_set_range_dirty(folio, (bidx - folio->index) << PAGE_SHIFT,
+				PAGE_SIZE);
+		}
+#else
+		folio_mark_dirty(folio);
+#endif
 	} else {
-		struct f2fs_io_info fio = {
-			.sbi = F2FS_I_SB(inode),
-			.ino = inode->i_ino,
-			.type = DATA,
-			.temp = COLD,
-			.op = REQ_OP_WRITE,
-			.op_flags = REQ_SYNC,
-			.old_blkaddr = NULL_ADDR,
-			.folio = folio,
-			.encrypted_page = NULL,
-			.need_lock = LOCK_REQ,
-			.io_type = FS_GC_DATA_IO,
-		};
+		struct f2fs_io_info fio = { .sbi = F2FS_I_SB(inode),
+					    .ino = inode->i_ino,
+					    .type = DATA,
+					    .temp = COLD,
+					    .op = REQ_OP_WRITE,
+					    .op_flags = REQ_SYNC,
+					    .old_blkaddr = NULL_ADDR,
+					    .folio = folio,
+					    .encrypted_page = NULL,
+					    .need_lock = LOCK_REQ,
+					    .io_type = FS_GC_DATA_IO,
+					    .idx = bidx - folio->index,
+					    .cnt = 1 };
 		bool is_dirty = folio_test_dirty(folio);
 
 retry:
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [f2fs-dev] [RFC PATCH 0/9] f2fs: Enable buffered read/write large folios support with extended iomap
  2025-08-13  9:21 [f2fs-dev] [RFC PATCH 0/9] f2fs: Enable buffered read/write large folios support with extended iomap Nanzhe Zhao
                   ` (8 preceding siblings ...)
  2025-08-13  9:21 ` [RFC PATCH 9/9] f2fs: Enable buffered read/write path large folios support for normal and atomic file with iomap Nanzhe Zhao
@ 2025-08-13 15:22 ` Christoph Hellwig
  2025-08-14  0:39   ` 赵南哲 
  9 siblings, 1 reply; 14+ messages in thread
From: Christoph Hellwig @ 2025-08-13 15:22 UTC (permalink / raw)
  To: Nanzhe Zhao
  Cc: Jaegeuk Kim, linux-f2fs, linux-fsdevel, Matthew Wilcox, Chao Yu,
	Yi Zhang, Barry Song, Darrick J. Wong, linux-xfs

On Wed, Aug 13, 2025 at 05:21:22PM +0800, Nanzhe Zhao wrote:
> * **Why extends iomap**
>   * F2FS stores its flags in the folio's private field,
>     which conflicts with iomap_folio_state.
>   * To resolve this, we designed f2fs_iomap_folio_state,
>     compatible with iomap_folio_state's layout while extending
>     its flexible state array for F2FS private flags.
>   * We store a magic number in read_bytes_pending to distinguish
>     whether a folio uses the original or F2FS's iomap_folio_state.
>     It's chosen because it remains 0 after readahead completes.

That's pretty ugly.  What additionals flags do you need?  We should
try to figure out if there is a sensible way to support the needs
with a single codebase and data structure if that the requirements
are sensible.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re:Re: [f2fs-dev] [RFC PATCH 0/9] f2fs: Enable buffered read/write large folios support with extended iomap
  2025-08-13 15:22 ` [f2fs-dev] [RFC PATCH 0/9] f2fs: Enable buffered read/write large folios support with extended iomap Christoph Hellwig
@ 2025-08-14  0:39   ` 赵南哲 
  2025-08-17  4:43     ` Nanzhe Zhao
  0 siblings, 1 reply; 14+ messages in thread
From: 赵南哲  @ 2025-08-14  0:39 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jaegeuk Kim, linux-f2fs-devel, linux-fsdevel, Matthew Wilcox,
	Chao Yu, Yi Zhang, Barry Song, Darrick J. Wong, linux-xfs

Hi Mr.Christoph,

Thanks for the quick feedback!

> That's pretty ugly.  What additional flags do you need?  

F2FS can utilize the folio's private field in a non-pointer mode to store its extra flags, which indicate the folio's additional status. 
Please take a look at the f2fs.h file from PAGE_PRIVATE_GET_FUNC to the end of clear_page_private_all().

These flags persist throughout the entire lifetime of a folio, which conflicts with the iomap_folio_state pointer.
Currently, the private fields of iomap's existing data structures,namely struct iomap's private, struct iomap_iter's private, 
and struct iomap_ioend's io_private,are either allocated locally on the stack or have a lifecycle on the heap that only exists 
for the duration of the I/O routine. This cannot meet F2FS's requirements.

> We should  try to figure out if there is a sensible way to support the needs
> with a single codebase and data structure.

As far as I know, only F2FS has this requirement, while other file systems do not. 
Therefore, my initial thought was to avoid directly modifying the generic logic in fs/iomap. Instead, I propose designing 
a wrapper structure for iomap_folio_state specifically for F2FS to satisfy both iomap's and F2FS's own needs.

Another issue is the handling of order-0 folios. Since the iomap framework does not allocate an iomap_folio_state for these folios, 
F2FS will always stores its private flags in the folio->private field. Then iomap framework would mistakenly interpret these flags as a pointer. 

If we are to solve this issue in generic iomap layer, a minimal changes method to iomap framework I suppose is to let iomap logic can
both distinguish pointer and non pointer mode of folio->private. We should also add a private field to iomap_folio_state , or extend he state 
flexible array to store the extra infomation. If iomap detects a order>0 folio's folio->private is used in non pointer mode, then it store the flags in a newly 
allocted iomap_folio_state first , clear the private field and then store's its address in it.

P.S.  I just noticed you didn't reply via my resend patch. I misspelled f2fs's subsytem mail address in the original patch and I sincerely apologize for that.
I already re-sent the series as  
 "[f2fs-dev] [RESEND RFC PATCH 0/9] f2fs: Enable buffered read/write large folios support with extended iomap"
Could we continue the discussion on that thread so the right list gets the
full context?  Thanks!

Best regards,
Nanzhe Zhao

At 2025-08-13 23:22:37, "Christoph Hellwig" <hch@infradead.org> wrote:
>On Wed, Aug 13, 2025 at 05:21:22PM +0800, Nanzhe Zhao wrote:
>> * **Why extends iomap**
>>   * F2FS stores its flags in the folio's private field,
>>     which conflicts with iomap_folio_state.
>>   * To resolve this, we designed f2fs_iomap_folio_state,
>>     compatible with iomap_folio_state's layout while extending
>>     its flexible state array for F2FS private flags.
>>   * We store a magic number in read_bytes_pending to distinguish
>>     whether a folio uses the original or F2FS's iomap_folio_state.
>>     It's chosen because it remains 0 after readahead completes.
>
>That's pretty ugly.  What additionals flags do you need?  We should
>try to figure out if there is a sensible way to support the needs
>with a single codebase and data structure if that the requirements
>are sensible.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re:Re:Re: [f2fs-dev] [RFC PATCH 0/9] f2fs: Enable buffered read/write large folios support with extended iomap
  2025-08-14  0:39   ` 赵南哲 
@ 2025-08-17  4:43     ` Nanzhe Zhao
  0 siblings, 0 replies; 14+ messages in thread
From: Nanzhe Zhao @ 2025-08-17  4:43 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jaegeuk Kim, linux-f2fs-devel, linux-fsdevel, Matthew Wilcox,
	Chao Yu, Yi Zhang, Barry Song, Darrick J. Wong, linux-xfs


There's another important reason to utilize an f2fs_iomap_folio_state. 

Because f2fs doesn't possess a per block state tracking data structure 
like buffer heads or subpages, it can't track per block dirty state or 
read/write bytes pending itself. Growing such a structure for f2fs and  
applying it to all code paths could be a tremendous and destructive 
task. 

So I think it's convenient to possess an f2fs own per folio 
private data structure that can both be compatible 
with iomap and f2fs's  needs, especially helpful for other f2fs's i/o paths
 that need to  support large folios altogether with buffered io but can't 
go into iomap  path (i.e., garbage collection).

 It can also be extended with fields to
 meet the needs of other types of f2fs files (e.g., compressed files) if
 they need to support large folios too.





At 2025-08-14 08:39:31, "赵南哲 " <nzzhao@126.com> wrote:
>Hi Mr.Christoph,
>
>Thanks for the quick feedback!
>
>> That's pretty ugly.  What additional flags do you need?  
>
>F2FS can utilize the folio's private field in a non-pointer mode to store its extra flags, which indicate the folio's additional status. 
>Please take a look at the f2fs.h file from PAGE_PRIVATE_GET_FUNC to the end of clear_page_private_all().
>
>These flags persist throughout the entire lifetime of a folio, which conflicts with the iomap_folio_state pointer.
>Currently, the private fields of iomap's existing data structures,namely struct iomap's private, struct iomap_iter's private, 
>and struct iomap_ioend's io_private,are either allocated locally on the stack or have a lifecycle on the heap that only exists 
>for the duration of the I/O routine. This cannot meet F2FS's requirements.
>
>> We should  try to figure out if there is a sensible way to support the needs
>> with a single codebase and data structure.
>
>As far as I know, only F2FS has this requirement, while other file systems do not. 
>Therefore, my initial thought was to avoid directly modifying the generic logic in fs/iomap. Instead, I propose designing 
>a wrapper structure for iomap_folio_state specifically for F2FS to satisfy both iomap's and F2FS's own needs.
>
>Another issue is the handling of order-0 folios. Since the iomap framework does not allocate an iomap_folio_state for these folios, 
>F2FS will always stores its private flags in the folio->private field. Then iomap framework would mistakenly interpret these flags as a pointer. 
>
>If we are to solve this issue in generic iomap layer, a minimal changes method to iomap framework I suppose is to let iomap logic can
>both distinguish pointer and non pointer mode of folio->private. We should also add a private field to iomap_folio_state , or extend he state 
>flexible array to store the extra infomation. If iomap detects a order>0 folio's folio->private is used in non pointer mode, then it store the flags in a newly 
>allocted iomap_folio_state first , clear the private field and then store's its address in it.
>
>P.S.  I just noticed you didn't reply via my resend patch. I misspelled f2fs's subsytem mail address in the original patch and I sincerely apologize for that.
>I already re-sent the series as  
> "[f2fs-dev] [RESEND RFC PATCH 0/9] f2fs: Enable buffered read/write large folios support with extended iomap"
>Could we continue the discussion on that thread so the right list gets the
>full context?  Thanks!
>
>Best regards,
>Nanzhe Zhao
>
>At 2025-08-13 23:22:37, "Christoph Hellwig" <hch@infradead.org> wrote:
>>On Wed, Aug 13, 2025 at 05:21:22PM +0800, Nanzhe Zhao wrote:
>>> * **Why extends iomap**
>>>   * F2FS stores its flags in the folio's private field,
>>>     which conflicts with iomap_folio_state.
>>>   * To resolve this, we designed f2fs_iomap_folio_state,
>>>     compatible with iomap_folio_state's layout while extending
>>>     its flexible state array for F2FS private flags.
>>>   * We store a magic number in read_bytes_pending to distinguish
>>>     whether a folio uses the original or F2FS's iomap_folio_state.
>>>     It's chosen because it remains 0 after readahead completes.
>>
>>That's pretty ugly.  What additionals flags do you need?  We should
>>try to figure out if there is a sensible way to support the needs
>>with a single codebase and data structure if that the requirements
>>are sensible.

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2025-08-17  4:44 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-13  9:21 [f2fs-dev] [RFC PATCH 0/9] f2fs: Enable buffered read/write large folios support with extended iomap Nanzhe Zhao
2025-08-13  9:21 ` [RFC PATCH 1/9] f2fs: Introduce f2fs_iomap_folio_state Nanzhe Zhao
2025-08-13  9:21 ` [RFC PATCH 2/9] f2fs: Integrate f2fs_iomap_folio_state into f2fs page private helpers Nanzhe Zhao
2025-08-13  9:21 ` [RFC PATCH 3/9] f2fs: Using `folio_detach_f2fs_private` in invalidate and release folio Nanzhe Zhao
2025-08-13  9:21 ` [RFC PATCH 4/9] f2fs: Convert outplace write path page private funcions to folio private functions Nanzhe Zhao
2025-08-13  9:21 ` [RFC PATCH 5/9] f2fs:Refactor `f2fs_is_compressed_page` to `f2fs_is_compressed_folio` Nanzhe Zhao
2025-08-13  9:21 ` [RFC PATCH 6/9] f2fs: Extend f2fs_io_info to support sub-folio ranges Nanzhe Zhao
2025-08-13  9:21 ` [RFC PATCH 7/9] f2fs:Make GC aware of large folios Nanzhe Zhao
2025-08-13  9:21 ` [RFC PATCH 8/9] f2fs: Introduce F2FS_GET_BLOCK_IOMAP and map_blocks he lpers Nanzhe Zhao
2025-08-13  9:21 ` [RFC PATCH 9/9] f2fs: Enable buffered read/write path large folios support for normal and atomic file with iomap Nanzhe Zhao
2025-08-13 15:22 ` [f2fs-dev] [RFC PATCH 0/9] f2fs: Enable buffered read/write large folios support with extended iomap Christoph Hellwig
2025-08-14  0:39   ` 赵南哲 
2025-08-17  4:43     ` Nanzhe Zhao
  -- strict thread matches above, loose matches on Subject: below --
2025-08-13  9:37 [f2fs-dev] [RESEND RFC " Nanzhe Zhao
2025-08-13  9:37 ` [RFC PATCH 7/9] f2fs:Make GC aware of large folios Nanzhe Zhao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).