[RFC PATCH 7/9] f2fs:Make GC aware of large folios

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [RFC PATCH 7/9] f2fs:Make GC aware of large folios
  2025-08-13  9:21 [f2fs-dev] [RFC PATCH 0/9] f2fs: Enable buffered read/write large folios support with extended iomap Nanzhe Zhao
@ 2025-08-13  9:21 ` Nanzhe Zhao
  0 siblings, 0 replies; 11+ messages in thread
From: Nanzhe Zhao @ 2025-08-13  9:21 UTC (permalink / raw)
  To: Jaegeuk Kim, linux-f2fs, linux-fsdevel
  Cc: Matthew Wilcox, Chao Yu, Yi Zhang, Barry Song, Nanzhe Zhao

Previously, the GC (Garbage Collection) logic for performing I/O and
marking folios dirty only supported order-0 folios and lacked awareness
of higher-order folios. To enable GC to correctly handle higher-order
folios, we made two changes:

- In `move_data_page`, we now use `f2fs_iomap_set_range_dirty` to mark
  only the sub-part of the folio corresponding to `bidx` as dirty,
  instead of the entire folio.

- The `f2fs_submit_page_read` function has been augmented with an
  `index` parameter, allowing it to precisely identify which sub-page
  of the current folio is being submitted.

Signed-off-by: Nanzhe Zhao <nzzhao@126.com>
---
 fs/f2fs/data.c | 13 +++++++------
 fs/f2fs/gc.c   | 37 +++++++++++++++++++++++--------------
 2 files changed, 30 insertions(+), 20 deletions(-)

diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index b7bef2a28c8e..5ecd08a3dd0b 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -1096,7 +1096,7 @@ static struct bio *f2fs_grab_read_bio(struct inode *inode, block_t blkaddr,
 /* This can handle encryption stuffs */
 static int f2fs_submit_page_read(struct inode *inode, struct folio *folio,
 				 block_t blkaddr, blk_opf_t op_flags,
-				 bool for_write)
+				 pgoff_t index, bool for_write)
 {
 	struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
 	struct bio *bio;
@@ -1109,7 +1109,8 @@ static int f2fs_submit_page_read(struct inode *inode, struct folio *folio,
 	/* wait for GCed page writeback via META_MAPPING */
 	f2fs_wait_on_block_writeback(inode, blkaddr);
 
-	if (!bio_add_folio(bio, folio, PAGE_SIZE, 0)) {
+	if (!bio_add_folio(bio, folio, PAGE_SIZE,
+			   (index - folio->index) << PAGE_SHIFT)) {
 		iostat_update_and_unbind_ctx(bio);
 		if (bio->bi_private)
 			mempool_free(bio->bi_private, bio_post_read_ctx_pool);
@@ -1276,8 +1277,8 @@ struct folio *f2fs_get_read_data_folio(struct inode *inode, pgoff_t index,
 		return folio;
 	}
 
-	err = f2fs_submit_page_read(inode, folio, dn.data_blkaddr,
-						op_flags, for_write);
+	err = f2fs_submit_page_read(inode, folio, dn.data_blkaddr, op_flags,
+				    index, for_write);
 	if (err)
 		goto put_err;
 	return folio;
@@ -3651,8 +3652,8 @@ static int f2fs_write_begin(const struct kiocb *iocb,
 			goto put_folio;
 		}
 		err = f2fs_submit_page_read(use_cow ?
-				F2FS_I(inode)->cow_inode : inode,
-				folio, blkaddr, 0, true);
+			F2FS_I(inode)->cow_inode : inode, folio,
+			blkaddr, 0, folio->index, true);
 		if (err)
 			goto put_folio;
 
diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
index 098e9f71421e..6d28f01bec42 100644
--- a/fs/f2fs/gc.c
+++ b/fs/f2fs/gc.c
@@ -1475,22 +1475,31 @@ static int move_data_page(struct inode *inode, block_t bidx, int gc_type,
 			err = -EAGAIN;
 			goto out;
 		}
-		folio_mark_dirty(folio);
 		folio_set_f2fs_gcing(folio);
+#ifdef CONFIG_F2FS_IOMAP_FOLIO_STATE
+		if (!folio_test_large(folio)) {
+			folio_mark_dirty(folio);
+		} else {
+			f2fs_iomap_set_range_dirty(folio, (bidx - folio->index) << PAGE_SHIFT,
+				PAGE_SIZE);
+		}
+#else
+		folio_mark_dirty(folio);
+#endif
 	} else {
-		struct f2fs_io_info fio = {
-			.sbi = F2FS_I_SB(inode),
-			.ino = inode->i_ino,
-			.type = DATA,
-			.temp = COLD,
-			.op = REQ_OP_WRITE,
-			.op_flags = REQ_SYNC,
-			.old_blkaddr = NULL_ADDR,
-			.folio = folio,
-			.encrypted_page = NULL,
-			.need_lock = LOCK_REQ,
-			.io_type = FS_GC_DATA_IO,
-		};
+		struct f2fs_io_info fio = { .sbi = F2FS_I_SB(inode),
+					    .ino = inode->i_ino,
+					    .type = DATA,
+					    .temp = COLD,
+					    .op = REQ_OP_WRITE,
+					    .op_flags = REQ_SYNC,
+					    .old_blkaddr = NULL_ADDR,
+					    .folio = folio,
+					    .encrypted_page = NULL,
+					    .need_lock = LOCK_REQ,
+					    .io_type = FS_GC_DATA_IO,
+					    .idx = bidx - folio->index,
+					    .cnt = 1 };
 		bool is_dirty = folio_test_dirty(folio);
 
 retry:
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [f2fs-dev] [RESEND RFC PATCH 0/9] f2fs: Enable buffered read/write large folios support with extended iomap
@ 2025-08-13  9:37 Nanzhe Zhao
  2025-08-13  9:37 ` [RFC PATCH 1/9] f2fs: Introduce f2fs_iomap_folio_state Nanzhe Zhao
                   ` (8 more replies)
  0 siblings, 9 replies; 11+ messages in thread
From: Nanzhe Zhao @ 2025-08-13  9:37 UTC (permalink / raw)
  To: Jaegeuk Kim, linux-f2fs-devel, linux-fsdevel
  Cc: Matthew Wilcox, Chao Yu, Yi Zhang, Barry Song, Nanzhe Zhao

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=y, Size: 6550 bytes --]

Resend: original patch was misspelling
the linux-f2fs-devel@lists.sourceforge.net address.
No code changes.

This RFC series enable buffered read/write paths large folio support
with F2FS-specific extended iomap, combined with some other preparation
work for large folio integration.

Because this is my first time to send a patch series to the kernel
mailing list,
I might have missed some conventions.
The patch passes checkpatch.pl with no errors,
but a few warnings remain.
I wasn't sure about the best way to address them,
so I would appreciate your guidance.
I am happy to fix them if needed.

Motivations:

* **Why iomap**:
  * F2FS couples pages directly to BIOs without a per-block tracking
    struct like buffer-head or sub-page.
    A naive large-folio port would cause:
    * Write-amplification.
    * Premature folio_end_read() / folio_end_writeback()
      when multi sub-ranges of a large folio are in io concurrently.
    Above issues has already been handled cleanly by iomap_folio_state.

  * Original buffered write path unlocks a folio halfway,causes status
    recheck for large folios carried with iomap_folio_state
    or partially trucnated folio tricky.iomap handles all locking
    unlocking operations automatically.

* **Why extends iomap**
  * F2FS stores its flags in the folio's private field,
    which conflicts with iomap_folio_state.
  * To resolve this, we designed f2fs_iomap_folio_state,
    compatible with iomap_folio_state's layout while extending
    its flexible state array for F2FS private flags.
  * We store a magic number in read_bytes_pending to distinguish
    whether a folio uses the original or F2FS's iomap_folio_state.
    It's chosen because it remains 0 after readahead completes.

Implementation notes:

* New Kconfig: `CONFIG_F2FS_IOMAP_FOLIO_STATE`; when off,falls back
  to the legacy buffered io path.

Limitations

* Don't support BLOCK_SIZE > PAGE_SIZE now.
* Don't support large folios for encrypted and fsverity files.
* Page writeback and compressed files large folios support is still WIP.

Why RFC:

* Need review and potential improvement on
 `f2fs_iomap_folio_state` design and implementation.
* Limited test coverage so far.Any extra testing is highly appreciated.
* Two runtime issues remain (see below).

Performance Testing:

* Platform: x86-64 laptop (PCIe 4.0 NVMe) -> qemu-arm64 VM, 4 GiB RAM
* Kernel: gcc-13.2, defconfig + `CONFIG_F2FS_IOMAP_FOLIO_STATE=y`
fio 3.35, `ioengine=psync`, `size=1G`, `numjobs=1`

Read throughput (MiB/s):

--- Kernel: iomap_v1 file type: normal ---
Block Size (bs)      | Avg. Bandwidth (MiB/s)       | Avg. IOPS
---------------------+------------------------------+-----------------
100M                 | 2809.60                      | 27.50
10M                  | 3184.60                      | 317.90
128k                 | 1376.20                      | 11000.80
1G                   | 1954.70                      | 1.20
1M                   | 2717.00                      | 2716.70
4k                   | 616.50                       | 157800.00

--- Kernel: vanilla  file type: normal ---
Block Size (bs)      | Avg. Bandwidth (MiB/s)       | Avg. IOPS
---------------------+------------------------------+-----------------
100M                 | 994.60                       | 9.60
10M                  | 986.50                       | 98.10
128k                 | 693.80                       | 5550.90
1G                   | 816.90                       | 0.00
1M                   | 968.90                       | 968.40
4k                   | 429.80                       | 109990.00
--- Kernel: iomap_v1 file type: hole ---
Block Size (bs)      | Avg. Bandwidth (MiB/s)       | Avg. IOPS
---------------------+------------------------------+-----------------
100M                 | 1825.60                      | 17.70
10M                  | 1989.24                      | 198.42
1G                   | 1312.80                      | 0.90
1M                   | 2326.02                      | 2325.42
4k                   | 799.40                       | 204700.00

--- Kernel: vanilla file type: hole ---
Block Size (bs)      | Avg. Bandwidth (MiB/s)       | Avg. IOPS
---------------------+------------------------------+-----------------
100M                 | 708.90                       | 6.50
10M                  | 735.00                       | 73.10
128k                 | 786.70                       | 6292.20
1G                   | 613.20                       | 0.00
1M                   | 764.50                       | 764.25
4k                   | 478.80                       | 122400.00

Sparse-file numbers on qemu look skewed; further bare-metal tests planned.

Write benchmarks are currently blocked by the issues below.

Known issues (help appreciated)

**Write throttling stalls**
  ```sh
  dd if=/dev/zero of=test.img bs=1G count=1 conv=fsync
  ```
  Write speed decays; task spins in `iomap_write_iter`
  ->`balance_dirty_pages_ratelimited_flags`.

**fsync dead-lock**
  ```sh
  fio --rw=write --bs=4K --fsync=1 --size=1G --ioengine=psync …
  ```
  Task Hangs in `f2fs_issue_flush`->'submit_bio_wait'

Full traces will be posted in a follow-up.

Nanzhe Zhao (9):
  f2fs: Introduce f2fs_iomap_folio_state
  f2fs: Integrate f2fs_iomap_folio_state into f2fs page private helpers
  f2fs: Using `folio_detach_f2fs_private` in invalidate and release
    folio
  f2fs: Convert outplace write path page private funcions to folio
    private functions.
  f2fs:Refactor `f2fs_is_compressed_page` to `f2fs_is_compressed_folio`
  f2fs: Extend f2fs_io_info to support sub-folio ranges
  f2fs:Make GC aware of large folios
  f2fs: Introduce F2FS_GET_BLOCK_IOMAP and map_blocks he lpers
  f2fs: Enable buffered read/write path large folios support for normal
    and atomic file with iomap

 fs/f2fs/Kconfig    |  10 ++
 fs/f2fs/Makefile   |   1 +
 fs/f2fs/compress.c |  11 +-
 fs/f2fs/data.c     | 389 ++++++++++++++++++++++++++++++++++++------
 fs/f2fs/f2fs.h     | 412 ++++++++++++++++++++++++++++++++++-----------
 fs/f2fs/f2fs_ifs.c | 221 ++++++++++++++++++++++++
 fs/f2fs/f2fs_ifs.h |  79 +++++++++
 fs/f2fs/file.c     |  33 +++-
 fs/f2fs/gc.c       |  37 ++--
 fs/f2fs/inline.c   |  15 +-
 fs/f2fs/inode.c    |  27 +++
 fs/f2fs/namei.c    |   7 +
 fs/f2fs/segment.c  |   2 +-
 fs/f2fs/super.c    |   3 +
 14 files changed, 1082 insertions(+), 165 deletions(-)
 create mode 100644 fs/f2fs/f2fs_ifs.c
 create mode 100644 fs/f2fs/f2fs_ifs.h


base-commit: b45116aef78ff0059abf563b339e62a734487a50
--
2.34.1


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [RFC PATCH 1/9] f2fs: Introduce f2fs_iomap_folio_state
  2025-08-13  9:37 [f2fs-dev] [RESEND RFC PATCH 0/9] f2fs: Enable buffered read/write large folios support with extended iomap Nanzhe Zhao
@ 2025-08-13  9:37 ` Nanzhe Zhao
  2025-08-13  9:37 ` [RFC PATCH 2/9] f2fs: Integrate f2fs_iomap_folio_state into f2fs page private helpers Nanzhe Zhao
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 11+ messages in thread
From: Nanzhe Zhao @ 2025-08-13  9:37 UTC (permalink / raw)
  To: Jaegeuk Kim, linux-f2fs-devel, linux-fsdevel
  Cc: Matthew Wilcox, Chao Yu, Yi Zhang, Barry Song, Nanzhe Zhao

Add f2fs's own per-folio structure to track
per-block dirty state of a folio.

The reason for introducing this structure is that f2fs's private flag
would conflict with iomap_folio_state's use of the folio->private field.
Thanks to Mr. Matthew for providing the idea. See for details:
[https://lore.kernel.org/linux-f2fs-devel/Z-oPTUrF7kkhzJg_
@casper.infradead.org/]

The memory layout of this structure is the same as iomap_folio_state,
except that we set read_bytes_pending to a magic number. This is because
we need to be able to distinguish it from the original iomap_folio_state.
We additionally allocate an unsigned long at the end of the state array
to store f2fs-specific flags.

This implementation is compatible with high-order folios, order-0 folios,
and metadata folios.
However, it does not support compressed data folios.

Introduction to related functions:

- f2fs_ifs_alloc: Allocates f2fs's own f2fs_iomap_folio_state. If it
  detects that folio->private already has a value, we distinguish
  whether it is f2fs's own flag value or an iomap_folio_state. If it is
  the latter, we will copy its content to our f2fs_iomap_folio_state
  and then free it.

- folio_detach_f2fs_private: Serves as a unified interface to release
  f2fs's private resources, no matter what it is.

- f2fs_ifs_clear_range_uptodate && f2fs_ifs_set_range_dirty: Helper
  functions copied and slightly modified from fs/iomap.

- folio_get_f2fs_ifs: Specifically used to get f2fs_iomap_folio_state.
  It cannot be used to get f2fs's own fields used on compressed folios.
  For the former, we return a null pointer to indicate that the current
  folio does not hold an f2fs_iomap_folio_state. For the latter, we
  directly BUG_ON.

Signed-off-by: Nanzhe Zhao <nzzhao@126.com>
---
 fs/f2fs/Kconfig    |  10 ++
 fs/f2fs/Makefile   |   1 +
 fs/f2fs/f2fs_ifs.c | 221 +++++++++++++++++++++++++++++++++++++++++++++
 fs/f2fs/f2fs_ifs.h |  79 ++++++++++++++++
 4 files changed, 311 insertions(+)
 create mode 100644 fs/f2fs/f2fs_ifs.c
 create mode 100644 fs/f2fs/f2fs_ifs.h

diff --git a/fs/f2fs/Kconfig b/fs/f2fs/Kconfig
index 5916a02fb46d..480b8536fa39 100644
--- a/fs/f2fs/Kconfig
+++ b/fs/f2fs/Kconfig
@@ -150,3 +150,13 @@ config F2FS_UNFAIR_RWSEM
 	help
 	  Use unfair rw_semaphore, if system configured IO priority by block
 	  cgroup.
+
+config F2FS_IOMAP_FOLIO_STATE
+	bool "F2FS folio per-block I/O state tracking"
+	depends on F2FS_FS && FS_IOMAP
+	help
+	  Enable a custom F2FS structure for tracking the I/O state
+	  (up-to-date, dirty) on a per-block basis within a memory folio.
+	  This structure stores F2FS private flag in its state flexible
+	  array while keeping compatibility with generic iomap_folio_state.
+	  Must be enabled if using iomap large folios support in F2FS.
\ No newline at end of file
diff --git a/fs/f2fs/Makefile b/fs/f2fs/Makefile
index 8a7322d229e4..3b9270d774e8 100644
--- a/fs/f2fs/Makefile
+++ b/fs/f2fs/Makefile
@@ -10,3 +10,4 @@ f2fs-$(CONFIG_F2FS_FS_POSIX_ACL) += acl.o
 f2fs-$(CONFIG_FS_VERITY) += verity.o
 f2fs-$(CONFIG_F2FS_FS_COMPRESSION) += compress.o
 f2fs-$(CONFIG_F2FS_IOSTAT) += iostat.o
+f2fs-$(CONFIG_F2FS_IOMAP_FOLIO_STATE) += f2fs_ifs.o
diff --git a/fs/f2fs/f2fs_ifs.c b/fs/f2fs/f2fs_ifs.c
new file mode 100644
index 000000000000..6b7503474580
--- /dev/null
+++ b/fs/f2fs/f2fs_ifs.c
@@ -0,0 +1,221 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/fs.h>
+#include <linux/f2fs_fs.h>
+
+#include "f2fs.h"
+#include "f2fs_ifs.h"
+
+/*
+ * Have to set parameter ifs's type to void*
+ * and have to interpret ifs as f2fs_ifs to access its fields because
+ * we cannot see iomap_folio_state definition
+ */
+static void ifs_to_f2fs_ifs(void *ifs, struct f2fs_iomap_folio_state *fifs,
+			    struct folio *folio)
+{
+	struct f2fs_iomap_folio_state *src_ifs =
+		(struct f2fs_iomap_folio_state *)ifs;
+	size_t iomap_longs = f2fs_ifs_iomap_longs(folio);
+
+	fifs->read_bytes_pending = READ_ONCE(src_ifs->read_bytes_pending);
+	atomic_set(&fifs->write_bytes_pending,
+		   atomic_read(&src_ifs->write_bytes_pending));
+	memcpy(fifs->state, src_ifs->state,
+	       iomap_longs * sizeof(unsigned long));
+}
+
+static inline bool is_f2fs_ifs(struct folio *folio)
+{
+	struct f2fs_iomap_folio_state *fifs;
+
+	if (!folio_test_private(folio))
+		return false;
+
+	// first directly test no pointer flag is set or not
+	if (test_bit(PAGE_PRIVATE_NOT_POINTER,
+		     (unsigned long *)&folio->private))
+		return false;
+
+	fifs = (struct f2fs_iomap_folio_state *)folio->private;
+	if (!fifs)
+		return false;
+
+	if (READ_ONCE(fifs->read_bytes_pending) == F2FS_IFS_MAGIC)
+		return true;
+
+	return false;
+}
+
+struct f2fs_iomap_folio_state *f2fs_ifs_alloc(struct folio *folio, gfp_t gfp,
+					      bool force_alloc)
+{
+	struct inode *inode = folio->mapping->host;
+	size_t alloc_size = 0;
+
+	if (!folio_test_large(folio)) {
+		if (!force_alloc) {
+			WARN_ON_ONCE(1);
+			return NULL;
+		}
+		/*
+		 * GC can store private flag in 0 order folio's folio->private
+		 * causes iomap buffered write mistakenly interpret as a pointer
+		 * we add a bool force_alloc to deal with this case
+		 */
+		struct f2fs_iomap_folio_state *fifs;
+
+		alloc_size = sizeof(*fifs) + 2 * sizeof(unsigned long);
+		fifs = kmalloc(alloc_size, gfp);
+		if (!fifs)
+			return NULL;
+		spin_lock_init(&fifs->state_lock);
+		WRITE_ONCE(fifs->read_bytes_pending, F2FS_IFS_MAGIC);
+		atomic_set(&fifs->write_bytes_pending, 0);
+		unsigned int nr_blocks =
+			i_blocks_per_folio(inode, folio);
+		if (folio_test_uptodate(folio))
+			bitmap_set(fifs->state, 0, nr_blocks);
+		if (folio_test_dirty(folio))
+			bitmap_set(fifs->state, nr_blocks, nr_blocks);
+		*f2fs_ifs_private_flags_ptr(fifs, folio) = 0;
+		folio_attach_private(folio, fifs);
+		return fifs;
+	}
+
+	struct f2fs_iomap_folio_state *fifs;
+	void *old_private;
+	size_t iomap_longs;
+	size_t total_longs;
+
+	WARN_ON_ONCE(!inode); // Should have an inode
+
+	old_private = folio_get_private(folio);
+
+	if (old_private) {
+		// Check if it's already our type using the magic number directly
+		if (READ_ONCE(((struct f2fs_iomap_folio_state *)old_private)
+				      ->read_bytes_pending) == F2FS_IFS_MAGIC) {
+			return (struct f2fs_iomap_folio_state *)
+				old_private; // Already ours
+		}
+		// Non-NULL, not ours -> Allocate, Copy, Replace path
+		total_longs = f2fs_ifs_total_longs(folio);
+		alloc_size = sizeof(*fifs) +
+				total_longs * sizeof(unsigned long);
+
+		fifs = kmalloc(alloc_size, gfp);
+		if (!fifs)
+			return NULL;
+
+		spin_lock_init(&fifs->state_lock);
+		*f2fs_ifs_private_flags_ptr(fifs, folio) = 0;
+		// Copy data from the presumed iomap_folio_state (old_private)
+		ifs_to_f2fs_ifs(old_private, fifs, folio);
+		WRITE_ONCE(fifs->read_bytes_pending, F2FS_IFS_MAGIC);
+		folio_change_private(folio, fifs);
+		kfree(old_private);
+		return fifs;
+	}
+
+	iomap_longs = f2fs_ifs_iomap_longs(folio);
+	total_longs = iomap_longs + 1;
+	alloc_size =
+		sizeof(*fifs) + total_longs * sizeof(unsigned long);
+
+	fifs = kzalloc(alloc_size, gfp);
+	if (!fifs)
+		return NULL;
+
+	spin_lock_init(&fifs->state_lock);
+
+	unsigned int nr_blocks = i_blocks_per_folio(inode, folio);
+
+	if (folio_test_uptodate(folio))
+		bitmap_set(fifs->state, 0, nr_blocks);
+	if (folio_test_dirty(folio))
+		bitmap_set(fifs->state, nr_blocks, nr_blocks);
+	WRITE_ONCE(fifs->read_bytes_pending, F2FS_IFS_MAGIC);
+	atomic_set(&fifs->write_bytes_pending, 0);
+	folio_attach_private(folio, fifs);
+	return fifs;
+}
+
+void folio_detach_f2fs_private(struct folio *folio)
+{
+	struct f2fs_iomap_folio_state *fifs;
+
+	if (!folio_test_private(folio))
+		return;
+
+	// Check if it's using direct flags
+	if (test_bit(PAGE_PRIVATE_NOT_POINTER,
+		     (unsigned long *)&folio->private)) {
+		folio_detach_private(folio);
+		return;
+	}
+
+	fifs = folio_detach_private(folio);
+	if (!fifs)
+		return;
+
+	if (is_f2fs_ifs(folio)) {
+		WARN_ON_ONCE(READ_ONCE(fifs->read_bytes_pending) !=
+			     F2FS_IFS_MAGIC);
+		WARN_ON_ONCE(atomic_read(&fifs->write_bytes_pending));
+	} else {
+		WARN_ON_ONCE(READ_ONCE(fifs->read_bytes_pending) != 0);
+		WARN_ON_ONCE(atomic_read(&fifs->write_bytes_pending));
+	}
+
+	kfree(fifs);
+}
+
+struct f2fs_iomap_folio_state *folio_get_f2fs_ifs(struct folio *folio)
+{
+	if (!folio_test_private(folio))
+		return NULL;
+
+	if (test_bit(PAGE_PRIVATE_NOT_POINTER,
+		     (unsigned long *)&folio->private))
+		return NULL;
+	/*
+	 * Note we assume folio->private can be either ifs or f2fs_ifs here.
+	 * Compresssed folios should not call this function
+	 */
+	f2fs_bug_on(F2FS_F_SB(folio),
+		    *((u32 *)folio->private) == F2FS_COMPRESSED_PAGE_MAGIC);
+	return folio->private;
+}
+
+void f2fs_ifs_clear_range_uptodate(struct folio *folio,
+				   struct f2fs_iomap_folio_state *fifs,
+				   size_t off, size_t len)
+{
+	struct inode *inode = folio->mapping->host;
+	unsigned int first_blk = (off >> inode->i_blkbits);
+	unsigned int last_blk = (off + len - 1) >> inode->i_blkbits;
+	unsigned int nr_blks = last_blk - first_blk + 1;
+	unsigned long flags;
+
+	spin_lock_irqsave(&fifs->state_lock, flags);
+	bitmap_clear(fifs->state, first_blk, nr_blks);
+	spin_unlock_irqrestore(&fifs->state_lock, flags);
+}
+
+void f2fs_iomap_set_range_dirty(struct folio *folio, size_t off, size_t len)
+{
+	struct f2fs_iomap_folio_state *fifs = folio_get_f2fs_ifs(folio);
+
+	if (fifs) {
+		struct inode *inode = folio->mapping->host;
+		unsigned int blks_per_folio = i_blocks_per_folio(inode, folio);
+		unsigned int first_blk = (off >> inode->i_blkbits);
+		unsigned int last_blk = (off + len - 1) >> inode->i_blkbits;
+		unsigned int nr_blks = last_blk - first_blk + 1;
+		unsigned long flags;
+
+		spin_lock_irqsave(&fifs->state_lock, flags);
+		bitmap_set(fifs->state, first_blk + blks_per_folio, nr_blks);
+		spin_unlock_irqrestore(&fifs->state_lock, flags);
+	}
+}
diff --git a/fs/f2fs/f2fs_ifs.h b/fs/f2fs/f2fs_ifs.h
new file mode 100644
index 000000000000..3b16deda8a1e
--- /dev/null
+++ b/fs/f2fs/f2fs_ifs.h
@@ -0,0 +1,79 @@
+// SPDX-License-Identifier: GPL-2.0
+#ifndef F2FS_IFS_H
+#define F2FS_IFS_H
+
+#include <linux/fs.h>
+#include <linux/bug.h>
+#include <linux/f2fs_fs.h>
+#include <linux/mm.h>
+#include <linux/iomap.h>
+#include <linux/slab.h>
+#include <linux/spinlock.h>
+#include <linux/atomic.h>
+
+#include "f2fs.h"
+
+#define F2FS_IFS_MAGIC 0xf2f5
+#define F2FS_IFS_PRIVATE_LONGS 1
+
+/*
+ * F2FS structure for folio private data, mimicking iomap_folio_state layout.
+ * F2FS private flags/data are stored in extra space allocated at the end
+ */
+struct f2fs_iomap_folio_state {
+	spinlock_t state_lock;
+	unsigned int read_bytes_pending;
+	atomic_t write_bytes_pending;
+	/*
+	 * Flexible array member.
+	 * Holds [0...iomap_longs-1] for iomap uptodate/dirty bits.
+	 * Holds [iomap_longs] for F2FS private flags/data (unsigned long).
+	 */
+	unsigned long state[];
+};
+
+static inline bool
+f2fs_ifs_block_is_uptodate(struct f2fs_iomap_folio_state *ifs,
+			   unsigned int block)
+{
+	return test_bit(block, ifs->state);
+}
+
+static inline size_t f2fs_ifs_iomap_longs(const struct folio *folio)
+{
+	struct inode *inode = folio->mapping->host;
+
+	WARN_ON_ONCE(!inode);
+	unsigned int nr_blocks =
+		i_blocks_per_folio(inode, (struct folio *)folio);
+	return BITS_TO_LONGS(2 * nr_blocks);
+}
+
+static inline size_t f2fs_ifs_total_longs(struct folio *folio)
+{
+	return f2fs_ifs_iomap_longs(folio) + F2FS_IFS_PRIVATE_LONGS;
+}
+
+static inline unsigned long *
+f2fs_ifs_private_flags_ptr(struct f2fs_iomap_folio_state *fifs,
+			   const struct folio *folio)
+{
+	return &fifs->state[f2fs_ifs_iomap_longs(folio)];
+}
+
+struct f2fs_iomap_folio_state *f2fs_ifs_alloc(struct folio *folio, gfp_t gfp,
+					      bool force_alloc);
+void folio_detach_f2fs_private(struct folio *folio);
+struct f2fs_iomap_folio_state *folio_get_f2fs_ifs(struct folio *folio);
+
+/*
+ * 0-order and fully dirty folio has no fifs
+ * they store private flag directly in their folio->private field
+ * as original f2fs page private behaviour
+ */
+void f2fs_ifs_clear_range_uptodate(struct folio *folio,
+				   struct f2fs_iomap_folio_state *fifs,
+				   size_t off, size_t len);
+void f2fs_iomap_set_range_dirty(struct folio *folio, size_t off, size_t len);
+
+#endif /* F2FS_IFS_H */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [RFC PATCH 2/9] f2fs: Integrate f2fs_iomap_folio_state into f2fs page private helpers
  2025-08-13  9:37 [f2fs-dev] [RESEND RFC PATCH 0/9] f2fs: Enable buffered read/write large folios support with extended iomap Nanzhe Zhao
  2025-08-13  9:37 ` [RFC PATCH 1/9] f2fs: Introduce f2fs_iomap_folio_state Nanzhe Zhao
@ 2025-08-13  9:37 ` Nanzhe Zhao
  2025-08-13  9:37 ` [RFC PATCH 3/9] f2fs: Using `folio_detach_f2fs_private` in invalidate and release folio Nanzhe Zhao
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 11+ messages in thread
From: Nanzhe Zhao @ 2025-08-13  9:37 UTC (permalink / raw)
  To: Jaegeuk Kim, linux-f2fs-devel, linux-fsdevel
  Cc: Matthew Wilcox, Chao Yu, Yi Zhang, Barry Song, Nanzhe Zhao

Integrate f2fs_iomap_folio_state into the f2fs page private helper
functions.

In these functions, we adopt a two-stage strategy to handle the
folio->private field, now supporting both direct bit flags and the
new f2fs_iomap_folio_state pointer.

Note that my implementation does not rely on checking the folio's
order to distinguish whether the folio's private field stores
a flag or an f2fs_iomap_folio_state.
This is because in the folio_set_f2fs_xxx
functions, we will forcibly allocate an f2fs_iomap_folio_state
struct even for order-0 folios under certain conditions.

The reason for doing this is that if an order-0 folio's private field
is set to an f2fs private flag by a thread like gc, the generic
iomap_folio_state helper functions used in iomap buffered write will
mistakenly interpret it as an iomap_folio_state pointer.
We cannot, or rather should not, modify fs/iomap to make it recognize
f2fs's private flags.
Therefore, for now, I have to uniformly allocate an
f2fs_iomap_folio_state for all folios that will need to store an
f2fs private flag to ensure correctness.

I am also thinking about other ways to eliminate the extra memory
overhead this approach introduces. Suggestions would be grateful.

Signed-off-by: Nanzhe Zhao <nzzhao@126.com>
---
 fs/f2fs/f2fs.h | 278 +++++++++++++++++++++++++++++++++++++++----------
 1 file changed, 225 insertions(+), 53 deletions(-)

diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 8df0443dd189..a14bef4dc394 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -27,7 +27,10 @@
 
 #include <linux/fscrypt.h>
 #include <linux/fsverity.h>
-
+#include <linux/iomap.h>
+#ifdef CONFIG_F2FS_IOMAP_FOLIO_STATE
+#include "f2fs_ifs.h"
+#endif
 struct pagevec;
 
 #ifdef CONFIG_F2FS_CHECK_FS
@@ -2509,58 +2512,227 @@ static inline int inc_valid_block_count(struct f2fs_sb_info *sbi,
 	return -ENOSPC;
 }
 
-#define PAGE_PRIVATE_GET_FUNC(name, flagname) \
-static inline bool folio_test_f2fs_##name(const struct folio *folio)	\
-{									\
-	unsigned long priv = (unsigned long)folio->private;		\
-	unsigned long v = (1UL << PAGE_PRIVATE_NOT_POINTER) |		\
-			     (1UL << PAGE_PRIVATE_##flagname);		\
-	return (priv & v) == v;						\
-}									\
-static inline bool page_private_##name(struct page *page) \
-{ \
-	return PagePrivate(page) && \
-		test_bit(PAGE_PRIVATE_NOT_POINTER, &page_private(page)) && \
-		test_bit(PAGE_PRIVATE_##flagname, &page_private(page)); \
-}
-
-#define PAGE_PRIVATE_SET_FUNC(name, flagname) \
-static inline void folio_set_f2fs_##name(struct folio *folio)		\
-{									\
-	unsigned long v = (1UL << PAGE_PRIVATE_NOT_POINTER) |		\
-			     (1UL << PAGE_PRIVATE_##flagname);		\
-	if (!folio->private)						\
-		folio_attach_private(folio, (void *)v);			\
-	else {								\
-		v |= (unsigned long)folio->private;			\
-		folio->private = (void *)v;				\
-	}								\
-}									\
-static inline void set_page_private_##name(struct page *page) \
-{ \
-	if (!PagePrivate(page)) \
-		attach_page_private(page, (void *)0); \
-	set_bit(PAGE_PRIVATE_NOT_POINTER, &page_private(page)); \
-	set_bit(PAGE_PRIVATE_##flagname, &page_private(page)); \
-}
-
-#define PAGE_PRIVATE_CLEAR_FUNC(name, flagname) \
-static inline void folio_clear_f2fs_##name(struct folio *folio)		\
-{									\
-	unsigned long v = (unsigned long)folio->private;		\
-									\
-	v &= ~(1UL << PAGE_PRIVATE_##flagname);				\
-	if (v == (1UL << PAGE_PRIVATE_NOT_POINTER))			\
-		folio_detach_private(folio);				\
-	else								\
-		folio->private = (void *)v;				\
-}									\
-static inline void clear_page_private_##name(struct page *page) \
-{ \
-	clear_bit(PAGE_PRIVATE_##flagname, &page_private(page)); \
-	if (page_private(page) == BIT(PAGE_PRIVATE_NOT_POINTER)) \
-		detach_page_private(page); \
+extern bool f2fs_should_use_buffered_iomap(struct inode *inode);
+#ifdef CONFIG_F2FS_IOMAP_FOLIO_STATE
+#define F2FS_FOLIO_PRIVATE_GET_FUNC(name, flagname)                            \
+	static inline bool folio_test_f2fs_##name(const struct folio *folio)   \
+	{                                                                      \
+		/* First try direct folio->private access for meta folio */    \
+		if (folio_test_private(folio) &&                               \
+		    test_bit(PAGE_PRIVATE_NOT_POINTER,                         \
+			     (unsigned long *)&folio->private)) {              \
+			return test_bit(PAGE_PRIVATE_##flagname,               \
+					(unsigned long *)&folio->private);     \
+		}                                                              \
+		/* For higher-order folios, use iomap folio state */           \
+		struct f2fs_iomap_folio_state *fifs =                          \
+			(struct f2fs_iomap_folio_state *)folio->private;       \
+		unsigned long *private_p;                                      \
+		if (unlikely(!fifs || !folio->mapping))                        \
+			return false;                                          \
+		/* Check magic number before accessing private data */         \
+		if (READ_ONCE(fifs->read_bytes_pending) != F2FS_IFS_MAGIC)     \
+			return false;                                          \
+		private_p = f2fs_ifs_private_flags_ptr(fifs, folio);           \
+		if (!private_p)                                                \
+			return false;                                          \
+		/* Test bits directly on the 'private' slot */                 \
+		return test_bit(PAGE_PRIVATE_##flagname, private_p);           \
+	}                                                                      \
+	static inline bool page_private_##name(struct page *page)              \
+	{                                                                      \
+		return PagePrivate(page) &&                                    \
+		       test_bit(PAGE_PRIVATE_NOT_POINTER,                      \
+				&page_private(page)) &&                        \
+		       test_bit(PAGE_PRIVATE_##flagname, &page_private(page)); \
+	}
+#define F2FS_FOLIO_PRIVATE_SET_FUNC(name, flagname)                              \
+	static inline int folio_set_f2fs_##name(struct folio *folio)             \
+	{                                                                        \
+		/* For higher-order folios, use iomap folio state */             \
+		if (unlikely(!folio->mapping))                                   \
+			return -ENOENT;                                          \
+		bool force_alloc =                                               \
+			f2fs_should_use_buffered_iomap(folio_inode(folio));      \
+		if (!force_alloc && !folio_test_private(folio)) {                \
+			folio_attach_private(folio, (void *)0);                  \
+			set_bit(PAGE_PRIVATE_NOT_POINTER,                        \
+				(unsigned long *)&folio->private);               \
+			set_bit(PAGE_PRIVATE_##flagname,                         \
+				(unsigned long *)&folio->private);               \
+			return 0;                                                \
+		}                                                                \
+		struct f2fs_iomap_folio_state *fifs =                            \
+			f2fs_ifs_alloc(folio, GFP_NOFS, true);                   \
+		if (unlikely(!fifs))                                             \
+			return -ENOMEM;                                          \
+		unsigned long *private_p;                                        \
+		WRITE_ONCE(fifs->read_bytes_pending, F2FS_IFS_MAGIC);            \
+		private_p = f2fs_ifs_private_flags_ptr(fifs, folio);             \
+		if (!private_p)                                                  \
+			return -EINVAL;                                          \
+		/* Set the bit atomically */                                     \
+		set_bit(PAGE_PRIVATE_##flagname, private_p);                     \
+		/* Ensure NOT_POINTER bit is also set if any F2FS flag is set */ \
+		if (PAGE_PRIVATE_##flagname != PAGE_PRIVATE_NOT_POINTER)         \
+			set_bit(PAGE_PRIVATE_NOT_POINTER, private_p);            \
+		return 0;                                                        \
+	}                                                                        \
+	static inline void set_page_private_##name(struct page *page)            \
+	{                                                                        \
+		if (!PagePrivate(page))                                          \
+			attach_page_private(page, (void *)0);                    \
+		set_bit(PAGE_PRIVATE_NOT_POINTER, &page_private(page));          \
+		set_bit(PAGE_PRIVATE_##flagname, &page_private(page));           \
+	}
+
+#define F2FS_FOLIO_PRIVATE_CLEAR_FUNC(name, flagname)                      \
+	static inline void folio_clear_f2fs_##name(struct folio *folio)    \
+	{                                                                  \
+		/* First try direct folio->private access */               \
+		if (folio_test_private(folio) &&                           \
+		    test_bit(PAGE_PRIVATE_NOT_POINTER,                     \
+			     (unsigned long *)&folio->private)) {          \
+			clear_bit(PAGE_PRIVATE_##flagname,                 \
+				  (unsigned long *)&folio->private);       \
+			folio_detach_private(folio);                       \
+			return;                                            \
+		}                                                          \
+		/* For higher-order folios, use iomap folio state */       \
+		struct f2fs_iomap_folio_state *fifs =                      \
+			(struct f2fs_iomap_folio_state *)folio->private;   \
+		unsigned long *private_p;                                  \
+		if (unlikely(!fifs || !folio->mapping))                    \
+			return;                                            \
+		/* Check magic number before clearing */                   \
+		if (READ_ONCE(fifs->read_bytes_pending) != F2FS_IFS_MAGIC) \
+			return; /* Not ours or state unclear */            \
+		private_p = f2fs_ifs_private_flags_ptr(fifs, folio);       \
+		if (!private_p)                                            \
+			return;                                            \
+		clear_bit(PAGE_PRIVATE_##flagname, private_p);             \
+	}                                                                  \
+	static inline void clear_page_private_##name(struct page *page)    \
+	{                                                                  \
+		clear_bit(PAGE_PRIVATE_##flagname, &page_private(page));   \
+		if (page_private(page) == BIT(PAGE_PRIVATE_NOT_POINTER))   \
+			detach_page_private(page);                         \
+	}
+// Generate the accessor functions using the macros
+F2FS_FOLIO_PRIVATE_GET_FUNC(nonpointer, NOT_POINTER);
+F2FS_FOLIO_PRIVATE_GET_FUNC(inline, INLINE_INODE);
+F2FS_FOLIO_PRIVATE_GET_FUNC(gcing, ONGOING_MIGRATION);
+F2FS_FOLIO_PRIVATE_GET_FUNC(atomic, ATOMIC_WRITE);
+F2FS_FOLIO_PRIVATE_GET_FUNC(reference, REF_RESOURCE);
+
+F2FS_FOLIO_PRIVATE_SET_FUNC(reference, REF_RESOURCE);
+F2FS_FOLIO_PRIVATE_SET_FUNC(inline, INLINE_INODE);
+F2FS_FOLIO_PRIVATE_SET_FUNC(gcing, ONGOING_MIGRATION);
+F2FS_FOLIO_PRIVATE_SET_FUNC(atomic, ATOMIC_WRITE);
+
+F2FS_FOLIO_PRIVATE_CLEAR_FUNC(reference, REF_RESOURCE);
+F2FS_FOLIO_PRIVATE_CLEAR_FUNC(inline, INLINE_INODE);
+F2FS_FOLIO_PRIVATE_CLEAR_FUNC(gcing, ONGOING_MIGRATION);
+F2FS_FOLIO_PRIVATE_CLEAR_FUNC(atomic, ATOMIC_WRITE);
+static inline int folio_set_f2fs_data(struct folio *folio, unsigned long data)
+{
+	if (unlikely(!folio->mapping))
+		return -ENOENT;
+
+	struct f2fs_iomap_folio_state *fifs =
+		f2fs_ifs_alloc(folio, GFP_NOFS, true);
+	if (unlikely(!fifs))
+		return -ENOMEM;
+
+	unsigned long *private_p;
+
+	private_p = f2fs_ifs_private_flags_ptr(fifs, folio);
+	if (!private_p)
+		return -EINVAL;
+
+	*private_p &= GENMASK(PAGE_PRIVATE_MAX - 1, 0);
+	*private_p |= (data << PAGE_PRIVATE_MAX);
+	set_bit(PAGE_PRIVATE_NOT_POINTER, private_p);
+
+	return 0;
 }
+static inline unsigned long folio_get_f2fs_data(struct folio *folio)
+{
+	struct f2fs_iomap_folio_state *fifs =
+		(struct f2fs_iomap_folio_state *)folio->private;
+	unsigned long *private_p;
+	unsigned long data_val;
+
+	if (!folio->mapping)
+		return 0;
+	f2fs_bug_on(F2FS_I_SB(folio_inode(folio)), !fifs);
+	if (READ_ONCE(fifs->read_bytes_pending) != F2FS_IFS_MAGIC)
+		return 0;
+
+	private_p = f2fs_ifs_private_flags_ptr(fifs, folio);
+	if (!private_p)
+		return 0;
+
+	data_val = READ_ONCE(*private_p);
+
+	if (!test_bit(PAGE_PRIVATE_NOT_POINTER, &data_val))
+		return 0;
+
+	return data_val >> PAGE_PRIVATE_MAX;
+}
+#else
+#define PAGE_PRIVATE_GET_FUNC(name, flagname)                                  \
+	static inline bool folio_test_f2fs_##name(const struct folio *folio)   \
+	{                                                                      \
+		unsigned long priv = (unsigned long)folio->private;            \
+		unsigned long v = (1UL << PAGE_PRIVATE_NOT_POINTER) |          \
+				  (1UL << PAGE_PRIVATE_##flagname);            \
+		return (priv & v) == v;                                        \
+	}                                                                      \
+	static inline bool page_private_##name(struct page *page)              \
+	{                                                                      \
+		return PagePrivate(page) &&                                    \
+		       test_bit(PAGE_PRIVATE_NOT_POINTER,                      \
+				&page_private(page)) &&                        \
+		       test_bit(PAGE_PRIVATE_##flagname, &page_private(page)); \
+	}
+
+#define PAGE_PRIVATE_SET_FUNC(name, flagname)                           \
+	static inline void folio_set_f2fs_##name(struct folio *folio)   \
+	{                                                               \
+		unsigned long v = (1UL << PAGE_PRIVATE_NOT_POINTER) |   \
+				  (1UL << PAGE_PRIVATE_##flagname);     \
+		if (!folio->private)                                    \
+			folio_attach_private(folio, (void *)v);         \
+		else {                                                  \
+			v |= (unsigned long)folio->private;             \
+			folio->private = (void *)v;                     \
+		}                                                       \
+	}                                                               \
+	static inline void set_page_private_##name(struct page *page)   \
+	{                                                               \
+		if (!PagePrivate(page))                                 \
+			attach_page_private(page, (void *)0);           \
+		set_bit(PAGE_PRIVATE_NOT_POINTER, &page_private(page)); \
+		set_bit(PAGE_PRIVATE_##flagname, &page_private(page));  \
+	}
+
+#define PAGE_PRIVATE_CLEAR_FUNC(name, flagname)                          \
+	static inline void folio_clear_f2fs_##name(struct folio *folio)  \
+	{                                                                \
+		unsigned long v = (unsigned long)folio->private;         \
+		v &= ~(1UL << PAGE_PRIVATE_##flagname);                  \
+		if (v == (1UL << PAGE_PRIVATE_NOT_POINTER))              \
+			folio_detach_private(folio);                     \
+		else                                                     \
+			folio->private = (void *)v;                      \
+	}                                                                \
+	static inline void clear_page_private_##name(struct page *page)  \
+	{                                                                \
+		clear_bit(PAGE_PRIVATE_##flagname, &page_private(page)); \
+		if (page_private(page) == BIT(PAGE_PRIVATE_NOT_POINTER)) \
+			detach_page_private(page);                       \
+	}
 
 PAGE_PRIVATE_GET_FUNC(nonpointer, NOT_POINTER);
 PAGE_PRIVATE_GET_FUNC(inline, INLINE_INODE);
@@ -2595,7 +2767,7 @@ static inline void folio_set_f2fs_data(struct folio *folio, unsigned long data)
 	else
 		folio->private = (void *)((unsigned long)folio->private | data);
 }
-
+#endif
 static inline void dec_valid_block_count(struct f2fs_sb_info *sbi,
 						struct inode *inode,
 						block_t count)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [RFC PATCH 3/9] f2fs: Using `folio_detach_f2fs_private` in invalidate and release folio
  2025-08-13  9:37 [f2fs-dev] [RESEND RFC PATCH 0/9] f2fs: Enable buffered read/write large folios support with extended iomap Nanzhe Zhao
  2025-08-13  9:37 ` [RFC PATCH 1/9] f2fs: Introduce f2fs_iomap_folio_state Nanzhe Zhao
  2025-08-13  9:37 ` [RFC PATCH 2/9] f2fs: Integrate f2fs_iomap_folio_state into f2fs page private helpers Nanzhe Zhao
@ 2025-08-13  9:37 ` Nanzhe Zhao
  2025-08-13  9:37 ` [RFC PATCH 4/9] f2fs: Convert outplace write path page private funcions to folio private functions Nanzhe Zhao
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 11+ messages in thread
From: Nanzhe Zhao @ 2025-08-13  9:37 UTC (permalink / raw)
  To: Jaegeuk Kim, linux-f2fs-devel, linux-fsdevel
  Cc: Matthew Wilcox, Chao Yu, Yi Zhang, Barry Song, Nanzhe Zhao

Since `folio_detach_f2fs_private` can handle all case for a
folio to free it's private date , intergrate it as a subtitute
for `folio_detach_private`.

Signed-off-by: Nanzhe Zhao <nzzhao@126.com>
---
 fs/f2fs/data.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index ed1174430827..415f51602492 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -3748,7 +3748,16 @@ void f2fs_invalidate_folio(struct folio *folio, size_t offset, size_t length)
 			f2fs_remove_dirty_inode(inode);
 		}
 	}
+#ifdef CONFIG_F2FS_IOMAP_FOLIO_STATE
+	/* Same to iomap_invalidate_folio*/
+	if (offset == 0 && length == folio_size(folio)) {
+		WARN_ON_ONCE(folio_test_writeback(folio));
+		folio_cancel_dirty(folio);
+		folio_detach_f2fs_private(folio);
+	}
+#else
 	folio_detach_private(folio);
+#endif
 }
 
 bool f2fs_release_folio(struct folio *folio, gfp_t wait)
@@ -3757,7 +3766,11 @@ bool f2fs_release_folio(struct folio *folio, gfp_t wait)
 	if (folio_test_dirty(folio))
 		return false;
 
+#ifdef CONFIG_F2FS_IOMAP_FOLIO_STATE
+	folio_detach_f2fs_private(folio);
+#else
 	folio_detach_private(folio);
+#endif
 	return true;
 }
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [RFC PATCH 4/9] f2fs: Convert outplace write path page private funcions to folio private functions.
  2025-08-13  9:37 [f2fs-dev] [RESEND RFC PATCH 0/9] f2fs: Enable buffered read/write large folios support with extended iomap Nanzhe Zhao
                   ` (2 preceding siblings ...)
  2025-08-13  9:37 ` [RFC PATCH 3/9] f2fs: Using `folio_detach_f2fs_private` in invalidate and release folio Nanzhe Zhao
@ 2025-08-13  9:37 ` Nanzhe Zhao
  2025-08-13  9:37 ` [RFC PATCH 5/9] f2fs:Refactor `f2fs_is_compressed_page` to `f2fs_is_compressed_folio` Nanzhe Zhao
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 11+ messages in thread
From: Nanzhe Zhao @ 2025-08-13  9:37 UTC (permalink / raw)
  To: Jaegeuk Kim, linux-f2fs-devel, linux-fsdevel
  Cc: Matthew Wilcox, Chao Yu, Yi Zhang, Barry Song, Nanzhe Zhao

    The core function `f2fs_out_place_write` and `__get_segment_type_6`
    in outplace write path haven't got their legacy page private functions
    converted which can be harmful for large folios support.
    Convert them to use our folio private funcions.

Signed-off-by: Nanzhe Zhao <nzzhao@126.com>
---
 fs/f2fs/data.c    | 2 +-
 fs/f2fs/segment.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index 415f51602492..5589280294c1 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -2637,7 +2637,7 @@ bool f2fs_should_update_outplace(struct inode *inode, struct f2fs_io_info *fio)
 		return true;
 
 	if (fio) {
-		if (page_private_gcing(fio->page))
+		if (folio_test_f2fs_gcing(fio->folio))
 			return true;
 		if (unlikely(is_sbi_flag_set(sbi, SBI_CP_DISABLED) &&
 			f2fs_is_checkpointed_data(sbi, fio->old_blkaddr)))
diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index 949ee1f8fb5c..7e9dd045b55d 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -3653,7 +3653,7 @@ static int __get_segment_type_6(struct f2fs_io_info *fio)
 		if (is_inode_flag_set(inode, FI_ALIGNED_WRITE))
 			return CURSEG_COLD_DATA_PINNED;
 
-		if (page_private_gcing(fio->page)) {
+		if (folio_test_f2fs_gcing(fio->folio)) {
 			if (fio->sbi->am.atgc_enabled &&
 				(fio->io_type == FS_DATA_IO) &&
 				(fio->sbi->gc_mode != GC_URGENT_HIGH) &&
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [RFC PATCH 5/9] f2fs:Refactor `f2fs_is_compressed_page` to `f2fs_is_compressed_folio`
  2025-08-13  9:37 [f2fs-dev] [RESEND RFC PATCH 0/9] f2fs: Enable buffered read/write large folios support with extended iomap Nanzhe Zhao
                   ` (3 preceding siblings ...)
  2025-08-13  9:37 ` [RFC PATCH 4/9] f2fs: Convert outplace write path page private funcions to folio private functions Nanzhe Zhao
@ 2025-08-13  9:37 ` Nanzhe Zhao
  2025-08-13  9:37 ` [RFC PATCH 6/9] f2fs: Extend f2fs_io_info to support sub-folio ranges Nanzhe Zhao
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 11+ messages in thread
From: Nanzhe Zhao @ 2025-08-13  9:37 UTC (permalink / raw)
  To: Jaegeuk Kim, linux-f2fs-devel, linux-fsdevel
  Cc: Matthew Wilcox, Chao Yu, Yi Zhang, Barry Song, Nanzhe Zhao

`f2fs_is_compressed_page` now already accept a folio as
a parameter.So the name now is confusing.
Rename it to `f2fs_is_compressed_folio`.
If a folio has f2fs_iomap_folio_state then it must not be
a compressed folio.

Signed-off-by: Nanzhe Zhao <nzzhao@126.com>
---
 fs/f2fs/compress.c | 11 ++++++-----
 fs/f2fs/data.c     | 10 +++++-----
 fs/f2fs/f2fs.h     |  7 +++++--
 3 files changed, 16 insertions(+), 12 deletions(-)

diff --git a/fs/f2fs/compress.c b/fs/f2fs/compress.c
index 6ad8d3bc6df7..627013ef856c 100644
--- a/fs/f2fs/compress.c
+++ b/fs/f2fs/compress.c
@@ -71,13 +71,14 @@ static pgoff_t start_idx_of_cluster(struct compress_ctx *cc)
 	return cc->cluster_idx << cc->log_cluster_size;
 }
 
-bool f2fs_is_compressed_page(struct folio *folio)
+bool f2fs_is_compressed_folio(struct folio *folio)
 {
-	if (!folio->private)
+	if (!folio_test_private(folio))
 		return false;
 	if (folio_test_f2fs_nonpointer(folio))
 		return false;
-
+	if (folio_get_f2fs_ifs(folio)) /*compressed folio current don't support higer order*/
+		return false;
 	f2fs_bug_on(F2FS_F_SB(folio),
 		*((u32 *)folio->private) != F2FS_COMPRESSED_PAGE_MAGIC);
 	return true;
@@ -1483,8 +1484,8 @@ void f2fs_compress_write_end_io(struct bio *bio, struct folio *folio)
 	struct page *page = &folio->page;
 	struct f2fs_sb_info *sbi = bio->bi_private;
 	struct compress_io_ctx *cic = folio->private;
-	enum count_type type = WB_DATA_TYPE(folio,
-				f2fs_is_compressed_page(folio));
+	enum count_type type =
+		WB_DATA_TYPE(folio, f2fs_is_compressed_folio(folio));
 	int i;
 
 	if (unlikely(bio->bi_status != BLK_STS_OK))
diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index 5589280294c1..a9dc2572bdc4 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -142,7 +142,7 @@ static void f2fs_finish_read_bio(struct bio *bio, bool in_task)
 	bio_for_each_folio_all(fi, bio) {
 		struct folio *folio = fi.folio;
 
-		if (f2fs_is_compressed_page(folio)) {
+		if (f2fs_is_compressed_folio(folio)) {
 			if (ctx && !ctx->decompression_attempted)
 				f2fs_end_read_compressed_page(folio, true, 0,
 							in_task);
@@ -186,7 +186,7 @@ static void f2fs_verify_bio(struct work_struct *work)
 		bio_for_each_folio_all(fi, bio) {
 			struct folio *folio = fi.folio;
 
-			if (!f2fs_is_compressed_page(folio) &&
+			if (!f2fs_is_compressed_folio(folio) &&
 			    !fsverity_verify_page(&folio->page)) {
 				bio->bi_status = BLK_STS_IOERR;
 				break;
@@ -239,7 +239,7 @@ static void f2fs_handle_step_decompress(struct bio_post_read_ctx *ctx,
 	bio_for_each_folio_all(fi, ctx->bio) {
 		struct folio *folio = fi.folio;
 
-		if (f2fs_is_compressed_page(folio))
+		if (f2fs_is_compressed_folio(folio))
 			f2fs_end_read_compressed_page(folio, false, blkaddr,
 						      in_task);
 		else
@@ -344,7 +344,7 @@ static void f2fs_write_end_io(struct bio *bio)
 		}
 
 #ifdef CONFIG_F2FS_FS_COMPRESSION
-		if (f2fs_is_compressed_page(folio)) {
+		if (f2fs_is_compressed_folio(folio)) {
 			f2fs_compress_write_end_io(bio, folio);
 			continue;
 		}
@@ -568,7 +568,7 @@ static bool __has_merged_page(struct bio *bio, struct inode *inode,
 			if (IS_ERR(target))
 				continue;
 		}
-		if (f2fs_is_compressed_page(target)) {
+		if (f2fs_is_compressed_folio(target)) {
 			target = f2fs_compress_control_folio(target);
 			if (IS_ERR(target))
 				continue;
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index a14bef4dc394..9f88be53174b 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -4677,7 +4677,7 @@ enum cluster_check_type {
 	CLUSTER_COMPR_BLKS, /* return # of compressed blocks in a cluster */
 	CLUSTER_RAW_BLKS    /* return # of raw blocks in a cluster */
 };
-bool f2fs_is_compressed_page(struct folio *folio);
+bool f2fs_is_compressed_folio(struct folio *folio);
 struct folio *f2fs_compress_control_folio(struct folio *folio);
 int f2fs_prepare_compress_overwrite(struct inode *inode,
 			struct page **pagep, pgoff_t index, void **fsdata);
@@ -4744,7 +4744,10 @@ void f2fs_invalidate_compress_pages(struct f2fs_sb_info *sbi, nid_t ino);
 		sbi->compr_saved_block += diff;				\
 	} while (0)
 #else
-static inline bool f2fs_is_compressed_page(struct folio *folio) { return false; }
+static inline bool f2fs_is_compressed_folio(struct folio *folio)
+{
+	return false;
+}
 static inline bool f2fs_is_compress_backend_ready(struct inode *inode)
 {
 	if (!f2fs_compressed_file(inode))
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [RFC PATCH 6/9] f2fs: Extend f2fs_io_info to support sub-folio ranges
  2025-08-13  9:37 [f2fs-dev] [RESEND RFC PATCH 0/9] f2fs: Enable buffered read/write large folios support with extended iomap Nanzhe Zhao
                   ` (4 preceding siblings ...)
  2025-08-13  9:37 ` [RFC PATCH 5/9] f2fs:Refactor `f2fs_is_compressed_page` to `f2fs_is_compressed_folio` Nanzhe Zhao
@ 2025-08-13  9:37 ` Nanzhe Zhao
  2025-08-13  9:37 ` [RFC PATCH 7/9] f2fs:Make GC aware of large folios Nanzhe Zhao
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 11+ messages in thread
From: Nanzhe Zhao @ 2025-08-13  9:37 UTC (permalink / raw)
  To: Jaegeuk Kim, linux-f2fs-devel, linux-fsdevel
  Cc: Matthew Wilcox, Chao Yu, Yi Zhang, Barry Song, Nanzhe Zhao

Since f2fs_io_info (hereafter fio) has been converted to use fio->folio
and fio->page is deprecated, we must now track which sub-part of a
folio is being submitted to a bio in order to support large folios.

To achieve this, we add `idx` and `cnt` fields to the fio struct.
`fio->idx` represents the offset (in pages) within the current folio
for this I/O operation, and `fio->cnt` represents the number of
contiguous blocks being processed.

With the introduction of these two fields, the existing `old_blkaddr`
and `new_blkaddr` fields in fio are reinterpreted. They now represent
the starting old and new block addresses corresponding to `fio->idx`.
Consequently, an fio no longer represents a single mapping from one
old_blkaddr to one new_blkaddr, but rather a range mapping from
[old_blkaddr, old_blkaddr + fio->cnt - 1] to
[new_blkaddr, new_blkaddr + fio->cnt - 1].

In bio submission paths, for cases where `fio->cnt` is not explicitly
initialized, we default it to 1 and `fio->idx` to 0. This ensures
backward compatibility with all existing f2fs logic that operates on
single pages.

Discussion: Now I don't know if it's better to store bytes-unit
logical file offset and LBA length instead of block-unit cnt and
page idx in fio if we are to support BLOCK_SIZE > PAGE_SIZE.
Suggestions are appreciated.

Signed-off-by: Nanzhe Zhao <nzzhao@126.com>
---
 fs/f2fs/data.c | 16 ++++++++++------
 fs/f2fs/f2fs.h |  2 ++
 2 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index a9dc2572bdc4..b7bef2a28c8e 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -711,7 +711,9 @@ int f2fs_submit_page_bio(struct f2fs_io_info *fio)
 
 	f2fs_set_bio_crypt_ctx(bio, fio_folio->mapping->host,
 			fio_folio->index, fio, GFP_NOIO);
-	bio_add_folio_nofail(bio, data_folio, folio_size(data_folio), 0);
+	bio_add_folio_nofail(bio, data_folio,
+			     F2FS_BLK_TO_BYTES(fio->cnt ? fio->cnt : 1),
+			     fio->idx << PAGE_SHIFT);
 
 	if (fio->io_wbc && !is_read_io(fio->op))
 		wbc_account_cgroup_owner(fio->io_wbc, fio_folio, PAGE_SIZE);
@@ -1010,16 +1012,18 @@ void f2fs_submit_page_write(struct f2fs_io_info *fio)
 		io->fio = *fio;
 	}
 
-	if (!bio_add_folio(io->bio, bio_folio, folio_size(bio_folio), 0)) {
+	if (!bio_add_folio(io->bio, bio_folio,
+			   F2FS_BLK_TO_BYTES(fio->cnt ? fio->cnt : 1),
+			   fio->idx << PAGE_SHIFT)) {
 		__submit_merged_bio(io);
 		goto alloc_new;
 	}
 
 	if (fio->io_wbc)
 		wbc_account_cgroup_owner(fio->io_wbc, fio->folio,
-				folio_size(fio->folio));
+					 F2FS_BLK_TO_BYTES(fio->cnt));
 
-	io->last_block_in_bio = fio->new_blkaddr;
+	io->last_block_in_bio = fio->new_blkaddr + fio->cnt - 1;
 
 	trace_f2fs_submit_folio_write(fio->folio, fio);
 #ifdef CONFIG_BLK_DEV_ZONED
@@ -2675,7 +2679,7 @@ int f2fs_do_write_data_page(struct f2fs_io_info *fio)
 		set_new_dnode(&dn, inode, NULL, NULL, 0);
 
 	if (need_inplace_update(fio) &&
-	    f2fs_lookup_read_extent_cache_block(inode, folio->index,
+	    f2fs_lookup_read_extent_cache_block(inode, folio->index + fio->idx,
 						&fio->old_blkaddr)) {
 		if (!f2fs_is_valid_blkaddr(fio->sbi, fio->old_blkaddr,
 						DATA_GENERIC_ENHANCE))
@@ -2690,7 +2694,7 @@ int f2fs_do_write_data_page(struct f2fs_io_info *fio)
 	if (fio->need_lock == LOCK_REQ && !f2fs_trylock_op(fio->sbi))
 		return -EAGAIN;
 
-	err = f2fs_get_dnode_of_data(&dn, folio->index, LOOKUP_NODE);
+	err = f2fs_get_dnode_of_data(&dn, folio->index + fio->idx, LOOKUP_NODE);
 	if (err)
 		goto out;
 
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 9f88be53174b..c6b23fa63588 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -1281,6 +1281,8 @@ struct f2fs_io_info {
 	blk_opf_t op_flags;	/* req_flag_bits */
 	block_t new_blkaddr;	/* new block address to be written */
 	block_t old_blkaddr;	/* old block address before Cow */
+	pgoff_t idx; /*start page index within current active folio in this fio*/
+	unsigned int cnt; /*block cnts of the active folio, we assume they are continuous.*/
 	union {
 		struct page *page;	/* page to be written */
 		struct folio *folio;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [RFC PATCH 7/9] f2fs:Make GC aware of large folios
  2025-08-13  9:37 [f2fs-dev] [RESEND RFC PATCH 0/9] f2fs: Enable buffered read/write large folios support with extended iomap Nanzhe Zhao
                   ` (5 preceding siblings ...)
  2025-08-13  9:37 ` [RFC PATCH 6/9] f2fs: Extend f2fs_io_info to support sub-folio ranges Nanzhe Zhao
@ 2025-08-13  9:37 ` Nanzhe Zhao
  2025-08-13  9:37 ` [RFC PATCH 8/9] f2fs: Introduce F2FS_GET_BLOCK_IOMAP and map_blocks he lpers Nanzhe Zhao
  2025-08-13  9:37 ` [RFC PATCH 9/9] f2fs: Enable buffered read/write path large folios support for normal and atomic file with iomap Nanzhe Zhao
  8 siblings, 0 replies; 11+ messages in thread
From: Nanzhe Zhao @ 2025-08-13  9:37 UTC (permalink / raw)
  To: Jaegeuk Kim, linux-f2fs-devel, linux-fsdevel
  Cc: Matthew Wilcox, Chao Yu, Yi Zhang, Barry Song, Nanzhe Zhao

Previously, the GC (Garbage Collection) logic for performing I/O and
marking folios dirty only supported order-0 folios and lacked awareness
of higher-order folios. To enable GC to correctly handle higher-order
folios, we made two changes:

- In `move_data_page`, we now use `f2fs_iomap_set_range_dirty` to mark
  only the sub-part of the folio corresponding to `bidx` as dirty,
  instead of the entire folio.

- The `f2fs_submit_page_read` function has been augmented with an
  `index` parameter, allowing it to precisely identify which sub-page
  of the current folio is being submitted.

Signed-off-by: Nanzhe Zhao <nzzhao@126.com>
---
 fs/f2fs/data.c | 13 +++++++------
 fs/f2fs/gc.c   | 37 +++++++++++++++++++++++--------------
 2 files changed, 30 insertions(+), 20 deletions(-)

diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index b7bef2a28c8e..5ecd08a3dd0b 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -1096,7 +1096,7 @@ static struct bio *f2fs_grab_read_bio(struct inode *inode, block_t blkaddr,
 /* This can handle encryption stuffs */
 static int f2fs_submit_page_read(struct inode *inode, struct folio *folio,
 				 block_t blkaddr, blk_opf_t op_flags,
-				 bool for_write)
+				 pgoff_t index, bool for_write)
 {
 	struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
 	struct bio *bio;
@@ -1109,7 +1109,8 @@ static int f2fs_submit_page_read(struct inode *inode, struct folio *folio,
 	/* wait for GCed page writeback via META_MAPPING */
 	f2fs_wait_on_block_writeback(inode, blkaddr);
 
-	if (!bio_add_folio(bio, folio, PAGE_SIZE, 0)) {
+	if (!bio_add_folio(bio, folio, PAGE_SIZE,
+			   (index - folio->index) << PAGE_SHIFT)) {
 		iostat_update_and_unbind_ctx(bio);
 		if (bio->bi_private)
 			mempool_free(bio->bi_private, bio_post_read_ctx_pool);
@@ -1276,8 +1277,8 @@ struct folio *f2fs_get_read_data_folio(struct inode *inode, pgoff_t index,
 		return folio;
 	}
 
-	err = f2fs_submit_page_read(inode, folio, dn.data_blkaddr,
-						op_flags, for_write);
+	err = f2fs_submit_page_read(inode, folio, dn.data_blkaddr, op_flags,
+				    index, for_write);
 	if (err)
 		goto put_err;
 	return folio;
@@ -3651,8 +3652,8 @@ static int f2fs_write_begin(const struct kiocb *iocb,
 			goto put_folio;
 		}
 		err = f2fs_submit_page_read(use_cow ?
-				F2FS_I(inode)->cow_inode : inode,
-				folio, blkaddr, 0, true);
+			F2FS_I(inode)->cow_inode : inode, folio,
+			blkaddr, 0, folio->index, true);
 		if (err)
 			goto put_folio;
 
diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
index 098e9f71421e..6d28f01bec42 100644
--- a/fs/f2fs/gc.c
+++ b/fs/f2fs/gc.c
@@ -1475,22 +1475,31 @@ static int move_data_page(struct inode *inode, block_t bidx, int gc_type,
 			err = -EAGAIN;
 			goto out;
 		}
-		folio_mark_dirty(folio);
 		folio_set_f2fs_gcing(folio);
+#ifdef CONFIG_F2FS_IOMAP_FOLIO_STATE
+		if (!folio_test_large(folio)) {
+			folio_mark_dirty(folio);
+		} else {
+			f2fs_iomap_set_range_dirty(folio, (bidx - folio->index) << PAGE_SHIFT,
+				PAGE_SIZE);
+		}
+#else
+		folio_mark_dirty(folio);
+#endif
 	} else {
-		struct f2fs_io_info fio = {
-			.sbi = F2FS_I_SB(inode),
-			.ino = inode->i_ino,
-			.type = DATA,
-			.temp = COLD,
-			.op = REQ_OP_WRITE,
-			.op_flags = REQ_SYNC,
-			.old_blkaddr = NULL_ADDR,
-			.folio = folio,
-			.encrypted_page = NULL,
-			.need_lock = LOCK_REQ,
-			.io_type = FS_GC_DATA_IO,
-		};
+		struct f2fs_io_info fio = { .sbi = F2FS_I_SB(inode),
+					    .ino = inode->i_ino,
+					    .type = DATA,
+					    .temp = COLD,
+					    .op = REQ_OP_WRITE,
+					    .op_flags = REQ_SYNC,
+					    .old_blkaddr = NULL_ADDR,
+					    .folio = folio,
+					    .encrypted_page = NULL,
+					    .need_lock = LOCK_REQ,
+					    .io_type = FS_GC_DATA_IO,
+					    .idx = bidx - folio->index,
+					    .cnt = 1 };
 		bool is_dirty = folio_test_dirty(folio);
 
 retry:
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [RFC PATCH 8/9] f2fs: Introduce F2FS_GET_BLOCK_IOMAP and map_blocks he lpers
  2025-08-13  9:37 [f2fs-dev] [RESEND RFC PATCH 0/9] f2fs: Enable buffered read/write large folios support with extended iomap Nanzhe Zhao
                   ` (6 preceding siblings ...)
  2025-08-13  9:37 ` [RFC PATCH 7/9] f2fs:Make GC aware of large folios Nanzhe Zhao
@ 2025-08-13  9:37 ` Nanzhe Zhao
  2025-08-13  9:37 ` [RFC PATCH 9/9] f2fs: Enable buffered read/write path large folios support for normal and atomic file with iomap Nanzhe Zhao
  8 siblings, 0 replies; 11+ messages in thread
From: Nanzhe Zhao @ 2025-08-13  9:37 UTC (permalink / raw)
  To: Jaegeuk Kim, linux-f2fs-devel, linux-fsdevel
  Cc: Matthew Wilcox, Chao Yu, Yi Zhang, Barry Song, Nanzhe Zhao

    Introduce the `F2FS_GET_BLOCK_IOMAP` flag for `f2fs_ma
p_blocks`.

    With this flag, holes encountered during buffered I/O
iterative mapping
    can now be merged under `map_is_mergeable`. Furthermor
e, when this flag
    is passed, `f2fs_map_blocks` will by default store the
 mapped block
    information (from the `f2fs_map_blocks` structure) int
o the extent cache,
    provided the resulting extent size is greater than the
 minimum allowed
    length for the f2fs extent cache.
    Notably, both holes and `NEW_ADDR`
    extents will also be cached under the influence of thi
s flag.
    This improves buffered write performance for sparse fi
les.

    Additionally, two helper functions are introduced:
    - `f2fs_map_blocks_iomap`: A simple wrapper for `f2fs_
map_blocks` that
      enables the `F2FS_GET_BLOCK_IOMAP` flag.
    - `f2fs_map_blocks_prealloc`: A simple wrapper for usi
ng
      `f2fs_map_blocks` to preallocate blocks.

Signed-off-by: Nanzhe Zhao <nzzhao@126.com>
---
 fs/f2fs/data.c | 49 +++++++++++++++++++++++++++++++++++++++++++------
 fs/f2fs/f2fs.h |  5 +++++
 2 files changed, 48 insertions(+), 6 deletions(-)

diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index 5ecd08a3dd0b..37eaf431ab42 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -1537,8 +1537,11 @@ static bool map_is_mergeable(struct f2fs_sb_info *sbi,
 		return true;
 	if (flag == F2FS_GET_BLOCK_PRE_DIO)
 		return true;
-	if (flag == F2FS_GET_BLOCK_DIO &&
-		map->m_pblk == NULL_ADDR && blkaddr == NULL_ADDR)
+	if (flag == F2FS_GET_BLOCK_DIO && map->m_pblk == NULL_ADDR &&
+	    blkaddr == NULL_ADDR)
+		return true;
+	if (flag == F2FS_GET_BLOCK_IOMAP && map->m_pblk == NULL_ADDR &&
+	    blkaddr == NULL_ADDR)
 		return true;
 	return false;
 }
@@ -1676,6 +1679,10 @@ int f2fs_map_blocks(struct inode *inode, struct f2fs_map_blocks *map, int flag)
 			if (map->m_next_pgofs)
 				*map->m_next_pgofs = pgofs + 1;
 			break;
+		case F2FS_GET_BLOCK_IOMAP:
+			if (map->m_next_pgofs)
+				*map->m_next_pgofs = pgofs + 1;
+			break;
 		default:
 			/* for defragment case */
 			if (map->m_next_pgofs)
@@ -1741,8 +1748,9 @@ int f2fs_map_blocks(struct inode *inode, struct f2fs_map_blocks *map, int flag)
 	else if (dn.ofs_in_node < end_offset)
 		goto next_block;
 
-	if (flag == F2FS_GET_BLOCK_PRECACHE) {
-		if (map->m_flags & F2FS_MAP_MAPPED) {
+	if (flag == F2FS_GET_BLOCK_PRECACHE || flag == F2FS_GET_BLOCK_IOMAP) {
+		if (map->m_flags & F2FS_MAP_MAPPED &&
+		    map->m_len > F2FS_MIN_EXTENT_LEN) {
 			unsigned int ofs = start_pgofs - map->m_lblk;
 
 			f2fs_update_read_extent_cache_range(&dn,
@@ -1786,8 +1794,9 @@ int f2fs_map_blocks(struct inode *inode, struct f2fs_map_blocks *map, int flag)
 		}
 	}
 
-	if (flag == F2FS_GET_BLOCK_PRECACHE) {
-		if (map->m_flags & F2FS_MAP_MAPPED) {
+	if (flag == F2FS_GET_BLOCK_PRECACHE || flag == F2FS_GET_BLOCK_IOMAP) {
+		if (map->m_flags & F2FS_MAP_MAPPED &&
+		    map->m_len > F2FS_MIN_EXTENT_LEN) {
 			unsigned int ofs = start_pgofs - map->m_lblk;
 
 			f2fs_update_read_extent_cache_range(&dn,
@@ -1808,6 +1817,34 @@ int f2fs_map_blocks(struct inode *inode, struct f2fs_map_blocks *map, int flag)
 	return err;
 }
 
+int f2fs_map_blocks_iomap(struct inode *inode, block_t start, block_t len,
+			  struct f2fs_map_blocks *map)
+{
+	int err = 0;
+
+	map->m_lblk = start; // Logical block number for the start pos
+	map->m_len = len; // Length in blocks
+	map->m_may_create = false;
+	map->m_seg_type =
+		f2fs_rw_hint_to_seg_type(F2FS_I_SB(inode), inode->i_write_hint);
+	err = f2fs_map_blocks(inode, map, F2FS_GET_BLOCK_IOMAP);
+	return err;
+}
+
+int f2fs_map_blocks_preallocate(struct inode *inode, block_t start, block_t len,
+				struct f2fs_map_blocks *map)
+{
+	int err = 0;
+
+	map->m_lblk = start;
+	map->m_len = len; // Length in blocks
+	map->m_may_create = true;
+	map->m_seg_type =
+		f2fs_rw_hint_to_seg_type(F2FS_I_SB(inode), inode->i_write_hint);
+	err = f2fs_map_blocks(inode, map, F2FS_GET_BLOCK_PRE_AIO);
+	return err;
+}
+
 bool f2fs_overwrite_io(struct inode *inode, loff_t pos, size_t len)
 {
 	struct f2fs_map_blocks map;
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index c6b23fa63588..ac9a6ac13e1f 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -788,6 +788,7 @@ enum {
 	F2FS_GET_BLOCK_PRE_DIO,
 	F2FS_GET_BLOCK_PRE_AIO,
 	F2FS_GET_BLOCK_PRECACHE,
+	F2FS_GET_BLOCK_IOMAP,
 };
 
 /*
@@ -4232,6 +4233,10 @@ struct folio *f2fs_get_new_data_folio(struct inode *inode,
 			struct folio *ifolio, pgoff_t index, bool new_i_size);
 int f2fs_do_write_data_page(struct f2fs_io_info *fio);
 int f2fs_map_blocks(struct inode *inode, struct f2fs_map_blocks *map, int flag);
+int f2fs_map_blocks_iomap(struct inode *inode, block_t start, block_t len,
+			  struct f2fs_map_blocks *map);
+int f2fs_map_blocks_preallocate(struct inode *inode, block_t start, block_t len,
+				struct f2fs_map_blocks *map);
 int f2fs_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
 			u64 start, u64 len);
 int f2fs_encrypt_one_page(struct f2fs_io_info *fio);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [RFC PATCH 9/9] f2fs: Enable buffered read/write path large folios support for normal and atomic file with iomap
  2025-08-13  9:37 [f2fs-dev] [RESEND RFC PATCH 0/9] f2fs: Enable buffered read/write large folios support with extended iomap Nanzhe Zhao
                   ` (7 preceding siblings ...)
  2025-08-13  9:37 ` [RFC PATCH 8/9] f2fs: Introduce F2FS_GET_BLOCK_IOMAP and map_blocks he lpers Nanzhe Zhao
@ 2025-08-13  9:37 ` Nanzhe Zhao
  8 siblings, 0 replies; 11+ messages in thread
From: Nanzhe Zhao @ 2025-08-13  9:37 UTC (permalink / raw)
  To: Jaegeuk Kim, linux-f2fs-devel, linux-fsdevel
  Cc: Matthew Wilcox, Chao Yu, Yi Zhang, Barry Song, Nanzhe Zhao

This commit enables large folios support for F2FS's buffered read and
write paths.

We introduce a helper function `f2fs_set_iomap` to handle all the logic
that converts a f2fs_map_blocks to iomap.

Currently, compressed files, encrypted files, and fsverity are not
supported with iomap large folios.

Since F2FS requires `f2fs_iomap_folio_state` (or a similar equivalent
mechanism) to correctly support the iomap framework, when
`CONFIG_F2FS_IOMAP_FOLIO_STATE` is not enabled, we will not use the
iomap buffered read/write paths.

Note: Since holes reported by f2fs_map_blocks come in two types
(NULL_ADDR and unmapped dnodes).
They requiring different handle logic to set iomap.length,
So we add a new block state flag for f2fs_map_blocks

Signed-off-by: Nanzhe Zhao <nzzhao@126.com>
---
 fs/f2fs/data.c   | 286 +++++++++++++++++++++++++++++++++++++++++++----
 fs/f2fs/f2fs.h   | 120 +++++++++++++-------
 fs/f2fs/file.c   |  33 +++++-
 fs/f2fs/inline.c |  15 ++-
 fs/f2fs/inode.c  |  27 +++++
 fs/f2fs/namei.c  |   7 ++
 fs/f2fs/super.c  |   3 +
 7 files changed, 425 insertions(+), 66 deletions(-)

diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index 37eaf431ab42..243c6305b0c5 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -1149,6 +1149,9 @@ void f2fs_update_data_blkaddr(struct dnode_of_data *dn, block_t blkaddr)
 {
 	f2fs_set_data_blkaddr(dn, blkaddr);
 	f2fs_update_read_extent_cache(dn);
+#ifdef CONFIG_F2FS_IOMAP_FOLIO_STATE
+	f2fs_iomap_seq_inc(dn->inode);
+#endif
 }
 
 /* dn->ofs_in_node will be returned with up-to-date last block pointer */
@@ -1182,6 +1185,9 @@ int f2fs_reserve_new_blocks(struct dnode_of_data *dn, blkcnt_t count)
 
 	if (folio_mark_dirty(dn->node_folio))
 		dn->node_changed = true;
+#ifdef CONFIG_F2FS_IOMAP_FOLIO_STATE
+	f2fs_iomap_seq_inc(dn->inode);
+#endif
 	return 0;
 }
 
@@ -1486,6 +1492,7 @@ static int f2fs_map_no_dnode(struct inode *inode,
 		*map->m_next_pgofs = f2fs_get_next_page_offset(dn, pgoff);
 	if (map->m_next_extent)
 		*map->m_next_extent = f2fs_get_next_page_offset(dn, pgoff);
+	map->m_flags |= F2FS_MAP_NODNODE;
 	return 0;
 }
 
@@ -1702,7 +1709,9 @@ int f2fs_map_blocks(struct inode *inode, struct f2fs_map_blocks *map, int flag)
 		if (blkaddr == NEW_ADDR)
 			map->m_flags |= F2FS_MAP_DELALLOC;
 		/* DIO READ and hole case, should not map the blocks. */
-		if (!(flag == F2FS_GET_BLOCK_DIO && is_hole && !map->m_may_create))
+		if (!(flag == F2FS_GET_BLOCK_DIO && is_hole &&
+		      !map->m_may_create) &&
+		    !(flag == F2FS_GET_BLOCK_IOMAP && is_hole))
 			map->m_flags |= F2FS_MAP_MAPPED;
 
 		map->m_pblk = blkaddr;
@@ -1736,6 +1745,10 @@ int f2fs_map_blocks(struct inode *inode, struct f2fs_map_blocks *map, int flag)
 			goto sync_out;
 
 		map->m_len += dn.ofs_in_node - ofs_in_node;
+		/* Since we successfully reserved blocks, we can update the pblk now.
+		 * No need to perform inefficient look up in write_begin again
+		 */
+		map->m_pblk = dn.data_blkaddr;
 		if (prealloc && dn.ofs_in_node != last_ofs_in_node + 1) {
 			err = -ENOSPC;
 			goto sync_out;
@@ -4255,9 +4268,6 @@ static int f2fs_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
 	err = f2fs_map_blocks(inode, &map, F2FS_GET_BLOCK_DIO);
 	if (err)
 		return err;
-
-	iomap->offset = F2FS_BLK_TO_BYTES(map.m_lblk);
-
 	/*
 	 * When inline encryption is enabled, sometimes I/O to an encrypted file
 	 * has to be broken up to guarantee DUN contiguity.  Handle this by
@@ -4272,28 +4282,44 @@ static int f2fs_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
 	if (WARN_ON_ONCE(map.m_pblk == COMPRESS_ADDR))
 		return -EINVAL;
 
-	if (map.m_flags & F2FS_MAP_MAPPED) {
-		if (WARN_ON_ONCE(map.m_pblk == NEW_ADDR))
-			return -EINVAL;
-
-		iomap->length = F2FS_BLK_TO_BYTES(map.m_len);
-		iomap->type = IOMAP_MAPPED;
-		iomap->flags |= IOMAP_F_MERGED;
-		iomap->bdev = map.m_bdev;
-		iomap->addr = F2FS_BLK_TO_BYTES(map.m_pblk);
-
-		if (flags & IOMAP_WRITE && map.m_last_pblk)
-			iomap->private = (void *)map.m_last_pblk;
+	return f2fs_set_iomap(inode, &map, iomap, flags, offset, length, false);
+}
+int f2fs_set_iomap(struct inode *inode, struct f2fs_map_blocks *map,
+		   struct iomap *iomap, unsigned int flags, loff_t offset,
+		   loff_t length, bool dio)
+{
+	iomap->offset = F2FS_BLK_TO_BYTES(map->m_lblk);
+	if (map->m_flags & F2FS_MAP_MAPPED) {
+		if (dio) {
+			if (WARN_ON_ONCE(map->m_pblk == NEW_ADDR))
+				return -EINVAL;
+		}
+		iomap->length = F2FS_BLK_TO_BYTES(map->m_len);
+		iomap->bdev = map->m_bdev;
+		if (map->m_pblk != NEW_ADDR) {
+			iomap->type = IOMAP_MAPPED;
+			iomap->flags |= IOMAP_F_MERGED;
+			iomap->addr = F2FS_BLK_TO_BYTES(map->m_pblk);
+		} else {
+			iomap->type = IOMAP_UNWRITTEN;
+			iomap->addr = IOMAP_NULL_ADDR;
+		}
+		if (flags & IOMAP_WRITE && map->m_last_pblk)
+			iomap->private = (void *)map->m_last_pblk;
 	} else {
-		if (flags & IOMAP_WRITE)
+		if (dio && flags & IOMAP_WRITE)
 			return -ENOTBLK;
 
-		if (map.m_pblk == NULL_ADDR) {
-			iomap->length = F2FS_BLK_TO_BYTES(next_pgofs) -
-							iomap->offset;
+		if (map->m_pblk == NULL_ADDR) {
+			if (map->m_flags & F2FS_MAP_NODNODE)
+				iomap->length =
+					F2FS_BLK_TO_BYTES(*map->m_next_pgofs) -
+					iomap->offset;
+			else
+				iomap->length = F2FS_BLK_TO_BYTES(map->m_len);
 			iomap->type = IOMAP_HOLE;
-		} else if (map.m_pblk == NEW_ADDR) {
-			iomap->length = F2FS_BLK_TO_BYTES(map.m_len);
+		} else if (map->m_pblk == NEW_ADDR) {
+			iomap->length = F2FS_BLK_TO_BYTES(map->m_len);
 			iomap->type = IOMAP_UNWRITTEN;
 		} else {
 			f2fs_bug_on(F2FS_I_SB(inode), 1);
@@ -4301,7 +4327,7 @@ static int f2fs_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
 		iomap->addr = IOMAP_NULL_ADDR;
 	}
 
-	if (map.m_flags & F2FS_MAP_NEW)
+	if (map->m_flags & F2FS_MAP_NEW)
 		iomap->flags |= IOMAP_F_NEW;
 	if ((inode->i_state & I_DIRTY_DATASYNC) ||
 	    offset + length > i_size_read(inode))
@@ -4313,3 +4339,217 @@ static int f2fs_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
 const struct iomap_ops f2fs_iomap_ops = {
 	.iomap_begin	= f2fs_iomap_begin,
 };
+
+/* iomap buffered-io */
+static int f2fs_buffered_read_iomap_begin(struct inode *inode, loff_t offset,
+					  loff_t length, unsigned int flags,
+					  struct iomap *iomap,
+					  struct iomap *srcmap)
+{
+	pgoff_t next_pgofs = 0;
+	int err;
+	struct f2fs_map_blocks map = {};
+
+	map.m_lblk = F2FS_BYTES_TO_BLK(offset);
+	map.m_len = F2FS_BYTES_TO_BLK(offset + length - 1) - map.m_lblk + 1;
+	map.m_next_pgofs = &next_pgofs;
+	map.m_seg_type =
+		f2fs_rw_hint_to_seg_type(F2FS_I_SB(inode), inode->i_write_hint);
+	map.m_may_create = false;
+	if (is_sbi_flag_set(F2FS_I_SB(inode), SBI_IS_SHUTDOWN))
+		return -EIO;
+	/*
+	 * If the blocks being overwritten are already allocated,
+	 * f2fs_map_lock and f2fs_balance_fs are not necessary.
+	 */
+	if (flags & IOMAP_WRITE)
+		return -EINVAL;
+
+	err = f2fs_map_blocks(inode, &map, F2FS_GET_BLOCK_IOMAP);
+	if (err)
+		return err;
+
+	if (WARN_ON_ONCE(map.m_pblk == COMPRESS_ADDR))
+		return -EINVAL;
+
+	return f2fs_set_iomap(inode, &map, iomap, flags, offset, length, false);
+}
+
+const struct iomap_ops f2fs_buffered_read_iomap_ops = {
+	.iomap_begin = f2fs_buffered_read_iomap_begin,
+};
+
+static void f2fs_iomap_readahead(struct readahead_control *rac)
+{
+	struct inode *inode = rac->mapping->host;
+
+	if (!f2fs_is_compress_backend_ready(inode))
+		return;
+
+	/* If the file has inline data, skip readahead */
+	if (f2fs_has_inline_data(inode))
+		return;
+	iomap_readahead(rac, &f2fs_buffered_read_iomap_ops);
+}
+
+static int f2fs_buffered_write_iomap_begin(struct inode *inode, loff_t offset,
+					   loff_t length, unsigned flags,
+					   struct iomap *iomap,
+					   struct iomap *srcmap)
+{
+	struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
+	struct f2fs_map_blocks map = {};
+	struct folio *ifolio = NULL;
+	int err = 0;
+
+	iomap->offset = offset;
+	iomap->bdev = sbi->sb->s_bdev;
+#ifdef CONFIG_F2FS_IOMAP_FOLIO_STATE
+	iomap->validity_cookie = f2fs_iomap_seq_read(inode);
+#endif
+	if (f2fs_has_inline_data(inode)) {
+		if (offset + length <= MAX_INLINE_DATA(inode)) {
+			ifolio = f2fs_get_inode_folio(sbi, inode->i_ino);
+			if (IS_ERR(ifolio)) {
+				err = PTR_ERR(ifolio);
+				goto failed;
+			}
+			set_inode_flag(inode, FI_DATA_EXIST);
+			f2fs_iomap_prepare_read_inline(inode, ifolio, iomap,
+						       offset, length);
+			if (inode->i_nlink)
+				folio_set_f2fs_inline(ifolio);
+
+			f2fs_folio_put(ifolio, 1);
+			goto out;
+		}
+	}
+	block_t start_blk = F2FS_BYTES_TO_BLK(offset);
+	block_t len_blks =
+		F2FS_BYTES_TO_BLK(offset + length - 1) - start_blk + 1;
+	err = f2fs_map_blocks_iomap(inode, start_blk, len_blks, &map);
+	if (map.m_pblk == NULL_ADDR) {
+		err = f2fs_map_blocks_preallocate(inode, map.m_lblk, len_blks,
+						  &map);
+		if (err)
+			goto failed;
+	}
+	if (WARN_ON_ONCE(map.m_pblk == COMPRESS_ADDR))
+		return -EIO; // Should not happen for buffered write prep
+	err = f2fs_set_iomap(inode, &map, iomap, flags, offset, length, false);
+	if (err)
+		return err;
+failed:
+	f2fs_write_failed(inode, offset + length);
+out:
+	return err;
+}
+
+static int f2fs_buffered_write_atomic_iomap_begin(struct inode *inode,
+						  loff_t offset, loff_t length,
+						  unsigned flags,
+						  struct iomap *iomap,
+						  struct iomap *srcmap)
+{
+	struct inode *cow_inode = F2FS_I(inode)->cow_inode;
+	struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
+	struct f2fs_map_blocks map = {};
+	int err = 0;
+
+	iomap->offset = offset;
+	iomap->bdev = sbi->sb->s_bdev;
+#ifdef CONFIG_F2FS_IOMAP_FOLIO_STATE
+	iomap->validity_cookie = f2fs_iomap_seq_read(inode);
+#endif
+	block_t start_blk = F2FS_BYTES_TO_BLK(offset);
+	block_t len_blks =
+		F2FS_BYTES_TO_BLK(offset + length - 1) - start_blk + 1;
+	err = f2fs_map_blocks_iomap(cow_inode, start_blk, len_blks, &map);
+	if (err)
+		return err;
+	if (map.m_pblk == NULL_ADDR &&
+	    is_inode_flag_set(inode, FI_ATOMIC_REPLACE)) {
+		err = f2fs_map_blocks_preallocate(cow_inode, map.m_lblk,
+						  map.m_len, &map);
+		if (err)
+			return err;
+		inc_atomic_write_cnt(inode);
+		goto out;
+	} else if (map.m_pblk != NULL_ADDR) {
+		goto out;
+	}
+	err = f2fs_map_blocks_iomap(inode, start_blk, len_blks, &map);
+	if (err)
+		return err;
+out:
+	if (WARN_ON_ONCE(map.m_pblk == COMPRESS_ADDR))
+		return -EIO;
+
+	return f2fs_set_iomap(inode, &map, iomap, flags, offset, length, false);
+}
+
+static int f2fs_buffered_write_iomap_end(struct inode *inode, loff_t pos,
+					 loff_t length, ssize_t written,
+					 unsigned flags, struct iomap *iomap)
+{
+	return written;
+}
+
+const struct iomap_ops f2fs_buffered_write_iomap_ops = {
+	.iomap_begin = f2fs_buffered_write_iomap_begin,
+	.iomap_end = f2fs_buffered_write_iomap_end,
+};
+
+const struct iomap_ops f2fs_buffered_write_atomic_iomap_ops = {
+	.iomap_begin = f2fs_buffered_write_atomic_iomap_begin,
+};
+
+const struct address_space_operations f2fs_iomap_aops = {
+	.read_folio = f2fs_read_data_folio,
+	.readahead = f2fs_iomap_readahead,
+	.write_begin = f2fs_write_begin,
+	.write_end = f2fs_write_end,
+	.writepages = f2fs_write_data_pages,
+	.dirty_folio = f2fs_dirty_data_folio,
+	.invalidate_folio = f2fs_invalidate_folio,
+	.release_folio = f2fs_release_folio,
+	.migrate_folio = filemap_migrate_folio,
+	.is_partially_uptodate = iomap_is_partially_uptodate,
+	.error_remove_folio = generic_error_remove_folio,
+};
+
+static void f2fs_iomap_put_folio(struct inode *inode, loff_t pos,
+				 unsigned copied, struct folio *folio)
+{
+	if (!copied)
+		goto unlock_out;
+	if (f2fs_is_atomic_file(inode))
+		folio_set_f2fs_atomic(folio);
+
+	if (pos + copied > i_size_read(inode) &&
+	    !f2fs_verity_in_progress(inode)) {
+		if (f2fs_is_atomic_file(inode))
+			f2fs_i_size_write(F2FS_I(inode)->cow_inode,
+					  pos + copied);
+	}
+unlock_out:
+	folio_unlock(folio);
+	folio_put(folio);
+	f2fs_update_time(F2FS_I_SB(inode), REQ_TIME);
+}
+
+#ifdef CONFIG_F2FS_IOMAP_FOLIO_STATE
+static bool f2fs_iomap_valid(struct inode *inode, const struct iomap *iomap)
+{
+	return iomap->validity_cookie == f2fs_iomap_seq_read(inode);
+}
+#else
+static bool f2fs_iomap_valid(struct inode *inode, const struct iomap *iomap)
+{
+	return 1;
+}
+#endif
+const struct iomap_write_ops f2fs_iomap_write_ops = {
+	.put_folio = f2fs_iomap_put_folio,
+	.iomap_valid = f2fs_iomap_valid
+};
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index ac9a6ac13e1f..1cf12b76b09a 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -762,6 +762,7 @@ struct extent_tree_info {
 #define F2FS_MAP_NEW		(1U << 0)
 #define F2FS_MAP_MAPPED		(1U << 1)
 #define F2FS_MAP_DELALLOC	(1U << 2)
+#define F2FS_MAP_NODNODE	(1U << 3)
 #define F2FS_MAP_FLAGS		(F2FS_MAP_NEW | F2FS_MAP_MAPPED |\
 				F2FS_MAP_DELALLOC)
 
@@ -837,49 +838,53 @@ enum {
 
 /* used for f2fs_inode_info->flags */
 enum {
-	FI_NEW_INODE,		/* indicate newly allocated inode */
-	FI_DIRTY_INODE,		/* indicate inode is dirty or not */
-	FI_AUTO_RECOVER,	/* indicate inode is recoverable */
-	FI_DIRTY_DIR,		/* indicate directory has dirty pages */
-	FI_INC_LINK,		/* need to increment i_nlink */
-	FI_ACL_MODE,		/* indicate acl mode */
-	FI_NO_ALLOC,		/* should not allocate any blocks */
-	FI_FREE_NID,		/* free allocated nide */
-	FI_NO_EXTENT,		/* not to use the extent cache */
-	FI_INLINE_XATTR,	/* used for inline xattr */
-	FI_INLINE_DATA,		/* used for inline data*/
-	FI_INLINE_DENTRY,	/* used for inline dentry */
-	FI_APPEND_WRITE,	/* inode has appended data */
-	FI_UPDATE_WRITE,	/* inode has in-place-update data */
-	FI_NEED_IPU,		/* used for ipu per file */
-	FI_ATOMIC_FILE,		/* indicate atomic file */
-	FI_DATA_EXIST,		/* indicate data exists */
-	FI_SKIP_WRITES,		/* should skip data page writeback */
-	FI_OPU_WRITE,		/* used for opu per file */
-	FI_DIRTY_FILE,		/* indicate regular/symlink has dirty pages */
-	FI_PREALLOCATED_ALL,	/* all blocks for write were preallocated */
-	FI_HOT_DATA,		/* indicate file is hot */
-	FI_EXTRA_ATTR,		/* indicate file has extra attribute */
-	FI_PROJ_INHERIT,	/* indicate file inherits projectid */
-	FI_PIN_FILE,		/* indicate file should not be gced */
-	FI_VERITY_IN_PROGRESS,	/* building fs-verity Merkle tree */
-	FI_COMPRESSED_FILE,	/* indicate file's data can be compressed */
-	FI_COMPRESS_CORRUPT,	/* indicate compressed cluster is corrupted */
-	FI_MMAP_FILE,		/* indicate file was mmapped */
-	FI_ENABLE_COMPRESS,	/* enable compression in "user" compression mode */
-	FI_COMPRESS_RELEASED,	/* compressed blocks were released */
-	FI_ALIGNED_WRITE,	/* enable aligned write */
-	FI_COW_FILE,		/* indicate COW file */
-	FI_ATOMIC_COMMITTED,	/* indicate atomic commit completed except disk sync */
-	FI_ATOMIC_DIRTIED,	/* indicate atomic file is dirtied */
-	FI_ATOMIC_REPLACE,	/* indicate atomic replace */
-	FI_OPENED_FILE,		/* indicate file has been opened */
-	FI_DONATE_FINISHED,	/* indicate page donation of file has been finished */
-	FI_MAX,			/* max flag, never be used */
+	FI_NEW_INODE, /* indicate newly allocated inode */
+	FI_DIRTY_INODE, /* indicate inode is dirty or not */
+	FI_AUTO_RECOVER, /* indicate inode is recoverable */
+	FI_DIRTY_DIR, /* indicate directory has dirty pages */
+	FI_INC_LINK, /* need to increment i_nlink */
+	FI_ACL_MODE, /* indicate acl mode */
+	FI_NO_ALLOC, /* should not allocate any blocks */
+	FI_FREE_NID, /* free allocated nide */
+	FI_NO_EXTENT, /* not to use the extent cache */
+	FI_INLINE_XATTR, /* used for inline xattr */
+	FI_INLINE_DATA, /* used for inline data*/
+	FI_INLINE_DENTRY, /* used for inline dentry */
+	FI_APPEND_WRITE, /* inode has appended data */
+	FI_UPDATE_WRITE, /* inode has in-place-update data */
+	FI_NEED_IPU, /* used for ipu per file */
+	FI_ATOMIC_FILE, /* indicate atomic file */
+	FI_DATA_EXIST, /* indicate data exists */
+	FI_SKIP_WRITES, /* should skip data page writeback */
+	FI_OPU_WRITE, /* used for opu per file */
+	FI_DIRTY_FILE, /* indicate regular/symlink has dirty pages */
+	FI_PREALLOCATED_ALL, /* all blocks for write were preallocated */
+	FI_HOT_DATA, /* indicate file is hot */
+	FI_EXTRA_ATTR, /* indicate file has extra attribute */
+	FI_PROJ_INHERIT, /* indicate file inherits projectid */
+	FI_PIN_FILE, /* indicate file should not be gced */
+	FI_VERITY_IN_PROGRESS, /* building fs-verity Merkle tree */
+	FI_COMPRESSED_FILE, /* indicate file's data can be compressed */
+	FI_COMPRESS_CORRUPT, /* indicate compressed cluster is corrupted */
+	FI_MMAP_FILE, /* indicate file was mmapped */
+	FI_ENABLE_COMPRESS, /* enable compression in "user" compression mode */
+	FI_COMPRESS_RELEASED, /* compressed blocks were released */
+	FI_ALIGNED_WRITE, /* enable aligned write */
+	FI_COW_FILE, /* indicate COW file */
+	FI_ATOMIC_COMMITTED, /* indicate atomic commit completed except disk sync */
+	FI_ATOMIC_DIRTIED, /* indicate atomic file is dirtied */
+	FI_ATOMIC_REPLACE, /* indicate atomic replace */
+	FI_OPENED_FILE, /* indicate file has been opened */
+	FI_DONATE_FINISHED, /* indicate page donation of file has been finished */
+	FI_IOMAP, /* indicate whether this inode should enable iomap*/
+	FI_MAX, /* max flag, never be used */
 };
 
 struct f2fs_inode_info {
 	struct inode vfs_inode;		/* serve a vfs inode */
+#ifdef CONFIG_F2FS_IOMAP_FOLIO_STATE
+	atomic64_t i_iomap_seq; /* for iomap_valid sequence number */
+#endif
 	unsigned long i_flags;		/* keep an inode flags for ioctl */
 	unsigned char i_advise;		/* use to give file attribute hints */
 	unsigned char i_dir_level;	/* use for dentry level for large dir */
@@ -2814,6 +2819,16 @@ static inline void inc_page_count(struct f2fs_sb_info *sbi, int count_type)
 		set_sbi_flag(sbi, SBI_IS_DIRTY);
 }
 
+static inline void inc_page_count_multiple(struct f2fs_sb_info *sbi,
+					   int count_type, int npages)
+{
+	atomic_add(npages, &sbi->nr_pages[count_type]);
+
+	if (count_type == F2FS_DIRTY_DENTS || count_type == F2FS_DIRTY_NODES ||
+	    count_type == F2FS_DIRTY_META || count_type == F2FS_DIRTY_QDATA ||
+	    count_type == F2FS_DIRTY_IMETA)
+		set_sbi_flag(sbi, SBI_IS_DIRTY);
+}
 static inline void inode_inc_dirty_pages(struct inode *inode)
 {
 	atomic_inc(&F2FS_I(inode)->dirty_pages);
@@ -3657,6 +3672,10 @@ static inline bool f2fs_is_cow_file(struct inode *inode)
 	return is_inode_flag_set(inode, FI_COW_FILE);
 }
 
+static inline bool f2fs_iomap_inode(struct inode *inode)
+{
+	return is_inode_flag_set(inode, FI_IOMAP);
+}
 static inline void *inline_data_addr(struct inode *inode, struct folio *folio)
 {
 	__le32 *addr = get_dnode_addr(inode, folio);
@@ -3880,7 +3899,17 @@ int f2fs_write_inode(struct inode *inode, struct writeback_control *wbc);
 void f2fs_remove_donate_inode(struct inode *inode);
 void f2fs_evict_inode(struct inode *inode);
 void f2fs_handle_failed_inode(struct inode *inode);
+#ifdef CONFIG_F2FS_IOMAP_FOLIO_STATE
+static inline void f2fs_iomap_seq_inc(struct inode *inode)
+{
+	atomic64_inc(&F2FS_I(inode)->i_iomap_seq);
+}
 
+static inline u64 f2fs_iomap_seq_read(struct inode *inode)
+{
+	return atomic64_read(&F2FS_I(inode)->i_iomap_seq);
+}
+#endif
 /*
  * namei.c
  */
@@ -4248,6 +4277,9 @@ int f2fs_write_single_data_page(struct folio *folio, int *submitted,
 				enum iostat_type io_type,
 				int compr_blocks, bool allow_balance);
 void f2fs_write_failed(struct inode *inode, loff_t to);
+int f2fs_set_iomap(struct inode *inode, struct f2fs_map_blocks *map,
+		   struct iomap *iomap, unsigned int flags, loff_t offset,
+		   loff_t length, bool dio);
 void f2fs_invalidate_folio(struct folio *folio, size_t offset, size_t length);
 bool f2fs_release_folio(struct folio *folio, gfp_t wait);
 bool f2fs_overwrite_io(struct inode *inode, loff_t pos, size_t len);
@@ -4258,6 +4290,11 @@ int f2fs_init_post_read_wq(struct f2fs_sb_info *sbi);
 void f2fs_destroy_post_read_wq(struct f2fs_sb_info *sbi);
 extern const struct iomap_ops f2fs_iomap_ops;
 
+extern const struct iomap_write_ops f2fs_iomap_write_ops;
+extern const struct iomap_ops f2fs_buffered_read_iomap_ops;
+extern const struct iomap_ops f2fs_buffered_write_iomap_ops;
+extern const struct iomap_ops f2fs_buffered_write_atomic_iomap_ops;
+
 /*
  * gc.c
  */
@@ -4540,6 +4577,7 @@ extern const struct file_operations f2fs_dir_operations;
 extern const struct file_operations f2fs_file_operations;
 extern const struct inode_operations f2fs_file_inode_operations;
 extern const struct address_space_operations f2fs_dblock_aops;
+extern const struct address_space_operations f2fs_iomap_aops;
 extern const struct address_space_operations f2fs_node_aops;
 extern const struct address_space_operations f2fs_meta_aops;
 extern const struct inode_operations f2fs_dir_inode_operations;
@@ -4578,7 +4616,9 @@ int f2fs_read_inline_dir(struct file *file, struct dir_context *ctx,
 int f2fs_inline_data_fiemap(struct inode *inode,
 			struct fiemap_extent_info *fieinfo,
 			__u64 start, __u64 len);
-
+void f2fs_iomap_prepare_read_inline(struct inode *inode, struct folio *ifolio,
+				    struct iomap *iomap, loff_t pos,
+				    loff_t length);
 /*
  * shrinker.c
  */
diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index 42faaed6a02d..6c5b3e632f2b 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -4965,7 +4965,14 @@ static int f2fs_preallocate_blocks(struct kiocb *iocb, struct iov_iter *iter,
 		if (ret)
 			return ret;
 	}
-
+#ifdef CONFIG_F2FS_IOMAP_FOLIO_STATE
+	/* Buffered write can convert inline file to large normal file
+	 * when convert success, we uses mapping set large folios here
+	 */
+	if (f2fs_should_use_buffered_iomap(inode))
+		mapping_set_large_folios(inode->i_mapping);
+	set_inode_flag(inode, FI_IOMAP);
+#endif
 	/* Do not preallocate blocks that will be written partially in 4KB. */
 	map.m_lblk = F2FS_BLK_ALIGN(pos);
 	map.m_len = F2FS_BYTES_TO_BLK(pos + count);
@@ -4994,6 +5001,24 @@ static int f2fs_preallocate_blocks(struct kiocb *iocb, struct iov_iter *iter,
 	return map.m_len;
 }
 
+static ssize_t f2fs_iomap_buffered_write(struct kiocb *iocb, struct iov_iter *i)
+{
+	struct file *file = iocb->ki_filp;
+	struct inode *inode = file_inode(file);
+	ssize_t ret;
+
+	if (f2fs_is_atomic_file(inode)) {
+		ret = iomap_file_buffered_write(iocb, i,
+						&f2fs_buffered_write_atomic_iomap_ops,
+						&f2fs_iomap_write_ops, NULL);
+	} else {
+		ret = iomap_file_buffered_write(iocb, i,
+						&f2fs_buffered_write_iomap_ops,
+						&f2fs_iomap_write_ops, NULL);
+	}
+	return ret;
+}
+
 static ssize_t f2fs_buffered_write_iter(struct kiocb *iocb,
 					struct iov_iter *from)
 {
@@ -5004,7 +5029,11 @@ static ssize_t f2fs_buffered_write_iter(struct kiocb *iocb,
 	if (iocb->ki_flags & IOCB_NOWAIT)
 		return -EOPNOTSUPP;
 
-	ret = generic_perform_write(iocb, from);
+	if (f2fs_iomap_inode(inode)) {
+		ret = f2fs_iomap_buffered_write(iocb, from);
+	} else {
+		ret = generic_perform_write(iocb, from);
+	}
 
 	if (ret > 0) {
 		f2fs_update_iostat(F2FS_I_SB(inode), inode,
diff --git a/fs/f2fs/inline.c b/fs/f2fs/inline.c
index 58ac831ef704..bda338b4fc22 100644
--- a/fs/f2fs/inline.c
+++ b/fs/f2fs/inline.c
@@ -13,7 +13,7 @@
 #include "f2fs.h"
 #include "node.h"
 #include <trace/events/f2fs.h>
-
+#include <linux/iomap.h>
 static bool support_inline_data(struct inode *inode)
 {
 	if (f2fs_used_in_atomic_write(inode))
@@ -832,3 +832,16 @@ int f2fs_inline_data_fiemap(struct inode *inode,
 	f2fs_folio_put(ifolio, true);
 	return err;
 }
+/* fill iomap struct for inline data case for
+ *iomap buffered write
+ */
+void f2fs_iomap_prepare_read_inline(struct inode *inode, struct folio *ifolio,
+				    struct iomap *iomap, loff_t pos,
+				    loff_t length)
+{
+	iomap->addr = IOMAP_NULL_ADDR;
+	iomap->length = length;
+	iomap->type = IOMAP_INLINE;
+	iomap->flags = 0;
+	iomap->inline_data = inline_data_addr(inode, ifolio);
+}
diff --git a/fs/f2fs/inode.c b/fs/f2fs/inode.c
index 8c4eafe9ffac..29378270d561 100644
--- a/fs/f2fs/inode.c
+++ b/fs/f2fs/inode.c
@@ -23,6 +23,24 @@
 extern const struct address_space_operations f2fs_compress_aops;
 #endif
 
+bool f2fs_should_use_buffered_iomap(struct inode *inode)
+{
+	if (!S_ISREG(inode->i_mode))
+		return false;
+	if (S_ISDIR(inode->i_mode) || S_ISLNK(inode->i_mode))
+		return false;
+	if (inode->i_mapping == NODE_MAPPING(F2FS_I_SB(inode)))
+		return false;
+	if (inode->i_mapping == META_MAPPING(F2FS_I_SB(inode)))
+		return false;
+	if (f2fs_encrypted_file(inode))
+		return false;
+	if (fsverity_active(inode))
+		return false;
+	if (f2fs_compressed_file(inode))
+		return false;
+	return true;
+}
 void f2fs_mark_inode_dirty_sync(struct inode *inode, bool sync)
 {
 	if (is_inode_flag_set(inode, FI_NEW_INODE))
@@ -611,7 +629,16 @@ struct inode *f2fs_iget(struct super_block *sb, unsigned long ino)
 	} else if (S_ISREG(inode->i_mode)) {
 		inode->i_op = &f2fs_file_inode_operations;
 		inode->i_fop = &f2fs_file_operations;
+#ifdef CONFIG_F2FS_IOMAP_FOLIO_STATE
+		if (f2fs_should_use_buffered_iomap(inode)) {
+			mapping_set_large_folios(inode->i_mapping);
+			set_inode_flag(inode, FI_IOMAP);
+			inode->i_mapping->a_ops = &f2fs_iomap_aops;
+		} else
+			inode->i_mapping->a_ops = &f2fs_dblock_aops;
+#else
 		inode->i_mapping->a_ops = &f2fs_dblock_aops;
+#endif
 	} else if (S_ISDIR(inode->i_mode)) {
 		inode->i_op = &f2fs_dir_inode_operations;
 		inode->i_fop = &f2fs_dir_operations;
diff --git a/fs/f2fs/namei.c b/fs/f2fs/namei.c
index b882771e4699..2d995860c488 100644
--- a/fs/f2fs/namei.c
+++ b/fs/f2fs/namei.c
@@ -328,6 +328,13 @@ static struct inode *f2fs_new_inode(struct mnt_idmap *idmap,
 	f2fs_init_extent_tree(inode);
 
 	trace_f2fs_new_inode(inode, 0);
+#ifdef CONFIG_F2FS_IOMAP_FOLIO_STATE
+	if (f2fs_should_use_buffered_iomap(inode)) {
+		set_inode_flag(inode, FI_IOMAP);
+		mapping_set_large_folios(inode->i_mapping);
+	}
+#endif
+
 	return inode;
 
 fail:
diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
index 2000880b7dca..35a42d6214fe 100644
--- a/fs/f2fs/super.c
+++ b/fs/f2fs/super.c
@@ -1719,6 +1719,9 @@ static struct inode *f2fs_alloc_inode(struct super_block *sb)
 	init_once((void *) fi);
 
 	/* Initialize f2fs-specific inode info */
+#ifdef CONFIG_F2FS_IOMAP_FOLIO_STATE
+	atomic64_set(&fi->i_iomap_seq, 0);
+#endif
 	atomic_set(&fi->dirty_pages, 0);
 	atomic_set(&fi->i_compr_blocks, 0);
 	atomic_set(&fi->open_count, 0);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2025-08-13  9:39 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-13  9:37 [f2fs-dev] [RESEND RFC PATCH 0/9] f2fs: Enable buffered read/write large folios support with extended iomap Nanzhe Zhao
2025-08-13  9:37 ` [RFC PATCH 1/9] f2fs: Introduce f2fs_iomap_folio_state Nanzhe Zhao
2025-08-13  9:37 ` [RFC PATCH 2/9] f2fs: Integrate f2fs_iomap_folio_state into f2fs page private helpers Nanzhe Zhao
2025-08-13  9:37 ` [RFC PATCH 3/9] f2fs: Using `folio_detach_f2fs_private` in invalidate and release folio Nanzhe Zhao
2025-08-13  9:37 ` [RFC PATCH 4/9] f2fs: Convert outplace write path page private funcions to folio private functions Nanzhe Zhao
2025-08-13  9:37 ` [RFC PATCH 5/9] f2fs:Refactor `f2fs_is_compressed_page` to `f2fs_is_compressed_folio` Nanzhe Zhao
2025-08-13  9:37 ` [RFC PATCH 6/9] f2fs: Extend f2fs_io_info to support sub-folio ranges Nanzhe Zhao
2025-08-13  9:37 ` [RFC PATCH 7/9] f2fs:Make GC aware of large folios Nanzhe Zhao
2025-08-13  9:37 ` [RFC PATCH 8/9] f2fs: Introduce F2FS_GET_BLOCK_IOMAP and map_blocks he lpers Nanzhe Zhao
2025-08-13  9:37 ` [RFC PATCH 9/9] f2fs: Enable buffered read/write path large folios support for normal and atomic file with iomap Nanzhe Zhao
  -- strict thread matches above, loose matches on Subject: below --
2025-08-13  9:21 [f2fs-dev] [RFC PATCH 0/9] f2fs: Enable buffered read/write large folios support with extended iomap Nanzhe Zhao
2025-08-13  9:21 ` [RFC PATCH 7/9] f2fs:Make GC aware of large folios Nanzhe Zhao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).