* [RFC PATCH 7/9] f2fs:Make GC aware of large folios
2025-08-13 9:21 [f2fs-dev] [RFC PATCH 0/9] f2fs: Enable buffered read/write large folios support with extended iomap Nanzhe Zhao
@ 2025-08-13 9:21 ` Nanzhe Zhao
0 siblings, 0 replies; 11+ messages in thread
From: Nanzhe Zhao @ 2025-08-13 9:21 UTC (permalink / raw)
To: Jaegeuk Kim, linux-f2fs, linux-fsdevel
Cc: Matthew Wilcox, Chao Yu, Yi Zhang, Barry Song, Nanzhe Zhao
Previously, the GC (Garbage Collection) logic for performing I/O and
marking folios dirty only supported order-0 folios and lacked awareness
of higher-order folios. To enable GC to correctly handle higher-order
folios, we made two changes:
- In `move_data_page`, we now use `f2fs_iomap_set_range_dirty` to mark
only the sub-part of the folio corresponding to `bidx` as dirty,
instead of the entire folio.
- The `f2fs_submit_page_read` function has been augmented with an
`index` parameter, allowing it to precisely identify which sub-page
of the current folio is being submitted.
Signed-off-by: Nanzhe Zhao <nzzhao@126.com>
---
fs/f2fs/data.c | 13 +++++++------
fs/f2fs/gc.c | 37 +++++++++++++++++++++++--------------
2 files changed, 30 insertions(+), 20 deletions(-)
diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index b7bef2a28c8e..5ecd08a3dd0b 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -1096,7 +1096,7 @@ static struct bio *f2fs_grab_read_bio(struct inode *inode, block_t blkaddr,
/* This can handle encryption stuffs */
static int f2fs_submit_page_read(struct inode *inode, struct folio *folio,
block_t blkaddr, blk_opf_t op_flags,
- bool for_write)
+ pgoff_t index, bool for_write)
{
struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
struct bio *bio;
@@ -1109,7 +1109,8 @@ static int f2fs_submit_page_read(struct inode *inode, struct folio *folio,
/* wait for GCed page writeback via META_MAPPING */
f2fs_wait_on_block_writeback(inode, blkaddr);
- if (!bio_add_folio(bio, folio, PAGE_SIZE, 0)) {
+ if (!bio_add_folio(bio, folio, PAGE_SIZE,
+ (index - folio->index) << PAGE_SHIFT)) {
iostat_update_and_unbind_ctx(bio);
if (bio->bi_private)
mempool_free(bio->bi_private, bio_post_read_ctx_pool);
@@ -1276,8 +1277,8 @@ struct folio *f2fs_get_read_data_folio(struct inode *inode, pgoff_t index,
return folio;
}
- err = f2fs_submit_page_read(inode, folio, dn.data_blkaddr,
- op_flags, for_write);
+ err = f2fs_submit_page_read(inode, folio, dn.data_blkaddr, op_flags,
+ index, for_write);
if (err)
goto put_err;
return folio;
@@ -3651,8 +3652,8 @@ static int f2fs_write_begin(const struct kiocb *iocb,
goto put_folio;
}
err = f2fs_submit_page_read(use_cow ?
- F2FS_I(inode)->cow_inode : inode,
- folio, blkaddr, 0, true);
+ F2FS_I(inode)->cow_inode : inode, folio,
+ blkaddr, 0, folio->index, true);
if (err)
goto put_folio;
diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
index 098e9f71421e..6d28f01bec42 100644
--- a/fs/f2fs/gc.c
+++ b/fs/f2fs/gc.c
@@ -1475,22 +1475,31 @@ static int move_data_page(struct inode *inode, block_t bidx, int gc_type,
err = -EAGAIN;
goto out;
}
- folio_mark_dirty(folio);
folio_set_f2fs_gcing(folio);
+#ifdef CONFIG_F2FS_IOMAP_FOLIO_STATE
+ if (!folio_test_large(folio)) {
+ folio_mark_dirty(folio);
+ } else {
+ f2fs_iomap_set_range_dirty(folio, (bidx - folio->index) << PAGE_SHIFT,
+ PAGE_SIZE);
+ }
+#else
+ folio_mark_dirty(folio);
+#endif
} else {
- struct f2fs_io_info fio = {
- .sbi = F2FS_I_SB(inode),
- .ino = inode->i_ino,
- .type = DATA,
- .temp = COLD,
- .op = REQ_OP_WRITE,
- .op_flags = REQ_SYNC,
- .old_blkaddr = NULL_ADDR,
- .folio = folio,
- .encrypted_page = NULL,
- .need_lock = LOCK_REQ,
- .io_type = FS_GC_DATA_IO,
- };
+ struct f2fs_io_info fio = { .sbi = F2FS_I_SB(inode),
+ .ino = inode->i_ino,
+ .type = DATA,
+ .temp = COLD,
+ .op = REQ_OP_WRITE,
+ .op_flags = REQ_SYNC,
+ .old_blkaddr = NULL_ADDR,
+ .folio = folio,
+ .encrypted_page = NULL,
+ .need_lock = LOCK_REQ,
+ .io_type = FS_GC_DATA_IO,
+ .idx = bidx - folio->index,
+ .cnt = 1 };
bool is_dirty = folio_test_dirty(folio);
retry:
--
2.34.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [f2fs-dev] [RESEND RFC PATCH 0/9] f2fs: Enable buffered read/write large folios support with extended iomap
@ 2025-08-13 9:37 Nanzhe Zhao
2025-08-13 9:37 ` [RFC PATCH 1/9] f2fs: Introduce f2fs_iomap_folio_state Nanzhe Zhao
` (8 more replies)
0 siblings, 9 replies; 11+ messages in thread
From: Nanzhe Zhao @ 2025-08-13 9:37 UTC (permalink / raw)
To: Jaegeuk Kim, linux-f2fs-devel, linux-fsdevel
Cc: Matthew Wilcox, Chao Yu, Yi Zhang, Barry Song, Nanzhe Zhao
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=y, Size: 6550 bytes --]
Resend: original patch was misspelling
the linux-f2fs-devel@lists.sourceforge.net address.
No code changes.
This RFC series enable buffered read/write paths large folio support
with F2FS-specific extended iomap, combined with some other preparation
work for large folio integration.
Because this is my first time to send a patch series to the kernel
mailing list,
I might have missed some conventions.
The patch passes checkpatch.pl with no errors,
but a few warnings remain.
I wasn't sure about the best way to address them,
so I would appreciate your guidance.
I am happy to fix them if needed.
Motivations:
* **Why iomap**:
* F2FS couples pages directly to BIOs without a per-block tracking
struct like buffer-head or sub-page.
A naive large-folio port would cause:
* Write-amplification.
* Premature folio_end_read() / folio_end_writeback()
when multi sub-ranges of a large folio are in io concurrently.
Above issues has already been handled cleanly by iomap_folio_state.
* Original buffered write path unlocks a folio halfway,causes status
recheck for large folios carried with iomap_folio_state
or partially trucnated folio tricky.iomap handles all locking
unlocking operations automatically.
* **Why extends iomap**
* F2FS stores its flags in the folio's private field,
which conflicts with iomap_folio_state.
* To resolve this, we designed f2fs_iomap_folio_state,
compatible with iomap_folio_state's layout while extending
its flexible state array for F2FS private flags.
* We store a magic number in read_bytes_pending to distinguish
whether a folio uses the original or F2FS's iomap_folio_state.
It's chosen because it remains 0 after readahead completes.
Implementation notes:
* New Kconfig: `CONFIG_F2FS_IOMAP_FOLIO_STATE`; when off,falls back
to the legacy buffered io path.
Limitations
* Don't support BLOCK_SIZE > PAGE_SIZE now.
* Don't support large folios for encrypted and fsverity files.
* Page writeback and compressed files large folios support is still WIP.
Why RFC:
* Need review and potential improvement on
`f2fs_iomap_folio_state` design and implementation.
* Limited test coverage so far.Any extra testing is highly appreciated.
* Two runtime issues remain (see below).
Performance Testing:
* Platform: x86-64 laptop (PCIe 4.0 NVMe) -> qemu-arm64 VM, 4 GiB RAM
* Kernel: gcc-13.2, defconfig + `CONFIG_F2FS_IOMAP_FOLIO_STATE=y`
fio 3.35, `ioengine=psync`, `size=1G`, `numjobs=1`
Read throughput (MiB/s):
--- Kernel: iomap_v1 file type: normal ---
Block Size (bs) | Avg. Bandwidth (MiB/s) | Avg. IOPS
---------------------+------------------------------+-----------------
100M | 2809.60 | 27.50
10M | 3184.60 | 317.90
128k | 1376.20 | 11000.80
1G | 1954.70 | 1.20
1M | 2717.00 | 2716.70
4k | 616.50 | 157800.00
--- Kernel: vanilla file type: normal ---
Block Size (bs) | Avg. Bandwidth (MiB/s) | Avg. IOPS
---------------------+------------------------------+-----------------
100M | 994.60 | 9.60
10M | 986.50 | 98.10
128k | 693.80 | 5550.90
1G | 816.90 | 0.00
1M | 968.90 | 968.40
4k | 429.80 | 109990.00
--- Kernel: iomap_v1 file type: hole ---
Block Size (bs) | Avg. Bandwidth (MiB/s) | Avg. IOPS
---------------------+------------------------------+-----------------
100M | 1825.60 | 17.70
10M | 1989.24 | 198.42
1G | 1312.80 | 0.90
1M | 2326.02 | 2325.42
4k | 799.40 | 204700.00
--- Kernel: vanilla file type: hole ---
Block Size (bs) | Avg. Bandwidth (MiB/s) | Avg. IOPS
---------------------+------------------------------+-----------------
100M | 708.90 | 6.50
10M | 735.00 | 73.10
128k | 786.70 | 6292.20
1G | 613.20 | 0.00
1M | 764.50 | 764.25
4k | 478.80 | 122400.00
Sparse-file numbers on qemu look skewed; further bare-metal tests planned.
Write benchmarks are currently blocked by the issues below.
Known issues (help appreciated)
**Write throttling stalls**
```sh
dd if=/dev/zero of=test.img bs=1G count=1 conv=fsync
```
Write speed decays; task spins in `iomap_write_iter`
->`balance_dirty_pages_ratelimited_flags`.
**fsync dead-lock**
```sh
fio --rw=write --bs=4K --fsync=1 --size=1G --ioengine=psync …
```
Task Hangs in `f2fs_issue_flush`->'submit_bio_wait'
Full traces will be posted in a follow-up.
Nanzhe Zhao (9):
f2fs: Introduce f2fs_iomap_folio_state
f2fs: Integrate f2fs_iomap_folio_state into f2fs page private helpers
f2fs: Using `folio_detach_f2fs_private` in invalidate and release
folio
f2fs: Convert outplace write path page private funcions to folio
private functions.
f2fs:Refactor `f2fs_is_compressed_page` to `f2fs_is_compressed_folio`
f2fs: Extend f2fs_io_info to support sub-folio ranges
f2fs:Make GC aware of large folios
f2fs: Introduce F2FS_GET_BLOCK_IOMAP and map_blocks he lpers
f2fs: Enable buffered read/write path large folios support for normal
and atomic file with iomap
fs/f2fs/Kconfig | 10 ++
fs/f2fs/Makefile | 1 +
fs/f2fs/compress.c | 11 +-
fs/f2fs/data.c | 389 ++++++++++++++++++++++++++++++++++++------
fs/f2fs/f2fs.h | 412 ++++++++++++++++++++++++++++++++++-----------
fs/f2fs/f2fs_ifs.c | 221 ++++++++++++++++++++++++
fs/f2fs/f2fs_ifs.h | 79 +++++++++
fs/f2fs/file.c | 33 +++-
fs/f2fs/gc.c | 37 ++--
fs/f2fs/inline.c | 15 +-
fs/f2fs/inode.c | 27 +++
fs/f2fs/namei.c | 7 +
fs/f2fs/segment.c | 2 +-
fs/f2fs/super.c | 3 +
14 files changed, 1082 insertions(+), 165 deletions(-)
create mode 100644 fs/f2fs/f2fs_ifs.c
create mode 100644 fs/f2fs/f2fs_ifs.h
base-commit: b45116aef78ff0059abf563b339e62a734487a50
--
2.34.1
^ permalink raw reply [flat|nested] 11+ messages in thread
* [RFC PATCH 1/9] f2fs: Introduce f2fs_iomap_folio_state
2025-08-13 9:37 [f2fs-dev] [RESEND RFC PATCH 0/9] f2fs: Enable buffered read/write large folios support with extended iomap Nanzhe Zhao
@ 2025-08-13 9:37 ` Nanzhe Zhao
2025-08-13 9:37 ` [RFC PATCH 2/9] f2fs: Integrate f2fs_iomap_folio_state into f2fs page private helpers Nanzhe Zhao
` (7 subsequent siblings)
8 siblings, 0 replies; 11+ messages in thread
From: Nanzhe Zhao @ 2025-08-13 9:37 UTC (permalink / raw)
To: Jaegeuk Kim, linux-f2fs-devel, linux-fsdevel
Cc: Matthew Wilcox, Chao Yu, Yi Zhang, Barry Song, Nanzhe Zhao
Add f2fs's own per-folio structure to track
per-block dirty state of a folio.
The reason for introducing this structure is that f2fs's private flag
would conflict with iomap_folio_state's use of the folio->private field.
Thanks to Mr. Matthew for providing the idea. See for details:
[https://lore.kernel.org/linux-f2fs-devel/Z-oPTUrF7kkhzJg_
@casper.infradead.org/]
The memory layout of this structure is the same as iomap_folio_state,
except that we set read_bytes_pending to a magic number. This is because
we need to be able to distinguish it from the original iomap_folio_state.
We additionally allocate an unsigned long at the end of the state array
to store f2fs-specific flags.
This implementation is compatible with high-order folios, order-0 folios,
and metadata folios.
However, it does not support compressed data folios.
Introduction to related functions:
- f2fs_ifs_alloc: Allocates f2fs's own f2fs_iomap_folio_state. If it
detects that folio->private already has a value, we distinguish
whether it is f2fs's own flag value or an iomap_folio_state. If it is
the latter, we will copy its content to our f2fs_iomap_folio_state
and then free it.
- folio_detach_f2fs_private: Serves as a unified interface to release
f2fs's private resources, no matter what it is.
- f2fs_ifs_clear_range_uptodate && f2fs_ifs_set_range_dirty: Helper
functions copied and slightly modified from fs/iomap.
- folio_get_f2fs_ifs: Specifically used to get f2fs_iomap_folio_state.
It cannot be used to get f2fs's own fields used on compressed folios.
For the former, we return a null pointer to indicate that the current
folio does not hold an f2fs_iomap_folio_state. For the latter, we
directly BUG_ON.
Signed-off-by: Nanzhe Zhao <nzzhao@126.com>
---
fs/f2fs/Kconfig | 10 ++
fs/f2fs/Makefile | 1 +
fs/f2fs/f2fs_ifs.c | 221 +++++++++++++++++++++++++++++++++++++++++++++
fs/f2fs/f2fs_ifs.h | 79 ++++++++++++++++
4 files changed, 311 insertions(+)
create mode 100644 fs/f2fs/f2fs_ifs.c
create mode 100644 fs/f2fs/f2fs_ifs.h
diff --git a/fs/f2fs/Kconfig b/fs/f2fs/Kconfig
index 5916a02fb46d..480b8536fa39 100644
--- a/fs/f2fs/Kconfig
+++ b/fs/f2fs/Kconfig
@@ -150,3 +150,13 @@ config F2FS_UNFAIR_RWSEM
help
Use unfair rw_semaphore, if system configured IO priority by block
cgroup.
+
+config F2FS_IOMAP_FOLIO_STATE
+ bool "F2FS folio per-block I/O state tracking"
+ depends on F2FS_FS && FS_IOMAP
+ help
+ Enable a custom F2FS structure for tracking the I/O state
+ (up-to-date, dirty) on a per-block basis within a memory folio.
+ This structure stores F2FS private flag in its state flexible
+ array while keeping compatibility with generic iomap_folio_state.
+ Must be enabled if using iomap large folios support in F2FS.
\ No newline at end of file
diff --git a/fs/f2fs/Makefile b/fs/f2fs/Makefile
index 8a7322d229e4..3b9270d774e8 100644
--- a/fs/f2fs/Makefile
+++ b/fs/f2fs/Makefile
@@ -10,3 +10,4 @@ f2fs-$(CONFIG_F2FS_FS_POSIX_ACL) += acl.o
f2fs-$(CONFIG_FS_VERITY) += verity.o
f2fs-$(CONFIG_F2FS_FS_COMPRESSION) += compress.o
f2fs-$(CONFIG_F2FS_IOSTAT) += iostat.o
+f2fs-$(CONFIG_F2FS_IOMAP_FOLIO_STATE) += f2fs_ifs.o
diff --git a/fs/f2fs/f2fs_ifs.c b/fs/f2fs/f2fs_ifs.c
new file mode 100644
index 000000000000..6b7503474580
--- /dev/null
+++ b/fs/f2fs/f2fs_ifs.c
@@ -0,0 +1,221 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/fs.h>
+#include <linux/f2fs_fs.h>
+
+#include "f2fs.h"
+#include "f2fs_ifs.h"
+
+/*
+ * Have to set parameter ifs's type to void*
+ * and have to interpret ifs as f2fs_ifs to access its fields because
+ * we cannot see iomap_folio_state definition
+ */
+static void ifs_to_f2fs_ifs(void *ifs, struct f2fs_iomap_folio_state *fifs,
+ struct folio *folio)
+{
+ struct f2fs_iomap_folio_state *src_ifs =
+ (struct f2fs_iomap_folio_state *)ifs;
+ size_t iomap_longs = f2fs_ifs_iomap_longs(folio);
+
+ fifs->read_bytes_pending = READ_ONCE(src_ifs->read_bytes_pending);
+ atomic_set(&fifs->write_bytes_pending,
+ atomic_read(&src_ifs->write_bytes_pending));
+ memcpy(fifs->state, src_ifs->state,
+ iomap_longs * sizeof(unsigned long));
+}
+
+static inline bool is_f2fs_ifs(struct folio *folio)
+{
+ struct f2fs_iomap_folio_state *fifs;
+
+ if (!folio_test_private(folio))
+ return false;
+
+ // first directly test no pointer flag is set or not
+ if (test_bit(PAGE_PRIVATE_NOT_POINTER,
+ (unsigned long *)&folio->private))
+ return false;
+
+ fifs = (struct f2fs_iomap_folio_state *)folio->private;
+ if (!fifs)
+ return false;
+
+ if (READ_ONCE(fifs->read_bytes_pending) == F2FS_IFS_MAGIC)
+ return true;
+
+ return false;
+}
+
+struct f2fs_iomap_folio_state *f2fs_ifs_alloc(struct folio *folio, gfp_t gfp,
+ bool force_alloc)
+{
+ struct inode *inode = folio->mapping->host;
+ size_t alloc_size = 0;
+
+ if (!folio_test_large(folio)) {
+ if (!force_alloc) {
+ WARN_ON_ONCE(1);
+ return NULL;
+ }
+ /*
+ * GC can store private flag in 0 order folio's folio->private
+ * causes iomap buffered write mistakenly interpret as a pointer
+ * we add a bool force_alloc to deal with this case
+ */
+ struct f2fs_iomap_folio_state *fifs;
+
+ alloc_size = sizeof(*fifs) + 2 * sizeof(unsigned long);
+ fifs = kmalloc(alloc_size, gfp);
+ if (!fifs)
+ return NULL;
+ spin_lock_init(&fifs->state_lock);
+ WRITE_ONCE(fifs->read_bytes_pending, F2FS_IFS_MAGIC);
+ atomic_set(&fifs->write_bytes_pending, 0);
+ unsigned int nr_blocks =
+ i_blocks_per_folio(inode, folio);
+ if (folio_test_uptodate(folio))
+ bitmap_set(fifs->state, 0, nr_blocks);
+ if (folio_test_dirty(folio))
+ bitmap_set(fifs->state, nr_blocks, nr_blocks);
+ *f2fs_ifs_private_flags_ptr(fifs, folio) = 0;
+ folio_attach_private(folio, fifs);
+ return fifs;
+ }
+
+ struct f2fs_iomap_folio_state *fifs;
+ void *old_private;
+ size_t iomap_longs;
+ size_t total_longs;
+
+ WARN_ON_ONCE(!inode); // Should have an inode
+
+ old_private = folio_get_private(folio);
+
+ if (old_private) {
+ // Check if it's already our type using the magic number directly
+ if (READ_ONCE(((struct f2fs_iomap_folio_state *)old_private)
+ ->read_bytes_pending) == F2FS_IFS_MAGIC) {
+ return (struct f2fs_iomap_folio_state *)
+ old_private; // Already ours
+ }
+ // Non-NULL, not ours -> Allocate, Copy, Replace path
+ total_longs = f2fs_ifs_total_longs(folio);
+ alloc_size = sizeof(*fifs) +
+ total_longs * sizeof(unsigned long);
+
+ fifs = kmalloc(alloc_size, gfp);
+ if (!fifs)
+ return NULL;
+
+ spin_lock_init(&fifs->state_lock);
+ *f2fs_ifs_private_flags_ptr(fifs, folio) = 0;
+ // Copy data from the presumed iomap_folio_state (old_private)
+ ifs_to_f2fs_ifs(old_private, fifs, folio);
+ WRITE_ONCE(fifs->read_bytes_pending, F2FS_IFS_MAGIC);
+ folio_change_private(folio, fifs);
+ kfree(old_private);
+ return fifs;
+ }
+
+ iomap_longs = f2fs_ifs_iomap_longs(folio);
+ total_longs = iomap_longs + 1;
+ alloc_size =
+ sizeof(*fifs) + total_longs * sizeof(unsigned long);
+
+ fifs = kzalloc(alloc_size, gfp);
+ if (!fifs)
+ return NULL;
+
+ spin_lock_init(&fifs->state_lock);
+
+ unsigned int nr_blocks = i_blocks_per_folio(inode, folio);
+
+ if (folio_test_uptodate(folio))
+ bitmap_set(fifs->state, 0, nr_blocks);
+ if (folio_test_dirty(folio))
+ bitmap_set(fifs->state, nr_blocks, nr_blocks);
+ WRITE_ONCE(fifs->read_bytes_pending, F2FS_IFS_MAGIC);
+ atomic_set(&fifs->write_bytes_pending, 0);
+ folio_attach_private(folio, fifs);
+ return fifs;
+}
+
+void folio_detach_f2fs_private(struct folio *folio)
+{
+ struct f2fs_iomap_folio_state *fifs;
+
+ if (!folio_test_private(folio))
+ return;
+
+ // Check if it's using direct flags
+ if (test_bit(PAGE_PRIVATE_NOT_POINTER,
+ (unsigned long *)&folio->private)) {
+ folio_detach_private(folio);
+ return;
+ }
+
+ fifs = folio_detach_private(folio);
+ if (!fifs)
+ return;
+
+ if (is_f2fs_ifs(folio)) {
+ WARN_ON_ONCE(READ_ONCE(fifs->read_bytes_pending) !=
+ F2FS_IFS_MAGIC);
+ WARN_ON_ONCE(atomic_read(&fifs->write_bytes_pending));
+ } else {
+ WARN_ON_ONCE(READ_ONCE(fifs->read_bytes_pending) != 0);
+ WARN_ON_ONCE(atomic_read(&fifs->write_bytes_pending));
+ }
+
+ kfree(fifs);
+}
+
+struct f2fs_iomap_folio_state *folio_get_f2fs_ifs(struct folio *folio)
+{
+ if (!folio_test_private(folio))
+ return NULL;
+
+ if (test_bit(PAGE_PRIVATE_NOT_POINTER,
+ (unsigned long *)&folio->private))
+ return NULL;
+ /*
+ * Note we assume folio->private can be either ifs or f2fs_ifs here.
+ * Compresssed folios should not call this function
+ */
+ f2fs_bug_on(F2FS_F_SB(folio),
+ *((u32 *)folio->private) == F2FS_COMPRESSED_PAGE_MAGIC);
+ return folio->private;
+}
+
+void f2fs_ifs_clear_range_uptodate(struct folio *folio,
+ struct f2fs_iomap_folio_state *fifs,
+ size_t off, size_t len)
+{
+ struct inode *inode = folio->mapping->host;
+ unsigned int first_blk = (off >> inode->i_blkbits);
+ unsigned int last_blk = (off + len - 1) >> inode->i_blkbits;
+ unsigned int nr_blks = last_blk - first_blk + 1;
+ unsigned long flags;
+
+ spin_lock_irqsave(&fifs->state_lock, flags);
+ bitmap_clear(fifs->state, first_blk, nr_blks);
+ spin_unlock_irqrestore(&fifs->state_lock, flags);
+}
+
+void f2fs_iomap_set_range_dirty(struct folio *folio, size_t off, size_t len)
+{
+ struct f2fs_iomap_folio_state *fifs = folio_get_f2fs_ifs(folio);
+
+ if (fifs) {
+ struct inode *inode = folio->mapping->host;
+ unsigned int blks_per_folio = i_blocks_per_folio(inode, folio);
+ unsigned int first_blk = (off >> inode->i_blkbits);
+ unsigned int last_blk = (off + len - 1) >> inode->i_blkbits;
+ unsigned int nr_blks = last_blk - first_blk + 1;
+ unsigned long flags;
+
+ spin_lock_irqsave(&fifs->state_lock, flags);
+ bitmap_set(fifs->state, first_blk + blks_per_folio, nr_blks);
+ spin_unlock_irqrestore(&fifs->state_lock, flags);
+ }
+}
diff --git a/fs/f2fs/f2fs_ifs.h b/fs/f2fs/f2fs_ifs.h
new file mode 100644
index 000000000000..3b16deda8a1e
--- /dev/null
+++ b/fs/f2fs/f2fs_ifs.h
@@ -0,0 +1,79 @@
+// SPDX-License-Identifier: GPL-2.0
+#ifndef F2FS_IFS_H
+#define F2FS_IFS_H
+
+#include <linux/fs.h>
+#include <linux/bug.h>
+#include <linux/f2fs_fs.h>
+#include <linux/mm.h>
+#include <linux/iomap.h>
+#include <linux/slab.h>
+#include <linux/spinlock.h>
+#include <linux/atomic.h>
+
+#include "f2fs.h"
+
+#define F2FS_IFS_MAGIC 0xf2f5
+#define F2FS_IFS_PRIVATE_LONGS 1
+
+/*
+ * F2FS structure for folio private data, mimicking iomap_folio_state layout.
+ * F2FS private flags/data are stored in extra space allocated at the end
+ */
+struct f2fs_iomap_folio_state {
+ spinlock_t state_lock;
+ unsigned int read_bytes_pending;
+ atomic_t write_bytes_pending;
+ /*
+ * Flexible array member.
+ * Holds [0...iomap_longs-1] for iomap uptodate/dirty bits.
+ * Holds [iomap_longs] for F2FS private flags/data (unsigned long).
+ */
+ unsigned long state[];
+};
+
+static inline bool
+f2fs_ifs_block_is_uptodate(struct f2fs_iomap_folio_state *ifs,
+ unsigned int block)
+{
+ return test_bit(block, ifs->state);
+}
+
+static inline size_t f2fs_ifs_iomap_longs(const struct folio *folio)
+{
+ struct inode *inode = folio->mapping->host;
+
+ WARN_ON_ONCE(!inode);
+ unsigned int nr_blocks =
+ i_blocks_per_folio(inode, (struct folio *)folio);
+ return BITS_TO_LONGS(2 * nr_blocks);
+}
+
+static inline size_t f2fs_ifs_total_longs(struct folio *folio)
+{
+ return f2fs_ifs_iomap_longs(folio) + F2FS_IFS_PRIVATE_LONGS;
+}
+
+static inline unsigned long *
+f2fs_ifs_private_flags_ptr(struct f2fs_iomap_folio_state *fifs,
+ const struct folio *folio)
+{
+ return &fifs->state[f2fs_ifs_iomap_longs(folio)];
+}
+
+struct f2fs_iomap_folio_state *f2fs_ifs_alloc(struct folio *folio, gfp_t gfp,
+ bool force_alloc);
+void folio_detach_f2fs_private(struct folio *folio);
+struct f2fs_iomap_folio_state *folio_get_f2fs_ifs(struct folio *folio);
+
+/*
+ * 0-order and fully dirty folio has no fifs
+ * they store private flag directly in their folio->private field
+ * as original f2fs page private behaviour
+ */
+void f2fs_ifs_clear_range_uptodate(struct folio *folio,
+ struct f2fs_iomap_folio_state *fifs,
+ size_t off, size_t len);
+void f2fs_iomap_set_range_dirty(struct folio *folio, size_t off, size_t len);
+
+#endif /* F2FS_IFS_H */
--
2.34.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [RFC PATCH 2/9] f2fs: Integrate f2fs_iomap_folio_state into f2fs page private helpers
2025-08-13 9:37 [f2fs-dev] [RESEND RFC PATCH 0/9] f2fs: Enable buffered read/write large folios support with extended iomap Nanzhe Zhao
2025-08-13 9:37 ` [RFC PATCH 1/9] f2fs: Introduce f2fs_iomap_folio_state Nanzhe Zhao
@ 2025-08-13 9:37 ` Nanzhe Zhao
2025-08-13 9:37 ` [RFC PATCH 3/9] f2fs: Using `folio_detach_f2fs_private` in invalidate and release folio Nanzhe Zhao
` (6 subsequent siblings)
8 siblings, 0 replies; 11+ messages in thread
From: Nanzhe Zhao @ 2025-08-13 9:37 UTC (permalink / raw)
To: Jaegeuk Kim, linux-f2fs-devel, linux-fsdevel
Cc: Matthew Wilcox, Chao Yu, Yi Zhang, Barry Song, Nanzhe Zhao
Integrate f2fs_iomap_folio_state into the f2fs page private helper
functions.
In these functions, we adopt a two-stage strategy to handle the
folio->private field, now supporting both direct bit flags and the
new f2fs_iomap_folio_state pointer.
Note that my implementation does not rely on checking the folio's
order to distinguish whether the folio's private field stores
a flag or an f2fs_iomap_folio_state.
This is because in the folio_set_f2fs_xxx
functions, we will forcibly allocate an f2fs_iomap_folio_state
struct even for order-0 folios under certain conditions.
The reason for doing this is that if an order-0 folio's private field
is set to an f2fs private flag by a thread like gc, the generic
iomap_folio_state helper functions used in iomap buffered write will
mistakenly interpret it as an iomap_folio_state pointer.
We cannot, or rather should not, modify fs/iomap to make it recognize
f2fs's private flags.
Therefore, for now, I have to uniformly allocate an
f2fs_iomap_folio_state for all folios that will need to store an
f2fs private flag to ensure correctness.
I am also thinking about other ways to eliminate the extra memory
overhead this approach introduces. Suggestions would be grateful.
Signed-off-by: Nanzhe Zhao <nzzhao@126.com>
---
fs/f2fs/f2fs.h | 278 +++++++++++++++++++++++++++++++++++++++----------
1 file changed, 225 insertions(+), 53 deletions(-)
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 8df0443dd189..a14bef4dc394 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -27,7 +27,10 @@
#include <linux/fscrypt.h>
#include <linux/fsverity.h>
-
+#include <linux/iomap.h>
+#ifdef CONFIG_F2FS_IOMAP_FOLIO_STATE
+#include "f2fs_ifs.h"
+#endif
struct pagevec;
#ifdef CONFIG_F2FS_CHECK_FS
@@ -2509,58 +2512,227 @@ static inline int inc_valid_block_count(struct f2fs_sb_info *sbi,
return -ENOSPC;
}
-#define PAGE_PRIVATE_GET_FUNC(name, flagname) \
-static inline bool folio_test_f2fs_##name(const struct folio *folio) \
-{ \
- unsigned long priv = (unsigned long)folio->private; \
- unsigned long v = (1UL << PAGE_PRIVATE_NOT_POINTER) | \
- (1UL << PAGE_PRIVATE_##flagname); \
- return (priv & v) == v; \
-} \
-static inline bool page_private_##name(struct page *page) \
-{ \
- return PagePrivate(page) && \
- test_bit(PAGE_PRIVATE_NOT_POINTER, &page_private(page)) && \
- test_bit(PAGE_PRIVATE_##flagname, &page_private(page)); \
-}
-
-#define PAGE_PRIVATE_SET_FUNC(name, flagname) \
-static inline void folio_set_f2fs_##name(struct folio *folio) \
-{ \
- unsigned long v = (1UL << PAGE_PRIVATE_NOT_POINTER) | \
- (1UL << PAGE_PRIVATE_##flagname); \
- if (!folio->private) \
- folio_attach_private(folio, (void *)v); \
- else { \
- v |= (unsigned long)folio->private; \
- folio->private = (void *)v; \
- } \
-} \
-static inline void set_page_private_##name(struct page *page) \
-{ \
- if (!PagePrivate(page)) \
- attach_page_private(page, (void *)0); \
- set_bit(PAGE_PRIVATE_NOT_POINTER, &page_private(page)); \
- set_bit(PAGE_PRIVATE_##flagname, &page_private(page)); \
-}
-
-#define PAGE_PRIVATE_CLEAR_FUNC(name, flagname) \
-static inline void folio_clear_f2fs_##name(struct folio *folio) \
-{ \
- unsigned long v = (unsigned long)folio->private; \
- \
- v &= ~(1UL << PAGE_PRIVATE_##flagname); \
- if (v == (1UL << PAGE_PRIVATE_NOT_POINTER)) \
- folio_detach_private(folio); \
- else \
- folio->private = (void *)v; \
-} \
-static inline void clear_page_private_##name(struct page *page) \
-{ \
- clear_bit(PAGE_PRIVATE_##flagname, &page_private(page)); \
- if (page_private(page) == BIT(PAGE_PRIVATE_NOT_POINTER)) \
- detach_page_private(page); \
+extern bool f2fs_should_use_buffered_iomap(struct inode *inode);
+#ifdef CONFIG_F2FS_IOMAP_FOLIO_STATE
+#define F2FS_FOLIO_PRIVATE_GET_FUNC(name, flagname) \
+ static inline bool folio_test_f2fs_##name(const struct folio *folio) \
+ { \
+ /* First try direct folio->private access for meta folio */ \
+ if (folio_test_private(folio) && \
+ test_bit(PAGE_PRIVATE_NOT_POINTER, \
+ (unsigned long *)&folio->private)) { \
+ return test_bit(PAGE_PRIVATE_##flagname, \
+ (unsigned long *)&folio->private); \
+ } \
+ /* For higher-order folios, use iomap folio state */ \
+ struct f2fs_iomap_folio_state *fifs = \
+ (struct f2fs_iomap_folio_state *)folio->private; \
+ unsigned long *private_p; \
+ if (unlikely(!fifs || !folio->mapping)) \
+ return false; \
+ /* Check magic number before accessing private data */ \
+ if (READ_ONCE(fifs->read_bytes_pending) != F2FS_IFS_MAGIC) \
+ return false; \
+ private_p = f2fs_ifs_private_flags_ptr(fifs, folio); \
+ if (!private_p) \
+ return false; \
+ /* Test bits directly on the 'private' slot */ \
+ return test_bit(PAGE_PRIVATE_##flagname, private_p); \
+ } \
+ static inline bool page_private_##name(struct page *page) \
+ { \
+ return PagePrivate(page) && \
+ test_bit(PAGE_PRIVATE_NOT_POINTER, \
+ &page_private(page)) && \
+ test_bit(PAGE_PRIVATE_##flagname, &page_private(page)); \
+ }
+#define F2FS_FOLIO_PRIVATE_SET_FUNC(name, flagname) \
+ static inline int folio_set_f2fs_##name(struct folio *folio) \
+ { \
+ /* For higher-order folios, use iomap folio state */ \
+ if (unlikely(!folio->mapping)) \
+ return -ENOENT; \
+ bool force_alloc = \
+ f2fs_should_use_buffered_iomap(folio_inode(folio)); \
+ if (!force_alloc && !folio_test_private(folio)) { \
+ folio_attach_private(folio, (void *)0); \
+ set_bit(PAGE_PRIVATE_NOT_POINTER, \
+ (unsigned long *)&folio->private); \
+ set_bit(PAGE_PRIVATE_##flagname, \
+ (unsigned long *)&folio->private); \
+ return 0; \
+ } \
+ struct f2fs_iomap_folio_state *fifs = \
+ f2fs_ifs_alloc(folio, GFP_NOFS, true); \
+ if (unlikely(!fifs)) \
+ return -ENOMEM; \
+ unsigned long *private_p; \
+ WRITE_ONCE(fifs->read_bytes_pending, F2FS_IFS_MAGIC); \
+ private_p = f2fs_ifs_private_flags_ptr(fifs, folio); \
+ if (!private_p) \
+ return -EINVAL; \
+ /* Set the bit atomically */ \
+ set_bit(PAGE_PRIVATE_##flagname, private_p); \
+ /* Ensure NOT_POINTER bit is also set if any F2FS flag is set */ \
+ if (PAGE_PRIVATE_##flagname != PAGE_PRIVATE_NOT_POINTER) \
+ set_bit(PAGE_PRIVATE_NOT_POINTER, private_p); \
+ return 0; \
+ } \
+ static inline void set_page_private_##name(struct page *page) \
+ { \
+ if (!PagePrivate(page)) \
+ attach_page_private(page, (void *)0); \
+ set_bit(PAGE_PRIVATE_NOT_POINTER, &page_private(page)); \
+ set_bit(PAGE_PRIVATE_##flagname, &page_private(page)); \
+ }
+
+#define F2FS_FOLIO_PRIVATE_CLEAR_FUNC(name, flagname) \
+ static inline void folio_clear_f2fs_##name(struct folio *folio) \
+ { \
+ /* First try direct folio->private access */ \
+ if (folio_test_private(folio) && \
+ test_bit(PAGE_PRIVATE_NOT_POINTER, \
+ (unsigned long *)&folio->private)) { \
+ clear_bit(PAGE_PRIVATE_##flagname, \
+ (unsigned long *)&folio->private); \
+ folio_detach_private(folio); \
+ return; \
+ } \
+ /* For higher-order folios, use iomap folio state */ \
+ struct f2fs_iomap_folio_state *fifs = \
+ (struct f2fs_iomap_folio_state *)folio->private; \
+ unsigned long *private_p; \
+ if (unlikely(!fifs || !folio->mapping)) \
+ return; \
+ /* Check magic number before clearing */ \
+ if (READ_ONCE(fifs->read_bytes_pending) != F2FS_IFS_MAGIC) \
+ return; /* Not ours or state unclear */ \
+ private_p = f2fs_ifs_private_flags_ptr(fifs, folio); \
+ if (!private_p) \
+ return; \
+ clear_bit(PAGE_PRIVATE_##flagname, private_p); \
+ } \
+ static inline void clear_page_private_##name(struct page *page) \
+ { \
+ clear_bit(PAGE_PRIVATE_##flagname, &page_private(page)); \
+ if (page_private(page) == BIT(PAGE_PRIVATE_NOT_POINTER)) \
+ detach_page_private(page); \
+ }
+// Generate the accessor functions using the macros
+F2FS_FOLIO_PRIVATE_GET_FUNC(nonpointer, NOT_POINTER);
+F2FS_FOLIO_PRIVATE_GET_FUNC(inline, INLINE_INODE);
+F2FS_FOLIO_PRIVATE_GET_FUNC(gcing, ONGOING_MIGRATION);
+F2FS_FOLIO_PRIVATE_GET_FUNC(atomic, ATOMIC_WRITE);
+F2FS_FOLIO_PRIVATE_GET_FUNC(reference, REF_RESOURCE);
+
+F2FS_FOLIO_PRIVATE_SET_FUNC(reference, REF_RESOURCE);
+F2FS_FOLIO_PRIVATE_SET_FUNC(inline, INLINE_INODE);
+F2FS_FOLIO_PRIVATE_SET_FUNC(gcing, ONGOING_MIGRATION);
+F2FS_FOLIO_PRIVATE_SET_FUNC(atomic, ATOMIC_WRITE);
+
+F2FS_FOLIO_PRIVATE_CLEAR_FUNC(reference, REF_RESOURCE);
+F2FS_FOLIO_PRIVATE_CLEAR_FUNC(inline, INLINE_INODE);
+F2FS_FOLIO_PRIVATE_CLEAR_FUNC(gcing, ONGOING_MIGRATION);
+F2FS_FOLIO_PRIVATE_CLEAR_FUNC(atomic, ATOMIC_WRITE);
+static inline int folio_set_f2fs_data(struct folio *folio, unsigned long data)
+{
+ if (unlikely(!folio->mapping))
+ return -ENOENT;
+
+ struct f2fs_iomap_folio_state *fifs =
+ f2fs_ifs_alloc(folio, GFP_NOFS, true);
+ if (unlikely(!fifs))
+ return -ENOMEM;
+
+ unsigned long *private_p;
+
+ private_p = f2fs_ifs_private_flags_ptr(fifs, folio);
+ if (!private_p)
+ return -EINVAL;
+
+ *private_p &= GENMASK(PAGE_PRIVATE_MAX - 1, 0);
+ *private_p |= (data << PAGE_PRIVATE_MAX);
+ set_bit(PAGE_PRIVATE_NOT_POINTER, private_p);
+
+ return 0;
}
+static inline unsigned long folio_get_f2fs_data(struct folio *folio)
+{
+ struct f2fs_iomap_folio_state *fifs =
+ (struct f2fs_iomap_folio_state *)folio->private;
+ unsigned long *private_p;
+ unsigned long data_val;
+
+ if (!folio->mapping)
+ return 0;
+ f2fs_bug_on(F2FS_I_SB(folio_inode(folio)), !fifs);
+ if (READ_ONCE(fifs->read_bytes_pending) != F2FS_IFS_MAGIC)
+ return 0;
+
+ private_p = f2fs_ifs_private_flags_ptr(fifs, folio);
+ if (!private_p)
+ return 0;
+
+ data_val = READ_ONCE(*private_p);
+
+ if (!test_bit(PAGE_PRIVATE_NOT_POINTER, &data_val))
+ return 0;
+
+ return data_val >> PAGE_PRIVATE_MAX;
+}
+#else
+#define PAGE_PRIVATE_GET_FUNC(name, flagname) \
+ static inline bool folio_test_f2fs_##name(const struct folio *folio) \
+ { \
+ unsigned long priv = (unsigned long)folio->private; \
+ unsigned long v = (1UL << PAGE_PRIVATE_NOT_POINTER) | \
+ (1UL << PAGE_PRIVATE_##flagname); \
+ return (priv & v) == v; \
+ } \
+ static inline bool page_private_##name(struct page *page) \
+ { \
+ return PagePrivate(page) && \
+ test_bit(PAGE_PRIVATE_NOT_POINTER, \
+ &page_private(page)) && \
+ test_bit(PAGE_PRIVATE_##flagname, &page_private(page)); \
+ }
+
+#define PAGE_PRIVATE_SET_FUNC(name, flagname) \
+ static inline void folio_set_f2fs_##name(struct folio *folio) \
+ { \
+ unsigned long v = (1UL << PAGE_PRIVATE_NOT_POINTER) | \
+ (1UL << PAGE_PRIVATE_##flagname); \
+ if (!folio->private) \
+ folio_attach_private(folio, (void *)v); \
+ else { \
+ v |= (unsigned long)folio->private; \
+ folio->private = (void *)v; \
+ } \
+ } \
+ static inline void set_page_private_##name(struct page *page) \
+ { \
+ if (!PagePrivate(page)) \
+ attach_page_private(page, (void *)0); \
+ set_bit(PAGE_PRIVATE_NOT_POINTER, &page_private(page)); \
+ set_bit(PAGE_PRIVATE_##flagname, &page_private(page)); \
+ }
+
+#define PAGE_PRIVATE_CLEAR_FUNC(name, flagname) \
+ static inline void folio_clear_f2fs_##name(struct folio *folio) \
+ { \
+ unsigned long v = (unsigned long)folio->private; \
+ v &= ~(1UL << PAGE_PRIVATE_##flagname); \
+ if (v == (1UL << PAGE_PRIVATE_NOT_POINTER)) \
+ folio_detach_private(folio); \
+ else \
+ folio->private = (void *)v; \
+ } \
+ static inline void clear_page_private_##name(struct page *page) \
+ { \
+ clear_bit(PAGE_PRIVATE_##flagname, &page_private(page)); \
+ if (page_private(page) == BIT(PAGE_PRIVATE_NOT_POINTER)) \
+ detach_page_private(page); \
+ }
PAGE_PRIVATE_GET_FUNC(nonpointer, NOT_POINTER);
PAGE_PRIVATE_GET_FUNC(inline, INLINE_INODE);
@@ -2595,7 +2767,7 @@ static inline void folio_set_f2fs_data(struct folio *folio, unsigned long data)
else
folio->private = (void *)((unsigned long)folio->private | data);
}
-
+#endif
static inline void dec_valid_block_count(struct f2fs_sb_info *sbi,
struct inode *inode,
block_t count)
--
2.34.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [RFC PATCH 3/9] f2fs: Using `folio_detach_f2fs_private` in invalidate and release folio
2025-08-13 9:37 [f2fs-dev] [RESEND RFC PATCH 0/9] f2fs: Enable buffered read/write large folios support with extended iomap Nanzhe Zhao
2025-08-13 9:37 ` [RFC PATCH 1/9] f2fs: Introduce f2fs_iomap_folio_state Nanzhe Zhao
2025-08-13 9:37 ` [RFC PATCH 2/9] f2fs: Integrate f2fs_iomap_folio_state into f2fs page private helpers Nanzhe Zhao
@ 2025-08-13 9:37 ` Nanzhe Zhao
2025-08-13 9:37 ` [RFC PATCH 4/9] f2fs: Convert outplace write path page private funcions to folio private functions Nanzhe Zhao
` (5 subsequent siblings)
8 siblings, 0 replies; 11+ messages in thread
From: Nanzhe Zhao @ 2025-08-13 9:37 UTC (permalink / raw)
To: Jaegeuk Kim, linux-f2fs-devel, linux-fsdevel
Cc: Matthew Wilcox, Chao Yu, Yi Zhang, Barry Song, Nanzhe Zhao
Since `folio_detach_f2fs_private` can handle all case for a
folio to free it's private date , intergrate it as a subtitute
for `folio_detach_private`.
Signed-off-by: Nanzhe Zhao <nzzhao@126.com>
---
fs/f2fs/data.c | 13 +++++++++++++
1 file changed, 13 insertions(+)
diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index ed1174430827..415f51602492 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -3748,7 +3748,16 @@ void f2fs_invalidate_folio(struct folio *folio, size_t offset, size_t length)
f2fs_remove_dirty_inode(inode);
}
}
+#ifdef CONFIG_F2FS_IOMAP_FOLIO_STATE
+ /* Same to iomap_invalidate_folio*/
+ if (offset == 0 && length == folio_size(folio)) {
+ WARN_ON_ONCE(folio_test_writeback(folio));
+ folio_cancel_dirty(folio);
+ folio_detach_f2fs_private(folio);
+ }
+#else
folio_detach_private(folio);
+#endif
}
bool f2fs_release_folio(struct folio *folio, gfp_t wait)
@@ -3757,7 +3766,11 @@ bool f2fs_release_folio(struct folio *folio, gfp_t wait)
if (folio_test_dirty(folio))
return false;
+#ifdef CONFIG_F2FS_IOMAP_FOLIO_STATE
+ folio_detach_f2fs_private(folio);
+#else
folio_detach_private(folio);
+#endif
return true;
}
--
2.34.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [RFC PATCH 4/9] f2fs: Convert outplace write path page private funcions to folio private functions.
2025-08-13 9:37 [f2fs-dev] [RESEND RFC PATCH 0/9] f2fs: Enable buffered read/write large folios support with extended iomap Nanzhe Zhao
` (2 preceding siblings ...)
2025-08-13 9:37 ` [RFC PATCH 3/9] f2fs: Using `folio_detach_f2fs_private` in invalidate and release folio Nanzhe Zhao
@ 2025-08-13 9:37 ` Nanzhe Zhao
2025-08-13 9:37 ` [RFC PATCH 5/9] f2fs:Refactor `f2fs_is_compressed_page` to `f2fs_is_compressed_folio` Nanzhe Zhao
` (4 subsequent siblings)
8 siblings, 0 replies; 11+ messages in thread
From: Nanzhe Zhao @ 2025-08-13 9:37 UTC (permalink / raw)
To: Jaegeuk Kim, linux-f2fs-devel, linux-fsdevel
Cc: Matthew Wilcox, Chao Yu, Yi Zhang, Barry Song, Nanzhe Zhao
The core function `f2fs_out_place_write` and `__get_segment_type_6`
in outplace write path haven't got their legacy page private functions
converted which can be harmful for large folios support.
Convert them to use our folio private funcions.
Signed-off-by: Nanzhe Zhao <nzzhao@126.com>
---
fs/f2fs/data.c | 2 +-
fs/f2fs/segment.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index 415f51602492..5589280294c1 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -2637,7 +2637,7 @@ bool f2fs_should_update_outplace(struct inode *inode, struct f2fs_io_info *fio)
return true;
if (fio) {
- if (page_private_gcing(fio->page))
+ if (folio_test_f2fs_gcing(fio->folio))
return true;
if (unlikely(is_sbi_flag_set(sbi, SBI_CP_DISABLED) &&
f2fs_is_checkpointed_data(sbi, fio->old_blkaddr)))
diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index 949ee1f8fb5c..7e9dd045b55d 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -3653,7 +3653,7 @@ static int __get_segment_type_6(struct f2fs_io_info *fio)
if (is_inode_flag_set(inode, FI_ALIGNED_WRITE))
return CURSEG_COLD_DATA_PINNED;
- if (page_private_gcing(fio->page)) {
+ if (folio_test_f2fs_gcing(fio->folio)) {
if (fio->sbi->am.atgc_enabled &&
(fio->io_type == FS_DATA_IO) &&
(fio->sbi->gc_mode != GC_URGENT_HIGH) &&
--
2.34.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [RFC PATCH 5/9] f2fs:Refactor `f2fs_is_compressed_page` to `f2fs_is_compressed_folio`
2025-08-13 9:37 [f2fs-dev] [RESEND RFC PATCH 0/9] f2fs: Enable buffered read/write large folios support with extended iomap Nanzhe Zhao
` (3 preceding siblings ...)
2025-08-13 9:37 ` [RFC PATCH 4/9] f2fs: Convert outplace write path page private funcions to folio private functions Nanzhe Zhao
@ 2025-08-13 9:37 ` Nanzhe Zhao
2025-08-13 9:37 ` [RFC PATCH 6/9] f2fs: Extend f2fs_io_info to support sub-folio ranges Nanzhe Zhao
` (3 subsequent siblings)
8 siblings, 0 replies; 11+ messages in thread
From: Nanzhe Zhao @ 2025-08-13 9:37 UTC (permalink / raw)
To: Jaegeuk Kim, linux-f2fs-devel, linux-fsdevel
Cc: Matthew Wilcox, Chao Yu, Yi Zhang, Barry Song, Nanzhe Zhao
`f2fs_is_compressed_page` now already accept a folio as
a parameter.So the name now is confusing.
Rename it to `f2fs_is_compressed_folio`.
If a folio has f2fs_iomap_folio_state then it must not be
a compressed folio.
Signed-off-by: Nanzhe Zhao <nzzhao@126.com>
---
fs/f2fs/compress.c | 11 ++++++-----
fs/f2fs/data.c | 10 +++++-----
fs/f2fs/f2fs.h | 7 +++++--
3 files changed, 16 insertions(+), 12 deletions(-)
diff --git a/fs/f2fs/compress.c b/fs/f2fs/compress.c
index 6ad8d3bc6df7..627013ef856c 100644
--- a/fs/f2fs/compress.c
+++ b/fs/f2fs/compress.c
@@ -71,13 +71,14 @@ static pgoff_t start_idx_of_cluster(struct compress_ctx *cc)
return cc->cluster_idx << cc->log_cluster_size;
}
-bool f2fs_is_compressed_page(struct folio *folio)
+bool f2fs_is_compressed_folio(struct folio *folio)
{
- if (!folio->private)
+ if (!folio_test_private(folio))
return false;
if (folio_test_f2fs_nonpointer(folio))
return false;
-
+ if (folio_get_f2fs_ifs(folio)) /*compressed folio current don't support higer order*/
+ return false;
f2fs_bug_on(F2FS_F_SB(folio),
*((u32 *)folio->private) != F2FS_COMPRESSED_PAGE_MAGIC);
return true;
@@ -1483,8 +1484,8 @@ void f2fs_compress_write_end_io(struct bio *bio, struct folio *folio)
struct page *page = &folio->page;
struct f2fs_sb_info *sbi = bio->bi_private;
struct compress_io_ctx *cic = folio->private;
- enum count_type type = WB_DATA_TYPE(folio,
- f2fs_is_compressed_page(folio));
+ enum count_type type =
+ WB_DATA_TYPE(folio, f2fs_is_compressed_folio(folio));
int i;
if (unlikely(bio->bi_status != BLK_STS_OK))
diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index 5589280294c1..a9dc2572bdc4 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -142,7 +142,7 @@ static void f2fs_finish_read_bio(struct bio *bio, bool in_task)
bio_for_each_folio_all(fi, bio) {
struct folio *folio = fi.folio;
- if (f2fs_is_compressed_page(folio)) {
+ if (f2fs_is_compressed_folio(folio)) {
if (ctx && !ctx->decompression_attempted)
f2fs_end_read_compressed_page(folio, true, 0,
in_task);
@@ -186,7 +186,7 @@ static void f2fs_verify_bio(struct work_struct *work)
bio_for_each_folio_all(fi, bio) {
struct folio *folio = fi.folio;
- if (!f2fs_is_compressed_page(folio) &&
+ if (!f2fs_is_compressed_folio(folio) &&
!fsverity_verify_page(&folio->page)) {
bio->bi_status = BLK_STS_IOERR;
break;
@@ -239,7 +239,7 @@ static void f2fs_handle_step_decompress(struct bio_post_read_ctx *ctx,
bio_for_each_folio_all(fi, ctx->bio) {
struct folio *folio = fi.folio;
- if (f2fs_is_compressed_page(folio))
+ if (f2fs_is_compressed_folio(folio))
f2fs_end_read_compressed_page(folio, false, blkaddr,
in_task);
else
@@ -344,7 +344,7 @@ static void f2fs_write_end_io(struct bio *bio)
}
#ifdef CONFIG_F2FS_FS_COMPRESSION
- if (f2fs_is_compressed_page(folio)) {
+ if (f2fs_is_compressed_folio(folio)) {
f2fs_compress_write_end_io(bio, folio);
continue;
}
@@ -568,7 +568,7 @@ static bool __has_merged_page(struct bio *bio, struct inode *inode,
if (IS_ERR(target))
continue;
}
- if (f2fs_is_compressed_page(target)) {
+ if (f2fs_is_compressed_folio(target)) {
target = f2fs_compress_control_folio(target);
if (IS_ERR(target))
continue;
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index a14bef4dc394..9f88be53174b 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -4677,7 +4677,7 @@ enum cluster_check_type {
CLUSTER_COMPR_BLKS, /* return # of compressed blocks in a cluster */
CLUSTER_RAW_BLKS /* return # of raw blocks in a cluster */
};
-bool f2fs_is_compressed_page(struct folio *folio);
+bool f2fs_is_compressed_folio(struct folio *folio);
struct folio *f2fs_compress_control_folio(struct folio *folio);
int f2fs_prepare_compress_overwrite(struct inode *inode,
struct page **pagep, pgoff_t index, void **fsdata);
@@ -4744,7 +4744,10 @@ void f2fs_invalidate_compress_pages(struct f2fs_sb_info *sbi, nid_t ino);
sbi->compr_saved_block += diff; \
} while (0)
#else
-static inline bool f2fs_is_compressed_page(struct folio *folio) { return false; }
+static inline bool f2fs_is_compressed_folio(struct folio *folio)
+{
+ return false;
+}
static inline bool f2fs_is_compress_backend_ready(struct inode *inode)
{
if (!f2fs_compressed_file(inode))
--
2.34.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [RFC PATCH 6/9] f2fs: Extend f2fs_io_info to support sub-folio ranges
2025-08-13 9:37 [f2fs-dev] [RESEND RFC PATCH 0/9] f2fs: Enable buffered read/write large folios support with extended iomap Nanzhe Zhao
` (4 preceding siblings ...)
2025-08-13 9:37 ` [RFC PATCH 5/9] f2fs:Refactor `f2fs_is_compressed_page` to `f2fs_is_compressed_folio` Nanzhe Zhao
@ 2025-08-13 9:37 ` Nanzhe Zhao
2025-08-13 9:37 ` [RFC PATCH 7/9] f2fs:Make GC aware of large folios Nanzhe Zhao
` (2 subsequent siblings)
8 siblings, 0 replies; 11+ messages in thread
From: Nanzhe Zhao @ 2025-08-13 9:37 UTC (permalink / raw)
To: Jaegeuk Kim, linux-f2fs-devel, linux-fsdevel
Cc: Matthew Wilcox, Chao Yu, Yi Zhang, Barry Song, Nanzhe Zhao
Since f2fs_io_info (hereafter fio) has been converted to use fio->folio
and fio->page is deprecated, we must now track which sub-part of a
folio is being submitted to a bio in order to support large folios.
To achieve this, we add `idx` and `cnt` fields to the fio struct.
`fio->idx` represents the offset (in pages) within the current folio
for this I/O operation, and `fio->cnt` represents the number of
contiguous blocks being processed.
With the introduction of these two fields, the existing `old_blkaddr`
and `new_blkaddr` fields in fio are reinterpreted. They now represent
the starting old and new block addresses corresponding to `fio->idx`.
Consequently, an fio no longer represents a single mapping from one
old_blkaddr to one new_blkaddr, but rather a range mapping from
[old_blkaddr, old_blkaddr + fio->cnt - 1] to
[new_blkaddr, new_blkaddr + fio->cnt - 1].
In bio submission paths, for cases where `fio->cnt` is not explicitly
initialized, we default it to 1 and `fio->idx` to 0. This ensures
backward compatibility with all existing f2fs logic that operates on
single pages.
Discussion: Now I don't know if it's better to store bytes-unit
logical file offset and LBA length instead of block-unit cnt and
page idx in fio if we are to support BLOCK_SIZE > PAGE_SIZE.
Suggestions are appreciated.
Signed-off-by: Nanzhe Zhao <nzzhao@126.com>
---
fs/f2fs/data.c | 16 ++++++++++------
fs/f2fs/f2fs.h | 2 ++
2 files changed, 12 insertions(+), 6 deletions(-)
diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index a9dc2572bdc4..b7bef2a28c8e 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -711,7 +711,9 @@ int f2fs_submit_page_bio(struct f2fs_io_info *fio)
f2fs_set_bio_crypt_ctx(bio, fio_folio->mapping->host,
fio_folio->index, fio, GFP_NOIO);
- bio_add_folio_nofail(bio, data_folio, folio_size(data_folio), 0);
+ bio_add_folio_nofail(bio, data_folio,
+ F2FS_BLK_TO_BYTES(fio->cnt ? fio->cnt : 1),
+ fio->idx << PAGE_SHIFT);
if (fio->io_wbc && !is_read_io(fio->op))
wbc_account_cgroup_owner(fio->io_wbc, fio_folio, PAGE_SIZE);
@@ -1010,16 +1012,18 @@ void f2fs_submit_page_write(struct f2fs_io_info *fio)
io->fio = *fio;
}
- if (!bio_add_folio(io->bio, bio_folio, folio_size(bio_folio), 0)) {
+ if (!bio_add_folio(io->bio, bio_folio,
+ F2FS_BLK_TO_BYTES(fio->cnt ? fio->cnt : 1),
+ fio->idx << PAGE_SHIFT)) {
__submit_merged_bio(io);
goto alloc_new;
}
if (fio->io_wbc)
wbc_account_cgroup_owner(fio->io_wbc, fio->folio,
- folio_size(fio->folio));
+ F2FS_BLK_TO_BYTES(fio->cnt));
- io->last_block_in_bio = fio->new_blkaddr;
+ io->last_block_in_bio = fio->new_blkaddr + fio->cnt - 1;
trace_f2fs_submit_folio_write(fio->folio, fio);
#ifdef CONFIG_BLK_DEV_ZONED
@@ -2675,7 +2679,7 @@ int f2fs_do_write_data_page(struct f2fs_io_info *fio)
set_new_dnode(&dn, inode, NULL, NULL, 0);
if (need_inplace_update(fio) &&
- f2fs_lookup_read_extent_cache_block(inode, folio->index,
+ f2fs_lookup_read_extent_cache_block(inode, folio->index + fio->idx,
&fio->old_blkaddr)) {
if (!f2fs_is_valid_blkaddr(fio->sbi, fio->old_blkaddr,
DATA_GENERIC_ENHANCE))
@@ -2690,7 +2694,7 @@ int f2fs_do_write_data_page(struct f2fs_io_info *fio)
if (fio->need_lock == LOCK_REQ && !f2fs_trylock_op(fio->sbi))
return -EAGAIN;
- err = f2fs_get_dnode_of_data(&dn, folio->index, LOOKUP_NODE);
+ err = f2fs_get_dnode_of_data(&dn, folio->index + fio->idx, LOOKUP_NODE);
if (err)
goto out;
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 9f88be53174b..c6b23fa63588 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -1281,6 +1281,8 @@ struct f2fs_io_info {
blk_opf_t op_flags; /* req_flag_bits */
block_t new_blkaddr; /* new block address to be written */
block_t old_blkaddr; /* old block address before Cow */
+ pgoff_t idx; /*start page index within current active folio in this fio*/
+ unsigned int cnt; /*block cnts of the active folio, we assume they are continuous.*/
union {
struct page *page; /* page to be written */
struct folio *folio;
--
2.34.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [RFC PATCH 7/9] f2fs:Make GC aware of large folios
2025-08-13 9:37 [f2fs-dev] [RESEND RFC PATCH 0/9] f2fs: Enable buffered read/write large folios support with extended iomap Nanzhe Zhao
` (5 preceding siblings ...)
2025-08-13 9:37 ` [RFC PATCH 6/9] f2fs: Extend f2fs_io_info to support sub-folio ranges Nanzhe Zhao
@ 2025-08-13 9:37 ` Nanzhe Zhao
2025-08-13 9:37 ` [RFC PATCH 8/9] f2fs: Introduce F2FS_GET_BLOCK_IOMAP and map_blocks he lpers Nanzhe Zhao
2025-08-13 9:37 ` [RFC PATCH 9/9] f2fs: Enable buffered read/write path large folios support for normal and atomic file with iomap Nanzhe Zhao
8 siblings, 0 replies; 11+ messages in thread
From: Nanzhe Zhao @ 2025-08-13 9:37 UTC (permalink / raw)
To: Jaegeuk Kim, linux-f2fs-devel, linux-fsdevel
Cc: Matthew Wilcox, Chao Yu, Yi Zhang, Barry Song, Nanzhe Zhao
Previously, the GC (Garbage Collection) logic for performing I/O and
marking folios dirty only supported order-0 folios and lacked awareness
of higher-order folios. To enable GC to correctly handle higher-order
folios, we made two changes:
- In `move_data_page`, we now use `f2fs_iomap_set_range_dirty` to mark
only the sub-part of the folio corresponding to `bidx` as dirty,
instead of the entire folio.
- The `f2fs_submit_page_read` function has been augmented with an
`index` parameter, allowing it to precisely identify which sub-page
of the current folio is being submitted.
Signed-off-by: Nanzhe Zhao <nzzhao@126.com>
---
fs/f2fs/data.c | 13 +++++++------
fs/f2fs/gc.c | 37 +++++++++++++++++++++++--------------
2 files changed, 30 insertions(+), 20 deletions(-)
diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index b7bef2a28c8e..5ecd08a3dd0b 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -1096,7 +1096,7 @@ static struct bio *f2fs_grab_read_bio(struct inode *inode, block_t blkaddr,
/* This can handle encryption stuffs */
static int f2fs_submit_page_read(struct inode *inode, struct folio *folio,
block_t blkaddr, blk_opf_t op_flags,
- bool for_write)
+ pgoff_t index, bool for_write)
{
struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
struct bio *bio;
@@ -1109,7 +1109,8 @@ static int f2fs_submit_page_read(struct inode *inode, struct folio *folio,
/* wait for GCed page writeback via META_MAPPING */
f2fs_wait_on_block_writeback(inode, blkaddr);
- if (!bio_add_folio(bio, folio, PAGE_SIZE, 0)) {
+ if (!bio_add_folio(bio, folio, PAGE_SIZE,
+ (index - folio->index) << PAGE_SHIFT)) {
iostat_update_and_unbind_ctx(bio);
if (bio->bi_private)
mempool_free(bio->bi_private, bio_post_read_ctx_pool);
@@ -1276,8 +1277,8 @@ struct folio *f2fs_get_read_data_folio(struct inode *inode, pgoff_t index,
return folio;
}
- err = f2fs_submit_page_read(inode, folio, dn.data_blkaddr,
- op_flags, for_write);
+ err = f2fs_submit_page_read(inode, folio, dn.data_blkaddr, op_flags,
+ index, for_write);
if (err)
goto put_err;
return folio;
@@ -3651,8 +3652,8 @@ static int f2fs_write_begin(const struct kiocb *iocb,
goto put_folio;
}
err = f2fs_submit_page_read(use_cow ?
- F2FS_I(inode)->cow_inode : inode,
- folio, blkaddr, 0, true);
+ F2FS_I(inode)->cow_inode : inode, folio,
+ blkaddr, 0, folio->index, true);
if (err)
goto put_folio;
diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
index 098e9f71421e..6d28f01bec42 100644
--- a/fs/f2fs/gc.c
+++ b/fs/f2fs/gc.c
@@ -1475,22 +1475,31 @@ static int move_data_page(struct inode *inode, block_t bidx, int gc_type,
err = -EAGAIN;
goto out;
}
- folio_mark_dirty(folio);
folio_set_f2fs_gcing(folio);
+#ifdef CONFIG_F2FS_IOMAP_FOLIO_STATE
+ if (!folio_test_large(folio)) {
+ folio_mark_dirty(folio);
+ } else {
+ f2fs_iomap_set_range_dirty(folio, (bidx - folio->index) << PAGE_SHIFT,
+ PAGE_SIZE);
+ }
+#else
+ folio_mark_dirty(folio);
+#endif
} else {
- struct f2fs_io_info fio = {
- .sbi = F2FS_I_SB(inode),
- .ino = inode->i_ino,
- .type = DATA,
- .temp = COLD,
- .op = REQ_OP_WRITE,
- .op_flags = REQ_SYNC,
- .old_blkaddr = NULL_ADDR,
- .folio = folio,
- .encrypted_page = NULL,
- .need_lock = LOCK_REQ,
- .io_type = FS_GC_DATA_IO,
- };
+ struct f2fs_io_info fio = { .sbi = F2FS_I_SB(inode),
+ .ino = inode->i_ino,
+ .type = DATA,
+ .temp = COLD,
+ .op = REQ_OP_WRITE,
+ .op_flags = REQ_SYNC,
+ .old_blkaddr = NULL_ADDR,
+ .folio = folio,
+ .encrypted_page = NULL,
+ .need_lock = LOCK_REQ,
+ .io_type = FS_GC_DATA_IO,
+ .idx = bidx - folio->index,
+ .cnt = 1 };
bool is_dirty = folio_test_dirty(folio);
retry:
--
2.34.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [RFC PATCH 8/9] f2fs: Introduce F2FS_GET_BLOCK_IOMAP and map_blocks he lpers
2025-08-13 9:37 [f2fs-dev] [RESEND RFC PATCH 0/9] f2fs: Enable buffered read/write large folios support with extended iomap Nanzhe Zhao
` (6 preceding siblings ...)
2025-08-13 9:37 ` [RFC PATCH 7/9] f2fs:Make GC aware of large folios Nanzhe Zhao
@ 2025-08-13 9:37 ` Nanzhe Zhao
2025-08-13 9:37 ` [RFC PATCH 9/9] f2fs: Enable buffered read/write path large folios support for normal and atomic file with iomap Nanzhe Zhao
8 siblings, 0 replies; 11+ messages in thread
From: Nanzhe Zhao @ 2025-08-13 9:37 UTC (permalink / raw)
To: Jaegeuk Kim, linux-f2fs-devel, linux-fsdevel
Cc: Matthew Wilcox, Chao Yu, Yi Zhang, Barry Song, Nanzhe Zhao
Introduce the `F2FS_GET_BLOCK_IOMAP` flag for `f2fs_ma
p_blocks`.
With this flag, holes encountered during buffered I/O
iterative mapping
can now be merged under `map_is_mergeable`. Furthermor
e, when this flag
is passed, `f2fs_map_blocks` will by default store the
mapped block
information (from the `f2fs_map_blocks` structure) int
o the extent cache,
provided the resulting extent size is greater than the
minimum allowed
length for the f2fs extent cache.
Notably, both holes and `NEW_ADDR`
extents will also be cached under the influence of thi
s flag.
This improves buffered write performance for sparse fi
les.
Additionally, two helper functions are introduced:
- `f2fs_map_blocks_iomap`: A simple wrapper for `f2fs_
map_blocks` that
enables the `F2FS_GET_BLOCK_IOMAP` flag.
- `f2fs_map_blocks_prealloc`: A simple wrapper for usi
ng
`f2fs_map_blocks` to preallocate blocks.
Signed-off-by: Nanzhe Zhao <nzzhao@126.com>
---
fs/f2fs/data.c | 49 +++++++++++++++++++++++++++++++++++++++++++------
fs/f2fs/f2fs.h | 5 +++++
2 files changed, 48 insertions(+), 6 deletions(-)
diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index 5ecd08a3dd0b..37eaf431ab42 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -1537,8 +1537,11 @@ static bool map_is_mergeable(struct f2fs_sb_info *sbi,
return true;
if (flag == F2FS_GET_BLOCK_PRE_DIO)
return true;
- if (flag == F2FS_GET_BLOCK_DIO &&
- map->m_pblk == NULL_ADDR && blkaddr == NULL_ADDR)
+ if (flag == F2FS_GET_BLOCK_DIO && map->m_pblk == NULL_ADDR &&
+ blkaddr == NULL_ADDR)
+ return true;
+ if (flag == F2FS_GET_BLOCK_IOMAP && map->m_pblk == NULL_ADDR &&
+ blkaddr == NULL_ADDR)
return true;
return false;
}
@@ -1676,6 +1679,10 @@ int f2fs_map_blocks(struct inode *inode, struct f2fs_map_blocks *map, int flag)
if (map->m_next_pgofs)
*map->m_next_pgofs = pgofs + 1;
break;
+ case F2FS_GET_BLOCK_IOMAP:
+ if (map->m_next_pgofs)
+ *map->m_next_pgofs = pgofs + 1;
+ break;
default:
/* for defragment case */
if (map->m_next_pgofs)
@@ -1741,8 +1748,9 @@ int f2fs_map_blocks(struct inode *inode, struct f2fs_map_blocks *map, int flag)
else if (dn.ofs_in_node < end_offset)
goto next_block;
- if (flag == F2FS_GET_BLOCK_PRECACHE) {
- if (map->m_flags & F2FS_MAP_MAPPED) {
+ if (flag == F2FS_GET_BLOCK_PRECACHE || flag == F2FS_GET_BLOCK_IOMAP) {
+ if (map->m_flags & F2FS_MAP_MAPPED &&
+ map->m_len > F2FS_MIN_EXTENT_LEN) {
unsigned int ofs = start_pgofs - map->m_lblk;
f2fs_update_read_extent_cache_range(&dn,
@@ -1786,8 +1794,9 @@ int f2fs_map_blocks(struct inode *inode, struct f2fs_map_blocks *map, int flag)
}
}
- if (flag == F2FS_GET_BLOCK_PRECACHE) {
- if (map->m_flags & F2FS_MAP_MAPPED) {
+ if (flag == F2FS_GET_BLOCK_PRECACHE || flag == F2FS_GET_BLOCK_IOMAP) {
+ if (map->m_flags & F2FS_MAP_MAPPED &&
+ map->m_len > F2FS_MIN_EXTENT_LEN) {
unsigned int ofs = start_pgofs - map->m_lblk;
f2fs_update_read_extent_cache_range(&dn,
@@ -1808,6 +1817,34 @@ int f2fs_map_blocks(struct inode *inode, struct f2fs_map_blocks *map, int flag)
return err;
}
+int f2fs_map_blocks_iomap(struct inode *inode, block_t start, block_t len,
+ struct f2fs_map_blocks *map)
+{
+ int err = 0;
+
+ map->m_lblk = start; // Logical block number for the start pos
+ map->m_len = len; // Length in blocks
+ map->m_may_create = false;
+ map->m_seg_type =
+ f2fs_rw_hint_to_seg_type(F2FS_I_SB(inode), inode->i_write_hint);
+ err = f2fs_map_blocks(inode, map, F2FS_GET_BLOCK_IOMAP);
+ return err;
+}
+
+int f2fs_map_blocks_preallocate(struct inode *inode, block_t start, block_t len,
+ struct f2fs_map_blocks *map)
+{
+ int err = 0;
+
+ map->m_lblk = start;
+ map->m_len = len; // Length in blocks
+ map->m_may_create = true;
+ map->m_seg_type =
+ f2fs_rw_hint_to_seg_type(F2FS_I_SB(inode), inode->i_write_hint);
+ err = f2fs_map_blocks(inode, map, F2FS_GET_BLOCK_PRE_AIO);
+ return err;
+}
+
bool f2fs_overwrite_io(struct inode *inode, loff_t pos, size_t len)
{
struct f2fs_map_blocks map;
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index c6b23fa63588..ac9a6ac13e1f 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -788,6 +788,7 @@ enum {
F2FS_GET_BLOCK_PRE_DIO,
F2FS_GET_BLOCK_PRE_AIO,
F2FS_GET_BLOCK_PRECACHE,
+ F2FS_GET_BLOCK_IOMAP,
};
/*
@@ -4232,6 +4233,10 @@ struct folio *f2fs_get_new_data_folio(struct inode *inode,
struct folio *ifolio, pgoff_t index, bool new_i_size);
int f2fs_do_write_data_page(struct f2fs_io_info *fio);
int f2fs_map_blocks(struct inode *inode, struct f2fs_map_blocks *map, int flag);
+int f2fs_map_blocks_iomap(struct inode *inode, block_t start, block_t len,
+ struct f2fs_map_blocks *map);
+int f2fs_map_blocks_preallocate(struct inode *inode, block_t start, block_t len,
+ struct f2fs_map_blocks *map);
int f2fs_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
u64 start, u64 len);
int f2fs_encrypt_one_page(struct f2fs_io_info *fio);
--
2.34.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [RFC PATCH 9/9] f2fs: Enable buffered read/write path large folios support for normal and atomic file with iomap
2025-08-13 9:37 [f2fs-dev] [RESEND RFC PATCH 0/9] f2fs: Enable buffered read/write large folios support with extended iomap Nanzhe Zhao
` (7 preceding siblings ...)
2025-08-13 9:37 ` [RFC PATCH 8/9] f2fs: Introduce F2FS_GET_BLOCK_IOMAP and map_blocks he lpers Nanzhe Zhao
@ 2025-08-13 9:37 ` Nanzhe Zhao
8 siblings, 0 replies; 11+ messages in thread
From: Nanzhe Zhao @ 2025-08-13 9:37 UTC (permalink / raw)
To: Jaegeuk Kim, linux-f2fs-devel, linux-fsdevel
Cc: Matthew Wilcox, Chao Yu, Yi Zhang, Barry Song, Nanzhe Zhao
This commit enables large folios support for F2FS's buffered read and
write paths.
We introduce a helper function `f2fs_set_iomap` to handle all the logic
that converts a f2fs_map_blocks to iomap.
Currently, compressed files, encrypted files, and fsverity are not
supported with iomap large folios.
Since F2FS requires `f2fs_iomap_folio_state` (or a similar equivalent
mechanism) to correctly support the iomap framework, when
`CONFIG_F2FS_IOMAP_FOLIO_STATE` is not enabled, we will not use the
iomap buffered read/write paths.
Note: Since holes reported by f2fs_map_blocks come in two types
(NULL_ADDR and unmapped dnodes).
They requiring different handle logic to set iomap.length,
So we add a new block state flag for f2fs_map_blocks
Signed-off-by: Nanzhe Zhao <nzzhao@126.com>
---
fs/f2fs/data.c | 286 +++++++++++++++++++++++++++++++++++++++++++----
fs/f2fs/f2fs.h | 120 +++++++++++++-------
fs/f2fs/file.c | 33 +++++-
fs/f2fs/inline.c | 15 ++-
fs/f2fs/inode.c | 27 +++++
fs/f2fs/namei.c | 7 ++
fs/f2fs/super.c | 3 +
7 files changed, 425 insertions(+), 66 deletions(-)
diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index 37eaf431ab42..243c6305b0c5 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -1149,6 +1149,9 @@ void f2fs_update_data_blkaddr(struct dnode_of_data *dn, block_t blkaddr)
{
f2fs_set_data_blkaddr(dn, blkaddr);
f2fs_update_read_extent_cache(dn);
+#ifdef CONFIG_F2FS_IOMAP_FOLIO_STATE
+ f2fs_iomap_seq_inc(dn->inode);
+#endif
}
/* dn->ofs_in_node will be returned with up-to-date last block pointer */
@@ -1182,6 +1185,9 @@ int f2fs_reserve_new_blocks(struct dnode_of_data *dn, blkcnt_t count)
if (folio_mark_dirty(dn->node_folio))
dn->node_changed = true;
+#ifdef CONFIG_F2FS_IOMAP_FOLIO_STATE
+ f2fs_iomap_seq_inc(dn->inode);
+#endif
return 0;
}
@@ -1486,6 +1492,7 @@ static int f2fs_map_no_dnode(struct inode *inode,
*map->m_next_pgofs = f2fs_get_next_page_offset(dn, pgoff);
if (map->m_next_extent)
*map->m_next_extent = f2fs_get_next_page_offset(dn, pgoff);
+ map->m_flags |= F2FS_MAP_NODNODE;
return 0;
}
@@ -1702,7 +1709,9 @@ int f2fs_map_blocks(struct inode *inode, struct f2fs_map_blocks *map, int flag)
if (blkaddr == NEW_ADDR)
map->m_flags |= F2FS_MAP_DELALLOC;
/* DIO READ and hole case, should not map the blocks. */
- if (!(flag == F2FS_GET_BLOCK_DIO && is_hole && !map->m_may_create))
+ if (!(flag == F2FS_GET_BLOCK_DIO && is_hole &&
+ !map->m_may_create) &&
+ !(flag == F2FS_GET_BLOCK_IOMAP && is_hole))
map->m_flags |= F2FS_MAP_MAPPED;
map->m_pblk = blkaddr;
@@ -1736,6 +1745,10 @@ int f2fs_map_blocks(struct inode *inode, struct f2fs_map_blocks *map, int flag)
goto sync_out;
map->m_len += dn.ofs_in_node - ofs_in_node;
+ /* Since we successfully reserved blocks, we can update the pblk now.
+ * No need to perform inefficient look up in write_begin again
+ */
+ map->m_pblk = dn.data_blkaddr;
if (prealloc && dn.ofs_in_node != last_ofs_in_node + 1) {
err = -ENOSPC;
goto sync_out;
@@ -4255,9 +4268,6 @@ static int f2fs_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
err = f2fs_map_blocks(inode, &map, F2FS_GET_BLOCK_DIO);
if (err)
return err;
-
- iomap->offset = F2FS_BLK_TO_BYTES(map.m_lblk);
-
/*
* When inline encryption is enabled, sometimes I/O to an encrypted file
* has to be broken up to guarantee DUN contiguity. Handle this by
@@ -4272,28 +4282,44 @@ static int f2fs_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
if (WARN_ON_ONCE(map.m_pblk == COMPRESS_ADDR))
return -EINVAL;
- if (map.m_flags & F2FS_MAP_MAPPED) {
- if (WARN_ON_ONCE(map.m_pblk == NEW_ADDR))
- return -EINVAL;
-
- iomap->length = F2FS_BLK_TO_BYTES(map.m_len);
- iomap->type = IOMAP_MAPPED;
- iomap->flags |= IOMAP_F_MERGED;
- iomap->bdev = map.m_bdev;
- iomap->addr = F2FS_BLK_TO_BYTES(map.m_pblk);
-
- if (flags & IOMAP_WRITE && map.m_last_pblk)
- iomap->private = (void *)map.m_last_pblk;
+ return f2fs_set_iomap(inode, &map, iomap, flags, offset, length, false);
+}
+int f2fs_set_iomap(struct inode *inode, struct f2fs_map_blocks *map,
+ struct iomap *iomap, unsigned int flags, loff_t offset,
+ loff_t length, bool dio)
+{
+ iomap->offset = F2FS_BLK_TO_BYTES(map->m_lblk);
+ if (map->m_flags & F2FS_MAP_MAPPED) {
+ if (dio) {
+ if (WARN_ON_ONCE(map->m_pblk == NEW_ADDR))
+ return -EINVAL;
+ }
+ iomap->length = F2FS_BLK_TO_BYTES(map->m_len);
+ iomap->bdev = map->m_bdev;
+ if (map->m_pblk != NEW_ADDR) {
+ iomap->type = IOMAP_MAPPED;
+ iomap->flags |= IOMAP_F_MERGED;
+ iomap->addr = F2FS_BLK_TO_BYTES(map->m_pblk);
+ } else {
+ iomap->type = IOMAP_UNWRITTEN;
+ iomap->addr = IOMAP_NULL_ADDR;
+ }
+ if (flags & IOMAP_WRITE && map->m_last_pblk)
+ iomap->private = (void *)map->m_last_pblk;
} else {
- if (flags & IOMAP_WRITE)
+ if (dio && flags & IOMAP_WRITE)
return -ENOTBLK;
- if (map.m_pblk == NULL_ADDR) {
- iomap->length = F2FS_BLK_TO_BYTES(next_pgofs) -
- iomap->offset;
+ if (map->m_pblk == NULL_ADDR) {
+ if (map->m_flags & F2FS_MAP_NODNODE)
+ iomap->length =
+ F2FS_BLK_TO_BYTES(*map->m_next_pgofs) -
+ iomap->offset;
+ else
+ iomap->length = F2FS_BLK_TO_BYTES(map->m_len);
iomap->type = IOMAP_HOLE;
- } else if (map.m_pblk == NEW_ADDR) {
- iomap->length = F2FS_BLK_TO_BYTES(map.m_len);
+ } else if (map->m_pblk == NEW_ADDR) {
+ iomap->length = F2FS_BLK_TO_BYTES(map->m_len);
iomap->type = IOMAP_UNWRITTEN;
} else {
f2fs_bug_on(F2FS_I_SB(inode), 1);
@@ -4301,7 +4327,7 @@ static int f2fs_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
iomap->addr = IOMAP_NULL_ADDR;
}
- if (map.m_flags & F2FS_MAP_NEW)
+ if (map->m_flags & F2FS_MAP_NEW)
iomap->flags |= IOMAP_F_NEW;
if ((inode->i_state & I_DIRTY_DATASYNC) ||
offset + length > i_size_read(inode))
@@ -4313,3 +4339,217 @@ static int f2fs_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
const struct iomap_ops f2fs_iomap_ops = {
.iomap_begin = f2fs_iomap_begin,
};
+
+/* iomap buffered-io */
+static int f2fs_buffered_read_iomap_begin(struct inode *inode, loff_t offset,
+ loff_t length, unsigned int flags,
+ struct iomap *iomap,
+ struct iomap *srcmap)
+{
+ pgoff_t next_pgofs = 0;
+ int err;
+ struct f2fs_map_blocks map = {};
+
+ map.m_lblk = F2FS_BYTES_TO_BLK(offset);
+ map.m_len = F2FS_BYTES_TO_BLK(offset + length - 1) - map.m_lblk + 1;
+ map.m_next_pgofs = &next_pgofs;
+ map.m_seg_type =
+ f2fs_rw_hint_to_seg_type(F2FS_I_SB(inode), inode->i_write_hint);
+ map.m_may_create = false;
+ if (is_sbi_flag_set(F2FS_I_SB(inode), SBI_IS_SHUTDOWN))
+ return -EIO;
+ /*
+ * If the blocks being overwritten are already allocated,
+ * f2fs_map_lock and f2fs_balance_fs are not necessary.
+ */
+ if (flags & IOMAP_WRITE)
+ return -EINVAL;
+
+ err = f2fs_map_blocks(inode, &map, F2FS_GET_BLOCK_IOMAP);
+ if (err)
+ return err;
+
+ if (WARN_ON_ONCE(map.m_pblk == COMPRESS_ADDR))
+ return -EINVAL;
+
+ return f2fs_set_iomap(inode, &map, iomap, flags, offset, length, false);
+}
+
+const struct iomap_ops f2fs_buffered_read_iomap_ops = {
+ .iomap_begin = f2fs_buffered_read_iomap_begin,
+};
+
+static void f2fs_iomap_readahead(struct readahead_control *rac)
+{
+ struct inode *inode = rac->mapping->host;
+
+ if (!f2fs_is_compress_backend_ready(inode))
+ return;
+
+ /* If the file has inline data, skip readahead */
+ if (f2fs_has_inline_data(inode))
+ return;
+ iomap_readahead(rac, &f2fs_buffered_read_iomap_ops);
+}
+
+static int f2fs_buffered_write_iomap_begin(struct inode *inode, loff_t offset,
+ loff_t length, unsigned flags,
+ struct iomap *iomap,
+ struct iomap *srcmap)
+{
+ struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
+ struct f2fs_map_blocks map = {};
+ struct folio *ifolio = NULL;
+ int err = 0;
+
+ iomap->offset = offset;
+ iomap->bdev = sbi->sb->s_bdev;
+#ifdef CONFIG_F2FS_IOMAP_FOLIO_STATE
+ iomap->validity_cookie = f2fs_iomap_seq_read(inode);
+#endif
+ if (f2fs_has_inline_data(inode)) {
+ if (offset + length <= MAX_INLINE_DATA(inode)) {
+ ifolio = f2fs_get_inode_folio(sbi, inode->i_ino);
+ if (IS_ERR(ifolio)) {
+ err = PTR_ERR(ifolio);
+ goto failed;
+ }
+ set_inode_flag(inode, FI_DATA_EXIST);
+ f2fs_iomap_prepare_read_inline(inode, ifolio, iomap,
+ offset, length);
+ if (inode->i_nlink)
+ folio_set_f2fs_inline(ifolio);
+
+ f2fs_folio_put(ifolio, 1);
+ goto out;
+ }
+ }
+ block_t start_blk = F2FS_BYTES_TO_BLK(offset);
+ block_t len_blks =
+ F2FS_BYTES_TO_BLK(offset + length - 1) - start_blk + 1;
+ err = f2fs_map_blocks_iomap(inode, start_blk, len_blks, &map);
+ if (map.m_pblk == NULL_ADDR) {
+ err = f2fs_map_blocks_preallocate(inode, map.m_lblk, len_blks,
+ &map);
+ if (err)
+ goto failed;
+ }
+ if (WARN_ON_ONCE(map.m_pblk == COMPRESS_ADDR))
+ return -EIO; // Should not happen for buffered write prep
+ err = f2fs_set_iomap(inode, &map, iomap, flags, offset, length, false);
+ if (err)
+ return err;
+failed:
+ f2fs_write_failed(inode, offset + length);
+out:
+ return err;
+}
+
+static int f2fs_buffered_write_atomic_iomap_begin(struct inode *inode,
+ loff_t offset, loff_t length,
+ unsigned flags,
+ struct iomap *iomap,
+ struct iomap *srcmap)
+{
+ struct inode *cow_inode = F2FS_I(inode)->cow_inode;
+ struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
+ struct f2fs_map_blocks map = {};
+ int err = 0;
+
+ iomap->offset = offset;
+ iomap->bdev = sbi->sb->s_bdev;
+#ifdef CONFIG_F2FS_IOMAP_FOLIO_STATE
+ iomap->validity_cookie = f2fs_iomap_seq_read(inode);
+#endif
+ block_t start_blk = F2FS_BYTES_TO_BLK(offset);
+ block_t len_blks =
+ F2FS_BYTES_TO_BLK(offset + length - 1) - start_blk + 1;
+ err = f2fs_map_blocks_iomap(cow_inode, start_blk, len_blks, &map);
+ if (err)
+ return err;
+ if (map.m_pblk == NULL_ADDR &&
+ is_inode_flag_set(inode, FI_ATOMIC_REPLACE)) {
+ err = f2fs_map_blocks_preallocate(cow_inode, map.m_lblk,
+ map.m_len, &map);
+ if (err)
+ return err;
+ inc_atomic_write_cnt(inode);
+ goto out;
+ } else if (map.m_pblk != NULL_ADDR) {
+ goto out;
+ }
+ err = f2fs_map_blocks_iomap(inode, start_blk, len_blks, &map);
+ if (err)
+ return err;
+out:
+ if (WARN_ON_ONCE(map.m_pblk == COMPRESS_ADDR))
+ return -EIO;
+
+ return f2fs_set_iomap(inode, &map, iomap, flags, offset, length, false);
+}
+
+static int f2fs_buffered_write_iomap_end(struct inode *inode, loff_t pos,
+ loff_t length, ssize_t written,
+ unsigned flags, struct iomap *iomap)
+{
+ return written;
+}
+
+const struct iomap_ops f2fs_buffered_write_iomap_ops = {
+ .iomap_begin = f2fs_buffered_write_iomap_begin,
+ .iomap_end = f2fs_buffered_write_iomap_end,
+};
+
+const struct iomap_ops f2fs_buffered_write_atomic_iomap_ops = {
+ .iomap_begin = f2fs_buffered_write_atomic_iomap_begin,
+};
+
+const struct address_space_operations f2fs_iomap_aops = {
+ .read_folio = f2fs_read_data_folio,
+ .readahead = f2fs_iomap_readahead,
+ .write_begin = f2fs_write_begin,
+ .write_end = f2fs_write_end,
+ .writepages = f2fs_write_data_pages,
+ .dirty_folio = f2fs_dirty_data_folio,
+ .invalidate_folio = f2fs_invalidate_folio,
+ .release_folio = f2fs_release_folio,
+ .migrate_folio = filemap_migrate_folio,
+ .is_partially_uptodate = iomap_is_partially_uptodate,
+ .error_remove_folio = generic_error_remove_folio,
+};
+
+static void f2fs_iomap_put_folio(struct inode *inode, loff_t pos,
+ unsigned copied, struct folio *folio)
+{
+ if (!copied)
+ goto unlock_out;
+ if (f2fs_is_atomic_file(inode))
+ folio_set_f2fs_atomic(folio);
+
+ if (pos + copied > i_size_read(inode) &&
+ !f2fs_verity_in_progress(inode)) {
+ if (f2fs_is_atomic_file(inode))
+ f2fs_i_size_write(F2FS_I(inode)->cow_inode,
+ pos + copied);
+ }
+unlock_out:
+ folio_unlock(folio);
+ folio_put(folio);
+ f2fs_update_time(F2FS_I_SB(inode), REQ_TIME);
+}
+
+#ifdef CONFIG_F2FS_IOMAP_FOLIO_STATE
+static bool f2fs_iomap_valid(struct inode *inode, const struct iomap *iomap)
+{
+ return iomap->validity_cookie == f2fs_iomap_seq_read(inode);
+}
+#else
+static bool f2fs_iomap_valid(struct inode *inode, const struct iomap *iomap)
+{
+ return 1;
+}
+#endif
+const struct iomap_write_ops f2fs_iomap_write_ops = {
+ .put_folio = f2fs_iomap_put_folio,
+ .iomap_valid = f2fs_iomap_valid
+};
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index ac9a6ac13e1f..1cf12b76b09a 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -762,6 +762,7 @@ struct extent_tree_info {
#define F2FS_MAP_NEW (1U << 0)
#define F2FS_MAP_MAPPED (1U << 1)
#define F2FS_MAP_DELALLOC (1U << 2)
+#define F2FS_MAP_NODNODE (1U << 3)
#define F2FS_MAP_FLAGS (F2FS_MAP_NEW | F2FS_MAP_MAPPED |\
F2FS_MAP_DELALLOC)
@@ -837,49 +838,53 @@ enum {
/* used for f2fs_inode_info->flags */
enum {
- FI_NEW_INODE, /* indicate newly allocated inode */
- FI_DIRTY_INODE, /* indicate inode is dirty or not */
- FI_AUTO_RECOVER, /* indicate inode is recoverable */
- FI_DIRTY_DIR, /* indicate directory has dirty pages */
- FI_INC_LINK, /* need to increment i_nlink */
- FI_ACL_MODE, /* indicate acl mode */
- FI_NO_ALLOC, /* should not allocate any blocks */
- FI_FREE_NID, /* free allocated nide */
- FI_NO_EXTENT, /* not to use the extent cache */
- FI_INLINE_XATTR, /* used for inline xattr */
- FI_INLINE_DATA, /* used for inline data*/
- FI_INLINE_DENTRY, /* used for inline dentry */
- FI_APPEND_WRITE, /* inode has appended data */
- FI_UPDATE_WRITE, /* inode has in-place-update data */
- FI_NEED_IPU, /* used for ipu per file */
- FI_ATOMIC_FILE, /* indicate atomic file */
- FI_DATA_EXIST, /* indicate data exists */
- FI_SKIP_WRITES, /* should skip data page writeback */
- FI_OPU_WRITE, /* used for opu per file */
- FI_DIRTY_FILE, /* indicate regular/symlink has dirty pages */
- FI_PREALLOCATED_ALL, /* all blocks for write were preallocated */
- FI_HOT_DATA, /* indicate file is hot */
- FI_EXTRA_ATTR, /* indicate file has extra attribute */
- FI_PROJ_INHERIT, /* indicate file inherits projectid */
- FI_PIN_FILE, /* indicate file should not be gced */
- FI_VERITY_IN_PROGRESS, /* building fs-verity Merkle tree */
- FI_COMPRESSED_FILE, /* indicate file's data can be compressed */
- FI_COMPRESS_CORRUPT, /* indicate compressed cluster is corrupted */
- FI_MMAP_FILE, /* indicate file was mmapped */
- FI_ENABLE_COMPRESS, /* enable compression in "user" compression mode */
- FI_COMPRESS_RELEASED, /* compressed blocks were released */
- FI_ALIGNED_WRITE, /* enable aligned write */
- FI_COW_FILE, /* indicate COW file */
- FI_ATOMIC_COMMITTED, /* indicate atomic commit completed except disk sync */
- FI_ATOMIC_DIRTIED, /* indicate atomic file is dirtied */
- FI_ATOMIC_REPLACE, /* indicate atomic replace */
- FI_OPENED_FILE, /* indicate file has been opened */
- FI_DONATE_FINISHED, /* indicate page donation of file has been finished */
- FI_MAX, /* max flag, never be used */
+ FI_NEW_INODE, /* indicate newly allocated inode */
+ FI_DIRTY_INODE, /* indicate inode is dirty or not */
+ FI_AUTO_RECOVER, /* indicate inode is recoverable */
+ FI_DIRTY_DIR, /* indicate directory has dirty pages */
+ FI_INC_LINK, /* need to increment i_nlink */
+ FI_ACL_MODE, /* indicate acl mode */
+ FI_NO_ALLOC, /* should not allocate any blocks */
+ FI_FREE_NID, /* free allocated nide */
+ FI_NO_EXTENT, /* not to use the extent cache */
+ FI_INLINE_XATTR, /* used for inline xattr */
+ FI_INLINE_DATA, /* used for inline data*/
+ FI_INLINE_DENTRY, /* used for inline dentry */
+ FI_APPEND_WRITE, /* inode has appended data */
+ FI_UPDATE_WRITE, /* inode has in-place-update data */
+ FI_NEED_IPU, /* used for ipu per file */
+ FI_ATOMIC_FILE, /* indicate atomic file */
+ FI_DATA_EXIST, /* indicate data exists */
+ FI_SKIP_WRITES, /* should skip data page writeback */
+ FI_OPU_WRITE, /* used for opu per file */
+ FI_DIRTY_FILE, /* indicate regular/symlink has dirty pages */
+ FI_PREALLOCATED_ALL, /* all blocks for write were preallocated */
+ FI_HOT_DATA, /* indicate file is hot */
+ FI_EXTRA_ATTR, /* indicate file has extra attribute */
+ FI_PROJ_INHERIT, /* indicate file inherits projectid */
+ FI_PIN_FILE, /* indicate file should not be gced */
+ FI_VERITY_IN_PROGRESS, /* building fs-verity Merkle tree */
+ FI_COMPRESSED_FILE, /* indicate file's data can be compressed */
+ FI_COMPRESS_CORRUPT, /* indicate compressed cluster is corrupted */
+ FI_MMAP_FILE, /* indicate file was mmapped */
+ FI_ENABLE_COMPRESS, /* enable compression in "user" compression mode */
+ FI_COMPRESS_RELEASED, /* compressed blocks were released */
+ FI_ALIGNED_WRITE, /* enable aligned write */
+ FI_COW_FILE, /* indicate COW file */
+ FI_ATOMIC_COMMITTED, /* indicate atomic commit completed except disk sync */
+ FI_ATOMIC_DIRTIED, /* indicate atomic file is dirtied */
+ FI_ATOMIC_REPLACE, /* indicate atomic replace */
+ FI_OPENED_FILE, /* indicate file has been opened */
+ FI_DONATE_FINISHED, /* indicate page donation of file has been finished */
+ FI_IOMAP, /* indicate whether this inode should enable iomap*/
+ FI_MAX, /* max flag, never be used */
};
struct f2fs_inode_info {
struct inode vfs_inode; /* serve a vfs inode */
+#ifdef CONFIG_F2FS_IOMAP_FOLIO_STATE
+ atomic64_t i_iomap_seq; /* for iomap_valid sequence number */
+#endif
unsigned long i_flags; /* keep an inode flags for ioctl */
unsigned char i_advise; /* use to give file attribute hints */
unsigned char i_dir_level; /* use for dentry level for large dir */
@@ -2814,6 +2819,16 @@ static inline void inc_page_count(struct f2fs_sb_info *sbi, int count_type)
set_sbi_flag(sbi, SBI_IS_DIRTY);
}
+static inline void inc_page_count_multiple(struct f2fs_sb_info *sbi,
+ int count_type, int npages)
+{
+ atomic_add(npages, &sbi->nr_pages[count_type]);
+
+ if (count_type == F2FS_DIRTY_DENTS || count_type == F2FS_DIRTY_NODES ||
+ count_type == F2FS_DIRTY_META || count_type == F2FS_DIRTY_QDATA ||
+ count_type == F2FS_DIRTY_IMETA)
+ set_sbi_flag(sbi, SBI_IS_DIRTY);
+}
static inline void inode_inc_dirty_pages(struct inode *inode)
{
atomic_inc(&F2FS_I(inode)->dirty_pages);
@@ -3657,6 +3672,10 @@ static inline bool f2fs_is_cow_file(struct inode *inode)
return is_inode_flag_set(inode, FI_COW_FILE);
}
+static inline bool f2fs_iomap_inode(struct inode *inode)
+{
+ return is_inode_flag_set(inode, FI_IOMAP);
+}
static inline void *inline_data_addr(struct inode *inode, struct folio *folio)
{
__le32 *addr = get_dnode_addr(inode, folio);
@@ -3880,7 +3899,17 @@ int f2fs_write_inode(struct inode *inode, struct writeback_control *wbc);
void f2fs_remove_donate_inode(struct inode *inode);
void f2fs_evict_inode(struct inode *inode);
void f2fs_handle_failed_inode(struct inode *inode);
+#ifdef CONFIG_F2FS_IOMAP_FOLIO_STATE
+static inline void f2fs_iomap_seq_inc(struct inode *inode)
+{
+ atomic64_inc(&F2FS_I(inode)->i_iomap_seq);
+}
+static inline u64 f2fs_iomap_seq_read(struct inode *inode)
+{
+ return atomic64_read(&F2FS_I(inode)->i_iomap_seq);
+}
+#endif
/*
* namei.c
*/
@@ -4248,6 +4277,9 @@ int f2fs_write_single_data_page(struct folio *folio, int *submitted,
enum iostat_type io_type,
int compr_blocks, bool allow_balance);
void f2fs_write_failed(struct inode *inode, loff_t to);
+int f2fs_set_iomap(struct inode *inode, struct f2fs_map_blocks *map,
+ struct iomap *iomap, unsigned int flags, loff_t offset,
+ loff_t length, bool dio);
void f2fs_invalidate_folio(struct folio *folio, size_t offset, size_t length);
bool f2fs_release_folio(struct folio *folio, gfp_t wait);
bool f2fs_overwrite_io(struct inode *inode, loff_t pos, size_t len);
@@ -4258,6 +4290,11 @@ int f2fs_init_post_read_wq(struct f2fs_sb_info *sbi);
void f2fs_destroy_post_read_wq(struct f2fs_sb_info *sbi);
extern const struct iomap_ops f2fs_iomap_ops;
+extern const struct iomap_write_ops f2fs_iomap_write_ops;
+extern const struct iomap_ops f2fs_buffered_read_iomap_ops;
+extern const struct iomap_ops f2fs_buffered_write_iomap_ops;
+extern const struct iomap_ops f2fs_buffered_write_atomic_iomap_ops;
+
/*
* gc.c
*/
@@ -4540,6 +4577,7 @@ extern const struct file_operations f2fs_dir_operations;
extern const struct file_operations f2fs_file_operations;
extern const struct inode_operations f2fs_file_inode_operations;
extern const struct address_space_operations f2fs_dblock_aops;
+extern const struct address_space_operations f2fs_iomap_aops;
extern const struct address_space_operations f2fs_node_aops;
extern const struct address_space_operations f2fs_meta_aops;
extern const struct inode_operations f2fs_dir_inode_operations;
@@ -4578,7 +4616,9 @@ int f2fs_read_inline_dir(struct file *file, struct dir_context *ctx,
int f2fs_inline_data_fiemap(struct inode *inode,
struct fiemap_extent_info *fieinfo,
__u64 start, __u64 len);
-
+void f2fs_iomap_prepare_read_inline(struct inode *inode, struct folio *ifolio,
+ struct iomap *iomap, loff_t pos,
+ loff_t length);
/*
* shrinker.c
*/
diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index 42faaed6a02d..6c5b3e632f2b 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -4965,7 +4965,14 @@ static int f2fs_preallocate_blocks(struct kiocb *iocb, struct iov_iter *iter,
if (ret)
return ret;
}
-
+#ifdef CONFIG_F2FS_IOMAP_FOLIO_STATE
+ /* Buffered write can convert inline file to large normal file
+ * when convert success, we uses mapping set large folios here
+ */
+ if (f2fs_should_use_buffered_iomap(inode))
+ mapping_set_large_folios(inode->i_mapping);
+ set_inode_flag(inode, FI_IOMAP);
+#endif
/* Do not preallocate blocks that will be written partially in 4KB. */
map.m_lblk = F2FS_BLK_ALIGN(pos);
map.m_len = F2FS_BYTES_TO_BLK(pos + count);
@@ -4994,6 +5001,24 @@ static int f2fs_preallocate_blocks(struct kiocb *iocb, struct iov_iter *iter,
return map.m_len;
}
+static ssize_t f2fs_iomap_buffered_write(struct kiocb *iocb, struct iov_iter *i)
+{
+ struct file *file = iocb->ki_filp;
+ struct inode *inode = file_inode(file);
+ ssize_t ret;
+
+ if (f2fs_is_atomic_file(inode)) {
+ ret = iomap_file_buffered_write(iocb, i,
+ &f2fs_buffered_write_atomic_iomap_ops,
+ &f2fs_iomap_write_ops, NULL);
+ } else {
+ ret = iomap_file_buffered_write(iocb, i,
+ &f2fs_buffered_write_iomap_ops,
+ &f2fs_iomap_write_ops, NULL);
+ }
+ return ret;
+}
+
static ssize_t f2fs_buffered_write_iter(struct kiocb *iocb,
struct iov_iter *from)
{
@@ -5004,7 +5029,11 @@ static ssize_t f2fs_buffered_write_iter(struct kiocb *iocb,
if (iocb->ki_flags & IOCB_NOWAIT)
return -EOPNOTSUPP;
- ret = generic_perform_write(iocb, from);
+ if (f2fs_iomap_inode(inode)) {
+ ret = f2fs_iomap_buffered_write(iocb, from);
+ } else {
+ ret = generic_perform_write(iocb, from);
+ }
if (ret > 0) {
f2fs_update_iostat(F2FS_I_SB(inode), inode,
diff --git a/fs/f2fs/inline.c b/fs/f2fs/inline.c
index 58ac831ef704..bda338b4fc22 100644
--- a/fs/f2fs/inline.c
+++ b/fs/f2fs/inline.c
@@ -13,7 +13,7 @@
#include "f2fs.h"
#include "node.h"
#include <trace/events/f2fs.h>
-
+#include <linux/iomap.h>
static bool support_inline_data(struct inode *inode)
{
if (f2fs_used_in_atomic_write(inode))
@@ -832,3 +832,16 @@ int f2fs_inline_data_fiemap(struct inode *inode,
f2fs_folio_put(ifolio, true);
return err;
}
+/* fill iomap struct for inline data case for
+ *iomap buffered write
+ */
+void f2fs_iomap_prepare_read_inline(struct inode *inode, struct folio *ifolio,
+ struct iomap *iomap, loff_t pos,
+ loff_t length)
+{
+ iomap->addr = IOMAP_NULL_ADDR;
+ iomap->length = length;
+ iomap->type = IOMAP_INLINE;
+ iomap->flags = 0;
+ iomap->inline_data = inline_data_addr(inode, ifolio);
+}
diff --git a/fs/f2fs/inode.c b/fs/f2fs/inode.c
index 8c4eafe9ffac..29378270d561 100644
--- a/fs/f2fs/inode.c
+++ b/fs/f2fs/inode.c
@@ -23,6 +23,24 @@
extern const struct address_space_operations f2fs_compress_aops;
#endif
+bool f2fs_should_use_buffered_iomap(struct inode *inode)
+{
+ if (!S_ISREG(inode->i_mode))
+ return false;
+ if (S_ISDIR(inode->i_mode) || S_ISLNK(inode->i_mode))
+ return false;
+ if (inode->i_mapping == NODE_MAPPING(F2FS_I_SB(inode)))
+ return false;
+ if (inode->i_mapping == META_MAPPING(F2FS_I_SB(inode)))
+ return false;
+ if (f2fs_encrypted_file(inode))
+ return false;
+ if (fsverity_active(inode))
+ return false;
+ if (f2fs_compressed_file(inode))
+ return false;
+ return true;
+}
void f2fs_mark_inode_dirty_sync(struct inode *inode, bool sync)
{
if (is_inode_flag_set(inode, FI_NEW_INODE))
@@ -611,7 +629,16 @@ struct inode *f2fs_iget(struct super_block *sb, unsigned long ino)
} else if (S_ISREG(inode->i_mode)) {
inode->i_op = &f2fs_file_inode_operations;
inode->i_fop = &f2fs_file_operations;
+#ifdef CONFIG_F2FS_IOMAP_FOLIO_STATE
+ if (f2fs_should_use_buffered_iomap(inode)) {
+ mapping_set_large_folios(inode->i_mapping);
+ set_inode_flag(inode, FI_IOMAP);
+ inode->i_mapping->a_ops = &f2fs_iomap_aops;
+ } else
+ inode->i_mapping->a_ops = &f2fs_dblock_aops;
+#else
inode->i_mapping->a_ops = &f2fs_dblock_aops;
+#endif
} else if (S_ISDIR(inode->i_mode)) {
inode->i_op = &f2fs_dir_inode_operations;
inode->i_fop = &f2fs_dir_operations;
diff --git a/fs/f2fs/namei.c b/fs/f2fs/namei.c
index b882771e4699..2d995860c488 100644
--- a/fs/f2fs/namei.c
+++ b/fs/f2fs/namei.c
@@ -328,6 +328,13 @@ static struct inode *f2fs_new_inode(struct mnt_idmap *idmap,
f2fs_init_extent_tree(inode);
trace_f2fs_new_inode(inode, 0);
+#ifdef CONFIG_F2FS_IOMAP_FOLIO_STATE
+ if (f2fs_should_use_buffered_iomap(inode)) {
+ set_inode_flag(inode, FI_IOMAP);
+ mapping_set_large_folios(inode->i_mapping);
+ }
+#endif
+
return inode;
fail:
diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
index 2000880b7dca..35a42d6214fe 100644
--- a/fs/f2fs/super.c
+++ b/fs/f2fs/super.c
@@ -1719,6 +1719,9 @@ static struct inode *f2fs_alloc_inode(struct super_block *sb)
init_once((void *) fi);
/* Initialize f2fs-specific inode info */
+#ifdef CONFIG_F2FS_IOMAP_FOLIO_STATE
+ atomic64_set(&fi->i_iomap_seq, 0);
+#endif
atomic_set(&fi->dirty_pages, 0);
atomic_set(&fi->i_compr_blocks, 0);
atomic_set(&fi->open_count, 0);
--
2.34.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
end of thread, other threads:[~2025-08-13 9:39 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-13 9:37 [f2fs-dev] [RESEND RFC PATCH 0/9] f2fs: Enable buffered read/write large folios support with extended iomap Nanzhe Zhao
2025-08-13 9:37 ` [RFC PATCH 1/9] f2fs: Introduce f2fs_iomap_folio_state Nanzhe Zhao
2025-08-13 9:37 ` [RFC PATCH 2/9] f2fs: Integrate f2fs_iomap_folio_state into f2fs page private helpers Nanzhe Zhao
2025-08-13 9:37 ` [RFC PATCH 3/9] f2fs: Using `folio_detach_f2fs_private` in invalidate and release folio Nanzhe Zhao
2025-08-13 9:37 ` [RFC PATCH 4/9] f2fs: Convert outplace write path page private funcions to folio private functions Nanzhe Zhao
2025-08-13 9:37 ` [RFC PATCH 5/9] f2fs:Refactor `f2fs_is_compressed_page` to `f2fs_is_compressed_folio` Nanzhe Zhao
2025-08-13 9:37 ` [RFC PATCH 6/9] f2fs: Extend f2fs_io_info to support sub-folio ranges Nanzhe Zhao
2025-08-13 9:37 ` [RFC PATCH 7/9] f2fs:Make GC aware of large folios Nanzhe Zhao
2025-08-13 9:37 ` [RFC PATCH 8/9] f2fs: Introduce F2FS_GET_BLOCK_IOMAP and map_blocks he lpers Nanzhe Zhao
2025-08-13 9:37 ` [RFC PATCH 9/9] f2fs: Enable buffered read/write path large folios support for normal and atomic file with iomap Nanzhe Zhao
-- strict thread matches above, loose matches on Subject: below --
2025-08-13 9:21 [f2fs-dev] [RFC PATCH 0/9] f2fs: Enable buffered read/write large folios support with extended iomap Nanzhe Zhao
2025-08-13 9:21 ` [RFC PATCH 7/9] f2fs:Make GC aware of large folios Nanzhe Zhao
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).