[PATCH AUTOSEL 6.19-6.12] ext4: use reserved metadata blocks when splitting extent on endio

public inbox for linux-ext4@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH AUTOSEL 6.19-6.12] ext4: use reserved metadata blocks when splitting extent on endio
       [not found] <20260214212452.782265-1-sashal@kernel.org>
@ 2026-02-14 21:23 ` Sasha Levin
  2026-02-14 21:23 ` [PATCH AUTOSEL 6.19-6.6] ext4: move ext4_percpu_param_init() before ext4_mb_init() Sasha Levin
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 5+ messages in thread
From: Sasha Levin @ 2026-02-14 21:23 UTC (permalink / raw)
  To: patches, stable
  Cc: Zhang Yi, Jan Kara, Baokun Li, Ojaswin Mujoo, Theodore Ts'o,
	Sasha Levin, adilger.kernel, linux-ext4

From: Zhang Yi <yi.zhang@huawei.com>

[ Upstream commit 01942af95ab6c9d98e64ae01fdc243a03e4b973f ]

When performing buffered writes, we may need to split and convert an
unwritten extent into a written one during the end I/O process. However,
we do not reserve space specifically for these metadata changes, we only
reserve 2% of space or 4096 blocks. To address this, we use
EXT4_GET_BLOCKS_PRE_IO to potentially split extents in advance and
EXT4_GET_BLOCKS_METADATA_NOFAIL to utilize reserved space if necessary.

These two approaches can reduce the likelihood of running out of space
and losing data. However, these methods are merely best efforts, we
could still run out of space, and there is not much difference between
converting an extent during the writeback process and the end I/O
process, it won't increase the risk of losing data if we postpone the
conversion.

Therefore, also use EXT4_GET_BLOCKS_METADATA_NOFAIL in
ext4_convert_unwritten_extents_endio() to prepare for the buffered I/O
iomap conversion, which may perform extent conversion during the end I/O
process.

Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Link: https://patch.msgid.link/20260105014522.1937690-2-yi.zhang@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

The commit message says "to prepare for the buffered I/O iomap
conversion, which may perform extent conversion during the end I/O
process." This suggests it's preparation for a future change. However,
the bug exists **independently** — even in the current code path through
`ext4_ext_handle_unwritten_extents`, the flag is set at line 3908 but
the `ext4_convert_unwritten_extents_endio` function hardcodes only
`EXT4_GET_BLOCKS_CONVERT` when calling `ext4_split_convert_extents` at
line 3779-3780.

Wait — let me re-read the flow more carefully. When
`ext4_ext_handle_unwritten_extents()` calls
`ext4_convert_unwritten_extents_endio()`, the flags variable with
`METADATA_NOFAIL` is local to `ext4_ext_handle_unwritten_extents()`.
`ext4_convert_unwritten_extents_endio()` doesn't receive those flags as
a parameter — it constructs its own flags (`EXT4_GET_BLOCKS_CONVERT`)
internally at line 3780. So **the METADATA_NOFAIL flag is NOT
propagated** to the split operation inside
`ext4_convert_unwritten_extents_endio()`.

This is a real bug that exists in the current codebase, not just a
preparation for future code. The split operation during endio can fail
with ENOSPC because it doesn't use reserved metadata blocks.

### 3. Classification

**Bug fix**: Prevents potential data loss on near-full ext4 filesystems
when extent splitting is needed during endio. When the filesystem is
nearly full, the extent conversion can fail because it doesn't tap into
the reserved metadata pool. This failure at endio means written data may
appear as unwritten (zeroed), which is **data loss**.

### 4. Scope and Risk Assessment

- **Lines changed**: ~5 lines (adding one flag to an existing call)
- **Files changed**: 1 (fs/ext4/extents.c)
- **Risk**: Extremely low. The `EXT4_GET_BLOCKS_METADATA_NOFAIL` flag is
  already used elsewhere in the same function's caller
  (`ext4_ext_handle_unwritten_extents`). This just ensures the flag is
  also used when the called function internally needs to split extents.
- **Subsystem**: ext4 — the most widely used Linux filesystem. Affects
  everyone.

### 5. User Impact

- **Who**: Any user with an ext4 filesystem that is near-full performing
  buffered writes where extent splitting is needed during endio
- **Severity**: Data loss — written data appears zeroed because the
  extent remains marked as unwritten
- **Likelihood**: Increases as filesystem fills up; real-world scenario
  on busy servers

### 6. Stability Indicators

- **Reviewed-by**: Jan Kara (ext4 co-maintainer), Baokun Li, Ojaswin
  Mujoo — three reviewers
- **Committed-by**: Theodore Ts'o (ext4 maintainer)
- This level of review indicates high confidence in the fix

### 7. Dependency Check

The change is entirely self-contained. It only adds an existing flag
(`EXT4_GET_BLOCKS_METADATA_NOFAIL`) to an existing function call. No
dependencies on other commits. The affected code
(`ext4_convert_unwritten_extents_endio`) has been in the kernel for many
years and exists in all stable trees.

### Summary

This is a small, surgical fix for a real data loss scenario in ext4 —
the most widely used Linux filesystem. When the filesystem is near full,
extent conversion during endio can fail because it doesn't use the
reserved metadata block pool. The fix adds a single flag
(`EXT4_GET_BLOCKS_METADATA_NOFAIL`) that was already supposed to be
propagated but wasn't. It's been reviewed by three ext4 experts
including the subsystem maintainer. The risk is minimal and the benefit
is preventing data loss.

**YES**

 fs/ext4/extents.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 2cf5759ba6894..f1322f64071ff 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -3770,6 +3770,8 @@ ext4_convert_unwritten_extents_endio(handle_t *handle, struct inode *inode,
 	 * illegal.
 	 */
 	if (ee_block != map->m_lblk || ee_len > map->m_len) {
+		int flags = EXT4_GET_BLOCKS_CONVERT |
+			    EXT4_GET_BLOCKS_METADATA_NOFAIL;
 #ifdef CONFIG_EXT4_DEBUG
 		ext4_warning(inode->i_sb, "Inode (%ld) finished: extent logical block %llu,"
 			     " len %u; IO logical block %llu, len %u",
@@ -3777,7 +3779,7 @@ ext4_convert_unwritten_extents_endio(handle_t *handle, struct inode *inode,
 			     (unsigned long long)map->m_lblk, map->m_len);
 #endif
 		path = ext4_split_convert_extents(handle, inode, map, path,
-						EXT4_GET_BLOCKS_CONVERT, NULL);
+						  flags, NULL);
 		if (IS_ERR(path))
 			return path;

-- 
2.51.0

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH AUTOSEL 6.19-6.6] ext4: move ext4_percpu_param_init() before ext4_mb_init()
       [not found] <20260214212452.782265-1-sashal@kernel.org>
  2026-02-14 21:23 ` [PATCH AUTOSEL 6.19-6.12] ext4: use reserved metadata blocks when splitting extent on endio Sasha Levin
@ 2026-02-14 21:23 ` Sasha Levin
  2026-02-14 21:23 ` [PATCH AUTOSEL 6.19-5.15] ext4: mark group add fast-commit ineligible Sasha Levin
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 5+ messages in thread
From: Sasha Levin @ 2026-02-14 21:23 UTC (permalink / raw)
  To: patches, stable
  Cc: Baokun Li, Zhang Yi, Jan Kara, Theodore Ts'o, Sasha Levin,
	adilger.kernel, linux-ext4

From: Baokun Li <libaokun1@huawei.com>

[ Upstream commit 270564513489d98b721a1e4a10017978d5213bff ]

When running `kvm-xfstests -c ext4/1k -C 1 generic/383` with the
`DOUBLE_CHECK` macro defined, the following panic is triggered:

==================================================================
EXT4-fs error (device vdc): ext4_validate_block_bitmap:423:
                        comm mount: bg 0: bad block bitmap checksum
BUG: unable to handle page fault for address: ff110000fa2cc000
PGD 3e01067 P4D 3e02067 PUD 0
Oops: Oops: 0000 [#1] SMP NOPTI
CPU: 0 UID: 0 PID: 2386 Comm: mount Tainted: G W
                        6.18.0-gba65a4e7120a-dirty #1152 PREEMPT(none)
RIP: 0010:percpu_counter_add_batch+0x13/0xa0
Call Trace:
 <TASK>
 ext4_mark_group_bitmap_corrupted+0xcb/0xe0
 ext4_validate_block_bitmap+0x2a1/0x2f0
 ext4_read_block_bitmap+0x33/0x50
 mb_group_bb_bitmap_alloc+0x33/0x80
 ext4_mb_add_groupinfo+0x190/0x250
 ext4_mb_init_backend+0x87/0x290
 ext4_mb_init+0x456/0x640
 __ext4_fill_super+0x1072/0x1680
 ext4_fill_super+0xd3/0x280
 get_tree_bdev_flags+0x132/0x1d0
 vfs_get_tree+0x29/0xd0
 vfs_cmd_create+0x59/0xe0
 __do_sys_fsconfig+0x4f6/0x6b0
 do_syscall_64+0x50/0x1f0
 entry_SYSCALL_64_after_hwframe+0x76/0x7e
==================================================================

This issue can be reproduced using the following commands:
        mkfs.ext4 -F -q -b 1024 /dev/sda 5G
        tune2fs -O quota,project /dev/sda
        mount /dev/sda /tmp/test

With DOUBLE_CHECK defined, mb_group_bb_bitmap_alloc() reads
and validates the block bitmap. When the validation fails,
ext4_mark_group_bitmap_corrupted() attempts to update
sbi->s_freeclusters_counter. However, this percpu_counter has not been
initialized yet at this point, which leads to the panic described above.

Fix this by moving the execution of ext4_percpu_param_init() to occur
before ext4_mb_init(), ensuring the per-CPU counters are initialized
before they are used.

Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20251209133116.731350-1-libaokun@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

## Analysis of ext4: move ext4_percpu_param_init() before ext4_mb_init()

### 1. COMMIT MESSAGE ANALYSIS

The commit message is exceptionally detailed and clearly describes:
- **A concrete, reproducible crash** (kernel panic / page fault / oops)
- **Exact reproduction steps** (`mkfs.ext4 -F -q -b 1024 /dev/sda 5G;
  tune2fs -O quota,project /dev/sda; mount /dev/sda /tmp/test`)
- **Full stack trace** showing the crash in `percpu_counter_add_batch`
  called from `ext4_mark_group_bitmap_corrupted` during `ext4_mb_init`
- **Root cause**: `sbi->s_freeclusters_counter` (a percpu_counter) is
  used before it's initialized because `ext4_percpu_param_init()` was
  called *after* `ext4_mb_init()`

The trigger is a corrupted block bitmap encountered during mount with
`DOUBLE_CHECK` defined. When `ext4_validate_block_bitmap` fails and
calls `ext4_mark_group_bitmap_corrupted`, it tries to update a percpu
counter that hasn't been initialized yet.

### 2. CODE CHANGE ANALYSIS

The fix is straightforward reordering:

1. **Move `ext4_percpu_param_init(sbi)` call BEFORE `ext4_mb_init(sb)`**
   — This ensures percpu counters are initialized before `ext4_mb_init`
   can potentially use them via bitmap validation error paths.

2. **Move `ext4_percpu_param_destroy(sbi)` to the `failed_mount5`
   label** — This adjusts the cleanup ordering to match the new
   initialization ordering. Previously it was at `failed_mount6` (after
   `ext4_mb_release`), now it's at `failed_mount5` (before
   `ext4_mb_release` but still cleaning up properly).

The error path handling is also correctly adjusted: if
`ext4_percpu_param_init` fails, it jumps to `failed_mount5` which now
calls `ext4_percpu_param_destroy`. If `ext4_mb_init` fails, it also
jumps to `failed_mount5`, which will now properly destroy the percpu
params that were initialized.

Let me verify the error path correctness more carefully:

- **Before the change**: `failed_mount6` → `ext4_mb_release`,
  `ext4_flex_groups_free`, `ext4_percpu_param_destroy` → falls through
  to `failed_mount5` → `ext4_ext_release`, etc.
- **After the change**: `failed_mount6` → `ext4_mb_release`,
  `ext4_flex_groups_free` → falls through to `failed_mount5` →
  `ext4_percpu_param_destroy` → `ext4_ext_release`, etc.

This is correct: the destroy is still called on all paths where init
succeeded, and the ordering is proper (mb_release before
percpu_param_destroy, which mirrors the new init ordering).

### 3. CLASSIFICATION

This is a clear **bug fix** — it fixes a **NULL pointer dereference /
page fault** during mount. The crash occurs when:
1. A filesystem with corrupted block bitmap is mounted
2. `ext4_mb_init` reads and validates block bitmaps
3. Validation fails, triggering `ext4_mark_group_bitmap_corrupted`
4. That function tries to update an uninitialized percpu counter →
   **crash**

While the reproduction requires `DOUBLE_CHECK` to be defined in the test
scenario described, the underlying issue is that `ext4_mb_init` can
access percpu counters before they're initialized. This could
potentially be triggered in other scenarios too.

### 4. SCOPE AND RISK ASSESSMENT

- **Lines changed**: Very small — moving a few lines of code (function
  call reordering + error label adjustment)
- **Files touched**: 1 file (`fs/ext4/super.c`)
- **Subsystem**: ext4 filesystem — one of the most widely used Linux
  filesystems
- **Risk**: Very low. The change is a simple reordering of
  initialization. `ext4_percpu_param_init` has no dependency on
  `ext4_mb_init`, so moving it before is safe. The error paths are
  correctly adjusted.
- **Could this break something?** Extremely unlikely — it just
  initializes counters earlier, which is strictly safer.

### 5. USER IMPACT

- **Who is affected**: Any user mounting an ext4 filesystem with
  corrupted block bitmaps
- **How severe**: Kernel panic / crash during mount — **HIGH severity**
- **Real-world scenario**: Filesystem corruption can happen due to power
  failure, hardware issues, etc. A mount attempt on a corrupted
  filesystem should not crash the kernel.

### 6. STABILITY INDICATORS

- **Reviewed-by: Zhang Yi** and **Reviewed-by: Jan Kara** (Jan Kara is a
  well-known ext4/filesystem developer)
- **Committed by: Theodore Ts'o** (ext4 maintainer)
- The fix is obviously correct — it's a simple initialization order fix

### 7. DEPENDENCY CHECK

This commit is self-contained. It only reorders existing function calls
in `__ext4_fill_super()`. The functions `ext4_percpu_param_init()` and
`ext4_mb_init()` both exist in stable trees. No additional commits are
needed.

### Summary

This is a textbook stable backport candidate:
- **Fixes a real crash** (kernel panic on mount with corrupted
  filesystem)
- **Small, surgical fix** (just reordering initialization + adjusting
  error cleanup)
- **No new features** — purely defensive initialization ordering
- **Low risk** — the change is obviously correct
- **Important subsystem** — ext4 is used by millions of systems
- **Reviewed by subsystem experts** including Jan Kara and committed by
  Ted Ts'o
- **Meets all stable kernel rules**: obviously correct, fixes real bug,
  small scope, tested

**YES**

 fs/ext4/super.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 87205660c5d02..5c2e931d8a533 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -5604,6 +5604,10 @@ static int __ext4_fill_super(struct fs_context *fc, struct super_block *sb)
 			clear_opt2(sb, MB_OPTIMIZE_SCAN);
 	}

+	err = ext4_percpu_param_init(sbi);
+	if (err)
+		goto failed_mount5;
+
 	err = ext4_mb_init(sb);
 	if (err) {
 		ext4_msg(sb, KERN_ERR, "failed to initialize mballoc (%d)",
@@ -5619,10 +5623,6 @@ static int __ext4_fill_super(struct fs_context *fc, struct super_block *sb)
 		sbi->s_journal->j_commit_callback =
 			ext4_journal_commit_callback;

-	err = ext4_percpu_param_init(sbi);
-	if (err)
-		goto failed_mount6;
-
 	if (ext4_has_feature_flex_bg(sb))
 		if (!ext4_fill_flex_info(sb)) {
 			ext4_msg(sb, KERN_ERR,
@@ -5704,8 +5704,8 @@ failed_mount8: __maybe_unused
 failed_mount6:
 	ext4_mb_release(sb);
 	ext4_flex_groups_free(sbi);
-	ext4_percpu_param_destroy(sbi);
 failed_mount5:
+	ext4_percpu_param_destroy(sbi);
 	ext4_ext_release(sb);
 	ext4_release_system_zone(sb);
 failed_mount4a:
-- 
2.51.0

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH AUTOSEL 6.19-5.15] ext4: mark group add fast-commit ineligible
       [not found] <20260214212452.782265-1-sashal@kernel.org>
  2026-02-14 21:23 ` [PATCH AUTOSEL 6.19-6.12] ext4: use reserved metadata blocks when splitting extent on endio Sasha Levin
  2026-02-14 21:23 ` [PATCH AUTOSEL 6.19-6.6] ext4: move ext4_percpu_param_init() before ext4_mb_init() Sasha Levin
@ 2026-02-14 21:23 ` Sasha Levin
  2026-02-14 21:23 ` [PATCH AUTOSEL 6.19-6.12] ext4: propagate flags to convert_initialized_extent() Sasha Levin
  2026-02-14 21:23 ` [PATCH AUTOSEL 6.19-5.15] ext4: mark group extend fast-commit ineligible Sasha Levin
  4 siblings, 0 replies; 5+ messages in thread
From: Sasha Levin @ 2026-02-14 21:23 UTC (permalink / raw)
  To: patches, stable
  Cc: Li Chen, Theodore Ts'o, Sasha Levin, adilger.kernel,
	linux-ext4

From: Li Chen <me@linux.beauty>

[ Upstream commit 89b4336fd5ec78f51f9d3a1d100f3ffa3228e604 ]

Fast commits only log operations that have dedicated replay support.
Online resize via EXT4_IOC_GROUP_ADD updates the superblock and group
descriptor metadata without going through the fast commit tracking
paths.
In practice these operations are rare and usually followed by further
updates, but mixing them into a fast commit makes the overall
semantics harder to reason about and risks replay gaps if new call
sites appear.

Teach ext4 to mark the filesystem fast-commit ineligible when
ext4_ioctl_group_add() adds new block groups.
This forces those transactions to fall back to a full commit,
ensuring that the filesystem geometry updates are captured by the
normal journal rather than partially encoded in fast commit TLVs.
This change should not affect common workloads but makes online
resize via GROUP_ADD safer and easier to reason about under fast
commit.

Testing:
1. prepare:
    dd if=/dev/zero of=/root/fc_resize.img bs=1M count=0 seek=256
    mkfs.ext4 -O fast_commit -F /root/fc_resize.img
    mkdir -p /mnt/fc_resize && mount -t ext4 -o loop /root/fc_resize.img /mnt/fc_resize
2. Ran a helper that issues EXT4_IOC_GROUP_ADD on the mounted
   filesystem and checked the resize ineligible reason:
    ./group_add_helper /mnt/fc_resize
    cat /proc/fs/ext4/loop0/fc_info
   shows "Resize": > 0.
3. Fsynced a file on the resized filesystem and verified that the fast
   commit stats report at least one ineligible commit:
    touch /mnt/fc_resize/file
    /root/fsync_file /mnt/fc_resize/file
    sync
    cat /proc/fs/ext4/loop0/fc_info
   shows fc stats ineligible > 0.

Signed-off-by: Li Chen <me@linux.beauty>
Link: https://patch.msgid.link/20251211115146.897420-5-me@linux.beauty
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

So this is part of a series by the same author. There are three resize
paths:
1. `EXT4_IOC_RESIZE_FS` (already had `ext4_fc_mark_ineligible` from the
   original fast commit implementation)
2. `EXT4_IOC_GROUP_EXTEND` (fixed by `1f8dd813a1c77` - the companion
   commit)
3. `EXT4_IOC_GROUP_ADD` (fixed by this commit under review)

Both this commit and `1f8dd813a1c77` are from the same patch series
(patches 5 and 6 of the series based on the msgid link). They're
independent fixes to two different ioctl paths.

### 7. DEPENDENCY CHECK

This commit depends on:
- `EXT4_FC_REASON_RESIZE` existing in `fast_commit.h` — this was added
  in `aa75f4d3daaeb` (5.10 era, "ext4: main fast-commit commit path")
- The `ext4_fc_mark_ineligible()` API accepting `(sb, reason, NULL)` —
  the NULL handle variant was introduced in `e85c81ba8859a` which went
  to stable

The fast commit feature itself was added in Linux 5.10, so this fix
applies to 5.10+ stable trees. The API with 3 arguments (sb, reason,
handle) was introduced in `e85c81ba8859a` which was 5.17-era and was
already tagged Cc: stable. So the function signature should be available
in 5.15+ stable trees at minimum.

### SUMMARY

**What the commit fixes**: A missing fast-commit ineligibility marking
in the `EXT4_IOC_GROUP_ADD` resize path. Without this, filesystem
geometry changes from GROUP_ADD could be mixed with fast commits,
leading to potential filesystem inconsistency after crash recovery
because the fast commit replay has no dedicated handler for resize
operations.

**Severity**: Medium-high. While the scenario requires specific
conditions (fast_commit enabled + GROUP_ADD resize + crash timing), the
consequence is filesystem corruption/inconsistency, which is a data
integrity issue.

**Risk**: Extremely low. Single line addition that mirrors existing code
in the same file. The only effect is forcing a full journal commit
instead of fast commit during GROUP_ADD, which is the correct and safe
behavior. No regression possible.

**Meets stable criteria**:
- Obviously correct: YES (mirrors existing pattern)
- Fixes a real bug: YES (potential filesystem corruption on crash)
- Small and contained: YES (1 line)
- No new features: YES
- Already tested: YES

**Concern**: This is one of a pair of patches (the other being
`1f8dd813a1c77` for GROUP_EXTEND). Both should be backported together
for completeness, but each is independently valuable — they fix
different ioctl paths.

**YES**

 fs/ext4/ioctl.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/ext4/ioctl.c b/fs/ext4/ioctl.c
index 7ce0fc40aec2f..5109b005e0286 100644
--- a/fs/ext4/ioctl.c
+++ b/fs/ext4/ioctl.c
@@ -966,6 +966,7 @@ static long ext4_ioctl_group_add(struct file *file,

 	err = ext4_group_add(sb, input);
 	if (EXT4_SB(sb)->s_journal) {
+		ext4_fc_mark_ineligible(sb, EXT4_FC_REASON_RESIZE, NULL);
 		jbd2_journal_lock_updates(EXT4_SB(sb)->s_journal);
 		err2 = jbd2_journal_flush(EXT4_SB(sb)->s_journal, 0);
 		jbd2_journal_unlock_updates(EXT4_SB(sb)->s_journal);
-- 
2.51.0

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH AUTOSEL 6.19-6.12] ext4: propagate flags to convert_initialized_extent()
       [not found] <20260214212452.782265-1-sashal@kernel.org>
                   ` (2 preceding siblings ...)
  2026-02-14 21:23 ` [PATCH AUTOSEL 6.19-5.15] ext4: mark group add fast-commit ineligible Sasha Levin
@ 2026-02-14 21:23 ` Sasha Levin
  2026-02-14 21:23 ` [PATCH AUTOSEL 6.19-5.15] ext4: mark group extend fast-commit ineligible Sasha Levin
  4 siblings, 0 replies; 5+ messages in thread
From: Sasha Levin @ 2026-02-14 21:23 UTC (permalink / raw)
  To: patches, stable
  Cc: Ojaswin Mujoo, Zhang Yi, Jan Kara, Theodore Ts'o, Sasha Levin,
	adilger.kernel, linux-ext4

From: Ojaswin Mujoo <ojaswin@linux.ibm.com>

[ Upstream commit 3fffa44b6ebf65be92a562a5063303979385a1c9 ]

Currently, ext4_zero_range passes EXT4_EX_NOCACHE flag to avoid caching
extents however this is not respected by convert_initialized_extent().
Hence, modify it to accept flags from the caller and to pass the flags
on to other extent manipulation functions it calls. This makes
sure the NOCACHE flag is respected throughout the code path.

Also, we no longer explicitly pass CONVERT_UNWRITTEN as the caller takes
care of this.

Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Link: https://patch.msgid.link/07008fbb14db727fddcaf4c30e2346c49f6c8fe0.1769149131.git.ojaswin@linux.ibm.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

### 4. Deeper Analysis: Is This Just Performance or Correctness?

The investigation reveals important nuances:

**EXT4_EX_NOCACHE** prevents populating the extent status (ES) tree
cache during operations like `ext4_zero_range`. The purpose is stated in
the code: "we shouldn't be caching the extents when reading from the
extent tree while a truncate or punch hole operation is in progress."

The reason is **correctness, not just performance**: During operations
that modify the extent tree (like zero_range, which splits/converts
extents), caching intermediate extent states can leave **stale entries**
in the ES tree. Recent fixes (commits `6d882ea3b0931` and
`c2b3575dfcc7f` from Nov 2025, both Cc: stable) explicitly address data
corruption from stale extent cache entries during these exact
operations.

**However**, the specific code path in this commit:

1. `ext4_find_extent()` with flags=0 instead of flags with NOCACHE -
   this means extents get cached when they shouldn't be during
   `convert_initialized_extent()`
2. `ext4_split_convert_extents()` was getting hardcoded
   `EXT4_GET_BLOCKS_CONVERT_UNWRITTEN` instead of the caller's flags
   (which include both `CONVERT_UNWRITTEN` and `NOCACHE`)

### 5. Risk and Scope Assessment

**Changes are small and surgical:**
- Only 1 file changed (fs/ext4/extents.c)
- Function signature change: adding `int flags` parameter
- 3 call sites modified: passing `flags` instead of hardcoded values
- Net code change is minimal

**Risk of regression:** LOW
- The flags being propagated are the same flags the caller already has
- The hardcoded `EXT4_GET_BLOCKS_CONVERT_UNWRITTEN` was already included
  in the `flags` from the caller (the call site check `flags &
  EXT4_GET_BLOCKS_CONVERT_UNWRITTEN` confirms this)
- The main behavioral change is that `EXT4_EX_NOCACHE` now reaches these
  inner functions

**Subsystem impact:** HIGH - ext4 is the most widely used Linux
filesystem

### 6. Does It Fix a Real Bug?

Yes. While the commit message is understated, the actual bug is:
- `ext4_zero_range` deliberately sets `EXT4_EX_NOCACHE` to prevent stale
  extent cache entries during extent tree modifications
- But `convert_initialized_extent()` was dropping this flag, leading to
  extents being cached during the modification operation
- This can cause stale extent cache entries, which related commits have
  proven can lead to **data corruption**

The commit message says "makes sure the NOCACHE flag is respected
throughout the code path" - this is a correctness fix for extent cache
consistency.

### 7. Dependency Check

The commit is self-contained. It modifies a static function and its
single call site within the same file. No external dependencies.

### 8. Summary

| Criterion | Assessment |
|-----------|-----------|
| Fixes real bug | YES - stale extent cache during zero_range can cause
corruption |
| Obviously correct | YES - propagating caller flags is the right
pattern |
| Small and contained | YES - minimal changes to one file |
| Risk of regression | LOW - only adds flag propagation that was already
intended |
| Subsystem importance | HIGH - ext4 filesystem |
| User impact | MEDIUM-HIGH - affects anyone using fallocate/zero_range
on ext4 |

This commit fixes a real correctness issue in ext4's extent cache
management during `ext4_zero_range` operations. The NOCACHE flag was
being dropped, which could lead to stale extent cache entries - a class
of bug that has been directly linked to data corruption in related
recent fixes. The change is small, surgical, obviously correct, and
carries minimal regression risk.

**YES**

 fs/ext4/extents.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index f1322f64071ff..4e97d1e7213f3 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -3818,6 +3818,7 @@ static struct ext4_ext_path *
 convert_initialized_extent(handle_t *handle, struct inode *inode,
 			   struct ext4_map_blocks *map,
 			   struct ext4_ext_path *path,
+			   int flags,
 			   unsigned int *allocated)
 {
 	struct ext4_extent *ex;
@@ -3843,11 +3844,11 @@ convert_initialized_extent(handle_t *handle, struct inode *inode,

 	if (ee_block != map->m_lblk || ee_len > map->m_len) {
 		path = ext4_split_convert_extents(handle, inode, map, path,
-				EXT4_GET_BLOCKS_CONVERT_UNWRITTEN, NULL);
+						  flags, NULL);
 		if (IS_ERR(path))
 			return path;

-		path = ext4_find_extent(inode, map->m_lblk, path, 0);
+		path = ext4_find_extent(inode, map->m_lblk, path, flags);
 		if (IS_ERR(path))
 			return path;
 		depth = ext_depth(inode);
@@ -4259,7 +4260,7 @@ int ext4_ext_map_blocks(handle_t *handle, struct inode *inode,
 			if ((!ext4_ext_is_unwritten(ex)) &&
 			    (flags & EXT4_GET_BLOCKS_CONVERT_UNWRITTEN)) {
 				path = convert_initialized_extent(handle,
-					inode, map, path, &allocated);
+					inode, map, path, flags, &allocated);
 				if (IS_ERR(path))
 					err = PTR_ERR(path);
 				goto out;
-- 
2.51.0

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH AUTOSEL 6.19-5.15] ext4: mark group extend fast-commit ineligible
       [not found] <20260214212452.782265-1-sashal@kernel.org>
                   ` (3 preceding siblings ...)
  2026-02-14 21:23 ` [PATCH AUTOSEL 6.19-6.12] ext4: propagate flags to convert_initialized_extent() Sasha Levin
@ 2026-02-14 21:23 ` Sasha Levin
  4 siblings, 0 replies; 5+ messages in thread
From: Sasha Levin @ 2026-02-14 21:23 UTC (permalink / raw)
  To: patches, stable
  Cc: Li Chen, Theodore Ts'o, Sasha Levin, adilger.kernel,
	linux-ext4

From: Li Chen <me@linux.beauty>

[ Upstream commit 1f8dd813a1c771b13c303f73d876164bc9b327cc ]

Fast commits only log operations that have dedicated replay support.
EXT4_IOC_GROUP_EXTEND grows the filesystem to the end of the last
block group and updates the same on-disk metadata without going
through the fast commit tracking paths.
In practice these operations are rare and usually followed by further
updates, but mixing them into a fast commit makes the overall
semantics harder to reason about and risks replay gaps if new call
sites appear.

Teach ext4 to mark the filesystem fast-commit ineligible when
EXT4_IOC_GROUP_EXTEND grows the filesystem.
This forces those transactions to fall back to a full commit,
ensuring that the group extension changes are captured by the normal
journal rather than partially encoded in fast commit TLVs.
This change should not affect common workloads but makes online
resize via GROUP_EXTEND safer and easier to reason about under fast
commit.

Testing:
1. prepare:
    dd if=/dev/zero of=/root/fc_resize.img bs=1M count=0 seek=256
    mkfs.ext4 -O fast_commit -F /root/fc_resize.img
    mkdir -p /mnt/fc_resize && mount -t ext4 -o loop /root/fc_resize.img /mnt/fc_resize
2. Extended the filesystem to the end of the last block group using a
   helper that calls EXT4_IOC_GROUP_EXTEND on the mounted filesystem
   and checked fc_info:
    ./group_extend_helper /mnt/fc_resize
    cat /proc/fs/ext4/loop0/fc_info
   shows the "Resize" ineligible reason increased.
3. Fsynced a file on the resized filesystem and confirmed that the fast
   commit ineligible counter incremented for the resize transaction:
    touch /mnt/fc_resize/file
    /root/fsync_file /mnt/fc_resize/file
    sync
    cat /proc/fs/ext4/loop0/fc_info

Signed-off-by: Li Chen <me@linux.beauty>
Link: https://patch.msgid.link/20251211115146.897420-6-me@linux.beauty
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

### 4. DETAILED FINDINGS

This is extremely revealing. The investigation shows:

**A real gap exists:** `EXT4_IOC_RESIZE_FS` already has
`ext4_fc_mark_ineligible(sb, EXT4_FC_REASON_RESIZE, NULL)`, but
`EXT4_IOC_GROUP_EXTEND` (and `EXT4_IOC_GROUP_ADD` in a sibling commit)
do **not**. This is an inconsistency — all three ioctls perform
filesystem geometry changes that are not supported by fast commit
replay.

**No replay support for geometry changes:** The fast commit replay logic
only handles file-level operations (add_range, del_range, link, unlink,
creat, inode). There are NO replay tags for superblock or group
descriptor changes. If `GROUP_EXTEND` runs and a fast commit is used
instead of a full journal commit, the geometry changes would not be
replayed on crash recovery.

**This is a data integrity issue:** If a fast commit transaction
includes both file operations (which are tracked) and geometry changes
from `GROUP_EXTEND` (which are untracked), crash recovery could replay
the file operations against stale filesystem geometry, leading to silent
metadata inconsistency or corruption.

### 5. CLASSIFICATION

This is a **correctness/data integrity fix**, not a feature addition. It
closes a gap where filesystem metadata changes could be lost or
inconsistently replayed after a crash. The `EXT4_FC_REASON_RESIZE` enum
value already exists — only the missing call site is added.

### 6. SCOPE AND RISK

- **Size:** 2 lines added. Minimal.
- **Risk:** Extremely low. The only effect is forcing a full journal
  commit instead of a fast commit during `GROUP_EXTEND`. Full commits
  are always correct; fast commits are the optimization. This is a
  fallback to the safe path.
- **Subsystem:** ext4 filesystem — widely used, data integrity matters
  enormously.
- **Could it break anything?** No. A full commit is strictly more
  conservative than a fast commit. The worst case is a minor performance
  impact during online resize, which is an infrequent operation.

### 7. USER IMPACT

- **Who is affected?** Anyone using ext4 with fast_commit enabled who
  performs online resize via `GROUP_EXTEND`. While this is not a common
  operation, when it does happen, data integrity on crash recovery is
  critical.
- **Severity if triggered:** Without this fix, crash recovery after
  `GROUP_EXTEND` + fast commit could result in filesystem metadata
  inconsistency — potentially data corruption.
- **Practical likelihood:** Low frequency, but high severity when it
  occurs.

### 8. DEPENDENCY CHECK

- The `EXT4_FC_REASON_RESIZE` enum already exists in stable trees (it
  was added when `RESIZE_FS` was given this marking).
- The `ext4_fc_mark_ineligible()` function already exists.
- This is a self-contained, standalone fix with no dependencies on other
  commits.
- There is a sibling commit for `GROUP_ADD` that would be good to
  backport together, but each stands alone.

### 9. STABLE KERNEL RULES ASSESSMENT

| Criterion | Assessment |
|-----------|-----------|
| Obviously correct and tested | Yes — mirrors existing pattern for
RESIZE_FS, tested by author |
| Fixes a real bug | Yes — missing fast-commit ineligibility marking can
lead to incomplete crash recovery |
| Important issue | Yes — data integrity / potential corruption on crash
recovery |
| Small and contained | Yes — 2 lines in one file |
| No new features | Correct — uses existing infrastructure |
| Applies cleanly | Very likely — minimal change to stable code |

### 10. RISK vs BENEFIT

- **Risk:** Near zero. Forces a safe fallback path for a rare operation.
- **Benefit:** Prevents potential filesystem metadata inconsistency
  after crash during online resize with fast commit enabled.

### CONCLUSION

This is a small, surgical, correctness fix for a real gap in ext4's fast
commit mechanism. The `RESIZE_FS` ioctl already had this protection, but
`GROUP_EXTEND` was missed. The fix prevents potential data integrity
issues during crash recovery. It uses existing infrastructure
(`EXT4_FC_REASON_RESIZE` and `ext4_fc_mark_ineligible()`), is self-
contained, and has effectively zero regression risk. It meets all stable
kernel criteria.

**YES**

 fs/ext4/ioctl.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/ext4/ioctl.c b/fs/ext4/ioctl.c
index 5109b005e0286..e5e197ac7d88b 100644
--- a/fs/ext4/ioctl.c
+++ b/fs/ext4/ioctl.c
@@ -1612,6 +1612,8 @@ static long __ext4_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)

 		err = ext4_group_extend(sb, EXT4_SB(sb)->s_es, n_blocks_count);
 		if (EXT4_SB(sb)->s_journal) {
+			ext4_fc_mark_ineligible(sb, EXT4_FC_REASON_RESIZE,
+						NULL);
 			jbd2_journal_lock_updates(EXT4_SB(sb)->s_journal);
 			err2 = jbd2_journal_flush(EXT4_SB(sb)->s_journal, 0);
 			jbd2_journal_unlock_updates(EXT4_SB(sb)->s_journal);
-- 
2.51.0

^ permalink raw reply related	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2026-02-14 21:27 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20260214212452.782265-1-sashal@kernel.org>
2026-02-14 21:23 ` [PATCH AUTOSEL 6.19-6.12] ext4: use reserved metadata blocks when splitting extent on endio Sasha Levin
2026-02-14 21:23 ` [PATCH AUTOSEL 6.19-6.6] ext4: move ext4_percpu_param_init() before ext4_mb_init() Sasha Levin
2026-02-14 21:23 ` [PATCH AUTOSEL 6.19-5.15] ext4: mark group add fast-commit ineligible Sasha Levin
2026-02-14 21:23 ` [PATCH AUTOSEL 6.19-6.12] ext4: propagate flags to convert_initialized_extent() Sasha Levin
2026-02-14 21:23 ` [PATCH AUTOSEL 6.19-5.15] ext4: mark group extend fast-commit ineligible Sasha Levin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox