From: Sasha Levin <sashal@kernel.org>
To: patches@lists.linux.dev, stable@vger.kernel.org
Cc: Zhang Yi <yi.zhang@huawei.com>, Jan Kara <jack@suse.cz>,
Baokun Li <libaokun1@huawei.com>,
Ojaswin Mujoo <ojaswin@linux.ibm.com>,
Theodore Ts'o <tytso@mit.edu>, Sasha Levin <sashal@kernel.org>,
adilger.kernel@dilger.ca, linux-ext4@vger.kernel.org
Subject: [PATCH AUTOSEL 6.19-6.12] ext4: use reserved metadata blocks when splitting extent on endio
Date: Sat, 14 Feb 2026 16:23:09 -0500 [thread overview]
Message-ID: <20260214212452.782265-44-sashal@kernel.org> (raw)
In-Reply-To: <20260214212452.782265-1-sashal@kernel.org>
From: Zhang Yi <yi.zhang@huawei.com>
[ Upstream commit 01942af95ab6c9d98e64ae01fdc243a03e4b973f ]
When performing buffered writes, we may need to split and convert an
unwritten extent into a written one during the end I/O process. However,
we do not reserve space specifically for these metadata changes, we only
reserve 2% of space or 4096 blocks. To address this, we use
EXT4_GET_BLOCKS_PRE_IO to potentially split extents in advance and
EXT4_GET_BLOCKS_METADATA_NOFAIL to utilize reserved space if necessary.
These two approaches can reduce the likelihood of running out of space
and losing data. However, these methods are merely best efforts, we
could still run out of space, and there is not much difference between
converting an extent during the writeback process and the end I/O
process, it won't increase the risk of losing data if we postpone the
conversion.
Therefore, also use EXT4_GET_BLOCKS_METADATA_NOFAIL in
ext4_convert_unwritten_extents_endio() to prepare for the buffered I/O
iomap conversion, which may perform extent conversion during the end I/O
process.
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Link: https://patch.msgid.link/20260105014522.1937690-2-yi.zhang@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
LLM Generated explanations, may be completely bogus:
The commit message says "to prepare for the buffered I/O iomap
conversion, which may perform extent conversion during the end I/O
process." This suggests it's preparation for a future change. However,
the bug exists **independently** — even in the current code path through
`ext4_ext_handle_unwritten_extents`, the flag is set at line 3908 but
the `ext4_convert_unwritten_extents_endio` function hardcodes only
`EXT4_GET_BLOCKS_CONVERT` when calling `ext4_split_convert_extents` at
line 3779-3780.
Wait — let me re-read the flow more carefully. When
`ext4_ext_handle_unwritten_extents()` calls
`ext4_convert_unwritten_extents_endio()`, the flags variable with
`METADATA_NOFAIL` is local to `ext4_ext_handle_unwritten_extents()`.
`ext4_convert_unwritten_extents_endio()` doesn't receive those flags as
a parameter — it constructs its own flags (`EXT4_GET_BLOCKS_CONVERT`)
internally at line 3780. So **the METADATA_NOFAIL flag is NOT
propagated** to the split operation inside
`ext4_convert_unwritten_extents_endio()`.
This is a real bug that exists in the current codebase, not just a
preparation for future code. The split operation during endio can fail
with ENOSPC because it doesn't use reserved metadata blocks.
### 3. Classification
**Bug fix**: Prevents potential data loss on near-full ext4 filesystems
when extent splitting is needed during endio. When the filesystem is
nearly full, the extent conversion can fail because it doesn't tap into
the reserved metadata pool. This failure at endio means written data may
appear as unwritten (zeroed), which is **data loss**.
### 4. Scope and Risk Assessment
- **Lines changed**: ~5 lines (adding one flag to an existing call)
- **Files changed**: 1 (fs/ext4/extents.c)
- **Risk**: Extremely low. The `EXT4_GET_BLOCKS_METADATA_NOFAIL` flag is
already used elsewhere in the same function's caller
(`ext4_ext_handle_unwritten_extents`). This just ensures the flag is
also used when the called function internally needs to split extents.
- **Subsystem**: ext4 — the most widely used Linux filesystem. Affects
everyone.
### 5. User Impact
- **Who**: Any user with an ext4 filesystem that is near-full performing
buffered writes where extent splitting is needed during endio
- **Severity**: Data loss — written data appears zeroed because the
extent remains marked as unwritten
- **Likelihood**: Increases as filesystem fills up; real-world scenario
on busy servers
### 6. Stability Indicators
- **Reviewed-by**: Jan Kara (ext4 co-maintainer), Baokun Li, Ojaswin
Mujoo — three reviewers
- **Committed-by**: Theodore Ts'o (ext4 maintainer)
- This level of review indicates high confidence in the fix
### 7. Dependency Check
The change is entirely self-contained. It only adds an existing flag
(`EXT4_GET_BLOCKS_METADATA_NOFAIL`) to an existing function call. No
dependencies on other commits. The affected code
(`ext4_convert_unwritten_extents_endio`) has been in the kernel for many
years and exists in all stable trees.
### Summary
This is a small, surgical fix for a real data loss scenario in ext4 —
the most widely used Linux filesystem. When the filesystem is near full,
extent conversion during endio can fail because it doesn't use the
reserved metadata block pool. The fix adds a single flag
(`EXT4_GET_BLOCKS_METADATA_NOFAIL`) that was already supposed to be
propagated but wasn't. It's been reviewed by three ext4 experts
including the subsystem maintainer. The risk is minimal and the benefit
is preventing data loss.
**YES**
fs/ext4/extents.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 2cf5759ba6894..f1322f64071ff 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -3770,6 +3770,8 @@ ext4_convert_unwritten_extents_endio(handle_t *handle, struct inode *inode,
* illegal.
*/
if (ee_block != map->m_lblk || ee_len > map->m_len) {
+ int flags = EXT4_GET_BLOCKS_CONVERT |
+ EXT4_GET_BLOCKS_METADATA_NOFAIL;
#ifdef CONFIG_EXT4_DEBUG
ext4_warning(inode->i_sb, "Inode (%ld) finished: extent logical block %llu,"
" len %u; IO logical block %llu, len %u",
@@ -3777,7 +3779,7 @@ ext4_convert_unwritten_extents_endio(handle_t *handle, struct inode *inode,
(unsigned long long)map->m_lblk, map->m_len);
#endif
path = ext4_split_convert_extents(handle, inode, map, path,
- EXT4_GET_BLOCKS_CONVERT, NULL);
+ flags, NULL);
if (IS_ERR(path))
return path;
--
2.51.0
next parent reply other threads:[~2026-02-14 21:26 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20260214212452.782265-1-sashal@kernel.org>
2026-02-14 21:23 ` Sasha Levin [this message]
2026-02-14 21:23 ` [PATCH AUTOSEL 6.19-6.6] ext4: move ext4_percpu_param_init() before ext4_mb_init() Sasha Levin
2026-02-14 21:23 ` [PATCH AUTOSEL 6.19-5.15] ext4: mark group add fast-commit ineligible Sasha Levin
2026-02-14 21:23 ` [PATCH AUTOSEL 6.19-6.12] ext4: propagate flags to convert_initialized_extent() Sasha Levin
2026-02-14 21:23 ` [PATCH AUTOSEL 6.19-5.15] ext4: mark group extend fast-commit ineligible Sasha Levin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260214212452.782265-44-sashal@kernel.org \
--to=sashal@kernel.org \
--cc=adilger.kernel@dilger.ca \
--cc=jack@suse.cz \
--cc=libaokun1@huawei.com \
--cc=linux-ext4@vger.kernel.org \
--cc=ojaswin@linux.ibm.com \
--cc=patches@lists.linux.dev \
--cc=stable@vger.kernel.org \
--cc=tytso@mit.edu \
--cc=yi.zhang@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox