* [PATCH v2 0/8] ext4: fix divide error in mb_update_avg_fragment_size()
@ 2023-12-21 15:05 Baokun Li
2023-12-21 15:05 ` [PATCH v2 1/8] ext4: fix double-free of blocks due to wrong extents moved_len Baokun Li
` (7 more replies)
0 siblings, 8 replies; 19+ messages in thread
From: Baokun Li @ 2023-12-21 15:05 UTC (permalink / raw)
To: linux-ext4
Cc: tytso, adilger.kernel, jack, ritesh.list, linux-kernel, yi.zhang,
yangerkun, yukuai3, libaokun1
V1->V2:
Fixed some things pointed out by Jan Kara.
Fixed more cases where blocks could be allocated from corrupted groups.
[V1]: https://lore.kernel.org/all/20231218141814.1477338-1-libaokun1@huawei.com/
Baokun Li (8):
ext4: fix double-free of blocks due to wrong extents moved_len
ext4: do not trim the group with corrupted block bitmap
ext4: regenerate buddy after block freeing failed if under fc replay
ext4: avoid bb_free and bb_fragments inconsistency in mb_free_blocks()
ext4: avoid dividing by 0 in mb_update_avg_fragment_size() when block
bitmap corrupt
ext4: avoid allocating blocks from corrupted group in
ext4_mb_try_best_found()
ext4: avoid allocating blocks from corrupted group in
ext4_mb_find_by_goal()
ext4: mark the group block bitmap as corrupted before reporting an
error
fs/ext4/mballoc.c | 76 +++++++++++++++++++++++++++++--------------
fs/ext4/move_extent.c | 6 ++--
2 files changed, 53 insertions(+), 29 deletions(-)
--
2.31.1
^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH v2 1/8] ext4: fix double-free of blocks due to wrong extents moved_len
2023-12-21 15:05 [PATCH v2 0/8] ext4: fix divide error in mb_update_avg_fragment_size() Baokun Li
@ 2023-12-21 15:05 ` Baokun Li
2024-01-04 10:27 ` Jan Kara
2023-12-21 15:05 ` [PATCH v2 2/8] ext4: do not trim the group with corrupted block bitmap Baokun Li
` (6 subsequent siblings)
7 siblings, 1 reply; 19+ messages in thread
From: Baokun Li @ 2023-12-21 15:05 UTC (permalink / raw)
To: linux-ext4
Cc: tytso, adilger.kernel, jack, ritesh.list, linux-kernel, yi.zhang,
yangerkun, yukuai3, libaokun1, Wei Chen, xingwei lee, stable
In ext4_move_extents(), moved_len is only updated when all moves are
successfully executed, and only discards orig_inode and donor_inode
preallocations when moved_len is not zero. When the loop fails to exit
after successfully moving some extents, moved_len is not updated and
remains at 0, so it does not discard the preallocations.
If the moved extents overlap with the preallocated extents, the
overlapped extents are freed twice in ext4_mb_release_inode_pa() and
ext4_process_freed_data() (as described in commit 94d7c16cbbbd ("ext4:
Fix double-free of blocks with EXT4_IOC_MOVE_EXT")), and bb_free is
incremented twice. Hence when trim is executed, a zero-division bug is
triggered in mb_update_avg_fragment_size() because bb_free is not zero
and bb_fragments is zero.
Therefore, update move_len after each extent move to avoid the issue.
Reported-by: Wei Chen <harperchen1110@gmail.com>
Reported-by: xingwei lee <xrivendell7@gmail.com>
Closes: https://lore.kernel.org/r/CAO4mrferzqBUnCag8R3m2zf897ts9UEuhjFQGPtODT92rYyR2Q@mail.gmail.com
Fixes: fcf6b1b729bc ("ext4: refactor ext4_move_extents code base")
CC: stable@vger.kernel.org # 3.18
Signed-off-by: Baokun Li <libaokun1@huawei.com>
---
fs/ext4/move_extent.c | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)
diff --git a/fs/ext4/move_extent.c b/fs/ext4/move_extent.c
index 3aa57376d9c2..391efa6d4c56 100644
--- a/fs/ext4/move_extent.c
+++ b/fs/ext4/move_extent.c
@@ -618,6 +618,7 @@ ext4_move_extents(struct file *o_filp, struct file *d_filp, __u64 orig_blk,
goto out;
o_end = o_start + len;
+ *moved_len = 0;
while (o_start < o_end) {
struct ext4_extent *ex;
ext4_lblk_t cur_blk, next_blk;
@@ -672,7 +673,7 @@ ext4_move_extents(struct file *o_filp, struct file *d_filp, __u64 orig_blk,
*/
ext4_double_up_write_data_sem(orig_inode, donor_inode);
/* Swap original branches with new branches */
- move_extent_per_page(o_filp, donor_inode,
+ *moved_len += move_extent_per_page(o_filp, donor_inode,
orig_page_index, donor_page_index,
offset_in_page, cur_len,
unwritten, &ret);
@@ -682,9 +683,6 @@ ext4_move_extents(struct file *o_filp, struct file *d_filp, __u64 orig_blk,
o_start += cur_len;
d_start += cur_len;
}
- *moved_len = o_start - orig_blk;
- if (*moved_len > len)
- *moved_len = len;
out:
if (*moved_len) {
--
2.31.1
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH v2 2/8] ext4: do not trim the group with corrupted block bitmap
2023-12-21 15:05 [PATCH v2 0/8] ext4: fix divide error in mb_update_avg_fragment_size() Baokun Li
2023-12-21 15:05 ` [PATCH v2 1/8] ext4: fix double-free of blocks due to wrong extents moved_len Baokun Li
@ 2023-12-21 15:05 ` Baokun Li
2023-12-21 15:05 ` [PATCH v2 3/8] ext4: regenerate buddy after block freeing failed if under fc replay Baokun Li
` (5 subsequent siblings)
7 siblings, 0 replies; 19+ messages in thread
From: Baokun Li @ 2023-12-21 15:05 UTC (permalink / raw)
To: linux-ext4
Cc: tytso, adilger.kernel, jack, ritesh.list, linux-kernel, yi.zhang,
yangerkun, yukuai3, libaokun1
Otherwise operating on an incorrupted block bitmap can lead to all sorts
of unknown problems.
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
---
fs/ext4/mballoc.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index d72b5e3c92ec..a95fa6e2b0f9 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -6757,6 +6757,9 @@ __releases(ext4_group_lock_ptr(sb, e4b->bd_group))
bool set_trimmed = false;
void *bitmap;
+ if (unlikely(EXT4_MB_GRP_BBITMAP_CORRUPT(e4b->bd_info)))
+ return 0;
+
bitmap = e4b->bd_bitmap;
if (start == 0 && max >= ext4_last_grp_cluster(sb, e4b->bd_group))
set_trimmed = true;
--
2.31.1
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH v2 3/8] ext4: regenerate buddy after block freeing failed if under fc replay
2023-12-21 15:05 [PATCH v2 0/8] ext4: fix divide error in mb_update_avg_fragment_size() Baokun Li
2023-12-21 15:05 ` [PATCH v2 1/8] ext4: fix double-free of blocks due to wrong extents moved_len Baokun Li
2023-12-21 15:05 ` [PATCH v2 2/8] ext4: do not trim the group with corrupted block bitmap Baokun Li
@ 2023-12-21 15:05 ` Baokun Li
2024-01-04 10:33 ` Jan Kara
2023-12-21 15:05 ` [PATCH v2 4/8] ext4: avoid bb_free and bb_fragments inconsistency in mb_free_blocks() Baokun Li
` (4 subsequent siblings)
7 siblings, 1 reply; 19+ messages in thread
From: Baokun Li @ 2023-12-21 15:05 UTC (permalink / raw)
To: linux-ext4
Cc: tytso, adilger.kernel, jack, ritesh.list, linux-kernel, yi.zhang,
yangerkun, yukuai3, libaokun1
This reverts [Fixes] under fast commit replay. When we are freeing blocks
that have already been freed, the buddy may be corrupted, and we need to
regenerate the buddy when the fast commit is being replayed in order to
avoid using an corrupted buddy, since it will not mark the group block
bitmap as corrupted at that point.
Reported-by: Jan Kara <jack@suse.cz>
Fixes: 6bd97bf273bd ("ext4: remove redundant mb_regenerate_buddy()")
Signed-off-by: Baokun Li <libaokun1@huawei.com>
---
fs/ext4/mballoc.c | 20 ++++++++++++++++++++
1 file changed, 20 insertions(+)
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index a95fa6e2b0f9..f6131ba514c8 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -1233,6 +1233,24 @@ void ext4_mb_generate_buddy(struct super_block *sb,
atomic64_add(period, &sbi->s_mb_generation_time);
}
+static void mb_regenerate_buddy(struct ext4_buddy *e4b)
+{
+ int count;
+ int order = 1;
+ void *buddy;
+
+ while ((buddy = mb_find_buddy(e4b, order++, &count)))
+ mb_set_bits(buddy, 0, count);
+
+ e4b->bd_info->bb_fragments = 0;
+ memset(e4b->bd_info->bb_counters, 0,
+ sizeof(*e4b->bd_info->bb_counters) *
+ (e4b->bd_sb->s_blocksize_bits + 2));
+
+ ext4_mb_generate_buddy(e4b->bd_sb, e4b->bd_buddy,
+ e4b->bd_bitmap, e4b->bd_group, e4b->bd_info);
+}
+
/* The buddy information is attached the buddy cache inode
* for convenience. The information regarding each group
* is loaded via ext4_mb_load_buddy. The information involve
@@ -1921,6 +1939,8 @@ static void mb_free_blocks(struct inode *inode, struct ext4_buddy *e4b,
ext4_mark_group_bitmap_corrupted(
sb, e4b->bd_group,
EXT4_GROUP_INFO_BBITMAP_CORRUPT);
+ } else {
+ mb_regenerate_buddy(e4b);
}
goto done;
}
--
2.31.1
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH v2 4/8] ext4: avoid bb_free and bb_fragments inconsistency in mb_free_blocks()
2023-12-21 15:05 [PATCH v2 0/8] ext4: fix divide error in mb_update_avg_fragment_size() Baokun Li
` (2 preceding siblings ...)
2023-12-21 15:05 ` [PATCH v2 3/8] ext4: regenerate buddy after block freeing failed if under fc replay Baokun Li
@ 2023-12-21 15:05 ` Baokun Li
2024-01-04 10:42 ` Jan Kara
2023-12-21 15:05 ` [PATCH v2 5/8] ext4: avoid dividing by 0 in mb_update_avg_fragment_size() when block bitmap corrupt Baokun Li
` (3 subsequent siblings)
7 siblings, 1 reply; 19+ messages in thread
From: Baokun Li @ 2023-12-21 15:05 UTC (permalink / raw)
To: linux-ext4
Cc: tytso, adilger.kernel, jack, ritesh.list, linux-kernel, yi.zhang,
yangerkun, yukuai3, libaokun1, stable
After updating bb_free in mb_free_blocks, it is possible to return without
updating bb_fragments because the block being freed is found to have
already been freed, which leads to inconsistency between bb_free and
bb_fragments.
Since the group may be unlocked in ext4_grp_locked_error(), this can lead
to problems such as dividing by zero when calculating the average fragment
length. Hence move the update of bb_free to after the block double-free
check guarantees that the corresponding statistics are updated only after
the core block bitmap is modified.
Fixes: eabe0444df90 ("ext4: speed-up releasing blocks on commit")
CC: stable@vger.kernel.org # 3.10
Signed-off-by: Baokun Li <libaokun1@huawei.com>
---
fs/ext4/mballoc.c | 15 ++++++++-------
1 file changed, 8 insertions(+), 7 deletions(-)
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index f6131ba514c8..1f15774971d7 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -1910,11 +1910,6 @@ static void mb_free_blocks(struct inode *inode, struct ext4_buddy *e4b,
mb_check_buddy(e4b);
mb_free_blocks_double(inode, e4b, first, count);
- this_cpu_inc(discard_pa_seq);
- e4b->bd_info->bb_free += count;
- if (first < e4b->bd_info->bb_first_free)
- e4b->bd_info->bb_first_free = first;
-
/* access memory sequentially: check left neighbour,
* clear range and then check right neighbour
*/
@@ -1941,10 +1936,16 @@ static void mb_free_blocks(struct inode *inode, struct ext4_buddy *e4b,
EXT4_GROUP_INFO_BBITMAP_CORRUPT);
} else {
mb_regenerate_buddy(e4b);
+ goto check;
}
- goto done;
+ return;
}
+ this_cpu_inc(discard_pa_seq);
+ e4b->bd_info->bb_free += count;
+ if (first < e4b->bd_info->bb_first_free)
+ e4b->bd_info->bb_first_free = first;
+
/* let's maintain fragments counter */
if (left_is_free && right_is_free)
e4b->bd_info->bb_fragments--;
@@ -1969,9 +1970,9 @@ static void mb_free_blocks(struct inode *inode, struct ext4_buddy *e4b,
if (first <= last)
mb_buddy_mark_free(e4b, first >> 1, last >> 1);
-done:
mb_set_largest_free_order(sb, e4b->bd_info);
mb_update_avg_fragment_size(sb, e4b->bd_info);
+check:
mb_check_buddy(e4b);
}
--
2.31.1
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH v2 5/8] ext4: avoid dividing by 0 in mb_update_avg_fragment_size() when block bitmap corrupt
2023-12-21 15:05 [PATCH v2 0/8] ext4: fix divide error in mb_update_avg_fragment_size() Baokun Li
` (3 preceding siblings ...)
2023-12-21 15:05 ` [PATCH v2 4/8] ext4: avoid bb_free and bb_fragments inconsistency in mb_free_blocks() Baokun Li
@ 2023-12-21 15:05 ` Baokun Li
2024-01-04 10:43 ` Jan Kara
2023-12-21 15:05 ` [PATCH v2 6/8] ext4: avoid allocating blocks from corrupted group in ext4_mb_try_best_found() Baokun Li
` (2 subsequent siblings)
7 siblings, 1 reply; 19+ messages in thread
From: Baokun Li @ 2023-12-21 15:05 UTC (permalink / raw)
To: linux-ext4
Cc: tytso, adilger.kernel, jack, ritesh.list, linux-kernel, yi.zhang,
yangerkun, yukuai3, libaokun1
Determine if bb_fragments is 0 instead of determining bb_free to eliminate
the risk of dividing by zero when the block bitmap is corrupted.
Signed-off-by: Baokun Li <libaokun1@huawei.com>
---
fs/ext4/mballoc.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 1f15774971d7..03500aec43ac 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -842,7 +842,7 @@ mb_update_avg_fragment_size(struct super_block *sb, struct ext4_group_info *grp)
struct ext4_sb_info *sbi = EXT4_SB(sb);
int new_order;
- if (!test_opt2(sb, MB_OPTIMIZE_SCAN) || grp->bb_free == 0)
+ if (!test_opt2(sb, MB_OPTIMIZE_SCAN) || grp->bb_fragments == 0)
return;
new_order = mb_avg_fragment_size_order(sb,
--
2.31.1
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH v2 6/8] ext4: avoid allocating blocks from corrupted group in ext4_mb_try_best_found()
2023-12-21 15:05 [PATCH v2 0/8] ext4: fix divide error in mb_update_avg_fragment_size() Baokun Li
` (4 preceding siblings ...)
2023-12-21 15:05 ` [PATCH v2 5/8] ext4: avoid dividing by 0 in mb_update_avg_fragment_size() when block bitmap corrupt Baokun Li
@ 2023-12-21 15:05 ` Baokun Li
2024-01-04 10:45 ` Jan Kara
2023-12-21 15:05 ` [PATCH v2 7/8] ext4: avoid allocating blocks from corrupted group in ext4_mb_find_by_goal() Baokun Li
2023-12-21 15:05 ` [PATCH v2 8/8] ext4: mark the group block bitmap as corrupted before reporting an error Baokun Li
7 siblings, 1 reply; 19+ messages in thread
From: Baokun Li @ 2023-12-21 15:05 UTC (permalink / raw)
To: linux-ext4
Cc: tytso, adilger.kernel, jack, ritesh.list, linux-kernel, yi.zhang,
yangerkun, yukuai3, libaokun1
Determine if the group block bitmap is corrupted before using ac_b_ex in
ext4_mb_try_best_found() to avoid allocating blocks from a group with a
corrupted block bitmap in the following concurrency and making the
situation worse.
ext4_mb_regular_allocator
ext4_lock_group(sb, group)
ext4_mb_good_group
// check if the group bbitmap is corrupted
ext4_mb_complex_scan_group
// Scan group gets ac_b_ex but doesn't use it
ext4_unlock_group(sb, group)
ext4_mark_group_bitmap_corrupted(group)
// The block bitmap was corrupted during
// the group unlock gap.
ext4_mb_try_best_found
ext4_lock_group(ac->ac_sb, group)
ext4_mb_use_best_found
mb_mark_used
// Allocating blocks in block bitmap corrupted group
Signed-off-by: Baokun Li <libaokun1@huawei.com>
---
fs/ext4/mballoc.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 03500aec43ac..2bb29f0077bd 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -2303,6 +2303,9 @@ void ext4_mb_try_best_found(struct ext4_allocation_context *ac,
return;
ext4_lock_group(ac->ac_sb, group);
+ if (unlikely(EXT4_MB_GRP_BBITMAP_CORRUPT(e4b->bd_info)))
+ goto out;
+
max = mb_find_extent(e4b, ex.fe_start, ex.fe_len, &ex);
if (max > 0) {
@@ -2310,6 +2313,7 @@ void ext4_mb_try_best_found(struct ext4_allocation_context *ac,
ext4_mb_use_best_found(ac, e4b);
}
+out:
ext4_unlock_group(ac->ac_sb, group);
ext4_mb_unload_buddy(e4b);
}
--
2.31.1
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH v2 7/8] ext4: avoid allocating blocks from corrupted group in ext4_mb_find_by_goal()
2023-12-21 15:05 [PATCH v2 0/8] ext4: fix divide error in mb_update_avg_fragment_size() Baokun Li
` (5 preceding siblings ...)
2023-12-21 15:05 ` [PATCH v2 6/8] ext4: avoid allocating blocks from corrupted group in ext4_mb_try_best_found() Baokun Li
@ 2023-12-21 15:05 ` Baokun Li
2024-01-04 10:47 ` Jan Kara
2023-12-21 15:05 ` [PATCH v2 8/8] ext4: mark the group block bitmap as corrupted before reporting an error Baokun Li
7 siblings, 1 reply; 19+ messages in thread
From: Baokun Li @ 2023-12-21 15:05 UTC (permalink / raw)
To: linux-ext4
Cc: tytso, adilger.kernel, jack, ritesh.list, linux-kernel, yi.zhang,
yangerkun, yukuai3, libaokun1
Places the logic for checking if the group's block bitmap is corrupt under
the protection of the group lock to avoid allocating blocks from the group
with a corrupted block bitmap.
Signed-off-by: Baokun Li <libaokun1@huawei.com>
---
fs/ext4/mballoc.c | 9 ++++-----
1 file changed, 4 insertions(+), 5 deletions(-)
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 2bb29f0077bd..b862ca2750fd 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -2340,12 +2340,10 @@ int ext4_mb_find_by_goal(struct ext4_allocation_context *ac,
if (err)
return err;
- if (unlikely(EXT4_MB_GRP_BBITMAP_CORRUPT(e4b->bd_info))) {
- ext4_mb_unload_buddy(e4b);
- return 0;
- }
-
ext4_lock_group(ac->ac_sb, group);
+ if (unlikely(EXT4_MB_GRP_BBITMAP_CORRUPT(e4b->bd_info)))
+ goto out;
+
max = mb_find_extent(e4b, ac->ac_g_ex.fe_start,
ac->ac_g_ex.fe_len, &ex);
ex.fe_logical = 0xDEADFA11; /* debug value */
@@ -2378,6 +2376,7 @@ int ext4_mb_find_by_goal(struct ext4_allocation_context *ac,
ac->ac_b_ex = ex;
ext4_mb_use_best_found(ac, e4b);
}
+out:
ext4_unlock_group(ac->ac_sb, group);
ext4_mb_unload_buddy(e4b);
--
2.31.1
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH v2 8/8] ext4: mark the group block bitmap as corrupted before reporting an error
2023-12-21 15:05 [PATCH v2 0/8] ext4: fix divide error in mb_update_avg_fragment_size() Baokun Li
` (6 preceding siblings ...)
2023-12-21 15:05 ` [PATCH v2 7/8] ext4: avoid allocating blocks from corrupted group in ext4_mb_find_by_goal() Baokun Li
@ 2023-12-21 15:05 ` Baokun Li
2024-01-04 10:51 ` Jan Kara
7 siblings, 1 reply; 19+ messages in thread
From: Baokun Li @ 2023-12-21 15:05 UTC (permalink / raw)
To: linux-ext4
Cc: tytso, adilger.kernel, jack, ritesh.list, linux-kernel, yi.zhang,
yangerkun, yukuai3, libaokun1
Otherwise unlocking the group in ext4_grp_locked_error may allow other
processes to modify the core block bitmap that is known to be corrupt.
Signed-off-by: Baokun Li <libaokun1@huawei.com>
---
fs/ext4/mballoc.c | 23 +++++++++++------------
1 file changed, 11 insertions(+), 12 deletions(-)
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index b862ca2750fd..c43eefebdaa3 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -564,14 +564,14 @@ static void mb_free_blocks_double(struct inode *inode, struct ext4_buddy *e4b,
blocknr = ext4_group_first_block_no(sb, e4b->bd_group);
blocknr += EXT4_C2B(EXT4_SB(sb), first + i);
+ ext4_mark_group_bitmap_corrupted(sb, e4b->bd_group,
+ EXT4_GROUP_INFO_BBITMAP_CORRUPT);
ext4_grp_locked_error(sb, e4b->bd_group,
inode ? inode->i_ino : 0,
blocknr,
"freeing block already freed "
"(bit %u)",
first + i);
- ext4_mark_group_bitmap_corrupted(sb, e4b->bd_group,
- EXT4_GROUP_INFO_BBITMAP_CORRUPT);
}
mb_clear_bit(first + i, e4b->bd_info->bb_bitmap);
}
@@ -1926,14 +1926,13 @@ static void mb_free_blocks(struct inode *inode, struct ext4_buddy *e4b,
blocknr = ext4_group_first_block_no(sb, e4b->bd_group);
blocknr += EXT4_C2B(sbi, block);
if (!(sbi->s_mount_state & EXT4_FC_REPLAY)) {
+ ext4_mark_group_bitmap_corrupted(sb, e4b->bd_group,
+ EXT4_GROUP_INFO_BBITMAP_CORRUPT);
ext4_grp_locked_error(sb, e4b->bd_group,
inode ? inode->i_ino : 0,
blocknr,
"freeing already freed block (bit %u); block bitmap corrupt.",
block);
- ext4_mark_group_bitmap_corrupted(
- sb, e4b->bd_group,
- EXT4_GROUP_INFO_BBITMAP_CORRUPT);
} else {
mb_regenerate_buddy(e4b);
goto check;
@@ -2410,12 +2409,12 @@ void ext4_mb_simple_scan_group(struct ext4_allocation_context *ac,
k = mb_find_next_zero_bit(buddy, max, 0);
if (k >= max) {
+ ext4_mark_group_bitmap_corrupted(ac->ac_sb,
+ e4b->bd_group,
+ EXT4_GROUP_INFO_BBITMAP_CORRUPT);
ext4_grp_locked_error(ac->ac_sb, e4b->bd_group, 0, 0,
"%d free clusters of order %d. But found 0",
grp->bb_counters[i], i);
- ext4_mark_group_bitmap_corrupted(ac->ac_sb,
- e4b->bd_group,
- EXT4_GROUP_INFO_BBITMAP_CORRUPT);
break;
}
ac->ac_found++;
@@ -2466,12 +2465,12 @@ void ext4_mb_complex_scan_group(struct ext4_allocation_context *ac,
* free blocks even though group info says we
* have free blocks
*/
+ ext4_mark_group_bitmap_corrupted(sb, e4b->bd_group,
+ EXT4_GROUP_INFO_BBITMAP_CORRUPT);
ext4_grp_locked_error(sb, e4b->bd_group, 0, 0,
"%d free clusters as per "
"group info. But bitmap says 0",
free);
- ext4_mark_group_bitmap_corrupted(sb, e4b->bd_group,
- EXT4_GROUP_INFO_BBITMAP_CORRUPT);
break;
}
@@ -2497,12 +2496,12 @@ void ext4_mb_complex_scan_group(struct ext4_allocation_context *ac,
if (WARN_ON(ex.fe_len <= 0))
break;
if (free < ex.fe_len) {
+ ext4_mark_group_bitmap_corrupted(sb, e4b->bd_group,
+ EXT4_GROUP_INFO_BBITMAP_CORRUPT);
ext4_grp_locked_error(sb, e4b->bd_group, 0, 0,
"%d free clusters as per "
"group info. But got %d blocks",
free, ex.fe_len);
- ext4_mark_group_bitmap_corrupted(sb, e4b->bd_group,
- EXT4_GROUP_INFO_BBITMAP_CORRUPT);
/*
* The number of free blocks differs. This mostly
* indicate that the bitmap is corrupt. So exit
--
2.31.1
^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [PATCH v2 1/8] ext4: fix double-free of blocks due to wrong extents moved_len
2023-12-21 15:05 ` [PATCH v2 1/8] ext4: fix double-free of blocks due to wrong extents moved_len Baokun Li
@ 2024-01-04 10:27 ` Jan Kara
0 siblings, 0 replies; 19+ messages in thread
From: Jan Kara @ 2024-01-04 10:27 UTC (permalink / raw)
To: Baokun Li
Cc: linux-ext4, tytso, adilger.kernel, jack, ritesh.list,
linux-kernel, yi.zhang, yangerkun, yukuai3, Wei Chen, xingwei lee,
stable
On Thu 21-12-23 23:05:51, Baokun Li wrote:
> In ext4_move_extents(), moved_len is only updated when all moves are
> successfully executed, and only discards orig_inode and donor_inode
> preallocations when moved_len is not zero. When the loop fails to exit
> after successfully moving some extents, moved_len is not updated and
> remains at 0, so it does not discard the preallocations.
>
> If the moved extents overlap with the preallocated extents, the
> overlapped extents are freed twice in ext4_mb_release_inode_pa() and
> ext4_process_freed_data() (as described in commit 94d7c16cbbbd ("ext4:
> Fix double-free of blocks with EXT4_IOC_MOVE_EXT")), and bb_free is
> incremented twice. Hence when trim is executed, a zero-division bug is
> triggered in mb_update_avg_fragment_size() because bb_free is not zero
> and bb_fragments is zero.
>
> Therefore, update move_len after each extent move to avoid the issue.
>
> Reported-by: Wei Chen <harperchen1110@gmail.com>
> Reported-by: xingwei lee <xrivendell7@gmail.com>
> Closes: https://lore.kernel.org/r/CAO4mrferzqBUnCag8R3m2zf897ts9UEuhjFQGPtODT92rYyR2Q@mail.gmail.com
> Fixes: fcf6b1b729bc ("ext4: refactor ext4_move_extents code base")
> CC: stable@vger.kernel.org # 3.18
> Signed-off-by: Baokun Li <libaokun1@huawei.com>
Looks good! Feel free to add:
Reviewed-by: Jan Kara <jack@suse.cz>
Honza
> ---
> fs/ext4/move_extent.c | 6 ++----
> 1 file changed, 2 insertions(+), 4 deletions(-)
>
> diff --git a/fs/ext4/move_extent.c b/fs/ext4/move_extent.c
> index 3aa57376d9c2..391efa6d4c56 100644
> --- a/fs/ext4/move_extent.c
> +++ b/fs/ext4/move_extent.c
> @@ -618,6 +618,7 @@ ext4_move_extents(struct file *o_filp, struct file *d_filp, __u64 orig_blk,
> goto out;
> o_end = o_start + len;
>
> + *moved_len = 0;
> while (o_start < o_end) {
> struct ext4_extent *ex;
> ext4_lblk_t cur_blk, next_blk;
> @@ -672,7 +673,7 @@ ext4_move_extents(struct file *o_filp, struct file *d_filp, __u64 orig_blk,
> */
> ext4_double_up_write_data_sem(orig_inode, donor_inode);
> /* Swap original branches with new branches */
> - move_extent_per_page(o_filp, donor_inode,
> + *moved_len += move_extent_per_page(o_filp, donor_inode,
> orig_page_index, donor_page_index,
> offset_in_page, cur_len,
> unwritten, &ret);
> @@ -682,9 +683,6 @@ ext4_move_extents(struct file *o_filp, struct file *d_filp, __u64 orig_blk,
> o_start += cur_len;
> d_start += cur_len;
> }
> - *moved_len = o_start - orig_blk;
> - if (*moved_len > len)
> - *moved_len = len;
>
> out:
> if (*moved_len) {
> --
> 2.31.1
>
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v2 3/8] ext4: regenerate buddy after block freeing failed if under fc replay
2023-12-21 15:05 ` [PATCH v2 3/8] ext4: regenerate buddy after block freeing failed if under fc replay Baokun Li
@ 2024-01-04 10:33 ` Jan Kara
2024-01-04 11:31 ` Baokun Li
0 siblings, 1 reply; 19+ messages in thread
From: Jan Kara @ 2024-01-04 10:33 UTC (permalink / raw)
To: Baokun Li
Cc: linux-ext4, tytso, adilger.kernel, jack, ritesh.list,
linux-kernel, yi.zhang, yangerkun, yukuai3
On Thu 21-12-23 23:05:53, Baokun Li wrote:
> This reverts [Fixes] under fast commit replay. When we are freeing blocks
> that have already been freed, the buddy may be corrupted, and we need to
> regenerate the buddy when the fast commit is being replayed in order to
> avoid using an corrupted buddy, since it will not mark the group block
> bitmap as corrupted at that point.
I'd rephrase the changelog as:
This mostly reverts commit 6bd97bf273bd ("ext4: remove redundant
mb_regenerate_buddy()") and reintroduces mb_regenerate_buddy(). Based on
code in mb_free_blocks(), fast commit replay can end up marking as free
blocks that are already marked as such. This causes corruption of the
buddy bitmap so we need to regenerate it in that case.
Otherwise the patch looks good to me so feel free to add:
Reviewed-by: Jan Kara <jack@suse.cz>
Honza
>
> Reported-by: Jan Kara <jack@suse.cz>
> Fixes: 6bd97bf273bd ("ext4: remove redundant mb_regenerate_buddy()")
> Signed-off-by: Baokun Li <libaokun1@huawei.com>
> ---
> fs/ext4/mballoc.c | 20 ++++++++++++++++++++
> 1 file changed, 20 insertions(+)
>
> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> index a95fa6e2b0f9..f6131ba514c8 100644
> --- a/fs/ext4/mballoc.c
> +++ b/fs/ext4/mballoc.c
> @@ -1233,6 +1233,24 @@ void ext4_mb_generate_buddy(struct super_block *sb,
> atomic64_add(period, &sbi->s_mb_generation_time);
> }
>
> +static void mb_regenerate_buddy(struct ext4_buddy *e4b)
> +{
> + int count;
> + int order = 1;
> + void *buddy;
> +
> + while ((buddy = mb_find_buddy(e4b, order++, &count)))
> + mb_set_bits(buddy, 0, count);
> +
> + e4b->bd_info->bb_fragments = 0;
> + memset(e4b->bd_info->bb_counters, 0,
> + sizeof(*e4b->bd_info->bb_counters) *
> + (e4b->bd_sb->s_blocksize_bits + 2));
> +
> + ext4_mb_generate_buddy(e4b->bd_sb, e4b->bd_buddy,
> + e4b->bd_bitmap, e4b->bd_group, e4b->bd_info);
> +}
> +
> /* The buddy information is attached the buddy cache inode
> * for convenience. The information regarding each group
> * is loaded via ext4_mb_load_buddy. The information involve
> @@ -1921,6 +1939,8 @@ static void mb_free_blocks(struct inode *inode, struct ext4_buddy *e4b,
> ext4_mark_group_bitmap_corrupted(
> sb, e4b->bd_group,
> EXT4_GROUP_INFO_BBITMAP_CORRUPT);
> + } else {
> + mb_regenerate_buddy(e4b);
> }
> goto done;
> }
> --
> 2.31.1
>
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v2 4/8] ext4: avoid bb_free and bb_fragments inconsistency in mb_free_blocks()
2023-12-21 15:05 ` [PATCH v2 4/8] ext4: avoid bb_free and bb_fragments inconsistency in mb_free_blocks() Baokun Li
@ 2024-01-04 10:42 ` Jan Kara
2024-01-04 11:43 ` Baokun Li
0 siblings, 1 reply; 19+ messages in thread
From: Jan Kara @ 2024-01-04 10:42 UTC (permalink / raw)
To: Baokun Li
Cc: linux-ext4, tytso, adilger.kernel, jack, ritesh.list,
linux-kernel, yi.zhang, yangerkun, yukuai3, stable
On Thu 21-12-23 23:05:54, Baokun Li wrote:
> After updating bb_free in mb_free_blocks, it is possible to return without
> updating bb_fragments because the block being freed is found to have
> already been freed, which leads to inconsistency between bb_free and
> bb_fragments.
>
> Since the group may be unlocked in ext4_grp_locked_error(), this can lead
> to problems such as dividing by zero when calculating the average fragment
> length. Hence move the update of bb_free to after the block double-free
> check guarantees that the corresponding statistics are updated only after
> the core block bitmap is modified.
>
> Fixes: eabe0444df90 ("ext4: speed-up releasing blocks on commit")
> CC: stable@vger.kernel.org # 3.10
> Signed-off-by: Baokun Li <libaokun1@huawei.com>
Just one nit below but regardless of that feel free to add:
Reviewed-by: Jan Kara <jack@suse.cz>
> @@ -1941,10 +1936,16 @@ static void mb_free_blocks(struct inode *inode, struct ext4_buddy *e4b,
> EXT4_GROUP_INFO_BBITMAP_CORRUPT);
> } else {
> mb_regenerate_buddy(e4b);
> + goto check;
> }
> - goto done;
> + return;
> }
I think this might be more readable when we revert the condition like:
/*
* Fastcommit replay can free already freed blocks which
* corrupts allocation info. Regenerate it.
*/
if (sbi->s_mount_state & EXT4_FC_REPLAY) {
mb_regenerate_buddy(e4b);
goto check;
}
ext4_grp_locked_error(sb, e4b->bd_group,
inode ? inode->i_ino : 0, blocknr,
"freeing already freed block (bit %u); block bitmap corrupt.",
block);
ext4_mark_group_bitmap_corrupted(sb, e4b->bd_group,
EXT4_GROUP_INFO_BBITMAP_CORRUPT);
return;
}
Honza
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v2 5/8] ext4: avoid dividing by 0 in mb_update_avg_fragment_size() when block bitmap corrupt
2023-12-21 15:05 ` [PATCH v2 5/8] ext4: avoid dividing by 0 in mb_update_avg_fragment_size() when block bitmap corrupt Baokun Li
@ 2024-01-04 10:43 ` Jan Kara
0 siblings, 0 replies; 19+ messages in thread
From: Jan Kara @ 2024-01-04 10:43 UTC (permalink / raw)
To: Baokun Li
Cc: linux-ext4, tytso, adilger.kernel, jack, ritesh.list,
linux-kernel, yi.zhang, yangerkun, yukuai3
On Thu 21-12-23 23:05:55, Baokun Li wrote:
> Determine if bb_fragments is 0 instead of determining bb_free to eliminate
> the risk of dividing by zero when the block bitmap is corrupted.
>
> Signed-off-by: Baokun Li <libaokun1@huawei.com>
Looks good. Feel free to add:
Reviewed-by: Jan Kara <jack@suse.cz>
Honza
> ---
> fs/ext4/mballoc.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> index 1f15774971d7..03500aec43ac 100644
> --- a/fs/ext4/mballoc.c
> +++ b/fs/ext4/mballoc.c
> @@ -842,7 +842,7 @@ mb_update_avg_fragment_size(struct super_block *sb, struct ext4_group_info *grp)
> struct ext4_sb_info *sbi = EXT4_SB(sb);
> int new_order;
>
> - if (!test_opt2(sb, MB_OPTIMIZE_SCAN) || grp->bb_free == 0)
> + if (!test_opt2(sb, MB_OPTIMIZE_SCAN) || grp->bb_fragments == 0)
> return;
>
> new_order = mb_avg_fragment_size_order(sb,
> --
> 2.31.1
>
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v2 6/8] ext4: avoid allocating blocks from corrupted group in ext4_mb_try_best_found()
2023-12-21 15:05 ` [PATCH v2 6/8] ext4: avoid allocating blocks from corrupted group in ext4_mb_try_best_found() Baokun Li
@ 2024-01-04 10:45 ` Jan Kara
0 siblings, 0 replies; 19+ messages in thread
From: Jan Kara @ 2024-01-04 10:45 UTC (permalink / raw)
To: Baokun Li
Cc: linux-ext4, tytso, adilger.kernel, jack, ritesh.list,
linux-kernel, yi.zhang, yangerkun, yukuai3
On Thu 21-12-23 23:05:56, Baokun Li wrote:
> Determine if the group block bitmap is corrupted before using ac_b_ex in
> ext4_mb_try_best_found() to avoid allocating blocks from a group with a
> corrupted block bitmap in the following concurrency and making the
> situation worse.
>
> ext4_mb_regular_allocator
> ext4_lock_group(sb, group)
> ext4_mb_good_group
> // check if the group bbitmap is corrupted
> ext4_mb_complex_scan_group
> // Scan group gets ac_b_ex but doesn't use it
> ext4_unlock_group(sb, group)
> ext4_mark_group_bitmap_corrupted(group)
> // The block bitmap was corrupted during
> // the group unlock gap.
> ext4_mb_try_best_found
> ext4_lock_group(ac->ac_sb, group)
> ext4_mb_use_best_found
> mb_mark_used
> // Allocating blocks in block bitmap corrupted group
>
> Signed-off-by: Baokun Li <libaokun1@huawei.com>
Looks good. Feel free to add:
Reviewed-by: Jan Kara <jack@suse.cz>
Honza
> ---
> fs/ext4/mballoc.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> index 03500aec43ac..2bb29f0077bd 100644
> --- a/fs/ext4/mballoc.c
> +++ b/fs/ext4/mballoc.c
> @@ -2303,6 +2303,9 @@ void ext4_mb_try_best_found(struct ext4_allocation_context *ac,
> return;
>
> ext4_lock_group(ac->ac_sb, group);
> + if (unlikely(EXT4_MB_GRP_BBITMAP_CORRUPT(e4b->bd_info)))
> + goto out;
> +
> max = mb_find_extent(e4b, ex.fe_start, ex.fe_len, &ex);
>
> if (max > 0) {
> @@ -2310,6 +2313,7 @@ void ext4_mb_try_best_found(struct ext4_allocation_context *ac,
> ext4_mb_use_best_found(ac, e4b);
> }
>
> +out:
> ext4_unlock_group(ac->ac_sb, group);
> ext4_mb_unload_buddy(e4b);
> }
> --
> 2.31.1
>
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v2 7/8] ext4: avoid allocating blocks from corrupted group in ext4_mb_find_by_goal()
2023-12-21 15:05 ` [PATCH v2 7/8] ext4: avoid allocating blocks from corrupted group in ext4_mb_find_by_goal() Baokun Li
@ 2024-01-04 10:47 ` Jan Kara
0 siblings, 0 replies; 19+ messages in thread
From: Jan Kara @ 2024-01-04 10:47 UTC (permalink / raw)
To: Baokun Li
Cc: linux-ext4, tytso, adilger.kernel, jack, ritesh.list,
linux-kernel, yi.zhang, yangerkun, yukuai3
On Thu 21-12-23 23:05:57, Baokun Li wrote:
> Places the logic for checking if the group's block bitmap is corrupt under
> the protection of the group lock to avoid allocating blocks from the group
> with a corrupted block bitmap.
>
> Signed-off-by: Baokun Li <libaokun1@huawei.com>
Looks good. Feel free to add:
Reviewed-by: Jan Kara <jack@suse.cz>
Honza
> ---
> fs/ext4/mballoc.c | 9 ++++-----
> 1 file changed, 4 insertions(+), 5 deletions(-)
>
> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> index 2bb29f0077bd..b862ca2750fd 100644
> --- a/fs/ext4/mballoc.c
> +++ b/fs/ext4/mballoc.c
> @@ -2340,12 +2340,10 @@ int ext4_mb_find_by_goal(struct ext4_allocation_context *ac,
> if (err)
> return err;
>
> - if (unlikely(EXT4_MB_GRP_BBITMAP_CORRUPT(e4b->bd_info))) {
> - ext4_mb_unload_buddy(e4b);
> - return 0;
> - }
> -
> ext4_lock_group(ac->ac_sb, group);
> + if (unlikely(EXT4_MB_GRP_BBITMAP_CORRUPT(e4b->bd_info)))
> + goto out;
> +
> max = mb_find_extent(e4b, ac->ac_g_ex.fe_start,
> ac->ac_g_ex.fe_len, &ex);
> ex.fe_logical = 0xDEADFA11; /* debug value */
> @@ -2378,6 +2376,7 @@ int ext4_mb_find_by_goal(struct ext4_allocation_context *ac,
> ac->ac_b_ex = ex;
> ext4_mb_use_best_found(ac, e4b);
> }
> +out:
> ext4_unlock_group(ac->ac_sb, group);
> ext4_mb_unload_buddy(e4b);
>
> --
> 2.31.1
>
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v2 8/8] ext4: mark the group block bitmap as corrupted before reporting an error
2023-12-21 15:05 ` [PATCH v2 8/8] ext4: mark the group block bitmap as corrupted before reporting an error Baokun Li
@ 2024-01-04 10:51 ` Jan Kara
2024-01-04 12:14 ` Baokun Li
0 siblings, 1 reply; 19+ messages in thread
From: Jan Kara @ 2024-01-04 10:51 UTC (permalink / raw)
To: Baokun Li
Cc: linux-ext4, tytso, adilger.kernel, jack, ritesh.list,
linux-kernel, yi.zhang, yangerkun, yukuai3
On Thu 21-12-23 23:05:58, Baokun Li wrote:
> Otherwise unlocking the group in ext4_grp_locked_error may allow other
> processes to modify the core block bitmap that is known to be corrupt.
>
> Signed-off-by: Baokun Li <libaokun1@huawei.com>
I'm not opposed but I don't think this matters much.
ext4_grp_locked_error() unlocks the group only in errors=remount-ro case
these days and in that case we abort the journal so none of the changes
should make it to disk anyway. Anyway, in the name of defensive programming
feel free to add:
Reviewed-by: Jan Kara <jack@suse.cz>
:)
Honza
> ---
> fs/ext4/mballoc.c | 23 +++++++++++------------
> 1 file changed, 11 insertions(+), 12 deletions(-)
>
> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> index b862ca2750fd..c43eefebdaa3 100644
> --- a/fs/ext4/mballoc.c
> +++ b/fs/ext4/mballoc.c
> @@ -564,14 +564,14 @@ static void mb_free_blocks_double(struct inode *inode, struct ext4_buddy *e4b,
>
> blocknr = ext4_group_first_block_no(sb, e4b->bd_group);
> blocknr += EXT4_C2B(EXT4_SB(sb), first + i);
> + ext4_mark_group_bitmap_corrupted(sb, e4b->bd_group,
> + EXT4_GROUP_INFO_BBITMAP_CORRUPT);
> ext4_grp_locked_error(sb, e4b->bd_group,
> inode ? inode->i_ino : 0,
> blocknr,
> "freeing block already freed "
> "(bit %u)",
> first + i);
> - ext4_mark_group_bitmap_corrupted(sb, e4b->bd_group,
> - EXT4_GROUP_INFO_BBITMAP_CORRUPT);
> }
> mb_clear_bit(first + i, e4b->bd_info->bb_bitmap);
> }
> @@ -1926,14 +1926,13 @@ static void mb_free_blocks(struct inode *inode, struct ext4_buddy *e4b,
> blocknr = ext4_group_first_block_no(sb, e4b->bd_group);
> blocknr += EXT4_C2B(sbi, block);
> if (!(sbi->s_mount_state & EXT4_FC_REPLAY)) {
> + ext4_mark_group_bitmap_corrupted(sb, e4b->bd_group,
> + EXT4_GROUP_INFO_BBITMAP_CORRUPT);
> ext4_grp_locked_error(sb, e4b->bd_group,
> inode ? inode->i_ino : 0,
> blocknr,
> "freeing already freed block (bit %u); block bitmap corrupt.",
> block);
> - ext4_mark_group_bitmap_corrupted(
> - sb, e4b->bd_group,
> - EXT4_GROUP_INFO_BBITMAP_CORRUPT);
> } else {
> mb_regenerate_buddy(e4b);
> goto check;
> @@ -2410,12 +2409,12 @@ void ext4_mb_simple_scan_group(struct ext4_allocation_context *ac,
>
> k = mb_find_next_zero_bit(buddy, max, 0);
> if (k >= max) {
> + ext4_mark_group_bitmap_corrupted(ac->ac_sb,
> + e4b->bd_group,
> + EXT4_GROUP_INFO_BBITMAP_CORRUPT);
> ext4_grp_locked_error(ac->ac_sb, e4b->bd_group, 0, 0,
> "%d free clusters of order %d. But found 0",
> grp->bb_counters[i], i);
> - ext4_mark_group_bitmap_corrupted(ac->ac_sb,
> - e4b->bd_group,
> - EXT4_GROUP_INFO_BBITMAP_CORRUPT);
> break;
> }
> ac->ac_found++;
> @@ -2466,12 +2465,12 @@ void ext4_mb_complex_scan_group(struct ext4_allocation_context *ac,
> * free blocks even though group info says we
> * have free blocks
> */
> + ext4_mark_group_bitmap_corrupted(sb, e4b->bd_group,
> + EXT4_GROUP_INFO_BBITMAP_CORRUPT);
> ext4_grp_locked_error(sb, e4b->bd_group, 0, 0,
> "%d free clusters as per "
> "group info. But bitmap says 0",
> free);
> - ext4_mark_group_bitmap_corrupted(sb, e4b->bd_group,
> - EXT4_GROUP_INFO_BBITMAP_CORRUPT);
> break;
> }
>
> @@ -2497,12 +2496,12 @@ void ext4_mb_complex_scan_group(struct ext4_allocation_context *ac,
> if (WARN_ON(ex.fe_len <= 0))
> break;
> if (free < ex.fe_len) {
> + ext4_mark_group_bitmap_corrupted(sb, e4b->bd_group,
> + EXT4_GROUP_INFO_BBITMAP_CORRUPT);
> ext4_grp_locked_error(sb, e4b->bd_group, 0, 0,
> "%d free clusters as per "
> "group info. But got %d blocks",
> free, ex.fe_len);
> - ext4_mark_group_bitmap_corrupted(sb, e4b->bd_group,
> - EXT4_GROUP_INFO_BBITMAP_CORRUPT);
> /*
> * The number of free blocks differs. This mostly
> * indicate that the bitmap is corrupt. So exit
> --
> 2.31.1
>
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v2 3/8] ext4: regenerate buddy after block freeing failed if under fc replay
2024-01-04 10:33 ` Jan Kara
@ 2024-01-04 11:31 ` Baokun Li
0 siblings, 0 replies; 19+ messages in thread
From: Baokun Li @ 2024-01-04 11:31 UTC (permalink / raw)
To: Jan Kara
Cc: linux-ext4, tytso, adilger.kernel, ritesh.list, linux-kernel,
yi.zhang, yangerkun, yukuai3, Baokun Li
On 2024/1/4 18:33, Jan Kara wrote:
> On Thu 21-12-23 23:05:53, Baokun Li wrote:
>> This reverts [Fixes] under fast commit replay. When we are freeing blocks
>> that have already been freed, the buddy may be corrupted, and we need to
>> regenerate the buddy when the fast commit is being replayed in order to
>> avoid using an corrupted buddy, since it will not mark the group block
>> bitmap as corrupted at that point.
> I'd rephrase the changelog as:
>
> This mostly reverts commit 6bd97bf273bd ("ext4: remove redundant
> mb_regenerate_buddy()") and reintroduces mb_regenerate_buddy(). Based on
> code in mb_free_blocks(), fast commit replay can end up marking as free
> blocks that are already marked as such. This causes corruption of the
> buddy bitmap so we need to regenerate it in that case.
>
> Otherwise the patch looks good to me so feel free to add:
>
> Reviewed-by: Jan Kara <jack@suse.cz>
>
> Honza
Hello Honza, Happy New Year!
Thank you very much for your review!
This changelog looks a lot more relevant!
I will use this changelog in the next version.
>> Reported-by: Jan Kara <jack@suse.cz>
>> Fixes: 6bd97bf273bd ("ext4: remove redundant mb_regenerate_buddy()")
>> Signed-off-by: Baokun Li <libaokun1@huawei.com>
>> ---
>> fs/ext4/mballoc.c | 20 ++++++++++++++++++++
>> 1 file changed, 20 insertions(+)
>>
>> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
>> index a95fa6e2b0f9..f6131ba514c8 100644
>> --- a/fs/ext4/mballoc.c
>> +++ b/fs/ext4/mballoc.c
>> @@ -1233,6 +1233,24 @@ void ext4_mb_generate_buddy(struct super_block *sb,
>> atomic64_add(period, &sbi->s_mb_generation_time);
>> }
>>
>> +static void mb_regenerate_buddy(struct ext4_buddy *e4b)
>> +{
>> + int count;
>> + int order = 1;
>> + void *buddy;
>> +
>> + while ((buddy = mb_find_buddy(e4b, order++, &count)))
>> + mb_set_bits(buddy, 0, count);
>> +
>> + e4b->bd_info->bb_fragments = 0;
>> + memset(e4b->bd_info->bb_counters, 0,
>> + sizeof(*e4b->bd_info->bb_counters) *
>> + (e4b->bd_sb->s_blocksize_bits + 2));
>> +
>> + ext4_mb_generate_buddy(e4b->bd_sb, e4b->bd_buddy,
>> + e4b->bd_bitmap, e4b->bd_group, e4b->bd_info);
>> +}
>> +
>> /* The buddy information is attached the buddy cache inode
>> * for convenience. The information regarding each group
>> * is loaded via ext4_mb_load_buddy. The information involve
>> @@ -1921,6 +1939,8 @@ static void mb_free_blocks(struct inode *inode, struct ext4_buddy *e4b,
>> ext4_mark_group_bitmap_corrupted(
>> sb, e4b->bd_group,
>> EXT4_GROUP_INFO_BBITMAP_CORRUPT);
>> + } else {
>> + mb_regenerate_buddy(e4b);
>> }
>> goto done;
>> }
>> --
>> 2.31.1
>>
Thanks again!
--
With Best Regards,
Baokun Li
.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v2 4/8] ext4: avoid bb_free and bb_fragments inconsistency in mb_free_blocks()
2024-01-04 10:42 ` Jan Kara
@ 2024-01-04 11:43 ` Baokun Li
0 siblings, 0 replies; 19+ messages in thread
From: Baokun Li @ 2024-01-04 11:43 UTC (permalink / raw)
To: Jan Kara
Cc: linux-ext4, tytso, adilger.kernel, ritesh.list, linux-kernel,
yi.zhang, yangerkun, yukuai3, stable, Baokun Li
On 2024/1/4 18:42, Jan Kara wrote:
> On Thu 21-12-23 23:05:54, Baokun Li wrote:
>> After updating bb_free in mb_free_blocks, it is possible to return without
>> updating bb_fragments because the block being freed is found to have
>> already been freed, which leads to inconsistency between bb_free and
>> bb_fragments.
>>
>> Since the group may be unlocked in ext4_grp_locked_error(), this can lead
>> to problems such as dividing by zero when calculating the average fragment
>> length. Hence move the update of bb_free to after the block double-free
>> check guarantees that the corresponding statistics are updated only after
>> the core block bitmap is modified.
>>
>> Fixes: eabe0444df90 ("ext4: speed-up releasing blocks on commit")
>> CC: stable@vger.kernel.org # 3.10
>> Signed-off-by: Baokun Li <libaokun1@huawei.com>
> Just one nit below but regardless of that feel free to add:
>
> Reviewed-by: Jan Kara <jack@suse.cz>
>
>> @@ -1941,10 +1936,16 @@ static void mb_free_blocks(struct inode *inode, struct ext4_buddy *e4b,
>> EXT4_GROUP_INFO_BBITMAP_CORRUPT);
>> } else {
>> mb_regenerate_buddy(e4b);
>> + goto check;
>> }
>> - goto done;
>> + return;
>> }
> I think this might be more readable when we revert the condition like:
>
> /*
> * Fastcommit replay can free already freed blocks which
> * corrupts allocation info. Regenerate it.
> */
> if (sbi->s_mount_state & EXT4_FC_REPLAY) {
> mb_regenerate_buddy(e4b);
> goto check;
> }
> ext4_grp_locked_error(sb, e4b->bd_group,
> inode ? inode->i_ino : 0, blocknr,
> "freeing already freed block (bit %u); block bitmap corrupt.",
> block);
> ext4_mark_group_bitmap_corrupted(sb, e4b->bd_group,
> EXT4_GROUP_INFO_BBITMAP_CORRUPT);
> return;
> }
>
> Honza
Yes, it looks much clearer that way!
I will switch to it in the next version.
Thanks a lot!
--
With Best Regards,
Baokun Li
.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v2 8/8] ext4: mark the group block bitmap as corrupted before reporting an error
2024-01-04 10:51 ` Jan Kara
@ 2024-01-04 12:14 ` Baokun Li
0 siblings, 0 replies; 19+ messages in thread
From: Baokun Li @ 2024-01-04 12:14 UTC (permalink / raw)
To: Jan Kara
Cc: linux-ext4, tytso, adilger.kernel, ritesh.list, linux-kernel,
yi.zhang, yangerkun, yukuai3, Baokun Li
On 2024/1/4 18:51, Jan Kara wrote:
> On Thu 21-12-23 23:05:58, Baokun Li wrote:
>> Otherwise unlocking the group in ext4_grp_locked_error may allow other
>> processes to modify the core block bitmap that is known to be corrupt.
>>
>> Signed-off-by: Baokun Li <libaokun1@huawei.com>
> I'm not opposed but I don't think this matters much.
> ext4_grp_locked_error() unlocks the group only in errors=remount-ro case
> these days and in that case we abort the journal so none of the changes
> should make it to disk anyway. Anyway, in the name of defensive programming
> feel free to add:
>
> Reviewed-by: Jan Kara <jack@suse.cz>
>
> :)
>
> Honza
Thank you very much for your review!
Yes, the unlock gap here does not cause corrupted data to be written
to disk, which is why no issues have been reported here before.
My concern is that core block bitmap corruption may cause kernel
crash in some corners. That's why inode bitmap corruption is not a
concern here, since there is no core inode bitmap.
We encounter all sorts of hard-to-replicate kernel problems every
day, and hopefully by fixing these trivial little issues, we can reduce
the number of difficult problems that arise from all sorts of
coincidental stacking.
Cheers!😊
>> ---
>> fs/ext4/mballoc.c | 23 +++++++++++------------
>> 1 file changed, 11 insertions(+), 12 deletions(-)
>>
>> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
>> index b862ca2750fd..c43eefebdaa3 100644
>> --- a/fs/ext4/mballoc.c
>> +++ b/fs/ext4/mballoc.c
>> @@ -564,14 +564,14 @@ static void mb_free_blocks_double(struct inode *inode, struct ext4_buddy *e4b,
>>
>> blocknr = ext4_group_first_block_no(sb, e4b->bd_group);
>> blocknr += EXT4_C2B(EXT4_SB(sb), first + i);
>> + ext4_mark_group_bitmap_corrupted(sb, e4b->bd_group,
>> + EXT4_GROUP_INFO_BBITMAP_CORRUPT);
>> ext4_grp_locked_error(sb, e4b->bd_group,
>> inode ? inode->i_ino : 0,
>> blocknr,
>> "freeing block already freed "
>> "(bit %u)",
>> first + i);
>> - ext4_mark_group_bitmap_corrupted(sb, e4b->bd_group,
>> - EXT4_GROUP_INFO_BBITMAP_CORRUPT);
>> }
>> mb_clear_bit(first + i, e4b->bd_info->bb_bitmap);
>> }
>> @@ -1926,14 +1926,13 @@ static void mb_free_blocks(struct inode *inode, struct ext4_buddy *e4b,
>> blocknr = ext4_group_first_block_no(sb, e4b->bd_group);
>> blocknr += EXT4_C2B(sbi, block);
>> if (!(sbi->s_mount_state & EXT4_FC_REPLAY)) {
>> + ext4_mark_group_bitmap_corrupted(sb, e4b->bd_group,
>> + EXT4_GROUP_INFO_BBITMAP_CORRUPT);
>> ext4_grp_locked_error(sb, e4b->bd_group,
>> inode ? inode->i_ino : 0,
>> blocknr,
>> "freeing already freed block (bit %u); block bitmap corrupt.",
>> block);
>> - ext4_mark_group_bitmap_corrupted(
>> - sb, e4b->bd_group,
>> - EXT4_GROUP_INFO_BBITMAP_CORRUPT);
>> } else {
>> mb_regenerate_buddy(e4b);
>> goto check;
>> @@ -2410,12 +2409,12 @@ void ext4_mb_simple_scan_group(struct ext4_allocation_context *ac,
>>
>> k = mb_find_next_zero_bit(buddy, max, 0);
>> if (k >= max) {
>> + ext4_mark_group_bitmap_corrupted(ac->ac_sb,
>> + e4b->bd_group,
>> + EXT4_GROUP_INFO_BBITMAP_CORRUPT);
>> ext4_grp_locked_error(ac->ac_sb, e4b->bd_group, 0, 0,
>> "%d free clusters of order %d. But found 0",
>> grp->bb_counters[i], i);
>> - ext4_mark_group_bitmap_corrupted(ac->ac_sb,
>> - e4b->bd_group,
>> - EXT4_GROUP_INFO_BBITMAP_CORRUPT);
>> break;
>> }
>> ac->ac_found++;
>> @@ -2466,12 +2465,12 @@ void ext4_mb_complex_scan_group(struct ext4_allocation_context *ac,
>> * free blocks even though group info says we
>> * have free blocks
>> */
>> + ext4_mark_group_bitmap_corrupted(sb, e4b->bd_group,
>> + EXT4_GROUP_INFO_BBITMAP_CORRUPT);
>> ext4_grp_locked_error(sb, e4b->bd_group, 0, 0,
>> "%d free clusters as per "
>> "group info. But bitmap says 0",
>> free);
>> - ext4_mark_group_bitmap_corrupted(sb, e4b->bd_group,
>> - EXT4_GROUP_INFO_BBITMAP_CORRUPT);
>> break;
>> }
>>
>> @@ -2497,12 +2496,12 @@ void ext4_mb_complex_scan_group(struct ext4_allocation_context *ac,
>> if (WARN_ON(ex.fe_len <= 0))
>> break;
>> if (free < ex.fe_len) {
>> + ext4_mark_group_bitmap_corrupted(sb, e4b->bd_group,
>> + EXT4_GROUP_INFO_BBITMAP_CORRUPT);
>> ext4_grp_locked_error(sb, e4b->bd_group, 0, 0,
>> "%d free clusters as per "
>> "group info. But got %d blocks",
>> free, ex.fe_len);
>> - ext4_mark_group_bitmap_corrupted(sb, e4b->bd_group,
>> - EXT4_GROUP_INFO_BBITMAP_CORRUPT);
>> /*
>> * The number of free blocks differs. This mostly
>> * indicate that the bitmap is corrupt. So exit
>> --
>> 2.31.1
>>
>>
Thanks!
--
With Best Regards,
Baokun Li
.
^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2024-01-04 12:14 UTC | newest]
Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-12-21 15:05 [PATCH v2 0/8] ext4: fix divide error in mb_update_avg_fragment_size() Baokun Li
2023-12-21 15:05 ` [PATCH v2 1/8] ext4: fix double-free of blocks due to wrong extents moved_len Baokun Li
2024-01-04 10:27 ` Jan Kara
2023-12-21 15:05 ` [PATCH v2 2/8] ext4: do not trim the group with corrupted block bitmap Baokun Li
2023-12-21 15:05 ` [PATCH v2 3/8] ext4: regenerate buddy after block freeing failed if under fc replay Baokun Li
2024-01-04 10:33 ` Jan Kara
2024-01-04 11:31 ` Baokun Li
2023-12-21 15:05 ` [PATCH v2 4/8] ext4: avoid bb_free and bb_fragments inconsistency in mb_free_blocks() Baokun Li
2024-01-04 10:42 ` Jan Kara
2024-01-04 11:43 ` Baokun Li
2023-12-21 15:05 ` [PATCH v2 5/8] ext4: avoid dividing by 0 in mb_update_avg_fragment_size() when block bitmap corrupt Baokun Li
2024-01-04 10:43 ` Jan Kara
2023-12-21 15:05 ` [PATCH v2 6/8] ext4: avoid allocating blocks from corrupted group in ext4_mb_try_best_found() Baokun Li
2024-01-04 10:45 ` Jan Kara
2023-12-21 15:05 ` [PATCH v2 7/8] ext4: avoid allocating blocks from corrupted group in ext4_mb_find_by_goal() Baokun Li
2024-01-04 10:47 ` Jan Kara
2023-12-21 15:05 ` [PATCH v2 8/8] ext4: mark the group block bitmap as corrupted before reporting an error Baokun Li
2024-01-04 10:51 ` Jan Kara
2024-01-04 12:14 ` Baokun Li
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox