* [PATCH 0/4] ext4: fix divide error in mb_update_avg_fragment_size()
@ 2023-12-18 14:18 Baokun Li
2023-12-18 14:18 ` [PATCH 1/4] ext4: fix double-free of blocks due to wrong extents moved_len Baokun Li
` (3 more replies)
0 siblings, 4 replies; 15+ messages in thread
From: Baokun Li @ 2023-12-18 14:18 UTC (permalink / raw)
To: linux-ext4
Cc: tytso, adilger.kernel, jack, ritesh.list, linux-kernel, yi.zhang,
yangerkun, yukuai3, libaokun1
Baokun Li (4):
ext4: fix double-free of blocks due to wrong extents moved_len
ext4: do not trim the group with corrupted block bitmap
ext4: avoid bb_free and bb_fragments inconsistency in mb_free_blocks()
ext4: avoid dividing by 0 in mb_update_avg_fragment_size() when block
bitmap corrupt
fs/ext4/mballoc.c | 19 ++++++++++++-------
fs/ext4/move_extent.c | 3 +--
2 files changed, 13 insertions(+), 9 deletions(-)
--
2.31.1
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 1/4] ext4: fix double-free of blocks due to wrong extents moved_len
2023-12-18 14:18 [PATCH 0/4] ext4: fix divide error in mb_update_avg_fragment_size() Baokun Li
@ 2023-12-18 14:18 ` Baokun Li
2023-12-18 15:32 ` Jan Kara
2023-12-18 14:18 ` [PATCH 2/4] ext4: do not trim the group with corrupted block bitmap Baokun Li
` (2 subsequent siblings)
3 siblings, 1 reply; 15+ messages in thread
From: Baokun Li @ 2023-12-18 14:18 UTC (permalink / raw)
To: linux-ext4
Cc: tytso, adilger.kernel, jack, ritesh.list, linux-kernel, yi.zhang,
yangerkun, yukuai3, libaokun1, Wei Chen, xingwei lee, stable
In ext4_move_extents(), moved_len is only updated when all moves are
successfully executed, and only discards orig_inode and donor_inode
preallocations when moved_len is not zero. When the loop fails to exit
after successfully moving some extents, moved_len is not updated and
remains at 0, so it does not discard the preallocations.
If the moved extents overlap with the preallocated extents, the
overlapped extents are freed twice in ext4_mb_release_inode_pa() and
ext4_process_freed_data() (as described in commit 94d7c16cbbbd), and
bb_free is incremented twice. Hence when trim is executed, a zero-division
bug is triggered in mb_update_avg_fragment_size() because bb_free is not
zero and bb_fragments is zero.
Therefore, update move_len after each extent move to avoid the issue.
Reported-by: Wei Chen <harperchen1110@gmail.com>
Reported-by: xingwei lee <xrivendell7@gmail.com>
Closes: https://lore.kernel.org/r/CAO4mrferzqBUnCag8R3m2zf897ts9UEuhjFQGPtODT92rYyR2Q@mail.gmail.com
Fixes: fcf6b1b729bc ("ext4: refactor ext4_move_extents code base")
CC: stable@vger.kernel.org # 3.18
Signed-off-by: Baokun Li <libaokun1@huawei.com>
---
fs/ext4/move_extent.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/fs/ext4/move_extent.c b/fs/ext4/move_extent.c
index 3aa57376d9c2..4b9b503c6346 100644
--- a/fs/ext4/move_extent.c
+++ b/fs/ext4/move_extent.c
@@ -672,7 +672,7 @@ ext4_move_extents(struct file *o_filp, struct file *d_filp, __u64 orig_blk,
*/
ext4_double_up_write_data_sem(orig_inode, donor_inode);
/* Swap original branches with new branches */
- move_extent_per_page(o_filp, donor_inode,
+ *moved_len += move_extent_per_page(o_filp, donor_inode,
orig_page_index, donor_page_index,
offset_in_page, cur_len,
unwritten, &ret);
@@ -682,7 +682,6 @@ ext4_move_extents(struct file *o_filp, struct file *d_filp, __u64 orig_blk,
o_start += cur_len;
d_start += cur_len;
}
- *moved_len = o_start - orig_blk;
if (*moved_len > len)
*moved_len = len;
--
2.31.1
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH 2/4] ext4: do not trim the group with corrupted block bitmap
2023-12-18 14:18 [PATCH 0/4] ext4: fix divide error in mb_update_avg_fragment_size() Baokun Li
2023-12-18 14:18 ` [PATCH 1/4] ext4: fix double-free of blocks due to wrong extents moved_len Baokun Li
@ 2023-12-18 14:18 ` Baokun Li
2023-12-18 15:02 ` Jan Kara
2023-12-18 14:18 ` [PATCH 3/4] ext4: avoid bb_free and bb_fragments inconsistency in mb_free_blocks() Baokun Li
2023-12-18 14:18 ` [PATCH 4/4] ext4: avoid dividing by 0 in mb_update_avg_fragment_size() when block bitmap corrupt Baokun Li
3 siblings, 1 reply; 15+ messages in thread
From: Baokun Li @ 2023-12-18 14:18 UTC (permalink / raw)
To: linux-ext4
Cc: tytso, adilger.kernel, jack, ritesh.list, linux-kernel, yi.zhang,
yangerkun, yukuai3, libaokun1
Otherwise operating on an incorrupted block bitmap can lead to all sorts
of unknown problems.
Signed-off-by: Baokun Li <libaokun1@huawei.com>
---
fs/ext4/mballoc.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index d72b5e3c92ec..a95fa6e2b0f9 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -6757,6 +6757,9 @@ __releases(ext4_group_lock_ptr(sb, e4b->bd_group))
bool set_trimmed = false;
void *bitmap;
+ if (unlikely(EXT4_MB_GRP_BBITMAP_CORRUPT(e4b->bd_info)))
+ return 0;
+
bitmap = e4b->bd_bitmap;
if (start == 0 && max >= ext4_last_grp_cluster(sb, e4b->bd_group))
set_trimmed = true;
--
2.31.1
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH 3/4] ext4: avoid bb_free and bb_fragments inconsistency in mb_free_blocks()
2023-12-18 14:18 [PATCH 0/4] ext4: fix divide error in mb_update_avg_fragment_size() Baokun Li
2023-12-18 14:18 ` [PATCH 1/4] ext4: fix double-free of blocks due to wrong extents moved_len Baokun Li
2023-12-18 14:18 ` [PATCH 2/4] ext4: do not trim the group with corrupted block bitmap Baokun Li
@ 2023-12-18 14:18 ` Baokun Li
2023-12-18 15:14 ` Jan Kara
2023-12-18 14:18 ` [PATCH 4/4] ext4: avoid dividing by 0 in mb_update_avg_fragment_size() when block bitmap corrupt Baokun Li
3 siblings, 1 reply; 15+ messages in thread
From: Baokun Li @ 2023-12-18 14:18 UTC (permalink / raw)
To: linux-ext4
Cc: tytso, adilger.kernel, jack, ritesh.list, linux-kernel, yi.zhang,
yangerkun, yukuai3, libaokun1, stable
After updating bb_free in mb_free_blocks, it is possible to return without
updating bb_fragments because the block being freed is found to have
already been freed, which leads to inconsistency between bb_free and
bb_fragments.
Since the group may be unlocked in ext4_grp_locked_error(), this can lead
to problems such as dividing by zero when calculating the average fragment
length. Therefore, to ensure consistency, move the update of bb_free to
after the block double-free check.
Fixes: eabe0444df90 ("ext4: speed-up releasing blocks on commit")
CC: stable@vger.kernel.org # 3.10
Signed-off-by: Baokun Li <libaokun1@huawei.com>
---
fs/ext4/mballoc.c | 13 ++++++-------
1 file changed, 6 insertions(+), 7 deletions(-)
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index a95fa6e2b0f9..2fbee0f0f5c3 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -1892,11 +1892,6 @@ static void mb_free_blocks(struct inode *inode, struct ext4_buddy *e4b,
mb_check_buddy(e4b);
mb_free_blocks_double(inode, e4b, first, count);
- this_cpu_inc(discard_pa_seq);
- e4b->bd_info->bb_free += count;
- if (first < e4b->bd_info->bb_first_free)
- e4b->bd_info->bb_first_free = first;
-
/* access memory sequentially: check left neighbour,
* clear range and then check right neighbour
*/
@@ -1922,9 +1917,14 @@ static void mb_free_blocks(struct inode *inode, struct ext4_buddy *e4b,
sb, e4b->bd_group,
EXT4_GROUP_INFO_BBITMAP_CORRUPT);
}
- goto done;
+ return;
}
+ this_cpu_inc(discard_pa_seq);
+ e4b->bd_info->bb_free += count;
+ if (first < e4b->bd_info->bb_first_free)
+ e4b->bd_info->bb_first_free = first;
+
/* let's maintain fragments counter */
if (left_is_free && right_is_free)
e4b->bd_info->bb_fragments--;
@@ -1949,7 +1949,6 @@ static void mb_free_blocks(struct inode *inode, struct ext4_buddy *e4b,
if (first <= last)
mb_buddy_mark_free(e4b, first >> 1, last >> 1);
-done:
mb_set_largest_free_order(sb, e4b->bd_info);
mb_update_avg_fragment_size(sb, e4b->bd_info);
mb_check_buddy(e4b);
--
2.31.1
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH 4/4] ext4: avoid dividing by 0 in mb_update_avg_fragment_size() when block bitmap corrupt
2023-12-18 14:18 [PATCH 0/4] ext4: fix divide error in mb_update_avg_fragment_size() Baokun Li
` (2 preceding siblings ...)
2023-12-18 14:18 ` [PATCH 3/4] ext4: avoid bb_free and bb_fragments inconsistency in mb_free_blocks() Baokun Li
@ 2023-12-18 14:18 ` Baokun Li
2023-12-18 14:43 ` Jan Kara
3 siblings, 1 reply; 15+ messages in thread
From: Baokun Li @ 2023-12-18 14:18 UTC (permalink / raw)
To: linux-ext4
Cc: tytso, adilger.kernel, jack, ritesh.list, linux-kernel, yi.zhang,
yangerkun, yukuai3, libaokun1, stable
When bb_free is not 0 but bb_fragments is 0, return directly to avoid
system crash due to division by zero.
Fixes: 83e80a6e3543 ("ext4: use buckets for cr 1 block scan instead of rbtree")
CC: stable@vger.kernel.org
Signed-off-by: Baokun Li <libaokun1@huawei.com>
---
fs/ext4/mballoc.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 2fbee0f0f5c3..e2a167240335 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -845,6 +845,9 @@ mb_update_avg_fragment_size(struct super_block *sb, struct ext4_group_info *grp)
if (!test_opt2(sb, MB_OPTIMIZE_SCAN) || grp->bb_free == 0)
return;
+ if (unlikely(grp->bb_fragments == 0))
+ return;
+
new_order = mb_avg_fragment_size_order(sb,
grp->bb_free / grp->bb_fragments);
if (new_order == grp->bb_avg_fragment_size_order)
--
2.31.1
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH 4/4] ext4: avoid dividing by 0 in mb_update_avg_fragment_size() when block bitmap corrupt
2023-12-18 14:18 ` [PATCH 4/4] ext4: avoid dividing by 0 in mb_update_avg_fragment_size() when block bitmap corrupt Baokun Li
@ 2023-12-18 14:43 ` Jan Kara
2023-12-18 15:09 ` Jan Kara
0 siblings, 1 reply; 15+ messages in thread
From: Jan Kara @ 2023-12-18 14:43 UTC (permalink / raw)
To: Baokun Li
Cc: linux-ext4, tytso, adilger.kernel, jack, ritesh.list,
linux-kernel, yi.zhang, yangerkun, yukuai3, stable
On Mon 18-12-23 22:18:14, Baokun Li wrote:
> When bb_free is not 0 but bb_fragments is 0, return directly to avoid
> system crash due to division by zero.
How could this possibly happen? bb_fragments is the number of free space
extents and bb_free is the number of free blocks. No free space extents =>
no free blocks seems pretty obvious? You can see the logic in
ext4_mb_generate_buddy()...
Honza
>
> Fixes: 83e80a6e3543 ("ext4: use buckets for cr 1 block scan instead of rbtree")
> CC: stable@vger.kernel.org
> Signed-off-by: Baokun Li <libaokun1@huawei.com>
> ---
> fs/ext4/mballoc.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> index 2fbee0f0f5c3..e2a167240335 100644
> --- a/fs/ext4/mballoc.c
> +++ b/fs/ext4/mballoc.c
> @@ -845,6 +845,9 @@ mb_update_avg_fragment_size(struct super_block *sb, struct ext4_group_info *grp)
> if (!test_opt2(sb, MB_OPTIMIZE_SCAN) || grp->bb_free == 0)
> return;
>
> + if (unlikely(grp->bb_fragments == 0))
> + return;
> +
> new_order = mb_avg_fragment_size_order(sb,
> grp->bb_free / grp->bb_fragments);
> if (new_order == grp->bb_avg_fragment_size_order)
> --
> 2.31.1
>
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 2/4] ext4: do not trim the group with corrupted block bitmap
2023-12-18 14:18 ` [PATCH 2/4] ext4: do not trim the group with corrupted block bitmap Baokun Li
@ 2023-12-18 15:02 ` Jan Kara
0 siblings, 0 replies; 15+ messages in thread
From: Jan Kara @ 2023-12-18 15:02 UTC (permalink / raw)
To: Baokun Li
Cc: linux-ext4, tytso, adilger.kernel, jack, ritesh.list,
linux-kernel, yi.zhang, yangerkun, yukuai3
On Mon 18-12-23 22:18:12, Baokun Li wrote:
> Otherwise operating on an incorrupted block bitmap can lead to all sorts
> of unknown problems.
>
> Signed-off-by: Baokun Li <libaokun1@huawei.com>
I agree this is a safer option. Feel free to add:
Reviewed-by: Jan Kara <jack@suse.cz>
Honza
> ---
> fs/ext4/mballoc.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> index d72b5e3c92ec..a95fa6e2b0f9 100644
> --- a/fs/ext4/mballoc.c
> +++ b/fs/ext4/mballoc.c
> @@ -6757,6 +6757,9 @@ __releases(ext4_group_lock_ptr(sb, e4b->bd_group))
> bool set_trimmed = false;
> void *bitmap;
>
> + if (unlikely(EXT4_MB_GRP_BBITMAP_CORRUPT(e4b->bd_info)))
> + return 0;
> +
> bitmap = e4b->bd_bitmap;
> if (start == 0 && max >= ext4_last_grp_cluster(sb, e4b->bd_group))
> set_trimmed = true;
> --
> 2.31.1
>
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 4/4] ext4: avoid dividing by 0 in mb_update_avg_fragment_size() when block bitmap corrupt
2023-12-18 14:43 ` Jan Kara
@ 2023-12-18 15:09 ` Jan Kara
2023-12-19 8:02 ` Baokun Li
0 siblings, 1 reply; 15+ messages in thread
From: Jan Kara @ 2023-12-18 15:09 UTC (permalink / raw)
To: Baokun Li
Cc: linux-ext4, tytso, adilger.kernel, jack, ritesh.list,
linux-kernel, yi.zhang, yangerkun, yukuai3, stable
On Mon 18-12-23 15:43:42, Jan Kara wrote:
> On Mon 18-12-23 22:18:14, Baokun Li wrote:
> > When bb_free is not 0 but bb_fragments is 0, return directly to avoid
> > system crash due to division by zero.
>
> How could this possibly happen? bb_fragments is the number of free space
> extents and bb_free is the number of free blocks. No free space extents =>
> no free blocks seems pretty obvious? You can see the logic in
> ext4_mb_generate_buddy()...
Oh, I see. This is probably about "bitmap corrupted case". But still both
allocation and freeing of blocks shouldn't operate on bitmaps marked as
corrupted so this should not happen?
Honza
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 3/4] ext4: avoid bb_free and bb_fragments inconsistency in mb_free_blocks()
2023-12-18 14:18 ` [PATCH 3/4] ext4: avoid bb_free and bb_fragments inconsistency in mb_free_blocks() Baokun Li
@ 2023-12-18 15:14 ` Jan Kara
2023-12-19 2:29 ` Baokun Li
2023-12-20 9:00 ` Baokun Li
0 siblings, 2 replies; 15+ messages in thread
From: Jan Kara @ 2023-12-18 15:14 UTC (permalink / raw)
To: Baokun Li
Cc: linux-ext4, tytso, adilger.kernel, jack, ritesh.list,
linux-kernel, yi.zhang, yangerkun, yukuai3, stable
On Mon 18-12-23 22:18:13, Baokun Li wrote:
> After updating bb_free in mb_free_blocks, it is possible to return without
> updating bb_fragments because the block being freed is found to have
> already been freed, which leads to inconsistency between bb_free and
> bb_fragments.
>
> Since the group may be unlocked in ext4_grp_locked_error(), this can lead
> to problems such as dividing by zero when calculating the average fragment
> length. Therefore, to ensure consistency, move the update of bb_free to
> after the block double-free check.
>
> Fixes: eabe0444df90 ("ext4: speed-up releasing blocks on commit")
> CC: stable@vger.kernel.org # 3.10
> Signed-off-by: Baokun Li <libaokun1@huawei.com>
I agree there's no point in updating the allocation info if the bitmap is
corrupted. We will not try to allocate (or free) blocks in that group
anymore. I'm just a bit unsure about the EXT4_FC_REPLAY state where we
don't mark the bitmap as corrupted although some blocks were already marked
as freed. So in this case the free space statistics tracking will go
permanently wrong. I'm kind of wondering in which case does fast-commit
free already freed blocks. Ted, any idea?
Honza
> ---
> fs/ext4/mballoc.c | 13 ++++++-------
> 1 file changed, 6 insertions(+), 7 deletions(-)
>
> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> index a95fa6e2b0f9..2fbee0f0f5c3 100644
> --- a/fs/ext4/mballoc.c
> +++ b/fs/ext4/mballoc.c
> @@ -1892,11 +1892,6 @@ static void mb_free_blocks(struct inode *inode, struct ext4_buddy *e4b,
> mb_check_buddy(e4b);
> mb_free_blocks_double(inode, e4b, first, count);
>
> - this_cpu_inc(discard_pa_seq);
> - e4b->bd_info->bb_free += count;
> - if (first < e4b->bd_info->bb_first_free)
> - e4b->bd_info->bb_first_free = first;
> -
> /* access memory sequentially: check left neighbour,
> * clear range and then check right neighbour
> */
> @@ -1922,9 +1917,14 @@ static void mb_free_blocks(struct inode *inode, struct ext4_buddy *e4b,
> sb, e4b->bd_group,
> EXT4_GROUP_INFO_BBITMAP_CORRUPT);
> }
> - goto done;
> + return;
> }
>
> + this_cpu_inc(discard_pa_seq);
> + e4b->bd_info->bb_free += count;
> + if (first < e4b->bd_info->bb_first_free)
> + e4b->bd_info->bb_first_free = first;
> +
> /* let's maintain fragments counter */
> if (left_is_free && right_is_free)
> e4b->bd_info->bb_fragments--;
> @@ -1949,7 +1949,6 @@ static void mb_free_blocks(struct inode *inode, struct ext4_buddy *e4b,
> if (first <= last)
> mb_buddy_mark_free(e4b, first >> 1, last >> 1);
>
> -done:
> mb_set_largest_free_order(sb, e4b->bd_info);
> mb_update_avg_fragment_size(sb, e4b->bd_info);
> mb_check_buddy(e4b);
> --
> 2.31.1
>
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 1/4] ext4: fix double-free of blocks due to wrong extents moved_len
2023-12-18 14:18 ` [PATCH 1/4] ext4: fix double-free of blocks due to wrong extents moved_len Baokun Li
@ 2023-12-18 15:32 ` Jan Kara
2023-12-19 1:51 ` Baokun Li
0 siblings, 1 reply; 15+ messages in thread
From: Jan Kara @ 2023-12-18 15:32 UTC (permalink / raw)
To: Baokun Li
Cc: linux-ext4, tytso, adilger.kernel, jack, ritesh.list,
linux-kernel, yi.zhang, yangerkun, yukuai3, Wei Chen, xingwei lee,
stable
On Mon 18-12-23 22:18:11, Baokun Li wrote:
> In ext4_move_extents(), moved_len is only updated when all moves are
> successfully executed, and only discards orig_inode and donor_inode
> preallocations when moved_len is not zero. When the loop fails to exit
> after successfully moving some extents, moved_len is not updated and
> remains at 0, so it does not discard the preallocations.
>
> If the moved extents overlap with the preallocated extents, the
> overlapped extents are freed twice in ext4_mb_release_inode_pa() and
> ext4_process_freed_data() (as described in commit 94d7c16cbbbd), and
> bb_free is incremented twice. Hence when trim is executed, a zero-division
> bug is triggered in mb_update_avg_fragment_size() because bb_free is not
> zero and bb_fragments is zero.
>
> Therefore, update move_len after each extent move to avoid the issue.
>
> Reported-by: Wei Chen <harperchen1110@gmail.com>
> Reported-by: xingwei lee <xrivendell7@gmail.com>
> Closes: https://lore.kernel.org/r/CAO4mrferzqBUnCag8R3m2zf897ts9UEuhjFQGPtODT92rYyR2Q@mail.gmail.com
> Fixes: fcf6b1b729bc ("ext4: refactor ext4_move_extents code base")
> CC: stable@vger.kernel.org # 3.18
> Signed-off-by: Baokun Li <libaokun1@huawei.com>
> ---
> fs/ext4/move_extent.c | 3 +--
> 1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/fs/ext4/move_extent.c b/fs/ext4/move_extent.c
> index 3aa57376d9c2..4b9b503c6346 100644
> --- a/fs/ext4/move_extent.c
> +++ b/fs/ext4/move_extent.c
> @@ -672,7 +672,7 @@ ext4_move_extents(struct file *o_filp, struct file *d_filp, __u64 orig_blk,
> */
> ext4_double_up_write_data_sem(orig_inode, donor_inode);
> /* Swap original branches with new branches */
> - move_extent_per_page(o_filp, donor_inode,
> + *moved_len += move_extent_per_page(o_filp, donor_inode,
> orig_page_index, donor_page_index,
> offset_in_page, cur_len,
> unwritten, &ret);
Although this is currently fine, I think ext4_move_extents() should be
careful and zero out *moved_len before it starts using it.
> @@ -682,7 +682,6 @@ ext4_move_extents(struct file *o_filp, struct file *d_filp, __u64 orig_blk,
> o_start += cur_len;
> d_start += cur_len;
> }
> - *moved_len = o_start - orig_blk;
> if (*moved_len > len)
> *moved_len = len;
So I'm not sure the *moved_len > len condition can ever trigger but if it
does, we'd need to check it whenever we are returning moved_len. So either
I'd delete the condition or move it to the out label.
Honza
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 1/4] ext4: fix double-free of blocks due to wrong extents moved_len
2023-12-18 15:32 ` Jan Kara
@ 2023-12-19 1:51 ` Baokun Li
0 siblings, 0 replies; 15+ messages in thread
From: Baokun Li @ 2023-12-19 1:51 UTC (permalink / raw)
To: Jan Kara
Cc: linux-ext4, tytso, adilger.kernel, ritesh.list, linux-kernel,
yi.zhang, yangerkun, yukuai3, Wei Chen, xingwei lee, stable,
Baokun Li
On 2023/12/18 23:32, Jan Kara wrote:
> On Mon 18-12-23 22:18:11, Baokun Li wrote:
>> In ext4_move_extents(), moved_len is only updated when all moves are
>> successfully executed, and only discards orig_inode and donor_inode
>> preallocations when moved_len is not zero. When the loop fails to exit
>> after successfully moving some extents, moved_len is not updated and
>> remains at 0, so it does not discard the preallocations.
>>
>> If the moved extents overlap with the preallocated extents, the
>> overlapped extents are freed twice in ext4_mb_release_inode_pa() and
>> ext4_process_freed_data() (as described in commit 94d7c16cbbbd), and
>> bb_free is incremented twice. Hence when trim is executed, a zero-division
>> bug is triggered in mb_update_avg_fragment_size() because bb_free is not
>> zero and bb_fragments is zero.
>>
>> Therefore, update move_len after each extent move to avoid the issue.
>>
>> Reported-by: Wei Chen <harperchen1110@gmail.com>
>> Reported-by: xingwei lee <xrivendell7@gmail.com>
>> Closes: https://lore.kernel.org/r/CAO4mrferzqBUnCag8R3m2zf897ts9UEuhjFQGPtODT92rYyR2Q@mail.gmail.com
>> Fixes: fcf6b1b729bc ("ext4: refactor ext4_move_extents code base")
>> CC: stable@vger.kernel.org # 3.18
>> Signed-off-by: Baokun Li <libaokun1@huawei.com>
>> ---
>> fs/ext4/move_extent.c | 3 +--
>> 1 file changed, 1 insertion(+), 2 deletions(-)
>>
>> diff --git a/fs/ext4/move_extent.c b/fs/ext4/move_extent.c
>> index 3aa57376d9c2..4b9b503c6346 100644
>> --- a/fs/ext4/move_extent.c
>> +++ b/fs/ext4/move_extent.c
>> @@ -672,7 +672,7 @@ ext4_move_extents(struct file *o_filp, struct file *d_filp, __u64 orig_blk,
>> */
>> ext4_double_up_write_data_sem(orig_inode, donor_inode);
>> /* Swap original branches with new branches */
>> - move_extent_per_page(o_filp, donor_inode,
>> + *moved_len += move_extent_per_page(o_filp, donor_inode,
>> orig_page_index, donor_page_index,
>> offset_in_page, cur_len,
>> unwritten, &ret);
> Although this is currently fine, I think ext4_move_extents() should be
> careful and zero out *moved_len before it starts using it.
Totally agree, now __ext4_ioctl zeroes it out, but to avoid code logic
changes or new callers afterward, I'll zero it out before using it in
ext4_move_extents() in the next version.
>
>> @@ -682,7 +682,6 @@ ext4_move_extents(struct file *o_filp, struct file *d_filp, __u64 orig_blk,
>> o_start += cur_len;
>> d_start += cur_len;
>> }
>> - *moved_len = o_start - orig_blk;
>> if (*moved_len > len)
>> *moved_len = len;
> So I'm not sure the *moved_len > len condition can ever trigger but if it
> does, we'd need to check it whenever we are returning moved_len. So either
> I'd delete the condition or move it to the out label.
>
> Honza
As the code stands now, the *moved_len > len condition never
occurs, and is supposed to be code left over from the refactoring
in [Fixes], which I'll remove in the next version.
Thank you for your review!
--
With Best Regards,
Baokun Li
.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 3/4] ext4: avoid bb_free and bb_fragments inconsistency in mb_free_blocks()
2023-12-18 15:14 ` Jan Kara
@ 2023-12-19 2:29 ` Baokun Li
2023-12-20 9:00 ` Baokun Li
1 sibling, 0 replies; 15+ messages in thread
From: Baokun Li @ 2023-12-19 2:29 UTC (permalink / raw)
To: Jan Kara, Theodore Ts'o
Cc: linux-ext4, adilger.kernel, ritesh.list, linux-kernel, yi.zhang,
yangerkun, yukuai3, stable, Baokun Li
On 2023/12/18 23:14, Jan Kara wrote:
> On Mon 18-12-23 22:18:13, Baokun Li wrote:
>> After updating bb_free in mb_free_blocks, it is possible to return without
>> updating bb_fragments because the block being freed is found to have
>> already been freed, which leads to inconsistency between bb_free and
>> bb_fragments.
>>
>> Since the group may be unlocked in ext4_grp_locked_error(), this can lead
>> to problems such as dividing by zero when calculating the average fragment
>> length. Therefore, to ensure consistency, move the update of bb_free to
>> after the block double-free check.
>>
>> Fixes: eabe0444df90 ("ext4: speed-up releasing blocks on commit")
>> CC: stable@vger.kernel.org # 3.10
>> Signed-off-by: Baokun Li <libaokun1@huawei.com>
> I agree there's no point in updating the allocation info if the bitmap is
> corrupted. We will not try to allocate (or free) blocks in that group
> anymore. I'm just a bit unsure about the EXT4_FC_REPLAY state where we
> don't mark the bitmap as corrupted although some blocks were already marked
> as freed. So in this case the free space statistics tracking will go
> permanently wrong. I'm kind of wondering in which case does fast-commit
> free already freed blocks. Ted, any idea?
>
> Honza
Some additional information, this judgment was introduced in
commit 8016e29f4362 ("ext4: fast commit recovery path") in v5.10-rc1,
at which point mb_regenerate_buddy() was called to regenerate the
buddy when it was found to be freeing a block that had already been
freed, so there was no problem. Until v5.11-rc1 commit 6bd97bf273bd
("ext4: remove redundant mb_regenerate_buddy()") removes the logic
to regenerate the buddy, it looks like the free space statistics will
remain wrong. If this normal scenario exists, perhaps buddy should
be regenerated here?
Thanks,
Baokun
>> ---
>> fs/ext4/mballoc.c | 13 ++++++-------
>> 1 file changed, 6 insertions(+), 7 deletions(-)
>>
>> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
>> index a95fa6e2b0f9..2fbee0f0f5c3 100644
>> --- a/fs/ext4/mballoc.c
>> +++ b/fs/ext4/mballoc.c
>> @@ -1892,11 +1892,6 @@ static void mb_free_blocks(struct inode *inode, struct ext4_buddy *e4b,
>> mb_check_buddy(e4b);
>> mb_free_blocks_double(inode, e4b, first, count);
>>
>> - this_cpu_inc(discard_pa_seq);
>> - e4b->bd_info->bb_free += count;
>> - if (first < e4b->bd_info->bb_first_free)
>> - e4b->bd_info->bb_first_free = first;
>> -
>> /* access memory sequentially: check left neighbour,
>> * clear range and then check right neighbour
>> */
>> @@ -1922,9 +1917,14 @@ static void mb_free_blocks(struct inode *inode, struct ext4_buddy *e4b,
>> sb, e4b->bd_group,
>> EXT4_GROUP_INFO_BBITMAP_CORRUPT);
>> }
>> - goto done;
>> + return;
>> }
>>
>> + this_cpu_inc(discard_pa_seq);
>> + e4b->bd_info->bb_free += count;
>> + if (first < e4b->bd_info->bb_first_free)
>> + e4b->bd_info->bb_first_free = first;
>> +
>> /* let's maintain fragments counter */
>> if (left_is_free && right_is_free)
>> e4b->bd_info->bb_fragments--;
>> @@ -1949,7 +1949,6 @@ static void mb_free_blocks(struct inode *inode, struct ext4_buddy *e4b,
>> if (first <= last)
>> mb_buddy_mark_free(e4b, first >> 1, last >> 1);
>>
>> -done:
>> mb_set_largest_free_order(sb, e4b->bd_info);
>> mb_update_avg_fragment_size(sb, e4b->bd_info);
>> mb_check_buddy(e4b);
>> --
>> 2.31.1
>>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 4/4] ext4: avoid dividing by 0 in mb_update_avg_fragment_size() when block bitmap corrupt
2023-12-18 15:09 ` Jan Kara
@ 2023-12-19 8:02 ` Baokun Li
2023-12-20 13:43 ` Baokun Li
0 siblings, 1 reply; 15+ messages in thread
From: Baokun Li @ 2023-12-19 8:02 UTC (permalink / raw)
To: Jan Kara
Cc: linux-ext4, tytso, adilger.kernel, ritesh.list, linux-kernel,
yi.zhang, yangerkun, yukuai3, stable, Baokun Li
On 2023/12/18 23:09, Jan Kara wrote:
> On Mon 18-12-23 15:43:42, Jan Kara wrote:
>> On Mon 18-12-23 22:18:14, Baokun Li wrote:
>>> When bb_free is not 0 but bb_fragments is 0, return directly to avoid
>>> system crash due to division by zero.
>> How could this possibly happen? bb_fragments is the number of free space
>> extents and bb_free is the number of free blocks. No free space extents =>
>> no free blocks seems pretty obvious? You can see the logic in
>> ext4_mb_generate_buddy()...
> Oh, I see. This is probably about "bitmap corrupted case". But still both
> allocation and freeing of blocks shouldn't operate on bitmaps marked as
> corrupted so this should not happen?
>
> Honza
Yes, we should make sure that we don't allocate or free blocks in
groups where the block bitmap has been marked as corrupt, but
there are still some issues here:
1. When a block bitmap is found to be corrupted, ext4_grp_locked_error()
is always called first, and only after that the block bitmap of the group
is marked as corrupted. In ext4_grp_locked_error(), the group may
be unlocked, and then other processes may be able to access the
corrupted bitmap. In this case, we can just put the marking of
corruption before ext4_grp_locked_error().
2. ext4_free_blocks() finds a corrupt bitmap can just return and do
nothing, because there is no problem with not freeing an exception
block. But mb_mark_used() has no logic for determining if a block
bitmap is corrupt, and its caller has no error handling logic, so
mb_mark_used() needs its caller to make sure that it doesn't allocate
blocks in a group with a corrupted block bitmap (which is why it
added the judgment in patch 2). However, it is possible to unlock group
between determining whether the group is corrupt and actually calling
mb_mark_used() to use those blocks. For example, when calling
mb_mark_used() in ext4_mb_try_best_found(), we are determining
whether the group's block bitmap is corrupted or not in the previous
ext4_mb_good_group(), but we are not determining it again when using
the blocks in ext4_mb_try_best_found(), at which point we may be
modifying the corrupted block bitmap.
3. Determine if a block bitmap is corrupted outside of a group lock
in ext4_mb_find_by_goal().
4. In ext4_mb_check_limits(), it may be possible to use the ac_b_ex
found in group 0 while holding a lock in group 1.
In addition to the above, there may be some corner cases that cause
inconsistencies, so here we determine if bb_fragments is 0 to avoid a
crash due to division by zero. Perhaps we could just replace
grp->bb_free == 0 with grp->bb_fragments == 0, which wouldn't look
so strange.
Thanks!
--
With Best Regards,
Baokun Li
.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 3/4] ext4: avoid bb_free and bb_fragments inconsistency in mb_free_blocks()
2023-12-18 15:14 ` Jan Kara
2023-12-19 2:29 ` Baokun Li
@ 2023-12-20 9:00 ` Baokun Li
1 sibling, 0 replies; 15+ messages in thread
From: Baokun Li @ 2023-12-20 9:00 UTC (permalink / raw)
To: Jan Kara, harshadshirwadkar
Cc: linux-ext4, tytso, adilger.kernel, ritesh.list, linux-kernel,
yi.zhang, yangerkun, yukuai3, stable, Baokun Li
On 2023/12/18 23:14, Jan Kara wrote:
> On Mon 18-12-23 22:18:13, Baokun Li wrote:
>> After updating bb_free in mb_free_blocks, it is possible to return without
>> updating bb_fragments because the block being freed is found to have
>> already been freed, which leads to inconsistency between bb_free and
>> bb_fragments.
>>
>> Since the group may be unlocked in ext4_grp_locked_error(), this can lead
>> to problems such as dividing by zero when calculating the average fragment
>> length. Therefore, to ensure consistency, move the update of bb_free to
>> after the block double-free check.
>>
>> Fixes: eabe0444df90 ("ext4: speed-up releasing blocks on commit")
>> CC: stable@vger.kernel.org # 3.10
>> Signed-off-by: Baokun Li <libaokun1@huawei.com>
> I agree there's no point in updating the allocation info if the bitmap is
> corrupted. We will not try to allocate (or free) blocks in that group
> anymore. I'm just a bit unsure about the EXT4_FC_REPLAY state where we
> don't mark the bitmap as corrupted although some blocks were already marked
> as freed. So in this case the free space statistics tracking will go
> permanently wrong. I'm kind of wondering in which case does fast-commit
> free already freed blocks. Ted, any idea?
>
> Honza
>
Hello Harshad!
Seeing that earlier you added EXT4_FC_REPLAY related judgment to
mb_free_blocks() in commit 8016e29f4362 ("ext4: fast commit recovery path"),
when there is EXT4_FC_REPLAY, even when freeing blocks that have already
been freed, the block bitmap is not marked as corrupted, is there a known
scenario here?
I would be grateful if you could shed some light on this!
Thanks,
Baokun
>> ---
>> fs/ext4/mballoc.c | 13 ++++++-------
>> 1 file changed, 6 insertions(+), 7 deletions(-)
>>
>> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
>> index a95fa6e2b0f9..2fbee0f0f5c3 100644
>> --- a/fs/ext4/mballoc.c
>> +++ b/fs/ext4/mballoc.c
>> @@ -1892,11 +1892,6 @@ static void mb_free_blocks(struct inode *inode, struct ext4_buddy *e4b,
>> mb_check_buddy(e4b);
>> mb_free_blocks_double(inode, e4b, first, count);
>>
>> - this_cpu_inc(discard_pa_seq);
>> - e4b->bd_info->bb_free += count;
>> - if (first < e4b->bd_info->bb_first_free)
>> - e4b->bd_info->bb_first_free = first;
>> -
>> /* access memory sequentially: check left neighbour,
>> * clear range and then check right neighbour
>> */
>> @@ -1922,9 +1917,14 @@ static void mb_free_blocks(struct inode *inode, struct ext4_buddy *e4b,
>> sb, e4b->bd_group,
>> EXT4_GROUP_INFO_BBITMAP_CORRUPT);
>> }
>> - goto done;
>> + return;
>> }
>>
>> + this_cpu_inc(discard_pa_seq);
>> + e4b->bd_info->bb_free += count;
>> + if (first < e4b->bd_info->bb_first_free)
>> + e4b->bd_info->bb_first_free = first;
>> +
>> /* let's maintain fragments counter */
>> if (left_is_free && right_is_free)
>> e4b->bd_info->bb_fragments--;
>> @@ -1949,7 +1949,6 @@ static void mb_free_blocks(struct inode *inode, struct ext4_buddy *e4b,
>> if (first <= last)
>> mb_buddy_mark_free(e4b, first >> 1, last >> 1);
>>
>> -done:
>> mb_set_largest_free_order(sb, e4b->bd_info);
>> mb_update_avg_fragment_size(sb, e4b->bd_info);
>> mb_check_buddy(e4b);
>> --
>> 2.31.1
>>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 4/4] ext4: avoid dividing by 0 in mb_update_avg_fragment_size() when block bitmap corrupt
2023-12-19 8:02 ` Baokun Li
@ 2023-12-20 13:43 ` Baokun Li
0 siblings, 0 replies; 15+ messages in thread
From: Baokun Li @ 2023-12-20 13:43 UTC (permalink / raw)
To: Jan Kara
Cc: linux-ext4, tytso, adilger.kernel, ritesh.list, linux-kernel,
yi.zhang, yangerkun, yukuai3, stable, Baokun Li
On 2023/12/19 16:02, Baokun Li wrote:
> On 2023/12/18 23:09, Jan Kara wrote:
>> On Mon 18-12-23 15:43:42, Jan Kara wrote:
>>> On Mon 18-12-23 22:18:14, Baokun Li wrote:
>>>> When bb_free is not 0 but bb_fragments is 0, return directly to avoid
>>>> system crash due to division by zero.
>>> How could this possibly happen? bb_fragments is the number of free
>>> space
>>> extents and bb_free is the number of free blocks. No free space
>>> extents =>
>>> no free blocks seems pretty obvious? You can see the logic in
>>> ext4_mb_generate_buddy()...
>> Oh, I see. This is probably about "bitmap corrupted case". But still
>> both
>> allocation and freeing of blocks shouldn't operate on bitmaps marked as
>> corrupted so this should not happen?
>>
>> Honza
> Yes, we should make sure that we don't allocate or free blocks in
> groups where the block bitmap has been marked as corrupt, but
> there are still some issues here:
>
> 1. When a block bitmap is found to be corrupted, ext4_grp_locked_error()
> is always called first, and only after that the block bitmap of the group
> is marked as corrupted. In ext4_grp_locked_error(), the group may
> be unlocked, and then other processes may be able to access the
> corrupted bitmap. In this case, we can just put the marking of
> corruption before ext4_grp_locked_error().
>
> 2. ext4_free_blocks() finds a corrupt bitmap can just return and do
> nothing, because there is no problem with not freeing an exception
> block. But mb_mark_used() has no logic for determining if a block
> bitmap is corrupt, and its caller has no error handling logic, so
> mb_mark_used() needs its caller to make sure that it doesn't allocate
> blocks in a group with a corrupted block bitmap (which is why it
> added the judgment in patch 2). However, it is possible to unlock group
> between determining whether the group is corrupt and actually calling
> mb_mark_used() to use those blocks. For example, when calling
> mb_mark_used() in ext4_mb_try_best_found(), we are determining
> whether the group's block bitmap is corrupted or not in the previous
> ext4_mb_good_group(), but we are not determining it again when using
> the blocks in ext4_mb_try_best_found(), at which point we may be
> modifying the corrupted block bitmap.
>
> 3. Determine if a block bitmap is corrupted outside of a group lock
> in ext4_mb_find_by_goal().
>
> 4. In ext4_mb_check_limits(), it may be possible to use the ac_b_ex
> found in group 0 while holding a lock in group 1.
I'm very sorry that the fourth point was wrong, I read "||" as "&&" in
ext4_mb_check_limits() :
if (finish_group || ac->ac_found > sbi->s_mb_min_to_scan)
>
> In addition to the above, there may be some corner cases that cause
> inconsistencies, so here we determine if bb_fragments is 0 to avoid a
> crash due to division by zero. Perhaps we could just replace
> grp->bb_free == 0 with grp->bb_fragments == 0, which wouldn't look
> so strange.
Thanks!
--
With Best Regards,
Baokun Li
.
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2023-12-20 13:43 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-12-18 14:18 [PATCH 0/4] ext4: fix divide error in mb_update_avg_fragment_size() Baokun Li
2023-12-18 14:18 ` [PATCH 1/4] ext4: fix double-free of blocks due to wrong extents moved_len Baokun Li
2023-12-18 15:32 ` Jan Kara
2023-12-19 1:51 ` Baokun Li
2023-12-18 14:18 ` [PATCH 2/4] ext4: do not trim the group with corrupted block bitmap Baokun Li
2023-12-18 15:02 ` Jan Kara
2023-12-18 14:18 ` [PATCH 3/4] ext4: avoid bb_free and bb_fragments inconsistency in mb_free_blocks() Baokun Li
2023-12-18 15:14 ` Jan Kara
2023-12-19 2:29 ` Baokun Li
2023-12-20 9:00 ` Baokun Li
2023-12-18 14:18 ` [PATCH 4/4] ext4: avoid dividing by 0 in mb_update_avg_fragment_size() when block bitmap corrupt Baokun Li
2023-12-18 14:43 ` Jan Kara
2023-12-18 15:09 ` Jan Kara
2023-12-19 8:02 ` Baokun Li
2023-12-20 13:43 ` Baokun Li
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox