* [PATCH AUTOSEL 6.7 11/21] ext4: enable dioread_nolock as default for bs < ps case
[not found] <20240116010422.217925-1-sashal@kernel.org>
@ 2024-01-16 1:03 ` Sasha Levin
2024-01-16 1:03 ` [PATCH AUTOSEL 6.7 12/21] ext4: treat end of range as exclusive in ext4_zero_range() Sasha Levin
` (4 subsequent siblings)
5 siblings, 0 replies; 6+ messages in thread
From: Sasha Levin @ 2024-01-16 1:03 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Ojaswin Mujoo, Ritesh Harjani, Theodore Ts'o, Sasha Levin,
adilger.kernel, linux-ext4
From: Ojaswin Mujoo <ojaswin@linux.ibm.com>
[ Upstream commit e89fdcc425b6feea4dfb33877e9256757905d763 ]
dioread_nolock was originally disabled as a default option for bs < ps
scenarios due to a data corruption issue. Since then, we've had some
fixes in this area which address such issues. Enable dioread_nolock by
default and remove the experimental warning message for bs < ps path.
dioread for bs < ps has been tested on a 64k pagesize machine using:
kvm-xfstest -C 3 -g auto
with the following configs:
64k adv bigalloc_4k bigalloc_64k data_journal encrypt
dioread_nolock dioread_nolock_4k ext3 ext3conv nojournal
And no new regressions were seen compared to baseline kernel.
Suggested-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Signed-off-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Link: https://lore.kernel.org/r/20231101154717.531865-1-ojaswin@linux.ibm.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
fs/ext4/super.c | 11 +----------
1 file changed, 1 insertion(+), 10 deletions(-)
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index c5fcf377ab1f..783a755078cf 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -2793,15 +2793,6 @@ static int ext4_check_opt_consistency(struct fs_context *fc,
return -EINVAL;
}
- if (ctx_test_mount_opt(ctx, EXT4_MOUNT_DIOREAD_NOLOCK)) {
- int blocksize =
- BLOCK_SIZE << le32_to_cpu(sbi->s_es->s_log_block_size);
- if (blocksize < PAGE_SIZE)
- ext4_msg(NULL, KERN_WARNING, "Warning: mounting with an "
- "experimental mount option 'dioread_nolock' "
- "for blocksize < PAGE_SIZE");
- }
-
err = ext4_check_test_dummy_encryption(fc, sb);
if (err)
return err;
@@ -4410,7 +4401,7 @@ static void ext4_set_def_opts(struct super_block *sb,
((def_mount_opts & EXT4_DEFM_NODELALLOC) == 0))
set_opt(sb, DELALLOC);
- if (sb->s_blocksize == PAGE_SIZE)
+ if (sb->s_blocksize <= PAGE_SIZE)
set_opt(sb, DIOREAD_NOLOCK);
}
--
2.43.0
^ permalink raw reply related [flat|nested] 6+ messages in thread* [PATCH AUTOSEL 6.7 12/21] ext4: treat end of range as exclusive in ext4_zero_range()
[not found] <20240116010422.217925-1-sashal@kernel.org>
2024-01-16 1:03 ` [PATCH AUTOSEL 6.7 11/21] ext4: enable dioread_nolock as default for bs < ps case Sasha Levin
@ 2024-01-16 1:03 ` Sasha Levin
2024-01-16 1:03 ` [PATCH AUTOSEL 6.7 18/21] ext4: fix inconsistent between segment fstrim and full fstrim Sasha Levin
` (3 subsequent siblings)
5 siblings, 0 replies; 6+ messages in thread
From: Sasha Levin @ 2024-01-16 1:03 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Ojaswin Mujoo, Jan Kara, Theodore Ts'o, Sasha Levin,
adilger.kernel, linux-ext4
From: Ojaswin Mujoo <ojaswin@linux.ibm.com>
[ Upstream commit 92573369144f40397e8514440afdf59f24905b40 ]
The call to filemap_write_and_wait_range() assumes the range passed to be
inclusive, so fix the call to make sure we follow that.
Signed-off-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/e503107a7c73a2b68dec645c5ad798c437717c45.1698856309.git.ojaswin@linux.ibm.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
fs/ext4/extents.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index d5efe076d3d3..01299b55a567 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -4523,7 +4523,8 @@ static long ext4_zero_range(struct file *file, loff_t offset,
* Round up offset. This is not fallocate, we need to zero out
* blocks, so convert interior block aligned part of the range to
* unwritten and possibly manually zero out unaligned parts of the
- * range.
+ * range. Here, start and partial_begin are inclusive, end and
+ * partial_end are exclusive.
*/
start = round_up(offset, 1 << blkbits);
end = round_down((offset + len), 1 << blkbits);
@@ -4609,7 +4610,8 @@ static long ext4_zero_range(struct file *file, loff_t offset,
* disk in case of crash before zeroing trans is committed.
*/
if (ext4_should_journal_data(inode)) {
- ret = filemap_write_and_wait_range(mapping, start, end);
+ ret = filemap_write_and_wait_range(mapping, start,
+ end - 1);
if (ret) {
filemap_invalidate_unlock(mapping);
goto out_mutex;
--
2.43.0
^ permalink raw reply related [flat|nested] 6+ messages in thread* [PATCH AUTOSEL 6.7 18/21] ext4: fix inconsistent between segment fstrim and full fstrim
[not found] <20240116010422.217925-1-sashal@kernel.org>
2024-01-16 1:03 ` [PATCH AUTOSEL 6.7 11/21] ext4: enable dioread_nolock as default for bs < ps case Sasha Levin
2024-01-16 1:03 ` [PATCH AUTOSEL 6.7 12/21] ext4: treat end of range as exclusive in ext4_zero_range() Sasha Levin
@ 2024-01-16 1:03 ` Sasha Levin
2024-01-16 1:03 ` [PATCH AUTOSEL 6.7 19/21] ext4: unify the type of flexbg_size to unsigned int Sasha Levin
` (2 subsequent siblings)
5 siblings, 0 replies; 6+ messages in thread
From: Sasha Levin @ 2024-01-16 1:03 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Ye Bin, Jan Kara, Theodore Ts'o, Sasha Levin, adilger.kernel,
linux-ext4
From: Ye Bin <yebin10@huawei.com>
[ Upstream commit 68da4c44b994aea797eb9821acb3a4a36015293e ]
Suppose we issue two FITRIM ioctls for ranges [0,15] and [16,31] with
mininum length of trimmed range set to 8 blocks. If we have say a range of
blocks 10-22 free, this range will not be trimmed because it straddles the
boundary of the two FITRIM ranges and neither part is big enough. This is a
bit surprising to some users that call FITRIM on smaller ranges of blocks
to limit impact on the system. Also XFS trims all free space extents that
overlap with the specified range so we are inconsistent among filesystems.
Let's change ext4_try_to_trim_range() to consider for trimming the whole
free space extent that straddles the end of specified range, not just the
part of it within the range.
Signed-off-by: Ye Bin <yebin10@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20231216010919.1995851-1-yebin10@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
fs/ext4/mballoc.c | 11 ++++++++---
1 file changed, 8 insertions(+), 3 deletions(-)
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index d72b5e3c92ec..d195461123d8 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -6753,13 +6753,15 @@ static int ext4_try_to_trim_range(struct super_block *sb,
__acquires(ext4_group_lock_ptr(sb, e4b->bd_group))
__releases(ext4_group_lock_ptr(sb, e4b->bd_group))
{
- ext4_grpblk_t next, count, free_count;
+ ext4_grpblk_t next, count, free_count, last, origin_start;
bool set_trimmed = false;
void *bitmap;
+ last = ext4_last_grp_cluster(sb, e4b->bd_group);
bitmap = e4b->bd_bitmap;
- if (start == 0 && max >= ext4_last_grp_cluster(sb, e4b->bd_group))
+ if (start == 0 && max >= last)
set_trimmed = true;
+ origin_start = start;
start = max(e4b->bd_info->bb_first_free, start);
count = 0;
free_count = 0;
@@ -6768,7 +6770,10 @@ __releases(ext4_group_lock_ptr(sb, e4b->bd_group))
start = mb_find_next_zero_bit(bitmap, max + 1, start);
if (start > max)
break;
- next = mb_find_next_bit(bitmap, max + 1, start);
+
+ next = mb_find_next_bit(bitmap, last + 1, start);
+ if (origin_start == 0 && next >= last)
+ set_trimmed = true;
if ((next - start) >= minblocks) {
int ret = ext4_trim_extent(sb, start, next - start, e4b);
--
2.43.0
^ permalink raw reply related [flat|nested] 6+ messages in thread* [PATCH AUTOSEL 6.7 19/21] ext4: unify the type of flexbg_size to unsigned int
[not found] <20240116010422.217925-1-sashal@kernel.org>
` (2 preceding siblings ...)
2024-01-16 1:03 ` [PATCH AUTOSEL 6.7 18/21] ext4: fix inconsistent between segment fstrim and full fstrim Sasha Levin
@ 2024-01-16 1:03 ` Sasha Levin
2024-01-16 1:03 ` [PATCH AUTOSEL 6.7 20/21] ext4: remove unnecessary check from alloc_flex_gd() Sasha Levin
2024-01-16 1:03 ` [PATCH AUTOSEL 6.7 21/21] ext4: avoid online resizing failures due to oversized flex bg Sasha Levin
5 siblings, 0 replies; 6+ messages in thread
From: Sasha Levin @ 2024-01-16 1:03 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Baokun Li, Jan Kara, Theodore Ts'o, Sasha Levin,
adilger.kernel, linux-ext4
From: Baokun Li <libaokun1@huawei.com>
[ Upstream commit 658a52344fb139f9531e7543a6e0015b630feb38 ]
The maximum value of flexbg_size is 2^31, but the maximum value of int
is (2^31 - 1), so overflow may occur when the type of flexbg_size is
declared as int.
For example, when uninit_mask is initialized in ext4_alloc_group_tables(),
if flexbg_size == 2^31, the initialized uninit_mask is incorrect, and this
may causes set_flexbg_block_bitmap() to trigger a BUG_ON().
Therefore, the flexbg_size type is declared as unsigned int to avoid
overflow and memory waste.
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20231023013057.2117948-2-libaokun1@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Stable-dep-of: 5d1935ac02ca ("ext4: avoid online resizing failures due to oversized flex bg")
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
fs/ext4/resize.c | 13 +++++++------
1 file changed, 7 insertions(+), 6 deletions(-)
diff --git a/fs/ext4/resize.c b/fs/ext4/resize.c
index 4fe061edefdd..c6d4539d4c1f 100644
--- a/fs/ext4/resize.c
+++ b/fs/ext4/resize.c
@@ -228,7 +228,7 @@ struct ext4_new_flex_group_data {
*
* Returns NULL on failure otherwise address of the allocated structure.
*/
-static struct ext4_new_flex_group_data *alloc_flex_gd(unsigned long flexbg_size)
+static struct ext4_new_flex_group_data *alloc_flex_gd(unsigned int flexbg_size)
{
struct ext4_new_flex_group_data *flex_gd;
@@ -283,7 +283,7 @@ static void free_flex_gd(struct ext4_new_flex_group_data *flex_gd)
*/
static int ext4_alloc_group_tables(struct super_block *sb,
struct ext4_new_flex_group_data *flex_gd,
- int flexbg_size)
+ unsigned int flexbg_size)
{
struct ext4_new_group_data *group_data = flex_gd->groups;
ext4_fsblk_t start_blk;
@@ -384,12 +384,12 @@ static int ext4_alloc_group_tables(struct super_block *sb,
group = group_data[0].group;
printk(KERN_DEBUG "EXT4-fs: adding a flex group with "
- "%d groups, flexbg size is %d:\n", flex_gd->count,
+ "%u groups, flexbg size is %u:\n", flex_gd->count,
flexbg_size);
for (i = 0; i < flex_gd->count; i++) {
ext4_debug(
- "adding %s group %u: %u blocks (%d free, %d mdata blocks)\n",
+ "adding %s group %u: %u blocks (%u free, %u mdata blocks)\n",
ext4_bg_has_super(sb, group + i) ? "normal" :
"no-super", group + i,
group_data[i].blocks_count,
@@ -1606,7 +1606,7 @@ static int ext4_flex_group_add(struct super_block *sb,
static int ext4_setup_next_flex_gd(struct super_block *sb,
struct ext4_new_flex_group_data *flex_gd,
ext4_fsblk_t n_blocks_count,
- unsigned long flexbg_size)
+ unsigned int flexbg_size)
{
struct ext4_sb_info *sbi = EXT4_SB(sb);
struct ext4_super_block *es = sbi->s_es;
@@ -1990,8 +1990,9 @@ int ext4_resize_fs(struct super_block *sb, ext4_fsblk_t n_blocks_count)
ext4_fsblk_t o_blocks_count;
ext4_fsblk_t n_blocks_count_retry = 0;
unsigned long last_update_time = 0;
- int err = 0, flexbg_size = 1 << sbi->s_log_groups_per_flex;
+ int err = 0;
int meta_bg;
+ unsigned int flexbg_size = ext4_flex_bg_size(sbi);
/* See if the device is actually as big as what was requested */
bh = ext4_sb_bread(sb, n_blocks_count - 1, 0);
--
2.43.0
^ permalink raw reply related [flat|nested] 6+ messages in thread* [PATCH AUTOSEL 6.7 20/21] ext4: remove unnecessary check from alloc_flex_gd()
[not found] <20240116010422.217925-1-sashal@kernel.org>
` (3 preceding siblings ...)
2024-01-16 1:03 ` [PATCH AUTOSEL 6.7 19/21] ext4: unify the type of flexbg_size to unsigned int Sasha Levin
@ 2024-01-16 1:03 ` Sasha Levin
2024-01-16 1:03 ` [PATCH AUTOSEL 6.7 21/21] ext4: avoid online resizing failures due to oversized flex bg Sasha Levin
5 siblings, 0 replies; 6+ messages in thread
From: Sasha Levin @ 2024-01-16 1:03 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Baokun Li, Jan Kara, Theodore Ts'o, Sasha Levin,
adilger.kernel, linux-ext4
From: Baokun Li <libaokun1@huawei.com>
[ Upstream commit b099eb87de105cf07cad731ded6fb40b2675108b ]
In commit 967ac8af4475 ("ext4: fix potential integer overflow in
alloc_flex_gd()"), an overflow check is added to alloc_flex_gd() to
prevent the allocated memory from being smaller than expected due to
the overflow. However, after kmalloc() is replaced with kmalloc_array()
in commit 6da2ec56059c ("treewide: kmalloc() -> kmalloc_array()"), the
kmalloc_array() function has an overflow check, so the above problem
will not occur. Therefore, the extra check is removed.
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20231023013057.2117948-3-libaokun1@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Stable-dep-of: 5d1935ac02ca ("ext4: avoid online resizing failures due to oversized flex bg")
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
fs/ext4/resize.c | 3 ---
1 file changed, 3 deletions(-)
diff --git a/fs/ext4/resize.c b/fs/ext4/resize.c
index c6d4539d4c1f..0a57b199883c 100644
--- a/fs/ext4/resize.c
+++ b/fs/ext4/resize.c
@@ -236,10 +236,7 @@ static struct ext4_new_flex_group_data *alloc_flex_gd(unsigned int flexbg_size)
if (flex_gd == NULL)
goto out3;
- if (flexbg_size >= UINT_MAX / sizeof(struct ext4_new_group_data))
- goto out2;
flex_gd->count = flexbg_size;
-
flex_gd->groups = kmalloc_array(flexbg_size,
sizeof(struct ext4_new_group_data),
GFP_NOFS);
--
2.43.0
^ permalink raw reply related [flat|nested] 6+ messages in thread* [PATCH AUTOSEL 6.7 21/21] ext4: avoid online resizing failures due to oversized flex bg
[not found] <20240116010422.217925-1-sashal@kernel.org>
` (4 preceding siblings ...)
2024-01-16 1:03 ` [PATCH AUTOSEL 6.7 20/21] ext4: remove unnecessary check from alloc_flex_gd() Sasha Levin
@ 2024-01-16 1:03 ` Sasha Levin
5 siblings, 0 replies; 6+ messages in thread
From: Sasha Levin @ 2024-01-16 1:03 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Baokun Li, Jan Kara, Theodore Ts'o, Sasha Levin,
adilger.kernel, linux-ext4
From: Baokun Li <libaokun1@huawei.com>
[ Upstream commit 5d1935ac02ca5aee364a449a35e2977ea84509b0 ]
When we online resize an ext4 filesystem with a oversized flexbg_size,
mkfs.ext4 -F -G 67108864 $dev -b 4096 100M
mount $dev $dir
resize2fs $dev 16G
the following WARN_ON is triggered:
==================================================================
WARNING: CPU: 0 PID: 427 at mm/page_alloc.c:4402 __alloc_pages+0x411/0x550
Modules linked in: sg(E)
CPU: 0 PID: 427 Comm: resize2fs Tainted: G E 6.6.0-rc5+ #314
RIP: 0010:__alloc_pages+0x411/0x550
Call Trace:
<TASK>
__kmalloc_large_node+0xa2/0x200
__kmalloc+0x16e/0x290
ext4_resize_fs+0x481/0xd80
__ext4_ioctl+0x1616/0x1d90
ext4_ioctl+0x12/0x20
__x64_sys_ioctl+0xf0/0x150
do_syscall_64+0x3b/0x90
==================================================================
This is because flexbg_size is too large and the size of the new_group_data
array to be allocated exceeds MAX_ORDER. Currently, the minimum value of
MAX_ORDER is 8, the minimum value of PAGE_SIZE is 4096, the corresponding
maximum number of groups that can be allocated is:
(PAGE_SIZE << MAX_ORDER) / sizeof(struct ext4_new_group_data) ≈ 21845
And the value that is down-aligned to the power of 2 is 16384. Therefore,
this value is defined as MAX_RESIZE_BG, and the number of groups added
each time does not exceed this value during resizing, and is added multiple
times to complete the online resizing. The difference is that the metadata
in a flex_bg may be more dispersed.
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20231023013057.2117948-4-libaokun1@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
fs/ext4/resize.c | 25 +++++++++++++++++--------
1 file changed, 17 insertions(+), 8 deletions(-)
diff --git a/fs/ext4/resize.c b/fs/ext4/resize.c
index 0a57b199883c..e168a9f59600 100644
--- a/fs/ext4/resize.c
+++ b/fs/ext4/resize.c
@@ -218,10 +218,17 @@ struct ext4_new_flex_group_data {
in the flex group */
__u16 *bg_flags; /* block group flags of groups
in @groups */
+ ext4_group_t resize_bg; /* number of allocated
+ new_group_data */
ext4_group_t count; /* number of groups in @groups
*/
};
+/*
+ * Avoiding memory allocation failures due to too many groups added each time.
+ */
+#define MAX_RESIZE_BG 16384
+
/*
* alloc_flex_gd() allocates a ext4_new_flex_group_data with size of
* @flexbg_size.
@@ -236,14 +243,18 @@ static struct ext4_new_flex_group_data *alloc_flex_gd(unsigned int flexbg_size)
if (flex_gd == NULL)
goto out3;
- flex_gd->count = flexbg_size;
- flex_gd->groups = kmalloc_array(flexbg_size,
+ if (unlikely(flexbg_size > MAX_RESIZE_BG))
+ flex_gd->resize_bg = MAX_RESIZE_BG;
+ else
+ flex_gd->resize_bg = flexbg_size;
+
+ flex_gd->groups = kmalloc_array(flex_gd->resize_bg,
sizeof(struct ext4_new_group_data),
GFP_NOFS);
if (flex_gd->groups == NULL)
goto out2;
- flex_gd->bg_flags = kmalloc_array(flexbg_size, sizeof(__u16),
+ flex_gd->bg_flags = kmalloc_array(flex_gd->resize_bg, sizeof(__u16),
GFP_NOFS);
if (flex_gd->bg_flags == NULL)
goto out1;
@@ -1602,8 +1613,7 @@ static int ext4_flex_group_add(struct super_block *sb,
static int ext4_setup_next_flex_gd(struct super_block *sb,
struct ext4_new_flex_group_data *flex_gd,
- ext4_fsblk_t n_blocks_count,
- unsigned int flexbg_size)
+ ext4_fsblk_t n_blocks_count)
{
struct ext4_sb_info *sbi = EXT4_SB(sb);
struct ext4_super_block *es = sbi->s_es;
@@ -1627,7 +1637,7 @@ static int ext4_setup_next_flex_gd(struct super_block *sb,
BUG_ON(last);
ext4_get_group_no_and_offset(sb, n_blocks_count - 1, &n_group, &last);
- last_group = group | (flexbg_size - 1);
+ last_group = group | (flex_gd->resize_bg - 1);
if (last_group > n_group)
last_group = n_group;
@@ -2130,8 +2140,7 @@ int ext4_resize_fs(struct super_block *sb, ext4_fsblk_t n_blocks_count)
/* Add flex groups. Note that a regular group is a
* flex group with 1 group.
*/
- while (ext4_setup_next_flex_gd(sb, flex_gd, n_blocks_count,
- flexbg_size)) {
+ while (ext4_setup_next_flex_gd(sb, flex_gd, n_blocks_count)) {
if (time_is_before_jiffies(last_update_time + HZ * 10)) {
if (last_update_time)
ext4_msg(sb, KERN_INFO,
--
2.43.0
^ permalink raw reply related [flat|nested] 6+ messages in thread