* [PATCH v2 0/3] ext4: tighten mount-time superblock geometry validation
@ 2026-06-08 11:11 Baokun Li
2026-06-08 11:11 ` [PATCH v2 1/3] ext4: reject mount if clusters/inodes per group are not 8-aligned Baokun Li
` (2 more replies)
0 siblings, 3 replies; 14+ messages in thread
From: Baokun Li @ 2026-06-08 11:11 UTC (permalink / raw)
To: linux-ext4; +Cc: tytso, adilger.kernel, jack, yi.zhang, ojaswin, ritesh.list
Changes since v1:
* Patch 1: Removed a spurious newline in the error message format string.
* Added Patches 2 and 3 to fix additional issues reported by Sashiko
(independent of Patch 1).
v1: https://patch.msgid.link/20260608061112.392391-1-libaokun@linux.alibaba.com
This series adds missing mount-time sanity checks for superblock
geometry parameters, preventing crafted filesystem images from causing
bitmap checksum corruption, integer overflow, or out-of-bounds inode
table access.
Patch 1 rejects filesystems where s_clusters_per_group or
s_inodes_per_group is not 8-aligned, since the bitmap checksum
functions operate on whole bytes and would leave trailing bits
unprotected.
Patch 2 reduces EXT4_MAX_CLUSTER_LOG_SIZE from 30 to 28 to match
the documented 256MB limit in mke2fs, preventing a 32-bit overflow
in the blocks-per-group consistency check on bigalloc filesystems.
Patch 3 rejects filesystems where s_inodes_per_group is not a
multiple of s_inodes_per_block, preventing truncation in the
s_itb_per_group calculation that could lead __ext4_get_inode_loc()
to read beyond the inode table.
Baokun Li (3):
ext4: reject mount if clusters/inodes per group are not 8-aligned
ext4: reduce max cluster size to match documented 256MB limit
ext4: reject mount if inodes per group is not a multiple of inodes per
block
fs/ext4/ext4.h | 2 +-
fs/ext4/super.c | 11 +++++++----
2 files changed, 8 insertions(+), 5 deletions(-)
--
2.43.7
^ permalink raw reply [flat|nested] 14+ messages in thread* [PATCH v2 1/3] ext4: reject mount if clusters/inodes per group are not 8-aligned 2026-06-08 11:11 [PATCH v2 0/3] ext4: tighten mount-time superblock geometry validation Baokun Li @ 2026-06-08 11:11 ` Baokun Li 2026-06-08 12:53 ` Zhang Yi 2026-06-09 10:57 ` Jan Kara 2026-06-08 11:11 ` [PATCH v2 2/3] ext4: reduce max cluster size to match documented 256MB limit Baokun Li 2026-06-08 11:11 ` [PATCH v2 3/3] ext4: reject mount if inodes per group is not a multiple of inodes per block Baokun Li 2 siblings, 2 replies; 14+ messages in thread From: Baokun Li @ 2026-06-08 11:11 UTC (permalink / raw) To: linux-ext4 Cc: tytso, adilger.kernel, jack, yi.zhang, ojaswin, ritesh.list, Sashiko The block and inode bitmap checksums are computed over a whole number of bytes: ext4_inode_bitmap_csum_*() use EXT4_INODES_PER_GROUP(sb) >> 3 and ext4_block_bitmap_csum_*() use EXT4_CLUSTERS_PER_GROUP(sb) / 8 as the length passed to ext4_chksum(). If s_inodes_per_group or s_clusters_per_group is not a multiple of 8, the trailing fractional bits are excluded from the checksum. Those bits are then unprotected, and any incremental csum update path that assumes a byte-aligned bitmap can compute a checksum inconsistent with the full recalculation, corrupting the on-disk bitmap checksum. Reject such filesystems at mount time by adding the missing " & 7" alignment checks alongside the existing range validation. Suggested-by: Theodore Ts'o <tytso@mit.edu> Link: https://patch.msgid.link/h3n7jlfhyna64dn5o76qxcspnhxdddcs6crpxftmy7gnl7b3sx@jenszfpcsnit Reported-by: Sashiko <sashiko-bot@kernel.org> Closes: https://sashiko.dev/#/patchset/20260508121539.4174601-1-libaokun%40linux.alibaba.com?part=10 Signed-off-by: Baokun Li <libaokun@linux.alibaba.com> --- fs/ext4/super.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/fs/ext4/super.c b/fs/ext4/super.c index 6a77db4d3124..3ddcb4a8d4db 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -4472,8 +4472,9 @@ static int ext4_handle_clustersize(struct super_block *sb) sbi->s_cluster_bits = 0; } sbi->s_clusters_per_group = le32_to_cpu(es->s_clusters_per_group); - if (sbi->s_clusters_per_group > sb->s_blocksize * 8) { - ext4_msg(sb, KERN_ERR, "#clusters per group too big: %lu", + if (sbi->s_clusters_per_group > sb->s_blocksize * 8 || + sbi->s_clusters_per_group & 7) { + ext4_msg(sb, KERN_ERR, "invalid #clusters per group: %lu", sbi->s_clusters_per_group); return -EINVAL; } @@ -5304,8 +5305,9 @@ static int ext4_block_group_meta_init(struct super_block *sb, int silent) return -EINVAL; } if (sbi->s_inodes_per_group < sbi->s_inodes_per_block || - sbi->s_inodes_per_group > sb->s_blocksize * 8) { - ext4_msg(sb, KERN_ERR, "invalid inodes per group: %lu\n", + sbi->s_inodes_per_group > sb->s_blocksize * 8 || + sbi->s_inodes_per_group & 7) { + ext4_msg(sb, KERN_ERR, "invalid inodes per group: %lu", sbi->s_inodes_per_group); return -EINVAL; } -- 2.43.7 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH v2 1/3] ext4: reject mount if clusters/inodes per group are not 8-aligned 2026-06-08 11:11 ` [PATCH v2 1/3] ext4: reject mount if clusters/inodes per group are not 8-aligned Baokun Li @ 2026-06-08 12:53 ` Zhang Yi 2026-06-09 10:57 ` Jan Kara 1 sibling, 0 replies; 14+ messages in thread From: Zhang Yi @ 2026-06-08 12:53 UTC (permalink / raw) To: Baokun Li, linux-ext4 Cc: tytso, adilger.kernel, jack, ojaswin, ritesh.list, Sashiko On 6/8/2026 7:11 PM, Baokun Li wrote: > The block and inode bitmap checksums are computed over a whole number of > bytes: ext4_inode_bitmap_csum_*() use EXT4_INODES_PER_GROUP(sb) >> 3 and > ext4_block_bitmap_csum_*() use EXT4_CLUSTERS_PER_GROUP(sb) / 8 as the > length passed to ext4_chksum(). > > If s_inodes_per_group or s_clusters_per_group is not a multiple of 8, the > trailing fractional bits are excluded from the checksum. Those bits are > then unprotected, and any incremental csum update path that assumes a > byte-aligned bitmap can compute a checksum inconsistent with the full > recalculation, corrupting the on-disk bitmap checksum. > > Reject such filesystems at mount time by adding the missing " & 7" > alignment checks alongside the existing range validation. > > Suggested-by: Theodore Ts'o <tytso@mit.edu> > Link: https://patch.msgid.link/h3n7jlfhyna64dn5o76qxcspnhxdddcs6crpxftmy7gnl7b3sx@jenszfpcsnit > Reported-by: Sashiko <sashiko-bot@kernel.org> > Closes: https://sashiko.dev/#/patchset/20260508121539.4174601-1-libaokun%40linux.alibaba.com?part=10 > Signed-off-by: Baokun Li <libaokun@linux.alibaba.com> Looks good to me. Thanks! Reviewed-by: Zhang Yi <yi.zhang@huawei.com> > --- > fs/ext4/super.c | 10 ++++++---- > 1 file changed, 6 insertions(+), 4 deletions(-) > > diff --git a/fs/ext4/super.c b/fs/ext4/super.c > index 6a77db4d3124..3ddcb4a8d4db 100644 > --- a/fs/ext4/super.c > +++ b/fs/ext4/super.c > @@ -4472,8 +4472,9 @@ static int ext4_handle_clustersize(struct super_block *sb) > sbi->s_cluster_bits = 0; > } > sbi->s_clusters_per_group = le32_to_cpu(es->s_clusters_per_group); > - if (sbi->s_clusters_per_group > sb->s_blocksize * 8) { > - ext4_msg(sb, KERN_ERR, "#clusters per group too big: %lu", > + if (sbi->s_clusters_per_group > sb->s_blocksize * 8 || > + sbi->s_clusters_per_group & 7) { > + ext4_msg(sb, KERN_ERR, "invalid #clusters per group: %lu", > sbi->s_clusters_per_group); > return -EINVAL; > } > @@ -5304,8 +5305,9 @@ static int ext4_block_group_meta_init(struct super_block *sb, int silent) > return -EINVAL; > } > if (sbi->s_inodes_per_group < sbi->s_inodes_per_block || > - sbi->s_inodes_per_group > sb->s_blocksize * 8) { > - ext4_msg(sb, KERN_ERR, "invalid inodes per group: %lu\n", > + sbi->s_inodes_per_group > sb->s_blocksize * 8 || > + sbi->s_inodes_per_group & 7) { > + ext4_msg(sb, KERN_ERR, "invalid inodes per group: %lu", > sbi->s_inodes_per_group); > return -EINVAL; > } ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 1/3] ext4: reject mount if clusters/inodes per group are not 8-aligned 2026-06-08 11:11 ` [PATCH v2 1/3] ext4: reject mount if clusters/inodes per group are not 8-aligned Baokun Li 2026-06-08 12:53 ` Zhang Yi @ 2026-06-09 10:57 ` Jan Kara 1 sibling, 0 replies; 14+ messages in thread From: Jan Kara @ 2026-06-09 10:57 UTC (permalink / raw) To: Baokun Li Cc: linux-ext4, tytso, adilger.kernel, jack, yi.zhang, ojaswin, ritesh.list, Sashiko On Mon 08-06-26 19:11:48, Baokun Li wrote: > The block and inode bitmap checksums are computed over a whole number of > bytes: ext4_inode_bitmap_csum_*() use EXT4_INODES_PER_GROUP(sb) >> 3 and > ext4_block_bitmap_csum_*() use EXT4_CLUSTERS_PER_GROUP(sb) / 8 as the > length passed to ext4_chksum(). > > If s_inodes_per_group or s_clusters_per_group is not a multiple of 8, the > trailing fractional bits are excluded from the checksum. Those bits are > then unprotected, and any incremental csum update path that assumes a > byte-aligned bitmap can compute a checksum inconsistent with the full > recalculation, corrupting the on-disk bitmap checksum. > > Reject such filesystems at mount time by adding the missing " & 7" > alignment checks alongside the existing range validation. > > Suggested-by: Theodore Ts'o <tytso@mit.edu> > Link: https://patch.msgid.link/h3n7jlfhyna64dn5o76qxcspnhxdddcs6crpxftmy7gnl7b3sx@jenszfpcsnit > Reported-by: Sashiko <sashiko-bot@kernel.org> > Closes: https://sashiko.dev/#/patchset/20260508121539.4174601-1-libaokun%40linux.alibaba.com?part=10 > Signed-off-by: Baokun Li <libaokun@linux.alibaba.com> Looks good. Feel free to add: Reviewed-by: Jan Kara <jack@suse.cz> Honza > --- > fs/ext4/super.c | 10 ++++++---- > 1 file changed, 6 insertions(+), 4 deletions(-) > > diff --git a/fs/ext4/super.c b/fs/ext4/super.c > index 6a77db4d3124..3ddcb4a8d4db 100644 > --- a/fs/ext4/super.c > +++ b/fs/ext4/super.c > @@ -4472,8 +4472,9 @@ static int ext4_handle_clustersize(struct super_block *sb) > sbi->s_cluster_bits = 0; > } > sbi->s_clusters_per_group = le32_to_cpu(es->s_clusters_per_group); > - if (sbi->s_clusters_per_group > sb->s_blocksize * 8) { > - ext4_msg(sb, KERN_ERR, "#clusters per group too big: %lu", > + if (sbi->s_clusters_per_group > sb->s_blocksize * 8 || > + sbi->s_clusters_per_group & 7) { > + ext4_msg(sb, KERN_ERR, "invalid #clusters per group: %lu", > sbi->s_clusters_per_group); > return -EINVAL; > } > @@ -5304,8 +5305,9 @@ static int ext4_block_group_meta_init(struct super_block *sb, int silent) > return -EINVAL; > } > if (sbi->s_inodes_per_group < sbi->s_inodes_per_block || > - sbi->s_inodes_per_group > sb->s_blocksize * 8) { > - ext4_msg(sb, KERN_ERR, "invalid inodes per group: %lu\n", > + sbi->s_inodes_per_group > sb->s_blocksize * 8 || > + sbi->s_inodes_per_group & 7) { > + ext4_msg(sb, KERN_ERR, "invalid inodes per group: %lu", > sbi->s_inodes_per_group); > return -EINVAL; > } > -- > 2.43.7 > -- Jan Kara <jack@suse.com> SUSE Labs, CR ^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH v2 2/3] ext4: reduce max cluster size to match documented 256MB limit 2026-06-08 11:11 [PATCH v2 0/3] ext4: tighten mount-time superblock geometry validation Baokun Li 2026-06-08 11:11 ` [PATCH v2 1/3] ext4: reject mount if clusters/inodes per group are not 8-aligned Baokun Li @ 2026-06-08 11:11 ` Baokun Li 2026-06-08 18:24 ` Andreas Dilger ` (2 more replies) 2026-06-08 11:11 ` [PATCH v2 3/3] ext4: reject mount if inodes per group is not a multiple of inodes per block Baokun Li 2 siblings, 3 replies; 14+ messages in thread From: Baokun Li @ 2026-06-08 11:11 UTC (permalink / raw) To: linux-ext4 Cc: tytso, adilger.kernel, jack, yi.zhang, ojaswin, ritesh.list, Sashiko The mke2fs man page documents: Valid cluster-size values are from 2048 to 256M bytes per cluster. but EXT4_MAX_CLUSTER_LOG_SIZE was set to 30 (1GB), allowing crafted filesystem images to specify cluster sizes up to 1GB. On 32-bit systems with bigalloc enabled, the consistency check in ext4_handle_clustersize(): s_blocks_per_group == s_clusters_per_group * (clustersize / blocksize) can overflow when the cluster ratio is large enough. Since s_blocks_per_group is not range-checked in the bigalloc path, the wrapped product can pass the consistency check, leading to inconsistent group geometry and potential out-of-bounds block allocation. Reduce EXT4_MAX_CLUSTER_LOG_SIZE to 28 to match the documented 256MB limit. With this cap, the maximum product is: (blocksize * 8) * (256M / blocksize) = 2^31 which fits safely in a 32-bit unsigned long for all block sizes. Reported-by: Sashiko <sashiko-bot@kernel.org> Closes: https://sashiko.dev/#/patchset/20260608061112.392391-1-libaokun%40linux.alibaba.com Signed-off-by: Baokun Li <libaokun@linux.alibaba.com> --- fs/ext4/ext4.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index 94283a991e5c..11e41a864db8 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -334,7 +334,7 @@ struct ext4_io_submit { #define EXT4_MAX_BLOCK_SIZE 65536 #define EXT4_MIN_BLOCK_LOG_SIZE 10 #define EXT4_MAX_BLOCK_LOG_SIZE 16 -#define EXT4_MAX_CLUSTER_LOG_SIZE 30 +#define EXT4_MAX_CLUSTER_LOG_SIZE 28 #ifdef __KERNEL__ # define EXT4_BLOCK_SIZE(s) ((s)->s_blocksize) #else -- 2.43.7 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH v2 2/3] ext4: reduce max cluster size to match documented 256MB limit 2026-06-08 11:11 ` [PATCH v2 2/3] ext4: reduce max cluster size to match documented 256MB limit Baokun Li @ 2026-06-08 18:24 ` Andreas Dilger 2026-06-09 2:48 ` Zhang Yi 2026-06-09 11:04 ` Jan Kara 2 siblings, 0 replies; 14+ messages in thread From: Andreas Dilger @ 2026-06-08 18:24 UTC (permalink / raw) To: Baokun Li Cc: linux-ext4, tytso, jack, yi.zhang, ojaswin, ritesh.list, Sashiko On Jun 8, 2026, at 05:11, Baokun Li <libaokun@linux.alibaba.com> wrote: > > The mke2fs man page documents: > > Valid cluster-size values are from 2048 to 256M bytes per cluster. > > but EXT4_MAX_CLUSTER_LOG_SIZE was set to 30 (1GB), allowing crafted > filesystem images to specify cluster sizes up to 1GB. > > On 32-bit systems with bigalloc enabled, the consistency check in > ext4_handle_clustersize(): > > s_blocks_per_group == s_clusters_per_group * (clustersize / blocksize) > > can overflow when the cluster ratio is large enough. Since > s_blocks_per_group is not range-checked in the bigalloc path, the > wrapped product can pass the consistency check, leading to inconsistent > group geometry and potential out-of-bounds block allocation. > > Reduce EXT4_MAX_CLUSTER_LOG_SIZE to 28 to match the documented 256MB > limit. With this cap, the maximum product is: > > (blocksize * 8) * (256M / blocksize) = 2^31 > > which fits safely in a 32-bit unsigned long for all block sizes. > > Reported-by: Sashiko <sashiko-bot@kernel.org> > Closes: https://sashiko.dev/#/patchset/20260608061112.392391-1-libaokun%40linux.alibaba.com > Signed-off-by: Baokun Li <libaokun@linux.alibaba.com> It should be pretty safe. I doubt anyone could be using > 1GiB clusters for real-world usage at this point, given the nature of the bug and the amount of space used for metadata (e.g. 1GiB per directory, etc). At that granularity they can use LVM for space management. Reviewed-by: Andreas Dilger <adilger@dilger.ca <mailto:adilger@dilger.ca>> > --- > fs/ext4/ext4.h | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h > index 94283a991e5c..11e41a864db8 100644 > --- a/fs/ext4/ext4.h > +++ b/fs/ext4/ext4.h > @@ -334,7 +334,7 @@ struct ext4_io_submit { > #define EXT4_MAX_BLOCK_SIZE 65536 > #define EXT4_MIN_BLOCK_LOG_SIZE 10 > #define EXT4_MAX_BLOCK_LOG_SIZE 16 > -#define EXT4_MAX_CLUSTER_LOG_SIZE 30 > +#define EXT4_MAX_CLUSTER_LOG_SIZE 28 > #ifdef __KERNEL__ > # define EXT4_BLOCK_SIZE(s) ((s)->s_blocksize) > #else > -- > 2.43.7 > Cheers, Andreas ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 2/3] ext4: reduce max cluster size to match documented 256MB limit 2026-06-08 11:11 ` [PATCH v2 2/3] ext4: reduce max cluster size to match documented 256MB limit Baokun Li 2026-06-08 18:24 ` Andreas Dilger @ 2026-06-09 2:48 ` Zhang Yi 2026-06-09 9:43 ` Baokun Li 2026-06-09 11:04 ` Jan Kara 2 siblings, 1 reply; 14+ messages in thread From: Zhang Yi @ 2026-06-09 2:48 UTC (permalink / raw) To: Baokun Li, linux-ext4 Cc: tytso, adilger.kernel, jack, ojaswin, ritesh.list, Sashiko Hi, Baokun! On 6/8/2026 7:11 PM, Baokun Li wrote: > The mke2fs man page documents: > > Valid cluster-size values are from 2048 to 256M bytes per cluster. Hmm, I checked the mke2fs(8)[1] and didn't find this sentence, instead, it said: Valid cluster-size values range from 2 to 32768 times the filesystem block size and must be a power of 2. However, the implementation of mkfs does not seem to respect this constraint and instead directly uses static macros to limit the user-specified cluster size. #define EXT2_MIN_BLOCK_LOG_SIZE 10 /* 1024 */ #define EXT2_MIN_CLUSTER_LOG_SIZE EXT2_MIN_BLOCK_LOG_SIZE #define EXT2_MAX_CLUSTER_LOG_SIZE 29 /* 512MB */ This is confusing, or perhaps I missed something. If I understand correctly, users can now format an image with a maximum cluster size of 512 MB (I tried it, and it worked). If the kernel's EXT4_MAX_CLUSTER_LOG_SIZE is limited to 28, this would cause such existing images to become unmountable, even though I don't think images with such a large cluster size actually exist in practice. Therefore, I'm not sure this is safe. [1] https://man7.org/linux/man-pages/man8/mke2fs.8.html > > but EXT4_MAX_CLUSTER_LOG_SIZE was set to 30 (1GB), allowing crafted > filesystem images to specify cluster sizes up to 1GB. > > On 32-bit systems with bigalloc enabled, the consistency check in > ext4_handle_clustersize(): > > s_blocks_per_group == s_clusters_per_group * (clustersize / blocksize) > > can overflow when the cluster ratio is large enough. Since > s_blocks_per_group is not range-checked in the bigalloc path, the > wrapped product can pass the consistency check, leading to inconsistent > group geometry and potential out-of-bounds block allocation. > > Reduce EXT4_MAX_CLUSTER_LOG_SIZE to 28 to match the documented 256MB > limit. With this cap, the maximum product is: > > (blocksize * 8) * (256M / blocksize) = 2^31 > > which fits safely in a 32-bit unsigned long for all block sizes. > > Reported-by: Sashiko <sashiko-bot@kernel.org> > Closes: https://sashiko.dev/#/patchset/20260608061112.392391-1-libaokun%40linux.alibaba.com > Signed-off-by: Baokun Li <libaokun@linux.alibaba.com> > --- > fs/ext4/ext4.h | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h > index 94283a991e5c..11e41a864db8 100644 > --- a/fs/ext4/ext4.h > +++ b/fs/ext4/ext4.h > @@ -334,7 +334,7 @@ struct ext4_io_submit { > #define EXT4_MAX_BLOCK_SIZE 65536 > #define EXT4_MIN_BLOCK_LOG_SIZE 10 > #define EXT4_MAX_BLOCK_LOG_SIZE 16 > -#define EXT4_MAX_CLUSTER_LOG_SIZE 30 > +#define EXT4_MAX_CLUSTER_LOG_SIZE 28 > #ifdef __KERNEL__ > # define EXT4_BLOCK_SIZE(s) ((s)->s_blocksize) > #else ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 2/3] ext4: reduce max cluster size to match documented 256MB limit 2026-06-09 2:48 ` Zhang Yi @ 2026-06-09 9:43 ` Baokun Li 0 siblings, 0 replies; 14+ messages in thread From: Baokun Li @ 2026-06-09 9:43 UTC (permalink / raw) To: Zhang Yi Cc: linux-ext4, tytso, adilger.kernel, jack, ojaswin, ritesh.list, Sashiko On 2026/6/9 10:48, Zhang Yi wrote: > Hi, Baokun! Hi Yi, Thank you for you review! > On 6/8/2026 7:11 PM, Baokun Li wrote: >> The mke2fs man page documents: >> >> Valid cluster-size values are from 2048 to 256M bytes per cluster. > Hmm, I checked the mke2fs(8)[1] and didn't find this sentence, instead, > it said: > > Valid cluster-size values range from 2 to 32768 times the filesystem > block size and must be a power of 2. Oh, I was indeed looking at a slightly older man page. It was changed to the current description in commit 87cbb381f2e2 ("e2fsprogs: misc/mke2fs.8.in: Correct valid cluster-size values"). That commit assumes the number of blocks in a cluster cannot exceed the maximum number of blocks that a single extent can hold (EXT_INIT_MAX_LEN = 32768). I believe this is wrong — a single cluster can be covered by multiple extents. So we should fix the bug in ext_falloc_helper() rather than working around it by adding a restriction to the documentation. The root cause is in ext_falloc_helper(): max_uninit_len = EXT_UNINIT_MAX_LEN & ~EXT2FS_CLUSTER_MASK(fs); max_init_len = EXT_INIT_MAX_LEN & ~EXT2FS_CLUSTER_MASK(fs); When cluster_ratio >= EXT_INIT_MAX_LEN (32768), the `& ~CLUSTER_MASK` operation zeroes out both values, causing ext2fs_new_range() to receive len=0 and return EXT2_ET_INVALID_ARGUMENT. The kernel handles this correctly — ext4_ext_map_blocks() simply truncates m_len to EXT_INIT_MAX_LEN and lets the caller loop. On subsequent iterations, get_implied_cluster_alloc() detects that the requested block falls within an already-allocated cluster by examining adjacent extents, and returns the physical address without allocating a new cluster. I'll send a patch to fix ext_falloc_helper() so that when cluster_ratio >= EXT_INIT_MAX_LEN, the allocation loop can create multiple extents within the same cluster — skipping ext2fs_new_range() and claim_range() for mid-cluster iterations where the physical blocks are already claimed. > > However, the implementation of mkfs does not seem to respect this > constraint and instead directly uses static macros to limit the > user-specified cluster size. > > #define EXT2_MIN_BLOCK_LOG_SIZE 10 /* 1024 */ > #define EXT2_MIN_CLUSTER_LOG_SIZE EXT2_MIN_BLOCK_LOG_SIZE > #define EXT2_MAX_CLUSTER_LOG_SIZE 29 /* 512MB */ I was going to set EXT2_MAX_CLUSTER_LOG_SIZE to 28 to align with the kernel's EXT4_MAX_CLUSTER_LOG_SIZE. > > This is confusing, or perhaps I missed something. If I understand > correctly, users can now format an image with a maximum cluster size of > 512 MB (I tried it, and it worked). Yes, a 512M cluster size also works correctly on the current kernel. The upper limit on cluster size here is to avoid the 32-bit overflow issue that Sashiko mentioned — theoretically the maximum could be 512M. It's just that the old mke2fs documentation originally specified 256M as the intended maximum, so both the kernel and e2fsprogs target 256M as the upper limit. If anyone prefers 512M, I'm happy to change it. > If the kernel's > EXT4_MAX_CLUSTER_LOG_SIZE is limited to 28, this would cause such > existing images to become unmountable, even though I don't think images > with such a large cluster size actually exist in practice. Therefore, > I'm not sure this is safe. > > [1] https://man7.org/linux/man-pages/man8/mke2fs.8.html > I personally don't think such images are likely to exist in practice, since formatting with the default 4K block size would already fail. Thanks, Baokun ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 2/3] ext4: reduce max cluster size to match documented 256MB limit 2026-06-08 11:11 ` [PATCH v2 2/3] ext4: reduce max cluster size to match documented 256MB limit Baokun Li 2026-06-08 18:24 ` Andreas Dilger 2026-06-09 2:48 ` Zhang Yi @ 2026-06-09 11:04 ` Jan Kara 2 siblings, 0 replies; 14+ messages in thread From: Jan Kara @ 2026-06-09 11:04 UTC (permalink / raw) To: Baokun Li Cc: linux-ext4, tytso, adilger.kernel, jack, yi.zhang, ojaswin, ritesh.list, Sashiko On Mon 08-06-26 19:11:49, Baokun Li wrote: > The mke2fs man page documents: > > Valid cluster-size values are from 2048 to 256M bytes per cluster. > > but EXT4_MAX_CLUSTER_LOG_SIZE was set to 30 (1GB), allowing crafted > filesystem images to specify cluster sizes up to 1GB. > > On 32-bit systems with bigalloc enabled, the consistency check in > ext4_handle_clustersize(): > > s_blocks_per_group == s_clusters_per_group * (clustersize / blocksize) > > can overflow when the cluster ratio is large enough. Since > s_blocks_per_group is not range-checked in the bigalloc path, the > wrapped product can pass the consistency check, leading to inconsistent > group geometry and potential out-of-bounds block allocation. > > Reduce EXT4_MAX_CLUSTER_LOG_SIZE to 28 to match the documented 256MB > limit. With this cap, the maximum product is: > > (blocksize * 8) * (256M / blocksize) = 2^31 > > which fits safely in a 32-bit unsigned long for all block sizes. > > Reported-by: Sashiko <sashiko-bot@kernel.org> > Closes: https://sashiko.dev/#/patchset/20260608061112.392391-1-libaokun%40linux.alibaba.com > Signed-off-by: Baokun Li <libaokun@linux.alibaba.com> I guess I'll leave it up to Ted to decide whether 28 or 29 in the right value. Both are fine with me if we are consistent with them :) Honza > --- > fs/ext4/ext4.h | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h > index 94283a991e5c..11e41a864db8 100644 > --- a/fs/ext4/ext4.h > +++ b/fs/ext4/ext4.h > @@ -334,7 +334,7 @@ struct ext4_io_submit { > #define EXT4_MAX_BLOCK_SIZE 65536 > #define EXT4_MIN_BLOCK_LOG_SIZE 10 > #define EXT4_MAX_BLOCK_LOG_SIZE 16 > -#define EXT4_MAX_CLUSTER_LOG_SIZE 30 > +#define EXT4_MAX_CLUSTER_LOG_SIZE 28 > #ifdef __KERNEL__ > # define EXT4_BLOCK_SIZE(s) ((s)->s_blocksize) > #else > -- > 2.43.7 > -- Jan Kara <jack@suse.com> SUSE Labs, CR ^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH v2 3/3] ext4: reject mount if inodes per group is not a multiple of inodes per block 2026-06-08 11:11 [PATCH v2 0/3] ext4: tighten mount-time superblock geometry validation Baokun Li 2026-06-08 11:11 ` [PATCH v2 1/3] ext4: reject mount if clusters/inodes per group are not 8-aligned Baokun Li 2026-06-08 11:11 ` [PATCH v2 2/3] ext4: reduce max cluster size to match documented 256MB limit Baokun Li @ 2026-06-08 11:11 ` Baokun Li 2026-06-08 18:27 ` Andreas Dilger ` (2 more replies) 2 siblings, 3 replies; 14+ messages in thread From: Baokun Li @ 2026-06-08 11:11 UTC (permalink / raw) To: linux-ext4 Cc: tytso, adilger.kernel, jack, yi.zhang, ojaswin, ritesh.list, Sashiko If s_inodes_per_group is not a multiple of s_inodes_per_block, the division that computes s_itb_per_group truncates, reserving fewer blocks for the inode table than needed. On a crafted filesystem image, this allows __ext4_get_inode_loc() to compute a block offset beyond the inode table, reading unrelated data as an inode structure. Add the missing divisibility check alongside the existing validation in ext4_block_group_meta_init(). Reported-by: Sashiko <sashiko-bot@kernel.org> Closes: https://sashiko.dev/#/patchset/20260608061112.392391-1-libaokun%40linux.alibaba.com Signed-off-by: Baokun Li <libaokun@linux.alibaba.com> --- fs/ext4/super.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/fs/ext4/super.c b/fs/ext4/super.c index 3ddcb4a8d4db..5ec9e1ef00c0 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -5306,7 +5306,8 @@ static int ext4_block_group_meta_init(struct super_block *sb, int silent) } if (sbi->s_inodes_per_group < sbi->s_inodes_per_block || sbi->s_inodes_per_group > sb->s_blocksize * 8 || - sbi->s_inodes_per_group & 7) { + sbi->s_inodes_per_group & 7 || + sbi->s_inodes_per_group % sbi->s_inodes_per_block) { ext4_msg(sb, KERN_ERR, "invalid inodes per group: %lu", sbi->s_inodes_per_group); return -EINVAL; -- 2.43.7 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH v2 3/3] ext4: reject mount if inodes per group is not a multiple of inodes per block 2026-06-08 11:11 ` [PATCH v2 3/3] ext4: reject mount if inodes per group is not a multiple of inodes per block Baokun Li @ 2026-06-08 18:27 ` Andreas Dilger 2026-06-09 1:46 ` Baokun Li 2026-06-09 3:33 ` Zhang Yi 2026-06-09 11:00 ` Jan Kara 2 siblings, 1 reply; 14+ messages in thread From: Andreas Dilger @ 2026-06-08 18:27 UTC (permalink / raw) To: Baokun Li Cc: linux-ext4, tytso, jack, yi.zhang, ojaswin, ritesh.list, Sashiko On Jun 8, 2026, at 05:11, Baokun Li <libaokun@linux.alibaba.com> wrote: > > If s_inodes_per_group is not a multiple of s_inodes_per_block, the > division that computes s_itb_per_group truncates, reserving fewer blocks > for the inode table than needed. > > On a crafted filesystem image, this allows __ext4_get_inode_loc() to > compute a block offset beyond the inode table, reading unrelated data as > an inode structure. > > Add the missing divisibility check alongside the existing validation in > ext4_block_group_meta_init(). > > Reported-by: Sashiko <sashiko-bot@kernel.org> > Closes: https://sashiko.dev/#/patchset/20260608061112.392391-1-libaokun%40linux.alibaba.com > Signed-off-by: Baokun Li <libaokun@linux.alibaba.com> Looks good. Is this also fixed/checked in e2fsprogs? There is really no reason *not* to use all the space in the last itable block for inodes. Reviewed-by: Andreas Dilger <adilger@dilger.ca <mailto:adilger@dilger.ca>> > --- > fs/ext4/super.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/fs/ext4/super.c b/fs/ext4/super.c > index 3ddcb4a8d4db..5ec9e1ef00c0 100644 > --- a/fs/ext4/super.c > +++ b/fs/ext4/super.c > @@ -5306,7 +5306,8 @@ static int ext4_block_group_meta_init(struct super_block *sb, > } > if (sbi->s_inodes_per_group < sbi->s_inodes_per_block || > sbi->s_inodes_per_group > sb->s_blocksize * 8 || > - sbi->s_inodes_per_group & 7) { > + sbi->s_inodes_per_group & 7 || > + sbi->s_inodes_per_group % sbi->s_inodes_per_block) { > ext4_msg(sb, KERN_ERR, "invalid inodes per group: %lu", > sbi->s_inodes_per_group); > return -EINVAL; > -- > 2.43.7 > Cheers, Andreas ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 3/3] ext4: reject mount if inodes per group is not a multiple of inodes per block 2026-06-08 18:27 ` Andreas Dilger @ 2026-06-09 1:46 ` Baokun Li 0 siblings, 0 replies; 14+ messages in thread From: Baokun Li @ 2026-06-09 1:46 UTC (permalink / raw) To: Andreas Dilger Cc: linux-ext4, tytso, jack, yi.zhang, ojaswin, ritesh.list, Sashiko On 2026/6/9 02:27, Andreas Dilger wrote: > On Jun 8, 2026, at 05:11, Baokun Li <libaokun@linux.alibaba.com> wrote: >> If s_inodes_per_group is not a multiple of s_inodes_per_block, the >> division that computes s_itb_per_group truncates, reserving fewer blocks >> for the inode table than needed. >> >> On a crafted filesystem image, this allows __ext4_get_inode_loc() to >> compute a block offset beyond the inode table, reading unrelated data as >> an inode structure. >> >> Add the missing divisibility check alongside the existing validation in >> ext4_block_group_meta_init(). >> >> Reported-by: Sashiko <sashiko-bot@kernel.org> >> Closes: https://sashiko.dev/#/patchset/20260608061112.392391-1-libaokun%40linux.alibaba.com >> Signed-off-by: Baokun Li <libaokun@linux.alibaba.com> Hi Andreas, Thank you for your review! > Looks good. Is this also fixed/checked in e2fsprogs? Yes, a similar patchset adding analogous checks to e2fsprogs is also in the works. > There is really no reason *not* to use all the space in the last itable block for inodes. > > Reviewed-by: Andreas Dilger <adilger@dilger.ca <mailto:adilger@dilger.ca>> Indeed. Images created by mke2fs do not exhibit this issue; it only occurs with certain crafted, malformed images. Thanks, Baokun ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 3/3] ext4: reject mount if inodes per group is not a multiple of inodes per block 2026-06-08 11:11 ` [PATCH v2 3/3] ext4: reject mount if inodes per group is not a multiple of inodes per block Baokun Li 2026-06-08 18:27 ` Andreas Dilger @ 2026-06-09 3:33 ` Zhang Yi 2026-06-09 11:00 ` Jan Kara 2 siblings, 0 replies; 14+ messages in thread From: Zhang Yi @ 2026-06-09 3:33 UTC (permalink / raw) To: Baokun Li, linux-ext4 Cc: tytso, adilger.kernel, jack, ojaswin, ritesh.list, Sashiko On 6/8/2026 7:11 PM, Baokun Li wrote: > If s_inodes_per_group is not a multiple of s_inodes_per_block, the > division that computes s_itb_per_group truncates, reserving fewer blocks > for the inode table than needed. > > On a crafted filesystem image, this allows __ext4_get_inode_loc() to > compute a block offset beyond the inode table, reading unrelated data as > an inode structure. > > Add the missing divisibility check alongside the existing validation in > ext4_block_group_meta_init(). > > Reported-by: Sashiko <sashiko-bot@kernel.org> > Closes: https://sashiko.dev/#/patchset/20260608061112.392391-1-libaokun%40linux.alibaba.com > Signed-off-by: Baokun Li <libaokun@linux.alibaba.com> Looks good to me. Reviewed-by: Zhang Yi <yi.zhang@huawei.com> > --- > fs/ext4/super.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/fs/ext4/super.c b/fs/ext4/super.c > index 3ddcb4a8d4db..5ec9e1ef00c0 100644 > --- a/fs/ext4/super.c > +++ b/fs/ext4/super.c > @@ -5306,7 +5306,8 @@ static int ext4_block_group_meta_init(struct super_block *sb, int silent) > } > if (sbi->s_inodes_per_group < sbi->s_inodes_per_block || > sbi->s_inodes_per_group > sb->s_blocksize * 8 || > - sbi->s_inodes_per_group & 7) { > + sbi->s_inodes_per_group & 7 || > + sbi->s_inodes_per_group % sbi->s_inodes_per_block) { > ext4_msg(sb, KERN_ERR, "invalid inodes per group: %lu", > sbi->s_inodes_per_group); > return -EINVAL; ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 3/3] ext4: reject mount if inodes per group is not a multiple of inodes per block 2026-06-08 11:11 ` [PATCH v2 3/3] ext4: reject mount if inodes per group is not a multiple of inodes per block Baokun Li 2026-06-08 18:27 ` Andreas Dilger 2026-06-09 3:33 ` Zhang Yi @ 2026-06-09 11:00 ` Jan Kara 2 siblings, 0 replies; 14+ messages in thread From: Jan Kara @ 2026-06-09 11:00 UTC (permalink / raw) To: Baokun Li Cc: linux-ext4, tytso, adilger.kernel, jack, yi.zhang, ojaswin, ritesh.list, Sashiko On Mon 08-06-26 19:11:50, Baokun Li wrote: > If s_inodes_per_group is not a multiple of s_inodes_per_block, the > division that computes s_itb_per_group truncates, reserving fewer blocks > for the inode table than needed. > > On a crafted filesystem image, this allows __ext4_get_inode_loc() to > compute a block offset beyond the inode table, reading unrelated data as > an inode structure. > > Add the missing divisibility check alongside the existing validation in > ext4_block_group_meta_init(). > > Reported-by: Sashiko <sashiko-bot@kernel.org> > Closes: https://sashiko.dev/#/patchset/20260608061112.392391-1-libaokun%40linux.alibaba.com > Signed-off-by: Baokun Li <libaokun@linux.alibaba.com> Looks good. Feel free to add: Reviewed-by: Jan Kara <jack@suse.cz> Honza > --- > fs/ext4/super.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/fs/ext4/super.c b/fs/ext4/super.c > index 3ddcb4a8d4db..5ec9e1ef00c0 100644 > --- a/fs/ext4/super.c > +++ b/fs/ext4/super.c > @@ -5306,7 +5306,8 @@ static int ext4_block_group_meta_init(struct super_block *sb, int silent) > } > if (sbi->s_inodes_per_group < sbi->s_inodes_per_block || > sbi->s_inodes_per_group > sb->s_blocksize * 8 || > - sbi->s_inodes_per_group & 7) { > + sbi->s_inodes_per_group & 7 || > + sbi->s_inodes_per_group % sbi->s_inodes_per_block) { > ext4_msg(sb, KERN_ERR, "invalid inodes per group: %lu", > sbi->s_inodes_per_group); > return -EINVAL; > -- > 2.43.7 > -- Jan Kara <jack@suse.com> SUSE Labs, CR ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2026-06-09 11:04 UTC | newest] Thread overview: 14+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-06-08 11:11 [PATCH v2 0/3] ext4: tighten mount-time superblock geometry validation Baokun Li 2026-06-08 11:11 ` [PATCH v2 1/3] ext4: reject mount if clusters/inodes per group are not 8-aligned Baokun Li 2026-06-08 12:53 ` Zhang Yi 2026-06-09 10:57 ` Jan Kara 2026-06-08 11:11 ` [PATCH v2 2/3] ext4: reduce max cluster size to match documented 256MB limit Baokun Li 2026-06-08 18:24 ` Andreas Dilger 2026-06-09 2:48 ` Zhang Yi 2026-06-09 9:43 ` Baokun Li 2026-06-09 11:04 ` Jan Kara 2026-06-08 11:11 ` [PATCH v2 3/3] ext4: reject mount if inodes per group is not a multiple of inodes per block Baokun Li 2026-06-08 18:27 ` Andreas Dilger 2026-06-09 1:46 ` Baokun Li 2026-06-09 3:33 ` Zhang Yi 2026-06-09 11:00 ` Jan Kara
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox