public inbox for linux-ext4@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] Fix default orphan file size calculations
@ 2026-03-11 14:11 Theodore Ts'o
  2026-03-11 15:01 ` Andreas Dilger
  2026-03-11 16:27 ` Baokun Li
  0 siblings, 2 replies; 5+ messages in thread
From: Theodore Ts'o @ 2026-03-11 14:11 UTC (permalink / raw)
  To: Ext4 Developers List; +Cc: Baokun Li, Theodore Ts'o

Due to missing parenthesis around the #defines for
EXT4_{MAX,DEFAULT}_ORPHAN_FILE_SIZE, the default number of blocks
assigned to the orphan file was calculated as 2 when the file system
has more than 2**21 blocks:

   % mke2fs  -t ext4 -Fq /tmp/foo.img 8191M
   /tmp/foo.img contains a ext4 file system
           created on Wed Mar 11 09:54:04 2026
   % debugfs -R "extents <12>" /tmp/foo.img
   debugfs 1.47.4 (6-Mar-2025)
   Level Entries       Logical          Physical Length Flags
    0/ 0   1/  1     0 -   510    9255 -    9765    511

   % mke2fs  -t ext4 -Fq /tmp/foo.img 8192M
   /tmp/foo.img contains a ext4 file system
           created on Wed Mar 11 09:54:11 2026
   % debugfs -R "extents <12>" /tmp/foo.img
   debugfs 1.47.4 (6-Mar-2025)
   Level Entries       Logical          Physical Length Flags
    0/ 0   1/  1     0 -     1    9255 -    9256      2  <======

In addition, looking at how the defaults was calculated, there were
some unit mismatches where number of blocks was compared with number
of bytes, and number of blocks divided by the blocksize.  As a result,
the number of blocks was much larger than before when the file system
blocksize was 1k, and much smaller than when the file system was 64k,
which was a bit surprising.

This caused a libblockdev regression test to fail when it created an
ext4 file system using a 1k blocksize and had a size of 138240k.
Before e2fsprogs 1.47.4, the size of the orphan file was 33k; but in
e2fsprogs 1.47.4, the size of the orphan file became 135k.  This
pushed the overhead just over 10%, which triggered the test failure.

So simplify the calculation so that default of the orphan file ranges
from 32 file system blocks and EXT4_DEFAULT_ORPHAN_SIZE, and avoids
the use of the file system blocksize.

Fixes: 6f03c698ef53 ("libext2fs: fix orphan file size > kernel limit with large blocksize")
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
---
 lib/ext2fs/ext2fs.h |  4 ++--
 lib/ext2fs/orphan.c | 12 ++++++------
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/lib/ext2fs/ext2fs.h b/lib/ext2fs/ext2fs.h
index d9df007c4..92da4d140 100644
--- a/lib/ext2fs/ext2fs.h
+++ b/lib/ext2fs/ext2fs.h
@@ -1819,8 +1819,8 @@ errcode_t ext2fs_set_data_io(ext2_filsys fs, io_channel new_io);
 errcode_t ext2fs_rewrite_to_io(ext2_filsys fs, io_channel new_io);
 
 /* orphan.c */
-#define EXT4_MAX_ORPHAN_FILE_SIZE	8 << 20
-#define EXT4_DEFAULT_ORPHAN_FILE_SIZE	2 << 20
+#define EXT4_MAX_ORPHAN_FILE_SIZE	(8 << 20)
+#define EXT4_DEFAULT_ORPHAN_FILE_SIZE	(2 << 20)
 extern errcode_t ext2fs_create_orphan_file(ext2_filsys fs, blk_t num_blocks);
 extern errcode_t ext2fs_truncate_orphan_file(ext2_filsys fs);
 extern e2_blkcnt_t ext2fs_default_orphan_file_blocks(ext2_filsys fs);
diff --git a/lib/ext2fs/orphan.c b/lib/ext2fs/orphan.c
index 40b1c5c72..b894f2468 100644
--- a/lib/ext2fs/orphan.c
+++ b/lib/ext2fs/orphan.c
@@ -218,18 +218,18 @@ out:
 
 /*
  * Find reasonable size for orphan file. We choose orphan file size to be
- * between 32 filesystem blocks and EXT4_DEFAULT_ORPHAN_FILE_SIZE, and not
- * more than 1/fs->blocksize of the filesystem unless it is really small.
+ * between 32 filesystem blocks and EXT4_DEFAULT_ORPHAN_FILE_SIZE.
  */
 e2_blkcnt_t ext2fs_default_orphan_file_blocks(ext2_filsys fs)
 {
 	__u64 num_blocks = ext2fs_blocks_count(fs->super);
-	e2_blkcnt_t blks = EXT4_DEFAULT_ORPHAN_FILE_SIZE / fs->blocksize;
+	e2_blkcnt_t blks = num_blocks / 4096;
+	e2_blkcnt_t max_blks = EXT4_DEFAULT_ORPHAN_FILE_SIZE / fs->blocksize;
 
-	if (num_blocks < 128 * 1024)
+	if (blks < 32)
 		blks = 32;
-	else if (num_blocks < EXT4_DEFAULT_ORPHAN_FILE_SIZE)
-		blks = num_blocks / fs->blocksize;
+	if (blks > max_blks)
+		blks = max_blks;
 	return (blks + EXT2FS_CLUSTER_MASK(fs)) & ~EXT2FS_CLUSTER_MASK(fs);
 }
 
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] Fix default orphan file size calculations
  2026-03-11 14:11 [PATCH] Fix default orphan file size calculations Theodore Ts'o
@ 2026-03-11 15:01 ` Andreas Dilger
  2026-03-11 16:27 ` Baokun Li
  1 sibling, 0 replies; 5+ messages in thread
From: Andreas Dilger @ 2026-03-11 15:01 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: Ext4 Developers List, Baokun Li

On Mar 11, 2026, at 08:11, Theodore Ts'o <tytso@mit.edu> wrote:
> 
> Due to missing parenthesis around the #defines for
> EXT4_{MAX,DEFAULT}_ORPHAN_FILE_SIZE, the default number of blocks
> assigned to the orphan file was calculated as 2 when the file system
> has more than 2**21 blocks:
> 
>   % mke2fs  -t ext4 -Fq /tmp/foo.img 8191M
>   /tmp/foo.img contains a ext4 file system
>           created on Wed Mar 11 09:54:04 2026
>   % debugfs -R "extents <12>" /tmp/foo.img
>   debugfs 1.47.4 (6-Mar-2025)
>   Level Entries       Logical          Physical Length Flags
>    0/ 0   1/  1     0 -   510    9255 -    9765    511
> 
>   % mke2fs  -t ext4 -Fq /tmp/foo.img 8192M
>   /tmp/foo.img contains a ext4 file system
>           created on Wed Mar 11 09:54:11 2026
>   % debugfs -R "extents <12>" /tmp/foo.img
>   debugfs 1.47.4 (6-Mar-2025)
>   Level Entries       Logical          Physical Length Flags
>    0/ 0   1/  1     0 -     1    9255 -    9256      2  <======
> 
> In addition, looking at how the defaults was calculated, there were
> some unit mismatches where number of blocks was compared with number
> of bytes, and number of blocks divided by the blocksize.  As a result,
> the number of blocks was much larger than before when the file system
> blocksize was 1k, and much smaller than when the file system was 64k,
> which was a bit surprising.
> 
> This caused a libblockdev regression test to fail when it created an
> ext4 file system using a 1k blocksize and had a size of 138240k.
> Before e2fsprogs 1.47.4, the size of the orphan file was 33k; but in
> e2fsprogs 1.47.4, the size of the orphan file became 135k.  This
> pushed the overhead just over 10%, which triggered the test failure.
> 
> So simplify the calculation so that default of the orphan file ranges
> from 32 file system blocks and EXT4_DEFAULT_ORPHAN_SIZE, and avoids
> the use of the file system blocksize.
> 
> Fixes: 6f03c698ef53 ("libext2fs: fix orphan file size > kernel limit with large blocksize")
> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
> ---
> lib/ext2fs/ext2fs.h |  4 ++--
> lib/ext2fs/orphan.c | 12 ++++++------
> 2 files changed, 8 insertions(+), 8 deletions(-)
> 
> diff --git a/lib/ext2fs/ext2fs.h b/lib/ext2fs/ext2fs.h
> index d9df007c4..92da4d140 100644
> --- a/lib/ext2fs/ext2fs.h
> +++ b/lib/ext2fs/ext2fs.h
> @@ -1819,8 +1819,8 @@ errcode_t ext2fs_set_data_io(ext2_filsys fs, io_channel new_io);
> errcode_t ext2fs_rewrite_to_io(ext2_filsys fs, io_channel new_io);
> 
> /* orphan.c */
> -#define EXT4_MAX_ORPHAN_FILE_SIZE 8 << 20
> -#define EXT4_DEFAULT_ORPHAN_FILE_SIZE 2 << 20
> +#define EXT4_MAX_ORPHAN_FILE_SIZE (8 << 20)
> +#define EXT4_DEFAULT_ORPHAN_FILE_SIZE (2 << 20)

Some of the issues here would be avoided with better variable naming,
like "EXT4_MAX_ORPHAN_FILE_BYTES" and "EXT4_DEFAULT_ORPHAN_FILE_BYTES"
rather than "size", which is a bit ambiguous.

Cheers, Andreas

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] Fix default orphan file size calculations
  2026-03-11 14:11 [PATCH] Fix default orphan file size calculations Theodore Ts'o
  2026-03-11 15:01 ` Andreas Dilger
@ 2026-03-11 16:27 ` Baokun Li
  2026-03-11 17:27   ` Theodore Tso
  1 sibling, 1 reply; 5+ messages in thread
From: Baokun Li @ 2026-03-11 16:27 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: Ext4 Developers List, libaokun


On 3/11/26 10:11 PM, Theodore Ts'o wrote:
> Due to missing parenthesis around the #defines for
> EXT4_{MAX,DEFAULT}_ORPHAN_FILE_SIZE, the default number of blocks
> assigned to the orphan file was calculated as 2 when the file system
> has more than 2**21 blocks:
>
>    % mke2fs  -t ext4 -Fq /tmp/foo.img 8191M
>    /tmp/foo.img contains a ext4 file system
>            created on Wed Mar 11 09:54:04 2026
>    % debugfs -R "extents <12>" /tmp/foo.img
>    debugfs 1.47.4 (6-Mar-2025)
>    Level Entries       Logical          Physical Length Flags
>     0/ 0   1/  1     0 -   510    9255 -    9765    511
>
>    % mke2fs  -t ext4 -Fq /tmp/foo.img 8192M
>    /tmp/foo.img contains a ext4 file system
>            created on Wed Mar 11 09:54:11 2026
>    % debugfs -R "extents <12>" /tmp/foo.img
>    debugfs 1.47.4 (6-Mar-2025)
>    Level Entries       Logical          Physical Length Flags
>     0/ 0   1/  1     0 -     1    9255 -    9256      2  <======
>
> In addition, looking at how the defaults was calculated, there were
> some unit mismatches where number of blocks was compared with number
> of bytes, and number of blocks divided by the blocksize.  As a result,
> the number of blocks was much larger than before when the file system
> blocksize was 1k, and much smaller than when the file system was 64k,
> which was a bit surprising.
>
> This caused a libblockdev regression test to fail when it created an
> ext4 file system using a 1k blocksize and had a size of 138240k.
> Before e2fsprogs 1.47.4, the size of the orphan file was 33k; but in
> e2fsprogs 1.47.4, the size of the orphan file became 135k.  This
> pushed the overhead just over 10%, which triggered the test failure.
>
> So simplify the calculation so that default of the orphan file ranges
> from 32 file system blocks and EXT4_DEFAULT_ORPHAN_SIZE, and avoids
> the use of the file system blocksize.
>
> Fixes: 6f03c698ef53 ("libext2fs: fix orphan file size > kernel limit with large blocksize")
> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
> ---

Hi Ted,

I’m very sorry for introducing this issue. 

In fact, the patch that introduced it was not the latest version —
there was a later v3 that is consistent with the upstream kernel:

https://lore.kernel.org/linux-ext4/20251120135514.3013973-1-libaokun@huaweicloud.com/

I noticed the wrong version had been applied and sent you a private email
at the time, but it seems that message fell through the cracks. I then got
caught up with other things and unfortunately lost track of this as well.
My apologies.

Would it be possible to revert the patch that introduced the issue and
apply the v3 version instead? Alternatively, I can send a new patch that
includes both the revert and the v3 changes.


Thanks,
Baokun

>  lib/ext2fs/ext2fs.h |  4 ++--
>  lib/ext2fs/orphan.c | 12 ++++++------
>  2 files changed, 8 insertions(+), 8 deletions(-)
>
> diff --git a/lib/ext2fs/ext2fs.h b/lib/ext2fs/ext2fs.h
> index d9df007c4..92da4d140 100644
> --- a/lib/ext2fs/ext2fs.h
> +++ b/lib/ext2fs/ext2fs.h
> @@ -1819,8 +1819,8 @@ errcode_t ext2fs_set_data_io(ext2_filsys fs, io_channel new_io);
>  errcode_t ext2fs_rewrite_to_io(ext2_filsys fs, io_channel new_io);
>  
>  /* orphan.c */
> -#define EXT4_MAX_ORPHAN_FILE_SIZE	8 << 20
> -#define EXT4_DEFAULT_ORPHAN_FILE_SIZE	2 << 20
> +#define EXT4_MAX_ORPHAN_FILE_SIZE	(8 << 20)
> +#define EXT4_DEFAULT_ORPHAN_FILE_SIZE	(2 << 20)
>  extern errcode_t ext2fs_create_orphan_file(ext2_filsys fs, blk_t num_blocks);
>  extern errcode_t ext2fs_truncate_orphan_file(ext2_filsys fs);
>  extern e2_blkcnt_t ext2fs_default_orphan_file_blocks(ext2_filsys fs);
> diff --git a/lib/ext2fs/orphan.c b/lib/ext2fs/orphan.c
> index 40b1c5c72..b894f2468 100644
> --- a/lib/ext2fs/orphan.c
> +++ b/lib/ext2fs/orphan.c
> @@ -218,18 +218,18 @@ out:
>  
>  /*
>   * Find reasonable size for orphan file. We choose orphan file size to be
> - * between 32 filesystem blocks and EXT4_DEFAULT_ORPHAN_FILE_SIZE, and not
> - * more than 1/fs->blocksize of the filesystem unless it is really small.
> + * between 32 filesystem blocks and EXT4_DEFAULT_ORPHAN_FILE_SIZE.
>   */
>  e2_blkcnt_t ext2fs_default_orphan_file_blocks(ext2_filsys fs)
>  {
>  	__u64 num_blocks = ext2fs_blocks_count(fs->super);
> -	e2_blkcnt_t blks = EXT4_DEFAULT_ORPHAN_FILE_SIZE / fs->blocksize;
> +	e2_blkcnt_t blks = num_blocks / 4096;
> +	e2_blkcnt_t max_blks = EXT4_DEFAULT_ORPHAN_FILE_SIZE / fs->blocksize;
>  
> -	if (num_blocks < 128 * 1024)
> +	if (blks < 32)
>  		blks = 32;
> -	else if (num_blocks < EXT4_DEFAULT_ORPHAN_FILE_SIZE)
> -		blks = num_blocks / fs->blocksize;
> +	if (blks > max_blks)
> +		blks = max_blks;
>  	return (blks + EXT2FS_CLUSTER_MASK(fs)) & ~EXT2FS_CLUSTER_MASK(fs);
>  }
>  

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] Fix default orphan file size calculations
  2026-03-11 16:27 ` Baokun Li
@ 2026-03-11 17:27   ` Theodore Tso
  2026-03-12  4:58     ` Baokun Li
  0 siblings, 1 reply; 5+ messages in thread
From: Theodore Tso @ 2026-03-11 17:27 UTC (permalink / raw)
  To: Baokun Li; +Cc: Ext4 Developers List

On Thu, Mar 12, 2026 at 12:27:07AM +0800, Baokun Li wrote:
> 
> In fact, the patch that introduced it was not the latest version —
> there was a later v3 that is consistent with the upstream kernel:

Hmm, I wonder why b4 didn't pick up the newer version of the patch.
Maybe I screwed up and missed the -c option to "b4 am -c".

The main difference between my proposed fix and your v3 patch is that
if the user doesn't specify an explicit orphan file size, with the v3
patch, it might max out to 8MB when the block size is 64k.  With my
fix, it will max out to 2MB in those situations.  The user can
explicitly specify a orphan file size as 8MB, but it makes the default
to be 2MB.

In retrospect, I think we went wrong when we capped the orphan file in
terms of bytes instead of blocks in the kernel.  When the block size
is 64k, 8MB is only 32 blocks.  When the block size is 4k, the orphan
file size can be up to 512 blocks.  And the scalability is really a
function of the number of blocks, not the number of bytes, since with
the orphan file, we use a hash that maps the cpu number to a logical
block number in orphan size.

Now, most of the time, I suspect 32 blocks is plenty most of the time,
since it's unlikely we'll have that many running processes trying to
truncate files or something else that requires adding the inode to the
oprhan file.

So I thought about just using a default orphan inode size of 32 file
system blocks.  I also thought about changing the kernel to allow size
of the orphan file to be say, up to 256 blocks.  And also maybe
allowing mke2fs and tune2fs to accept an extended options
orphan_file_blocks which takes an argument denominated in blocks.

Ultimately, though, *most* of the time, consuming 512MB on the orphan
file inode if the file system is say, 2TB.  So I decided it wasn't
worth the effort to change how things worked.  But if we were starting
from scratch, I think we would have been better of doing things in
terms of blocks, instead of bytes.

But maybe we should go and make that change.  What do folks think?

    	     	       	   	     	      - Ted

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] Fix default orphan file size calculations
  2026-03-11 17:27   ` Theodore Tso
@ 2026-03-12  4:58     ` Baokun Li
  0 siblings, 0 replies; 5+ messages in thread
From: Baokun Li @ 2026-03-12  4:58 UTC (permalink / raw)
  To: Theodore Tso; +Cc: Ext4 Developers List, libaokun

Hi Ted,

On 3/12/26 1:27 AM, Theodore Tso wrote:
> On Thu, Mar 12, 2026 at 12:27:07AM +0800, Baokun Li wrote:
>> In fact, the patch that introduced it was not the latest version —
>> there was a later v3 that is consistent with the upstream kernel:
> Hmm, I wonder why b4 didn't pick up the newer version of the patch.
> Maybe I screwed up and missed the -c option to "b4 am -c".
>
> The main difference between my proposed fix and your v3 patch is that
> if the user doesn't specify an explicit orphan file size, with the v3
> patch, it might max out to 8MB when the block size is 64k.  With my
> fix, it will max out to 2MB in those situations.  The user can
> explicitly specify a orphan file size as 8MB, but it makes the default
> to be 2MB.

Yes, but if we cap the maximum size at 8 MB, then the maximum orphan file
created with the previous default settings would be 512 blocks. On 64 KB
page-size systems, older mkfs versions could create a 64 KB-block ext4 fs
with a 32 MB orphan file, so that would break forward compatibility.

This is also why kernel commit 7c11c56eb32e ("ext4: align max orphan file
size with e2fsprogs limit") changed the limit to 512 blocks instead.
That at least preserves compatibility for filesystems created with the
default mkfs options.

>
> In retrospect, I think we went wrong when we capped the orphan file in
> terms of bytes instead of blocks in the kernel.  When the block size
> is 64k, 8MB is only 32 blocks.  When the block size is 4k, the orphan
> file size can be up to 512 blocks.  And the scalability is really a
> function of the number of blocks, not the number of bytes, since with
> the orphan file, we use a hash that maps the cpu number to a logical
> block number in orphan size.

Yes, limiting it by the number of blocks is simpler, and that is exactly
what kernel commit 7c11c56eb32e does.

>
> Now, most of the time, I suspect 32 blocks is plenty most of the time,
> since it's unlikely we'll have that many running processes trying to
> truncate files or something else that requires adding the inode to the
> oprhan file.
>
> So I thought about just using a default orphan inode size of 32 file
> system blocks.  I also thought about changing the kernel to allow size
> of the orphan file to be say, up to 256 blocks.  And also maybe
> allowing mke2fs and tune2fs to accept an extended options
> orphan_file_blocks which takes an argument denominated in blocks.
>
> Ultimately, though, *most* of the time, consuming 512MB on the orphan
> file inode if the file system is say, 2TB.  So I decided it wasn't
> worth the effort to change how things worked.  But if we were starting
> from scratch, I think we would have been better of doing things in
> terms of blocks, instead of bytes.
>
> But maybe we should go and make that change.  What do folks think?
>
I’d prefer reverting e2fsprogs commit 6f03c698ef53 and taking v3 instead.
It looks like the simplest fix for now, and it should still preserve some
compatibility.


Regards,
Baokun


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2026-03-12  4:58 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-11 14:11 [PATCH] Fix default orphan file size calculations Theodore Ts'o
2026-03-11 15:01 ` Andreas Dilger
2026-03-11 16:27 ` Baokun Li
2026-03-11 17:27   ` Theodore Tso
2026-03-12  4:58     ` Baokun Li

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox