Re: [ RFC ] xfs: 4K inode support

public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed

From: Wang Yugui <wangyugui@e16-tech.com>
To: Dave Chinner <dgc@kernel.org>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [ RFC ] xfs: 4K inode support
Date: Thu, 23 Apr 2026 07:02:27 +0800	[thread overview]
Message-ID: <20260423070227.B2C6.409509F4@e16-tech.com> (raw)
In-Reply-To: <aelAeiyiAyFiJUgQ@dread>

Hi,

> On Wed, Apr 22, 2026 at 07:05:15AM +0800, Wang Yugui wrote:
> > use case for 4K inode
> > - simpler logic for 4Kn device, and less lock.
> 
> Nope, neither of these are true.
> 
> There is no change in logic when inode sizes change, and there is no
> change in locking as inode size changes.
> 
> This is because inodes are allocated in chunks of 64, and they are
> read and written in clusters of 32 inodes. Hence all that changing
> the size of the inode does is change the size of the inode cluster
> buffer.
> 
> And therein lies the problem: 32 x 4kB inodes is 128kB. Looking at
> xfs_types.h:
> 
> /*                                                                               
>  * Minimum and maximum blocksize and sectorsize.                                 
>  * The blocksize upper limit is pretty much arbitrary.                           
>  * The sectorsize upper limit is due to sizeof(sb_sectsize).                     
>  * CRC enable filesystems use 512 byte inodes, meaning 512 byte block sizes      
>  * cannot be used.                                                               
>  */                                                                              
> #define XFS_MIN_BLOCKSIZE_LOG   9       /* i.e. 512 bytes */                     
> #define XFS_MAX_BLOCKSIZE_LOG   16      /* i.e. 65536 bytes */                   
> #define XFS_MIN_BLOCKSIZE       (1 << XFS_MIN_BLOCKSIZE_LOG)                     
> #define XFS_MAX_BLOCKSIZE       (1 << XFS_MAX_BLOCKSIZE_LOG)
> 
> Yup, XFS defines a maximum block size of 64kB, and inode cluster
> buffers are already at this maximum size for 2kB inodes.
> 
> > - better performance for directory with many files.
> 
> No, it won't make any difference to large directory performance
> because they are in block/leaf/node form and all the directory
> information is held in extents external to the inode. The size of
> the directory inode really does not influence the performance of the
> directory once it transitions out of inline format.
> 
> In fact, larger inode sizes result in lower performance for
> directory ops, because the metadata footprint has increased in
> size and so every inode cluster IO now has higher latency and
> consumes more IO bandwidth. i.e. the -inode operations- that are
> done during directory modifications are slower...
> 
> Then there's the larger memory footprint of the buffer cache due to
> cached inode cluster buffers - in most cases that's all wasted space
> because inode metadata is typically just an inode core (176 bytes),
> a couple of extent records (16 bytes each) and maybe a couple of
> xattrs (e.g. selinux). So a typical inode will only contain maybe
> 300 bytes of metadata, yet now they take up 4kB of RAM -each- when
> resident in the buffer cache...
> 
> > - maybe inline data support later.
> 
> That's a whole different problem - it doesn't require inode sizes to
> be expanded to implement.
> 
> > TODO:
> > still crash in xfs_trans_read_buf_map() when mount a 4K inode xfs now.
> 
> Good luck with that - there's several issues with on-disk format
> constants that need to be sorted out before IO will work. e.g.
> you'll hit this error through _xfs_trans_bjoin():
> 
>                         xfs_err(mp,
>         "buffer item dirty bitmap (%u uints) too small to reflect %u bytes!",
>                                         map_size,
>                                         BBTOB(bp->b_maps[i].bm_len));
> 
> and it will shut down with a corruption error. That's indicating
> that the on-disk journal format for buffer logging does not support
> the buffer size being read. i.e. there's a problem with the inode
> cluster size....
> 
> IOWs, there are -lots- of complex and cirtical subsystems that
> increasing the inode size will break and need to be fixed. Changing
> a fundamental on-disk format constant isn't a simple thing to do, an
> AI will not be able to tell you all the things you need to change
> and test without already knowing where all the architectural
> problems are to begin with....
> 
> Without an actual solid reason for making fundamental on-disk format
> changes and a commitment of significant time and testing resources,
> changes of this scope are unlikely to be made...
> 
> -Dave.
> -- 
> Dave Chinner
> dgc@kernel.org

Thanks a lot for these info.

the basic logic is that, for 4Kn device, the min io size is already 4K. 
and 4Kn device(SSD and RAID) because more common now.

On a 4Kn device, we can I/O one single inode of 4K size without interaction with
other inode? so mabye better performance for high speed ssd such as pcie gen5/gen6?

Best Regards
Wang Yugui (wangyugui@e16-tech.com)
2026/04/23

next prev parent reply	other threads:[~2026-04-23  1:03 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-21  1:42 4K inode support on 4Kn device Wang Yugui
2026-04-21  5:48 ` Carlos Maiolino
2026-04-21 23:01 ` [PATCH] xfsprogs: 4K inode support Wang Yugui
2026-04-21 23:05 ` [ RFC ] xfs: " Wang Yugui
2026-04-22 21:41   ` Dave Chinner
2026-04-22 23:02     ` Wang Yugui [this message]
2026-04-23  2:09       ` Eric Sandeen
2026-04-23  2:18       ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260423070227.B2C6.409509F4@e16-tech.com \
    --to=wangyugui@e16-tech.com \
    --cc=dgc@kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox