public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <dgc@kernel.org>
To: Wang Yugui <wangyugui@e16-tech.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [ RFC ] xfs: 4K inode support
Date: Thu, 23 Apr 2026 12:18:59 +1000	[thread overview]
Message-ID: <aemBk-rEnj6Or-76@dread> (raw)
In-Reply-To: <20260423070227.B2C6.409509F4@e16-tech.com>

On Thu, Apr 23, 2026 at 07:02:27AM +0800, Wang Yugui wrote:
> Hi,
> 
> > On Wed, Apr 22, 2026 at 07:05:15AM +0800, Wang Yugui wrote:
> > > use case for 4K inode
> > > - simpler logic for 4Kn device, and less lock.
> > 
> > Nope, neither of these are true.
> > 
> > There is no change in logic when inode sizes change, and there is no
> > change in locking as inode size changes.
> > 
> > This is because inodes are allocated in chunks of 64, and they are
> > read and written in clusters of 32 inodes. Hence all that changing
> > the size of the inode does is change the size of the inode cluster
> > buffer.
.....

> On a 4Kn device, we can I/O one single inode of 4K size without interaction with
> other inode? so mabye better performance for high speed ssd such as pcie gen5/gen6?

Yes, you can do lots of 4kB IOs, but you can move more data in/out
of memory by doing 8kB IOs, yes?

In reality, on-disk inodes are not independent. They are allocated
and freed in contiguous chunks of 64 inodes, and the inode cluster
buffer is used for bulk initialisation, logging unlinked list
changes, etc.

Application operations on inodes often occur in batches, and XFS's
inode allocation algorithms usually provide physical locality of
inodes for a given workload. Hence for typical data set access
patterns, inode clustering usually results in a reduction of inode
IO due to increases in inode cluster buffer cache hit ratios.

If you want to test whether 8kB inode cluster buffers 
result in higher performance than using 32 inodes per buffer,
then you can do that with some tweaks to the sb->sb_inoalignmt
value set by mkfs.xfs. See the xfs_ialloc_setup_geometry() function
for details on how to modify that setting during mkfs to influence
the cluster buffer size the kernel will configure.

If you create a filesytem with 2kB inodes and a 8kB cluster buffer
size, you are going to see a different performance profile compared
to using a 64kB inode cluster buffer. It will very much depend on
the workload  and cache hit patterns as to whether that is a
performance win or a performance degradation.

The typical situation is that smaller cluster buffers reduces cache
hits and so increase both metadata IOPS (read and write) and
per-metadata operation CPU overhead due to needing to manage more
buffers (e.g. inode chunk allocation now has to allocate,
initialise, log and write back 16 buffers instead of 2). 
Workloads that benefit from smaller buffers tend to have large
working sets of inodes (i.e. don't fit in cache) and low
physical locality in their inode access patterns (i.e. random file
access patterns). There aren't a lot of workloads with those
characteristics, espcially with modern servers have hundreds of GBs
to TBs of RAM in them.

So before you start asking us to review code changes, first show us
that we can meaningfully improve application performance by reducing
inode cluster sizes and increasing the number of inode metadata IOPS
needed for any given inode intensive workload....

-Dave.
-- 
Dave Chinner
dgc@kernel.org

      parent reply	other threads:[~2026-04-23  2:19 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-21  1:42 4K inode support on 4Kn device Wang Yugui
2026-04-21  5:48 ` Carlos Maiolino
2026-04-21 23:01 ` [PATCH] xfsprogs: 4K inode support Wang Yugui
2026-04-21 23:05 ` [ RFC ] xfs: " Wang Yugui
2026-04-22 21:41   ` Dave Chinner
2026-04-22 23:02     ` Wang Yugui
2026-04-23  2:09       ` Eric Sandeen
2026-04-23  2:18       ` Dave Chinner [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aemBk-rEnj6Or-76@dread \
    --to=dgc@kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=wangyugui@e16-tech.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox