From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id F41A01B87C9 for ; Thu, 23 Apr 2026 02:19:06 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776910747; cv=none; b=FwU0wcI3resOOpAM9ESNTHBTUpZiwkPuEv1jUMS6SHMjW5Rzy67IA1J6LkOz5X/8gAVDyUacS0HL6PUpxn+a1BUWX6W1IsMnMwbGdZXVEgB1Kdg7iNqSOfSosMd9iS/X7+hA/vC3vQunpgeTdUzVWM7eJDmzj5g8NXKiUX1YEJ4= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776910747; c=relaxed/simple; bh=gMkQhECJ20tmW8beDJagjq56KCQvAnVeH7VA0zR433M=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=WLpb0AFmjUTvLM2QTKTKGfYLJpn1f4NaNlh5IQ4f5eHdHmvqP5A0EmnxBOcMFdvBeCvS9cyXwi9zMSts0uQzb3dTiGyLoMBOkHIymRB9vcmQcnyYmPdxp7KimXkWfMsDE9Kn/8kv5fZ1cQooF0xSGjUUcvU/kVPsr8h7UO6wQZE= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=rRjxwzEE; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="rRjxwzEE" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 4284DC19425; Thu, 23 Apr 2026 02:19:03 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1776910746; bh=gMkQhECJ20tmW8beDJagjq56KCQvAnVeH7VA0zR433M=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=rRjxwzEE1+T0BwmmRifONgRoQDcReR53mQ5rKeMVNL+Xfvh2/BOteFcYnB4ydXZ+B r1UV2qLjh5wSV9N1y7Hcoocp/rBAF3vm/WbJV8x8LEnW0e9hZBGd6g9auouY5NJT56 BXS5XIuSORrv+tAhBgnVf5DiubuThlYBlQPDStCQk693InJkiEfr3KK3SSY9wzeHrm zQQshy8NQm/zxMd+CM/TeQZHEbmRxuspOFzKu+luyTqQnaWS5cg0WBaty7ZfxsIcao uxbL7y9gHqo2BKUzz9RTBQOYTD+IGowiGEJBpl0u3eiahbQO+wrnmuGC0udUFB4t4i sD8SXPmq9VjnA== Date: Thu, 23 Apr 2026 12:18:59 +1000 From: Dave Chinner To: Wang Yugui Cc: linux-xfs@vger.kernel.org Subject: Re: [ RFC ] xfs: 4K inode support Message-ID: References: <20260421230515.2234-1-wangyugui@e16-tech.com> <20260423070227.B2C6.409509F4@e16-tech.com> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20260423070227.B2C6.409509F4@e16-tech.com> On Thu, Apr 23, 2026 at 07:02:27AM +0800, Wang Yugui wrote: > Hi, > > > On Wed, Apr 22, 2026 at 07:05:15AM +0800, Wang Yugui wrote: > > > use case for 4K inode > > > - simpler logic for 4Kn device, and less lock. > > > > Nope, neither of these are true. > > > > There is no change in logic when inode sizes change, and there is no > > change in locking as inode size changes. > > > > This is because inodes are allocated in chunks of 64, and they are > > read and written in clusters of 32 inodes. Hence all that changing > > the size of the inode does is change the size of the inode cluster > > buffer. ..... > On a 4Kn device, we can I/O one single inode of 4K size without interaction with > other inode? so mabye better performance for high speed ssd such as pcie gen5/gen6? Yes, you can do lots of 4kB IOs, but you can move more data in/out of memory by doing 8kB IOs, yes? In reality, on-disk inodes are not independent. They are allocated and freed in contiguous chunks of 64 inodes, and the inode cluster buffer is used for bulk initialisation, logging unlinked list changes, etc. Application operations on inodes often occur in batches, and XFS's inode allocation algorithms usually provide physical locality of inodes for a given workload. Hence for typical data set access patterns, inode clustering usually results in a reduction of inode IO due to increases in inode cluster buffer cache hit ratios. If you want to test whether 8kB inode cluster buffers result in higher performance than using 32 inodes per buffer, then you can do that with some tweaks to the sb->sb_inoalignmt value set by mkfs.xfs. See the xfs_ialloc_setup_geometry() function for details on how to modify that setting during mkfs to influence the cluster buffer size the kernel will configure. If you create a filesytem with 2kB inodes and a 8kB cluster buffer size, you are going to see a different performance profile compared to using a 64kB inode cluster buffer. It will very much depend on the workload and cache hit patterns as to whether that is a performance win or a performance degradation. The typical situation is that smaller cluster buffers reduces cache hits and so increase both metadata IOPS (read and write) and per-metadata operation CPU overhead due to needing to manage more buffers (e.g. inode chunk allocation now has to allocate, initialise, log and write back 16 buffers instead of 2). Workloads that benefit from smaller buffers tend to have large working sets of inodes (i.e. don't fit in cache) and low physical locality in their inode access patterns (i.e. random file access patterns). There aren't a lot of workloads with those characteristics, espcially with modern servers have hundreds of GBs to TBs of RAM in them. So before you start asking us to review code changes, first show us that we can meaningfully improve application performance by reducing inode cluster sizes and increasing the number of inode metadata IOPS needed for any given inode intensive workload.... -Dave. -- Dave Chinner dgc@kernel.org