From: Eric Sandeen <sandeen@sandeen.net>
To: Dave Chinner <david@fromorbit.com>
Cc: xfs@oss.sgi.com
Subject: Re: [PATCH 5/5] xfs: increase inode cluster size for v5 filesystems
Date: Mon, 11 Nov 2013 18:24:45 -0600 [thread overview]
Message-ID: <5281754D.3060102@sandeen.net> (raw)
In-Reply-To: <20131111224559.GW6188@dastard>
On 11/11/13, 4:45 PM, Dave Chinner wrote:
> On Fri, Nov 08, 2013 at 12:21:20PM -0600, Eric Sandeen wrote:
>> On 10/31/13, 11:27 PM, Dave Chinner wrote:
>>> So, this patch removes most of the performance and CPU usage
>>> differential between v4 and v5 filesystems on traversal related
>>> workloads.
...
>> Just thinking out loud here: So this is runtime only; nothing on disk sets
>> the cluster size explicitly (granted, it never did).
>>
>> So moving back and forth across newer/older kernels will create clusters
>> of different sizes on the same filesystem, right?
>
> No - inodes are allocated in chunks, not clusters. Inode clusters
> are the unit of IO we read and write inodes in.
Hohum, confused that fundamental difference. Sorry. Kind of
invalidates my other questions doesn't it.
>> (In the very distant past, this same change could have happened if
>> the amount of memory in a box changed (!) - see commit
>> 425f9ddd534573f58df8e7b633a534fcfc16d44d; prior to that we set
>> m_inode_cluster_size on the fly as well).
>
> Right, I think I've already pointed that out.
>
>> But sb_inoalignmt is a mkfs-set, on-disk feature. So we might start with
>> i.e. this, where A1 are 8k alignment points, and 512 byte inodes, in clusters
>> of size 8k / 16 inodes:
>>
>> A1 A1 A1 A1
>> [ 16 inodes ][ 16 inodes ] [ 16 inodes ]
>
> Ok, here's where you go wrong. Inode chunks are always 64 inodes,
> and so what you have on disk after any inode allocation is:
>
> A1 A1 A1 A1
> [ 16 inodes ][ 16 inodes ][ 16 inodes ][ 16 inodes ]
>
> and sb_inoalign determines where A1 lands in terms of filesystem
> blocks. With sb_inoalign = 2 and a 4k filesystem block size, you can
> only align inode *chunks* to even filesystem blocks like so:
>
> ODD EVEN ODD EVEN ODD EVEN ODD EVEN ODD EVEN
> A1 A1 A1 A1 A1
> [ 16 inodes ][ 16 inodes ][ 16 inodes ][ 16 inodes ]
>
<snip a few examples>
> To be able to bump up the inode cluster size, what we have to
> guarantee is that the inode chunks align to the the larger cluster
> size like so:
>
> A2 A2
> A1 A1 A1 A1
> [ 16 inodes ][ 16 inodes ][ 16 inodes ][ 16 inodes ] <--- existing
> [ 32 inodes ][ 32 inodes ] <--- new
>
> i.e. inode chunk allocation needs to be aligned to A2, not A1 for
> the correct alignment of the larger clusters.
>
> If we align to A1, then this will happen:
>
> A2 A2 A2
> A1 A1 A1 A1 A1
> [ 16 inodes ][ 16 inodes ][ 16 inodes ][ 16 inodes ]
> [ 32 inodes ][ 32 inodes ] <--- new
>
> And that is clearly broken. Hence, to ensure we can use larger inode
> clusters, we have to ensure that the inode chunks are aligned
> appropriately for those cluster sizes. If the chunks are
> appropriately aligned for larger inode clusters (e.g. sb_inoalign =
> 4), then they are also appropriately aligned for inode cluster sizes
> older kernels support.
*nod* Ok.
>> So the only other thing I wonder about is when we are handling
>> pre-existing, smaller-than m_inode_cluster_size clusters.
>>
>> i.e. xfs_ifree_cluster() figures out the number of blocks &
>> number of inodes in a cluster, based on the (now not
>> constant) m_inode_cluster_size.
>>
>> What stops us from going off the end of a smaller cluster?
>
> The fact that we calculate the number of inodes to process per
> cluster based on the size of the cluster buffer (in blocks)
> multiplied by the number of inodes per block. If the code didn't
> work, we'd have found out a long time ago ;)
And I was rather stupidly confusing clusters & chunks, so never mind...
-Eric
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
next prev parent reply other threads:[~2013-11-12 0:24 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-11-01 4:27 [PATCH 0/5] xfs: more patches for 3.13 Dave Chinner
2013-11-01 4:27 ` [PATCH 1/5] xfs: xfs_remove deadlocks due to inverted AGF vs AGI lock ordering Dave Chinner
2013-11-01 4:27 ` [PATCH 2/5] xfs: open code inc_inode_iversion when logging an inode Dave Chinner
2013-11-05 16:41 ` Christoph Hellwig
2013-11-18 21:54 ` Eric Sandeen
2013-11-18 22:28 ` Ben Myers
2013-11-18 22:45 ` Eric Sandeen
2013-11-01 4:27 ` [PATCH 3/5] xfs: trace AIL manipulations Dave Chinner
2013-11-05 16:41 ` Christoph Hellwig
2013-11-01 4:27 ` [PATCH 4/5] xfs: add tracepoints to AGF/AGI read operations Dave Chinner
2013-11-05 16:42 ` Christoph Hellwig
2013-11-01 4:27 ` [PATCH 5/5] xfs: increase inode cluster size for v5 filesystems Dave Chinner
2013-11-05 16:43 ` Christoph Hellwig
2013-11-05 19:56 ` Dave Chinner
2013-11-06 21:31 ` Ben Myers
2013-11-07 0:32 ` Dave Chinner
2013-11-12 17:33 ` Christoph Hellwig
2013-11-08 18:21 ` Eric Sandeen
2013-11-11 22:45 ` Dave Chinner
2013-11-12 0:24 ` Eric Sandeen [this message]
2013-11-14 18:51 ` Eric Sandeen
2013-11-06 23:01 ` [PATCH 0/5] xfs: more patches for 3.13 Ben Myers
2013-11-07 1:57 ` Dave Chinner
2013-11-13 1:16 ` Eric Sandeen
2013-11-14 1:16 ` Dave Chinner
2013-11-15 17:19 ` Eric Sandeen
2013-11-15 17:55 ` Eric Sandeen
2013-11-17 19:48 ` Dave Chinner
2013-11-18 21:52 ` Eric Sandeen
2013-11-18 20:30 ` Ben Myers
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5281754D.3060102@sandeen.net \
--to=sandeen@sandeen.net \
--cc=david@fromorbit.com \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox