From: Eric Sandeen <sandeen@sandeen.net>
To: Dave Chinner <david@fromorbit.com>
Cc: xfs@oss.sgi.com
Subject: Re: [PATCH 5/5] xfs: increase inode cluster size for v5 filesystems
Date: Mon, 11 Nov 2013 18:24:45 -0600 [thread overview]
Message-ID: <5281754D.3060102@sandeen.net> (raw)
In-Reply-To: <20131111224559.GW6188@dastard>
On 11/11/13, 4:45 PM, Dave Chinner wrote:
> On Fri, Nov 08, 2013 at 12:21:20PM -0600, Eric Sandeen wrote:
>> On 10/31/13, 11:27 PM, Dave Chinner wrote:
>>> So, this patch removes most of the performance and CPU usage
>>> differential between v4 and v5 filesystems on traversal related
>>> workloads.
...
>> Just thinking out loud here: So this is runtime only; nothing on disk sets
>> the cluster size explicitly (granted, it never did).
>>
>> So moving back and forth across newer/older kernels will create clusters
>> of different sizes on the same filesystem, right?
>
> No - inodes are allocated in chunks, not clusters. Inode clusters
> are the unit of IO we read and write inodes in.
Hohum, confused that fundamental difference. Sorry. Kind of
invalidates my other questions doesn't it.
>> (In the very distant past, this same change could have happened if
>> the amount of memory in a box changed (!) - see commit
>> 425f9ddd534573f58df8e7b633a534fcfc16d44d; prior to that we set
>> m_inode_cluster_size on the fly as well).
>
> Right, I think I've already pointed that out.
>
>> But sb_inoalignmt is a mkfs-set, on-disk feature. So we might start with
>> i.e. this, where A1 are 8k alignment points, and 512 byte inodes, in clusters
>> of size 8k / 16 inodes:
>>
>> A1 A1 A1 A1
>> [ 16 inodes ][ 16 inodes ] [ 16 inodes ]
>
> Ok, here's where you go wrong. Inode chunks are always 64 inodes,
> and so what you have on disk after any inode allocation is:
>
> A1 A1 A1 A1
> [ 16 inodes ][ 16 inodes ][ 16 inodes ][ 16 inodes ]
>
> and sb_inoalign determines where A1 lands in terms of filesystem
> blocks. With sb_inoalign = 2 and a 4k filesystem block size, you can
> only align inode *chunks* to even filesystem blocks like so:
>
> ODD EVEN ODD EVEN ODD EVEN ODD EVEN ODD EVEN
> A1 A1 A1 A1 A1
> [ 16 inodes ][ 16 inodes ][ 16 inodes ][ 16 inodes ]
>
<snip a few examples>
> To be able to bump up the inode cluster size, what we have to
> guarantee is that the inode chunks align to the the larger cluster
> size like so:
>
> A2 A2
> A1 A1 A1 A1
> [ 16 inodes ][ 16 inodes ][ 16 inodes ][ 16 inodes ] <--- existing
> [ 32 inodes ][ 32 inodes ] <--- new
>
> i.e. inode chunk allocation needs to be aligned to A2, not A1 for
> the correct alignment of the larger clusters.
>
> If we align to A1, then this will happen:
>
> A2 A2 A2
> A1 A1 A1 A1 A1
> [ 16 inodes ][ 16 inodes ][ 16 inodes ][ 16 inodes ]
> [ 32 inodes ][ 32 inodes ] <--- new
>
> And that is clearly broken. Hence, to ensure we can use larger inode
> clusters, we have to ensure that the inode chunks are aligned
> appropriately for those cluster sizes. If the chunks are
> appropriately aligned for larger inode clusters (e.g. sb_inoalign =
> 4), then they are also appropriately aligned for inode cluster sizes
> older kernels support.
*nod* Ok.
>> So the only other thing I wonder about is when we are handling
>> pre-existing, smaller-than m_inode_cluster_size clusters.
>>
>> i.e. xfs_ifree_cluster() figures out the number of blocks &
>> number of inodes in a cluster, based on the (now not
>> constant) m_inode_cluster_size.
>>
>> What stops us from going off the end of a smaller cluster?
>
> The fact that we calculate the number of inodes to process per
> cluster based on the size of the cluster buffer (in blocks)
> multiplied by the number of inodes per block. If the code didn't
> work, we'd have found out a long time ago ;)
And I was rather stupidly confusing clusters & chunks, so never mind...
-Eric
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
next prev parent reply other threads:[~2013-11-12 0:24 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-11-01 4:27 [PATCH 0/5] xfs: more patches for 3.13 Dave Chinner
2013-11-01 4:27 ` [PATCH 1/5] xfs: xfs_remove deadlocks due to inverted AGF vs AGI lock ordering Dave Chinner
2013-11-01 4:27 ` [PATCH 2/5] xfs: open code inc_inode_iversion when logging an inode Dave Chinner
2013-11-05 16:41 ` Christoph Hellwig
2013-11-18 21:54 ` Eric Sandeen
2013-11-18 22:28 ` Ben Myers
2013-11-18 22:45 ` Eric Sandeen
2013-11-01 4:27 ` [PATCH 3/5] xfs: trace AIL manipulations Dave Chinner
2013-11-05 16:41 ` Christoph Hellwig
2013-11-01 4:27 ` [PATCH 4/5] xfs: add tracepoints to AGF/AGI read operations Dave Chinner
2013-11-05 16:42 ` Christoph Hellwig
2013-11-01 4:27 ` [PATCH 5/5] xfs: increase inode cluster size for v5 filesystems Dave Chinner
2013-11-05 16:43 ` Christoph Hellwig
2013-11-05 19:56 ` Dave Chinner
2013-11-06 21:31 ` Ben Myers
2013-11-07 0:32 ` Dave Chinner
2013-11-12 17:33 ` Christoph Hellwig
2013-11-08 18:21 ` Eric Sandeen
2013-11-11 22:45 ` Dave Chinner
2013-11-12 0:24 ` Eric Sandeen [this message]
2013-11-14 18:51 ` Eric Sandeen
2013-11-06 23:01 ` [PATCH 0/5] xfs: more patches for 3.13 Ben Myers
2013-11-07 1:57 ` Dave Chinner
2013-11-13 1:16 ` Eric Sandeen
2013-11-14 1:16 ` Dave Chinner
2013-11-15 17:19 ` Eric Sandeen
2013-11-15 17:55 ` Eric Sandeen
2013-11-17 19:48 ` Dave Chinner
2013-11-18 21:52 ` Eric Sandeen
2013-11-18 20:30 ` Ben Myers
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5281754D.3060102@sandeen.net \
--to=sandeen@sandeen.net \
--cc=david@fromorbit.com \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.