public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Eric Sandeen <sandeen@sandeen.net>
To: Dave Chinner <david@fromorbit.com>
Cc: xfs@oss.sgi.com
Subject: Re: [PATCH 5/5] xfs: increase inode cluster size for v5 filesystems
Date: Mon, 11 Nov 2013 18:24:45 -0600	[thread overview]
Message-ID: <5281754D.3060102@sandeen.net> (raw)
In-Reply-To: <20131111224559.GW6188@dastard>

On 11/11/13, 4:45 PM, Dave Chinner wrote:
> On Fri, Nov 08, 2013 at 12:21:20PM -0600, Eric Sandeen wrote:
>> On 10/31/13, 11:27 PM, Dave Chinner wrote:
>>> So, this patch removes most of the performance and CPU usage
>>> differential between v4 and v5 filesystems on traversal related
>>> workloads.

...

>> Just thinking out loud here: So this is runtime only; nothing on disk sets
>> the cluster size explicitly (granted, it never did).
>>
>> So moving back and forth across newer/older kernels will create clusters 
>> of different sizes on the same filesystem, right?
> 
> No - inodes are allocated in chunks, not clusters. Inode clusters
> are the unit of IO we read and write inodes in.

Hohum, confused that fundamental difference.  Sorry.  Kind of
invalidates my other questions doesn't it.

>> (In the very distant past, this same change could have happened if
>> the amount of memory in a box changed (!) - see commit
>> 425f9ddd534573f58df8e7b633a534fcfc16d44d; prior to that we set
>> m_inode_cluster_size on the fly as well).
> 
> Right, I think I've already pointed that out.
> 
>> But sb_inoalignmt is a mkfs-set, on-disk feature.  So we might start with
>> i.e. this, where A1 are 8k alignment points, and 512 byte inodes, in clusters
>> of size 8k / 16 inodes:
>>
>> A1           A1           A1           A1           
>> [ 16 inodes ][ 16 inodes ]             [ 16 inodes ]
> 
> Ok, here's where you go wrong. Inode chunks are always 64 inodes,
> and so what you have on disk after any inode allocation is:
> 
> A1           A1           A1           A1
> [ 16 inodes ][ 16 inodes ][ 16 inodes ][ 16 inodes ]
> 
> and sb_inoalign determines where A1 lands in terms of filesystem
> blocks. With sb_inoalign = 2 and a 4k filesystem block size, you can
> only align inode *chunks* to even filesystem blocks like so:
> 
> ODD   EVEN   ODD   EVEN   ODD   EVEN   ODD   EVEN   ODD   EVEN
>       A1           A1           A1           A1		  A1
>       [ 16 inodes ][ 16 inodes ][ 16 inodes ][ 16 inodes ]
> 

<snip a few examples>

> To be able to bump up the inode cluster size, what we have to
> guarantee is that the inode chunks align to the the larger cluster
> size like so:
> 
> A2                        A2
> A1           A1           A1           A1
> [ 16 inodes ][ 16 inodes ][ 16 inodes ][ 16 inodes ] <--- existing
> [        32 inodes       ][       32 inodes        ] <--- new
> 
> i.e. inode chunk allocation needs to be aligned to A2, not A1 for
> the correct alignment of the larger clusters.
> 
> If we align to A1, then this will happen:
> 
> A2                        A2                        A2
> A1           A1           A1           A1           A1
>              [ 16 inodes ][ 16 inodes ][ 16 inodes ][ 16 inodes ]
> [        32 inodes       ][       32 inodes        ] <--- new
> 
> And that is clearly broken. Hence, to ensure we can use larger inode
> clusters, we have to ensure that the inode chunks are aligned
> appropriately for those cluster sizes. If the chunks are
> appropriately aligned for larger inode clusters (e.g. sb_inoalign =
> 4), then they are also appropriately aligned for inode cluster sizes
> older kernels support.

*nod*  Ok.

>> So the only other thing I wonder about is when we are handling
>> pre-existing, smaller-than m_inode_cluster_size clusters.
>>
>> i.e. xfs_ifree_cluster() figures out the number of blocks &
>> number of inodes in a cluster, based on the (now not
>> constant) m_inode_cluster_size.
>>
>> What stops us from going off the end of a smaller cluster?
> 
> The fact that we calculate the number of inodes to process per
> cluster based on the size of the cluster buffer (in blocks)
> multiplied by the number of inodes per block. If the code didn't
> work, we'd have found out a long time ago ;)

And I was rather stupidly confusing clusters & chunks, so never mind...

-Eric

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

  reply	other threads:[~2013-11-12  0:24 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-11-01  4:27 [PATCH 0/5] xfs: more patches for 3.13 Dave Chinner
2013-11-01  4:27 ` [PATCH 1/5] xfs: xfs_remove deadlocks due to inverted AGF vs AGI lock ordering Dave Chinner
2013-11-01  4:27 ` [PATCH 2/5] xfs: open code inc_inode_iversion when logging an inode Dave Chinner
2013-11-05 16:41   ` Christoph Hellwig
2013-11-18 21:54   ` Eric Sandeen
2013-11-18 22:28     ` Ben Myers
2013-11-18 22:45       ` Eric Sandeen
2013-11-01  4:27 ` [PATCH 3/5] xfs: trace AIL manipulations Dave Chinner
2013-11-05 16:41   ` Christoph Hellwig
2013-11-01  4:27 ` [PATCH 4/5] xfs: add tracepoints to AGF/AGI read operations Dave Chinner
2013-11-05 16:42   ` Christoph Hellwig
2013-11-01  4:27 ` [PATCH 5/5] xfs: increase inode cluster size for v5 filesystems Dave Chinner
2013-11-05 16:43   ` Christoph Hellwig
2013-11-05 19:56     ` Dave Chinner
2013-11-06 21:31       ` Ben Myers
2013-11-07  0:32         ` Dave Chinner
2013-11-12 17:33       ` Christoph Hellwig
2013-11-08 18:21   ` Eric Sandeen
2013-11-11 22:45     ` Dave Chinner
2013-11-12  0:24       ` Eric Sandeen [this message]
2013-11-14 18:51   ` Eric Sandeen
2013-11-06 23:01 ` [PATCH 0/5] xfs: more patches for 3.13 Ben Myers
2013-11-07  1:57   ` Dave Chinner
2013-11-13  1:16     ` Eric Sandeen
2013-11-14  1:16       ` Dave Chinner
2013-11-15 17:19         ` Eric Sandeen
2013-11-15 17:55           ` Eric Sandeen
2013-11-17 19:48             ` Dave Chinner
2013-11-18 21:52               ` Eric Sandeen
2013-11-18 20:30 ` Ben Myers

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5281754D.3060102@sandeen.net \
    --to=sandeen@sandeen.net \
    --cc=david@fromorbit.com \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox