[RFC 0/3] Convert XFS inode hashes to radix trees

public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed

* [RFC 0/3] Convert XFS inode hashes to radix trees
@ 2006-10-03  6:06 David Chinner
  2006-10-03 21:23 ` Chris Wedgwood
  2006-11-15  1:09 ` Shailendra Tripathi
  0 siblings, 2 replies; 10+ messages in thread
From: David Chinner @ 2006-10-03  6:06 UTC (permalink / raw)
  To: xfs-dev; +Cc: xfs

One of the long standing problems with XFS on large machines and
filesystems is the sizing of the inode cache hashes used by XFS to
index the xfs_inode_t structures. The mount option ihashsize became
a necessity because the default calculations simply can't get it
right for all situations.

On top of that, as we increase the size of the inode hash and cache
more inodes, the inode cluster hash becomes the limiting factor,
especially when we have sparse cluster population.  The result of
this is that we can always get to the point where either the ihash
or the chash is a scalability or performance limitation.

The following three patches replace the hashes with a more scalable
solution that should not require tweaking in most situations.

I chose a radix tree to replace the hash chains because of a neat
alignment of XFS inode structures and the kernel radix tree fanout.
XFS allocates inodes in clusters of 64 inodes and the radix tree
keeps 64 sequential entries per node.  That means all for the inodes
in a cluster will always sit in the same node of the radix tree.

Using this relationship, we completely remove the need for the
cluster hash to track clusters because we can use a gang lookup on
the radix tree to search for an existing inode in the cluster in an
efficient manner.

The following three patches sit on top of the recently posted
i_flags cleanup patch.
(http://marc.theaimsgroup.com/?l=linux-xfs&m=115985254820322&w=2)

The first patch replaces the inode hash chains with radix trees.  A
single radix tree with a read/write lock does not provide enough
parallelism to prevent performance regressions under simultanenous
create/unlink workloadds, so we hash the inode clusters into
different radix trees each with their own read/write lock. The
default is to create (2*ncpus)-1 radix trees up to a maximum of 15.
At this point I have left the ihashsize mount option alone but
limited the maximum number it can take to 128. if you specify more
than 128 (i.e. everyone currently using this mount option), it
falls back to the default.

The second patch introduces a per-cluster object lock for chaining
the inodes in the cluster together (for xfs_iflush()). The inode
chain is currently locked by cluster hash chain lock, so we need
some other method of locking if we are to remove the cluster hash
altogether.

The third patch removes the cluster hash and replaces it with some
masking and a radix tree gang lookup.

Overall, the patchset removes more than 200 lines of code from the
xfs inode caching and lookup code and provides more consistent
scalability for large numbers of cached inodes. The only down side
is that it limits us to 32 bit inode numbers of 32 bit platforms due
to the way the radix tree uses unsigned longs for it's indexes

Comments, thoughts, etc are welcome.

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC 0/3] Convert XFS inode hashes to radix trees
  2006-10-03  6:06 [RFC 0/3] Convert XFS inode hashes to radix trees David Chinner
@ 2006-10-03 21:23 ` Chris Wedgwood
  2006-10-03 22:22   ` David Chinner
       [not found]   ` <20061003222256.GW4695059__33273.3314754025$1159914338$gmane$org@melbourne.sgi.com>
  2006-11-15  1:09 ` Shailendra Tripathi
  1 sibling, 2 replies; 10+ messages in thread
From: Chris Wedgwood @ 2006-10-03 21:23 UTC (permalink / raw)
  To: David Chinner; +Cc: xfs-dev, xfs, dhowells, LKML

On Tue, Oct 03, 2006 at 04:06:10PM +1000, David Chinner wrote:

> Overall, the patchset removes more than 200 lines of code from the
> xfs inode caching and lookup code and provides more consistent
> scalability for large numbers of cached inodes. The only down side
> is that it limits us to 32 bit inode numbers of 32 bit platforms due
> to the way the radix tree uses unsigned longs for it's indexes

    commit afefdbb28a0a2af689926c30b94a14aea6036719
    tree 6ee500575cac928cd90045bcf5b691cf2b8daa09
    parent 1d32849b14bc8792e6f35ab27dd990d74b16126c
    author David Howells <dhowells@redhat.com> 1159863226 -0700
    committer Linus Torvalds <torvalds@g5.osdl.org> 1159887820 -0700

    [PATCH] VFS: Make filldir_t and struct kstat deal in 64-bit inode numbers

    These patches make the kernel pass 64-bit inode numbers internally when
    communicating to userspace, even on a 32-bit system.  They are required
    because some filesystems have intrinsic 64-bit inode numbers: NFS3+ and XFS
    for example.  The 64-bit inode numbers are then propagated to userspace
    automatically where the arch supports it.
    [...]

Doing this will mean XFS won't be able to support 32-bit inodes on
32-bit platforms the above (merged) patch --- though given that cheap
64-bit systems are now abundant does anyone really care?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC 0/3] Convert XFS inode hashes to radix trees
  2006-10-03 21:23 ` Chris Wedgwood
@ 2006-10-03 22:22   ` David Chinner
  2006-10-04  0:47     ` Chris Wedgwood
                       ` (2 more replies)
       [not found]   ` <20061003222256.GW4695059__33273.3314754025$1159914338$gmane$org@melbourne.sgi.com>
  1 sibling, 3 replies; 10+ messages in thread
From: David Chinner @ 2006-10-03 22:22 UTC (permalink / raw)
  To: Chris Wedgwood; +Cc: David Chinner, xfs-dev, xfs, dhowells, LKML

On Tue, Oct 03, 2006 at 02:23:35PM -0700, Chris Wedgwood wrote:
> On Tue, Oct 03, 2006 at 04:06:10PM +1000, David Chinner wrote:
> > Overall, the patchset removes more than 200 lines of code from the
> > xfs inode caching and lookup code and provides more consistent
> > scalability for large numbers of cached inodes. The only down side
> > is that it limits us to 32 bit inode numbers of 32 bit platforms due
> > to the way the radix tree uses unsigned longs for it's indexes
> 
>     commit afefdbb28a0a2af689926c30b94a14aea6036719
>     tree 6ee500575cac928cd90045bcf5b691cf2b8daa09
>     parent 1d32849b14bc8792e6f35ab27dd990d74b16126c
>     author David Howells <dhowells@redhat.com> 1159863226 -0700
>     committer Linus Torvalds <torvalds@g5.osdl.org> 1159887820 -0700
> 
>     [PATCH] VFS: Make filldir_t and struct kstat deal in 64-bit inode numbers
> 
>     These patches make the kernel pass 64-bit inode numbers internally when
>     communicating to userspace, even on a 32-bit system.  They are required
>     because some filesystems have intrinsic 64-bit inode numbers: NFS3+ and XFS
>     for example.  The 64-bit inode numbers are then propagated to userspace
>     automatically where the arch supports it.
>     [...]
> 
> Doing this will mean XFS won't be able to support 32-bit inodes on
> 32-bit platforms the above (merged) patch --- though given that cheap
> 64-bit systems are now abundant does anyone really care?

That's a good question. In a recent thread on linux-fsdevel about
these patches Christoph Hellwig pointed out that 32bit user space is
not ready for 64 bit inodes, so it's probably going to be a while
before the second half of this mod is ready (which exports 64 bit
inodes ito userspace on 32bit platforms).

http://marc.theaimsgroup.com/?l=linux-fsdevel&m=115946211808497&w=2
http://marc.theaimsgroup.com/?l=linux-fsdevel&m=115948836023569&w=2

ISTR someone else also menitoning that 64bit inodes on 32-bit machines
also breaks the dynamic linker, but I can't find a reference to that
atm.

As it stands, there's still a few barriers to getting 64 bit inodes
on 32 bit platforms and I can't see them going away quickly. Right
now I see little reason in moving to 64 bit inodes for 32 bit
platforms for XFS because of the 16TB filesystem size limit (that
only needs 33-36 bit inodes depending on the inode size) and no
32bit platform is currently able to repair a filesystem of that
size.

And yes, 64 bit systems are cheap, cheap, cheap so IMO this
functionality is really irrelevant moving forward. If it had come
along a couple of years ago then it would be different, but I think
mainstream technology is finally catching up with XFS so it's not a
critical issue anymore... ;)

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC 0/3] Convert XFS inode hashes to radix trees
  2006-10-03 22:22   ` David Chinner
@ 2006-10-04  0:47     ` Chris Wedgwood
  2006-10-04  1:43     ` David Chinner
  2006-10-04 19:22     ` Trond Myklebust
  2 siblings, 0 replies; 10+ messages in thread
From: Chris Wedgwood @ 2006-10-04  0:47 UTC (permalink / raw)
  To: David Chinner; +Cc: xfs-dev, xfs, dhowells, LKML

On Wed, Oct 04, 2006 at 08:22:56AM +1000, David Chinner wrote:

> That's a good question. In a recent thread on linux-fsdevel about
> these patches Christoph Hellwig pointed out that 32bit user space is
> not ready for 64 bit inodes, so it's probably going to be a while
> before the second half of this mod is ready (which exports 64 bit
> inodes ito userspace on 32bit platforms).

yes a patch changing struct kstat and filldir* was merged...

> http://marc.theaimsgroup.com/?l=linux-fsdevel&m=115946211808497&w=2
> http://marc.theaimsgroup.com/?l=linux-fsdevel&m=115948836023569&w=2

> As it stands, there's still a few barriers to getting 64 bit inodes
> on 32 bit platforms and I can't see them going away quickly. Right
> now I see little reason in moving to 64 bit inodes for 32 bit
> platforms for XFS because of the 16TB filesystem size limit (that
> only needs 33-36 bit inodes depending on the inode size) and no
> 32bit platform is currently able to repair a filesystem of that
> size.

so that leaves NFS3+

is it really worth the pain then?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC 0/3] Convert XFS inode hashes to radix trees
  2006-10-03 22:22   ` David Chinner
  2006-10-04  0:47     ` Chris Wedgwood
@ 2006-10-04  1:43     ` David Chinner
  2006-10-04 19:22     ` Trond Myklebust
  2 siblings, 0 replies; 10+ messages in thread
From: David Chinner @ 2006-10-04  1:43 UTC (permalink / raw)
  To: David Chinner; +Cc: Chris Wedgwood, xfs-dev, xfs, dhowells, LKML

On Wed, Oct 04, 2006 at 08:22:56AM +1000, David Chinner wrote:
> On Tue, Oct 03, 2006 at 02:23:35PM -0700, Chris Wedgwood wrote:
> > On Tue, Oct 03, 2006 at 04:06:10PM +1000, David Chinner wrote:
> > > Overall, the patchset removes more than 200 lines of code from the
> > > xfs inode caching and lookup code and provides more consistent
> > > scalability for large numbers of cached inodes. The only down side
> > > is that it limits us to 32 bit inode numbers of 32 bit platforms due
> > > to the way the radix tree uses unsigned longs for it's indexes
> > 
> >     commit afefdbb28a0a2af689926c30b94a14aea6036719
> >     tree 6ee500575cac928cd90045bcf5b691cf2b8daa09
> >     parent 1d32849b14bc8792e6f35ab27dd990d74b16126c
> >     author David Howells <dhowells@redhat.com> 1159863226 -0700
> >     committer Linus Torvalds <torvalds@g5.osdl.org> 1159887820 -0700
> > 
> >     [PATCH] VFS: Make filldir_t and struct kstat deal in 64-bit inode numbers
> > 
> >     These patches make the kernel pass 64-bit inode numbers internally when
> >     communicating to userspace, even on a 32-bit system.  They are required
> >     because some filesystems have intrinsic 64-bit inode numbers: NFS3+ and XFS
> >     for example.  The 64-bit inode numbers are then propagated to userspace
> >     automatically where the arch supports it.
> >     [...]
> > 
> > Doing this will mean XFS won't be able to support 32-bit inodes on
> > 32-bit platforms the above (merged) patch --- though given that cheap
> > 64-bit systems are now abundant does anyone really care?
> 
> That's a good question. In a recent thread on linux-fsdevel about
> these patches Christoph Hellwig pointed out that 32bit user space is
> not ready for 64 bit inodes, so it's probably going to be a while
> before the second half of this mod is ready (which exports 64 bit
> inodes ito userspace on 32bit platforms).

Ahhh.... I think I misread what Chris wrote here - _32_ bit inodes on
32 bit platforms not working? I can't see how this would be the
case with the mods I posted given that they are entirely internal to
XFS and don't change any external inode number interfaces. And the
above commit shouldn't break XFS either.

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC 0/3] Convert XFS inode hashes to radix trees
       [not found]   ` <20061003222256.GW4695059__33273.3314754025$1159914338$gmane$org@melbourne.sgi.com>
@ 2006-10-04 17:59     ` Andi Kleen
  2006-10-05  0:37       ` David Chinner
  0 siblings, 1 reply; 10+ messages in thread
From: Andi Kleen @ 2006-10-04 17:59 UTC (permalink / raw)
  To: David Chinner; +Cc: xfs-dev, xfs, dhowells, LKML

David Chinner <dgc@sgi.com> writes:
> 
> And yes, 64 bit systems are cheap, cheap, cheap so IMO this
> functionality is really irrelevant moving forward. If it had come
> along a couple of years ago then it would be different, but I think
> mainstream technology is finally catching up with XFS so it's not a
> critical issue anymore... ;)

One issue is that people often still run a lot of 32bit userland
even with 64bit kernels. The compat layer will just truncate
the inodes I think. But so far I haven't heard of anybody
complaining on x86-64.

-Andi

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC 0/3] Convert XFS inode hashes to radix trees
  2006-10-03 22:22   ` David Chinner
  2006-10-04  0:47     ` Chris Wedgwood
  2006-10-04  1:43     ` David Chinner
@ 2006-10-04 19:22     ` Trond Myklebust
  2 siblings, 0 replies; 10+ messages in thread
From: Trond Myklebust @ 2006-10-04 19:22 UTC (permalink / raw)
  To: David Chinner; +Cc: Chris Wedgwood, xfs-dev, xfs, dhowells, LKML

On Wed, 2006-10-04 at 08:22 +1000, David Chinner wrote:
> On Tue, Oct 03, 2006 at 02:23:35PM -0700, Chris Wedgwood wrote:
> > On Tue, Oct 03, 2006 at 04:06:10PM +1000, David Chinner wrote:
> > > Overall, the patchset removes more than 200 lines of code from the
> > > xfs inode caching and lookup code and provides more consistent
> > > scalability for large numbers of cached inodes. The only down side
> > > is that it limits us to 32 bit inode numbers of 32 bit platforms due
> > > to the way the radix tree uses unsigned longs for it's indexes
> > 
> >     commit afefdbb28a0a2af689926c30b94a14aea6036719
> >     tree 6ee500575cac928cd90045bcf5b691cf2b8daa09
> >     parent 1d32849b14bc8792e6f35ab27dd990d74b16126c
> >     author David Howells <dhowells@redhat.com> 1159863226 -0700
> >     committer Linus Torvalds <torvalds@g5.osdl.org> 1159887820 -0700
> > 
> >     [PATCH] VFS: Make filldir_t and struct kstat deal in 64-bit inode numbers
> > 
> >     These patches make the kernel pass 64-bit inode numbers internally when
> >     communicating to userspace, even on a 32-bit system.  They are required
> >     because some filesystems have intrinsic 64-bit inode numbers: NFS3+ and XFS
> >     for example.  The 64-bit inode numbers are then propagated to userspace
> >     automatically where the arch supports it.
> >     [...]
> > 
> > Doing this will mean XFS won't be able to support 32-bit inodes on
> > 32-bit platforms the above (merged) patch --- though given that cheap
> > 64-bit systems are now abundant does anyone really care?

Which completely ignored the fact that NFS systems are already having to
truncate 64-bit inode numbers to 32-bits and pass these truncated values
up to userspace. Collisions have been observed in the wild, and I've
already had to change the 64-bit->32-bit hashing algorithm on at least
one occasion.

By moving that truncation into userspace, we will at least give 64-bit
standards-compliant programs a chance to work correctly.

Trond

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC 0/3] Convert XFS inode hashes to radix trees
  2006-10-04 17:59     ` Andi Kleen
@ 2006-10-05  0:37       ` David Chinner
  0 siblings, 0 replies; 10+ messages in thread
From: David Chinner @ 2006-10-05  0:37 UTC (permalink / raw)
  To: Andi Kleen; +Cc: David Chinner, xfs-dev, xfs, dhowells, LKML

On Wed, Oct 04, 2006 at 07:59:15PM +0200, Andi Kleen wrote:
> David Chinner <dgc@sgi.com> writes:
> > 
> > And yes, 64 bit systems are cheap, cheap, cheap so IMO this
> > functionality is really irrelevant moving forward. If it had come
> > along a couple of years ago then it would be different, but I think
> > mainstream technology is finally catching up with XFS so it's not a
> > critical issue anymore... ;)
> 
> One issue is that people often still run a lot of 32bit userland
> even with 64bit kernels.

Which is one of the reasons why XFS uses 32 bit inodes by default
even on 64 bit kernels. XFS does not use 64 bit inodes unless you
tell it to via the inode64 mount option....

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC 0/3] Convert XFS inode hashes to radix trees
  2006-10-03  6:06 [RFC 0/3] Convert XFS inode hashes to radix trees David Chinner
  2006-10-03 21:23 ` Chris Wedgwood
@ 2006-11-15  1:09 ` Shailendra Tripathi
  2006-11-20  2:13   ` David Chinner
  1 sibling, 1 reply; 10+ messages in thread
From: Shailendra Tripathi @ 2006-11-15  1:09 UTC (permalink / raw)
  To: David Chinner; +Cc: xfs-dev, xfs

Hi David,
                I regret for making comments and questions on this quite 
late (somehow I missed to email).
It does appear to me that using this approach can potentially help in 
cluster hash list related manipulations.
However, this appears (to me) to be at the cost of regular inode lookup.

As of now, each of the hash buckets have their own lock. This helps in 
not making the xfs_iget
operations hot. I have not seen of xfs_iget anywhere on the top in my 
profiling of Linux for SPECFS. 
With this code, the number of hash buckets can be appropriately sized 
(based upon memory availability).

However, it appears to be that radix tree (even with 15) can become a 
bottleneck. Lets assume that there are
600K inodes on a reasonably big end system and assuming fare 
distribution, each of the radix tree will
have 600K/15 ~ 40K inodes per hash tree. Insertion and deletion  to the 
list have to take writer_lock and
given their frequency, both readers (lookups) and writers will be affected.
 
That means, if one tree is locked for insertion or deletion, remaining 
40K inodes will be just serialized. However, in
current design, by sacrificing little extra memory, we can allocate more 
hash buckets and eventually the locked down
inodes can be made pretty small.  My knowledge on radix tree is little 
limited, but I think, increasing the number of trees
would be much more costly in memory terms. Given less memory usage and 
performance, I tend to believe that hash
table is more scalable than radix tree for inode tables.
        Have you done any performance testing with these patches. I am 
quite curious to know the results. If not, may be I can
try do some perf. testing with these changes albeit on a old kernel tree.
       Am I missing something here ? Please let me know.

Thanks and Regards,
Shailendra

David Chinner wrote:
> One of the long standing problems with XFS on large machines and
> filesystems is the sizing of the inode cache hashes used by XFS to
> index the xfs_inode_t structures. The mount option ihashsize became
> a necessity because the default calculations simply can't get it
> right for all situations.
>
> On top of that, as we increase the size of the inode hash and cache
> more inodes, the inode cluster hash becomes the limiting factor,
> especially when we have sparse cluster population.  The result of
> this is that we can always get to the point where either the ihash
> or the chash is a scalability or performance limitation.
>
> The following three patches replace the hashes with a more scalable
> solution that should not require tweaking in most situations.
>
> I chose a radix tree to replace the hash chains because of a neat
> alignment of XFS inode structures and the kernel radix tree fanout.
> XFS allocates inodes in clusters of 64 inodes and the radix tree
> keeps 64 sequential entries per node.  That means all for the inodes
> in a cluster will always sit in the same node of the radix tree.
>
> Using this relationship, we completely remove the need for the
> cluster hash to track clusters because we can use a gang lookup on
> the radix tree to search for an existing inode in the cluster in an
> efficient manner.
>
> The following three patches sit on top of the recently posted
> i_flags cleanup patch.
> (http://marc.theaimsgroup.com/?l=linux-xfs&m=115985254820322&w=2)
>
> The first patch replaces the inode hash chains with radix trees.  A
> single radix tree with a read/write lock does not provide enough
> parallelism to prevent performance regressions under simultanenous
> create/unlink workloadds, so we hash the inode clusters into
> different radix trees each with their own read/write lock. The
> default is to create (2*ncpus)-1 radix trees up to a maximum of 15.
> At this point I have left the ihashsize mount option alone but
> limited the maximum number it can take to 128. if you specify more
> than 128 (i.e. everyone currently using this mount option), it
> falls back to the default.
>
> The second patch introduces a per-cluster object lock for chaining
> the inodes in the cluster together (for xfs_iflush()). The inode
> chain is currently locked by cluster hash chain lock, so we need
> some other method of locking if we are to remove the cluster hash
> altogether.
>
> The third patch removes the cluster hash and replaces it with some
> masking and a radix tree gang lookup.
>
> Overall, the patchset removes more than 200 lines of code from the
> xfs inode caching and lookup code and provides more consistent
> scalability for large numbers of cached inodes. The only down side
> is that it limits us to 32 bit inode numbers of 32 bit platforms due
> to the way the radix tree uses unsigned longs for it's indexes
>
> Comments, thoughts, etc are welcome.
>
> Cheers,
>
> Dave.
>   

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC 0/3] Convert XFS inode hashes to radix trees
  2006-11-15  1:09 ` Shailendra Tripathi
@ 2006-11-20  2:13   ` David Chinner
  0 siblings, 0 replies; 10+ messages in thread
From: David Chinner @ 2006-11-20  2:13 UTC (permalink / raw)
  To: Shailendra Tripathi; +Cc: xfs-dev, xfs

On Tue, Nov 14, 2006 at 05:09:03PM -0800, Shailendra Tripathi wrote:
> Hi David,
>                I regret for making comments and questions on this quite 
> late (somehow I missed to email).
> It does appear to me that using this approach can potentially help in 
> cluster hash list related manipulations.
> However, this appears (to me) to be at the cost of regular inode lookup.

Yes, there is less parallelism in the radix tree approach, as I stated
in the original description.

> As of now, each of the hash buckets have their own lock. This helps in 
> not making the xfs_iget
> operations hot. I have not seen of xfs_iget anywhere on the top in my 
> profiling of Linux for SPECFS. 
> With this code, the number of hash buckets can be appropriately sized 
> (based upon memory availability).

Sure, but tuning for specsfs is not the problem we are trying to
solve here.  The problem we are solving is scaling to tens of
millions of cached inodes in core -without needing to tune- the
filesystem and the inode hashes are the number one problem there.

> However, it appears to be that radix tree (even with 15) can become a 
> bottleneck. Lets assume that there are
> 600K inodes on a reasonably big end system and assuming fare 

Only 600k cached inodes? That's not a "big end" system - we're
seeing problems with single filesystem inode caches almost two
_orders of magnitude_ larger than this on production machines.

> distribution, each of the radix tree will
> have 600K/15 ~ 40K inodes per hash tree. Insertion and deletion  to the 
> list have to take writer_lock and
> given their frequency, both readers (lookups) and writers will be affected.

Right, but we've been hacking at this code time and time again
because of scalability problems due to hash sizing, inefficient list
traversal, non MRU ordering of the hash lists, etc.  Hash tables are
simply too inflexible when it comes to scaling to really, really
large numbers of cached inodes.

The advantage of radix trees is logarithmic scaling, so the length
of time the lock is held (either shared or exclusive) is reduced
substantially when cache misses (i.e. when you need to do an insert)
occur. Hence the reduction in the number of locks is somewhat
negated by the reduced time we need to hold the lock for.

So, I've traded off massively overblown parallelism for a struture
that scales far better and, by my measurements, provides the same
throughput.

And, FWIW, I'm not really concerned about cache hit parallelism in
the face of insert and delete exclusive locking because this patch
in the -mm tree from Nick Piggin:

radix-tree-rcu-lockless-readside.patch

is the right way to solve this problem and will be far better
than even the existing hash is in terms of lookup parallelism.

>        Have you done any performance testing with these patches. I am 
> quite curious to know the results. If not, may be I can
> try do some perf. testing with these changes albeit on a old kernel tree.

Yes, I have done some performance testing on them (but not specsfs).
IIRC (I can't find the results right now), a single radix tree performed
the same as a default hash up to ~8 parallel threads all doing
creates or removes and the tests ran up to about 5 million inodes
in core on the one filesystem. With a hash of 7 radix trees (4p machine)
the radix tree implemetnation at 8 threads had about 10% improvement
in throughput and this increased to about 15% by 128 threads. Also,
there was a reduction in CPU usage of about 10% when the thread count
increased past about 16....

The other big difference is theimprovement in inode reclaim speed -
unmount of a filesystem with ~13 million inodes in core dropped from
about 20 minutes to under 2 minutes i.e. ~18 minutes reclaiming
inodes (i.e. removing them form the hashes) down to ~30s during
unmount.

>       Am I missing something here ? Please let me know.

The potential that lockless radix tree lookups imply, I think ;)

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2006-11-20  2:14 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-10-03  6:06 [RFC 0/3] Convert XFS inode hashes to radix trees David Chinner
2006-10-03 21:23 ` Chris Wedgwood
2006-10-03 22:22   ` David Chinner
2006-10-04  0:47     ` Chris Wedgwood
2006-10-04  1:43     ` David Chinner
2006-10-04 19:22     ` Trond Myklebust
     [not found]   ` <20061003222256.GW4695059__33273.3314754025$1159914338$gmane$org@melbourne.sgi.com>
2006-10-04 17:59     ` Andi Kleen
2006-10-05  0:37       ` David Chinner
2006-11-15  1:09 ` Shailendra Tripathi
2006-11-20  2:13   ` David Chinner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox