From: Alex Elder <aelder@sgi.com>
To: Dave Chinner <david@fromorbit.com>
Cc: xfs@oss.sgi.com
Subject: Re: [PATCH 0/18] xfs: metadata and buffer cache scalability improvements
Date: Fri, 17 Sep 2010 08:21:40 -0500 [thread overview]
Message-ID: <1284729700.5524.53.camel@doink> (raw)
In-Reply-To: <1284461777-1496-1-git-send-email-david@fromorbit.com>
On Tue, 2010-09-14 at 20:55 +1000, Dave Chinner wrote:
> This patchset has grown quite a bit - it started out as a "convert
> the buffer cache to rbtrees" patch, and has gotten bigger as I
> peeled the onion from one bottleneck to another.
I know you're going to re-submit this series. I would
like you to split it into several smaller series if you
don't mind. Some of these are simpler than others,
and there are some somewhat logical groupings (you
even described them here as two sets). But beyond
that it would be nice to get at least some of them
committed before the full series is perfected.
To be constructive, here's a grouping based on what
seems to be a change of significance somehow. I'm
not suggesting they all be separated, but I'm just
trying to identify the many things you're doing with
this series.
[01/18] xfs: single thread inode cache shrinking.
[02/18] xfs: reduce the number of CIL lock round trips during commit
[05/18] xfs: convert inode cache lookups to use RCU locking
[06/18] xfs: convert pag_ici_lock to a spin lock
[07/18] xfs: don't use vfs writeback for pure metadata modifications
[08/18] xfs: rename xfs_buf_get_nodaddr to be more appropriate
[09/18] xfs: introduced uncached buffer read primitve
[10/18] xfs: store xfs_mount in the buftarg instead of in the xfs_buf
[11/18] xfs: kill XBF_FS_MANAGED buffers
[12/18] xfs: use unhashed buffers for size checks
[13/18] xfs: remove buftarg hash for external devices
[03/18] xfs: remove debug assert for per-ag reference counting
[04/18] xfs: lockless per-ag lookups
[14/18] xfs: convert buffer cache hash to rbtree
[15/18] xfs; pack xfs_buf structure more tightly
[16/18] xfs: convert xfsbud shrinker to a per-buftarg shrinker.
[17/18] xfs: add a lru to the XFS buffer cache
[18/18] xfs: stop using the page cache to back the buffer cache
Thanks.
-Alex
> Performance numbers here are 8-way fs_mark create to 50M files, and
> 8-way rm -rf to remove the files created.
>
> wall time fs_mark rate
> 2.6.36-rc4:
> create: 13m10s 65k file/s
> unlink: 23m58s N/A
>
> The first set of patches are generic infrastructure changes that
> address pain points the rbtree based buffer cache introduces. I've
> put them first because they are simpler to review and have immediate
> impact on performance. These patches address lock contention as
> measured by the kernel lockstat infrastructure.
>
> xfs: single thread inode cache shrinking.
> - prevents per-ag contention during cache shrinking
>
> xfs: reduce the number of CIL lock round trips during commit
> - reduces lock traffic on the xc_cil_lock by two orders of
> magnitude
>
> xfs: remove debug assert for per-ag reference counting
> xfs: lockless per-ag lookups
> - hottest lock in the system with buffer cache rbtree path
> - converted to use RCU.
>
> xfs: convert inode cache lookups to use RCU locking
> xfs: convert pag_ici_lock to a spin lock
> - addresses lookup vs reclaim contention on pag_ici_lock
> - converted to use RCU.
>
> xfs: don't use vfs writeback for pure metadata modifications
> - inode writeback does not keep up with dirtying 100,000
> inodes a second. Avoids the superblock dirty list where
> possible by using the AIL as the age-order flusher.
>
> Performance with these patches:
>
> 2.6.36-rc4 + shrinker + CIL + RCU:
> create: 11m38s 80k files/s
> unlink: 14m29s N/A
>
> Create rate has improved by 20%, unlink time has almost halved. On
> large numbers of inodes, the unlink rate improves even more
> dramatically.
>
> The buffer cache to rbtree series current stands at:
>
> xfs: rename xfs_buf_get_nodaddr to be more appropriate
> xfs: introduced uncached buffer read primitve
> xfs: store xfs_mount in the buftarg instead of in the xfs_buf
> xfs: kill XBF_FS_MANAGED buffers
> xfs: use unhashed buffers for size checks
> xfs: remove buftarg hash for external devices
> - preparatory buffer cache API cleanup patches
>
> xfs: convert buffer cache hash to rbtree
> - what it says ;)
> - includes changes based on Alex's review.
>
> xfs; pack xfs_buf structure more tightly
> - memory usage reduction, means adding the LRU list head is
> effectively memory usage neutral.
>
> xfs: convert xfsbud shrinker to a per-buftarg shrinker.
> xfs: add a lru to the XFS buffer cache
> - Add an LRU for reclaim
>
> xfs: stop using the page cache to back the buffer cache
> - kill all the page cache code
>
> 2.6.36-rc4 + shrinker + CIL + RCU + rbtree:
> create: 9m47s 95k files/s
> unlink: 14m16s N/A
>
> Create rate has improved by another 20%, unlink rate has improved
> marginally (noise, really).
>
> There are two remaining parts to the buffer cache conversions:
>
> 1. work out how to efficiently support block size smaller
> than page size. The current code works, but uses a page per
> sub-apge buffer. A set of slab caches would be perfect for
> this use, but I'm not sure that we are allowed to use them
> for IO anymore. Christoph?
>
> 2. Connect up the buffer type sepcific reclaim priority
> reference counting and convert the LRU reclaim to a cursor
> based walk that simply drops reclaim reference counts and
> frees anything that has a zero reclaim reference.
>
> Overall, I can swap the order of the two patch sets, and the
> incremental performance increases for create are pretty much
> identical. For unlink, te benefit comes from the shrinker
> modification. For those that care, the rbtree patch set in isolation
> results in a time of 4h38m to create 1 billion inodes on my 8p/4GB
> RAM test VM. I haven't run this test with the RCU and writeback
> modifications yet.
>
> Moving on from this point is to start testing against Nick Piggin's
> VFS scalability tree, aѕ the inode_lock and dcache_lock are now the
> performance limiting factors. That will, without doubt, bring new
> hotspots out in XFS so I'll be starting this cycle over again soon.
>
> Overall diffstat at this point is:
>
> fs/xfs/linux-2.6/kmem.h | 1 +
> fs/xfs/linux-2.6/xfs_buf.c | 588 ++++++++++++++--------------------------
> fs/xfs/linux-2.6/xfs_buf.h | 61 +++--
> fs/xfs/linux-2.6/xfs_iops.c | 18 +-
> fs/xfs/linux-2.6/xfs_super.c | 11 +-
> fs/xfs/linux-2.6/xfs_sync.c | 49 +++-
> fs/xfs/linux-2.6/xfs_trace.h | 2 +-
> fs/xfs/quota/xfs_qm_syscalls.c | 4 +-
> fs/xfs/xfs_ag.h | 9 +-
> fs/xfs/xfs_buf_item.c | 3 +-
> fs/xfs/xfs_fsops.c | 11 +-
> fs/xfs/xfs_iget.c | 46 +++-
> fs/xfs/xfs_inode.c | 22 +-
> fs/xfs/xfs_inode_item.c | 9 -
> fs/xfs/xfs_log.c | 3 +-
> fs/xfs/xfs_log_cil.c | 116 +++++----
> fs/xfs/xfs_log_recover.c | 18 +-
> fs/xfs/xfs_mount.c | 126 ++++-----
> fs/xfs/xfs_mount.h | 2 +
> fs/xfs/xfs_rtalloc.c | 29 +-
> fs/xfs/xfs_vnodeops.c | 2 +-
> 21 files changed, 502 insertions(+), 628 deletions(-)
>
> So it is improving performance, removing code and fixing
> longstanding bugs all at the same time. ;)
>
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
next prev parent reply other threads:[~2010-09-17 13:20 UTC|newest]
Thread overview: 67+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-09-14 10:55 [PATCH 0/18] xfs: metadata and buffer cache scalability improvements Dave Chinner
2010-09-14 10:56 ` [PATCH 01/18] xfs: single thread inode cache shrinking Dave Chinner
2010-09-14 18:48 ` Alex Elder
2010-09-14 22:48 ` Dave Chinner
2010-09-14 10:56 ` [PATCH 02/18] xfs: reduce the number of CIL lock round trips during commit Dave Chinner
2010-09-14 14:48 ` Christoph Hellwig
2010-09-14 17:21 ` Alex Elder
2010-09-14 10:56 ` [PATCH 03/18] xfs: remove debug assert for per-ag reference counting Dave Chinner
2010-09-14 14:48 ` Christoph Hellwig
2010-09-14 17:22 ` Alex Elder
2010-09-14 10:56 ` [PATCH 04/18] xfs: lockless per-ag lookups Dave Chinner
2010-09-14 12:35 ` Dave Chinner
2010-09-14 14:50 ` Christoph Hellwig
2010-09-14 17:28 ` Alex Elder
2010-09-14 10:56 ` [PATCH 05/18] xfs: convert inode cache lookups to use RCU locking Dave Chinner
2010-09-14 16:27 ` Christoph Hellwig
2010-09-14 23:17 ` Dave Chinner
2010-09-14 21:23 ` Alex Elder
2010-09-14 23:42 ` Dave Chinner
2010-09-14 10:56 ` [PATCH 06/18] xfs: convert pag_ici_lock to a spin lock Dave Chinner
2010-09-14 21:26 ` Alex Elder
2010-09-14 10:56 ` [PATCH 07/18] xfs: don't use vfs writeback for pure metadata modifications Dave Chinner
2010-09-14 14:54 ` Christoph Hellwig
2010-09-15 0:14 ` Dave Chinner
2010-09-15 0:17 ` Christoph Hellwig
2010-09-14 22:12 ` Alex Elder
2010-09-15 0:28 ` Dave Chinner
2010-11-08 10:47 ` Christoph Hellwig
2010-09-14 10:56 ` [PATCH 08/18] xfs: rename xfs_buf_get_nodaddr to be more appropriate Dave Chinner
2010-09-14 14:56 ` Christoph Hellwig
2010-09-14 22:14 ` Alex Elder
2010-09-14 10:56 ` [PATCH 09/18] xfs: introduced uncached buffer read primitve Dave Chinner
2010-09-14 14:56 ` Christoph Hellwig
2010-09-14 22:16 ` Alex Elder
2010-09-14 10:56 ` [PATCH 10/18] xfs: store xfs_mount in the buftarg instead of in the xfs_buf Dave Chinner
2010-09-14 14:57 ` Christoph Hellwig
2010-09-14 22:21 ` Alex Elder
2010-09-14 10:56 ` [PATCH 11/18] xfs: kill XBF_FS_MANAGED buffers Dave Chinner
2010-09-14 14:59 ` Christoph Hellwig
2010-09-14 22:26 ` Alex Elder
2010-09-14 10:56 ` [PATCH 12/18] xfs: use unhashed buffers for size checks Dave Chinner
2010-09-14 15:00 ` Christoph Hellwig
2010-09-14 22:29 ` Alex Elder
2010-09-14 10:56 ` [PATCH 13/18] xfs: remove buftarg hash for external devices Dave Chinner
2010-09-14 22:29 ` Alex Elder
2010-09-14 10:56 ` [PATCH 14/18] xfs: convert buffer cache hash to rbtree Dave Chinner
2010-09-14 16:29 ` Christoph Hellwig
2010-09-15 17:46 ` Alex Elder
2010-09-14 10:56 ` [PATCH 15/18] xfs; pack xfs_buf structure more tightly Dave Chinner
2010-09-14 16:30 ` Christoph Hellwig
2010-09-15 18:01 ` Alex Elder
2010-09-14 10:56 ` [PATCH 16/18] xfs: convert xfsbud shrinker to a per-buftarg shrinker Dave Chinner
2010-09-14 16:32 ` Christoph Hellwig
2010-09-15 20:19 ` Alex Elder
2010-09-16 0:28 ` Dave Chinner
2010-09-14 10:56 ` [PATCH 17/18] xfs: add a lru to the XFS buffer cache Dave Chinner
2010-09-14 23:16 ` Christoph Hellwig
2010-09-15 0:05 ` Dave Chinner
2010-09-15 21:28 ` Alex Elder
2010-09-14 10:56 ` [PATCH 18/18] xfs: stop using the page cache to back the " Dave Chinner
2010-09-14 23:20 ` Christoph Hellwig
2010-09-15 0:06 ` Dave Chinner
2010-09-14 14:25 ` [PATCH 0/18] xfs: metadata and buffer cache scalability improvements Christoph Hellwig
2010-09-17 13:21 ` Alex Elder [this message]
2010-09-21 2:02 ` Dave Chinner
2010-09-21 16:23 ` Alex Elder
2010-09-21 22:34 ` Dave Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1284729700.5524.53.camel@doink \
--to=aelder@sgi.com \
--cc=david@fromorbit.com \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox