[PATCH 00/16] xfs: current patch stack for 2.6.38 window

* [PATCH 00/16] xfs: current patch stack for 2.6.38 window
@ 2010-11-08  8:55 Dave Chinner
  2010-11-08  8:55 ` [PATCH 01/16] xfs: fix per-ag reference counting in inode reclaim tree walking Dave Chinner
                   ` (16 more replies)
  0 siblings, 17 replies; 42+ messages in thread
From: Dave Chinner @ 2010-11-08  8:55 UTC (permalink / raw)
  To: xfs

Folks,

FYI, here is my current XFS patch stack that I'll be trying to get ready in
time for the 2.6.38 merge window.  Note that the first two patches are
candidates for 2.6.37-rc. They are a perag reference counting fix and the
movement of a trace point.

My tree is currently based on the VFS locking changes I have out for review,
so there's a couple fo patches that won't apply sanely to a mainline or OSS xfs
dev tree. See below for a pointer to a git tree with all the patches in it.

First patch is a per-cpu superblock counter rewrite. This uses the generic
per-cpu coutner infrastructure to do the heavy lifting. Needs to be split into
two patches.

Following this is the dynamic speculative allocation patches. These have been
rewritten to be base don the current inode size rather than a thumb-in-the-air
how-many-preallocs-have-we-already-done algorithm. There are also some fixes in
the second patch that fix assumptions about ip->i_delayed_blks being zero after
a flush.

Next up we have the inode cache RCU freeing and lookup patches, including one
that avoids putting the inode in the VFS hash (similar to Christoph's patch,
but using the different VFS code).

Then there are buffer cache reclaim changes. First is a per-buftarg shrinker
interface, followed by a lazily updated per-buftarg buffer LRU. building on
this connecting up the prioritised buffer reclaim hooks that ensure more
critical buffers are harder to reclaim.

AIL lock contention fixes are next, with bulk AIL insert and removal functions
being implemented and connected up to the transaction commit and inode buffer
IO completion routines. These significantly reduce AIL lock contention, and
combined with a reduction in the granularity of xfsaild push wakeups, the AIL
lock drops out of the "top 10" contended locks on ۸-way workloads.

There's a fix to avoid error injection from burning CPU on debug kernels - with
a badly fragmented freespace tree, the btree block validation was taking ~60%
of the CPU time, with most of that running error injection checks. 

Finally, there's a patch to split up the log grant lock. This needs splitting
into 4 or 5 smaller patches (as you can see it was originally from the commit
log). It splits the grant lock into two list locks (reserve and write queues),
and converts all the other variables that the grant lock protected into atomic
variables. Grant head calculations are made atomic by converting them into 64
bit "LSNs" and the use of cmpxchg loops on atomic 64 bit variables. All log
tail and sync LSNs updates are made atomic via conversion to atomic variables.
With this, the grant lock goes away completely, and the transaction reserve
fast path now only has two cmpxchg loops instead of a heavily contended spin
lock.

The result of all this is raw cpu bound 8-way create performance of just over
100,000 inodes/s, and unlink performance of over 90,000 inodes/s. 8-way dbench
performance is improved from ~1150MB/s to ~1650MB/s by this patchset.

For 8-way creation and unlink of small files (~50 million), the lockstat
profiles look like:

				contended	total		Lock
		Lock		acquistions  acquisitions	Description
-----------------------------   -----------  ------------	-------------------
           inode_wb_list_lock:    496330785    836287347	VFS
                  dcache_lock:    116299583    681450027	VFS
        &(&vblk->lock)->rlock:     52829329    131054495	virtio block device
    &sb->s_type->i_lock_key#1:     41772196   2375571240	VFS (inode->i_lock)
  &(&cil->xc_cil_lock)->rlock:     29549897    410553961	XFS (CIL commit lock)
         &irq_desc_lock_class:     27520142     63908701	IRQ edge lock
 &(&pag->pag_buf_lock)->rlock:     11756249   1838039685	XFS (buffer cache lock)
    &(&dentry->d_lock)->rlock:      5735657   1225028487	VFS
 &(&parent->list_lock)->rlock:      4356293    249408696	VM (SLAB list lock)
           inode_sb_list_lock:      3616366    203712449	VFS
                        key#5:      2075310    139221312	XFS SB percpu counter
              inode_hash_lock:      1529969    102359626	VFS
             rcu_node_level_0:      1363470     13730113	RCU
        &(&zone->lock)->rlock:      1247467     16469316	VM (free list lock)
 &(&pag->pag_ici_lock)->rlock:       770880    337090972	XFS (inode cache lock)
                    &rq->lock:       589111    184220946	Scheduler
               inode_lru_lock:       527163    102791204	VFS
g->l_grant_write_lock)->rlock:       526471     51279626	XFS (grant write lock)
    &(&pag->pagb_lock)->rlock:       402878    208861744	XFS (busy extent list)
    &(&zone->lru_lock)->rlock:       167692     25383748	VM (page cache LRU)
              &on_slab_l3_key:       166183     58470153	VM (slab cache)
            semaphore->lock#2:       161321   3659173925	???
     &(&ailp->xa_lock)->rlock:       143859    164470123	XFS (AIL lock)
          &cil->xc_ctx_lock-W:        32850       173279	XFS (CIL push lock)
          &cil->xc_ctx_lock-R:        90868    357572724	XFS (CIL push lock)

I'm still to determine if I'll have the time to finish the removal of the page cache from
the buffer cache yet - for pure inode create/unlink workloads the buftarg
mapping tree lock is the second most heavily contended lock in the system.
Hence this definitely needs solving in some way or another....

Anyway, comments are welcome - just keep in mind that there is still some
polish required for these patches. ;)

If you want the git version, everything is here:

  git://git.kernel.org/pub/scm/linux/kernel/git/dgc/xfsdev.git working

Dave Chinner (16):
      xfs: fix per-ag reference counting in inode reclaim tree walking
      xfs: move delayed write buffer trace
      [RFC] xfs: use generic per-cpu counter infrastructure
      xfs: dynamic speculative EOF preallocation
      xfs: don't truncate prealloc from frequently accessed inodes
      patch xfs-inode-hash-fake
      xfs: convert inode cache lookups to use RCU locking
      xfs: convert pag_ici_lock to a spin lock
      xfs: convert xfsbud shrinker to a per-buftarg shrinker.
      xfs: add a lru to the XFS buffer cache
      xfs: connect up buffer reclaim priority hooks
      xfs: bulk AIL insertion during transaction commit
      xfs: reduce the number of AIL push wakeups
      xfs: remove all the inodes on a buffer from the AIL in bulk
      xfs: only run xfs_error_test if error injection is active
      xfs: make xlog_space_left() independent of the grant lock

 fs/xfs/linux-2.6/xfs_buf.c     |  239 ++++++++----
 fs/xfs/linux-2.6/xfs_buf.h     |   43 ++-
 fs/xfs/linux-2.6/xfs_iops.c    |   11 +-
 fs/xfs/linux-2.6/xfs_linux.h   |    9 -
 fs/xfs/linux-2.6/xfs_super.c   |   22 +-
 fs/xfs/linux-2.6/xfs_sync.c    |   28 +-
 fs/xfs/linux-2.6/xfs_trace.h   |   36 +-
 fs/xfs/quota/xfs_dquot.c       |    2 +-
 fs/xfs/quota/xfs_qm_syscalls.c |    3 +
 fs/xfs/xfs_ag.h                |    2 +-
 fs/xfs/xfs_alloc.c             |    4 +-
 fs/xfs/xfs_bmap.c              |    9 +-
 fs/xfs/xfs_btree.c             |   11 +-
 fs/xfs/xfs_buf_item.c          |   17 +-
 fs/xfs/xfs_da_btree.c          |    4 +-
 fs/xfs/xfs_dfrag.c             |   13 +
 fs/xfs/xfs_error.c             |    3 +
 fs/xfs/xfs_error.h             |    5 +-
 fs/xfs/xfs_extfree_item.c      |   85 +++--
 fs/xfs/xfs_extfree_item.h      |   12 +-
 fs/xfs/xfs_fsops.c             |    4 +-
 fs/xfs/xfs_ialloc.c            |    2 +-
 fs/xfs/xfs_iget.c              |   55 ++-
 fs/xfs/xfs_inode.c             |   24 +-
 fs/xfs/xfs_inode.h             |    1 +
 fs/xfs/xfs_inode_item.c        |  112 +++++-
 fs/xfs/xfs_iomap.c             |   53 ++-
 fs/xfs/xfs_log.c               |  678 +++++++++++++++++---------------
 fs/xfs/xfs_log_cil.c           |    9 +-
 fs/xfs/xfs_log_priv.h          |   40 ++-
 fs/xfs/xfs_log_recover.c       |   27 +-
 fs/xfs/xfs_mount.c             |  837 +++++++++++-----------------------------
 fs/xfs/xfs_mount.h             |   80 +---
 fs/xfs/xfs_trans.c             |   70 ++++-
 fs/xfs/xfs_trans.h             |    2 +-
 fs/xfs/xfs_trans_ail.c         |  189 ++++++++-
 fs/xfs/xfs_trans_extfree.c     |    4 +-
 fs/xfs/xfs_trans_priv.h        |   13 +-
 fs/xfs/xfs_vnodeops.c          |   61 ++-
 include/linux/percpu_counter.h |   16 +
 lib/percpu_counter.c           |   79 ++++
 41 files changed, 1593 insertions(+), 1321 deletions(-)

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 42+ messages in thread