public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Alex Elder <aelder@sgi.com>
Cc: xfs@oss.sgi.com
Subject: Re: [PATCH 0/18] xfs: metadata and buffer cache scalability improvements
Date: Tue, 21 Sep 2010 12:02:03 +1000	[thread overview]
Message-ID: <20100921020203.GC2614@dastard> (raw)
In-Reply-To: <1284729700.5524.53.camel@doink>

On Fri, Sep 17, 2010 at 08:21:40AM -0500, Alex Elder wrote:
> On Tue, 2010-09-14 at 20:55 +1000, Dave Chinner wrote:
> > This patchset has grown quite a bit - it started out as a "convert
> > the buffer cache to rbtrees" patch, and has gotten bigger as I
> > peeled the onion from one bottleneck to another.
> 
> I know you're going to re-submit this series.  I would
> like you to split it into several smaller series if you
> don't mind.  Some of these are simpler than others,
> and there are some somewhat logical groupings (you
> even described them here as two sets).  But beyond
> that it would be nice to get at least some of them
> committed before the full series is perfected.
> 
> To be constructive, here's a grouping based on what
> seems to be a change of significance somehow.  I'm
> not suggesting they all be separated, but I'm just
> trying to identify the many things you're doing with
> this series.  
> 
> [01/18] xfs: single thread inode cache shrinking.

This is separate, and I'm completely reworking this patch.

Fundamentally, single threading the shrinker is not robust enough
to prevent OOM situations because reclaim relies on the shrinkers to
throttle reclaim sufficiently that some memory is reclaimed while
they are running. Hence every shrinker call needs to do something,
otherwise the direct reclaim priority will wind up very quickly and
declare OOM instead of waiting for IO completion to clean and free
objects.

I've introduced per-ag reclaim walk locks to prevent multiple
shrinkers from walking the same AG, and this has prevented most of
the lock contention. instead, the shrinkers burn CPU walking
instead. hence I've needed to add batched lookups (new patch)
and optimise the way we check whether an inode is reclaimable by
avoiding locking the inode first.

I've also added a scan cursor that tracks where the last reclaim
walk got up to, and restarts the next shrinker reclaim walk from
that inode. This means it is not walkiagn over the same unreclaimable
inodes over and over again.

Then I've changed shrinker reclaim to be sort-of-blocking. That is, it
doesn't block on locks (because that would cause deadlocks), but it
does does SYNC_WAIT inode cluster IO when it finds a dirty inode.
This provides sufficient throttling for reclaim not to OOM.

To make matters even more complex, the inode cache shrinker was
being overloaded by the VM because the buffer cache lru shrinker was
not being asked to do enough work. As a result, the buffer cache
would grow to 1.5GB, and the VM would spent all it's time trying to
shrinker the inode cache! The cause of this was leaving stale
buffers on the LRU, so avoiding leaving them on the LRU has
prevented buffer cache windup on parallel unlink workloads.

The reclaim code is not as efficient as the "just do a full, single
threaded non-blocking reclaim pass" code, but it is still ~4x faster
than the existing code and does not randomly go OOM on parallel
unlink workloads.


> [02/18] xfs: reduce the number of CIL lock round trips during commit

Stand-alone fix.

> [05/18] xfs: convert inode cache lookups to use RCU locking
> [06/18] xfs: convert pag_ici_lock to a spin lock

So up to this point I now have:

d10d7d6 xfs: reduce the number of CIL lock round trips during commit
e9a1a2a xfs: remove debug assert for per-ag reference counting
0a055a3 xfs: lockless per-ag lookups
3403f75 xfs: convert inode cache lookups to use RCU locking
84c5b79 xfs: convert pag_ici_lock to a spin lock
033bc9b xfs: batch per-ag inode lookup walks (NEW)
c5bbc30 xfs: rework inode cache shrinking. (NEW)




> [07/18] xfs: don't use vfs writeback for pure metadata modifications

Standalone.

> [08/18] xfs: rename xfs_buf_get_nodaddr to be more appropriate
> [09/18] xfs: introduced uncached buffer read primitve
> [10/18] xfs: store xfs_mount in the buftarg instead of in the xfs_buf
> [11/18] xfs: kill XBF_FS_MANAGED buffers
> [12/18] xfs: use unhashed buffers for size checks
> [13/18] xfs: remove buftarg hash for external devices

cleanups needed for rbtree conversion.

> [03/18] xfs: remove debug assert for per-ag reference counting
> [04/18] xfs: lockless per-ag lookups

Stand alone.

> [14/18] xfs: convert buffer cache hash to rbtree

Goes with the cleanups.

> [15/18] xfs; pack xfs_buf structure more tightly
> [16/18] xfs: convert xfsbud shrinker to a per-buftarg shrinker.
> [17/18] xfs: add a lru to the XFS buffer cache
> [18/18] xfs: stop using the page cache to back the buffer cache

All together, with the LRU code being reworked a bit w.r.t. stale
buffers and shrinker behaviour.

In reality, though, i don't think that separating them into separate
series make much sense. The order they are in right now is
bisectable and fairly logical....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

  reply	other threads:[~2010-09-21  2:01 UTC|newest]

Thread overview: 67+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-09-14 10:55 [PATCH 0/18] xfs: metadata and buffer cache scalability improvements Dave Chinner
2010-09-14 10:56 ` [PATCH 01/18] xfs: single thread inode cache shrinking Dave Chinner
2010-09-14 18:48   ` Alex Elder
2010-09-14 22:48     ` Dave Chinner
2010-09-14 10:56 ` [PATCH 02/18] xfs: reduce the number of CIL lock round trips during commit Dave Chinner
2010-09-14 14:48   ` Christoph Hellwig
2010-09-14 17:21   ` Alex Elder
2010-09-14 10:56 ` [PATCH 03/18] xfs: remove debug assert for per-ag reference counting Dave Chinner
2010-09-14 14:48   ` Christoph Hellwig
2010-09-14 17:22   ` Alex Elder
2010-09-14 10:56 ` [PATCH 04/18] xfs: lockless per-ag lookups Dave Chinner
2010-09-14 12:35   ` Dave Chinner
2010-09-14 14:50   ` Christoph Hellwig
2010-09-14 17:28   ` Alex Elder
2010-09-14 10:56 ` [PATCH 05/18] xfs: convert inode cache lookups to use RCU locking Dave Chinner
2010-09-14 16:27   ` Christoph Hellwig
2010-09-14 23:17     ` Dave Chinner
2010-09-14 21:23   ` Alex Elder
2010-09-14 23:42     ` Dave Chinner
2010-09-14 10:56 ` [PATCH 06/18] xfs: convert pag_ici_lock to a spin lock Dave Chinner
2010-09-14 21:26   ` Alex Elder
2010-09-14 10:56 ` [PATCH 07/18] xfs: don't use vfs writeback for pure metadata modifications Dave Chinner
2010-09-14 14:54   ` Christoph Hellwig
2010-09-15  0:14     ` Dave Chinner
2010-09-15  0:17       ` Christoph Hellwig
2010-09-14 22:12   ` Alex Elder
2010-09-15  0:28     ` Dave Chinner
2010-11-08 10:47   ` Christoph Hellwig
2010-09-14 10:56 ` [PATCH 08/18] xfs: rename xfs_buf_get_nodaddr to be more appropriate Dave Chinner
2010-09-14 14:56   ` Christoph Hellwig
2010-09-14 22:14   ` Alex Elder
2010-09-14 10:56 ` [PATCH 09/18] xfs: introduced uncached buffer read primitve Dave Chinner
2010-09-14 14:56   ` Christoph Hellwig
2010-09-14 22:16   ` Alex Elder
2010-09-14 10:56 ` [PATCH 10/18] xfs: store xfs_mount in the buftarg instead of in the xfs_buf Dave Chinner
2010-09-14 14:57   ` Christoph Hellwig
2010-09-14 22:21   ` Alex Elder
2010-09-14 10:56 ` [PATCH 11/18] xfs: kill XBF_FS_MANAGED buffers Dave Chinner
2010-09-14 14:59   ` Christoph Hellwig
2010-09-14 22:26   ` Alex Elder
2010-09-14 10:56 ` [PATCH 12/18] xfs: use unhashed buffers for size checks Dave Chinner
2010-09-14 15:00   ` Christoph Hellwig
2010-09-14 22:29   ` Alex Elder
2010-09-14 10:56 ` [PATCH 13/18] xfs: remove buftarg hash for external devices Dave Chinner
2010-09-14 22:29   ` Alex Elder
2010-09-14 10:56 ` [PATCH 14/18] xfs: convert buffer cache hash to rbtree Dave Chinner
2010-09-14 16:29   ` Christoph Hellwig
2010-09-15 17:46   ` Alex Elder
2010-09-14 10:56 ` [PATCH 15/18] xfs; pack xfs_buf structure more tightly Dave Chinner
2010-09-14 16:30   ` Christoph Hellwig
2010-09-15 18:01   ` Alex Elder
2010-09-14 10:56 ` [PATCH 16/18] xfs: convert xfsbud shrinker to a per-buftarg shrinker Dave Chinner
2010-09-14 16:32   ` Christoph Hellwig
2010-09-15 20:19   ` Alex Elder
2010-09-16  0:28     ` Dave Chinner
2010-09-14 10:56 ` [PATCH 17/18] xfs: add a lru to the XFS buffer cache Dave Chinner
2010-09-14 23:16   ` Christoph Hellwig
2010-09-15  0:05     ` Dave Chinner
2010-09-15 21:28   ` Alex Elder
2010-09-14 10:56 ` [PATCH 18/18] xfs: stop using the page cache to back the " Dave Chinner
2010-09-14 23:20   ` Christoph Hellwig
2010-09-15  0:06     ` Dave Chinner
2010-09-14 14:25 ` [PATCH 0/18] xfs: metadata and buffer cache scalability improvements Christoph Hellwig
2010-09-17 13:21 ` Alex Elder
2010-09-21  2:02   ` Dave Chinner [this message]
2010-09-21 16:23     ` Alex Elder
2010-09-21 22:34       ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100921020203.GC2614@dastard \
    --to=david@fromorbit.com \
    --cc=aelder@sgi.com \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox