public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: xfs@oss.sgi.com
Cc: chris.mason@oracle.com
Subject: [RFC, PATCH 0/5] xfs: Reduce OOM kill problems under heavy load
Date: Wed, 23 Feb 2011 09:16:04 +1100	[thread overview]
Message-ID: <1298412969-14389-1-git-send-email-david@fromorbit.com> (raw)

Chris Mason reported recently that a concurent stress test (basically copying
the linux kernel tree 20 times, verifying md5sums and deleting it in a loop
concurrently) under low memory conditions was triggering the OOM killer
muchmore easily than for btrfs.

Turns out there are two main problems. The first is that unlinked inodes were
not being reclaimed fast enough, leading to the OOM being declared when there
are large numbers of reclaimable inodes still around. The second was that
atime updates due to the verify step were creating large numbers of dirty
inodes at the VFS level that were not being written back and hence made
reclaimable before the system declared OOM and killed stuff.

The first problem is fixed by making background inode reclaim more frequent and
faster, kicking background reclaim from the inode cache shrinker so that when
memory is low we always have background inode reclaim in progress, and finally
making the shrinker reclaim scan block waiting on inodes to reclaim. This last
step throttles memory reclaim to the speed at which we can reclaim inodes, a
key step needed to prevent inodes from reclaim declaring OOM while there are
still reclaimable inodes around. The background inode reclaim prevents this
synchronous flush from finding dirty inodes and block on them in most cases and
hence prevents performance regressions in more common workloads due to reclaim
stalls.

To enable this new functionality, the xfssyncd thread is replaced with a
workqueue and the existing xfssyncd work replaced with a global workqueue.
Hence all filesystems will share the same workqueue and we remove allt eh
xfssyncd threads from the system. The ENOSPC inode flush is converted to use
the workqueue, and optimised to only allow a single flush at a time. This
significant speeds up ENOSPC processing under concurrent workloads as it
removes all the unnecessary scanning that every single ENOSPC event
currently queues to the xfssyncd. Finally, a new reinode reclaim worker is
added to the workqueue that runs 5x more frequently that the xfssyncd to do the
background inode reclaim scan.

The second problem is fixed simply by making the XFS inode cache shrinker kick
the bdi flusher to write back inodes if the bdi flusher is not already active.
This ensures that in low memory situations we are always actively writing back
inodes that are dirty at the VFS level and hence preventing them from building
up in an unreclaimable state. Once again this does not affect performance in
non-memory constrained situations.

The result is not yet perfect - the stress test still triggers the OOM killer
somewhere between 3-6 hours into the test on a CONFIG_XFS_DEBUG kernel with
lockdep enabled (so inodes consume roughly 2x the memory of a production
kernel), though this is a marked improvement. The OOM kill trigger appears to
be a different one to the above two, so expect more patches to address that
soon.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

             reply	other threads:[~2011-02-22 22:13 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-02-22 22:16 Dave Chinner [this message]
2011-02-22 22:16 ` [PATCH 1/5] xfs: introduce inode cluster buffer trylocks for xfs_iflush Dave Chinner
2011-03-03 15:55   ` Christoph Hellwig
2011-03-03 22:04     ` Dave Chinner
2011-02-22 22:16 ` [PATCH 2/5] xfs: introduce a xfssyncd workqueue Dave Chinner
2011-02-22 22:16 ` [PATCH 3/5] xfs: convert ENOSPC inode flushing to use new syncd workqueue Dave Chinner
2011-03-03 15:34   ` Christoph Hellwig
2011-03-03 22:41     ` Dave Chinner
2011-03-04 12:40       ` Christoph Hellwig
2011-02-22 22:16 ` [PATCH 4/5] xfs: introduce background inode reclaim work Dave Chinner
2011-03-03 15:36   ` Christoph Hellwig
2011-03-03 22:43     ` Dave Chinner
2011-02-22 22:16 ` [PATCH 5/5] xfs: kick inode writeback when low on memory Dave Chinner
2011-03-02  3:06   ` Dave Chinner
2011-03-02 14:12     ` Christoph Hellwig
2011-03-03  2:42       ` Dave Chinner
2011-03-03 15:48         ` Christoph Hellwig
2011-03-03 16:19           ` Christoph Hellwig
2011-03-09  5:46             ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1298412969-14389-1-git-send-email-david@fromorbit.com \
    --to=david@fromorbit.com \
    --cc=chris.mason@oracle.com \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox