From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <xfs-bounces@oss.sgi.com>
Received: from cuda.sgi.com (cuda3.sgi.com [192.48.176.15])
	by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with SMTP id
	p376Glbi080934 for <xfs@oss.sgi.com>; Thu, 7 Apr 2011 01:16:47 -0500
Received: from ipmail06.adl6.internode.on.net (localhost [127.0.0.1])
	by cuda.sgi.com (Spam Firewall) with ESMTP id 041371E06539
	for <xfs@oss.sgi.com>; Wed,  6 Apr 2011 23:20:01 -0700 (PDT)
Received: from ipmail06.adl6.internode.on.net (ipmail06.adl6.internode.on.net
	[150.101.137.145]) by cuda.sgi.com with ESMTP id
	1u5a4erkhNUv7Q96 for <xfs@oss.sgi.com>;
	Wed, 06 Apr 2011 23:20:01 -0700 (PDT)
From: Dave Chinner <david@fromorbit.com>
Subject: [PATCH 2/2] xfs: kick inode writeback when low on memory
Date: Thu,  7 Apr 2011 16:19:56 +1000
Message-Id: <1302157196-1988-3-git-send-email-david@fromorbit.com>
In-Reply-To: <1302157196-1988-1-git-send-email-david@fromorbit.com>
References: <1302157196-1988-1-git-send-email-david@fromorbit.com>
List-Id: XFS Filesystem from SGI <xfs.oss.sgi.com>
List-Unsubscribe: <http://oss.sgi.com/mailman/options/xfs>,
	<mailto:xfs-request@oss.sgi.com?subject=unsubscribe>
List-Archive: <http://oss.sgi.com/pipermail/xfs>
List-Post: <mailto:xfs@oss.sgi.com>
List-Help: <mailto:xfs-request@oss.sgi.com?subject=help>
List-Subscribe: <http://oss.sgi.com/mailman/listinfo/xfs>,
	<mailto:xfs-request@oss.sgi.com?subject=subscribe>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Sender: xfs-bounces@oss.sgi.com
Errors-To: xfs-bounces@oss.sgi.com
To: xfs@oss.sgi.com
Cc: linux-fsdevel@vger.kernel.org

From: Dave Chinner <dchinner@redhat.com>

When the inode cache shrinker runs, we may have lots of dirty inodes queued up
in the VFS dirty queues that have not been expired. The typical case for this
with XFS is atime updates. The result is that a highly concurrent workload that
copies files and then later reads them (say to verify checksums) dirties all
the inodes again, even when relatime is used.

In a constrained memory environment, this results in a large number of dirty
inodes using all of available memory and memory reclaim being unable to free
them as dirty inodes areconsidered active. This problem was uncovered by Chris
Mason during recent low memory stress testing.

The fix is to trigger VFS level writeback from the XFS inode cache shrinker if
there isn't already writeback in progress. This ensures that when we enter a
low memory situation we start cleaning inodes (via the flusher thread) on the
filesystem immediately, thereby making it more likely that we will be able to
evict those dirty inodes from the VFS in the near future.

The mechanism is not perfect - it only acts on the current filesystem, so if
all the dirty inodes are on a different filesystem it won't help. However, it
seems to be a valid assumption is that the filesystem with lots of dirty inodes
is going to have the shrinker called very soon after the memory shortage
begins, so this shouldn't be an issue.

The other flaw is that there is no guarantee that the flusher thread will make
progress fast enough to clean the dirty inodes so they can be reclaimed in the
near future. However, this mechanism does improve the resilience of the
filesystem under the test conditions - instead of reliably triggering the OOM
killer 20 minutes into the stress test, it took more than 6 hours before it
happened.

This small addition definitely improves the low memory resilience of XFS on
this type of workload, and best of all it has no impact on performance when
memory is not constrained.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/linux-2.6/xfs_sync.c |   11 +++++++++++
 1 files changed, 11 insertions(+), 0 deletions(-)

diff --git a/fs/xfs/linux-2.6/xfs_sync.c b/fs/xfs/linux-2.6/xfs_sync.c
index 9ad9560..c240d46 100644
--- a/fs/xfs/linux-2.6/xfs_sync.c
+++ b/fs/xfs/linux-2.6/xfs_sync.c
@@ -1038,6 +1038,17 @@ xfs_reclaim_inode_shrink(
 		if (!(gfp_mask & __GFP_FS))
 			return -1;
 
+		/*
+		 * make sure VFS is cleaning inodes so they can be pruned
+		 * and marked for reclaim in the XFS inode cache. If we don't
+		 * do this the VFS can accumulate dirty inodes and we can OOM
+		 * before they are cleaned by the periodic VFS writeback.
+		 *
+		 * This takes VFS level locks, so we can only do this after
+		 * the __GFP_FS checks otherwise lockdep gets really unhappy.
+		 */
+		writeback_inodes_sb_nr_if_idle(mp->m_super, nr_to_scan);
+
 		xfs_reclaim_inodes_ag(mp, SYNC_TRYLOCK | SYNC_WAIT,
 					&nr_to_scan);
 		/* terminate if we don't exhaust the scan */
-- 
1.7.2.3

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs