From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from cuda.sgi.com (cuda3.sgi.com [192.48.176.15]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with SMTP id p376Glbi080934 for ; Thu, 7 Apr 2011 01:16:47 -0500 Received: from ipmail06.adl6.internode.on.net (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 041371E06539 for ; Wed, 6 Apr 2011 23:20:01 -0700 (PDT) Received: from ipmail06.adl6.internode.on.net (ipmail06.adl6.internode.on.net [150.101.137.145]) by cuda.sgi.com with ESMTP id 1u5a4erkhNUv7Q96 for ; Wed, 06 Apr 2011 23:20:01 -0700 (PDT) From: Dave Chinner Subject: [PATCH 2/2] xfs: kick inode writeback when low on memory Date: Thu, 7 Apr 2011 16:19:56 +1000 Message-Id: <1302157196-1988-3-git-send-email-david@fromorbit.com> In-Reply-To: <1302157196-1988-1-git-send-email-david@fromorbit.com> References: <1302157196-1988-1-git-send-email-david@fromorbit.com> List-Id: XFS Filesystem from SGI List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: xfs-bounces@oss.sgi.com Errors-To: xfs-bounces@oss.sgi.com To: xfs@oss.sgi.com Cc: linux-fsdevel@vger.kernel.org From: Dave Chinner When the inode cache shrinker runs, we may have lots of dirty inodes queued up in the VFS dirty queues that have not been expired. The typical case for this with XFS is atime updates. The result is that a highly concurrent workload that copies files and then later reads them (say to verify checksums) dirties all the inodes again, even when relatime is used. In a constrained memory environment, this results in a large number of dirty inodes using all of available memory and memory reclaim being unable to free them as dirty inodes areconsidered active. This problem was uncovered by Chris Mason during recent low memory stress testing. The fix is to trigger VFS level writeback from the XFS inode cache shrinker if there isn't already writeback in progress. This ensures that when we enter a low memory situation we start cleaning inodes (via the flusher thread) on the filesystem immediately, thereby making it more likely that we will be able to evict those dirty inodes from the VFS in the near future. The mechanism is not perfect - it only acts on the current filesystem, so if all the dirty inodes are on a different filesystem it won't help. However, it seems to be a valid assumption is that the filesystem with lots of dirty inodes is going to have the shrinker called very soon after the memory shortage begins, so this shouldn't be an issue. The other flaw is that there is no guarantee that the flusher thread will make progress fast enough to clean the dirty inodes so they can be reclaimed in the near future. However, this mechanism does improve the resilience of the filesystem under the test conditions - instead of reliably triggering the OOM killer 20 minutes into the stress test, it took more than 6 hours before it happened. This small addition definitely improves the low memory resilience of XFS on this type of workload, and best of all it has no impact on performance when memory is not constrained. Signed-off-by: Dave Chinner --- fs/xfs/linux-2.6/xfs_sync.c | 11 +++++++++++ 1 files changed, 11 insertions(+), 0 deletions(-) diff --git a/fs/xfs/linux-2.6/xfs_sync.c b/fs/xfs/linux-2.6/xfs_sync.c index 9ad9560..c240d46 100644 --- a/fs/xfs/linux-2.6/xfs_sync.c +++ b/fs/xfs/linux-2.6/xfs_sync.c @@ -1038,6 +1038,17 @@ xfs_reclaim_inode_shrink( if (!(gfp_mask & __GFP_FS)) return -1; + /* + * make sure VFS is cleaning inodes so they can be pruned + * and marked for reclaim in the XFS inode cache. If we don't + * do this the VFS can accumulate dirty inodes and we can OOM + * before they are cleaned by the periodic VFS writeback. + * + * This takes VFS level locks, so we can only do this after + * the __GFP_FS checks otherwise lockdep gets really unhappy. + */ + writeback_inodes_sb_nr_if_idle(mp->m_super, nr_to_scan); + xfs_reclaim_inodes_ag(mp, SYNC_TRYLOCK | SYNC_WAIT, &nr_to_scan); /* terminate if we don't exhaust the scan */ -- 1.7.2.3 _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs