From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andreas Dilger Subject: Re: [rfc][patch] fs: turn iprune_mutex into rwsem Date: Sun, 16 Aug 2009 16:11:59 -0600 Message-ID: <20090816221159.GR5931@webber.adilger.int> References: <20090814152504.GA19195@wotan.suse.de> <20090815195742.GA14842@infradead.org> Mime-Version: 1.0 Content-Type: text/plain; CHARSET=US-ASCII Content-Transfer-Encoding: 7BIT Cc: Nick Piggin , linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, Jan Kara , Andrew Morton To: Christoph Hellwig Return-path: Received: from sca-es-mail-2.Sun.COM ([192.18.43.133]:46558 "EHLO sca-es-mail-2.sun.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755398AbZHPWMZ (ORCPT ); Sun, 16 Aug 2009 18:12:25 -0400 Received: from fe-sfbay-09.sun.com ([192.18.43.129]) by sca-es-mail-2.sun.com (8.13.7+Sun/8.12.9) with ESMTP id n7GMBxap018877 for ; Sun, 16 Aug 2009 15:12:01 -0700 (PDT) Content-disposition: inline Received: from conversion-daemon.fe-sfbay-09.sun.com by fe-sfbay-09.sun.com (Sun Java(tm) System Messaging Server 7u2-7.04 64bit (built Jul 2 2009)) id <0KOH00500PGRL200@fe-sfbay-09.sun.com> for linux-fsdevel@vger.kernel.org; Sun, 16 Aug 2009 15:11:59 -0700 (PDT) In-reply-to: <20090815195742.GA14842@infradead.org> Sender: linux-fsdevel-owner@vger.kernel.org List-ID: On Aug 15, 2009 15:57 -0400, Christoph Hellwig wrote: > On Fri, Aug 14, 2009 at 05:25:05PM +0200, Nick Piggin wrote: > > Now I think the main problem is having the filesystem block (and do IO > > in inode reclaim. The problem is that this doesn't get accounted well > > and penalizes a random allocator with a big latency spike caused by > > work generated from elsewhere. > > > > I think the best idea would be to avoid this. By design if possible, > > or by deferring the hard work to an asynchronous context. If the latter, > > then the fs would probably want to throttle creation of new work with > > queue size of the deferred work, but let's not get into those details. > > I don't really see a good way to avoid this. For any filesystem that > does some sort of preallocations we need to drop them in ->clear_inode. One of the problems I've seen in the past is that filesystem memory reclaim (in particular dentry/inode cleanup) cannot happen within filesystems due to potential deadlocks. This is particularly problematic when there is a lot of memory pressure from within the kernel and very little from userspace (e.g. updatedb or find). However, many/most inodes/dentries in the filesystem could be discarded quite easily and would not deadlock the system. I wonder if it makes sense to keep a mask in the inode that the filesystem could set that determines whether it is safe to clean up the inode even though __GFP_FS is not set? That would potentially allow e.g. shrink_icache_memory() to free a large number of "non-tricky" inodes if needed (e.g. ones without locks/preallocation/expensive cleanup). Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.