From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from cuda.sgi.com (cuda2.sgi.com [192.48.176.25]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id o4R3xpwO107690 for ; Wed, 26 May 2010 22:59:52 -0500 Received: from mail.internode.on.net (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id A976E37A662 for ; Wed, 26 May 2010 21:02:13 -0700 (PDT) Received: from mail.internode.on.net (bld-mail13.adl6.internode.on.net [150.101.137.98]) by cuda.sgi.com with ESMTP id fjkAEYvPTyNaV4kN for ; Wed, 26 May 2010 21:02:13 -0700 (PDT) Date: Thu, 27 May 2010 14:02:10 +1000 From: Dave Chinner Subject: Re: [PATCH 1/5] inode: Make unused inode LRU per superblock Message-ID: <20100527040210.GI12087@dastard> References: <1274777588-21494-1-git-send-email-david@fromorbit.com> <1274777588-21494-2-git-send-email-david@fromorbit.com> <20100526161732.GC22536@laptop> <20100526230129.GA1395@dastard> <20100527020445.GF22536@laptop> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20100527020445.GF22536@laptop> List-Id: XFS Filesystem from SGI List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Sender: xfs-bounces@oss.sgi.com Errors-To: xfs-bounces@oss.sgi.com To: Nick Piggin Cc: linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, xfs@oss.sgi.com T24gVGh1LCBNYXkgMjcsIDIwMTAgYXQgMTI6MDQ6NDVQTSArMTAwMCwgTmljayBQaWdnaW4gd3Jv dGU6Cj4gT24gVGh1LCBNYXkgMjcsIDIwMTAgYXQgMDk6MDE6MjlBTSArMTAwMCwgRGF2ZSBDaGlu bmVyIHdyb3RlOgo+ID4gT24gVGh1LCBNYXkgMjcsIDIwMTAgYXQgMDI6MTc6MzNBTSArMTAwMCwg TmljayBQaWdnaW4gd3JvdGU6Cj4gPiA+IE9uIFR1ZSwgTWF5IDI1LCAyMDEwIGF0IDA2OjUzOjA0 UE0gKzEwMDAsIERhdmUgQ2hpbm5lciB3cm90ZToKPiA+ID4gPiBGcm9tOiBEYXZlIENoaW5uZXIg PGRjaGlubmVyQHJlZGhhdC5jb20+Cj4gPiA+ID4gCj4gPiA+ID4gVGhlIGlub2RlIHVudXNlZCBs aXN0IGlzIGN1cnJlbnRseSBhIGdsb2JhbCBMUlUuIFRoaXMgZG9lcyBub3QgbWF0Y2gKPiA+ID4g PiB0aGUgb3RoZXIgZ2xvYmFsIGZpbGVzeXN0ZW0gY2FjaGUgLSB0aGUgZGVudHJ5IGNhY2hlIC0g d2hpY2ggdXNlcwo+ID4gPiA+IHBlci1zdXBlcmJsb2NrIExSVSBsaXN0cy4gSGVuY2Ugd2UgaGF2 ZSByZWxhdGVkIGZpbGVzeXN0ZW0gb2JqZWN0Cj4gPiA+ID4gdHlwZXMgdXNpbmcgZGlmZmVyZW50 IExSVSByZWNsYWltYXRpbiBzY2hlbWVzLgo+ID4gPiAKPiA+ID4gSXMgdGhpcyBhbiBpbXByb3Zl bWVudCBJIHdvbmRlcj8gVGhlIGRjYWNoZSBpcyB1c2luZyBwZXIgc2IgbGlzdHMKPiA+ID4gYmVj YXVzZSBpdCBzcGVjaWZpY2FsbHkgcmVxdWlyZXMgc2IgdHJhdmVyc2FsLgo+ID4gCj4gPiBSaWdo dCAtIEkgb3JpZ2luYWxseSBpbXBsZW1lbnRlZCB0aGUgcGVyLXNiIGRlbnRyeSBsaXN0cyBmb3IK PiA+IHNjYWxhYmlsaXR5IHB1cnBvc2VzLiBpLmUuIHRvIGF2b2lkIG1vbm9wb2xpc2luZyB0aGUg ZGVudHJ5X2xvY2sKPiA+IGR1cmluZyB1bm1vdW50IGxvb2tpbmcgZm9yIGRlbnRyaWVzIG9uIGEg c3BlY2lmaWMgc2IgYW5kIGhhbmdpbmcgdGhlCj4gPiBzeXN0ZW0gZm9yIHNldmVyYWwgbWludXRl cy4KPiA+IAo+ID4gSG93ZXZlciwgdGhlIHJlYXNvbiBmb3IgZG9pbmcgdGhpcyB0byB0aGUgaW5v ZGUgY2FjaGUgaXMgbm90IGZvcgo+ID4gc2NhbGFiaWxpdHksIGl0J3MgYmVjYXVzZSB3ZSBoYXZl IGEgdGlnaHQgcmVsYXRpb25zaGlwIGJldHdlZW4gdGhlCj4gPiBkZW50cnkgYW5kIGlub2RlIGNh Y2hl0ZUuIFRoYXQgaXMsIHJlY2xhaW0gZnJvbSB0aGUgZGVudHJ5IExSVSBncm93cwo+ID4gdGhl IGlub2RlIExSVS4gIExpa2UgdGhlIHJlZ2lzdHJhdGlvbiBvZiB0aGUgc2hyaW5rZXJzLCB0aGlz IGlzIGtpbmQKPiA+IG9mIGFuIGltcGxpY2l0LCB1bmRvY3VtZW50ZWQgYmVoYXZvdXIgb2YgdGhl IGN1cnJlbnQgc2hyaW5rZXIKPiA+IGltcGxlbWVuYXRpb24uCj4gCj4gUmlnaHQsIHRoYXQncyB3 aHkgSSB3b25kZXIgd2hldGhlciBpdCBpcyBhbiBpbXByb3ZlbWVudC4gSXQgd291bGQKPiBiZSBp bnRlcmVzdGluZyB0byBzZWUgc29tZSB0ZXN0cyAoc2hvd2luZyBhdCBsZWFzdCBwYXJpdHkpLgoK SSd2ZSBkb25lIHNvbWUgdGVzdGluZyBzaG93aW5nIHBhcml0eS4gVGhleSd2ZSBiZWVuIGFsb25n IHRoZSBsaW5lcwpvZjoKCS0gcG9wdWxhdGUgY2FjaGUgd2l0aCAxbSBkZW50cmllcyArIGlub2Rl cwoJLSBydW4gJ3RpbWUgZWNobyAyID4gL3Byb2Mvc3lzL3ZtL2Ryb3BfY2FjaGVzJwoKSSd2ZSB1 c2VkIGRpZmZlcmVudCBtZXRob2RzIG9mIHBvcHVsYXRpbmcgdGhlIGNhY2hlcyB0byBoYXZlIHRo ZW0Kbm9uLXNlcXVlbnRpYWwgaW4gdGhlIExSVSAoaS5lLiB0cmlnZ2VyIGZyYWdtZW50YXRpb24p LCBoYXZlIGRpcnR5CmJhY2tpbmcgaW5vZGVzIChlLmcuIHRoZSBWRlMgaW5vZGUgY2xlYW4sIHRo ZSB4ZnMgaW5vZGUgZGlydHkKYmVjYXVzZSB0cmFuc2FjdGlvbnMgaGF2ZW4ndCBjb21wbGV0ZWQp LCBldGMuCgpUaGUgdmFyaWF0aW9uIG9uIHRoZSB0ZXN0IGlzIGFyb3VuZCArLTEwJSwgd2l0aCB0 aGUgcGVyLXNiIHNocmlua2VycwphdmVyYWdpbmcgYWJvdXQgNSUgbG93ZXIgdGltZSB0byByZWNs YWltLiBUaGlzIGlzIHdpdGhpbiB0aGUgZXJyb3IKbWFyZ2luIG9mIHRoZSB0ZXN0LCBzbyBpdCdz IG5vdCByZWFsbHkgYSBjb25jbHVzaXZlIHdpbiwgYnV0IGl0IGlzCmNlcnRhaW5seSBzaG93cyB0 aGF0IGl0IGRvZXMgbm90IHNsb3cgYW55dGhpbmcgZG93bi4gSWYgeW91J3ZlIGdvdCBhCmJldHRl ciB3YXkgdG8gdGVzdCBpdCwgdGhlbiBJJ20gYWxsIGVhcnMuLi4uCgo+ID4gV2hhdCB0aGlzIHBh dGNoIHNlcmllcyBkb2VzIGlzIHRha2UgdGhhdCBpbXBsaWNpdCByZWxhdGlvbnNoaXAgYW5kCj4g PiBtYWtlIGl0IGV4cGxpY2l0LiAgSXQgYWxzbyBhbGxvd3Mgb3RoZXIgZmlsZXN5c3RlbSBjYWNo ZXMgdG8gdGllCj4gPiBpbnRvIHRoZSByZWxhdGlvbnNoaXAgaWYgdGhleSBuZWVkIHRvIChlLmcu IHRoZSBYRlMgaW5vZGUgY2FjaGUpLgo+ID4gV2hhdCBpdCBfZG9lc24ndCBkb18gaXMgY2hhbmdl IHRoZSBtYWNybyBsZXZlbCBiZWhhdmlvdXIgb2YgdGhlCj4gPiBzaHJpbmtlcnMuLi4KPiA+IAo+ ID4gPiBXaGF0IGFsbG9jYXRpb24vcmVjbGFpbSByZWFsbHkgd2FudHMgKGZvciBnb29kIHNjYWxh YmlsaXR5IGFuZCBOVU1BCj4gPiA+IGNoYXJhY3RlcmlzdGljcykgaXMgcGVyLXpvbmUgbGlzdHMg Zm9yIHRoZXNlIHRoaW5ncy4gSXQncyBlYXN5IHRvCj4gPiA+IGNvbnZlcnQgYSBzaW5nbGUgbGlz dCBpbnRvIHBlci16b25lIGxpc3RzLgo+ID4gPgo+ID4gPiBJdCBpcyBtdWNoIGhhcmRlciB0byBj b252ZXJ0IHBlci1zYiBsaXN0cyBpbnRvIHBlci1zYiB4IHBlci16b25lIGxpc3RzLgo+ID4gCj4g PiBObyBpdCdzIG5vdC4gSnVzdCBjb252ZXJ0IHRoZSBzX3tkZW50cnksaW5vZGV9X2xydSBsaXN0 cyBvbiBlYWNoCj4gPiBzdXBlcmJsb2NrIGFuZCBjYWxsIHRoZSBzaHJpbmtlciB3aXRoIGEgbmV3 IHpvbmUgbWFzayBmaWVsZCB0byBwaWNrCj4gPiB0aGUgY29ycmVjdCBMUlUuIFRoYXQncyBubyBo YXJkZXIgdGhhbiBjb252ZXJ0aW5nIGEgZ2xvYmFsIExSVS4KPiA+IEFueXdheSwgeW91J2Qgc3Rp bGwgaGF2ZSB0byBkbyBwZXItc2IgeCBwZXItem9uZSBsaXN0cyBmb3IgdGhlIGRlbnRyeSBMUlVz LAo+ID4gc28gY2hhbmdpbmcgdGhlIGlub2RlIGNhY2hlIHRvIHBlci1zYiBtYWtlcyBubyBkaWZm ZXJlbmNlLgo+IAo+IFJpZ2h0LCBpdCBqdXN0IG1ha2VzIGl0IGhhcmRlciB0byBkby4gQnkgbXVj aCBoYXJkZXIsIEkgZGlkIG1vc3RseSBtZWFuCj4gdGhlIGV4dHJhIG1lbW9yeSBvdmVyaGVhZC4K CllvdSd2ZSBzdGlsbCBnb3QgdG8gYWxsb2NhdGUgdGhhdCBleHRyYSBtZW1vcnkgb24gdGhlIHBl ci1zYiBkZW50cnkKTFJVcyBzbyBpdCdzIG5vdCByZWFsbHkgYSB2YWxpZCBhcmd1bWVudC4gSU9X cywgaWYgaXQncyB0b28gbXVjaAptZW1vcnkgZm9yIHBlci1zYiBpbm9kZSBMUlVzLCB0aGVuIGl0 J3MgdG9vIG11Y2ggbWVtb3J5IGZvciB0aGUKcGVyLXNiIGRlbnRyeSBMUlVzIGFzIHdlbGwuLi4K Cj4gSWYgdGhlcmUgaXMgKm5vKiBiZW5lZml0IGZyb20gZG9pbmcgcGVyLXNiCj4gaWNhY2hlIHRo ZW4gSSB3b3VsZCBxdWVzdGlvbiB3aGV0aGVyIHdlIHNob3VsZC4KClRoZSBzYW1lIHZhZ3VlIHF1 ZXN0aW9ucyB3b25kZXJpbmcgYWJvdXQgdGhlIGJlbmVmaXQgb2YgcGVyLXNiCmRlbnRyeSBMUlVz IHdlcmUgcmFpc2VkIHdoZW4gSSBmaXJzdCBwcm9wb3NlZCB0aGVtIHllYXJzIGFnbywgYW5kCmxv b2sgd2hlcmUgd2UgYXJlIG5vdy4gIEJlc2lkZXMsIGZvY3Vzc2luZyBvbiB3aGV0aGVyIHRoaXMg b25lIHBhdGNoCmlzIGEgYmVuZWZpdCBvciBub3QgaXMgcmVhbGx5IG1pc3NpbmcgdGhlIHBvaW50 IGJlY2F1c2UgaXQncyB0aGUKYmVuZWZpdHMgb2YgdGhpcyBwYXRjaHNldCBhcyBhIHdob2xlIHRo YXQgbmVlZCB0byBiZSBjb25zaWRlcmVkLi4uLgoKQ2hlZXJzLAoKRGF2ZS4KLS0gCkRhdmUgQ2hp bm5lcgpkYXZpZEBmcm9tb3JiaXQuY29tCgpfX19fX19fX19fX19fX19fX19fX19fX19fX19fX19f X19fX19fX19fX19fX19fXwp4ZnMgbWFpbGluZyBsaXN0Cnhmc0Bvc3Muc2dpLmNvbQpodHRwOi8v b3NzLnNnaS5jb20vbWFpbG1hbi9saXN0aW5mby94ZnMK From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1750913Ab0E0ECX (ORCPT ); Thu, 27 May 2010 00:02:23 -0400 Received: from bld-mail13.adl6.internode.on.net ([150.101.137.98]:52670 "EHLO mail.internode.on.net" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1750724Ab0E0ECV (ORCPT ); Thu, 27 May 2010 00:02:21 -0400 Date: Thu, 27 May 2010 14:02:10 +1000 From: Dave Chinner To: Nick Piggin Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, xfs@oss.sgi.com Subject: Re: [PATCH 1/5] inode: Make unused inode LRU per superblock Message-ID: <20100527040210.GI12087@dastard> References: <1274777588-21494-1-git-send-email-david@fromorbit.com> <1274777588-21494-2-git-send-email-david@fromorbit.com> <20100526161732.GC22536@laptop> <20100526230129.GA1395@dastard> <20100527020445.GF22536@laptop> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20100527020445.GF22536@laptop> User-Agent: Mutt/1.5.20 (2009-06-14) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, May 27, 2010 at 12:04:45PM +1000, Nick Piggin wrote: > On Thu, May 27, 2010 at 09:01:29AM +1000, Dave Chinner wrote: > > On Thu, May 27, 2010 at 02:17:33AM +1000, Nick Piggin wrote: > > > On Tue, May 25, 2010 at 06:53:04PM +1000, Dave Chinner wrote: > > > > From: Dave Chinner > > > > > > > > The inode unused list is currently a global LRU. This does not match > > > > the other global filesystem cache - the dentry cache - which uses > > > > per-superblock LRU lists. Hence we have related filesystem object > > > > types using different LRU reclaimatin schemes. > > > > > > Is this an improvement I wonder? The dcache is using per sb lists > > > because it specifically requires sb traversal. > > > > Right - I originally implemented the per-sb dentry lists for > > scalability purposes. i.e. to avoid monopolising the dentry_lock > > during unmount looking for dentries on a specific sb and hanging the > > system for several minutes. > > > > However, the reason for doing this to the inode cache is not for > > scalability, it's because we have a tight relationship between the > > dentry and inode cacheѕ. That is, reclaim from the dentry LRU grows > > the inode LRU. Like the registration of the shrinkers, this is kind > > of an implicit, undocumented behavour of the current shrinker > > implemenation. > > Right, that's why I wonder whether it is an improvement. It would > be interesting to see some tests (showing at least parity). I've done some testing showing parity. They've been along the lines of: - populate cache with 1m dentries + inodes - run 'time echo 2 > /proc/sys/vm/drop_caches' I've used different methods of populating the caches to have them non-sequential in the LRU (i.e. trigger fragmentation), have dirty backing inodes (e.g. the VFS inode clean, the xfs inode dirty because transactions haven't completed), etc. The variation on the test is around +-10%, with the per-sb shrinkers averaging about 5% lower time to reclaim. This is within the error margin of the test, so it's not really a conclusive win, but it is certainly shows that it does not slow anything down. If you've got a better way to test it, then I'm all ears.... > > What this patch series does is take that implicit relationship and > > make it explicit. It also allows other filesystem caches to tie > > into the relationship if they need to (e.g. the XFS inode cache). > > What it _doesn't do_ is change the macro level behaviour of the > > shrinkers... > > > > > What allocation/reclaim really wants (for good scalability and NUMA > > > characteristics) is per-zone lists for these things. It's easy to > > > convert a single list into per-zone lists. > > > > > > It is much harder to convert per-sb lists into per-sb x per-zone lists. > > > > No it's not. Just convert the s_{dentry,inode}_lru lists on each > > superblock and call the shrinker with a new zone mask field to pick > > the correct LRU. That's no harder than converting a global LRU. > > Anyway, you'd still have to do per-sb x per-zone lists for the dentry LRUs, > > so changing the inode cache to per-sb makes no difference. > > Right, it just makes it harder to do. By much harder, I did mostly mean > the extra memory overhead. You've still got to allocate that extra memory on the per-sb dentry LRUs so it's not really a valid argument. IOWs, if it's too much memory for per-sb inode LRUs, then it's too much memory for the per-sb dentry LRUs as well... > If there is *no* benefit from doing per-sb > icache then I would question whether we should. The same vague questions wondering about the benefit of per-sb dentry LRUs were raised when I first proposed them years ago, and look where we are now. Besides, focussing on whether this one patch is a benefit or not is really missing the point because it's the benefits of this patchset as a whole that need to be considered.... Cheers, Dave. -- Dave Chinner david@fromorbit.com From mboxrd@z Thu Jan 1 00:00:00 1970 From: Dave Chinner Subject: Re: [PATCH 1/5] inode: Make unused inode LRU per superblock Date: Thu, 27 May 2010 14:02:10 +1000 Message-ID: <20100527040210.GI12087@dastard> References: <1274777588-21494-1-git-send-email-david@fromorbit.com> <1274777588-21494-2-git-send-email-david@fromorbit.com> <20100526161732.GC22536@laptop> <20100526230129.GA1395@dastard> <20100527020445.GF22536@laptop> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, xfs@oss.sgi.com To: Nick Piggin Return-path: Content-Disposition: inline In-Reply-To: <20100527020445.GF22536@laptop> Sender: owner-linux-mm@kvack.org List-Id: linux-fsdevel.vger.kernel.org On Thu, May 27, 2010 at 12:04:45PM +1000, Nick Piggin wrote: > On Thu, May 27, 2010 at 09:01:29AM +1000, Dave Chinner wrote: > > On Thu, May 27, 2010 at 02:17:33AM +1000, Nick Piggin wrote: > > > On Tue, May 25, 2010 at 06:53:04PM +1000, Dave Chinner wrote: > > > > From: Dave Chinner > > > >=20 > > > > The inode unused list is currently a global LRU. This does not ma= tch > > > > the other global filesystem cache - the dentry cache - which uses > > > > per-superblock LRU lists. Hence we have related filesystem object > > > > types using different LRU reclaimatin schemes. > > >=20 > > > Is this an improvement I wonder? The dcache is using per sb lists > > > because it specifically requires sb traversal. > >=20 > > Right - I originally implemented the per-sb dentry lists for > > scalability purposes. i.e. to avoid monopolising the dentry_lock > > during unmount looking for dentries on a specific sb and hanging the > > system for several minutes. > >=20 > > However, the reason for doing this to the inode cache is not for > > scalability, it's because we have a tight relationship between the > > dentry and inode cache=D1=95. That is, reclaim from the dentry LRU gr= ows > > the inode LRU. Like the registration of the shrinkers, this is kind > > of an implicit, undocumented behavour of the current shrinker > > implemenation. >=20 > Right, that's why I wonder whether it is an improvement. It would > be interesting to see some tests (showing at least parity). I've done some testing showing parity. They've been along the lines of: - populate cache with 1m dentries + inodes - run 'time echo 2 > /proc/sys/vm/drop_caches' I've used different methods of populating the caches to have them non-sequential in the LRU (i.e. trigger fragmentation), have dirty backing inodes (e.g. the VFS inode clean, the xfs inode dirty because transactions haven't completed), etc. The variation on the test is around +-10%, with the per-sb shrinkers averaging about 5% lower time to reclaim. This is within the error margin of the test, so it's not really a conclusive win, but it is certainly shows that it does not slow anything down. If you've got a better way to test it, then I'm all ears.... > > What this patch series does is take that implicit relationship and > > make it explicit. It also allows other filesystem caches to tie > > into the relationship if they need to (e.g. the XFS inode cache). > > What it _doesn't do_ is change the macro level behaviour of the > > shrinkers... > >=20 > > > What allocation/reclaim really wants (for good scalability and NUMA > > > characteristics) is per-zone lists for these things. It's easy to > > > convert a single list into per-zone lists. > > > > > > It is much harder to convert per-sb lists into per-sb x per-zone li= sts. > >=20 > > No it's not. Just convert the s_{dentry,inode}_lru lists on each > > superblock and call the shrinker with a new zone mask field to pick > > the correct LRU. That's no harder than converting a global LRU. > > Anyway, you'd still have to do per-sb x per-zone lists for the dentry= LRUs, > > so changing the inode cache to per-sb makes no difference. >=20 > Right, it just makes it harder to do. By much harder, I did mostly mean > the extra memory overhead. You've still got to allocate that extra memory on the per-sb dentry LRUs so it's not really a valid argument. IOWs, if it's too much memory for per-sb inode LRUs, then it's too much memory for the per-sb dentry LRUs as well... > If there is *no* benefit from doing per-sb > icache then I would question whether we should. The same vague questions wondering about the benefit of per-sb dentry LRUs were raised when I first proposed them years ago, and look where we are now. Besides, focussing on whether this one patch is a benefit or not is really missing the point because it's the benefits of this patchset as a whole that need to be considered.... Cheers, Dave. --=20 Dave Chinner david@fromorbit.com -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail203.messagelabs.com (mail203.messagelabs.com [216.82.254.243]) by kanga.kvack.org (Postfix) with SMTP id 8B314600385 for ; Thu, 27 May 2010 00:02:18 -0400 (EDT) Date: Thu, 27 May 2010 14:02:10 +1000 From: Dave Chinner Subject: Re: [PATCH 1/5] inode: Make unused inode LRU per superblock Message-ID: <20100527040210.GI12087@dastard> References: <1274777588-21494-1-git-send-email-david@fromorbit.com> <1274777588-21494-2-git-send-email-david@fromorbit.com> <20100526161732.GC22536@laptop> <20100526230129.GA1395@dastard> <20100527020445.GF22536@laptop> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20100527020445.GF22536@laptop> Sender: owner-linux-mm@kvack.org To: Nick Piggin Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, xfs@oss.sgi.com List-ID: On Thu, May 27, 2010 at 12:04:45PM +1000, Nick Piggin wrote: > On Thu, May 27, 2010 at 09:01:29AM +1000, Dave Chinner wrote: > > On Thu, May 27, 2010 at 02:17:33AM +1000, Nick Piggin wrote: > > > On Tue, May 25, 2010 at 06:53:04PM +1000, Dave Chinner wrote: > > > > From: Dave Chinner > > > > > > > > The inode unused list is currently a global LRU. This does not match > > > > the other global filesystem cache - the dentry cache - which uses > > > > per-superblock LRU lists. Hence we have related filesystem object > > > > types using different LRU reclaimatin schemes. > > > > > > Is this an improvement I wonder? The dcache is using per sb lists > > > because it specifically requires sb traversal. > > > > Right - I originally implemented the per-sb dentry lists for > > scalability purposes. i.e. to avoid monopolising the dentry_lock > > during unmount looking for dentries on a specific sb and hanging the > > system for several minutes. > > > > However, the reason for doing this to the inode cache is not for > > scalability, it's because we have a tight relationship between the > > dentry and inode cacheN?. That is, reclaim from the dentry LRU grows > > the inode LRU. Like the registration of the shrinkers, this is kind > > of an implicit, undocumented behavour of the current shrinker > > implemenation. > > Right, that's why I wonder whether it is an improvement. It would > be interesting to see some tests (showing at least parity). I've done some testing showing parity. They've been along the lines of: - populate cache with 1m dentries + inodes - run 'time echo 2 > /proc/sys/vm/drop_caches' I've used different methods of populating the caches to have them non-sequential in the LRU (i.e. trigger fragmentation), have dirty backing inodes (e.g. the VFS inode clean, the xfs inode dirty because transactions haven't completed), etc. The variation on the test is around +-10%, with the per-sb shrinkers averaging about 5% lower time to reclaim. This is within the error margin of the test, so it's not really a conclusive win, but it is certainly shows that it does not slow anything down. If you've got a better way to test it, then I'm all ears.... > > What this patch series does is take that implicit relationship and > > make it explicit. It also allows other filesystem caches to tie > > into the relationship if they need to (e.g. the XFS inode cache). > > What it _doesn't do_ is change the macro level behaviour of the > > shrinkers... > > > > > What allocation/reclaim really wants (for good scalability and NUMA > > > characteristics) is per-zone lists for these things. It's easy to > > > convert a single list into per-zone lists. > > > > > > It is much harder to convert per-sb lists into per-sb x per-zone lists. > > > > No it's not. Just convert the s_{dentry,inode}_lru lists on each > > superblock and call the shrinker with a new zone mask field to pick > > the correct LRU. That's no harder than converting a global LRU. > > Anyway, you'd still have to do per-sb x per-zone lists for the dentry LRUs, > > so changing the inode cache to per-sb makes no difference. > > Right, it just makes it harder to do. By much harder, I did mostly mean > the extra memory overhead. You've still got to allocate that extra memory on the per-sb dentry LRUs so it's not really a valid argument. IOWs, if it's too much memory for per-sb inode LRUs, then it's too much memory for the per-sb dentry LRUs as well... > If there is *no* benefit from doing per-sb > icache then I would question whether we should. The same vague questions wondering about the benefit of per-sb dentry LRUs were raised when I first proposed them years ago, and look where we are now. Besides, focussing on whether this one patch is a benefit or not is really missing the point because it's the benefits of this patchset as a whole that need to be considered.... Cheers, Dave. -- Dave Chinner david@fromorbit.com -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org