From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from cuda.sgi.com (cuda3.sgi.com [192.48.176.15]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id o4R22SCW104069 for ; Wed, 26 May 2010 21:02:29 -0500 Received: from mx1.suse.de (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id B208F144D710 for ; Wed, 26 May 2010 19:04:52 -0700 (PDT) Received: from mx1.suse.de (cantor.suse.de [195.135.220.2]) by cuda.sgi.com with ESMTP id zwPB8OIqXD9QiCEV for ; Wed, 26 May 2010 19:04:52 -0700 (PDT) Date: Thu, 27 May 2010 12:04:45 +1000 From: Nick Piggin Subject: Re: [PATCH 1/5] inode: Make unused inode LRU per superblock Message-ID: <20100527020445.GF22536@laptop> References: <1274777588-21494-1-git-send-email-david@fromorbit.com> <1274777588-21494-2-git-send-email-david@fromorbit.com> <20100526161732.GC22536@laptop> <20100526230129.GA1395@dastard> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20100526230129.GA1395@dastard> List-Id: XFS Filesystem from SGI List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Sender: xfs-bounces@oss.sgi.com Errors-To: xfs-bounces@oss.sgi.com To: Dave Chinner Cc: linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, xfs@oss.sgi.com T24gVGh1LCBNYXkgMjcsIDIwMTAgYXQgMDk6MDE6MjlBTSArMTAwMCwgRGF2ZSBDaGlubmVyIHdy b3RlOgo+IE9uIFRodSwgTWF5IDI3LCAyMDEwIGF0IDAyOjE3OjMzQU0gKzEwMDAsIE5pY2sgUGln Z2luIHdyb3RlOgo+ID4gT24gVHVlLCBNYXkgMjUsIDIwMTAgYXQgMDY6NTM6MDRQTSArMTAwMCwg RGF2ZSBDaGlubmVyIHdyb3RlOgo+ID4gPiBGcm9tOiBEYXZlIENoaW5uZXIgPGRjaGlubmVyQHJl ZGhhdC5jb20+Cj4gPiA+IAo+ID4gPiBUaGUgaW5vZGUgdW51c2VkIGxpc3QgaXMgY3VycmVudGx5 IGEgZ2xvYmFsIExSVS4gVGhpcyBkb2VzIG5vdCBtYXRjaAo+ID4gPiB0aGUgb3RoZXIgZ2xvYmFs IGZpbGVzeXN0ZW0gY2FjaGUgLSB0aGUgZGVudHJ5IGNhY2hlIC0gd2hpY2ggdXNlcwo+ID4gPiBw ZXItc3VwZXJibG9jayBMUlUgbGlzdHMuIEhlbmNlIHdlIGhhdmUgcmVsYXRlZCBmaWxlc3lzdGVt IG9iamVjdAo+ID4gPiB0eXBlcyB1c2luZyBkaWZmZXJlbnQgTFJVIHJlY2xhaW1hdGluIHNjaGVt ZXMuCj4gPiAKPiA+IElzIHRoaXMgYW4gaW1wcm92ZW1lbnQgSSB3b25kZXI/IFRoZSBkY2FjaGUg aXMgdXNpbmcgcGVyIHNiIGxpc3RzCj4gPiBiZWNhdXNlIGl0IHNwZWNpZmljYWxseSByZXF1aXJl cyBzYiB0cmF2ZXJzYWwuCj4gCj4gUmlnaHQgLSBJIG9yaWdpbmFsbHkgaW1wbGVtZW50ZWQgdGhl IHBlci1zYiBkZW50cnkgbGlzdHMgZm9yCj4gc2NhbGFiaWxpdHkgcHVycG9zZXMuIGkuZS4gdG8g YXZvaWQgbW9ub3BvbGlzaW5nIHRoZSBkZW50cnlfbG9jawo+IGR1cmluZyB1bm1vdW50IGxvb2tp bmcgZm9yIGRlbnRyaWVzIG9uIGEgc3BlY2lmaWMgc2IgYW5kIGhhbmdpbmcgdGhlCj4gc3lzdGVt IGZvciBzZXZlcmFsIG1pbnV0ZXMuCj4gCj4gSG93ZXZlciwgdGhlIHJlYXNvbiBmb3IgZG9pbmcg dGhpcyB0byB0aGUgaW5vZGUgY2FjaGUgaXMgbm90IGZvcgo+IHNjYWxhYmlsaXR5LCBpdCdzIGJl Y2F1c2Ugd2UgaGF2ZSBhIHRpZ2h0IHJlbGF0aW9uc2hpcCBiZXR3ZWVuIHRoZQo+IGRlbnRyeSBh bmQgaW5vZGUgY2FjaGXRlS4gVGhhdCBpcywgcmVjbGFpbSBmcm9tIHRoZSBkZW50cnkgTFJVIGdy b3dzCj4gdGhlIGlub2RlIExSVS4gIExpa2UgdGhlIHJlZ2lzdHJhdGlvbiBvZiB0aGUgc2hyaW5r ZXJzLCB0aGlzIGlzIGtpbmQKPiBvZiBhbiBpbXBsaWNpdCwgdW5kb2N1bWVudGVkIGJlaGF2b3Vy IG9mIHRoZSBjdXJyZW50IHNocmlua2VyCj4gaW1wbGVtZW5hdGlvbi4KClJpZ2h0LCB0aGF0J3Mg d2h5IEkgd29uZGVyIHdoZXRoZXIgaXQgaXMgYW4gaW1wcm92ZW1lbnQuIEl0IHdvdWxkCmJlIGlu dGVyZXN0aW5nIHRvIHNlZSBzb21lIHRlc3RzIChzaG93aW5nIGF0IGxlYXN0IHBhcml0eSkuCgog Cj4gV2hhdCB0aGlzIHBhdGNoIHNlcmllcyBkb2VzIGlzIHRha2UgdGhhdCBpbXBsaWNpdCByZWxh dGlvbnNoaXAgYW5kCj4gbWFrZSBpdCBleHBsaWNpdC4gIEl0IGFsc28gYWxsb3dzIG90aGVyIGZp bGVzeXN0ZW0gY2FjaGVzIHRvIHRpZQo+IGludG8gdGhlIHJlbGF0aW9uc2hpcCBpZiB0aGV5IG5l ZWQgdG8gKGUuZy4gdGhlIFhGUyBpbm9kZSBjYWNoZSkuCj4gV2hhdCBpdCBfZG9lc24ndCBkb18g aXMgY2hhbmdlIHRoZSBtYWNybyBsZXZlbCBiZWhhdmlvdXIgb2YgdGhlCj4gc2hyaW5rZXJzLi4u Cj4gCj4gPiBXaGF0IGFsbG9jYXRpb24vcmVjbGFpbSByZWFsbHkgd2FudHMgKGZvciBnb29kIHNj YWxhYmlsaXR5IGFuZCBOVU1BCj4gPiBjaGFyYWN0ZXJpc3RpY3MpIGlzIHBlci16b25lIGxpc3Rz IGZvciB0aGVzZSB0aGluZ3MuIEl0J3MgZWFzeSB0bwo+ID4gY29udmVydCBhIHNpbmdsZSBsaXN0 IGludG8gcGVyLXpvbmUgbGlzdHMuCj4gPgo+ID4gSXQgaXMgbXVjaCBoYXJkZXIgdG8gY29udmVy dCBwZXItc2IgbGlzdHMgaW50byBwZXItc2IgeCBwZXItem9uZSBsaXN0cy4KPiAKPiBObyBpdCdz IG5vdC4gSnVzdCBjb252ZXJ0IHRoZSBzX3tkZW50cnksaW5vZGV9X2xydSBsaXN0cyBvbiBlYWNo Cj4gc3VwZXJibG9jayBhbmQgY2FsbCB0aGUgc2hyaW5rZXIgd2l0aCBhIG5ldyB6b25lIG1hc2sg ZmllbGQgdG8gcGljawo+IHRoZSBjb3JyZWN0IExSVS4gVGhhdCdzIG5vIGhhcmRlciB0aGFuIGNv bnZlcnRpbmcgYSBnbG9iYWwgTFJVLgo+IEFueXdheSwgeW91J2Qgc3RpbGwgaGF2ZSB0byBkbyBw ZXItc2IgeCBwZXItem9uZSBsaXN0cyBmb3IgdGhlIGRlbnRyeSBMUlVzLAo+IHNvIGNoYW5naW5n IHRoZSBpbm9kZSBjYWNoZSB0byBwZXItc2IgbWFrZXMgbm8gZGlmZmVyZW5jZS4KClJpZ2h0LCBp dCBqdXN0IG1ha2VzIGl0IGhhcmRlciB0byBkby4gQnkgbXVjaCBoYXJkZXIsIEkgZGlkIG1vc3Rs eSBtZWFuCnRoZSBleHRyYSBtZW1vcnkgb3ZlcmhlYWQuIElmIHRoZXJlIGlzICpubyogYmVuZWZp dCBmcm9tIGRvaW5nIHBlci1zYgppY2FjaGUgdGhlbiBJIHdvdWxkIHF1ZXN0aW9uIHdoZXRoZXIg d2Ugc2hvdWxkLgoKIAo+IEhvd2V2ZXIsIHRoaXMgaXMgYSBtb290IHBvaW50IGJlY2F1c2Ugd2Ug ZG9uJ3QgaGF2ZSBwZXItem9uZSBzaHJpbmtlcgo+IGludGVyZmFjZXMuIFRoYXQncyBhbiBlbnRp cmVseSBzZXBhcmF0ZSBkaXNjdXNzaW9uIGJlY2F1c2Ugb2YgdGhlCj4gbWFjcm8tbGV2ZWwgYmVo YXZpb3VyYWwgY2hhbmdlcyBpdCBpbXBsaWVzLi4uLgoKWWVwLiBJIGhhdmUgc29tZSBwYXRjaGVz IGZvciBpdCwgYnV0IHRoZXkncmUgY3VycmVudGx5IGJlaGluZCB0aGUgb3RoZXIKZmluZSBncmFp bmVkIGxvY2tpbmcgc3R1ZmYuIEJ1dCBpdCdzIHNvbWV0aGluZyB0aGF0IHJlYWxseSBuZWVkcyB0 byBiZQppbXBsZW1lbnRlZCwgSU1PLgoKX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19f X19fX19fX19fX19fX18KeGZzIG1haWxpbmcgbGlzdAp4ZnNAb3NzLnNnaS5jb20KaHR0cDovL29z cy5zZ2kuY29tL21haWxtYW4vbGlzdGluZm8veGZzCg== From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757131Ab0E0CEx (ORCPT ); Wed, 26 May 2010 22:04:53 -0400 Received: from cantor.suse.de ([195.135.220.2]:35324 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753801Ab0E0CEw (ORCPT ); Wed, 26 May 2010 22:04:52 -0400 Date: Thu, 27 May 2010 12:04:45 +1000 From: Nick Piggin To: Dave Chinner Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, xfs@oss.sgi.com Subject: Re: [PATCH 1/5] inode: Make unused inode LRU per superblock Message-ID: <20100527020445.GF22536@laptop> References: <1274777588-21494-1-git-send-email-david@fromorbit.com> <1274777588-21494-2-git-send-email-david@fromorbit.com> <20100526161732.GC22536@laptop> <20100526230129.GA1395@dastard> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20100526230129.GA1395@dastard> User-Agent: Mutt/1.5.20 (2009-06-14) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, May 27, 2010 at 09:01:29AM +1000, Dave Chinner wrote: > On Thu, May 27, 2010 at 02:17:33AM +1000, Nick Piggin wrote: > > On Tue, May 25, 2010 at 06:53:04PM +1000, Dave Chinner wrote: > > > From: Dave Chinner > > > > > > The inode unused list is currently a global LRU. This does not match > > > the other global filesystem cache - the dentry cache - which uses > > > per-superblock LRU lists. Hence we have related filesystem object > > > types using different LRU reclaimatin schemes. > > > > Is this an improvement I wonder? The dcache is using per sb lists > > because it specifically requires sb traversal. > > Right - I originally implemented the per-sb dentry lists for > scalability purposes. i.e. to avoid monopolising the dentry_lock > during unmount looking for dentries on a specific sb and hanging the > system for several minutes. > > However, the reason for doing this to the inode cache is not for > scalability, it's because we have a tight relationship between the > dentry and inode cacheѕ. That is, reclaim from the dentry LRU grows > the inode LRU. Like the registration of the shrinkers, this is kind > of an implicit, undocumented behavour of the current shrinker > implemenation. Right, that's why I wonder whether it is an improvement. It would be interesting to see some tests (showing at least parity). > What this patch series does is take that implicit relationship and > make it explicit. It also allows other filesystem caches to tie > into the relationship if they need to (e.g. the XFS inode cache). > What it _doesn't do_ is change the macro level behaviour of the > shrinkers... > > > What allocation/reclaim really wants (for good scalability and NUMA > > characteristics) is per-zone lists for these things. It's easy to > > convert a single list into per-zone lists. > > > > It is much harder to convert per-sb lists into per-sb x per-zone lists. > > No it's not. Just convert the s_{dentry,inode}_lru lists on each > superblock and call the shrinker with a new zone mask field to pick > the correct LRU. That's no harder than converting a global LRU. > Anyway, you'd still have to do per-sb x per-zone lists for the dentry LRUs, > so changing the inode cache to per-sb makes no difference. Right, it just makes it harder to do. By much harder, I did mostly mean the extra memory overhead. If there is *no* benefit from doing per-sb icache then I would question whether we should. > However, this is a moot point because we don't have per-zone shrinker > interfaces. That's an entirely separate discussion because of the > macro-level behavioural changes it implies.... Yep. I have some patches for it, but they're currently behind the other fine grained locking stuff. But it's something that really needs to be implemented, IMO. From mboxrd@z Thu Jan 1 00:00:00 1970 From: Nick Piggin Subject: Re: [PATCH 1/5] inode: Make unused inode LRU per superblock Date: Thu, 27 May 2010 12:04:45 +1000 Message-ID: <20100527020445.GF22536@laptop> References: <1274777588-21494-1-git-send-email-david@fromorbit.com> <1274777588-21494-2-git-send-email-david@fromorbit.com> <20100526161732.GC22536@laptop> <20100526230129.GA1395@dastard> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, xfs@oss.sgi.com To: Dave Chinner Return-path: Content-Disposition: inline In-Reply-To: <20100526230129.GA1395@dastard> Sender: owner-linux-mm@kvack.org List-Id: linux-fsdevel.vger.kernel.org On Thu, May 27, 2010 at 09:01:29AM +1000, Dave Chinner wrote: > On Thu, May 27, 2010 at 02:17:33AM +1000, Nick Piggin wrote: > > On Tue, May 25, 2010 at 06:53:04PM +1000, Dave Chinner wrote: > > > From: Dave Chinner > > >=20 > > > The inode unused list is currently a global LRU. This does not matc= h > > > the other global filesystem cache - the dentry cache - which uses > > > per-superblock LRU lists. Hence we have related filesystem object > > > types using different LRU reclaimatin schemes. > >=20 > > Is this an improvement I wonder? The dcache is using per sb lists > > because it specifically requires sb traversal. >=20 > Right - I originally implemented the per-sb dentry lists for > scalability purposes. i.e. to avoid monopolising the dentry_lock > during unmount looking for dentries on a specific sb and hanging the > system for several minutes. >=20 > However, the reason for doing this to the inode cache is not for > scalability, it's because we have a tight relationship between the > dentry and inode cache=D1=95. That is, reclaim from the dentry LRU grow= s > the inode LRU. Like the registration of the shrinkers, this is kind > of an implicit, undocumented behavour of the current shrinker > implemenation. Right, that's why I wonder whether it is an improvement. It would be interesting to see some tests (showing at least parity). =20 > What this patch series does is take that implicit relationship and > make it explicit. It also allows other filesystem caches to tie > into the relationship if they need to (e.g. the XFS inode cache). > What it _doesn't do_ is change the macro level behaviour of the > shrinkers... >=20 > > What allocation/reclaim really wants (for good scalability and NUMA > > characteristics) is per-zone lists for these things. It's easy to > > convert a single list into per-zone lists. > > > > It is much harder to convert per-sb lists into per-sb x per-zone list= s. >=20 > No it's not. Just convert the s_{dentry,inode}_lru lists on each > superblock and call the shrinker with a new zone mask field to pick > the correct LRU. That's no harder than converting a global LRU. > Anyway, you'd still have to do per-sb x per-zone lists for the dentry L= RUs, > so changing the inode cache to per-sb makes no difference. Right, it just makes it harder to do. By much harder, I did mostly mean the extra memory overhead. If there is *no* benefit from doing per-sb icache then I would question whether we should. =20 > However, this is a moot point because we don't have per-zone shrinker > interfaces. That's an entirely separate discussion because of the > macro-level behavioural changes it implies.... Yep. I have some patches for it, but they're currently behind the other fine grained locking stuff. But it's something that really needs to be implemented, IMO. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail143.messagelabs.com (mail143.messagelabs.com [216.82.254.35]) by kanga.kvack.org (Postfix) with ESMTP id 094D26B01B8 for ; Wed, 26 May 2010 22:04:53 -0400 (EDT) Date: Thu, 27 May 2010 12:04:45 +1000 From: Nick Piggin Subject: Re: [PATCH 1/5] inode: Make unused inode LRU per superblock Message-ID: <20100527020445.GF22536@laptop> References: <1274777588-21494-1-git-send-email-david@fromorbit.com> <1274777588-21494-2-git-send-email-david@fromorbit.com> <20100526161732.GC22536@laptop> <20100526230129.GA1395@dastard> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20100526230129.GA1395@dastard> Sender: owner-linux-mm@kvack.org To: Dave Chinner Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, xfs@oss.sgi.com List-ID: On Thu, May 27, 2010 at 09:01:29AM +1000, Dave Chinner wrote: > On Thu, May 27, 2010 at 02:17:33AM +1000, Nick Piggin wrote: > > On Tue, May 25, 2010 at 06:53:04PM +1000, Dave Chinner wrote: > > > From: Dave Chinner > > > > > > The inode unused list is currently a global LRU. This does not match > > > the other global filesystem cache - the dentry cache - which uses > > > per-superblock LRU lists. Hence we have related filesystem object > > > types using different LRU reclaimatin schemes. > > > > Is this an improvement I wonder? The dcache is using per sb lists > > because it specifically requires sb traversal. > > Right - I originally implemented the per-sb dentry lists for > scalability purposes. i.e. to avoid monopolising the dentry_lock > during unmount looking for dentries on a specific sb and hanging the > system for several minutes. > > However, the reason for doing this to the inode cache is not for > scalability, it's because we have a tight relationship between the > dentry and inode cacheN?. That is, reclaim from the dentry LRU grows > the inode LRU. Like the registration of the shrinkers, this is kind > of an implicit, undocumented behavour of the current shrinker > implemenation. Right, that's why I wonder whether it is an improvement. It would be interesting to see some tests (showing at least parity). > What this patch series does is take that implicit relationship and > make it explicit. It also allows other filesystem caches to tie > into the relationship if they need to (e.g. the XFS inode cache). > What it _doesn't do_ is change the macro level behaviour of the > shrinkers... > > > What allocation/reclaim really wants (for good scalability and NUMA > > characteristics) is per-zone lists for these things. It's easy to > > convert a single list into per-zone lists. > > > > It is much harder to convert per-sb lists into per-sb x per-zone lists. > > No it's not. Just convert the s_{dentry,inode}_lru lists on each > superblock and call the shrinker with a new zone mask field to pick > the correct LRU. That's no harder than converting a global LRU. > Anyway, you'd still have to do per-sb x per-zone lists for the dentry LRUs, > so changing the inode cache to per-sb makes no difference. Right, it just makes it harder to do. By much harder, I did mostly mean the extra memory overhead. If there is *no* benefit from doing per-sb icache then I would question whether we should. > However, this is a moot point because we don't have per-zone shrinker > interfaces. That's an entirely separate discussion because of the > macro-level behavioural changes it implies.... Yep. I have some patches for it, but they're currently behind the other fine grained locking stuff. But it's something that really needs to be implemented, IMO. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org