From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <xfs-bounces@oss.sgi.com>
Received: from cuda.sgi.com (cuda1.sgi.com [192.48.157.11])
	by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id
	o4S5H7Oj196346 for <xfs@oss.sgi.com>; Fri, 28 May 2010 00:17:07 -0500
Received: from mx2.suse.de (localhost [127.0.0.1])
	by cuda.sgi.com (Spam Firewall) with ESMTP id 84098130C1A0
	for <xfs@oss.sgi.com>; Thu, 27 May 2010 22:21:22 -0700 (PDT)
Received: from mx2.suse.de (cantor2.suse.de [195.135.220.15]) by cuda.sgi.com
	with ESMTP id 8IlAKx3OwH4b0pAq for <xfs@oss.sgi.com>;
	Thu, 27 May 2010 22:21:22 -0700 (PDT)
Date: Fri, 28 May 2010 15:19:24 +1000
From: Nick Piggin <npiggin@suse.de>
Subject: Re: [PATCH 3/5] superblock: introduce per-sb cache shrinker
	infrastructure
Message-ID: <20100528051924.GZ22536@laptop>
References: <1274777588-21494-1-git-send-email-david@fromorbit.com>
	<1274777588-21494-4-git-send-email-david@fromorbit.com>
	<20100527063523.GJ22536@laptop> <20100527224034.GO12087@dastard>
MIME-Version: 1.0
Content-Disposition: inline
In-Reply-To: <20100527224034.GO12087@dastard>
List-Id: XFS Filesystem from SGI <xfs.oss.sgi.com>
List-Unsubscribe: <http://oss.sgi.com/mailman/options/xfs>,
	<mailto:xfs-request@oss.sgi.com?subject=unsubscribe>
List-Archive: <http://oss.sgi.com/pipermail/xfs>
List-Post: <mailto:xfs@oss.sgi.com>
List-Help: <mailto:xfs-request@oss.sgi.com?subject=help>
List-Subscribe: <http://oss.sgi.com/mailman/listinfo/xfs>,
	<mailto:xfs-request@oss.sgi.com?subject=subscribe>
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64
Sender: xfs-bounces@oss.sgi.com
Errors-To: xfs-bounces@oss.sgi.com
To: Dave Chinner <david@fromorbit.com>
Cc: linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, xfs@oss.sgi.com

T24gRnJpLCBNYXkgMjgsIDIwMTAgYXQgMDg6NDA6MzRBTSArMTAwMCwgRGF2ZSBDaGlubmVyIHdy
b3RlOgo+IE9uIFRodSwgTWF5IDI3LCAyMDEwIGF0IDA0OjM1OjIzUE0gKzEwMDAsIE5pY2sgUGln
Z2luIHdyb3RlOgo+ID4gQnV0IHdlIGNhbiB0aGluayBvZiBpbm9kZXMgdGhhdCBhcmUgb25seSBp
biB1c2UgYnkgdW51c2VkIChhbmQgYWdlZCkKPiA+IGRlbnRyaWVzIGFzIGVmZmVjdGl2ZWx5IHVu
dXNlZCB0aGVtc2VsdmVzLiBTbyB0aGlzIHNlcXVlbmNlIHVuZGVyCj4gPiBlc3RpbWF0ZXMgaG93
IG1hbnkgaW5vZGVzIHRvIHNjYW4uIFRoaXMgY291bGQgYmlhcyBwcmVzc3VyZSBhZ2FpbnN0Cj4g
PiBkY2FjaGUgSSdkIHRoaW5rLCBlc3BlY2lhbGx5IGNvbnNpZGVyaW5nIGlub2RlcyBhcmUgZmFy
IGxhcmdlciB0aGFuCj4gPiBkZW50cmllcy4gTWF5YmUgcmVxdWlyZSAyIHBhc3NlcyB0byBnZXQg
dGhlIGlub2RlcyB1bnVzZWQgaW50aGUKPiA+IGZpcnN0IHBhc3MuCj4gCj4gSXQncyBzZWxmLWJh
bGFuY2luZyAtIGl0IHRyZW5kcyB0b3dhcmRzIGFuIGVxdWFsIG51bWJlciBvZiB1bnVzZWQKPiBk
ZW50cmllcyBhbmQgaW5vZGVzIGluIHRoZSBjYWNoZXMuIFllcywgaXQgd2lsbCB0ZWFyIGRvd24g
bW9yZQo+IGRlbnRyaWVzIGF0IGZpcnN0LCBidXQgd2UgbmVlZCB0byBkbyB0aGF0IHRvIGJlIGFi
bGUgdG8gcmVjbGFpbQo+IGlub2Rlcy4KCkJ1dCB0aGVuIGl0IGRvZXNuJ3Qgc2NhbiBlbm91Z2gg
aW5vZGVzIG9uIHRoZSBpbm9kZSBwYXNzLgoKCj4g4oSrcyByZWNsYWltIHByb2dyZXNzZXMgdGhl
IHByb3BvdGlvbiBvZiBpbm9kZXMgaW5jcmVhc2VzLCBzbwo+IHRoZSBhbW91bnQgb2YgaW5vZGVz
IHJlY2xhaW1lZCBpbmNyZWFzZXMuIAo+IAo+IEJhc2ljYWxseSB0aGlzIGlzIGEgcmVjb2duaXRp
b24gdGhhdCB0aGUgaW1wb3J0YW50IGNhY2hlIGZvcgo+IGF2b2lkaW5nIElPIGlzIHRoZSBpbm9k
ZSBjYWNoZSwgbm90IGhlIGRlbnRyeSBjYWNoZS4gT25jZSB0aGUgaW5vZGUKCllvdSBjYW4gYmlh
cyBhZ2FpbnN0IHRoZSBkY2FjaGUgdXNpbmcgbXVsdGlwbGllcnMuCgoKPiBjYWNoZSBpcyBmcmVl
ZCB0aGF0IHdlIG5lZWQgdG8gZG8gSU8gdG8gcmVwb3B1bGF0ZSBpdCwgYnV0Cj4gcmVidWlsZGlu
ZyBkZW50cmllcyBmcm9tdGVoIGlub2RlIGNhY2hlIG9ubHkgY29zdHMgQ1BVIHRpbWUuIEhlbmNl
Cj4gdW5kZXIgbGlnaHQgcmVjbGFpbSwgaW5vZGVzIGFyZSBtb3N0bHkgbGVmdCBpbiBjYWNoZSBi
dXQgd2UgZnJlZSB1cAo+IG1lbW9yeSB0aGF0IG9ubHkgY29zdHMgQ1BVIHRvIHJlYnVpbGQuIFVu
ZGVyIGhlYXZ5LCBzdXN0YWluZWQKPiByZWNsYWltLCB3ZSB0cmVuZCB0b3dhcmRzIGZyZWVpbmcg
ZXF1YWwgYW1vdW50cyBvZiBvYmplY3RzIGZyb20gYm90aAo+IGNhY2hlcy4KCkkgZG9uJ3Qga25v
dyBpZiB5b3UndmUgZ290IG51bWJlcnMgb3IgcGF0dGVybnMgdG8ganVzdGlmeSB0aGF0LgpNeSBw
b2ludCBpcyB0aGF0IHRoaW5ncyBzaG91bGQgc3RheSBhcyBjbG9zZSB0byB0aGUgb2xkIGNvZGUg
YXMKcG9zc2libGUgd2l0aG91dCBnb29kIHJlYXNvbi4KCiAKPiBUaGlzIGlzIHByZXR0eSBtdWNo
IHdoYXQgdGhlIGN1cnJlbnQgY29kZSBhdHRlbXB0cyB0byBkbyAtIGZyZWUgYQo+IGxvdCBvZiBk
ZW50cmllcywgdGhlbiBmcmVlIGEgc21hbGxlciBhbW91bnQgb2YgdGhlIGlub2RlcyB0aGF0IHdl
cmUKPiB1c2VkIGJ5IHRoZSBmcmVlZCBkZW50cmllcy4gT25jZSBhZ2FpbiBpdCBpcyBhIGRpcmVj
dCBlbmNvZGluZyBvZgo+IHdoYXQgaXMgY3VycmVudGx5IGFuIGltcGxpY2l0IGRlc2lnbiBmZWF0
dXJlIC0gaXQgbWFrZXMgaXQgKm9idmlvdXMqCj4gaG93IHdlIGFyZSB0cnlpbmcgdG8gYmFsYW5j
ZSB0aGUgY2FjaGVzLgoKV2l0aCB5b3VyIHBhdGNoZXMsIGlmIHRoZXJlIGFyZSBubyBpbm9kZXMg
ZnJlZSB5b3Ugd291bGQgbmVlZCB0byB0YWtlCjIgcGFzc2VzIGF0IGZyZWVpbmcgdGhlIGRlbnRy
eSBjYWNoZS4gTXkgc3VnZ2VzdGlvbiBpcyBjbG9zZXIgdG8gdGhlCmN1cnJlbnQgY29kZS4KIAoK
PiBBbm90aGVyIHJlYXNvbiBmb3IgdGhpcyBpcyB0aGF0IHRoZSBjYWxjdWxhdGlvbiBjaGFuZ2Vz
IGFnYWluIHRvCj4gYWxsb3cgZmlsZXN5c3RlbSBjYWNoZXMgdG8gbW9kaXkgdGhpcyBwcm9wb3J0
aW9uaW5nIGluIHRoZSBuZXh0Cj4gcGF0Y2guLi4uCj4gCj4gRldJVywgdGhpcyBhbHNvIG1ha2Vz
IHdvcmtsb2FkcyB0aGF0IGdlbmVyYXRlIGh1bmRyZWRzIG9mIHRob3VzYW5kcwo+IG9mIG5ldmVy
LXRvLWJlLXVzZWQgYWdhaW4gbmVnYXRpdmUgZGVudHJpZXMgZnJlZSBkY2FjaGUgbWVtb3J5IHJl
YWxseQo+IHF1aWNrbHkgb24gbWVtb3J5IHByZXNzdXJlLi4uCgpUaGF0IHdvdWxkIHN0aWxsIGJl
IHRoZSBjYXNlIGJlY2F1c2UgdXNlZCBpbm9kZXMgYXJlbid0IGdldHRpbmcgdGhlaXIKZGVudHJp
ZXMgZnJlZWQgc28gbGl0dGxlIGlub2RlIHNjYW5uaW5nIHdpbGwgb2NjdXIuCgo+IAo+ID4gUGFy
dCBvZiB0aGUgcHJvYmxlbSBpcyB0aGUgZnVubnkgc2hyaW5rZXIgQVBJLgo+ID4gCj4gPiBUaGUg
cmlnaHQgd2F5IHRvIGRvIGl0IGlzIHRvIGNoYW5nZSB0aGUgc2hyaW5rZXIgQVBJIHNvIHRoYXQg
aXQgcGFzc2VzCj4gPiBkb3duIHRoZSBscnVfcGFnZXMgYW5kIHNjYW5uZWQgaW50byB0aGUgY2Fs
bGJhY2suIEZyb20gdGhlcmUsIHRoZQo+ID4gc2hyaW5rZXJzIGNhbiBjYWxjdWxhdGUgdGhlIGFw
cHJvcHJpYXRlIHJhdGlvIG9mIG9iamVjdHMgdG8gc2Nhbi4KPiA+IE5vIG5lZWQgZm9yIDItY2Fs
bCBzY2hlbWUsIG5vIG5lZWQgZm9yIHNocmlua2VyLT5zZWVrcywgYW5kIHRoZQo+ID4gYWJpbGl0
eSB0byBjYWxjdWxhdGUgYW4gYXBwcm9wcmlhdGUgcmF0aW8gZmlyc3QgZm9yIGRjYWNoZSwgYW5k
ICp0aGVuKgo+ID4gZm9yIGljYWNoZS4KPiAKPiBNeSBvbmx5IGNvbmNlcm4gYWJvdXQgdGhpcyBp
cyB0aGF0IGV4cG9zZXMgdGhlIGlubmVyIHdvcmtpbmdzIG9mIHRoZQo+IHNocmlua2VyIGFuZCBt
bSBzdWJzeXN0ZW0gdG8gY29kZSB0aGF0IHNpbXBseSBkb2Vzbid0IG5lZWQgdG8ga25vdwo+IGFi
b3V0IGl0LgoKSXQncyBqdXN0IHByb3ZpZGluZyBhIHJhdGlvLiBUaGUgc2hyaW5rZXJzIGFsbHJl
YWR5IGtub3cgdGhleSBhcmUKc2Nhbm5pbmcgYmFzZWQgb24gYSByYXRpbyBvZiBwYWdlY2FjaGUg
c2Nhbm5lZC4KCgo+ID4gQSBoZWxwZXIgb2YgY291cnNlIGNhbiBkbyB0aGUgY2FsY3VsYXRpb24g
KGNvbnNpZGVyaW5nIHRoYXQgZXZlcnkKPiA+IGRyaXZlciBhbmQgdGhlaXIgZG9nIHdpbGwgZG8g
dGhlIHdyb25nIHRoaW5nIGlmIHdlIGxldCB0aGVtIDopKS4KPiA+IAo+ID4gdW5zaWduZWQgbG9u
ZyBzaHJpbmtlcl9zY2FuKHVuc2lnbmVkIGxvbmcgbHJ1X3BhZ2VzLAo+ID4gCQkJdW5zaWduZWQg
bG9uZyBscnVfc2Nhbm5lZCwKPiA+IAkJCXVuc2lnbmVkIGxvbmcgbnJfb2JqZWN0cywKPiA+IAkJ
CXVuc2lnbmVkIGxvbmcgc2Nhbl9yYXRpbykKPiA+IHsKPiA+IAl1bnNpZ25lZCBsb25nIGxvbmcg
dG1wID0gbnJfb2JqZWN0czsKPiA+IAo+ID4gCXRtcCAqPSBscnVfc2Nhbm5lZCAqIDEwMDsKPiA+
IAlkb19kaXYodG1wLCAobHJ1X3BhZ2VzICogc2Nhbl9yYXRpbykgKyAxKTsKPiA+IAo+ID4gCXJl
dHVybiAodW5zaWduZWQgbG9uZyl0bXA7Cj4gPiB9Cj4gPiAKPiA+IFRoZW4gdGhlIHNocmlua2Vy
IGNhbGxiYWNrIHdpbGwgZ286Cj4gPiAJc2ItPnNfbnJfZGVudHJ5X3NjYW4gKz0gc2hyaW5rZXJf
c2NhbihscnVfcGFnZXMsIGxydV9zY2FubmVkLAo+ID4gCQkJCXNiLT5zX25yX2RlbnRyeV91bnVz
ZWQsCj4gPiAJCQkJdmZzX2NhY2hlX3ByZXNzdXJlICogU0VFS1NfUEVSX0RFTlRSWSk7Cj4gPiAJ
aWYgKHNiLT5zX25yX2RlbnRyeV9zY2FuID4gU0hSSU5LX0JBVENIKQo+ID4gCQlwcnVuZV9kY2Fj
aGUoKQo+ID4gCj4gPiAJc2ItPnNfbnJfaW5vZGVfc2NhbiArPSBzaHJpbmtlcl9zY2FuKGxydV9w
YWdlcywgbHJ1X3NjYW5uZWQsCj4gPiAJCQkJc2ItPnNfbnJfaW5vZGVzX3VudXNlZCwKPiA+IAkJ
CQl2ZnNfY2FjaGVfcHJlc3N1cmUgKiBTRUVLU19QRVJfSU5PREUpOwo+ID4gCS4uLgo+ID4gCj4g
PiBXaGF0IGRvIHlvdSB0aGluayBvZiB0aGF0PyBTZWVpbmcgYXMgd2UncmUgY2hhbmdpbmcgdGhl
IHNocmlua2VyIEFQSQo+ID4gYW55d2F5LCBJJ2QgdGhpbmsgaXQgaXMgaGlnaCB0aW1lIHRvIGRv
IHNvbXRoaW5nIGxpa2UgdGhpcy4KPiAKPiBJZ25vcmluZyB0aGUgZGNhY2hlL2ljYWNoZSByZWNs
YWltIHJhdGlvIGlzc3VlcywgSSdkIHByZWZlciBhIHR3bwoKV2VsbCBpZiBpdCBpcyBhbiBpc3N1
ZSwgaXQgc2hvdWxkIGJlIGNoYW5nZWQgaW4gYSBkaWZmZXJlbnQgcGF0Y2gKSSB0aGluayAod2l0
aCBudW1iZXJzKS4KCgo+IGNhbGwgQVBJIHRoYXQgbWF0Y2hlcyB0aGUgY3VycmVudCBiZWhhdmlv
dXIsIGxlYXZpbmcgdGhlIGNhY2x1bGF0aW9uCj4gb2YgaG93IG11Y2ggdG8gcmVjbGFpbSBpbiBz
aHJpbmtfc2xhYigpLiBFbmNvZGluZyBpdCB0aGlzIHdheSBtYWtlcwo+IGl0IG1vcmUgZGlmZmlj
dWx0IHRvIGNoYW5nZSB0aGUgaGlnaCBsZXZlbCBiZWhhdmlvdXIgZS5nLiBpZiB3ZSB3YW50Cj4g
dG8gbW9kaWZ5IHRoZSBhbW91bnQgb2Ygc2xhYiByZWNsYWltIGJhc2VkIG9uIHJlY2xhaW0gcHJp
b3JpdHksIHdlJ2QKPiBoYXZlIHRvIGNhaG5nZSBldmVyeSBzaHJpbmtlciBpbnN0ZWFkIG9mIGp1
c3Qgc2hyaW5rX3NsYWIoKS4KCldlIGNhbiBtb2RpZml5IHRoZSByYXRpb3MgYmVmb3JlIGNhbGxp
bmcgaWYgbmVlZGVkLCBvciBoYXZlIGEgZGVmYXVsdApyYXRpbyBkZWZpbmUgdG8gbXVsdGlwbHkg
d2l0aCBhcyB3ZWxsLgoKQnV0IHNocmlua2VycyBhcmUgdmVyeSBzdWJzeXN0ZW0gc3BlY2lmaWMu
CgpfX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fXwp4ZnMgbWFp
bGluZyBsaXN0Cnhmc0Bvc3Muc2dpLmNvbQpodHRwOi8vb3NzLnNnaS5jb20vbWFpbG1hbi9saXN0
aW5mby94ZnMK

From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1754009Ab0E1FT3 (ORCPT <rfc822;w@1wt.eu>);
	Fri, 28 May 2010 01:19:29 -0400
Received: from cantor2.suse.de ([195.135.220.15]:48576 "EHLO mx2.suse.de"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S1751260Ab0E1FT2 (ORCPT <rfc822;linux-kernel@vger.kernel.org>);
	Fri, 28 May 2010 01:19:28 -0400
Date: Fri, 28 May 2010 15:19:24 +1000
From: Nick Piggin <npiggin@suse.de>
To: Dave Chinner <david@fromorbit.com>
Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
       linux-mm@kvack.org, xfs@oss.sgi.com
Subject: Re: [PATCH 3/5] superblock: introduce per-sb cache shrinker
 infrastructure
Message-ID: <20100528051924.GZ22536@laptop>
References: <1274777588-21494-1-git-send-email-david@fromorbit.com>
 <1274777588-21494-4-git-send-email-david@fromorbit.com>
 <20100527063523.GJ22536@laptop>
 <20100527224034.GO12087@dastard>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
In-Reply-To: <20100527224034.GO12087@dastard>
User-Agent: Mutt/1.5.20 (2009-06-14)
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Fri, May 28, 2010 at 08:40:34AM +1000, Dave Chinner wrote:
> On Thu, May 27, 2010 at 04:35:23PM +1000, Nick Piggin wrote:
> > But we can think of inodes that are only in use by unused (and aged)
> > dentries as effectively unused themselves. So this sequence under
> > estimates how many inodes to scan. This could bias pressure against
> > dcache I'd think, especially considering inodes are far larger than
> > dentries. Maybe require 2 passes to get the inodes unused inthe
> > first pass.
> 
> It's self-balancing - it trends towards an equal number of unused
> dentries and inodes in the caches. Yes, it will tear down more
> dentries at first, but we need to do that to be able to reclaim
> inodes.

But then it doesn't scan enough inodes on the inode pass.


> Ås reclaim progresses the propotion of inodes increases, so
> the amount of inodes reclaimed increases. 
> 
> Basically this is a recognition that the important cache for
> avoiding IO is the inode cache, not he dentry cache. Once the inode

You can bias against the dcache using multipliers.


> cache is freed that we need to do IO to repopulate it, but
> rebuilding dentries fromteh inode cache only costs CPU time. Hence
> under light reclaim, inodes are mostly left in cache but we free up
> memory that only costs CPU to rebuild. Under heavy, sustained
> reclaim, we trend towards freeing equal amounts of objects from both
> caches.

I don't know if you've got numbers or patterns to justify that.
My point is that things should stay as close to the old code as
possible without good reason.

 
> This is pretty much what the current code attempts to do - free a
> lot of dentries, then free a smaller amount of the inodes that were
> used by the freed dentries. Once again it is a direct encoding of
> what is currently an implicit design feature - it makes it *obvious*
> how we are trying to balance the caches.

With your patches, if there are no inodes free you would need to take
2 passes at freeing the dentry cache. My suggestion is closer to the
current code.
 

> Another reason for this is that the calculation changes again to
> allow filesystem caches to modiy this proportioning in the next
> patch....
> 
> FWIW, this also makes workloads that generate hundreds of thousands
> of never-to-be-used again negative dentries free dcache memory really
> quickly on memory pressure...

That would still be the case because used inodes aren't getting their
dentries freed so little inode scanning will occur.

> 
> > Part of the problem is the funny shrinker API.
> > 
> > The right way to do it is to change the shrinker API so that it passes
> > down the lru_pages and scanned into the callback. From there, the
> > shrinkers can calculate the appropriate ratio of objects to scan.
> > No need for 2-call scheme, no need for shrinker->seeks, and the
> > ability to calculate an appropriate ratio first for dcache, and *then*
> > for icache.
> 
> My only concern about this is that exposes the inner workings of the
> shrinker and mm subsystem to code that simply doesn't need to know
> about it.

It's just providing a ratio. The shrinkers allready know they are
scanning based on a ratio of pagecache scanned.


> > A helper of course can do the calculation (considering that every
> > driver and their dog will do the wrong thing if we let them :)).
> > 
> > unsigned long shrinker_scan(unsigned long lru_pages,
> > 			unsigned long lru_scanned,
> > 			unsigned long nr_objects,
> > 			unsigned long scan_ratio)
> > {
> > 	unsigned long long tmp = nr_objects;
> > 
> > 	tmp *= lru_scanned * 100;
> > 	do_div(tmp, (lru_pages * scan_ratio) + 1);
> > 
> > 	return (unsigned long)tmp;
> > }
> > 
> > Then the shrinker callback will go:
> > 	sb->s_nr_dentry_scan += shrinker_scan(lru_pages, lru_scanned,
> > 				sb->s_nr_dentry_unused,
> > 				vfs_cache_pressure * SEEKS_PER_DENTRY);
> > 	if (sb->s_nr_dentry_scan > SHRINK_BATCH)
> > 		prune_dcache()
> > 
> > 	sb->s_nr_inode_scan += shrinker_scan(lru_pages, lru_scanned,
> > 				sb->s_nr_inodes_unused,
> > 				vfs_cache_pressure * SEEKS_PER_INODE);
> > 	...
> > 
> > What do you think of that? Seeing as we're changing the shrinker API
> > anyway, I'd think it is high time to do somthing like this.
> 
> Ignoring the dcache/icache reclaim ratio issues, I'd prefer a two

Well if it is an issue, it should be changed in a different patch
I think (with numbers).


> call API that matches the current behaviour, leaving the caclulation
> of how much to reclaim in shrink_slab(). Encoding it this way makes
> it more difficult to change the high level behaviour e.g. if we want
> to modify the amount of slab reclaim based on reclaim priority, we'd
> have to cahnge every shrinker instead of just shrink_slab().

We can modifiy the ratios before calling if needed, or have a default
ratio define to multiply with as well.

But shrinkers are very subsystem specific.


From mboxrd@z Thu Jan  1 00:00:00 1970
From: Nick Piggin <npiggin@suse.de>
Subject: Re: [PATCH 3/5] superblock: introduce per-sb cache shrinker
 infrastructure
Date: Fri, 28 May 2010 15:19:24 +1000
Message-ID: <20100528051924.GZ22536@laptop>
References: <1274777588-21494-1-git-send-email-david@fromorbit.com>
 <1274777588-21494-4-git-send-email-david@fromorbit.com>
 <20100527063523.GJ22536@laptop>
 <20100527224034.GO12087@dastard>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-mm@kvack.org, xfs@oss.sgi.com
To: Dave Chinner <david@fromorbit.com>
Return-path: <owner-linux-mm@kvack.org>
Content-Disposition: inline
In-Reply-To: <20100527224034.GO12087@dastard>
Sender: owner-linux-mm@kvack.org
List-Id: linux-fsdevel.vger.kernel.org

On Fri, May 28, 2010 at 08:40:34AM +1000, Dave Chinner wrote:
> On Thu, May 27, 2010 at 04:35:23PM +1000, Nick Piggin wrote:
> > But we can think of inodes that are only in use by unused (and aged)
> > dentries as effectively unused themselves. So this sequence under
> > estimates how many inodes to scan. This could bias pressure against
> > dcache I'd think, especially considering inodes are far larger than
> > dentries. Maybe require 2 passes to get the inodes unused inthe
> > first pass.
>=20
> It's self-balancing - it trends towards an equal number of unused
> dentries and inodes in the caches. Yes, it will tear down more
> dentries at first, but we need to do that to be able to reclaim
> inodes.

But then it doesn't scan enough inodes on the inode pass.


> =E2=84=ABs reclaim progresses the propotion of inodes increases, so
> the amount of inodes reclaimed increases.=20
>=20
> Basically this is a recognition that the important cache for
> avoiding IO is the inode cache, not he dentry cache. Once the inode

You can bias against the dcache using multipliers.


> cache is freed that we need to do IO to repopulate it, but
> rebuilding dentries fromteh inode cache only costs CPU time. Hence
> under light reclaim, inodes are mostly left in cache but we free up
> memory that only costs CPU to rebuild. Under heavy, sustained
> reclaim, we trend towards freeing equal amounts of objects from both
> caches.

I don't know if you've got numbers or patterns to justify that.
My point is that things should stay as close to the old code as
possible without good reason.

=20
> This is pretty much what the current code attempts to do - free a
> lot of dentries, then free a smaller amount of the inodes that were
> used by the freed dentries. Once again it is a direct encoding of
> what is currently an implicit design feature - it makes it *obvious*
> how we are trying to balance the caches.

With your patches, if there are no inodes free you would need to take
2 passes at freeing the dentry cache. My suggestion is closer to the
current code.
=20

> Another reason for this is that the calculation changes again to
> allow filesystem caches to modiy this proportioning in the next
> patch....
>=20
> FWIW, this also makes workloads that generate hundreds of thousands
> of never-to-be-used again negative dentries free dcache memory really
> quickly on memory pressure...

That would still be the case because used inodes aren't getting their
dentries freed so little inode scanning will occur.

>=20
> > Part of the problem is the funny shrinker API.
> >=20
> > The right way to do it is to change the shrinker API so that it passe=
s
> > down the lru_pages and scanned into the callback. From there, the
> > shrinkers can calculate the appropriate ratio of objects to scan.
> > No need for 2-call scheme, no need for shrinker->seeks, and the
> > ability to calculate an appropriate ratio first for dcache, and *then=
*
> > for icache.
>=20
> My only concern about this is that exposes the inner workings of the
> shrinker and mm subsystem to code that simply doesn't need to know
> about it.

It's just providing a ratio. The shrinkers allready know they are
scanning based on a ratio of pagecache scanned.


> > A helper of course can do the calculation (considering that every
> > driver and their dog will do the wrong thing if we let them :)).
> >=20
> > unsigned long shrinker_scan(unsigned long lru_pages,
> > 			unsigned long lru_scanned,
> > 			unsigned long nr_objects,
> > 			unsigned long scan_ratio)
> > {
> > 	unsigned long long tmp =3D nr_objects;
> >=20
> > 	tmp *=3D lru_scanned * 100;
> > 	do_div(tmp, (lru_pages * scan_ratio) + 1);
> >=20
> > 	return (unsigned long)tmp;
> > }
> >=20
> > Then the shrinker callback will go:
> > 	sb->s_nr_dentry_scan +=3D shrinker_scan(lru_pages, lru_scanned,
> > 				sb->s_nr_dentry_unused,
> > 				vfs_cache_pressure * SEEKS_PER_DENTRY);
> > 	if (sb->s_nr_dentry_scan > SHRINK_BATCH)
> > 		prune_dcache()
> >=20
> > 	sb->s_nr_inode_scan +=3D shrinker_scan(lru_pages, lru_scanned,
> > 				sb->s_nr_inodes_unused,
> > 				vfs_cache_pressure * SEEKS_PER_INODE);
> > 	...
> >=20
> > What do you think of that? Seeing as we're changing the shrinker API
> > anyway, I'd think it is high time to do somthing like this.
>=20
> Ignoring the dcache/icache reclaim ratio issues, I'd prefer a two

Well if it is an issue, it should be changed in a different patch
I think (with numbers).


> call API that matches the current behaviour, leaving the caclulation
> of how much to reclaim in shrink_slab(). Encoding it this way makes
> it more difficult to change the high level behaviour e.g. if we want
> to modify the amount of slab reclaim based on reclaim priority, we'd
> have to cahnge every shrinker instead of just shrink_slab().

We can modifiy the ratios before calling if needed, or have a default
ratio define to multiply with as well.

But shrinkers are very subsystem specific.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=3Dmailto:"dont@kvack.org"> email@kvack.org </a>

From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
Received: from mail172.messagelabs.com (mail172.messagelabs.com [216.82.254.3])
	by kanga.kvack.org (Postfix) with SMTP id 2B4596B01BC
	for <linux-mm@kvack.org>; Fri, 28 May 2010 01:19:30 -0400 (EDT)
Date: Fri, 28 May 2010 15:19:24 +1000
From: Nick Piggin <npiggin@suse.de>
Subject: Re: [PATCH 3/5] superblock: introduce per-sb cache shrinker
 infrastructure
Message-ID: <20100528051924.GZ22536@laptop>
References: <1274777588-21494-1-git-send-email-david@fromorbit.com>
 <1274777588-21494-4-git-send-email-david@fromorbit.com>
 <20100527063523.GJ22536@laptop>
 <20100527224034.GO12087@dastard>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
In-Reply-To: <20100527224034.GO12087@dastard>
Sender: owner-linux-mm@kvack.org
To: Dave Chinner <david@fromorbit.com>
Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, xfs@oss.sgi.com
List-ID: <linux-mm.kvack.org>

On Fri, May 28, 2010 at 08:40:34AM +1000, Dave Chinner wrote:
> On Thu, May 27, 2010 at 04:35:23PM +1000, Nick Piggin wrote:
> > But we can think of inodes that are only in use by unused (and aged)
> > dentries as effectively unused themselves. So this sequence under
> > estimates how many inodes to scan. This could bias pressure against
> > dcache I'd think, especially considering inodes are far larger than
> > dentries. Maybe require 2 passes to get the inodes unused inthe
> > first pass.
> 
> It's self-balancing - it trends towards an equal number of unused
> dentries and inodes in the caches. Yes, it will tear down more
> dentries at first, but we need to do that to be able to reclaim
> inodes.

But then it doesn't scan enough inodes on the inode pass.


> a?<<s reclaim progresses the propotion of inodes increases, so
> the amount of inodes reclaimed increases. 
> 
> Basically this is a recognition that the important cache for
> avoiding IO is the inode cache, not he dentry cache. Once the inode

You can bias against the dcache using multipliers.


> cache is freed that we need to do IO to repopulate it, but
> rebuilding dentries fromteh inode cache only costs CPU time. Hence
> under light reclaim, inodes are mostly left in cache but we free up
> memory that only costs CPU to rebuild. Under heavy, sustained
> reclaim, we trend towards freeing equal amounts of objects from both
> caches.

I don't know if you've got numbers or patterns to justify that.
My point is that things should stay as close to the old code as
possible without good reason.

 
> This is pretty much what the current code attempts to do - free a
> lot of dentries, then free a smaller amount of the inodes that were
> used by the freed dentries. Once again it is a direct encoding of
> what is currently an implicit design feature - it makes it *obvious*
> how we are trying to balance the caches.

With your patches, if there are no inodes free you would need to take
2 passes at freeing the dentry cache. My suggestion is closer to the
current code.
 

> Another reason for this is that the calculation changes again to
> allow filesystem caches to modiy this proportioning in the next
> patch....
> 
> FWIW, this also makes workloads that generate hundreds of thousands
> of never-to-be-used again negative dentries free dcache memory really
> quickly on memory pressure...

That would still be the case because used inodes aren't getting their
dentries freed so little inode scanning will occur.

> 
> > Part of the problem is the funny shrinker API.
> > 
> > The right way to do it is to change the shrinker API so that it passes
> > down the lru_pages and scanned into the callback. From there, the
> > shrinkers can calculate the appropriate ratio of objects to scan.
> > No need for 2-call scheme, no need for shrinker->seeks, and the
> > ability to calculate an appropriate ratio first for dcache, and *then*
> > for icache.
> 
> My only concern about this is that exposes the inner workings of the
> shrinker and mm subsystem to code that simply doesn't need to know
> about it.

It's just providing a ratio. The shrinkers allready know they are
scanning based on a ratio of pagecache scanned.


> > A helper of course can do the calculation (considering that every
> > driver and their dog will do the wrong thing if we let them :)).
> > 
> > unsigned long shrinker_scan(unsigned long lru_pages,
> > 			unsigned long lru_scanned,
> > 			unsigned long nr_objects,
> > 			unsigned long scan_ratio)
> > {
> > 	unsigned long long tmp = nr_objects;
> > 
> > 	tmp *= lru_scanned * 100;
> > 	do_div(tmp, (lru_pages * scan_ratio) + 1);
> > 
> > 	return (unsigned long)tmp;
> > }
> > 
> > Then the shrinker callback will go:
> > 	sb->s_nr_dentry_scan += shrinker_scan(lru_pages, lru_scanned,
> > 				sb->s_nr_dentry_unused,
> > 				vfs_cache_pressure * SEEKS_PER_DENTRY);
> > 	if (sb->s_nr_dentry_scan > SHRINK_BATCH)
> > 		prune_dcache()
> > 
> > 	sb->s_nr_inode_scan += shrinker_scan(lru_pages, lru_scanned,
> > 				sb->s_nr_inodes_unused,
> > 				vfs_cache_pressure * SEEKS_PER_INODE);
> > 	...
> > 
> > What do you think of that? Seeing as we're changing the shrinker API
> > anyway, I'd think it is high time to do somthing like this.
> 
> Ignoring the dcache/icache reclaim ratio issues, I'd prefer a two

Well if it is an issue, it should be changed in a different patch
I think (with numbers).


> call API that matches the current behaviour, leaving the caclulation
> of how much to reclaim in shrink_slab(). Encoding it this way makes
> it more difficult to change the high level behaviour e.g. if we want
> to modify the amount of slab reclaim based on reclaim priority, we'd
> have to cahnge every shrinker instead of just shrink_slab().

We can modifiy the ratios before calling if needed, or have a default
ratio define to multiply with as well.

But shrinkers are very subsystem specific.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>